Commit Graph

721 Commits

Author SHA1 Message Date
Oleksandr Pavlyk
c9705de4a4 Reserve enough space clock-rates for min samples, if specified 2026-02-27 12:49:35 -06:00
Oleksandr Pavlyk
998ab125ce Don't override m_check_throttling if throttling threshold is non-positive
measure_cold class now directly inherits m_check_throttling from state.
This ensures that when `--jsonbin` is specified frequency data corresponding
to timing data are available to write out.
2026-02-20 16:34:53 -06:00
Oleksandr Pavlyk
731e0c2c30 Swapped data members m_sm_clock_rates and m_sm_clock_rate_accumulator
This places all std::vector members together. Added default initialization
to all std::vector members, and all other members with default constructors.

Exceptions are references and nvbench::launch m_launch; member
2026-02-19 15:33:57 -06:00
Oleksandr Pavlyk
4da9f431c0 Templatize write_out_values for different storage formats
This could be used to save data as float32_t, or float64_t.
This flexibility is useful for experimentation.
2026-02-19 15:32:00 -06:00
Oleksandr Pavlyk
988420b5b1 Use write_out_values utility to save frequencies
The utility was already used to save times
2026-02-13 10:19:06 -06:00
Georgy Evtushenko
40b2f4ece2 Better place to stop freq timer? 2026-02-13 09:53:59 -06:00
Georgy Evtushenko
a487a38895 Dump frequencies 2026-02-13 08:49:41 -06:00
Nader Al Awar
dc59f98ecd Remove cupti from cuda-bench dependencies (#311) python-0.2.0 2026-02-03 14:16:26 -06:00
Bernhard Manfred Gruber
90ad8bcbc7 Merge pull request #296 from bernhardmgruber/compare_sub_results
Allow partial comparison in `nvbench_compare.py`
2026-02-03 20:02:34 +01:00
Bernhard Manfred Gruber
c6ef87575c Allow partial comparison in nvbench_compare.py
Fixes: #295
2026-02-03 16:32:11 +01:00
Nader Al Awar
d75fc74162 Merge branch 'main' into remove-cupti-python 2026-02-03 08:58:41 -06:00
Oleksandr Pavlyk
867d5d4276 Merge pull request #294 from oleksandr-pavlyk/add-docstrings 2026-02-03 08:51:55 -06:00
Oleksandr Pavlyk
8a128ed7d9 Merge pull request #309 from oleksandr-pavlyk/support-skipping-batched-runs 2026-02-02 17:57:45 -06:00
Nader Al Awar
4fa4296810 Remove cuda.pathfinder function 2026-02-02 16:43:45 -06:00
Nader Al Awar
f2d5730104 Disable CUPTI in cmake file 2026-02-02 16:03:15 -06:00
Nader Al Awar
6df5fc8c67 Remove cupti from cuda-bench dependencies 2026-02-02 15:37:13 -06:00
Oleksandr Pavlyk
a33a454a2d Make skip_hot_measurement method const 2026-02-02 14:42:07 -06:00
Oleksandr Pavlyk
f049f10977 Fix typo 2026-02-02 14:41:42 -06:00
Oleksandr Pavlyk
cfb4a9b8b0 Fix for comment grammar 2026-02-02 12:58:15 -06:00
Oleksandr Pavlyk
27d6492355 Factor out check for whether to skip hot measurement to a nvbench::state private method 2026-02-02 12:43:39 -06:00
Oleksandr Pavlyk
cff6df9bb2 Renamed option to --no-batch to stay aligned with tag name 2026-02-02 12:28:39 -06:00
Oleksandr Pavlyk
8ff0557ad8 Replace use of py::handle to store global_registry
Use py::gil_safe_call_once_and_store facility pybind11 provides.
2026-02-02 11:55:48 -06:00
Oleksandr Pavlyk
39c29026fd Move docstrings from PYI file to implementation
Added tests that docstrings exist and are not empty.

This closes #291
2026-02-02 11:55:48 -06:00
Oleksandr Pavlyk
f1b9d44304 Support --no-batched CLI option
The option sets m_skip_batched boolean member in benchmark_base class.
Methods `bool get_skip_batched()` and `void set_skip_batched(bool)` added.

m_skip_batched is also added to state class. Similarly named methods
are added.

CLI help file documents `--no-batched` option.
2026-02-02 11:32:57 -06:00
Nader Al Awar
34a089f805 Add 89-real to list of architectures built for cuda-bench (#308) 2026-01-30 13:35:17 -06:00
Nader Al Awar
7b5887a4a6 Add 89-real to list of architectures built 2026-01-30 13:02:42 -06:00
Nader Al Awar
a5ad480dfe Add installation instructions to cuda-bench readme (#307)
Add installation instructions to `cuda-bench` readme
2026-01-30 10:02:56 -06:00
Nader Al Awar
edf0b80599 Add installation instructions 2026-01-30 09:32:44 -06:00
Nader Al Awar
a29748316d Fix pypi url to publish wheel (#306)
Fix pypi url to publish wheel
python-0.1.0
2026-01-29 16:03:48 -06:00
Nader Al Awar
bd775c8c14 Use inputs.component for concistency with cuda-cccl 2026-01-29 15:10:46 -06:00
Nader Al Awar
a8e8e176e9 Fix pypi url to publish wheel 2026-01-29 14:57:48 -06:00
Nader Al Awar
f66f76731c Replace all occurences of pynvbench with cuda-bench (#305) 2026-01-29 14:13:44 -06:00
Nader Al Awar
fa1eed69c0 Rename test file to refer to cuda_bench 2026-01-29 13:53:29 -06:00
Nader Al Awar
c14a016e40 Replace a few more occurrences 2026-01-29 13:32:09 -06:00
Nader Al Awar
711c1e2eb1 Replace all occurences of pynvbench with cuda-bench 2026-01-29 13:25:17 -06:00
Nader Al Awar
5e7adc5c3f Build multi architecture cuda wheels (#302)
* Add cuda architectures to build wheel for

* Package scripts in wheel

* Separate cuda major version extraction to fix architecutre selection logic

* Add back statement printing cuda version

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-01-29 01:13:24 +00:00
Ashwin Srinath
a681e2185d Add multi-cuda wheel build (#289)
Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>
Co-authored-by: Nader Al Awar <naderalawar@gmail.com>
2026-01-28 10:37:55 -05:00
Oleksandr Pavlyk
f3fa93f388 Merge pull request #290 from oleksandr-pavlyk/debug/outstanding-changes
Make python nvbench benchmarks interruptible
2026-01-23 15:39:23 -06:00
Bernhard Manfred Gruber
2d4690e07d Merge pull request #298 from bernhardmgruber/ignore_device
Allow to by-pass device section check and compare different devices
2025-12-10 18:24:26 +01:00
Bernhard Manfred Gruber
85548809d6 Allow to by-pass device section check and compare different devices
Fixes: #297
2025-12-10 13:14:50 +01:00
Oleksandr Pavlyk
f6a9b245d3 Only trigger skipping of outstanding benchmarks on KeyboardInterrupt exception, on others benchmakr is to continue execution 2025-12-08 14:46:59 -06:00
Oleksandr Pavlyk
7e9a9a8983 Replace main_arg_run_benchmarks with run_interriptible
This loop uses benchmark.run_or_skip to resolve #284 even
for scripts that contain more than one benchmark, or when
a script with a single benchmark is executed when more than
one device is available.
2025-12-08 14:29:27 -06:00
Oleksandr Pavlyk
8e6154511e Introduce runner->run_or_skip(bool &) and benchmark->run_or_skip(bool &)
These methods take reference to a boolean whose value signals whether
benchmark instances pending for execution are to be skipped.

void benchmark->run_or_skip(bool &) is called by Python to ensure
that KeyboardInterrupt is properly handled in scripts that contain
multiple benchmarks, or in case when single benchmark script is
executed on a machine with more than one device.
2025-12-08 14:24:32 -06:00
Oleksandr Pavlyk
a7763bdd7a Remove debug outputs 2025-12-08 12:25:31 -06:00
Oleksandr Pavlyk
b2a80c92b8 Revert "Scripts to triage 284"
This reverts commit c286199adc.
2025-12-08 11:53:08 -06:00
Oleksandr Pavlyk
ce9a76167f Use nvbench::stop_runner_loop to signal stop of runner loop
Add try/catch around Python calls to improve keyboard interrup
response.
2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
e57f1ecf4c Introduce nvbench::stop_runner_loop exception. If application throws it, runner loop is stopped and other pending benchmark instances are skipped 2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
c286199adc Scripts to triage 284 2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
de471e1d42 Use pybind11==3.0.1, do not use pybind11_add_module 2025-12-05 19:38:11 -06:00
Jerry Hou
f651636501 entropy criterion optimizations (#286)
* entropy criterion optimizations

* online linear regression module

* online regression refactor

* revising ss_tot handling

---------

Co-authored-by: Jerry Hou <jerryhou@fb.com>
2025-12-06 01:02:21 +00:00