Commit Graph

129 Commits

Author SHA1 Message Date
Oleksandr Pavlyk
e53a1a2654 Use median and IR/relative as cmp_time/ref_time and cmp_noise/ref_noise
These measures are less sensitive to outliers
2026-05-04 16:14:56 -05:00
Oleksandr Pavlyk
ea592b6444 Tweaks for nvbench_compare
1. For JSON files that contains repeated measurements of run-time
   axis values, make sure that scripts compares corresponding
   reference entries.

   If cmp had two states with the same name and ref had two, we
   would compare measurements for each state in cmp against the
   first state in ref.

   Change here introduces counters tracking how many times each
   particular axis value, and retrieve corresponding entry in ref.

Previously, I had

```

|  BlockSize  |  NumBlocks  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|-------------|-------------|------------|-------------|------------|-------------|-----------|---------|----------|
|     2^8     |     64      |   1.776 ms |       0.46% |   1.777 ms |       0.40% |  1.024 us |   0.06% |   SAME   |
|     2^8     |     64      |   1.776 ms |       0.46% |   1.774 ms |       0.52% | -2.048 us |  -0.12% |   SAME   |
|     2^8     |     64      |   1.776 ms |       0.46% |   1.773 ms |       0.52% | -3.072 us |  -0.17% |   SAME   |
|     2^8     |     64      |   1.776 ms |       0.46% |   1.774 ms |       0.58% | -2.048 us |  -0.12% |   SAME   |
|     2^8     |     64      |   1.776 ms |       0.46% |   1.773 ms |       0.58% | -3.072 us |  -0.17% |   SAME   |
```

and now it becomes

```

|  BlockSize  |  NumBlocks  |   Ref Time |   Ref Noise |   Cmp Time |   Cmp Noise |      Diff |   %Diff |  Status  |
|-------------|-------------|------------|-------------|------------|-------------|-----------|---------|----------|
|     2^8     |     64      |   1.776 ms |       0.46% |   1.777 ms |       0.40% |  1.024 us |   0.06% |   SAME   |
|     2^8     |     64      |   1.773 ms |       0.64% |   1.774 ms |       0.52% |  1.024 us |   0.06% |   SAME   |
|     2^8     |     64      |   1.774 ms |       0.46% |   1.773 ms |       0.52% | -1.024 us |  -0.06% |   SAME   |
|     2^8     |     64      |   1.773 ms |       0.46% |   1.774 ms |       0.58% |  1.024 us |   0.06% |   SAME   |
|     2^8     |     64      |   1.774 ms |       0.52% |   1.773 ms |       0.58% | -1.024 us |  -0.06% |   SAME   |
```

With the following raw data expected

```
(py313) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/median") | .data[] | .value' base.json
"0.0017756160497665405"
"0.0017725440263748169"
"0.001773568034172058"
"0.0017725440263748169"
"0.001773568034172058"

(py313) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. | .benchmarks[] | .states[] | .summaries[] | select(.tag == "nv/cold/time/gpu/median") | .data[] | .value' test.json
"0.0017766400575637818"
"0.001773568034172058"
"0.0017725440263748169"
"0.001773568034172058"
"0.0017725440263748169"
```

2. nvbench_compare changes from using min_noise = min(ref_noise, cmp_noise) to using max_noise = max(ref_noise, cmp_noise)
   Using larger of ref and cmp noise level as a reference against which to gauge timing difference ratio makes more sense.
2026-05-04 16:14:56 -05:00
Oleksandr Pavlyk
f392725015 Correct Python API signature of State.get_axis_values_as_strings (#346)
* Correct Python API signature of State.get_axis_values_as_strings

The C++ API has default boolean argument color, but Python API
declared no arguments.

Closes #345

* Also exercise invocation of get_axis_values_as_string with keyword argument value

* Remove use of cuda.core.experimental
2026-05-04 08:40:29 -05:00
Oleksandr Pavlyk
a3364ca5c7 Port changes to the package from #323 (#337)
Fixed relative text alignment in docstrings to fix autodoc warnigns

Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions
to start with underscore, signaling that these functions are internal and should
not be documented

Account for test_cpp_exceptions -> _test_cpp_exception, same for *_py_*

Make sure to reset __module__ of reexported symbols to be cuda.bench
2026-04-22 08:28:15 -05:00
Oleksandr Pavlyk
b0a46f44c2 Modularize color handling (#336)
* Introduce function colorize to modularize colorization/no-color handling

* Use sns.set_theme instead of deprecated sns.set()

* Use str.format instead of legacy % syntax

* Simplified iteration over list

Use f-string (supported since Python 3.6) instead of str.format for
better readability and performance
2026-04-14 08:09:44 -05:00
Nader Al Awar
373970323f Merge pull request #331 from oleksandr-pavlyk/update-python-examples
Update python examples
2026-04-02 15:20:24 -04:00
Oleksandr Pavlyk
39730efbc3 Update requirements to reflect packages used by examples 2026-04-02 10:37:17 -05:00
Oleksandr Pavlyk
9f75642387 Add patch to cutlass.base_dsl.dsl.BaseDSL to work-around a bug
See https://github.com/NVIDIA/cutlass/issues/3142
2026-04-02 10:29:31 -05:00
Nader Al Awar
7a68e53df0 Rename flag from markdown to no-color 2026-04-01 17:01:29 -05:00
Nader Al Awar
7e5e784855 Add --markdown flag to nvbench_compare.py which can be use for github issues/prs 2026-04-01 14:53:13 -05:00
Oleksandr Pavlyk
93bc59d05c Renamed CUTLASS example to reflect that it uses CuteDSL 2026-04-01 08:24:29 -05:00
Oleksandr Pavlyk
e4cfddeb87 Rewrote cutlass_gemm example to use CuteDSL 2026-04-01 08:23:41 -05:00
Oleksandr Pavlyk
3f284b4004 Renamed cccl_* examples
cccl_parallel_* -> cuda_compute_*
cccl_cooperative_* -> cuda_coop_*
2026-04-01 08:20:20 -05:00
Oleksandr Pavlyk
5bdb30f4b6 Update to cccl_parallel_segmented_reduce example per changes in API
Update namespace changes. Use make_segmented_reduce factory function,
and update call signatures.
2026-04-01 08:18:15 -05:00
Oleksandr Pavlyk
d8739fc208 Update to cccl_cooperative_block_reduce example 2026-04-01 08:17:52 -05:00
Oleksandr Pavlyk
974eb5ee0f Replace use of cupy.cuda.ExternalStream with cupy.cuda.Stream.from_external 2026-04-01 08:17:12 -05:00
Oleksandr Pavlyk
7c60edcc0a cuda.core.experimental -> cuda.core 2026-04-01 08:16:04 -05:00
Oleksandr Pavlyk
836a6c12f4 Merge pull request #326 from oleksandr-pavlyk/fix-sfinae-incomplete
Fix GCC16 sfinae incomplete warnings.

GCC16 started requiring that the type `T` used in `std::reference_wrapper<T>` is complete where using `-std=c++17`. Since NVBench has to forward declare some types in header files to break circular dependency, use of incomplete type breaks build due to use of `-Werror` flag due to `-Wsfinae-incomplete` warning emitted by GCC16.

This commit replaced affected uses of `std::reference_wrapper<const nvbench::benchmark_base>` in state.cxx, and `std::reference_wrapper<nvbench::printer_base>` in benchmark_base.cxx with raw pointers.
2026-03-24 16:02:28 -05:00
Bernhard Manfred Gruber
4164909c52 Feedback 2026-02-28 01:19:18 +01:00
Bernhard Manfred Gruber
0abc8ec82b Extend nvbench_compare.py with --plot, axis/benchmark filtering, and dark mode
Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>
2026-02-27 11:06:20 +01:00
Bernhard Manfred Gruber
800f640c20 Apply reviewer feedback 2026-02-26 19:23:51 +01:00
Bernhard Manfred Gruber
d3a0bec4a8 Feedback from review 2026-02-05 14:13:16 +01:00
Bernhard Manfred Gruber
28ed32bb47 Implement dark mode using style sheets 2026-02-05 14:00:33 +01:00
Bernhard Manfred Gruber
ec9759037d I have no idea what I am doing 2026-02-05 11:15:27 +01:00
Bernhard Manfred Gruber
ccde9fc4d4 More 2026-02-05 10:56:36 +01:00
Bernhard Manfred Gruber
0be190b407 Add a script to plot benchmark results 2026-02-05 10:36:52 +01:00
Nader Al Awar
dc59f98ecd Remove cupti from cuda-bench dependencies (#311) 2026-02-03 14:16:26 -06:00
Bernhard Manfred Gruber
c6ef87575c Allow partial comparison in nvbench_compare.py
Fixes: #295
2026-02-03 16:32:11 +01:00
Nader Al Awar
d75fc74162 Merge branch 'main' into remove-cupti-python 2026-02-03 08:58:41 -06:00
Nader Al Awar
4fa4296810 Remove cuda.pathfinder function 2026-02-02 16:43:45 -06:00
Nader Al Awar
f2d5730104 Disable CUPTI in cmake file 2026-02-02 16:03:15 -06:00
Nader Al Awar
6df5fc8c67 Remove cupti from cuda-bench dependencies 2026-02-02 15:37:13 -06:00
Oleksandr Pavlyk
8ff0557ad8 Replace use of py::handle to store global_registry
Use py::gil_safe_call_once_and_store facility pybind11 provides.
2026-02-02 11:55:48 -06:00
Oleksandr Pavlyk
39c29026fd Move docstrings from PYI file to implementation
Added tests that docstrings exist and are not empty.

This closes #291
2026-02-02 11:55:48 -06:00
Nader Al Awar
edf0b80599 Add installation instructions 2026-01-30 09:32:44 -06:00
Nader Al Awar
fa1eed69c0 Rename test file to refer to cuda_bench 2026-01-29 13:53:29 -06:00
Nader Al Awar
711c1e2eb1 Replace all occurences of pynvbench with cuda-bench 2026-01-29 13:25:17 -06:00
Nader Al Awar
5e7adc5c3f Build multi architecture cuda wheels (#302)
* Add cuda architectures to build wheel for

* Package scripts in wheel

* Separate cuda major version extraction to fix architecutre selection logic

* Add back statement printing cuda version

* [pre-commit.ci] auto code formatting

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-01-29 01:13:24 +00:00
Ashwin Srinath
a681e2185d Add multi-cuda wheel build (#289)
Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com>
Co-authored-by: Nader Al Awar <naderalawar@gmail.com>
2026-01-28 10:37:55 -05:00
Oleksandr Pavlyk
f6a9b245d3 Only trigger skipping of outstanding benchmarks on KeyboardInterrupt exception, on others benchmakr is to continue execution 2025-12-08 14:46:59 -06:00
Oleksandr Pavlyk
7e9a9a8983 Replace main_arg_run_benchmarks with run_interriptible
This loop uses benchmark.run_or_skip to resolve #284 even
for scripts that contain more than one benchmark, or when
a script with a single benchmark is executed when more than
one device is available.
2025-12-08 14:29:27 -06:00
Oleksandr Pavlyk
a7763bdd7a Remove debug outputs 2025-12-08 12:25:31 -06:00
Oleksandr Pavlyk
b2a80c92b8 Revert "Scripts to triage 284"
This reverts commit c286199adc.
2025-12-08 11:53:08 -06:00
Oleksandr Pavlyk
ce9a76167f Use nvbench::stop_runner_loop to signal stop of runner loop
Add try/catch around Python calls to improve keyboard interrup
response.
2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
c286199adc Scripts to triage 284 2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
de471e1d42 Use pybind11==3.0.1, do not use pybind11_add_module 2025-12-05 19:38:11 -06:00
Ashwin Srinath
77b7afc3c9 Remove the Python version file 2025-12-03 16:23:14 -05:00
Ashwin Srinath
29389b5791 Initial wheel build and publishing infrastructure 2025-12-03 10:15:32 -05:00
Asher Mancinelli
e91559edf0 Update README.md 2025-11-14 14:34:18 -08:00
Jaya Venkatesh
0f997271f7 added numba-cuda to requirements
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
2025-09-16 14:54:08 -07:00