nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-05-12 09:15:47 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	e53a1a2654	Use median and IR/relative as cmp_time/ref_time and cmp_noise/ref_noise These measures are less sensitive to outliers	2026-05-04 16:14:56 -05:00
Oleksandr Pavlyk	ea592b6444	Tweaks for nvbench_compare 1. For JSON files that contains repeated measurements of run-time axis values, make sure that scripts compares corresponding reference entries. If cmp had two states with the same name and ref had two, we would compare measurements for each state in cmp against the first state in ref. Change here introduces counters tracking how many times each particular axis value, and retrieve corresponding entry in ref. Previously, I had ``` \| BlockSize \| NumBlocks \| Ref Time \| Ref Noise \| Cmp Time \| Cmp Noise \| Diff \| %Diff \| Status \| \|-------------\|-------------\|------------\|-------------\|------------\|-------------\|-----------\|---------\|----------\| \| 2^8 \| 64 \| 1.776 ms \| 0.46% \| 1.777 ms \| 0.40% \| 1.024 us \| 0.06% \| SAME \| \| 2^8 \| 64 \| 1.776 ms \| 0.46% \| 1.774 ms \| 0.52% \| -2.048 us \| -0.12% \| SAME \| \| 2^8 \| 64 \| 1.776 ms \| 0.46% \| 1.773 ms \| 0.52% \| -3.072 us \| -0.17% \| SAME \| \| 2^8 \| 64 \| 1.776 ms \| 0.46% \| 1.774 ms \| 0.58% \| -2.048 us \| -0.12% \| SAME \| \| 2^8 \| 64 \| 1.776 ms \| 0.46% \| 1.773 ms \| 0.58% \| -3.072 us \| -0.17% \| SAME \| ``` and now it becomes ``` \| BlockSize \| NumBlocks \| Ref Time \| Ref Noise \| Cmp Time \| Cmp Noise \| Diff \| %Diff \| Status \| \|-------------\|-------------\|------------\|-------------\|------------\|-------------\|-----------\|---------\|----------\| \| 2^8 \| 64 \| 1.776 ms \| 0.46% \| 1.777 ms \| 0.40% \| 1.024 us \| 0.06% \| SAME \| \| 2^8 \| 64 \| 1.773 ms \| 0.64% \| 1.774 ms \| 0.52% \| 1.024 us \| 0.06% \| SAME \| \| 2^8 \| 64 \| 1.774 ms \| 0.46% \| 1.773 ms \| 0.52% \| -1.024 us \| -0.06% \| SAME \| \| 2^8 \| 64 \| 1.773 ms \| 0.46% \| 1.774 ms \| 0.58% \| 1.024 us \| 0.06% \| SAME \| \| 2^8 \| 64 \| 1.774 ms \| 0.52% \| 1.773 ms \| 0.58% \| -1.024 us \| -0.06% \| SAME \| ``` With the following raw data expected ``` (py313) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. \| .benchmarks[] \| .states[] \| .summaries[] \| select(.tag == "nv/cold/time/gpu/median") \| .data[] \| .value' base.json "0.0017756160497665405" "0.0017725440263748169" "0.001773568034172058" "0.0017725440263748169" "0.001773568034172058" (py313) opavlyk@NV-22T4X34:~/repos/nvbench$ jq '. \| .benchmarks[] \| .states[] \| .summaries[] \| select(.tag == "nv/cold/time/gpu/median") \| .data[] \| .value' test.json "0.0017766400575637818" "0.001773568034172058" "0.0017725440263748169" "0.001773568034172058" "0.0017725440263748169" ``` 2. nvbench_compare changes from using min_noise = min(ref_noise, cmp_noise) to using max_noise = max(ref_noise, cmp_noise) Using larger of ref and cmp noise level as a reference against which to gauge timing difference ratio makes more sense.	2026-05-04 16:14:56 -05:00
Oleksandr Pavlyk	f392725015	Correct Python API signature of State.get_axis_values_as_strings (#346 ) * Correct Python API signature of State.get_axis_values_as_strings The C++ API has default boolean argument color, but Python API declared no arguments. Closes #345 * Also exercise invocation of get_axis_values_as_string with keyword argument value * Remove use of cuda.core.experimental	2026-05-04 08:40:29 -05:00
Oleksandr Pavlyk	a3364ca5c7	Port changes to the package from #323 (#337 ) Fixed relative text alignment in docstrings to fix autodoc warnigns Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions to start with underscore, signaling that these functions are internal and should not be documented Account for test_cpp_exceptions -> _test_cpp_exception, same for _py_ Make sure to reset __module__ of reexported symbols to be cuda.bench	2026-04-22 08:28:15 -05:00
Oleksandr Pavlyk	b0a46f44c2	Modularize color handling (#336 ) * Introduce function colorize to modularize colorization/no-color handling * Use sns.set_theme instead of deprecated sns.set() * Use str.format instead of legacy % syntax * Simplified iteration over list Use f-string (supported since Python 3.6) instead of str.format for better readability and performance	2026-04-14 08:09:44 -05:00
Nader Al Awar	373970323f	Merge pull request #331 from oleksandr-pavlyk/update-python-examples Update python examples	2026-04-02 15:20:24 -04:00
Oleksandr Pavlyk	39730efbc3	Update requirements to reflect packages used by examples	2026-04-02 10:37:17 -05:00
Oleksandr Pavlyk	9f75642387	Add patch to cutlass.base_dsl.dsl.BaseDSL to work-around a bug See https://github.com/NVIDIA/cutlass/issues/3142	2026-04-02 10:29:31 -05:00
Nader Al Awar	7a68e53df0	Rename flag from markdown to no-color	2026-04-01 17:01:29 -05:00
Nader Al Awar	7e5e784855	Add --markdown flag to nvbench_compare.py which can be use for github issues/prs	2026-04-01 14:53:13 -05:00
Oleksandr Pavlyk	93bc59d05c	Renamed CUTLASS example to reflect that it uses CuteDSL	2026-04-01 08:24:29 -05:00
Oleksandr Pavlyk	e4cfddeb87	Rewrote cutlass_gemm example to use CuteDSL	2026-04-01 08:23:41 -05:00
Oleksandr Pavlyk	3f284b4004	Renamed cccl_* examples cccl_parallel_* -> cuda_compute_* cccl_cooperative_* -> cuda_coop_*	2026-04-01 08:20:20 -05:00
Oleksandr Pavlyk	5bdb30f4b6	Update to cccl_parallel_segmented_reduce example per changes in API Update namespace changes. Use make_segmented_reduce factory function, and update call signatures.	2026-04-01 08:18:15 -05:00
Oleksandr Pavlyk	d8739fc208	Update to cccl_cooperative_block_reduce example	2026-04-01 08:17:52 -05:00
Oleksandr Pavlyk	974eb5ee0f	Replace use of cupy.cuda.ExternalStream with cupy.cuda.Stream.from_external	2026-04-01 08:17:12 -05:00
Oleksandr Pavlyk	7c60edcc0a	cuda.core.experimental -> cuda.core	2026-04-01 08:16:04 -05:00
Oleksandr Pavlyk	836a6c12f4	Merge pull request #326 from oleksandr-pavlyk/fix-sfinae-incomplete Fix GCC16 sfinae incomplete warnings. GCC16 started requiring that the type `T` used in `std::reference_wrapper<T>` is complete where using `-std=c++17`. Since NVBench has to forward declare some types in header files to break circular dependency, use of incomplete type breaks build due to use of `-Werror` flag due to `-Wsfinae-incomplete` warning emitted by GCC16. This commit replaced affected uses of `std::reference_wrapper<const nvbench::benchmark_base>` in state.cxx, and `std::reference_wrapper<nvbench::printer_base>` in benchmark_base.cxx with raw pointers.	2026-03-24 16:02:28 -05:00
Bernhard Manfred Gruber	4164909c52	Feedback	2026-02-28 01:19:18 +01:00
Bernhard Manfred Gruber	0abc8ec82b	Extend nvbench_compare.py with `--plot`, axis/benchmark filtering, and dark mode Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>	2026-02-27 11:06:20 +01:00
Bernhard Manfred Gruber	800f640c20	Apply reviewer feedback	2026-02-26 19:23:51 +01:00
Bernhard Manfred Gruber	d3a0bec4a8	Feedback from review	2026-02-05 14:13:16 +01:00
Bernhard Manfred Gruber	28ed32bb47	Implement dark mode using style sheets	2026-02-05 14:00:33 +01:00
Bernhard Manfred Gruber	ec9759037d	I have no idea what I am doing	2026-02-05 11:15:27 +01:00
Bernhard Manfred Gruber	ccde9fc4d4	More	2026-02-05 10:56:36 +01:00
Bernhard Manfred Gruber	0be190b407	Add a script to plot benchmark results	2026-02-05 10:36:52 +01:00
Nader Al Awar	dc59f98ecd	Remove cupti from cuda-bench dependencies (#311 )	2026-02-03 14:16:26 -06:00
Bernhard Manfred Gruber	c6ef87575c	Allow partial comparison in nvbench_compare.py Fixes: #295	2026-02-03 16:32:11 +01:00
Nader Al Awar	d75fc74162	Merge branch 'main' into remove-cupti-python	2026-02-03 08:58:41 -06:00
Nader Al Awar	4fa4296810	Remove cuda.pathfinder function	2026-02-02 16:43:45 -06:00
Nader Al Awar	f2d5730104	Disable CUPTI in cmake file	2026-02-02 16:03:15 -06:00
Nader Al Awar	6df5fc8c67	Remove cupti from cuda-bench dependencies	2026-02-02 15:37:13 -06:00
Oleksandr Pavlyk	8ff0557ad8	Replace use of py::handle to store global_registry Use py::gil_safe_call_once_and_store facility pybind11 provides.	2026-02-02 11:55:48 -06:00
Oleksandr Pavlyk	39c29026fd	Move docstrings from PYI file to implementation Added tests that docstrings exist and are not empty. This closes #291	2026-02-02 11:55:48 -06:00
Nader Al Awar	edf0b80599	Add installation instructions	2026-01-30 09:32:44 -06:00
Nader Al Awar	fa1eed69c0	Rename test file to refer to cuda_bench	2026-01-29 13:53:29 -06:00
Nader Al Awar	711c1e2eb1	Replace all occurences of pynvbench with cuda-bench	2026-01-29 13:25:17 -06:00
Nader Al Awar	5e7adc5c3f	Build multi architecture cuda wheels (#302 ) * Add cuda architectures to build wheel for * Package scripts in wheel * Separate cuda major version extraction to fix architecutre selection logic * Add back statement printing cuda version * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-01-29 01:13:24 +00:00
Ashwin Srinath	a681e2185d	Add multi-cuda wheel build (#289 ) Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com> Co-authored-by: Nader Al Awar <naderalawar@gmail.com>	2026-01-28 10:37:55 -05:00
Oleksandr Pavlyk	f6a9b245d3	Only trigger skipping of outstanding benchmarks on KeyboardInterrupt exception, on others benchmakr is to continue execution	2025-12-08 14:46:59 -06:00
Oleksandr Pavlyk	7e9a9a8983	Replace main_arg_run_benchmarks with run_interriptible This loop uses benchmark.run_or_skip to resolve #284 even for scripts that contain more than one benchmark, or when a script with a single benchmark is executed when more than one device is available.	2025-12-08 14:29:27 -06:00
Oleksandr Pavlyk	a7763bdd7a	Remove debug outputs	2025-12-08 12:25:31 -06:00
Oleksandr Pavlyk	b2a80c92b8	Revert "Scripts to triage 284" This reverts commit `c286199adc`.	2025-12-08 11:53:08 -06:00
Oleksandr Pavlyk	ce9a76167f	Use nvbench::stop_runner_loop to signal stop of runner loop Add try/catch around Python calls to improve keyboard interrup response.	2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk	c286199adc	Scripts to triage 284	2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk	de471e1d42	Use pybind11==3.0.1, do not use pybind11_add_module	2025-12-05 19:38:11 -06:00
Ashwin Srinath	77b7afc3c9	Remove the Python version file	2025-12-03 16:23:14 -05:00
Ashwin Srinath	29389b5791	Initial wheel build and publishing infrastructure	2025-12-03 10:15:32 -05:00
Asher Mancinelli	e91559edf0	Update README.md	2025-11-14 14:34:18 -08:00
Jaya Venkatesh	0f997271f7	added numba-cuda to requirements Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>	2025-09-16 14:54:08 -07:00

1 2 3

129 Commits