nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-05-11 17:00:01 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	d13a0fde32	Correct cuda cccl examples per change in api (#353 )	2026-05-06 13:30:44 -05:00
Oleksandr Pavlyk	f392725015	Correct Python API signature of State.get_axis_values_as_strings (#346 ) * Correct Python API signature of State.get_axis_values_as_strings The C++ API has default boolean argument color, but Python API declared no arguments. Closes #345 * Also exercise invocation of get_axis_values_as_string with keyword argument value * Remove use of cuda.core.experimental	2026-05-04 08:40:29 -05:00
Oleksandr Pavlyk	a3364ca5c7	Port changes to the package from #323 (#337 ) Fixed relative text alignment in docstrings to fix autodoc warnigns Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions to start with underscore, signaling that these functions are internal and should not be documented Account for test_cpp_exceptions -> _test_cpp_exception, same for _py_ Make sure to reset __module__ of reexported symbols to be cuda.bench	2026-04-22 08:28:15 -05:00
Oleksandr Pavlyk	b0a46f44c2	Modularize color handling (#336 ) * Introduce function colorize to modularize colorization/no-color handling * Use sns.set_theme instead of deprecated sns.set() * Use str.format instead of legacy % syntax * Simplified iteration over list Use f-string (supported since Python 3.6) instead of str.format for better readability and performance	2026-04-14 08:09:44 -05:00
pre-commit-ci[bot]	8d23e3e73c	[pre-commit.ci] pre-commit autoupdate (#333 ) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/pre-commit/mirrors-clang-format: v21.1.8 → v22.1.2](https://github.com/pre-commit/mirrors-clang-format/compare/v21.1.8...v22.1.2) - [github.com/astral-sh/ruff-pre-commit: v0.14.10 → v0.15.9](https://github.com/astral-sh/ruff-pre-commit/compare/v0.14.10...v0.15.9) - [github.com/codespell-project/codespell: v2.4.1 → v2.4.2](https://github.com/codespell-project/codespell/compare/v2.4.1...v2.4.2) * [pre-commit.ci] auto code formatting --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>	2026-04-13 16:24:55 +00:00
Oleksandr Pavlyk	e62c5b6f79	Correct description/hint entries for summaries with name "Noise" (#335 ) See #334	2026-04-13 11:13:37 -05:00
Nader Al Awar	373970323f	Merge pull request #331 from oleksandr-pavlyk/update-python-examples Update python examples	2026-04-02 15:20:24 -04:00
Oleksandr Pavlyk	39730efbc3	Update requirements to reflect packages used by examples	2026-04-02 10:37:17 -05:00
Oleksandr Pavlyk	9f75642387	Add patch to cutlass.base_dsl.dsl.BaseDSL to work-around a bug See https://github.com/NVIDIA/cutlass/issues/3142	2026-04-02 10:29:31 -05:00
Nader Al Awar	488173a242	Add `--no-color` flag to nvbench_compare.py which can be used for github issues and PRs python-0.2.1	2026-04-01 18:27:54 -04:00
Nader Al Awar	7a68e53df0	Rename flag from markdown to no-color	2026-04-01 17:01:29 -05:00
Nader Al Awar	7e5e784855	Add --markdown flag to nvbench_compare.py which can be use for github issues/prs	2026-04-01 14:53:13 -05:00
Oleksandr Pavlyk	93bc59d05c	Renamed CUTLASS example to reflect that it uses CuteDSL	2026-04-01 08:24:29 -05:00
Oleksandr Pavlyk	e4cfddeb87	Rewrote cutlass_gemm example to use CuteDSL	2026-04-01 08:23:41 -05:00
Oleksandr Pavlyk	3f284b4004	Renamed cccl_* examples cccl_parallel_* -> cuda_compute_* cccl_cooperative_* -> cuda_coop_*	2026-04-01 08:20:20 -05:00
Oleksandr Pavlyk	5bdb30f4b6	Update to cccl_parallel_segmented_reduce example per changes in API Update namespace changes. Use make_segmented_reduce factory function, and update call signatures.	2026-04-01 08:18:15 -05:00
Oleksandr Pavlyk	d8739fc208	Update to cccl_cooperative_block_reduce example	2026-04-01 08:17:52 -05:00
Oleksandr Pavlyk	974eb5ee0f	Replace use of cupy.cuda.ExternalStream with cupy.cuda.Stream.from_external	2026-04-01 08:17:12 -05:00
Oleksandr Pavlyk	7c60edcc0a	cuda.core.experimental -> cuda.core	2026-04-01 08:16:04 -05:00
Oleksandr Pavlyk	836a6c12f4	Merge pull request #326 from oleksandr-pavlyk/fix-sfinae-incomplete Fix GCC16 sfinae incomplete warnings. GCC16 started requiring that the type `T` used in `std::reference_wrapper<T>` is complete where using `-std=c++17`. Since NVBench has to forward declare some types in header files to break circular dependency, use of incomplete type breaks build due to use of `-Werror` flag due to `-Wsfinae-incomplete` warning emitted by GCC16. This commit replaced affected uses of `std::reference_wrapper<const nvbench::benchmark_base>` in state.cxx, and `std::reference_wrapper<nvbench::printer_base>` in benchmark_base.cxx with raw pointers.	2026-03-24 16:02:28 -05:00
Oleksandr Pavlyk	317dc6824e	Mark NVBench headers as SYSTEM for consuming targets + FIX (#330 ) * Mark NVBench headers as SYSTEM for consuming targets. Fixes #30. * As nvbench.main links to nvbench as INTERFACE only, it no longer consumes usage reqs of nvbench Because of this nvbench.main was no longer consuming dependence on CUDA::toolkit include dirs. This PR links nvbench.main to ${ctk_libraries} privately to reestablish that dependency * Implement use of pragma system_header in NVBench 1. Add code to nvbench/config.cuh.in to define NVBENCH_IMPLICIT_SYSTEM_HEADER_* preprocessor variable dependending on compiler, unless NVBENCH_NO_IMPLICIT_SYSTEM_HEADER was defined. 2. Build NVBench targets with -DNVBENCH_NO_IMPLICIT_SYSTEM_HEADER 3. Modify each header file in nvbench/ folder to - include <nvbench/config.cuh> - Execute pragma <OPTIONAL_CMPLR> system_header guarded by checks for defined preprocessor variables - Do the above two steps before any other headers are included --------- Co-authored-by: Allison Piper <apiper@nvidia.com>	2026-03-23 15:10:41 -04:00
Oleksandr Pavlyk	9a91b9ef0c	Reworked cupti_profiler to use Host + Range Profiler APIs end-to-end (#327 ) * Reworked cupti_profiler to use Host + Range Profiler APIs end-to-end NVPW_* API has been deprecated since CTK 13.0. Followed advice in compliation message to replace NVPW_* API with CUPTI Profiler Host API. `libnvbench.so` no longer links to `nvperf_host` directly, only to `libcupti`. NVBench uses the CUPTI Host API to build a config image from metric names, and the Range Profiler API to collect and decode counters. The host API never collects data directly; it prepares and evaluates data produced by range profiling. Introduce `host_impl`/`profiler_init_guard` to manage CUPTI Host object and initialization/deinitialization, including safe move-assignment cleanup. `profiler_init_guard` initializes profiler, and throws if CUPTI returns an error code. `profiler_init_guard::finalize_profiler()` de-inits profiler and returns the error code. Destructor calls finalize_profiler, but ignores the status code. If user wants to explicitly de-initialize profiler and handle the error, he/she is advised to call `finalize_profiler()` directly. The guard has a boolean member variable to allow destructor to work even if user explicitly called finalize_profiler() method. The old counter-data prefix/scratch flow was replaced with the Range Profiler counter data image sizing/initialization path and decode flow. Host API metric filtering (base metrics + context scope) and Host-side evaluation to GPU values via cuptiProfilerHostEvaluateToGpuValues is implemented. - Host object: `host_impl::object` in `nvbench/cupti_profiler.cxx`. - Range profiler object: `host_impl::range_profiler_object`. - Config image: `m_config_image`. - Counter data image: `m_data_image`. 1) Host init + config image - `initialize_profiler_host()` creates the host object. - `initialize_config_image_host()` adds metrics and builds the config image. 2) Range profiler enable + counter data image - `enable_range_profiler()` creates the range profiler object. - `initialize_counter_data_image()` sizes and initializes the data image using the range profiler object, matching the CUPTI samples. 3) Config + collect + decode - `set_range_profiler_config()` binds the config image + data image. - `start_user_loop()` / `stop_user_loop()` push/pop the user range and start/stop the range profiler. - `process_user_loop()` decodes counter data via `cuptiRangeProfilerDecodeData()`. 4) Evaluate metrics - `get_counter_values()` calls `cuptiProfilerHostEvaluateToGpuValues()` to convert counter data into metric values. The * Use class instead of struct in profiler_init_guard; forward declaration * Add SFINAE guards before accessing members not present in earlier CTK versions * Check if cupti_profiler_host.h exists, use old/new implementation based on that check 1. Reintroduced legacy `cupti_profiler_nvpw.cuh` and `cupti_profiler_nvpw.cuh`. 2. Moved profiler-host-API implementation to `cupti_profiler_host.cuh`, `cupti_profiler_host.cxx`. 3. Add `nvbench/cupti_profiler.cuh` which checks if `cupti_profiler_host.h` header is known and includes `cupti_profiler_host.cuh` or `cupti_profiler_nvpw.cuh` respectively. 4. In cmake, we check if ${nvbench_cupti_root}/include/cupti_profiler_host.h file exists. If it does not, `libnvbench.so` would have dependency on libnvperf_host and libnvperf_target in addition to dependency on libcupti. If the header exists, it would only depend on libcupti	2026-03-23 11:51:16 -04:00
Oleksandr Pavlyk	1d823c6975	Merge pull request #328 from oleksandr-pavlyk/set-type-axes-names-in-auto-throughput-example	2026-03-20 18:44:03 -05:00
Oleksandr Pavlyk	56cdaed0af	Merge pull request #299 from NVIDIA/pre-commit-ci-update-config [pre-commit.ci] pre-commit autoupdate	2026-03-20 16:15:20 -05:00
Oleksandr Pavlyk	a6e570083d	Merge pull request #329 from oleksandr-pavlyk/fix-fmt-target-name-in-tests Link against fmt::fmt target, not fmt	2026-03-20 08:49:05 -05:00
Oleksandr Pavlyk	4c278b08b3	Link against fmt::fmt target, not fmt. Consistent with nvbench/CMakeLists.txt Co-authored-by: Dominic Charrier <docharri@amd.com>	2026-03-19 14:53:06 -05:00
Oleksandr Pavlyk	49636c70b3	Set type-axes name to ItemsPerThread to replace auto-generated T	2026-03-19 14:35:46 -05:00
Bernhard Manfred Gruber	728212f9f1	Merge pull request #315 from bernhardmgruber/plot_diff_script Extend `nvbench_compare.py` with `--plot`, axis/benchmark filtering, and dark mode	2026-02-28 01:38:27 +01:00
Bernhard Manfred Gruber	4164909c52	Feedback	2026-02-28 01:19:18 +01:00
Oleksandr Pavlyk	5387d2005b	Merge pull request #322 from oleksandr-pavlyk/feature/save-frequencies Save frequencies when bulk-saving of times is enabled SM clock rates are now always collected, even if throttling threshold is set to zero	2026-02-27 13:30:11 -06:00
Oleksandr Pavlyk	c9705de4a4	Reserve enough space clock-rates for min samples, if specified	2026-02-27 12:49:35 -06:00
Bernhard Manfred Gruber	0abc8ec82b	Extend nvbench_compare.py with `--plot`, axis/benchmark filtering, and dark mode Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>	2026-02-27 11:06:20 +01:00
Oleksandr Pavlyk	ba7150e447	Merge pull request #314 from bernhardmgruber/plot_script Add a script to plot benchmark results	2026-02-26 12:59:16 -06:00
Bernhard Manfred Gruber	800f640c20	Apply reviewer feedback	2026-02-26 19:23:51 +01:00
Oleksandr Pavlyk	998ab125ce	Don't override m_check_throttling if throttling threshold is non-positive measure_cold class now directly inherits m_check_throttling from state. This ensures that when `--jsonbin` is specified frequency data corresponding to timing data are available to write out.	2026-02-20 16:34:53 -06:00
Oleksandr Pavlyk	731e0c2c30	Swapped data members m_sm_clock_rates and m_sm_clock_rate_accumulator This places all std::vector members together. Added default initialization to all std::vector members, and all other members with default constructors. Exceptions are references and nvbench::launch m_launch; member	2026-02-19 15:33:57 -06:00
Oleksandr Pavlyk	4da9f431c0	Templatize write_out_values for different storage formats This could be used to save data as float32_t, or float64_t. This flexibility is useful for experimentation.	2026-02-19 15:32:00 -06:00
Oleksandr Pavlyk	988420b5b1	Use write_out_values utility to save frequencies The utility was already used to save times	2026-02-13 10:19:06 -06:00
Georgy Evtushenko	40b2f4ece2	Better place to stop freq timer?	2026-02-13 09:53:59 -06:00
Georgy Evtushenko	a487a38895	Dump frequencies	2026-02-13 08:49:41 -06:00
Bernhard Manfred Gruber	d3a0bec4a8	Feedback from review	2026-02-05 14:13:16 +01:00
Bernhard Manfred Gruber	28ed32bb47	Implement dark mode using style sheets	2026-02-05 14:00:33 +01:00
Bernhard Manfred Gruber	ec9759037d	I have no idea what I am doing	2026-02-05 11:15:27 +01:00
Bernhard Manfred Gruber	ccde9fc4d4	More	2026-02-05 10:56:36 +01:00
Bernhard Manfred Gruber	0be190b407	Add a script to plot benchmark results	2026-02-05 10:36:52 +01:00
Nader Al Awar	dc59f98ecd	Remove cupti from cuda-bench dependencies (#311 ) python-0.2.0	2026-02-03 14:16:26 -06:00
Bernhard Manfred Gruber	90ad8bcbc7	Merge pull request #296 from bernhardmgruber/compare_sub_results Allow partial comparison in `nvbench_compare.py`	2026-02-03 20:02:34 +01:00
Bernhard Manfred Gruber	c6ef87575c	Allow partial comparison in nvbench_compare.py Fixes: #295	2026-02-03 16:32:11 +01:00
Nader Al Awar	d75fc74162	Merge branch 'main' into remove-cupti-python	2026-02-03 08:58:41 -06:00
Oleksandr Pavlyk	867d5d4276	Merge pull request #294 from oleksandr-pavlyk/add-docstrings	2026-02-03 08:51:55 -06:00

1 2 3 4 5 ...

760 Commits