nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-04-19 14:28:53 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	df426a0bad	Add examples/axes.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	576c473481	Add implementation of and signature for State.getDevice make batch/sync arguments of State.exec keyword-only Provide default column_name value for State.addElementCount method, so that it can be called state.addElementCount(count), or as state.addElementCount(count, column_name="Descriptive Name")	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	2507bc2263	Add Python example based on C++ example/auto_throughput.cpp	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	4950a50961	Add empty py.typed to signal mypy that package has type annotations	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	c9f0785aed	Replace uses of deprecated typing.Tuple, typing.Callable, etc. Also use typing.Self to encode that `Benchmark.addInt64Axis` returns self.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	6f8bcdc774	Fixed correctness of nvbench.State.getStream() method Fix run-time exception: ``` Fail: Unexpected error: RuntimeError: return_value_policy = copy, but type is non-copyable! (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details) ``` caused by attempt to returning move-only `nvbench::cuda_stream` class instance using default `pybind11::return_value_policy::copy`.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	e768ce28b6	Add Python stub file for cuda.nvbench API	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	c49d718f65	Corrected nvbench.State.getBlockingKernel -> getBlockingKernelTimeout Similar change for setBlockingKernelTimeout. Corrected statement in a comment.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	c184549cda	Import and reexport symbols from _nvbench one-by-one	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	b88cc78aeb	Add license header to py_nvbench.cpp Also updated comment as to why calling `nvbench::benchmark_manager::get().initialize()` is necessary for running all tests.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	6552ef503c	Draft of Python API for NVBench The prototype is based on pybind11 to minimize boiler-plate code needed to deal with move-only semantics of many nvbench classes.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	a9fb32e25d	Merge pull request #254 from oleksandr-pavlyk/remove-cli-run-once-and-disable-blocking-kernel-options Remove cli run-once and disable-blocking-kernel options	2025-07-28 15:10:50 -05:00
Oleksandr Pavlyk	4ad3088a47	Update docs/cli_help.md Spare users of implementation details in description of `--profile` option Co-authored-by: Allison Piper <apiper@nvidia.com>	2025-07-28 14:52:57 -05:00
Oleksandr Pavlyk	25c604cf37	Fix typo, corrected control flow logic flaw Verify that `--profile` option results in setting `m_run_once`: ``` (nvbench) opavlyk@ee09c48-lcedt:~/repos/nvbench$ ./build/bin/nvbench.example.cpp20.axes -b simple -d 0 --profile \| grep "Pass: Cold" Pass: Cold: 1.006560ms GPU, 1.009277ms CPU, 0.00s total GPU, 0.00s total wall, 1x (nvbench) opavlyk@ee09c48-lcedt:~/repos/nvbench$ ./build/bin/nvbench.example.cpp20.axes -b simple -d 0 \| grep "Pass: Cold" Pass: Cold: 1.002844ms GPU, 1.011917ms CPU, 0.50s total GPU, 0.52s total wall, 499x ```	2025-07-28 14:40:02 -05:00
Oleksandr Pavlyk	5b6c3818f4	Corrected blocking kernel timeout message	2025-07-28 14:39:54 -05:00
Oleksandr Pavlyk	2ab5e2d1be	Run once disables blocking kernel (#252 ) * Measure cold must not use block_kernel for single runs Per https://github.com/NVIDIA/nvbench/issues/242, we should not use blocking kernel when --run-once, or --profile is used to avoid possible deadlocks when providing with external tools, also to avoid deadlocking when Python programs load the program on the first execution. * Measure hot should not use blocking kernel during warmup This change follows suite of measure_cold, where it is prompted by deadlock, see https://github.com/NVIDIA/nvbench/pull/241 * Remove setting of CUDA_MODULE_LOADING=EAGER This is no longer necessary as warm-up runs in regular runs, or the single run in (run-once/profile) no longer use blocking kernel.	2025-07-28 12:14:54 -07:00
Oleksandr Pavlyk	d160a2bafa	Replace --run-once in testing/CMakeLists.txt with --profile	2025-07-28 12:03:42 -05:00
Oleksandr Pavlyk	8416342af0	Remove mentions of --run-once and --disable-blocking-kernel from help Text for --profile modified to be self-consistent, i.e., not to refer to removed --run-once and --disable-blocking-kernel for explanantion of what it does.	2025-07-28 07:55:25 -05:00
Oleksandr Pavlyk	3bb34b1b1f	Remove suggestion to use --disable-blocking-kernel The text printed when blocking kernel times out already suggests to use --profile option.	2025-07-28 07:54:16 -05:00
Oleksandr Pavlyk	281a08a57e	Remove CLI --run-once and --disable-blocking-kernel options Removed option_parser::disable_blocking_kernel and option_parse::set_run_once methods. Added option_parser::enable_profile method instead, which calls ``` bench.set_run_once(true); bench.disable_blocking_kernel(true); ```	2025-07-28 07:51:50 -05:00
Oleksandr Pavlyk	3de9dc95da	Merge pull request #250 from oleksandr-pavlyk/measure-cold-with-blocking-kernel-to-start-cpu-timer-in-kernel-timer-start Include host work of benched fn in CPU time when using blocking kernel	2025-07-22 15:32:09 -05:00
Oleksandr Pavlyk	e5a04c825d	Fix description text for absolute standard deviation (#251 ) The entry with tag "nv/cold/time/cpu/stdev/absolute" stores value of standard deviation of execution duration measurments, not the relative standard deviation.	2025-07-22 15:17:30 -04:00
Oleksandr Pavlyk	2ab76a8d5c	Include host activity of benched fn in CPU time when blocking kernel is used Based on findings of https://github.com/NVIDIA/nvbench/issues/249, m_cpu_timer.start() is being called from kernel_launcher_timer.start() method. Previously it was called from kernel_launcher_timer.stop() just before unblock_stream() call with the intention to hone in time to execute GPU work, but this excluded any host work performed by the benched function from CPU time.	2025-07-21 15:36:19 -05:00
Bernhard Manfred Gruber	0c24f0250b	Avoid cuda/std types in host compiler headers (#246 ) Fixes: #245	2025-07-17 03:27:39 -07:00
pre-commit-ci[bot]	38ac5d7339	[pre-commit.ci] pre-commit autoupdate (#243 )	2025-07-11 07:33:25 -04:00
Oleksandr Pavlyk	b8c664d22e	Do not use blocking kernel in warmup run of measure_cold (#241 ) See https://github.com/NVIDIA/nvbench/issues/240	2025-07-03 21:22:12 -07:00
Allard Hendriksen	53bf11a27d	Fix axes metadata assert (#239 )	2025-07-03 09:32:44 -04:00
Oleksandr Pavlyk	c463a783bb	Allow kernel_generator to be stateful (#234 ) In python kernel generator is a user-defined callable. We need to capture Python object of that callable in kernel generator provided for each benchmark. To this end, nvbench::benchmark has been modified to have member of kernel_generator type (must be copy-constructable). Constructor acquires an optional parameter of type `kernel_generator` with default value of default-contstructed instance. nvbench::runner was modified to store kernel_generator instance as well. Its run method creates a fresh copy of stored instance for each invocation, just as it was happening before. nvbench tests/examples pass with this change.	2025-06-28 19:17:12 -07:00
Oleksandr Pavlyk	c2a30cf0d2	Set underlying type for enum class exec_tag to uint16_t (#233 ) This change reduces size of exec_tag instance from 4 bytes to 2 bytes, it also makes it more explicit what underlying type exec_tag is using.	2025-06-28 18:03:25 -07:00
Allison Piper	8e3e0ad117	Include RAPIDS.cmake to WAR network issues on CI. (#236 ) See also https://github.com/rapidsai/rmm/pull/1886	2025-06-24 17:03:30 -04:00
Oleksandr Pavlyk	bc8319d5d9	Fix obvious typo in getter for device_manager singleton docstring (#232 )	2025-06-13 10:03:54 -04:00
Oleksandr Pavlyk	b1551d2eb7	Update json and fmt projects to latest versions (#229 )	2025-05-27 12:49:35 -04:00
Allison Piper	26f52a7175	Add cupti paths to INSTALL_RPATH. (#230 )	2025-05-22 12:56:22 -04:00
Allison Piper	b62c0d9d78	Update youtube link URL. (#226 )	2025-05-10 10:11:58 -04:00
Allison Piper	c5b8b3b494	Link GPU Mode talk talk from README. (#224 )	2025-05-09 16:19:02 -04:00
Allison Piper	f44f5cc22c	Remove min-time/max-noise API. (#223 ) These are now owned by the stdrel stopping criterion, and should not be exposed directly in the benchmark/state/etc APIs. This will affect users that are calling `NVBENCH_BENCH(...).set_min_time(...)` or `NVBENCH_BENCH(...).set_max_noise(...)`. These can be updated to `NVBENCH_BENCH(...).set_criterion_param_float64(["min-time"\|"max-noise"], ...)`.	2025-05-08 10:02:54 -04:00
Allison Piper	a36e15f6ca	Fix issues with default stopping params. (#221 )	2025-05-07 11:01:36 -04:00
Allison Piper	249a74f73b	Bump CI to CTK 12.9, regen devcontainers. (#219 )	2025-05-02 12:05:50 -04:00
Allison Piper	9d189280de	Fix `get_config_count` for CPU-only benchmarks. (#218 )	2025-05-01 12:34:35 -04:00
Sergey Pavlov	433376fd83	Restrict stopping criterion parameter usage in command line (#174 ) * restrict stopping criterion parameter usage in command line * Update docs for stopping criterion. * Add convenience benchmark_base API for criterion params. * Add more test cases for stopping criterion parsing. --------- Co-authored-by: Sergey Pavlov <psvvsp89@gmail.com> Co-authored-by: Allison Piper <alliepiper16@gmail.com>	2025-04-30 15:53:45 -04:00
Elias Stehle	ca0e795b46	Merge pull request #113 from elstehle/fix/per-device-stream Fixes cudaErrorInvalidValue when running on nvbench-created cuda stream	2025-04-30 15:40:33 -04:00
Allison Piper	4879607c70	Merge pull request #216 from alliepiper/disable_throttle_for_sync Disable throttling when `sync` exec tag is used.	2025-04-24 19:02:39 -04:00
Allison Piper	e4057575c7	Disable throttling when `sync` exec tag is used.	2025-04-24 22:48:35 +00:00
Allison Piper	0573ffa9bd	Merge pull request #214 from PointKernel/fix-throttle-setters Fix throttle setter return values and update customization example	2025-04-24 13:53:20 -04:00
Yunsong Wang	dbd12f61b8	Revert example change	2025-04-24 10:12:46 -07:00
Allison Piper	2938a94d49	Merge pull request #215 from alliepiper/dynamic_throttle_delay Dynamically increase recovery delay for consecutive discards.	2025-04-24 10:32:45 -04:00
Allison Piper	d12614b5cb	Dynamically increase recovery delay for consecutive discards.	2025-04-24 14:11:31 +00:00
Yunsong Wang	797f91bc7e	Update example to show to customize throttle threshold	2025-04-23 14:10:16 -07:00
Yunsong Wang	31efce1ec8	Fix throttle setters	2025-04-23 14:01:56 -07:00
Allison Piper	89bec09b82	Merge pull request #207 from alliepiper/throttle_followup Throttling followup	2025-04-18 08:48:41 -04:00

... 2 3 4 5 6 ...

714 Commits