nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-05-11 17:00:01 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	51fa07fab8	Avoid overloading get_int64_or_default as get_int64 Introduce get_int64_or_default method, and counterparts for float64 and string. Provided names for Python arguments. Tried generating Python stubs automatically with ``` stubgen -m cuda.nvbench._nvbench ``` Gave up on this, since it does not include doc-strings. It would be nice to compare auto-generated _nvbench.pyi with __init__.pyi for discrepancies though.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	dc7f9edfd4	Support nvbench.Benchmark.add_int64_power_of_two_axis	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	526856db4e	Fix typo in the method spelling	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	893cefb400	Fix the need to set PYTHONPATH, edited README Edit wheel.packages metadata to include namespace package "cuda". Updated README to remove the work-around of setting PYTHONPATH, as it is no longer necessary.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	a535a1d173	Fix type annotations in cuda.nvbench, and in examples	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	13ad115ca3	Add nvbench.Benchmark.set_run_once method	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	bd2b536ab4	cpu_only -> cpu_activity Change example to illustrate timing CPU work. First example does only CPU work (sleeps), use CPU-only timer. Second examples does both CPU and GPU work (sleeps in either case). Use cold-run timer with/without sync tag to measure both CPU and GPU times.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	d09df0f754	Expand examples/cpu_only.py Benchmark function that sleeps for 1 seconda on the host using CPU-only timer, as well as CPU/GPU timer that does/doesn't use blocking kernel. All three methods must report consistent values close to 1 second.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	e426368485	Correct propagating nvbench_main exceptions to Python python examples/cpu_only.py --run-once -d 0 --output foo.md used to trip SystemError, returned a result with an exception set. It now returns a clean NVBenchmarkError exception.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	9ab642cf69	Add suggestion to create conda environment with recent CMake to build nvbench	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	e589518376	Change test and examples from using camelCase to using snake_case as implementation changed	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	81fff085b9	Change method nameing from camelCase to snake_case This ensures names of Python API methods are consistent with those of C++ counterparts.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	11ae98389d	Replace use of py::object copy constructor with use of move constructor Change explicit constructor of benchmark_wrapper_t to use move-constructor of py::object instead of copy constructor by replacing `py::object(o)` with `py::object(std::move(o))`.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	d3071fb038	Addressed PR feedback re: definition of benchmark_wrapper_t See https://github.com/NVIDIA/nvbench/pull/237#discussion_r2183749750	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	c960ef75cc	Add examples/cpu_only.py based on code from PR feedback https://github.com/NVIDIA/nvbench/pull/237#issuecomment-3058594793	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	6b4da8c5cb	add comments to body of launcher_fn lambda in State.exec method	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	aa2b4d9960	Add Benchmark.setIsCPUOnly API	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	7f9d672cec	Raise Python exception if error is encountered while executing benchmarks Introduce new exception type to raise on errors that occurred while NVBench runs benchmarks.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	8c112d529f	Include Pybind11 headers before anything else See https://github.com/NVIDIA/nvbench/pull/237#discussion_r2183703828 for the rationale	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	6b1b2f3c30	Updated readme	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	203ef2046e	Add warm-up call to auto_throughput.py Add throughput.py example, which is based on the same kernel as auto_throughput.py but records global memory reads/writes amounts to output BWUtil metric measuring %SOL in bandwidth utilization.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	02ad6e5490	Implement Benchmark.setName	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	8589511f61	Corrected broken cccl_parallel_segmented_reduce.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	394324023f	Add example for benchmarking CuPy function	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	707b24ffb5	Add examples/cccl_parallel_segmented_reduce.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	883e5819b6	Use cuda.Stream.from_handle to create core.Stream from nvbench.CudaStream	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	b357af0092	Add examples/skip.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	964ec2e1bc	Add examples/exec_tag_sync.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	4f15840832	Use state.add_summary to supplement integral TypeID with meaningful type name	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	9dba866426	Add State.add_summary method state.add_summary(column_name: str, value: Union[int, float, str]) This is used in examples/axes.py to map integral value from Int64Axis to string description.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	df426a0bad	Add examples/axes.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	576c473481	Add implementation of and signature for State.getDevice make batch/sync arguments of State.exec keyword-only Provide default column_name value for State.addElementCount method, so that it can be called state.addElementCount(count), or as state.addElementCount(count, column_name="Descriptive Name")	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	2507bc2263	Add Python example based on C++ example/auto_throughput.cpp	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	4950a50961	Add empty py.typed to signal mypy that package has type annotations	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	c9f0785aed	Replace uses of deprecated typing.Tuple, typing.Callable, etc. Also use typing.Self to encode that `Benchmark.addInt64Axis` returns self.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	6f8bcdc774	Fixed correctness of nvbench.State.getStream() method Fix run-time exception: ``` Fail: Unexpected error: RuntimeError: return_value_policy = copy, but type is non-copyable! (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details) ``` caused by attempt to returning move-only `nvbench::cuda_stream` class instance using default `pybind11::return_value_policy::copy`.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	e768ce28b6	Add Python stub file for cuda.nvbench API	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	c49d718f65	Corrected nvbench.State.getBlockingKernel -> getBlockingKernelTimeout Similar change for setBlockingKernelTimeout. Corrected statement in a comment.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	c184549cda	Import and reexport symbols from _nvbench one-by-one	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	b88cc78aeb	Add license header to py_nvbench.cpp Also updated comment as to why calling `nvbench::benchmark_manager::get().initialize()` is necessary for running all tests.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	6552ef503c	Draft of Python API for NVBench The prototype is based on pybind11 to minimize boiler-plate code needed to deal with move-only semantics of many nvbench classes.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	a9fb32e25d	Merge pull request #254 from oleksandr-pavlyk/remove-cli-run-once-and-disable-blocking-kernel-options Remove cli run-once and disable-blocking-kernel options	2025-07-28 15:10:50 -05:00
Oleksandr Pavlyk	4ad3088a47	Update docs/cli_help.md Spare users of implementation details in description of `--profile` option Co-authored-by: Allison Piper <apiper@nvidia.com>	2025-07-28 14:52:57 -05:00
Oleksandr Pavlyk	25c604cf37	Fix typo, corrected control flow logic flaw Verify that `--profile` option results in setting `m_run_once`: ``` (nvbench) opavlyk@ee09c48-lcedt:~/repos/nvbench$ ./build/bin/nvbench.example.cpp20.axes -b simple -d 0 --profile \| grep "Pass: Cold" Pass: Cold: 1.006560ms GPU, 1.009277ms CPU, 0.00s total GPU, 0.00s total wall, 1x (nvbench) opavlyk@ee09c48-lcedt:~/repos/nvbench$ ./build/bin/nvbench.example.cpp20.axes -b simple -d 0 \| grep "Pass: Cold" Pass: Cold: 1.002844ms GPU, 1.011917ms CPU, 0.50s total GPU, 0.52s total wall, 499x ```	2025-07-28 14:40:02 -05:00
Oleksandr Pavlyk	5b6c3818f4	Corrected blocking kernel timeout message	2025-07-28 14:39:54 -05:00
Oleksandr Pavlyk	2ab5e2d1be	Run once disables blocking kernel (#252 ) * Measure cold must not use block_kernel for single runs Per https://github.com/NVIDIA/nvbench/issues/242, we should not use blocking kernel when --run-once, or --profile is used to avoid possible deadlocks when providing with external tools, also to avoid deadlocking when Python programs load the program on the first execution. * Measure hot should not use blocking kernel during warmup This change follows suite of measure_cold, where it is prompted by deadlock, see https://github.com/NVIDIA/nvbench/pull/241 * Remove setting of CUDA_MODULE_LOADING=EAGER This is no longer necessary as warm-up runs in regular runs, or the single run in (run-once/profile) no longer use blocking kernel.	2025-07-28 12:14:54 -07:00
Oleksandr Pavlyk	d160a2bafa	Replace --run-once in testing/CMakeLists.txt with --profile	2025-07-28 12:03:42 -05:00
Oleksandr Pavlyk	8416342af0	Remove mentions of --run-once and --disable-blocking-kernel from help Text for --profile modified to be self-consistent, i.e., not to refer to removed --run-once and --disable-blocking-kernel for explanantion of what it does.	2025-07-28 07:55:25 -05:00
Oleksandr Pavlyk	3bb34b1b1f	Remove suggestion to use --disable-blocking-kernel The text printed when blocking kernel times out already suggests to use --profile option.	2025-07-28 07:54:16 -05:00
Oleksandr Pavlyk	281a08a57e	Remove CLI --run-once and --disable-blocking-kernel options Removed option_parser::disable_blocking_kernel and option_parse::set_run_once methods. Added option_parser::enable_profile method instead, which calls ``` bench.set_run_once(true); bench.disable_blocking_kernel(true); ```	2025-07-28 07:51:50 -05:00

1 2 3 4 5 ...

594 Commits