nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-03-14 20:27:24 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	c747a19b98	Remove code setting up CUDA_MODULE_LOADING=EAGER in Python extension	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	985db4f144	Add examples/cccl_cooperative_block_reduce.py	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	445d881eda	Expand README Make it explicit in README that we build and locally install NVBench first, and then build Python package use the library as a dependency. The nvbench library is installed into Python layout alongside the native extension.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	5e8c17c740	Fix mypy error in import statement used in cutlass_gemm example	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	b9554d7980	Fix method name typo in stub file discovered by mypy	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	5c01c34793	Fix mypy error in cutlass_gemm example	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	5428534124	Add license header to __init__.py	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	480614e847	Add license to stub fuile, add comment about syncing impl and stubs Add comments stating the need to keep implementation and Python stub file in sync to both files. In the stub file to comment documents use of mypy's stubgen to generate stubs and calls to compare that against current stubs. It also calls out the need to keep docstrings and doctring examples in sync with implementation.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	a69a3647b2	CUTLASS example added, license headers added, fixes - Add license header to each example file. - Fixed broken runs caused by type declarations. - Fixed hang in throughput.py when --run-once by doing a manual warm-up step, like in auto_throughput.py	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	c136efab65	Use absolute imports in cuda/nvbench/__init__.py	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	fc0249d188	Updated examples/axes.py to use get_float64_or_default	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	361c0337be	Use cuda-pathfinder instead of cuda-bindings for Pathfinder Removed use of __all__ per PR feedback. Emit warnings.warn if version information could not be retrieved from the package metadata, e.g., package has been renamed by source code was not updated.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	51fa07fab8	Avoid overloading get_int64_or_default as get_int64 Introduce get_int64_or_default method, and counterparts for float64 and string. Provided names for Python arguments. Tried generating Python stubs automatically with ``` stubgen -m cuda.nvbench._nvbench ``` Gave up on this, since it does not include doc-strings. It would be nice to compare auto-generated _nvbench.pyi with __init__.pyi for discrepancies though.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	dc7f9edfd4	Support nvbench.Benchmark.add_int64_power_of_two_axis	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	526856db4e	Fix typo in the method spelling	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	893cefb400	Fix the need to set PYTHONPATH, edited README Edit wheel.packages metadata to include namespace package "cuda". Updated README to remove the work-around of setting PYTHONPATH, as it is no longer necessary.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	a535a1d173	Fix type annotations in cuda.nvbench, and in examples	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	13ad115ca3	Add nvbench.Benchmark.set_run_once method	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	bd2b536ab4	cpu_only -> cpu_activity Change example to illustrate timing CPU work. First example does only CPU work (sleeps), use CPU-only timer. Second examples does both CPU and GPU work (sleeps in either case). Use cold-run timer with/without sync tag to measure both CPU and GPU times.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	d09df0f754	Expand examples/cpu_only.py Benchmark function that sleeps for 1 seconda on the host using CPU-only timer, as well as CPU/GPU timer that does/doesn't use blocking kernel. All three methods must report consistent values close to 1 second.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	e426368485	Correct propagating nvbench_main exceptions to Python python examples/cpu_only.py --run-once -d 0 --output foo.md used to trip SystemError, returned a result with an exception set. It now returns a clean NVBenchmarkError exception.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	9ab642cf69	Add suggestion to create conda environment with recent CMake to build nvbench	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	e589518376	Change test and examples from using camelCase to using snake_case as implementation changed	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	81fff085b9	Change method nameing from camelCase to snake_case This ensures names of Python API methods are consistent with those of C++ counterparts.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	11ae98389d	Replace use of py::object copy constructor with use of move constructor Change explicit constructor of benchmark_wrapper_t to use move-constructor of py::object instead of copy constructor by replacing `py::object(o)` with `py::object(std::move(o))`.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	d3071fb038	Addressed PR feedback re: definition of benchmark_wrapper_t See https://github.com/NVIDIA/nvbench/pull/237#discussion_r2183749750	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	c960ef75cc	Add examples/cpu_only.py based on code from PR feedback https://github.com/NVIDIA/nvbench/pull/237#issuecomment-3058594793	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	6b4da8c5cb	add comments to body of launcher_fn lambda in State.exec method	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	aa2b4d9960	Add Benchmark.setIsCPUOnly API	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	7f9d672cec	Raise Python exception if error is encountered while executing benchmarks Introduce new exception type to raise on errors that occurred while NVBench runs benchmarks.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	8c112d529f	Include Pybind11 headers before anything else See https://github.com/NVIDIA/nvbench/pull/237#discussion_r2183703828 for the rationale	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	6b1b2f3c30	Updated readme	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	203ef2046e	Add warm-up call to auto_throughput.py Add throughput.py example, which is based on the same kernel as auto_throughput.py but records global memory reads/writes amounts to output BWUtil metric measuring %SOL in bandwidth utilization.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	02ad6e5490	Implement Benchmark.setName	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	8589511f61	Corrected broken cccl_parallel_segmented_reduce.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	394324023f	Add example for benchmarking CuPy function	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	707b24ffb5	Add examples/cccl_parallel_segmented_reduce.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	883e5819b6	Use cuda.Stream.from_handle to create core.Stream from nvbench.CudaStream	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	b357af0092	Add examples/skip.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	964ec2e1bc	Add examples/exec_tag_sync.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	4f15840832	Use state.add_summary to supplement integral TypeID with meaningful type name	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	9dba866426	Add State.add_summary method state.add_summary(column_name: str, value: Union[int, float, str]) This is used in examples/axes.py to map integral value from Int64Axis to string description.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	df426a0bad	Add examples/axes.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	576c473481	Add implementation of and signature for State.getDevice make batch/sync arguments of State.exec keyword-only Provide default column_name value for State.addElementCount method, so that it can be called state.addElementCount(count), or as state.addElementCount(count, column_name="Descriptive Name")	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	2507bc2263	Add Python example based on C++ example/auto_throughput.cpp	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	4950a50961	Add empty py.typed to signal mypy that package has type annotations	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	c9f0785aed	Replace uses of deprecated typing.Tuple, typing.Callable, etc. Also use typing.Self to encode that `Benchmark.addInt64Axis` returns self.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	6f8bcdc774	Fixed correctness of nvbench.State.getStream() method Fix run-time exception: ``` Fail: Unexpected error: RuntimeError: return_value_policy = copy, but type is non-copyable! (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details) ``` caused by attempt to returning move-only `nvbench::cuda_stream` class instance using default `pybind11::return_value_policy::copy`.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	e768ce28b6	Add Python stub file for cuda.nvbench API	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	c49d718f65	Corrected nvbench.State.getBlockingKernel -> getBlockingKernelTimeout Similar change for setBlockingKernelTimeout. Corrected statement in a comment.	2025-07-28 15:37:04 -05:00

1 2 3 4 5 ...

606 Commits