Commit Graph

597 Commits

Author SHA1 Message Date
Oleksandr Pavlyk
c136efab65 Use absolute imports in cuda/nvbench/__init__.py 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
fc0249d188 Updated examples/axes.py to use get_float64_or_default 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
361c0337be Use cuda-pathfinder instead of cuda-bindings for Pathfinder
Removed use of __all__ per PR feedback. Emit warnings.warn if
version information could not be retrieved from the package metadata,
e.g., package has been renamed by source code was not updated.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
51fa07fab8 Avoid overloading get_int64_or_default as get_int64
Introduce get_int64_or_default method, and counterparts for
float64 and string.

Provided names for Python arguments.

Tried generating Python stubs automatically with

```
stubgen -m cuda.nvbench._nvbench
```

Gave up on this, since it does not include doc-strings.
It would be nice to compare auto-generated _nvbench.pyi with
__init__.pyi for discrepancies though.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
dc7f9edfd4 Support nvbench.Benchmark.add_int64_power_of_two_axis 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
526856db4e Fix typo in the method spelling 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
893cefb400 Fix the need to set PYTHONPATH, edited README
Edit wheel.packages metadata to include namespace package "cuda".
Updated README to remove the work-around of setting PYTHONPATH,
as it is no longer necessary.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
a535a1d173 Fix type annotations in cuda.nvbench, and in examples 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
13ad115ca3 Add nvbench.Benchmark.set_run_once method 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
bd2b536ab4 cpu_only -> cpu_activity
Change example to illustrate timing CPU work.

First example does only CPU work (sleeps), use CPU-only timer.
Second examples does both CPU and GPU work (sleeps in either case).
Use cold-run timer with/without sync tag to measure both CPU and GPU times.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
d09df0f754 Expand examples/cpu_only.py
Benchmark function that sleeps for 1 seconda on the host using CPU-only
timer, as well as CPU/GPU timer that does/doesn't use blocking kernel.

All three methods must report consistent values close to 1 second.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
e426368485 Correct propagating nvbench_main exceptions to Python
python examples/cpu_only.py --run-once -d 0 --output foo.md

used to trip SystemError, returned a result with an exception set.

It now returns a clean NVBenchmarkError exception.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
9ab642cf69 Add suggestion to create conda environment with recent CMake to build nvbench 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
e589518376 Change test and examples from using camelCase to using snake_case as implementation changed 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
81fff085b9 Change method nameing from camelCase to snake_case
This ensures names of Python API methods are consistent with those of C++
counterparts.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
11ae98389d Replace use of py::object copy constructor with use of move constructor
Change explicit constructor of benchmark_wrapper_t to use move-constructor
of py::object instead of copy constructor by replacing `py::object(o)` with
`py::object(std::move(o))`.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
d3071fb038 Addressed PR feedback re: definition of benchmark_wrapper_t
See https://github.com/NVIDIA/nvbench/pull/237#discussion_r2183749750
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
c960ef75cc Add examples/cpu_only.py based on code from PR feedback
https://github.com/NVIDIA/nvbench/pull/237#issuecomment-3058594793
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
6b4da8c5cb add comments to body of launcher_fn lambda in State.exec method 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
aa2b4d9960 Add Benchmark.setIsCPUOnly API 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
7f9d672cec Raise Python exception if error is encountered while executing benchmarks
Introduce new exception type to raise on errors that occurred while
NVBench runs benchmarks.
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
8c112d529f Include Pybind11 headers before anything else
See https://github.com/NVIDIA/nvbench/pull/237#discussion_r2183703828
for the rationale
2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
6b1b2f3c30 Updated readme 2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk
203ef2046e Add warm-up call to auto_throughput.py
Add throughput.py example, which is based on the same kernel as
auto_throughput.py but records global memory reads/writes amounts
to output BWUtil metric measuring %SOL in bandwidth utilization.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
02ad6e5490 Implement Benchmark.setName 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
8589511f61 Corrected broken cccl_parallel_segmented_reduce.py 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
394324023f Add example for benchmarking CuPy function 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
707b24ffb5 Add examples/cccl_parallel_segmented_reduce.py 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
883e5819b6 Use cuda.Stream.from_handle to create core.Stream from nvbench.CudaStream 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
b357af0092 Add examples/skip.py 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
964ec2e1bc Add examples/exec_tag_sync.py 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
4f15840832 Use state.add_summary to supplement integral TypeID with meaningful type name 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
9dba866426 Add State.add_summary method
state.add_summary(column_name: str, value: Union[int, float, str])

This is used in examples/axes.py to map integral value from Int64Axis
to string description.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
df426a0bad Add examples/axes.py 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
576c473481 Add implementation of and signature for State.getDevice
make batch/sync arguments of State.exec keyword-only

Provide default column_name value for State.addElementCount method,
so that it can be called state.addElementCount(count), or as
state.addElementCount(count, column_name="Descriptive Name")
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
2507bc2263 Add Python example based on C++ example/auto_throughput.cpp 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
4950a50961 Add empty py.typed to signal mypy that package has type annotations 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
c9f0785aed Replace uses of deprecated typing.Tuple, typing.Callable, etc.
Also use typing.Self to encode that `Benchmark.addInt64Axis` returns
self.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
6f8bcdc774 Fixed correctness of nvbench.State.getStream() method
Fix run-time exception:

```
Fail: Unexpected error: RuntimeError: return_value_policy = copy, but type is non-copyable! (#define PYBIND11_DETAILED_ERROR_MESSAGES or compile in debug mode for details)
```

caused by attempt to returning move-only `nvbench::cuda_stream` class
instance using default `pybind11::return_value_policy::copy`.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
e768ce28b6 Add Python stub file for cuda.nvbench API 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
c49d718f65 Corrected nvbench.State.getBlockingKernel -> getBlockingKernelTimeout
Similar change for setBlockingKernelTimeout.

Corrected statement in a comment.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
c184549cda Import and reexport symbols from _nvbench one-by-one 2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
b88cc78aeb Add license header to py_nvbench.cpp
Also updated comment as to why calling
`nvbench::benchmark_manager::get().initialize()` is necessary
for running all tests.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
6552ef503c Draft of Python API for NVBench
The prototype is based on pybind11 to minimize boiler-plate
code needed to deal with move-only semantics of many nvbench
classes.
2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk
a9fb32e25d Merge pull request #254 from oleksandr-pavlyk/remove-cli-run-once-and-disable-blocking-kernel-options
Remove cli run-once and disable-blocking-kernel options
2025-07-28 15:10:50 -05:00
Oleksandr Pavlyk
4ad3088a47 Update docs/cli_help.md
Spare users of implementation details in description of `--profile` option

Co-authored-by: Allison Piper <apiper@nvidia.com>
2025-07-28 14:52:57 -05:00
Oleksandr Pavlyk
25c604cf37 Fix typo, corrected control flow logic flaw
Verify that `--profile` option results in setting `m_run_once`:

```
(nvbench) opavlyk@ee09c48-lcedt:~/repos/nvbench$ ./build/bin/nvbench.example.cpp20.axes -b simple -d 0 --profile | grep "Pass: Cold"
Pass: Cold: 1.006560ms GPU, 1.009277ms CPU, 0.00s total GPU, 0.00s total wall, 1x

(nvbench) opavlyk@ee09c48-lcedt:~/repos/nvbench$ ./build/bin/nvbench.example.cpp20.axes -b simple -d 0 | grep "Pass: Cold"
Pass: Cold: 1.002844ms GPU, 1.011917ms CPU, 0.50s total GPU, 0.52s total wall, 499x
```
2025-07-28 14:40:02 -05:00
Oleksandr Pavlyk
5b6c3818f4 Corrected blocking kernel timeout message 2025-07-28 14:39:54 -05:00
Oleksandr Pavlyk
2ab5e2d1be Run once disables blocking kernel (#252)
* Measure cold must not use block_kernel for single runs

Per https://github.com/NVIDIA/nvbench/issues/242, we should not
use blocking kernel when --run-once, or --profile is used to avoid
possible deadlocks when providing with external tools, also to avoid
deadlocking when Python programs load the program on the first execution.

* Measure hot should not use blocking kernel during warmup

This change follows suite of measure_cold, where it is prompted
by deadlock, see https://github.com/NVIDIA/nvbench/pull/241

* Remove setting of CUDA_MODULE_LOADING=EAGER

This is no longer necessary as warm-up runs in regular runs,
or the single run in (run-once/profile) no longer use blocking kernel.
2025-07-28 12:14:54 -07:00
Oleksandr Pavlyk
d160a2bafa Replace --run-once in testing/CMakeLists.txt with --profile 2025-07-28 12:03:42 -05:00