Oleksandr Pavlyk
93bc59d05c
Renamed CUTLASS example to reflect that it uses CuteDSL
2026-04-01 08:24:29 -05:00
Oleksandr Pavlyk
e4cfddeb87
Rewrote cutlass_gemm example to use CuteDSL
2026-04-01 08:23:41 -05:00
Oleksandr Pavlyk
3f284b4004
Renamed cccl_* examples
...
cccl_parallel_* -> cuda_compute_*
cccl_cooperative_* -> cuda_coop_*
2026-04-01 08:20:20 -05:00
Oleksandr Pavlyk
5bdb30f4b6
Update to cccl_parallel_segmented_reduce example per changes in API
...
Update namespace changes. Use make_segmented_reduce factory function,
and update call signatures.
2026-04-01 08:18:15 -05:00
Oleksandr Pavlyk
d8739fc208
Update to cccl_cooperative_block_reduce example
2026-04-01 08:17:52 -05:00
Oleksandr Pavlyk
974eb5ee0f
Replace use of cupy.cuda.ExternalStream with cupy.cuda.Stream.from_external
2026-04-01 08:17:12 -05:00
Oleksandr Pavlyk
7c60edcc0a
cuda.core.experimental -> cuda.core
2026-04-01 08:16:04 -05:00
Oleksandr Pavlyk
836a6c12f4
Merge pull request #326 from oleksandr-pavlyk/fix-sfinae-incomplete
...
Fix GCC16 sfinae incomplete warnings.
GCC16 started requiring that the type `T` used in `std::reference_wrapper<T>` is complete where using `-std=c++17`. Since NVBench has to forward declare some types in header files to break circular dependency, use of incomplete type breaks build due to use of `-Werror` flag due to `-Wsfinae-incomplete` warning emitted by GCC16.
This commit replaced affected uses of `std::reference_wrapper<const nvbench::benchmark_base>` in state.cxx, and `std::reference_wrapper<nvbench::printer_base>` in benchmark_base.cxx with raw pointers.
2026-03-24 16:02:28 -05:00
Bernhard Manfred Gruber
4164909c52
Feedback
2026-02-28 01:19:18 +01:00
Bernhard Manfred Gruber
0abc8ec82b
Extend nvbench_compare.py with --plot, axis/benchmark filtering, and dark mode
...
Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com >
2026-02-27 11:06:20 +01:00
Bernhard Manfred Gruber
800f640c20
Apply reviewer feedback
2026-02-26 19:23:51 +01:00
Bernhard Manfred Gruber
d3a0bec4a8
Feedback from review
2026-02-05 14:13:16 +01:00
Bernhard Manfred Gruber
28ed32bb47
Implement dark mode using style sheets
2026-02-05 14:00:33 +01:00
Bernhard Manfred Gruber
ec9759037d
I have no idea what I am doing
2026-02-05 11:15:27 +01:00
Bernhard Manfred Gruber
ccde9fc4d4
More
2026-02-05 10:56:36 +01:00
Bernhard Manfred Gruber
0be190b407
Add a script to plot benchmark results
2026-02-05 10:36:52 +01:00
Nader Al Awar
dc59f98ecd
Remove cupti from cuda-bench dependencies ( #311 )
2026-02-03 14:16:26 -06:00
Bernhard Manfred Gruber
c6ef87575c
Allow partial comparison in nvbench_compare.py
...
Fixes : #295
2026-02-03 16:32:11 +01:00
Nader Al Awar
d75fc74162
Merge branch 'main' into remove-cupti-python
2026-02-03 08:58:41 -06:00
Nader Al Awar
4fa4296810
Remove cuda.pathfinder function
2026-02-02 16:43:45 -06:00
Nader Al Awar
f2d5730104
Disable CUPTI in cmake file
2026-02-02 16:03:15 -06:00
Nader Al Awar
6df5fc8c67
Remove cupti from cuda-bench dependencies
2026-02-02 15:37:13 -06:00
Oleksandr Pavlyk
8ff0557ad8
Replace use of py::handle to store global_registry
...
Use py::gil_safe_call_once_and_store facility pybind11 provides.
2026-02-02 11:55:48 -06:00
Oleksandr Pavlyk
39c29026fd
Move docstrings from PYI file to implementation
...
Added tests that docstrings exist and are not empty.
This closes #291
2026-02-02 11:55:48 -06:00
Nader Al Awar
edf0b80599
Add installation instructions
2026-01-30 09:32:44 -06:00
Nader Al Awar
fa1eed69c0
Rename test file to refer to cuda_bench
2026-01-29 13:53:29 -06:00
Nader Al Awar
711c1e2eb1
Replace all occurences of pynvbench with cuda-bench
2026-01-29 13:25:17 -06:00
Nader Al Awar
5e7adc5c3f
Build multi architecture cuda wheels ( #302 )
...
* Add cuda architectures to build wheel for
* Package scripts in wheel
* Separate cuda major version extraction to fix architecutre selection logic
* Add back statement printing cuda version
* [pre-commit.ci] auto code formatting
---------
Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
2026-01-29 01:13:24 +00:00
Ashwin Srinath
a681e2185d
Add multi-cuda wheel build ( #289 )
...
Co-authored-by: Ashwin Srinath <shwina@users.noreply.github.com >
Co-authored-by: Nader Al Awar <naderalawar@gmail.com >
2026-01-28 10:37:55 -05:00
Oleksandr Pavlyk
f6a9b245d3
Only trigger skipping of outstanding benchmarks on KeyboardInterrupt exception, on others benchmakr is to continue execution
2025-12-08 14:46:59 -06:00
Oleksandr Pavlyk
7e9a9a8983
Replace main_arg_run_benchmarks with run_interriptible
...
This loop uses benchmark.run_or_skip to resolve #284 even
for scripts that contain more than one benchmark, or when
a script with a single benchmark is executed when more than
one device is available.
2025-12-08 14:29:27 -06:00
Oleksandr Pavlyk
a7763bdd7a
Remove debug outputs
2025-12-08 12:25:31 -06:00
Oleksandr Pavlyk
b2a80c92b8
Revert "Scripts to triage 284"
...
This reverts commit c286199adc .
2025-12-08 11:53:08 -06:00
Oleksandr Pavlyk
ce9a76167f
Use nvbench::stop_runner_loop to signal stop of runner loop
...
Add try/catch around Python calls to improve keyboard interrup
response.
2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
c286199adc
Scripts to triage 284
2025-12-05 19:38:11 -06:00
Oleksandr Pavlyk
de471e1d42
Use pybind11==3.0.1, do not use pybind11_add_module
2025-12-05 19:38:11 -06:00
Ashwin Srinath
77b7afc3c9
Remove the Python version file
2025-12-03 16:23:14 -05:00
Ashwin Srinath
29389b5791
Initial wheel build and publishing infrastructure
2025-12-03 10:15:32 -05:00
Asher Mancinelli
e91559edf0
Update README.md
2025-11-14 14:34:18 -08:00
Jaya Venkatesh
0f997271f7
added numba-cuda to requirements
...
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com >
2025-09-16 14:54:08 -07:00
Jaya Venkatesh
bfa6a6c7c6
remove pynvjitlink references in examples
...
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com >
2025-09-08 16:00:19 -07:00
Oleksandr Pavlyk
b5e4b4ba31
cuda.nvbench -> cuda.bench
...
Per PR review suggestion:
- `cuda.parallel` - device-wide algorithms/Thrust
- `cuda.cooperative` - Cooperative algorithsm/CUB
- `cuda.bench` - Benchmarking/NVBench
2025-08-04 13:42:43 -05:00
Oleksandr Pavlyk
c2a2acc9b6
Change float64_t arg-type for set_throttle_threshold to float32_t
...
The C++ method signature of set_throttle_threshold/set_trottle_recovery_delay,
which uses nvbench::float32_t
2025-08-04 12:14:52 -05:00
Oleksandr Pavlyk
584f48ac97
Remove warm-up invocations outside of launcher in examples/throughout and auto_throughput
2025-08-04 12:14:44 -05:00
Oleksandr Pavlyk
d8b0acc8d4
Export exception to nvbench namespace
2025-08-04 12:00:42 -05:00
Oleksandr Pavlyk
9dfdd8af89
Minimal test file
2025-08-04 11:59:17 -05:00
Oleksandr Pavlyk
6aff4712f8
Change permissions of test/run_1.py
2025-08-04 10:13:08 -05:00
Oleksandr Pavlyk
73e18419b2
Stub of __cuda_stream__ special method declare tuple[int, int] as return type
...
This is to indicate that special method always returns a pair of integers
2025-08-04 10:11:33 -05:00
Oleksandr Pavlyk
a5e0a48f80
Add test test functions for cpp/python exceptions
2025-08-04 10:09:10 -05:00
Oleksandr Pavlyk
40a2337a6b
Review fix: make nvbenhch_run_error constructable
...
Allow `throw nvbench_run_error("Msg");` to compile.
Add comment around definition of nvbench_run_error
2025-08-04 10:09:04 -05:00