Commit Graph

673 Commits

Author SHA1 Message Date
Oleksandr Pavlyk
de471e1d42 Use pybind11==3.0.1, do not use pybind11_add_module 2025-12-05 19:38:11 -06:00
Jerry Hou
f651636501 entropy criterion optimizations (#286)
* entropy criterion optimizations

* online linear regression module

* online regression refactor

* revising ss_tot handling

---------

Co-authored-by: Jerry Hou <jerryhou@fb.com>
2025-12-06 01:02:21 +00:00
Ashwin Srinath
a6995413ac Merge pull request #288 from shwina/wheel-build-and-publish-infra
Initial wheel build and publishing infrastructure
2025-12-04 04:37:07 -05:00
Ashwin Srinath
1d33536ce1 Re-enable other CI jobs 2025-12-03 16:42:30 -05:00
Ashwin Srinath
603a2df445 Remove workaround 2025-12-03 16:23:42 -05:00
Ashwin Srinath
77b7afc3c9 Remove the Python version file 2025-12-03 16:23:14 -05:00
Ashwin Srinath
3af11c8ee7 Expand the CI matrix back 2025-12-03 15:48:40 -05:00
Ashwin Srinath
cadfa7de61 We no longer need to install libnvidia-ml.so 2025-12-03 15:37:20 -05:00
Ashwin Srinath
7ad064ea4f Change to GPU runner for testing 2025-12-03 15:18:39 -05:00
Ashwin Srinath
b7eaf44ca3 Install libnvidia-ml.so.1 in test environment 2025-12-03 14:56:37 -05:00
Ashwin Srinath
c2c34c9378 Temporarily reduce CI matrix 2025-12-03 14:37:23 -05:00
Ashwin Srinath
a293af1d52 Try capturing the Python path before changing directories 2025-12-03 14:15:34 -05:00
Ashwin Srinath
a7f92b7436 Try an inner and outer script 2025-12-03 13:21:53 -05:00
Ashwin Srinath
9746aa14df Maybe fix to test script 2025-12-03 12:47:43 -05:00
Ashwin Srinath
d1efef03bc Fix wheel naming 2025-12-03 11:54:46 -05:00
Ashwin Srinath
618001143b Fixes to test script 2025-12-03 11:41:36 -05:00
Ashwin Srinath
8443a2059c Ensure test jobs find wheels correctly 2025-12-03 11:22:19 -05:00
Ashwin Srinath
f3df4104de Make wheels manylinux compliant 2025-12-03 11:22:12 -05:00
Ashwin Srinath
e15d9ebf58 Lint fixes 2025-12-03 11:07:03 -05:00
Ashwin Srinath
98e0b5994a Introduce build-and-test-python-wheels workflow 2025-12-03 11:06:11 -05:00
Ashwin Srinath
e9cf53a1a4 Add PR workflow for building and testing wheels 2025-12-03 10:30:27 -05:00
Ashwin Srinath
8b2afa6c16 Lint fixes 2025-12-03 10:17:23 -05:00
Ashwin Srinath
29389b5791 Initial wheel build and publishing infrastructure 2025-12-03 10:15:32 -05:00
Bernhard Manfred Gruber
34f1e2a7ee Merge pull request #285 from ashermancinelli/patch-1
Update README.md
2025-11-16 00:11:42 +01:00
Asher Mancinelli
e91559edf0 Update README.md 2025-11-14 14:34:18 -08:00
comeyrd
92d2e01cd1 Profile only the kernels involved in the benchmark (#277)
Co-authored-by: Allison Piper <alliepiper16@gmail.com>
2025-10-21 13:51:37 -04:00
Allison Piper
9b133a94bc Remove GLOBAL tags from fmt targets. (#281)
Fixes #279.
2025-10-21 11:16:44 -04:00
Allison Piper
e6283df79c Build native arch by default, update rapids-cmake. (#280)
* Build native arch by default, update rapids-cmake.
* Add check that CXX and CUDA_HOST compiler match.
  Similar to CCCL, we need these to match to ensure that our warning flag detection functions properly.
* GCC only recognizes `unused-local-typedefs`.
  Clang recognizes both. Ensure that we set this for both compilers.
2025-10-21 10:41:36 -04:00
Bernhard Manfred Gruber
98d701c054 Diff device sections on mismatch in nvbench_compare.py (#278) 2025-10-15 08:58:08 -04:00
pre-commit-ci[bot]
7feda2cf3a [pre-commit.ci] pre-commit autoupdate (#276)
* [pre-commit.ci] pre-commit autoupdate

updates:
- [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v5.0.0...v6.0.0)
- [github.com/pre-commit/mirrors-clang-format: v20.1.7 → v21.1.2](https://github.com/pre-commit/mirrors-clang-format/compare/v20.1.7...v21.1.2)
- [github.com/astral-sh/ruff-pre-commit: v0.12.2 → v0.13.3](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.2...v0.13.3)

* Update matrix + devcontainers.

* Fix typo.

Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>

---------

Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com>
Co-authored-by: Allison Piper <alliepiper16@gmail.com>
Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>
2025-10-07 15:22:36 -04:00
Oleksandr Pavlyk
e7cc1e344c Add an benchmark example parametrized by typename and integral constant. (#275)
* Add an benchmark example parametrized by typename and integral constant.

Add a variation of copy_type_sweep kernel, where block size is controlled
via integral constant passed as template parameter.

* Addressed PR review feedback

* Use auto to gridSize

* Address PR review change request

* Add comment to use ceil_div with CCCL >= 2.8
2025-10-07 13:49:17 -04:00
Oleksandr Pavlyk
b88a45f417 Merge pull request #269 from jayavenkatesh19/main
remove pynvjitlink references in examples
2025-09-17 13:54:36 -05:00
Jaya Venkatesh
0f997271f7 added numba-cuda to requirements
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
2025-09-16 14:54:08 -07:00
Jaya Venkatesh
bfa6a6c7c6 remove pynvjitlink references in examples
Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>
2025-09-08 16:00:19 -07:00
Allison Piper
4642df7006 Fix sccache checks when running locally. (#268) 2025-09-05 15:50:09 -04:00
Allison Piper
33a659ecd3 Add CTK 13.0 + Clang20 to CI. (#266) 2025-09-03 11:24:07 -04:00
Allison Piper
ebc1bd1795 Avoid unreachable code warning (#265) 2025-09-02 22:03:39 -04:00
Oleksandr Pavlyk
935bb0b633 Merge pull request #237 from oleksandr-pavlyk/add-pynvbench
Python package pynvbench introduced that exposes `cuda.bench` namespace. Repository provides a set of examples.
2025-08-06 12:22:55 -05:00
Oleksandr Pavlyk
b5e4b4ba31 cuda.nvbench -> cuda.bench
Per PR review suggestion:
   - `cuda.parallel`    - device-wide algorithms/Thrust
   - `cuda.cooperative` - Cooperative algorithsm/CUB
   - `cuda.bench`       - Benchmarking/NVBench
2025-08-04 13:42:43 -05:00
Oleksandr Pavlyk
c2a2acc9b6 Change float64_t arg-type for set_throttle_threshold to float32_t
The C++ method signature of set_throttle_threshold/set_trottle_recovery_delay,
which uses nvbench::float32_t
2025-08-04 12:14:52 -05:00
Oleksandr Pavlyk
584f48ac97 Remove warm-up invocations outside of launcher in examples/throughout and auto_throughput 2025-08-04 12:14:44 -05:00
Oleksandr Pavlyk
d8b0acc8d4 Export exception to nvbench namespace 2025-08-04 12:00:42 -05:00
Oleksandr Pavlyk
9dfdd8af89 Minimal test file 2025-08-04 11:59:17 -05:00
Oleksandr Pavlyk
6aff4712f8 Change permissions of test/run_1.py 2025-08-04 10:13:08 -05:00
Oleksandr Pavlyk
73e18419b2 Stub of __cuda_stream__ special method declare tuple[int, int] as return type
This is to indicate that special method always returns a pair of integers
2025-08-04 10:11:33 -05:00
Oleksandr Pavlyk
a5e0a48f80 Add test test functions for cpp/python exceptions 2025-08-04 10:09:10 -05:00
Oleksandr Pavlyk
40a2337a6b Review fix: make nvbenhch_run_error constructable
Allow `throw nvbench_run_error("Msg");` to compile.

Add comment around definition of nvbench_run_error
2025-08-04 10:09:04 -05:00
Oleksandr Pavlyk
4fc628c4d7 Python native extension to use CXX/CUDA standard of NVBench library
This fixes cryptic build failure with GNU compiler 14
2025-08-01 15:33:39 -05:00
Oleksandr Pavlyk
3fea652d16 Fix type in stub declaration for Benchmark.add_string_axis 2025-08-01 15:03:06 -05:00
Oleksandr Pavlyk
fa8dd48186 json_printer.cu changed to use write-out buffer of 4KB (#259)
* json_printer.cu changed to use write-out buffer of 4KB

The json_printer::do_process_bulk_data_float64 used to write
out one float32 value at a time. This PR introduces a buffer of 4KB
that is being filled with values until full, and then written out.

The 4KB value aligns with system memory page size and seems
appropriate for relatively small datasizes of duration measurements.

* Add explicit static cast from std::size_t to std::streamsize

The explcit cast avoids narrowing error.

* Factor out writing array out to binary file into standalone function

This function is templated based on buffer-size. The function can be
reused to also write-out frequence samples in the future.
2025-08-01 12:48:25 -07:00