nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-03-14 20:27:24 +00:00

Author	SHA1	Message	Date
Oleksandr Pavlyk	de471e1d42	Use pybind11==3.0.1, do not use pybind11_add_module	2025-12-05 19:38:11 -06:00
Jerry Hou	f651636501	entropy criterion optimizations (#286 ) * entropy criterion optimizations * online linear regression module * online regression refactor * revising ss_tot handling --------- Co-authored-by: Jerry Hou <jerryhou@fb.com>	2025-12-06 01:02:21 +00:00
Ashwin Srinath	a6995413ac	Merge pull request #288 from shwina/wheel-build-and-publish-infra Initial wheel build and publishing infrastructure	2025-12-04 04:37:07 -05:00
Ashwin Srinath	1d33536ce1	Re-enable other CI jobs	2025-12-03 16:42:30 -05:00
Ashwin Srinath	603a2df445	Remove workaround	2025-12-03 16:23:42 -05:00
Ashwin Srinath	77b7afc3c9	Remove the Python version file	2025-12-03 16:23:14 -05:00
Ashwin Srinath	3af11c8ee7	Expand the CI matrix back	2025-12-03 15:48:40 -05:00
Ashwin Srinath	cadfa7de61	We no longer need to install libnvidia-ml.so	2025-12-03 15:37:20 -05:00
Ashwin Srinath	7ad064ea4f	Change to GPU runner for testing	2025-12-03 15:18:39 -05:00
Ashwin Srinath	b7eaf44ca3	Install libnvidia-ml.so.1 in test environment	2025-12-03 14:56:37 -05:00
Ashwin Srinath	c2c34c9378	Temporarily reduce CI matrix	2025-12-03 14:37:23 -05:00
Ashwin Srinath	a293af1d52	Try capturing the Python path before changing directories	2025-12-03 14:15:34 -05:00
Ashwin Srinath	a7f92b7436	Try an inner and outer script	2025-12-03 13:21:53 -05:00
Ashwin Srinath	9746aa14df	Maybe fix to test script	2025-12-03 12:47:43 -05:00
Ashwin Srinath	d1efef03bc	Fix wheel naming	2025-12-03 11:54:46 -05:00
Ashwin Srinath	618001143b	Fixes to test script	2025-12-03 11:41:36 -05:00
Ashwin Srinath	8443a2059c	Ensure test jobs find wheels correctly	2025-12-03 11:22:19 -05:00
Ashwin Srinath	f3df4104de	Make wheels manylinux compliant	2025-12-03 11:22:12 -05:00
Ashwin Srinath	e15d9ebf58	Lint fixes	2025-12-03 11:07:03 -05:00
Ashwin Srinath	98e0b5994a	Introduce build-and-test-python-wheels workflow	2025-12-03 11:06:11 -05:00
Ashwin Srinath	e9cf53a1a4	Add PR workflow for building and testing wheels	2025-12-03 10:30:27 -05:00
Ashwin Srinath	8b2afa6c16	Lint fixes	2025-12-03 10:17:23 -05:00
Ashwin Srinath	29389b5791	Initial wheel build and publishing infrastructure	2025-12-03 10:15:32 -05:00
Bernhard Manfred Gruber	34f1e2a7ee	Merge pull request #285 from ashermancinelli/patch-1 Update README.md	2025-11-16 00:11:42 +01:00
Asher Mancinelli	e91559edf0	Update README.md	2025-11-14 14:34:18 -08:00
comeyrd	92d2e01cd1	Profile only the kernels involved in the benchmark (#277 ) Co-authored-by: Allison Piper <alliepiper16@gmail.com>	2025-10-21 13:51:37 -04:00
Allison Piper	9b133a94bc	Remove GLOBAL tags from fmt targets. (#281 ) Fixes #279.	2025-10-21 11:16:44 -04:00
Allison Piper	e6283df79c	Build native arch by default, update rapids-cmake. (#280 ) * Build native arch by default, update rapids-cmake. * Add check that CXX and CUDA_HOST compiler match. Similar to CCCL, we need these to match to ensure that our warning flag detection functions properly. * GCC only recognizes `unused-local-typedefs`. Clang recognizes both. Ensure that we set this for both compilers.	2025-10-21 10:41:36 -04:00
Bernhard Manfred Gruber	98d701c054	Diff device sections on mismatch in nvbench_compare.py (#278 )	2025-10-15 08:58:08 -04:00
pre-commit-ci[bot]	7feda2cf3a	[pre-commit.ci] pre-commit autoupdate (#276 ) * [pre-commit.ci] pre-commit autoupdate updates: - [github.com/pre-commit/pre-commit-hooks: v5.0.0 → v6.0.0](https://github.com/pre-commit/pre-commit-hooks/compare/v5.0.0...v6.0.0) - [github.com/pre-commit/mirrors-clang-format: v20.1.7 → v21.1.2](https://github.com/pre-commit/mirrors-clang-format/compare/v20.1.7...v21.1.2) - [github.com/astral-sh/ruff-pre-commit: v0.12.2 → v0.13.3](https://github.com/astral-sh/ruff-pre-commit/compare/v0.12.2...v0.13.3) * Update matrix + devcontainers. * Fix typo. Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com> --------- Co-authored-by: pre-commit-ci[bot] <66853113+pre-commit-ci[bot]@users.noreply.github.com> Co-authored-by: Allison Piper <alliepiper16@gmail.com> Co-authored-by: Oleksandr Pavlyk <21087696+oleksandr-pavlyk@users.noreply.github.com>	2025-10-07 15:22:36 -04:00
Oleksandr Pavlyk	e7cc1e344c	Add an benchmark example parametrized by typename and integral constant. (#275 ) * Add an benchmark example parametrized by typename and integral constant. Add a variation of copy_type_sweep kernel, where block size is controlled via integral constant passed as template parameter. * Addressed PR review feedback * Use auto to gridSize * Address PR review change request * Add comment to use ceil_div with CCCL >= 2.8	2025-10-07 13:49:17 -04:00
Oleksandr Pavlyk	b88a45f417	Merge pull request #269 from jayavenkatesh19/main remove pynvjitlink references in examples	2025-09-17 13:54:36 -05:00
Jaya Venkatesh	0f997271f7	added numba-cuda to requirements Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>	2025-09-16 14:54:08 -07:00
Jaya Venkatesh	bfa6a6c7c6	remove pynvjitlink references in examples Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>	2025-09-08 16:00:19 -07:00
Allison Piper	4642df7006	Fix sccache checks when running locally. (#268 )	2025-09-05 15:50:09 -04:00
Allison Piper	33a659ecd3	Add CTK 13.0 + Clang20 to CI. (#266 )	2025-09-03 11:24:07 -04:00
Allison Piper	ebc1bd1795	Avoid unreachable code warning (#265 )	2025-09-02 22:03:39 -04:00
Oleksandr Pavlyk	935bb0b633	Merge pull request #237 from oleksandr-pavlyk/add-pynvbench Python package pynvbench introduced that exposes `cuda.bench` namespace. Repository provides a set of examples.	2025-08-06 12:22:55 -05:00
Oleksandr Pavlyk	b5e4b4ba31	cuda.nvbench -> cuda.bench Per PR review suggestion: - `cuda.parallel` - device-wide algorithms/Thrust - `cuda.cooperative` - Cooperative algorithsm/CUB - `cuda.bench` - Benchmarking/NVBench	2025-08-04 13:42:43 -05:00
Oleksandr Pavlyk	c2a2acc9b6	Change float64_t arg-type for set_throttle_threshold to float32_t The C++ method signature of set_throttle_threshold/set_trottle_recovery_delay, which uses nvbench::float32_t	2025-08-04 12:14:52 -05:00
Oleksandr Pavlyk	584f48ac97	Remove warm-up invocations outside of launcher in examples/throughout and auto_throughput	2025-08-04 12:14:44 -05:00
Oleksandr Pavlyk	d8b0acc8d4	Export exception to nvbench namespace	2025-08-04 12:00:42 -05:00
Oleksandr Pavlyk	9dfdd8af89	Minimal test file	2025-08-04 11:59:17 -05:00
Oleksandr Pavlyk	6aff4712f8	Change permissions of test/run_1.py	2025-08-04 10:13:08 -05:00
Oleksandr Pavlyk	73e18419b2	Stub of __cuda_stream__ special method declare tuple[int, int] as return type This is to indicate that special method always returns a pair of integers	2025-08-04 10:11:33 -05:00
Oleksandr Pavlyk	a5e0a48f80	Add test test functions for cpp/python exceptions	2025-08-04 10:09:10 -05:00
Oleksandr Pavlyk	40a2337a6b	Review fix: make nvbenhch_run_error constructable Allow `throw nvbench_run_error("Msg");` to compile. Add comment around definition of nvbench_run_error	2025-08-04 10:09:04 -05:00
Oleksandr Pavlyk	4fc628c4d7	Python native extension to use CXX/CUDA standard of NVBench library This fixes cryptic build failure with GNU compiler 14	2025-08-01 15:33:39 -05:00
Oleksandr Pavlyk	3fea652d16	Fix type in stub declaration for Benchmark.add_string_axis	2025-08-01 15:03:06 -05:00
Oleksandr Pavlyk	fa8dd48186	json_printer.cu changed to use write-out buffer of 4KB (#259 ) * json_printer.cu changed to use write-out buffer of 4KB The json_printer::do_process_bulk_data_float64 used to write out one float32 value at a time. This PR introduces a buffer of 4KB that is being filled with values until full, and then written out. The 4KB value aligns with system memory page size and seems appropriate for relatively small datasizes of duration measurements. * Add explicit static cast from std::size_t to std::streamsize The explcit cast avoids narrowing error. * Factor out writing array out to binary file into standalone function This function is templated based on buffer-size. The function can be reused to also write-out frequence samples in the future.	2025-08-01 12:48:25 -07:00

1 2 3 4 5 ...

673 Commits