nvbench

mirror of https://github.com/NVIDIA/nvbench.git synced 2026-05-12 01:10:01 +00:00

Author	SHA1	Message	Date
Jaya Venkatesh	0f997271f7	added numba-cuda to requirements Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>	2025-09-16 14:54:08 -07:00
Jaya Venkatesh	bfa6a6c7c6	remove pynvjitlink references in examples Signed-off-by: Jaya Venkatesh <jjayabaskar@nvidia.com>	2025-09-08 16:00:19 -07:00
Oleksandr Pavlyk	b5e4b4ba31	cuda.nvbench -> cuda.bench Per PR review suggestion: - `cuda.parallel` - device-wide algorithms/Thrust - `cuda.cooperative` - Cooperative algorithsm/CUB - `cuda.bench` - Benchmarking/NVBench	2025-08-04 13:42:43 -05:00
Oleksandr Pavlyk	584f48ac97	Remove warm-up invocations outside of launcher in examples/throughout and auto_throughput	2025-08-04 12:14:44 -05:00
Oleksandr Pavlyk	453a1648aa	Improvements to readability of examples per PR review	2025-07-31 16:20:52 -05:00
Oleksandr Pavlyk	6b9050e404	Add example of benchmarking pytorch code	2025-07-28 15:57:11 -05:00
Oleksandr Pavlyk	985db4f144	Add examples/cccl_cooperative_block_reduce.py	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	5e8c17c740	Fix mypy error in import statement used in cutlass_gemm example	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	5c01c34793	Fix mypy error in cutlass_gemm example	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	a69a3647b2	CUTLASS example added, license headers added, fixes - Add license header to each example file. - Fixed broken runs caused by type declarations. - Fixed hang in throughput.py when --run-once by doing a manual warm-up step, like in auto_throughput.py	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	fc0249d188	Updated examples/axes.py to use get_float64_or_default	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	a535a1d173	Fix type annotations in cuda.nvbench, and in examples	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	bd2b536ab4	cpu_only -> cpu_activity Change example to illustrate timing CPU work. First example does only CPU work (sleeps), use CPU-only timer. Second examples does both CPU and GPU work (sleeps in either case). Use cold-run timer with/without sync tag to measure both CPU and GPU times.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	d09df0f754	Expand examples/cpu_only.py Benchmark function that sleeps for 1 seconda on the host using CPU-only timer, as well as CPU/GPU timer that does/doesn't use blocking kernel. All three methods must report consistent values close to 1 second.	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	e589518376	Change test and examples from using camelCase to using snake_case as implementation changed	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	c960ef75cc	Add examples/cpu_only.py based on code from PR feedback https://github.com/NVIDIA/nvbench/pull/237#issuecomment-3058594793	2025-07-28 15:37:05 -05:00
Oleksandr Pavlyk	203ef2046e	Add warm-up call to auto_throughput.py Add throughput.py example, which is based on the same kernel as auto_throughput.py but records global memory reads/writes amounts to output BWUtil metric measuring %SOL in bandwidth utilization.	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	8589511f61	Corrected broken cccl_parallel_segmented_reduce.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	394324023f	Add example for benchmarking CuPy function	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	707b24ffb5	Add examples/cccl_parallel_segmented_reduce.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	883e5819b6	Use cuda.Stream.from_handle to create core.Stream from nvbench.CudaStream	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	b357af0092	Add examples/skip.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	964ec2e1bc	Add examples/exec_tag_sync.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	4f15840832	Use state.add_summary to supplement integral TypeID with meaningful type name	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	df426a0bad	Add examples/axes.py	2025-07-28 15:37:04 -05:00
Oleksandr Pavlyk	2507bc2263	Add Python example based on C++ example/auto_throughput.cpp	2025-07-28 15:37:04 -05:00

26 Commits