* Correct Python API signature of State.get_axis_values_as_strings
The C++ API has default boolean argument color, but Python API
declared no arguments.
Closes#345
* Also exercise invocation of get_axis_values_as_string with keyword argument value
* Remove use of cuda.core.experimental
Fixed relative text alignment in docstrings to fix autodoc warnigns
Renamed cuda.bench.test_cpp_exception and cuda.bench.test_py_exception functions
to start with underscore, signaling that these functions are internal and should
not be documented
Account for test_cpp_exceptions -> _test_cpp_exception, same for *_py_*
Make sure to reset __module__ of reexported symbols to be cuda.bench
* Introduce function colorize to modularize colorization/no-color handling
* Use sns.set_theme instead of deprecated sns.set()
* Use str.format instead of legacy % syntax
* Simplified iteration over list
Use f-string (supported since Python 3.6) instead of str.format for
better readability and performance
Fix GCC16 sfinae incomplete warnings.
GCC16 started requiring that the type `T` used in `std::reference_wrapper<T>` is complete where using `-std=c++17`. Since NVBench has to forward declare some types in header files to break circular dependency, use of incomplete type breaks build due to use of `-Werror` flag due to `-Wsfinae-incomplete` warning emitted by GCC16.
This commit replaced affected uses of `std::reference_wrapper<const nvbench::benchmark_base>` in state.cxx, and `std::reference_wrapper<nvbench::printer_base>` in benchmark_base.cxx with raw pointers.
* Mark NVBench headers as SYSTEM for consuming targets.
Fixes#30.
* As nvbench.main links to nvbench as INTERFACE only, it no longer consumes usage reqs of nvbench
Because of this nvbench.main was no longer consuming dependence on CUDA::toolkit include dirs.
This PR links nvbench.main to ${ctk_libraries} privately to reestablish that dependency
* Implement use of pragma system_header in NVBench
1. Add code to nvbench/config.cuh.in to define NVBENCH_IMPLICIT_SYSTEM_HEADER_*
preprocessor variable dependending on compiler, unless NVBENCH_NO_IMPLICIT_SYSTEM_HEADER
was defined.
2. Build NVBench targets with -DNVBENCH_NO_IMPLICIT_SYSTEM_HEADER
3. Modify each header file in nvbench/ folder to
- include <nvbench/config.cuh>
- Execute pragma <OPTIONAL_CMPLR> system_header guarded
by checks for defined preprocessor variables
- Do the above two steps before any other headers are included
---------
Co-authored-by: Allison Piper <apiper@nvidia.com>
* Reworked cupti_profiler to use Host + Range Profiler APIs end-to-end
NVPW_* API has been deprecated since CTK 13.0. Followed advice in compliation
message to replace NVPW_* API with CUPTI Profiler Host API.
`libnvbench.so` no longer links to `nvperf_host` directly, only to `libcupti`.
NVBench uses the **CUPTI Host API** to build a config image from metric names,
and the **Range Profiler API** to collect and decode counters. The host API never
collects data directly; it prepares and evaluates data produced by range profiling.
Introduce `host_impl`/`profiler_init_guard` to manage CUPTI Host object and
initialization/deinitialization, including safe move-assignment cleanup.
`profiler_init_guard` initializes profiler, and throws if CUPTI returns
an error code. `profiler_init_guard::finalize_profiler()` de-inits profiler
and returns the error code. Destructor calls finalize_profiler, but ignores
the status code. If user wants to explicitly de-initialize profiler and handle
the error, he/she is advised to call `finalize_profiler()` directly. The guard
has a boolean member variable to allow destructor to work even if user explicitly
called finalize_profiler() method.
The old counter-data prefix/scratch flow was replaced with the Range Profiler counter
data image sizing/initialization path and decode flow.
Host API metric filtering (base metrics + context scope) and Host-side evaluation to
GPU values via cuptiProfilerHostEvaluateToGpuValues is implemented.
- **Host object**: `host_impl::object` in `nvbench/cupti_profiler.cxx`.
- **Range profiler object**: `host_impl::range_profiler_object`.
- **Config image**: `m_config_image`.
- **Counter data image**: `m_data_image`.
1) **Host init + config image**
- `initialize_profiler_host()` creates the host object.
- `initialize_config_image_host()` adds metrics and builds the config image.
2) **Range profiler enable + counter data image**
- `enable_range_profiler()` creates the range profiler object.
- `initialize_counter_data_image()` sizes and initializes the data image using
the range profiler object, matching the CUPTI samples.
3) **Config + collect + decode**
- `set_range_profiler_config()` binds the config image + data image.
- `start_user_loop()` / `stop_user_loop()` push/pop the user range and
start/stop the range profiler.
- `process_user_loop()` decodes counter data via
`cuptiRangeProfilerDecodeData()`.
4) **Evaluate metrics**
- `get_counter_values()` calls `cuptiProfilerHostEvaluateToGpuValues()` to
convert counter data into metric values.
The
* Use class instead of struct in profiler_init_guard; forward declaration
* Add SFINAE guards before accessing members not present in earlier CTK versions
* Check if cupti_profiler_host.h exists, use old/new implementation based on that check
1. Reintroduced legacy `cupti_profiler_nvpw.cuh` and `cupti_profiler_nvpw.cuh`.
2. Moved profiler-host-API implementation to `cupti_profiler_host.cuh`, `cupti_profiler_host.cxx`.
3. Add `nvbench/cupti_profiler.cuh` which checks if `cupti_profiler_host.h` header is known and
includes `cupti_profiler_host.cuh` or `cupti_profiler_nvpw.cuh` respectively.
4. In cmake, we check if ${nvbench_cupti_root}/include/cupti_profiler_host.h file exists.
If it does not, `libnvbench.so` would have dependency on libnvperf_host and libnvperf_target
in addition to dependency on libcupti. If the header exists, it would only depend on libcupti
measure_cold class now directly inherits m_check_throttling from state.
This ensures that when `--jsonbin` is specified frequency data corresponding
to timing data are available to write out.
This places all std::vector members together. Added default initialization
to all std::vector members, and all other members with default constructors.
Exceptions are references and nvbench::launch m_launch; member