mirror of
https://github.com/NVIDIA/nvbench.git
synced 2026-05-12 01:10:01 +00:00
* Reworked cupti_profiler to use Host + Range Profiler APIs end-to-end
NVPW_* API has been deprecated since CTK 13.0. Followed advice in compliation
message to replace NVPW_* API with CUPTI Profiler Host API.
`libnvbench.so` no longer links to `nvperf_host` directly, only to `libcupti`.
NVBench uses the **CUPTI Host API** to build a config image from metric names,
and the **Range Profiler API** to collect and decode counters. The host API never
collects data directly; it prepares and evaluates data produced by range profiling.
Introduce `host_impl`/`profiler_init_guard` to manage CUPTI Host object and
initialization/deinitialization, including safe move-assignment cleanup.
`profiler_init_guard` initializes profiler, and throws if CUPTI returns
an error code. `profiler_init_guard::finalize_profiler()` de-inits profiler
and returns the error code. Destructor calls finalize_profiler, but ignores
the status code. If user wants to explicitly de-initialize profiler and handle
the error, he/she is advised to call `finalize_profiler()` directly. The guard
has a boolean member variable to allow destructor to work even if user explicitly
called finalize_profiler() method.
The old counter-data prefix/scratch flow was replaced with the Range Profiler counter
data image sizing/initialization path and decode flow.
Host API metric filtering (base metrics + context scope) and Host-side evaluation to
GPU values via cuptiProfilerHostEvaluateToGpuValues is implemented.
- **Host object**: `host_impl::object` in `nvbench/cupti_profiler.cxx`.
- **Range profiler object**: `host_impl::range_profiler_object`.
- **Config image**: `m_config_image`.
- **Counter data image**: `m_data_image`.
1) **Host init + config image**
- `initialize_profiler_host()` creates the host object.
- `initialize_config_image_host()` adds metrics and builds the config image.
2) **Range profiler enable + counter data image**
- `enable_range_profiler()` creates the range profiler object.
- `initialize_counter_data_image()` sizes and initializes the data image using
the range profiler object, matching the CUPTI samples.
3) **Config + collect + decode**
- `set_range_profiler_config()` binds the config image + data image.
- `start_user_loop()` / `stop_user_loop()` push/pop the user range and
start/stop the range profiler.
- `process_user_loop()` decodes counter data via
`cuptiRangeProfilerDecodeData()`.
4) **Evaluate metrics**
- `get_counter_values()` calls `cuptiProfilerHostEvaluateToGpuValues()` to
convert counter data into metric values.
The
* Use class instead of struct in profiler_init_guard; forward declaration
* Add SFINAE guards before accessing members not present in earlier CTK versions
* Check if cupti_profiler_host.h exists, use old/new implementation based on that check
1. Reintroduced legacy `cupti_profiler_nvpw.cuh` and `cupti_profiler_nvpw.cuh`.
2. Moved profiler-host-API implementation to `cupti_profiler_host.cuh`, `cupti_profiler_host.cxx`.
3. Add `nvbench/cupti_profiler.cuh` which checks if `cupti_profiler_host.h` header is known and
includes `cupti_profiler_host.cuh` or `cupti_profiler_nvpw.cuh` respectively.
4. In cmake, we check if ${nvbench_cupti_root}/include/cupti_profiler_host.h file exists.
If it does not, `libnvbench.so` would have dependency on libnvperf_host and libnvperf_target
in addition to dependency on libcupti. If the header exists, it would only depend on libcupti