Files
nvbench/cmake/NVBenchCUPTI.cmake
Oleksandr Pavlyk 9a91b9ef0c Reworked cupti_profiler to use Host + Range Profiler APIs end-to-end (#327)
* Reworked cupti_profiler to use Host + Range Profiler APIs end-to-end

NVPW_* API has been deprecated since CTK 13.0. Followed advice in compliation
message to replace NVPW_* API with CUPTI Profiler Host API.

`libnvbench.so` no longer links to `nvperf_host` directly, only to `libcupti`.

NVBench uses the **CUPTI Host API** to build a config image from metric names,
and the **Range Profiler API** to collect and decode counters. The host API never
collects data directly; it prepares and evaluates data produced by range profiling.

Introduce `host_impl`/`profiler_init_guard` to manage CUPTI Host object and
initialization/deinitialization, including safe move-assignment cleanup.

`profiler_init_guard` initializes profiler, and throws if CUPTI returns
an error code. `profiler_init_guard::finalize_profiler()` de-inits profiler
and returns the error code. Destructor calls finalize_profiler, but ignores
the status code. If user wants to explicitly de-initialize profiler and handle
the error, he/she is advised to call `finalize_profiler()` directly. The guard
has a boolean member variable to allow destructor to work even if user explicitly
called finalize_profiler() method.

The old counter-data prefix/scratch flow was replaced with the Range Profiler counter
data image sizing/initialization path and decode flow.

Host API metric filtering (base metrics + context scope) and Host-side evaluation to
GPU values via cuptiProfilerHostEvaluateToGpuValues is implemented.

   - **Host object**: `host_impl::object` in `nvbench/cupti_profiler.cxx`.
   - **Range profiler object**: `host_impl::range_profiler_object`.
   - **Config image**: `m_config_image`.
   - **Counter data image**: `m_data_image`.

1) **Host init + config image**
   - `initialize_profiler_host()` creates the host object.
   - `initialize_config_image_host()` adds metrics and builds the config image.

2) **Range profiler enable + counter data image**
   - `enable_range_profiler()` creates the range profiler object.
   - `initialize_counter_data_image()` sizes and initializes the data image using
     the range profiler object, matching the CUPTI samples.

3) **Config + collect + decode**
   - `set_range_profiler_config()` binds the config image + data image.
   - `start_user_loop()` / `stop_user_loop()` push/pop the user range and
     start/stop the range profiler.
   - `process_user_loop()` decodes counter data via
     `cuptiRangeProfilerDecodeData()`.

4) **Evaluate metrics**
   - `get_counter_values()` calls `cuptiProfilerHostEvaluateToGpuValues()` to
     convert counter data into metric values.

The

* Use class instead of struct in profiler_init_guard; forward declaration

* Add SFINAE guards before accessing members not present in earlier CTK versions

* Check if cupti_profiler_host.h exists, use old/new implementation based on that check

1. Reintroduced legacy `cupti_profiler_nvpw.cuh` and `cupti_profiler_nvpw.cuh`.
2. Moved profiler-host-API implementation to `cupti_profiler_host.cuh`, `cupti_profiler_host.cxx`.
3. Add `nvbench/cupti_profiler.cuh` which checks if `cupti_profiler_host.h` header is known and
   includes `cupti_profiler_host.cuh` or `cupti_profiler_nvpw.cuh` respectively.
4. In cmake, we check if ${nvbench_cupti_root}/include/cupti_profiler_host.h file exists.
   If it does not, `libnvbench.so` would have dependency on libnvperf_host and libnvperf_target
   in addition to dependency on libcupti. If the header exists, it would only depend on libcupti
2026-03-23 11:51:16 -04:00

51 lines
1.6 KiB
CMake

# Since this file is installed, we need to make sure that the CUDAToolkit has
# been found by consumers:
if (NOT TARGET CUDA::toolkit)
find_package(CUDAToolkit REQUIRED)
endif()
if (EXISTS "${CUDAToolkit_LIBRARY_ROOT}/extras/CUPTI/lib64")
# NVIDIA installer layout:
set(nvbench_cupti_root "${CUDAToolkit_LIBRARY_ROOT}/extras/CUPTI")
else()
# Ubuntu package layout:
set(nvbench_cupti_root "${CUDAToolkit_LIBRARY_ROOT}")
endif()
# The CUPTI targets in FindCUDAToolkit are broken:
# - The dll locations are not specified
# - Dependent libraries nvperf_* are not linked.
# So we create our own targets:
function(nvbench_add_cupti_dep dep_name)
string(TOLOWER ${dep_name} dep_name_lower)
string(TOUPPER ${dep_name} dep_name_upper)
add_library(nvbench::${dep_name_lower} SHARED IMPORTED)
find_library(NVBench_${dep_name_upper}_LIBRARY ${dep_name_lower} REQUIRED
DOC "The full path to lib${dep_name_lower}.so from the CUDA Toolkit."
HINTS "${nvbench_cupti_root}/lib64"
)
mark_as_advanced(NVBench_${dep_name_upper}_LIBRARY)
set_target_properties(nvbench::${dep_name_lower} PROPERTIES
IMPORTED_LOCATION "${NVBench_${dep_name_upper}_LIBRARY}"
)
endfunction()
nvbench_add_cupti_dep(cupti)
target_include_directories(nvbench::cupti INTERFACE
"${nvbench_cupti_root}/include"
)
if (NOT EXISTS "${nvbench_cupti_root}/include/cupti_profiler_host.h")
# Profile Host API does not exist yet, need NVPERF libraries
# for NVPW_* API used in nvbench::cupti_profiler
nvbench_add_cupti_dep(nvperf_target)
nvbench_add_cupti_dep(nvperf_host)
target_link_libraries(nvbench::cupti INTERFACE
nvbench::nvperf_target
nvbench::nvperf_host
)
endif()