mirror of
https://github.com/NVIDIA/nvbench.git
synced 2026-04-19 22:38:52 +00:00
* Reworked cupti_profiler to use Host + Range Profiler APIs end-to-end
NVPW_* API has been deprecated since CTK 13.0. Followed advice in compliation
message to replace NVPW_* API with CUPTI Profiler Host API.
`libnvbench.so` no longer links to `nvperf_host` directly, only to `libcupti`.
NVBench uses the **CUPTI Host API** to build a config image from metric names,
and the **Range Profiler API** to collect and decode counters. The host API never
collects data directly; it prepares and evaluates data produced by range profiling.
Introduce `host_impl`/`profiler_init_guard` to manage CUPTI Host object and
initialization/deinitialization, including safe move-assignment cleanup.
`profiler_init_guard` initializes profiler, and throws if CUPTI returns
an error code. `profiler_init_guard::finalize_profiler()` de-inits profiler
and returns the error code. Destructor calls finalize_profiler, but ignores
the status code. If user wants to explicitly de-initialize profiler and handle
the error, he/she is advised to call `finalize_profiler()` directly. The guard
has a boolean member variable to allow destructor to work even if user explicitly
called finalize_profiler() method.
The old counter-data prefix/scratch flow was replaced with the Range Profiler counter
data image sizing/initialization path and decode flow.
Host API metric filtering (base metrics + context scope) and Host-side evaluation to
GPU values via cuptiProfilerHostEvaluateToGpuValues is implemented.
- **Host object**: `host_impl::object` in `nvbench/cupti_profiler.cxx`.
- **Range profiler object**: `host_impl::range_profiler_object`.
- **Config image**: `m_config_image`.
- **Counter data image**: `m_data_image`.
1) **Host init + config image**
- `initialize_profiler_host()` creates the host object.
- `initialize_config_image_host()` adds metrics and builds the config image.
2) **Range profiler enable + counter data image**
- `enable_range_profiler()` creates the range profiler object.
- `initialize_counter_data_image()` sizes and initializes the data image using
the range profiler object, matching the CUPTI samples.
3) **Config + collect + decode**
- `set_range_profiler_config()` binds the config image + data image.
- `start_user_loop()` / `stop_user_loop()` push/pop the user range and
start/stop the range profiler.
- `process_user_loop()` decodes counter data via
`cuptiRangeProfilerDecodeData()`.
4) **Evaluate metrics**
- `get_counter_values()` calls `cuptiProfilerHostEvaluateToGpuValues()` to
convert counter data into metric values.
The
* Use class instead of struct in profiler_init_guard; forward declaration
* Add SFINAE guards before accessing members not present in earlier CTK versions
* Check if cupti_profiler_host.h exists, use old/new implementation based on that check
1. Reintroduced legacy `cupti_profiler_nvpw.cuh` and `cupti_profiler_nvpw.cuh`.
2. Moved profiler-host-API implementation to `cupti_profiler_host.cuh`, `cupti_profiler_host.cxx`.
3. Add `nvbench/cupti_profiler.cuh` which checks if `cupti_profiler_host.h` header is known and
includes `cupti_profiler_host.cuh` or `cupti_profiler_nvpw.cuh` respectively.
4. In cmake, we check if ${nvbench_cupti_root}/include/cupti_profiler_host.h file exists.
If it does not, `libnvbench.so` would have dependency on libnvperf_host and libnvperf_target
in addition to dependency on libcupti. If the header exists, it would only depend on libcupti
51 lines
1.6 KiB
CMake
51 lines
1.6 KiB
CMake
# Since this file is installed, we need to make sure that the CUDAToolkit has
|
|
# been found by consumers:
|
|
if (NOT TARGET CUDA::toolkit)
|
|
find_package(CUDAToolkit REQUIRED)
|
|
endif()
|
|
|
|
if (EXISTS "${CUDAToolkit_LIBRARY_ROOT}/extras/CUPTI/lib64")
|
|
# NVIDIA installer layout:
|
|
set(nvbench_cupti_root "${CUDAToolkit_LIBRARY_ROOT}/extras/CUPTI")
|
|
else()
|
|
# Ubuntu package layout:
|
|
set(nvbench_cupti_root "${CUDAToolkit_LIBRARY_ROOT}")
|
|
endif()
|
|
|
|
# The CUPTI targets in FindCUDAToolkit are broken:
|
|
# - The dll locations are not specified
|
|
# - Dependent libraries nvperf_* are not linked.
|
|
# So we create our own targets:
|
|
function(nvbench_add_cupti_dep dep_name)
|
|
string(TOLOWER ${dep_name} dep_name_lower)
|
|
string(TOUPPER ${dep_name} dep_name_upper)
|
|
|
|
add_library(nvbench::${dep_name_lower} SHARED IMPORTED)
|
|
|
|
find_library(NVBench_${dep_name_upper}_LIBRARY ${dep_name_lower} REQUIRED
|
|
DOC "The full path to lib${dep_name_lower}.so from the CUDA Toolkit."
|
|
HINTS "${nvbench_cupti_root}/lib64"
|
|
)
|
|
mark_as_advanced(NVBench_${dep_name_upper}_LIBRARY)
|
|
|
|
set_target_properties(nvbench::${dep_name_lower} PROPERTIES
|
|
IMPORTED_LOCATION "${NVBench_${dep_name_upper}_LIBRARY}"
|
|
)
|
|
endfunction()
|
|
|
|
nvbench_add_cupti_dep(cupti)
|
|
target_include_directories(nvbench::cupti INTERFACE
|
|
"${nvbench_cupti_root}/include"
|
|
)
|
|
|
|
if (NOT EXISTS "${nvbench_cupti_root}/include/cupti_profiler_host.h")
|
|
# Profile Host API does not exist yet, need NVPERF libraries
|
|
# for NVPW_* API used in nvbench::cupti_profiler
|
|
nvbench_add_cupti_dep(nvperf_target)
|
|
nvbench_add_cupti_dep(nvperf_host)
|
|
target_link_libraries(nvbench::cupti INTERFACE
|
|
nvbench::nvperf_target
|
|
nvbench::nvperf_host
|
|
)
|
|
endif()
|