Add ck-rocprof: GPU profiling tool for rocprof-compute (#3627)

* Decouple configure/build/test tools from Docker

Create a two-layer tool architecture:
- Core tools (ck-configure, ck-build, ck-test): Environment-agnostic,
  work on any system with ROCm - no Docker dependency
- Container tools (ck-docker): Manage Docker containers and delegate
  to core tools via docker exec

Changes:
- Add ck-configure: New CMake configuration tool with preset support,
  native GPU detection, and flexible options
- Refactor ck-build: Remove Docker dependency, add --configure and
  --list options, call ninja directly
- Refactor ck-test: Remove Docker dependency, add CTest integration
  with --smoke/--regression/--all options
- Enhance common.sh: Add native GPU detection, build directory utils,
  and output helpers
- Update ck-docker: Add configure/build/test/exec commands that
  delegate to core tools inside container

This enables:
- Native development on ROCm hosts without Docker
- Simpler CI/CD integration
- Consistent behavior inside and outside containers

Co-Authored-By: Claude <noreply@anthropic.com>

* Add ck-rocprof: GPU profiling tool for rocprof-compute

Adds a command-line profiling tool to simplify GPU performance
analysis workflow using AMD rocprof-compute.

Features:
- Easy setup with automatic Python venv configuration
- Simple CLI: setup, run, analyze, compare, list
- Automatic GPU architecture detection
- Focus on LDS metrics (Block 12) for bank conflict analysis
- Comprehensive documentation with examples and troubleshooting

Usage:
  ck-rocprof setup                    # One-time environment setup
  ck-rocprof run <name> <executable>  # Profile executable
  ck-rocprof analyze <name> [block]   # Analyze metrics
  ck-rocprof compare <name1> <name2>  # Compare two runs
  ck-rocprof list                     # List available runs

* Make ck-rocprof documentation concise and improve Docker integration

- Streamlined documentation from 416 to 157 lines (62% reduction)
- Focused on essential commands, metrics, and workflows
- Enhanced script to run all operations inside Docker containers
- Fixed workload directory path and improved container management
- Added automatic rocprofiler-compute installation and dependency handling

* Add --no-roof flag to ck-rocprof profile command

Skip roofline analysis by default to speed up profiling. Roofline
analysis can add significant time to profiling runs but is not
needed for most LDS bank conflict analysis workflows.

* Make ck-rocprof work independently of Docker

Add native execution mode that runs rocprof-compute directly on the host
system when available, falling back to Docker mode when not.

Key changes:
- Auto-detect native mode when rocprof-compute is in PATH or common locations
- Add execution mode wrappers (exec_cmd, file_exists, dir_exists, etc.)
- Native mode stores venv at .ck-rocprof-venv in project root
- Native mode stores workloads at build/workloads/
- Support user-installed rocprofiler-compute (e.g., ~/.local/rocprofiler-compute)
- Add CK_FORCE_DOCKER env var to force Docker mode
- Update help message to show current execution mode
- Maintain full backward compatibility with existing Docker workflow

Tested successfully with rocprofiler-compute 3.4.0 installed from source
on MI300X GPU in native mode.

Co-Authored-By: Claude <noreply@anthropic.com>

* Add clean/status commands and improve ck-rocprof robustness

- Add 'clean' command to remove profiling runs (supports --all)
- Add 'status' command to show configuration and environment info
- Add workload name validation to prevent path traversal attacks
- Fix uv installation to use pip instead of curl for reliability
- Add cross-platform stat support for macOS compatibility
- Consolidate ROCPROF_CANDIDATES to avoid code duplication
- Expand help documentation with all profiling block descriptions
- Fix Docker wrapper script escaping issues

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix analyze command to use correct workload path

rocprof-compute stores results directly in the workload directory
(pmc_perf.csv) rather than in a GPU architecture subdirectory.
Updated find_workload_path to detect this correctly.

Co-Authored-By: Claude <noreply@anthropic.com>

* Address PR review security and robustness issues

Security fixes:
- Escape executable path in cmd_run to prevent shell injection
- Add workload name validation to cmd_analyze and cmd_compare

Robustness improvements:
- Add error checking for uv package manager installation
- Use consistent project root detection (find_project_root || get_project_root)
- Use /opt/rocm instead of hardcoded /opt/rocm-7.0.1 in Docker mode
- Derive ROCM_REQUIREMENTS path from ROCPROF_BIN for flexibility
- Use gfx950 as fallback GPU consistent with common.sh

Documentation updates:
- Fix env var name GPU_TARGET -> CK_GPU_TARGET
- Update storage layout to reflect current structure (workloads/<name>/)
- Document clean and status commands
- Clarify native vs Docker default paths

Co-Authored-By: Claude <noreply@anthropic.com>

* Simplify ck-rocprof to native-only mode

Remove Docker mode from ck-rocprof. Docker users should run the tool
via `ck-docker exec ck-rocprof ...` instead.

This simplification:
- Removes ~210 lines of Docker-specific code
- Eliminates mode detection complexity
- Makes the script easier to maintain
- Provides clearer error messages when rocprof-compute is not found

The setup command now lists all searched locations when rocprof-compute
is not found, helping users understand how to install it.

Co-Authored-By: Claude <noreply@anthropic.com>

* Add rocprofiler-compute source installation fallback

When rocprof-compute is not found in system locations, automatically
install rocprofiler-compute 3.4.0 from source as a fallback. This
eliminates the hard dependency on system ROCm packages.

Implementation details:
- Clone rocprofiler-compute from GitHub to ~/.local/
- Install dependencies via requirements.txt (not editable install)
- Create wrapper that sets PYTHONPATH to source directory
- Execute source script directly rather than importing as module

This approach matches the project's development workflow and works
around the incomplete pyproject.toml that prevents editable installs.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
This commit is contained in:
Max Podkorytov
2026-01-29 17:20:22 -08:00
committed by GitHub
parent 05ef93a69d
commit 83b6155354
7 changed files with 1528 additions and 275 deletions

View File

@@ -2,7 +2,8 @@
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
# CK Build - Build Composable Kernel targets in Docker
# CK Build - Build Composable Kernel targets
# Environment-agnostic: works natively on ROCm hosts or inside containers
set -e
set -o pipefail
@@ -12,46 +13,51 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/common.sh"
# Initialize configuration
PROJECT_ROOT=$(get_project_root "${SCRIPT_DIR}")
CONTAINER_NAME=$(get_container_name "${PROJECT_ROOT}")
PROJECT_ROOT=$(find_project_root "${SCRIPT_DIR}" || get_project_root "${SCRIPT_DIR}")
BUILD_DIR=$(get_build_dir "${PROJECT_ROOT}")
# Help message
show_help() {
cat << EOF
CK Build - Build Composable Kernel targets in Docker
CK Build - Build Composable Kernel targets
Usage: ck-build [options] [target...]
Options:
-h, --help Show this help message
--name <name> Specify container name
--reconfigure Reconfigure CMake before building
-j <N> Parallel jobs (passed to ninja)
-v, --verbose Verbose output
--build-dir <dir> Build directory (default: ./build)
--clean Clean before building
--configure Auto-configure if build.ninja missing
--list List available targets
Arguments:
target Target(s) to build (default: all)
Environment:
CK_CONTAINER_NAME - Override default container name
GPU_TARGET - Override GPU target detection (e.g., gfx950, gfx942)
CK_BUILD_DIR - Override build directory
CK_GPU_TARGET - Override GPU target for auto-configure
Examples:
ck-build # Build all targets
ck-build test_amdgcn_mma # Build specific target
ck-build test_amdgcn_mma test_gemm # Build multiple targets
ck-build --reconfigure # Reconfigure CMake and build all
ck-build --configure # Auto-configure and build all
ck-build --clean test_amdgcn_mma # Clean and build target
ck-build -j 8 test_amdgcn_mma # Build with 8 parallel jobs
ck-build --list # List available targets
EOF
}
# Parse arguments
targets=()
reconfigure=false
clean=false
parallel_jobs=""
verbose=false
clean=false
auto_configure=false
list_targets=false
while [[ $# -gt 0 ]]; do
case $1 in
@@ -59,21 +65,35 @@ while [[ $# -gt 0 ]]; do
show_help
exit 0
;;
--name)
CONTAINER_NAME="$2"
-j)
require_arg "$1" "${2:-}"
parallel_jobs="$2"
shift 2
;;
--reconfigure)
reconfigure=true
-j*)
parallel_jobs="${1#-j}"
shift
;;
-v|--verbose)
verbose=true
shift
;;
--build-dir)
require_arg "$1" "${2:-}"
BUILD_DIR="$2"
shift 2
;;
--clean)
clean=true
shift
;;
-j)
parallel_jobs="-j $2"
shift 2
--configure)
auto_configure=true
shift
;;
--list)
list_targets=true
shift
;;
*)
targets+=("$1")
@@ -82,62 +102,62 @@ while [[ $# -gt 0 ]]; do
esac
done
# Ensure container is running
if ! container_is_running "${CONTAINER_NAME}"; then
echo "Container '${CONTAINER_NAME}' not running. Starting..."
"${SCRIPT_DIR}/ck-start" "${CONTAINER_NAME}"
# Handle --list
if [ "$list_targets" = true ]; then
if ! is_build_configured "${BUILD_DIR}"; then
error "Build not configured. Run 'ck-configure' first or use --configure"
exit 1
fi
info "Available targets:"
cd "${BUILD_DIR}"
ninja -t targets 2>/dev/null | grep -E '^[a-zA-Z_][a-zA-Z0-9_-]*:' | cut -d: -f1 | sort | head -100
echo ""
echo "(Showing first 100 targets. Use 'ninja -t targets' for full list)"
exit 0
fi
# Configure CMake if needed or requested
if [ "$reconfigure" = true ] || ! docker exec "${CONTAINER_NAME}" test -f /workspace/build/build.ninja 2>/dev/null; then
echo "Detecting GPU target..."
GPU_TARGET_DETECTED=$(detect_gpu_target "${CONTAINER_NAME}")
if [ "$reconfigure" = true ]; then
echo "Reconfiguring CMake from scratch for GPU target: ${GPU_TARGET_DETECTED}"
# Auto-configure if needed
if ! is_build_configured "${BUILD_DIR}"; then
if [ "$auto_configure" = true ]; then
info "Build not configured. Running ck-configure..."
"${SCRIPT_DIR}/ck-configure" --build-dir "${BUILD_DIR}"
echo ""
else
echo "Configuring build with CMake for GPU target: ${GPU_TARGET_DETECTED}"
error "Build not configured. Run 'ck-configure' first or use --configure"
exit 1
fi
docker exec "${CONTAINER_NAME}" bash -c "
cd /workspace || exit 1
rm -rf /workspace/build
mkdir /workspace/build
cd /workspace/build || exit 1
cmake .. -GNinja \
-DGPU_TARGETS=${GPU_TARGET_DETECTED} \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ \
-DBUILD_TESTING=ON 2>&1 | tail -30
"
echo ""
fi
# Clean if requested
if [ "$clean" = true ]; then
echo "Cleaning build directory..."
docker exec "${CONTAINER_NAME}" bash -c "
cd /workspace/build || exit 1
ninja clean
"
info "Cleaning build directory..."
cd "${BUILD_DIR}"
ninja clean
echo ""
fi
# Build targets
if [ ${#targets[@]} -eq 0 ]; then
echo "Building all configured targets..."
docker exec "${CONTAINER_NAME}" bash -c "
cd /workspace/build || exit 1
ninja ${parallel_jobs} 2>&1
"
else
echo "Building targets: ${targets[*]}"
docker exec "${CONTAINER_NAME}" bash -c "
cd /workspace/build || exit 1
ninja ${parallel_jobs} ${targets[*]} 2>&1
"
# Build ninja command
ninja_cmd=(ninja -C "${BUILD_DIR}")
if [ -n "$parallel_jobs" ]; then
ninja_cmd+=("-j" "$parallel_jobs")
fi
if [ "$verbose" = true ]; then
ninja_cmd+=(-v)
fi
# Add targets
ninja_cmd+=("${targets[@]}")
# Build targets
if [ ${#targets[@]} -eq 0 ]; then
info "Building all configured targets..."
else
info "Building targets: ${targets[*]}"
fi
"${ninja_cmd[@]}"
echo ""
echo "Build complete"
info "Build complete"

187
script/tools/ck-configure Executable file
View File

@@ -0,0 +1,187 @@
#!/bin/bash
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
# CK Configure - Configure CMake build for Composable Kernel
# Environment-agnostic: works natively on ROCm hosts or inside containers
set -e
set -o pipefail
# Find script directory and load common utilities
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/common.sh"
# Initialize configuration
PROJECT_ROOT=$(find_project_root "${SCRIPT_DIR}" || get_project_root "${SCRIPT_DIR}")
BUILD_DIR=$(get_build_dir "${PROJECT_ROOT}")
# Help message
show_help() {
cat << EOF
CK Configure - Configure CMake build for Composable Kernel
Usage: ck-configure [options]
Options:
-h, --help Show this help message
--preset <name> Use CMake preset (dev, dev-gfx908, dev-gfx90a, dev-gfx942, dev-gfx950)
--gpu <target> Override GPU_TARGETS (auto-detected if not specified)
--dtypes <types> Set DTYPES (e.g., fp16,fp32,bf16)
--build-type <type> CMAKE_BUILD_TYPE (default: Release)
--build-dir <dir> Build directory (default: ./build)
--clean Remove existing build directory before configuring
--list-presets List available CMake presets
-D <VAR>=<value> Pass additional CMake variable
Environment:
CK_GPU_TARGET - Override GPU target detection (e.g., gfx950, gfx942)
CK_BUILD_DIR - Override build directory
Examples:
ck-configure # Auto-detect GPU and configure
ck-configure --preset dev-gfx950 # Use CMake preset
ck-configure --gpu gfx942 # Configure for specific GPU
ck-configure --clean --preset dev # Clean and reconfigure
ck-configure -D BUILD_DEV=ON # Pass CMake variable
EOF
}
# Parse arguments
preset=""
gpu_target=""
dtypes=""
build_type="Release"
clean=false
list_presets=false
cmake_vars=()
while [[ $# -gt 0 ]]; do
case $1 in
-h|--help)
show_help
exit 0
;;
--preset)
require_arg "$1" "${2:-}"
preset="$2"
shift 2
;;
--gpu)
require_arg "$1" "${2:-}"
gpu_target="$2"
shift 2
;;
--dtypes)
require_arg "$1" "${2:-}"
dtypes="$2"
shift 2
;;
--build-type)
require_arg "$1" "${2:-}"
build_type="$2"
shift 2
;;
--build-dir)
require_arg "$1" "${2:-}"
BUILD_DIR="$2"
shift 2
;;
--clean)
clean=true
shift
;;
--list-presets)
list_presets=true
shift
;;
-D)
require_arg "$1" "${2:-}"
cmake_vars+=("-D$2")
shift 2
;;
-D*)
cmake_vars+=("$1")
shift
;;
*)
error "Unknown option: $1"
echo ""
show_help
exit 1
;;
esac
done
# Handle --list-presets
if [ "$list_presets" = true ]; then
echo "Available CMake presets:"
presets=$(list_cmake_presets "${PROJECT_ROOT}" 2>/dev/null)
if [ -n "$presets" ]; then
echo "$presets" | sed 's/^/ /'
else
echo " (No CMakePresets.json found or jq not available)"
fi
exit 0
fi
# Clean build directory if requested
if [ "$clean" = true ]; then
if [ -d "${BUILD_DIR}" ]; then
info "Removing existing build directory: ${BUILD_DIR}"
rm -rf "${BUILD_DIR}"
fi
fi
# Create build directory
mkdir -p "${BUILD_DIR}"
# Change to project root for CMake
cd "${PROJECT_ROOT}"
# Build CMake command
cmake_cmd=(cmake -S . -B "${BUILD_DIR}" -GNinja)
# Use preset if specified
if [ -n "$preset" ]; then
cmake_cmd+=(--preset "${preset}")
info "Using CMake preset: ${preset}"
else
# Manual configuration
# Detect GPU target if not specified
if [ -z "$gpu_target" ]; then
gpu_target=$(detect_gpu_native)
info "Auto-detected GPU target: ${gpu_target}"
else
info "Using specified GPU target: ${gpu_target}"
fi
cmake_cmd+=(-DGPU_TARGETS="${gpu_target}")
cmake_cmd+=(-DCMAKE_BUILD_TYPE="${build_type}")
cmake_cmd+=(-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++)
cmake_cmd+=(-DBUILD_TESTING=ON)
# Add DTYPES if specified
if [ -n "$dtypes" ]; then
cmake_cmd+=(-DDTYPES="${dtypes}")
info "Using DTYPES: ${dtypes}"
fi
fi
# Add any additional CMake variables
for var in "${cmake_vars[@]}"; do
cmake_cmd+=("$var")
done
# Run CMake
info "Configuring build in: ${BUILD_DIR}"
echo "Running: ${cmake_cmd[*]}"
echo ""
"${cmake_cmd[@]}"
echo ""
info "Configuration complete. Build directory: ${BUILD_DIR}"
info "Next: run 'ck-build' to build targets"

View File

@@ -22,25 +22,29 @@ CK Docker Tool - Build and test composable_kernel in Docker
Usage: ck-docker <command> [options]
Commands:
start [name] Start Docker container
build [target] [--reconfigure] Build target (optionally reconfigure CMake)
test <test> [options] Run test
shell [name] Open shell in container
status [name] Check container status
stop [name] Stop and remove container
Container Management:
start [name] Start Docker container
stop [name] Stop and remove container
status [name] Check container status
shell [name] Open shell in container
Build/Test (delegates to core tools inside container):
configure [opts] Run ck-configure in container
build [opts] Run ck-build in container
test [opts] Run ck-test in container
exec <cmd> Run arbitrary command in container
Examples:
ck-docker start
ck-docker configure --preset dev-gfx950
ck-docker build test_amdgcn_mma
ck-docker build --reconfigure test_amdgcn_mma
ck-docker test test_amdgcn_mma --gtest_filter=*Fp16*
ck-docker test test_amdgcn_mma --filter '*Fp16*'
ck-docker shell
ck-docker exec rocminfo
Environment:
CK_CONTAINER_NAME - Override default container name (default: ck_<username>_<branch>)
CK_DOCKER_IMAGE - Override Docker image (default: rocm/composable_kernel:ck_ub24.04_rocm7.0.1)
GPU_TARGET - Override GPU target detection (e.g., gfx950, gfx942)
EOF
}
@@ -77,126 +81,38 @@ cmd_start() {
docker exec "${name}" bash -c "echo 'Working directory:' && pwd"
}
# Build target
cmd_build() {
local target=""
local name="${CONTAINER_NAME}"
local reconfigure=false
while [[ $# -gt 0 ]]; do
case $1 in
--name)
name="$2"
shift 2
;;
--reconfigure)
reconfigure=true
shift
;;
*)
target="$1"
shift
;;
esac
done
# Check if container is running
if ! container_is_running "${name}"; then
echo "Container '${name}' not running. Starting..."
cmd_start "${name}"
fi
# Reconfigure CMake if requested or if build.ninja doesn't exist
if [ "$reconfigure" = true ] || ! docker exec "${name}" test -f /workspace/build/build.ninja 2>/dev/null; then
echo "Detecting GPU target..."
local gpu_target=$(detect_gpu_target "${name}")
if [ "$reconfigure" = true ]; then
echo "Reconfiguring CMake from scratch for GPU target: ${gpu_target}"
else
echo "Configuring build with CMake for GPU target: ${gpu_target}"
fi
docker exec "${name}" bash -c "
cd /workspace || exit 1
rm -rf /workspace/build
mkdir /workspace/build
cd /workspace/build || exit 1
cmake .. -GNinja \
-DGPU_TARGETS=${gpu_target} \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ \
-DBUILD_TESTING=ON 2>&1 | tail -30
"
fi
if [ -z "$target" ]; then
echo "Building all configured targets..."
else
echo "Building target: ${target}"
fi
docker exec "${name}" bash -c "
cd /workspace/build || exit 1
ninja ${target} 2>&1
"
echo "Build complete"
# Configure (delegate to ck-configure in container)
cmd_configure() {
ensure_container_running "${CONTAINER_NAME}" "${SCRIPT_DIR}"
docker exec "${CONTAINER_NAME}" /workspace/script/tools/ck-configure "$@"
}
# Run test
# Build (delegate to ck-build in container)
cmd_build() {
ensure_container_running "${CONTAINER_NAME}" "${SCRIPT_DIR}"
docker exec "${CONTAINER_NAME}" /workspace/script/tools/ck-build "$@"
}
# Test (delegate to ck-test in container)
cmd_test() {
local test_name=""
local name="${CONTAINER_NAME}"
local -a test_options=()
ensure_container_running "${CONTAINER_NAME}" "${SCRIPT_DIR}"
docker exec "${CONTAINER_NAME}" /workspace/script/tools/ck-test "$@"
}
while [[ $# -gt 0 ]]; do
case $1 in
--name)
name="$2"
shift 2
;;
--gtest_*|--help)
test_options+=("$1")
shift
;;
*)
if [ -z "$test_name" ]; then
test_name="$1"
else
test_options+=("$1")
fi
shift
;;
esac
done
if [ -z "$test_name" ]; then
echo "Error: test_name required"
echo "Usage: ck-docker test <test_name> [--name container_name] [gtest_options]"
# Execute arbitrary command in container
cmd_exec() {
if [ $# -eq 0 ]; then
error "command required"
echo "Usage: ck-docker exec <command>"
return 1
fi
# Check if container is running
if ! container_is_running "${name}"; then
echo "Error: Container '${name}' not running"
echo "Start it with: ck-docker start --name ${name}"
return 1
fi
ensure_container_running "${CONTAINER_NAME}" "${SCRIPT_DIR}"
if ! docker exec "${name}" test -f "/workspace/build/bin/${test_name}" 2>/dev/null; then
echo "Test executable not found. Building ${test_name}..."
cmd_build "${test_name}" --name "${name}"
fi
local docker_flags=()
[ -t 0 ] && [ -t 1 ] && docker_flags+=("-it")
echo "Running: ${test_name} ${test_options[*]}"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
# Build the command with proper quoting
local cmd="cd /workspace/build && ./bin/${test_name}"
for opt in "${test_options[@]}"; do
cmd="${cmd} $(printf '%q' "$opt")"
done
docker exec "${name}" bash -c "${cmd}"
docker exec "${docker_flags[@]}" "${CONTAINER_NAME}" "$@"
}
# Shell
@@ -220,7 +136,7 @@ cmd_status() {
if [ -z "$name" ]; then
echo "Composable Kernel Docker Containers:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "---"
docker ps -a --filter "ancestor=${docker_image}" \
--format "table {{.Names}}\t{{.Status}}\t{{.CreatedAt}}" || echo "No containers found"
else
@@ -262,6 +178,10 @@ case "${1:-}" in
shift
cmd_start "$@"
;;
configure)
shift
cmd_configure "$@"
;;
build)
shift
cmd_build "$@"
@@ -270,6 +190,10 @@ case "${1:-}" in
shift
cmd_test "$@"
;;
exec)
shift
cmd_exec "$@"
;;
shell)
shift
cmd_shell "$@"

806
script/tools/ck-rocprof Executable file
View File

@@ -0,0 +1,806 @@
#!/bin/bash
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
# CK ROCProf Tool - Profile CK applications with rocprof-compute
# Native-only tool. For Docker usage, run via: ck-docker exec ck-rocprof ...
set -e
set -o pipefail
# Find script directory and load common utilities
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/common.sh"
# Initialize configuration
PROJECT_ROOT=$(find_project_root "${SCRIPT_DIR}" || get_project_root "${SCRIPT_DIR}")
# ============================================================================
# rocprof-compute detection
# ============================================================================
# Common rocprof-compute binary locations
# Order: user installs first, then system ROCm versions (newest first)
ROCPROF_CANDIDATES=(
"${HOME}/.local/rocprofiler-compute/3.4.0/bin/rocprof-compute"
"/opt/rocm/bin/rocprof-compute"
"/opt/rocm-7.2.0/bin/rocprof-compute"
"/opt/rocm-7.0.1/bin/rocprof-compute"
"/opt/rocm-6.2.0/bin/rocprof-compute"
"/opt/rocm-6.1.0/bin/rocprof-compute"
)
# Find rocprof-compute binary
find_rocprof_bin() {
# Check CK_ROCPROF_BIN first
if [ -n "${CK_ROCPROF_BIN:-}" ] && [ -f "${CK_ROCPROF_BIN}" ]; then
echo "${CK_ROCPROF_BIN}"
return 0
fi
# Check PATH
if command -v rocprof-compute &>/dev/null; then
command -v rocprof-compute
return 0
fi
# Check common ROCm locations and user installations
for bin in "${ROCPROF_CANDIDATES[@]}"; do
if [ -f "$bin" ]; then
echo "$bin"
return 0
fi
done
return 1
}
# Find ROCm requirements file
find_rocm_requirements() {
local rocprof_bin="${1:-$(find_rocprof_bin)}"
if [ -z "$rocprof_bin" ]; then
return 1
fi
# Requirements file is typically at ../libexec/rocprofiler-compute/requirements.txt
local rocm_dir
rocm_dir=$(dirname "$(dirname "$rocprof_bin")")
local req_file="${rocm_dir}/libexec/rocprofiler-compute/requirements.txt"
if [ -f "$req_file" ]; then
echo "$req_file"
return 0
fi
return 1
}
# ============================================================================
# Configuration
# ============================================================================
ROCPROF_BIN="${CK_ROCPROF_BIN:-$(find_rocprof_bin || echo "")}"
VENV_PATH="${CK_PROFILE_VENV:-${PROJECT_ROOT}/.ck-rocprof-venv}"
WORKLOAD_DIR="${CK_WORKLOAD_DIR:-$(get_build_dir "${PROJECT_ROOT}")/workloads}"
ROCM_REQUIREMENTS="${CK_ROCM_REQUIREMENTS:-$(find_rocm_requirements "${ROCPROF_BIN}" || echo "")}"
# ============================================================================
# Helper functions
# ============================================================================
# Get file/directory size
get_size() {
local path="$1"
du -sh "$path" 2>/dev/null | cut -f1
}
# Get file modification date (cross-platform: Linux and macOS)
get_date() {
local path="$1"
# Try GNU stat first (Linux), fall back to BSD stat (macOS)
if stat --version &>/dev/null 2>&1; then
stat -c %y "$path" 2>/dev/null | cut -d' ' -f1
else
stat -f %Sm -t %Y-%m-%d "$path" 2>/dev/null
fi
}
# Help message
show_help() {
cat << EOF
CK ROCProf Tool - Profile CK applications with rocprof-compute
Usage: ck-rocprof <command> [options]
Commands:
setup One-time setup: create Python venv and install dependencies
run <name> <executable> [args] Profile executable and save results as <name>
analyze <name> [block] Analyze profiling results (default: block 12 - LDS metrics)
compare <name1> <name2> Compare two profiling runs
list List available profiling runs
clean <name> Remove a profiling run (use --all for all runs)
status Show current configuration and status
help Show this help message
Examples:
ck-rocprof setup
ck-rocprof run baseline ./bin/tile_example_gemm_universal
ck-rocprof analyze baseline
ck-rocprof analyze baseline 12
ck-rocprof compare baseline optimized
ck-rocprof list
ck-rocprof clean baseline
ck-rocprof status
Environment Variables:
CK_GPU_TARGET - Override GPU detection (e.g., gfx950, MI300X)
CK_PROFILE_VENV - Python venv path (default: \$PROJECT/.ck-rocprof-venv)
CK_ROCPROF_BIN - rocprof-compute binary path
CK_ROCM_REQUIREMENTS - Path to rocprofiler-compute requirements.txt
CK_WORKLOAD_DIR - Workload storage directory
Profiling Blocks (use with 'analyze <name> <block>'):
Block 2: System Speed-of-Light (SOL)
Block 6: Shader Engine (SE) utilization
Block 7: L2 Cache metrics
Block 11: Vector L1D Cache metrics
Block 12: LDS (Local Data Share) - DEFAULT
Block 16: Instruction mix statistics
Block 17: Compute Unit (CU) metrics
LDS Metrics (Block 12):
- 12.1.3: Bank Conflict Rate (% of peak)
- 12.2.9: Bank Conflicts/Access (conflicts/access)
- 12.2.12: Bank Conflict (cycles per kernel)
- 12.2.17: LDS Data FIFO Full Rate (cycles)
Notes:
- Workload names must be alphanumeric with hyphens/underscores only
- Profiling skips roofline analysis (--no-roof) for faster execution
- Results stored in workloads/<name>/
- For Docker usage, run via: ck-docker exec ck-rocprof ...
EOF
}
# Get rocprof-compute wrapper path
get_rocprof_wrapper() {
echo "${VENV_PATH}/bin/rocprof-compute"
}
# Validate workload name to prevent path traversal and shell injection
# Allowed: alphanumeric, hyphens, underscores
validate_workload_name() {
local name="$1"
if [[ ! "$name" =~ ^[a-zA-Z0-9_-]+$ ]]; then
error "Invalid workload name: '$name'"
echo "Names must contain only letters, numbers, hyphens, and underscores"
return 1
fi
# Prevent reserved names
if [[ "$name" == "." || "$name" == ".." ]]; then
error "Invalid workload name: '$name'"
return 1
fi
return 0
}
# Check if setup is complete
is_setup_complete() {
local wrapper
wrapper=$(get_rocprof_wrapper)
[ -d "${VENV_PATH}" ] && [ -f "${wrapper}" ]
}
# ============================================================================
# Source installation
# ============================================================================
# rocprofiler-compute source installation location
ROCPROF_SOURCE_VERSION="3.4.0"
ROCPROF_SOURCE_DIR="${HOME}/.local/rocprofiler-compute/${ROCPROF_SOURCE_VERSION}"
ROCPROF_SOURCE_BIN="${ROCPROF_SOURCE_DIR}/bin/rocprof-compute"
ROCPROF_REPO_URL="https://github.com/ROCm/rocprofiler-compute.git"
ROCPROF_REPO_BRANCH="release/rocprofiler-compute-v${ROCPROF_SOURCE_VERSION}"
# Install rocprofiler-compute from source
install_from_source() {
local install_dir="${ROCPROF_SOURCE_DIR}"
local src_dir="${install_dir}/src"
info "Installing rocprofiler-compute ${ROCPROF_SOURCE_VERSION} from source..."
echo "Install location: ${install_dir}"
echo ""
# Ensure uv is available
if ! command -v uv &>/dev/null; then
info "Installing uv package manager via pip..."
if ! python3 -m pip install --user uv; then
error "Failed to install uv package manager"
return 1
fi
export PATH="${HOME}/.local/bin:${PATH}"
if ! command -v uv &>/dev/null; then
error "uv installed but not found in PATH"
return 1
fi
fi
# Create installation directory
mkdir -p "${install_dir}"
# Clone repository
if [ -d "${src_dir}" ]; then
info "Source already exists, updating..."
git -C "${src_dir}" fetch --quiet
git -C "${src_dir}" checkout --quiet "${ROCPROF_REPO_BRANCH}" 2>/dev/null || \
git -C "${src_dir}" checkout --quiet "amd-mainline"
else
info "Cloning rocprofiler-compute repository..."
if ! git clone --quiet --branch "${ROCPROF_REPO_BRANCH}" --depth 1 "${ROCPROF_REPO_URL}" "${src_dir}" 2>/dev/null; then
# Fall back to amd-mainline if release branch doesn't exist
info "Release branch not found, using amd-mainline..."
git clone --quiet --branch "amd-mainline" --depth 1 "${ROCPROF_REPO_URL}" "${src_dir}"
fi
fi
# Create venv for source installation
local venv_dir="${install_dir}/venv"
if [ ! -d "${venv_dir}" ]; then
info "Creating Python virtual environment..."
uv venv "${venv_dir}"
fi
# Install dependencies from requirements.txt
info "Installing dependencies (this may take a minute)..."
uv pip install --python "${venv_dir}/bin/python" -r "${src_dir}/requirements.txt" --quiet
# Pin pandas to avoid CSV conversion bug
uv pip install --python "${venv_dir}/bin/python" 'pandas<3.0' --quiet
# Create bin directory and wrapper script
mkdir -p "${install_dir}/bin"
cat > "${ROCPROF_SOURCE_BIN}" << 'WRAPPER_EOF'
#!/bin/bash
# rocprof-compute wrapper for source installation
INSTALL_DIR="$(cd "$(dirname "$0")/.." && pwd)"
SRC_DIR="${INSTALL_DIR}/src/src"
VENV_DIR="${INSTALL_DIR}/venv"
# Set PYTHONPATH to source directory for module imports
export PYTHONPATH="${SRC_DIR}:${PYTHONPATH}"
# Execute rocprof-compute script with venv Python
exec "${VENV_DIR}/bin/python3" "${SRC_DIR}/rocprof-compute" "$@"
WRAPPER_EOF
chmod +x "${ROCPROF_SOURCE_BIN}"
info "rocprofiler-compute installed successfully!"
echo " Binary: ${ROCPROF_SOURCE_BIN}"
echo ""
}
# ============================================================================
# Commands
# ============================================================================
# Setup: Create Python venv and install rocprof-compute dependencies
cmd_setup() {
echo "Setting up rocprof-compute profiling environment..."
echo "==========================================="
# Check if rocprof-compute exists, install from source if not
if [ -z "${ROCPROF_BIN}" ] || [ ! -f "${ROCPROF_BIN}" ]; then
warn "rocprof-compute not found in standard locations"
echo ""
echo "Searched locations:"
for bin in "${ROCPROF_CANDIDATES[@]}"; do
echo " - $bin"
done
echo ""
# Check if we can install from source
if ! command -v git &>/dev/null; then
error "git is required to install from source"
return 1
fi
if ! command -v python3 &>/dev/null; then
error "python3 is required to install from source"
return 1
fi
echo "Installing rocprofiler-compute from source..."
echo ""
if ! install_from_source; then
error "Failed to install rocprofiler-compute from source"
return 1
fi
# Update configuration with source installation
ROCPROF_BIN="${ROCPROF_SOURCE_BIN}"
ROCM_REQUIREMENTS="${ROCPROF_SOURCE_DIR}/libexec/rocprofiler-compute/requirements.txt"
fi
info "Using rocprof-compute: ${ROCPROF_BIN}"
# Check requirements file (only needed for non-source installs that use separate venv)
if [ -z "${ROCM_REQUIREMENTS}" ] || [ ! -f "${ROCM_REQUIREMENTS}" ]; then
# For source installs, requirements are bundled
if [[ "${ROCPROF_BIN}" == "${ROCPROF_SOURCE_BIN}" ]]; then
ROCM_REQUIREMENTS="${ROCPROF_SOURCE_DIR}/libexec/rocprofiler-compute/requirements.txt"
else
error "ROCm requirements file not found"
local expected_path
expected_path="$(dirname "$(dirname "${ROCPROF_BIN}")")/libexec/rocprofiler-compute/requirements.txt"
echo "Expected at: ${expected_path}"
echo "Set CK_ROCM_REQUIREMENTS to override"
return 1
fi
fi
# Check GPU access
if [ ! -r /dev/kfd ]; then
warn "No read access to /dev/kfd - GPU profiling may fail"
warn "Add user to video/render group: sudo usermod -a -G video,render \$USER"
fi
# For source installations, the venv is already set up - just create wrapper
if [[ "${ROCPROF_BIN}" == "${ROCPROF_SOURCE_BIN}" ]]; then
# Source install already has everything set up
local wrapper
wrapper=$(get_rocprof_wrapper)
mkdir -p "$(dirname "${wrapper}")"
# For source install, wrapper just calls the source binary
cat > "${wrapper}" << WRAPPER_EOF
#!/bin/bash
# rocprof-compute wrapper (using source installation)
exec "${ROCPROF_BIN}" "\$@"
WRAPPER_EOF
chmod +x "${wrapper}"
info "Wrapper created at ${wrapper}"
# Create marker file for venv directory
mkdir -p "${VENV_PATH}/bin"
touch "${VENV_PATH}/.source-install"
else
# System install - need to set up venv with dependencies
# Install uv if needed
if ! command -v uv &>/dev/null; then
info "Installing uv package manager via pip..."
if ! python3 -m pip install --user uv; then
error "Failed to install uv package manager"
return 1
fi
export PATH="${HOME}/.local/bin:${PATH}"
if ! command -v uv &>/dev/null; then
error "uv installed but not found in PATH"
echo "Try adding ~/.local/bin to your PATH"
return 1
fi
fi
# Create venv
if [ -d "${VENV_PATH}" ]; then
info "Python venv already exists at ${VENV_PATH}"
else
info "Creating Python venv at ${VENV_PATH}..."
uv venv "${VENV_PATH}"
fi
# Install dependencies
info "Installing dependencies..."
uv pip install --python "${VENV_PATH}/bin/python" -r "${ROCM_REQUIREMENTS}"
uv pip install --python "${VENV_PATH}/bin/python" 'pandas<3.0'
# Create wrapper script
local wrapper
wrapper=$(get_rocprof_wrapper)
mkdir -p "$(dirname "${wrapper}")"
cat > "${wrapper}" << WRAPPER_EOF
#!/bin/bash
# rocprof-compute wrapper using venv Python
VENV_DIR="\$(cd "\$(dirname "\$0")/.." && pwd)"
exec "\${VENV_DIR}/bin/python" "${ROCPROF_BIN}" "\$@"
WRAPPER_EOF
chmod +x "${wrapper}"
info "Wrapper created at ${wrapper}"
fi
# Create workload directory
mkdir -p "${WORKLOAD_DIR}"
info "Workload directory: ${WORKLOAD_DIR}"
echo ""
info "Setup complete! You can now use:"
echo " ck-rocprof run <name> <executable>"
}
# Detect GPU architecture
detect_gpu_arch() {
# Allow override via environment variable
if [ -n "${CK_GPU_TARGET:-}" ]; then
echo "${CK_GPU_TARGET}"
return 0
fi
if command -v rocminfo &>/dev/null; then
# Try marketing name first (MI350, MI300X)
local marketing_name
marketing_name=$(rocminfo 2>/dev/null | grep 'Marketing Name:' | grep -oE 'MI[0-9]+[A-Z]*' | head -1)
if [ -n "$marketing_name" ]; then
echo "$marketing_name"
return 0
fi
# Fallback to gfx name
local gfx_name
gfx_name=$(rocminfo 2>/dev/null | grep -oE 'gfx[0-9a-z]+' | head -1)
if [ -n "$gfx_name" ]; then
echo "$gfx_name"
return 0
fi
fi
# Try existing workload directories
if [ -d "${WORKLOAD_DIR}" ]; then
local first_dir
first_dir=$(find "${WORKLOAD_DIR}" -maxdepth 2 -type d \( -name 'gfx*' -o -name 'MI*' \) 2>/dev/null | head -1)
if [ -n "$first_dir" ]; then
basename "$first_dir"
return 0
fi
fi
# Final fallback - use gfx950 consistent with common.sh
echo "gfx950"
}
# Run profiling
cmd_run() {
# Validate argument count before shifting
if [ $# -lt 2 ]; then
error "name and executable required"
echo "Usage: ck-rocprof run <name> <executable> [args]"
return 1
fi
local name="$1"
local executable="$2"
shift 2
local -a exe_args=("$@")
# Validate workload name (prevents path traversal)
if ! validate_workload_name "$name"; then
return 1
fi
# Check setup
if ! is_setup_complete; then
error "Profiling environment not set up"
echo "Run: ck-rocprof setup"
return 1
fi
# Check if executable exists
if [ ! -f "$executable" ]; then
error "Executable not found: $executable"
return 1
fi
local wrapper
wrapper=$(get_rocprof_wrapper)
local gpu_arch
gpu_arch=$(detect_gpu_arch)
echo "Profiling: $executable ${exe_args[*]}"
echo "Run name: $name"
echo "GPU arch: $gpu_arch"
echo "==========================================="
# Build command with proper escaping to prevent shell injection
# --no-roof skips roofline analysis to speed up profiling
local escaped_executable
escaped_executable=$(printf '%q' "$executable")
local escaped_workload_dir
escaped_workload_dir=$(printf '%q' "${WORKLOAD_DIR}/${name}")
local cmd="${wrapper} profile --no-roof --path ${escaped_workload_dir} --name ${name} -- ${escaped_executable}"
for arg in "${exe_args[@]}"; do
cmd="${cmd} $(printf '%q' "$arg")"
done
# Run profiling
bash -c "${cmd}"
echo ""
info "Profiling complete"
echo "Results saved to: ${WORKLOAD_DIR}/${name}/"
echo ""
echo "Analyze with: ck-rocprof analyze ${name}"
}
# Find workload path for a given run name
find_workload_path() {
local name="$1"
local run_dir="${WORKLOAD_DIR}/${name}"
if [ ! -d "$run_dir" ]; then
return 1
fi
# Check if profiling data exists
if [ -f "${run_dir}/pmc_perf.csv" ]; then
echo "$run_dir"
return 0
fi
return 1
}
# Analyze profiling results
cmd_analyze() {
local name="$1"
local block="${2:-12}" # Default to block 12 (LDS metrics)
if [ -z "$name" ]; then
error "name required"
echo "Usage: ck-rocprof analyze <name> [block]"
return 1
fi
# Validate workload name (prevents path traversal)
if ! validate_workload_name "$name"; then
return 1
fi
# Check setup
if ! is_setup_complete; then
error "Profiling environment not set up"
echo "Run: ck-rocprof setup"
return 1
fi
local wrapper
wrapper=$(get_rocprof_wrapper)
local workload_path
workload_path=$(find_workload_path "${name}")
if [ -z "$workload_path" ]; then
error "Profiling results not found for '${name}'"
echo ""
echo "Available runs:"
cmd_list
return 1
fi
echo "Analyzing: ${name} (Block ${block})"
echo "==========================================="
echo ""
"${wrapper}" analyze --path "${workload_path}" --block "${block}"
}
# Compare two profiling runs
cmd_compare() {
local name1="$1"
local name2="$2"
if [ -z "$name1" ] || [ -z "$name2" ]; then
error "two run names required"
echo "Usage: ck-rocprof compare <name1> <name2>"
return 1
fi
# Validate workload names (prevents path traversal)
if ! validate_workload_name "$name1"; then
return 1
fi
if ! validate_workload_name "$name2"; then
return 1
fi
# Check setup
if ! is_setup_complete; then
error "Profiling environment not set up"
echo "Run: ck-rocprof setup"
return 1
fi
# Verify both runs exist
local path1
path1=$(find_workload_path "${name1}")
local path2
path2=$(find_workload_path "${name2}")
if [ -z "$path1" ]; then
error "Profiling results not found for '${name1}'"
return 1
fi
if [ -z "$path2" ]; then
error "Profiling results not found for '${name2}'"
return 1
fi
echo "Comparing profiling runs:"
echo " Baseline: ${name1}"
echo " Optimized: ${name2}"
echo "==========================================="
echo ""
echo "=== ${name1} - Block 12 (LDS) ==="
cmd_analyze "${name1}" 12 2>/dev/null | head -40
echo ""
echo "=== ${name2} - Block 12 (LDS) ==="
cmd_analyze "${name2}" 12 2>/dev/null | head -40
echo ""
echo "==========================================="
echo "For detailed analysis, run:"
echo " ck-rocprof analyze ${name1} 12"
echo " ck-rocprof analyze ${name2} 12"
}
# List available profiling runs
cmd_list() {
if [ ! -d "${WORKLOAD_DIR}" ]; then
echo "No profiling runs found (workload directory doesn't exist)"
return 0
fi
local runs
runs=$(find "${WORKLOAD_DIR}" -maxdepth 1 -mindepth 1 -type d -exec basename {} \; 2>/dev/null | sort)
if [ -z "$runs" ]; then
echo "No profiling runs found in ${WORKLOAD_DIR}"
return 0
fi
echo "Available profiling runs:"
echo "==========================================="
while IFS= read -r run; do
local path
path=$(find_workload_path "$run")
if [ -n "$path" ]; then
local size
size=$(get_size "$path")
local date
date=$(get_date "$path")
printf " %-25s [%s, %s]\n" "$run" "$size" "$date"
else
printf " %-25s [no data]\n" "$run"
fi
done <<< "$runs"
echo ""
echo "Analyze with: ck-rocprof analyze <name>"
}
# Clean (remove) profiling runs
cmd_clean() {
local name="${1:-}"
if [ -z "$name" ]; then
error "name required (or use --all to remove all runs)"
echo "Usage: ck-rocprof clean <name>"
echo " ck-rocprof clean --all"
return 1
fi
if [ "$name" = "--all" ]; then
# Remove all profiling runs
if [ ! -d "${WORKLOAD_DIR}" ]; then
echo "No profiling runs to clean"
return 0
fi
echo "This will remove ALL profiling runs in ${WORKLOAD_DIR}"
read -r -p "Are you sure? [y/N] " confirm
if [[ ! "$confirm" =~ ^[Yy]$ ]]; then
echo "Cancelled"
return 0
fi
rm -rf "${WORKLOAD_DIR:?}"/*
info "All profiling runs removed"
else
# Validate name
if ! validate_workload_name "$name"; then
return 1
fi
local run_dir="${WORKLOAD_DIR}/${name}"
if [ ! -d "$run_dir" ]; then
error "Profiling run not found: ${name}"
return 1
fi
rm -rf "${run_dir}"
info "Removed profiling run: ${name}"
fi
}
# Show status information
cmd_status() {
echo "CK ROCProf Status"
echo "==========================================="
echo ""
# rocprof-compute binary
if [ -n "${ROCPROF_BIN}" ] && [ -f "${ROCPROF_BIN}" ]; then
echo "rocprof-compute: ${ROCPROF_BIN}"
else
echo "rocprof-compute: not found"
fi
echo ""
# Paths
echo "Paths:"
echo " Venv: ${VENV_PATH}"
echo " Workloads: ${WORKLOAD_DIR}"
echo ""
# Setup status
echo "Setup status:"
if is_setup_complete; then
echo " Profiling environment: ready"
else
echo " Profiling environment: not configured (run 'ck-rocprof setup')"
fi
echo ""
# Workload count
if [ -d "${WORKLOAD_DIR}" ]; then
local count
count=$(find "${WORKLOAD_DIR}" -maxdepth 1 -mindepth 1 -type d 2>/dev/null | wc -l)
echo "Profiling runs: ${count}"
else
echo "Profiling runs: 0"
fi
}
# ============================================================================
# Main command dispatcher
# ============================================================================
case "${1:-}" in
setup)
cmd_setup
;;
run)
shift
cmd_run "$@"
;;
analyze)
shift
cmd_analyze "$@"
;;
compare)
shift
cmd_compare "$@"
;;
list)
cmd_list
;;
clean)
shift
cmd_clean "$@"
;;
status)
cmd_status
;;
help|--help|-h)
show_help
;;
*)
if [ -z "${1:-}" ]; then
show_help
else
echo "Unknown command: ${1}"
echo ""
show_help
exit 1
fi
;;
esac

167
script/tools/ck-rocprof.md Normal file
View File

@@ -0,0 +1,167 @@
# CK ROCProf Tool
GPU performance profiling for Composable Kernel applications using AMD rocprof-compute.
**Note:** This is a native-only tool. For Docker usage, run via `ck-docker exec ck-rocprof ...`
## Quick Start
```bash
# One-time setup (requires rocprofiler-compute installed)
./script/tools/ck-rocprof setup
# Profile executable
cd build
../script/tools/ck-rocprof run baseline ./bin/tile_example_gemm_universal
# Analyze LDS metrics
../script/tools/ck-rocprof analyze baseline
# Compare optimizations
../script/tools/ck-rocprof run optimized ./bin/tile_example_gemm_universal
../script/tools/ck-rocprof compare baseline optimized
```
## Commands
### `setup`
One-time setup: creates Python venv, installs dependencies, configures rocprof-compute.
### `run <name> <executable> [args]`
Profile executable and save results.
```bash
# Basic profiling
ck-rocprof run baseline ./bin/gemm_example
# With arguments
ck-rocprof run large_matrix ./bin/gemm_example -m 8192 -n 8192 -k 4096
# Test filtering
ck-rocprof run unit_test ./bin/test_gemm --gtest_filter="*Fp16*"
```
### `analyze <name> [block]`
Display profiling metrics (default: Block 12 - LDS).
```bash
ck-rocprof analyze baseline # LDS metrics
ck-rocprof analyze baseline 2 # L2 Cache
ck-rocprof analyze baseline 7 # Instruction Mix
```
### `compare <name1> <name2>`
Side-by-side comparison of two runs.
### `list`
List all profiling runs with size and date.
### `clean <name>` / `clean --all`
Remove profiling runs. Use `--all` to remove all runs.
### `status`
Show current configuration: mode (native/Docker), paths, setup status.
## Key LDS Metrics (Block 12)
**Target Values:**
- Bank Conflicts/Access: <0.01 (1% conflict rate)
- Bank Conflict Rate: >90% of peak bandwidth
**Critical Metrics:**
- **12.2.9 Bank Conflicts/Access**: Direct conflict measure
- Baseline (naive): ~0.04 (4% conflicts)
- Optimized: <0.005 (<0.5% conflicts)
- **12.2.12 Bank Conflict Cycles**: Wasted cycles per kernel
- **12.2.17 LDS Data FIFO Full**: Memory system pressure
## Optimization Workflow
```bash
# 1. Baseline
ck-rocprof run baseline ./bin/my_kernel
# 2. Check conflicts
ck-rocprof analyze baseline
# Look for Bank Conflicts/Access > 0.02
# 3. Optimize code (XOR transforms, padding, etc.)
# ... edit source ...
# 4. Test optimization
ninja my_kernel
ck-rocprof run optimized ./bin/my_kernel
# 5. Verify improvement
ck-rocprof compare baseline optimized
# Target: 8-10x reduction in conflicts
```
## Environment Variables
- `CK_PROFILE_VENV`: Python venv path (default: `$PROJECT/.ck-rocprof-venv`)
- `CK_ROCPROF_BIN`: rocprof-compute binary path (auto-detected from PATH or /opt/rocm)
- `CK_ROCM_REQUIREMENTS`: Path to rocprofiler-compute requirements.txt (auto-detected)
- `CK_WORKLOAD_DIR`: Results directory (default: `$PROJECT/build/workloads`)
- `CK_GPU_TARGET`: Override GPU detection (e.g., `gfx950`, `MI300X`)
## Interpreting Results
**Good Performance:**
```
Bank Conflicts/Access: <0.01
Bank Conflict Rate: >90% of peak
LDS Data FIFO Full: Minimal cycles
```
**Needs Optimization:**
```
Bank Conflicts/Access: >0.02
Bank Conflict Cycles: High MAX values
LDS Data FIFO Full: High memory pressure
```
## Troubleshooting
**"Profiling environment not set up"**
```bash
ck-rocprof setup
```
**"rocprof-compute not found"**
```bash
export CK_ROCPROF_BIN=/custom/path/rocprof-compute
ck-rocprof setup
```
**"Profiling results not found"**
```bash
ck-rocprof list # Check available runs
rocminfo | grep gfx # Verify GPU arch
export CK_GPU_TARGET=gfx950 # Override if needed
```
## Storage Layout
Results stored in `workloads/<name>/`:
- `pmc_perf.csv`: Performance counters (primary data file)
- `perfmon/`: Input metric files
- `out/`: Raw output data from profiler runs
- `log.txt`: Profiling log
## Technical Details
- **Setup**: Creates isolated Python venv, installs dependencies
- **Profiling**: Runs `rocprof-compute profile --name <name> -- <executable>`
- **Analysis**: Runs `rocprof-compute analyze --path <path> --block <block>`
- **GPU Support**: MI300/MI350 series, auto-detects architecture
## Related Tools
- `ck-docker`: Container management
- `rocprof-compute`: AMD GPU profiler v2
- `rocm-smi`: System monitoring
## License
Copyright (c) Advanced Micro Devices, Inc. SPDX-License-Identifier: MIT

View File

@@ -2,7 +2,8 @@
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
# CK Test - Build and test Composable Kernel in Docker
# CK Test - Run Composable Kernel tests
# Environment-agnostic: works natively on ROCm hosts or inside containers
set -e
set -o pipefail
@@ -12,155 +13,219 @@ SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/common.sh"
# Initialize configuration
PROJECT_ROOT=$(get_project_root "${SCRIPT_DIR}")
CONTAINER_NAME=$(get_container_name "${PROJECT_ROOT}")
PROJECT_ROOT=$(find_project_root "${SCRIPT_DIR}" || get_project_root "${SCRIPT_DIR}")
BUILD_DIR=$(get_build_dir "${PROJECT_ROOT}")
# Help message
show_help() {
cat << EOF
CK Test - Build and test Composable Kernel in Docker
CK Test - Run Composable Kernel tests
Usage: ck-test [options] <test_name> [test_options]
Usage: ck-test [options] [test_name] [-- gtest_options]
Options:
-h, --help Show this help message
--name <name> Specify container name
--reconfigure Reconfigure CMake before building
--build-dir <dir> Build directory (default: ./build)
--no-build Skip building, run test directly
--list List available tests
--smoke Run all smoke tests (via CTest -L SMOKE_TEST)
--regression Run all regression tests (via CTest -L REGRESSION_TEST)
--all Run all tests (via CTest)
--filter <pattern> Shorthand for --gtest_filter=<pattern>
Arguments:
test_name Name of test executable (required)
test_options Additional options passed to test (e.g., --gtest_filter=*)
test_name Name of test executable (optional for --smoke/--regression/--all)
gtest_options Additional options passed to test (after --)
Environment:
CK_CONTAINER_NAME - Override default container name
GPU_TARGET - Override GPU target detection (e.g., gfx950, gfx942)
CK_BUILD_DIR - Override build directory
Examples:
ck-test test_amdgcn_mma
ck-test test_amdgcn_mma --gtest_filter=*Fp16*
ck-test --name my_container test_amdgcn_mma
ck-test --reconfigure test_amdgcn_mma
ck-test test_amdgcn_mma # Build and run specific test
ck-test test_amdgcn_mma --filter '*Fp16*' # Run with gtest filter
ck-test test_amdgcn_mma -- --gtest_filter=*Fp16* # Explicit gtest options
ck-test --no-build test_amdgcn_mma # Run without rebuilding
ck-test --list # List available tests
ck-test --smoke # Run all smoke tests
ck-test --regression # Run all regression tests
ck-test --all # Run all tests
EOF
}
# Parse arguments
test_name=""
reconfigure=false
no_build=false
test_options=()
list_tests=false
run_smoke=false
run_regression=false
run_all=false
gtest_filter=""
gtest_options=()
parsing_gtest=false
while [[ $# -gt 0 ]]; do
if [ "$parsing_gtest" = true ]; then
gtest_options+=("$1")
shift
continue
fi
case $1 in
-h|--help)
show_help
exit 0
;;
--name)
CONTAINER_NAME="$2"
--build-dir)
require_arg "$1" "${2:-}"
BUILD_DIR="$2"
shift 2
;;
--reconfigure)
reconfigure=true
shift
;;
--no-build)
no_build=true
shift
;;
--gtest_*|--help)
test_options+=("$1")
--list)
list_tests=true
shift
;;
--smoke)
run_smoke=true
shift
;;
--regression)
run_regression=true
shift
;;
--all)
run_all=true
shift
;;
--filter)
require_arg "$1" "${2:-}"
gtest_filter="$2"
shift 2
;;
--)
parsing_gtest=true
shift
;;
--gtest_*)
gtest_options+=("$1")
shift
;;
*)
if [ -z "$test_name" ]; then
test_name="$1"
else
test_options+=("$1")
gtest_options+=("$1")
fi
shift
;;
esac
done
# Validate test name
# Add filter to gtest options if specified
if [ -n "$gtest_filter" ]; then
gtest_options+=("--gtest_filter=${gtest_filter}")
fi
# Validate mutual exclusivity of test suite options
suite_count=0
[ "$run_smoke" = true ] && suite_count=$((suite_count + 1))
[ "$run_regression" = true ] && suite_count=$((suite_count + 1))
[ "$run_all" = true ] && suite_count=$((suite_count + 1))
if [ "$suite_count" -gt 1 ]; then
error "Options --smoke, --regression, and --all are mutually exclusive"
exit 1
fi
# Check build is configured
if ! is_build_configured "${BUILD_DIR}"; then
error "Build not configured. Run 'ck-configure' first"
exit 1
fi
# Handle --list
if [ "$list_tests" = true ]; then
info "Available tests:"
if [ -d "${BUILD_DIR}/bin" ]; then
ls -1 "${BUILD_DIR}/bin/" 2>/dev/null | grep -E '^test_' | sort || echo " (No test binaries found)"
else
echo " (No bin directory found)"
fi
echo ""
echo "CTest labels:"
cd "${BUILD_DIR}"
ctest -N 2>/dev/null | head -20 || echo " (Run 'ctest -N' for full list)"
exit 0
fi
# Handle CTest-based test suites
if [ "$run_smoke" = true ] || [ "$run_regression" = true ] || [ "$run_all" = true ]; then
cd "${BUILD_DIR}"
ctest_cmd=(ctest --output-on-failure)
if [ "$run_smoke" = true ]; then
ctest_cmd+=(-L SMOKE_TEST)
info "Running smoke tests..."
elif [ "$run_regression" = true ]; then
ctest_cmd+=(-L REGRESSION_TEST)
info "Running regression tests..."
else
info "Running all tests..."
fi
"${ctest_cmd[@]}"
exit_code=$?
echo ""
if [ $exit_code -eq 0 ]; then
info "Tests completed successfully"
else
error "Tests failed with exit code: ${exit_code}"
fi
exit $exit_code
fi
# Validate test name for individual test runs
if [ -z "$test_name" ]; then
echo "Error: test_name required"
error "test_name required (or use --smoke/--regression/--all for test suites)"
echo ""
show_help
exit 1
fi
# Ensure container is running
if ! container_is_running "${CONTAINER_NAME}"; then
echo "Container '${CONTAINER_NAME}' not running. Starting..."
"${SCRIPT_DIR}/ck-start" "${CONTAINER_NAME}"
echo ""
fi
# Configure CMake if needed or requested
if [ "$reconfigure" = true ] || ! docker exec "${CONTAINER_NAME}" test -f /workspace/build/build.ninja 2>/dev/null; then
echo "Detecting GPU target..."
GPU_TARGET_DETECTED=$(detect_gpu_target "${CONTAINER_NAME}")
if [ "$reconfigure" = true ]; then
echo "Reconfiguring CMake from scratch for GPU target: ${GPU_TARGET_DETECTED}"
else
echo "Configuring build with CMake for GPU target: ${GPU_TARGET_DETECTED}"
fi
docker exec "${CONTAINER_NAME}" bash -c "
cd /workspace || exit 1
rm -rf /workspace/build
mkdir /workspace/build
cd /workspace/build || exit 1
cmake .. -GNinja \
-DGPU_TARGETS=${GPU_TARGET_DETECTED} \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ \
-DBUILD_TESTING=ON 2>&1 | tail -30
"
echo ""
fi
# Build test if needed (unless --no-build is specified)
if [ "$no_build" = false ]; then
if ! docker exec "${CONTAINER_NAME}" test -f "/workspace/build/bin/${test_name}" 2>/dev/null; then
echo "Building ${test_name}..."
docker exec "${CONTAINER_NAME}" bash -c "
cd /workspace/build || exit 1
ninja ${test_name} 2>&1
"
echo ""
else
echo "Test executable found, rebuilding to ensure latest version..."
docker exec "${CONTAINER_NAME}" bash -c "
cd /workspace/build || exit 1
ninja ${test_name} 2>&1
"
echo ""
fi
info "Building ${test_name}..."
"${SCRIPT_DIR}/ck-build" --build-dir "${BUILD_DIR}" "${test_name}"
echo ""
fi
# Verify test executable exists
test_binary="${BUILD_DIR}/bin/${test_name}"
if [ ! -f "$test_binary" ]; then
error "Test executable not found: ${test_binary}"
echo "Run 'ck-build ${test_name}' first"
exit 1
fi
# Run test
echo "Running: ${test_name} ${test_options[*]}"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Running: ${test_name} ${gtest_options[*]}"
echo "---"
# Build the command with proper quoting
cmd="cd /workspace/build && ./bin/${test_name}"
for opt in "${test_options[@]}"; do
cmd="${cmd} $(printf '%q' "$opt")"
done
docker exec "${CONTAINER_NAME}" bash -c "${cmd}"
cd "${BUILD_DIR}"
"./bin/${test_name}" "${gtest_options[@]}"
exit_code=$?
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "---"
if [ $exit_code -eq 0 ]; then
echo "Test completed successfully"
info "Test completed successfully"
else
echo "Test failed with exit code: ${exit_code}"
error "Test failed with exit code: ${exit_code}"
fi
exit $exit_code

View File

@@ -74,14 +74,14 @@ container_is_running() {
detect_gpu_target() {
local container="$1"
# Allow override via GPU_TARGET environment variable
if [ -n "${GPU_TARGET:-}" ]; then
echo "${GPU_TARGET}"
# Allow override via CK_GPU_TARGET environment variable
if [ -n "${CK_GPU_TARGET:-}" ]; then
echo "${CK_GPU_TARGET}"
return 0
fi
docker exec "${container}" bash -c "
rocminfo 2>/dev/null | grep -oP 'gfx[0-9a-z]+' | head -1 || echo 'gfx950'
rocminfo 2>/dev/null | grep -oE 'gfx[0-9a-z]+' | head -1 || echo 'gfx950'
" | tr -d '\r\n'
}
@@ -95,3 +95,87 @@ ensure_container_running() {
"${script_dir}/ck-docker" start "${container}"
fi
}
# ============================================================================
# Native (non-Docker) utilities
# ============================================================================
# Output utilities
info() { echo "[info] $*"; }
warn() { echo "[warn] $*" >&2; }
error() { echo "[error] $*" >&2; }
# Require argument for option (validates $2 exists and is not another flag)
require_arg() {
local option="$1"
local value="$2"
if [ -z "$value" ] || [[ "$value" == -* ]]; then
error "Option $option requires an argument"
exit 1
fi
}
# Native GPU detection (no Docker required)
detect_gpu_native() {
# Allow override via CK_GPU_TARGET environment variable
if [ -n "${CK_GPU_TARGET:-}" ]; then
echo "${CK_GPU_TARGET}"
return 0
fi
# Try rocminfo if available
if command -v rocminfo &>/dev/null; then
local gpu
gpu=$(rocminfo 2>/dev/null | grep -oE 'gfx[0-9a-z]+' | head -1)
if [ -n "$gpu" ]; then
echo "$gpu"
return 0
fi
fi
# Fallback
echo "gfx950"
}
# Get build directory (respects CK_BUILD_DIR env var)
get_build_dir() {
local project_root="${1:-$(get_project_root "$(dirname "${BASH_SOURCE[0]}")")}"
echo "${CK_BUILD_DIR:-${project_root}/build}"
}
# Check if build is configured (build.ninja exists)
is_build_configured() {
local build_dir="${1:-$(get_build_dir)}"
[ -f "${build_dir}/build.ninja" ]
}
# Find project root from any subdirectory (walks up to find .git)
find_project_root() {
local dir="${1:-$(pwd)}"
while [ "$dir" != "/" ]; do
if [ -d "$dir/.git" ]; then
echo "$dir"
return 0
fi
dir=$(dirname "$dir")
done
return 1
}
# List available CMake presets
list_cmake_presets() {
local project_root="${1:-$(find_project_root)}"
local presets_file="${project_root}/CMakePresets.json"
if [ ! -f "$presets_file" ]; then
return 1
fi
# Extract non-hidden preset names
if command -v jq &>/dev/null; then
jq -r '.configurePresets[] | select(.hidden != true) | .name' "$presets_file" 2>/dev/null
else
# Fallback: sed-based extraction (more portable than grep -P)
sed -n 's/.*"name"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/\1/p' "$presets_file" | grep -v '^use-'
fi
}