Files
composable_kernel/script/tools/common.sh
Max Podkorytov 83b6155354 Add ck-rocprof: GPU profiling tool for rocprof-compute (#3627)
* Decouple configure/build/test tools from Docker

Create a two-layer tool architecture:
- Core tools (ck-configure, ck-build, ck-test): Environment-agnostic,
  work on any system with ROCm - no Docker dependency
- Container tools (ck-docker): Manage Docker containers and delegate
  to core tools via docker exec

Changes:
- Add ck-configure: New CMake configuration tool with preset support,
  native GPU detection, and flexible options
- Refactor ck-build: Remove Docker dependency, add --configure and
  --list options, call ninja directly
- Refactor ck-test: Remove Docker dependency, add CTest integration
  with --smoke/--regression/--all options
- Enhance common.sh: Add native GPU detection, build directory utils,
  and output helpers
- Update ck-docker: Add configure/build/test/exec commands that
  delegate to core tools inside container

This enables:
- Native development on ROCm hosts without Docker
- Simpler CI/CD integration
- Consistent behavior inside and outside containers

Co-Authored-By: Claude <noreply@anthropic.com>

* Add ck-rocprof: GPU profiling tool for rocprof-compute

Adds a command-line profiling tool to simplify GPU performance
analysis workflow using AMD rocprof-compute.

Features:
- Easy setup with automatic Python venv configuration
- Simple CLI: setup, run, analyze, compare, list
- Automatic GPU architecture detection
- Focus on LDS metrics (Block 12) for bank conflict analysis
- Comprehensive documentation with examples and troubleshooting

Usage:
  ck-rocprof setup                    # One-time environment setup
  ck-rocprof run <name> <executable>  # Profile executable
  ck-rocprof analyze <name> [block]   # Analyze metrics
  ck-rocprof compare <name1> <name2>  # Compare two runs
  ck-rocprof list                     # List available runs

* Make ck-rocprof documentation concise and improve Docker integration

- Streamlined documentation from 416 to 157 lines (62% reduction)
- Focused on essential commands, metrics, and workflows
- Enhanced script to run all operations inside Docker containers
- Fixed workload directory path and improved container management
- Added automatic rocprofiler-compute installation and dependency handling

* Add --no-roof flag to ck-rocprof profile command

Skip roofline analysis by default to speed up profiling. Roofline
analysis can add significant time to profiling runs but is not
needed for most LDS bank conflict analysis workflows.

* Make ck-rocprof work independently of Docker

Add native execution mode that runs rocprof-compute directly on the host
system when available, falling back to Docker mode when not.

Key changes:
- Auto-detect native mode when rocprof-compute is in PATH or common locations
- Add execution mode wrappers (exec_cmd, file_exists, dir_exists, etc.)
- Native mode stores venv at .ck-rocprof-venv in project root
- Native mode stores workloads at build/workloads/
- Support user-installed rocprofiler-compute (e.g., ~/.local/rocprofiler-compute)
- Add CK_FORCE_DOCKER env var to force Docker mode
- Update help message to show current execution mode
- Maintain full backward compatibility with existing Docker workflow

Tested successfully with rocprofiler-compute 3.4.0 installed from source
on MI300X GPU in native mode.

Co-Authored-By: Claude <noreply@anthropic.com>

* Add clean/status commands and improve ck-rocprof robustness

- Add 'clean' command to remove profiling runs (supports --all)
- Add 'status' command to show configuration and environment info
- Add workload name validation to prevent path traversal attacks
- Fix uv installation to use pip instead of curl for reliability
- Add cross-platform stat support for macOS compatibility
- Consolidate ROCPROF_CANDIDATES to avoid code duplication
- Expand help documentation with all profiling block descriptions
- Fix Docker wrapper script escaping issues

Co-Authored-By: Claude <noreply@anthropic.com>

* Fix analyze command to use correct workload path

rocprof-compute stores results directly in the workload directory
(pmc_perf.csv) rather than in a GPU architecture subdirectory.
Updated find_workload_path to detect this correctly.

Co-Authored-By: Claude <noreply@anthropic.com>

* Address PR review security and robustness issues

Security fixes:
- Escape executable path in cmd_run to prevent shell injection
- Add workload name validation to cmd_analyze and cmd_compare

Robustness improvements:
- Add error checking for uv package manager installation
- Use consistent project root detection (find_project_root || get_project_root)
- Use /opt/rocm instead of hardcoded /opt/rocm-7.0.1 in Docker mode
- Derive ROCM_REQUIREMENTS path from ROCPROF_BIN for flexibility
- Use gfx950 as fallback GPU consistent with common.sh

Documentation updates:
- Fix env var name GPU_TARGET -> CK_GPU_TARGET
- Update storage layout to reflect current structure (workloads/<name>/)
- Document clean and status commands
- Clarify native vs Docker default paths

Co-Authored-By: Claude <noreply@anthropic.com>

* Simplify ck-rocprof to native-only mode

Remove Docker mode from ck-rocprof. Docker users should run the tool
via `ck-docker exec ck-rocprof ...` instead.

This simplification:
- Removes ~210 lines of Docker-specific code
- Eliminates mode detection complexity
- Makes the script easier to maintain
- Provides clearer error messages when rocprof-compute is not found

The setup command now lists all searched locations when rocprof-compute
is not found, helping users understand how to install it.

Co-Authored-By: Claude <noreply@anthropic.com>

* Add rocprofiler-compute source installation fallback

When rocprof-compute is not found in system locations, automatically
install rocprofiler-compute 3.4.0 from source as a fallback. This
eliminates the hard dependency on system ROCm packages.

Implementation details:
- Clone rocprofiler-compute from GitHub to ~/.local/
- Install dependencies via requirements.txt (not editable install)
- Create wrapper that sets PYTHONPATH to source directory
- Execute source script directly rather than importing as module

This approach matches the project's development workflow and works
around the incomplete pyproject.toml that prevents editable installs.

Co-Authored-By: Claude <noreply@anthropic.com>

---------

Co-authored-by: Claude <noreply@anthropic.com>
2026-01-29 17:20:22 -08:00

182 lines
4.8 KiB
Bash

#!/bin/bash
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
# Common utilities for CK Docker tools
# Shared configuration and helper functions
# Find project root (where .git directory is)
get_project_root() {
local script_dir="$1"
cd "${script_dir}/../.." && pwd
}
# Detect git branch and sanitize for Docker naming
get_sanitized_branch() {
local project_root="$1"
local branch
branch=$(cd "${project_root}" && git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '_' | tr -cd 'a-zA-Z0-9_-' || echo "")
branch=${branch:-unknown}
# Handle detached HEAD state
if [ "${branch}" = "HEAD" ]; then
branch="detached"
fi
echo "${branch}"
}
# Get username with fallback
get_username() {
echo "${USER:-$(whoami 2>/dev/null || echo "user")}"
}
# Generate default container name: ck_<username>_<branch>
get_default_container_name() {
local project_root="$1"
local user_name
local git_branch
user_name=$(get_username)
git_branch=$(get_sanitized_branch "${project_root}")
echo "ck_${user_name}_${git_branch}"
}
# Get container name (respects CK_CONTAINER_NAME env var)
get_container_name() {
local project_root="$1"
local default_name
default_name=$(get_default_container_name "${project_root}")
echo "${CK_CONTAINER_NAME:-${default_name}}"
}
# Get Docker image (respects CK_DOCKER_IMAGE env var)
get_docker_image() {
echo "${CK_DOCKER_IMAGE:-rocm/composable_kernel:ck_ub24.04_rocm7.0.1}"
}
# Check if container exists (exact match)
container_exists() {
local name="$1"
docker ps -a --filter "name=^${name}$" --format '{{.Names}}' | grep -q "^${name}$"
}
# Check if container is running (exact match)
container_is_running() {
local name="$1"
docker ps --filter "name=^${name}$" --format '{{.Names}}' | grep -q "^${name}$"
}
# Detect GPU target in container
detect_gpu_target() {
local container="$1"
# Allow override via CK_GPU_TARGET environment variable
if [ -n "${CK_GPU_TARGET:-}" ]; then
echo "${CK_GPU_TARGET}"
return 0
fi
docker exec "${container}" bash -c "
rocminfo 2>/dev/null | grep -oE 'gfx[0-9a-z]+' | head -1 || echo 'gfx950'
" | tr -d '\r\n'
}
# Ensure container is running, start if needed
ensure_container_running() {
local container="$1"
local script_dir="$2"
if ! container_is_running "${container}"; then
echo "Container '${container}' not running. Starting with ck-docker..."
"${script_dir}/ck-docker" start "${container}"
fi
}
# ============================================================================
# Native (non-Docker) utilities
# ============================================================================
# Output utilities
info() { echo "[info] $*"; }
warn() { echo "[warn] $*" >&2; }
error() { echo "[error] $*" >&2; }
# Require argument for option (validates $2 exists and is not another flag)
require_arg() {
local option="$1"
local value="$2"
if [ -z "$value" ] || [[ "$value" == -* ]]; then
error "Option $option requires an argument"
exit 1
fi
}
# Native GPU detection (no Docker required)
detect_gpu_native() {
# Allow override via CK_GPU_TARGET environment variable
if [ -n "${CK_GPU_TARGET:-}" ]; then
echo "${CK_GPU_TARGET}"
return 0
fi
# Try rocminfo if available
if command -v rocminfo &>/dev/null; then
local gpu
gpu=$(rocminfo 2>/dev/null | grep -oE 'gfx[0-9a-z]+' | head -1)
if [ -n "$gpu" ]; then
echo "$gpu"
return 0
fi
fi
# Fallback
echo "gfx950"
}
# Get build directory (respects CK_BUILD_DIR env var)
get_build_dir() {
local project_root="${1:-$(get_project_root "$(dirname "${BASH_SOURCE[0]}")")}"
echo "${CK_BUILD_DIR:-${project_root}/build}"
}
# Check if build is configured (build.ninja exists)
is_build_configured() {
local build_dir="${1:-$(get_build_dir)}"
[ -f "${build_dir}/build.ninja" ]
}
# Find project root from any subdirectory (walks up to find .git)
find_project_root() {
local dir="${1:-$(pwd)}"
while [ "$dir" != "/" ]; do
if [ -d "$dir/.git" ]; then
echo "$dir"
return 0
fi
dir=$(dirname "$dir")
done
return 1
}
# List available CMake presets
list_cmake_presets() {
local project_root="${1:-$(find_project_root)}"
local presets_file="${project_root}/CMakePresets.json"
if [ ! -f "$presets_file" ]; then
return 1
fi
# Extract non-hidden preset names
if command -v jq &>/dev/null; then
jq -r '.configurePresets[] | select(.hidden != true) | .name' "$presets_file" 2>/dev/null
else
# Fallback: sed-based extraction (more portable than grep -P)
sed -n 's/.*"name"[[:space:]]*:[[:space:]]*"\([^"]*\)".*/\1/p' "$presets_file" | grep -v '^use-'
fi
}