Files
composable_kernel/script/tools/ck-rocprof.md
2026-01-30 02:36:23 +00:00

4.3 KiB

CK ROCProf Tool

GPU performance profiling for Composable Kernel applications using AMD rocprof-compute.

Note: This is a native-only tool. For Docker usage, run via ck-docker exec ck-rocprof ...

Quick Start

# One-time setup (requires rocprofiler-compute installed)
./script/tools/ck-rocprof setup

# Profile executable
cd build
../script/tools/ck-rocprof run baseline ./bin/tile_example_gemm_universal

# Analyze LDS metrics
../script/tools/ck-rocprof analyze baseline

# Compare optimizations
../script/tools/ck-rocprof run optimized ./bin/tile_example_gemm_universal
../script/tools/ck-rocprof compare baseline optimized

Commands

setup

One-time setup: creates Python venv, installs dependencies, configures rocprof-compute.

run <name> <executable> [args]

Profile executable and save results.

# Basic profiling
ck-rocprof run baseline ./bin/gemm_example

# With arguments
ck-rocprof run large_matrix ./bin/gemm_example -m 8192 -n 8192 -k 4096

# Test filtering
ck-rocprof run unit_test ./bin/test_gemm --gtest_filter="*Fp16*"

analyze <name> [block]

Display profiling metrics (default: Block 12 - LDS).

ck-rocprof analyze baseline        # LDS metrics
ck-rocprof analyze baseline 2      # L2 Cache
ck-rocprof analyze baseline 7      # Instruction Mix

compare <name1> <name2>

Side-by-side comparison of two runs.

list

List all profiling runs with size and date.

clean <name> / clean --all

Remove profiling runs. Use --all to remove all runs.

status

Show current configuration: mode (native/Docker), paths, setup status.

Key LDS Metrics (Block 12)

Target Values:

  • Bank Conflicts/Access: <0.01 (1% conflict rate)
  • Bank Conflict Rate: >90% of peak bandwidth

Critical Metrics:

  • 12.2.9 Bank Conflicts/Access: Direct conflict measure
    • Baseline (naive): ~0.04 (4% conflicts)
    • Optimized: <0.005 (<0.5% conflicts)
  • 12.2.12 Bank Conflict Cycles: Wasted cycles per kernel
  • 12.2.17 LDS Data FIFO Full: Memory system pressure

Optimization Workflow

# 1. Baseline
ck-rocprof run baseline ./bin/my_kernel

# 2. Check conflicts
ck-rocprof analyze baseline
# Look for Bank Conflicts/Access > 0.02

# 3. Optimize code (XOR transforms, padding, etc.)
# ... edit source ...

# 4. Test optimization
ninja my_kernel
ck-rocprof run optimized ./bin/my_kernel

# 5. Verify improvement
ck-rocprof compare baseline optimized
# Target: 8-10x reduction in conflicts

Environment Variables

  • CK_PROFILE_VENV: Python venv path (default: $PROJECT/.ck-rocprof-venv)
  • CK_ROCPROF_BIN: rocprof-compute binary path (auto-detected from PATH or /opt/rocm)
  • CK_ROCM_REQUIREMENTS: Path to rocprofiler-compute requirements.txt (auto-detected)
  • CK_WORKLOAD_DIR: Results directory (default: $PROJECT/build/workloads)
  • CK_GPU_TARGET: Override GPU detection (e.g., gfx950, MI300X)

Interpreting Results

Good Performance:

Bank Conflicts/Access: <0.01
Bank Conflict Rate: >90% of peak
LDS Data FIFO Full: Minimal cycles

Needs Optimization:

Bank Conflicts/Access: >0.02
Bank Conflict Cycles: High MAX values
LDS Data FIFO Full: High memory pressure

Troubleshooting

"Profiling environment not set up"

ck-rocprof setup

"rocprof-compute not found"

export CK_ROCPROF_BIN=/custom/path/rocprof-compute
ck-rocprof setup

"Profiling results not found"

ck-rocprof list                    # Check available runs
rocminfo | grep gfx               # Verify GPU arch
export CK_GPU_TARGET=gfx950       # Override if needed

Storage Layout

Results stored in workloads/<name>/:

  • pmc_perf.csv: Performance counters (primary data file)
  • perfmon/: Input metric files
  • out/: Raw output data from profiler runs
  • log.txt: Profiling log

Technical Details

  • Setup: Creates isolated Python venv, installs dependencies
  • Profiling: Runs rocprof-compute profile --name <name> -- <executable>
  • Analysis: Runs rocprof-compute analyze --path <path> --block <block>
  • GPU Support: MI300/MI350 series, auto-detects architecture
  • ck-docker: Container management
  • rocprof-compute: AMD GPU profiler v2
  • rocm-smi: System monitoring

License

Copyright (c) Advanced Micro Devices, Inc. SPDX-License-Identifier: MIT