Add LLM-agnostic Docker and build analysis tools (#3576)

This commit introduces utility tools for building, testing, and analyzing
Composable Kernel. The tools are designed to be LLM-agnostic and can be
used with any AI assistant or directly from the command line.

Tools Added:
============

1. ck-docker - Docker container management
   - Start/stop ROCm-enabled containers
   - Build targets with CMake + Ninja
   - Run tests with gtest filters
   - Auto-detect GPU targets (gfx950, gfx942, etc.)
   - Per-user, per-branch container naming to avoid conflicts

2. ck-build-analysis - Build time profiling
   - Uses Clang's -ftime-trace for compilation analysis
   - Aggregates statistics across multiple trace files
   - Identifies template instantiation bottlenecks
   - Generates detailed Markdown reports with:
     * Compilation phase breakdown
     * Top expensive instantiations
     * Template family analysis
     * Data-driven optimization recommendations
   - Configurable granularity (1µs to 500µs)
   - PEP 723 compliant Python script with auto-dependency management via uv

Key Features:
=============

- LLM-agnostic design (works with any AI assistant)
- Zero-configuration setup with automatic dependency installation
- Comprehensive documentation in script/tools/README*.md
- Security hardening (input validation, no command injection)
- Multi-file trace aggregation for accurate build analysis
- Jinja2-based report generation for customizable output

Implementation:
===============

- script/tools/ck-docker - Main Docker orchestration script
- script/tools/ck-build-analysis - Build analysis orchestration
- script/tools/common.sh - Shared utilities (container mgmt, GPU detection)
- script/tools/analyze_build_trace.py - PEP 723 compliant Python analyzer
- script/tools/templates/ - Jinja2 templates for report generation
- script/tools/README*.md - Comprehensive documentation

Directory Structure:
====================

script/tools/
├── README.md                          # Main overview
├── README_ck-docker.md                # ck-docker documentation
├── README_ck-build-analysis.md        # ck-build-analysis documentation
├── ck-docker                          # Docker orchestration script
├── ck-build-analysis                  # Build analysis orchestration
├── common.sh                          # Shared utilities
├── analyze_build_trace.py             # Python analyzer (PEP 723)
└── templates/
    └── build_analysis_report.md.jinja # Report template

The tools follow Unix philosophy: do one thing well, compose easily,
and work from both CLI and programmatic contexts.
This commit is contained in:
Max Podkorytov
2026-01-15 08:30:23 -08:00
committed by GitHub
parent f57395689b
commit 086a1f8861
8 changed files with 1426 additions and 0 deletions

78
script/tools/README.md Normal file
View File

@@ -0,0 +1,78 @@
# Composable Kernel Tools
This directory contains utility tools for building, testing, and analyzing Composable Kernel.
These tools are designed to be LLM-agnostic and can be used with any AI assistant or directly from the command line.
## Available Tools
### ck-docker
Build and test composable_kernel in Docker with ROCm support.
See [README_ck-docker.md](README_ck-docker.md) for details.
**Quick start:**
```bash
# Add to PATH
export PATH="$PATH:$PWD/script/tools"
# Start container and build
ck-docker start
ck-docker build test_amdgcn_mma
ck-docker test test_amdgcn_mma
```
### ck-build-analysis
Analyze Composable Kernel build times using Clang's -ftime-trace profiler.
See [README_ck-build-analysis.md](README_ck-build-analysis.md) for details.
**Quick start:**
```bash
# Add to PATH
export PATH="$PATH:$PWD/script/tools"
# Analyze build time
ck-build-analysis example_convnd_fwd_xdl_fp8
```
## LLM Assistant Integration
These tools can be used as-is with any LLM assistant by providing the tool documentation to the assistant. The assistant can then invoke these tools on your behalf.
For example, you can ask:
- "Start the docker container"
- "Build and test test_amdgcn_mma"
- "Analyze build time for example_convnd_fwd_xdl_fp8"
The assistant will translate your natural language request into the appropriate tool invocation.
## Dependencies
- **ck-docker**: Requires Docker and ROCm-capable GPU (for running tests)
- **ck-build-analysis**: Requires Docker, automatically installs Python dependencies (jinja2) via `uv`
## Directory Structure
```
script/tools/
├── README.md # This file
├── README_ck-docker.md # Documentation for ck-docker
├── README_ck-build-analysis.md # Documentation for ck-build-analysis
├── ck-docker # Docker container management tool
├── ck-build-analysis # Build time analysis tool
├── common.sh # Shared utilities for bash scripts
├── analyze_build_trace.py # Python script for trace analysis (PEP 723 compliant)
└── templates/
└── build_analysis_report.md.jinja # Jinja2 template for analysis reports
```
## Contributing
When adding new tools to this directory:
1. Keep them LLM-agnostic (avoid hardcoding references to specific AI assistants)
2. Provide clear command-line usage documentation
3. Include examples for both CLI and LLM assistant usage
4. Follow the existing naming convention and structure

View File

@@ -0,0 +1,168 @@
# ck-build-analysis
Analyze Composable Kernel build times using Clang's -ftime-trace profiler.
## Terminal Usage
Direct command-line usage:
```bash
# From composable_kernel directory
script/tools/ck-build-analysis example_convnd_fwd_xdl_fp8
script/tools/ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=1
script/tools/ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=1 --output=my_report.md
# Or add to PATH
export PATH="$PATH:$PWD/script/tools"
ck-build-analysis example_convnd_fwd_xdl_fp8
```
## LLM Assistant Integration
If using an LLM assistant, you can ask in natural language:
- "Analyze build time for example_convnd_fwd_xdl_fp8"
- "Profile the compilation of test_amdgcn_mma with 1us granularity"
- "Generate a build time report for example_gemm_xdl"
## Commands
```
ck-build-analysis <target> [options]
Options:
--granularity=N Time trace granularity in microseconds (default: 1)
--output=FILE Output report filename (default: build_time_analysis_report.md)
--name=NAME Docker container name (default: from CK_CONTAINER_NAME or auto-generated)
--no-reconfigure Skip CMake reconfiguration if build exists
--help Show this help message
```
## What It Does
1. **Configures CMake** with `-ftime-trace` and custom granularity
2. **Builds the target** using Ninja in Docker
3. **Analyzes the trace** JSON file for template instantiation patterns
4. **Generates a report** with:
- Compilation phase breakdown
- Top expensive individual instantiations
- Template families ranked by total time and count
- Key insights and optimization recommendations
- Complete statistics
## Configuration
- **Container**: Uses ck-docker container (auto-starts if needed)
- **Granularity**: Default 1us (100% template coverage, best balance)
- **Output**: Markdown report in project root
## Environment
```bash
export CK_CONTAINER_NAME=my_build # Override container name
export CK_BUILD_ANALYSIS_GRANULARITY=1 # Default granularity in microseconds
```
## Examples
```bash
# Complete template analysis with default granularity (1us - recommended)
ck-build-analysis example_convnd_fwd_xdl_fp8
# Quick daily check (10us granularity, captures most expensive templates)
ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=10
# Maximum detail (0us granularity, includes LLVM internals)
ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=0
# High-level overview (500us granularity, major bottlenecks only)
ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=500
# Custom output filename
ck-build-analysis example_convnd_fwd_xdl_fp8 --output=fp8_conv_analysis.md
# Analyze test target
ck-build-analysis test_amdgcn_mma
# Use existing build (skip reconfigure)
ck-build-analysis example_convnd_fwd_xdl_fp8 --no-reconfigure
```
## Output
The report includes:
- **Executive Summary**: Total time, events, instantiations, unique templates
- **Compilation Phases**: InstantiateFunction, Frontend, Backend, Optimizer, etc.
- **Top 30 Individual Instantiations**: Most expensive single templates
- **Template Families**: Grouped by total time and instantiation count
- **Key Insights**: What's slow and why
- **Optimization Recommendations**: Short, medium, and long-term strategies
- **Detailed Statistics**: Averages, medians, distributions
## Granularity Trade-offs
| Granularity | Template Coverage | Use Case |
|-------------|-------------------|----------|
| **0us** | All templates + sub-us compiler internals | LLVM internals debugging, very large files, higher overhead |
| **1us (default)** | **All templates** | **Default: Complete template analysis with low overhead** |
| **10us** | Most expensive templates | Daily quick checks, smaller files, minimal overhead |
| **50-100us** | Top bottlenecks | Balanced detail/size, suitable for CI/CD |
| **500us** | High-level phases only | Not recommended for template analysis |
**Recommended default**: 1us captures all template instantiations with minimal overhead
## Notes
- **0us and 1us capture all templates** - 0us adds sub-microsecond compiler internals
- **1us is the sweet spot**: complete template coverage, filters noise, low overhead
- **10us is practical** for daily use: captures most expensive templates, smaller files
- **500us loses most template instantiation data** - only use for high-level phase breakdown
- Finer granularity = more events = larger files + higher build time overhead
- For template-heavy C++ codebases like CK: **use 1us for analysis, 10us for daily checks**
## Implementation Details
### PEP 723 Compliance with Automatic Dependency Management
The analysis script (`analyze_build_trace.py`) is PEP 723 compliant with inline dependency metadata:
```python
# /// script
# requires-python = ">=3.8"
# dependencies = [
# "jinja2>=3.0.0",
# ]
# ///
```
**The tool automatically installs and uses `uv`**, which provides:
- ✅ Zero-configuration dependency management
- ✅ Automatic installation of jinja2 from PEP 723 metadata
- ✅ Isolated dependency environment (no system pollution)
- ✅ Fast caching for subsequent runs
**No manual setup required!** The first time you run the tool, it will:
1. Detect if `uv` is installed in the container
2. If not, automatically install it via Ubuntu packages (pipx install uv)
3. Use `uv run` to execute the analysis with auto-managed dependencies
On subsequent runs, `uv` will already be available and dependencies will be cached.
Installation is done through Ubuntu's package manager for security and reliability.
### Components
- **ck-build-analysis** - Main bash script that orchestrates Docker, CMake, and analysis
- **analyze_build_trace.py** - PEP 723 compliant Python script for trace analysis
- **templates/build_analysis_report.md.jinja** - Jinja2 template for report generation
### Standalone Usage
The Python script can also be run independently:
```bash
# With uv (recommended - auto-installs dependencies from PEP 723 metadata)
uv run script/tools/analyze_build_trace.py trace.json report.md target 100 22 templates/
# With pipx (alternative - also auto-installs dependencies)
pipx run script/tools/analyze_build_trace.py trace.json report.md target 100 22 templates/
```

View File

@@ -0,0 +1,80 @@
# ck-docker
Build and test composable_kernel in Docker with ROCm support.
## Terminal Usage
Direct command-line usage:
```bash
# From composable_kernel directory
script/tools/ck-docker start
script/tools/ck-docker build test_amdgcn_mma
script/tools/ck-docker test test_amdgcn_mma --gtest_filter=*Fp16*
script/tools/ck-docker status
script/tools/ck-docker shell
# Or add to PATH
export PATH="$PATH:$PWD/script/tools"
ck-docker start
```
## LLM Assistant Integration
If using an LLM assistant, you can ask in natural language:
- "Start the docker container"
- "Build test_amdgcn_mma"
- "Run test_amdgcn_mma with filter *Fp16*"
- "Check container status"
- "Open a shell in the container"
## Commands
```
ck-docker start [name] Start Docker container
ck-docker build [target] [--reconfigure] Build target (optionally reconfigure CMake)
ck-docker test <name> [options] Run test
ck-docker shell [name] Interactive shell
ck-docker status [name] Check status
ck-docker stop [name] Stop container
```
## Configuration
- **Image**: rocm/composable_kernel:ck_ub24.04_rocm7.0.1
- **GPU**: Auto-detected via rocminfo (fallback: gfx950)
- **Compiler**: /opt/rocm/llvm/bin/clang++
- **Build**: Ninja + CMake (Release)
- **Mount**: Current directory → /workspace
- **Container Name**: Auto-generated as `ck_<username>_<branch>` to avoid clashes
## Environment
```bash
export CK_CONTAINER_NAME=my_build # Override default container name
export CK_DOCKER_IMAGE=rocm/composable_kernel:ck_ub24.04_rocm7.0.1 # Override Docker image
export GPU_TARGET=gfx942 # Override GPU target detection
```
## Examples
```bash
# Start container
ck-docker start
# Build and run test
ck-docker build test_amdgcn_mma
ck-docker test test_amdgcn_mma
# Force clean CMake reconfiguration and build
ck-docker build --reconfigure test_amdgcn_mma
# Custom container
ck-docker start my_build
ck-docker build test_amdgcn_mma --name my_build
ck-docker test test_amdgcn_mma --name my_build
# Debug
ck-docker shell
ck-docker status
```

View File

@@ -0,0 +1,347 @@
#!/usr/bin/env python3
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
# /// script
# requires-python = ">=3.8"
# dependencies = [
# "jinja2>=3.0.0",
# ]
# ///
"""
Build Time Analysis Tool for Composable Kernel
Analyzes Clang -ftime-trace output to identify template instantiation
bottlenecks and generate comprehensive build time reports.
"""
import json
import os
import re
import sys
from collections import defaultdict
from datetime import datetime
try:
from jinja2 import Environment, FileSystemLoader
except ImportError:
print("Error: jinja2 is required but not installed.", file=sys.stderr)
print("Install with: apt-get install python3-jinja2", file=sys.stderr)
print("Or with pip: pip install jinja2", file=sys.stderr)
sys.exit(1)
def parse_arguments():
"""Parse command-line arguments."""
if len(sys.argv) < 7:
print(
"Usage: analyze_build_trace.py <trace_files_or_dir> <output_file> <target> <granularity> <build_time> <template_dir>"
)
print(
" trace_files_or_dir: Comma-separated list of trace files OR directory containing .json files"
)
sys.exit(1)
return {
"trace_input": sys.argv[1],
"output_file": sys.argv[2],
"target": sys.argv[3],
"granularity": sys.argv[4],
"build_time": sys.argv[5],
"template_dir": sys.argv[6],
}
def find_trace_files(trace_input):
"""Find all trace files from input (file list, single file, or directory)."""
trace_files = []
# Check if it's a directory
if os.path.isdir(trace_input):
print(f"Scanning directory: {trace_input}")
for root, dirs, files in os.walk(trace_input):
for file in files:
# Include .cpp.json and .hip.json, exclude compile_commands.json and CMake files
if file.endswith((".cpp.json", ".hip.json")) and "CMakeFiles" in root:
trace_files.append(os.path.join(root, file))
trace_files.sort()
# Check if it's a comma-separated list
elif "," in trace_input:
trace_files = [f.strip() for f in trace_input.split(",")]
# Single file
else:
trace_files = [trace_input]
# Filter out non-existent files
valid_files = [f for f in trace_files if os.path.isfile(f)]
if not valid_files:
print(f"Error: No valid trace files found in: {trace_input}", file=sys.stderr)
sys.exit(1)
print(f"Found {len(valid_files)} trace file(s)")
return valid_files
def load_trace_data(trace_files):
"""Load and parse multiple trace JSON files."""
all_data = []
for trace_file in trace_files:
print(f" Loading: {trace_file}")
try:
with open(trace_file, "r") as f:
data = json.load(f)
# Get file basename for tracking
file_name = os.path.basename(trace_file)
all_data.append({"file": file_name, "path": trace_file, "data": data})
except Exception as e:
print(f" Warning: Failed to load {trace_file}: {e}", file=sys.stderr)
return all_data
def process_events(all_trace_data):
"""Process trace events from multiple files and extract statistics."""
print("Processing events from all files...")
template_stats = defaultdict(lambda: {"count": 0, "total_dur": 0})
phase_stats = defaultdict(int)
top_individual = []
file_stats = []
total_events = 0
for trace_info in all_trace_data:
file_name = trace_info["file"]
data = trace_info["data"]
events = data.get("traceEvents", [])
file_template_time = 0
file_event_count = len(events)
total_events += file_event_count
print(f" Processing {file_name}: {file_event_count:,} events")
for event in events:
name = event.get("name", "")
dur = int(event.get("dur", 0)) # Keep as integer microseconds
if name and dur > 0:
phase_stats[name] += dur
if name in ["InstantiateFunction", "InstantiateClass"]:
detail = event.get("args", {}).get("detail", "")
top_individual.append(
{"detail": detail, "dur": dur, "type": name, "file": file_name}
)
file_template_time += dur
# Extract template name (everything before '<' or '(')
match = re.match(r"^([^<(]+)", detail)
if match:
template_name = match.group(1).strip()
# Normalize template names
template_name = re.sub(r"^ck::", "", template_name)
template_name = re.sub(r"^std::", "std::", template_name)
template_stats[template_name]["count"] += 1
template_stats[template_name]["total_dur"] += dur
file_stats.append(
{
"name": file_name,
"events": file_event_count,
"template_time": file_template_time,
}
)
return template_stats, phase_stats, top_individual, file_stats, total_events
def prepare_template_data(template_stats, phase_stats, top_individual, file_stats):
"""Prepare and calculate derived statistics for template rendering."""
print("Sorting data...")
# Sort data
sorted_phases = sorted(phase_stats.items(), key=lambda x: x[1], reverse=True)
top_individual.sort(key=lambda x: x["dur"], reverse=True)
file_stats.sort(key=lambda x: x["template_time"], reverse=True)
# Calculate totals
total_template_time = sum(s["total_dur"] for s in template_stats.values())
total_trace_time = sum(phase_stats.values())
total_inst = sum(s["count"] for s in template_stats.values())
# Prepare templates by time with calculated fields
templates_by_time = []
for name, stats in sorted(
template_stats.items(), key=lambda x: x[1]["total_dur"], reverse=True
):
templates_by_time.append(
(
name,
{
"count": stats["count"],
"total_dur": stats["total_dur"],
"avg": stats["total_dur"] // stats["count"]
if stats["count"] > 0
else 0,
"pct": 100 * stats["total_dur"] / total_template_time
if total_template_time > 0
else 0,
},
)
)
# Prepare templates by count
templates_by_count = []
for name, stats in sorted(
template_stats.items(), key=lambda x: x[1]["count"], reverse=True
):
templates_by_count.append(
(
name,
{
"count": stats["count"],
"total_dur": stats["total_dur"],
"avg": stats["total_dur"] // stats["count"]
if stats["count"] > 0
else 0,
},
)
)
# Add friendly type names to individual instantiations
for inst in top_individual:
inst["inst_type"] = "Func" if inst["type"] == "InstantiateFunction" else "Class"
# Calculate additional metrics
median_count = 0
if len(template_stats) > 0:
median_count = sorted([s["count"] for s in template_stats.values()])[
len(template_stats) // 2
]
top10_pct = 0
if len(templates_by_time) >= 10:
top10_pct = (
100
* sum(s[1]["total_dur"] for s in templates_by_time[:10])
/ total_template_time
)
return {
"sorted_phases": sorted_phases,
"top_individual": top_individual,
"templates_by_time": templates_by_time,
"templates_by_count": templates_by_count,
"total_template_time": total_template_time,
"total_trace_time": total_trace_time,
"total_inst": total_inst,
"median_count": median_count,
"top10_pct": top10_pct,
"unique_families": len(template_stats),
"file_stats": file_stats,
}
def setup_jinja_environment(template_dir):
"""Set up Jinja2 environment with custom filters."""
env = Environment(loader=FileSystemLoader(template_dir))
def format_number(value):
"""Format number with thousand separators."""
return f"{value:,}"
def truncate(value, length):
"""Truncate string to length with ellipsis."""
if len(value) > length:
return value[: length - 3] + "..."
return value
def pad(value, length):
"""Pad string to specified length."""
return f"{value:<{length}}"
def us_to_ms(value):
"""Convert microseconds to milliseconds."""
return value / 1000.0
def us_to_s(value):
"""Convert microseconds to seconds."""
return value / 1000000.0
env.filters["format_number"] = format_number
env.filters["truncate"] = truncate
env.filters["pad"] = pad
env.filters["us_to_ms"] = us_to_ms
env.filters["us_to_s"] = us_to_s
return env
def generate_report(env, data, args, total_events, num_files):
"""Generate the final report using Jinja2 template."""
print("Rendering report with Jinja2...")
template = env.get_template("build_analysis_report.md.jinja")
report_content = template.render(
timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
target=args["target"],
granularity=args["granularity"],
build_time=args["build_time"],
total_events=total_events,
num_files=num_files,
total_instantiations=data["total_inst"],
unique_families=data["unique_families"],
total_trace_time=data["total_trace_time"],
total_template_time=data["total_template_time"],
phases=data["sorted_phases"],
top_individual=data["top_individual"],
templates_by_time=data["templates_by_time"],
templates_by_count=data["templates_by_count"],
median_count=data["median_count"],
top10_pct=data["top10_pct"],
file_stats=data["file_stats"],
)
return report_content
def main():
"""Main entry point for the analysis tool."""
args = parse_arguments()
# Find and load trace files
trace_files = find_trace_files(args["trace_input"])
all_trace_data = load_trace_data(trace_files)
# Process events from all files
template_stats, phase_stats, top_individual, file_stats, total_events = (
process_events(all_trace_data)
)
# Prepare template data
data = prepare_template_data(
template_stats, phase_stats, top_individual, file_stats
)
# Setup Jinja2 environment
env = setup_jinja_environment(args["template_dir"])
# Generate report
report_content = generate_report(env, data, args, total_events, len(all_trace_data))
# Write output
with open(args["output_file"], "w") as f:
f.write(report_content)
print(f"Report generated: {args['output_file']}")
print(f"Report size: {len(report_content):,} bytes")
print(f"Analyzed {len(all_trace_data)} file(s) with {total_events:,} total events")
if __name__ == "__main__":
main()

237
script/tools/ck-build-analysis Executable file
View File

@@ -0,0 +1,237 @@
#!/bin/bash
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
# CK Build Analysis Tool - Analyze build times using -ftime-trace
set -e
set -o pipefail
# Find script directory and load common utilities
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/common.sh"
# Initialize configuration
PROJECT_ROOT=$(get_project_root "${SCRIPT_DIR}")
CONTAINER_NAME=$(get_container_name "${PROJECT_ROOT}")
# Default settings
GRANULARITY="${CK_BUILD_ANALYSIS_GRANULARITY:-1}"
OUTPUT_FILE="build_time_analysis_report.md"
RECONFIGURE=true
# Help message
show_help() {
cat << EOF
CK Build Analysis - Analyze build times using Clang -ftime-trace
Usage: ck-build-analysis <target> [options]
Arguments:
target Build target to analyze (e.g., example_convnd_fwd_xdl_fp8)
Options:
--granularity=N Time trace granularity in microseconds (default: 1)
--output=FILE Output report filename (default: build_time_analysis_report.md)
--name=NAME Docker container name (default: ${CONTAINER_NAME})
--no-reconfigure Skip CMake reconfiguration if build exists
--help Show this help message
Examples:
ck-build-analysis example_convnd_fwd_xdl_fp8
ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=10
ck-build-analysis test_amdgcn_mma --granularity=1 --output=mma_test_analysis.md
Granularity Guide:
0 - Everything: All compiler events including sub-microsecond operations
Use for LLVM internals debugging. Large files, higher overhead.
1 (default) - Complete template coverage: Captures all template instantiations
Best balance - filters sub-microsecond noise, low overhead
10 - Daily use: Captures most expensive templates, smaller files
Good for quick checks and routine analysis
50-100 - Intermediate: Balanced between detail and file size
Suitable for CI/CD tracking
500 - High-level only: Major compilation phases, minimal detail
Not recommended for template analysis (loses most instantiations)
Recommendation: Use 1us (default) for template analysis, 10us for quick checks.
EOF
}
# Parse arguments
TARGET=""
while [[ $# -gt 0 ]]; do
case $1 in
--granularity=*)
GRANULARITY="${1#*=}"
shift
;;
--output=*)
OUTPUT_FILE="${1#*=}"
shift
;;
--name=*)
CONTAINER_NAME="${1#*=}"
shift
;;
--no-reconfigure)
RECONFIGURE=false
shift
;;
--help|-h)
show_help
exit 0
;;
-*)
echo "Unknown option: $1"
show_help
exit 1
;;
*)
if [ -z "$TARGET" ]; then
TARGET="$1"
else
echo "Error: Multiple targets specified"
show_help
exit 1
fi
shift
;;
esac
done
if [ -z "$TARGET" ]; then
echo "Error: No target specified"
echo ""
show_help
exit 1
fi
# Validate OUTPUT_FILE to prevent path traversal
if [[ "$OUTPUT_FILE" =~ / ]] || [[ "$OUTPUT_FILE" =~ \.\. ]]; then
echo "Error: OUTPUT_FILE must be a simple filename (no path separators or .. allowed)"
echo "Invalid: $OUTPUT_FILE"
exit 1
fi
echo "═══════════════════════════════════════════════════════════════"
echo " CK Build Time Analysis"
echo "═══════════════════════════════════════════════════════════════"
echo "Target: $TARGET"
echo "Granularity: ${GRANULARITY}us"
echo "Container: $CONTAINER_NAME"
echo "Output: $OUTPUT_FILE"
echo "═══════════════════════════════════════════════════════════════"
echo ""
# Ensure container is running
ensure_container_running "${CONTAINER_NAME}" "${SCRIPT_DIR}"
# Configure CMake with -ftime-trace if needed
if [ "$RECONFIGURE" = true ] || ! docker exec "${CONTAINER_NAME}" test -f /workspace/build/build.ninja 2>/dev/null; then
echo ""
echo "Configuring CMake with -ftime-trace (granularity=${GRANULARITY}us)..."
GPU_TARGET=$(detect_gpu_target "${CONTAINER_NAME}")
docker exec -e GPU_TARGET="${GPU_TARGET}" -e GRANULARITY="${GRANULARITY}" "${CONTAINER_NAME}" bash -c '
cd /workspace || exit 1
rm -rf /workspace/build
mkdir /workspace/build
cd /workspace/build || exit 1
cmake .. -GNinja \
-DGPU_TARGETS="${GPU_TARGET}" \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ \
-DCMAKE_CXX_FLAGS="-ftime-trace -ftime-trace-granularity=${GRANULARITY}" \
-DCMAKE_HIP_FLAGS="-ftime-trace -ftime-trace-granularity=${GRANULARITY}" \
-DBUILD_TESTING=ON 2>&1 | tail -20
'
echo "CMake configuration complete"
fi
# Build the target
echo ""
echo "Building target: $TARGET"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
BUILD_START=$(date +%s)
docker exec -e TARGET="${TARGET}" "${CONTAINER_NAME}" bash -c 'cd /workspace/build && time ninja "${TARGET}" 2>&1'
BUILD_END=$(date +%s)
BUILD_TIME=$((BUILD_END - BUILD_START))
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Build completed in ${BUILD_TIME} seconds"
# Find all trace JSON files for the target
echo ""
echo "Locating trace files..."
# Count trace files
TRACE_COUNT=$(docker exec -e TARGET="${TARGET}" "${CONTAINER_NAME}" bash -c '
find /workspace/build -type f \( -name "*.cpp.json" -o -name "*.hip.json" \) 2>/dev/null | \
grep -vF "compile_commands.json" | wc -l
')
if [ "$TRACE_COUNT" -eq 0 ]; then
echo "Error: Could not find any trace files in /workspace/build"
echo "Expected .cpp.json or .hip.json files from -ftime-trace compilation"
exit 1
fi
echo "Found ${TRACE_COUNT} trace file(s) in build directory"
# We'll pass the build directory to the Python script
BUILD_DIR="/workspace/build"
# Generate analysis report
echo ""
echo "Generating analysis report..."
# Copy analysis script and templates to container
docker cp "${SCRIPT_DIR}/analyze_build_trace.py" "${CONTAINER_NAME}:/tmp/analyze_build_trace.py"
docker cp "${SCRIPT_DIR}/templates" "${CONTAINER_NAME}:/tmp/ck_build_analysis_templates"
# Check if uv is available, install if needed, and use for PEP 723 dependency management
if ! docker exec "${CONTAINER_NAME}" bash -c "command -v uv >/dev/null 2>&1 || test -x \$HOME/.local/bin/uv"; then
echo "uv not found, installing via pipx..."
docker exec "${CONTAINER_NAME}" bash -c "
# Install pipx if not available
if ! command -v pipx >/dev/null 2>&1; then
apt-get update -qq && apt-get install -y -qq pipx >/dev/null 2>&1
fi
# Install uv via pipx
pipx install uv >/dev/null 2>&1
"
echo "uv installed successfully"
fi
echo "Using uv run for automatic dependency management..."
# Ensure uv is in PATH (handles ~/.local/bin installation)
# Pass build directory instead of single file
docker exec -e BUILD_DIR="${BUILD_DIR}" -e OUTPUT_FILE="${OUTPUT_FILE}" -e TARGET="${TARGET}" -e GRANULARITY="${GRANULARITY}" -e BUILD_TIME="${BUILD_TIME}" "${CONTAINER_NAME}" bash -c 'export PATH="$HOME/.local/bin:$PATH" && uv run --no-project /tmp/analyze_build_trace.py "${BUILD_DIR}" "/workspace/${OUTPUT_FILE}" "${TARGET}" "${GRANULARITY}" "${BUILD_TIME}" /tmp/ck_build_analysis_templates'
# Copy report back to host
docker cp "${CONTAINER_NAME}:/workspace/${OUTPUT_FILE}" "${PROJECT_ROOT}/${OUTPUT_FILE}"
# Cleanup
docker exec "${CONTAINER_NAME}" rm -f /tmp/analyze_build_trace.py
docker exec "${CONTAINER_NAME}" rm -rf /tmp/ck_build_analysis_templates
echo ""
echo "═══════════════════════════════════════════════════════════════"
echo " Analysis Complete!"
echo "═══════════════════════════════════════════════════════════════"
echo "Report: ${PROJECT_ROOT}/${OUTPUT_FILE}"
echo ""
echo "Summary:"
docker exec "${CONTAINER_NAME}" bash -c "head -20 /workspace/${OUTPUT_FILE} | tail -10"
echo ""
echo "View the full report:"
echo " cat ${OUTPUT_FILE}"
echo " or open it in your editor"
echo "═══════════════════════════════════════════════════════════════"

294
script/tools/ck-docker Executable file
View File

@@ -0,0 +1,294 @@
#!/bin/bash
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
# CK Docker Tool - Build and test composable_kernel in Docker with ROCm support
set -e
set -o pipefail
# Find script directory and load common utilities
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
source "${SCRIPT_DIR}/common.sh"
# Initialize configuration
PROJECT_ROOT=$(get_project_root "${SCRIPT_DIR}")
CONTAINER_NAME=$(get_container_name "${PROJECT_ROOT}")
# Help message
show_help() {
cat << EOF
CK Docker Tool - Build and test composable_kernel in Docker
Usage: ck-docker <command> [options]
Commands:
start [name] Start Docker container
build [target] [--reconfigure] Build target (optionally reconfigure CMake)
test <test> [options] Run test
shell [name] Open shell in container
status [name] Check container status
stop [name] Stop and remove container
Examples:
ck-docker start
ck-docker build test_amdgcn_mma
ck-docker build --reconfigure test_amdgcn_mma
ck-docker test test_amdgcn_mma --gtest_filter=*Fp16*
ck-docker shell
Environment:
CK_CONTAINER_NAME - Override default container name (default: ck_<username>_<branch>)
CK_DOCKER_IMAGE - Override Docker image (default: rocm/composable_kernel:ck_ub24.04_rocm7.0.1)
GPU_TARGET - Override GPU target detection (e.g., gfx950, gfx942)
EOF
}
# Start container
cmd_start() {
local name="${1:-${CONTAINER_NAME}}"
local docker_image=$(get_docker_image)
# Check if container exists and is running
if container_exists "${name}"; then
if container_is_running "${name}"; then
echo "Container '${name}' is already running"
return 0
else
echo "Starting existing container '${name}'..."
docker start "${name}"
echo "Container started"
return 0
fi
fi
echo "Creating new Docker container '${name}'..."
docker run -d \
--name "${name}" \
--device=/dev/kfd --device=/dev/dri \
--security-opt seccomp=unconfined \
--group-add video \
-v "${PROJECT_ROOT}":/workspace \
-w /workspace \
"${docker_image}" \
tail -f /dev/null
echo "Container '${name}' started successfully"
docker exec "${name}" bash -c "echo 'Working directory:' && pwd"
}
# Build target
cmd_build() {
local target=""
local name="${CONTAINER_NAME}"
local reconfigure=false
while [[ $# -gt 0 ]]; do
case $1 in
--name)
name="$2"
shift 2
;;
--reconfigure)
reconfigure=true
shift
;;
*)
target="$1"
shift
;;
esac
done
# Check if container is running
if ! container_is_running "${name}"; then
echo "Container '${name}' not running. Starting..."
cmd_start "${name}"
fi
# Reconfigure CMake if requested or if build.ninja doesn't exist
if [ "$reconfigure" = true ] || ! docker exec "${name}" test -f /workspace/build/build.ninja 2>/dev/null; then
echo "Detecting GPU target..."
local gpu_target=$(detect_gpu_target "${name}")
if [ "$reconfigure" = true ]; then
echo "Reconfiguring CMake from scratch for GPU target: ${gpu_target}"
else
echo "Configuring build with CMake for GPU target: ${gpu_target}"
fi
docker exec "${name}" bash -c "
cd /workspace || exit 1
rm -rf /workspace/build
mkdir /workspace/build
cd /workspace/build || exit 1
cmake .. -GNinja \
-DGPU_TARGETS=${gpu_target} \
-DCMAKE_BUILD_TYPE=Release \
-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ \
-DBUILD_TESTING=ON 2>&1 | tail -30
"
fi
if [ -z "$target" ]; then
echo "Building all configured targets..."
else
echo "Building target: ${target}"
fi
docker exec "${name}" bash -c "
cd /workspace/build || exit 1
ninja ${target} 2>&1
"
echo "Build complete"
}
# Run test
cmd_test() {
local test_name=""
local name="${CONTAINER_NAME}"
local -a test_options=()
while [[ $# -gt 0 ]]; do
case $1 in
--name)
name="$2"
shift 2
;;
--gtest_*|--help)
test_options+=("$1")
shift
;;
*)
if [ -z "$test_name" ]; then
test_name="$1"
else
test_options+=("$1")
fi
shift
;;
esac
done
if [ -z "$test_name" ]; then
echo "Error: test_name required"
echo "Usage: ck-docker test <test_name> [--name container_name] [gtest_options]"
return 1
fi
# Check if container is running
if ! container_is_running "${name}"; then
echo "Error: Container '${name}' not running"
echo "Start it with: ck-docker start --name ${name}"
return 1
fi
if ! docker exec "${name}" test -f "/workspace/build/bin/${test_name}" 2>/dev/null; then
echo "Test executable not found. Building ${test_name}..."
cmd_build "${test_name}" --name "${name}"
fi
echo "Running: ${test_name} ${test_options[*]}"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
# Build the command with proper quoting
local cmd="cd /workspace/build && ./bin/${test_name}"
for opt in "${test_options[@]}"; do
cmd="${cmd} $(printf '%q' "$opt")"
done
docker exec "${name}" bash -c "${cmd}"
}
# Shell
cmd_shell() {
local name="${1:-${CONTAINER_NAME}}"
# Check if container is running
if ! container_is_running "${name}"; then
echo "Container '${name}' not running. Starting..."
cmd_start "${name}"
fi
echo "Opening shell in '${name}' (type 'exit' to leave)..."
docker exec -it "${name}" bash
}
# Status
cmd_status() {
local name="${1:-}"
local docker_image=$(get_docker_image)
if [ -z "$name" ]; then
echo "Composable Kernel Docker Containers:"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
docker ps -a --filter "ancestor=${docker_image}" \
--format "table {{.Names}}\t{{.Status}}\t{{.CreatedAt}}" || echo "No containers found"
else
# Check container status
if container_is_running "${name}"; then
echo "Container '${name}' is RUNNING"
docker ps --filter "name=^${name}$" --format "table {{.Names}}\t{{.Status}}\t{{.Image}}"
echo ""
echo "GPU Information:"
docker exec "${name}" bash -c "rocm-smi --showproductname 2>/dev/null | head -10 || echo 'No GPU detected'"
elif container_exists "${name}"; then
echo "Container '${name}' exists but is STOPPED"
echo "Start with: ck-docker start ${name}"
else
echo "Container '${name}' does NOT exist"
echo "Create with: ck-docker start ${name}"
fi
fi
}
# Stop
cmd_stop() {
local name="${1:-${CONTAINER_NAME}}"
# Check if container exists
if container_exists "${name}"; then
echo "Stopping and removing container '${name}'..."
docker stop "${name}" 2>/dev/null || true
docker rm "${name}" 2>/dev/null || true
echo "Container stopped and removed"
else
echo "Container '${name}' does not exist"
fi
}
# Main command dispatcher
case "${1:-}" in
start)
shift
cmd_start "$@"
;;
build)
shift
cmd_build "$@"
;;
test)
shift
cmd_test "$@"
;;
shell)
shift
cmd_shell "$@"
;;
status)
shift
cmd_status "$@"
;;
stop)
shift
cmd_stop "$@"
;;
help|--help|-h)
show_help
;;
*)
echo "Unknown command: ${1:-}"
echo ""
show_help
exit 1
;;
esac

97
script/tools/common.sh Normal file
View File

@@ -0,0 +1,97 @@
#!/bin/bash
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
# Common utilities for CK Docker tools
# Shared configuration and helper functions
# Find project root (where .git directory is)
get_project_root() {
local script_dir="$1"
cd "${script_dir}/../.." && pwd
}
# Detect git branch and sanitize for Docker naming
get_sanitized_branch() {
local project_root="$1"
local branch
branch=$(cd "${project_root}" && git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '_' | tr -cd 'a-zA-Z0-9_-' || echo "")
branch=${branch:-unknown}
# Handle detached HEAD state
if [ "${branch}" = "HEAD" ]; then
branch="detached"
fi
echo "${branch}"
}
# Get username with fallback
get_username() {
echo "${USER:-$(whoami 2>/dev/null || echo "user")}"
}
# Generate default container name: ck_<username>_<branch>
get_default_container_name() {
local project_root="$1"
local user_name
local git_branch
user_name=$(get_username)
git_branch=$(get_sanitized_branch "${project_root}")
echo "ck_${user_name}_${git_branch}"
}
# Get container name (respects CK_CONTAINER_NAME env var)
get_container_name() {
local project_root="$1"
local default_name
default_name=$(get_default_container_name "${project_root}")
echo "${CK_CONTAINER_NAME:-${default_name}}"
}
# Get Docker image (respects CK_DOCKER_IMAGE env var)
get_docker_image() {
echo "${CK_DOCKER_IMAGE:-rocm/composable_kernel:ck_ub24.04_rocm7.0.1}"
}
# Check if container exists (exact match)
container_exists() {
local name="$1"
docker ps -a --filter "name=^${name}$" --format '{{.Names}}' | grep -q "^${name}$"
}
# Check if container is running (exact match)
container_is_running() {
local name="$1"
docker ps --filter "name=^${name}$" --format '{{.Names}}' | grep -q "^${name}$"
}
# Detect GPU target in container
detect_gpu_target() {
local container="$1"
# Allow override via GPU_TARGET environment variable
if [ -n "${GPU_TARGET:-}" ]; then
echo "${GPU_TARGET}"
return 0
fi
docker exec "${container}" bash -c "
rocminfo 2>/dev/null | grep -oP 'gfx[0-9a-z]+' | head -1 || echo 'gfx950'
" | tr -d '\r\n'
}
# Ensure container is running, start if needed
ensure_container_running() {
local container="$1"
local script_dir="$2"
if ! container_is_running "${container}"; then
echo "Container '${container}' not running. Starting with ck-docker..."
"${script_dir}/ck-docker" start "${container}"
fi
}

View File

@@ -0,0 +1,125 @@
# Composable Kernel Build Time Analysis Report
**Generated:** {{ timestamp }}
**Target:** {{ target }}
**Granularity:** {{ granularity }}µs
**Files Analyzed:** {{ num_files }}
## Executive Summary
- **Wall Clock Time:** {{ build_time }} seconds
- **Trace Time:** {{ total_trace_time|us_to_s|round(1) }} seconds
- **Template Instantiation Time:** {{ total_template_time|us_to_s|round(1) }} seconds ({{ (100 * total_template_time / total_trace_time)|round(1) }}% of trace)
- **Total Events Captured:** {{ total_events|format_number }} (across {{ num_files }} file{{ 's' if num_files != 1 else '' }})
- **Total Template Instantiations:** {{ total_instantiations|format_number }}
- **Unique Template Families:** {{ unique_families }}
{% if num_files > 1 -%}
## Per-File Analysis
| File | Events | Template Time (ms) | % of Total |
|------|--------|-------------------|------------|
{% for file in file_stats[:20] -%}
| {{ file.name|truncate(50)|pad(50) }} | {{ "%7d"|format(file.events) }} | {{ "%17.2f"|format(file.template_time|us_to_ms) }} | {{ "%9.1f"|format(100 * file.template_time / total_template_time if total_template_time > 0 else 0) }}% |
{% endfor %}
{% endif -%}
## Compilation Phase Breakdown
| Phase | Time (ms) | Time (s) | % of Total |
|-------|-----------|----------|------------|
{% for phase, dur in phases[:20] -%}
| {{ phase|pad(40) }} | {{ "%9.2f"|format(dur|us_to_ms) }} | {{ "%8.2f"|format(dur|us_to_s) }} | {{ "%9.1f"|format(100 * dur / total_trace_time) }}% |
{% endfor %}
## Top 30 Most Expensive Individual Instantiations
{% if num_files > 1 -%}
| Rank | Template | Type | Time (ms) | File |
|------|----------|------|-----------|------|
{% for inst in top_individual[:30] -%}
| {{ "%4d"|format(loop.index) }} | {{ inst.detail|truncate(50) }} | {{ inst.inst_type|pad(5) }} | {{ "%9.2f"|format(inst.dur|us_to_ms) }} | {{ inst.file|truncate(20) }} |
{% endfor -%}
{% else -%}
| Rank | Template | Type | Time (ms) |
|------|----------|------|-----------|
{% for inst in top_individual[:30] -%}
| {{ "%4d"|format(loop.index) }} | {{ inst.detail|truncate(70) }} | {{ inst.inst_type|pad(5) }} | {{ "%9.2f"|format(inst.dur|us_to_ms) }} |
{% endfor -%}
{% endif %}
## Template Families by Total Time (Top 50)
| Rank | Template Family | Count | Total (ms) | Avg (ms) | % of Total |
|------|-----------------|-------|------------|----------|------------|
{% for name, stats in templates_by_time[:50] -%}
| {{ "%4d"|format(loop.index) }} | {{ name|truncate(43)|pad(43) }} | {{ "%5d"|format(stats.count) }} | {{ "%10.2f"|format(stats.total_dur|us_to_ms) }} | {{ "%8.2f"|format(stats.avg|us_to_ms) }} | {{ "%9.1f"|format(stats.pct) }}% |
{% endfor %}
## Template Families by Instantiation Count (Top 50)
| Rank | Template Family | Count | Total (ms) | Avg (ms) |
|------|-----------------|-------|------------|----------|
{% for name, stats in templates_by_count[:50] -%}
| {{ "%4d"|format(loop.index) }} | {{ name|truncate(43)|pad(43) }} | {{ "%5d"|format(stats.count) }} | {{ "%10.2f"|format(stats.total_dur|us_to_ms) }} | {{ "%8.2f"|format(stats.avg|us_to_ms) }} |
{% endfor %}
## Key Insights
### 1. Template Instantiation Impact
- Template instantiation accounts for {{ (100 * total_template_time / total_trace_time)|round(1) }}% of total trace time
{% if unique_families >= 10 -%}
- Top 10 template families account for {{ top10_pct|round(1) }}% of instantiation time
{% endif %}
### 2. Most Expensive Templates
{% if templates_by_time|length > 0 -%}
- **{{ templates_by_time[0][0] }}**: {{ templates_by_time[0][1].count|format_number }} instantiations, {{ (templates_by_time[0][1].total_dur|us_to_s)|round(2) }}s total
{% endif -%}
{% if templates_by_time|length > 1 -%}
- **{{ templates_by_time[1][0] }}**: {{ templates_by_time[1][1].count|format_number }} instantiations, {{ (templates_by_time[1][1].avg|us_to_ms)|round(2) }}ms average
{% endif %}
## Optimization Recommendations
### High-Impact Targets (by total time)
{% for name, stats in templates_by_time[:5] -%}
**{{ loop.index }}. {{ name }}** - {{ (stats.total_dur|us_to_s)|round(1) }}s total ({{ stats.pct|round(1) }}%)
- {{ stats.count|format_number }} instantiations, {{ (stats.avg|us_to_ms)|round(2) }}ms average
{% if stats.count > 100 -%}
- Strategy: Extern templates - High instantiation count suggests repeated compilation
{% elif stats.avg|us_to_ms > 50 -%}
- Strategy: Template specialization - High individual cost suggests complexity
{% else -%}
- Strategy: Explicit instantiation - Pre-instantiate common configurations
{% endif %}
{% endfor %}
### Frequently Instantiated (optimization candidates)
{% for name, stats in templates_by_count[:5] if stats.count > 100 -%}
**{{ name }}** - {{ stats.count|format_number }} times ({{ (stats.total_dur|us_to_s)|round(2) }}s total)
- Consider: Precompiled headers or extern templates to avoid recompilation
{% endfor %}
### Most Expensive Individual Instantiations
{% for inst in top_individual[:3] -%}
**{{ loop.index }}. {{ inst.detail|truncate(60) }}** - {{ (inst.dur|us_to_ms)|round(1) }}ms
- Strategy: Profile and simplify this specific instantiation
{% endfor %}
## Detailed Statistics
- **Total Unique Templates:** {{ unique_families }}
- **Total Instantiations:** {{ total_instantiations|format_number }}
{% if total_instantiations > 0 -%}
- **Average Instantiation Time:** {{ ((total_template_time // total_instantiations)|us_to_ms)|round(3) }}ms
{% endif -%}
{% if unique_families > 0 -%}
- **Median Template Family Count:** {{ median_count }}
{% endif %}
---
*Report generated using Clang -ftime-trace with {{ granularity }}µs granularity*
*Analysis tool: ck-build-analysis*