mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-20 06:49:15 +00:00
Add LLM-agnostic Docker and build analysis tools (#3576)
This commit introduces utility tools for building, testing, and analyzing
Composable Kernel. The tools are designed to be LLM-agnostic and can be
used with any AI assistant or directly from the command line.
Tools Added:
============
1. ck-docker - Docker container management
- Start/stop ROCm-enabled containers
- Build targets with CMake + Ninja
- Run tests with gtest filters
- Auto-detect GPU targets (gfx950, gfx942, etc.)
- Per-user, per-branch container naming to avoid conflicts
2. ck-build-analysis - Build time profiling
- Uses Clang's -ftime-trace for compilation analysis
- Aggregates statistics across multiple trace files
- Identifies template instantiation bottlenecks
- Generates detailed Markdown reports with:
* Compilation phase breakdown
* Top expensive instantiations
* Template family analysis
* Data-driven optimization recommendations
- Configurable granularity (1µs to 500µs)
- PEP 723 compliant Python script with auto-dependency management via uv
Key Features:
=============
- LLM-agnostic design (works with any AI assistant)
- Zero-configuration setup with automatic dependency installation
- Comprehensive documentation in script/tools/README*.md
- Security hardening (input validation, no command injection)
- Multi-file trace aggregation for accurate build analysis
- Jinja2-based report generation for customizable output
Implementation:
===============
- script/tools/ck-docker - Main Docker orchestration script
- script/tools/ck-build-analysis - Build analysis orchestration
- script/tools/common.sh - Shared utilities (container mgmt, GPU detection)
- script/tools/analyze_build_trace.py - PEP 723 compliant Python analyzer
- script/tools/templates/ - Jinja2 templates for report generation
- script/tools/README*.md - Comprehensive documentation
Directory Structure:
====================
script/tools/
├── README.md # Main overview
├── README_ck-docker.md # ck-docker documentation
├── README_ck-build-analysis.md # ck-build-analysis documentation
├── ck-docker # Docker orchestration script
├── ck-build-analysis # Build analysis orchestration
├── common.sh # Shared utilities
├── analyze_build_trace.py # Python analyzer (PEP 723)
└── templates/
└── build_analysis_report.md.jinja # Report template
The tools follow Unix philosophy: do one thing well, compose easily,
and work from both CLI and programmatic contexts.
This commit is contained in:
78
script/tools/README.md
Normal file
78
script/tools/README.md
Normal file
@@ -0,0 +1,78 @@
|
||||
# Composable Kernel Tools
|
||||
|
||||
This directory contains utility tools for building, testing, and analyzing Composable Kernel.
|
||||
|
||||
These tools are designed to be LLM-agnostic and can be used with any AI assistant or directly from the command line.
|
||||
|
||||
## Available Tools
|
||||
|
||||
### ck-docker
|
||||
|
||||
Build and test composable_kernel in Docker with ROCm support.
|
||||
|
||||
See [README_ck-docker.md](README_ck-docker.md) for details.
|
||||
|
||||
**Quick start:**
|
||||
```bash
|
||||
# Add to PATH
|
||||
export PATH="$PATH:$PWD/script/tools"
|
||||
|
||||
# Start container and build
|
||||
ck-docker start
|
||||
ck-docker build test_amdgcn_mma
|
||||
ck-docker test test_amdgcn_mma
|
||||
```
|
||||
|
||||
### ck-build-analysis
|
||||
|
||||
Analyze Composable Kernel build times using Clang's -ftime-trace profiler.
|
||||
|
||||
See [README_ck-build-analysis.md](README_ck-build-analysis.md) for details.
|
||||
|
||||
**Quick start:**
|
||||
```bash
|
||||
# Add to PATH
|
||||
export PATH="$PATH:$PWD/script/tools"
|
||||
|
||||
# Analyze build time
|
||||
ck-build-analysis example_convnd_fwd_xdl_fp8
|
||||
```
|
||||
|
||||
## LLM Assistant Integration
|
||||
|
||||
These tools can be used as-is with any LLM assistant by providing the tool documentation to the assistant. The assistant can then invoke these tools on your behalf.
|
||||
|
||||
For example, you can ask:
|
||||
- "Start the docker container"
|
||||
- "Build and test test_amdgcn_mma"
|
||||
- "Analyze build time for example_convnd_fwd_xdl_fp8"
|
||||
|
||||
The assistant will translate your natural language request into the appropriate tool invocation.
|
||||
|
||||
## Dependencies
|
||||
|
||||
- **ck-docker**: Requires Docker and ROCm-capable GPU (for running tests)
|
||||
- **ck-build-analysis**: Requires Docker, automatically installs Python dependencies (jinja2) via `uv`
|
||||
|
||||
## Directory Structure
|
||||
|
||||
```
|
||||
script/tools/
|
||||
├── README.md # This file
|
||||
├── README_ck-docker.md # Documentation for ck-docker
|
||||
├── README_ck-build-analysis.md # Documentation for ck-build-analysis
|
||||
├── ck-docker # Docker container management tool
|
||||
├── ck-build-analysis # Build time analysis tool
|
||||
├── common.sh # Shared utilities for bash scripts
|
||||
├── analyze_build_trace.py # Python script for trace analysis (PEP 723 compliant)
|
||||
└── templates/
|
||||
└── build_analysis_report.md.jinja # Jinja2 template for analysis reports
|
||||
```
|
||||
|
||||
## Contributing
|
||||
|
||||
When adding new tools to this directory:
|
||||
1. Keep them LLM-agnostic (avoid hardcoding references to specific AI assistants)
|
||||
2. Provide clear command-line usage documentation
|
||||
3. Include examples for both CLI and LLM assistant usage
|
||||
4. Follow the existing naming convention and structure
|
||||
168
script/tools/README_ck-build-analysis.md
Normal file
168
script/tools/README_ck-build-analysis.md
Normal file
@@ -0,0 +1,168 @@
|
||||
# ck-build-analysis
|
||||
|
||||
Analyze Composable Kernel build times using Clang's -ftime-trace profiler.
|
||||
|
||||
## Terminal Usage
|
||||
|
||||
Direct command-line usage:
|
||||
|
||||
```bash
|
||||
# From composable_kernel directory
|
||||
script/tools/ck-build-analysis example_convnd_fwd_xdl_fp8
|
||||
script/tools/ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=1
|
||||
script/tools/ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=1 --output=my_report.md
|
||||
|
||||
# Or add to PATH
|
||||
export PATH="$PATH:$PWD/script/tools"
|
||||
ck-build-analysis example_convnd_fwd_xdl_fp8
|
||||
```
|
||||
|
||||
## LLM Assistant Integration
|
||||
|
||||
If using an LLM assistant, you can ask in natural language:
|
||||
- "Analyze build time for example_convnd_fwd_xdl_fp8"
|
||||
- "Profile the compilation of test_amdgcn_mma with 1us granularity"
|
||||
- "Generate a build time report for example_gemm_xdl"
|
||||
|
||||
## Commands
|
||||
|
||||
```
|
||||
ck-build-analysis <target> [options]
|
||||
|
||||
Options:
|
||||
--granularity=N Time trace granularity in microseconds (default: 1)
|
||||
--output=FILE Output report filename (default: build_time_analysis_report.md)
|
||||
--name=NAME Docker container name (default: from CK_CONTAINER_NAME or auto-generated)
|
||||
--no-reconfigure Skip CMake reconfiguration if build exists
|
||||
--help Show this help message
|
||||
```
|
||||
|
||||
## What It Does
|
||||
|
||||
1. **Configures CMake** with `-ftime-trace` and custom granularity
|
||||
2. **Builds the target** using Ninja in Docker
|
||||
3. **Analyzes the trace** JSON file for template instantiation patterns
|
||||
4. **Generates a report** with:
|
||||
- Compilation phase breakdown
|
||||
- Top expensive individual instantiations
|
||||
- Template families ranked by total time and count
|
||||
- Key insights and optimization recommendations
|
||||
- Complete statistics
|
||||
|
||||
## Configuration
|
||||
|
||||
- **Container**: Uses ck-docker container (auto-starts if needed)
|
||||
- **Granularity**: Default 1us (100% template coverage, best balance)
|
||||
- **Output**: Markdown report in project root
|
||||
|
||||
## Environment
|
||||
|
||||
```bash
|
||||
export CK_CONTAINER_NAME=my_build # Override container name
|
||||
export CK_BUILD_ANALYSIS_GRANULARITY=1 # Default granularity in microseconds
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
```bash
|
||||
# Complete template analysis with default granularity (1us - recommended)
|
||||
ck-build-analysis example_convnd_fwd_xdl_fp8
|
||||
|
||||
# Quick daily check (10us granularity, captures most expensive templates)
|
||||
ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=10
|
||||
|
||||
# Maximum detail (0us granularity, includes LLVM internals)
|
||||
ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=0
|
||||
|
||||
# High-level overview (500us granularity, major bottlenecks only)
|
||||
ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=500
|
||||
|
||||
# Custom output filename
|
||||
ck-build-analysis example_convnd_fwd_xdl_fp8 --output=fp8_conv_analysis.md
|
||||
|
||||
# Analyze test target
|
||||
ck-build-analysis test_amdgcn_mma
|
||||
|
||||
# Use existing build (skip reconfigure)
|
||||
ck-build-analysis example_convnd_fwd_xdl_fp8 --no-reconfigure
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
The report includes:
|
||||
- **Executive Summary**: Total time, events, instantiations, unique templates
|
||||
- **Compilation Phases**: InstantiateFunction, Frontend, Backend, Optimizer, etc.
|
||||
- **Top 30 Individual Instantiations**: Most expensive single templates
|
||||
- **Template Families**: Grouped by total time and instantiation count
|
||||
- **Key Insights**: What's slow and why
|
||||
- **Optimization Recommendations**: Short, medium, and long-term strategies
|
||||
- **Detailed Statistics**: Averages, medians, distributions
|
||||
|
||||
## Granularity Trade-offs
|
||||
|
||||
| Granularity | Template Coverage | Use Case |
|
||||
|-------------|-------------------|----------|
|
||||
| **0us** | All templates + sub-us compiler internals | LLVM internals debugging, very large files, higher overhead |
|
||||
| **1us (default)** | **All templates** | **Default: Complete template analysis with low overhead** |
|
||||
| **10us** | Most expensive templates | Daily quick checks, smaller files, minimal overhead |
|
||||
| **50-100us** | Top bottlenecks | Balanced detail/size, suitable for CI/CD |
|
||||
| **500us** | High-level phases only | Not recommended for template analysis |
|
||||
|
||||
**Recommended default**: 1us captures all template instantiations with minimal overhead
|
||||
|
||||
## Notes
|
||||
|
||||
- **0us and 1us capture all templates** - 0us adds sub-microsecond compiler internals
|
||||
- **1us is the sweet spot**: complete template coverage, filters noise, low overhead
|
||||
- **10us is practical** for daily use: captures most expensive templates, smaller files
|
||||
- **500us loses most template instantiation data** - only use for high-level phase breakdown
|
||||
- Finer granularity = more events = larger files + higher build time overhead
|
||||
- For template-heavy C++ codebases like CK: **use 1us for analysis, 10us for daily checks**
|
||||
|
||||
## Implementation Details
|
||||
|
||||
### PEP 723 Compliance with Automatic Dependency Management
|
||||
|
||||
The analysis script (`analyze_build_trace.py`) is PEP 723 compliant with inline dependency metadata:
|
||||
|
||||
```python
|
||||
# /// script
|
||||
# requires-python = ">=3.8"
|
||||
# dependencies = [
|
||||
# "jinja2>=3.0.0",
|
||||
# ]
|
||||
# ///
|
||||
```
|
||||
|
||||
**The tool automatically installs and uses `uv`**, which provides:
|
||||
- ✅ Zero-configuration dependency management
|
||||
- ✅ Automatic installation of jinja2 from PEP 723 metadata
|
||||
- ✅ Isolated dependency environment (no system pollution)
|
||||
- ✅ Fast caching for subsequent runs
|
||||
|
||||
**No manual setup required!** The first time you run the tool, it will:
|
||||
1. Detect if `uv` is installed in the container
|
||||
2. If not, automatically install it via Ubuntu packages (pipx install uv)
|
||||
3. Use `uv run` to execute the analysis with auto-managed dependencies
|
||||
|
||||
On subsequent runs, `uv` will already be available and dependencies will be cached.
|
||||
|
||||
Installation is done through Ubuntu's package manager for security and reliability.
|
||||
|
||||
### Components
|
||||
|
||||
- **ck-build-analysis** - Main bash script that orchestrates Docker, CMake, and analysis
|
||||
- **analyze_build_trace.py** - PEP 723 compliant Python script for trace analysis
|
||||
- **templates/build_analysis_report.md.jinja** - Jinja2 template for report generation
|
||||
|
||||
### Standalone Usage
|
||||
|
||||
The Python script can also be run independently:
|
||||
|
||||
```bash
|
||||
# With uv (recommended - auto-installs dependencies from PEP 723 metadata)
|
||||
uv run script/tools/analyze_build_trace.py trace.json report.md target 100 22 templates/
|
||||
|
||||
# With pipx (alternative - also auto-installs dependencies)
|
||||
pipx run script/tools/analyze_build_trace.py trace.json report.md target 100 22 templates/
|
||||
```
|
||||
80
script/tools/README_ck-docker.md
Normal file
80
script/tools/README_ck-docker.md
Normal file
@@ -0,0 +1,80 @@
|
||||
# ck-docker
|
||||
|
||||
Build and test composable_kernel in Docker with ROCm support.
|
||||
|
||||
## Terminal Usage
|
||||
|
||||
Direct command-line usage:
|
||||
|
||||
```bash
|
||||
# From composable_kernel directory
|
||||
script/tools/ck-docker start
|
||||
script/tools/ck-docker build test_amdgcn_mma
|
||||
script/tools/ck-docker test test_amdgcn_mma --gtest_filter=*Fp16*
|
||||
script/tools/ck-docker status
|
||||
script/tools/ck-docker shell
|
||||
|
||||
# Or add to PATH
|
||||
export PATH="$PATH:$PWD/script/tools"
|
||||
ck-docker start
|
||||
```
|
||||
|
||||
## LLM Assistant Integration
|
||||
|
||||
If using an LLM assistant, you can ask in natural language:
|
||||
- "Start the docker container"
|
||||
- "Build test_amdgcn_mma"
|
||||
- "Run test_amdgcn_mma with filter *Fp16*"
|
||||
- "Check container status"
|
||||
- "Open a shell in the container"
|
||||
|
||||
## Commands
|
||||
|
||||
```
|
||||
ck-docker start [name] Start Docker container
|
||||
ck-docker build [target] [--reconfigure] Build target (optionally reconfigure CMake)
|
||||
ck-docker test <name> [options] Run test
|
||||
ck-docker shell [name] Interactive shell
|
||||
ck-docker status [name] Check status
|
||||
ck-docker stop [name] Stop container
|
||||
```
|
||||
|
||||
## Configuration
|
||||
|
||||
- **Image**: rocm/composable_kernel:ck_ub24.04_rocm7.0.1
|
||||
- **GPU**: Auto-detected via rocminfo (fallback: gfx950)
|
||||
- **Compiler**: /opt/rocm/llvm/bin/clang++
|
||||
- **Build**: Ninja + CMake (Release)
|
||||
- **Mount**: Current directory → /workspace
|
||||
- **Container Name**: Auto-generated as `ck_<username>_<branch>` to avoid clashes
|
||||
|
||||
## Environment
|
||||
|
||||
```bash
|
||||
export CK_CONTAINER_NAME=my_build # Override default container name
|
||||
export CK_DOCKER_IMAGE=rocm/composable_kernel:ck_ub24.04_rocm7.0.1 # Override Docker image
|
||||
export GPU_TARGET=gfx942 # Override GPU target detection
|
||||
```
|
||||
|
||||
## Examples
|
||||
|
||||
```bash
|
||||
# Start container
|
||||
ck-docker start
|
||||
|
||||
# Build and run test
|
||||
ck-docker build test_amdgcn_mma
|
||||
ck-docker test test_amdgcn_mma
|
||||
|
||||
# Force clean CMake reconfiguration and build
|
||||
ck-docker build --reconfigure test_amdgcn_mma
|
||||
|
||||
# Custom container
|
||||
ck-docker start my_build
|
||||
ck-docker build test_amdgcn_mma --name my_build
|
||||
ck-docker test test_amdgcn_mma --name my_build
|
||||
|
||||
# Debug
|
||||
ck-docker shell
|
||||
ck-docker status
|
||||
```
|
||||
347
script/tools/analyze_build_trace.py
Executable file
347
script/tools/analyze_build_trace.py
Executable file
@@ -0,0 +1,347 @@
|
||||
#!/usr/bin/env python3
|
||||
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
# /// script
|
||||
# requires-python = ">=3.8"
|
||||
# dependencies = [
|
||||
# "jinja2>=3.0.0",
|
||||
# ]
|
||||
# ///
|
||||
"""
|
||||
Build Time Analysis Tool for Composable Kernel
|
||||
|
||||
Analyzes Clang -ftime-trace output to identify template instantiation
|
||||
bottlenecks and generate comprehensive build time reports.
|
||||
"""
|
||||
|
||||
import json
|
||||
import os
|
||||
import re
|
||||
import sys
|
||||
from collections import defaultdict
|
||||
from datetime import datetime
|
||||
|
||||
try:
|
||||
from jinja2 import Environment, FileSystemLoader
|
||||
except ImportError:
|
||||
print("Error: jinja2 is required but not installed.", file=sys.stderr)
|
||||
print("Install with: apt-get install python3-jinja2", file=sys.stderr)
|
||||
print("Or with pip: pip install jinja2", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
|
||||
def parse_arguments():
|
||||
"""Parse command-line arguments."""
|
||||
if len(sys.argv) < 7:
|
||||
print(
|
||||
"Usage: analyze_build_trace.py <trace_files_or_dir> <output_file> <target> <granularity> <build_time> <template_dir>"
|
||||
)
|
||||
print(
|
||||
" trace_files_or_dir: Comma-separated list of trace files OR directory containing .json files"
|
||||
)
|
||||
sys.exit(1)
|
||||
|
||||
return {
|
||||
"trace_input": sys.argv[1],
|
||||
"output_file": sys.argv[2],
|
||||
"target": sys.argv[3],
|
||||
"granularity": sys.argv[4],
|
||||
"build_time": sys.argv[5],
|
||||
"template_dir": sys.argv[6],
|
||||
}
|
||||
|
||||
|
||||
def find_trace_files(trace_input):
|
||||
"""Find all trace files from input (file list, single file, or directory)."""
|
||||
trace_files = []
|
||||
|
||||
# Check if it's a directory
|
||||
if os.path.isdir(trace_input):
|
||||
print(f"Scanning directory: {trace_input}")
|
||||
for root, dirs, files in os.walk(trace_input):
|
||||
for file in files:
|
||||
# Include .cpp.json and .hip.json, exclude compile_commands.json and CMake files
|
||||
if file.endswith((".cpp.json", ".hip.json")) and "CMakeFiles" in root:
|
||||
trace_files.append(os.path.join(root, file))
|
||||
trace_files.sort()
|
||||
# Check if it's a comma-separated list
|
||||
elif "," in trace_input:
|
||||
trace_files = [f.strip() for f in trace_input.split(",")]
|
||||
# Single file
|
||||
else:
|
||||
trace_files = [trace_input]
|
||||
|
||||
# Filter out non-existent files
|
||||
valid_files = [f for f in trace_files if os.path.isfile(f)]
|
||||
|
||||
if not valid_files:
|
||||
print(f"Error: No valid trace files found in: {trace_input}", file=sys.stderr)
|
||||
sys.exit(1)
|
||||
|
||||
print(f"Found {len(valid_files)} trace file(s)")
|
||||
return valid_files
|
||||
|
||||
|
||||
def load_trace_data(trace_files):
|
||||
"""Load and parse multiple trace JSON files."""
|
||||
all_data = []
|
||||
|
||||
for trace_file in trace_files:
|
||||
print(f" Loading: {trace_file}")
|
||||
try:
|
||||
with open(trace_file, "r") as f:
|
||||
data = json.load(f)
|
||||
# Get file basename for tracking
|
||||
file_name = os.path.basename(trace_file)
|
||||
all_data.append({"file": file_name, "path": trace_file, "data": data})
|
||||
except Exception as e:
|
||||
print(f" Warning: Failed to load {trace_file}: {e}", file=sys.stderr)
|
||||
|
||||
return all_data
|
||||
|
||||
|
||||
def process_events(all_trace_data):
|
||||
"""Process trace events from multiple files and extract statistics."""
|
||||
print("Processing events from all files...")
|
||||
|
||||
template_stats = defaultdict(lambda: {"count": 0, "total_dur": 0})
|
||||
phase_stats = defaultdict(int)
|
||||
top_individual = []
|
||||
file_stats = []
|
||||
total_events = 0
|
||||
|
||||
for trace_info in all_trace_data:
|
||||
file_name = trace_info["file"]
|
||||
data = trace_info["data"]
|
||||
events = data.get("traceEvents", [])
|
||||
|
||||
file_template_time = 0
|
||||
file_event_count = len(events)
|
||||
total_events += file_event_count
|
||||
|
||||
print(f" Processing {file_name}: {file_event_count:,} events")
|
||||
|
||||
for event in events:
|
||||
name = event.get("name", "")
|
||||
dur = int(event.get("dur", 0)) # Keep as integer microseconds
|
||||
|
||||
if name and dur > 0:
|
||||
phase_stats[name] += dur
|
||||
|
||||
if name in ["InstantiateFunction", "InstantiateClass"]:
|
||||
detail = event.get("args", {}).get("detail", "")
|
||||
top_individual.append(
|
||||
{"detail": detail, "dur": dur, "type": name, "file": file_name}
|
||||
)
|
||||
|
||||
file_template_time += dur
|
||||
|
||||
# Extract template name (everything before '<' or '(')
|
||||
match = re.match(r"^([^<(]+)", detail)
|
||||
if match:
|
||||
template_name = match.group(1).strip()
|
||||
# Normalize template names
|
||||
template_name = re.sub(r"^ck::", "", template_name)
|
||||
template_name = re.sub(r"^std::", "std::", template_name)
|
||||
|
||||
template_stats[template_name]["count"] += 1
|
||||
template_stats[template_name]["total_dur"] += dur
|
||||
|
||||
file_stats.append(
|
||||
{
|
||||
"name": file_name,
|
||||
"events": file_event_count,
|
||||
"template_time": file_template_time,
|
||||
}
|
||||
)
|
||||
|
||||
return template_stats, phase_stats, top_individual, file_stats, total_events
|
||||
|
||||
|
||||
def prepare_template_data(template_stats, phase_stats, top_individual, file_stats):
|
||||
"""Prepare and calculate derived statistics for template rendering."""
|
||||
print("Sorting data...")
|
||||
|
||||
# Sort data
|
||||
sorted_phases = sorted(phase_stats.items(), key=lambda x: x[1], reverse=True)
|
||||
top_individual.sort(key=lambda x: x["dur"], reverse=True)
|
||||
file_stats.sort(key=lambda x: x["template_time"], reverse=True)
|
||||
|
||||
# Calculate totals
|
||||
total_template_time = sum(s["total_dur"] for s in template_stats.values())
|
||||
total_trace_time = sum(phase_stats.values())
|
||||
total_inst = sum(s["count"] for s in template_stats.values())
|
||||
|
||||
# Prepare templates by time with calculated fields
|
||||
templates_by_time = []
|
||||
for name, stats in sorted(
|
||||
template_stats.items(), key=lambda x: x[1]["total_dur"], reverse=True
|
||||
):
|
||||
templates_by_time.append(
|
||||
(
|
||||
name,
|
||||
{
|
||||
"count": stats["count"],
|
||||
"total_dur": stats["total_dur"],
|
||||
"avg": stats["total_dur"] // stats["count"]
|
||||
if stats["count"] > 0
|
||||
else 0,
|
||||
"pct": 100 * stats["total_dur"] / total_template_time
|
||||
if total_template_time > 0
|
||||
else 0,
|
||||
},
|
||||
)
|
||||
)
|
||||
|
||||
# Prepare templates by count
|
||||
templates_by_count = []
|
||||
for name, stats in sorted(
|
||||
template_stats.items(), key=lambda x: x[1]["count"], reverse=True
|
||||
):
|
||||
templates_by_count.append(
|
||||
(
|
||||
name,
|
||||
{
|
||||
"count": stats["count"],
|
||||
"total_dur": stats["total_dur"],
|
||||
"avg": stats["total_dur"] // stats["count"]
|
||||
if stats["count"] > 0
|
||||
else 0,
|
||||
},
|
||||
)
|
||||
)
|
||||
|
||||
# Add friendly type names to individual instantiations
|
||||
for inst in top_individual:
|
||||
inst["inst_type"] = "Func" if inst["type"] == "InstantiateFunction" else "Class"
|
||||
|
||||
# Calculate additional metrics
|
||||
median_count = 0
|
||||
if len(template_stats) > 0:
|
||||
median_count = sorted([s["count"] for s in template_stats.values()])[
|
||||
len(template_stats) // 2
|
||||
]
|
||||
|
||||
top10_pct = 0
|
||||
if len(templates_by_time) >= 10:
|
||||
top10_pct = (
|
||||
100
|
||||
* sum(s[1]["total_dur"] for s in templates_by_time[:10])
|
||||
/ total_template_time
|
||||
)
|
||||
|
||||
return {
|
||||
"sorted_phases": sorted_phases,
|
||||
"top_individual": top_individual,
|
||||
"templates_by_time": templates_by_time,
|
||||
"templates_by_count": templates_by_count,
|
||||
"total_template_time": total_template_time,
|
||||
"total_trace_time": total_trace_time,
|
||||
"total_inst": total_inst,
|
||||
"median_count": median_count,
|
||||
"top10_pct": top10_pct,
|
||||
"unique_families": len(template_stats),
|
||||
"file_stats": file_stats,
|
||||
}
|
||||
|
||||
|
||||
def setup_jinja_environment(template_dir):
|
||||
"""Set up Jinja2 environment with custom filters."""
|
||||
env = Environment(loader=FileSystemLoader(template_dir))
|
||||
|
||||
def format_number(value):
|
||||
"""Format number with thousand separators."""
|
||||
return f"{value:,}"
|
||||
|
||||
def truncate(value, length):
|
||||
"""Truncate string to length with ellipsis."""
|
||||
if len(value) > length:
|
||||
return value[: length - 3] + "..."
|
||||
return value
|
||||
|
||||
def pad(value, length):
|
||||
"""Pad string to specified length."""
|
||||
return f"{value:<{length}}"
|
||||
|
||||
def us_to_ms(value):
|
||||
"""Convert microseconds to milliseconds."""
|
||||
return value / 1000.0
|
||||
|
||||
def us_to_s(value):
|
||||
"""Convert microseconds to seconds."""
|
||||
return value / 1000000.0
|
||||
|
||||
env.filters["format_number"] = format_number
|
||||
env.filters["truncate"] = truncate
|
||||
env.filters["pad"] = pad
|
||||
env.filters["us_to_ms"] = us_to_ms
|
||||
env.filters["us_to_s"] = us_to_s
|
||||
|
||||
return env
|
||||
|
||||
|
||||
def generate_report(env, data, args, total_events, num_files):
|
||||
"""Generate the final report using Jinja2 template."""
|
||||
print("Rendering report with Jinja2...")
|
||||
|
||||
template = env.get_template("build_analysis_report.md.jinja")
|
||||
|
||||
report_content = template.render(
|
||||
timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"),
|
||||
target=args["target"],
|
||||
granularity=args["granularity"],
|
||||
build_time=args["build_time"],
|
||||
total_events=total_events,
|
||||
num_files=num_files,
|
||||
total_instantiations=data["total_inst"],
|
||||
unique_families=data["unique_families"],
|
||||
total_trace_time=data["total_trace_time"],
|
||||
total_template_time=data["total_template_time"],
|
||||
phases=data["sorted_phases"],
|
||||
top_individual=data["top_individual"],
|
||||
templates_by_time=data["templates_by_time"],
|
||||
templates_by_count=data["templates_by_count"],
|
||||
median_count=data["median_count"],
|
||||
top10_pct=data["top10_pct"],
|
||||
file_stats=data["file_stats"],
|
||||
)
|
||||
|
||||
return report_content
|
||||
|
||||
|
||||
def main():
|
||||
"""Main entry point for the analysis tool."""
|
||||
args = parse_arguments()
|
||||
|
||||
# Find and load trace files
|
||||
trace_files = find_trace_files(args["trace_input"])
|
||||
all_trace_data = load_trace_data(trace_files)
|
||||
|
||||
# Process events from all files
|
||||
template_stats, phase_stats, top_individual, file_stats, total_events = (
|
||||
process_events(all_trace_data)
|
||||
)
|
||||
|
||||
# Prepare template data
|
||||
data = prepare_template_data(
|
||||
template_stats, phase_stats, top_individual, file_stats
|
||||
)
|
||||
|
||||
# Setup Jinja2 environment
|
||||
env = setup_jinja_environment(args["template_dir"])
|
||||
|
||||
# Generate report
|
||||
report_content = generate_report(env, data, args, total_events, len(all_trace_data))
|
||||
|
||||
# Write output
|
||||
with open(args["output_file"], "w") as f:
|
||||
f.write(report_content)
|
||||
|
||||
print(f"Report generated: {args['output_file']}")
|
||||
print(f"Report size: {len(report_content):,} bytes")
|
||||
print(f"Analyzed {len(all_trace_data)} file(s) with {total_events:,} total events")
|
||||
|
||||
|
||||
if __name__ == "__main__":
|
||||
main()
|
||||
237
script/tools/ck-build-analysis
Executable file
237
script/tools/ck-build-analysis
Executable file
@@ -0,0 +1,237 @@
|
||||
#!/bin/bash
|
||||
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
# CK Build Analysis Tool - Analyze build times using -ftime-trace
|
||||
|
||||
set -e
|
||||
set -o pipefail
|
||||
|
||||
# Find script directory and load common utilities
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
source "${SCRIPT_DIR}/common.sh"
|
||||
|
||||
# Initialize configuration
|
||||
PROJECT_ROOT=$(get_project_root "${SCRIPT_DIR}")
|
||||
CONTAINER_NAME=$(get_container_name "${PROJECT_ROOT}")
|
||||
|
||||
# Default settings
|
||||
GRANULARITY="${CK_BUILD_ANALYSIS_GRANULARITY:-1}"
|
||||
OUTPUT_FILE="build_time_analysis_report.md"
|
||||
RECONFIGURE=true
|
||||
|
||||
# Help message
|
||||
show_help() {
|
||||
cat << EOF
|
||||
CK Build Analysis - Analyze build times using Clang -ftime-trace
|
||||
|
||||
Usage: ck-build-analysis <target> [options]
|
||||
|
||||
Arguments:
|
||||
target Build target to analyze (e.g., example_convnd_fwd_xdl_fp8)
|
||||
|
||||
Options:
|
||||
--granularity=N Time trace granularity in microseconds (default: 1)
|
||||
--output=FILE Output report filename (default: build_time_analysis_report.md)
|
||||
--name=NAME Docker container name (default: ${CONTAINER_NAME})
|
||||
--no-reconfigure Skip CMake reconfiguration if build exists
|
||||
--help Show this help message
|
||||
|
||||
Examples:
|
||||
ck-build-analysis example_convnd_fwd_xdl_fp8
|
||||
ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=10
|
||||
ck-build-analysis test_amdgcn_mma --granularity=1 --output=mma_test_analysis.md
|
||||
|
||||
Granularity Guide:
|
||||
0 - Everything: All compiler events including sub-microsecond operations
|
||||
Use for LLVM internals debugging. Large files, higher overhead.
|
||||
|
||||
1 (default) - Complete template coverage: Captures all template instantiations
|
||||
Best balance - filters sub-microsecond noise, low overhead
|
||||
|
||||
10 - Daily use: Captures most expensive templates, smaller files
|
||||
Good for quick checks and routine analysis
|
||||
|
||||
50-100 - Intermediate: Balanced between detail and file size
|
||||
Suitable for CI/CD tracking
|
||||
|
||||
500 - High-level only: Major compilation phases, minimal detail
|
||||
Not recommended for template analysis (loses most instantiations)
|
||||
|
||||
Recommendation: Use 1us (default) for template analysis, 10us for quick checks.
|
||||
EOF
|
||||
}
|
||||
|
||||
# Parse arguments
|
||||
TARGET=""
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case $1 in
|
||||
--granularity=*)
|
||||
GRANULARITY="${1#*=}"
|
||||
shift
|
||||
;;
|
||||
--output=*)
|
||||
OUTPUT_FILE="${1#*=}"
|
||||
shift
|
||||
;;
|
||||
--name=*)
|
||||
CONTAINER_NAME="${1#*=}"
|
||||
shift
|
||||
;;
|
||||
--no-reconfigure)
|
||||
RECONFIGURE=false
|
||||
shift
|
||||
;;
|
||||
--help|-h)
|
||||
show_help
|
||||
exit 0
|
||||
;;
|
||||
-*)
|
||||
echo "Unknown option: $1"
|
||||
show_help
|
||||
exit 1
|
||||
;;
|
||||
*)
|
||||
if [ -z "$TARGET" ]; then
|
||||
TARGET="$1"
|
||||
else
|
||||
echo "Error: Multiple targets specified"
|
||||
show_help
|
||||
exit 1
|
||||
fi
|
||||
shift
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [ -z "$TARGET" ]; then
|
||||
echo "Error: No target specified"
|
||||
echo ""
|
||||
show_help
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Validate OUTPUT_FILE to prevent path traversal
|
||||
if [[ "$OUTPUT_FILE" =~ / ]] || [[ "$OUTPUT_FILE" =~ \.\. ]]; then
|
||||
echo "Error: OUTPUT_FILE must be a simple filename (no path separators or .. allowed)"
|
||||
echo "Invalid: $OUTPUT_FILE"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "═══════════════════════════════════════════════════════════════"
|
||||
echo " CK Build Time Analysis"
|
||||
echo "═══════════════════════════════════════════════════════════════"
|
||||
echo "Target: $TARGET"
|
||||
echo "Granularity: ${GRANULARITY}us"
|
||||
echo "Container: $CONTAINER_NAME"
|
||||
echo "Output: $OUTPUT_FILE"
|
||||
echo "═══════════════════════════════════════════════════════════════"
|
||||
echo ""
|
||||
|
||||
# Ensure container is running
|
||||
ensure_container_running "${CONTAINER_NAME}" "${SCRIPT_DIR}"
|
||||
|
||||
# Configure CMake with -ftime-trace if needed
|
||||
if [ "$RECONFIGURE" = true ] || ! docker exec "${CONTAINER_NAME}" test -f /workspace/build/build.ninja 2>/dev/null; then
|
||||
echo ""
|
||||
echo "Configuring CMake with -ftime-trace (granularity=${GRANULARITY}us)..."
|
||||
|
||||
GPU_TARGET=$(detect_gpu_target "${CONTAINER_NAME}")
|
||||
|
||||
docker exec -e GPU_TARGET="${GPU_TARGET}" -e GRANULARITY="${GRANULARITY}" "${CONTAINER_NAME}" bash -c '
|
||||
cd /workspace || exit 1
|
||||
rm -rf /workspace/build
|
||||
mkdir /workspace/build
|
||||
cd /workspace/build || exit 1
|
||||
cmake .. -GNinja \
|
||||
-DGPU_TARGETS="${GPU_TARGET}" \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ \
|
||||
-DCMAKE_CXX_FLAGS="-ftime-trace -ftime-trace-granularity=${GRANULARITY}" \
|
||||
-DCMAKE_HIP_FLAGS="-ftime-trace -ftime-trace-granularity=${GRANULARITY}" \
|
||||
-DBUILD_TESTING=ON 2>&1 | tail -20
|
||||
'
|
||||
echo "CMake configuration complete"
|
||||
fi
|
||||
|
||||
# Build the target
|
||||
echo ""
|
||||
echo "Building target: $TARGET"
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
|
||||
BUILD_START=$(date +%s)
|
||||
docker exec -e TARGET="${TARGET}" "${CONTAINER_NAME}" bash -c 'cd /workspace/build && time ninja "${TARGET}" 2>&1'
|
||||
BUILD_END=$(date +%s)
|
||||
BUILD_TIME=$((BUILD_END - BUILD_START))
|
||||
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
echo "Build completed in ${BUILD_TIME} seconds"
|
||||
|
||||
# Find all trace JSON files for the target
|
||||
echo ""
|
||||
echo "Locating trace files..."
|
||||
|
||||
# Count trace files
|
||||
TRACE_COUNT=$(docker exec -e TARGET="${TARGET}" "${CONTAINER_NAME}" bash -c '
|
||||
find /workspace/build -type f \( -name "*.cpp.json" -o -name "*.hip.json" \) 2>/dev/null | \
|
||||
grep -vF "compile_commands.json" | wc -l
|
||||
')
|
||||
|
||||
if [ "$TRACE_COUNT" -eq 0 ]; then
|
||||
echo "Error: Could not find any trace files in /workspace/build"
|
||||
echo "Expected .cpp.json or .hip.json files from -ftime-trace compilation"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Found ${TRACE_COUNT} trace file(s) in build directory"
|
||||
|
||||
# We'll pass the build directory to the Python script
|
||||
BUILD_DIR="/workspace/build"
|
||||
|
||||
# Generate analysis report
|
||||
echo ""
|
||||
echo "Generating analysis report..."
|
||||
|
||||
# Copy analysis script and templates to container
|
||||
docker cp "${SCRIPT_DIR}/analyze_build_trace.py" "${CONTAINER_NAME}:/tmp/analyze_build_trace.py"
|
||||
docker cp "${SCRIPT_DIR}/templates" "${CONTAINER_NAME}:/tmp/ck_build_analysis_templates"
|
||||
|
||||
# Check if uv is available, install if needed, and use for PEP 723 dependency management
|
||||
if ! docker exec "${CONTAINER_NAME}" bash -c "command -v uv >/dev/null 2>&1 || test -x \$HOME/.local/bin/uv"; then
|
||||
echo "uv not found, installing via pipx..."
|
||||
docker exec "${CONTAINER_NAME}" bash -c "
|
||||
# Install pipx if not available
|
||||
if ! command -v pipx >/dev/null 2>&1; then
|
||||
apt-get update -qq && apt-get install -y -qq pipx >/dev/null 2>&1
|
||||
fi
|
||||
# Install uv via pipx
|
||||
pipx install uv >/dev/null 2>&1
|
||||
"
|
||||
echo "uv installed successfully"
|
||||
fi
|
||||
|
||||
echo "Using uv run for automatic dependency management..."
|
||||
# Ensure uv is in PATH (handles ~/.local/bin installation)
|
||||
# Pass build directory instead of single file
|
||||
docker exec -e BUILD_DIR="${BUILD_DIR}" -e OUTPUT_FILE="${OUTPUT_FILE}" -e TARGET="${TARGET}" -e GRANULARITY="${GRANULARITY}" -e BUILD_TIME="${BUILD_TIME}" "${CONTAINER_NAME}" bash -c 'export PATH="$HOME/.local/bin:$PATH" && uv run --no-project /tmp/analyze_build_trace.py "${BUILD_DIR}" "/workspace/${OUTPUT_FILE}" "${TARGET}" "${GRANULARITY}" "${BUILD_TIME}" /tmp/ck_build_analysis_templates'
|
||||
|
||||
# Copy report back to host
|
||||
docker cp "${CONTAINER_NAME}:/workspace/${OUTPUT_FILE}" "${PROJECT_ROOT}/${OUTPUT_FILE}"
|
||||
|
||||
# Cleanup
|
||||
docker exec "${CONTAINER_NAME}" rm -f /tmp/analyze_build_trace.py
|
||||
docker exec "${CONTAINER_NAME}" rm -rf /tmp/ck_build_analysis_templates
|
||||
|
||||
echo ""
|
||||
echo "═══════════════════════════════════════════════════════════════"
|
||||
echo " Analysis Complete!"
|
||||
echo "═══════════════════════════════════════════════════════════════"
|
||||
echo "Report: ${PROJECT_ROOT}/${OUTPUT_FILE}"
|
||||
echo ""
|
||||
echo "Summary:"
|
||||
docker exec "${CONTAINER_NAME}" bash -c "head -20 /workspace/${OUTPUT_FILE} | tail -10"
|
||||
echo ""
|
||||
echo "View the full report:"
|
||||
echo " cat ${OUTPUT_FILE}"
|
||||
echo " or open it in your editor"
|
||||
echo "═══════════════════════════════════════════════════════════════"
|
||||
294
script/tools/ck-docker
Executable file
294
script/tools/ck-docker
Executable file
@@ -0,0 +1,294 @@
|
||||
#!/bin/bash
|
||||
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
# CK Docker Tool - Build and test composable_kernel in Docker with ROCm support
|
||||
|
||||
set -e
|
||||
set -o pipefail
|
||||
|
||||
# Find script directory and load common utilities
|
||||
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
|
||||
source "${SCRIPT_DIR}/common.sh"
|
||||
|
||||
# Initialize configuration
|
||||
PROJECT_ROOT=$(get_project_root "${SCRIPT_DIR}")
|
||||
CONTAINER_NAME=$(get_container_name "${PROJECT_ROOT}")
|
||||
|
||||
# Help message
|
||||
show_help() {
|
||||
cat << EOF
|
||||
CK Docker Tool - Build and test composable_kernel in Docker
|
||||
|
||||
Usage: ck-docker <command> [options]
|
||||
|
||||
Commands:
|
||||
start [name] Start Docker container
|
||||
build [target] [--reconfigure] Build target (optionally reconfigure CMake)
|
||||
test <test> [options] Run test
|
||||
shell [name] Open shell in container
|
||||
status [name] Check container status
|
||||
stop [name] Stop and remove container
|
||||
|
||||
Examples:
|
||||
ck-docker start
|
||||
ck-docker build test_amdgcn_mma
|
||||
ck-docker build --reconfigure test_amdgcn_mma
|
||||
ck-docker test test_amdgcn_mma --gtest_filter=*Fp16*
|
||||
ck-docker shell
|
||||
|
||||
Environment:
|
||||
CK_CONTAINER_NAME - Override default container name (default: ck_<username>_<branch>)
|
||||
CK_DOCKER_IMAGE - Override Docker image (default: rocm/composable_kernel:ck_ub24.04_rocm7.0.1)
|
||||
GPU_TARGET - Override GPU target detection (e.g., gfx950, gfx942)
|
||||
EOF
|
||||
}
|
||||
|
||||
# Start container
|
||||
cmd_start() {
|
||||
local name="${1:-${CONTAINER_NAME}}"
|
||||
local docker_image=$(get_docker_image)
|
||||
|
||||
# Check if container exists and is running
|
||||
if container_exists "${name}"; then
|
||||
if container_is_running "${name}"; then
|
||||
echo "Container '${name}' is already running"
|
||||
return 0
|
||||
else
|
||||
echo "Starting existing container '${name}'..."
|
||||
docker start "${name}"
|
||||
echo "Container started"
|
||||
return 0
|
||||
fi
|
||||
fi
|
||||
|
||||
echo "Creating new Docker container '${name}'..."
|
||||
docker run -d \
|
||||
--name "${name}" \
|
||||
--device=/dev/kfd --device=/dev/dri \
|
||||
--security-opt seccomp=unconfined \
|
||||
--group-add video \
|
||||
-v "${PROJECT_ROOT}":/workspace \
|
||||
-w /workspace \
|
||||
"${docker_image}" \
|
||||
tail -f /dev/null
|
||||
|
||||
echo "Container '${name}' started successfully"
|
||||
docker exec "${name}" bash -c "echo 'Working directory:' && pwd"
|
||||
}
|
||||
|
||||
# Build target
|
||||
cmd_build() {
|
||||
local target=""
|
||||
local name="${CONTAINER_NAME}"
|
||||
local reconfigure=false
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case $1 in
|
||||
--name)
|
||||
name="$2"
|
||||
shift 2
|
||||
;;
|
||||
--reconfigure)
|
||||
reconfigure=true
|
||||
shift
|
||||
;;
|
||||
*)
|
||||
target="$1"
|
||||
shift
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
# Check if container is running
|
||||
if ! container_is_running "${name}"; then
|
||||
echo "Container '${name}' not running. Starting..."
|
||||
cmd_start "${name}"
|
||||
fi
|
||||
|
||||
# Reconfigure CMake if requested or if build.ninja doesn't exist
|
||||
if [ "$reconfigure" = true ] || ! docker exec "${name}" test -f /workspace/build/build.ninja 2>/dev/null; then
|
||||
echo "Detecting GPU target..."
|
||||
local gpu_target=$(detect_gpu_target "${name}")
|
||||
|
||||
if [ "$reconfigure" = true ]; then
|
||||
echo "Reconfiguring CMake from scratch for GPU target: ${gpu_target}"
|
||||
else
|
||||
echo "Configuring build with CMake for GPU target: ${gpu_target}"
|
||||
fi
|
||||
|
||||
docker exec "${name}" bash -c "
|
||||
cd /workspace || exit 1
|
||||
rm -rf /workspace/build
|
||||
mkdir /workspace/build
|
||||
cd /workspace/build || exit 1
|
||||
cmake .. -GNinja \
|
||||
-DGPU_TARGETS=${gpu_target} \
|
||||
-DCMAKE_BUILD_TYPE=Release \
|
||||
-DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ \
|
||||
-DBUILD_TESTING=ON 2>&1 | tail -30
|
||||
"
|
||||
fi
|
||||
|
||||
if [ -z "$target" ]; then
|
||||
echo "Building all configured targets..."
|
||||
else
|
||||
echo "Building target: ${target}"
|
||||
fi
|
||||
|
||||
docker exec "${name}" bash -c "
|
||||
cd /workspace/build || exit 1
|
||||
ninja ${target} 2>&1
|
||||
"
|
||||
|
||||
echo "Build complete"
|
||||
}
|
||||
|
||||
# Run test
|
||||
cmd_test() {
|
||||
local test_name=""
|
||||
local name="${CONTAINER_NAME}"
|
||||
local -a test_options=()
|
||||
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case $1 in
|
||||
--name)
|
||||
name="$2"
|
||||
shift 2
|
||||
;;
|
||||
--gtest_*|--help)
|
||||
test_options+=("$1")
|
||||
shift
|
||||
;;
|
||||
*)
|
||||
if [ -z "$test_name" ]; then
|
||||
test_name="$1"
|
||||
else
|
||||
test_options+=("$1")
|
||||
fi
|
||||
shift
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
if [ -z "$test_name" ]; then
|
||||
echo "Error: test_name required"
|
||||
echo "Usage: ck-docker test <test_name> [--name container_name] [gtest_options]"
|
||||
return 1
|
||||
fi
|
||||
|
||||
# Check if container is running
|
||||
if ! container_is_running "${name}"; then
|
||||
echo "Error: Container '${name}' not running"
|
||||
echo "Start it with: ck-docker start --name ${name}"
|
||||
return 1
|
||||
fi
|
||||
|
||||
if ! docker exec "${name}" test -f "/workspace/build/bin/${test_name}" 2>/dev/null; then
|
||||
echo "Test executable not found. Building ${test_name}..."
|
||||
cmd_build "${test_name}" --name "${name}"
|
||||
fi
|
||||
|
||||
echo "Running: ${test_name} ${test_options[*]}"
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
# Build the command with proper quoting
|
||||
local cmd="cd /workspace/build && ./bin/${test_name}"
|
||||
for opt in "${test_options[@]}"; do
|
||||
cmd="${cmd} $(printf '%q' "$opt")"
|
||||
done
|
||||
docker exec "${name}" bash -c "${cmd}"
|
||||
}
|
||||
|
||||
# Shell
|
||||
cmd_shell() {
|
||||
local name="${1:-${CONTAINER_NAME}}"
|
||||
|
||||
# Check if container is running
|
||||
if ! container_is_running "${name}"; then
|
||||
echo "Container '${name}' not running. Starting..."
|
||||
cmd_start "${name}"
|
||||
fi
|
||||
|
||||
echo "Opening shell in '${name}' (type 'exit' to leave)..."
|
||||
docker exec -it "${name}" bash
|
||||
}
|
||||
|
||||
# Status
|
||||
cmd_status() {
|
||||
local name="${1:-}"
|
||||
local docker_image=$(get_docker_image)
|
||||
|
||||
if [ -z "$name" ]; then
|
||||
echo "Composable Kernel Docker Containers:"
|
||||
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
|
||||
docker ps -a --filter "ancestor=${docker_image}" \
|
||||
--format "table {{.Names}}\t{{.Status}}\t{{.CreatedAt}}" || echo "No containers found"
|
||||
else
|
||||
# Check container status
|
||||
if container_is_running "${name}"; then
|
||||
echo "Container '${name}' is RUNNING"
|
||||
docker ps --filter "name=^${name}$" --format "table {{.Names}}\t{{.Status}}\t{{.Image}}"
|
||||
echo ""
|
||||
echo "GPU Information:"
|
||||
docker exec "${name}" bash -c "rocm-smi --showproductname 2>/dev/null | head -10 || echo 'No GPU detected'"
|
||||
elif container_exists "${name}"; then
|
||||
echo "Container '${name}' exists but is STOPPED"
|
||||
echo "Start with: ck-docker start ${name}"
|
||||
else
|
||||
echo "Container '${name}' does NOT exist"
|
||||
echo "Create with: ck-docker start ${name}"
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
# Stop
|
||||
cmd_stop() {
|
||||
local name="${1:-${CONTAINER_NAME}}"
|
||||
|
||||
# Check if container exists
|
||||
if container_exists "${name}"; then
|
||||
echo "Stopping and removing container '${name}'..."
|
||||
docker stop "${name}" 2>/dev/null || true
|
||||
docker rm "${name}" 2>/dev/null || true
|
||||
echo "Container stopped and removed"
|
||||
else
|
||||
echo "Container '${name}' does not exist"
|
||||
fi
|
||||
}
|
||||
|
||||
# Main command dispatcher
|
||||
case "${1:-}" in
|
||||
start)
|
||||
shift
|
||||
cmd_start "$@"
|
||||
;;
|
||||
build)
|
||||
shift
|
||||
cmd_build "$@"
|
||||
;;
|
||||
test)
|
||||
shift
|
||||
cmd_test "$@"
|
||||
;;
|
||||
shell)
|
||||
shift
|
||||
cmd_shell "$@"
|
||||
;;
|
||||
status)
|
||||
shift
|
||||
cmd_status "$@"
|
||||
;;
|
||||
stop)
|
||||
shift
|
||||
cmd_stop "$@"
|
||||
;;
|
||||
help|--help|-h)
|
||||
show_help
|
||||
;;
|
||||
*)
|
||||
echo "Unknown command: ${1:-}"
|
||||
echo ""
|
||||
show_help
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
97
script/tools/common.sh
Normal file
97
script/tools/common.sh
Normal file
@@ -0,0 +1,97 @@
|
||||
#!/bin/bash
|
||||
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
# SPDX-License-Identifier: MIT
|
||||
|
||||
# Common utilities for CK Docker tools
|
||||
# Shared configuration and helper functions
|
||||
|
||||
# Find project root (where .git directory is)
|
||||
get_project_root() {
|
||||
local script_dir="$1"
|
||||
cd "${script_dir}/../.." && pwd
|
||||
}
|
||||
|
||||
# Detect git branch and sanitize for Docker naming
|
||||
get_sanitized_branch() {
|
||||
local project_root="$1"
|
||||
local branch
|
||||
|
||||
branch=$(cd "${project_root}" && git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '_' | tr -cd 'a-zA-Z0-9_-' || echo "")
|
||||
branch=${branch:-unknown}
|
||||
|
||||
# Handle detached HEAD state
|
||||
if [ "${branch}" = "HEAD" ]; then
|
||||
branch="detached"
|
||||
fi
|
||||
|
||||
echo "${branch}"
|
||||
}
|
||||
|
||||
# Get username with fallback
|
||||
get_username() {
|
||||
echo "${USER:-$(whoami 2>/dev/null || echo "user")}"
|
||||
}
|
||||
|
||||
# Generate default container name: ck_<username>_<branch>
|
||||
get_default_container_name() {
|
||||
local project_root="$1"
|
||||
local user_name
|
||||
local git_branch
|
||||
|
||||
user_name=$(get_username)
|
||||
git_branch=$(get_sanitized_branch "${project_root}")
|
||||
|
||||
echo "ck_${user_name}_${git_branch}"
|
||||
}
|
||||
|
||||
# Get container name (respects CK_CONTAINER_NAME env var)
|
||||
get_container_name() {
|
||||
local project_root="$1"
|
||||
local default_name
|
||||
|
||||
default_name=$(get_default_container_name "${project_root}")
|
||||
echo "${CK_CONTAINER_NAME:-${default_name}}"
|
||||
}
|
||||
|
||||
# Get Docker image (respects CK_DOCKER_IMAGE env var)
|
||||
get_docker_image() {
|
||||
echo "${CK_DOCKER_IMAGE:-rocm/composable_kernel:ck_ub24.04_rocm7.0.1}"
|
||||
}
|
||||
|
||||
# Check if container exists (exact match)
|
||||
container_exists() {
|
||||
local name="$1"
|
||||
docker ps -a --filter "name=^${name}$" --format '{{.Names}}' | grep -q "^${name}$"
|
||||
}
|
||||
|
||||
# Check if container is running (exact match)
|
||||
container_is_running() {
|
||||
local name="$1"
|
||||
docker ps --filter "name=^${name}$" --format '{{.Names}}' | grep -q "^${name}$"
|
||||
}
|
||||
|
||||
# Detect GPU target in container
|
||||
detect_gpu_target() {
|
||||
local container="$1"
|
||||
|
||||
# Allow override via GPU_TARGET environment variable
|
||||
if [ -n "${GPU_TARGET:-}" ]; then
|
||||
echo "${GPU_TARGET}"
|
||||
return 0
|
||||
fi
|
||||
|
||||
docker exec "${container}" bash -c "
|
||||
rocminfo 2>/dev/null | grep -oP 'gfx[0-9a-z]+' | head -1 || echo 'gfx950'
|
||||
" | tr -d '\r\n'
|
||||
}
|
||||
|
||||
# Ensure container is running, start if needed
|
||||
ensure_container_running() {
|
||||
local container="$1"
|
||||
local script_dir="$2"
|
||||
|
||||
if ! container_is_running "${container}"; then
|
||||
echo "Container '${container}' not running. Starting with ck-docker..."
|
||||
"${script_dir}/ck-docker" start "${container}"
|
||||
fi
|
||||
}
|
||||
125
script/tools/templates/build_analysis_report.md.jinja
Normal file
125
script/tools/templates/build_analysis_report.md.jinja
Normal file
@@ -0,0 +1,125 @@
|
||||
# Composable Kernel Build Time Analysis Report
|
||||
|
||||
**Generated:** {{ timestamp }}
|
||||
**Target:** {{ target }}
|
||||
**Granularity:** {{ granularity }}µs
|
||||
**Files Analyzed:** {{ num_files }}
|
||||
|
||||
## Executive Summary
|
||||
|
||||
- **Wall Clock Time:** {{ build_time }} seconds
|
||||
- **Trace Time:** {{ total_trace_time|us_to_s|round(1) }} seconds
|
||||
- **Template Instantiation Time:** {{ total_template_time|us_to_s|round(1) }} seconds ({{ (100 * total_template_time / total_trace_time)|round(1) }}% of trace)
|
||||
- **Total Events Captured:** {{ total_events|format_number }} (across {{ num_files }} file{{ 's' if num_files != 1 else '' }})
|
||||
- **Total Template Instantiations:** {{ total_instantiations|format_number }}
|
||||
- **Unique Template Families:** {{ unique_families }}
|
||||
|
||||
{% if num_files > 1 -%}
|
||||
## Per-File Analysis
|
||||
|
||||
| File | Events | Template Time (ms) | % of Total |
|
||||
|------|--------|-------------------|------------|
|
||||
{% for file in file_stats[:20] -%}
|
||||
| {{ file.name|truncate(50)|pad(50) }} | {{ "%7d"|format(file.events) }} | {{ "%17.2f"|format(file.template_time|us_to_ms) }} | {{ "%9.1f"|format(100 * file.template_time / total_template_time if total_template_time > 0 else 0) }}% |
|
||||
{% endfor %}
|
||||
|
||||
{% endif -%}
|
||||
## Compilation Phase Breakdown
|
||||
|
||||
| Phase | Time (ms) | Time (s) | % of Total |
|
||||
|-------|-----------|----------|------------|
|
||||
{% for phase, dur in phases[:20] -%}
|
||||
| {{ phase|pad(40) }} | {{ "%9.2f"|format(dur|us_to_ms) }} | {{ "%8.2f"|format(dur|us_to_s) }} | {{ "%9.1f"|format(100 * dur / total_trace_time) }}% |
|
||||
{% endfor %}
|
||||
|
||||
## Top 30 Most Expensive Individual Instantiations
|
||||
|
||||
{% if num_files > 1 -%}
|
||||
| Rank | Template | Type | Time (ms) | File |
|
||||
|------|----------|------|-----------|------|
|
||||
{% for inst in top_individual[:30] -%}
|
||||
| {{ "%4d"|format(loop.index) }} | {{ inst.detail|truncate(50) }} | {{ inst.inst_type|pad(5) }} | {{ "%9.2f"|format(inst.dur|us_to_ms) }} | {{ inst.file|truncate(20) }} |
|
||||
{% endfor -%}
|
||||
{% else -%}
|
||||
| Rank | Template | Type | Time (ms) |
|
||||
|------|----------|------|-----------|
|
||||
{% for inst in top_individual[:30] -%}
|
||||
| {{ "%4d"|format(loop.index) }} | {{ inst.detail|truncate(70) }} | {{ inst.inst_type|pad(5) }} | {{ "%9.2f"|format(inst.dur|us_to_ms) }} |
|
||||
{% endfor -%}
|
||||
{% endif %}
|
||||
|
||||
## Template Families by Total Time (Top 50)
|
||||
|
||||
| Rank | Template Family | Count | Total (ms) | Avg (ms) | % of Total |
|
||||
|------|-----------------|-------|------------|----------|------------|
|
||||
{% for name, stats in templates_by_time[:50] -%}
|
||||
| {{ "%4d"|format(loop.index) }} | {{ name|truncate(43)|pad(43) }} | {{ "%5d"|format(stats.count) }} | {{ "%10.2f"|format(stats.total_dur|us_to_ms) }} | {{ "%8.2f"|format(stats.avg|us_to_ms) }} | {{ "%9.1f"|format(stats.pct) }}% |
|
||||
{% endfor %}
|
||||
|
||||
## Template Families by Instantiation Count (Top 50)
|
||||
|
||||
| Rank | Template Family | Count | Total (ms) | Avg (ms) |
|
||||
|------|-----------------|-------|------------|----------|
|
||||
{% for name, stats in templates_by_count[:50] -%}
|
||||
| {{ "%4d"|format(loop.index) }} | {{ name|truncate(43)|pad(43) }} | {{ "%5d"|format(stats.count) }} | {{ "%10.2f"|format(stats.total_dur|us_to_ms) }} | {{ "%8.2f"|format(stats.avg|us_to_ms) }} |
|
||||
{% endfor %}
|
||||
|
||||
## Key Insights
|
||||
|
||||
### 1. Template Instantiation Impact
|
||||
- Template instantiation accounts for {{ (100 * total_template_time / total_trace_time)|round(1) }}% of total trace time
|
||||
{% if unique_families >= 10 -%}
|
||||
- Top 10 template families account for {{ top10_pct|round(1) }}% of instantiation time
|
||||
{% endif %}
|
||||
|
||||
### 2. Most Expensive Templates
|
||||
{% if templates_by_time|length > 0 -%}
|
||||
- **{{ templates_by_time[0][0] }}**: {{ templates_by_time[0][1].count|format_number }} instantiations, {{ (templates_by_time[0][1].total_dur|us_to_s)|round(2) }}s total
|
||||
{% endif -%}
|
||||
{% if templates_by_time|length > 1 -%}
|
||||
- **{{ templates_by_time[1][0] }}**: {{ templates_by_time[1][1].count|format_number }} instantiations, {{ (templates_by_time[1][1].avg|us_to_ms)|round(2) }}ms average
|
||||
{% endif %}
|
||||
|
||||
## Optimization Recommendations
|
||||
|
||||
### High-Impact Targets (by total time)
|
||||
{% for name, stats in templates_by_time[:5] -%}
|
||||
**{{ loop.index }}. {{ name }}** - {{ (stats.total_dur|us_to_s)|round(1) }}s total ({{ stats.pct|round(1) }}%)
|
||||
- {{ stats.count|format_number }} instantiations, {{ (stats.avg|us_to_ms)|round(2) }}ms average
|
||||
{% if stats.count > 100 -%}
|
||||
- Strategy: Extern templates - High instantiation count suggests repeated compilation
|
||||
{% elif stats.avg|us_to_ms > 50 -%}
|
||||
- Strategy: Template specialization - High individual cost suggests complexity
|
||||
{% else -%}
|
||||
- Strategy: Explicit instantiation - Pre-instantiate common configurations
|
||||
{% endif %}
|
||||
|
||||
{% endfor %}
|
||||
### Frequently Instantiated (optimization candidates)
|
||||
{% for name, stats in templates_by_count[:5] if stats.count > 100 -%}
|
||||
**{{ name }}** - {{ stats.count|format_number }} times ({{ (stats.total_dur|us_to_s)|round(2) }}s total)
|
||||
- Consider: Precompiled headers or extern templates to avoid recompilation
|
||||
|
||||
{% endfor %}
|
||||
### Most Expensive Individual Instantiations
|
||||
{% for inst in top_individual[:3] -%}
|
||||
**{{ loop.index }}. {{ inst.detail|truncate(60) }}** - {{ (inst.dur|us_to_ms)|round(1) }}ms
|
||||
- Strategy: Profile and simplify this specific instantiation
|
||||
|
||||
{% endfor %}
|
||||
|
||||
## Detailed Statistics
|
||||
|
||||
- **Total Unique Templates:** {{ unique_families }}
|
||||
- **Total Instantiations:** {{ total_instantiations|format_number }}
|
||||
{% if total_instantiations > 0 -%}
|
||||
- **Average Instantiation Time:** {{ ((total_template_time // total_instantiations)|us_to_ms)|round(3) }}ms
|
||||
{% endif -%}
|
||||
{% if unique_families > 0 -%}
|
||||
- **Median Template Family Count:** {{ median_count }}
|
||||
{% endif %}
|
||||
|
||||
---
|
||||
|
||||
*Report generated using Clang -ftime-trace with {{ granularity }}µs granularity*
|
||||
*Analysis tool: ck-build-analysis*
|
||||
Reference in New Issue
Block a user