From f6d1bb77e09184b3c1bb1d53a3538e20872f6222 Mon Sep 17 00:00:00 2001 From: Max Podkorytov <4273004+tenpercent@users.noreply.github.com> Date: Thu, 15 Jan 2026 08:30:23 -0800 Subject: [PATCH] Add LLM-agnostic Docker and build analysis tools (#3576) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit This commit introduces utility tools for building, testing, and analyzing Composable Kernel. The tools are designed to be LLM-agnostic and can be used with any AI assistant or directly from the command line. Tools Added: ============ 1. ck-docker - Docker container management - Start/stop ROCm-enabled containers - Build targets with CMake + Ninja - Run tests with gtest filters - Auto-detect GPU targets (gfx950, gfx942, etc.) - Per-user, per-branch container naming to avoid conflicts 2. ck-build-analysis - Build time profiling - Uses Clang's -ftime-trace for compilation analysis - Aggregates statistics across multiple trace files - Identifies template instantiation bottlenecks - Generates detailed Markdown reports with: * Compilation phase breakdown * Top expensive instantiations * Template family analysis * Data-driven optimization recommendations - Configurable granularity (1µs to 500µs) - PEP 723 compliant Python script with auto-dependency management via uv Key Features: ============= - LLM-agnostic design (works with any AI assistant) - Zero-configuration setup with automatic dependency installation - Comprehensive documentation in script/tools/README*.md - Security hardening (input validation, no command injection) - Multi-file trace aggregation for accurate build analysis - Jinja2-based report generation for customizable output Implementation: =============== - script/tools/ck-docker - Main Docker orchestration script - script/tools/ck-build-analysis - Build analysis orchestration - script/tools/common.sh - Shared utilities (container mgmt, GPU detection) - script/tools/analyze_build_trace.py - PEP 723 compliant Python analyzer - script/tools/templates/ - Jinja2 templates for report generation - script/tools/README*.md - Comprehensive documentation Directory Structure: ==================== script/tools/ ├── README.md # Main overview ├── README_ck-docker.md # ck-docker documentation ├── README_ck-build-analysis.md # ck-build-analysis documentation ├── ck-docker # Docker orchestration script ├── ck-build-analysis # Build analysis orchestration ├── common.sh # Shared utilities ├── analyze_build_trace.py # Python analyzer (PEP 723) └── templates/ └── build_analysis_report.md.jinja # Report template The tools follow Unix philosophy: do one thing well, compose easily, and work from both CLI and programmatic contexts. [ROCm/composable_kernel commit: 086a1f8861ef8c81db854e7f2749458b69121617] --- script/tools/README.md | 78 ++++ script/tools/README_ck-build-analysis.md | 168 +++++++++ script/tools/README_ck-docker.md | 80 ++++ script/tools/analyze_build_trace.py | 347 ++++++++++++++++++ script/tools/ck-build-analysis | 237 ++++++++++++ script/tools/ck-docker | 294 +++++++++++++++ script/tools/common.sh | 97 +++++ .../templates/build_analysis_report.md.jinja | 125 +++++++ 8 files changed, 1426 insertions(+) create mode 100644 script/tools/README.md create mode 100644 script/tools/README_ck-build-analysis.md create mode 100644 script/tools/README_ck-docker.md create mode 100755 script/tools/analyze_build_trace.py create mode 100755 script/tools/ck-build-analysis create mode 100755 script/tools/ck-docker create mode 100644 script/tools/common.sh create mode 100644 script/tools/templates/build_analysis_report.md.jinja diff --git a/script/tools/README.md b/script/tools/README.md new file mode 100644 index 0000000000..e5bf91cedc --- /dev/null +++ b/script/tools/README.md @@ -0,0 +1,78 @@ +# Composable Kernel Tools + +This directory contains utility tools for building, testing, and analyzing Composable Kernel. + +These tools are designed to be LLM-agnostic and can be used with any AI assistant or directly from the command line. + +## Available Tools + +### ck-docker + +Build and test composable_kernel in Docker with ROCm support. + +See [README_ck-docker.md](README_ck-docker.md) for details. + +**Quick start:** +```bash +# Add to PATH +export PATH="$PATH:$PWD/script/tools" + +# Start container and build +ck-docker start +ck-docker build test_amdgcn_mma +ck-docker test test_amdgcn_mma +``` + +### ck-build-analysis + +Analyze Composable Kernel build times using Clang's -ftime-trace profiler. + +See [README_ck-build-analysis.md](README_ck-build-analysis.md) for details. + +**Quick start:** +```bash +# Add to PATH +export PATH="$PATH:$PWD/script/tools" + +# Analyze build time +ck-build-analysis example_convnd_fwd_xdl_fp8 +``` + +## LLM Assistant Integration + +These tools can be used as-is with any LLM assistant by providing the tool documentation to the assistant. The assistant can then invoke these tools on your behalf. + +For example, you can ask: +- "Start the docker container" +- "Build and test test_amdgcn_mma" +- "Analyze build time for example_convnd_fwd_xdl_fp8" + +The assistant will translate your natural language request into the appropriate tool invocation. + +## Dependencies + +- **ck-docker**: Requires Docker and ROCm-capable GPU (for running tests) +- **ck-build-analysis**: Requires Docker, automatically installs Python dependencies (jinja2) via `uv` + +## Directory Structure + +``` +script/tools/ +├── README.md # This file +├── README_ck-docker.md # Documentation for ck-docker +├── README_ck-build-analysis.md # Documentation for ck-build-analysis +├── ck-docker # Docker container management tool +├── ck-build-analysis # Build time analysis tool +├── common.sh # Shared utilities for bash scripts +├── analyze_build_trace.py # Python script for trace analysis (PEP 723 compliant) +└── templates/ + └── build_analysis_report.md.jinja # Jinja2 template for analysis reports +``` + +## Contributing + +When adding new tools to this directory: +1. Keep them LLM-agnostic (avoid hardcoding references to specific AI assistants) +2. Provide clear command-line usage documentation +3. Include examples for both CLI and LLM assistant usage +4. Follow the existing naming convention and structure diff --git a/script/tools/README_ck-build-analysis.md b/script/tools/README_ck-build-analysis.md new file mode 100644 index 0000000000..d52e4eb2c7 --- /dev/null +++ b/script/tools/README_ck-build-analysis.md @@ -0,0 +1,168 @@ +# ck-build-analysis + +Analyze Composable Kernel build times using Clang's -ftime-trace profiler. + +## Terminal Usage + +Direct command-line usage: + +```bash +# From composable_kernel directory +script/tools/ck-build-analysis example_convnd_fwd_xdl_fp8 +script/tools/ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=1 +script/tools/ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=1 --output=my_report.md + +# Or add to PATH +export PATH="$PATH:$PWD/script/tools" +ck-build-analysis example_convnd_fwd_xdl_fp8 +``` + +## LLM Assistant Integration + +If using an LLM assistant, you can ask in natural language: +- "Analyze build time for example_convnd_fwd_xdl_fp8" +- "Profile the compilation of test_amdgcn_mma with 1us granularity" +- "Generate a build time report for example_gemm_xdl" + +## Commands + +``` +ck-build-analysis [options] + +Options: + --granularity=N Time trace granularity in microseconds (default: 1) + --output=FILE Output report filename (default: build_time_analysis_report.md) + --name=NAME Docker container name (default: from CK_CONTAINER_NAME or auto-generated) + --no-reconfigure Skip CMake reconfiguration if build exists + --help Show this help message +``` + +## What It Does + +1. **Configures CMake** with `-ftime-trace` and custom granularity +2. **Builds the target** using Ninja in Docker +3. **Analyzes the trace** JSON file for template instantiation patterns +4. **Generates a report** with: + - Compilation phase breakdown + - Top expensive individual instantiations + - Template families ranked by total time and count + - Key insights and optimization recommendations + - Complete statistics + +## Configuration + +- **Container**: Uses ck-docker container (auto-starts if needed) +- **Granularity**: Default 1us (100% template coverage, best balance) +- **Output**: Markdown report in project root + +## Environment + +```bash +export CK_CONTAINER_NAME=my_build # Override container name +export CK_BUILD_ANALYSIS_GRANULARITY=1 # Default granularity in microseconds +``` + +## Examples + +```bash +# Complete template analysis with default granularity (1us - recommended) +ck-build-analysis example_convnd_fwd_xdl_fp8 + +# Quick daily check (10us granularity, captures most expensive templates) +ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=10 + +# Maximum detail (0us granularity, includes LLVM internals) +ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=0 + +# High-level overview (500us granularity, major bottlenecks only) +ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=500 + +# Custom output filename +ck-build-analysis example_convnd_fwd_xdl_fp8 --output=fp8_conv_analysis.md + +# Analyze test target +ck-build-analysis test_amdgcn_mma + +# Use existing build (skip reconfigure) +ck-build-analysis example_convnd_fwd_xdl_fp8 --no-reconfigure +``` + +## Output + +The report includes: +- **Executive Summary**: Total time, events, instantiations, unique templates +- **Compilation Phases**: InstantiateFunction, Frontend, Backend, Optimizer, etc. +- **Top 30 Individual Instantiations**: Most expensive single templates +- **Template Families**: Grouped by total time and instantiation count +- **Key Insights**: What's slow and why +- **Optimization Recommendations**: Short, medium, and long-term strategies +- **Detailed Statistics**: Averages, medians, distributions + +## Granularity Trade-offs + +| Granularity | Template Coverage | Use Case | +|-------------|-------------------|----------| +| **0us** | All templates + sub-us compiler internals | LLVM internals debugging, very large files, higher overhead | +| **1us (default)** | **All templates** | **Default: Complete template analysis with low overhead** | +| **10us** | Most expensive templates | Daily quick checks, smaller files, minimal overhead | +| **50-100us** | Top bottlenecks | Balanced detail/size, suitable for CI/CD | +| **500us** | High-level phases only | Not recommended for template analysis | + +**Recommended default**: 1us captures all template instantiations with minimal overhead + +## Notes + +- **0us and 1us capture all templates** - 0us adds sub-microsecond compiler internals +- **1us is the sweet spot**: complete template coverage, filters noise, low overhead +- **10us is practical** for daily use: captures most expensive templates, smaller files +- **500us loses most template instantiation data** - only use for high-level phase breakdown +- Finer granularity = more events = larger files + higher build time overhead +- For template-heavy C++ codebases like CK: **use 1us for analysis, 10us for daily checks** + +## Implementation Details + +### PEP 723 Compliance with Automatic Dependency Management + +The analysis script (`analyze_build_trace.py`) is PEP 723 compliant with inline dependency metadata: + +```python +# /// script +# requires-python = ">=3.8" +# dependencies = [ +# "jinja2>=3.0.0", +# ] +# /// +``` + +**The tool automatically installs and uses `uv`**, which provides: +- ✅ Zero-configuration dependency management +- ✅ Automatic installation of jinja2 from PEP 723 metadata +- ✅ Isolated dependency environment (no system pollution) +- ✅ Fast caching for subsequent runs + +**No manual setup required!** The first time you run the tool, it will: +1. Detect if `uv` is installed in the container +2. If not, automatically install it via Ubuntu packages (pipx install uv) +3. Use `uv run` to execute the analysis with auto-managed dependencies + +On subsequent runs, `uv` will already be available and dependencies will be cached. + +Installation is done through Ubuntu's package manager for security and reliability. + +### Components + +- **ck-build-analysis** - Main bash script that orchestrates Docker, CMake, and analysis +- **analyze_build_trace.py** - PEP 723 compliant Python script for trace analysis +- **templates/build_analysis_report.md.jinja** - Jinja2 template for report generation + +### Standalone Usage + +The Python script can also be run independently: + +```bash +# With uv (recommended - auto-installs dependencies from PEP 723 metadata) +uv run script/tools/analyze_build_trace.py trace.json report.md target 100 22 templates/ + +# With pipx (alternative - also auto-installs dependencies) +pipx run script/tools/analyze_build_trace.py trace.json report.md target 100 22 templates/ +``` diff --git a/script/tools/README_ck-docker.md b/script/tools/README_ck-docker.md new file mode 100644 index 0000000000..c432c1dba9 --- /dev/null +++ b/script/tools/README_ck-docker.md @@ -0,0 +1,80 @@ +# ck-docker + +Build and test composable_kernel in Docker with ROCm support. + +## Terminal Usage + +Direct command-line usage: + +```bash +# From composable_kernel directory +script/tools/ck-docker start +script/tools/ck-docker build test_amdgcn_mma +script/tools/ck-docker test test_amdgcn_mma --gtest_filter=*Fp16* +script/tools/ck-docker status +script/tools/ck-docker shell + +# Or add to PATH +export PATH="$PATH:$PWD/script/tools" +ck-docker start +``` + +## LLM Assistant Integration + +If using an LLM assistant, you can ask in natural language: +- "Start the docker container" +- "Build test_amdgcn_mma" +- "Run test_amdgcn_mma with filter *Fp16*" +- "Check container status" +- "Open a shell in the container" + +## Commands + +``` +ck-docker start [name] Start Docker container +ck-docker build [target] [--reconfigure] Build target (optionally reconfigure CMake) +ck-docker test [options] Run test +ck-docker shell [name] Interactive shell +ck-docker status [name] Check status +ck-docker stop [name] Stop container +``` + +## Configuration + +- **Image**: rocm/composable_kernel:ck_ub24.04_rocm7.0.1 +- **GPU**: Auto-detected via rocminfo (fallback: gfx950) +- **Compiler**: /opt/rocm/llvm/bin/clang++ +- **Build**: Ninja + CMake (Release) +- **Mount**: Current directory → /workspace +- **Container Name**: Auto-generated as `ck__` to avoid clashes + +## Environment + +```bash +export CK_CONTAINER_NAME=my_build # Override default container name +export CK_DOCKER_IMAGE=rocm/composable_kernel:ck_ub24.04_rocm7.0.1 # Override Docker image +export GPU_TARGET=gfx942 # Override GPU target detection +``` + +## Examples + +```bash +# Start container +ck-docker start + +# Build and run test +ck-docker build test_amdgcn_mma +ck-docker test test_amdgcn_mma + +# Force clean CMake reconfiguration and build +ck-docker build --reconfigure test_amdgcn_mma + +# Custom container +ck-docker start my_build +ck-docker build test_amdgcn_mma --name my_build +ck-docker test test_amdgcn_mma --name my_build + +# Debug +ck-docker shell +ck-docker status +``` diff --git a/script/tools/analyze_build_trace.py b/script/tools/analyze_build_trace.py new file mode 100755 index 0000000000..3597132f32 --- /dev/null +++ b/script/tools/analyze_build_trace.py @@ -0,0 +1,347 @@ +#!/usr/bin/env python3 +# Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +# SPDX-License-Identifier: MIT + +# /// script +# requires-python = ">=3.8" +# dependencies = [ +# "jinja2>=3.0.0", +# ] +# /// +""" +Build Time Analysis Tool for Composable Kernel + +Analyzes Clang -ftime-trace output to identify template instantiation +bottlenecks and generate comprehensive build time reports. +""" + +import json +import os +import re +import sys +from collections import defaultdict +from datetime import datetime + +try: + from jinja2 import Environment, FileSystemLoader +except ImportError: + print("Error: jinja2 is required but not installed.", file=sys.stderr) + print("Install with: apt-get install python3-jinja2", file=sys.stderr) + print("Or with pip: pip install jinja2", file=sys.stderr) + sys.exit(1) + + +def parse_arguments(): + """Parse command-line arguments.""" + if len(sys.argv) < 7: + print( + "Usage: analyze_build_trace.py " + ) + print( + " trace_files_or_dir: Comma-separated list of trace files OR directory containing .json files" + ) + sys.exit(1) + + return { + "trace_input": sys.argv[1], + "output_file": sys.argv[2], + "target": sys.argv[3], + "granularity": sys.argv[4], + "build_time": sys.argv[5], + "template_dir": sys.argv[6], + } + + +def find_trace_files(trace_input): + """Find all trace files from input (file list, single file, or directory).""" + trace_files = [] + + # Check if it's a directory + if os.path.isdir(trace_input): + print(f"Scanning directory: {trace_input}") + for root, dirs, files in os.walk(trace_input): + for file in files: + # Include .cpp.json and .hip.json, exclude compile_commands.json and CMake files + if file.endswith((".cpp.json", ".hip.json")) and "CMakeFiles" in root: + trace_files.append(os.path.join(root, file)) + trace_files.sort() + # Check if it's a comma-separated list + elif "," in trace_input: + trace_files = [f.strip() for f in trace_input.split(",")] + # Single file + else: + trace_files = [trace_input] + + # Filter out non-existent files + valid_files = [f for f in trace_files if os.path.isfile(f)] + + if not valid_files: + print(f"Error: No valid trace files found in: {trace_input}", file=sys.stderr) + sys.exit(1) + + print(f"Found {len(valid_files)} trace file(s)") + return valid_files + + +def load_trace_data(trace_files): + """Load and parse multiple trace JSON files.""" + all_data = [] + + for trace_file in trace_files: + print(f" Loading: {trace_file}") + try: + with open(trace_file, "r") as f: + data = json.load(f) + # Get file basename for tracking + file_name = os.path.basename(trace_file) + all_data.append({"file": file_name, "path": trace_file, "data": data}) + except Exception as e: + print(f" Warning: Failed to load {trace_file}: {e}", file=sys.stderr) + + return all_data + + +def process_events(all_trace_data): + """Process trace events from multiple files and extract statistics.""" + print("Processing events from all files...") + + template_stats = defaultdict(lambda: {"count": 0, "total_dur": 0}) + phase_stats = defaultdict(int) + top_individual = [] + file_stats = [] + total_events = 0 + + for trace_info in all_trace_data: + file_name = trace_info["file"] + data = trace_info["data"] + events = data.get("traceEvents", []) + + file_template_time = 0 + file_event_count = len(events) + total_events += file_event_count + + print(f" Processing {file_name}: {file_event_count:,} events") + + for event in events: + name = event.get("name", "") + dur = int(event.get("dur", 0)) # Keep as integer microseconds + + if name and dur > 0: + phase_stats[name] += dur + + if name in ["InstantiateFunction", "InstantiateClass"]: + detail = event.get("args", {}).get("detail", "") + top_individual.append( + {"detail": detail, "dur": dur, "type": name, "file": file_name} + ) + + file_template_time += dur + + # Extract template name (everything before '<' or '(') + match = re.match(r"^([^<(]+)", detail) + if match: + template_name = match.group(1).strip() + # Normalize template names + template_name = re.sub(r"^ck::", "", template_name) + template_name = re.sub(r"^std::", "std::", template_name) + + template_stats[template_name]["count"] += 1 + template_stats[template_name]["total_dur"] += dur + + file_stats.append( + { + "name": file_name, + "events": file_event_count, + "template_time": file_template_time, + } + ) + + return template_stats, phase_stats, top_individual, file_stats, total_events + + +def prepare_template_data(template_stats, phase_stats, top_individual, file_stats): + """Prepare and calculate derived statistics for template rendering.""" + print("Sorting data...") + + # Sort data + sorted_phases = sorted(phase_stats.items(), key=lambda x: x[1], reverse=True) + top_individual.sort(key=lambda x: x["dur"], reverse=True) + file_stats.sort(key=lambda x: x["template_time"], reverse=True) + + # Calculate totals + total_template_time = sum(s["total_dur"] for s in template_stats.values()) + total_trace_time = sum(phase_stats.values()) + total_inst = sum(s["count"] for s in template_stats.values()) + + # Prepare templates by time with calculated fields + templates_by_time = [] + for name, stats in sorted( + template_stats.items(), key=lambda x: x[1]["total_dur"], reverse=True + ): + templates_by_time.append( + ( + name, + { + "count": stats["count"], + "total_dur": stats["total_dur"], + "avg": stats["total_dur"] // stats["count"] + if stats["count"] > 0 + else 0, + "pct": 100 * stats["total_dur"] / total_template_time + if total_template_time > 0 + else 0, + }, + ) + ) + + # Prepare templates by count + templates_by_count = [] + for name, stats in sorted( + template_stats.items(), key=lambda x: x[1]["count"], reverse=True + ): + templates_by_count.append( + ( + name, + { + "count": stats["count"], + "total_dur": stats["total_dur"], + "avg": stats["total_dur"] // stats["count"] + if stats["count"] > 0 + else 0, + }, + ) + ) + + # Add friendly type names to individual instantiations + for inst in top_individual: + inst["inst_type"] = "Func" if inst["type"] == "InstantiateFunction" else "Class" + + # Calculate additional metrics + median_count = 0 + if len(template_stats) > 0: + median_count = sorted([s["count"] for s in template_stats.values()])[ + len(template_stats) // 2 + ] + + top10_pct = 0 + if len(templates_by_time) >= 10: + top10_pct = ( + 100 + * sum(s[1]["total_dur"] for s in templates_by_time[:10]) + / total_template_time + ) + + return { + "sorted_phases": sorted_phases, + "top_individual": top_individual, + "templates_by_time": templates_by_time, + "templates_by_count": templates_by_count, + "total_template_time": total_template_time, + "total_trace_time": total_trace_time, + "total_inst": total_inst, + "median_count": median_count, + "top10_pct": top10_pct, + "unique_families": len(template_stats), + "file_stats": file_stats, + } + + +def setup_jinja_environment(template_dir): + """Set up Jinja2 environment with custom filters.""" + env = Environment(loader=FileSystemLoader(template_dir)) + + def format_number(value): + """Format number with thousand separators.""" + return f"{value:,}" + + def truncate(value, length): + """Truncate string to length with ellipsis.""" + if len(value) > length: + return value[: length - 3] + "..." + return value + + def pad(value, length): + """Pad string to specified length.""" + return f"{value:<{length}}" + + def us_to_ms(value): + """Convert microseconds to milliseconds.""" + return value / 1000.0 + + def us_to_s(value): + """Convert microseconds to seconds.""" + return value / 1000000.0 + + env.filters["format_number"] = format_number + env.filters["truncate"] = truncate + env.filters["pad"] = pad + env.filters["us_to_ms"] = us_to_ms + env.filters["us_to_s"] = us_to_s + + return env + + +def generate_report(env, data, args, total_events, num_files): + """Generate the final report using Jinja2 template.""" + print("Rendering report with Jinja2...") + + template = env.get_template("build_analysis_report.md.jinja") + + report_content = template.render( + timestamp=datetime.now().strftime("%Y-%m-%d %H:%M:%S"), + target=args["target"], + granularity=args["granularity"], + build_time=args["build_time"], + total_events=total_events, + num_files=num_files, + total_instantiations=data["total_inst"], + unique_families=data["unique_families"], + total_trace_time=data["total_trace_time"], + total_template_time=data["total_template_time"], + phases=data["sorted_phases"], + top_individual=data["top_individual"], + templates_by_time=data["templates_by_time"], + templates_by_count=data["templates_by_count"], + median_count=data["median_count"], + top10_pct=data["top10_pct"], + file_stats=data["file_stats"], + ) + + return report_content + + +def main(): + """Main entry point for the analysis tool.""" + args = parse_arguments() + + # Find and load trace files + trace_files = find_trace_files(args["trace_input"]) + all_trace_data = load_trace_data(trace_files) + + # Process events from all files + template_stats, phase_stats, top_individual, file_stats, total_events = ( + process_events(all_trace_data) + ) + + # Prepare template data + data = prepare_template_data( + template_stats, phase_stats, top_individual, file_stats + ) + + # Setup Jinja2 environment + env = setup_jinja_environment(args["template_dir"]) + + # Generate report + report_content = generate_report(env, data, args, total_events, len(all_trace_data)) + + # Write output + with open(args["output_file"], "w") as f: + f.write(report_content) + + print(f"Report generated: {args['output_file']}") + print(f"Report size: {len(report_content):,} bytes") + print(f"Analyzed {len(all_trace_data)} file(s) with {total_events:,} total events") + + +if __name__ == "__main__": + main() diff --git a/script/tools/ck-build-analysis b/script/tools/ck-build-analysis new file mode 100755 index 0000000000..cd06a1796f --- /dev/null +++ b/script/tools/ck-build-analysis @@ -0,0 +1,237 @@ +#!/bin/bash +# Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +# SPDX-License-Identifier: MIT + +# CK Build Analysis Tool - Analyze build times using -ftime-trace + +set -e +set -o pipefail + +# Find script directory and load common utilities +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +source "${SCRIPT_DIR}/common.sh" + +# Initialize configuration +PROJECT_ROOT=$(get_project_root "${SCRIPT_DIR}") +CONTAINER_NAME=$(get_container_name "${PROJECT_ROOT}") + +# Default settings +GRANULARITY="${CK_BUILD_ANALYSIS_GRANULARITY:-1}" +OUTPUT_FILE="build_time_analysis_report.md" +RECONFIGURE=true + +# Help message +show_help() { + cat << EOF +CK Build Analysis - Analyze build times using Clang -ftime-trace + +Usage: ck-build-analysis [options] + +Arguments: + target Build target to analyze (e.g., example_convnd_fwd_xdl_fp8) + +Options: + --granularity=N Time trace granularity in microseconds (default: 1) + --output=FILE Output report filename (default: build_time_analysis_report.md) + --name=NAME Docker container name (default: ${CONTAINER_NAME}) + --no-reconfigure Skip CMake reconfiguration if build exists + --help Show this help message + +Examples: + ck-build-analysis example_convnd_fwd_xdl_fp8 + ck-build-analysis example_convnd_fwd_xdl_fp8 --granularity=10 + ck-build-analysis test_amdgcn_mma --granularity=1 --output=mma_test_analysis.md + +Granularity Guide: + 0 - Everything: All compiler events including sub-microsecond operations + Use for LLVM internals debugging. Large files, higher overhead. + + 1 (default) - Complete template coverage: Captures all template instantiations + Best balance - filters sub-microsecond noise, low overhead + + 10 - Daily use: Captures most expensive templates, smaller files + Good for quick checks and routine analysis + + 50-100 - Intermediate: Balanced between detail and file size + Suitable for CI/CD tracking + + 500 - High-level only: Major compilation phases, minimal detail + Not recommended for template analysis (loses most instantiations) + + Recommendation: Use 1us (default) for template analysis, 10us for quick checks. +EOF +} + +# Parse arguments +TARGET="" +while [[ $# -gt 0 ]]; do + case $1 in + --granularity=*) + GRANULARITY="${1#*=}" + shift + ;; + --output=*) + OUTPUT_FILE="${1#*=}" + shift + ;; + --name=*) + CONTAINER_NAME="${1#*=}" + shift + ;; + --no-reconfigure) + RECONFIGURE=false + shift + ;; + --help|-h) + show_help + exit 0 + ;; + -*) + echo "Unknown option: $1" + show_help + exit 1 + ;; + *) + if [ -z "$TARGET" ]; then + TARGET="$1" + else + echo "Error: Multiple targets specified" + show_help + exit 1 + fi + shift + ;; + esac +done + +if [ -z "$TARGET" ]; then + echo "Error: No target specified" + echo "" + show_help + exit 1 +fi + +# Validate OUTPUT_FILE to prevent path traversal +if [[ "$OUTPUT_FILE" =~ / ]] || [[ "$OUTPUT_FILE" =~ \.\. ]]; then + echo "Error: OUTPUT_FILE must be a simple filename (no path separators or .. allowed)" + echo "Invalid: $OUTPUT_FILE" + exit 1 +fi + +echo "═══════════════════════════════════════════════════════════════" +echo " CK Build Time Analysis" +echo "═══════════════════════════════════════════════════════════════" +echo "Target: $TARGET" +echo "Granularity: ${GRANULARITY}us" +echo "Container: $CONTAINER_NAME" +echo "Output: $OUTPUT_FILE" +echo "═══════════════════════════════════════════════════════════════" +echo "" + +# Ensure container is running +ensure_container_running "${CONTAINER_NAME}" "${SCRIPT_DIR}" + +# Configure CMake with -ftime-trace if needed +if [ "$RECONFIGURE" = true ] || ! docker exec "${CONTAINER_NAME}" test -f /workspace/build/build.ninja 2>/dev/null; then + echo "" + echo "Configuring CMake with -ftime-trace (granularity=${GRANULARITY}us)..." + + GPU_TARGET=$(detect_gpu_target "${CONTAINER_NAME}") + + docker exec -e GPU_TARGET="${GPU_TARGET}" -e GRANULARITY="${GRANULARITY}" "${CONTAINER_NAME}" bash -c ' + cd /workspace || exit 1 + rm -rf /workspace/build + mkdir /workspace/build + cd /workspace/build || exit 1 + cmake .. -GNinja \ + -DGPU_TARGETS="${GPU_TARGET}" \ + -DCMAKE_BUILD_TYPE=Release \ + -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ \ + -DCMAKE_CXX_FLAGS="-ftime-trace -ftime-trace-granularity=${GRANULARITY}" \ + -DCMAKE_HIP_FLAGS="-ftime-trace -ftime-trace-granularity=${GRANULARITY}" \ + -DBUILD_TESTING=ON 2>&1 | tail -20 + ' + echo "CMake configuration complete" +fi + +# Build the target +echo "" +echo "Building target: $TARGET" +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" + +BUILD_START=$(date +%s) +docker exec -e TARGET="${TARGET}" "${CONTAINER_NAME}" bash -c 'cd /workspace/build && time ninja "${TARGET}" 2>&1' +BUILD_END=$(date +%s) +BUILD_TIME=$((BUILD_END - BUILD_START)) + +echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" +echo "Build completed in ${BUILD_TIME} seconds" + +# Find all trace JSON files for the target +echo "" +echo "Locating trace files..." + +# Count trace files +TRACE_COUNT=$(docker exec -e TARGET="${TARGET}" "${CONTAINER_NAME}" bash -c ' + find /workspace/build -type f \( -name "*.cpp.json" -o -name "*.hip.json" \) 2>/dev/null | \ + grep -vF "compile_commands.json" | wc -l +') + +if [ "$TRACE_COUNT" -eq 0 ]; then + echo "Error: Could not find any trace files in /workspace/build" + echo "Expected .cpp.json or .hip.json files from -ftime-trace compilation" + exit 1 +fi + +echo "Found ${TRACE_COUNT} trace file(s) in build directory" + +# We'll pass the build directory to the Python script +BUILD_DIR="/workspace/build" + +# Generate analysis report +echo "" +echo "Generating analysis report..." + +# Copy analysis script and templates to container +docker cp "${SCRIPT_DIR}/analyze_build_trace.py" "${CONTAINER_NAME}:/tmp/analyze_build_trace.py" +docker cp "${SCRIPT_DIR}/templates" "${CONTAINER_NAME}:/tmp/ck_build_analysis_templates" + +# Check if uv is available, install if needed, and use for PEP 723 dependency management +if ! docker exec "${CONTAINER_NAME}" bash -c "command -v uv >/dev/null 2>&1 || test -x \$HOME/.local/bin/uv"; then + echo "uv not found, installing via pipx..." + docker exec "${CONTAINER_NAME}" bash -c " + # Install pipx if not available + if ! command -v pipx >/dev/null 2>&1; then + apt-get update -qq && apt-get install -y -qq pipx >/dev/null 2>&1 + fi + # Install uv via pipx + pipx install uv >/dev/null 2>&1 + " + echo "uv installed successfully" +fi + +echo "Using uv run for automatic dependency management..." +# Ensure uv is in PATH (handles ~/.local/bin installation) +# Pass build directory instead of single file +docker exec -e BUILD_DIR="${BUILD_DIR}" -e OUTPUT_FILE="${OUTPUT_FILE}" -e TARGET="${TARGET}" -e GRANULARITY="${GRANULARITY}" -e BUILD_TIME="${BUILD_TIME}" "${CONTAINER_NAME}" bash -c 'export PATH="$HOME/.local/bin:$PATH" && uv run --no-project /tmp/analyze_build_trace.py "${BUILD_DIR}" "/workspace/${OUTPUT_FILE}" "${TARGET}" "${GRANULARITY}" "${BUILD_TIME}" /tmp/ck_build_analysis_templates' + +# Copy report back to host +docker cp "${CONTAINER_NAME}:/workspace/${OUTPUT_FILE}" "${PROJECT_ROOT}/${OUTPUT_FILE}" + +# Cleanup +docker exec "${CONTAINER_NAME}" rm -f /tmp/analyze_build_trace.py +docker exec "${CONTAINER_NAME}" rm -rf /tmp/ck_build_analysis_templates + +echo "" +echo "═══════════════════════════════════════════════════════════════" +echo " Analysis Complete!" +echo "═══════════════════════════════════════════════════════════════" +echo "Report: ${PROJECT_ROOT}/${OUTPUT_FILE}" +echo "" +echo "Summary:" +docker exec "${CONTAINER_NAME}" bash -c "head -20 /workspace/${OUTPUT_FILE} | tail -10" +echo "" +echo "View the full report:" +echo " cat ${OUTPUT_FILE}" +echo " or open it in your editor" +echo "═══════════════════════════════════════════════════════════════" diff --git a/script/tools/ck-docker b/script/tools/ck-docker new file mode 100755 index 0000000000..82bf770011 --- /dev/null +++ b/script/tools/ck-docker @@ -0,0 +1,294 @@ +#!/bin/bash +# Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +# SPDX-License-Identifier: MIT + +# CK Docker Tool - Build and test composable_kernel in Docker with ROCm support + +set -e +set -o pipefail + +# Find script directory and load common utilities +SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)" +source "${SCRIPT_DIR}/common.sh" + +# Initialize configuration +PROJECT_ROOT=$(get_project_root "${SCRIPT_DIR}") +CONTAINER_NAME=$(get_container_name "${PROJECT_ROOT}") + +# Help message +show_help() { + cat << EOF +CK Docker Tool - Build and test composable_kernel in Docker + +Usage: ck-docker [options] + +Commands: + start [name] Start Docker container + build [target] [--reconfigure] Build target (optionally reconfigure CMake) + test [options] Run test + shell [name] Open shell in container + status [name] Check container status + stop [name] Stop and remove container + +Examples: + ck-docker start + ck-docker build test_amdgcn_mma + ck-docker build --reconfigure test_amdgcn_mma + ck-docker test test_amdgcn_mma --gtest_filter=*Fp16* + ck-docker shell + +Environment: + CK_CONTAINER_NAME - Override default container name (default: ck__) + CK_DOCKER_IMAGE - Override Docker image (default: rocm/composable_kernel:ck_ub24.04_rocm7.0.1) + GPU_TARGET - Override GPU target detection (e.g., gfx950, gfx942) +EOF +} + +# Start container +cmd_start() { + local name="${1:-${CONTAINER_NAME}}" + local docker_image=$(get_docker_image) + + # Check if container exists and is running + if container_exists "${name}"; then + if container_is_running "${name}"; then + echo "Container '${name}' is already running" + return 0 + else + echo "Starting existing container '${name}'..." + docker start "${name}" + echo "Container started" + return 0 + fi + fi + + echo "Creating new Docker container '${name}'..." + docker run -d \ + --name "${name}" \ + --device=/dev/kfd --device=/dev/dri \ + --security-opt seccomp=unconfined \ + --group-add video \ + -v "${PROJECT_ROOT}":/workspace \ + -w /workspace \ + "${docker_image}" \ + tail -f /dev/null + + echo "Container '${name}' started successfully" + docker exec "${name}" bash -c "echo 'Working directory:' && pwd" +} + +# Build target +cmd_build() { + local target="" + local name="${CONTAINER_NAME}" + local reconfigure=false + + while [[ $# -gt 0 ]]; do + case $1 in + --name) + name="$2" + shift 2 + ;; + --reconfigure) + reconfigure=true + shift + ;; + *) + target="$1" + shift + ;; + esac + done + + # Check if container is running + if ! container_is_running "${name}"; then + echo "Container '${name}' not running. Starting..." + cmd_start "${name}" + fi + + # Reconfigure CMake if requested or if build.ninja doesn't exist + if [ "$reconfigure" = true ] || ! docker exec "${name}" test -f /workspace/build/build.ninja 2>/dev/null; then + echo "Detecting GPU target..." + local gpu_target=$(detect_gpu_target "${name}") + + if [ "$reconfigure" = true ]; then + echo "Reconfiguring CMake from scratch for GPU target: ${gpu_target}" + else + echo "Configuring build with CMake for GPU target: ${gpu_target}" + fi + + docker exec "${name}" bash -c " + cd /workspace || exit 1 + rm -rf /workspace/build + mkdir /workspace/build + cd /workspace/build || exit 1 + cmake .. -GNinja \ + -DGPU_TARGETS=${gpu_target} \ + -DCMAKE_BUILD_TYPE=Release \ + -DCMAKE_CXX_COMPILER=/opt/rocm/llvm/bin/clang++ \ + -DBUILD_TESTING=ON 2>&1 | tail -30 + " + fi + + if [ -z "$target" ]; then + echo "Building all configured targets..." + else + echo "Building target: ${target}" + fi + + docker exec "${name}" bash -c " + cd /workspace/build || exit 1 + ninja ${target} 2>&1 + " + + echo "Build complete" +} + +# Run test +cmd_test() { + local test_name="" + local name="${CONTAINER_NAME}" + local -a test_options=() + + while [[ $# -gt 0 ]]; do + case $1 in + --name) + name="$2" + shift 2 + ;; + --gtest_*|--help) + test_options+=("$1") + shift + ;; + *) + if [ -z "$test_name" ]; then + test_name="$1" + else + test_options+=("$1") + fi + shift + ;; + esac + done + + if [ -z "$test_name" ]; then + echo "Error: test_name required" + echo "Usage: ck-docker test [--name container_name] [gtest_options]" + return 1 + fi + + # Check if container is running + if ! container_is_running "${name}"; then + echo "Error: Container '${name}' not running" + echo "Start it with: ck-docker start --name ${name}" + return 1 + fi + + if ! docker exec "${name}" test -f "/workspace/build/bin/${test_name}" 2>/dev/null; then + echo "Test executable not found. Building ${test_name}..." + cmd_build "${test_name}" --name "${name}" + fi + + echo "Running: ${test_name} ${test_options[*]}" + echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" + # Build the command with proper quoting + local cmd="cd /workspace/build && ./bin/${test_name}" + for opt in "${test_options[@]}"; do + cmd="${cmd} $(printf '%q' "$opt")" + done + docker exec "${name}" bash -c "${cmd}" +} + +# Shell +cmd_shell() { + local name="${1:-${CONTAINER_NAME}}" + + # Check if container is running + if ! container_is_running "${name}"; then + echo "Container '${name}' not running. Starting..." + cmd_start "${name}" + fi + + echo "Opening shell in '${name}' (type 'exit' to leave)..." + docker exec -it "${name}" bash +} + +# Status +cmd_status() { + local name="${1:-}" + local docker_image=$(get_docker_image) + + if [ -z "$name" ]; then + echo "Composable Kernel Docker Containers:" + echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━" + docker ps -a --filter "ancestor=${docker_image}" \ + --format "table {{.Names}}\t{{.Status}}\t{{.CreatedAt}}" || echo "No containers found" + else + # Check container status + if container_is_running "${name}"; then + echo "Container '${name}' is RUNNING" + docker ps --filter "name=^${name}$" --format "table {{.Names}}\t{{.Status}}\t{{.Image}}" + echo "" + echo "GPU Information:" + docker exec "${name}" bash -c "rocm-smi --showproductname 2>/dev/null | head -10 || echo 'No GPU detected'" + elif container_exists "${name}"; then + echo "Container '${name}' exists but is STOPPED" + echo "Start with: ck-docker start ${name}" + else + echo "Container '${name}' does NOT exist" + echo "Create with: ck-docker start ${name}" + fi + fi +} + +# Stop +cmd_stop() { + local name="${1:-${CONTAINER_NAME}}" + + # Check if container exists + if container_exists "${name}"; then + echo "Stopping and removing container '${name}'..." + docker stop "${name}" 2>/dev/null || true + docker rm "${name}" 2>/dev/null || true + echo "Container stopped and removed" + else + echo "Container '${name}' does not exist" + fi +} + +# Main command dispatcher +case "${1:-}" in + start) + shift + cmd_start "$@" + ;; + build) + shift + cmd_build "$@" + ;; + test) + shift + cmd_test "$@" + ;; + shell) + shift + cmd_shell "$@" + ;; + status) + shift + cmd_status "$@" + ;; + stop) + shift + cmd_stop "$@" + ;; + help|--help|-h) + show_help + ;; + *) + echo "Unknown command: ${1:-}" + echo "" + show_help + exit 1 + ;; +esac diff --git a/script/tools/common.sh b/script/tools/common.sh new file mode 100644 index 0000000000..6683572c0f --- /dev/null +++ b/script/tools/common.sh @@ -0,0 +1,97 @@ +#!/bin/bash +# Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +# SPDX-License-Identifier: MIT + +# Common utilities for CK Docker tools +# Shared configuration and helper functions + +# Find project root (where .git directory is) +get_project_root() { + local script_dir="$1" + cd "${script_dir}/../.." && pwd +} + +# Detect git branch and sanitize for Docker naming +get_sanitized_branch() { + local project_root="$1" + local branch + + branch=$(cd "${project_root}" && git rev-parse --abbrev-ref HEAD 2>/dev/null | tr '/' '_' | tr -cd 'a-zA-Z0-9_-' || echo "") + branch=${branch:-unknown} + + # Handle detached HEAD state + if [ "${branch}" = "HEAD" ]; then + branch="detached" + fi + + echo "${branch}" +} + +# Get username with fallback +get_username() { + echo "${USER:-$(whoami 2>/dev/null || echo "user")}" +} + +# Generate default container name: ck__ +get_default_container_name() { + local project_root="$1" + local user_name + local git_branch + + user_name=$(get_username) + git_branch=$(get_sanitized_branch "${project_root}") + + echo "ck_${user_name}_${git_branch}" +} + +# Get container name (respects CK_CONTAINER_NAME env var) +get_container_name() { + local project_root="$1" + local default_name + + default_name=$(get_default_container_name "${project_root}") + echo "${CK_CONTAINER_NAME:-${default_name}}" +} + +# Get Docker image (respects CK_DOCKER_IMAGE env var) +get_docker_image() { + echo "${CK_DOCKER_IMAGE:-rocm/composable_kernel:ck_ub24.04_rocm7.0.1}" +} + +# Check if container exists (exact match) +container_exists() { + local name="$1" + docker ps -a --filter "name=^${name}$" --format '{{.Names}}' | grep -q "^${name}$" +} + +# Check if container is running (exact match) +container_is_running() { + local name="$1" + docker ps --filter "name=^${name}$" --format '{{.Names}}' | grep -q "^${name}$" +} + +# Detect GPU target in container +detect_gpu_target() { + local container="$1" + + # Allow override via GPU_TARGET environment variable + if [ -n "${GPU_TARGET:-}" ]; then + echo "${GPU_TARGET}" + return 0 + fi + + docker exec "${container}" bash -c " + rocminfo 2>/dev/null | grep -oP 'gfx[0-9a-z]+' | head -1 || echo 'gfx950' + " | tr -d '\r\n' +} + +# Ensure container is running, start if needed +ensure_container_running() { + local container="$1" + local script_dir="$2" + + if ! container_is_running "${container}"; then + echo "Container '${container}' not running. Starting with ck-docker..." + "${script_dir}/ck-docker" start "${container}" + fi +} diff --git a/script/tools/templates/build_analysis_report.md.jinja b/script/tools/templates/build_analysis_report.md.jinja new file mode 100644 index 0000000000..f91dce14a9 --- /dev/null +++ b/script/tools/templates/build_analysis_report.md.jinja @@ -0,0 +1,125 @@ +# Composable Kernel Build Time Analysis Report + +**Generated:** {{ timestamp }} +**Target:** {{ target }} +**Granularity:** {{ granularity }}µs +**Files Analyzed:** {{ num_files }} + +## Executive Summary + +- **Wall Clock Time:** {{ build_time }} seconds +- **Trace Time:** {{ total_trace_time|us_to_s|round(1) }} seconds +- **Template Instantiation Time:** {{ total_template_time|us_to_s|round(1) }} seconds ({{ (100 * total_template_time / total_trace_time)|round(1) }}% of trace) +- **Total Events Captured:** {{ total_events|format_number }} (across {{ num_files }} file{{ 's' if num_files != 1 else '' }}) +- **Total Template Instantiations:** {{ total_instantiations|format_number }} +- **Unique Template Families:** {{ unique_families }} + +{% if num_files > 1 -%} +## Per-File Analysis + +| File | Events | Template Time (ms) | % of Total | +|------|--------|-------------------|------------| +{% for file in file_stats[:20] -%} +| {{ file.name|truncate(50)|pad(50) }} | {{ "%7d"|format(file.events) }} | {{ "%17.2f"|format(file.template_time|us_to_ms) }} | {{ "%9.1f"|format(100 * file.template_time / total_template_time if total_template_time > 0 else 0) }}% | +{% endfor %} + +{% endif -%} +## Compilation Phase Breakdown + +| Phase | Time (ms) | Time (s) | % of Total | +|-------|-----------|----------|------------| +{% for phase, dur in phases[:20] -%} +| {{ phase|pad(40) }} | {{ "%9.2f"|format(dur|us_to_ms) }} | {{ "%8.2f"|format(dur|us_to_s) }} | {{ "%9.1f"|format(100 * dur / total_trace_time) }}% | +{% endfor %} + +## Top 30 Most Expensive Individual Instantiations + +{% if num_files > 1 -%} +| Rank | Template | Type | Time (ms) | File | +|------|----------|------|-----------|------| +{% for inst in top_individual[:30] -%} +| {{ "%4d"|format(loop.index) }} | {{ inst.detail|truncate(50) }} | {{ inst.inst_type|pad(5) }} | {{ "%9.2f"|format(inst.dur|us_to_ms) }} | {{ inst.file|truncate(20) }} | +{% endfor -%} +{% else -%} +| Rank | Template | Type | Time (ms) | +|------|----------|------|-----------| +{% for inst in top_individual[:30] -%} +| {{ "%4d"|format(loop.index) }} | {{ inst.detail|truncate(70) }} | {{ inst.inst_type|pad(5) }} | {{ "%9.2f"|format(inst.dur|us_to_ms) }} | +{% endfor -%} +{% endif %} + +## Template Families by Total Time (Top 50) + +| Rank | Template Family | Count | Total (ms) | Avg (ms) | % of Total | +|------|-----------------|-------|------------|----------|------------| +{% for name, stats in templates_by_time[:50] -%} +| {{ "%4d"|format(loop.index) }} | {{ name|truncate(43)|pad(43) }} | {{ "%5d"|format(stats.count) }} | {{ "%10.2f"|format(stats.total_dur|us_to_ms) }} | {{ "%8.2f"|format(stats.avg|us_to_ms) }} | {{ "%9.1f"|format(stats.pct) }}% | +{% endfor %} + +## Template Families by Instantiation Count (Top 50) + +| Rank | Template Family | Count | Total (ms) | Avg (ms) | +|------|-----------------|-------|------------|----------| +{% for name, stats in templates_by_count[:50] -%} +| {{ "%4d"|format(loop.index) }} | {{ name|truncate(43)|pad(43) }} | {{ "%5d"|format(stats.count) }} | {{ "%10.2f"|format(stats.total_dur|us_to_ms) }} | {{ "%8.2f"|format(stats.avg|us_to_ms) }} | +{% endfor %} + +## Key Insights + +### 1. Template Instantiation Impact +- Template instantiation accounts for {{ (100 * total_template_time / total_trace_time)|round(1) }}% of total trace time +{% if unique_families >= 10 -%} +- Top 10 template families account for {{ top10_pct|round(1) }}% of instantiation time +{% endif %} + +### 2. Most Expensive Templates +{% if templates_by_time|length > 0 -%} +- **{{ templates_by_time[0][0] }}**: {{ templates_by_time[0][1].count|format_number }} instantiations, {{ (templates_by_time[0][1].total_dur|us_to_s)|round(2) }}s total +{% endif -%} +{% if templates_by_time|length > 1 -%} +- **{{ templates_by_time[1][0] }}**: {{ templates_by_time[1][1].count|format_number }} instantiations, {{ (templates_by_time[1][1].avg|us_to_ms)|round(2) }}ms average +{% endif %} + +## Optimization Recommendations + +### High-Impact Targets (by total time) +{% for name, stats in templates_by_time[:5] -%} +**{{ loop.index }}. {{ name }}** - {{ (stats.total_dur|us_to_s)|round(1) }}s total ({{ stats.pct|round(1) }}%) + - {{ stats.count|format_number }} instantiations, {{ (stats.avg|us_to_ms)|round(2) }}ms average + {% if stats.count > 100 -%} + - Strategy: Extern templates - High instantiation count suggests repeated compilation + {% elif stats.avg|us_to_ms > 50 -%} + - Strategy: Template specialization - High individual cost suggests complexity + {% else -%} + - Strategy: Explicit instantiation - Pre-instantiate common configurations + {% endif %} + +{% endfor %} +### Frequently Instantiated (optimization candidates) +{% for name, stats in templates_by_count[:5] if stats.count > 100 -%} +**{{ name }}** - {{ stats.count|format_number }} times ({{ (stats.total_dur|us_to_s)|round(2) }}s total) + - Consider: Precompiled headers or extern templates to avoid recompilation + +{% endfor %} +### Most Expensive Individual Instantiations +{% for inst in top_individual[:3] -%} +**{{ loop.index }}. {{ inst.detail|truncate(60) }}** - {{ (inst.dur|us_to_ms)|round(1) }}ms + - Strategy: Profile and simplify this specific instantiation + +{% endfor %} + +## Detailed Statistics + +- **Total Unique Templates:** {{ unique_families }} +- **Total Instantiations:** {{ total_instantiations|format_number }} +{% if total_instantiations > 0 -%} +- **Average Instantiation Time:** {{ ((total_template_time // total_instantiations)|us_to_ms)|round(3) }}ms +{% endif -%} +{% if unique_families > 0 -%} +- **Median Template Family Count:** {{ median_count }} +{% endif %} + +--- + +*Report generated using Clang -ftime-trace with {{ granularity }}µs granularity* +*Analysis tool: ck-build-analysis*