Files
composable_kernel/script/analyze_build

Build Trace Analysis

Simple, fast tools for analyzing Clang -ftime-trace build performance data.

Overview

This directory provides straightforward Python tools for analyzing the JSON trace files generated during compilation with -ftime-trace. The focus is on simplicity and speed - no caching, no complexity, just fast parallel I/O and pandas DataFrames.

Quick Start

Configure a build directory build-trace and edit CMakeCache to add -ftime-trace to the CMAKE_CXX_FLAGS. With -ftime-trace enabled, the clang compiler will generate .json trace files alongside each compiled object file. These trace files contain detailed timing information about:

  • Template instantiations
  • Function parsing
  • Code generation phases
  • Optimization passes

These JSON files are what the analysis tools in script/analyze_build/ are designed to process.

# Analyze all trace files in a directory
cd script/analyze_build/examples
python analyze_build.py ../../build-trace

# Analyze a single file
python analyze_file.py ../../build-trace/some_file.json

Installation

Install required Python packages:

pip install pandas orjson tqdm

Performance Note: orjson provides a 1.65x speedup in JSON parsing. The parser automatically uses it if available, otherwise falls back to the standard library.

Directory Structure

script/analyze_build/
├── trace_analysis/          # Core library
│   ├── __init__.py         # Main exports
│   ├── models.py           # TraceFile model
│   ├── parser.py           # Fast JSON parsing
│   └── transformer.py      # DataFrame conversion
├── examples/
│   ├── analyze_build.py    # Analyze all files in a directory
│   └── analyze_file.py     # Analyze a single file
├── notebooks/              # Jupyter notebooks for analysis
│   └── (existing notebooks)
└── README.md               # This file

Usage

Command-Line Analysis

Analyze all trace files:

python examples/analyze_build.py ../../build-trace

This will:

  • Find all .json files recursively
  • Process them in parallel using all CPU cores
  • Display comprehensive build statistics
  • Show top event types, slowest files, and template analysis

Analyze a single file:

python examples/analyze_file.py ../../build-trace/some_file.json

Python API

from pathlib import Path
from trace_analysis import TraceFile, TraceParser, TraceTransformer

# Parse a single file
trace_file = TraceFile.from_path(Path("build.json"))
events = TraceParser.parse(trace_file)

# Convert to DataFrames
events_df = TraceTransformer.to_events_dataframe(events)
templates_df = TraceTransformer.to_templates_dataframe(events)

# Analyze
print(f"Total events: {len(events_df):,}")
print(f"Total time: {events_df['dur'].sum() / 1e6:.2f}s")
print(f"Template time: {templates_df['dur'].sum() / 1e6:.2f}s")

Jupyter Notebooks

For interactive analysis, see the comprehensive example notebook:

notebooks/comprehensive_example.ipynb - Complete guide covering:

  • Single file analysis with detailed explanations
  • Multi-file parallel processing
  • Build-wide statistics and template analysis
  • Advanced analysis patterns (optimization targets, distributions, etc.)
  • Practical recommendations for improving build times

Quick example for custom notebooks:

from pathlib import Path
from concurrent.futures import ProcessPoolExecutor
from trace_analysis import TraceFile, TraceParser, TraceTransformer
import pandas as pd

def process_file(json_path):
    trace_file = TraceFile.from_path(json_path)
    events = TraceParser.parse(trace_file)
    return TraceTransformer.to_events_dataframe(events)

# Process all files in parallel
trace_dir = Path("../../build-trace")
json_files = list(trace_dir.rglob("*.json"))

with ProcessPoolExecutor() as executor:
    dfs = list(executor.map(process_file, json_files))

# Combine and analyze
events_df = pd.concat(dfs, ignore_index=True)

# Top event types
event_totals = events_df.groupby('name')['dur'].sum().sort_values(ascending=False)
print(event_totals.head(10))

Performance

Typical performance on 4,484 trace files (~46 GB):

  • Parsing: ~26 seconds (174 files/sec)
  • Memory: ~1-2 GB
  • Throughput: I/O limited (uses all CPU cores)

Why no caching?

  • Fresh analysis is faster than cache management overhead
  • Simpler code (60% less code than cached version)
  • No cache invalidation issues
  • Catches changes immediately

Data Format

The trace files use the Chrome Trace Event Format:

{
  "traceEvents": [
    {
      "pid": 1234,
      "tid": 1234,
      "ts": 1000,
      "dur": 500,
      "ph": "X",
      "name": "InstantiateFunction",
      "args": {
        "detail": "template_name<Args...>"
      }
    }
  ],
  "beginningOfTime": 1234567890
}

Key fields:

  • name: Event type (e.g., "InstantiateClass", "ParseFunctionDefinition")
  • dur: Duration in microseconds
  • ts: Timestamp in microseconds
  • args.detail: Additional information (e.g., template name)

Library Components

TraceFile

Simple model for trace file metadata:

@dataclass
class TraceFile:
    path: Path
    size_bytes: int
    mtime_ns: int
    
    @classmethod
    def from_path(cls, path: Path) -> "TraceFile"

TraceParser

Fast JSON parsing with orjson support:

class TraceParser:
    @staticmethod
    def parse(trace_file: TraceFile) -> List[Dict[str, Any]]

Automatically uses orjson if available for 1.65x speedup.

TraceTransformer

Convert parsed events to pandas DataFrames:

class TraceTransformer:
    @staticmethod
    def to_events_dataframe(events: List[Dict]) -> pd.DataFrame
    
    @staticmethod
    def to_templates_dataframe(events: List[Dict]) -> pd.DataFrame

The events DataFrame includes all events with optimized dtypes. The templates DataFrame filters to template-related events and extracts template details.

Analysis Examples

Find Most Expensive Event Types

event_totals = events_df.groupby('name')['dur'].sum()
top_events = event_totals.sort_values(ascending=False).head(10)
print(top_events / 1e6)  # Convert to seconds

Find Slowest Files

file_totals = events_df.groupby('file_name')['dur'].sum()
slowest = file_totals.sort_values(ascending=False).head(10)
print(slowest / 1e6)  # Convert to seconds

Analyze Template Instantiations

# Most frequently instantiated
template_counts = templates_df['template_detail'].value_counts()
print(template_counts.head(10))

# Most expensive by total time
template_totals = templates_df.groupby('template_detail')['dur'].sum()
print(template_totals.sort_values(ascending=False).head(10) / 1e6)

# Template time percentage
total_time = events_df['dur'].sum()
template_time = templates_df['dur'].sum()
print(f"Template time: {(template_time / total_time) * 100:.1f}%")

Build time analysis philosophy

  • Use all CPU cores: The tools automatically use all available cores for parallel processing
  • Memory is cheap: 1-2GB for 4,484 files is acceptable on modern systems
  • Fresh is fast: No cache overhead means consistent ~26s analysis time
  • Jupyter-friendly: Progress bars work automatically in notebooks
  • Simple is better: One straightforward approach, not multiple complex paths

References