composable_kernel/script/analyze_build/notebooks/build_analysis_example.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Full Build Analysis\n",
    "\n",
    "This notebook demonstrates comprehensive build-wide analysis using the Pipeline API to process all trace files in parallel.\n",
    "\n",
    "We'll create three main DataFrames:\n",
    "1. **Metadata DataFrame**: One row per build unit with compilation statistics\n",
    "2. **Phase DataFrame**: Compilation phase breakdown for all build units\n",
    "3. **Template DataFrame**: Template instantiation events across the entire build\n",
    "\n",
    "All DataFrames are keyed by `build_unit` (the source file name) for easy cross-analysis."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Setup"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "\n",
    "from pathlib import Path\n",
    "import sys\n",
    "import pandas as pd\n",
    "import plotly.express as px\n",
    "\n",
    "# Add parent directory to path\n",
    "sys.path.insert(0, str(Path.cwd().parent))\n",
    "\n",
    "from trace_analysis import (\n",
    "    Pipeline,\n",
    "    find_trace_files,\n",
    "    parse_file,\n",
    "    get_trace_file,\n",
    ")\n",
    "from trace_analysis.build_helpers import extract_all_data, print_phase_hierarchy\n",
    "\n",
    "# Display settings\n",
    "pd.set_option(\"display.max_rows\", 100)\n",
    "pd.set_option(\"display.max_columns\", None)\n",
    "pd.set_option(\"display.width\", None)\n",
    "pd.set_option(\"display.max_colwidth\", None)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Find Trace Files"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Configure the path to your trace files\n",
    "TRACE_DIR = Path(\"../../../build-trace\")\n",
    "\n",
    "json_files = find_trace_files(TRACE_DIR)\n",
    "\n",
    "if not json_files:\n",
    "    print(f\"No trace files found in {TRACE_DIR}\")\n",
    "    print(\"\\nTo generate trace files:\")\n",
    "    print(\"1. Configure your build with: cmake -DCMAKE_CXX_FLAGS='-ftime-trace' ...\")\n",
    "    print(\"2. Build your project\")\n",
    "    print(\"3. Trace files will be generated alongside object files\")\n",
    "    raise ValueError(f\"No trace files found in {TRACE_DIR}\")\n",
    "else:\n",
    "    print(f\"Found {len(json_files):,} trace files\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Parse All Files in Parallel\n",
    "\n",
    "Parse all trace files using all available CPUs with progress tracking."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Parse all files in parallel, returning a list of raw trace dataframes.\n",
    "parsed_dfs = (\n",
    "    Pipeline(json_files)\n",
    "    .map(parse_file, workers=-1, desc=\"Parsing trace files\")\n",
    "    .collect()\n",
    ")\n",
    "\n",
    "print(f\"\\nParsed {len(parsed_dfs):,} files\")\n",
    "print(f\"Total events across all files: {sum(len(df) for df in parsed_dfs):,}\")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Create Three Analysis DataFrames\n",
    "\n",
    "Extract metadata, phase breakdown, and template events from all parsed files in parallel.\n",
    "This creates three DataFrames:\n",
    "1. **metadata_df**: One row per build unit with compilation statistics\n",
    "2. **phase_df**: Phase breakdown for all build units\n",
    "3. **template_df**: Template instantiation events across the build\n",
    "\n",
    "All DataFrames use a shared categorical `build_unit` column for efficient grouping and joining.\n",
    "\n",
    "📝 **TODO:**\n",
    "The details of this processing is all exposed in these notebook cells. We should add library functionality to simplify this.\n",
    "\n",
    "📝 **TODO:**\n",
    "This takes way too long, there is likely something going wrong with the in-memory dataframe processing."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Extract all three types of data in one parallel pass (this can take a few minutes)\n",
    "results = (\n",
    "    Pipeline(parsed_dfs)\n",
    "    .map(\n",
    "        extract_all_data, workers=-1, desc=\"Extracting metadata, phases, and templates\"\n",
    "    )\n",
    "    .collect()\n",
    ")\n",
    "\n",
    "print(f\"\\nExtracted data from {len(results):,} build units\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create shared categorical dtype for build_unit column\n",
    "build_units = [r[\"build_unit\"] for r in results]\n",
    "build_unit_dtype = pd.CategoricalDtype(\n",
    "    categories=sorted(set(build_units)), ordered=False\n",
    ")\n",
    "\n",
    "print(f\"Created categorical dtype with {len(build_unit_dtype.categories)} categories\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Build metadata DataFrame with categorical build_unit\n",
    "metadata_df = pd.DataFrame(\n",
    "    [{\"build_unit\": r[\"build_unit\"], **r[\"metadata\"]} for r in results]\n",
    ")\n",
    "metadata_df[\"build_unit\"] = metadata_df[\"build_unit\"].astype(build_unit_dtype)\n",
    "\n",
    "# Build trace file mapping and store in DataFrame attributes\n",
    "trace_file_mapping = {r[\"build_unit\"]: r[\"trace_file_path\"] for r in results}\n",
    "metadata_df.attrs[\"trace_file_mapping\"] = trace_file_mapping\n",
    "\n",
    "print(f\"metadata_df: {metadata_df.shape[0]:,} rows (one per build unit)\")\n",
    "print(f\"Stored trace file mapping for {len(trace_file_mapping):,} build units\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Build phase DataFrame with categorical build_unit\n",
    "phase_df = pd.concat(\n",
    "    [\n",
    "        r[\"phase\"].assign(\n",
    "            build_unit=pd.Categorical(\n",
    "                [r[\"build_unit\"]] * len(r[\"phase\"]), dtype=build_unit_dtype\n",
    "            )\n",
    "        )\n",
    "        for r in results\n",
    "    ],\n",
    "    ignore_index=True,\n",
    ")\n",
    "\n",
    "print(f\"phase_df: {phase_df.shape[0]:,} rows (phases across all build units)\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Build template DataFrame with categorical build_unit\n",
    "template_df = pd.concat(\n",
    "    [\n",
    "        r[\"template\"].assign(\n",
    "            build_unit=pd.Categorical(\n",
    "                [r[\"build_unit\"]] * len(r[\"template\"]), dtype=build_unit_dtype\n",
    "            )\n",
    "        )\n",
    "        for r in results\n",
    "    ],\n",
    "    ignore_index=True,\n",
    ")\n",
    "\n",
    "print(f\"template_df: {template_df.shape[0]:,} rows (template events across build)\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Summary of created DataFrames\n",
    "print(\"\\n=== Created Analysis DataFrames ===\")\n",
    "print(f\"  metadata_df: {metadata_df.shape[0]:,} rows × {metadata_df.shape[1]} columns\")\n",
    "print(f\"  phase_df: {phase_df.shape[0]:,} rows × {phase_df.shape[1]} columns\")\n",
    "print(f\"  template_df: {template_df.shape[0]:,} rows × {template_df.shape[1]} columns\")\n",
    "print(\n",
    "    f\"\\nAll DataFrames share the same categorical build_unit dtype with {len(build_unit_dtype.categories)} categories\"\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build-Wide Metadata Analysis\n",
    "\n",
    "Analyze compilation statistics across all build units."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Overall build statistics\n",
    "print(\"=== Build-Wide Statistics ===\")\n",
    "print(f\"Total build units: {len(metadata_df):,}\")\n",
    "print(f\"Total compilation time: {metadata_df['total_wall_time_s'].sum():.1f} seconds\")\n",
    "print(f\"Average time per unit: {metadata_df['total_wall_time_s'].mean():.2f} seconds\")\n",
    "print(f\"Median time per unit: {metadata_df['total_wall_time_s'].median():.2f} seconds\")\n",
    "print(f\"Slowest unit: {metadata_df['total_wall_time_s'].max():.2f} seconds\")\n",
    "print(f\"Fastest unit: {metadata_df['total_wall_time_s'].min():.2f} seconds\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Top 20 slowest compilation units\n",
    "print(\"\\n=== Top 20 Slowest Compilation Units ===\")\n",
    "slowest = metadata_df.nlargest(20, \"total_wall_time_s\")[\n",
    "    [\"build_unit\", \"total_wall_time_s\"]\n",
    "]\n",
    "\n",
    "display(slowest.style.format({\"total_wall_time_s\": lambda x: f\"{x:.1f}\"}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Getting Trace Files for Interesting Build Units\n",
    "\n",
    "Now that we've identified interesting build units (e.g., slowest compilation times), we can easily get their trace files for deeper analysis."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Example: Get trace file for a specific build unit\n",
    "example_build_unit = slowest.iloc[0][\"build_unit\"]\n",
    "trace_file = get_trace_file(metadata_df, example_build_unit)\n",
    "\n",
    "print(f\"Build unit: {example_build_unit}\")\n",
    "print(f\"Trace file: {trace_file}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Get trace files for all slowest compilation units\n",
    "print(\"\\n=== Trace Files for Top 10 Slowest Compilation Units ===\\n\")\n",
    "for idx, row in slowest.head(10).iterrows():\n",
    "    build_unit = row[\"build_unit\"]\n",
    "    compile_time = row[\"total_wall_time_s\"]\n",
    "    trace_file = get_trace_file(metadata_df, build_unit)\n",
    "    print(f\"{compile_time:6.1f}s  {build_unit}\")\n",
    "    print(f\"         → {trace_file}\\n\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Compilation time distribution\n",
    "fig = px.histogram(\n",
    "    metadata_df,\n",
    "    x=\"total_wall_time_s\",\n",
    "    nbins=600,\n",
    "    title=\"Distribution of Compilation Times\",\n",
    "    labels={\"total_wall_time_s\": \"Compilation Time (seconds)\"},\n",
    "    log_y=True,\n",
    ")\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build-Wide Phase Analysis\n",
    "\n",
    "Analyze compilation phases across the entire build."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Add duration_ms column for easier reading\n",
    "phase_df[\"duration_ms\"] = phase_df[\"duration\"] / 1000.0\n",
    "\n",
    "print(f\"Total phase records: {len(phase_df):,}\")\n",
    "print(f\"Unique phases: {phase_df['name'].nunique()}\")\n",
    "print(f\"\\nPhases tracked: {sorted(phase_df['name'].unique())}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Cumulative phase time across entire build - hierarchical view\n",
    "print_phase_hierarchy(phase_df)"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Visualize cumulative phase breakdown\n",
    "phase_summary = (\n",
    "    phase_df.groupby([\"name\", \"parent\", \"depth\"]).agg({\"duration\": \"sum\"}).reset_index()\n",
    ")\n",
    "\n",
    "fig = px.sunburst(\n",
    "    phase_summary,\n",
    "    names=\"name\",\n",
    "    parents=\"parent\",\n",
    "    values=\"duration\",\n",
    "    title=\"Cumulative Phase Breakdown Across Entire Build\",\n",
    "    branchvalues=\"total\",\n",
    ")\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Which build units spend most time in Frontend?\n",
    "frontend_time = phase_df[phase_df[\"name\"] == \"Frontend\"].nlargest(20, \"duration_ms\")[\n",
    "    [\"build_unit\", \"duration_ms\"]\n",
    "]\n",
    "\n",
    "# Convert to seconds (keep as float)\n",
    "frontend_time[\"duration_s\"] = frontend_time[\"duration_ms\"] / 1000\n",
    "frontend_time = frontend_time[[\"build_unit\", \"duration_s\"]]\n",
    "\n",
    "print(\"=== Top 20 Build Units by Frontend Time ===\")\n",
    "display(frontend_time.style.format({\"duration_s\": \"{:,.1f}\"}))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Which build units spend most time in Backend?\n",
    "backend_time = phase_df[phase_df[\"name\"] == \"Backend\"].nlargest(20, \"duration_ms\")[\n",
    "    [\"build_unit\", \"duration_ms\"]\n",
    "]\n",
    "\n",
    "backend_time[\"duration_s\"] = backend_time[\"duration_ms\"] / 1000\n",
    "backend_time = backend_time[[\"build_unit\", \"duration_s\"]]\n",
    "\n",
    "print(\"=== Top 20 Build Units by Backend Time ===\")\n",
    "display(backend_time.style.format({\"duration_s\": \"{:,.1f}\"}))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Build-Wide Template Analysis\n",
    "\n",
    "Analyze template instantiations across the entire build."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Overall template statistics\n",
    "print(\"=== Build-Wide Template Statistics ===\")\n",
    "print(f\"Total template instantiation events: {len(template_df):,}\")\n",
    "print(f\"Total template time: {template_df['dur'].sum() / 1_000_000:.1f} seconds\")\n",
    "print(f\"Average template time: {template_df['dur'].mean() / 1000:.2f} ms\")\n",
    "print(f\"Unique template names: {template_df['template_name'].nunique():,}\")\n",
    "print(f\"Unique namespaces: {template_df['namespace'].nunique()}\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Top templates by total time across build\n",
    "top_templates = template_df.groupby([\"namespace\", \"template_name\"]).agg(\n",
    "    {\"dur\": [\"count\", \"sum\", \"mean\"]}\n",
    ")\n",
    "top_templates.columns = [\"count\", \"total_dur\", \"avg_dur\"]\n",
    "top_templates[\"total_s\"] = top_templates[\"total_dur\"] / 1_000_000\n",
    "top_templates[\"avg_ms\"] = top_templates[\"avg_dur\"] / 1000\n",
    "top_templates = top_templates.sort_values(\"total_dur\", ascending=False).reset_index()\n",
    "\n",
    "print(\"\\n=== Top 30 Templates by Total Time Across Build ===\")\n",
    "display(\n",
    "    top_templates.head(30)[\n",
    "        [\"namespace\", \"template_name\", \"count\", \"total_s\", \"avg_ms\"]\n",
    "    ].style.format({\"count\": \"{:,.0f}\", \"total_s\": \"{:,.1f}\", \"avg_ms\": \"{:,.1f}\"})\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Template time by namespace\n",
    "namespace_summary = (\n",
    "    template_df.groupby(\"namespace\")\n",
    "    .agg({\"dur\": [\"count\", \"sum\", \"mean\"], \"param_count\": \"mean\"})\n",
    "    .round(2)\n",
    ")\n",
    "namespace_summary.columns = [\"count\", \"total_dur\", \"avg_dur\", \"avg_params\"]\n",
    "namespace_summary[\"total_s\"] = namespace_summary[\"total_dur\"] / 1_000_000\n",
    "namespace_summary = namespace_summary.sort_values(\"total_dur\", ascending=False)\n",
    "\n",
    "print(\"\\n=== Template Time by Namespace ===\")\n",
    "display(\n",
    "    namespace_summary.style.format(\n",
    "        {\n",
    "            \"count\": \"{:,d}\",\n",
    "            \"total_dur\": \"{:,.0f}\",\n",
    "            \"avg_dur\": \"{:,.0f}\",\n",
    "            \"avg_params\": \"{:,.2f}\",\n",
    "            \"total_s\": \"{:,.1f}\",\n",
    "        }\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# CK library templates analysis\n",
    "ck_templates = template_df[template_df[\"is_ck_type\"]].copy()\n",
    "\n",
    "print(\"=== CK Library Template Analysis ===\")\n",
    "print(f\"CK template instantiations: {len(ck_templates):,}\")\n",
    "print(f\"CK template time: {ck_templates['dur'].sum() / 1_000_000:.1f} seconds\")\n",
    "print(\n",
    "    f\"Percentage of total template time: {100 * ck_templates['dur'].sum() / template_df['dur'].sum():.1f}%\"\n",
    ")\n",
    "\n",
    "# Top CK templates\n",
    "ck_by_name = (\n",
    "    ck_templates.groupby(\"template_name\")\n",
    "    .agg({\"dur\": [\"count\", \"sum\", \"mean\"]})\n",
    "    .round(2)\n",
    ")\n",
    "ck_by_name.columns = [\"count\", \"total_dur\", \"avg_dur\"]\n",
    "ck_by_name[\"total_s\"] = ck_by_name[\"total_dur\"] / 1_000_000\n",
    "ck_by_name = ck_by_name.sort_values(\"total_dur\", ascending=False)\n",
    "\n",
    "print(\"\\n=== Top 20 CK Templates by Total Time ===\")\n",
    "display(\n",
    "    ck_by_name.head(20)[[\"count\", \"total_s\"]].style.format(\n",
    "        {\"count\": \"{:,d}\", \"total_s\": \"{:,.0f}\"}\n",
    "    )\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Cross-Analysis: Templates vs Compilation Time\n",
    "\n",
    "Analyze relationships between template instantiations and compilation time."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Template count per build unit\n",
    "template_counts = (\n",
    "    template_df.groupby(\"build_unit\").size().reset_index(name=\"template_count\")\n",
    ")\n",
    "\n",
    "# Template time per build unit\n",
    "template_time = (\n",
    "    template_df.groupby(\"build_unit\")[\"dur\"].sum().reset_index(name=\"template_time_us\")\n",
    ")\n",
    "template_time[\"template_time_s\"] = template_time[\"template_time_us\"] / 1_000_000\n",
    "\n",
    "# Merge with metadata\n",
    "analysis_df = (\n",
    "    metadata_df[[\"build_unit\", \"total_wall_time_s\"]]\n",
    "    .merge(template_counts, on=\"build_unit\", how=\"left\")\n",
    "    .merge(\n",
    "        template_time[[\"build_unit\", \"template_time_s\"]], on=\"build_unit\", how=\"left\"\n",
    "    )\n",
    ")\n",
    "\n",
    "# Fill NaN with 0 for units with no templates\n",
    "analysis_df[\"template_count\"] = analysis_df[\"template_count\"].fillna(0)\n",
    "analysis_df[\"template_time_s\"] = analysis_df[\"template_time_s\"].fillna(0)\n",
    "\n",
    "print(\"=== Template Count vs Compilation Time ===\")\n",
    "print(\n",
    "    f\"Correlation: {analysis_df['template_count'].corr(analysis_df['total_wall_time_s']):.3f}\"\n",
    ")\n",
    "\n",
    "# Top units by template count\n",
    "print(\"\\n=== Top 20 Build Units by Template Count ===\")\n",
    "display(analysis_df.nlargest(20, \"template_count\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Scatter plot: template count vs compilation time\n",
    "fig = px.scatter(\n",
    "    analysis_df,\n",
    "    x=\"template_count\",\n",
    "    y=\"total_wall_time_s\",\n",
    "    hover_data=[\"build_unit\"],\n",
    "    title=\"Template Count vs Compilation Time\",\n",
    "    labels={\n",
    "        \"template_count\": \"Number of Template Instantiations\",\n",
    "        \"total_wall_time_s\": \"Compilation Time (seconds)\",\n",
    "    },\n",
    "    trendline=\"ols\",\n",
    ")\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Scatter plot: template time vs compilation time\n",
    "# Note: The total instantiation double-counts nested templates.\n",
    "fig = px.scatter(\n",
    "    analysis_df,\n",
    "    x=\"template_time_s\",\n",
    "    y=\"total_wall_time_s\",\n",
    "    hover_data=[\"build_unit\"],\n",
    "    title=\"Template Instantiation Time vs Total Compilation Time\",\n",
    "    labels={\n",
    "        \"template_time_s\": \"Template Instantiation Time (seconds)\",\n",
    "        \"total_wall_time_s\": \"Total Compilation Time (seconds)\",\n",
    "    },\n",
    "    trendline=\"ols\",\n",
    ")\n",
    "fig.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Summary\n",
    "\n",
    "This notebook demonstrated:\n",
    "- Parallel parsing of all trace files using the Pipeline API\n",
    "- Parallel extraction of metadata, phases, and templates in a single pass\n",
    "- Creating three consistently-keyed DataFrames with shared categorical dtype\n",
    "- **Trace file mapping** stored in metadata_df.attrs for easy lookup\n",
    "- Build-wide metadata analysis\n",
    "- Cumulative phase analysis with visualizations\n",
    "- Build-wide template analysis\n",
    "- Cross-analysis between templates and compilation time\n",
    "- **Using get_trace_file() to locate trace files for interesting build units**\n",
    "\n",
    "The shared categorical `build_unit` dtype enables efficient grouping and joining across all three DataFrames for comprehensive build analysis.\n",
    "\n",
    "The trace file mapping allows you to quickly download or analyze the raw trace JSON for any build unit of interest."
   ]
  }
 ],
 "metadata": {
  "kernelspec": {
   "display_name": ".venv",
   "language": "python",
   "name": "python3"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.12.3"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}