[rocm-libraries] ROCm/rocm-libraries#5383 (commit b660b8c)

[CK_TILE] Add CShuffleLds microbenchmark suite
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

Microbenchmarks isolating LDS store/load operations in CShuffleEpilogue
for bank conflict analysis.

## Motivation

CShuffleEpilogue performs LDS store (MFMA registers → LDS) and load (LDS
→ registers for coalesced global writes). This suite isolates each
operation to:
- Identify which operation causes bank conflicts
- Measure pure LDS bandwidth per access pattern
- Validate access patterns across MFMA tile sizes and wave layouts

## Components

- **Microkernels** (`tile_load_store_microkernels.hpp`):
`StoreTile<Setup>`, `LoadTile<Setup>`
- **Setup Adapters** (`benchmark_cshuffle_lds.hpp`): Wire
CShuffleEpilogue to microkernels
- **Template** (`benchmark_template.cpp.in`): Generated benchmarks with
timing

## Build

```bash
cmake -G Ninja -B build -S . \
    -DGPU_TARGETS=gfx950 \
    -DBUILD_CK_EXAMPLES=ON \
    -DBUILD_CK_TILE_CSHUFFLE_LDS_BENCHMARKS=ON

ninja -C build bench_lds_fp8_16x16x128_2x2_fp8
```

## New CMake Options

| Option | Default | Description |
|--------|---------|-------------|
| `BUILD_CK_TILE_CSHUFFLE_LDS_BENCHMARKS` | OFF | LDS microbenchmarks |
| `BUILD_CK_TILE_FMHA_TESTS` | ON | FMHA tests |
| `BUILD_CK_TILE_ENGINE` | ON | Tile engine |
| `BUILD_CK_TILE_ENGINE_TESTS` | ON | Tile engine tests |
| `BUILD_CK_EXAMPLES` | ON | Examples |
| `BUILD_CK_TUTORIALS` | ON | Tutorials |
| `BUILD_CK_DEVICE_INSTANCES` | ON | Device instances |
| `BUILD_CK_PROFILER` | ON | Profiler |

Setting guards to OFF reduces cmake configure from ~150s to ~5s.
This commit is contained in:
Max Podkorytov
2026-04-15 03:44:07 +00:00
committed by assistant-librarian[bot]
parent 5348b577ed
commit 7dcc606adc
11 changed files with 629 additions and 74 deletions

View File

@@ -1,6 +1,23 @@
#!/bin/bash
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT
#
# Usage: cmake-ck-dev.sh [--minimal|--preset=NAME] [SOURCE_DIR] [GPU_TARGET] [CMAKE_ARGS...]
#
# Flags (can appear anywhere):
# --minimal Use dev-minimal preset (fast ~5s vs ~150s configure)
# --preset=NAME Use custom CMake preset
#
# Positional arguments:
# SOURCE_DIR Source directory (default: ..)
# GPU_TARGET GPU target like gfx90a (default: gfx908;gfx90a;gfx942)
# CMAKE_ARGS Additional arguments passed to cmake
#
# Examples:
# cmake-ck-dev.sh # Default build
# cmake-ck-dev.sh --minimal .. gfx90a # Fast iteration build
# cmake-ck-dev.sh .. gfx90a --minimal # Flags can go anywhere
# cmake-ck-dev.sh --preset=dev-gfx942 .. # Custom preset
# exit when a command exits with non-zero status; also when an unbound variable is referenced
set -eu
@@ -13,6 +30,35 @@ IFS=$(printf '\n\t')
find . -name CMakeFiles -type d -exec rm -rfv {} +
find . -name CMakeCache.txt -type f -exec rm -rv {} +
# Default preset
PRESET="dev"
POSITIONAL_ARGS=()
# Parse all arguments, extracting flags and preserving positional args
while [ $# -gt 0 ]; do
case "$1" in
--minimal)
PRESET="dev-minimal"
echo "Using minimal preset (fast configure ~5s vs ~150s)"
shift
;;
--preset=*)
PRESET="${1#--preset=}"
echo "Using preset: $PRESET"
shift
;;
*)
# Preserve positional arguments
POSITIONAL_ARGS+=("$1")
shift
;;
esac
done
# Restore positional arguments
set -- "${POSITIONAL_ARGS[@]}"
# Parse positional arguments
if [ $# -ge 1 ]; then
MY_PROJECT_SOURCE="$1"
shift 1
@@ -38,4 +84,4 @@ else
REST_ARGS=("$@")
fi
cmake "${MY_PROJECT_SOURCE}" --preset dev -DGPU_TARGETS="$GPU_TARGETS" "${REST_ARGS[@]}"
cmake "${MY_PROJECT_SOURCE}" --preset "$PRESET" -DGPU_TARGETS="$GPU_TARGETS" "${REST_ARGS[@]}"