mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-13 17:55:48 +00:00

Files

arai713 5d2fce819d [rocm-libraries] ROCm/rocm-libraries#4769 (commit 72ae66e)

[CK_TILE] Restructure Tile Engine's benchmarking and
 profiling (#4769)

## Motivation
This PR introduces a restructure for the benchmarking and profiling
aspects of CK Tile's Tile Engine, expanding on the groundwork from this
previous https://github.com/ROCm/composable_kernel/pull/3434 and
outlined in this [design
document](https://amdcloud-my.sharepoint.com/:w:/r/personal/astharai_amd_com/Documents/Restructuring%20Tile%20Engine.docx?d=w14ea28a30718416988ed5ebb759bd3b2&csf=1&web=1&e=l3VBuX).
In PR 3434, to reduce repeated code we implemented:

- Base class that centralizes common functionality and provides a
default implementation (Universal GEMM)
- Child classes for GEMM variants override virtual functions to handle
variant-specific behavior

This refactoring in this PR follows the same process and should greatly
reduce the duplicated code present in Tile Engine and make it simpler to
add in new operations, increasing scalability.

## Technical Details
The files have been refactored around new base structs for benchmarks,
profiling and problem descriptions. The new base structs are:

- GemmProblem
- GemmBenchmark
- GemmProfiler

Universal GEMM, Preshuffle GEMM, and Multi-D GEMM all have child classes
that will inherit from these base structs overriding only what differs
per variant.
All common functions across the benchmarking and profiling files have
been moved into newly added common utility files under the commons/
directory. The new utility files are:

- utils.hpp: common functions for the benchmarking and profiling process
- benchmark_utils.py: common utility functions for the benchmark
generation

## Test Plan
I tested using the existing tests for Tile Engine.
## Test Result
All tests passed.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

2026-04-14 17:51:20 +00:00

configs

Ck tile engine gemm unit tests exapand test coverage (#3025 )

2025-11-03 10:29:16 +01:00

CMakeLists.txt

[rocm-libraries] ROCm/rocm-libraries#4769 (commit 72ae66e)

2026-04-14 17:51:20 +00:00

extract_test_params.py

chore(copyright): update copyright header for test directory (#3243 )

2025-11-20 10:33:34 -08:00

README.md

Ck tile engine gemm unit tests exapand test coverage (#3025 )

2025-11-03 10:29:16 +01:00

test_gemm_simple.cpp

chore(copyright): update copyright header for test directory (#3243 )

2025-11-20 10:33:34 -08:00

README.md

GEMM Tile Engine Unit Tests

How It Works

This unit test system integrates tile_engine's kernel generation into automated testing:

Uses tile_engine scripts directly: Same Python scripts that generate tile_engine kernels
JSON-based configuration: Define test parameters in JSON files (like tile_engine)
Build-time generation: CMake calls tile_engine scripts to generate kernel headers
Individual test executables: Each kernel configuration becomes a separate test
Tile_engine verification: Uses exact same error thresholds and validation as tile_engine

Tile Engine Integration

JSON Config → tile_engine Python scripts → Generated Headers → Test Executables

--list_kernels: Get available kernel configurations from JSON
--gen_individual: Generate all kernel headers in parallel during CMake configuration
--gen_single: Generate individual kernel header for each configuration
Same verification: Uses tile_engine's adaptive error thresholds and reference calculations
Same patterns: Follows tile_engine's tensor initialization, stride calculation, and kernel launching

Config-Specific Test Parameters

Each test configuration can specify optimized problem sizes in its JSON file:

test_params.problem_sizes: Array of {m, n, k, split_k} configurations
CMake extraction: extract_test_params.py generates config-specific test parameter files
Build integration: Each test target uses parameters appropriate for its kernel configuration
Optimized testing: Different configs test different problem sizes that showcase their strengths

The key idea: Unit tests that use tile_engine's exact kernel generation and verification methodology instead of creating separate test infrastructure.

Test Configurations

1. Simple Test (`simple_test_config.json`)

Purpose: Basic functionality validation
Config: 128x128x64, warp 2x2x1, warp_tile 16x16x16
Traits: compv3 + compv4 pipelines
Coverage: ~2 kernels per datatype/layout

2. Small Datatype (`small_datatype_config.json`)

Purpose: Optimized for fp8/fp16/bf16 data types
Config: 128x128x32, warp 2x2x1, warp_tile 32x32x16
Traits: compv3 pipeline only
Coverage: All 4 layouts (rcr, rrr, ccr, crr) for fp8, fp16, bf16

3. Padding Coverage (`padding_coverage_config.json`)

Purpose: Test padding behavior with all padding flags enabled
Config: Fixed 64x64x32, warp 2x2x1, warp_tile 32x32x16
Padding: All enabled (pad_m=true, pad_n=true, pad_k=true)
Problem sizes: Vector-aligned but not tile-aligned (104×104×56, 200×152×80, 152×200×64)
Coverage: 1 kernel configuration testing padding with irregular sizes

4. Coverage Testing (Quick or Comprehensive)

Purpose: Comprehensive testing across tile sizes, warp configurations, and trait combinations
Quick (quick_coverage_config.json): Approximately 144 kernels
- tile_m/n: [32, 64, 256], tile_k: [16, 32]
- warp config: 2×2×1, warp_tile 16×16×16
- Traits: 3 pipelines × 2 epilogues × 2 schedulers (persistent=false only)
- Focused set testing trait combinations with multiple tile sizes
Comprehensive (comprehensive_coverage_config.json): Several thousand kernels
- tile_m/n: [16-256 step 16]
- tile_k: [16, 32, 64]
- warp_m/n: [1, 2, 4], warp_tile_m/n: [16, 32], warp_tile_k: [16, 32]
- Traits: 3 pipelines × 2 epilogues × 2 schedulers × 2 persistent
- Extensive coverage across all tile sizes, warp configurations, and trait combinations
- Exact count varies based on validation filtering
Note: Use CMake option -DCOVERAGE_LEVEL=comprehensive to enable comprehensive testing (default is quick)

Data Type Support

✅ fp8, fp16, bf16: Fully supported - all layouts (rcr, rrr, ccr, crr)
❌ fp64: Not supported (hardware MFMA limitation)
⏳ fp32, bf8, pk-int4-t: Not yet supported by gemm_instance_builder (will be added later)

Test Result Behavior

Tests automatically handle unsupported configurations through runtime validation:

PASSED: Kernel executed correctly with results within error thresholds ✅
SKIPPED: Kernel validation returned "Arguments not supported" (expected for certain problem sizes/configurations) ⚠️
FAILED: Actual error or incorrect computation results ❌

When a kernel's IsSupportedArgument() check fails (e.g., due to vector alignment requirements, dimension constraints, or padding limitations), the test is automatically skipped rather than failed. This allows comprehensive testing across various problem sizes while gracefully handling configurations that don't meet specific kernel requirements.

README.md Unescape Escape

GEMM Tile Engine Unit Tests

How It Works

Tile Engine Integration

Config-Specific Test Parameters

Test Configurations

1. Simple Test (simple_test_config.json)

2. Small Datatype (small_datatype_config.json)

3. Padding Coverage (padding_coverage_config.json)

4. Coverage Testing (Quick or Comprehensive)

Data Type Support

Test Result Behavior

README.md

1. Simple Test (`simple_test_config.json`)

2. Small Datatype (`small_datatype_config.json`)

3. Padding Coverage (`padding_coverage_config.json`)