Files
composable_kernel/experimental/builder
Márton Bidlek 683865895e [rocm-libraries] ROCm/rocm-libraries#5135 (commit 5ccc138)
Proof of concept for removing forward declarations
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

Currently, we forward declare CK device operation templates in
CK-Builder's reflection code:

9b168082b7/experimental/builder/include/ck_tile/builder/reflect/instance_traits_device_grouped_conv_bwd_weight_xdl_cshuffle.hpp (L13-L57)
This is mainly required to break a circular dependency in reflection.
The architecture of that is as follows:

MyDeviceOp implements GetInstanceString(). This is typically defined
directly in the class definition (no forward declaration).

GetInstanceString() calls instance_string<MyDeviceOp>()

instance_string<MyDeviceOp>() calls
InstanceTraits<MyDeviceOp>::instance_string()

InstanceTraits has a specialization for MyDeviceOp which implements
instance_string()

So order for GetInstanceString() to work properly, InstanceTraits must
already be defined. And for InstanceTraits to be defined, the device op
needs to be defined. In order to do that, we are currently using
aforementioned forward declaration.

## Technical Details

C++'s lazy template evaluation is used by calling into an as-of-yet
undefined function static member function of
`InstanceTraits<MyDeviceOp>` in `GetInstanceString()`, and then
specializing `InstanceTraits` only _after that_. The caveat here is that
both the device op itself as well as the instance traits specialization
must be in scope, otherwise there would be an undefined function error.
In practise, we can solve that either by placing the instance traits
directly into the file that defines `MyDeviceOp`, or possibly by using a
`.inc` file to keep the concerns separated.

## Test Plan

The results were verified by running the existing regression tests for
CK Builder

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
2026-03-09 16:35:26 +00:00
..

Builder

This directory contains the experimental builder feature for composable_kernel.

  • Status: In development (October 2025 - March 2026)

Overview

The builder provides a high-level, semantically-clear interface for constructing composable kernel operations, with an initial focus on convolution kernels for MIOpen. It leverages modern C++20 features (such as POD structs as non-type template parameters, concepts, and designated initializers) to simplify kernel instantiation and improve developer experience.

This project is a prototype for a more general builder pattern for all of composable_kernel (CK) and CK Tile, but is currently limited to formalizing the interface between MIOpen and CK.

Design Direction

The builder's primary goal is transparent dispatch across two backend implementations: old CK (template-heavy device operations) and CK Tile (modern tile-based API). MIOpen, the consumer library, should construct kernels through the builder without needing to know which backend provides the implementation.

Current state: The builder dispatches correctly, but each kernel variant (forward XDL, forward WMMA, backward weight XDL V3, etc.) has its own factory and its own algorithm descriptor shape. The result is 16+ per-variant facades rather than one unified facade. Unification across three axes — CK vs CK Tile backend, MFMA vs WMMA instruction set, and forward vs backward direction — is the central design challenge.

Three principles guide the design toward that unification:

  1. Unified vocabulary through reflection. The reflection system (reflect/) extracts kernel traits from both backends into a common ConvTraits representation. This shared vocabulary is the mechanism for discovering what algorithm parameters are truly variant-specific versus what can be expressed once and mapped to multiple backends.

  2. Expert overrides. Power users can pin to a specific backend or device operation when needed, bypassing automatic dispatch.

  3. Versioned API evolution. The builder uses semantic version strings ("0.0.0", "0.1.0") to manage API changes predictably. The ConvBuilder template defaults to the latest version but accepts explicit version pinning.

Design descriptions

Directory Structure

  • include/ck_tile/builder/ Core builder headers and public API.
  • include/ck_tile/builder/reflect Reflection mechanism.
  • include/ck_tile/builder/factory Compile-time dispatch from builder descriptors to our existing specialized convolution kernel implementations.
  • test/ Unit tests and example usage of the builder pattern.
  • CMakeLists.txt CMake configuration for building the experimental builder and its tests.

CMake Configuration

To enable the experimental builder, configure your build with:

cmake                                                                                             \
  -D CMAKE_PREFIX_PATH=/opt/rocm                                                                  \
  -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc                                                       \
  -D CMAKE_BUILD_TYPE=Release                                                                     \
  -D GPU_TARGETS="gfx942"                                                                         \
  -D CK_EXPERIMENTAL_BUILDER=ON                                                                   \
  -D CMAKE_CXX_STANDARD=20                                                                        \
  -G Ninja                                                                                        \
  ..

Note: The tests for WMMA builders are only built when CK_USE_WMMA is enabled. Add e.g. gfx1121 or any of the other gfx11/gfx12 architectures to the GPU targets. Alternatively, one can add flag -D CK_USE_WMMA=ON to build the tests. For the end-to-end tests that use the instances from builder, one needs an actual Navi card.

Building and Testing

The builder test suite is organized into two main categories:

Smoke Tests (Fast Unit Tests)

Quick unit tests that verify the builder's internal logic without compiling GPU kernels. These complete in under 1 second total and are suitable for frequent execution during development.

ninja smoke-builder

Regression Tests (Integration Tests)

Integration tests that compile actual GPU kernels to verify that the builder generates valid, compilable code. These are more expensive than smoke tests (can take minutes to compile) but cover more functionality.

ninja regression-builder

Running All Tests

To build and run the complete test suite:

ninja check-builder

Building Individual Tests

To build and run a specific test:

ninja test_ckb_conv_builder && bin/test_ckb_conv_builder

Test Organization

  • Smoke tests: Fast feedback during active development
  • Regression tests: Thorough validation before submitting changes
  • Factory tests: Expensive tests that build all MIOpen kernels (included in regression tests)

When adding new tests, please follow the convention where the CMake build target starts with a prefix test_ckb. This allows filtering of CK Builder tests from the full CK repository test suite.