Files
composable_kernel/experimental/builder
Robin Voetter 97d6e59580 [CK_BUILDER] Integrate CKB validation with CK verification (#3649)
* ck-builder: tensor copy function

This function copies one tensor to another, so that the memory
layout can be changed between them.

* ck-builder: fix ck::bhalf literals

These types don't work properly.

* ck-builder: abstract compare_elements in gpu_verification.hpp and make builder use it

This reduces the amount of duplicated code a bit.

* ck-builder: add flat tensor iterator

This "iterator" type pretends to be a pointer, useful for passing
tensors to functions expecting pointer-like types.

* ck-builder: integrate validation with ck gpu verification

By templating the gpu_verify function over iterators, we can use
the new FlatTensorIterator to adapt the function to multi-
dimensional tensors without changing either implementation
too much.

* ck-builder: add check_by_accumulations

This changes the gpu_verification.hpp code to also accept "iterator"
types for the relevant gpu_verify and gpu_reduce_max functions.

* ck: fix test_gpu_verification GenerateRandomData for bhalf

is_integer_it<bhalf_t> yields true, but it is not actually
an integer.

* ck: make gpu_verification kernels be proper persistent kernels

Previously these were using a hardcoded value for the grid size. This
commit changes that so that the grid size is automatically derived
from the kernel's occupancy and the number of multiprocessors on
the GPU.

* ck: clean up gpu_verification.hpp using block_reduce

This implements a small generic block reduce function, and rewrites
the rest of gpu_verification.hpp using that function to clean it up
a bit.

* ck-builder: doc typos

* ck-builder: update testing readme with validation interface.

* ck-builder: rebase fixes + review comments

* ck-builder: fix device integer generation with float types

Passing bfloat here causes a nans due to type_convert performing
a bitcast.

* ck: another bhalf_t bug

CK expects that int-generation with ck::bhalf_t yields bhalf integers,
not unsigned integers. This makes the logic of FillUniformRandInteger
compatible with GeneratorTensor_2<InDataType>, however idiotic that
may be.

[ROCm/composable_kernel commit: 42048bdb7d]
2026-01-28 17:41:02 +01:00
..

Builder

This directory contains the experimental builder feature for composable_kernel.

  • Status: In development (October 2025 - March 2026)

Overview

The builder provides a high-level, semantically-clear interface for constructing composable kernel operations, with an initial focus on convolution kernels for MIOpen. It leverages modern C++20 features (such as POD structs as non-type template parameters, concepts, and designated initializers) to simplify kernel instantiation and improve developer experience.

This project is a prototype for a more general builder pattern for all of composable_kernel (CK) and CK Tile, but is currently limited to formalizing the interface between MIOpen and CK.

Design descriptions

Directory Structure

  • include/ck_tile/builder/ Core builder headers and public API.
  • include/ck_tile/builder/reflect Reflection mechanism.
  • include/ck_tile/builder/factory Compile-time dispatch from builder descriptors to our exisitng specialized convolution kernel implementations.
  • test/ Unit tests and example usage of the builder pattern.
  • CMakeLists.txt CMake configuration for building the experimental builder and its tests.

CMake Configuration

To enable the experimental builder, configure your build with:

cmake                                                                                             \
  -D CMAKE_PREFIX_PATH=/opt/rocm                                                                  \
  -D CMAKE_CXX_COMPILER=/opt/rocm/bin/hipcc                                                       \
  -D CMAKE_BUILD_TYPE=Release                                                                     \
  -D GPU_TARGETS="gfx942"                                                                         \
  -D CK_EXPERIMENTAL_BUILDER=ON                                                                   \
  -D CMAKE_CXX_STANDARD=20                                                                        \
  -G Ninja                                                                                        \
  ..

Note: The tests for WMMA builders are only built when CK_USE_WMMA is enabled. Add e.g. gfx1121 or any of the other gfx11/gfx12 architectures to the GPU targets. Alternatively, one can add flag -D CK_USE_WMMA=ON to build the tests. For the end-to-end tests that use the instances from builder, one needs an actual Navi card.

Building and Testing

The builder test suite is organized into two main categories:

Smoke Tests (Fast Unit Tests)

Quick unit tests that verify the builder's internal logic without compiling GPU kernels. These complete in under 1 second total and are suitable for frequent execution during development.

ninja smoke-builder

Regression Tests (Integration Tests)

Integration tests that compile actual GPU kernels to verify that the builder generates valid, compilable code. These are more expensive than smoke tests (can take minutes to compile) but cover more fuctionality. )

ninja regression-builder

Running All Tests

To build and run the complete test suite:

ninja check-builder

Building Individual Tests

To build and run a specific test:

ninja test_ckb_conv_builder && bin/test_ckb_conv_builder

Test Organization

  • Smoke tests: Fast feedback during active development
  • Regression tests: Thorough validation before submitting changes
  • Factory tests: Expensive tests that build all MIOpen kernels (included in regression tests)

When adding new tests, please follow the convention where the CMake build target starts with a prefix test_ckb. This allows filtering of CK Builder tests from the full CK repository test suite.