mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-14 11:07:44 +00:00

Files

Robin Voetter cc75948d1c [CK_BUILDER] conv bwd weight testing (#3618 )

* ck-builder: restructure testing conv

In order to prepare for bwd of conv testing, this commit moves some
files and types around so that we can reuse ckt::Args for both forward
and backwards convolution.

* ck-builder: decouple fwd_ck.hpp and fwd_reference.hpp from fwd.hpp

This will allow us to more easily include fwd.hpp from backwards
definitions, which is required for initializing bwd values.

* ck-builder: fix layout of test_ckb_conv_bwd_weight_xdl_cshuffle_v3

Turns out that the supplied layout isn't actually supported...

* ck-builder: ck and reference conv integration for bwd weight

* ck-builder: ck bwd weight execution test

* ck-builder: ckt::run support for ck-tile bwd weight

* ck-builder: ck tile bwd weight execution test

* ck-builder: extra debug printing in MatchesReference

* ck-builder: make ckt::run return RunResult

This type is more convenient than std::tuple, as it will allow us to
use google test matchers with this in the future.

* ck-builder: RunResult matcher

Using EXPECT_THAT(..., SuccessfulRun()) will generate a check and a nice error
message about how and why running an algorithm failed.

* ck-builder: doc fixes

* ck-builder: add missing headers

2026-01-26 23:50:15 +01:00

factory

[CK_BUILDER] Replace reference conv with old ck implementation (#3604 )

2026-01-21 19:18:47 +01:00

reflect

[CK_BUILDER] Convert convolution traits to a struct with factory functions (#3547 )

2026-01-15 10:03:21 +01:00

testing

[CK_BUILDER] conv bwd weight testing (#3618 )

2026-01-26 23:50:15 +01:00

builder_utils.hpp

Fix copyright messages in experimental/builder. (#3253 )

2025-11-20 17:40:55 -08:00

CMakeLists.txt

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

conv_algorithm_concepts.hpp

[CK_BUILDER] Convolution forward transfer concepts. (#3535 )

2026-01-19 10:54:10 +01:00

conv_algorithm_limits.hpp

[CK_BUILDER] Convolution forward transfer concepts. (#3535 )

2026-01-19 10:54:10 +01:00

conv_builder.hpp

[CK_BUILDER] Refactor builder factory code. (#3276 )

2025-12-02 07:40:14 -08:00

conv_signature_concepts.hpp

[CK_BUILDER] Add bwd weight factories (#3509 )

2026-01-13 18:12:38 +02:00

conv_signature_utils.hpp

[ck_builder] add utility functions to convolution (#3459 )

2025-12-23 10:39:49 +01:00

README.md

Update README.md files to match recent code changes

2026-01-15 02:15:29 -05:00

types.hpp

[CK_BUILDER] Add bwd weight factories (#3509 )

2026-01-13 18:12:38 +02:00

versions.hpp

Fix copyright messages in experimental/builder. (#3253 )

2025-11-20 17:40:55 -08:00

README.md

Composable Kernel Builder Design Documentation

This directory contains the builder framework for Composable Kernel, which provides a compile-time, type-safe interface for constructing convolution operations with various configurations.

Convolution Signature
Convolution Algorithm
Convolution Factory

Convolution Signature

Overview

The convolution signature system provides a compile-time description of grouped convolution operations. A signature is a collection of properties that fully characterize a convolution kernel's mathematical and operational behavior, enabling:

Compile-time validation: Ensures type safety and correctness before kernel instantiation
Kernel selection: Matches user requirements to optimized implementations
Specialization: Enables optimized code paths for specific configurations
Composability: Supports building complex operations from simpler components

The signature leverages modern C++20 features, particularly concepts, to provide expressive, self-documenting interfaces with compile-time guarantees.

Architecture

The signature system is organized into a hierarchical structure:

┌─────────────────────────────────────────────────────────┐
│                    ConvSignature                        │
├─────────────────────────────────────────────────────────┤
│ Properties:                                             │
│   • spatial_dim: int           (1D, 2D, or 3D)          │
│   • direction: ConvDirection   (Fwd/BwdData/BwdWeight)  │
│   • data_type: DataType        (default data type)      │
│   • accumulation_data_type: DataType                    │
│   • input: ConvTensor          ──┐                      │
│   • weight: ConvTensor         ──│                      │
│   • output: ConvTensor         ──│                      │
└──────────────────────────────────┼──────────────────────┘
                                   │
                                   ▼
              ┌─────────────────────────────────────────┐
              │           ConvTensor                    │
              ├─────────────────────────────────────────┤
              │ ╔═════════════════════════════════════╗ │
              │ ║ TensorConfig (required)             ║ │
              │ ╠═════════════════════════════════════╣ │
              │ ║  • layout: ConvLayout               ║ │
              │ ║  • data_type: DataType (optional)   ║ │
              │ ║  • compute_type: DataType (optional)║ │
              │ ╚═════════════════════════════════════╝ │
              │                                         │
              │ ┌─────────────────────────────────────┐ │
              │ │ TensorOperation (optional)          │ │
              │ ├─────────────────────────────────────┤ │
              │ │  • elementwise_operation            │ │
              │ │  • auxiliary_operand_configs[]      │ │
              │ │    (each is also ConvTensor)  ◄───────┼─┐
              │ └─────────────────────────────────────┘ │ │
              └─────────────────────────────────────────┘ │
                                                          │
                                 Recursive ───────────────┘

Key Design Points:

ConvSignature contains three ConvTensor instances (input, weight, output)
All tensors share the same ConvTensor structure
Each ConvTensor has:
- TensorConfig (required): Defines layout as well as optional data and compute type overrides
- TensorOperation (optional): Defines fused elementwise operations
Auxiliary operands (e.g., bias) in TensorOperation also use the ConvTensor type

Core Components

1. Signature Level

The top-level signature contains global properties that apply to the entire convolution operation:

template <typename T>
concept ConvSignatureDescriptor = requires(T t) {
    { t.spatial_dim } -> std::convertible_to<unsigned int>;  // 1, 2, or 3
    { t.input } -> ConvTensorDescriptor;
    { t.weight } -> ConvTensorDescriptor;
    { t.output } -> ConvTensorDescriptor;
    requires ConvolutionDirectionWellDefinedIfProvided<T>;   // Optional direction
    requires detail::DataTypeWellDefinedIfProvided<T>; // Optional default data type
    requires detail::ElementwiseOpWellDefinedIfProvided<T>; // Optional default elementwise operation
};

Properties:

spatial_dim: Dimensionality of the convolution (1D, 2D, or 3D)
direction: Operation type (Optional, defaults to FORWARD)
- FORWARD: Standard forward convolution
- BACKWARD_DATA: Gradient computation w.r.t. input
- BACKWARD_WEIGHT: Gradient computation w.r.t. weights
data_type: Default data type for all tensors (FP32, FP16, BF16, FP8, I8, U8). (Optional, defaults to UNDEFINED_DATA_TYPE which indicates the type should be inferred or specified per-tensor, may be overridden by individual tensors)
elementwise_operation: Default elementwise operation for all tensors (Optional, defaults to PASS_THROUGH, may be overridden by individual tensors via their operation field)
accumulation_data_type: Type used for internal accumulation

2. Tensor Level

Each tensor (input, weight, output) has its own descriptor:

template <typename T>
concept ConvTensorDescriptor = requires(T t) {
    { t.config } -> TensorConfigDescriptor;
    requires ElementwiseOpWellDefinedIfProvided<T>;
};

A tensor descriptor encapsulates:

Configuration: Layout and data type information
operation Fused elementwise operations on this tensor (Optional, default provided by ConvSignatureDescriptor)

3. Tensor Configuration

Describes the memory layout and data types:

template <typename T>
concept TensorConfigDescriptor = requires(T t) {
    { t.layout } -> std::convertible_to<ConvLayout>;
    requires detail::DataTypeWellDefinedIfProvided<T>; // Override data type (Optional, default provided by ConvSignatureDescriptor)
};

Layout Types (dimension-specific):

Special Values:
- UNDEFINED_TENSOR_LAYOUT: Placeholder value indicating layout is not yet specified or should be inferred
1D Convolution:
- Input: GNCW, GNWC, NWGC, NGCW, G_NW_C_strided
- Weight: GKXC, GKCX, KXGC, G_K_X_C_strided
- Output: GNKW, GNWK, NWGK, NGKW, G_NW_K_strided
2D Convolution:
- Input: GNCHW, GNHWC, NHWGC, NGCHW, G_NHW_C_strided
- Weight: GKYXC, GKCYX, KYXGC, G_K_YX_C_strided
- Output: GNKHW, GNHWK, NHWGK, NGKHW, G_NHW_K_strided
3D Convolution:
- Input: GNCDHW, GNDHWC, NDHWGC, NGCDHW, G_NDHW_C_strided
- Weight: GKZYXC, GKCZYX, KZYXGC, G_K_ZYX_C_strided
- Output: GNKDHW, GNDHWK, NDHWGK, NGKDHW, G_NDHW_K_strided
Bias Tensors:
- GC, G_C_strided, G_K_strided

Where:

G = Groups
N = Batch size
C = Input channels
K = Output channels (filters)
W, H, D = Width, Height, Depth (spatial dimensions)
X, Y, Z = Filter dimensions

4. Tensor Operations

Describes fused elementwise operations applied to a tensor:

template <typename T>
concept TensorOperatorDescriptor = requires(T t) {
    { t.elementwise_operation } -> std::convertible_to<ElementwiseOperation>;
    requires AuxiliaryOperandConfigsWellDefinedIfProvided<T>;
};

Supported Operations:

PASS_THROUGH: No operation (identity)
SCALE: Multiply by a scalar
CLAMP: Clamp values to a range
BIAS_BNORM_CLAMP: Bias addition + batch normalization + clamp
SCALEADD_SCALEADD_RELU: Fused scale-add operations + ReLU activation

Auxiliary Operands: Some operations require additional tensor inputs (e.g., bias tensors, scaling factors). These are specified through auxiliary_operand_configs, which is an array of TensorConfigDescriptor objects describing the layout and data type of each auxiliary input.

Concepts and Validation

The signature system uses C++20 concepts for compile-time validation at multiple levels:

Constraint Concepts

// Spatial dimension must be 1, 2, or 3
template <auto N>
concept ConvSpatialDim = std::is_integral_v<decltype(N)> && (N == 1 || N == 2 || N == 3);

// Valid data types for convolution
template <DataType T>
concept ValidConvDataType = 
    (T == DataType::FP32) || (T == DataType::FP16) || (T == DataType::BF16) ||
    (T == DataType::FP8) || (T == DataType::I8) || (T == DataType::U8);

Validation Concept

// Validates a complete signature
template <auto Sig>
concept ValidConvSignature = requires {
    requires ConvSpatialDim<Sig.spatial_dim>;
    requires ValidConvDataType<Sig.data_type>;
};

Tensor Descriptors

The layout/data type/elementwise operation are described per tensor. This multi-level hierarchy allows:

Flexibility: Each tensor can have independent layout and data type
Reusability: Common configurations can be shared across different signatures
Extensibility: New properties can be added to specific levels without affecting others
Clarity: Separates concerns (global properties vs. tensor-specific properties)

Optional Signature Fields

Several fields in the signature are optional:

direction: Defaults to FORWARD if not specified, reducing boilerplate for the common case
Tensor data_type: Falls back to signature's default, allowing mixed-precision with minimal specification
Tensor operation: Defaults to PASS_THROUGH, supporting both fused and non-fused operations with the same interface

This design follows the principle of "make the common case simple, the complex case possible."

Convolution Algorithm

Convolution Factory

Convolution factory builds the instance based on the convolution signature and convolution algorithm. The signature and the algorithm descriptions are dispatched to the relevant algorithm specific factory for instance creation. The convolution factory design is described in a separate Readme.

README.md

Composable Kernel Builder Design Documentation

Table of Contents

Convolution Signature

Overview

Architecture

Core Components

1. Signature Level

2. Tensor Level

3. Tensor Configuration

4. Tensor Operations

Concepts and Validation

Constraint Concepts

Validation Concept

Tensor Descriptors

Optional Signature Fields

Convolution Algorithm

Convolution Factory