Files
composable_kernel/experimental/builder/include/ck_tile/builder/reflect
Ville Pietilä e40687bfc3 [CK_BUILDER] Add bwd weight factories (#3509)
* Add placeholder test.

* Initial conv bwd weight factory.

* Conv builder test refactoring.

* Add missing pieces to bwd weight factory.

* Improve compile time erros message when no matching factory is found.

* Use amcro to ensure automatic macthing between concepts are their string representations.

* Improve compile time diagnostics.

* Small improvements.

* Improve missing member/wrong type compile-time errors.

* Improve compile time diagnostics.

* Concept bug fixes.

* Remove debug assert.

* Update algorithm signature diagnostics.

* Factory bug fixes.

* First functional version of bwd weight conv factory.

* Refactor handing of GEMM-K batch template parameter in conv bwd weight factory.

* Concept improvements.

* Improve concept diagnostics.

* Introduve a common size type for concepts.

* Update compiletime diagnostics to use the size type.

* Update conv specialization enum.

* Fix fwd conv builder tests.

* Fix smoke tests.

* Separate bwd weigth and bwd data tests into separate targets.

* Clean-up CK Tile builder tests.

* Add bwd weight XDL CShuffle V3 factory.

* Build conv bwd weigth v3 instances successfully.

* Add instance traits for DeviceGroupedConvBwdWeight_Xdl_CShuffleV3.

* Test fix.

* Add instance traits for bwd weight algorithms.

* Add unit tests for instance strings.

* Build new instance traits unit tests but exclude WMMA for now.

* Added factory for DeviceGroupedConvBwdWeightTwoStage_Xdl_CShuffle.

* Conv bwd weight DL factory.

* Final implementation for bwd weight DL factory.

* Add test for creating DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle instance.

* Add factory for DeviceGroupedConvBwdWeightMultipleD_Xdl_CShuffle

* Treat ref algorithm the same way as real algorithms in the dispatcher.

* Refactor large tensor support and WMMA configuration.

* Add factory and tests for DeviceGroupedConvBwdWeight_Wmma_CShuffleV3.

* Update Readme.

* Fix WMMA bwd weight tests.

* Added factory and tests for DeviceGroupedConvBwdWeightTwoStage_Wmma_CShuffleV3.

* Factory and tests for DeviceGroupedConvBwdWeight_Wmma_CShuffle.

* Dispatching for DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffle.

* Add factory for DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffleV3

* Fix DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffleV3 factory and  compute types for input and output tensor in bwd weigth convs.

* Fix fwd factories after refactoring.

* clang-format

* Move compile-time diagnostics to a separate branch.

* Fix ref algorithm dispatching.

* Fix smoke tests.

* clang-format

* Fix factory for regular WMMA conv bwd weight.

* Clarify builder Readme.

* Remove obsolete test file.

* Fix test after merge.

* clang-format

* Remove the C++26 extensions.

* Unify conv elementwise ops and layout definitions for fwd and bwd directions.

* Remove old layout and elementwise ops.

* Unify handling of conv tensor types between fwd and bwd directions.

* Unify block transfer for fwd and bwd directions. Rename ThreadSliceDim to ThreadClusterRank.

* Make BlockTransferDescriptor concept parametrized. Introduce a common TileTransferParameters concept for conv algorithms.

* clang-format

---------

Co-authored-by: Ville Pietilä <>

[ROCm/composable_kernel commit: 9908a87c31]
2026-01-13 18:12:38 +02:00
..

Convolution Reflection Directory

This directory contains tools for "reflecting" on convolution kernel instances. It allows developers to inspect the compile-time configuration of a kernel and generate detailed, human-readable descriptions.

See the main builder documentation for an overview.

Design Overview

The reflection system works by extracting properties from a convolution kernel type and formatting them into a string. This is useful for debugging, performance tuning, and generating documentation.

  1. Trait Extraction: The ConvTraits template (in conv_traits.hpp) is specialized for each kernel instance. It extracts low-level details like tile sizes, data layouts, and pipeline versions from the kernel's type definition.

  2. Description Generation: The describe<Instance>() function (in conv_description.hpp) uses ConvTraits to populate a ConvDescription (Description) object.

  3. Formatting: The ConvDescription class (which implements Description) contains methods like brief() and detailed() that format the extracted properties into well-structured strings for display.

Key Files

  • description.hpp: The generalized Description base class with no implementation.

  • conv_description.hpp: The main entry point. Contains the ConvDescription struct and the describe() factory function.

  • conv_traits.hpp: Home of the ConvTraits template, which is the core of the property extraction mechanism.

  • tree_formatter.hpp: A simple utility for generating the indented, tree-like format used in the detailed() description.

Usage

To get a description of a convolution kernel instance, use the describe function and call one of its formatting methods:

#include "ck_tile/builder/reflect/conv_description.hpp"

// Assume MyConvFwdInstance is a type alias for a specific kernel instance
using MyConvFwdInstance = /* ... some kernel type ... */;

// Describe the instance
const auto description = ck_tile::reflect::conv::Describe<MyConvFwdInstance>();

// Print the detailed description
std::cout << description.detailed() << std::endl;

Appendix: Current Limitations

Supported Instance Types

The reflection system (ckr::describe) currently supports the following convolution instance types:

  • Standard XDL Forward Convolution (DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle)
  • Large Tensor XDL Forward Convolution (DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor)
  • V3 XDL Forward Convolution (DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3)

These variants all share similar template parameter structures and are compatible with the current ConvTraits implementation.

Unsupported Instance Types

The following instance types are not yet supported by the reflection system:

  • DL (pre-XDL) Variants (DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK)

    • Uses different internal structure with parameters like K0PerBlock, K1, M1PerThread, etc.
    • Missing standard members like kKPerBlock, kMPerXDL, kAK1
  • WMMA Variants (DeviceGroupedConvFwdMultipleD_Wmma_CShuffle)

    • Uses WMMA-specific parameters like MPerWmma, NPerWmma, MRepeat, NRepeat
    • Different tile transfer structure incompatible with current ConvTraits
  • Backward Weight Convolution (DeviceGroupedConvBwdWeight_Xdl_CShuffle)

    • Uses different layout naming: InLayout, WeiLayout, OutLayout instead of ALayout, BLayout, ELayout
    • Different specialization type: ConvBackwardWeightSpecialization vs ConvForwardSpecialization
    • Missing several members expected by forward convolution traits

Future Work

To support these additional instance types, the reflection system would need:

  1. Specialized ConvTraits templates for each variant type
  2. Updated conv_layout, conv_data_type, and other helper functions to handle different parameter structures
  3. Conditional compilation or SFINAE techniques to select the appropriate trait extraction logic based on instance type
  4. Customize ConvDescription methods for more general kernels.

For now, these unsupported types can still use GetInstanceString() through the base class pointer, but cannot use the ckr::describe reflection API.