[CK_BUILDER] Add Description::instance_string() method and update tests (#3340)

* Create Description::instance_string() function

To expose more reflection capabilities in MIOpen, we add the instance_string functionality to the ckr::Description class. This PR introduces a base class, adds the instance_string method, and implements the method by injecting the Traits::instance_string method through the ConvDescription constructor.

This will enable us to replace the specialized get_instance_string() method on device operations with a describe() method in a subsequent PR.

* Test describe().instance_string()

Update the instance string tests to also call `ckr::describe<Instance>().instance_string()`. This documents that the xld kernels are supported with describe(), but WMMA and DL kernels are not yet supported. Also update namespace and add a HasConvTraits concept.
This commit is contained in:
John Shumway
2025-12-03 06:36:09 -08:00
committed by GitHub
parent e6a583416b
commit f29b67cf9b
18 changed files with 802 additions and 670 deletions

View File

@@ -10,19 +10,21 @@ The reflection system works by extracting properties from a convolution kernel *
1. **Trait Extraction**: The `ConvTraits` template (in `conv_traits.hpp`) is specialized for each kernel instance. It extracts low-level details like tile sizes, data layouts, and pipeline versions from the kernel's type definition.
2. **Description Generation**: The `Describe<Instance>()` function (in `conv_description.hpp`) uses `ConvTraits` to populate a `ConvDescription` struct.
2. **Description Generation**: The `describe<Instance>()` function (in `conv_description.hpp`) uses `ConvTraits` to populate a `ConvDescription` (`Description`) object.
3. **Formatting**: The `ConvDescription` struct contains methods like `brief()` and `detailed()` that format the extracted properties into well-structured strings for display.
3. **Formatting**: The `ConvDescription` class (which implements `Description`) contains methods like `brief()` and `detailed()` that format the extracted properties into well-structured strings for display.
## Key Files
- **`conv_description.hpp`**: The main entry point. Contains the `ConvDescription` struct and the `Describe()` factory function.
- **`description.hpp`**: The generalized Description base class with no implementation.
- **`conv_description.hpp`**: The main entry point. Contains the `ConvDescription` struct and the `describe()` factory function.
- **`conv_traits.hpp`**: Home of the `ConvTraits` template, which is the core of the property extraction mechanism.
- **`tree_formatter.hpp`**: A simple utility for generating the indented, tree-like format used in the `detailed()` description.
## Usage
To get a description of a convolution kernel instance, use the `Describe` function and call one of its formatting methods:
To get a description of a convolution kernel instance, use the `describe` function and call one of its formatting methods:
```cpp
#include "ck_tile/builder/reflect/conv_description.hpp"
@@ -36,3 +38,43 @@ const auto description = ck_tile::reflect::conv::Describe<MyConvFwdInstance>();
// Print the detailed description
std::cout << description.detailed() << std::endl;
```
## Appendix: Current Limitations
### Supported Instance Types
The reflection system (`ckr::describe`) currently supports the following convolution instance types:
- **Standard XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle`)
- **Large Tensor XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor`)
- **V3 XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3`)
These variants all share similar template parameter structures and are compatible with the current `ConvTraits` implementation.
### Unsupported Instance Types
The following instance types are **not yet supported** by the reflection system:
- **DL (pre-XDL) Variants** (`DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK`)
- Uses different internal structure with parameters like `K0PerBlock`, `K1`, `M1PerThread`, etc.
- Missing standard members like `kKPerBlock`, `kMPerXDL`, `kAK1`
- **WMMA Variants** (`DeviceGroupedConvFwdMultipleD_Wmma_CShuffle`)
- Uses WMMA-specific parameters like `MPerWmma`, `NPerWmma`, `MRepeat`, `NRepeat`
- Different tile transfer structure incompatible with current `ConvTraits`
- **Backward Weight Convolution** (`DeviceGroupedConvBwdWeight_Xdl_CShuffle`)
- Uses different layout naming: `InLayout`, `WeiLayout`, `OutLayout` instead of `ALayout`, `BLayout`, `ELayout`
- Different specialization type: `ConvBackwardWeightSpecialization` vs `ConvForwardSpecialization`
- Missing several members expected by forward convolution traits
### Future Work
To support these additional instance types, the reflection system would need:
1. Specialized `ConvTraits` templates for each variant type
2. Updated `conv_layout`, `conv_data_type`, and other helper functions to handle different parameter structures
3. Conditional compilation or SFINAE techniques to select the appropriate trait extraction logic based on instance type
4. Customize `ConvDescription` methods for more general kernels.
For now, these unsupported types can still use `GetInstanceString()` through the base class pointer, but cannot use the `ckr::describe` reflection API.

View File

@@ -1,21 +1,19 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
/**
* @file
* @brief Provides utilities to reflect on convolution kernel instances and generate
* human-readable descriptions of their configuration.
*
* This file contains the necessary components to transform a convolution kernel's
* compile-time properties into a structured, descriptive format. This is primarily
* used for debugging, logging, and generating documentation.
*
* Key components:
* - ck_tile::reflect::conv::ConvDescription: A struct that holds the extracted
* properties and provides methods to format them into strings.
* - ck_tile::reflect::conv::Describe(): A factory function that creates a
* ConvDescription from a given kernel instance type.
*/
/// @file
/// @brief Provides utilities to reflect on convolution kernel instances and generate
/// human-readable descriptions of their configuration.
///
/// This file contains the necessary components to transform a convolution kernel's
/// compile-time properties into a structured, descriptive format. This is primarily
/// used for debugging, logging, and generating documentation.
///
/// Key components:
/// - ck_tile::reflect::conv::ConvDescription: A struct that holds the extracted
/// properties and provides methods to format them into strings.
/// - ck_tile::reflect::conv::Describe(): A factory function that creates a
/// ConvDescription from a given kernel instance type.
#pragma once
@@ -24,14 +22,17 @@
#include <sstream>
#include <type_traits>
#include <variant>
#include <functional>
#include <ck_tile/builder/conv_signature_concepts.hpp>
#include <ck_tile/builder/reflect/conv_traits.hpp>
#include <ck_tile/builder/reflect/description.hpp>
#include <ck_tile/builder/reflect/instance_traits.hpp>
#include <ck_tile/builder/reflect/tree_formatter.hpp>
/// @brief Provides human-readable descriptions of convolution kernel instances
namespace ck_tile::reflect {
namespace ck_tile::reflect::conv {
namespace conv {
/// @brief Signature information for a convolution operation
/// Contains high-level properties that define the convolution's interface,
@@ -71,56 +72,68 @@ struct GemmAlgorithmInfo
/// @brief Provides human-readable descriptions of convolution kernel instances
/// Generates formatted text descriptions at various levels of detail for
/// understanding and documenting convolution kernel configurations.
struct ConvDescription
class ConvDescription : public Description
{
ConvSignatureInfo signature;
GemmAlgorithmInfo algorithm;
public:
/// @brief Constructor for ConvDescription
/// @param sig The signature information containing high-level convolution properties
/// @param algo The algorithm configuration containing low-level implementation details
/// @param instance_string_getter A callable that returns a string representation of the
/// instance
ConvDescription(ConvSignatureInfo sig,
GemmAlgorithmInfo algo,
std::function<std::string()> instance_string_getter)
: signature_(std::move(sig)),
algorithm_(std::move(algo)),
instance_string_getter_(std::move(instance_string_getter))
{
}
/// @brief Generate a brief one-line summary of the convolution
/// @return A concise description (e.g., "2D Forward convolution")
std::string brief() const
std::string brief() const override
{
std::ostringstream oss;
oss << signature.spatial_dim << "D " << signature.direction << " convolution";
oss << signature_.spatial_dim << "D " << signature_.direction << " convolution";
return oss.str();
}
/// @brief Generate a detailed hierarchical description of the convolution
/// @return A multi-line tree-formatted description covering signature and algorithm details
std::string detailed() const
std::string detailed() const override
{
TreeFormatter f;
f.writeLine(0, signature.spatial_dim, "D ", signature.direction, " Convolution Kernel");
f.writeLine(0, signature_.spatial_dim, "D ", signature_.direction, " Convolution Kernel");
f.writeLine(1, "Signature");
f.writeLine(2, "Tensor Type: ", signature.data_type);
f.writeLine(2, "Memory Layout: ", signature.layout);
f.writeLine(2, "Input elementwise operation: ", signature.input_element_op);
f.writeLine(2, "Weights elementwise operation: ", signature.weight_element_op);
f.writeLast(2, "Output elementwise operation: ", signature.output_element_op);
f.writeLine(2, "Tensor Type: ", signature_.data_type);
f.writeLine(2, "Memory Layout: ", signature_.layout);
f.writeLine(2, "Input elementwise operation: ", signature_.input_element_op);
f.writeLine(2, "Weights elementwise operation: ", signature_.weight_element_op);
f.writeLast(2, "Output elementwise operation: ", signature_.output_element_op);
f.writeLast(1, "Algorithm");
// Compute Block section
f.writeLine(2, "Thread block size: ", algorithm.thread_block_size);
f.writeLine(2, "Thread block size: ", algorithm_.thread_block_size);
f.writeLine(2,
"Data tile size: ",
algorithm.tile_dims.m,
algorithm_.tile_dims.m,
"×",
algorithm.tile_dims.n,
algorithm_.tile_dims.n,
"×",
algorithm.tile_dims.k);
f.writeLine(2, "Gemm padding: ", algorithm.padding);
f.writeLine(2, "Convolution specialization: ", algorithm.conv_specialization);
algorithm_.tile_dims.k);
f.writeLine(2, "Gemm padding: ", algorithm_.padding);
f.writeLine(2, "Convolution specialization: ", algorithm_.conv_specialization);
// Pipeline section
f.writeLine(2, "Pipeline version: ", algorithm.pipeline_version);
f.writeLine(2, "Pipeline scheduler: ", algorithm.pipeline_scheduler);
f.writeLine(2, "Pipeline version: ", algorithm_.pipeline_version);
f.writeLine(2, "Pipeline scheduler: ", algorithm_.pipeline_scheduler);
f.writeLine(2, "Warp Gemm parameters: ");
f.writeLine(
3, "subtile size: ", algorithm.warp_gemm.gemm_m, "×", algorithm.warp_gemm.gemm_n);
3, "subtile size: ", algorithm_.warp_gemm.gemm_m, "×", algorithm_.warp_gemm.gemm_n);
f.writeLast(3,
"Number of warp gemm iterations: ",
algorithm.warp_gemm.m_iter,
algorithm_.warp_gemm.m_iter,
"×",
algorithm.warp_gemm.n_iter);
algorithm_.warp_gemm.n_iter);
// Memory Access section
f.writeLast(2, "Memory access:");
@@ -128,152 +141,148 @@ struct ConvDescription
f.writeLine(3, "A Tile transfer: ");
f.writeLine(4,
"Tile dimensions: ",
algorithm.a_tile_transfer.tile_dimensions.k0,
algorithm_.a_tile_transfer.tile_dimensions.k0,
"×",
algorithm.a_tile_transfer.tile_dimensions.m_or_n,
algorithm_.a_tile_transfer.tile_dimensions.m_or_n,
"×",
algorithm.a_tile_transfer.tile_dimensions.k1,
algorithm_.a_tile_transfer.tile_dimensions.k1,
"×");
f.writeLine(
4, "The innermost K subdimension size: ", algorithm.a_tile_transfer.transfer_params.k1);
f.writeLine(4,
"The innermost K subdimension size: ",
algorithm_.a_tile_transfer.transfer_params.k1);
f.writeLine(4,
"Spatial thread distribution over the data tile: ",
algorithm.a_tile_transfer.transfer_params.thread_cluster_order[0],
algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[0],
"×",
algorithm.a_tile_transfer.transfer_params.thread_cluster_order[1],
algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[1],
"×",
algorithm.a_tile_transfer.transfer_params.thread_cluster_order[2]);
algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[2]);
f.writeLine(4,
"The order of accessing data tile axes: ",
algorithm.a_tile_transfer.transfer_params.src_access_order[0],
algorithm_.a_tile_transfer.transfer_params.src_access_order[0],
"×",
algorithm.a_tile_transfer.transfer_params.src_access_order[1],
algorithm_.a_tile_transfer.transfer_params.src_access_order[1],
"×",
algorithm.a_tile_transfer.transfer_params.src_access_order[2]);
algorithm_.a_tile_transfer.transfer_params.src_access_order[2]);
f.writeLine(4,
"Vectorized memory access axis index (with contiguous memory): ",
algorithm.a_tile_transfer.transfer_params.src_vector_dim);
algorithm_.a_tile_transfer.transfer_params.src_vector_dim);
f.writeLine(4,
"Vector access (GMEM read) instruction size: ",
algorithm.a_tile_transfer.transfer_params.src_scalar_per_vector);
algorithm_.a_tile_transfer.transfer_params.src_scalar_per_vector);
f.writeLine(4,
"Vector access (LDS write) instruction size: ",
algorithm.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
algorithm_.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
f.writeLast(4,
"LDS data layout padding (to prevent bank conflicts): ",
algorithm.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
algorithm_.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
f.writeLine(3, "B Tile transfer: ");
f.writeLine(4,
"Tile dimensions: ",
algorithm.b_tile_transfer.tile_dimensions.k0,
algorithm_.b_tile_transfer.tile_dimensions.k0,
"×",
algorithm.b_tile_transfer.tile_dimensions.m_or_n,
algorithm_.b_tile_transfer.tile_dimensions.m_or_n,
"×",
algorithm.b_tile_transfer.tile_dimensions.k1,
algorithm_.b_tile_transfer.tile_dimensions.k1,
"×");
f.writeLine(
4, "The innermost K subdimension size: ", algorithm.b_tile_transfer.transfer_params.k1);
f.writeLine(4,
"The innermost K subdimension size: ",
algorithm_.b_tile_transfer.transfer_params.k1);
f.writeLine(4,
"Spatial thread distribution over the data tile: ",
algorithm.b_tile_transfer.transfer_params.thread_cluster_order[0],
algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[0],
"×",
algorithm.b_tile_transfer.transfer_params.thread_cluster_order[1],
algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[1],
"×",
algorithm.b_tile_transfer.transfer_params.thread_cluster_order[2]);
algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[2]);
f.writeLine(4,
"The order of accessing data tile axes: ",
algorithm.b_tile_transfer.transfer_params.src_access_order[0],
algorithm_.b_tile_transfer.transfer_params.src_access_order[0],
"×",
algorithm.b_tile_transfer.transfer_params.src_access_order[1],
algorithm_.b_tile_transfer.transfer_params.src_access_order[1],
"×",
algorithm.b_tile_transfer.transfer_params.src_access_order[2]);
algorithm_.b_tile_transfer.transfer_params.src_access_order[2]);
f.writeLine(4,
"Vectorized memory access axis index (with contiguous memory): ",
algorithm.b_tile_transfer.transfer_params.src_vector_dim);
algorithm_.b_tile_transfer.transfer_params.src_vector_dim);
f.writeLine(4,
"Vector access (GMEM read) instruction size: ",
algorithm.b_tile_transfer.transfer_params.src_scalar_per_vector);
algorithm_.b_tile_transfer.transfer_params.src_scalar_per_vector);
f.writeLine(4,
"Vector access (LDS write) instruction size: ",
algorithm.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
algorithm_.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
f.writeLast(4,
"LDS data layout padding (to prevent bank conflicts): ",
algorithm.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
algorithm_.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
f.writeLast(3, "C Tile transfer: ");
f.writeLine(4,
"Data shuffle (number of gemm instructions per iteration): ",
algorithm.c_tile_transfer.shuffle_params.m_gemms_per_shuffle,
algorithm_.c_tile_transfer.shuffle_params.m_gemms_per_shuffle,
"×",
algorithm.c_tile_transfer.shuffle_params.n_gemms_per_shuffle);
algorithm_.c_tile_transfer.shuffle_params.n_gemms_per_shuffle);
f.writeLine(4,
"Spatial thread distribution used to store data: ",
algorithm.c_tile_transfer.thread_cluster_dims[0],
algorithm_.c_tile_transfer.thread_cluster_dims[0],
"×",
algorithm.c_tile_transfer.thread_cluster_dims[1],
algorithm_.c_tile_transfer.thread_cluster_dims[1],
"×",
algorithm.c_tile_transfer.thread_cluster_dims[2],
algorithm_.c_tile_transfer.thread_cluster_dims[2],
"×",
algorithm.c_tile_transfer.thread_cluster_dims[3]);
algorithm_.c_tile_transfer.thread_cluster_dims[3]);
f.writeLast(4,
"Vector access (GMEM write) instruction size: ",
algorithm.c_tile_transfer.scalar_per_vector);
algorithm_.c_tile_transfer.scalar_per_vector);
return f.getString();
}
/// @brief Generate an educational explanation of optimization choices
/// @return Educational content explaining why certain algorithm choices were made
/// @note Currently unimplemented - reserved for future enhancement
std::string explain() const
{
std::ostringstream oss;
// Placeholder for future implementation
return oss.str();
}
/// @brief Generate a string representation of the instance
/// @return A string that represents the instance
std::string instance_string() const override { return instance_string_getter_(); }
/// @brief Generate performance characteristics and use case guidance
/// @return Guidance on when this configuration is optimal and expected performance
/// @note Currently unimplemented - reserved for future enhancement
std::string suggest() const
{
std::ostringstream oss;
// Placeholder for future implementation
return oss.str();
}
private:
ConvSignatureInfo signature_;
GemmAlgorithmInfo algorithm_;
std::function<std::string()> instance_string_getter_;
};
} // namespace conv
/// @brief Helper concept to detect if a type has InstanceTraits specialization
/// @brief Helper concept to detect if a type has ConvTraits specialization
template <typename T>
concept HasInstanceTraits = requires { typename InstanceTraits<T>; };
concept HasConvTraits = requires { typename conv::ConvTraits<T>; };
/// @brief Factory function to create ConvDescription from a convolution instance type
/// @tparam Instance The convolution instance type (must have InstanceTraits specialization)
/// @return A ConvDescription object populated with the instance's configuration details
template <typename Instance>
requires HasInstanceTraits<Instance>
ConvDescription Describe()
template <HasConvTraits Instance>
conv::ConvDescription describe()
{
using Traits = ConvTraits<Instance>;
using Traits = conv::ConvTraits<Instance>;
return ConvDescription{
.signature = ConvSignatureInfo{.spatial_dim = Traits::spatial_dim,
.direction = Traits::direction,
.layout = Traits::layout,
.data_type = Traits::data_type,
.input_element_op = Traits::input_element_op,
.weight_element_op = Traits::weight_element_op,
.output_element_op = Traits::output_element_op},
.algorithm = GemmAlgorithmInfo{.thread_block_size = Traits::thread_block_size,
.tile_dims = Traits::tile_dims,
.warp_gemm = Traits::warp_gemm,
.a_tile_transfer = Traits::a_tile_transfer,
.b_tile_transfer = Traits::b_tile_transfer,
.c_tile_transfer = Traits::c_tile_transfer,
.pipeline_version = Traits::pipeline_version,
.pipeline_scheduler = Traits::pipeline_scheduler,
.conv_specialization = Traits::conv_specialization,
.padding = Traits::gemm_padding}};
return conv::ConvDescription(
conv::ConvSignatureInfo{
.spatial_dim = Traits::spatial_dim,
.direction = Traits::direction,
.layout = Traits::layout,
.data_type = Traits::data_type,
.input_element_op = Traits::input_element_op,
.weight_element_op = Traits::weight_element_op,
.output_element_op = Traits::output_element_op,
},
conv::GemmAlgorithmInfo{
.thread_block_size = Traits::thread_block_size,
.tile_dims = Traits::tile_dims,
.warp_gemm = Traits::warp_gemm,
.a_tile_transfer = Traits::a_tile_transfer,
.b_tile_transfer = Traits::b_tile_transfer,
.c_tile_transfer = Traits::c_tile_transfer,
.pipeline_version = Traits::pipeline_version,
.pipeline_scheduler = Traits::pipeline_scheduler,
.conv_specialization = Traits::conv_specialization,
.padding = Traits::gemm_padding,
},
[]() { return reflect::instance_string<Instance>(); });
}
} // namespace ck_tile::reflect::conv
} // namespace ck_tile::reflect

View File

@@ -551,8 +551,7 @@ struct ConvTraits;
/// @details This is the primary specialization used to extract a comprehensive
/// set of traits directly from a fully-formed device kernel `Instance` type.
/// It uses `InstanceTraits` to access the kernel's template parameters.
template <typename Instance>
requires requires { typename InstanceTraits<Instance>; }
template <HasInstanceTraits Instance>
struct ConvTraits<Instance>
{
using InstTraits = InstanceTraits<Instance>;

View File

@@ -0,0 +1,39 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
/// @file
/// @brief Provides a base class for generating human-readable descriptions of kernel instances.
///
/// This file contains the Description base class that defines a common interface for
/// all descriptor types. Derived classes implement specific formatting and explanation
/// logic for different kernel types (e.g., convolution, GEMM, etc.).
#pragma once
#include <string>
namespace ck_tile::reflect {
/// @brief Base class for generating human-readable descriptions of kernel instances
/// Defines a common interface for all descriptor types with methods for generating
/// descriptions at various levels of detail.
class Description
{
public:
/// @brief Virtual destructor for proper cleanup of derived classes
virtual ~Description() = default;
/// @brief Generate a brief one-line summary
/// @return A concise description of the kernel configuration
virtual std::string brief() const = 0;
/// @brief Generate a detailed hierarchical description
/// @return A multi-line tree-formatted description covering all configuration details
virtual std::string detailed() const = 0;
/// @brief Generate a string representation of the instance
/// @return A string that represents the instance
virtual std::string instance_string() const = 0;
};
} // namespace ck_tile::reflect