mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-19 22:39:03 +00:00
[CK_BUILDER] Add Description::instance_string() method and update tests (#3340)
* Create Description::instance_string() function To expose more reflection capabilities in MIOpen, we add the instance_string functionality to the ckr::Description class. This PR introduces a base class, adds the instance_string method, and implements the method by injecting the Traits::instance_string method through the ConvDescription constructor. This will enable us to replace the specialized get_instance_string() method on device operations with a describe() method in a subsequent PR. * Test describe().instance_string() Update the instance string tests to also call `ckr::describe<Instance>().instance_string()`. This documents that the xld kernels are supported with describe(), but WMMA and DL kernels are not yet supported. Also update namespace and add a HasConvTraits concept.
This commit is contained in:
@@ -10,19 +10,21 @@ The reflection system works by extracting properties from a convolution kernel *
|
||||
|
||||
1. **Trait Extraction**: The `ConvTraits` template (in `conv_traits.hpp`) is specialized for each kernel instance. It extracts low-level details like tile sizes, data layouts, and pipeline versions from the kernel's type definition.
|
||||
|
||||
2. **Description Generation**: The `Describe<Instance>()` function (in `conv_description.hpp`) uses `ConvTraits` to populate a `ConvDescription` struct.
|
||||
2. **Description Generation**: The `describe<Instance>()` function (in `conv_description.hpp`) uses `ConvTraits` to populate a `ConvDescription` (`Description`) object.
|
||||
|
||||
3. **Formatting**: The `ConvDescription` struct contains methods like `brief()` and `detailed()` that format the extracted properties into well-structured strings for display.
|
||||
3. **Formatting**: The `ConvDescription` class (which implements `Description`) contains methods like `brief()` and `detailed()` that format the extracted properties into well-structured strings for display.
|
||||
|
||||
## Key Files
|
||||
|
||||
- **`conv_description.hpp`**: The main entry point. Contains the `ConvDescription` struct and the `Describe()` factory function.
|
||||
- **`description.hpp`**: The generalized Description base class with no implementation.
|
||||
|
||||
- **`conv_description.hpp`**: The main entry point. Contains the `ConvDescription` struct and the `describe()` factory function.
|
||||
- **`conv_traits.hpp`**: Home of the `ConvTraits` template, which is the core of the property extraction mechanism.
|
||||
- **`tree_formatter.hpp`**: A simple utility for generating the indented, tree-like format used in the `detailed()` description.
|
||||
|
||||
## Usage
|
||||
|
||||
To get a description of a convolution kernel instance, use the `Describe` function and call one of its formatting methods:
|
||||
To get a description of a convolution kernel instance, use the `describe` function and call one of its formatting methods:
|
||||
|
||||
```cpp
|
||||
#include "ck_tile/builder/reflect/conv_description.hpp"
|
||||
@@ -36,3 +38,43 @@ const auto description = ck_tile::reflect::conv::Describe<MyConvFwdInstance>();
|
||||
// Print the detailed description
|
||||
std::cout << description.detailed() << std::endl;
|
||||
```
|
||||
|
||||
## Appendix: Current Limitations
|
||||
|
||||
### Supported Instance Types
|
||||
|
||||
The reflection system (`ckr::describe`) currently supports the following convolution instance types:
|
||||
|
||||
- **Standard XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle`)
|
||||
- **Large Tensor XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor`)
|
||||
- **V3 XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3`)
|
||||
|
||||
These variants all share similar template parameter structures and are compatible with the current `ConvTraits` implementation.
|
||||
|
||||
### Unsupported Instance Types
|
||||
|
||||
The following instance types are **not yet supported** by the reflection system:
|
||||
|
||||
- **DL (pre-XDL) Variants** (`DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK`)
|
||||
- Uses different internal structure with parameters like `K0PerBlock`, `K1`, `M1PerThread`, etc.
|
||||
- Missing standard members like `kKPerBlock`, `kMPerXDL`, `kAK1`
|
||||
|
||||
- **WMMA Variants** (`DeviceGroupedConvFwdMultipleD_Wmma_CShuffle`)
|
||||
- Uses WMMA-specific parameters like `MPerWmma`, `NPerWmma`, `MRepeat`, `NRepeat`
|
||||
- Different tile transfer structure incompatible with current `ConvTraits`
|
||||
|
||||
- **Backward Weight Convolution** (`DeviceGroupedConvBwdWeight_Xdl_CShuffle`)
|
||||
- Uses different layout naming: `InLayout`, `WeiLayout`, `OutLayout` instead of `ALayout`, `BLayout`, `ELayout`
|
||||
- Different specialization type: `ConvBackwardWeightSpecialization` vs `ConvForwardSpecialization`
|
||||
- Missing several members expected by forward convolution traits
|
||||
|
||||
### Future Work
|
||||
|
||||
To support these additional instance types, the reflection system would need:
|
||||
|
||||
1. Specialized `ConvTraits` templates for each variant type
|
||||
2. Updated `conv_layout`, `conv_data_type`, and other helper functions to handle different parameter structures
|
||||
3. Conditional compilation or SFINAE techniques to select the appropriate trait extraction logic based on instance type
|
||||
4. Customize `ConvDescription` methods for more general kernels.
|
||||
|
||||
For now, these unsupported types can still use `GetInstanceString()` through the base class pointer, but cannot use the `ckr::describe` reflection API.
|
||||
|
||||
@@ -1,21 +1,19 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
/**
|
||||
* @file
|
||||
* @brief Provides utilities to reflect on convolution kernel instances and generate
|
||||
* human-readable descriptions of their configuration.
|
||||
*
|
||||
* This file contains the necessary components to transform a convolution kernel's
|
||||
* compile-time properties into a structured, descriptive format. This is primarily
|
||||
* used for debugging, logging, and generating documentation.
|
||||
*
|
||||
* Key components:
|
||||
* - ck_tile::reflect::conv::ConvDescription: A struct that holds the extracted
|
||||
* properties and provides methods to format them into strings.
|
||||
* - ck_tile::reflect::conv::Describe(): A factory function that creates a
|
||||
* ConvDescription from a given kernel instance type.
|
||||
*/
|
||||
/// @file
|
||||
/// @brief Provides utilities to reflect on convolution kernel instances and generate
|
||||
/// human-readable descriptions of their configuration.
|
||||
///
|
||||
/// This file contains the necessary components to transform a convolution kernel's
|
||||
/// compile-time properties into a structured, descriptive format. This is primarily
|
||||
/// used for debugging, logging, and generating documentation.
|
||||
///
|
||||
/// Key components:
|
||||
/// - ck_tile::reflect::conv::ConvDescription: A struct that holds the extracted
|
||||
/// properties and provides methods to format them into strings.
|
||||
/// - ck_tile::reflect::conv::Describe(): A factory function that creates a
|
||||
/// ConvDescription from a given kernel instance type.
|
||||
|
||||
#pragma once
|
||||
|
||||
@@ -24,14 +22,17 @@
|
||||
#include <sstream>
|
||||
#include <type_traits>
|
||||
#include <variant>
|
||||
#include <functional>
|
||||
|
||||
#include <ck_tile/builder/conv_signature_concepts.hpp>
|
||||
#include <ck_tile/builder/reflect/conv_traits.hpp>
|
||||
#include <ck_tile/builder/reflect/description.hpp>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck_tile/builder/reflect/tree_formatter.hpp>
|
||||
|
||||
/// @brief Provides human-readable descriptions of convolution kernel instances
|
||||
namespace ck_tile::reflect {
|
||||
|
||||
namespace ck_tile::reflect::conv {
|
||||
namespace conv {
|
||||
|
||||
/// @brief Signature information for a convolution operation
|
||||
/// Contains high-level properties that define the convolution's interface,
|
||||
@@ -71,56 +72,68 @@ struct GemmAlgorithmInfo
|
||||
/// @brief Provides human-readable descriptions of convolution kernel instances
|
||||
/// Generates formatted text descriptions at various levels of detail for
|
||||
/// understanding and documenting convolution kernel configurations.
|
||||
struct ConvDescription
|
||||
class ConvDescription : public Description
|
||||
{
|
||||
ConvSignatureInfo signature;
|
||||
GemmAlgorithmInfo algorithm;
|
||||
public:
|
||||
/// @brief Constructor for ConvDescription
|
||||
/// @param sig The signature information containing high-level convolution properties
|
||||
/// @param algo The algorithm configuration containing low-level implementation details
|
||||
/// @param instance_string_getter A callable that returns a string representation of the
|
||||
/// instance
|
||||
ConvDescription(ConvSignatureInfo sig,
|
||||
GemmAlgorithmInfo algo,
|
||||
std::function<std::string()> instance_string_getter)
|
||||
: signature_(std::move(sig)),
|
||||
algorithm_(std::move(algo)),
|
||||
instance_string_getter_(std::move(instance_string_getter))
|
||||
{
|
||||
}
|
||||
|
||||
/// @brief Generate a brief one-line summary of the convolution
|
||||
/// @return A concise description (e.g., "2D Forward convolution")
|
||||
std::string brief() const
|
||||
std::string brief() const override
|
||||
{
|
||||
std::ostringstream oss;
|
||||
oss << signature.spatial_dim << "D " << signature.direction << " convolution";
|
||||
oss << signature_.spatial_dim << "D " << signature_.direction << " convolution";
|
||||
return oss.str();
|
||||
}
|
||||
|
||||
/// @brief Generate a detailed hierarchical description of the convolution
|
||||
/// @return A multi-line tree-formatted description covering signature and algorithm details
|
||||
std::string detailed() const
|
||||
std::string detailed() const override
|
||||
{
|
||||
TreeFormatter f;
|
||||
f.writeLine(0, signature.spatial_dim, "D ", signature.direction, " Convolution Kernel");
|
||||
f.writeLine(0, signature_.spatial_dim, "D ", signature_.direction, " Convolution Kernel");
|
||||
f.writeLine(1, "Signature");
|
||||
f.writeLine(2, "Tensor Type: ", signature.data_type);
|
||||
f.writeLine(2, "Memory Layout: ", signature.layout);
|
||||
f.writeLine(2, "Input elementwise operation: ", signature.input_element_op);
|
||||
f.writeLine(2, "Weights elementwise operation: ", signature.weight_element_op);
|
||||
f.writeLast(2, "Output elementwise operation: ", signature.output_element_op);
|
||||
f.writeLine(2, "Tensor Type: ", signature_.data_type);
|
||||
f.writeLine(2, "Memory Layout: ", signature_.layout);
|
||||
f.writeLine(2, "Input elementwise operation: ", signature_.input_element_op);
|
||||
f.writeLine(2, "Weights elementwise operation: ", signature_.weight_element_op);
|
||||
f.writeLast(2, "Output elementwise operation: ", signature_.output_element_op);
|
||||
|
||||
f.writeLast(1, "Algorithm");
|
||||
// Compute Block section
|
||||
f.writeLine(2, "Thread block size: ", algorithm.thread_block_size);
|
||||
f.writeLine(2, "Thread block size: ", algorithm_.thread_block_size);
|
||||
f.writeLine(2,
|
||||
"Data tile size: ",
|
||||
algorithm.tile_dims.m,
|
||||
algorithm_.tile_dims.m,
|
||||
"×",
|
||||
algorithm.tile_dims.n,
|
||||
algorithm_.tile_dims.n,
|
||||
"×",
|
||||
algorithm.tile_dims.k);
|
||||
f.writeLine(2, "Gemm padding: ", algorithm.padding);
|
||||
f.writeLine(2, "Convolution specialization: ", algorithm.conv_specialization);
|
||||
algorithm_.tile_dims.k);
|
||||
f.writeLine(2, "Gemm padding: ", algorithm_.padding);
|
||||
f.writeLine(2, "Convolution specialization: ", algorithm_.conv_specialization);
|
||||
// Pipeline section
|
||||
f.writeLine(2, "Pipeline version: ", algorithm.pipeline_version);
|
||||
f.writeLine(2, "Pipeline scheduler: ", algorithm.pipeline_scheduler);
|
||||
f.writeLine(2, "Pipeline version: ", algorithm_.pipeline_version);
|
||||
f.writeLine(2, "Pipeline scheduler: ", algorithm_.pipeline_scheduler);
|
||||
f.writeLine(2, "Warp Gemm parameters: ");
|
||||
f.writeLine(
|
||||
3, "subtile size: ", algorithm.warp_gemm.gemm_m, "×", algorithm.warp_gemm.gemm_n);
|
||||
3, "subtile size: ", algorithm_.warp_gemm.gemm_m, "×", algorithm_.warp_gemm.gemm_n);
|
||||
f.writeLast(3,
|
||||
"Number of warp gemm iterations: ",
|
||||
algorithm.warp_gemm.m_iter,
|
||||
algorithm_.warp_gemm.m_iter,
|
||||
"×",
|
||||
algorithm.warp_gemm.n_iter);
|
||||
algorithm_.warp_gemm.n_iter);
|
||||
|
||||
// Memory Access section
|
||||
f.writeLast(2, "Memory access:");
|
||||
@@ -128,152 +141,148 @@ struct ConvDescription
|
||||
f.writeLine(3, "A Tile transfer: ");
|
||||
f.writeLine(4,
|
||||
"Tile dimensions: ",
|
||||
algorithm.a_tile_transfer.tile_dimensions.k0,
|
||||
algorithm_.a_tile_transfer.tile_dimensions.k0,
|
||||
"×",
|
||||
algorithm.a_tile_transfer.tile_dimensions.m_or_n,
|
||||
algorithm_.a_tile_transfer.tile_dimensions.m_or_n,
|
||||
"×",
|
||||
algorithm.a_tile_transfer.tile_dimensions.k1,
|
||||
algorithm_.a_tile_transfer.tile_dimensions.k1,
|
||||
"×");
|
||||
f.writeLine(
|
||||
4, "The innermost K subdimension size: ", algorithm.a_tile_transfer.transfer_params.k1);
|
||||
f.writeLine(4,
|
||||
"The innermost K subdimension size: ",
|
||||
algorithm_.a_tile_transfer.transfer_params.k1);
|
||||
f.writeLine(4,
|
||||
"Spatial thread distribution over the data tile: ",
|
||||
algorithm.a_tile_transfer.transfer_params.thread_cluster_order[0],
|
||||
algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[0],
|
||||
"×",
|
||||
algorithm.a_tile_transfer.transfer_params.thread_cluster_order[1],
|
||||
algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[1],
|
||||
"×",
|
||||
algorithm.a_tile_transfer.transfer_params.thread_cluster_order[2]);
|
||||
algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[2]);
|
||||
f.writeLine(4,
|
||||
"The order of accessing data tile axes: ",
|
||||
algorithm.a_tile_transfer.transfer_params.src_access_order[0],
|
||||
algorithm_.a_tile_transfer.transfer_params.src_access_order[0],
|
||||
"×",
|
||||
algorithm.a_tile_transfer.transfer_params.src_access_order[1],
|
||||
algorithm_.a_tile_transfer.transfer_params.src_access_order[1],
|
||||
"×",
|
||||
algorithm.a_tile_transfer.transfer_params.src_access_order[2]);
|
||||
algorithm_.a_tile_transfer.transfer_params.src_access_order[2]);
|
||||
f.writeLine(4,
|
||||
"Vectorized memory access axis index (with contiguous memory): ",
|
||||
algorithm.a_tile_transfer.transfer_params.src_vector_dim);
|
||||
algorithm_.a_tile_transfer.transfer_params.src_vector_dim);
|
||||
f.writeLine(4,
|
||||
"Vector access (GMEM read) instruction size: ",
|
||||
algorithm.a_tile_transfer.transfer_params.src_scalar_per_vector);
|
||||
algorithm_.a_tile_transfer.transfer_params.src_scalar_per_vector);
|
||||
f.writeLine(4,
|
||||
"Vector access (LDS write) instruction size: ",
|
||||
algorithm.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
|
||||
algorithm_.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
|
||||
f.writeLast(4,
|
||||
"LDS data layout padding (to prevent bank conflicts): ",
|
||||
algorithm.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
|
||||
algorithm_.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
|
||||
|
||||
f.writeLine(3, "B Tile transfer: ");
|
||||
f.writeLine(4,
|
||||
"Tile dimensions: ",
|
||||
algorithm.b_tile_transfer.tile_dimensions.k0,
|
||||
algorithm_.b_tile_transfer.tile_dimensions.k0,
|
||||
"×",
|
||||
algorithm.b_tile_transfer.tile_dimensions.m_or_n,
|
||||
algorithm_.b_tile_transfer.tile_dimensions.m_or_n,
|
||||
"×",
|
||||
algorithm.b_tile_transfer.tile_dimensions.k1,
|
||||
algorithm_.b_tile_transfer.tile_dimensions.k1,
|
||||
"×");
|
||||
f.writeLine(
|
||||
4, "The innermost K subdimension size: ", algorithm.b_tile_transfer.transfer_params.k1);
|
||||
f.writeLine(4,
|
||||
"The innermost K subdimension size: ",
|
||||
algorithm_.b_tile_transfer.transfer_params.k1);
|
||||
f.writeLine(4,
|
||||
"Spatial thread distribution over the data tile: ",
|
||||
algorithm.b_tile_transfer.transfer_params.thread_cluster_order[0],
|
||||
algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[0],
|
||||
"×",
|
||||
algorithm.b_tile_transfer.transfer_params.thread_cluster_order[1],
|
||||
algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[1],
|
||||
"×",
|
||||
algorithm.b_tile_transfer.transfer_params.thread_cluster_order[2]);
|
||||
algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[2]);
|
||||
f.writeLine(4,
|
||||
"The order of accessing data tile axes: ",
|
||||
algorithm.b_tile_transfer.transfer_params.src_access_order[0],
|
||||
algorithm_.b_tile_transfer.transfer_params.src_access_order[0],
|
||||
"×",
|
||||
algorithm.b_tile_transfer.transfer_params.src_access_order[1],
|
||||
algorithm_.b_tile_transfer.transfer_params.src_access_order[1],
|
||||
"×",
|
||||
algorithm.b_tile_transfer.transfer_params.src_access_order[2]);
|
||||
algorithm_.b_tile_transfer.transfer_params.src_access_order[2]);
|
||||
f.writeLine(4,
|
||||
"Vectorized memory access axis index (with contiguous memory): ",
|
||||
algorithm.b_tile_transfer.transfer_params.src_vector_dim);
|
||||
algorithm_.b_tile_transfer.transfer_params.src_vector_dim);
|
||||
f.writeLine(4,
|
||||
"Vector access (GMEM read) instruction size: ",
|
||||
algorithm.b_tile_transfer.transfer_params.src_scalar_per_vector);
|
||||
algorithm_.b_tile_transfer.transfer_params.src_scalar_per_vector);
|
||||
f.writeLine(4,
|
||||
"Vector access (LDS write) instruction size: ",
|
||||
algorithm.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
|
||||
algorithm_.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
|
||||
f.writeLast(4,
|
||||
"LDS data layout padding (to prevent bank conflicts): ",
|
||||
algorithm.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
|
||||
algorithm_.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
|
||||
|
||||
f.writeLast(3, "C Tile transfer: ");
|
||||
f.writeLine(4,
|
||||
"Data shuffle (number of gemm instructions per iteration): ",
|
||||
algorithm.c_tile_transfer.shuffle_params.m_gemms_per_shuffle,
|
||||
algorithm_.c_tile_transfer.shuffle_params.m_gemms_per_shuffle,
|
||||
"×",
|
||||
algorithm.c_tile_transfer.shuffle_params.n_gemms_per_shuffle);
|
||||
algorithm_.c_tile_transfer.shuffle_params.n_gemms_per_shuffle);
|
||||
f.writeLine(4,
|
||||
"Spatial thread distribution used to store data: ",
|
||||
algorithm.c_tile_transfer.thread_cluster_dims[0],
|
||||
algorithm_.c_tile_transfer.thread_cluster_dims[0],
|
||||
"×",
|
||||
algorithm.c_tile_transfer.thread_cluster_dims[1],
|
||||
algorithm_.c_tile_transfer.thread_cluster_dims[1],
|
||||
"×",
|
||||
algorithm.c_tile_transfer.thread_cluster_dims[2],
|
||||
algorithm_.c_tile_transfer.thread_cluster_dims[2],
|
||||
"×",
|
||||
algorithm.c_tile_transfer.thread_cluster_dims[3]);
|
||||
algorithm_.c_tile_transfer.thread_cluster_dims[3]);
|
||||
f.writeLast(4,
|
||||
"Vector access (GMEM write) instruction size: ",
|
||||
algorithm.c_tile_transfer.scalar_per_vector);
|
||||
algorithm_.c_tile_transfer.scalar_per_vector);
|
||||
return f.getString();
|
||||
}
|
||||
|
||||
/// @brief Generate an educational explanation of optimization choices
|
||||
/// @return Educational content explaining why certain algorithm choices were made
|
||||
/// @note Currently unimplemented - reserved for future enhancement
|
||||
std::string explain() const
|
||||
{
|
||||
std::ostringstream oss;
|
||||
// Placeholder for future implementation
|
||||
return oss.str();
|
||||
}
|
||||
/// @brief Generate a string representation of the instance
|
||||
/// @return A string that represents the instance
|
||||
std::string instance_string() const override { return instance_string_getter_(); }
|
||||
|
||||
/// @brief Generate performance characteristics and use case guidance
|
||||
/// @return Guidance on when this configuration is optimal and expected performance
|
||||
/// @note Currently unimplemented - reserved for future enhancement
|
||||
std::string suggest() const
|
||||
{
|
||||
std::ostringstream oss;
|
||||
// Placeholder for future implementation
|
||||
return oss.str();
|
||||
}
|
||||
private:
|
||||
ConvSignatureInfo signature_;
|
||||
GemmAlgorithmInfo algorithm_;
|
||||
std::function<std::string()> instance_string_getter_;
|
||||
};
|
||||
} // namespace conv
|
||||
|
||||
/// @brief Helper concept to detect if a type has InstanceTraits specialization
|
||||
/// @brief Helper concept to detect if a type has ConvTraits specialization
|
||||
template <typename T>
|
||||
concept HasInstanceTraits = requires { typename InstanceTraits<T>; };
|
||||
concept HasConvTraits = requires { typename conv::ConvTraits<T>; };
|
||||
|
||||
/// @brief Factory function to create ConvDescription from a convolution instance type
|
||||
/// @tparam Instance The convolution instance type (must have InstanceTraits specialization)
|
||||
/// @return A ConvDescription object populated with the instance's configuration details
|
||||
template <typename Instance>
|
||||
requires HasInstanceTraits<Instance>
|
||||
ConvDescription Describe()
|
||||
template <HasConvTraits Instance>
|
||||
conv::ConvDescription describe()
|
||||
{
|
||||
using Traits = ConvTraits<Instance>;
|
||||
using Traits = conv::ConvTraits<Instance>;
|
||||
|
||||
return ConvDescription{
|
||||
.signature = ConvSignatureInfo{.spatial_dim = Traits::spatial_dim,
|
||||
.direction = Traits::direction,
|
||||
.layout = Traits::layout,
|
||||
.data_type = Traits::data_type,
|
||||
.input_element_op = Traits::input_element_op,
|
||||
.weight_element_op = Traits::weight_element_op,
|
||||
.output_element_op = Traits::output_element_op},
|
||||
.algorithm = GemmAlgorithmInfo{.thread_block_size = Traits::thread_block_size,
|
||||
.tile_dims = Traits::tile_dims,
|
||||
.warp_gemm = Traits::warp_gemm,
|
||||
.a_tile_transfer = Traits::a_tile_transfer,
|
||||
.b_tile_transfer = Traits::b_tile_transfer,
|
||||
.c_tile_transfer = Traits::c_tile_transfer,
|
||||
.pipeline_version = Traits::pipeline_version,
|
||||
.pipeline_scheduler = Traits::pipeline_scheduler,
|
||||
.conv_specialization = Traits::conv_specialization,
|
||||
.padding = Traits::gemm_padding}};
|
||||
return conv::ConvDescription(
|
||||
conv::ConvSignatureInfo{
|
||||
.spatial_dim = Traits::spatial_dim,
|
||||
.direction = Traits::direction,
|
||||
.layout = Traits::layout,
|
||||
.data_type = Traits::data_type,
|
||||
.input_element_op = Traits::input_element_op,
|
||||
.weight_element_op = Traits::weight_element_op,
|
||||
.output_element_op = Traits::output_element_op,
|
||||
},
|
||||
conv::GemmAlgorithmInfo{
|
||||
.thread_block_size = Traits::thread_block_size,
|
||||
.tile_dims = Traits::tile_dims,
|
||||
.warp_gemm = Traits::warp_gemm,
|
||||
.a_tile_transfer = Traits::a_tile_transfer,
|
||||
.b_tile_transfer = Traits::b_tile_transfer,
|
||||
.c_tile_transfer = Traits::c_tile_transfer,
|
||||
.pipeline_version = Traits::pipeline_version,
|
||||
.pipeline_scheduler = Traits::pipeline_scheduler,
|
||||
.conv_specialization = Traits::conv_specialization,
|
||||
.padding = Traits::gemm_padding,
|
||||
},
|
||||
[]() { return reflect::instance_string<Instance>(); });
|
||||
}
|
||||
|
||||
} // namespace ck_tile::reflect::conv
|
||||
} // namespace ck_tile::reflect
|
||||
|
||||
@@ -551,8 +551,7 @@ struct ConvTraits;
|
||||
/// @details This is the primary specialization used to extract a comprehensive
|
||||
/// set of traits directly from a fully-formed device kernel `Instance` type.
|
||||
/// It uses `InstanceTraits` to access the kernel's template parameters.
|
||||
template <typename Instance>
|
||||
requires requires { typename InstanceTraits<Instance>; }
|
||||
template <HasInstanceTraits Instance>
|
||||
struct ConvTraits<Instance>
|
||||
{
|
||||
using InstTraits = InstanceTraits<Instance>;
|
||||
|
||||
@@ -0,0 +1,39 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
/// @file
|
||||
/// @brief Provides a base class for generating human-readable descriptions of kernel instances.
|
||||
///
|
||||
/// This file contains the Description base class that defines a common interface for
|
||||
/// all descriptor types. Derived classes implement specific formatting and explanation
|
||||
/// logic for different kernel types (e.g., convolution, GEMM, etc.).
|
||||
|
||||
#pragma once
|
||||
|
||||
#include <string>
|
||||
|
||||
namespace ck_tile::reflect {
|
||||
|
||||
/// @brief Base class for generating human-readable descriptions of kernel instances
|
||||
/// Defines a common interface for all descriptor types with methods for generating
|
||||
/// descriptions at various levels of detail.
|
||||
class Description
|
||||
{
|
||||
public:
|
||||
/// @brief Virtual destructor for proper cleanup of derived classes
|
||||
virtual ~Description() = default;
|
||||
|
||||
/// @brief Generate a brief one-line summary
|
||||
/// @return A concise description of the kernel configuration
|
||||
virtual std::string brief() const = 0;
|
||||
|
||||
/// @brief Generate a detailed hierarchical description
|
||||
/// @return A multi-line tree-formatted description covering all configuration details
|
||||
virtual std::string detailed() const = 0;
|
||||
|
||||
/// @brief Generate a string representation of the instance
|
||||
/// @return A string that represents the instance
|
||||
virtual std::string instance_string() const = 0;
|
||||
};
|
||||
|
||||
} // namespace ck_tile::reflect
|
||||
@@ -103,18 +103,18 @@ add_ck_builder_test(test_ckb_conv_builder
|
||||
# compilable code. They are more expensive but catch real-world issues.
|
||||
|
||||
|
||||
# Verifies that GetInstanceString() methods produce valid kernel code.
|
||||
# Verifies that GetInstanceString() methods and other functions produce valid kernel code.
|
||||
# Tests various convolution types:
|
||||
# - Group convolution (v3, standard, large tensor, WMMA, DL variants)
|
||||
# - Backward weight group convolution (XDL)
|
||||
# Requires kernel compilation to validate the generated strings.
|
||||
add_ck_builder_test(test_ckb_get_instance_string
|
||||
test_get_instance_string_fwd_grp_conv_v3.cpp
|
||||
test_get_instance_string_fwd_grp_conv.cpp
|
||||
test_get_instance_string_fwd_grp_conv_large_tensor.cpp
|
||||
test_get_instance_string_fwd_grp_conv_wmma.cpp
|
||||
test_get_instance_string_fwd_grp_conv_dl.cpp
|
||||
test_get_instance_string_bwd_weight_grp_conv_xdl.cpp)
|
||||
# Requires kernel compilation to validate the generated strings through the base class.
|
||||
add_ck_builder_test(test_ckb_instance_string
|
||||
test_instance_string_fwd_grp_conv_v3.cpp
|
||||
test_instance_string_fwd_grp_conv.cpp
|
||||
test_instance_string_fwd_grp_conv_large_tensor.cpp
|
||||
test_instance_string_fwd_grp_conv_wmma.cpp
|
||||
test_instance_string_fwd_grp_conv_dl.cpp
|
||||
test_instance_string_bwd_weight_grp_conv_xdl.cpp)
|
||||
|
||||
# Tests the forward convolution builder across multiple data types and dimensions.
|
||||
# Individual tests are split into separate files to enable parallel compilation.
|
||||
@@ -183,7 +183,7 @@ endforeach()
|
||||
|
||||
# Register all regression tests (integration tests with kernel compilation)
|
||||
set(CKB_REGRESSION_TESTS
|
||||
test_ckb_get_instance_string
|
||||
test_ckb_instance_string
|
||||
test_ckb_build_fwd_instances
|
||||
test_ckb_testing_utils
|
||||
test_ckb_factory_grouped_convolution_forward_convscale
|
||||
|
||||
@@ -13,7 +13,7 @@
|
||||
namespace {
|
||||
|
||||
namespace ckb = ck_tile::builder;
|
||||
namespace ckr = ck_tile::reflect::conv;
|
||||
namespace ckr = ck_tile::reflect;
|
||||
namespace ckt = ck_tile::test;
|
||||
|
||||
// Defines the signature of the convolution operation to be tested.
|
||||
@@ -110,7 +110,7 @@ TEST(ConvDescriptionTest, DefaultInstanceHasBriefDescription)
|
||||
static constexpr const ConvSignature SIGNATURE;
|
||||
static constexpr const DefaultAlgorithm ALGORITHM;
|
||||
using Instance = ckb::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
|
||||
EXPECT_THAT(ckr::Describe<Instance>().brief(), ckt::StringEqWithDiff("2D Forward convolution"));
|
||||
EXPECT_THAT(ckr::describe<Instance>().brief(), ckt::StringEqWithDiff("2D Forward convolution"));
|
||||
}
|
||||
|
||||
TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription)
|
||||
@@ -118,7 +118,7 @@ TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription)
|
||||
static constexpr const ConvSignature SIGNATURE;
|
||||
static constexpr const DefaultAlgorithm ALGORITHM;
|
||||
using Instance = ckb::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
|
||||
EXPECT_THAT(ckr::Describe<Instance>().detailed(),
|
||||
EXPECT_THAT(ckr::describe<Instance>().detailed(),
|
||||
ckt::StringEqWithDiff( //
|
||||
"2D Forward Convolution Kernel\n"
|
||||
"├─ Signature\n"
|
||||
@@ -162,6 +162,23 @@ TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription)
|
||||
" └─ Vector access (GMEM write) instruction size: 2"));
|
||||
}
|
||||
|
||||
TEST(ConvDescriptionTest, DefaultInstanceHasInstanceString)
|
||||
{
|
||||
static constexpr const ConvSignature SIGNATURE;
|
||||
static constexpr const DefaultAlgorithm ALGORITHM;
|
||||
using Instance = ckb::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
|
||||
|
||||
// Get the instance string from the description
|
||||
std::string instance_str = ckr::describe<Instance>().instance_string();
|
||||
|
||||
// Verify that the instance string is not empty
|
||||
EXPECT_FALSE(instance_str.empty());
|
||||
|
||||
// Verify that it contains the device operation name
|
||||
// The exact format depends on the InstanceTraits implementation
|
||||
EXPECT_THAT(instance_str, ::testing::HasSubstr("DeviceGroupedConvFwdMultipleABD"));
|
||||
}
|
||||
|
||||
// NOTE: BackwardDataInstanceHasDetailedDescription test is disabled because ConvFactory
|
||||
// does not have a specialization for backward data convolutions. The test fails with:
|
||||
// "implicit instantiation of undefined template 'ck_tile::builder::ConvFactory<...>'"
|
||||
@@ -195,4 +212,5 @@ TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription)
|
||||
// EXPECT_THAT(ckr::Describe<Builder>().detailed(),
|
||||
// ckt::StringEqWithDiff("PLACEHOLDER"));
|
||||
// }
|
||||
|
||||
} // namespace
|
||||
|
||||
@@ -1,86 +0,0 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck/tensor_operation/gpu/device/device_base.hpp>
|
||||
#include <ck/library/tensor_operation_instance/gpu/grouped_conv_bwd_weight/device_grouped_conv_bwd_weight_xdl_instance.hpp>
|
||||
|
||||
// Test GetInstanceString through base class pointer for backward weight XDL variant
|
||||
TEST(GetInstanceString, ReturnsStringForBwdWeightGrpConvXdlInstance)
|
||||
{
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple = ck::tensor_operation::device::instance::
|
||||
device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances<
|
||||
2, // NDimSpatial
|
||||
ck::tensor_operation::device::instance::GNHWC, // InLayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // WeiLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // OutLayout
|
||||
ck::tensor_operation::device::instance::
|
||||
ConvBwdWeightDefault>; // ConvBwdWeightSpecialization
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Define the base class type using the most general operator base
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
|
||||
// Create an instance of the derived class
|
||||
DeviceInstance device_instance;
|
||||
|
||||
// Get a pointer to the base class
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
// Call GetInstanceString through the base class pointer
|
||||
std::string instance_str = base_ptr->GetInstanceString();
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances
|
||||
// This corresponds to the configuration with BlockSize=64, MPerBlock=64, NPerBlock=64, etc.
|
||||
std::string expected_str = "DeviceGroupedConvBwdWeight_Xdl_CShuffle"
|
||||
"<2" // NDimSpatial
|
||||
",GNHWC" // InLayout
|
||||
",GKYXC" // WeiLayout
|
||||
",GNHWK" // OutLayout
|
||||
",fp16" // InDataType
|
||||
",fp16" // WeiDataType
|
||||
",fp16" // OutDataType
|
||||
",fp32" // AccDataType
|
||||
",PassThrough" // InElementwiseOperation
|
||||
",PassThrough" // WeiElementwiseOperation
|
||||
",PassThrough" // OutElementwiseOperation
|
||||
",Default" // ConvBackwardWeightSpecialization
|
||||
",64" // BlockSize
|
||||
",64" // MPerBlock
|
||||
",64" // NPerBlock
|
||||
",4" // K0PerBlock
|
||||
",8" // K1
|
||||
",32" // MPerXDL
|
||||
",32" // NPerXDL
|
||||
",2" // MXdlPerWave
|
||||
",2" // NXdlPerWave
|
||||
",Seq(1,4,8,2)" // ABlockTransferThreadClusterLengths_K0_M_K1
|
||||
",Seq(0,3,1,2)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(0,2,1,3)" // ABlockTransferSrcAccessOrder
|
||||
",2" // ABlockTransferSrcVectorDim
|
||||
",2" // ABlockTransferSrcScalarPerVector
|
||||
",4" // ABlockTransferDstScalarPerVector_K1
|
||||
",true" // ABlockLdsAddExtraM
|
||||
",Seq(1,4,8,2)" // BBlockTransferThreadClusterLengths_K0_N_K1
|
||||
",Seq(0,3,1,2)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(0,2,1,3)" // BBlockTransferSrcAccessOrder
|
||||
",2" // BBlockTransferSrcVectorDim
|
||||
",2" // BBlockTransferSrcScalarPerVector
|
||||
",4" // BBlockTransferDstScalarPerVector_K1
|
||||
",true" // BBlockLdsAddExtraN
|
||||
",1" // CShuffleMXdlPerWavePerShuffle
|
||||
",1" // CShuffleNXdlPerWavePerShuffle
|
||||
",Seq(1,16,1,4)" // CBlockTransferClusterLengths
|
||||
",2" // CBlockTransferScalarPerVector_NWaveNPerXdl
|
||||
",fp16" // ComputeTypeA
|
||||
",fp16" // ComputeTypeB
|
||||
",1" // MaxTransposeTransferSrcScalarPerVector
|
||||
",1>"; // MaxTransposeTransferDstScalarPerVector
|
||||
|
||||
EXPECT_EQ(instance_str, expected_str);
|
||||
}
|
||||
@@ -1,90 +0,0 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck/tensor_operation/gpu/device/device_base.hpp>
|
||||
#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp>
|
||||
|
||||
// Test GetInstanceString through base class pointer for non-V3 variant
|
||||
TEST(GetInstanceString, ReturnsStringForFwdGrpConvInstance)
|
||||
{
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple =
|
||||
ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_instances<
|
||||
2, // NDimSpatial
|
||||
ck::tensor_operation::device::instance::GNHWC, // ALayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // BLayout
|
||||
ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // ELayout
|
||||
ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Define the base class type using the most general operator base
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
|
||||
// Create an instance of the derived class
|
||||
DeviceInstance device_instance;
|
||||
|
||||
// Get a pointer to the base class
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
// Call GetInstanceString through the base class pointer
|
||||
std::string instance_str = base_ptr->GetInstanceString();
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv_fwd_xdl_f16_instances
|
||||
std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle"
|
||||
"<2" // NDimSpatial
|
||||
",GNHWC" // ALayout
|
||||
",GKYXC" // BLayout
|
||||
",EmptyTuple" // DsLayout
|
||||
",GNHWK" // ELayout
|
||||
",fp16" // ADataType
|
||||
",fp16" // BDataType
|
||||
",fp32" // AccDataType
|
||||
",fp16" // CShuffleDataType
|
||||
",EmptyTuple" // DsDataType
|
||||
",fp16" // EDataType
|
||||
",PassThrough" // AElementwiseOperation
|
||||
",PassThrough" // BElementwiseOperation
|
||||
",PassThrough" // CDEElementwiseOperation
|
||||
",Default" // ConvForwardSpecialization
|
||||
",MNKPadding" // GemmSpec
|
||||
",1" // NumGemmKPrefetchStage
|
||||
",64" // BlockSize
|
||||
",64" // MPerBlock
|
||||
",64" // NPerBlock
|
||||
",32" // KPerBlock
|
||||
",8" // AK1
|
||||
",8" // BK1
|
||||
",32" // MPerXDL
|
||||
",32" // NPerXDL
|
||||
",2" // MXdlPerWave
|
||||
",2" // NXdlPerWave
|
||||
",Seq(4,16,1)" // ABlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // ABlockTransferSrcAccessOrder
|
||||
",2" // ABlockTransferSrcVectorDim
|
||||
",1" // ABlockTransferSrcScalarPerVector
|
||||
",8" // ABlockTransferDstScalarPerVector_AK1
|
||||
",1" // ABlockLdsExtraM
|
||||
",Seq(4,16,1)" // BBlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // BBlockTransferSrcAccessOrder
|
||||
",2" // BBlockTransferSrcVectorDim
|
||||
",1" // BBlockTransferSrcScalarPerVector
|
||||
",8" // BBlockTransferDstScalarPerVector_BK1
|
||||
",1" // BBlockLdsExtraN
|
||||
",1" // CShuffleMXdlPerWavePerShuffle
|
||||
",1" // CShuffleNXdlPerWavePerShuffle
|
||||
",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths
|
||||
",1" // CDEBlockTransferScalarPerVector_NPerBlock
|
||||
",fp16" // AComputeDataType
|
||||
",fp16" // BComputeDataType
|
||||
",Default" // LoopScheduler
|
||||
",1>"; // NumGroupsToMerge
|
||||
EXPECT_EQ(instance_str, expected_str);
|
||||
}
|
||||
@@ -1,85 +0,0 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck/tensor_operation/gpu/device/device_base.hpp>
|
||||
#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp>
|
||||
|
||||
// Test GetInstanceString through base class pointer for DL variant
|
||||
TEST(GetInstanceString, ReturnsStringForFwdGrpConvDlInstance)
|
||||
{
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple =
|
||||
ck::tensor_operation::device::instance::device_grouped_conv2d_fwd_dl_f16_instances<
|
||||
ck::tensor_operation::device::instance::GNHWC, // InLayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // WeiLayout
|
||||
ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // OutLayout
|
||||
ck::Tuple<>, // DsDatatype
|
||||
ck::tensor_operation::element_wise::PassThrough, // CDEElementOp
|
||||
ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Define the base class type using the most general operator base
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
|
||||
// Create an instance of the derived class
|
||||
DeviceInstance device_instance;
|
||||
|
||||
// Get a pointer to the base class
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
// Call GetInstanceString through the base class pointer
|
||||
std::string instance_str = base_ptr->GetInstanceString();
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv2d_fwd_dl_f16_instances
|
||||
std::string expected_str = "DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK"
|
||||
"<2" // NDimSpatial
|
||||
",fp16" // ADataType
|
||||
",fp16" // BDataType
|
||||
",EmptyTuple" // DsDataType
|
||||
",fp16" // EDataType
|
||||
",fp32" // AccDataType
|
||||
",GNHWC" // ALayout
|
||||
",GKYXC" // BLayout
|
||||
",EmptyTuple" // DsLayout
|
||||
",GNHWK" // ELayout
|
||||
",PassThrough" // AElementwiseOperation
|
||||
",PassThrough" // BElementwiseOperation
|
||||
",PassThrough" // CDEElementwiseOperation
|
||||
",Default" // ConvForwardSpecialization
|
||||
",MNKPadding" // GemmSpec
|
||||
",8" // BlockSize
|
||||
",16" // MPerBlock
|
||||
",4" // NPerBlock
|
||||
",2" // K0PerBlock
|
||||
",1" // K1
|
||||
",1" // M1PerThread
|
||||
",2" // N1PerThread
|
||||
",1" // KPerThread
|
||||
",Seq(4,2)" // M1N1ThreadClusterM1Xs
|
||||
",Seq(1,1)" // M1N1ThreadClusterN1Xs
|
||||
",Seq(2,1,2,1)" // ABlockTransferThreadSliceLengths_K0_M0_M1_K1
|
||||
",Seq(1,1,8,1)" // ABlockTransferThreadClusterLengths_K0_M0_M1_K1
|
||||
",Seq(1,2,0,3)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,2,0,3)" // ABlockTransferSrcAccessOrder
|
||||
",Seq(1,1,1,1)" // ABlockTransferSrcVectorTensorLengths_K0_M0_M1_K1
|
||||
",Seq(1,2,0,3)" // ABlockTransferSrcVectorTensorContiguousDimOrder
|
||||
",Seq(1,1,1,1)" // ABlockTransferDstVectorTensorLengths_K0_M0_M1_K1
|
||||
",Seq(1,1,1,1)" // BBlockTransferThreadSliceLengths_K0_N0_N1_K1
|
||||
",Seq(2,1,4,1)" // BBlockTransferThreadClusterLengths_K0_N0_N1_K1
|
||||
",Seq(1,2,0,3)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,2,0,3)" // BBlockTransferSrcAccessOrder
|
||||
",Seq(1,1,1,1)" // BBlockTransferSrcVectorTensorLengths_K0_N0_N1_K1
|
||||
",Seq(1,2,0,3)" // BBlockTransferSrcVectorTensorContiguousDimOrder
|
||||
",Seq(1,1,1,1)" // BBlockTransferDstVectorTensorLengths_K0_N0_N1_K1
|
||||
",Seq(0,1,2,3,4,5)" // CThreadTransferSrcDstAccessOrder
|
||||
",5" // CThreadTransferSrcDstVectorDim
|
||||
",1>"; // CThreadTransferDstScalarPerVector
|
||||
|
||||
EXPECT_EQ(instance_str, expected_str);
|
||||
}
|
||||
@@ -1,89 +0,0 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck/tensor_operation/gpu/device/device_base.hpp>
|
||||
#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp>
|
||||
|
||||
// Test GetInstanceString through base class pointer for large tensor variant
|
||||
TEST(GetInstanceString, ReturnsStringForFwdGrpConvLargeTensorInstance)
|
||||
{
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple = ck::tensor_operation::device::instance::
|
||||
device_grouped_conv_fwd_xdl_large_tensor_f16_instances<
|
||||
2, // NDimSpatial
|
||||
ck::tensor_operation::device::instance::GNHWC, // ALayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // BLayout
|
||||
ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // ELayout
|
||||
ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Define the base class type using the most general operator base
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
|
||||
// Create an instance of the derived class
|
||||
DeviceInstance device_instance;
|
||||
|
||||
// Get a pointer to the base class
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
// Call GetInstanceString through the base class pointer
|
||||
std::string instance_str = base_ptr->GetInstanceString();
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv_fwd_xdl_large_tensor_f16_instances
|
||||
std::string expected_str = "DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor"
|
||||
"<2" // NDimSpatial
|
||||
",GNHWC" // ALayout
|
||||
",GKYXC" // BLayout
|
||||
",EmptyTuple" // DsLayout
|
||||
",GNHWK" // ELayout
|
||||
",fp16" // ADataType
|
||||
",fp16" // BDataType
|
||||
",fp32" // AccDataType
|
||||
",fp16" // CShuffleDataType
|
||||
",EmptyTuple" // DsDataType
|
||||
",fp16" // EDataType
|
||||
",PassThrough" // AElementwiseOperation
|
||||
",PassThrough" // BElementwiseOperation
|
||||
",PassThrough" // CDEElementwiseOperation
|
||||
",Default" // ConvForwardSpecialization
|
||||
",MNKPadding" // GemmSpec
|
||||
",1" // NumGemmKPrefetchStage
|
||||
",64" // BlockSize
|
||||
",64" // MPerBlock
|
||||
",64" // NPerBlock
|
||||
",32" // KPerBlock
|
||||
",8" // AK1
|
||||
",8" // BK1
|
||||
",32" // MPerXDL
|
||||
",32" // NPerXDL
|
||||
",2" // MXdlPerWave
|
||||
",2" // NXdlPerWave
|
||||
",Seq(4,16,1)" // ABlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // ABlockTransferSrcAccessOrder
|
||||
",2" // ABlockTransferSrcVectorDim
|
||||
",1" // ABlockTransferSrcScalarPerVector
|
||||
",8" // ABlockTransferDstScalarPerVector_AK1
|
||||
",1" // ABlockLdsExtraM
|
||||
",Seq(4,16,1)" // BBlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // BBlockTransferSrcAccessOrder
|
||||
",2" // BBlockTransferSrcVectorDim
|
||||
",1" // BBlockTransferSrcScalarPerVector
|
||||
",8" // BBlockTransferDstScalarPerVector_BK1
|
||||
",1" // BBlockLdsExtraN
|
||||
",1" // CShuffleMXdlPerWavePerShuffle
|
||||
",1" // CShuffleNXdlPerWavePerShuffle
|
||||
",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths
|
||||
",1" // CDEBlockTransferScalarPerVector_NPerBlock
|
||||
",fp16" // AComputeDataType
|
||||
",fp16" // BComputeDataType
|
||||
",Default>"; // LoopScheduler
|
||||
EXPECT_EQ(instance_str, expected_str);
|
||||
}
|
||||
@@ -1,91 +0,0 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck/tensor_operation/gpu/device/device_base.hpp>
|
||||
#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp>
|
||||
|
||||
// Test GetInstanceString through base class pointer for V3 variant
|
||||
TEST(GetInstanceString, ReturnsStringForFwdGrpConvV3Instance)
|
||||
{
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple =
|
||||
ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_comp_instances<
|
||||
2, // NDimSpatial
|
||||
ck::tensor_operation::device::instance::GNHWC, // ALayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // BLayout
|
||||
ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // ELayout
|
||||
ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Define the base class type using the most general operator base
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
|
||||
// Create an instance of the derived class
|
||||
DeviceInstance device_instance;
|
||||
|
||||
// Get a pointer to the base class
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
// Call GetInstanceString through the base class pointer
|
||||
std::string instance_str = base_ptr->GetInstanceString();
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv_fwd_xdl_f16_comp_instances This corresponds to the configuration with
|
||||
// BlockSize=256, MPerBlock=128, NPerBlock=128, KPerBlock=64, etc.
|
||||
std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3"
|
||||
"<2" // NDimSpatial
|
||||
",GNHWC" // ALayout
|
||||
",GKYXC" // BLayout
|
||||
",EmptyTuple" // DsLayout
|
||||
",GNHWK" // ELayout
|
||||
",fp16" // ADataType
|
||||
",fp16" // BDataType
|
||||
",fp32" // AccDataType
|
||||
",fp16" // CShuffleDataType
|
||||
",EmptyTuple" // DsDataType
|
||||
",fp16" // EDataType
|
||||
",PassThrough" // AElementwiseOperation
|
||||
",PassThrough" // BElementwiseOperation
|
||||
",PassThrough" // CDEElementwiseOperation
|
||||
",Default" // ConvForwardSpecialization
|
||||
",MNKPadding" // GemmSpec
|
||||
",256" // BlockSize
|
||||
",128" // MPerBlock
|
||||
",128" // NPerBlock
|
||||
",64" // KPerBlock
|
||||
",8" // AK1
|
||||
",8" // BK1
|
||||
",32" // MPerXDL
|
||||
",32" // NPerXDL
|
||||
",2" // MXdlPerWave
|
||||
",2" // NXdlPerWave
|
||||
",Seq(8,32,1)" // ABlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // ABlockTransferSrcAccessOrder
|
||||
",2" // ABlockTransferSrcVectorDim
|
||||
",8" // ABlockTransferSrcScalarPerVector
|
||||
",8" // ABlockTransferDstScalarPerVector_AK1
|
||||
",0" // ABlockLdsExtraM
|
||||
",Seq(8,32,1)" // BBlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // BBlockTransferSrcAccessOrder
|
||||
",2" // BBlockTransferSrcVectorDim
|
||||
",8" // BBlockTransferSrcScalarPerVector
|
||||
",8" // BBlockTransferDstScalarPerVector_BK1
|
||||
",0" // BBlockLdsExtraN
|
||||
",1" // CShuffleMXdlPerWavePerShuffle
|
||||
",1" // CShuffleNXdlPerWavePerShuffle
|
||||
",Seq(1,32,1,8)" // CDEBlockTransferClusterLengths
|
||||
",8" // CDEBlockTransferScalarPerVector_NPerBlock
|
||||
",Intrawave" // BlkGemmPipeSched
|
||||
",v4" // BlkGemmPipelineVer
|
||||
",fp16" // AComputeDataType
|
||||
",fp16" // BComputeDataType
|
||||
",false>"; // DirectLoad
|
||||
EXPECT_EQ(instance_str, expected_str);
|
||||
}
|
||||
@@ -1,90 +0,0 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck/tensor_operation/gpu/device/device_base.hpp>
|
||||
#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp>
|
||||
|
||||
// Test GetInstanceString through base class pointer for Wmma variant
|
||||
TEST(GetInstanceString, ReturnsStringForFwdGrpConvWmmaInstance)
|
||||
{
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple =
|
||||
ck::tensor_operation::device::instance::device_grouped_conv_fwd_wmma_f16_instances<
|
||||
2, // NDimSpatial
|
||||
ck::tensor_operation::device::instance::GNHWC, // ALayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // BLayout
|
||||
ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // ELayout
|
||||
ck::Tuple<>, // DsDatatype
|
||||
ck::tensor_operation::element_wise::PassThrough, // CDEElementOp
|
||||
ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Define the base class type using the most general operator base
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
|
||||
// Create an instance of the derived class
|
||||
DeviceInstance device_instance;
|
||||
|
||||
// Get a pointer to the base class
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
// Call GetInstanceString through the base class pointer
|
||||
std::string instance_str = base_ptr->GetInstanceString();
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv_fwd_wmma_f16_instances
|
||||
std::string expected_str = "DeviceGroupedConvFwdMultipleD_Wmma_CShuffle"
|
||||
"<2" // NDimSpatial
|
||||
",GNHWC" // ALayout
|
||||
",GKYXC" // BLayout
|
||||
",EmptyTuple" // DsLayout
|
||||
",GNHWK" // ELayout
|
||||
",fp16" // ADataType
|
||||
",fp16" // BDataType
|
||||
",fp32" // AccDataType
|
||||
",fp16" // CShuffleDataType
|
||||
",EmptyTuple" // DsDataType
|
||||
",fp16" // EDataType
|
||||
",PassThrough" // AElementwiseOperation
|
||||
",PassThrough" // BElementwiseOperation
|
||||
",PassThrough" // CDEElementwiseOperation
|
||||
",Default" // ConvForwardSpecialization
|
||||
",MNKPadding" // GemmSpec
|
||||
",1" // NumGemmKPrefetchStage
|
||||
",128" // BlockSize
|
||||
",64" // MPerBlock
|
||||
",64" // NPerBlock
|
||||
",32" // KPerBlock
|
||||
",8" // K1
|
||||
",16" // MPerWmma
|
||||
",16" // NPerWmma
|
||||
",2" // MRepeat
|
||||
",2" // NRepeat
|
||||
",Seq(4,32,1)" // ABlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // ABlockTransferSrcAccessOrder
|
||||
",2" // ABlockTransferSrcVectorDim
|
||||
",1" // ABlockTransferSrcScalarPerVector
|
||||
",8" // ABlockTransferDstScalarPerVector_AK1
|
||||
",true" // ABlockLdsExtraM
|
||||
",Seq(4,32,1)" // BBlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // BBlockTransferSrcAccessOrder
|
||||
",2" // BBlockTransferSrcVectorDim
|
||||
",1" // BBlockTransferSrcScalarPerVector
|
||||
",8" // BBlockTransferDstScalarPerVector_BK1
|
||||
",true" // BBlockLdsExtraN
|
||||
",1" // CShuffleMRepeatPerShuffle
|
||||
",1" // CShuffleNRepeatPerShuffle
|
||||
",Seq(1,32,1,4)" // CDEShuffleBlockTransferClusterLengths
|
||||
",1" // CDEShuffleBlockTransferScalarPerVector_NPerBlock
|
||||
",Default" // LoopSched
|
||||
",v1>"; // PipelineVer
|
||||
|
||||
EXPECT_EQ(instance_str, expected_str);
|
||||
}
|
||||
@@ -0,0 +1,90 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include "ck_tile/builder/reflect/instance_traits.hpp"
|
||||
#include "ck_tile/builder/reflect/conv_description.hpp"
|
||||
#include "ck/tensor_operation/gpu/device/device_base.hpp"
|
||||
#include "ck/library/tensor_operation_instance/gpu/grouped_conv_bwd_weight/device_grouped_conv_bwd_weight_xdl_instance.hpp"
|
||||
|
||||
namespace {
|
||||
|
||||
namespace ckr = ck_tile::reflect;
|
||||
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple = ck::tensor_operation::device::instance::
|
||||
device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances<
|
||||
2, // NDimSpatial
|
||||
ck::tensor_operation::device::instance::GNHWC, // InLayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // WeiLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // OutLayout
|
||||
ck::tensor_operation::device::instance::
|
||||
ConvBwdWeightDefault>; // ConvBwdWeightSpecialization
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances
|
||||
// This corresponds to the configuration with BlockSize=64, MPerBlock=64, NPerBlock=64, etc.
|
||||
std::string expected_str = "DeviceGroupedConvBwdWeight_Xdl_CShuffle"
|
||||
"<2" // NDimSpatial
|
||||
",GNHWC" // InLayout
|
||||
",GKYXC" // WeiLayout
|
||||
",GNHWK" // OutLayout
|
||||
",fp16" // InDataType
|
||||
",fp16" // WeiDataType
|
||||
",fp16" // OutDataType
|
||||
",fp32" // AccDataType
|
||||
",PassThrough" // InElementwiseOperation
|
||||
",PassThrough" // WeiElementwiseOperation
|
||||
",PassThrough" // OutElementwiseOperation
|
||||
",Default" // ConvBackwardWeightSpecialization
|
||||
",64" // BlockSize
|
||||
",64" // MPerBlock
|
||||
",64" // NPerBlock
|
||||
",4" // K0PerBlock
|
||||
",8" // K1
|
||||
",32" // MPerXDL
|
||||
",32" // NPerXDL
|
||||
",2" // MXdlPerWave
|
||||
",2" // NXdlPerWave
|
||||
",Seq(1,4,8,2)" // ABlockTransferThreadClusterLengths_K0_M_K1
|
||||
",Seq(0,3,1,2)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(0,2,1,3)" // ABlockTransferSrcAccessOrder
|
||||
",2" // ABlockTransferSrcVectorDim
|
||||
",2" // ABlockTransferSrcScalarPerVector
|
||||
",4" // ABlockTransferDstScalarPerVector_K1
|
||||
",true" // ABlockLdsAddExtraM
|
||||
",Seq(1,4,8,2)" // BBlockTransferThreadClusterLengths_K0_N_K1
|
||||
",Seq(0,3,1,2)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(0,2,1,3)" // BBlockTransferSrcAccessOrder
|
||||
",2" // BBlockTransferSrcVectorDim
|
||||
",2" // BBlockTransferSrcScalarPerVector
|
||||
",4" // BBlockTransferDstScalarPerVector_K1
|
||||
",true" // BBlockLdsAddExtraN
|
||||
",1" // CShuffleMXdlPerWavePerShuffle
|
||||
",1" // CShuffleNXdlPerWavePerShuffle
|
||||
",Seq(1,16,1,4)" // CBlockTransferClusterLengths
|
||||
",2" // CBlockTransferScalarPerVector_NWaveNPerXdl
|
||||
",fp16" // ComputeTypeA
|
||||
",fp16" // ComputeTypeB
|
||||
",1" // MaxTransposeTransferSrcScalarPerVector
|
||||
",1>"; // MaxTransposeTransferDstScalarPerVector
|
||||
|
||||
// Test GetInstanceString through base class pointer for backward weight XDL variant
|
||||
TEST(InstanceString, GetInstanceStringReturnsCorrectValueForBwdWeightGrpConvXdl)
|
||||
{
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
DeviceInstance device_instance;
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
|
||||
}
|
||||
|
||||
// TODO: Add DescriptionReturnsCorrectValueForBwdWeightGrpConvXdl test once ckr::describe supports
|
||||
// backward weight convolution The backward weight variant uses different layout naming (InLayout,
|
||||
// WeiLayout, OutLayout) instead of (ALayout, BLayout, ELayout) and has
|
||||
// ConvBackwardWeightSpecialization
|
||||
|
||||
} // namespace
|
||||
@@ -0,0 +1,95 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include "gtest/gtest.h"
|
||||
#include "ck_tile/builder/reflect/instance_traits.hpp"
|
||||
#include "ck_tile/builder/reflect/conv_description.hpp"
|
||||
#include "ck/tensor_operation/gpu/device/device_base.hpp"
|
||||
#include "ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp"
|
||||
|
||||
namespace {
|
||||
|
||||
namespace ckr = ck_tile::reflect;
|
||||
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple =
|
||||
ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_instances<
|
||||
2, // NDimSpatial
|
||||
ck::tensor_operation::device::instance::GNHWC, // ALayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // BLayout
|
||||
ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // ELayout
|
||||
ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv_fwd_xdl_f16_instances
|
||||
std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle"
|
||||
"<2" // NDimSpatial
|
||||
",GNHWC" // ALayout
|
||||
",GKYXC" // BLayout
|
||||
",EmptyTuple" // DsLayout
|
||||
",GNHWK" // ELayout
|
||||
",fp16" // ADataType
|
||||
",fp16" // BDataType
|
||||
",fp32" // AccDataType
|
||||
",fp16" // CShuffleDataType
|
||||
",EmptyTuple" // DsDataType
|
||||
",fp16" // EDataType
|
||||
",PassThrough" // AElementwiseOperation
|
||||
",PassThrough" // BElementwiseOperation
|
||||
",PassThrough" // CDEElementwiseOperation
|
||||
",Default" // ConvForwardSpecialization
|
||||
",MNKPadding" // GemmSpec
|
||||
",1" // NumGemmKPrefetchStage
|
||||
",64" // BlockSize
|
||||
",64" // MPerBlock
|
||||
",64" // NPerBlock
|
||||
",32" // KPerBlock
|
||||
",8" // AK1
|
||||
",8" // BK1
|
||||
",32" // MPerXDL
|
||||
",32" // NPerXDL
|
||||
",2" // MXdlPerWave
|
||||
",2" // NXdlPerWave
|
||||
",Seq(4,16,1)" // ABlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // ABlockTransferSrcAccessOrder
|
||||
",2" // ABlockTransferSrcVectorDim
|
||||
",1" // ABlockTransferSrcScalarPerVector
|
||||
",8" // ABlockTransferDstScalarPerVector_AK1
|
||||
",1" // ABlockLdsExtraM
|
||||
",Seq(4,16,1)" // BBlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // BBlockTransferSrcAccessOrder
|
||||
",2" // BBlockTransferSrcVectorDim
|
||||
",1" // BBlockTransferSrcScalarPerVector
|
||||
",8" // BBlockTransferDstScalarPerVector_BK1
|
||||
",1" // BBlockLdsExtraN
|
||||
",1" // CShuffleMXdlPerWavePerShuffle
|
||||
",1" // CShuffleNXdlPerWavePerShuffle
|
||||
",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths
|
||||
",1" // CDEBlockTransferScalarPerVector_NPerBlock
|
||||
",fp16" // AComputeDataType
|
||||
",fp16" // BComputeDataType
|
||||
",Default" // LoopScheduler
|
||||
",1>"; // NumGroupsToMerge
|
||||
|
||||
// Test GetInstanceString through base class pointer for standard XDL variant
|
||||
TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConv)
|
||||
{
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
DeviceInstance device_instance;
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
|
||||
}
|
||||
|
||||
TEST(InstanceString, DescriptionReturnsCorrectValueForFwdGrpConv)
|
||||
{
|
||||
EXPECT_EQ(ckr::describe<DeviceInstance>().instance_string(), expected_str);
|
||||
}
|
||||
|
||||
} // namespace
|
||||
@@ -0,0 +1,88 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck_tile/builder/reflect/conv_description.hpp>
|
||||
#include <ck/tensor_operation/gpu/device/device_base.hpp>
|
||||
#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp>
|
||||
|
||||
namespace {
|
||||
|
||||
namespace ckr = ck_tile::reflect;
|
||||
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple =
|
||||
ck::tensor_operation::device::instance::device_grouped_conv2d_fwd_dl_f16_instances<
|
||||
ck::tensor_operation::device::instance::GNHWC, // InLayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // WeiLayout
|
||||
ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // OutLayout
|
||||
ck::Tuple<>, // DsDatatype
|
||||
ck::tensor_operation::element_wise::PassThrough, // CDEElementOp
|
||||
ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv2d_fwd_dl_f16_instances
|
||||
std::string expected_str = "DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK"
|
||||
"<2" // NDimSpatial
|
||||
",fp16" // ADataType
|
||||
",fp16" // BDataType
|
||||
",EmptyTuple" // DsDataType
|
||||
",fp16" // EDataType
|
||||
",fp32" // AccDataType
|
||||
",GNHWC" // ALayout
|
||||
",GKYXC" // BLayout
|
||||
",EmptyTuple" // DsLayout
|
||||
",GNHWK" // ELayout
|
||||
",PassThrough" // AElementwiseOperation
|
||||
",PassThrough" // BElementwiseOperation
|
||||
",PassThrough" // CDEElementwiseOperation
|
||||
",Default" // ConvForwardSpecialization
|
||||
",MNKPadding" // GemmSpec
|
||||
",8" // BlockSize
|
||||
",16" // MPerBlock
|
||||
",4" // NPerBlock
|
||||
",2" // K0PerBlock
|
||||
",1" // K1
|
||||
",1" // M1PerThread
|
||||
",2" // N1PerThread
|
||||
",1" // KPerThread
|
||||
",Seq(4,2)" // M1N1ThreadClusterM1Xs
|
||||
",Seq(1,1)" // M1N1ThreadClusterN1Xs
|
||||
",Seq(2,1,2,1)" // ABlockTransferThreadSliceLengths_K0_M0_M1_K1
|
||||
",Seq(1,1,8,1)" // ABlockTransferThreadClusterLengths_K0_M0_M1_K1
|
||||
",Seq(1,2,0,3)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,2,0,3)" // ABlockTransferSrcAccessOrder
|
||||
",Seq(1,1,1,1)" // ABlockTransferSrcVectorTensorLengths_K0_M0_M1_K1
|
||||
",Seq(1,2,0,3)" // ABlockTransferSrcVectorTensorContiguousDimOrder
|
||||
",Seq(1,1,1,1)" // ABlockTransferDstVectorTensorLengths_K0_M0_M1_K1
|
||||
",Seq(1,1,1,1)" // BBlockTransferThreadSliceLengths_K0_N0_N1_K1
|
||||
",Seq(2,1,4,1)" // BBlockTransferThreadClusterLengths_K0_N0_N1_K1
|
||||
",Seq(1,2,0,3)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,2,0,3)" // BBlockTransferSrcAccessOrder
|
||||
",Seq(1,1,1,1)" // BBlockTransferSrcVectorTensorLengths_K0_N0_N1_K1
|
||||
",Seq(1,2,0,3)" // BBlockTransferSrcVectorTensorContiguousDimOrder
|
||||
",Seq(1,1,1,1)" // BBlockTransferDstVectorTensorLengths_K0_N0_N1_K1
|
||||
",Seq(0,1,2,3,4,5)" // CThreadTransferSrcDstAccessOrder
|
||||
",5" // CThreadTransferSrcDstVectorDim
|
||||
",1>"; // CThreadTransferDstScalarPerVector
|
||||
|
||||
// Test GetInstanceString through base class pointer for DL variant
|
||||
TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvDl)
|
||||
{
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
DeviceInstance device_instance;
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
|
||||
}
|
||||
|
||||
// TODO: Add DescriptionReturnsCorrectValueForFwdGrpConvDl test once ckr::describe supports DL
|
||||
// variants The DL variant uses different internal structure (K0PerBlock, K1, M1PerThread, etc.) and
|
||||
// is missing standard members like kKPerBlock, kMPerXDL, kAK1
|
||||
|
||||
} // namespace
|
||||
@@ -0,0 +1,94 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck_tile/builder/reflect/conv_description.hpp>
|
||||
#include <ck/tensor_operation/gpu/device/device_base.hpp>
|
||||
#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp>
|
||||
|
||||
namespace {
|
||||
|
||||
namespace ckr = ck_tile::reflect;
|
||||
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple =
|
||||
ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_large_tensor_f16_instances<
|
||||
2, // NDimSpatial
|
||||
ck::tensor_operation::device::instance::GNHWC, // ALayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // BLayout
|
||||
ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // ELayout
|
||||
ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv_fwd_xdl_large_tensor_f16_instances
|
||||
std::string expected_str = "DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor"
|
||||
"<2" // NDimSpatial
|
||||
",GNHWC" // ALayout
|
||||
",GKYXC" // BLayout
|
||||
",EmptyTuple" // DsLayout
|
||||
",GNHWK" // ELayout
|
||||
",fp16" // ADataType
|
||||
",fp16" // BDataType
|
||||
",fp32" // AccDataType
|
||||
",fp16" // CShuffleDataType
|
||||
",EmptyTuple" // DsDataType
|
||||
",fp16" // EDataType
|
||||
",PassThrough" // AElementwiseOperation
|
||||
",PassThrough" // BElementwiseOperation
|
||||
",PassThrough" // CDEElementwiseOperation
|
||||
",Default" // ConvForwardSpecialization
|
||||
",MNKPadding" // GemmSpec
|
||||
",1" // NumGemmKPrefetchStage
|
||||
",64" // BlockSize
|
||||
",64" // MPerBlock
|
||||
",64" // NPerBlock
|
||||
",32" // KPerBlock
|
||||
",8" // AK1
|
||||
",8" // BK1
|
||||
",32" // MPerXDL
|
||||
",32" // NPerXDL
|
||||
",2" // MXdlPerWave
|
||||
",2" // NXdlPerWave
|
||||
",Seq(4,16,1)" // ABlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // ABlockTransferSrcAccessOrder
|
||||
",2" // ABlockTransferSrcVectorDim
|
||||
",1" // ABlockTransferSrcScalarPerVector
|
||||
",8" // ABlockTransferDstScalarPerVector_AK1
|
||||
",1" // ABlockLdsExtraM
|
||||
",Seq(4,16,1)" // BBlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // BBlockTransferSrcAccessOrder
|
||||
",2" // BBlockTransferSrcVectorDim
|
||||
",1" // BBlockTransferSrcScalarPerVector
|
||||
",8" // BBlockTransferDstScalarPerVector_BK1
|
||||
",1" // BBlockLdsExtraN
|
||||
",1" // CShuffleMXdlPerWavePerShuffle
|
||||
",1" // CShuffleNXdlPerWavePerShuffle
|
||||
",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths
|
||||
",1" // CDEBlockTransferScalarPerVector_NPerBlock
|
||||
",fp16" // AComputeDataType
|
||||
",fp16" // BComputeDataType
|
||||
",Default>"; // LoopScheduler
|
||||
|
||||
// Test GetInstanceString through base class pointer for large tensor variant
|
||||
TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvLargeTensor)
|
||||
{
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
DeviceInstance device_instance;
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
|
||||
}
|
||||
|
||||
TEST(InstanceString, DescriptionReturnsCorrectValueForFwdGrpConvLargeTensor)
|
||||
{
|
||||
EXPECT_EQ(ckr::describe<DeviceInstance>().instance_string(), expected_str);
|
||||
}
|
||||
|
||||
} // namespace
|
||||
@@ -0,0 +1,96 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck_tile/builder/reflect/conv_description.hpp>
|
||||
#include <ck/tensor_operation/gpu/device/device_base.hpp>
|
||||
#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp>
|
||||
|
||||
namespace {
|
||||
|
||||
namespace ckr = ck_tile::reflect;
|
||||
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple =
|
||||
ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_comp_instances<
|
||||
2, // NDimSpatial
|
||||
ck::tensor_operation::device::instance::GNHWC, // ALayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // BLayout
|
||||
ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // ELayout
|
||||
ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv_fwd_xdl_f16_comp_instances This corresponds to the configuration with
|
||||
// BlockSize=256, MPerBlock=128, NPerBlock=128, KPerBlock=64, etc.
|
||||
std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3"
|
||||
"<2" // NDimSpatial
|
||||
",GNHWC" // ALayout
|
||||
",GKYXC" // BLayout
|
||||
",EmptyTuple" // DsLayout
|
||||
",GNHWK" // ELayout
|
||||
",fp16" // ADataType
|
||||
",fp16" // BDataType
|
||||
",fp32" // AccDataType
|
||||
",fp16" // CShuffleDataType
|
||||
",EmptyTuple" // DsDataType
|
||||
",fp16" // EDataType
|
||||
",PassThrough" // AElementwiseOperation
|
||||
",PassThrough" // BElementwiseOperation
|
||||
",PassThrough" // CDEElementwiseOperation
|
||||
",Default" // ConvForwardSpecialization
|
||||
",MNKPadding" // GemmSpec
|
||||
",256" // BlockSize
|
||||
",128" // MPerBlock
|
||||
",128" // NPerBlock
|
||||
",64" // KPerBlock
|
||||
",8" // AK1
|
||||
",8" // BK1
|
||||
",32" // MPerXDL
|
||||
",32" // NPerXDL
|
||||
",2" // MXdlPerWave
|
||||
",2" // NXdlPerWave
|
||||
",Seq(8,32,1)" // ABlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // ABlockTransferSrcAccessOrder
|
||||
",2" // ABlockTransferSrcVectorDim
|
||||
",8" // ABlockTransferSrcScalarPerVector
|
||||
",8" // ABlockTransferDstScalarPerVector_AK1
|
||||
",0" // ABlockLdsExtraM
|
||||
",Seq(8,32,1)" // BBlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // BBlockTransferSrcAccessOrder
|
||||
",2" // BBlockTransferSrcVectorDim
|
||||
",8" // BBlockTransferSrcScalarPerVector
|
||||
",8" // BBlockTransferDstScalarPerVector_BK1
|
||||
",0" // BBlockLdsExtraN
|
||||
",1" // CShuffleMXdlPerWavePerShuffle
|
||||
",1" // CShuffleNXdlPerWavePerShuffle
|
||||
",Seq(1,32,1,8)" // CDEBlockTransferClusterLengths
|
||||
",8" // CDEBlockTransferScalarPerVector_NPerBlock
|
||||
",Intrawave" // BlkGemmPipeSched
|
||||
",v4" // BlkGemmPipelineVer
|
||||
",fp16" // AComputeDataType
|
||||
",fp16" // BComputeDataType
|
||||
",false>"; // DirectLoad
|
||||
|
||||
// Test GetInstanceString through base class pointer for V3 variant
|
||||
TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvV3)
|
||||
{
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
DeviceInstance device_instance;
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
|
||||
}
|
||||
|
||||
TEST(InstanceString, DescriptionReturnsCorrectValueForFwdGrpConvV3)
|
||||
{
|
||||
EXPECT_EQ(ckr::describe<DeviceInstance>().instance_string(), expected_str);
|
||||
}
|
||||
|
||||
} // namespace
|
||||
@@ -0,0 +1,93 @@
|
||||
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
||||
// SPDX-License-Identifier: MIT
|
||||
|
||||
#include <gtest/gtest.h>
|
||||
#include <ck_tile/builder/reflect/instance_traits.hpp>
|
||||
#include <ck_tile/builder/reflect/conv_description.hpp>
|
||||
#include <ck/tensor_operation/gpu/device/device_base.hpp>
|
||||
#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp>
|
||||
|
||||
namespace {
|
||||
|
||||
namespace ckr = ck_tile::reflect;
|
||||
|
||||
// Use the template helper to get a working instance configuration
|
||||
using InstanceTuple =
|
||||
ck::tensor_operation::device::instance::device_grouped_conv_fwd_wmma_f16_instances<
|
||||
2, // NDimSpatial
|
||||
ck::tensor_operation::device::instance::GNHWC, // ALayout
|
||||
ck::tensor_operation::device::instance::GKYXC, // BLayout
|
||||
ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout
|
||||
ck::tensor_operation::device::instance::GNHWK, // ELayout
|
||||
ck::Tuple<>, // DsDatatype
|
||||
ck::tensor_operation::element_wise::PassThrough, // CDEElementOp
|
||||
ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec
|
||||
|
||||
// Get the first instance from the tuple
|
||||
using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
|
||||
|
||||
// Expected complete instance string based on the first instance from
|
||||
// device_grouped_conv_fwd_wmma_f16_instances
|
||||
std::string expected_str = "DeviceGroupedConvFwdMultipleD_Wmma_CShuffle"
|
||||
"<2" // NDimSpatial
|
||||
",GNHWC" // ALayout
|
||||
",GKYXC" // BLayout
|
||||
",EmptyTuple" // DsLayout
|
||||
",GNHWK" // ELayout
|
||||
",fp16" // ADataType
|
||||
",fp16" // BDataType
|
||||
",fp32" // AccDataType
|
||||
",fp16" // CShuffleDataType
|
||||
",EmptyTuple" // DsDataType
|
||||
",fp16" // EDataType
|
||||
",PassThrough" // AElementwiseOperation
|
||||
",PassThrough" // BElementwiseOperation
|
||||
",PassThrough" // CDEElementwiseOperation
|
||||
",Default" // ConvForwardSpecialization
|
||||
",MNKPadding" // GemmSpec
|
||||
",1" // NumGemmKPrefetchStage
|
||||
",128" // BlockSize
|
||||
",64" // MPerBlock
|
||||
",64" // NPerBlock
|
||||
",32" // KPerBlock
|
||||
",8" // K1
|
||||
",16" // MPerWmma
|
||||
",16" // NPerWmma
|
||||
",2" // MRepeat
|
||||
",2" // NRepeat
|
||||
",Seq(4,32,1)" // ABlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // ABlockTransferSrcAccessOrder
|
||||
",2" // ABlockTransferSrcVectorDim
|
||||
",1" // ABlockTransferSrcScalarPerVector
|
||||
",8" // ABlockTransferDstScalarPerVector_AK1
|
||||
",true" // ABlockLdsExtraM
|
||||
",Seq(4,32,1)" // BBlockTransferThreadClusterLengths
|
||||
",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder
|
||||
",Seq(1,0,2)" // BBlockTransferSrcAccessOrder
|
||||
",2" // BBlockTransferSrcVectorDim
|
||||
",1" // BBlockTransferSrcScalarPerVector
|
||||
",8" // BBlockTransferDstScalarPerVector_BK1
|
||||
",true" // BBlockLdsExtraN
|
||||
",1" // CShuffleMRepeatPerShuffle
|
||||
",1" // CShuffleNRepeatPerShuffle
|
||||
",Seq(1,32,1,4)" // CDEShuffleBlockTransferClusterLengths
|
||||
",1" // CDEShuffleBlockTransferScalarPerVector_NPerBlock
|
||||
",Default" // LoopSched
|
||||
",v1>"; // PipelineVer
|
||||
|
||||
// Test GetInstanceString through base class pointer for WMMA variant
|
||||
TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvWmma)
|
||||
{
|
||||
using BaseClass = ck::tensor_operation::device::BaseOperator;
|
||||
DeviceInstance device_instance;
|
||||
BaseClass* base_ptr = &device_instance;
|
||||
|
||||
EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
|
||||
}
|
||||
|
||||
// TODO: Add DescriptionReturnsCorrectValueForFwdGrpConvWmma test once ckr::describe supports WMMA
|
||||
// variants The WMMA variant uses WMMA-specific parameters (MPerWmma, NPerWmma, MRepeat, NRepeat)
|
||||
// and has a different tile transfer structure incompatible with current ConvTraits
|
||||
|
||||
} // namespace
|
||||
Reference in New Issue
Block a user