diff --git a/experimental/builder/include/ck_tile/builder/reflect/README.md b/experimental/builder/include/ck_tile/builder/reflect/README.md index 6ef9ebe87a..8bb9c89c80 100644 --- a/experimental/builder/include/ck_tile/builder/reflect/README.md +++ b/experimental/builder/include/ck_tile/builder/reflect/README.md @@ -10,19 +10,21 @@ The reflection system works by extracting properties from a convolution kernel * 1. **Trait Extraction**: The `ConvTraits` template (in `conv_traits.hpp`) is specialized for each kernel instance. It extracts low-level details like tile sizes, data layouts, and pipeline versions from the kernel's type definition. -2. **Description Generation**: The `Describe()` function (in `conv_description.hpp`) uses `ConvTraits` to populate a `ConvDescription` struct. +2. **Description Generation**: The `describe()` function (in `conv_description.hpp`) uses `ConvTraits` to populate a `ConvDescription` (`Description`) object. -3. **Formatting**: The `ConvDescription` struct contains methods like `brief()` and `detailed()` that format the extracted properties into well-structured strings for display. +3. **Formatting**: The `ConvDescription` class (which implements `Description`) contains methods like `brief()` and `detailed()` that format the extracted properties into well-structured strings for display. ## Key Files -- **`conv_description.hpp`**: The main entry point. Contains the `ConvDescription` struct and the `Describe()` factory function. +- **`description.hpp`**: The generalized Description base class with no implementation. + +- **`conv_description.hpp`**: The main entry point. Contains the `ConvDescription` struct and the `describe()` factory function. - **`conv_traits.hpp`**: Home of the `ConvTraits` template, which is the core of the property extraction mechanism. - **`tree_formatter.hpp`**: A simple utility for generating the indented, tree-like format used in the `detailed()` description. ## Usage -To get a description of a convolution kernel instance, use the `Describe` function and call one of its formatting methods: +To get a description of a convolution kernel instance, use the `describe` function and call one of its formatting methods: ```cpp #include "ck_tile/builder/reflect/conv_description.hpp" @@ -36,3 +38,43 @@ const auto description = ck_tile::reflect::conv::Describe(); // Print the detailed description std::cout << description.detailed() << std::endl; ``` + +## Appendix: Current Limitations + +### Supported Instance Types + +The reflection system (`ckr::describe`) currently supports the following convolution instance types: + +- **Standard XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle`) +- **Large Tensor XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor`) +- **V3 XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3`) + +These variants all share similar template parameter structures and are compatible with the current `ConvTraits` implementation. + +### Unsupported Instance Types + +The following instance types are **not yet supported** by the reflection system: + +- **DL (pre-XDL) Variants** (`DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK`) + - Uses different internal structure with parameters like `K0PerBlock`, `K1`, `M1PerThread`, etc. + - Missing standard members like `kKPerBlock`, `kMPerXDL`, `kAK1` + +- **WMMA Variants** (`DeviceGroupedConvFwdMultipleD_Wmma_CShuffle`) + - Uses WMMA-specific parameters like `MPerWmma`, `NPerWmma`, `MRepeat`, `NRepeat` + - Different tile transfer structure incompatible with current `ConvTraits` + +- **Backward Weight Convolution** (`DeviceGroupedConvBwdWeight_Xdl_CShuffle`) + - Uses different layout naming: `InLayout`, `WeiLayout`, `OutLayout` instead of `ALayout`, `BLayout`, `ELayout` + - Different specialization type: `ConvBackwardWeightSpecialization` vs `ConvForwardSpecialization` + - Missing several members expected by forward convolution traits + +### Future Work + +To support these additional instance types, the reflection system would need: + +1. Specialized `ConvTraits` templates for each variant type +2. Updated `conv_layout`, `conv_data_type`, and other helper functions to handle different parameter structures +3. Conditional compilation or SFINAE techniques to select the appropriate trait extraction logic based on instance type +4. Customize `ConvDescription` methods for more general kernels. + +For now, these unsupported types can still use `GetInstanceString()` through the base class pointer, but cannot use the `ckr::describe` reflection API. diff --git a/experimental/builder/include/ck_tile/builder/reflect/conv_description.hpp b/experimental/builder/include/ck_tile/builder/reflect/conv_description.hpp index 375e465721..be3c208ba8 100644 --- a/experimental/builder/include/ck_tile/builder/reflect/conv_description.hpp +++ b/experimental/builder/include/ck_tile/builder/reflect/conv_description.hpp @@ -1,21 +1,19 @@ // Copyright (c) Advanced Micro Devices, Inc., or its affiliates. // SPDX-License-Identifier: MIT -/** - * @file - * @brief Provides utilities to reflect on convolution kernel instances and generate - * human-readable descriptions of their configuration. - * - * This file contains the necessary components to transform a convolution kernel's - * compile-time properties into a structured, descriptive format. This is primarily - * used for debugging, logging, and generating documentation. - * - * Key components: - * - ck_tile::reflect::conv::ConvDescription: A struct that holds the extracted - * properties and provides methods to format them into strings. - * - ck_tile::reflect::conv::Describe(): A factory function that creates a - * ConvDescription from a given kernel instance type. - */ +/// @file +/// @brief Provides utilities to reflect on convolution kernel instances and generate +/// human-readable descriptions of their configuration. +/// +/// This file contains the necessary components to transform a convolution kernel's +/// compile-time properties into a structured, descriptive format. This is primarily +/// used for debugging, logging, and generating documentation. +/// +/// Key components: +/// - ck_tile::reflect::conv::ConvDescription: A struct that holds the extracted +/// properties and provides methods to format them into strings. +/// - ck_tile::reflect::conv::Describe(): A factory function that creates a +/// ConvDescription from a given kernel instance type. #pragma once @@ -24,14 +22,17 @@ #include #include #include +#include #include #include +#include +#include #include -/// @brief Provides human-readable descriptions of convolution kernel instances +namespace ck_tile::reflect { -namespace ck_tile::reflect::conv { +namespace conv { /// @brief Signature information for a convolution operation /// Contains high-level properties that define the convolution's interface, @@ -71,56 +72,68 @@ struct GemmAlgorithmInfo /// @brief Provides human-readable descriptions of convolution kernel instances /// Generates formatted text descriptions at various levels of detail for /// understanding and documenting convolution kernel configurations. -struct ConvDescription +class ConvDescription : public Description { - ConvSignatureInfo signature; - GemmAlgorithmInfo algorithm; + public: + /// @brief Constructor for ConvDescription + /// @param sig The signature information containing high-level convolution properties + /// @param algo The algorithm configuration containing low-level implementation details + /// @param instance_string_getter A callable that returns a string representation of the + /// instance + ConvDescription(ConvSignatureInfo sig, + GemmAlgorithmInfo algo, + std::function instance_string_getter) + : signature_(std::move(sig)), + algorithm_(std::move(algo)), + instance_string_getter_(std::move(instance_string_getter)) + { + } /// @brief Generate a brief one-line summary of the convolution /// @return A concise description (e.g., "2D Forward convolution") - std::string brief() const + std::string brief() const override { std::ostringstream oss; - oss << signature.spatial_dim << "D " << signature.direction << " convolution"; + oss << signature_.spatial_dim << "D " << signature_.direction << " convolution"; return oss.str(); } /// @brief Generate a detailed hierarchical description of the convolution /// @return A multi-line tree-formatted description covering signature and algorithm details - std::string detailed() const + std::string detailed() const override { TreeFormatter f; - f.writeLine(0, signature.spatial_dim, "D ", signature.direction, " Convolution Kernel"); + f.writeLine(0, signature_.spatial_dim, "D ", signature_.direction, " Convolution Kernel"); f.writeLine(1, "Signature"); - f.writeLine(2, "Tensor Type: ", signature.data_type); - f.writeLine(2, "Memory Layout: ", signature.layout); - f.writeLine(2, "Input elementwise operation: ", signature.input_element_op); - f.writeLine(2, "Weights elementwise operation: ", signature.weight_element_op); - f.writeLast(2, "Output elementwise operation: ", signature.output_element_op); + f.writeLine(2, "Tensor Type: ", signature_.data_type); + f.writeLine(2, "Memory Layout: ", signature_.layout); + f.writeLine(2, "Input elementwise operation: ", signature_.input_element_op); + f.writeLine(2, "Weights elementwise operation: ", signature_.weight_element_op); + f.writeLast(2, "Output elementwise operation: ", signature_.output_element_op); f.writeLast(1, "Algorithm"); // Compute Block section - f.writeLine(2, "Thread block size: ", algorithm.thread_block_size); + f.writeLine(2, "Thread block size: ", algorithm_.thread_block_size); f.writeLine(2, "Data tile size: ", - algorithm.tile_dims.m, + algorithm_.tile_dims.m, "×", - algorithm.tile_dims.n, + algorithm_.tile_dims.n, "×", - algorithm.tile_dims.k); - f.writeLine(2, "Gemm padding: ", algorithm.padding); - f.writeLine(2, "Convolution specialization: ", algorithm.conv_specialization); + algorithm_.tile_dims.k); + f.writeLine(2, "Gemm padding: ", algorithm_.padding); + f.writeLine(2, "Convolution specialization: ", algorithm_.conv_specialization); // Pipeline section - f.writeLine(2, "Pipeline version: ", algorithm.pipeline_version); - f.writeLine(2, "Pipeline scheduler: ", algorithm.pipeline_scheduler); + f.writeLine(2, "Pipeline version: ", algorithm_.pipeline_version); + f.writeLine(2, "Pipeline scheduler: ", algorithm_.pipeline_scheduler); f.writeLine(2, "Warp Gemm parameters: "); f.writeLine( - 3, "subtile size: ", algorithm.warp_gemm.gemm_m, "×", algorithm.warp_gemm.gemm_n); + 3, "subtile size: ", algorithm_.warp_gemm.gemm_m, "×", algorithm_.warp_gemm.gemm_n); f.writeLast(3, "Number of warp gemm iterations: ", - algorithm.warp_gemm.m_iter, + algorithm_.warp_gemm.m_iter, "×", - algorithm.warp_gemm.n_iter); + algorithm_.warp_gemm.n_iter); // Memory Access section f.writeLast(2, "Memory access:"); @@ -128,152 +141,148 @@ struct ConvDescription f.writeLine(3, "A Tile transfer: "); f.writeLine(4, "Tile dimensions: ", - algorithm.a_tile_transfer.tile_dimensions.k0, + algorithm_.a_tile_transfer.tile_dimensions.k0, "×", - algorithm.a_tile_transfer.tile_dimensions.m_or_n, + algorithm_.a_tile_transfer.tile_dimensions.m_or_n, "×", - algorithm.a_tile_transfer.tile_dimensions.k1, + algorithm_.a_tile_transfer.tile_dimensions.k1, "×"); - f.writeLine( - 4, "The innermost K subdimension size: ", algorithm.a_tile_transfer.transfer_params.k1); + f.writeLine(4, + "The innermost K subdimension size: ", + algorithm_.a_tile_transfer.transfer_params.k1); f.writeLine(4, "Spatial thread distribution over the data tile: ", - algorithm.a_tile_transfer.transfer_params.thread_cluster_order[0], + algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[0], "×", - algorithm.a_tile_transfer.transfer_params.thread_cluster_order[1], + algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[1], "×", - algorithm.a_tile_transfer.transfer_params.thread_cluster_order[2]); + algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[2]); f.writeLine(4, "The order of accessing data tile axes: ", - algorithm.a_tile_transfer.transfer_params.src_access_order[0], + algorithm_.a_tile_transfer.transfer_params.src_access_order[0], "×", - algorithm.a_tile_transfer.transfer_params.src_access_order[1], + algorithm_.a_tile_transfer.transfer_params.src_access_order[1], "×", - algorithm.a_tile_transfer.transfer_params.src_access_order[2]); + algorithm_.a_tile_transfer.transfer_params.src_access_order[2]); f.writeLine(4, "Vectorized memory access axis index (with contiguous memory): ", - algorithm.a_tile_transfer.transfer_params.src_vector_dim); + algorithm_.a_tile_transfer.transfer_params.src_vector_dim); f.writeLine(4, "Vector access (GMEM read) instruction size: ", - algorithm.a_tile_transfer.transfer_params.src_scalar_per_vector); + algorithm_.a_tile_transfer.transfer_params.src_scalar_per_vector); f.writeLine(4, "Vector access (LDS write) instruction size: ", - algorithm.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1); + algorithm_.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1); f.writeLast(4, "LDS data layout padding (to prevent bank conflicts): ", - algorithm.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1); + algorithm_.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1); f.writeLine(3, "B Tile transfer: "); f.writeLine(4, "Tile dimensions: ", - algorithm.b_tile_transfer.tile_dimensions.k0, + algorithm_.b_tile_transfer.tile_dimensions.k0, "×", - algorithm.b_tile_transfer.tile_dimensions.m_or_n, + algorithm_.b_tile_transfer.tile_dimensions.m_or_n, "×", - algorithm.b_tile_transfer.tile_dimensions.k1, + algorithm_.b_tile_transfer.tile_dimensions.k1, "×"); - f.writeLine( - 4, "The innermost K subdimension size: ", algorithm.b_tile_transfer.transfer_params.k1); + f.writeLine(4, + "The innermost K subdimension size: ", + algorithm_.b_tile_transfer.transfer_params.k1); f.writeLine(4, "Spatial thread distribution over the data tile: ", - algorithm.b_tile_transfer.transfer_params.thread_cluster_order[0], + algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[0], "×", - algorithm.b_tile_transfer.transfer_params.thread_cluster_order[1], + algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[1], "×", - algorithm.b_tile_transfer.transfer_params.thread_cluster_order[2]); + algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[2]); f.writeLine(4, "The order of accessing data tile axes: ", - algorithm.b_tile_transfer.transfer_params.src_access_order[0], + algorithm_.b_tile_transfer.transfer_params.src_access_order[0], "×", - algorithm.b_tile_transfer.transfer_params.src_access_order[1], + algorithm_.b_tile_transfer.transfer_params.src_access_order[1], "×", - algorithm.b_tile_transfer.transfer_params.src_access_order[2]); + algorithm_.b_tile_transfer.transfer_params.src_access_order[2]); f.writeLine(4, "Vectorized memory access axis index (with contiguous memory): ", - algorithm.b_tile_transfer.transfer_params.src_vector_dim); + algorithm_.b_tile_transfer.transfer_params.src_vector_dim); f.writeLine(4, "Vector access (GMEM read) instruction size: ", - algorithm.b_tile_transfer.transfer_params.src_scalar_per_vector); + algorithm_.b_tile_transfer.transfer_params.src_scalar_per_vector); f.writeLine(4, "Vector access (LDS write) instruction size: ", - algorithm.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1); + algorithm_.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1); f.writeLast(4, "LDS data layout padding (to prevent bank conflicts): ", - algorithm.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1); + algorithm_.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1); f.writeLast(3, "C Tile transfer: "); f.writeLine(4, "Data shuffle (number of gemm instructions per iteration): ", - algorithm.c_tile_transfer.shuffle_params.m_gemms_per_shuffle, + algorithm_.c_tile_transfer.shuffle_params.m_gemms_per_shuffle, "×", - algorithm.c_tile_transfer.shuffle_params.n_gemms_per_shuffle); + algorithm_.c_tile_transfer.shuffle_params.n_gemms_per_shuffle); f.writeLine(4, "Spatial thread distribution used to store data: ", - algorithm.c_tile_transfer.thread_cluster_dims[0], + algorithm_.c_tile_transfer.thread_cluster_dims[0], "×", - algorithm.c_tile_transfer.thread_cluster_dims[1], + algorithm_.c_tile_transfer.thread_cluster_dims[1], "×", - algorithm.c_tile_transfer.thread_cluster_dims[2], + algorithm_.c_tile_transfer.thread_cluster_dims[2], "×", - algorithm.c_tile_transfer.thread_cluster_dims[3]); + algorithm_.c_tile_transfer.thread_cluster_dims[3]); f.writeLast(4, "Vector access (GMEM write) instruction size: ", - algorithm.c_tile_transfer.scalar_per_vector); + algorithm_.c_tile_transfer.scalar_per_vector); return f.getString(); } - /// @brief Generate an educational explanation of optimization choices - /// @return Educational content explaining why certain algorithm choices were made - /// @note Currently unimplemented - reserved for future enhancement - std::string explain() const - { - std::ostringstream oss; - // Placeholder for future implementation - return oss.str(); - } + /// @brief Generate a string representation of the instance + /// @return A string that represents the instance + std::string instance_string() const override { return instance_string_getter_(); } - /// @brief Generate performance characteristics and use case guidance - /// @return Guidance on when this configuration is optimal and expected performance - /// @note Currently unimplemented - reserved for future enhancement - std::string suggest() const - { - std::ostringstream oss; - // Placeholder for future implementation - return oss.str(); - } + private: + ConvSignatureInfo signature_; + GemmAlgorithmInfo algorithm_; + std::function instance_string_getter_; }; +} // namespace conv -/// @brief Helper concept to detect if a type has InstanceTraits specialization +/// @brief Helper concept to detect if a type has ConvTraits specialization template -concept HasInstanceTraits = requires { typename InstanceTraits; }; +concept HasConvTraits = requires { typename conv::ConvTraits; }; /// @brief Factory function to create ConvDescription from a convolution instance type /// @tparam Instance The convolution instance type (must have InstanceTraits specialization) /// @return A ConvDescription object populated with the instance's configuration details -template - requires HasInstanceTraits -ConvDescription Describe() +template +conv::ConvDescription describe() { - using Traits = ConvTraits; + using Traits = conv::ConvTraits; - return ConvDescription{ - .signature = ConvSignatureInfo{.spatial_dim = Traits::spatial_dim, - .direction = Traits::direction, - .layout = Traits::layout, - .data_type = Traits::data_type, - .input_element_op = Traits::input_element_op, - .weight_element_op = Traits::weight_element_op, - .output_element_op = Traits::output_element_op}, - .algorithm = GemmAlgorithmInfo{.thread_block_size = Traits::thread_block_size, - .tile_dims = Traits::tile_dims, - .warp_gemm = Traits::warp_gemm, - .a_tile_transfer = Traits::a_tile_transfer, - .b_tile_transfer = Traits::b_tile_transfer, - .c_tile_transfer = Traits::c_tile_transfer, - .pipeline_version = Traits::pipeline_version, - .pipeline_scheduler = Traits::pipeline_scheduler, - .conv_specialization = Traits::conv_specialization, - .padding = Traits::gemm_padding}}; + return conv::ConvDescription( + conv::ConvSignatureInfo{ + .spatial_dim = Traits::spatial_dim, + .direction = Traits::direction, + .layout = Traits::layout, + .data_type = Traits::data_type, + .input_element_op = Traits::input_element_op, + .weight_element_op = Traits::weight_element_op, + .output_element_op = Traits::output_element_op, + }, + conv::GemmAlgorithmInfo{ + .thread_block_size = Traits::thread_block_size, + .tile_dims = Traits::tile_dims, + .warp_gemm = Traits::warp_gemm, + .a_tile_transfer = Traits::a_tile_transfer, + .b_tile_transfer = Traits::b_tile_transfer, + .c_tile_transfer = Traits::c_tile_transfer, + .pipeline_version = Traits::pipeline_version, + .pipeline_scheduler = Traits::pipeline_scheduler, + .conv_specialization = Traits::conv_specialization, + .padding = Traits::gemm_padding, + }, + []() { return reflect::instance_string(); }); } -} // namespace ck_tile::reflect::conv +} // namespace ck_tile::reflect diff --git a/experimental/builder/include/ck_tile/builder/reflect/conv_traits.hpp b/experimental/builder/include/ck_tile/builder/reflect/conv_traits.hpp index a59c6c3045..316f570bcd 100644 --- a/experimental/builder/include/ck_tile/builder/reflect/conv_traits.hpp +++ b/experimental/builder/include/ck_tile/builder/reflect/conv_traits.hpp @@ -551,8 +551,7 @@ struct ConvTraits; /// @details This is the primary specialization used to extract a comprehensive /// set of traits directly from a fully-formed device kernel `Instance` type. /// It uses `InstanceTraits` to access the kernel's template parameters. -template - requires requires { typename InstanceTraits; } +template struct ConvTraits { using InstTraits = InstanceTraits; diff --git a/experimental/builder/include/ck_tile/builder/reflect/description.hpp b/experimental/builder/include/ck_tile/builder/reflect/description.hpp new file mode 100644 index 0000000000..c3a38964a7 --- /dev/null +++ b/experimental/builder/include/ck_tile/builder/reflect/description.hpp @@ -0,0 +1,39 @@ +// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +// SPDX-License-Identifier: MIT + +/// @file +/// @brief Provides a base class for generating human-readable descriptions of kernel instances. +/// +/// This file contains the Description base class that defines a common interface for +/// all descriptor types. Derived classes implement specific formatting and explanation +/// logic for different kernel types (e.g., convolution, GEMM, etc.). + +#pragma once + +#include + +namespace ck_tile::reflect { + +/// @brief Base class for generating human-readable descriptions of kernel instances +/// Defines a common interface for all descriptor types with methods for generating +/// descriptions at various levels of detail. +class Description +{ + public: + /// @brief Virtual destructor for proper cleanup of derived classes + virtual ~Description() = default; + + /// @brief Generate a brief one-line summary + /// @return A concise description of the kernel configuration + virtual std::string brief() const = 0; + + /// @brief Generate a detailed hierarchical description + /// @return A multi-line tree-formatted description covering all configuration details + virtual std::string detailed() const = 0; + + /// @brief Generate a string representation of the instance + /// @return A string that represents the instance + virtual std::string instance_string() const = 0; +}; + +} // namespace ck_tile::reflect diff --git a/experimental/builder/test/CMakeLists.txt b/experimental/builder/test/CMakeLists.txt index a32c68e219..e43c88c7a7 100644 --- a/experimental/builder/test/CMakeLists.txt +++ b/experimental/builder/test/CMakeLists.txt @@ -103,18 +103,18 @@ add_ck_builder_test(test_ckb_conv_builder # compilable code. They are more expensive but catch real-world issues. -# Verifies that GetInstanceString() methods produce valid kernel code. +# Verifies that GetInstanceString() methods and other functions produce valid kernel code. # Tests various convolution types: # - Group convolution (v3, standard, large tensor, WMMA, DL variants) # - Backward weight group convolution (XDL) -# Requires kernel compilation to validate the generated strings. -add_ck_builder_test(test_ckb_get_instance_string - test_get_instance_string_fwd_grp_conv_v3.cpp - test_get_instance_string_fwd_grp_conv.cpp - test_get_instance_string_fwd_grp_conv_large_tensor.cpp - test_get_instance_string_fwd_grp_conv_wmma.cpp - test_get_instance_string_fwd_grp_conv_dl.cpp - test_get_instance_string_bwd_weight_grp_conv_xdl.cpp) +# Requires kernel compilation to validate the generated strings through the base class. +add_ck_builder_test(test_ckb_instance_string + test_instance_string_fwd_grp_conv_v3.cpp + test_instance_string_fwd_grp_conv.cpp + test_instance_string_fwd_grp_conv_large_tensor.cpp + test_instance_string_fwd_grp_conv_wmma.cpp + test_instance_string_fwd_grp_conv_dl.cpp + test_instance_string_bwd_weight_grp_conv_xdl.cpp) # Tests the forward convolution builder across multiple data types and dimensions. # Individual tests are split into separate files to enable parallel compilation. @@ -183,7 +183,7 @@ endforeach() # Register all regression tests (integration tests with kernel compilation) set(CKB_REGRESSION_TESTS - test_ckb_get_instance_string + test_ckb_instance_string test_ckb_build_fwd_instances test_ckb_testing_utils test_ckb_factory_grouped_convolution_forward_convscale diff --git a/experimental/builder/test/test_conv_description.cpp b/experimental/builder/test/test_conv_description.cpp index 20df86efd3..5480c2740a 100644 --- a/experimental/builder/test/test_conv_description.cpp +++ b/experimental/builder/test/test_conv_description.cpp @@ -13,7 +13,7 @@ namespace { namespace ckb = ck_tile::builder; -namespace ckr = ck_tile::reflect::conv; +namespace ckr = ck_tile::reflect; namespace ckt = ck_tile::test; // Defines the signature of the convolution operation to be tested. @@ -110,7 +110,7 @@ TEST(ConvDescriptionTest, DefaultInstanceHasBriefDescription) static constexpr const ConvSignature SIGNATURE; static constexpr const DefaultAlgorithm ALGORITHM; using Instance = ckb::ConvBuilder::Instance; - EXPECT_THAT(ckr::Describe().brief(), ckt::StringEqWithDiff("2D Forward convolution")); + EXPECT_THAT(ckr::describe().brief(), ckt::StringEqWithDiff("2D Forward convolution")); } TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription) @@ -118,7 +118,7 @@ TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription) static constexpr const ConvSignature SIGNATURE; static constexpr const DefaultAlgorithm ALGORITHM; using Instance = ckb::ConvBuilder::Instance; - EXPECT_THAT(ckr::Describe().detailed(), + EXPECT_THAT(ckr::describe().detailed(), ckt::StringEqWithDiff( // "2D Forward Convolution Kernel\n" "├─ Signature\n" @@ -162,6 +162,23 @@ TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription) " └─ Vector access (GMEM write) instruction size: 2")); } +TEST(ConvDescriptionTest, DefaultInstanceHasInstanceString) +{ + static constexpr const ConvSignature SIGNATURE; + static constexpr const DefaultAlgorithm ALGORITHM; + using Instance = ckb::ConvBuilder::Instance; + + // Get the instance string from the description + std::string instance_str = ckr::describe().instance_string(); + + // Verify that the instance string is not empty + EXPECT_FALSE(instance_str.empty()); + + // Verify that it contains the device operation name + // The exact format depends on the InstanceTraits implementation + EXPECT_THAT(instance_str, ::testing::HasSubstr("DeviceGroupedConvFwdMultipleABD")); +} + // NOTE: BackwardDataInstanceHasDetailedDescription test is disabled because ConvFactory // does not have a specialization for backward data convolutions. The test fails with: // "implicit instantiation of undefined template 'ck_tile::builder::ConvFactory<...>'" @@ -195,4 +212,5 @@ TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription) // EXPECT_THAT(ckr::Describe().detailed(), // ckt::StringEqWithDiff("PLACEHOLDER")); // } + } // namespace diff --git a/experimental/builder/test/test_get_instance_string_bwd_weight_grp_conv_xdl.cpp b/experimental/builder/test/test_get_instance_string_bwd_weight_grp_conv_xdl.cpp deleted file mode 100644 index 74de0903f4..0000000000 --- a/experimental/builder/test/test_get_instance_string_bwd_weight_grp_conv_xdl.cpp +++ /dev/null @@ -1,86 +0,0 @@ -// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. -// SPDX-License-Identifier: MIT - -#include -#include -#include -#include - -// Test GetInstanceString through base class pointer for backward weight XDL variant -TEST(GetInstanceString, ReturnsStringForBwdWeightGrpConvXdlInstance) -{ - // Use the template helper to get a working instance configuration - using InstanceTuple = ck::tensor_operation::device::instance:: - device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances< - 2, // NDimSpatial - ck::tensor_operation::device::instance::GNHWC, // InLayout - ck::tensor_operation::device::instance::GKYXC, // WeiLayout - ck::tensor_operation::device::instance::GNHWK, // OutLayout - ck::tensor_operation::device::instance:: - ConvBwdWeightDefault>; // ConvBwdWeightSpecialization - - // Get the first instance from the tuple - using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; - - // Define the base class type using the most general operator base - using BaseClass = ck::tensor_operation::device::BaseOperator; - - // Create an instance of the derived class - DeviceInstance device_instance; - - // Get a pointer to the base class - BaseClass* base_ptr = &device_instance; - - // Call GetInstanceString through the base class pointer - std::string instance_str = base_ptr->GetInstanceString(); - - // Expected complete instance string based on the first instance from - // device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances - // This corresponds to the configuration with BlockSize=64, MPerBlock=64, NPerBlock=64, etc. - std::string expected_str = "DeviceGroupedConvBwdWeight_Xdl_CShuffle" - "<2" // NDimSpatial - ",GNHWC" // InLayout - ",GKYXC" // WeiLayout - ",GNHWK" // OutLayout - ",fp16" // InDataType - ",fp16" // WeiDataType - ",fp16" // OutDataType - ",fp32" // AccDataType - ",PassThrough" // InElementwiseOperation - ",PassThrough" // WeiElementwiseOperation - ",PassThrough" // OutElementwiseOperation - ",Default" // ConvBackwardWeightSpecialization - ",64" // BlockSize - ",64" // MPerBlock - ",64" // NPerBlock - ",4" // K0PerBlock - ",8" // K1 - ",32" // MPerXDL - ",32" // NPerXDL - ",2" // MXdlPerWave - ",2" // NXdlPerWave - ",Seq(1,4,8,2)" // ABlockTransferThreadClusterLengths_K0_M_K1 - ",Seq(0,3,1,2)" // ABlockTransferThreadClusterArrangeOrder - ",Seq(0,2,1,3)" // ABlockTransferSrcAccessOrder - ",2" // ABlockTransferSrcVectorDim - ",2" // ABlockTransferSrcScalarPerVector - ",4" // ABlockTransferDstScalarPerVector_K1 - ",true" // ABlockLdsAddExtraM - ",Seq(1,4,8,2)" // BBlockTransferThreadClusterLengths_K0_N_K1 - ",Seq(0,3,1,2)" // BBlockTransferThreadClusterArrangeOrder - ",Seq(0,2,1,3)" // BBlockTransferSrcAccessOrder - ",2" // BBlockTransferSrcVectorDim - ",2" // BBlockTransferSrcScalarPerVector - ",4" // BBlockTransferDstScalarPerVector_K1 - ",true" // BBlockLdsAddExtraN - ",1" // CShuffleMXdlPerWavePerShuffle - ",1" // CShuffleNXdlPerWavePerShuffle - ",Seq(1,16,1,4)" // CBlockTransferClusterLengths - ",2" // CBlockTransferScalarPerVector_NWaveNPerXdl - ",fp16" // ComputeTypeA - ",fp16" // ComputeTypeB - ",1" // MaxTransposeTransferSrcScalarPerVector - ",1>"; // MaxTransposeTransferDstScalarPerVector - - EXPECT_EQ(instance_str, expected_str); -} diff --git a/experimental/builder/test/test_get_instance_string_fwd_grp_conv.cpp b/experimental/builder/test/test_get_instance_string_fwd_grp_conv.cpp deleted file mode 100644 index a26f2b0f39..0000000000 --- a/experimental/builder/test/test_get_instance_string_fwd_grp_conv.cpp +++ /dev/null @@ -1,90 +0,0 @@ -// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. -// SPDX-License-Identifier: MIT - -#include -#include -#include -#include - -// Test GetInstanceString through base class pointer for non-V3 variant -TEST(GetInstanceString, ReturnsStringForFwdGrpConvInstance) -{ - // Use the template helper to get a working instance configuration - using InstanceTuple = - ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_instances< - 2, // NDimSpatial - ck::tensor_operation::device::instance::GNHWC, // ALayout - ck::tensor_operation::device::instance::GKYXC, // BLayout - ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout - ck::tensor_operation::device::instance::GNHWK, // ELayout - ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization - - // Get the first instance from the tuple - using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; - - // Define the base class type using the most general operator base - using BaseClass = ck::tensor_operation::device::BaseOperator; - - // Create an instance of the derived class - DeviceInstance device_instance; - - // Get a pointer to the base class - BaseClass* base_ptr = &device_instance; - - // Call GetInstanceString through the base class pointer - std::string instance_str = base_ptr->GetInstanceString(); - - // Expected complete instance string based on the first instance from - // device_grouped_conv_fwd_xdl_f16_instances - std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle" - "<2" // NDimSpatial - ",GNHWC" // ALayout - ",GKYXC" // BLayout - ",EmptyTuple" // DsLayout - ",GNHWK" // ELayout - ",fp16" // ADataType - ",fp16" // BDataType - ",fp32" // AccDataType - ",fp16" // CShuffleDataType - ",EmptyTuple" // DsDataType - ",fp16" // EDataType - ",PassThrough" // AElementwiseOperation - ",PassThrough" // BElementwiseOperation - ",PassThrough" // CDEElementwiseOperation - ",Default" // ConvForwardSpecialization - ",MNKPadding" // GemmSpec - ",1" // NumGemmKPrefetchStage - ",64" // BlockSize - ",64" // MPerBlock - ",64" // NPerBlock - ",32" // KPerBlock - ",8" // AK1 - ",8" // BK1 - ",32" // MPerXDL - ",32" // NPerXDL - ",2" // MXdlPerWave - ",2" // NXdlPerWave - ",Seq(4,16,1)" // ABlockTransferThreadClusterLengths - ",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder - ",Seq(1,0,2)" // ABlockTransferSrcAccessOrder - ",2" // ABlockTransferSrcVectorDim - ",1" // ABlockTransferSrcScalarPerVector - ",8" // ABlockTransferDstScalarPerVector_AK1 - ",1" // ABlockLdsExtraM - ",Seq(4,16,1)" // BBlockTransferThreadClusterLengths - ",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder - ",Seq(1,0,2)" // BBlockTransferSrcAccessOrder - ",2" // BBlockTransferSrcVectorDim - ",1" // BBlockTransferSrcScalarPerVector - ",8" // BBlockTransferDstScalarPerVector_BK1 - ",1" // BBlockLdsExtraN - ",1" // CShuffleMXdlPerWavePerShuffle - ",1" // CShuffleNXdlPerWavePerShuffle - ",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths - ",1" // CDEBlockTransferScalarPerVector_NPerBlock - ",fp16" // AComputeDataType - ",fp16" // BComputeDataType - ",Default" // LoopScheduler - ",1>"; // NumGroupsToMerge - EXPECT_EQ(instance_str, expected_str); -} diff --git a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_dl.cpp b/experimental/builder/test/test_get_instance_string_fwd_grp_conv_dl.cpp deleted file mode 100644 index 9a1b5cb703..0000000000 --- a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_dl.cpp +++ /dev/null @@ -1,85 +0,0 @@ -// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. -// SPDX-License-Identifier: MIT - -#include -#include -#include -#include - -// Test GetInstanceString through base class pointer for DL variant -TEST(GetInstanceString, ReturnsStringForFwdGrpConvDlInstance) -{ - // Use the template helper to get a working instance configuration - using InstanceTuple = - ck::tensor_operation::device::instance::device_grouped_conv2d_fwd_dl_f16_instances< - ck::tensor_operation::device::instance::GNHWC, // InLayout - ck::tensor_operation::device::instance::GKYXC, // WeiLayout - ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout - ck::tensor_operation::device::instance::GNHWK, // OutLayout - ck::Tuple<>, // DsDatatype - ck::tensor_operation::element_wise::PassThrough, // CDEElementOp - ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec - - // Get the first instance from the tuple - using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; - - // Define the base class type using the most general operator base - using BaseClass = ck::tensor_operation::device::BaseOperator; - - // Create an instance of the derived class - DeviceInstance device_instance; - - // Get a pointer to the base class - BaseClass* base_ptr = &device_instance; - - // Call GetInstanceString through the base class pointer - std::string instance_str = base_ptr->GetInstanceString(); - - // Expected complete instance string based on the first instance from - // device_grouped_conv2d_fwd_dl_f16_instances - std::string expected_str = "DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK" - "<2" // NDimSpatial - ",fp16" // ADataType - ",fp16" // BDataType - ",EmptyTuple" // DsDataType - ",fp16" // EDataType - ",fp32" // AccDataType - ",GNHWC" // ALayout - ",GKYXC" // BLayout - ",EmptyTuple" // DsLayout - ",GNHWK" // ELayout - ",PassThrough" // AElementwiseOperation - ",PassThrough" // BElementwiseOperation - ",PassThrough" // CDEElementwiseOperation - ",Default" // ConvForwardSpecialization - ",MNKPadding" // GemmSpec - ",8" // BlockSize - ",16" // MPerBlock - ",4" // NPerBlock - ",2" // K0PerBlock - ",1" // K1 - ",1" // M1PerThread - ",2" // N1PerThread - ",1" // KPerThread - ",Seq(4,2)" // M1N1ThreadClusterM1Xs - ",Seq(1,1)" // M1N1ThreadClusterN1Xs - ",Seq(2,1,2,1)" // ABlockTransferThreadSliceLengths_K0_M0_M1_K1 - ",Seq(1,1,8,1)" // ABlockTransferThreadClusterLengths_K0_M0_M1_K1 - ",Seq(1,2,0,3)" // ABlockTransferThreadClusterArrangeOrder - ",Seq(1,2,0,3)" // ABlockTransferSrcAccessOrder - ",Seq(1,1,1,1)" // ABlockTransferSrcVectorTensorLengths_K0_M0_M1_K1 - ",Seq(1,2,0,3)" // ABlockTransferSrcVectorTensorContiguousDimOrder - ",Seq(1,1,1,1)" // ABlockTransferDstVectorTensorLengths_K0_M0_M1_K1 - ",Seq(1,1,1,1)" // BBlockTransferThreadSliceLengths_K0_N0_N1_K1 - ",Seq(2,1,4,1)" // BBlockTransferThreadClusterLengths_K0_N0_N1_K1 - ",Seq(1,2,0,3)" // BBlockTransferThreadClusterArrangeOrder - ",Seq(1,2,0,3)" // BBlockTransferSrcAccessOrder - ",Seq(1,1,1,1)" // BBlockTransferSrcVectorTensorLengths_K0_N0_N1_K1 - ",Seq(1,2,0,3)" // BBlockTransferSrcVectorTensorContiguousDimOrder - ",Seq(1,1,1,1)" // BBlockTransferDstVectorTensorLengths_K0_N0_N1_K1 - ",Seq(0,1,2,3,4,5)" // CThreadTransferSrcDstAccessOrder - ",5" // CThreadTransferSrcDstVectorDim - ",1>"; // CThreadTransferDstScalarPerVector - - EXPECT_EQ(instance_str, expected_str); -} diff --git a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_large_tensor.cpp b/experimental/builder/test/test_get_instance_string_fwd_grp_conv_large_tensor.cpp deleted file mode 100644 index bb192b0177..0000000000 --- a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_large_tensor.cpp +++ /dev/null @@ -1,89 +0,0 @@ -// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. -// SPDX-License-Identifier: MIT - -#include -#include -#include -#include - -// Test GetInstanceString through base class pointer for large tensor variant -TEST(GetInstanceString, ReturnsStringForFwdGrpConvLargeTensorInstance) -{ - // Use the template helper to get a working instance configuration - using InstanceTuple = ck::tensor_operation::device::instance:: - device_grouped_conv_fwd_xdl_large_tensor_f16_instances< - 2, // NDimSpatial - ck::tensor_operation::device::instance::GNHWC, // ALayout - ck::tensor_operation::device::instance::GKYXC, // BLayout - ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout - ck::tensor_operation::device::instance::GNHWK, // ELayout - ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization - - // Get the first instance from the tuple - using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; - - // Define the base class type using the most general operator base - using BaseClass = ck::tensor_operation::device::BaseOperator; - - // Create an instance of the derived class - DeviceInstance device_instance; - - // Get a pointer to the base class - BaseClass* base_ptr = &device_instance; - - // Call GetInstanceString through the base class pointer - std::string instance_str = base_ptr->GetInstanceString(); - - // Expected complete instance string based on the first instance from - // device_grouped_conv_fwd_xdl_large_tensor_f16_instances - std::string expected_str = "DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor" - "<2" // NDimSpatial - ",GNHWC" // ALayout - ",GKYXC" // BLayout - ",EmptyTuple" // DsLayout - ",GNHWK" // ELayout - ",fp16" // ADataType - ",fp16" // BDataType - ",fp32" // AccDataType - ",fp16" // CShuffleDataType - ",EmptyTuple" // DsDataType - ",fp16" // EDataType - ",PassThrough" // AElementwiseOperation - ",PassThrough" // BElementwiseOperation - ",PassThrough" // CDEElementwiseOperation - ",Default" // ConvForwardSpecialization - ",MNKPadding" // GemmSpec - ",1" // NumGemmKPrefetchStage - ",64" // BlockSize - ",64" // MPerBlock - ",64" // NPerBlock - ",32" // KPerBlock - ",8" // AK1 - ",8" // BK1 - ",32" // MPerXDL - ",32" // NPerXDL - ",2" // MXdlPerWave - ",2" // NXdlPerWave - ",Seq(4,16,1)" // ABlockTransferThreadClusterLengths - ",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder - ",Seq(1,0,2)" // ABlockTransferSrcAccessOrder - ",2" // ABlockTransferSrcVectorDim - ",1" // ABlockTransferSrcScalarPerVector - ",8" // ABlockTransferDstScalarPerVector_AK1 - ",1" // ABlockLdsExtraM - ",Seq(4,16,1)" // BBlockTransferThreadClusterLengths - ",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder - ",Seq(1,0,2)" // BBlockTransferSrcAccessOrder - ",2" // BBlockTransferSrcVectorDim - ",1" // BBlockTransferSrcScalarPerVector - ",8" // BBlockTransferDstScalarPerVector_BK1 - ",1" // BBlockLdsExtraN - ",1" // CShuffleMXdlPerWavePerShuffle - ",1" // CShuffleNXdlPerWavePerShuffle - ",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths - ",1" // CDEBlockTransferScalarPerVector_NPerBlock - ",fp16" // AComputeDataType - ",fp16" // BComputeDataType - ",Default>"; // LoopScheduler - EXPECT_EQ(instance_str, expected_str); -} diff --git a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_v3.cpp b/experimental/builder/test/test_get_instance_string_fwd_grp_conv_v3.cpp deleted file mode 100644 index 6e37dccf74..0000000000 --- a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_v3.cpp +++ /dev/null @@ -1,91 +0,0 @@ -// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. -// SPDX-License-Identifier: MIT - -#include -#include -#include -#include - -// Test GetInstanceString through base class pointer for V3 variant -TEST(GetInstanceString, ReturnsStringForFwdGrpConvV3Instance) -{ - // Use the template helper to get a working instance configuration - using InstanceTuple = - ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_comp_instances< - 2, // NDimSpatial - ck::tensor_operation::device::instance::GNHWC, // ALayout - ck::tensor_operation::device::instance::GKYXC, // BLayout - ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout - ck::tensor_operation::device::instance::GNHWK, // ELayout - ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization - - // Get the first instance from the tuple - using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; - - // Define the base class type using the most general operator base - using BaseClass = ck::tensor_operation::device::BaseOperator; - - // Create an instance of the derived class - DeviceInstance device_instance; - - // Get a pointer to the base class - BaseClass* base_ptr = &device_instance; - - // Call GetInstanceString through the base class pointer - std::string instance_str = base_ptr->GetInstanceString(); - - // Expected complete instance string based on the first instance from - // device_grouped_conv_fwd_xdl_f16_comp_instances This corresponds to the configuration with - // BlockSize=256, MPerBlock=128, NPerBlock=128, KPerBlock=64, etc. - std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3" - "<2" // NDimSpatial - ",GNHWC" // ALayout - ",GKYXC" // BLayout - ",EmptyTuple" // DsLayout - ",GNHWK" // ELayout - ",fp16" // ADataType - ",fp16" // BDataType - ",fp32" // AccDataType - ",fp16" // CShuffleDataType - ",EmptyTuple" // DsDataType - ",fp16" // EDataType - ",PassThrough" // AElementwiseOperation - ",PassThrough" // BElementwiseOperation - ",PassThrough" // CDEElementwiseOperation - ",Default" // ConvForwardSpecialization - ",MNKPadding" // GemmSpec - ",256" // BlockSize - ",128" // MPerBlock - ",128" // NPerBlock - ",64" // KPerBlock - ",8" // AK1 - ",8" // BK1 - ",32" // MPerXDL - ",32" // NPerXDL - ",2" // MXdlPerWave - ",2" // NXdlPerWave - ",Seq(8,32,1)" // ABlockTransferThreadClusterLengths - ",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder - ",Seq(1,0,2)" // ABlockTransferSrcAccessOrder - ",2" // ABlockTransferSrcVectorDim - ",8" // ABlockTransferSrcScalarPerVector - ",8" // ABlockTransferDstScalarPerVector_AK1 - ",0" // ABlockLdsExtraM - ",Seq(8,32,1)" // BBlockTransferThreadClusterLengths - ",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder - ",Seq(1,0,2)" // BBlockTransferSrcAccessOrder - ",2" // BBlockTransferSrcVectorDim - ",8" // BBlockTransferSrcScalarPerVector - ",8" // BBlockTransferDstScalarPerVector_BK1 - ",0" // BBlockLdsExtraN - ",1" // CShuffleMXdlPerWavePerShuffle - ",1" // CShuffleNXdlPerWavePerShuffle - ",Seq(1,32,1,8)" // CDEBlockTransferClusterLengths - ",8" // CDEBlockTransferScalarPerVector_NPerBlock - ",Intrawave" // BlkGemmPipeSched - ",v4" // BlkGemmPipelineVer - ",fp16" // AComputeDataType - ",fp16" // BComputeDataType - ",false>"; // DirectLoad - EXPECT_EQ(instance_str, expected_str); -} diff --git a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_wmma.cpp b/experimental/builder/test/test_get_instance_string_fwd_grp_conv_wmma.cpp deleted file mode 100644 index 8e1dc0f790..0000000000 --- a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_wmma.cpp +++ /dev/null @@ -1,90 +0,0 @@ -// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. -// SPDX-License-Identifier: MIT - -#include -#include -#include -#include - -// Test GetInstanceString through base class pointer for Wmma variant -TEST(GetInstanceString, ReturnsStringForFwdGrpConvWmmaInstance) -{ - // Use the template helper to get a working instance configuration - using InstanceTuple = - ck::tensor_operation::device::instance::device_grouped_conv_fwd_wmma_f16_instances< - 2, // NDimSpatial - ck::tensor_operation::device::instance::GNHWC, // ALayout - ck::tensor_operation::device::instance::GKYXC, // BLayout - ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout - ck::tensor_operation::device::instance::GNHWK, // ELayout - ck::Tuple<>, // DsDatatype - ck::tensor_operation::element_wise::PassThrough, // CDEElementOp - ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec - - // Get the first instance from the tuple - using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; - - // Define the base class type using the most general operator base - using BaseClass = ck::tensor_operation::device::BaseOperator; - - // Create an instance of the derived class - DeviceInstance device_instance; - - // Get a pointer to the base class - BaseClass* base_ptr = &device_instance; - - // Call GetInstanceString through the base class pointer - std::string instance_str = base_ptr->GetInstanceString(); - - // Expected complete instance string based on the first instance from - // device_grouped_conv_fwd_wmma_f16_instances - std::string expected_str = "DeviceGroupedConvFwdMultipleD_Wmma_CShuffle" - "<2" // NDimSpatial - ",GNHWC" // ALayout - ",GKYXC" // BLayout - ",EmptyTuple" // DsLayout - ",GNHWK" // ELayout - ",fp16" // ADataType - ",fp16" // BDataType - ",fp32" // AccDataType - ",fp16" // CShuffleDataType - ",EmptyTuple" // DsDataType - ",fp16" // EDataType - ",PassThrough" // AElementwiseOperation - ",PassThrough" // BElementwiseOperation - ",PassThrough" // CDEElementwiseOperation - ",Default" // ConvForwardSpecialization - ",MNKPadding" // GemmSpec - ",1" // NumGemmKPrefetchStage - ",128" // BlockSize - ",64" // MPerBlock - ",64" // NPerBlock - ",32" // KPerBlock - ",8" // K1 - ",16" // MPerWmma - ",16" // NPerWmma - ",2" // MRepeat - ",2" // NRepeat - ",Seq(4,32,1)" // ABlockTransferThreadClusterLengths - ",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder - ",Seq(1,0,2)" // ABlockTransferSrcAccessOrder - ",2" // ABlockTransferSrcVectorDim - ",1" // ABlockTransferSrcScalarPerVector - ",8" // ABlockTransferDstScalarPerVector_AK1 - ",true" // ABlockLdsExtraM - ",Seq(4,32,1)" // BBlockTransferThreadClusterLengths - ",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder - ",Seq(1,0,2)" // BBlockTransferSrcAccessOrder - ",2" // BBlockTransferSrcVectorDim - ",1" // BBlockTransferSrcScalarPerVector - ",8" // BBlockTransferDstScalarPerVector_BK1 - ",true" // BBlockLdsExtraN - ",1" // CShuffleMRepeatPerShuffle - ",1" // CShuffleNRepeatPerShuffle - ",Seq(1,32,1,4)" // CDEShuffleBlockTransferClusterLengths - ",1" // CDEShuffleBlockTransferScalarPerVector_NPerBlock - ",Default" // LoopSched - ",v1>"; // PipelineVer - - EXPECT_EQ(instance_str, expected_str); -} diff --git a/experimental/builder/test/test_instance_string_bwd_weight_grp_conv_xdl.cpp b/experimental/builder/test/test_instance_string_bwd_weight_grp_conv_xdl.cpp new file mode 100644 index 0000000000..88a57a3735 --- /dev/null +++ b/experimental/builder/test/test_instance_string_bwd_weight_grp_conv_xdl.cpp @@ -0,0 +1,90 @@ +// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +// SPDX-License-Identifier: MIT + +#include +#include "ck_tile/builder/reflect/instance_traits.hpp" +#include "ck_tile/builder/reflect/conv_description.hpp" +#include "ck/tensor_operation/gpu/device/device_base.hpp" +#include "ck/library/tensor_operation_instance/gpu/grouped_conv_bwd_weight/device_grouped_conv_bwd_weight_xdl_instance.hpp" + +namespace { + +namespace ckr = ck_tile::reflect; + +// Use the template helper to get a working instance configuration +using InstanceTuple = ck::tensor_operation::device::instance:: + device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances< + 2, // NDimSpatial + ck::tensor_operation::device::instance::GNHWC, // InLayout + ck::tensor_operation::device::instance::GKYXC, // WeiLayout + ck::tensor_operation::device::instance::GNHWK, // OutLayout + ck::tensor_operation::device::instance:: + ConvBwdWeightDefault>; // ConvBwdWeightSpecialization + +// Get the first instance from the tuple +using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; + +// Expected complete instance string based on the first instance from +// device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances +// This corresponds to the configuration with BlockSize=64, MPerBlock=64, NPerBlock=64, etc. +std::string expected_str = "DeviceGroupedConvBwdWeight_Xdl_CShuffle" + "<2" // NDimSpatial + ",GNHWC" // InLayout + ",GKYXC" // WeiLayout + ",GNHWK" // OutLayout + ",fp16" // InDataType + ",fp16" // WeiDataType + ",fp16" // OutDataType + ",fp32" // AccDataType + ",PassThrough" // InElementwiseOperation + ",PassThrough" // WeiElementwiseOperation + ",PassThrough" // OutElementwiseOperation + ",Default" // ConvBackwardWeightSpecialization + ",64" // BlockSize + ",64" // MPerBlock + ",64" // NPerBlock + ",4" // K0PerBlock + ",8" // K1 + ",32" // MPerXDL + ",32" // NPerXDL + ",2" // MXdlPerWave + ",2" // NXdlPerWave + ",Seq(1,4,8,2)" // ABlockTransferThreadClusterLengths_K0_M_K1 + ",Seq(0,3,1,2)" // ABlockTransferThreadClusterArrangeOrder + ",Seq(0,2,1,3)" // ABlockTransferSrcAccessOrder + ",2" // ABlockTransferSrcVectorDim + ",2" // ABlockTransferSrcScalarPerVector + ",4" // ABlockTransferDstScalarPerVector_K1 + ",true" // ABlockLdsAddExtraM + ",Seq(1,4,8,2)" // BBlockTransferThreadClusterLengths_K0_N_K1 + ",Seq(0,3,1,2)" // BBlockTransferThreadClusterArrangeOrder + ",Seq(0,2,1,3)" // BBlockTransferSrcAccessOrder + ",2" // BBlockTransferSrcVectorDim + ",2" // BBlockTransferSrcScalarPerVector + ",4" // BBlockTransferDstScalarPerVector_K1 + ",true" // BBlockLdsAddExtraN + ",1" // CShuffleMXdlPerWavePerShuffle + ",1" // CShuffleNXdlPerWavePerShuffle + ",Seq(1,16,1,4)" // CBlockTransferClusterLengths + ",2" // CBlockTransferScalarPerVector_NWaveNPerXdl + ",fp16" // ComputeTypeA + ",fp16" // ComputeTypeB + ",1" // MaxTransposeTransferSrcScalarPerVector + ",1>"; // MaxTransposeTransferDstScalarPerVector + +// Test GetInstanceString through base class pointer for backward weight XDL variant +TEST(InstanceString, GetInstanceStringReturnsCorrectValueForBwdWeightGrpConvXdl) +{ + using BaseClass = ck::tensor_operation::device::BaseOperator; + DeviceInstance device_instance; + BaseClass* base_ptr = &device_instance; + + EXPECT_EQ(base_ptr->GetInstanceString(), expected_str); +} + +// TODO: Add DescriptionReturnsCorrectValueForBwdWeightGrpConvXdl test once ckr::describe supports +// backward weight convolution The backward weight variant uses different layout naming (InLayout, +// WeiLayout, OutLayout) instead of (ALayout, BLayout, ELayout) and has +// ConvBackwardWeightSpecialization + +} // namespace diff --git a/experimental/builder/test/test_instance_string_fwd_grp_conv.cpp b/experimental/builder/test/test_instance_string_fwd_grp_conv.cpp new file mode 100644 index 0000000000..9929f276a7 --- /dev/null +++ b/experimental/builder/test/test_instance_string_fwd_grp_conv.cpp @@ -0,0 +1,95 @@ +// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +// SPDX-License-Identifier: MIT + +#include "gtest/gtest.h" +#include "ck_tile/builder/reflect/instance_traits.hpp" +#include "ck_tile/builder/reflect/conv_description.hpp" +#include "ck/tensor_operation/gpu/device/device_base.hpp" +#include "ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp" + +namespace { + +namespace ckr = ck_tile::reflect; + +// Use the template helper to get a working instance configuration +using InstanceTuple = + ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_instances< + 2, // NDimSpatial + ck::tensor_operation::device::instance::GNHWC, // ALayout + ck::tensor_operation::device::instance::GKYXC, // BLayout + ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout + ck::tensor_operation::device::instance::GNHWK, // ELayout + ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization + +// Get the first instance from the tuple +using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; + +// Expected complete instance string based on the first instance from +// device_grouped_conv_fwd_xdl_f16_instances +std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle" + "<2" // NDimSpatial + ",GNHWC" // ALayout + ",GKYXC" // BLayout + ",EmptyTuple" // DsLayout + ",GNHWK" // ELayout + ",fp16" // ADataType + ",fp16" // BDataType + ",fp32" // AccDataType + ",fp16" // CShuffleDataType + ",EmptyTuple" // DsDataType + ",fp16" // EDataType + ",PassThrough" // AElementwiseOperation + ",PassThrough" // BElementwiseOperation + ",PassThrough" // CDEElementwiseOperation + ",Default" // ConvForwardSpecialization + ",MNKPadding" // GemmSpec + ",1" // NumGemmKPrefetchStage + ",64" // BlockSize + ",64" // MPerBlock + ",64" // NPerBlock + ",32" // KPerBlock + ",8" // AK1 + ",8" // BK1 + ",32" // MPerXDL + ",32" // NPerXDL + ",2" // MXdlPerWave + ",2" // NXdlPerWave + ",Seq(4,16,1)" // ABlockTransferThreadClusterLengths + ",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder + ",Seq(1,0,2)" // ABlockTransferSrcAccessOrder + ",2" // ABlockTransferSrcVectorDim + ",1" // ABlockTransferSrcScalarPerVector + ",8" // ABlockTransferDstScalarPerVector_AK1 + ",1" // ABlockLdsExtraM + ",Seq(4,16,1)" // BBlockTransferThreadClusterLengths + ",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder + ",Seq(1,0,2)" // BBlockTransferSrcAccessOrder + ",2" // BBlockTransferSrcVectorDim + ",1" // BBlockTransferSrcScalarPerVector + ",8" // BBlockTransferDstScalarPerVector_BK1 + ",1" // BBlockLdsExtraN + ",1" // CShuffleMXdlPerWavePerShuffle + ",1" // CShuffleNXdlPerWavePerShuffle + ",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths + ",1" // CDEBlockTransferScalarPerVector_NPerBlock + ",fp16" // AComputeDataType + ",fp16" // BComputeDataType + ",Default" // LoopScheduler + ",1>"; // NumGroupsToMerge + +// Test GetInstanceString through base class pointer for standard XDL variant +TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConv) +{ + using BaseClass = ck::tensor_operation::device::BaseOperator; + DeviceInstance device_instance; + BaseClass* base_ptr = &device_instance; + + EXPECT_EQ(base_ptr->GetInstanceString(), expected_str); +} + +TEST(InstanceString, DescriptionReturnsCorrectValueForFwdGrpConv) +{ + EXPECT_EQ(ckr::describe().instance_string(), expected_str); +} + +} // namespace diff --git a/experimental/builder/test/test_instance_string_fwd_grp_conv_dl.cpp b/experimental/builder/test/test_instance_string_fwd_grp_conv_dl.cpp new file mode 100644 index 0000000000..4f018cca11 --- /dev/null +++ b/experimental/builder/test/test_instance_string_fwd_grp_conv_dl.cpp @@ -0,0 +1,88 @@ +// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +// SPDX-License-Identifier: MIT + +#include +#include +#include +#include +#include + +namespace { + +namespace ckr = ck_tile::reflect; + +// Use the template helper to get a working instance configuration +using InstanceTuple = + ck::tensor_operation::device::instance::device_grouped_conv2d_fwd_dl_f16_instances< + ck::tensor_operation::device::instance::GNHWC, // InLayout + ck::tensor_operation::device::instance::GKYXC, // WeiLayout + ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout + ck::tensor_operation::device::instance::GNHWK, // OutLayout + ck::Tuple<>, // DsDatatype + ck::tensor_operation::element_wise::PassThrough, // CDEElementOp + ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec + +// Get the first instance from the tuple +using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; + +// Expected complete instance string based on the first instance from +// device_grouped_conv2d_fwd_dl_f16_instances +std::string expected_str = "DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK" + "<2" // NDimSpatial + ",fp16" // ADataType + ",fp16" // BDataType + ",EmptyTuple" // DsDataType + ",fp16" // EDataType + ",fp32" // AccDataType + ",GNHWC" // ALayout + ",GKYXC" // BLayout + ",EmptyTuple" // DsLayout + ",GNHWK" // ELayout + ",PassThrough" // AElementwiseOperation + ",PassThrough" // BElementwiseOperation + ",PassThrough" // CDEElementwiseOperation + ",Default" // ConvForwardSpecialization + ",MNKPadding" // GemmSpec + ",8" // BlockSize + ",16" // MPerBlock + ",4" // NPerBlock + ",2" // K0PerBlock + ",1" // K1 + ",1" // M1PerThread + ",2" // N1PerThread + ",1" // KPerThread + ",Seq(4,2)" // M1N1ThreadClusterM1Xs + ",Seq(1,1)" // M1N1ThreadClusterN1Xs + ",Seq(2,1,2,1)" // ABlockTransferThreadSliceLengths_K0_M0_M1_K1 + ",Seq(1,1,8,1)" // ABlockTransferThreadClusterLengths_K0_M0_M1_K1 + ",Seq(1,2,0,3)" // ABlockTransferThreadClusterArrangeOrder + ",Seq(1,2,0,3)" // ABlockTransferSrcAccessOrder + ",Seq(1,1,1,1)" // ABlockTransferSrcVectorTensorLengths_K0_M0_M1_K1 + ",Seq(1,2,0,3)" // ABlockTransferSrcVectorTensorContiguousDimOrder + ",Seq(1,1,1,1)" // ABlockTransferDstVectorTensorLengths_K0_M0_M1_K1 + ",Seq(1,1,1,1)" // BBlockTransferThreadSliceLengths_K0_N0_N1_K1 + ",Seq(2,1,4,1)" // BBlockTransferThreadClusterLengths_K0_N0_N1_K1 + ",Seq(1,2,0,3)" // BBlockTransferThreadClusterArrangeOrder + ",Seq(1,2,0,3)" // BBlockTransferSrcAccessOrder + ",Seq(1,1,1,1)" // BBlockTransferSrcVectorTensorLengths_K0_N0_N1_K1 + ",Seq(1,2,0,3)" // BBlockTransferSrcVectorTensorContiguousDimOrder + ",Seq(1,1,1,1)" // BBlockTransferDstVectorTensorLengths_K0_N0_N1_K1 + ",Seq(0,1,2,3,4,5)" // CThreadTransferSrcDstAccessOrder + ",5" // CThreadTransferSrcDstVectorDim + ",1>"; // CThreadTransferDstScalarPerVector + +// Test GetInstanceString through base class pointer for DL variant +TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvDl) +{ + using BaseClass = ck::tensor_operation::device::BaseOperator; + DeviceInstance device_instance; + BaseClass* base_ptr = &device_instance; + + EXPECT_EQ(base_ptr->GetInstanceString(), expected_str); +} + +// TODO: Add DescriptionReturnsCorrectValueForFwdGrpConvDl test once ckr::describe supports DL +// variants The DL variant uses different internal structure (K0PerBlock, K1, M1PerThread, etc.) and +// is missing standard members like kKPerBlock, kMPerXDL, kAK1 + +} // namespace diff --git a/experimental/builder/test/test_instance_string_fwd_grp_conv_large_tensor.cpp b/experimental/builder/test/test_instance_string_fwd_grp_conv_large_tensor.cpp new file mode 100644 index 0000000000..aecce25f1d --- /dev/null +++ b/experimental/builder/test/test_instance_string_fwd_grp_conv_large_tensor.cpp @@ -0,0 +1,94 @@ +// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +// SPDX-License-Identifier: MIT + +#include +#include +#include +#include +#include + +namespace { + +namespace ckr = ck_tile::reflect; + +// Use the template helper to get a working instance configuration +using InstanceTuple = + ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_large_tensor_f16_instances< + 2, // NDimSpatial + ck::tensor_operation::device::instance::GNHWC, // ALayout + ck::tensor_operation::device::instance::GKYXC, // BLayout + ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout + ck::tensor_operation::device::instance::GNHWK, // ELayout + ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization + +// Get the first instance from the tuple +using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; + +// Expected complete instance string based on the first instance from +// device_grouped_conv_fwd_xdl_large_tensor_f16_instances +std::string expected_str = "DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor" + "<2" // NDimSpatial + ",GNHWC" // ALayout + ",GKYXC" // BLayout + ",EmptyTuple" // DsLayout + ",GNHWK" // ELayout + ",fp16" // ADataType + ",fp16" // BDataType + ",fp32" // AccDataType + ",fp16" // CShuffleDataType + ",EmptyTuple" // DsDataType + ",fp16" // EDataType + ",PassThrough" // AElementwiseOperation + ",PassThrough" // BElementwiseOperation + ",PassThrough" // CDEElementwiseOperation + ",Default" // ConvForwardSpecialization + ",MNKPadding" // GemmSpec + ",1" // NumGemmKPrefetchStage + ",64" // BlockSize + ",64" // MPerBlock + ",64" // NPerBlock + ",32" // KPerBlock + ",8" // AK1 + ",8" // BK1 + ",32" // MPerXDL + ",32" // NPerXDL + ",2" // MXdlPerWave + ",2" // NXdlPerWave + ",Seq(4,16,1)" // ABlockTransferThreadClusterLengths + ",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder + ",Seq(1,0,2)" // ABlockTransferSrcAccessOrder + ",2" // ABlockTransferSrcVectorDim + ",1" // ABlockTransferSrcScalarPerVector + ",8" // ABlockTransferDstScalarPerVector_AK1 + ",1" // ABlockLdsExtraM + ",Seq(4,16,1)" // BBlockTransferThreadClusterLengths + ",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder + ",Seq(1,0,2)" // BBlockTransferSrcAccessOrder + ",2" // BBlockTransferSrcVectorDim + ",1" // BBlockTransferSrcScalarPerVector + ",8" // BBlockTransferDstScalarPerVector_BK1 + ",1" // BBlockLdsExtraN + ",1" // CShuffleMXdlPerWavePerShuffle + ",1" // CShuffleNXdlPerWavePerShuffle + ",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths + ",1" // CDEBlockTransferScalarPerVector_NPerBlock + ",fp16" // AComputeDataType + ",fp16" // BComputeDataType + ",Default>"; // LoopScheduler + +// Test GetInstanceString through base class pointer for large tensor variant +TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvLargeTensor) +{ + using BaseClass = ck::tensor_operation::device::BaseOperator; + DeviceInstance device_instance; + BaseClass* base_ptr = &device_instance; + + EXPECT_EQ(base_ptr->GetInstanceString(), expected_str); +} + +TEST(InstanceString, DescriptionReturnsCorrectValueForFwdGrpConvLargeTensor) +{ + EXPECT_EQ(ckr::describe().instance_string(), expected_str); +} + +} // namespace diff --git a/experimental/builder/test/test_instance_string_fwd_grp_conv_v3.cpp b/experimental/builder/test/test_instance_string_fwd_grp_conv_v3.cpp new file mode 100644 index 0000000000..7eeaec8e25 --- /dev/null +++ b/experimental/builder/test/test_instance_string_fwd_grp_conv_v3.cpp @@ -0,0 +1,96 @@ +// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +// SPDX-License-Identifier: MIT + +#include +#include +#include +#include +#include + +namespace { + +namespace ckr = ck_tile::reflect; + +// Use the template helper to get a working instance configuration +using InstanceTuple = + ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_comp_instances< + 2, // NDimSpatial + ck::tensor_operation::device::instance::GNHWC, // ALayout + ck::tensor_operation::device::instance::GKYXC, // BLayout + ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout + ck::tensor_operation::device::instance::GNHWK, // ELayout + ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization + +// Get the first instance from the tuple +using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; + +// Expected complete instance string based on the first instance from +// device_grouped_conv_fwd_xdl_f16_comp_instances This corresponds to the configuration with +// BlockSize=256, MPerBlock=128, NPerBlock=128, KPerBlock=64, etc. +std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3" + "<2" // NDimSpatial + ",GNHWC" // ALayout + ",GKYXC" // BLayout + ",EmptyTuple" // DsLayout + ",GNHWK" // ELayout + ",fp16" // ADataType + ",fp16" // BDataType + ",fp32" // AccDataType + ",fp16" // CShuffleDataType + ",EmptyTuple" // DsDataType + ",fp16" // EDataType + ",PassThrough" // AElementwiseOperation + ",PassThrough" // BElementwiseOperation + ",PassThrough" // CDEElementwiseOperation + ",Default" // ConvForwardSpecialization + ",MNKPadding" // GemmSpec + ",256" // BlockSize + ",128" // MPerBlock + ",128" // NPerBlock + ",64" // KPerBlock + ",8" // AK1 + ",8" // BK1 + ",32" // MPerXDL + ",32" // NPerXDL + ",2" // MXdlPerWave + ",2" // NXdlPerWave + ",Seq(8,32,1)" // ABlockTransferThreadClusterLengths + ",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder + ",Seq(1,0,2)" // ABlockTransferSrcAccessOrder + ",2" // ABlockTransferSrcVectorDim + ",8" // ABlockTransferSrcScalarPerVector + ",8" // ABlockTransferDstScalarPerVector_AK1 + ",0" // ABlockLdsExtraM + ",Seq(8,32,1)" // BBlockTransferThreadClusterLengths + ",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder + ",Seq(1,0,2)" // BBlockTransferSrcAccessOrder + ",2" // BBlockTransferSrcVectorDim + ",8" // BBlockTransferSrcScalarPerVector + ",8" // BBlockTransferDstScalarPerVector_BK1 + ",0" // BBlockLdsExtraN + ",1" // CShuffleMXdlPerWavePerShuffle + ",1" // CShuffleNXdlPerWavePerShuffle + ",Seq(1,32,1,8)" // CDEBlockTransferClusterLengths + ",8" // CDEBlockTransferScalarPerVector_NPerBlock + ",Intrawave" // BlkGemmPipeSched + ",v4" // BlkGemmPipelineVer + ",fp16" // AComputeDataType + ",fp16" // BComputeDataType + ",false>"; // DirectLoad + +// Test GetInstanceString through base class pointer for V3 variant +TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvV3) +{ + using BaseClass = ck::tensor_operation::device::BaseOperator; + DeviceInstance device_instance; + BaseClass* base_ptr = &device_instance; + + EXPECT_EQ(base_ptr->GetInstanceString(), expected_str); +} + +TEST(InstanceString, DescriptionReturnsCorrectValueForFwdGrpConvV3) +{ + EXPECT_EQ(ckr::describe().instance_string(), expected_str); +} + +} // namespace diff --git a/experimental/builder/test/test_instance_string_fwd_grp_conv_wmma.cpp b/experimental/builder/test/test_instance_string_fwd_grp_conv_wmma.cpp new file mode 100644 index 0000000000..717b770c52 --- /dev/null +++ b/experimental/builder/test/test_instance_string_fwd_grp_conv_wmma.cpp @@ -0,0 +1,93 @@ +// Copyright (c) Advanced Micro Devices, Inc., or its affiliates. +// SPDX-License-Identifier: MIT + +#include +#include +#include +#include +#include + +namespace { + +namespace ckr = ck_tile::reflect; + +// Use the template helper to get a working instance configuration +using InstanceTuple = + ck::tensor_operation::device::instance::device_grouped_conv_fwd_wmma_f16_instances< + 2, // NDimSpatial + ck::tensor_operation::device::instance::GNHWC, // ALayout + ck::tensor_operation::device::instance::GKYXC, // BLayout + ck::tensor_operation::device::instance::Empty_Tuple, // DsLayout + ck::tensor_operation::device::instance::GNHWK, // ELayout + ck::Tuple<>, // DsDatatype + ck::tensor_operation::element_wise::PassThrough, // CDEElementOp + ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec + +// Get the first instance from the tuple +using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type; + +// Expected complete instance string based on the first instance from +// device_grouped_conv_fwd_wmma_f16_instances +std::string expected_str = "DeviceGroupedConvFwdMultipleD_Wmma_CShuffle" + "<2" // NDimSpatial + ",GNHWC" // ALayout + ",GKYXC" // BLayout + ",EmptyTuple" // DsLayout + ",GNHWK" // ELayout + ",fp16" // ADataType + ",fp16" // BDataType + ",fp32" // AccDataType + ",fp16" // CShuffleDataType + ",EmptyTuple" // DsDataType + ",fp16" // EDataType + ",PassThrough" // AElementwiseOperation + ",PassThrough" // BElementwiseOperation + ",PassThrough" // CDEElementwiseOperation + ",Default" // ConvForwardSpecialization + ",MNKPadding" // GemmSpec + ",1" // NumGemmKPrefetchStage + ",128" // BlockSize + ",64" // MPerBlock + ",64" // NPerBlock + ",32" // KPerBlock + ",8" // K1 + ",16" // MPerWmma + ",16" // NPerWmma + ",2" // MRepeat + ",2" // NRepeat + ",Seq(4,32,1)" // ABlockTransferThreadClusterLengths + ",Seq(1,0,2)" // ABlockTransferThreadClusterArrangeOrder + ",Seq(1,0,2)" // ABlockTransferSrcAccessOrder + ",2" // ABlockTransferSrcVectorDim + ",1" // ABlockTransferSrcScalarPerVector + ",8" // ABlockTransferDstScalarPerVector_AK1 + ",true" // ABlockLdsExtraM + ",Seq(4,32,1)" // BBlockTransferThreadClusterLengths + ",Seq(1,0,2)" // BBlockTransferThreadClusterArrangeOrder + ",Seq(1,0,2)" // BBlockTransferSrcAccessOrder + ",2" // BBlockTransferSrcVectorDim + ",1" // BBlockTransferSrcScalarPerVector + ",8" // BBlockTransferDstScalarPerVector_BK1 + ",true" // BBlockLdsExtraN + ",1" // CShuffleMRepeatPerShuffle + ",1" // CShuffleNRepeatPerShuffle + ",Seq(1,32,1,4)" // CDEShuffleBlockTransferClusterLengths + ",1" // CDEShuffleBlockTransferScalarPerVector_NPerBlock + ",Default" // LoopSched + ",v1>"; // PipelineVer + +// Test GetInstanceString through base class pointer for WMMA variant +TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvWmma) +{ + using BaseClass = ck::tensor_operation::device::BaseOperator; + DeviceInstance device_instance; + BaseClass* base_ptr = &device_instance; + + EXPECT_EQ(base_ptr->GetInstanceString(), expected_str); +} + +// TODO: Add DescriptionReturnsCorrectValueForFwdGrpConvWmma test once ckr::describe supports WMMA +// variants The WMMA variant uses WMMA-specific parameters (MPerWmma, NPerWmma, MRepeat, NRepeat) +// and has a different tile transfer structure incompatible with current ConvTraits + +} // namespace