[CK_BUILDER] Add Description::instance_string() method and update tests (#3340)

* Create Description::instance_string() function To expose more reflection capabilities in MIOpen, we add the instance_string functionality to the ckr::Description class. This PR introduces a base class, adds the instance_string method, and implements the method by injecting the Traits::instance_string method through the ConvDescription constructor. This will enable us to replace the specialized get_instance_string() method on device operations with a describe() method in a subsequent PR. * Test describe().instance_string() Update the instance string tests to also call `ckr::describe<Instance>().instance_string()`. This documents that the xld kernels are supported with describe(), but WMMA and DL kernels are not yet supported. Also update namespace and add a HasConvTraits concept.
2026-06-08 23:38:11 +00:00 · 2025-12-03 06:36:09 -08:00
parent e6a583416b
commit f29b67cf9b
18 changed files with 802 additions and 670 deletions
--- a/experimental/builder/include/ck_tile/builder/reflect/README.md
+++ b/experimental/builder/include/ck_tile/builder/reflect/README.md
@@ -10,19 +10,21 @@ The reflection system works by extracting properties from a convolution kernel *

 1. **Trait Extraction**: The `ConvTraits` template (in `conv_traits.hpp`) is specialized for each kernel instance. It extracts low-level details like tile sizes, data layouts, and pipeline versions from the kernel's type definition.

-2. **Description Generation**: The `Describe<Instance>()` function (in `conv_description.hpp`) uses `ConvTraits` to populate a `ConvDescription` struct.
+2. **Description Generation**: The `describe<Instance>()` function (in `conv_description.hpp`) uses `ConvTraits` to populate a `ConvDescription` (`Description`) object.

-3. **Formatting**: The `ConvDescription` struct contains methods like `brief()` and `detailed()` that format the extracted properties into well-structured strings for display.
+3. **Formatting**: The `ConvDescription` class (which implements `Description`) contains methods like `brief()` and `detailed()` that format the extracted properties into well-structured strings for display.

 ## Key Files

- **`conv_description.hpp`**: The main entry point. Contains the `ConvDescription` struct and the `Describe()` factory function.
+- **`description.hpp`**: The generalized Description base class with no implementation.
+
+- **`conv_description.hpp`**: The main entry point. Contains the `ConvDescription` struct and the `describe()` factory function.
 - **`conv_traits.hpp`**: Home of the `ConvTraits` template, which is the core of the property extraction mechanism.
 - **`tree_formatter.hpp`**: A simple utility for generating the indented, tree-like format used in the `detailed()` description.

 ## Usage

-To get a description of a convolution kernel instance, use the `Describe` function and call one of its formatting methods:
+To get a description of a convolution kernel instance, use the `describe` function and call one of its formatting methods:

 ```cpp
 #include "ck_tile/builder/reflect/conv_description.hpp"
@@ -36,3 +38,43 @@ const auto description = ck_tile::reflect::conv::Describe<MyConvFwdInstance>();
 // Print the detailed description
 std::cout << description.detailed() << std::endl;
 ```
+
+## Appendix: Current Limitations
+
+### Supported Instance Types
+
+The reflection system (`ckr::describe`) currently supports the following convolution instance types:
+
+- **Standard XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle`)
+- **Large Tensor XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor`)
+- **V3 XDL Forward Convolution** (`DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3`)
+
+These variants all share similar template parameter structures and are compatible with the current `ConvTraits` implementation.
+
+### Unsupported Instance Types
+
+The following instance types are **not yet supported** by the reflection system:
+
+- **DL (pre-XDL) Variants** (`DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK`)
+  - Uses different internal structure with parameters like `K0PerBlock`, `K1`, `M1PerThread`, etc.
+  - Missing standard members like `kKPerBlock`, `kMPerXDL`, `kAK1`
+
+- **WMMA Variants** (`DeviceGroupedConvFwdMultipleD_Wmma_CShuffle`)
+  - Uses WMMA-specific parameters like `MPerWmma`, `NPerWmma`, `MRepeat`, `NRepeat`
+  - Different tile transfer structure incompatible with current `ConvTraits`
+
+- **Backward Weight Convolution** (`DeviceGroupedConvBwdWeight_Xdl_CShuffle`)
+  - Uses different layout naming: `InLayout`, `WeiLayout`, `OutLayout` instead of `ALayout`, `BLayout`, `ELayout`
+  - Different specialization type: `ConvBackwardWeightSpecialization` vs `ConvForwardSpecialization`
+  - Missing several members expected by forward convolution traits
+
+### Future Work
+
+To support these additional instance types, the reflection system would need:
+
+1. Specialized `ConvTraits` templates for each variant type
+2. Updated `conv_layout`, `conv_data_type`, and other helper functions to handle different parameter structures
+3. Conditional compilation or SFINAE techniques to select the appropriate trait extraction logic based on instance type
+4. Customize `ConvDescription` methods for more general kernels.
+
+For now, these unsupported types can still use `GetInstanceString()` through the base class pointer, but cannot use the `ckr::describe` reflection API.
--- a/experimental/builder/include/ck_tile/builder/reflect/conv_description.hpp
+++ b/experimental/builder/include/ck_tile/builder/reflect/conv_description.hpp
@@ -1,21 +1,19 @@
 // Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
 // SPDX-License-Identifier: MIT

-/**
- * @file
- * @brief Provides utilities to reflect on convolution kernel instances and generate
- * human-readable descriptions of their configuration.
- *
- * This file contains the necessary components to transform a convolution kernel's
- * compile-time properties into a structured, descriptive format. This is primarily
- * used for debugging, logging, and generating documentation.
- *
- * Key components:
- * - ck_tile::reflect::conv::ConvDescription: A struct that holds the extracted
- *   properties and provides methods to format them into strings.
- * - ck_tile::reflect::conv::Describe(): A factory function that creates a
- *   ConvDescription from a given kernel instance type.
- */
+/// @file
+/// @brief Provides utilities to reflect on convolution kernel instances and generate
+/// human-readable descriptions of their configuration.
+///
+/// This file contains the necessary components to transform a convolution kernel's
+/// compile-time properties into a structured, descriptive format. This is primarily
+/// used for debugging, logging, and generating documentation.
+///
+/// Key components:
+/// - ck_tile::reflect::conv::ConvDescription: A struct that holds the extracted
+///   properties and provides methods to format them into strings.
+/// - ck_tile::reflect::conv::Describe(): A factory function that creates a
+///   ConvDescription from a given kernel instance type.

 #pragma once

@@ -24,14 +22,17 @@
 #include <sstream>
 #include <type_traits>
 #include <variant>
+#include <functional>

 #include <ck_tile/builder/conv_signature_concepts.hpp>
 #include <ck_tile/builder/reflect/conv_traits.hpp>
+#include <ck_tile/builder/reflect/description.hpp>
+#include <ck_tile/builder/reflect/instance_traits.hpp>
 #include <ck_tile/builder/reflect/tree_formatter.hpp>

-/// @brief Provides human-readable descriptions of convolution kernel instances
+namespace ck_tile::reflect {

-namespace ck_tile::reflect::conv {
+namespace conv {

 /// @brief Signature information for a convolution operation
 /// Contains high-level properties that define the convolution's interface,
@@ -71,56 +72,68 @@ struct GemmAlgorithmInfo
 /// @brief Provides human-readable descriptions of convolution kernel instances
 /// Generates formatted text descriptions at various levels of detail for
 /// understanding and documenting convolution kernel configurations.
-struct ConvDescription
+class ConvDescription : public Description
 {
-    ConvSignatureInfo signature;
-    GemmAlgorithmInfo algorithm;
+    public:
+    /// @brief Constructor for ConvDescription
+    /// @param sig The signature information containing high-level convolution properties
+    /// @param algo The algorithm configuration containing low-level implementation details
+    /// @param instance_string_getter A callable that returns a string representation of the
+    /// instance
+    ConvDescription(ConvSignatureInfo sig,
+                    GemmAlgorithmInfo algo,
+                    std::function<std::string()> instance_string_getter)
+        : signature_(std::move(sig)),
+          algorithm_(std::move(algo)),
+          instance_string_getter_(std::move(instance_string_getter))
+    {
+    }

    /// @brief Generate a brief one-line summary of the convolution
    /// @return A concise description (e.g., "2D Forward convolution")
-    std::string brief() const
+    std::string brief() const override
    {
        std::ostringstream oss;
-        oss << signature.spatial_dim << "D " << signature.direction << " convolution";
+        oss << signature_.spatial_dim << "D " << signature_.direction << " convolution";
        return oss.str();
    }

    /// @brief Generate a detailed hierarchical description of the convolution
    /// @return A multi-line tree-formatted description covering signature and algorithm details
-    std::string detailed() const
+    std::string detailed() const override
    {
        TreeFormatter f;
-        f.writeLine(0, signature.spatial_dim, "D ", signature.direction, " Convolution Kernel");
+        f.writeLine(0, signature_.spatial_dim, "D ", signature_.direction, " Convolution Kernel");
        f.writeLine(1, "Signature");
-        f.writeLine(2, "Tensor Type: ", signature.data_type);
-        f.writeLine(2, "Memory Layout: ", signature.layout);
-        f.writeLine(2, "Input elementwise operation: ", signature.input_element_op);
-        f.writeLine(2, "Weights elementwise operation: ", signature.weight_element_op);
-        f.writeLast(2, "Output elementwise operation: ", signature.output_element_op);
+        f.writeLine(2, "Tensor Type: ", signature_.data_type);
+        f.writeLine(2, "Memory Layout: ", signature_.layout);
+        f.writeLine(2, "Input elementwise operation: ", signature_.input_element_op);
+        f.writeLine(2, "Weights elementwise operation: ", signature_.weight_element_op);
+        f.writeLast(2, "Output elementwise operation: ", signature_.output_element_op);

        f.writeLast(1, "Algorithm");
        // Compute Block section
-        f.writeLine(2, "Thread block size: ", algorithm.thread_block_size);
+        f.writeLine(2, "Thread block size: ", algorithm_.thread_block_size);
        f.writeLine(2,
                    "Data tile size: ",
-                    algorithm.tile_dims.m,
+                    algorithm_.tile_dims.m,
                    "×",
-                    algorithm.tile_dims.n,
+                    algorithm_.tile_dims.n,
                    "×",
-                    algorithm.tile_dims.k);
-        f.writeLine(2, "Gemm padding: ", algorithm.padding);
-        f.writeLine(2, "Convolution specialization: ", algorithm.conv_specialization);
+                    algorithm_.tile_dims.k);
+        f.writeLine(2, "Gemm padding: ", algorithm_.padding);
+        f.writeLine(2, "Convolution specialization: ", algorithm_.conv_specialization);
        // Pipeline section
-        f.writeLine(2, "Pipeline version: ", algorithm.pipeline_version);
-        f.writeLine(2, "Pipeline scheduler: ", algorithm.pipeline_scheduler);
+        f.writeLine(2, "Pipeline version: ", algorithm_.pipeline_version);
+        f.writeLine(2, "Pipeline scheduler: ", algorithm_.pipeline_scheduler);
        f.writeLine(2, "Warp Gemm parameters: ");
        f.writeLine(
-            3, "subtile size: ", algorithm.warp_gemm.gemm_m, "×", algorithm.warp_gemm.gemm_n);
+            3, "subtile size: ", algorithm_.warp_gemm.gemm_m, "×", algorithm_.warp_gemm.gemm_n);
        f.writeLast(3,
                    "Number of warp gemm iterations: ",
-                    algorithm.warp_gemm.m_iter,
+                    algorithm_.warp_gemm.m_iter,
                    "×",
-                    algorithm.warp_gemm.n_iter);
+                    algorithm_.warp_gemm.n_iter);

        // Memory Access section
        f.writeLast(2, "Memory access:");
@@ -128,152 +141,148 @@ struct ConvDescription
        f.writeLine(3, "A Tile transfer: ");
        f.writeLine(4,
                    "Tile dimensions: ",
-                    algorithm.a_tile_transfer.tile_dimensions.k0,
+                    algorithm_.a_tile_transfer.tile_dimensions.k0,
                    "×",
-                    algorithm.a_tile_transfer.tile_dimensions.m_or_n,
+                    algorithm_.a_tile_transfer.tile_dimensions.m_or_n,
                    "×",
-                    algorithm.a_tile_transfer.tile_dimensions.k1,
+                    algorithm_.a_tile_transfer.tile_dimensions.k1,
                    "×");
-        f.writeLine(
-            4, "The innermost K subdimension size: ", algorithm.a_tile_transfer.transfer_params.k1);
+        f.writeLine(4,
+                    "The innermost K subdimension size: ",
+                    algorithm_.a_tile_transfer.transfer_params.k1);
        f.writeLine(4,
                    "Spatial thread distribution over the data tile: ",
-                    algorithm.a_tile_transfer.transfer_params.thread_cluster_order[0],
+                    algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[0],
                    "×",
-                    algorithm.a_tile_transfer.transfer_params.thread_cluster_order[1],
+                    algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[1],
                    "×",
-                    algorithm.a_tile_transfer.transfer_params.thread_cluster_order[2]);
+                    algorithm_.a_tile_transfer.transfer_params.thread_cluster_order[2]);
        f.writeLine(4,
                    "The order of accessing data tile axes: ",
-                    algorithm.a_tile_transfer.transfer_params.src_access_order[0],
+                    algorithm_.a_tile_transfer.transfer_params.src_access_order[0],
                    "×",
-                    algorithm.a_tile_transfer.transfer_params.src_access_order[1],
+                    algorithm_.a_tile_transfer.transfer_params.src_access_order[1],
                    "×",
-                    algorithm.a_tile_transfer.transfer_params.src_access_order[2]);
+                    algorithm_.a_tile_transfer.transfer_params.src_access_order[2]);
        f.writeLine(4,
                    "Vectorized memory access axis index (with contiguous memory): ",
-                    algorithm.a_tile_transfer.transfer_params.src_vector_dim);
+                    algorithm_.a_tile_transfer.transfer_params.src_vector_dim);
        f.writeLine(4,
                    "Vector access (GMEM read) instruction size: ",
-                    algorithm.a_tile_transfer.transfer_params.src_scalar_per_vector);
+                    algorithm_.a_tile_transfer.transfer_params.src_scalar_per_vector);
        f.writeLine(4,
                    "Vector access (LDS write) instruction size: ",
-                    algorithm.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
+                    algorithm_.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
        f.writeLast(4,
                    "LDS data layout padding (to prevent bank conflicts): ",
-                    algorithm.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
+                    algorithm_.a_tile_transfer.transfer_params.dst_scalar_per_vector_k1);

        f.writeLine(3, "B Tile transfer: ");
        f.writeLine(4,
                    "Tile dimensions: ",
-                    algorithm.b_tile_transfer.tile_dimensions.k0,
+                    algorithm_.b_tile_transfer.tile_dimensions.k0,
                    "×",
-                    algorithm.b_tile_transfer.tile_dimensions.m_or_n,
+                    algorithm_.b_tile_transfer.tile_dimensions.m_or_n,
                    "×",
-                    algorithm.b_tile_transfer.tile_dimensions.k1,
+                    algorithm_.b_tile_transfer.tile_dimensions.k1,
                    "×");
-        f.writeLine(
-            4, "The innermost K subdimension size: ", algorithm.b_tile_transfer.transfer_params.k1);
+        f.writeLine(4,
+                    "The innermost K subdimension size: ",
+                    algorithm_.b_tile_transfer.transfer_params.k1);
        f.writeLine(4,
                    "Spatial thread distribution over the data tile: ",
-                    algorithm.b_tile_transfer.transfer_params.thread_cluster_order[0],
+                    algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[0],
                    "×",
-                    algorithm.b_tile_transfer.transfer_params.thread_cluster_order[1],
+                    algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[1],
                    "×",
-                    algorithm.b_tile_transfer.transfer_params.thread_cluster_order[2]);
+                    algorithm_.b_tile_transfer.transfer_params.thread_cluster_order[2]);
        f.writeLine(4,
                    "The order of accessing data tile axes: ",
-                    algorithm.b_tile_transfer.transfer_params.src_access_order[0],
+                    algorithm_.b_tile_transfer.transfer_params.src_access_order[0],
                    "×",
-                    algorithm.b_tile_transfer.transfer_params.src_access_order[1],
+                    algorithm_.b_tile_transfer.transfer_params.src_access_order[1],
                    "×",
-                    algorithm.b_tile_transfer.transfer_params.src_access_order[2]);
+                    algorithm_.b_tile_transfer.transfer_params.src_access_order[2]);
        f.writeLine(4,
                    "Vectorized memory access axis index (with contiguous memory): ",
-                    algorithm.b_tile_transfer.transfer_params.src_vector_dim);
+                    algorithm_.b_tile_transfer.transfer_params.src_vector_dim);
        f.writeLine(4,
                    "Vector access (GMEM read) instruction size: ",
-                    algorithm.b_tile_transfer.transfer_params.src_scalar_per_vector);
+                    algorithm_.b_tile_transfer.transfer_params.src_scalar_per_vector);
        f.writeLine(4,
                    "Vector access (LDS write) instruction size: ",
-                    algorithm.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
+                    algorithm_.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
        f.writeLast(4,
                    "LDS data layout padding (to prevent bank conflicts): ",
-                    algorithm.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);
+                    algorithm_.b_tile_transfer.transfer_params.dst_scalar_per_vector_k1);

        f.writeLast(3, "C Tile transfer: ");
        f.writeLine(4,
                    "Data shuffle (number of gemm instructions per iteration): ",
-                    algorithm.c_tile_transfer.shuffle_params.m_gemms_per_shuffle,
+                    algorithm_.c_tile_transfer.shuffle_params.m_gemms_per_shuffle,
                    "×",
-                    algorithm.c_tile_transfer.shuffle_params.n_gemms_per_shuffle);
+                    algorithm_.c_tile_transfer.shuffle_params.n_gemms_per_shuffle);
        f.writeLine(4,
                    "Spatial thread distribution used to store data: ",
-                    algorithm.c_tile_transfer.thread_cluster_dims[0],
+                    algorithm_.c_tile_transfer.thread_cluster_dims[0],
                    "×",
-                    algorithm.c_tile_transfer.thread_cluster_dims[1],
+                    algorithm_.c_tile_transfer.thread_cluster_dims[1],
                    "×",
-                    algorithm.c_tile_transfer.thread_cluster_dims[2],
+                    algorithm_.c_tile_transfer.thread_cluster_dims[2],
                    "×",
-                    algorithm.c_tile_transfer.thread_cluster_dims[3]);
+                    algorithm_.c_tile_transfer.thread_cluster_dims[3]);
        f.writeLast(4,
                    "Vector access (GMEM write) instruction size: ",
-                    algorithm.c_tile_transfer.scalar_per_vector);
+                    algorithm_.c_tile_transfer.scalar_per_vector);
        return f.getString();
    }

-    /// @brief Generate an educational explanation of optimization choices
-    /// @return Educational content explaining why certain algorithm choices were made
-    /// @note Currently unimplemented - reserved for future enhancement
-    std::string explain() const
-    {
-        std::ostringstream oss;
-        // Placeholder for future implementation
-        return oss.str();
-    }
+    /// @brief Generate a string representation of the instance
+    /// @return A string that represents the instance
+    std::string instance_string() const override { return instance_string_getter_(); }

-    /// @brief Generate performance characteristics and use case guidance
-    /// @return Guidance on when this configuration is optimal and expected performance
-    /// @note Currently unimplemented - reserved for future enhancement
-    std::string suggest() const
-    {
-        std::ostringstream oss;
-        // Placeholder for future implementation
-        return oss.str();
-    }
+    private:
+    ConvSignatureInfo signature_;
+    GemmAlgorithmInfo algorithm_;
+    std::function<std::string()> instance_string_getter_;
 };
+} // namespace conv

-/// @brief Helper concept to detect if a type has InstanceTraits specialization
+/// @brief Helper concept to detect if a type has ConvTraits specialization
 template <typename T>
-concept HasInstanceTraits = requires { typename InstanceTraits<T>; };
+concept HasConvTraits = requires { typename conv::ConvTraits<T>; };

 /// @brief Factory function to create ConvDescription from a convolution instance type
 /// @tparam Instance The convolution instance type (must have InstanceTraits specialization)
 /// @return A ConvDescription object populated with the instance's configuration details
-template <typename Instance>
-    requires HasInstanceTraits<Instance>
-ConvDescription Describe()
+template <HasConvTraits Instance>
+conv::ConvDescription describe()
 {
-    using Traits = ConvTraits<Instance>;
+    using Traits = conv::ConvTraits<Instance>;

-    return ConvDescription{
-        .signature = ConvSignatureInfo{.spatial_dim       = Traits::spatial_dim,
-                                       .direction         = Traits::direction,
-                                       .layout            = Traits::layout,
-                                       .data_type         = Traits::data_type,
-                                       .input_element_op  = Traits::input_element_op,
-                                       .weight_element_op = Traits::weight_element_op,
-                                       .output_element_op = Traits::output_element_op},
-        .algorithm = GemmAlgorithmInfo{.thread_block_size   = Traits::thread_block_size,
-                                       .tile_dims           = Traits::tile_dims,
-                                       .warp_gemm           = Traits::warp_gemm,
-                                       .a_tile_transfer     = Traits::a_tile_transfer,
-                                       .b_tile_transfer     = Traits::b_tile_transfer,
-                                       .c_tile_transfer     = Traits::c_tile_transfer,
-                                       .pipeline_version    = Traits::pipeline_version,
-                                       .pipeline_scheduler  = Traits::pipeline_scheduler,
-                                       .conv_specialization = Traits::conv_specialization,
-                                       .padding             = Traits::gemm_padding}};
+    return conv::ConvDescription(
+        conv::ConvSignatureInfo{
+            .spatial_dim       = Traits::spatial_dim,
+            .direction         = Traits::direction,
+            .layout            = Traits::layout,
+            .data_type         = Traits::data_type,
+            .input_element_op  = Traits::input_element_op,
+            .weight_element_op = Traits::weight_element_op,
+            .output_element_op = Traits::output_element_op,
+        },
+        conv::GemmAlgorithmInfo{
+            .thread_block_size   = Traits::thread_block_size,
+            .tile_dims           = Traits::tile_dims,
+            .warp_gemm           = Traits::warp_gemm,
+            .a_tile_transfer     = Traits::a_tile_transfer,
+            .b_tile_transfer     = Traits::b_tile_transfer,
+            .c_tile_transfer     = Traits::c_tile_transfer,
+            .pipeline_version    = Traits::pipeline_version,
+            .pipeline_scheduler  = Traits::pipeline_scheduler,
+            .conv_specialization = Traits::conv_specialization,
+            .padding             = Traits::gemm_padding,
+        },
+        []() { return reflect::instance_string<Instance>(); });
 }

-} // namespace ck_tile::reflect::conv
+} // namespace ck_tile::reflect
--- a/experimental/builder/include/ck_tile/builder/reflect/conv_traits.hpp
+++ b/experimental/builder/include/ck_tile/builder/reflect/conv_traits.hpp
@@ -551,8 +551,7 @@ struct ConvTraits;
 /// @details This is the primary specialization used to extract a comprehensive
 /// set of traits directly from a fully-formed device kernel `Instance` type.
 /// It uses `InstanceTraits` to access the kernel's template parameters.
-template <typename Instance>
-    requires requires { typename InstanceTraits<Instance>; }
+template <HasInstanceTraits Instance>
 struct ConvTraits<Instance>
 {
    using InstTraits = InstanceTraits<Instance>;
--- a/experimental/builder/include/ck_tile/builder/reflect/description.hpp
+++ b/experimental/builder/include/ck_tile/builder/reflect/description.hpp
@@ -0,0 +1,39 @@
+// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier: MIT
+
+/// @file
+/// @brief Provides a base class for generating human-readable descriptions of kernel instances.
+///
+/// This file contains the Description base class that defines a common interface for
+/// all descriptor types. Derived classes implement specific formatting and explanation
+/// logic for different kernel types (e.g., convolution, GEMM, etc.).
+
+#pragma once
+
+#include <string>
+
+namespace ck_tile::reflect {
+
+/// @brief Base class for generating human-readable descriptions of kernel instances
+/// Defines a common interface for all descriptor types with methods for generating
+/// descriptions at various levels of detail.
+class Description
+{
+    public:
+    /// @brief Virtual destructor for proper cleanup of derived classes
+    virtual ~Description() = default;
+
+    /// @brief Generate a brief one-line summary
+    /// @return A concise description of the kernel configuration
+    virtual std::string brief() const = 0;
+
+    /// @brief Generate a detailed hierarchical description
+    /// @return A multi-line tree-formatted description covering all configuration details
+    virtual std::string detailed() const = 0;
+
+    /// @brief Generate a string representation of the instance
+    /// @return A string that represents the instance
+    virtual std::string instance_string() const = 0;
+};
+
+} // namespace ck_tile::reflect
--- a/experimental/builder/test/CMakeLists.txt
+++ b/experimental/builder/test/CMakeLists.txt
@@ -103,18 +103,18 @@ add_ck_builder_test(test_ckb_conv_builder
 # compilable code. They are more expensive but catch real-world issues.


-# Verifies that GetInstanceString() methods produce valid kernel code.
+# Verifies that GetInstanceString() methods and other functions produce valid kernel code.
 # Tests various convolution types:
 #   - Group convolution (v3, standard, large tensor, WMMA, DL variants)
 #   - Backward weight group convolution (XDL)
-# Requires kernel compilation to validate the generated strings.
-add_ck_builder_test(test_ckb_get_instance_string
-    test_get_instance_string_fwd_grp_conv_v3.cpp
-    test_get_instance_string_fwd_grp_conv.cpp
-    test_get_instance_string_fwd_grp_conv_large_tensor.cpp
-    test_get_instance_string_fwd_grp_conv_wmma.cpp
-    test_get_instance_string_fwd_grp_conv_dl.cpp
-    test_get_instance_string_bwd_weight_grp_conv_xdl.cpp)
+# Requires kernel compilation to validate the generated strings through the base class.
+add_ck_builder_test(test_ckb_instance_string
+    test_instance_string_fwd_grp_conv_v3.cpp
+    test_instance_string_fwd_grp_conv.cpp
+    test_instance_string_fwd_grp_conv_large_tensor.cpp
+    test_instance_string_fwd_grp_conv_wmma.cpp
+    test_instance_string_fwd_grp_conv_dl.cpp
+    test_instance_string_bwd_weight_grp_conv_xdl.cpp)

 # Tests the forward convolution builder across multiple data types and dimensions.
 # Individual tests are split into separate files to enable parallel compilation.
@@ -183,7 +183,7 @@ endforeach()

 # Register all regression tests (integration tests with kernel compilation)
 set(CKB_REGRESSION_TESTS
-    test_ckb_get_instance_string
+    test_ckb_instance_string
    test_ckb_build_fwd_instances
    test_ckb_testing_utils
    test_ckb_factory_grouped_convolution_forward_convscale
--- a/experimental/builder/test/test_conv_description.cpp
+++ b/experimental/builder/test/test_conv_description.cpp
@@ -13,7 +13,7 @@
 namespace {

 namespace ckb = ck_tile::builder;
-namespace ckr = ck_tile::reflect::conv;
+namespace ckr = ck_tile::reflect;
 namespace ckt = ck_tile::test;

 // Defines the signature of the convolution operation to be tested.
@@ -110,7 +110,7 @@ TEST(ConvDescriptionTest, DefaultInstanceHasBriefDescription)
    static constexpr const ConvSignature SIGNATURE;
    static constexpr const DefaultAlgorithm ALGORITHM;
    using Instance = ckb::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
-    EXPECT_THAT(ckr::Describe<Instance>().brief(), ckt::StringEqWithDiff("2D Forward convolution"));
+    EXPECT_THAT(ckr::describe<Instance>().brief(), ckt::StringEqWithDiff("2D Forward convolution"));
 }

 TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription)
@@ -118,7 +118,7 @@ TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription)
    static constexpr const ConvSignature SIGNATURE;
    static constexpr const DefaultAlgorithm ALGORITHM;
    using Instance = ckb::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
-    EXPECT_THAT(ckr::Describe<Instance>().detailed(),
+    EXPECT_THAT(ckr::describe<Instance>().detailed(),
                ckt::StringEqWithDiff( //
                    "2D Forward Convolution Kernel\n"
                    "├─ Signature\n"
@@ -162,6 +162,23 @@ TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription)
                    "         └─ Vector access (GMEM write) instruction size: 2"));
 }

+TEST(ConvDescriptionTest, DefaultInstanceHasInstanceString)
+{
+    static constexpr const ConvSignature SIGNATURE;
+    static constexpr const DefaultAlgorithm ALGORITHM;
+    using Instance = ckb::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
+
+    // Get the instance string from the description
+    std::string instance_str = ckr::describe<Instance>().instance_string();
+
+    // Verify that the instance string is not empty
+    EXPECT_FALSE(instance_str.empty());
+
+    // Verify that it contains the device operation name
+    // The exact format depends on the InstanceTraits implementation
+    EXPECT_THAT(instance_str, ::testing::HasSubstr("DeviceGroupedConvFwdMultipleABD"));
+}
+
 // NOTE: BackwardDataInstanceHasDetailedDescription test is disabled because ConvFactory
 // does not have a specialization for backward data convolutions. The test fails with:
 //   "implicit instantiation of undefined template 'ck_tile::builder::ConvFactory<...>'"
@@ -195,4 +212,5 @@ TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription)
 //     EXPECT_THAT(ckr::Describe<Builder>().detailed(),
 //                 ckt::StringEqWithDiff("PLACEHOLDER"));
 // }
+
 } // namespace
--- a/experimental/builder/test/test_get_instance_string_bwd_weight_grp_conv_xdl.cpp
+++ b/experimental/builder/test/test_get_instance_string_bwd_weight_grp_conv_xdl.cpp
@@ -1,86 +0,0 @@
-// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
-// SPDX-License-Identifier: MIT
-
-#include <gtest/gtest.h>
-#include <ck_tile/builder/reflect/instance_traits.hpp>
-#include <ck/tensor_operation/gpu/device/device_base.hpp>
-#include <ck/library/tensor_operation_instance/gpu/grouped_conv_bwd_weight/device_grouped_conv_bwd_weight_xdl_instance.hpp>
-
-// Test GetInstanceString through base class pointer for backward weight XDL variant
-TEST(GetInstanceString, ReturnsStringForBwdWeightGrpConvXdlInstance)
-{
-    // Use the template helper to get a working instance configuration
-    using InstanceTuple = ck::tensor_operation::device::instance::
-        device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances<
-            2,                                             // NDimSpatial
-            ck::tensor_operation::device::instance::GNHWC, // InLayout
-            ck::tensor_operation::device::instance::GKYXC, // WeiLayout
-            ck::tensor_operation::device::instance::GNHWK, // OutLayout
-            ck::tensor_operation::device::instance::
-                ConvBwdWeightDefault>; // ConvBwdWeightSpecialization
-
-    // Get the first instance from the tuple
-    using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
-
-    // Define the base class type using the most general operator base
-    using BaseClass = ck::tensor_operation::device::BaseOperator;
-
-    // Create an instance of the derived class
-    DeviceInstance device_instance;
-
-    // Get a pointer to the base class
-    BaseClass* base_ptr = &device_instance;
-
-    // Call GetInstanceString through the base class pointer
-    std::string instance_str = base_ptr->GetInstanceString();
-
-    // Expected complete instance string based on the first instance from
-    // device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances
-    // This corresponds to the configuration with BlockSize=64, MPerBlock=64, NPerBlock=64, etc.
-    std::string expected_str = "DeviceGroupedConvBwdWeight_Xdl_CShuffle"
-                               "<2"             // NDimSpatial
-                               ",GNHWC"         // InLayout
-                               ",GKYXC"         // WeiLayout
-                               ",GNHWK"         // OutLayout
-                               ",fp16"          // InDataType
-                               ",fp16"          // WeiDataType
-                               ",fp16"          // OutDataType
-                               ",fp32"          // AccDataType
-                               ",PassThrough"   // InElementwiseOperation
-                               ",PassThrough"   // WeiElementwiseOperation
-                               ",PassThrough"   // OutElementwiseOperation
-                               ",Default"       // ConvBackwardWeightSpecialization
-                               ",64"            // BlockSize
-                               ",64"            // MPerBlock
-                               ",64"            // NPerBlock
-                               ",4"             // K0PerBlock
-                               ",8"             // K1
-                               ",32"            // MPerXDL
-                               ",32"            // NPerXDL
-                               ",2"             // MXdlPerWave
-                               ",2"             // NXdlPerWave
-                               ",Seq(1,4,8,2)"  // ABlockTransferThreadClusterLengths_K0_M_K1
-                               ",Seq(0,3,1,2)"  // ABlockTransferThreadClusterArrangeOrder
-                               ",Seq(0,2,1,3)"  // ABlockTransferSrcAccessOrder
-                               ",2"             // ABlockTransferSrcVectorDim
-                               ",2"             // ABlockTransferSrcScalarPerVector
-                               ",4"             // ABlockTransferDstScalarPerVector_K1
-                               ",true"          // ABlockLdsAddExtraM
-                               ",Seq(1,4,8,2)"  // BBlockTransferThreadClusterLengths_K0_N_K1
-                               ",Seq(0,3,1,2)"  // BBlockTransferThreadClusterArrangeOrder
-                               ",Seq(0,2,1,3)"  // BBlockTransferSrcAccessOrder
-                               ",2"             // BBlockTransferSrcVectorDim
-                               ",2"             // BBlockTransferSrcScalarPerVector
-                               ",4"             // BBlockTransferDstScalarPerVector_K1
-                               ",true"          // BBlockLdsAddExtraN
-                               ",1"             // CShuffleMXdlPerWavePerShuffle
-                               ",1"             // CShuffleNXdlPerWavePerShuffle
-                               ",Seq(1,16,1,4)" // CBlockTransferClusterLengths
-                               ",2"             // CBlockTransferScalarPerVector_NWaveNPerXdl
-                               ",fp16"          // ComputeTypeA
-                               ",fp16"          // ComputeTypeB
-                               ",1"             // MaxTransposeTransferSrcScalarPerVector
-                               ",1>";           // MaxTransposeTransferDstScalarPerVector
-
-    EXPECT_EQ(instance_str, expected_str);
-}
--- a/experimental/builder/test/test_get_instance_string_fwd_grp_conv.cpp
+++ b/experimental/builder/test/test_get_instance_string_fwd_grp_conv.cpp
@@ -1,90 +0,0 @@
-// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
-// SPDX-License-Identifier: MIT
-
-#include <gtest/gtest.h>
-#include <ck_tile/builder/reflect/instance_traits.hpp>
-#include <ck/tensor_operation/gpu/device/device_base.hpp>
-#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp>
-
-// Test GetInstanceString through base class pointer for non-V3 variant
-TEST(GetInstanceString, ReturnsStringForFwdGrpConvInstance)
-{
-    // Use the template helper to get a working instance configuration
-    using InstanceTuple =
-        ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_instances<
-            2,                                                       // NDimSpatial
-            ck::tensor_operation::device::instance::GNHWC,           // ALayout
-            ck::tensor_operation::device::instance::GKYXC,           // BLayout
-            ck::tensor_operation::device::instance::Empty_Tuple,     // DsLayout
-            ck::tensor_operation::device::instance::GNHWK,           // ELayout
-            ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
-
-    // Get the first instance from the tuple
-    using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
-
-    // Define the base class type using the most general operator base
-    using BaseClass = ck::tensor_operation::device::BaseOperator;
-
-    // Create an instance of the derived class
-    DeviceInstance device_instance;
-
-    // Get a pointer to the base class
-    BaseClass* base_ptr = &device_instance;
-
-    // Call GetInstanceString through the base class pointer
-    std::string instance_str = base_ptr->GetInstanceString();
-
-    // Expected complete instance string based on the first instance from
-    // device_grouped_conv_fwd_xdl_f16_instances
-    std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle"
-                               "<2"             // NDimSpatial
-                               ",GNHWC"         // ALayout
-                               ",GKYXC"         // BLayout
-                               ",EmptyTuple"    // DsLayout
-                               ",GNHWK"         // ELayout
-                               ",fp16"          // ADataType
-                               ",fp16"          // BDataType
-                               ",fp32"          // AccDataType
-                               ",fp16"          // CShuffleDataType
-                               ",EmptyTuple"    // DsDataType
-                               ",fp16"          // EDataType
-                               ",PassThrough"   // AElementwiseOperation
-                               ",PassThrough"   // BElementwiseOperation
-                               ",PassThrough"   // CDEElementwiseOperation
-                               ",Default"       // ConvForwardSpecialization
-                               ",MNKPadding"    // GemmSpec
-                               ",1"             // NumGemmKPrefetchStage
-                               ",64"            // BlockSize
-                               ",64"            // MPerBlock
-                               ",64"            // NPerBlock
-                               ",32"            // KPerBlock
-                               ",8"             // AK1
-                               ",8"             // BK1
-                               ",32"            // MPerXDL
-                               ",32"            // NPerXDL
-                               ",2"             // MXdlPerWave
-                               ",2"             // NXdlPerWave
-                               ",Seq(4,16,1)"   // ABlockTransferThreadClusterLengths
-                               ",Seq(1,0,2)"    // ABlockTransferThreadClusterArrangeOrder
-                               ",Seq(1,0,2)"    // ABlockTransferSrcAccessOrder
-                               ",2"             // ABlockTransferSrcVectorDim
-                               ",1"             // ABlockTransferSrcScalarPerVector
-                               ",8"             // ABlockTransferDstScalarPerVector_AK1
-                               ",1"             // ABlockLdsExtraM
-                               ",Seq(4,16,1)"   // BBlockTransferThreadClusterLengths
-                               ",Seq(1,0,2)"    // BBlockTransferThreadClusterArrangeOrder
-                               ",Seq(1,0,2)"    // BBlockTransferSrcAccessOrder
-                               ",2"             // BBlockTransferSrcVectorDim
-                               ",1"             // BBlockTransferSrcScalarPerVector
-                               ",8"             // BBlockTransferDstScalarPerVector_BK1
-                               ",1"             // BBlockLdsExtraN
-                               ",1"             // CShuffleMXdlPerWavePerShuffle
-                               ",1"             // CShuffleNXdlPerWavePerShuffle
-                               ",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths
-                               ",1"             // CDEBlockTransferScalarPerVector_NPerBlock
-                               ",fp16"          // AComputeDataType
-                               ",fp16"          // BComputeDataType
-                               ",Default"       // LoopScheduler
-                               ",1>";           // NumGroupsToMerge
-    EXPECT_EQ(instance_str, expected_str);
-}
--- a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_dl.cpp
+++ b/experimental/builder/test/test_get_instance_string_fwd_grp_conv_dl.cpp
@@ -1,85 +0,0 @@
-// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
-// SPDX-License-Identifier: MIT
-
-#include <gtest/gtest.h>
-#include <ck_tile/builder/reflect/instance_traits.hpp>
-#include <ck/tensor_operation/gpu/device/device_base.hpp>
-#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp>
-
-// Test GetInstanceString through base class pointer for DL variant
-TEST(GetInstanceString, ReturnsStringForFwdGrpConvDlInstance)
-{
-    // Use the template helper to get a working instance configuration
-    using InstanceTuple =
-        ck::tensor_operation::device::instance::device_grouped_conv2d_fwd_dl_f16_instances<
-            ck::tensor_operation::device::instance::GNHWC,           // InLayout
-            ck::tensor_operation::device::instance::GKYXC,           // WeiLayout
-            ck::tensor_operation::device::instance::Empty_Tuple,     // DsLayout
-            ck::tensor_operation::device::instance::GNHWK,           // OutLayout
-            ck::Tuple<>,                                             // DsDatatype
-            ck::tensor_operation::element_wise::PassThrough,         // CDEElementOp
-            ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec
-
-    // Get the first instance from the tuple
-    using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
-
-    // Define the base class type using the most general operator base
-    using BaseClass = ck::tensor_operation::device::BaseOperator;
-
-    // Create an instance of the derived class
-    DeviceInstance device_instance;
-
-    // Get a pointer to the base class
-    BaseClass* base_ptr = &device_instance;
-
-    // Call GetInstanceString through the base class pointer
-    std::string instance_str = base_ptr->GetInstanceString();
-
-    // Expected complete instance string based on the first instance from
-    // device_grouped_conv2d_fwd_dl_f16_instances
-    std::string expected_str = "DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK"
-                               "<2"            // NDimSpatial
-                               ",fp16"         // ADataType
-                               ",fp16"         // BDataType
-                               ",EmptyTuple"   // DsDataType
-                               ",fp16"         // EDataType
-                               ",fp32"         // AccDataType
-                               ",GNHWC"        // ALayout
-                               ",GKYXC"        // BLayout
-                               ",EmptyTuple"   // DsLayout
-                               ",GNHWK"        // ELayout
-                               ",PassThrough"  // AElementwiseOperation
-                               ",PassThrough"  // BElementwiseOperation
-                               ",PassThrough"  // CDEElementwiseOperation
-                               ",Default"      // ConvForwardSpecialization
-                               ",MNKPadding"   // GemmSpec
-                               ",8"            // BlockSize
-                               ",16"           // MPerBlock
-                               ",4"            // NPerBlock
-                               ",2"            // K0PerBlock
-                               ",1"            // K1
-                               ",1"            // M1PerThread
-                               ",2"            // N1PerThread
-                               ",1"            // KPerThread
-                               ",Seq(4,2)"     // M1N1ThreadClusterM1Xs
-                               ",Seq(1,1)"     // M1N1ThreadClusterN1Xs
-                               ",Seq(2,1,2,1)" // ABlockTransferThreadSliceLengths_K0_M0_M1_K1
-                               ",Seq(1,1,8,1)" // ABlockTransferThreadClusterLengths_K0_M0_M1_K1
-                               ",Seq(1,2,0,3)" // ABlockTransferThreadClusterArrangeOrder
-                               ",Seq(1,2,0,3)" // ABlockTransferSrcAccessOrder
-                               ",Seq(1,1,1,1)" // ABlockTransferSrcVectorTensorLengths_K0_M0_M1_K1
-                               ",Seq(1,2,0,3)" // ABlockTransferSrcVectorTensorContiguousDimOrder
-                               ",Seq(1,1,1,1)" // ABlockTransferDstVectorTensorLengths_K0_M0_M1_K1
-                               ",Seq(1,1,1,1)" // BBlockTransferThreadSliceLengths_K0_N0_N1_K1
-                               ",Seq(2,1,4,1)" // BBlockTransferThreadClusterLengths_K0_N0_N1_K1
-                               ",Seq(1,2,0,3)" // BBlockTransferThreadClusterArrangeOrder
-                               ",Seq(1,2,0,3)" // BBlockTransferSrcAccessOrder
-                               ",Seq(1,1,1,1)" // BBlockTransferSrcVectorTensorLengths_K0_N0_N1_K1
-                               ",Seq(1,2,0,3)" // BBlockTransferSrcVectorTensorContiguousDimOrder
-                               ",Seq(1,1,1,1)" // BBlockTransferDstVectorTensorLengths_K0_N0_N1_K1
-                               ",Seq(0,1,2,3,4,5)" // CThreadTransferSrcDstAccessOrder
-                               ",5"                // CThreadTransferSrcDstVectorDim
-                               ",1>";              // CThreadTransferDstScalarPerVector
-
-    EXPECT_EQ(instance_str, expected_str);
-}
--- a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_large_tensor.cpp
+++ b/experimental/builder/test/test_get_instance_string_fwd_grp_conv_large_tensor.cpp
@@ -1,89 +0,0 @@
-// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
-// SPDX-License-Identifier: MIT
-
-#include <gtest/gtest.h>
-#include <ck_tile/builder/reflect/instance_traits.hpp>
-#include <ck/tensor_operation/gpu/device/device_base.hpp>
-#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp>
-
-// Test GetInstanceString through base class pointer for large tensor variant
-TEST(GetInstanceString, ReturnsStringForFwdGrpConvLargeTensorInstance)
-{
-    // Use the template helper to get a working instance configuration
-    using InstanceTuple = ck::tensor_operation::device::instance::
-        device_grouped_conv_fwd_xdl_large_tensor_f16_instances<
-            2,                                                       // NDimSpatial
-            ck::tensor_operation::device::instance::GNHWC,           // ALayout
-            ck::tensor_operation::device::instance::GKYXC,           // BLayout
-            ck::tensor_operation::device::instance::Empty_Tuple,     // DsLayout
-            ck::tensor_operation::device::instance::GNHWK,           // ELayout
-            ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
-
-    // Get the first instance from the tuple
-    using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
-
-    // Define the base class type using the most general operator base
-    using BaseClass = ck::tensor_operation::device::BaseOperator;
-
-    // Create an instance of the derived class
-    DeviceInstance device_instance;
-
-    // Get a pointer to the base class
-    BaseClass* base_ptr = &device_instance;
-
-    // Call GetInstanceString through the base class pointer
-    std::string instance_str = base_ptr->GetInstanceString();
-
-    // Expected complete instance string based on the first instance from
-    // device_grouped_conv_fwd_xdl_large_tensor_f16_instances
-    std::string expected_str = "DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor"
-                               "<2"             // NDimSpatial
-                               ",GNHWC"         // ALayout
-                               ",GKYXC"         // BLayout
-                               ",EmptyTuple"    // DsLayout
-                               ",GNHWK"         // ELayout
-                               ",fp16"          // ADataType
-                               ",fp16"          // BDataType
-                               ",fp32"          // AccDataType
-                               ",fp16"          // CShuffleDataType
-                               ",EmptyTuple"    // DsDataType
-                               ",fp16"          // EDataType
-                               ",PassThrough"   // AElementwiseOperation
-                               ",PassThrough"   // BElementwiseOperation
-                               ",PassThrough"   // CDEElementwiseOperation
-                               ",Default"       // ConvForwardSpecialization
-                               ",MNKPadding"    // GemmSpec
-                               ",1"             // NumGemmKPrefetchStage
-                               ",64"            // BlockSize
-                               ",64"            // MPerBlock
-                               ",64"            // NPerBlock
-                               ",32"            // KPerBlock
-                               ",8"             // AK1
-                               ",8"             // BK1
-                               ",32"            // MPerXDL
-                               ",32"            // NPerXDL
-                               ",2"             // MXdlPerWave
-                               ",2"             // NXdlPerWave
-                               ",Seq(4,16,1)"   // ABlockTransferThreadClusterLengths
-                               ",Seq(1,0,2)"    // ABlockTransferThreadClusterArrangeOrder
-                               ",Seq(1,0,2)"    // ABlockTransferSrcAccessOrder
-                               ",2"             // ABlockTransferSrcVectorDim
-                               ",1"             // ABlockTransferSrcScalarPerVector
-                               ",8"             // ABlockTransferDstScalarPerVector_AK1
-                               ",1"             // ABlockLdsExtraM
-                               ",Seq(4,16,1)"   // BBlockTransferThreadClusterLengths
-                               ",Seq(1,0,2)"    // BBlockTransferThreadClusterArrangeOrder
-                               ",Seq(1,0,2)"    // BBlockTransferSrcAccessOrder
-                               ",2"             // BBlockTransferSrcVectorDim
-                               ",1"             // BBlockTransferSrcScalarPerVector
-                               ",8"             // BBlockTransferDstScalarPerVector_BK1
-                               ",1"             // BBlockLdsExtraN
-                               ",1"             // CShuffleMXdlPerWavePerShuffle
-                               ",1"             // CShuffleNXdlPerWavePerShuffle
-                               ",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths
-                               ",1"             // CDEBlockTransferScalarPerVector_NPerBlock
-                               ",fp16"          // AComputeDataType
-                               ",fp16"          // BComputeDataType
-                               ",Default>";     // LoopScheduler
-    EXPECT_EQ(instance_str, expected_str);
-}
--- a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_v3.cpp
+++ b/experimental/builder/test/test_get_instance_string_fwd_grp_conv_v3.cpp
@@ -1,91 +0,0 @@
-// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
-// SPDX-License-Identifier: MIT
-
-#include <gtest/gtest.h>
-#include <ck_tile/builder/reflect/instance_traits.hpp>
-#include <ck/tensor_operation/gpu/device/device_base.hpp>
-#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp>
-
-// Test GetInstanceString through base class pointer for V3 variant
-TEST(GetInstanceString, ReturnsStringForFwdGrpConvV3Instance)
-{
-    // Use the template helper to get a working instance configuration
-    using InstanceTuple =
-        ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_comp_instances<
-            2,                                                       // NDimSpatial
-            ck::tensor_operation::device::instance::GNHWC,           // ALayout
-            ck::tensor_operation::device::instance::GKYXC,           // BLayout
-            ck::tensor_operation::device::instance::Empty_Tuple,     // DsLayout
-            ck::tensor_operation::device::instance::GNHWK,           // ELayout
-            ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
-
-    // Get the first instance from the tuple
-    using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
-
-    // Define the base class type using the most general operator base
-    using BaseClass = ck::tensor_operation::device::BaseOperator;
-
-    // Create an instance of the derived class
-    DeviceInstance device_instance;
-
-    // Get a pointer to the base class
-    BaseClass* base_ptr = &device_instance;
-
-    // Call GetInstanceString through the base class pointer
-    std::string instance_str = base_ptr->GetInstanceString();
-
-    // Expected complete instance string based on the first instance from
-    // device_grouped_conv_fwd_xdl_f16_comp_instances This corresponds to the configuration with
-    // BlockSize=256, MPerBlock=128, NPerBlock=128, KPerBlock=64, etc.
-    std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3"
-                               "<2"             // NDimSpatial
-                               ",GNHWC"         // ALayout
-                               ",GKYXC"         // BLayout
-                               ",EmptyTuple"    // DsLayout
-                               ",GNHWK"         // ELayout
-                               ",fp16"          // ADataType
-                               ",fp16"          // BDataType
-                               ",fp32"          // AccDataType
-                               ",fp16"          // CShuffleDataType
-                               ",EmptyTuple"    // DsDataType
-                               ",fp16"          // EDataType
-                               ",PassThrough"   // AElementwiseOperation
-                               ",PassThrough"   // BElementwiseOperation
-                               ",PassThrough"   // CDEElementwiseOperation
-                               ",Default"       // ConvForwardSpecialization
-                               ",MNKPadding"    // GemmSpec
-                               ",256"           // BlockSize
-                               ",128"           // MPerBlock
-                               ",128"           // NPerBlock
-                               ",64"            // KPerBlock
-                               ",8"             // AK1
-                               ",8"             // BK1
-                               ",32"            // MPerXDL
-                               ",32"            // NPerXDL
-                               ",2"             // MXdlPerWave
-                               ",2"             // NXdlPerWave
-                               ",Seq(8,32,1)"   // ABlockTransferThreadClusterLengths
-                               ",Seq(1,0,2)"    // ABlockTransferThreadClusterArrangeOrder
-                               ",Seq(1,0,2)"    // ABlockTransferSrcAccessOrder
-                               ",2"             // ABlockTransferSrcVectorDim
-                               ",8"             // ABlockTransferSrcScalarPerVector
-                               ",8"             // ABlockTransferDstScalarPerVector_AK1
-                               ",0"             // ABlockLdsExtraM
-                               ",Seq(8,32,1)"   // BBlockTransferThreadClusterLengths
-                               ",Seq(1,0,2)"    // BBlockTransferThreadClusterArrangeOrder
-                               ",Seq(1,0,2)"    // BBlockTransferSrcAccessOrder
-                               ",2"             // BBlockTransferSrcVectorDim
-                               ",8"             // BBlockTransferSrcScalarPerVector
-                               ",8"             // BBlockTransferDstScalarPerVector_BK1
-                               ",0"             // BBlockLdsExtraN
-                               ",1"             // CShuffleMXdlPerWavePerShuffle
-                               ",1"             // CShuffleNXdlPerWavePerShuffle
-                               ",Seq(1,32,1,8)" // CDEBlockTransferClusterLengths
-                               ",8"             // CDEBlockTransferScalarPerVector_NPerBlock
-                               ",Intrawave"     // BlkGemmPipeSched
-                               ",v4"            // BlkGemmPipelineVer
-                               ",fp16"          // AComputeDataType
-                               ",fp16"          // BComputeDataType
-                               ",false>";       // DirectLoad
-    EXPECT_EQ(instance_str, expected_str);
-}
--- a/experimental/builder/test/test_get_instance_string_fwd_grp_conv_wmma.cpp
+++ b/experimental/builder/test/test_get_instance_string_fwd_grp_conv_wmma.cpp
@@ -1,90 +0,0 @@
-// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
-// SPDX-License-Identifier: MIT
-
-#include <gtest/gtest.h>
-#include <ck_tile/builder/reflect/instance_traits.hpp>
-#include <ck/tensor_operation/gpu/device/device_base.hpp>
-#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp>
-
-// Test GetInstanceString through base class pointer for Wmma variant
-TEST(GetInstanceString, ReturnsStringForFwdGrpConvWmmaInstance)
-{
-    // Use the template helper to get a working instance configuration
-    using InstanceTuple =
-        ck::tensor_operation::device::instance::device_grouped_conv_fwd_wmma_f16_instances<
-            2,                                                       // NDimSpatial
-            ck::tensor_operation::device::instance::GNHWC,           // ALayout
-            ck::tensor_operation::device::instance::GKYXC,           // BLayout
-            ck::tensor_operation::device::instance::Empty_Tuple,     // DsLayout
-            ck::tensor_operation::device::instance::GNHWK,           // ELayout
-            ck::Tuple<>,                                             // DsDatatype
-            ck::tensor_operation::element_wise::PassThrough,         // CDEElementOp
-            ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec
-
-    // Get the first instance from the tuple
-    using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
-
-    // Define the base class type using the most general operator base
-    using BaseClass = ck::tensor_operation::device::BaseOperator;
-
-    // Create an instance of the derived class
-    DeviceInstance device_instance;
-
-    // Get a pointer to the base class
-    BaseClass* base_ptr = &device_instance;
-
-    // Call GetInstanceString through the base class pointer
-    std::string instance_str = base_ptr->GetInstanceString();
-
-    // Expected complete instance string based on the first instance from
-    // device_grouped_conv_fwd_wmma_f16_instances
-    std::string expected_str = "DeviceGroupedConvFwdMultipleD_Wmma_CShuffle"
-                               "<2"             // NDimSpatial
-                               ",GNHWC"         // ALayout
-                               ",GKYXC"         // BLayout
-                               ",EmptyTuple"    // DsLayout
-                               ",GNHWK"         // ELayout
-                               ",fp16"          // ADataType
-                               ",fp16"          // BDataType
-                               ",fp32"          // AccDataType
-                               ",fp16"          // CShuffleDataType
-                               ",EmptyTuple"    // DsDataType
-                               ",fp16"          // EDataType
-                               ",PassThrough"   // AElementwiseOperation
-                               ",PassThrough"   // BElementwiseOperation
-                               ",PassThrough"   // CDEElementwiseOperation
-                               ",Default"       // ConvForwardSpecialization
-                               ",MNKPadding"    // GemmSpec
-                               ",1"             // NumGemmKPrefetchStage
-                               ",128"           // BlockSize
-                               ",64"            // MPerBlock
-                               ",64"            // NPerBlock
-                               ",32"            // KPerBlock
-                               ",8"             // K1
-                               ",16"            // MPerWmma
-                               ",16"            // NPerWmma
-                               ",2"             // MRepeat
-                               ",2"             // NRepeat
-                               ",Seq(4,32,1)"   // ABlockTransferThreadClusterLengths
-                               ",Seq(1,0,2)"    // ABlockTransferThreadClusterArrangeOrder
-                               ",Seq(1,0,2)"    // ABlockTransferSrcAccessOrder
-                               ",2"             // ABlockTransferSrcVectorDim
-                               ",1"             // ABlockTransferSrcScalarPerVector
-                               ",8"             // ABlockTransferDstScalarPerVector_AK1
-                               ",true"          // ABlockLdsExtraM
-                               ",Seq(4,32,1)"   // BBlockTransferThreadClusterLengths
-                               ",Seq(1,0,2)"    // BBlockTransferThreadClusterArrangeOrder
-                               ",Seq(1,0,2)"    // BBlockTransferSrcAccessOrder
-                               ",2"             // BBlockTransferSrcVectorDim
-                               ",1"             // BBlockTransferSrcScalarPerVector
-                               ",8"             // BBlockTransferDstScalarPerVector_BK1
-                               ",true"          // BBlockLdsExtraN
-                               ",1"             // CShuffleMRepeatPerShuffle
-                               ",1"             // CShuffleNRepeatPerShuffle
-                               ",Seq(1,32,1,4)" // CDEShuffleBlockTransferClusterLengths
-                               ",1"             // CDEShuffleBlockTransferScalarPerVector_NPerBlock
-                               ",Default"       // LoopSched
-                               ",v1>";          // PipelineVer
-
-    EXPECT_EQ(instance_str, expected_str);
-}
--- a/experimental/builder/test/test_instance_string_bwd_weight_grp_conv_xdl.cpp
+++ b/experimental/builder/test/test_instance_string_bwd_weight_grp_conv_xdl.cpp
@@ -0,0 +1,90 @@
+// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier: MIT
+
+#include <gtest/gtest.h>
+#include "ck_tile/builder/reflect/instance_traits.hpp"
+#include "ck_tile/builder/reflect/conv_description.hpp"
+#include "ck/tensor_operation/gpu/device/device_base.hpp"
+#include "ck/library/tensor_operation_instance/gpu/grouped_conv_bwd_weight/device_grouped_conv_bwd_weight_xdl_instance.hpp"
+
+namespace {
+
+namespace ckr = ck_tile::reflect;
+
+// Use the template helper to get a working instance configuration
+using InstanceTuple = ck::tensor_operation::device::instance::
+    device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances<
+        2,                                             // NDimSpatial
+        ck::tensor_operation::device::instance::GNHWC, // InLayout
+        ck::tensor_operation::device::instance::GKYXC, // WeiLayout
+        ck::tensor_operation::device::instance::GNHWK, // OutLayout
+        ck::tensor_operation::device::instance::
+            ConvBwdWeightDefault>; // ConvBwdWeightSpecialization
+
+// Get the first instance from the tuple
+using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
+
+// Expected complete instance string based on the first instance from
+// device_grouped_conv_bwd_weight_xdl_c_shuffle_f16_instances
+// This corresponds to the configuration with BlockSize=64, MPerBlock=64, NPerBlock=64, etc.
+std::string expected_str = "DeviceGroupedConvBwdWeight_Xdl_CShuffle"
+                           "<2"             // NDimSpatial
+                           ",GNHWC"         // InLayout
+                           ",GKYXC"         // WeiLayout
+                           ",GNHWK"         // OutLayout
+                           ",fp16"          // InDataType
+                           ",fp16"          // WeiDataType
+                           ",fp16"          // OutDataType
+                           ",fp32"          // AccDataType
+                           ",PassThrough"   // InElementwiseOperation
+                           ",PassThrough"   // WeiElementwiseOperation
+                           ",PassThrough"   // OutElementwiseOperation
+                           ",Default"       // ConvBackwardWeightSpecialization
+                           ",64"            // BlockSize
+                           ",64"            // MPerBlock
+                           ",64"            // NPerBlock
+                           ",4"             // K0PerBlock
+                           ",8"             // K1
+                           ",32"            // MPerXDL
+                           ",32"            // NPerXDL
+                           ",2"             // MXdlPerWave
+                           ",2"             // NXdlPerWave
+                           ",Seq(1,4,8,2)"  // ABlockTransferThreadClusterLengths_K0_M_K1
+                           ",Seq(0,3,1,2)"  // ABlockTransferThreadClusterArrangeOrder
+                           ",Seq(0,2,1,3)"  // ABlockTransferSrcAccessOrder
+                           ",2"             // ABlockTransferSrcVectorDim
+                           ",2"             // ABlockTransferSrcScalarPerVector
+                           ",4"             // ABlockTransferDstScalarPerVector_K1
+                           ",true"          // ABlockLdsAddExtraM
+                           ",Seq(1,4,8,2)"  // BBlockTransferThreadClusterLengths_K0_N_K1
+                           ",Seq(0,3,1,2)"  // BBlockTransferThreadClusterArrangeOrder
+                           ",Seq(0,2,1,3)"  // BBlockTransferSrcAccessOrder
+                           ",2"             // BBlockTransferSrcVectorDim
+                           ",2"             // BBlockTransferSrcScalarPerVector
+                           ",4"             // BBlockTransferDstScalarPerVector_K1
+                           ",true"          // BBlockLdsAddExtraN
+                           ",1"             // CShuffleMXdlPerWavePerShuffle
+                           ",1"             // CShuffleNXdlPerWavePerShuffle
+                           ",Seq(1,16,1,4)" // CBlockTransferClusterLengths
+                           ",2"             // CBlockTransferScalarPerVector_NWaveNPerXdl
+                           ",fp16"          // ComputeTypeA
+                           ",fp16"          // ComputeTypeB
+                           ",1"             // MaxTransposeTransferSrcScalarPerVector
+                           ",1>";           // MaxTransposeTransferDstScalarPerVector
+
+// Test GetInstanceString through base class pointer for backward weight XDL variant
+TEST(InstanceString, GetInstanceStringReturnsCorrectValueForBwdWeightGrpConvXdl)
+{
+    using BaseClass = ck::tensor_operation::device::BaseOperator;
+    DeviceInstance device_instance;
+    BaseClass* base_ptr = &device_instance;
+
+    EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
+}
+
+// TODO: Add DescriptionReturnsCorrectValueForBwdWeightGrpConvXdl test once ckr::describe supports
+// backward weight convolution The backward weight variant uses different layout naming (InLayout,
+// WeiLayout, OutLayout) instead of (ALayout, BLayout, ELayout) and has
+// ConvBackwardWeightSpecialization
+
+} // namespace
--- a/experimental/builder/test/test_instance_string_fwd_grp_conv.cpp
+++ b/experimental/builder/test/test_instance_string_fwd_grp_conv.cpp
@@ -0,0 +1,95 @@
+// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier: MIT
+
+#include "gtest/gtest.h"
+#include "ck_tile/builder/reflect/instance_traits.hpp"
+#include "ck_tile/builder/reflect/conv_description.hpp"
+#include "ck/tensor_operation/gpu/device/device_base.hpp"
+#include "ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_instance.hpp"
+
+namespace {
+
+namespace ckr = ck_tile::reflect;
+
+// Use the template helper to get a working instance configuration
+using InstanceTuple =
+    ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_instances<
+        2,                                                       // NDimSpatial
+        ck::tensor_operation::device::instance::GNHWC,           // ALayout
+        ck::tensor_operation::device::instance::GKYXC,           // BLayout
+        ck::tensor_operation::device::instance::Empty_Tuple,     // DsLayout
+        ck::tensor_operation::device::instance::GNHWK,           // ELayout
+        ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
+
+// Get the first instance from the tuple
+using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
+
+// Expected complete instance string based on the first instance from
+// device_grouped_conv_fwd_xdl_f16_instances
+std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle"
+                           "<2"             // NDimSpatial
+                           ",GNHWC"         // ALayout
+                           ",GKYXC"         // BLayout
+                           ",EmptyTuple"    // DsLayout
+                           ",GNHWK"         // ELayout
+                           ",fp16"          // ADataType
+                           ",fp16"          // BDataType
+                           ",fp32"          // AccDataType
+                           ",fp16"          // CShuffleDataType
+                           ",EmptyTuple"    // DsDataType
+                           ",fp16"          // EDataType
+                           ",PassThrough"   // AElementwiseOperation
+                           ",PassThrough"   // BElementwiseOperation
+                           ",PassThrough"   // CDEElementwiseOperation
+                           ",Default"       // ConvForwardSpecialization
+                           ",MNKPadding"    // GemmSpec
+                           ",1"             // NumGemmKPrefetchStage
+                           ",64"            // BlockSize
+                           ",64"            // MPerBlock
+                           ",64"            // NPerBlock
+                           ",32"            // KPerBlock
+                           ",8"             // AK1
+                           ",8"             // BK1
+                           ",32"            // MPerXDL
+                           ",32"            // NPerXDL
+                           ",2"             // MXdlPerWave
+                           ",2"             // NXdlPerWave
+                           ",Seq(4,16,1)"   // ABlockTransferThreadClusterLengths
+                           ",Seq(1,0,2)"    // ABlockTransferThreadClusterArrangeOrder
+                           ",Seq(1,0,2)"    // ABlockTransferSrcAccessOrder
+                           ",2"             // ABlockTransferSrcVectorDim
+                           ",1"             // ABlockTransferSrcScalarPerVector
+                           ",8"             // ABlockTransferDstScalarPerVector_AK1
+                           ",1"             // ABlockLdsExtraM
+                           ",Seq(4,16,1)"   // BBlockTransferThreadClusterLengths
+                           ",Seq(1,0,2)"    // BBlockTransferThreadClusterArrangeOrder
+                           ",Seq(1,0,2)"    // BBlockTransferSrcAccessOrder
+                           ",2"             // BBlockTransferSrcVectorDim
+                           ",1"             // BBlockTransferSrcScalarPerVector
+                           ",8"             // BBlockTransferDstScalarPerVector_BK1
+                           ",1"             // BBlockLdsExtraN
+                           ",1"             // CShuffleMXdlPerWavePerShuffle
+                           ",1"             // CShuffleNXdlPerWavePerShuffle
+                           ",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths
+                           ",1"             // CDEBlockTransferScalarPerVector_NPerBlock
+                           ",fp16"          // AComputeDataType
+                           ",fp16"          // BComputeDataType
+                           ",Default"       // LoopScheduler
+                           ",1>";           // NumGroupsToMerge
+
+// Test GetInstanceString through base class pointer for standard XDL variant
+TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConv)
+{
+    using BaseClass = ck::tensor_operation::device::BaseOperator;
+    DeviceInstance device_instance;
+    BaseClass* base_ptr = &device_instance;
+
+    EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
+}
+
+TEST(InstanceString, DescriptionReturnsCorrectValueForFwdGrpConv)
+{
+    EXPECT_EQ(ckr::describe<DeviceInstance>().instance_string(), expected_str);
+}
+
+} // namespace
--- a/experimental/builder/test/test_instance_string_fwd_grp_conv_dl.cpp
+++ b/experimental/builder/test/test_instance_string_fwd_grp_conv_dl.cpp
@@ -0,0 +1,88 @@
+// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier: MIT
+
+#include <gtest/gtest.h>
+#include <ck_tile/builder/reflect/instance_traits.hpp>
+#include <ck_tile/builder/reflect/conv_description.hpp>
+#include <ck/tensor_operation/gpu/device/device_base.hpp>
+#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_dl_instance.hpp>
+
+namespace {
+
+namespace ckr = ck_tile::reflect;
+
+// Use the template helper to get a working instance configuration
+using InstanceTuple =
+    ck::tensor_operation::device::instance::device_grouped_conv2d_fwd_dl_f16_instances<
+        ck::tensor_operation::device::instance::GNHWC,           // InLayout
+        ck::tensor_operation::device::instance::GKYXC,           // WeiLayout
+        ck::tensor_operation::device::instance::Empty_Tuple,     // DsLayout
+        ck::tensor_operation::device::instance::GNHWK,           // OutLayout
+        ck::Tuple<>,                                             // DsDatatype
+        ck::tensor_operation::element_wise::PassThrough,         // CDEElementOp
+        ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec
+
+// Get the first instance from the tuple
+using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
+
+// Expected complete instance string based on the first instance from
+// device_grouped_conv2d_fwd_dl_f16_instances
+std::string expected_str = "DeviceGroupedConvFwdDlMultipleD_NHWC_KYXC_NHWK"
+                           "<2"                // NDimSpatial
+                           ",fp16"             // ADataType
+                           ",fp16"             // BDataType
+                           ",EmptyTuple"       // DsDataType
+                           ",fp16"             // EDataType
+                           ",fp32"             // AccDataType
+                           ",GNHWC"            // ALayout
+                           ",GKYXC"            // BLayout
+                           ",EmptyTuple"       // DsLayout
+                           ",GNHWK"            // ELayout
+                           ",PassThrough"      // AElementwiseOperation
+                           ",PassThrough"      // BElementwiseOperation
+                           ",PassThrough"      // CDEElementwiseOperation
+                           ",Default"          // ConvForwardSpecialization
+                           ",MNKPadding"       // GemmSpec
+                           ",8"                // BlockSize
+                           ",16"               // MPerBlock
+                           ",4"                // NPerBlock
+                           ",2"                // K0PerBlock
+                           ",1"                // K1
+                           ",1"                // M1PerThread
+                           ",2"                // N1PerThread
+                           ",1"                // KPerThread
+                           ",Seq(4,2)"         // M1N1ThreadClusterM1Xs
+                           ",Seq(1,1)"         // M1N1ThreadClusterN1Xs
+                           ",Seq(2,1,2,1)"     // ABlockTransferThreadSliceLengths_K0_M0_M1_K1
+                           ",Seq(1,1,8,1)"     // ABlockTransferThreadClusterLengths_K0_M0_M1_K1
+                           ",Seq(1,2,0,3)"     // ABlockTransferThreadClusterArrangeOrder
+                           ",Seq(1,2,0,3)"     // ABlockTransferSrcAccessOrder
+                           ",Seq(1,1,1,1)"     // ABlockTransferSrcVectorTensorLengths_K0_M0_M1_K1
+                           ",Seq(1,2,0,3)"     // ABlockTransferSrcVectorTensorContiguousDimOrder
+                           ",Seq(1,1,1,1)"     // ABlockTransferDstVectorTensorLengths_K0_M0_M1_K1
+                           ",Seq(1,1,1,1)"     // BBlockTransferThreadSliceLengths_K0_N0_N1_K1
+                           ",Seq(2,1,4,1)"     // BBlockTransferThreadClusterLengths_K0_N0_N1_K1
+                           ",Seq(1,2,0,3)"     // BBlockTransferThreadClusterArrangeOrder
+                           ",Seq(1,2,0,3)"     // BBlockTransferSrcAccessOrder
+                           ",Seq(1,1,1,1)"     // BBlockTransferSrcVectorTensorLengths_K0_N0_N1_K1
+                           ",Seq(1,2,0,3)"     // BBlockTransferSrcVectorTensorContiguousDimOrder
+                           ",Seq(1,1,1,1)"     // BBlockTransferDstVectorTensorLengths_K0_N0_N1_K1
+                           ",Seq(0,1,2,3,4,5)" // CThreadTransferSrcDstAccessOrder
+                           ",5"                // CThreadTransferSrcDstVectorDim
+                           ",1>";              // CThreadTransferDstScalarPerVector
+
+// Test GetInstanceString through base class pointer for DL variant
+TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvDl)
+{
+    using BaseClass = ck::tensor_operation::device::BaseOperator;
+    DeviceInstance device_instance;
+    BaseClass* base_ptr = &device_instance;
+
+    EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
+}
+
+// TODO: Add DescriptionReturnsCorrectValueForFwdGrpConvDl test once ckr::describe supports DL
+// variants The DL variant uses different internal structure (K0PerBlock, K1, M1PerThread, etc.) and
+// is missing standard members like kKPerBlock, kMPerXDL, kAK1
+
+} // namespace
--- a/experimental/builder/test/test_instance_string_fwd_grp_conv_large_tensor.cpp
+++ b/experimental/builder/test/test_instance_string_fwd_grp_conv_large_tensor.cpp
@@ -0,0 +1,94 @@
+// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier: MIT
+
+#include <gtest/gtest.h>
+#include <ck_tile/builder/reflect/instance_traits.hpp>
+#include <ck_tile/builder/reflect/conv_description.hpp>
+#include <ck/tensor_operation/gpu/device/device_base.hpp>
+#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_large_tensor_instance.hpp>
+
+namespace {
+
+namespace ckr = ck_tile::reflect;
+
+// Use the template helper to get a working instance configuration
+using InstanceTuple =
+    ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_large_tensor_f16_instances<
+        2,                                                       // NDimSpatial
+        ck::tensor_operation::device::instance::GNHWC,           // ALayout
+        ck::tensor_operation::device::instance::GKYXC,           // BLayout
+        ck::tensor_operation::device::instance::Empty_Tuple,     // DsLayout
+        ck::tensor_operation::device::instance::GNHWK,           // ELayout
+        ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
+
+// Get the first instance from the tuple
+using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
+
+// Expected complete instance string based on the first instance from
+// device_grouped_conv_fwd_xdl_large_tensor_f16_instances
+std::string expected_str = "DeviceGroupedConvFwdMultipleD_Xdl_CShuffle_Large_Tensor"
+                           "<2"             // NDimSpatial
+                           ",GNHWC"         // ALayout
+                           ",GKYXC"         // BLayout
+                           ",EmptyTuple"    // DsLayout
+                           ",GNHWK"         // ELayout
+                           ",fp16"          // ADataType
+                           ",fp16"          // BDataType
+                           ",fp32"          // AccDataType
+                           ",fp16"          // CShuffleDataType
+                           ",EmptyTuple"    // DsDataType
+                           ",fp16"          // EDataType
+                           ",PassThrough"   // AElementwiseOperation
+                           ",PassThrough"   // BElementwiseOperation
+                           ",PassThrough"   // CDEElementwiseOperation
+                           ",Default"       // ConvForwardSpecialization
+                           ",MNKPadding"    // GemmSpec
+                           ",1"             // NumGemmKPrefetchStage
+                           ",64"            // BlockSize
+                           ",64"            // MPerBlock
+                           ",64"            // NPerBlock
+                           ",32"            // KPerBlock
+                           ",8"             // AK1
+                           ",8"             // BK1
+                           ",32"            // MPerXDL
+                           ",32"            // NPerXDL
+                           ",2"             // MXdlPerWave
+                           ",2"             // NXdlPerWave
+                           ",Seq(4,16,1)"   // ABlockTransferThreadClusterLengths
+                           ",Seq(1,0,2)"    // ABlockTransferThreadClusterArrangeOrder
+                           ",Seq(1,0,2)"    // ABlockTransferSrcAccessOrder
+                           ",2"             // ABlockTransferSrcVectorDim
+                           ",1"             // ABlockTransferSrcScalarPerVector
+                           ",8"             // ABlockTransferDstScalarPerVector_AK1
+                           ",1"             // ABlockLdsExtraM
+                           ",Seq(4,16,1)"   // BBlockTransferThreadClusterLengths
+                           ",Seq(1,0,2)"    // BBlockTransferThreadClusterArrangeOrder
+                           ",Seq(1,0,2)"    // BBlockTransferSrcAccessOrder
+                           ",2"             // BBlockTransferSrcVectorDim
+                           ",1"             // BBlockTransferSrcScalarPerVector
+                           ",8"             // BBlockTransferDstScalarPerVector_BK1
+                           ",1"             // BBlockLdsExtraN
+                           ",1"             // CShuffleMXdlPerWavePerShuffle
+                           ",1"             // CShuffleNXdlPerWavePerShuffle
+                           ",Seq(1,16,1,4)" // CDEBlockTransferClusterLengths
+                           ",1"             // CDEBlockTransferScalarPerVector_NPerBlock
+                           ",fp16"          // AComputeDataType
+                           ",fp16"          // BComputeDataType
+                           ",Default>";     // LoopScheduler
+
+// Test GetInstanceString through base class pointer for large tensor variant
+TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvLargeTensor)
+{
+    using BaseClass = ck::tensor_operation::device::BaseOperator;
+    DeviceInstance device_instance;
+    BaseClass* base_ptr = &device_instance;
+
+    EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
+}
+
+TEST(InstanceString, DescriptionReturnsCorrectValueForFwdGrpConvLargeTensor)
+{
+    EXPECT_EQ(ckr::describe<DeviceInstance>().instance_string(), expected_str);
+}
+
+} // namespace
--- a/experimental/builder/test/test_instance_string_fwd_grp_conv_v3.cpp
+++ b/experimental/builder/test/test_instance_string_fwd_grp_conv_v3.cpp
@@ -0,0 +1,96 @@
+// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier: MIT
+
+#include <gtest/gtest.h>
+#include <ck_tile/builder/reflect/instance_traits.hpp>
+#include <ck_tile/builder/reflect/conv_description.hpp>
+#include <ck/tensor_operation/gpu/device/device_base.hpp>
+#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_xdl_comp_instance.hpp>
+
+namespace {
+
+namespace ckr = ck_tile::reflect;
+
+// Use the template helper to get a working instance configuration
+using InstanceTuple =
+    ck::tensor_operation::device::instance::device_grouped_conv_fwd_xdl_f16_comp_instances<
+        2,                                                       // NDimSpatial
+        ck::tensor_operation::device::instance::GNHWC,           // ALayout
+        ck::tensor_operation::device::instance::GKYXC,           // BLayout
+        ck::tensor_operation::device::instance::Empty_Tuple,     // DsLayout
+        ck::tensor_operation::device::instance::GNHWK,           // ELayout
+        ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvForwardSpecialization
+
+// Get the first instance from the tuple
+using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
+
+// Expected complete instance string based on the first instance from
+// device_grouped_conv_fwd_xdl_f16_comp_instances This corresponds to the configuration with
+// BlockSize=256, MPerBlock=128, NPerBlock=128, KPerBlock=64, etc.
+std::string expected_str = "DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3"
+                           "<2"             // NDimSpatial
+                           ",GNHWC"         // ALayout
+                           ",GKYXC"         // BLayout
+                           ",EmptyTuple"    // DsLayout
+                           ",GNHWK"         // ELayout
+                           ",fp16"          // ADataType
+                           ",fp16"          // BDataType
+                           ",fp32"          // AccDataType
+                           ",fp16"          // CShuffleDataType
+                           ",EmptyTuple"    // DsDataType
+                           ",fp16"          // EDataType
+                           ",PassThrough"   // AElementwiseOperation
+                           ",PassThrough"   // BElementwiseOperation
+                           ",PassThrough"   // CDEElementwiseOperation
+                           ",Default"       // ConvForwardSpecialization
+                           ",MNKPadding"    // GemmSpec
+                           ",256"           // BlockSize
+                           ",128"           // MPerBlock
+                           ",128"           // NPerBlock
+                           ",64"            // KPerBlock
+                           ",8"             // AK1
+                           ",8"             // BK1
+                           ",32"            // MPerXDL
+                           ",32"            // NPerXDL
+                           ",2"             // MXdlPerWave
+                           ",2"             // NXdlPerWave
+                           ",Seq(8,32,1)"   // ABlockTransferThreadClusterLengths
+                           ",Seq(1,0,2)"    // ABlockTransferThreadClusterArrangeOrder
+                           ",Seq(1,0,2)"    // ABlockTransferSrcAccessOrder
+                           ",2"             // ABlockTransferSrcVectorDim
+                           ",8"             // ABlockTransferSrcScalarPerVector
+                           ",8"             // ABlockTransferDstScalarPerVector_AK1
+                           ",0"             // ABlockLdsExtraM
+                           ",Seq(8,32,1)"   // BBlockTransferThreadClusterLengths
+                           ",Seq(1,0,2)"    // BBlockTransferThreadClusterArrangeOrder
+                           ",Seq(1,0,2)"    // BBlockTransferSrcAccessOrder
+                           ",2"             // BBlockTransferSrcVectorDim
+                           ",8"             // BBlockTransferSrcScalarPerVector
+                           ",8"             // BBlockTransferDstScalarPerVector_BK1
+                           ",0"             // BBlockLdsExtraN
+                           ",1"             // CShuffleMXdlPerWavePerShuffle
+                           ",1"             // CShuffleNXdlPerWavePerShuffle
+                           ",Seq(1,32,1,8)" // CDEBlockTransferClusterLengths
+                           ",8"             // CDEBlockTransferScalarPerVector_NPerBlock
+                           ",Intrawave"     // BlkGemmPipeSched
+                           ",v4"            // BlkGemmPipelineVer
+                           ",fp16"          // AComputeDataType
+                           ",fp16"          // BComputeDataType
+                           ",false>";       // DirectLoad
+
+// Test GetInstanceString through base class pointer for V3 variant
+TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvV3)
+{
+    using BaseClass = ck::tensor_operation::device::BaseOperator;
+    DeviceInstance device_instance;
+    BaseClass* base_ptr = &device_instance;
+
+    EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
+}
+
+TEST(InstanceString, DescriptionReturnsCorrectValueForFwdGrpConvV3)
+{
+    EXPECT_EQ(ckr::describe<DeviceInstance>().instance_string(), expected_str);
+}
+
+} // namespace
--- a/experimental/builder/test/test_instance_string_fwd_grp_conv_wmma.cpp
+++ b/experimental/builder/test/test_instance_string_fwd_grp_conv_wmma.cpp
@@ -0,0 +1,93 @@
+// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
+// SPDX-License-Identifier: MIT
+
+#include <gtest/gtest.h>
+#include <ck_tile/builder/reflect/instance_traits.hpp>
+#include <ck_tile/builder/reflect/conv_description.hpp>
+#include <ck/tensor_operation/gpu/device/device_base.hpp>
+#include <ck/library/tensor_operation_instance/gpu/grouped_conv_fwd/device_grouped_conv_fwd_wmma_instance.hpp>
+
+namespace {
+
+namespace ckr = ck_tile::reflect;
+
+// Use the template helper to get a working instance configuration
+using InstanceTuple =
+    ck::tensor_operation::device::instance::device_grouped_conv_fwd_wmma_f16_instances<
+        2,                                                       // NDimSpatial
+        ck::tensor_operation::device::instance::GNHWC,           // ALayout
+        ck::tensor_operation::device::instance::GKYXC,           // BLayout
+        ck::tensor_operation::device::instance::Empty_Tuple,     // DsLayout
+        ck::tensor_operation::device::instance::GNHWK,           // ELayout
+        ck::Tuple<>,                                             // DsDatatype
+        ck::tensor_operation::element_wise::PassThrough,         // CDEElementOp
+        ck::tensor_operation::device::instance::ConvFwdDefault>; // ConvSpec
+
+// Get the first instance from the tuple
+using DeviceInstance = typename std::tuple_element<0, InstanceTuple>::type;
+
+// Expected complete instance string based on the first instance from
+// device_grouped_conv_fwd_wmma_f16_instances
+std::string expected_str = "DeviceGroupedConvFwdMultipleD_Wmma_CShuffle"
+                           "<2"             // NDimSpatial
+                           ",GNHWC"         // ALayout
+                           ",GKYXC"         // BLayout
+                           ",EmptyTuple"    // DsLayout
+                           ",GNHWK"         // ELayout
+                           ",fp16"          // ADataType
+                           ",fp16"          // BDataType
+                           ",fp32"          // AccDataType
+                           ",fp16"          // CShuffleDataType
+                           ",EmptyTuple"    // DsDataType
+                           ",fp16"          // EDataType
+                           ",PassThrough"   // AElementwiseOperation
+                           ",PassThrough"   // BElementwiseOperation
+                           ",PassThrough"   // CDEElementwiseOperation
+                           ",Default"       // ConvForwardSpecialization
+                           ",MNKPadding"    // GemmSpec
+                           ",1"             // NumGemmKPrefetchStage
+                           ",128"           // BlockSize
+                           ",64"            // MPerBlock
+                           ",64"            // NPerBlock
+                           ",32"            // KPerBlock
+                           ",8"             // K1
+                           ",16"            // MPerWmma
+                           ",16"            // NPerWmma
+                           ",2"             // MRepeat
+                           ",2"             // NRepeat
+                           ",Seq(4,32,1)"   // ABlockTransferThreadClusterLengths
+                           ",Seq(1,0,2)"    // ABlockTransferThreadClusterArrangeOrder
+                           ",Seq(1,0,2)"    // ABlockTransferSrcAccessOrder
+                           ",2"             // ABlockTransferSrcVectorDim
+                           ",1"             // ABlockTransferSrcScalarPerVector
+                           ",8"             // ABlockTransferDstScalarPerVector_AK1
+                           ",true"          // ABlockLdsExtraM
+                           ",Seq(4,32,1)"   // BBlockTransferThreadClusterLengths
+                           ",Seq(1,0,2)"    // BBlockTransferThreadClusterArrangeOrder
+                           ",Seq(1,0,2)"    // BBlockTransferSrcAccessOrder
+                           ",2"             // BBlockTransferSrcVectorDim
+                           ",1"             // BBlockTransferSrcScalarPerVector
+                           ",8"             // BBlockTransferDstScalarPerVector_BK1
+                           ",true"          // BBlockLdsExtraN
+                           ",1"             // CShuffleMRepeatPerShuffle
+                           ",1"             // CShuffleNRepeatPerShuffle
+                           ",Seq(1,32,1,4)" // CDEShuffleBlockTransferClusterLengths
+                           ",1"             // CDEShuffleBlockTransferScalarPerVector_NPerBlock
+                           ",Default"       // LoopSched
+                           ",v1>";          // PipelineVer
+
+// Test GetInstanceString through base class pointer for WMMA variant
+TEST(InstanceString, GetInstanceStringReturnsCorrectValueForFwdGrpConvWmma)
+{
+    using BaseClass = ck::tensor_operation::device::BaseOperator;
+    DeviceInstance device_instance;
+    BaseClass* base_ptr = &device_instance;
+
+    EXPECT_EQ(base_ptr->GetInstanceString(), expected_str);
+}
+
+// TODO: Add DescriptionReturnsCorrectValueForFwdGrpConvWmma test once ckr::describe supports WMMA
+// variants The WMMA variant uses WMMA-specific parameters (MPerWmma, NPerWmma, MRepeat, NRepeat)
+// and has a different tile transfer structure incompatible with current ConvTraits
+
+} // namespace