CK Instance Gen (#1145)

* Format * Format * Format * Remove const * Use the right template * Format * Format * add row/col instances * Add missing file * fixed * fixing block to etile error * Format * Updates * Format * fixed rrr layout * generating a sample JSON file: currently contains includes, prologue/epilogue and instances * version where the json is passed into the instances to generate a key * updated run function to just launch kernel * updated run function: only contains kernel object, json file is updated but still needs to be cleaned up, added front-end API to parse JSON into character buffer * adding in testing files * cleaned up comments, still need to work on including header files * removed unneeded files * removed/commented out JSON implementation * added fusion(prologue/epilogue) into instance generation * working on instance selection * added instance selection, need to fix instance validation * removed block2etile map validity check for testing purposes * test running: failing due to incorrect files/input * all grid descs/ptrs completed, but device file not found * Update test and embed modules * Restore older version * added convolution operation, written test, debugging generated code for compilation * attempting to include CK in host directory: _Float16 error * CK header file issues * slight fix * don't crash when hip can't report total memory * dump generated code to a file * changing sizes * creating tensor descriptors using CK methods: set up grid desc manually, also trying to set up an argument pointer - this needs to be fixed * some fixes to call the device code * separating test files for conv and gemm * completed arg ptr, now have linking errors * clang format fix * resolved linker issues in conv test * remove dependency on libutility from ck * resolved num dim error * properly passing arg ptr, errors with passing typenames: redefinition/redeclaration * undo the commenting of device function * hand created kernel code to find rtc issues * dump the full src to file * resolved redeclaration errors, cleaned up errors for Amber's kernel code * debugging purposes: redeclaration error * config files * resolved errors for NumTensor and redeclaration, formatted version.h * resolved most errors in manually added kernel and my own. error with calling kernel object: overloaded function type * WIP: close to getting kernel compiled * WIP: fixing rtc errors * fixed sequence errors, formatting, still one error with run fcn * yay: kernel compiles and runs * updated templated/generated version to run and compile * minor fixes * working generated example, resolved memory access error due to padding * adding in reference kernel, validation failing against reference * debugging: printing kernel argsz * reduced error in results * debugged reference kernel and output errors, added to generated version, currently debugging prologue function issues * working validation (using reference convolution) with prologue function for both hard-coded and generated version * WIP: create an alt version that creates Argument on the device * wip: added new duplicate files, fixed fusion templating errors from working example, setting up kernel arguments * wip: making necessary methods device code * added grid descs, working on grid pointers, errors with stl numerics * wip: updating kernel args - issue, replacing some std functions * replaced std::accumulate call with temp hardcoded version * wip: args causing memory issue * Construct Argument object inside the kernel and use it to call convolution device function. Code runs and verification passes * adding object file dump * temporary hardcoding of grid size, can remove device op inst + arg ptr * minor fix for grid size * added modified example where arg ptr is created on the device for generated version as well * removed device op instance and arg ptr from modified examples * moving device op file for testing purposes and to properly build CK * commenting out print-outs * adjust compiler args to produce a valid ELF file * temporary removal of validation * reverting compiler args back for working example * retrieve necessary arguments from generated template parameters in correct format * calculating grid size on host-side, still need to clean up process, pass parameters to host functions properly * scaled up factory functions/wrapper structs to implement host-side launch parameter calculations using CK host side functions - in hard-coded example * temporary change to generate ELF format binary object file * removed unecessary code, added comments * formatting fix * cleaned up code, added new tests, restructured library: move helper into CK * refactored launch parameter calculation to be more concise * renamed files and variables for more clarity/uniformity * more code cleaning, removed debug statements * moved majority of my files into codegen directory, running properly * updated Embed.cmake(string_view) in codegen directory * updated host directory to match Embed.cmake as well * added old tests in * updated instance generation methods to be more concise * removed layout from launch parameter calculation * working test * fixed issue with verification, all instances working * updated verification in other tests * removed duplicate matrix padder file, removed code dumps * removed old hard-coded tests * removed old host directory, all files in codegen directory now * fixed copyright in files * commenting out validation * renamed files * made changes for review: fixed copyright, renamed files for clarity, removed comments, refactored code * updated headers * removing duplicate file for fwd conv to gemm, merging with original file * fix building codegen with clang++ directly * resolving build error from conv_fwd_to_gemm * fix for previous error * renaming tests * created common test file * cleaned up code, added comments * renamed device op * fixed typos in comments * removed extra space * code cleanup: resolving Amber's comments * removed wrapper struct for matrix padder, fixed template * cleaned up if statements for better readability --------- Co-authored-by: Paul <pfultz2@yahoo.com> Co-authored-by: Jing Zhang <jizha@amd.com> Co-authored-by: M. Amber Hassaan <amber_474@yahoo.com> Co-authored-by: illsilin <Illia.Silin@amd.com> Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
2026-04-19 22:39:03 +00:00 · 2024-06-25 14:37:35 -07:00
parent cb13839425
commit 3e9711f0cb
33 changed files with 3417 additions and 47 deletions
--- a/codegen/include/ck/host/device_gemm_multiple_d.hpp
+++ b/codegen/include/ck/host/device_gemm_multiple_d.hpp
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: MIT
-// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
+// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.

 #pragma once

--- a/codegen/include/ck/host/device_gemm_multiple_d/operation.hpp
+++ b/codegen/include/ck/host/device_gemm_multiple_d/operation.hpp
@@ -14,10 +14,15 @@ namespace ck {
 namespace host {
 namespace device_gemm_multiple_d {

+// defines all values need for an instance of fwd conv
 struct Operation_Xdl_CShuffle
 {
-    static std::vector<std::vector<Operation_Xdl_CShuffle>> CreateOperations();
-    static std::vector<Operation_Xdl_CShuffle> CreateOperations(const Problem& prob);
+    // returns a vector of instances, only given fusion operators: will use default problem spec
+    static std::vector<std::vector<Operation_Xdl_CShuffle>>
+    CreateOperations(const std::string& prologue, const std::string& epilogue);
+    // returns a vector of instances, given a problem spec and fusion operators
+    static std::vector<Operation_Xdl_CShuffle>
+    CreateOperations(const Problem& prob, const std::string& prologue, const std::string& epilogue);
    TensorDesc A{};
    TensorDesc B{};
    DataType acc               = DataType::Float;
@@ -27,13 +32,21 @@ struct Operation_Xdl_CShuffle
    std::string a_elem_op           = PassThrough;
    std::string b_elem_op           = PassThrough;
    std::string cde_elem_op         = Bilinear;
+    std::string prologue            = "";
+    std::string epilogue            = "";
    std::string gemm_specialization = "ck::tensor_operation::device::GemmSpecialization::Default";
+    // tuning parameters
    operation::TileDesc tile_desc{};
    operation::BlockTransferDesc a_block_transfer{};
    operation::BlockTransferDesc b_block_transfer{};
    operation::CShuffleDesc cshuffle{};
    operation::CBlockTransferDesc c_block_transfer{};

+    // functions to update fusion operators if provided
+    void update_prologue(const std::string& prologue);
+    void update_epilogue(const std::string& epilogue);
+    /**constexpr**/ bool IsSupported(std::size_t MRaw_, std::size_t NRaw_, std::size_t KRaw_);
+    // returns a templated instance
    Solution ToSolution() const;
 };

--- a/codegen/include/ck/host/device_gemm_multiple_d/problem.hpp
+++ b/codegen/include/ck/host/device_gemm_multiple_d/problem.hpp
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: MIT
-// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
+// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.

 #pragma once

@@ -12,11 +12,14 @@ namespace ck {
 namespace host {
 namespace device_gemm_multiple_d {

+// defines the problem specification for a GEMM operation
 struct Problem
 {
-    std::size_t M                    = 0;
-    std::size_t N                    = 0;
-    std::size_t K                    = 0;
+    // dimensions for GEMM operation
+    std::size_t M = 0;
+    std::size_t N = 0;
+    std::size_t K = 0;
+    // layouts for tensors
    bool TransA                      = false;
    bool TransB                      = false;
    bool TransE                      = false;
@@ -29,9 +32,13 @@ struct Problem
    std::string BElementOp           = PassThrough;
    std::string CDEElementOp         = PassThrough;

+    // returns the correct device op file for the operation
    std::string GetIncludeHeader() const;

-    std::vector<Solution> GetSolutions(const std::string& arch) const;
+    // returns a list of instances based on the problem spec and provided fusion operations
+    std::vector<Solution> GetSolutions(const std::string& arch,
+                                       const std::string& prologue,
+                                       const std::string& epilogue) const;
 };

 } // namespace device_gemm_multiple_d
--- a/codegen/include/ck/host/device_grouped_conv_fwd_multiple_d/conv_fwd_op.hpp
+++ b/codegen/include/ck/host/device_grouped_conv_fwd_multiple_d/conv_fwd_op.hpp
@@ -0,0 +1,60 @@
+// SPDX-License-Identifier: MIT
+// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.
+
+#pragma once
+
+#include <cstdlib>
+#include <vector>
+#include <string>
+#include "ck/host/types.hpp"
+#include "ck/host/operation/gemm.hpp"
+#include "ck/host/device_grouped_conv_fwd_multiple_d/conv_fwd_problem.hpp"
+
+namespace ck {
+namespace host {
+namespace conv {
+
+// defines the values needed for an instance of forward convolution and functions to return
+// (templated) instances
+struct Operation_Conv_Fwd_Xdl_Cshuffle
+{
+    // returns a vector of instances given the fusion operations, uses default values for problem
+    // spec
+    static std::vector<Operation_Conv_Fwd_Xdl_Cshuffle>
+    CreateOperations(const std::string& prologue, const std::string& epilogue);
+    // returns a vector of instances, provided with a problem spec and fusion operations
+    static std::vector<Operation_Conv_Fwd_Xdl_Cshuffle> CreateOperations(
+        const Problem_Conv_Fwd& prob, const std::string& prologue, const std::string& epilogue);
+    std::size_t NumDim;
+    TensorDesc A{};
+    TensorDesc B{};
+    DataType acc               = DataType::Float;
+    DataType cs_type           = DataType::Half;
+    std::vector<TensorDesc> Ds = {};
+    TensorDesc E{};
+    std::string a_elem_op   = PassThrough;
+    std::string b_elem_op   = PassThrough;
+    std::string cde_elem_op = PassThrough;
+    std::string prologue    = "";
+    std::string epilogue    = "";
+    std::string conv_specialization =
+        "ck::tensor_operation::device::ConvolutionForwardSpecialization::Default";
+    std::string gemm_specialization =
+        "ck::tensor_operation::device::GemmSpecialization::MNKPadding";
+    // tuning parameters
+    operation::TileDesc tile_desc{};
+    operation::BlockTransferDesc a_block_transfer{};
+    operation::BlockTransferDesc b_block_transfer{};
+    operation::CShuffleDesc cshuffle{};
+    operation::CBlockTransferDesc c_block_transfer{};
+
+    // functions to update fusion operations if they are provided
+    void update_prologue(const std::string& prologue);
+    void update_epilogue(const std::string& epilogue);
+    // returns a templated instance
+    Solution ToSolution() const;
+};
+
+} // namespace conv
+} // namespace host
+} // namespace ck
--- a/codegen/include/ck/host/device_grouped_conv_fwd_multiple_d/conv_fwd_problem.hpp
+++ b/codegen/include/ck/host/device_grouped_conv_fwd_multiple_d/conv_fwd_problem.hpp
@@ -0,0 +1,56 @@
+// SPDX-License-Identifier: MIT
+// Copyright (c) 2024, Advanced Micro Devices, Inc. All rights reserved.
+
+#pragma once
+
+#include <cstdlib>
+#include <vector>
+#include <memory>
+#include <sstream>
+#include <iterator>
+#include <numeric>
+#include "ck/host/types.hpp"
+
+namespace ck {
+namespace host {
+namespace conv {
+
+// defines the problem specification for a forward convolution operation
+struct Problem_Conv_Fwd
+{
+    std::size_t NumDim = 0;
+    // size of a forward convolution operation
+    std::size_t G                    = 0;
+    std::size_t N                    = 0;
+    std::size_t C                    = 0;
+    std::size_t Hi                   = 0;
+    std::size_t Wi                   = 0;
+    std::size_t Ho                   = 0;
+    std::size_t Wo                   = 0;
+    std::size_t K                    = 0;
+    std::size_t Y                    = 0;
+    std::size_t X                    = 0;
+    Layout ALayout                   = Layout::NHWGC;
+    Layout BLayout                   = Layout::GKYXC;
+    Layout ELayout                   = Layout::NHWGK;
+    std::vector<Layout> DsLayout     = {};
+    DataType ADataType               = DataType::Half;
+    DataType BDataType               = DataType::Half;
+    DataType EDataType               = DataType::Half;
+    std::vector<DataType> DsDataType = {};
+    std::string AElementOp           = "ck::tensor_operation::element_wise::PassThrough";
+    std::string BElementOp           = "ck::tensor_operation::element_wise::PassThrough";
+    std::string CDEElementOp         = "ck::tensor_operation::element_wise::PassThrough";
+
+    // returns the correct device op file for the operation
+    std::string GetIncludeHeader() const;
+
+    // returns a list of instances based on the problem spec and provided fusion operations
+    std::vector<Solution> GetSolutions(const std::string& arch,
+                                       const std::string& prologue,
+                                       const std::string& epilogue) const;
+};
+
+} // namespace conv
+} // namespace host
+} // namespace ck
--- a/codegen/include/ck/host/headers.hpp
+++ b/codegen/include/ck/host/headers.hpp
@@ -4,7 +4,6 @@
 #pragma once

 #include <string>
-#include <string_view>
 #include <utility>
 #include <unordered_map>
 #include <vector>
--- a/codegen/include/ck/host/operation/gemm.hpp
+++ b/codegen/include/ck/host/operation/gemm.hpp
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: MIT
-// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
+// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.

 #pragma once

--- a/codegen/include/ck/host/stringutils.hpp
+++ b/codegen/include/ck/host/stringutils.hpp
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: MIT
-// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
+// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.

 #pragma once

--- a/codegen/include/ck/host/types.hpp
+++ b/codegen/include/ck/host/types.hpp
@@ -1,5 +1,5 @@
 // SPDX-License-Identifier: MIT
-// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
+// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.

 #pragma once

@@ -12,6 +12,7 @@
 namespace ck {
 namespace host {

+// holds the templated instance, substitues values into template from instancess
 struct Solution
 {

@@ -33,6 +34,7 @@ struct Solution
    std::unordered_map<std::string, std::string> template_values;
 };

+// supported data types
 enum class DataType
 {
    Half,
@@ -40,22 +42,28 @@ enum class DataType
    Int8,
    Int32
 };
-
 std::string ToString(DataType dt);

+// supported layouts: gemm and fwd conv
 enum class Layout
 {
    Row,
-    Column
+    Column,
+    GKYXC,
+    GKCYX,
+    GNHWK,
+    GNHWC,
+    NHWGC,
+    NHWGK
 };
-
 std::string ToString(Layout dl);
+Layout ToLayout(bool Trans); // returns the layout for gemm

+// supported GEMM types
 enum class GemmType
 {
    Default
 };
-
 std::string ToString(GemmType gt);

 struct TensorDesc
--- a/codegen/include/ck/host/utils.hpp
+++ b/codegen/include/ck/host/utils.hpp
@@ -1,10 +1,12 @@
 // SPDX-License-Identifier: MIT
-// Copyright (c) 2018-2023, Advanced Micro Devices, Inc. All rights reserved.
+// Copyright (c) 2018-2024, Advanced Micro Devices, Inc. All rights reserved.

 #pragma once

 #include <cstdint>
 #include <unordered_set>
+#include <numeric>
+#include <iterator>

 namespace ck {
 namespace host {
@@ -12,6 +14,5 @@ namespace host {
 std::size_t integer_divide_ceil(std::size_t x, std::size_t y);

 const std::unordered_set<std::string>& get_xdlop_archs();
-
 } // namespace host
 } // namespace ck