mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-11 16:59:10 +00:00
[ck] Enforce ASCII-only C/C++ sources for hipRTC compatibility (#7829) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary CK source files must be compilable via **hipRTC (HIP runtime compilation)**, whose preprocessor does not accept non-ASCII bytes anywhere in a translation unit — **including in comments**. Bytes that are harmless under `hipcc` (em-dashes, smart quotes, multiplication signs, Greek letters, box-drawing glyphs, etc.) cause hipRTC to fail at preprocessing time. These regularly leak in via LLM-assisted authoring or copy/paste from formatted documents and silently break hipRTC paths that are not exercised by the default `hipcc`-based build matrix. This PR (a) cleans every existing violation (53 files) and (b) adds a pre-checkin gate so new violations are rejected before merge. ## File extensions covered Both the cleanup scan and the new Jenkins enforcement stage use the same predicate: ``` *.h *.hpp *.cpp *.h.in *.hpp.in *.cpp.in *.inc *.cl ``` (excluding `*/build/*` and `*/include/rapidjson/*`). This is a strict superset of the existing `Clang Format` stage's predicate — `*.inc` is added so test-fixture include files are also gated. The local pre-commit hook's `c++/inc` type filter covers the same set. ## Why no enforcement today CK is opted out of the rocm-libraries root `.pre-commit-config.yaml`, so the existing `pre-commit` workflow doesn't touch CK. The local CK `.pre-commit-config.yaml` only runs for developers who installed hooks. The **authoritative gate is therefore the new Jenkins stage** in this PR; the local hook is convenience. ## Commit layout (bisect-friendly) 1. `79798aa6261` — **`[ck] Convert reflect/ rendering to ASCII for hipRTC compatibility`** Behavior change, isolated. `TreeFormatter` swaps `├─ / └─ / │ ` for `|- / +- / | ` (3-col width preserved so alignment is unchanged). `conv_description.hpp` swaps `×` for `x` as the dimension separator. `test_conv_description.cpp` expected strings updated in lockstep so the snapshot test stays green. This is the only commit in the series with observable runtime impact. 2. `738fdb0d81c` — **`[ck] Strip non-ASCII bytes from C++ sources for hipRTC compatibility`** Mechanical text cleanup across 53 files. Replacements happen in comments or in `std::cout` strings that are not asserted on by any test. None of the 174 `.inc` files in the tree required edits, but they were in the scan's predicate so the enforcement stage's predicate is a superset of what was scanned. Full replacement table in the commit message. 3. `1d7cd8ba235` — **`[ck] Enforce ASCII-only C/C++ sources for hipRTC compatibility`** - New `projects/composablekernel/script/check_ascii_only.sh` (modeled on `check_copyright_year.sh`). - New entry in `projects/composablekernel/.pre-commit-config.yaml` under the local-hooks block (`types_or: [c++, inc]`). - New `ASCII Only Check` parallel stage in `projects/composablekernel/Jenkinsfile`'s `Static checks` block, mirroring the existing `Clang Format` stage but with `*.inc` added to the find predicate. Always-on, no `RUN_CPPCHECK` gate. The tree is buildable at every commit boundary. Commit 1 leaves 50 known violations; commit 2 leaves 0; commit 3 wires the gate. ## Demo Script output on a synthesized violation: ``` $ printf '// em-dash test \xe2\x80\x94 here\n' > /tmp/bad.cpp $ projects/composablekernel/script/check_ascii_only.sh /tmp/bad.cpp ERROR: /tmp/bad.cpp contains non-ASCII bytes: 1:// em-dash test — here Fix: replace with ASCII (em-dash -> --, smart quotes -> ", arrows -> ->, etc.) $ echo $? 1 ``` Full repo scan after the cleanup commits (note the `-name '*.inc'` clause): ``` $ cd projects/composablekernel && find . -type f \( -name '*.h' -o -name '*.hpp' -o -name '*.cpp' \ -o -name '*.h.in' -o -name '*.hpp.in' -o -name '*.cpp.in' -o -name '*.inc' -o -name '*.cl' \) \ -not -path '*/build/*' -not -path '*/include/rapidjson/*' -print0 \ | xargs -0 -P 8 -n 64 script/check_ascii_only.sh $ echo $? 0 ``` ## Test plan - [ ] Jenkins PR build: confirm new `Static checks -> ASCII Only Check` stage runs green over the full predicate (incl. `*.inc`) and existing `Clang Format` stage is unaffected. - [ ] `test_conv_description` passes against the ASCII tree-formatter output (touched in commit 1). - [ ] Local: `pre-commit run ascii-only-checker --all-files` runs cleanly after installing CK pre-commit hooks via `script/install_precommit.sh`. - [ ] Manually inject a non-ASCII byte in any `.cpp/.hpp/.inc` file, push: confirm Jenkins fails the new stage with a clear error. - [ ] Spot-check a representative subset of touched files under hipRTC compilation to confirm no remaining hipRTC-blocking content (optional, since the static byte check is a sufficient condition for hipRTC preprocessor acceptance on this dimension). 🤖 Generated with [Claude Code](https://claude.com/claude-code)
571 lines
28 KiB
C++
571 lines
28 KiB
C++
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
|
|
// SPDX-License-Identifier: MIT
|
|
|
|
#include <gtest/gtest.h>
|
|
#include <gmock/gmock.h>
|
|
|
|
#include "ck_tile/builder/conv_builder.hpp"
|
|
#include "ck_tile/builder/reflect/conv_description.hpp"
|
|
#include "ck_tile/builder/reflect/conv_describe.hpp"
|
|
#include "testing_utils.hpp"
|
|
#include "impl/conv_signature_types.hpp"
|
|
#include "impl/conv_algorithm_types.hpp"
|
|
#include "ck_tile/builder/conv_signature_utils.hpp"
|
|
|
|
namespace {
|
|
|
|
namespace ckb = ck_tile::builder;
|
|
namespace ckr = ck_tile::reflect;
|
|
namespace ckt = ck_tile::test;
|
|
|
|
struct TensorOp
|
|
{
|
|
ckb::ElementwiseOperation elementwise_operation{ckb::ElementwiseOperation::PASS_THROUGH};
|
|
};
|
|
|
|
struct InvalidTensorOp
|
|
{
|
|
int elementwise_operation = 7; // invalid value
|
|
};
|
|
static_assert(!ckb::TensorOperatorDescriptor<InvalidTensorOp>);
|
|
|
|
struct TensorConfig
|
|
{
|
|
ckb::TensorLayout layout;
|
|
ckb::DataType data_type{ckb::DataType::UNDEFINED_DATA_TYPE};
|
|
ckb::DataType compute_type{ckb::DataType::UNDEFINED_DATA_TYPE};
|
|
};
|
|
|
|
struct TensorConfigNoDataType
|
|
{
|
|
ckb::TensorLayout layout;
|
|
ckb::DataType compute_type{ckb::DataType::UNDEFINED_DATA_TYPE};
|
|
};
|
|
|
|
struct ConvTensorNoDataType
|
|
{
|
|
TensorConfigNoDataType config;
|
|
TensorOp operation{};
|
|
};
|
|
|
|
struct ConvTensorSimple
|
|
{
|
|
TensorConfig config;
|
|
};
|
|
|
|
struct ConvTensorWithOp
|
|
{
|
|
TensorConfig config;
|
|
TensorOp operation{};
|
|
};
|
|
|
|
struct ConvTensorWithInvalidOp
|
|
{
|
|
TensorConfig config;
|
|
InvalidTensorOp operation{};
|
|
};
|
|
|
|
// Defines the signature of the convolution operation to be tested.
|
|
// This includes dimensionality, direction, data layout, and data type.
|
|
struct ConvSignature
|
|
{
|
|
using enum ckb::DataType;
|
|
using enum ckb::TensorLayout;
|
|
|
|
int spatial_dim = 2;
|
|
ckb::DataType data_type = FP16;
|
|
ckb::DataType accumulation_data_type = FP32;
|
|
ConvTensorSimple input = {.config = {GNHWC}};
|
|
ConvTensorSimple weight = {.config = {GKYXC}};
|
|
ConvTensorSimple output = {.config = {GNHWK}};
|
|
};
|
|
static_assert(ckb::ConvSignatureDescriptor<ConvSignature>);
|
|
|
|
// Compile time tests for concepts
|
|
struct ConvSignatureWithOptionalParams
|
|
{
|
|
using enum ckb::DataType;
|
|
using enum ckb::TensorLayout;
|
|
using enum ckb::ConvDirection;
|
|
using enum ckb::ElementwiseOperation;
|
|
|
|
int spatial_dim = 2;
|
|
ckb::DataType data_type = FP16;
|
|
ckb::DataType accumulation_data_type = FP32;
|
|
ckb::ConvDirection direction = FORWARD;
|
|
ConvTensorWithOp input = {
|
|
.config = {GNHWC, FP16},
|
|
};
|
|
ConvTensorWithOp weight = {.config = {GKYXC, FP16}};
|
|
ConvTensorWithOp output = {.config = {GNHWK, FP16}, .operation = {SCALE}};
|
|
};
|
|
static_assert(ckb::ConvSignatureDescriptor<ConvSignatureWithOptionalParams>);
|
|
|
|
struct ConvSignatureWithInvalidOptionalParams
|
|
{
|
|
using enum ckb::DataType;
|
|
using enum ckb::TensorLayout;
|
|
|
|
int spatial_dim = 2;
|
|
ckb::DataType data_type = FP16;
|
|
ckb::DataType accumulation_data_type = FP32;
|
|
ConvTensorWithInvalidOp input = {.config = {GNHWC}};
|
|
ConvTensorWithInvalidOp weight = {.config = {GKYXC}};
|
|
ConvTensorWithInvalidOp output = {.config = {GNHWK}};
|
|
};
|
|
static_assert(!ckb::ConvSignatureDescriptor<ConvSignatureWithInvalidOptionalParams>);
|
|
|
|
struct DefaultAlgorithm
|
|
{
|
|
ckb::test::ThreadBlock thread_block{.block_size = 256,
|
|
.tile_size = {.m = 256, .n = 256, .k = 32}};
|
|
|
|
ckb::test::GridwiseFwdXdlGemm gridwise_gemm{
|
|
.ak1 = 8,
|
|
.bk1 = 8,
|
|
.xdl_params = {.m_per_xdl = 16, .n_per_xdl = 16, .m_xdl_per_wave = 8, .n_xdl_per_wave = 8}};
|
|
|
|
ckb::test::Transfer<> transfer{
|
|
.a =
|
|
{
|
|
.block_transfer = {.k0 = 1, .m_n = 128, .k1 = 2},
|
|
.lds_transfer = {.src_vector_dim = 2,
|
|
.src_scalar_per_vector = 2,
|
|
.lds_dst_scalar_per_vector = 2,
|
|
.is_direct_load = false,
|
|
.lds_padding = false},
|
|
.thread_cluster_arrange_order = {.order = {0, 1, 2}},
|
|
.src_access_order = {.order = {0, 1, 2}},
|
|
|
|
},
|
|
.b =
|
|
{
|
|
.block_transfer = {.k0 = 1, .m_n = 128, .k1 = 2},
|
|
.lds_transfer = {.src_vector_dim = 2,
|
|
.src_scalar_per_vector = 2,
|
|
.lds_dst_scalar_per_vector = 2,
|
|
.is_direct_load = false,
|
|
.lds_padding = false},
|
|
.thread_cluster_arrange_order = {.order = {0, 1, 2}},
|
|
.src_access_order = {.order = {0, 1, 2}},
|
|
},
|
|
.c =
|
|
{
|
|
.thread_cluster_dims =
|
|
{.m_block = 1, .m_wave_per_xdl = 32, .n_block = 1, .n_wave_per_xdl = 8},
|
|
.epilogue = {.m_xdl_per_wave_per_shuffle = 1,
|
|
.n_xdl_per_wave_per_shuffle = 1,
|
|
.scalar_per_vector = 2},
|
|
},
|
|
};
|
|
|
|
ckb::ConvSpecialization fwd_specialization = ckb::ConvSpecialization::DEFAULT;
|
|
ckb::GemmSpecialization gemm_specialization = ckb::GemmSpecialization::Default;
|
|
ckb::test::BlockGemmPipeline block_gemm_pipeline{.pipeline_version = ckb::PipelineVersion::V4,
|
|
.scheduler =
|
|
ckb::PipelineScheduler::INTRAWAVE};
|
|
size_t num_conv_groups_to_merge = 1;
|
|
};
|
|
static_assert(ckb::ConvAlgorithmDescriptor<DefaultAlgorithm>);
|
|
|
|
struct ConvSignatureUtilsTest1
|
|
{
|
|
using enum ckb::DataType;
|
|
using enum ckb::TensorLayout;
|
|
using enum ckb::ConvDirection;
|
|
using enum ckb::ElementwiseOperation;
|
|
|
|
int spatial_dim = 2;
|
|
ckb::DataType data_type = FP16;
|
|
ckb::DataType accumulation_data_type = FP32;
|
|
ckb::ConvDirection direction = FORWARD;
|
|
ConvTensorWithOp input = {
|
|
.config = {GNHWC, FP16},
|
|
};
|
|
ConvTensorWithOp weight = {.config = {GKYXC, FP16}};
|
|
ConvTensorWithOp output = {.config = {GNHWK, UNDEFINED_DATA_TYPE}, .operation = {SCALE}};
|
|
};
|
|
|
|
static_assert(ckb::ConvSignatureDescriptor<ConvSignatureUtilsTest1>);
|
|
|
|
struct ConvSignatureUtilsTest2
|
|
{
|
|
using enum ckb::DataType;
|
|
using enum ckb::TensorLayout;
|
|
using enum ckb::ConvDirection;
|
|
using enum ckb::ElementwiseOperation;
|
|
|
|
int spatial_dim = 2;
|
|
ckb::DataType data_type = FP16;
|
|
ckb::ElementwiseOperation elementwise_operation = CONV_INVSCALE;
|
|
ckb::DataType accumulation_data_type = FP32;
|
|
ckb::ConvDirection direction = FORWARD;
|
|
ConvTensorSimple input = {
|
|
.config = {GNHWC, FP16},
|
|
};
|
|
ConvTensorNoDataType weight = {.config = {GKYXC}, .operation = {POWER}};
|
|
ConvTensorWithOp output = {.config = {GNHWK, BF16}, .operation = {GELU}};
|
|
};
|
|
|
|
static_assert(ckb::ConvSignatureDescriptor<ConvSignatureUtilsTest2>);
|
|
|
|
TEST(ConvUtilsTest, getDataType1)
|
|
{
|
|
using enum ckb::DataType;
|
|
static constexpr const ConvSignatureUtilsTest1 SIGNATURE;
|
|
EXPECT_THAT(ckb::getInputDataType<SIGNATURE>(), FP16);
|
|
EXPECT_THAT(ckb::getWeightDataType<SIGNATURE>(), FP16);
|
|
EXPECT_THAT(ckb::getOutputDataType<SIGNATURE>(), FP16);
|
|
EXPECT_THAT(ckb::getDataTypeIfCommon<SIGNATURE>(), FP16);
|
|
}
|
|
|
|
TEST(ConvUtilsTest, getDataType2)
|
|
{
|
|
using enum ckb::DataType;
|
|
static constexpr const ConvSignatureUtilsTest2 SIGNATURE;
|
|
EXPECT_THAT(ckb::getInputDataType<SIGNATURE>(), FP16);
|
|
EXPECT_THAT(ckb::getWeightDataType<SIGNATURE>(), FP16);
|
|
EXPECT_THAT(ckb::getOutputDataType<SIGNATURE>(), BF16);
|
|
EXPECT_THAT(ckb::getDataTypeIfCommon<SIGNATURE>(), UNDEFINED_DATA_TYPE);
|
|
}
|
|
|
|
TEST(ConvUtilsTest, getElementwiseOperation1)
|
|
{
|
|
using enum ckb::ElementwiseOperation;
|
|
static constexpr const ConvSignatureUtilsTest1 SIGNATURE;
|
|
EXPECT_THAT(ckb::getInputElementwiseOperation<SIGNATURE>(), PASS_THROUGH);
|
|
EXPECT_THAT(ckb::getWeightElementwiseOperation<SIGNATURE>(), PASS_THROUGH);
|
|
EXPECT_THAT(ckb::getOutputElementwiseOperation<SIGNATURE>(), SCALE);
|
|
}
|
|
|
|
TEST(ConvUtilsTest, getElementwiseOperation2)
|
|
{
|
|
using enum ckb::ElementwiseOperation;
|
|
static constexpr const ConvSignatureUtilsTest2 SIGNATURE;
|
|
EXPECT_THAT(ckb::getInputElementwiseOperation<SIGNATURE>(), CONV_INVSCALE);
|
|
EXPECT_THAT(ckb::getWeightElementwiseOperation<SIGNATURE>(), POWER);
|
|
EXPECT_THAT(ckb::getOutputElementwiseOperation<SIGNATURE>(), GELU);
|
|
}
|
|
|
|
TEST(ConvDescriptionTest, DefaultInstanceHasBriefDescription)
|
|
{
|
|
static constexpr const ConvSignature SIGNATURE;
|
|
static constexpr const DefaultAlgorithm ALGORITHM;
|
|
using Instance = ckb::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
|
|
EXPECT_THAT(ckr::describe<Instance>().brief(), ckt::StringEqWithDiff("2D Forward convolution"));
|
|
}
|
|
|
|
TEST(ConvDescriptionTest, DefaultInstanceHasDetailedDescription)
|
|
{
|
|
static constexpr const ConvSignature SIGNATURE;
|
|
static constexpr const DefaultAlgorithm ALGORITHM;
|
|
using Instance = ckb::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
|
|
EXPECT_THAT(ckr::describe<Instance>().detailed(),
|
|
ckt::StringEqWithDiff( //
|
|
"2D Forward Convolution Kernel\n"
|
|
"|- Signature\n"
|
|
"| |- Tensor Type: FP16\n"
|
|
"| |- Input Layout: GNHWC\n"
|
|
"| |- Weight Layout: GKYXC\n"
|
|
"| |- Output Layout: GNHWK\n"
|
|
"| |- Input elementwise operation: PASS_THROUGH\n"
|
|
"| |- Weights elementwise operation: PASS_THROUGH\n"
|
|
"| +- Output elementwise operation: PASS_THROUGH\n"
|
|
"+- Algorithm\n"
|
|
" |- Thread block size: 256\n"
|
|
" |- Data tile size: 256x256x32\n"
|
|
" |- Gemm padding: DEFAULT\n"
|
|
" |- Convolution specialization: DEFAULT\n"
|
|
" |- Pipeline version: V4\n"
|
|
" |- Pipeline scheduler: INTRAWAVE\n"
|
|
" |- Warp Gemm parameters:\n"
|
|
" | |- subtile size: 16x16\n"
|
|
" | +- Number of warp gemm iterations: 8x8\n"
|
|
" +- Memory access:\n"
|
|
" |- A Tile transfer:\n"
|
|
" | |- Tile dimensions: 4x256x8\n"
|
|
" | |- The innermost K subdimension size: 8\n"
|
|
" | |- Thread cluster lengths (threads per axis): 1x128x2\n"
|
|
" | |- Spatial thread distribution over the data tile: 0x1x2\n"
|
|
" | |- The order of accessing data tile axes: 0x1x2\n"
|
|
" | |- Vectorized memory access axis index (with contiguous memory): 2\n"
|
|
" | |- Vector access (GMEM read) instruction size: 2\n"
|
|
" | |- Vector access (LDS write) instruction size: 2\n"
|
|
" | +- LDS data layout padding (to prevent bank conflicts): 0\n"
|
|
" |- B Tile transfer:\n"
|
|
" | |- Tile dimensions: 4x256x8\n"
|
|
" | |- The innermost K subdimension size: 8\n"
|
|
" | |- Thread cluster lengths (threads per axis): 1x128x2\n"
|
|
" | |- Spatial thread distribution over the data tile: 0x1x2\n"
|
|
" | |- The order of accessing data tile axes: 0x1x2\n"
|
|
" | |- Vectorized memory access axis index (with contiguous memory): 2\n"
|
|
" | |- Vector access (GMEM read) instruction size: 2\n"
|
|
" | |- Vector access (LDS write) instruction size: 2\n"
|
|
" | +- LDS data layout padding (to prevent bank conflicts): 0\n"
|
|
" +- C Tile transfer:\n"
|
|
" |- Data shuffle (number of gemm instructions per iteration): 1x1\n"
|
|
" |- Spatial thread distribution used to store data: 1x32x1x8\n"
|
|
" +- Vector access (GMEM write) instruction size: 2"));
|
|
}
|
|
|
|
// Test printing of optional parameters num_groups_to_merge,
|
|
// max_transpose_transfer_src_scalar_per_vector and max_transpose_transfer_dst_scalar_per_vector
|
|
TEST(ConvDescriptionTest, BwdWeightTwoStageWmmaV3DescriptionTest)
|
|
{
|
|
using Instance =
|
|
ck::tensor_operation::device::DeviceGroupedConvBwdWeightTwoStage_Wmma_CShuffleV3<
|
|
2, // NDimSpatial
|
|
ck::tensor_layout::convolution::GNHWC, // InLayout
|
|
ck::tensor_layout::convolution::GKYXC, // WeiLayout
|
|
ck::tensor_layout::convolution::GNHWK, // OutLayout
|
|
ck::half_t, // InDataType
|
|
ck::half_t, // WeiDataType
|
|
ck::half_t, // OutDataType
|
|
float, // AccDataType
|
|
ck::tensor_operation::element_wise::PassThrough, // InElementwiseOperation
|
|
ck::tensor_operation::element_wise::PassThrough, // WeiElementwiseOperation
|
|
ck::tensor_operation::element_wise::PassThrough, // OutElementwiseOperation
|
|
ck::tensor_operation::device::ConvolutionBackwardWeightSpecialization::
|
|
Default, // ConvBackwardWeightSpecialization
|
|
256, // BlockSize
|
|
128, // MPerBlock
|
|
128, // NPerBlock
|
|
16, // K0PerBlock
|
|
8, // AK1
|
|
32, // MPerWMMA
|
|
32, // NPerXDL
|
|
4, // MRepeat
|
|
4, // NRepeat
|
|
ck::Sequence<4, 64, 1>, // ABlockTransferThreadClusterLengths_AK0_M_AK1
|
|
ck::Sequence<1, 0, 2>, // ABlockTransferThreadClusterArrangeOrder_
|
|
ck::Sequence<1, 0, 2>, // ABlockTransferSrcAccessOrder
|
|
2, // ABlockTransferSrcVectorDim
|
|
8, // ABlockTransferSrcScalarPerVector
|
|
8, // ABlockTransferDstScalarPerVector_K1
|
|
1, // ABlockLdsAddExtraM
|
|
ck::Sequence<4, 64, 1>, // BBlockTransferThreadClusterLengths_BK0_N_BK1
|
|
ck::Sequence<1, 0, 2>, // BBlockTransferThreadClusterArrangeOrder_
|
|
ck::Sequence<1, 0, 2>, // BBlockTransferSrcAccessOrder_
|
|
2, // BBlockTransferSrcVectorDim
|
|
8, // BBlockTransferSrcScalarPerVector
|
|
8, // BBlockTransferDstScalarPerVector_K1
|
|
1, // BBlockLdsAddExtraN
|
|
1, // CShuffleMXdlPerWavePerShuffle
|
|
1, // CShuffleNXdlPerWavePerShuffle
|
|
ck::Sequence<1,
|
|
32,
|
|
1,
|
|
8>, // CBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock_
|
|
8, // CDEBlockTransferScalarPerVector_NPerBlock_
|
|
ck::BlockGemmPipelineScheduler::Intrawave, // BlkGemmPipeSched
|
|
ck::BlockGemmPipelineVersion::v1, // BlkGemmPipelineVer
|
|
4, // NumGroupsToMerge
|
|
ck::half_t, // AComputeDataType
|
|
ck::half_t, // BComputeDataType
|
|
1, // MaxTransposeTransferSrcScalarPerVector
|
|
1>; // MaxTransposeTransferDstScalarPerVector>
|
|
|
|
EXPECT_THAT(ckr::describe<Instance>().detailed(),
|
|
ckt::StringEqWithDiff( //
|
|
"2D Backward Weight Convolution Kernel\n"
|
|
"|- Signature\n"
|
|
"| |- Tensor Type: FP16\n"
|
|
"| |- Input Layout: GNHWC\n"
|
|
"| |- Weight Layout: GKYXC\n"
|
|
"| |- Output Layout: GNHWK\n"
|
|
"| |- Input elementwise operation: PASS_THROUGH\n"
|
|
"| |- Weights elementwise operation: PASS_THROUGH\n"
|
|
"| +- Output elementwise operation: PASS_THROUGH\n"
|
|
"+- Algorithm\n"
|
|
" |- Thread block size: 256\n"
|
|
" |- Data tile size: 128x128x16\n"
|
|
" |- Convolution specialization: DEFAULT\n"
|
|
" |- Pipeline version: V1\n"
|
|
" |- Pipeline scheduler: DEFAULT\n"
|
|
" |- Warp Gemm parameters:\n"
|
|
" | |- subtile size: 32x32\n"
|
|
" | +- Number of warp gemm iterations: 4x4\n"
|
|
" |- Memory access:\n"
|
|
" | |- A Tile transfer:\n"
|
|
" | | |- Tile dimensions: 2x128x8\n"
|
|
" | | |- The innermost K subdimension size: 8\n"
|
|
" | | |- Thread cluster lengths (threads per axis): 4x64x1\n"
|
|
" | | |- Spatial thread distribution over the data tile: 1x0x2\n"
|
|
" | | |- The order of accessing data tile axes: 1x0x2\n"
|
|
" | | |- Vectorized memory access axis index (with contiguous memory): 2\n"
|
|
" | | |- Vector access (GMEM read) instruction size: 8\n"
|
|
" | | |- Vector access (LDS write) instruction size: 8\n"
|
|
" | | +- LDS data layout padding (to prevent bank conflicts): 1\n"
|
|
" | |- B Tile transfer:\n"
|
|
" | | |- Tile dimensions: 2x128x8\n"
|
|
" | | |- The innermost K subdimension size: 8\n"
|
|
" | | |- Thread cluster lengths (threads per axis): 4x64x1\n"
|
|
" | | |- Spatial thread distribution over the data tile: 1x0x2\n"
|
|
" | | |- The order of accessing data tile axes: 1x0x2\n"
|
|
" | | |- Vectorized memory access axis index (with contiguous memory): 2\n"
|
|
" | | |- Vector access (GMEM read) instruction size: 8\n"
|
|
" | | |- Vector access (LDS write) instruction size: 8\n"
|
|
" | | +- LDS data layout padding (to prevent bank conflicts): 1\n"
|
|
" | +- C Tile transfer:\n"
|
|
" | |- Data shuffle (number of gemm instructions per iteration): 1x1\n"
|
|
" | |- Spatial thread distribution used to store data: 1x32x1x8\n"
|
|
" | +- Vector access (GMEM write) instruction size: 8\n"
|
|
" |- Max Transpose transfer src scalar per vector: 1\n"
|
|
" |- Max Transpose dst scalar per vector: 1\n"
|
|
" +- Num groups to merge: 4"));
|
|
}
|
|
|
|
// Test printing of optional parameters num_groups_to_merge,
|
|
// nax_transose_transfer_src_scalar_per_vector and max_transpose_dst_scalar_per_vector
|
|
TEST(ConvDescriptionTest, BwdWeightWmmaCshuffleV3DescriptionTest)
|
|
{
|
|
using Instance = ck::tensor_operation::device::DeviceGroupedConvBwdWeight_Wmma_CShuffle<
|
|
3, // NDimSpatial
|
|
ck::tensor_layout::convolution::GNDHWC, // InLayout
|
|
ck::tensor_layout::convolution::GKZYXC, // WeiLayout
|
|
ck::tensor_layout::convolution::GNDHWK, // OutLayout
|
|
ck::half_t, // InDataType
|
|
ck::half_t, // WeiDataType
|
|
ck::half_t, // OutDataType
|
|
float, // AccDataType
|
|
ck::tensor_operation::element_wise::PassThrough, // InElementwiseOperation
|
|
ck::tensor_operation::element_wise::PassThrough, // WeiElementwiseOperation
|
|
ck::tensor_operation::element_wise::PassThrough, // OutElementwiseOperation
|
|
ck::tensor_operation::device::ConvolutionBackwardWeightSpecialization::
|
|
Default, // ConvBackwardWeightSpecialization
|
|
256, // BlockSize
|
|
128, // MPerBlock
|
|
128, // NPerBlock
|
|
16, // K0PerBlock
|
|
8, // K1
|
|
32, // MPerWmma
|
|
32, // NPerWmma
|
|
4, // MRepeat
|
|
4, // NRepeat
|
|
ck::Sequence<4, 64, 1>, // ABlockTransferThreadClusterLengths_K0_M_K1
|
|
ck::Sequence<1, 0, 2>, // ABlockTransferThreadClusterArrangeOrder_
|
|
ck::Sequence<1, 0, 2>, // ABlockTransferSrcAccessOrder
|
|
2, // ABlockTransferSrcVectorDim
|
|
8, // ABlockTransferSrcScalarPerVector
|
|
8, // ABlockTransferDstScalarPerVector_K1
|
|
1, // ABlockLdsAddExtraM
|
|
ck::Sequence<4, 64, 1>, // BBlockTransferThreadClusterLengths_K0_N_K1
|
|
ck::Sequence<1, 0, 2>, // BBlockTransferThreadClusterArrangeOrder_
|
|
ck::Sequence<1, 0, 2>, // BBlockTransferSrcAccessOrder_
|
|
2, // BBlockTransferSrcVectorDim
|
|
8, // BBlockTransferSrcScalarPerVector
|
|
8, // BBlockTransferDstScalarPerVector_K1
|
|
1, // BBlockLdsAddExtraN
|
|
1, // CShuffleMXdlPerWavePerShuffle
|
|
1, // CShuffleNXdlPerWavePerShuffle
|
|
ck::Sequence<1,
|
|
32,
|
|
1,
|
|
8>, // CBlockTransferClusterLengths_MBlock_MPerBlock_NBlock_NPerBlock_
|
|
8, // CDEBlockTransferScalarPerVector_NPerBlock_
|
|
1, // NummGemmKPrefetchStage
|
|
ck::LoopScheduler::Default, // BlkGemmPipeSched
|
|
ck::PipelineVersion::v1, // BlkGemmPipelineVer
|
|
false>; // BComputeDataType
|
|
|
|
EXPECT_THAT(ckr::describe<Instance>().detailed(),
|
|
ckt::StringEqWithDiff( //
|
|
"3D Backward Weight Convolution Kernel\n"
|
|
"|- Signature\n"
|
|
"| |- Tensor Type: FP16\n"
|
|
"| |- Input Layout: GNDHWC\n"
|
|
"| |- Weight Layout: GKZYXC\n"
|
|
"| |- Output Layout: GNDHWK\n"
|
|
"| |- Input elementwise operation: PASS_THROUGH\n"
|
|
"| |- Weights elementwise operation: PASS_THROUGH\n"
|
|
"| +- Output elementwise operation: PASS_THROUGH\n"
|
|
"+- Algorithm\n"
|
|
" |- Thread block size: 256\n"
|
|
" |- Data tile size: 128x128x16\n"
|
|
" |- Convolution specialization: DEFAULT\n"
|
|
" |- Pipeline version: V1\n"
|
|
" |- Pipeline scheduler: DEFAULT\n"
|
|
" |- Warp Gemm parameters:\n"
|
|
" | |- subtile size: 32x32\n"
|
|
" | +- Number of warp gemm iterations: 4x4\n"
|
|
" |- Memory access:\n"
|
|
" | |- A Tile transfer:\n"
|
|
" | | |- Tile dimensions: 2x128x8\n"
|
|
" | | |- The innermost K subdimension size: 8\n"
|
|
" | | |- Thread cluster lengths (threads per axis): 4x64x1\n"
|
|
" | | |- Spatial thread distribution over the data tile: 1x0x2\n"
|
|
" | | |- The order of accessing data tile axes: 1x0x2\n"
|
|
" | | |- Vectorized memory access axis index (with contiguous memory): 2\n"
|
|
" | | |- Vector access (GMEM read) instruction size: 8\n"
|
|
" | | |- Vector access (LDS write) instruction size: 8\n"
|
|
" | | +- LDS data layout padding (to prevent bank conflicts): 1\n"
|
|
" | |- B Tile transfer:\n"
|
|
" | | |- Tile dimensions: 2x128x8\n"
|
|
" | | |- The innermost K subdimension size: 8\n"
|
|
" | | |- Thread cluster lengths (threads per axis): 4x64x1\n"
|
|
" | | |- Spatial thread distribution over the data tile: 1x0x2\n"
|
|
" | | |- The order of accessing data tile axes: 1x0x2\n"
|
|
" | | |- Vectorized memory access axis index (with contiguous memory): 2\n"
|
|
" | | |- Vector access (GMEM read) instruction size: 8\n"
|
|
" | | |- Vector access (LDS write) instruction size: 8\n"
|
|
" | | +- LDS data layout padding (to prevent bank conflicts): 1\n"
|
|
" | +- C Tile transfer:\n"
|
|
" | |- Data shuffle (number of gemm instructions per iteration): 1x1\n"
|
|
" | |- Spatial thread distribution used to store data: 1x32x1x8\n"
|
|
" | +- Vector access (GMEM write) instruction size: 8\n"
|
|
" +- Num gemm k prefetch stage: 1"));
|
|
}
|
|
|
|
TEST(ConvDescriptionTest, DefaultInstanceHasInstanceString)
|
|
{
|
|
static constexpr const ConvSignature SIGNATURE;
|
|
static constexpr const DefaultAlgorithm ALGORITHM;
|
|
using Instance = ckb::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
|
|
|
|
// Get the instance string from the description
|
|
std::string instance_str = ckr::describe<Instance>().instance_string();
|
|
|
|
// Verify that the instance string is not empty
|
|
EXPECT_FALSE(instance_str.empty());
|
|
|
|
// Verify that it contains the device operation name
|
|
// The exact format depends on the InstanceTraits implementation
|
|
EXPECT_THAT(instance_str, ::testing::HasSubstr("DeviceGroupedConvFwdMultipleABD"));
|
|
}
|
|
|
|
// NOTE: BackwardDataInstanceHasDetailedDescription test is disabled because ConvFactory
|
|
// does not have a specialization for backward data convolutions. The test fails with:
|
|
// "implicit instantiation of undefined template 'ck_tile::builder::ConvFactory<...>'"
|
|
//
|
|
// To enable this test, a ConvFactory specialization for backward data operations must be
|
|
// implemented first.
|
|
//
|
|
// TEST(ConvDescriptionTest, BackwardDataInstanceHasDetailedDescription)
|
|
// {
|
|
// struct BackwardDataSignature
|
|
// {
|
|
// int spatial_dim = 2;
|
|
// ckb::ConvDirection direction = ckb::ConvDirection::BACKWARD_DATA;
|
|
// ckb::GroupConvLayout layout = ckb::GroupConvLayout2D::GNHWC_GKYXC_GNHWK;
|
|
// ckb::DataType data_type = ckb::DataType::FP16;
|
|
// ckb::ElementwiseOperation elementwise_operation =
|
|
// ckb::ElementwiseOperation::PASS_THROUGH; ckb::GroupConvDeviceOp device_operation =
|
|
// ckb::BwdDataGroupConvDeviceOperation::DeviceGroupedConvBwdDataMultipleD_Xdl_CShuffle_v1;
|
|
// };
|
|
// static_assert(ckb::ConvSignatureDescriptor<BackwardDataSignature>);
|
|
//
|
|
// static constexpr const BackwardDataSignature SIGNATURE;
|
|
// static constexpr const DefaultAlgorithm ALGORITHM;
|
|
// using Builder = ckb::ConvBuilder<SIGNATURE, ALGORITHM>;
|
|
//
|
|
// // Verify Brief works
|
|
// EXPECT_THAT(ckr::Describe<Builder>().brief(),
|
|
// ckt::StringEqWithDiff("2D Backward Data convolution"));
|
|
//
|
|
// // Verify detailed works - to be updated once ConvFactory is implemented
|
|
// EXPECT_THAT(ckr::Describe<Builder>().detailed(),
|
|
// ckt::StringEqWithDiff("PLACEHOLDER"));
|
|
// }
|
|
|
|
} // namespace
|