[CK_BUILDER] convolution testing (#3267)

* Add README.md for testing

* Add tensor_memory_manager.

* ck-builder: tensor memory manager rebase fixes

This fixes some issues caused by the API being changed recently.
Also, this streamlines the ckt namespace to always be ck_tile::builder::test,
as this is already being used by other tests

Really, this commit should be squashed into the previous,
but I'm keeping it separate for brevity.

* ck-builder: test arguments initial prototype

* ck-builder: test system initial prototype

* ck-builder: fix non-standardized copyright comments

* ck-builder: new prototype

* ck-builder: group testing inputs/outputs into a separate structure

This is basically the return of the tensor memory manager after all,
except that the design is more closely tied to the actual operation.
Using a struct allows us to add additional input/output tensors
without breaking code (by defaulting those new parameters). Note
that the tensors are split into a separate inputs/outputs because we
usually want to allocate the output _twice_: once for the real
computation and once for the reference computation.

* ck-builder: simplify prototype naming; start docs

* ck-builder: update testing readme

* ck-builder: testing documentation

* ck-builder: HipStatusMatcher

This matcher can be used to check HIP status codes and provide
nice and readable error messages.

* ck-builder: tensor_buffer.hpp tests

* ck-builder: conv_fwd.hpp tests

* ck-builder: add example end-to-end test in conv fwd 2d fp16

* ck-builder: simplify extent usage

* ck-builder: update testing doc

* ck-builder: skip end to end test on non-gfx9

* fix check_copyright_year interpreter

/bin/bash is not guaranteed to exist on Linux. Signed,
a NixOS user

* ck-builder: fix copyrights

* ck-builder: reduce conv fwd testing size

This test allocated 24GB of memory, too much for 16GB cards.

---------

Co-authored-by: John Shumway <jshumway@amd.com>
This commit is contained in:
Robin Voetter
2025-12-13 15:33:41 +01:00
committed by GitHub
parent 9707ddb444
commit 6219b12730
17 changed files with 1660 additions and 57 deletions

View File

@@ -12,19 +12,21 @@ This project is a prototype for a more general builder pattern for all of compos
## Design descriptions
- [CK Builder design description](include/ck_tile/builder/README.md)
- [CK Builder design description](include/ck_tile/builder/README.md)
- [CK Builder factory design](include/ck_tile/builder/factory/README.md)
- [CK Builder testing design](include/ck_tile/builder/testing/README.md)
## Directory Structure
- `include/ck_tile/builder/`
- `include/ck_tile/builder/`
Core builder headers and public API.
- `include/ck_tile/builder/reflect`
Reflection mechanism.
- `include/ck_tile/builder/factory`
Compile-time dispatch from builder descriptors to our exisitng specialized convolution kernel implementations.
- `test/`
- `test/`
Unit tests and example usage of the builder pattern.
- `CMakeLists.txt`
- `CMakeLists.txt`
CMake configuration for building the experimental builder and its tests.
## CMake Configuration

View File

@@ -0,0 +1,347 @@
# CK-Builder Testing Utilities
This directory contains testing utilities designed to simplify the process of writing unit tests for GPU kernels built with `ck_tile::builder`. These utilities enable a clean, expressive **Given-When-Then** (Given-When-Then) testing pattern that separates test setup, execution, and validation.
See the [main builder documentation](../README.md) for an overview of the CK-Builder API components.
## Overview
Testing GPU kernels typically involves significant boilerplate: allocating device memory, initializing test data, launching kernels, and validating results. The utilities in this directory abstract away these repetitive tasks, allowing you to focus on defining test cases and verifying correctness.
The core components are:
- **`Args`**: A struct template that holds runtime parameters for a specific test case.
- **`Input`** and **`Output`**: Helper classes that groups operation inputs and outputs.
- **`Validator`**: A utility that performs on-GPU validation and integrates with GoogleTest/GoogleMock.
Together, these components enable a structured approach to kernel testing that mirrors the Given-When-Then pattern commonly used in behavior-driven development.
## The Given-When-Then Testing Pattern
The Given-When-Then pattern organizes tests into three distinct phases:
1. **Given**: Set up the preconditions and test data
2. **When**: Execute the action being tested
3. **Then**: Verify the expected outcome
This structure makes tests easier to read, write, and maintain. Each phase has a clear purpose, and the testing utilities are designed to support this workflow.
### Given: Defining the Test Case
The "Given" phase establishes the context for your test. This includes both the compile-time characteristics of the kernel and the runtime parameters for the specific test case.
#### Operation Signature
The "signature" defines the **mathematical contract** that the kernel must satisfy. It specifies compile-time properties such as:
- Spatial dimensionality (1D, 2D, or 3D)
- Convolution direction (Forward, Backward Data, Backward Weight)
- Tensor memory layout (e.g., NHWC, NCHW)
- Data types (FP32, FP16, BF16, etc.)
- Fused element-wise operations (e.g., Bias, ReLU)
The format of the signature struct is enforced at compile time using C++20 concepts by the CK-Builder API, ensuring type safety and enabling compile-time optimizations. The design of these concepts and the required constraints are discussed in the [CK Builder design description](../include/ck_tile/builder/README.md).
```cpp
// Define our custom signature struct.
struct ConvSignature {
int spatial_dim = 2;
ck_tile::builder::ConvDirection direction =
ck_tile::builder::ConvDirection::FORWARD;
ck_tile::builder::GroupConvLayout2D layout =
ck_tile::builder::GroupConvLayout2D::NHWGC_GKYXC_NHWGK;
ck_tile::builder::DataType data_type =
ck_tile::builder::DataType::FP16;
ck_tile::builder::ElementwiseOperation elementwise_operation =
ck_tile::builder::ElementwiseOperation::NONE;
};
// Double-check that out structure is well-defined according to the CK-Builder API.
static_assert(ck_tile::builder::ConvSignatureDescriptor<ConvSignature>);
// Instantiate the signature with a configuration. These values are again checked
// by the CK-Builder API when a device operation is built.
constexpr auto SIGNATURE = ConvSignature{
.spatial_dim = 2,
.direction = ck_tile::builder::ConvDirection::FORWARD,
.layout = ck_tile::builder::GroupConvLayout2D::NHWGC_GKYXC_NHWGK,
.data_type = ck_tile::builder::DataType::FP16,
.elementwise_operation = ck_tile::builder::ElementwiseOperation::NONE,
};
```
#### Run-time Arguments
The `Args` struct template provides the **runtime parameters** for your test case. It is parameterized by the `SIGNATURE` and contains fields for tensor dimensions, strides, dilations, and other dynamic properties. Note that the exact parameters required for each `Args` depends on the `SIGNATURE`: For example, a `SIGNATURE` that represents a forward convolution requires specifying the number of batches, groups, input- and output-channels, filter dimensions, filter strides, and so on. A `SIGNATURE` that represents a simple GEMM operation may instead require only the dimensions of the A-, B- and C-matrices.
```cpp
ck_tile::builder::test::Args<SIGNATURE> args = {
.lengths = {
.batch_size = 128,
.groups = 1,
.input_channels = 64,
.output_channels = 128,
.image = {.height = 56, .width = 56},
.filter = {.height = 3, .width = 3},
},
.filter_strides = {.height = 1, .width = 1},
.filter_dilation = {.height = 1, .width = 1},
.input_left_pad = {.width = 1, .height = 1},
.input_right_pad = {.width = 1, .height = 1},
};
```
#### Tensor Memory Management
Tensor memory is passed using the `Inputs<SIGNATURE>` and `Outputs<SIGNATURE>` structures. These group all inputs and outputs for an operation. Note that these structures do not "own" the memory inside: They only logically group the inputs so that they can be passed as a common type. The amount of inputs and outputs may differ depending on the `SIGNATURE`, and this avoids having to pass additional values and accept additional parameters in those situations.
The exact fields in `Inputs` and `Outputs` depend again on the particular `SIGNATURE` that they are constructed with. In general, these structures are intended to be freely constructible from external data and only serve to group relevant information. Automatic memory management can be performed using the `UniqueInputs<SIGNATURE>` and `UniqueOutputs<SIGNATURE>` structures instead. The `alloc_inputs` and `alloc_outputs` functions are used to initialize these types: They take an `Args` structure and allocate the appropriate amounts of memory. `.get()` is used to return an instance of the appropriate `Input` or `Output`.
```cpp
auto inputs = ck_tile::builder::test::allocate_inputs(args);
auto outputs = ck_tile::builder::test::allocate_outputs(args);
```
Note that these functions merely _allocate_ memory: After allocation, the memory is still uninitialized.
#### Tensor Memory Initialization
Operation inputs can be initialized by using `ck_tile::builder::test::init_inputs()`. Crucially, this operation accepts _all_ inputs, as well as the `args` structure. This is because initializing tensor memory is a context-dependent operation: We need to understand the operation in detail in order to generate inputs which do not overflow, do not generate NaNs or all zeros, etc. Passing the `args` allows `init_inputs` to generate a good test for the operation at hand.
### When: Executing the Kernel
The "When" phase is where the kernel to be tested is actually executed. This involves selecting an algorithm and using the `Builder` to generate the kernel.
#### Operation Algorithm
The "algorithm" defines the **implementation strategy** for the kernel. It specifies low-level details such as:
- Thread block dimensions and tile sizes
- GEMM implementation (XDL or WMMA)
- Data transfer vectorization
- Pipeline scheduling
As with the signature struct, the format of the algorithm struct is enforced at compile time using C++20 concepts by the CK-Builder API. The design of these concepts and the required constraints are discussed in the [CK Builder factory design description](../include/ck_tile/builder/factory/README.md).
```cpp
// Define our custom algorithm struct.
struct ConvAlgorithm {
// Thread block configuration
ThreadBlock thread_block;
// Gridwise GEMM configuration
GridwiseXdlGemm gridwise_gemm;
// Block transfer configuration
Transfer transfer;
// Additional tuning parameters
// ...
};
// Double-check that our algorithm is well-defined according to the CK-Builder API.
static_assert(ck_tile::builder::ConvAlgorithmDescriptor<ConvAlgorithm>);
// Instantiate the algorithm with a configuration. Like with the signature struct
// the CK-Builder API will check that the values are correct when a device
// operation is built.
constexpr auto ALGORITHM = ConvAlgorithm{
.thread_block = /* ... */;
.gridwise_gem = /* ... */;
.transfer = /* ... */;
// ...
};
```
#### Building the Kernel
The `Builder` combines the signature (what to compute) with the algorithm (how to compute it) to generate a kernel type which represents the operation. The implementation details, including invocation method, depend on the particular signature and algorithm.
```cpp
using Conv = ck_tile::builder::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
auto conv = Conv{};
```
#### Invoking the Kernel
After creating the kernel instance, it can be invoked by passing the instance, the arguments, the inputs, and the outputs to `run()`. This operation writes results into the buffers in `outputs`.
```cpp
ck_tile::builder::test::run(conv, args, inputs.get(), outputs.get());
```
### Then: Verifying the Results
The "Then" phase validates that the kernel produced the expected output. This is done by running a reference kernel and comparing the results.
#### Building the Reference Kernel
The reference kernel is just another kernel instance of the builder, one that's been externally verified to produce the correct results. As this kernel is also running on the GPU, we can use it to perform tests far more quickly than when comparing the outputs to a CPU-based reference implementation.
In order to obtain an instance of the reference kernel, the correct `ALGORITHM` needs to be passed to the `Builder`.
```cpp
struct ReferenceAlgorithm {
ck_tile::builder::ConvAlgorithmSpecialization specialization;
};
static_assert(ck_tile::builder::ConvAlgorithmDescriptor<ReferenceAlgorithm>);
constexpr auto REFERENCE_ALGORITHM = ReferenceAlgorithm{
.specialization = ck_tile::builder::ConvAlgorithmSpecialization::REFERENCE;
};
using ReferenceConv = ck_tile::builder::ConvBuilder<SIGNATURE, REFERENCE_ALGORITHM>::Instance;
auto reference_conv = ReferenceConv{};
```
This instance can then be invoked using `ck_tile::builder::test::run()`, the same as the kernel to be tested. Note that another instance of the `Outputs` structure needs to be passed here in order to store the results.
```cpp
auto reference_outputs = ck_tile::builder::test::allocate_outputs(args);
ck_tile::builder::test::run(conv, args, inputs.get(), reference_outputs.get());
```
#### `Validator<SIGNATURE>`
The `Validator` class encapsulates the validation logic. It performs on-GPU correctness checks by comparing two instances of the `Outputs` structure.
```cpp
ck_tile::builder::test::Validator<SIGNATURE> validator(outputs.get(), reference_outputs.get());
```
The `Validator` provides methods that return GoogleMock matchers, enabling clean integration with GoogleTest:
```cpp
EXPECT_THAT(validator.result(), validator.matches_reference_output());
```
The `matches_reference_output()` matcher checks that the output is numerically correct within acceptable tolerances. The `Validator` can also provide more detailed diagnostics, such as:
- Maximum absolute error
- Maximum relative error
- Number of mismatched elements
- Specific locations of errors
## Complete Example
Here's a complete test that demonstrates the Given-When-Then pattern:
```cpp
#include <gtest/gtest.h>
#include "ck_tile/builder/conv_signature_concepts.hpp"
#include "ck_tile/builder/conv_algorithm_concepts.hpp"
#include "ck_tile/builder/conv_builder.hpp"
#include "ck_tile/testing/tensor_memory_manager.hpp"
#include "ck_tile/testing/validator.hpp"
// Define the convolution signature
struct ConvSignature {
int spatial_dim = 2;
ck_tile::builder::ConvDirection direction =
ck_tile::builder::ConvDirection::FORWARD;
ck_tile::builder::GroupConvLayout2D layout =
ck_tile::builder::GroupConvLayout2D::NHWGC_GKYXC_NHWGK;
ck_tile::builder::DataType data_type =
ck_tile::builder::DataType::FP16;
ck_tile::builder::ElementwiseOperation elementwise_operation =
ck_tile::builder::ElementwiseOperation::NONE;
};
static_assert(ck_tile::builder::ConvSignatureDescriptor<ConvSignature>);
constexpr auto SIGNATURE = ConvSignature{
.spatial_dim = 2,
.direction = ck_tile::builder::ConvDirection::FORWARD,
.layout = ck_tile::builder::GroupConvLayout2D::NHWGC_GKYXC_NHWGK,
.data_type = ck_tile::builder::DataType::FP16,
.elementwise_operation = ck_tile::builder::ElementwiseOperation::NONE,
};
// Define the convolution algorithm
struct ConvAlgorithm {
// Algorithm configuration details...
// (Omitted for brevity)
};
static_assert(ck_tile::builder::ConvAlgorithmDescriptor<ConvAlgorithm>);
constexpr auto ALGORITHM = ConvAlgorithm{/* ... */};
// Define the reference convolution algorithm
struct ReferenceAlgorithm {
ck_tile::builder::ConvAlgorithmSpecialization specialization;
};
static_assert(ck_tile::builder::ConvAlgorithmDescriptor<ReferenceAlgorithm>);
constexpr auto REFERENCE_ALGORITHM = ReferenceAlgorithm{
.specialization = ck_tile::builder::ConvAlgorithmSpecialization::REFERENCE;
};
// The actual test
TEST(ConvolutionTest, Forward2D_FP16) {
// ===== GIVEN: Set up the test case =====
// Define runtime parameters
ck_tile::builder::test::Args<ConvSignature> args = {
.lengths = {
.batch_size = 128,
.groups = 1,
.input_channels = 64,
.output_channels = 128,
.image = {.height = 56, .width = 56},
.filter = {.height = 3, .width = 3},
},
.filter_strides = {.height = 1, .width = 1},
.filter_dilation = {.height = 1, .width = 1},
.input_left_pad = {.width = 1, .height = 1},
.input_right_pad = {.width = 1, .height = 1},
};
// Allocate GPU memory
auto inputs = ck_tile::builder::test::allocate_inputs(args);
auto outputs = ck_tile::builder::test::allocate_outputs(args);
auto reference_outputs = ck_tile::builder::test::allocate_outputs(args);
// Initialize inputs
ck_tile::builder::test::init_inputs(args, inputs);
// ===== WHEN: Execute the kernel =====
// Build the kernel
using Conv = ck_tile::builder::ConvBuilder<SIGNATURE, ALGORITHM>::Instance;
auto conv = Conv{};
// Compute actual results
ck_tile::builder::test::run(conv, args, inputs.get(), outputs.get());
// ===== THEN: Verify the results =====
// Build the reference kernel
using ReferenceConv = ck_tile::builder::ConvBuilder<SIGNATURE, REFERENCE_ALGORITHM>::Instance;
auto reference_conv = ReferenceConv{};
// Compute reference results
ck_tile::builder::test::run(conv, args, inputs.get(), reference_outputs.get());
// Check the results
ck_tile::builder::test::Validator<SIGNATURE> validator(outputs.get(), reference_outputs.get());
EXPECT_THAT(validator.result(), validator.is_ok());
}
```
## Benefits of This Approach
1. **Clarity**: The Given-When-Then structure makes tests self-documenting. Each phase has a clear purpose.
2. **Reduced Boilerplate**: The utilities handle memory management, initialization, and validation, eliminating repetitive code.
3. **Type Safety**: The use of C++20 concepts ensures that signatures and algorithms are well-formed at compile time.
4. **Flexibility**: The `Args` struct can be easily extended to support different test scenarios, `Inputs` and `Outputs` can be modified to support additional tensors where necessary, and alternatives to `init_inputs()` can be provided to support additional testing strategies.
5. **Integration**: The `Validator` integrates seamlessly with GoogleTest/GoogleMock, providing familiar assertion syntax.
6. **Maintainability**: Changes to the testing infrastructure are localized to the utility classes, not scattered across individual tests.
## Future Enhancements
Potential improvements to the testing utilities include:
- Performance benchmarking utilities
- Automatic test case generation from parameter ranges
- Enhanced error reporting with visual diffs
- Support for multi-GPU testing scenarios

View File

@@ -0,0 +1,256 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#pragma once
#include "ck_tile/builder/conv_signature_concepts.hpp"
#include "ck_tile/builder/factory/helpers/ck/conv_tensor_layout.hpp"
#include "ck_tile/builder/factory/helpers/ck/conv_elementwise_op.hpp"
#include "ck_tile/builder/testing/testing.hpp"
#include "ck_tile/builder/testing/extent.hpp"
#include "ck_tile/builder/testing/tensor_buffer.hpp"
#include "ck/library/utility/convolution_parameter.hpp"
#include "ck/library/utility/convolution_host_tensor_descriptor_helper.hpp"
/// This file implements common functionality for invoking/testing grouped
/// forward convolutions created through the CK Builder API. The main item
/// of it is the ConvArgs structure - which contains a complete description
/// of a convolution operation.
///
/// It is not intended that this file contains implementation details for
/// actually launching a convolution operation. As this can be done
/// through different APIs depending on the kernel (CK, CK Tile, or a
/// reference implementation), the code dealing with that is split out
/// into a separate header for each implementation.
namespace ck_tile::builder::test {
/// @brief Convolution tensor dimensions.
///
/// This structure is used to describe lengths of a convolution problem. In
/// fact, this structure is a complete description of ALL inputs and outputs
/// lengths of a convolution problem, as this structure contains all of the
/// combined parameters. Note that we can't also use this structure to describe
/// tensor strides: whereas the lengths are all governed by a common set of
/// parameters, strides of the input, weight, and output tensor are all
/// independent.
template <int SPATIAL_DIM>
struct ConvTensorLengths
{
size_t batch_size = 1; // N
size_t groups = 1; // G
size_t input_channels = 1; // C
size_t output_channels = 1; // K
Extent<SPATIAL_DIM> image = {}; // W, H, D
Extent<SPATIAL_DIM> filter = {}; // X, Y, Z
};
/// @brief `Args` specialization for forward convolution.
///
/// @tparam SIGNATURE Forward convolution signature.
///
/// @see Args
template <auto SIGNATURE>
requires ValidConvSignature<SIGNATURE> && ConvDirectionIsForward<SIGNATURE>
struct Args<SIGNATURE>
{
constexpr static auto SPATIAL_DIM = SIGNATURE.spatial_dim;
constexpr static auto INPUT_TYPE = SIGNATURE.data_type;
constexpr static auto WEIGHT_TYPE = SIGNATURE.data_type;
constexpr static auto OUTPUT_TYPE = SIGNATURE.data_type;
// TODO: We shouldn't need to call into an internal namespace here.
using Ops = factory::internal::ElementwiseOps<SIGNATURE>;
// TODO: We shouldn't need to call into an internal namespace here.
using Layouts =
factory::internal::ConvTensorLayouts<SIGNATURE, SPATIAL_DIM, ConvDirection::FORWARD>;
ConvTensorLengths<SPATIAL_DIM> lengths;
// TODO: Tensor strides. This needs a new structure as well as some
// reworking of the make_*_descriptor() functions, as the current
// implementation (based on ConvParam in old CK / CK Tile) does not
// support strides at all.
Extent<SPATIAL_DIM> filter_strides;
Extent<SPATIAL_DIM> filter_dilation;
Extent<SPATIAL_DIM> input_left_pad;
Extent<SPATIAL_DIM> input_right_pad;
Ops::AElementwiseOp a_elementwise_op;
Ops::BElementwiseOp b_elementwise_op;
Ops::CDEElementwiseOp cde_elementwise_op;
/// This function returns the `TensorDescriptor` corresponding to
/// the input-tensor of the convolution problem. This can then
/// be used to, for example, allocate memory.
TensorDescriptor<INPUT_TYPE> make_input_descriptor() const
{
// TODO: We're using old CK functionality to compute the right
// values here, mainly because CK tile does not support the
// right tensor layouts here. We should probably change that
// because CK currently prints an annoying message about it,
// plus that would let us get rid of the `to_ck_conv_param()`
// function.
const auto param = to_ck_conv_param();
const auto desc = ck::utils::conv::make_input_host_tensor_descriptor_g_n_c_wis_packed<
typename Layouts::ALayout>(param);
return TensorDescriptor<INPUT_TYPE>(desc.GetLengths(), desc.GetStrides());
}
/// This function returns the `TensorDescriptor` corresponding to
/// the weight-tensor of the convolution problem. This can then
/// be used to, for example, allocate memory.
TensorDescriptor<WEIGHT_TYPE> make_weight_descriptor() const
{
// See note in implementation of `make_input_descriptor`.
const auto param = to_ck_conv_param();
const auto desc = ck::utils::conv::make_weight_host_tensor_descriptor_g_k_c_xs_packed<
typename Layouts::BLayout>(param);
return TensorDescriptor<WEIGHT_TYPE>(desc.GetLengths(), desc.GetStrides());
}
/// This function returns the `TensorDescriptor` corresponding to
/// the output-tensor of the convolution problem. This can then
/// be used to, for example, allocate memory.
TensorDescriptor<OUTPUT_TYPE> make_output_descriptor() const
{
// See note in implementation of `make_input_descriptor`.
const auto param = to_ck_conv_param();
const auto desc = ck::utils::conv::make_output_host_tensor_descriptor_g_n_k_wos_packed<
typename Layouts::ELayout>(param);
return TensorDescriptor<OUTPUT_TYPE>(desc.GetLengths(), desc.GetStrides());
}
/// Convert the Args structure into a CK conv_param structure. This
/// function is mainly used to be able to use the existing
/// CK-functionality to obtain tensor descriptors.
ck::utils::conv::ConvParam to_ck_conv_param() const
{
const auto to_vector = [](const auto& extent) {
if constexpr(SPATIAL_DIM == 1)
return std::vector<ck::index_t>{ck::index_t(extent.width)};
else if constexpr(SPATIAL_DIM == 2)
return std::vector<ck::index_t>{ck::index_t(extent.height),
ck::index_t(extent.width)};
else
return std::vector<ck::index_t>{ck::index_t(extent.depth),
ck::index_t(extent.height),
ck::index_t(extent.width)};
};
return ck::utils::conv::ConvParam(SPATIAL_DIM,
this->lengths.groups,
this->lengths.batch_size,
this->lengths.output_channels,
this->lengths.input_channels,
to_vector(this->lengths.filter),
to_vector(this->lengths.image),
to_vector(this->filter_strides),
to_vector(this->filter_dilation),
to_vector(this->input_left_pad),
to_vector(this->input_right_pad));
}
};
/// @brief `Inputs` specialization for forward convolution.
///
/// @tparam SIGNATURE Forward convolution signature.
///
/// @see Inputs
template <auto SIGNATURE>
requires ValidConvSignature<SIGNATURE> && ConvDirectionIsForward<SIGNATURE>
struct Inputs<SIGNATURE>
{
void* input;
void* weight;
};
/// @brief `Outputs` specialization for forward convolution.
///
/// @tparam SIGNATURE Forward convolution signature.
///
/// @see Outputs
template <auto SIGNATURE>
requires ValidConvSignature<SIGNATURE> && ConvDirectionIsForward<SIGNATURE>
struct Outputs<SIGNATURE>
{
void* output;
};
/// @brief `UniqueInputs` specialization for forward convolution.
///
/// @tparam SIGNATURE Forward convolution signature.
///
/// @see UniqueInputs
/// @see ValidUniqueInputs
template <auto SIGNATURE>
requires ValidConvSignature<SIGNATURE> && ConvDirectionIsForward<SIGNATURE>
struct UniqueInputs<SIGNATURE>
{
DeviceBuffer input_buf;
DeviceBuffer weight_buf;
/// @see ValidUniqueInputs
Inputs<SIGNATURE> get()
{
return {
.input = input_buf.get(),
.weight = weight_buf.get(),
};
}
};
/// @brief `UniqueOutputs` specialization for forward convolution.
///
/// @tparam SIGNATURE Forward convolution signature.
///
/// @see UniqueOutputs
/// @see ValidUniqueOutputs
template <auto SIGNATURE>
requires ValidConvSignature<SIGNATURE> && ConvDirectionIsForward<SIGNATURE>
struct UniqueOutputs<SIGNATURE>
{
DeviceBuffer output_buf;
/// @see ValidUniqueOutputs
Outputs<SIGNATURE> get()
{
return {
.output = output_buf.get(),
};
}
};
/// @brief `alloc_inputs()` specialization for forward convolution.
///
/// @tparam SIGNATURE Forward convolution signature.
///
/// @see alloc_inputs()
template <auto SIGNATURE>
requires ValidConvSignature<SIGNATURE> && ConvDirectionIsForward<SIGNATURE> &&
ValidUniqueInputs<SIGNATURE>
UniqueInputs<SIGNATURE> alloc_inputs(const Args<SIGNATURE>& args)
{
return {
.input_buf = alloc_tensor_buffer(args.make_input_descriptor()),
.weight_buf = alloc_tensor_buffer(args.make_weight_descriptor()),
};
}
/// @brief `alloc_outputs()` specialization for forward convolution.
///
/// @tparam SIGNATURE Forward convolution signature.
///
/// @see alloc_outputs()
template <auto SIGNATURE>
requires ValidConvSignature<SIGNATURE> && ConvDirectionIsForward<SIGNATURE> &&
ValidUniqueOutputs<SIGNATURE>
UniqueOutputs<SIGNATURE> alloc_outputs(const Args<SIGNATURE>& args)
{
return {
.output_buf = alloc_tensor_buffer(args.make_output_descriptor()),
};
}
} // namespace ck_tile::builder::test

View File

@@ -0,0 +1,102 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#pragma once
#include <span>
#include <cstddef>
#include "ck_tile/builder/testing/conv_fwd.hpp"
/// This file contains the implementation details for invoking/testing
/// grouped convolution operations in old CK. The main item is the
/// `run()` function, which is the main implementation used to invoke
/// CK grouped forward convolution kernels.
namespace ck_tile::builder::test {
/// @brief Concept for checking whether a convolution is invoked like old CK.
///
/// This concept is used to tell whether a convolution implementation is
/// likely to be an "old CK" implementation - that is, whether we should
/// invoke it as an old CK kernel. This is mainly used with `run()` to
/// differentiate which implementation that should be invoked.
///
/// - SIGNATURE is the operation signature.
/// - Conv is a convolution instance created by the CK Builder API.
template <auto SIGNATURE, typename Conv>
concept IsCkConvInstance =
// TODO: This should be implemented by converting the signature into the
// type parameters for DeviceGroupedConvFwdMultipleABD. For now, just leave
// it empty. Improve when needed, you get the point. Also we should probably
// move this to the ck conv factory helper.
true;
/// @brief `run()` specialization for forward convolution and old CK.
///
/// @tparam SIGNATURE Forward convolution signature.
/// @throws std::runtime_error if the arguments werent actually valid for the
/// operation. This should be caught and reported by the testing framework.
///
/// @see run()
template <auto SIGNATURE, typename Conv>
requires ValidConvSignature<SIGNATURE> && ConvDirectionIsForward<SIGNATURE> &&
IsCkConvInstance<SIGNATURE, Conv>
void run(Conv& conv,
const Args<SIGNATURE>& args,
const Inputs<SIGNATURE>& inputs,
const Outputs<SIGNATURE>& outputs)
{
constexpr auto spatial_dim = SIGNATURE.spatial_dim;
const auto copy = [](const auto& src, auto& dst) {
std::copy(src.begin(), src.end(), dst.begin());
};
const auto to_ck_lengths = [&](const auto& src) {
std::array<ck::index_t, spatial_dim + 3> result;
copy(src, result);
return result;
};
const auto to_ck_extent = [&](const auto& extent) {
std::array<ck::index_t, spatial_dim> result;
copy(extent, result);
return result;
};
const auto param = args.to_ck_conv_param();
const auto input_desc = args.make_input_descriptor();
const auto weight_desc = args.make_weight_descriptor();
const auto output_desc = args.make_output_descriptor();
auto ck_args = conv.MakeArgument(inputs.input,
inputs.weight,
{},
outputs.output,
to_ck_lengths(input_desc.get_lengths()),
to_ck_lengths(input_desc.get_strides()),
to_ck_lengths(weight_desc.get_lengths()),
to_ck_lengths(weight_desc.get_strides()),
{},
{},
to_ck_lengths(output_desc.get_lengths()),
to_ck_lengths(output_desc.get_strides()),
to_ck_extent(param.conv_filter_strides_),
to_ck_extent(param.conv_filter_dilations_),
to_ck_extent(param.input_left_pads_),
to_ck_extent(param.input_right_pads_),
args.a_elementwise_op,
args.b_elementwise_op,
args.cde_elementwise_op);
if(!conv.IsSupportedArgument(ck_args))
{
throw std::runtime_error("invalid argument");
}
conv.MakeInvoker().Run(ck_args, {});
}
} // namespace ck_tile::builder::test

View File

@@ -0,0 +1,36 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#pragma once
namespace ck_tile::builder::test {
/// This structure describes a 1-, 2-, or 3-D extent. Its used to
/// communicate 1-, 2- or 3-D sizes and strides of tensors.
/// Depending on the dimension, the structure will have the `width`,
/// `height`, and `depth` fields available.
template <int SPATIAL_DIM>
struct Extent;
template <>
struct Extent<1>
{
size_t width = 1;
};
template <>
struct Extent<2>
{
size_t width = 1;
size_t height = 1;
};
template <>
struct Extent<3>
{
size_t width = 1;
size_t height = 1;
size_t depth = 1;
};
} // namespace ck_tile::builder::test

View File

@@ -0,0 +1,212 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#pragma once
#include <stdexcept>
#include <memory>
#include <numeric>
#include <span>
#include <concepts>
#include <hip/hip_runtime.h>
#include "ck_tile/builder/conv_signature_concepts.hpp"
#include "ck_tile/builder/testing/type_traits.hpp"
#include "ck_tile/host/host_tensor.hpp"
/// This file deals with tensor memory allocation: Both the act of allocating
/// and (automatically) deallocating memory, as well as utilities for managing
/// the layout of tensor data in memory.
namespace ck_tile::builder::test {
/// @brief Automatic deleter for GPU memory.
///
/// This structure implements a C++ functor which can be used to configure
/// `std::unique_ptr` to automatically delete memory using `hipFree`.
///
/// @see DeviceBuffer
struct DeviceMemoryDeleter
{
/// @brief Deleter callback.
///
/// This function is invoked by `std::unique_ptr` when memory that the
/// pointer represents should be freed. In our implementation, we just
/// pass it directly to `hipFree`.
void operator()(std::byte* ptr) const
{
if(ptr)
(void)hipFree(ptr);
}
};
/// @brief HIP out of memory error
///
/// This is a derivation of `std::runtime_error` specialized for HIP
/// out-of-memory errors.
///
/// @see std::runtime_error
struct OutOfDeviceMemoryError : std::runtime_error
{
/// @brief Utility for formatting out-of-memory error messages
///
/// Returns a human-readable description of a HIP out-of-memory error.
///
/// @param status The status to report
static std::string format_error(hipError_t status)
{
return std::string("failed to allocate hip memory: ") + hipGetErrorString(status) + " (" +
std::to_string(status) + ")";
}
/// @brief Construct an out-of-memory error using `status` as message.
///
/// @param status A HIP error status that was encountered while allocating memory.
OutOfDeviceMemoryError(hipError_t status) : std::runtime_error(format_error(status)) {}
};
/// @brief Automatically managed GPU memory.
///
/// The `DeviceBuffer` is an automatically managed pointer for GPU memory. When
/// adopting a device pointer into a `DeviceBuffer`, it will automatically be
/// free'd when the pointer goes out of scope. Memory can be allocated directly
/// into a `DeviceBuffer` using `alloc_buffer()` or `alloc_tensor_buffer()`.
///
/// Since this type is just an alias of `std::unique_ptr`, you can use that type's
/// functionality to manage memory further, such as `.reset()` to release the
/// memory.
///
/// @see alloc_buffer()
/// @see alloc_tensor_buffer()
using DeviceBuffer = std::unique_ptr<std::byte[], DeviceMemoryDeleter>;
/// @brief Allocate automatically managed GPU memory.
///
/// This function essentially acts like a managed version of hipMalloc -
/// allocating GPU memory on the currently active device - except that this
/// version returns an automatically managed pointer.
///
/// @param size The amount of memory to allocate in bytes.
/// @throws OutOfDeviceMemoryError if memory allocation failed.
///
/// @see DeviceBuffer
/// @see OutOfDeviceMemoryError
/// @see hipMalloc()
inline DeviceBuffer alloc_buffer(size_t size)
{
std::byte* d_buf = nullptr;
if(const auto status = hipMalloc(&d_buf, size); status != hipSuccess)
{
throw OutOfDeviceMemoryError(status);
}
return DeviceBuffer(d_buf);
}
/// @brief Type managing tensor data layout in memory.
///
/// This structure describes a tensor in memory. It does not actually hold any
/// reference to memory, it just describes how the memory should be laid out if it
/// were.
///
/// @note This type is very much like ck_tile::HostTensorDescriptor, except that it
/// also includes the data type of the elements of htis tensor. This is mainly to
/// make the descriptor a _complete_ description of a tensor rather than just the
/// dimensions in strides, which helps in reducing clutter in uses of this type.
///
/// @note All strides are still in _elements_.
///
/// @tparam DT The conceptual data type of the tensor elements. This need not be the
/// type that the data is actually stored as in memory.
template <DataType DT>
struct TensorDescriptor
{
// For now, the implementation of this type is based on
// `ck_tile::HostTensorDescriptor`, so that we can prototype without
// reimplementing the `HostTensorDescriptor` for the 3rd time. You can regard
// the use of `ck_tile::HostTensorDescriptor` here as an implementation detail.
/// The conceptual data type of the tensor elements. This need not be the type
/// that the data is actually stored as in memory.
constexpr static DataType data_type = DT;
/// @brief Create a tensor descriptor from lengths and strides.
///
/// @param lengths A sequence of tensor lengths, the conceptial dimensions of
/// the tensor in elements.
/// @param strides A sequence of in-memory strides of the tensor, measured in
/// elements. Each element of `strides`` corresponds to one at the same index
/// in `lengths`, the amount of elements to skip in memory to find the next
/// element along that axis.
TensorDescriptor(std::span<const size_t> lengths, std::span<const size_t> strides)
: inner_descriptor_(lengths, strides)
{
// TODO: Validation of strides? For now we just delegate the details of the
// construction to the CK Tile HostTensorDescriptor.
}
/// Query the conceptual dimensions of the tensor.
///
/// @returns A span of tensor dimensions, one for every axis. Note that the order
/// does *not* correspond with memory layout, query the in-memory strides for
/// that.
///
/// @see get_strides()
std::span<const size_t> get_lengths() const { return inner_descriptor_.get_lengths(); }
/// Query the in-memory strides of the tensor.
///
/// @returns A span of tensor dimensions, one for every axis. Each element
/// corresponds directly with the stride in elements at the same index in the
/// tensor dimensions.
///
/// @see get_lengths()
std::span<const size_t> get_strides() const { return inner_descriptor_.get_strides(); }
/// @brief Compute total tensor size in elements.
///
/// This function returns the total size of the memory backing a tensor with
/// this descriptor in *elements*, including required extra size for strides.
///
/// @see get_element_space_size_in_bytes()
size_t get_element_space_size() const { return inner_descriptor_.get_element_space_size(); }
/// @brief Compute total tensor size in bytes.
///
/// This function is like `get_element_space_size()`, except that the returned
/// value is measured in *bytes* rather than *elements*. Use this function for
/// figuring out how much memory needs to be allocated for a particular tensor.
///
/// @see get_element_space_size()
size_t get_element_space_size_in_bytes() const
{
// For now, the backing type is the naive C++-type that represents the data
// type. When we are going to support packed types such as i4 and fp6, this
// is going to become more complicated.
return get_element_space_size() * data_type_sizeof(DT);
}
private:
ck_tile::HostTensorDescriptor inner_descriptor_;
};
/// @brief Allocate automatically managed GPU memory corresponding to a tensor descriptor.
///
/// This function is similar to `alloc_buffer()`, except that the required size is
/// derived automatically from a tensor descriptor. The returned buffer is valid for
/// tensors with that layout. Strides are also taken into account when computing the
/// required size.
///
/// @tparam DT The conceptual datatype of the elements of the tensor.
/// @param descriptor A descriptor of the memory layout of the tensor to allocate.
/// @throws OutOfDeviceMemoryError if memory allocation failed.
///
/// @see TensorDescriptor
/// @see DeviceBuffer
/// @see OutOfDeviceMemoryError
/// @see hipMalloc()
template <DataType DT>
DeviceBuffer alloc_tensor_buffer(const TensorDescriptor<DT>& descriptor)
{
return alloc_buffer(descriptor.get_element_space_size_in_bytes());
}
} // namespace ck_tile::builder::test

View File

@@ -0,0 +1,260 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#pragma once
#include <concepts>
/// This file is the main header for the CK-Builder testing system. A high-level
/// description of this testing system is documented in
/// `ck_tile/builder/testing/README.md`. This file deals mainly deals with the
/// documentation of the implementation details by forward-declaring and documenting
/// the relevant types.
///
/// The intention is that the basic testing strategy (explained in the testing
/// documentation) is available for every different type of device operation. This
/// requires us to provide some implementations in two fronts: Support for the
/// Args, Inputs, Outputs, UniqueInputs, and UniqueOutputs for all SIGNATUREs which
/// are supported by CK Builder, and support for invoking the different
/// implementations returned by CK Builder, depending on the Algorithm.
///
/// Different SIGNATUREs may require different arguments and different (amounts of)
/// input/output tensors. Rather than trying to cram all this in the same structure,
/// or to provide different types, we will use dependent typing to specialize the
/// implementation for the SIGNATURE at hand. For this reason, the Args, Inputs,
/// Outputs, UniqueInputs, and UniqueOutputs structures are all parameterized by the
/// SIGNATURE. The idea is to use C++20 concepts to limit the specialization to the
/// subset of SIGNATUREs that conceptually make sense for that implementation. For
/// example, to provide an implementation of the testing framework for forward
/// convolutions, we can use a concept to check whether the SIGNATURE is a valid
/// forward convolution signature:
///
/// template <auto SIGNATURE>
/// requires ValidConvSignature<SIGNATURE> && ConvDirectionIsForward<SIGNATURE>
/// struct Args<SIGNATURE> { ... }; // Similar for the other types
///
/// Invocation of instances is another matter: The Builder may return instances from
/// either CK or CK-Tile depending on the ALGORITHM configuration. The only place
/// where this matters is the implementation of `run()`, which needs to provide a
/// custom implementation for all instances which the Builder may return, including
/// the reference implementation. The strategy is the same here: Use concepts to
/// check whether the instance returned by the builder is of a particular type, and
/// overload the `run()` function for that concept:
///
/// template <auto SIGNATURE, typename Conv>
/// requires
/// // Check that the SIGNATURE is of the type that we expect
/// ValidConvSignature<SIGNATURE> && ConvDirectionIsForward<SIGNATURE> &&
/// // Also check that the instance is of a type which we can invoke here
/// IsCkConvInstance<SIGNATURE, Conv>
/// void run(Conv& conv, ...);
///
/// Note that this is only the suggested strategy; you may also use `if constexpr`
/// or similar to dispatch the correct implementation of the instance in the
/// implementation of the `run()` function for a particular group of device
/// operations.
///
/// The remainder of this file describes the types and functions that should be
/// overloaded for a particular device operation, and in which situation.
namespace ck_tile::builder::test {
/// @brief Run-time arguments corresponding to a signature.
///
/// The `Args` structure is the main point of runtime configuration for a device
/// operation. Depending on the SIGNATURE, it is used to provide the run-time
/// parameters for a device operation, for instance, for the tensor dimensions,
/// tensor strides, parameters such as padding, split-K batch size, fused
/// element-wise operator instances, etc. In short, a complete run-time
/// configuration of the tensor operation at hand.
///
/// This structure does not require additional member functions, any which are
/// provided should be considered implementation details of Args structure for
/// that particular SIGNATURE.
///
/// @note A good indicator of the fields necessary here are the values that should
/// be passed to the CK `MakeArgument()` function or CK-Tile `HostArgs` structure
/// of the device operation that you are trying to implement. It is the intention
/// that this structure is an aggregrate so that it can be initialized using C++20
/// designated initializers to keep the tests readable.
///
/// @tparam SIGNATURE the signature to specialize the structure for.
template <auto SIGNATURE>
struct Args;
/// @brief Non-owning input collection corresponding to a signature.
///
/// The `Input` structure represents the collection of input tensor data on the
/// device, associated to a particular SIGNATURE. The exact fields in this structure
/// may again depend on the exact SIGNATURE. This structure is non-owning: its use
/// is intended as a way to pass all inputs around as a single value.
///
/// This structure does not require additional member functions, any which are
/// provided should be considered implementation details of Args structure for
/// that particular SIGNATURE.
///
/// @note The implementation can just be a set of void-pointers which conceptually
/// represent the inputs of the device operation. It is the intention that this
/// structure is an aggregrate so that it can be initialized using C++20
/// designated initializers to keep the tests readable.
///
/// @tparam SIGNATURE the signature to specialize the structure for.
template <auto SIGNATURE>
struct Inputs;
/// @brief Non-owning outputs collection corresponding to a signature.
///
/// The `Output` structure represents the collection of input tensor data on the
/// device, associated to a particular SIGNATURE. The exact fields in this structure
/// may again depend on the exact SIGNATURE. This structure is non-owning: its use
/// is intended as a way to pass all outputs around as a single value.
///
/// This structure does not require additional member functions, any which are
/// provided should be considered implementation details of Args structure for
/// that particular SIGNATURE.
///
/// @note The implementation can just be a set of void-pointers which conceptually
/// represent the outputs of the device operation. It is the intention that this
/// structure is an aggregrate so that it can be initialized using C++20
/// designated initializers to keep the tests readable.
///
/// @tparam SIGNATURE the signature to specialize the structure for.
template <auto SIGNATURE>
struct Outputs;
/// @brief RAII-enabled inputs collection corresponding to a signature.
///
/// The `UniqueInputs` is used to automatically manage the memory of a set of
/// inputs. Unlike the corresponding `Inputs` structure, the implementation is
/// opaque; the only requirements for this structure is that an instance can
/// be created using `alloc_inputs()` and that an instance of the corresponding
/// `Inputs` structure can be obtained using `.get()`.
///
/// @note The easiest way to implement this type is to use the `DeviceBuffer`
/// type to allocate individual device buffers for each input tensor.
///
/// @tparam SIGNATURE the signature to specialize the structure for.
///
/// @see alloc_inputs()
/// @see ValidUniqueInputs
/// @see DeviceBuffer
template <auto SIGNATURE>
struct UniqueInputs;
/// @brief RAII-enabled outputs collection corresponding to a signature.
///
/// The `UniqueOutputs` is used to automatically manage the memory of a set of
/// outputs. Unlike the corresponding `Outputs` structure, the implementation is
/// opaque; the only requirements for this structure is that an instance can
/// be created using `alloc_outputs()` and that an instance of the corresponding
/// `Outputs` structure can be obtained using `.get()`.
///
/// @note The easiest way to implement this type is to use the `DeviceBuffer`
/// type to allocate individual device buffers for each output tensor.
///
/// @tparam SIGNATURE the signature to specialize the structure for.
///
/// @see alloc_outputs()
/// @see ValidUniqueOutputs
/// @see DeviceBuffer
template <auto SIGNATURE>
struct UniqueOutputs;
/// @brief Concept to check the validity of `UniqueInputs`.
///
/// The `ValidUniqueInputs` concept can be used to check whether the definition
/// of `UniqueInputs` is valid for a particular SIGNATURE.
///
/// - SIGNATURE is signature to specialize the structure for.
///
/// @see UniqueInputs
template <auto SIGNATURE>
concept ValidUniqueInputs = requires(UniqueInputs<SIGNATURE>& inputs) {
/// `.get()` is used to obtain a non-owning version of the `Inputs` collection.
{ inputs.get() } -> std::convertible_to<Inputs<SIGNATURE>>;
};
/// @brief Concept to check the validity of `UniqueOutputs`.
///
/// The `ValidUniqueOutputs` concept can be used to check whether the definition
/// of `UniqueOutputs` is valid for a particular SIGNATURE.
///
/// - SIGNATURE is signature to specialize the structure for.
///
/// @see UniqueOutputs
template <auto SIGNATURE>
concept ValidUniqueOutputs = requires(UniqueOutputs<SIGNATURE>& inputs) {
/// `.get()` is used to obtain a non-owning version of the `Outputs` collection.
{ inputs.get() } -> std::convertible_to<Outputs<SIGNATURE>>;
};
/// @brief Allocate inputs corresponding to a signature.
///
/// The `alloc_inputs()` function is used to create an instance of
/// `UniqueInputs`. This function uses the `args` structure to compute the
/// amount of memory required and then allocate it on the device, for example
/// using `alloc_buffer` or `alloc_tensor_buffer`.
///
/// @tparam SIGNATURE the signature to specialize the structure for.
///
/// @see Inputs
/// @see UniqueInputs
/// @see alloc_buffer()
/// @see alloc_tensor_buffer()
template <auto SIGNATURE>
requires ValidUniqueInputs<SIGNATURE>
UniqueInputs<SIGNATURE> alloc_inputs(const Args<SIGNATURE>& args);
/// @brief Allocate outputs corresponding to a signature.
///
/// The `alloc_outputs()` function is used to create an instance of
/// `UniqueOutputs`. This function uses the `args` structure to compute the
/// amount of memory required and then allocate it on the device, for example
/// using `alloc_buffer` or `alloc_tensor_buffer`.
///
/// @tparam SIGNATURE the signature to specialize the structure for.
///
/// @see Outputs
/// @see UniqueOutputs
/// @see alloc_buffer()
/// @see alloc_tensor_buffer()
template <auto SIGNATURE>
requires ValidUniqueOutputs<SIGNATURE>
UniqueInputs<SIGNATURE> alloc_outputs(const Args<SIGNATURE>& args);
/// @brief Invoke a device operation created by CK Builder.
///
/// This is the main function used to invoke a particular device operation
/// instance created by the builder. It uses the `args`, `inputs`, and `outputs`
/// to configure the `operation` and invokes it immediately.
///
/// In practice, the `Operation` is usually a CK or CK Tile device operation
/// type, for example `DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3`.
/// This function implements the required functionality to invoke any relevant
/// type created by the builder.
///
/// @note Unlike the Args, Inputs, Outputs, and related structures, this function
/// is specialized for the different implementations that the builder may
/// return (see file-level documentation).
///
/// @pre The tensors in `inputs` should be allocated and initialized with the
/// appropriate values to perform the operation.
/// @pre The tensors in `outputs` should be allocated.
/// @post The tensors in `outputs` are overwritten with the outputs of the device
/// operation.
///
/// @tparam SIGNATURE the signature to specialize this function for
/// @tparam Operation the kernel of the operation to invoke. This type should be
/// one that is created using the Builder API.
/// @param operation An instance of the operation to invoke.
/// @param args The run-time arguments of the operation.
/// @param inputs The input tensor data. Will not be modified by this function.
/// @param outputs The output tensor data. The contents will be overwritten by
/// this function.
template <auto SIGNATURE, typename Operation>
void run(Operation& operation,
const Args<SIGNATURE>& args,
const Inputs<SIGNATURE>& inputs,
const Outputs<SIGNATURE>& outputs);
} // namespace ck_tile::builder::test

View File

@@ -0,0 +1,43 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#pragma once
#include <cstddef>
#include "ck_tile/builder/types.hpp"
/// This file implements various backend-independent traits for
/// CK-Builder types.
namespace ck_tile::builder::test {
/// @brief Query the size of a data type in memory.
///
/// This function computes the size of a variant of `DataType` in memory.
/// This is more complicated than it seems. For most types, this is just
/// the size of the equivalent C++-type, but for sub-byte type we have to
/// represent each byte by multiple values, for example. For now, we only
/// care about types which consist of an integral number of bytes, though.
///
/// @note The details of this function are likely going to change with the
/// support of sub-byte types.
///
/// @param data_type The type to query the in-memory size of.
/// @returns The number of bytes that an element of this data type requires
/// in memory.
constexpr size_t data_type_sizeof(DataType data_type)
{
switch(data_type)
{
case DataType::UNDEFINED_DATA_TYPE: return 0;
case DataType::FP32: return 4;
case DataType::FP16: return 2;
case DataType::BF16: return 2;
case DataType::FP8: return 1;
case DataType::INT32: return 4;
case DataType::I8: return 1;
case DataType::U8: return 1;
}
}
} // namespace ck_tile::builder::test

View File

@@ -78,24 +78,27 @@ add_ck_builder_test(test_ckb_conv_builder
test_fwd_instance_traits.cpp
test_bwd_data_instance_traits.cpp
test_instance_traits_util.cpp
unit_device_buffer.cpp
unit_tensor_descriptor.cpp
unit_conv_elementwise_op.cpp
unit_conv_tensor_layout.cpp
unit_conv_tensor_type.cpp
unit_conv_thread_block.cpp
unit_conv_tuning_params.cpp)
# Tests the inline diff utility used for comparing strings in tests assertions
add_ck_builder_test(test_ckb_inline_diff test_inline_diff.cpp)
unit_conv_tuning_params.cpp
unit_conv_fwd_testing.cpp)
target_link_libraries(test_ckb_conv_builder PRIVATE utility)
# Tests the inline diff utility used for comparing strings in tests assertions
add_ck_builder_test(test_ckb_inline_diff test_inline_diff.cpp)
# Tests convolution trait selection and configuration
add_ck_builder_test(test_ckb_conv_traits
conv/ck/test_conv_traits.cpp)
# Tests convolution problem description and parameter handling
add_ck_builder_test(test_ckb_conv_description
test_conv_description.cpp)
# Tests convolution trait selection and configuration
add_ck_builder_test(test_ckb_conv_traits
conv/ck/test_conv_traits.cpp)
# Tests convolution problem description and parameter handling
add_ck_builder_test(test_ckb_conv_description
test_conv_description.cpp)
################################################################################
# REGRESSION TESTS - Integration Tests (With Kernel Compilation)
################################################################################
@@ -134,8 +137,8 @@ add_ck_builder_test(test_ckb_build_fwd_instances
conv/ck/test_ckb_conv_fwd_3d_fp32.cpp
conv/ck_tile/test_ckb_conv_fwd_2d_fp16_v3.cpp
conv/ck_tile/test_ckb_conv_bwd_weight_2d_fp16_v3.cpp
conv/ck_tile/test_ckb_conv_bwd_data_2d_fp16_v3.cpp
)
conv/ck_tile/test_ckb_conv_bwd_data_2d_fp16_v3.cpp)
target_link_libraries(test_ckb_build_fwd_instances PRIVATE utility)
################################################################################

View File

@@ -4,46 +4,83 @@
#include "utils/ckb_conv_test_configs.hpp"
#include "utils/ckb_conv_test_utils.hpp"
#include "utils/conv_algorithm_type_utils.hpp"
#include "ck_tile/builder/testing/conv_fwd_ck.hpp"
#include "ck_tile/host/device_prop.hpp"
namespace {
namespace ckb = ck_tile::builder;
namespace ckt = ck_tile::builder::test;
namespace cku = ck_tile::builder::test_utils;
using namespace ck_tile::builder::test_utils;
constexpr auto SIGNATURE =
ckt::ConvSignature{.spatial_dim = 2,
.direction = ckb::ConvDirection::FORWARD,
.data_type = ckb::DataType::FP16,
.accumulation_data_type = ckb::DataType::FP32,
.input = {.config = {.layout = ckb::TensorLayout::GNHWC}},
.weight = {.config = {.layout = ckb::TensorLayout::GKYXC}},
.output = {.config = {.layout = ckb::TensorLayout::GNHWK}}};
TEST(FwdConvInstances,
Create_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3_Instance_2D_FP16_GNHWC)
constexpr auto ALGORITHM = cku::ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3{}
.with_thread_block(cku::FwdThreadBlock_256_256x256x32)
.with_gemm_config(cku::FwdGemmParams_Xdl_4x4_per_wave)
.with_transfer(cku::FwdTransfer_4x64x1)
.with_specializations(ckb::ConvFwdSpecialization::DEFAULT,
ckb::GemmSpecialization::MNKPadding)
.with_block_gemm(cku::BlockGemmDesc_v3_intrawave);
using Builder = ckb::ConvBuilder<SIGNATURE, ALGORITHM>;
using Instance = Builder::Instance;
TEST(Fwd2DFp16_CShufV3_GNHWC, Create)
{
using enum ck_tile::builder::ConvDirection;
using enum ck_tile::builder::DataType;
using enum ck_tile::builder::TensorLayout;
constexpr ConvSignature FwdConvSignature{.spatial_dim = 2,
.direction = FORWARD,
.data_type = FP16,
.accumulation_data_type = FP32,
.input = {.config = {.layout = GNHWC}},
.weight = {.config = {.layout = GKYXC}},
.output = {.config = {.layout = GNHWK}}};
constexpr auto FwdConvAlgorithm =
ConvAlgorithm_DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3{}
.with_thread_block(FwdThreadBlock_256_256x256x32)
.with_gemm_config(FwdGemmParams_Xdl_4x4_per_wave)
.with_transfer(FwdTransfer_4x64x1)
.with_specializations(ConvFwdSpecialization::FILTER_1X1_PAD0,
GemmSpecialization::MNKPadding)
.with_block_gemm(BlockGemmDesc_v3_intrawave);
using Builder = ConvBuilder<FwdConvSignature, FwdConvAlgorithm>;
const auto expected_transfer_parameters = to_string(FwdConvAlgorithm);
run_test<Builder>({"DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3",
expected_transfer_parameters,
"Filter1x1Pad0",
"Intrawave",
"v3",
"GNHWC,GKYXC,EmptyTuple,GNHWK",
"PassThrough,PassThrough,PassThrough",
"MNKPadding"});
const auto expected_transfer_parameters = to_string(ALGORITHM);
cku::run_test<Builder>({"DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle_V3",
expected_transfer_parameters,
"Default",
"Intrawave",
"v3",
"GNHWC,GKYXC,EmptyTuple,GNHWK",
"PassThrough,PassThrough,PassThrough",
"MNKPadding"});
}
} // namespace
TEST(Fwd2DFp16_CShufV3_GNHWC, EndToEnd)
{
if(!ck_tile::get_device_name().starts_with("gfx9"))
{
GTEST_SKIP() << "unsupported architecture";
}
ckt::Args<SIGNATURE> args = {
.lengths =
{
.batch_size = 16,
.groups = 1,
.input_channels = 32,
.output_channels = 48,
.image =
{
.width = 56,
.height = 64,
},
.filter =
{
.width = 3,
.height = 5,
},
},
.filter_strides = {.width = 1, .height = 1},
.filter_dilation = {.width = 1, .height = 1},
.input_left_pad = {.width = 0, .height = 0},
.input_right_pad = {.width = 0, .height = 0},
.a_elementwise_op = {},
.b_elementwise_op = {},
.cde_elementwise_op = {},
};
auto inputs = alloc_inputs(args);
auto outputs = alloc_outputs(args);
auto conv = Instance{};
ckt::run(conv, args, inputs.get(), outputs.get());
}

View File

@@ -5,6 +5,8 @@
#include "testing_utils.hpp"
using ck_tile::test::HipError;
using ck_tile::test::HipSuccess;
using ck_tile::test::InstanceMatcher;
using ck_tile::test::InstanceSet;
using ck_tile::test::StringEqWithDiff;
@@ -96,3 +98,12 @@ TEST(InstanceMatcher, ExplainMatchResult)
"Unexpected: 1\n"
"- python\n"));
}
TEST(HipStatusMatcher, Basic)
{
EXPECT_THAT(hipSuccess, HipSuccess());
EXPECT_THAT(hipErrorInvalidValue, HipError(hipErrorInvalidValue));
EXPECT_THAT(hipErrorInvalidValue, Not(HipSuccess()));
EXPECT_THAT(hipSuccess, Not(HipError(hipErrorInvalidValue)));
EXPECT_THAT(hipErrorOutOfMemory, Not(HipError(hipErrorInvalidValue)));
}

View File

@@ -11,6 +11,11 @@
#include <vector>
#include <algorithm>
std::ostream& operator<<(std::ostream& os, hipError_t status)
{
return os << hipGetErrorString(status);
}
namespace ck_tile::test {
// Wagner-Fischer Algorithm for Computing Edit Distance and Inline Diff
@@ -297,4 +302,41 @@ void InstanceMatcher::DescribeNegationTo(std::ostream* os) const
*os << "is not equal to " << expected_;
}
bool HipStatusMatcher::MatchAndExplain(hipError_t actual,
::testing::MatchResultListener* listener) const
{
(void)listener;
if(actual == expected_)
{
return true;
}
return false;
}
void HipStatusMatcher::DescribeTo(std::ostream* os) const { *os << hipGetErrorString(expected_); }
void HipStatusMatcher::DescribeNegationTo(std::ostream* os) const
{
if(expected_ == hipSuccess)
{
*os << "any error";
}
else
{
*os << "isn't equal to " << hipGetErrorString(expected_);
}
}
::testing::Matcher<hipError_t> HipSuccess()
{
return ::testing::MakeMatcher(new HipStatusMatcher(hipSuccess));
}
::testing::Matcher<hipError_t> HipError(hipError_t error)
{
return ::testing::MakeMatcher(new HipStatusMatcher(error));
}
} // namespace ck_tile::test

View File

@@ -11,6 +11,16 @@
#include <vector>
#include <array>
/// @brief ostream-overload for hipError
///
/// Google Test likes to print errors to ostream, and this provides integration
/// with that. Since we only expect to use this with CK-Builder's own tests,
/// providing this implementation seems not problematic, but if it starts to
/// clash with another implementation then we will need to provide this
/// implementation another way. Unfortunately Google Test does not have a
/// dedicated function to override to provide printing support.
std::ostream& operator<<(std::ostream& os, hipError_t status);
namespace ck_tile::test {
static bool isTerminalOutput() { return isatty(fileno(stdout)) || isatty(fileno(stderr)); }
@@ -109,4 +119,35 @@ struct InstanceMatcher : public ::testing::MatcherInterface<InstanceSet>
::testing::Matcher<InstanceSet> InstancesMatch(const InstanceSet& expected);
/// @brief Google Test hipError_t matcher.
///
/// This is a custom Google Test matcher implementation which can be used to
/// compare HIP status codes. Use `HipSuccess()` or `HipError()` to obtain
/// an instance.
///
/// @see HipSuccess
/// @see HipError
/// @see ::testing::MatcherInterface
struct HipStatusMatcher : public ::testing::MatcherInterface<hipError_t>
{
HipStatusMatcher(hipError_t expected) : expected_(expected) {}
bool MatchAndExplain(hipError_t actual,
::testing::MatchResultListener* listener) const override;
void DescribeTo(std::ostream* os) const override;
void DescribeNegationTo(std::ostream* os) const override;
hipError_t expected_;
};
/// @brief Construct a Google Test matcher that checks that a HIP operation
/// was successful.
::testing::Matcher<hipError_t> HipSuccess();
/// @brief Construct a Google Test matcher that checks that a HIP operation
/// returned a particular error code.
///
/// @param error The error to expect.
::testing::Matcher<hipError_t> HipError(hipError_t error);
} // namespace ck_tile::test

View File

@@ -0,0 +1,83 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#include "impl/conv_signature_types.hpp"
#include "testing_utils.hpp"
#include "ck_tile/builder/testing/conv_fwd.hpp"
#include <gtest/gtest.h>
#include <gmock/gmock.h>
#include <vector>
namespace ckb = ck_tile::builder;
namespace ckt = ck_tile::builder::test;
using ::testing::ElementsAreArray;
using ::testing::NotNull;
constexpr auto SIGNATURE =
ckt::ConvSignature{.spatial_dim = 2,
.direction = ckb::ConvDirection::FORWARD,
.data_type = ckb::DataType::BF16,
.accumulation_data_type = ckb::DataType::FP32,
.input = {.config = {.layout = ckb::TensorLayout::NHWGC}},
.weight = {.config = {.layout = ckb::TensorLayout::GKYXC}},
.output = {.config = {.layout = ckb::TensorLayout::NHWGK}}};
constexpr ckt::Args<SIGNATURE> ARGS = {
.lengths =
{
.batch_size = 17,
.groups = 5,
.input_channels = 13,
.output_channels = 44,
.image =
{
.width = 99,
.height = 125,
},
.filter =
{
.width = 9,
.height = 4,
},
},
.filter_strides = {.width = 1, .height = 1},
.filter_dilation = {.width = 1, .height = 1},
.input_left_pad = {.width = 0, .height = 0},
.input_right_pad = {.width = 0, .height = 0},
.a_elementwise_op = {},
.b_elementwise_op = {},
.cde_elementwise_op = {},
};
using Inputs = ckt::Inputs<SIGNATURE>;
using Outputs = ckt::Outputs<SIGNATURE>;
using UniqueInputs = ckt::UniqueInputs<SIGNATURE>;
using UniqueOutputs = ckt::UniqueOutputs<SIGNATURE>;
static_assert(ckt::ValidUniqueInputs<SIGNATURE>);
static_assert(ckt::ValidUniqueOutputs<SIGNATURE>);
TEST(ConvFwdTesting, MakeDescriptors)
{
const auto get_lengths = [](const auto& descriptor) {
const auto lengths = descriptor.get_lengths();
// Google Test cannot print std::span, so turn it into a vector for
// legibility.
return std::vector(lengths.begin(), lengths.end());
};
EXPECT_THAT(get_lengths(ARGS.make_input_descriptor()), ElementsAreArray({5, 17, 13, 125, 99}));
EXPECT_THAT(get_lengths(ARGS.make_weight_descriptor()), ElementsAreArray({5, 44, 13, 4, 9}));
EXPECT_THAT(get_lengths(ARGS.make_output_descriptor()), ElementsAreArray({5, 17, 44, 122, 91}));
}
TEST(ConvFwdTesting, Alloc)
{
auto inputs = alloc_inputs(ARGS);
auto outputs = alloc_outputs(ARGS);
EXPECT_THAT(inputs.get().input, NotNull());
EXPECT_THAT(inputs.get().weight, NotNull());
EXPECT_THAT(outputs.get().output, NotNull());
}

View File

@@ -0,0 +1,81 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#include "ck_tile/builder/testing/tensor_buffer.hpp"
#include "testing_utils.hpp"
#include <gtest/gtest.h>
#include <gmock/gmock.h>
#include <vector>
namespace ckb = ck_tile::builder;
namespace ckt = ck_tile::builder::test;
using ck_tile::test::HipError;
using ck_tile::test::HipSuccess;
using ::testing::Eq;
using ::testing::IsNull;
using ::testing::NotNull;
using ::testing::Throws;
TEST(DeviceBuffer, DefaultToNull)
{
ckt::DeviceBuffer buffer;
EXPECT_THAT(buffer.get(), IsNull());
}
TEST(DeviceBuffer, AllocBuffer)
{
const auto size = 12345;
auto buffer = ckt::alloc_buffer(size);
// Pointer should be non-null
EXPECT_THAT(buffer.get(), NotNull());
// Actually, the pointer should be a device pointer
hipPointerAttribute_t attr;
EXPECT_THAT(hipPointerGetAttributes(&attr, buffer.get()), HipSuccess());
EXPECT_THAT(attr.devicePointer, NotNull());
EXPECT_THAT(attr.type, Eq(hipMemoryTypeDevice));
// Memory should be writable without error
EXPECT_THAT(hipMemset(buffer.get(), 0xFF, size), HipSuccess());
}
TEST(DeviceBuffer, AutoFree)
{
const auto size = 12345;
std::byte* ptr = nullptr;
{
auto buffer = ckt::alloc_buffer(size);
ptr = buffer.get();
}
// Trying to use a pointer after freeing should return en error in HIP.
EXPECT_THAT(hipMemset(ptr, 0xFF, size), HipError(hipErrorInvalidValue));
}
TEST(DeviceBuffer, ThrowsOnOom)
{
const auto size = size_t{1} << 60; // 1 exabyte
auto check = [] { auto buffer = ckt::alloc_buffer(size); };
EXPECT_THAT(check, Throws<ckt::OutOfDeviceMemoryError>());
}
TEST(DeviceBuffer, AllocTensorBuffer)
{
std::vector<size_t> lengths = {128, 128, 128};
std::vector<size_t> strides = {128 * 128, 128, 1};
ckt::TensorDescriptor<ckb::DataType::FP32> descriptor(lengths, strides);
auto buffer = ckt::alloc_tensor_buffer(descriptor);
// Pointer should be non-null
EXPECT_THAT(buffer.get(), NotNull());
// Memory should be writable without error
EXPECT_THAT(hipMemset(buffer.get(), 0xFF, descriptor.get_element_space_size_in_bytes()),
HipSuccess());
}

View File

@@ -0,0 +1,47 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#include "ck_tile/builder/testing/tensor_buffer.hpp"
#include "testing_utils.hpp"
#include <gtest/gtest.h>
#include <gmock/gmock.h>
#include <vector>
namespace ckb = ck_tile::builder;
namespace ckt = ck_tile::builder::test;
using ::testing::ElementsAreArray;
using ::testing::Ge;
TEST(TensorDescriptor, Basic)
{
constexpr auto dt = ckb::DataType::FP16;
std::vector<size_t> lengths = {123, 456, 789};
std::vector<size_t> strides = {456 * 789, 789, 1};
ckt::TensorDescriptor<dt> descriptor(lengths, strides);
EXPECT_THAT(descriptor.get_lengths(), ElementsAreArray(lengths));
EXPECT_THAT(descriptor.get_strides(), ElementsAreArray(strides));
}
TEST(TensorDescriptor, ComputeSize)
{
constexpr auto dt = ckb::DataType::FP32;
std::vector<size_t> lengths = {305, 130, 924};
std::vector<size_t> strides = {1000 * 1000, 1, 1000};
ckt::TensorDescriptor<dt> descriptor(lengths, strides);
// Compute the location of the last item in memory, then add one
// to get the minimum size.
size_t expected_size = 1;
for(size_t i = 0; i < lengths.size(); ++i)
{
expected_size += (lengths[i] - 1) * strides[i];
}
EXPECT_THAT(descriptor.get_element_space_size(), Ge(expected_size));
EXPECT_THAT(descriptor.get_element_space_size_in_bytes(),
Ge(expected_size * ckt::data_type_sizeof(dt)));
}

View File

@@ -1,4 +1,4 @@
#!/bin/bash
#!/usr/bin/env bash
# Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
# SPDX-License-Identifier: MIT