[CK_BUILDER] Add GPU Reference Algorithm to CK Builder (#3381)

* [CK_BUILDER] Integrate GPU reference as ConvAlgorithm

Add GPU reference as a ConvAlgorithm specialization, enabling:
- Unified Builder API for reference and optimized kernels
- Future ckProfiler integration for validation
- First step toward numerical validation in Builder tests

Changes:
- Add ConvAlgorithmSpecialization::REFERENCE enum
- Add ConvAlgorithm_Reference struct
- Add IsReferenceAlgorithm concept
- Create 3 reference factories (Forward, BwdData, BwdWeight)
- Wire into conv_dispatcher
- Add proof-of-concept test (passing)

Test result: Can instantiate reference through Builder API

* Add GPU reference execution tests

- Reference kernel executes through Builder (459ms)
- Both reference and optimized can instantiate
- Tests passing

Next: Implement utilities for comparison

* Optimized Builder kernel execution works

- MakeArgument pattern implemented
- Builder-generated kernel executes successfully
- Tests passing (451ms execution)

Next: Add comparison

* VALIDATION COMPLETE: Builder == Reference

Builder-generated kernel output matches GPU reference!

Test: Validate_Optimized_vs_Reference_Forward_2D_FP16
Result: PASS ✓

This proves CK Builder generates correct code!

* Update to new Builder API

All tests passing

* Rename test file for clarity

test_builder_kernel_execution -> test_builder_kernel_validation

* Add all 3 directions support

- Forward, Backward Data, Backward Weight
- All reference factories working
- Dispatcher wired for all directions
- 9 tests passing

Tests:
- test_reference_execution: 3 tests (all directions)
- test_optimized_execution: 3 tests (all directions)
- test_builder_kernel_validation: 3 tests (fwd validated, bwd placeholders)

* Add backward direction support

- Backward data and weight dispatcher wiring
- Fix factories for new API
- All 3 directions tested
- 9 tests passing

* Refactor: Change IsReferenceAlgorithm from concept to consteval function

Address review feedback: Use consteval function in dispatcher instead of
concept, matching the pattern for other algorithms (Tile, XDL, WMMA, DL).

- Remove IsReferenceAlgorithm concept from conv_algorithm_concepts.hpp
- Add IsReferenceAlgorithm() consteval function to conv_dispatcher.hpp
- Update dispatcher to use function call: IsReferenceAlgorithm<T>()
- Remove redundant algorithm checks from reference factory requires clauses

All tests passing (9/9).

* Move Tile algorithm check outside direction block to support all directions

* Implement MakeInvokerPointer interface and add random input validation

- Implement full Argument/Invoker structs for old CK interface (not just nullptr)
- Refactor with reference_common.hpp to reduce code duplication
- Add random input validation tests: Builder vs direct GPU reference (all directions)
- Fix layout: GNHWC -> NHWGC to match reference kernel expectations
- All 12 tests pass with IDENTICAL results on random input

* Move ConvAlgorithm_Reference to test/impl/conv_algorithm_types.hpp

Keep types.hpp for data types only (enums), move algorithm descriptors
to conv_algorithm_types.hpp as suggested by review.

* Add static_assert to ensure reference factories only accept PassThrough operations

Reference implementation doesn't support fused elementwise operations.
Add compile-time validation to fail early with clear error message if
non-PassThrough operations are specified on input, weight, or output.

* Add InstanceTraits support for reference kernels

- Store SIGNATURE/ALGORITHM/VERSION in Instance for reflection
- Create shared ReferenceCommonTraits base for common properties
- Add 3 direction-specific InstanceTraits specializations in one file
- Include data type and layouts in instance_string output

* Remove optimized kernel validation tests from reference-only branch

* Use existing layout helper and organize reference tests

Use LayoutToCK from conv_tensor_layout.hpp and move reference InstanceTraits
test to validation folder.

* Merge develop branch

Fix DataType switch for new mixed precision types.

* Fix comment spacing for CI

* Convert IsReferenceAlgorithm from function to concept

* Add reference tests to CI smoke tests

* Consolidate 3 reference factories into single unified factory

---------

Co-authored-by: Ville Pietilä <188998872+vpietila-amd@users.noreply.github.com>
This commit is contained in:
JH-Leon-KIM-AMD
2025-12-29 16:11:08 +02:00
committed by GitHub
parent 88ae445580
commit a0acc83a72
9 changed files with 1774 additions and 29 deletions

View File

@@ -84,21 +84,29 @@ add_ck_builder_test(test_ckb_conv_builder
unit_conv_tensor_layout.cpp
unit_conv_tensor_type.cpp
unit_conv_thread_block.cpp
unit_conv_tuning_params.cpp
unit_conv_fwd_testing.cpp)
target_link_libraries(test_ckb_conv_builder PRIVATE utility)
unit_conv_tuning_params.cpp)
# Tests the inline diff utility used for comparing strings in tests assertions
add_ck_builder_test(test_ckb_inline_diff test_inline_diff.cpp)
# Tests the inline diff utility used for comparing strings in tests assertions
add_ck_builder_test(test_ckb_inline_diff test_inline_diff.cpp)
# Tests convolution trait selection and configuration
add_ck_builder_test(test_ckb_conv_traits
conv/ck/test_conv_traits.cpp)
# Tests convolution problem description and parameter handling
add_ck_builder_test(test_ckb_conv_description
test_conv_description.cpp)
# GPU reference validation tests (in validation/ folder)
# 1. Reference kernel execution and InstanceTraits
add_ck_builder_test(test_ckb_reference_execution
validation/test_reference_execution.cpp
validation/test_reference_instance_traits.cpp)
target_link_libraries(test_ckb_reference_execution PRIVATE utility)
# Note: Optimized kernel validation tests will be added after merging dev branch
# with kernel Run() implementation from colleague's work
# Tests convolution trait selection and configuration
add_ck_builder_test(test_ckb_conv_traits
conv/ck/test_conv_traits.cpp)
# Tests convolution problem description and parameter handling
add_ck_builder_test(test_ckb_conv_description
test_conv_description.cpp)
################################################################################
# REGRESSION TESTS - Integration Tests (With Kernel Compilation)
################################################################################
@@ -181,6 +189,7 @@ set(CKB_SMOKE_TESTS
test_ckb_inline_diff
test_ckb_conv_traits
test_ckb_conv_description
test_ckb_reference_execution
)
foreach(test_target ${CKB_SMOKE_TESTS})

View File

@@ -479,4 +479,13 @@ using ConvAlgorithm_Tile_GroupedConvolutionKernel = ConvAlgorithmTemplate<TileTh
TileConvSpecialization_,
TileOptimizations_>;
// Reference algorithm descriptor - for GPU reference validation
// This is a simple algorithm that requires no complex configuration,
// just a specialization marker to identify it as a reference implementation.
struct ConvAlgorithm_Reference
{
static constexpr auto specialization = ckb::ConvAlgorithmSpecialization::REFERENCE;
// GPU reference uses simple algorithm, no tile configuration needed
};
} // namespace ck_tile::builder::test

File diff suppressed because it is too large Load Diff

View File

@@ -0,0 +1,117 @@
// Copyright (c) Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
// Test: Verify InstanceTraits works for Reference kernels
#include "ck_tile/builder/conv_builder.hpp"
#include "ck_tile/builder/types.hpp"
#include "ck_tile/builder/reflect/instance_traits_reference.hpp"
#include "impl/conv_algorithm_types.hpp"
#include "impl/conv_signature_types.hpp"
#include <gtest/gtest.h>
namespace {
using namespace ck_tile::builder;
using namespace ck_tile::builder::test;
TEST(ReferenceInstanceTraits, Forward_2D_FP16)
{
// Create a reference forward kernel
constexpr ConvSignature sig{.spatial_dim = 2,
.direction = ConvDirection::FORWARD,
.data_type = DataType::FP16,
.accumulation_data_type = DataType::FP32,
.input = {.config = {.layout = TensorLayout::NHWGC}},
.weight = {.config = {.layout = TensorLayout::GKYXC}},
.output = {.config = {.layout = TensorLayout::NHWGK}}};
constexpr auto ref_alg = ConvAlgorithm_Reference{};
using RefKernel = ConvBuilder<sig, ref_alg>::Instance;
// Use InstanceTraits to query properties
using Traits = ck_tile::reflect::InstanceTraits<RefKernel>;
// Verify spatial dimension
EXPECT_EQ(Traits::kSpatialDim, 2);
// Verify direction
EXPECT_EQ(Traits::direction, ConvDirection::FORWARD);
// Verify data types
EXPECT_TRUE((std::is_same_v<Traits::ADataType, ck::half_t>));
EXPECT_TRUE((std::is_same_v<Traits::BDataType, ck::half_t>));
EXPECT_TRUE((std::is_same_v<Traits::EDataType, ck::half_t>));
// Verify layouts
EXPECT_TRUE((std::is_same_v<Traits::InLayout, ck::tensor_layout::convolution::NHWGC>));
EXPECT_TRUE((std::is_same_v<Traits::WeiLayout, ck::tensor_layout::convolution::GKYXC>));
EXPECT_TRUE((std::is_same_v<Traits::OutLayout, ck::tensor_layout::convolution::NHWGK>));
// Verify elementwise operations (always PassThrough for reference)
EXPECT_TRUE(
(std::is_same_v<Traits::AElementwiseOperation, ck_tile::element_wise::PassThrough>));
EXPECT_TRUE(
(std::is_same_v<Traits::BElementwiseOperation, ck_tile::element_wise::PassThrough>));
EXPECT_TRUE(
(std::is_same_v<Traits::CDEElementwiseOperation, ck_tile::element_wise::PassThrough>));
// Verify block size is 0 (N/A for reference)
EXPECT_EQ(Traits::kBlockSize, 0);
// Verify instance_string() - now includes data type and layouts!
std::string instance_str = Traits::instance_string();
EXPECT_EQ(instance_str, "GPU_Reference_Forward_2D_fp16_NHWGC_GKYXC_NHWGK");
std::cout << "✓ Forward InstanceTraits validated: " << instance_str << std::endl;
}
TEST(ReferenceInstanceTraits, BackwardData_2D_FP16)
{
constexpr ConvSignature sig{.spatial_dim = 2,
.direction = ConvDirection::BACKWARD_DATA,
.data_type = DataType::FP16,
.accumulation_data_type = DataType::FP32,
.input = {.config = {.layout = TensorLayout::NHWGC}},
.weight = {.config = {.layout = TensorLayout::GKYXC}},
.output = {.config = {.layout = TensorLayout::NHWGK}}};
constexpr auto ref_alg = ConvAlgorithm_Reference{};
using RefKernel = ConvBuilder<sig, ref_alg>::Instance;
using Traits = ck_tile::reflect::InstanceTraits<RefKernel>;
EXPECT_EQ(Traits::kSpatialDim, 2);
EXPECT_EQ(Traits::direction, ConvDirection::BACKWARD_DATA);
std::string instance_str = Traits::instance_string();
EXPECT_EQ(instance_str, "GPU_Reference_BackwardData_2D_fp16_NHWGC_GKYXC_NHWGK");
std::cout << "✓ Backward Data InstanceTraits validated: " << instance_str << std::endl;
}
TEST(ReferenceInstanceTraits, BackwardWeight_2D_FP16)
{
constexpr ConvSignature sig{.spatial_dim = 2,
.direction = ConvDirection::BACKWARD_WEIGHT,
.data_type = DataType::FP16,
.accumulation_data_type = DataType::FP32,
.input = {.config = {.layout = TensorLayout::NHWGC}},
.weight = {.config = {.layout = TensorLayout::GKYXC}},
.output = {.config = {.layout = TensorLayout::NHWGK}}};
constexpr auto ref_alg = ConvAlgorithm_Reference{};
using RefKernel = ConvBuilder<sig, ref_alg>::Instance;
using Traits = ck_tile::reflect::InstanceTraits<RefKernel>;
EXPECT_EQ(Traits::kSpatialDim, 2);
EXPECT_EQ(Traits::direction, ConvDirection::BACKWARD_WEIGHT);
std::string instance_str = Traits::instance_string();
EXPECT_EQ(instance_str, "GPU_Reference_BackwardWeight_2D_fp16_NHWGC_GKYXC_NHWGK");
std::cout << "✓ Backward Weight InstanceTraits validated: " << instance_str << std::endl;
}
} // namespace