[CK_TILE] Stream-K operator() Reboot (#3064)

* Persistent Stream-K Kernel Implementation

This change implements an operator() function in the
reboot::StreamKKernel class that is enabled when the Persistent flag is
set to true. In this case, the data-parallel portion and the Stream-K
portion of the kernel are fully persistent.

The changes were made in the reboot namespace. A future PR will remove
the old Stream-K kernel class and remove the reboot namespace.

* Unit Tests for Persistent Stream-K Kernel

This change contains the inital test suite for the Persitent Stream-K
Kernel. The files contain "reboot" in the name; a future PR will remove
tests for the old Stream-K Kernel and remove the "reboot" naming.

A future commit will add tests for the non-persistent kernel.

Also added estimate_num_wgs_per_tile to the StreamKTilePartitionerBase
class. This allows us to estimate the number of accumulations done per
macro tile in C to use during validation when computing relative and
absolute tolerance.

* Adding implementation for the Non-Persistent Stream-K kernel

This code is adding the operator() function for the Non-Persistent Stream-K
kernel. Persistency of the kernel is determined through a template argument.
The Non-Persistent kernel will allocate additional workgroups for the data
parallel section, leading to a different structure for processing the data
parallel and Stream-K sections.

There has been an addition to the TilePartitioner to get access to the whether
Persistent has been set to true or false in the StreamKKernel.

* Adding in the tests for the Non-Persistent Stream-K kernel

* Refactor Stream-K Reboot Unit Tests

This commit makes the following changes:
- Update test cases to determine M, N, and K based on the number of CUs.
  This ensures that each test case is one of Edge Case, SK Only, DP
Only, or DP + 2 Tile SK regardless of the architecture.
- Since the DP + 2 Tile SK test case takes long to run, this change
  moves this case into a separate .inc file and labels it as an extended
test.
- Since the extended test takes > 30 seconds to run, this test is added
  to the list of regression tests.

* Fix spelling errors in comments for test cases

Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>

* Changes based on review

Removed const volatile for typenames
Set up alias for is_tuple_t
Naming changes for clarity: GemmCommon -> BaseGemm
Moved std::enable_if_t out of template parameters and changed to a return type for operator()
Added constructor for StreamKKernelArgs to clarify UniversalGemm inheritance

---------

Co-authored-by: Emily Martins <emily.martins@amd.com>
Co-authored-by: Christopher Millette <63608002+cgmillette@users.noreply.github.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
This commit is contained in:
arai713
2025-10-27 09:14:17 -07:00
committed by GitHub
parent 0b68423015
commit 054fdb765c
20 changed files with 1122 additions and 0 deletions

View File

@@ -0,0 +1,19 @@
// Copyright © Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#include "test_gemm_streamk_reboot_types.hpp"
#include "test_gemm_streamk_reboot_util.hpp"
#include "gtest/gtest.h"
template <typename Tuple>
class TestCkTileStreamKRebootBf16NonPersistent : public TestCkTileStreamKReboot<Tuple>
{
};
#define TEST_SUITE_NAME TestCkTileStreamKRebootBf16NonPersistent
TYPED_TEST_SUITE(TestCkTileStreamKRebootBf16NonPersistent, KernelTypesStreamKBf16NonPersistent);
#include "test_gemm_streamk_reboot_extended_cases.inc"
#undef TEST_SUITE_NAME

View File

@@ -0,0 +1,19 @@
// Copyright © Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#include "test_gemm_streamk_reboot_types.hpp"
#include "test_gemm_streamk_reboot_util.hpp"
#include "gtest/gtest.h"
template <typename Tuple>
class TestCkTileStreamKRebootBf16Persistent : public TestCkTileStreamKReboot<Tuple>
{
};
#define TEST_SUITE_NAME TestCkTileStreamKRebootBf16Persistent
TYPED_TEST_SUITE(TestCkTileStreamKRebootBf16Persistent, KernelTypesStreamKBf16Persistent);
#include "test_gemm_streamk_reboot_extended_cases.inc"
#undef TEST_SUITE_NAME

View File

@@ -0,0 +1,19 @@
// Copyright © Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#include "test_gemm_streamk_reboot_types.hpp"
#include "test_gemm_streamk_reboot_util.hpp"
#include "gtest/gtest.h"
template <typename Tuple>
class TestCkTileStreamKRebootFp16NonPersistent : public TestCkTileStreamKReboot<Tuple>
{
};
#define TEST_SUITE_NAME TestCkTileStreamKRebootFp16NonPersistent
TYPED_TEST_SUITE(TestCkTileStreamKRebootFp16NonPersistent, KernelTypesStreamKFp16NonPersistent);
#include "test_gemm_streamk_reboot_extended_cases.inc"
#undef TEST_SUITE_NAME

View File

@@ -0,0 +1,19 @@
// Copyright © Advanced Micro Devices, Inc., or its affiliates.
// SPDX-License-Identifier: MIT
#include "test_gemm_streamk_reboot_types.hpp"
#include "test_gemm_streamk_reboot_util.hpp"
#include "gtest/gtest.h"
template <typename Tuple>
class TestCkTileStreamKRebootFp16Persistent : public TestCkTileStreamKReboot<Tuple>
{
};
#define TEST_SUITE_NAME TestCkTileStreamKRebootFp16Persistent
TYPED_TEST_SUITE(TestCkTileStreamKRebootFp16Persistent, KernelTypesStreamKFp16Persistent);
#include "test_gemm_streamk_reboot_extended_cases.inc"
#undef TEST_SUITE_NAME