[CK_TILE] Update Stream-K Reduction Strategy Enum (#4756)

## Motivation

Currently, Stream-K has 3 reduction options: 1) atomics, 2) The
reduction described in the Stream-K paper, and 3) a tree reduction. The
reduction strategy described in the original Stream-K paper has the
starting workgroup of each tile sequentially accumulating partial
results of other contributing workgroups in the tile, which requires a
linear number of steps. Hence, for clarity, this works updates the
naming of the `StreamKReductionStrategy` enum members to better describe
the existing reduction strategy options.

## Technical Details

Prior to this change, the enum is as follows:
```cpp
enum StreamKReductionStrategy : uint32_t
{
    Atomic        = 0u,
    Reduction     = 1u,
    TreeReduction = 2u
};
```
But, the distinction between `Reduction` and `TreeReduction` is not very
clear and has some redundancy.
Hence, the updated enum is as follows:
```cpp
enum StreamKReductionStrategy : uint32_t
{
    Atomic = 0u,
    Linear = 1u,
    Tree   = 2u
};
```
All references to `StreamKReductionStrategy` were updated to reflect
this change.
## Test Plan

No new functionality was added, so no new tests were added; I just
validated existing tests and examples.

## Test Result

All tests passed locally.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
This commit is contained in:
Emily Martins
2026-02-23 23:40:08 -07:00
committed by GitHub
parent 6286c71499
commit cf00dc87d0
12 changed files with 32 additions and 35 deletions

View File

@@ -22,7 +22,7 @@ args:
-a_layout tensor A data layout (default: R)
-b_layout tensor B data layout (default: C)
-c_layout tensor C data layout (default: R)
-reduction_strategy strategy for storing results in C tensor. atomic/reduction (default:atomic)
-reduction_strategy strategy for storing results in C tensor. atomic/linear (default:atomic)
-persistent_dp persistent strategy for data-parallel section. Set to 0 for non-persistent or to 1 for persistent. (default:0)
-stride_a tensor A stride (default:0)
-stride_b tensor B stride (default:0)

View File

@@ -65,7 +65,7 @@ auto createArgs(int argc, char* argv[])
.insert("c_layout", "R", "C tensor data layout - Row by default")
.insert("reduction_strategy",
"atomic",
"strategy for storing results in C tensor - atomic/reduction")
"strategy for storing results in C tensor - atomic/linear")
.insert("persistent_dp",
"0",
"0. Non-persistent data-parallel section, 1 Fully persistent kernel.")

View File

@@ -119,7 +119,7 @@ std::tuple<float, ck_tile::index_t> invokeGemm(ck_tile::DeviceMem& a_m_k_device_
DsLayout,
CLayout,
CDEElementWise,
ck_tile::StreamKReductionStrategy::Reduction>(
ck_tile::StreamKReductionStrategy::Linear>(
args,
ck_tile::stream_config{
nullptr, true, 1, warmup_iterations, repeat_iterations, true, flush_cache});
@@ -155,9 +155,9 @@ ck_tile::StreamKReductionStrategy getReductionStrategyValue(const std::string& s
{
return ck_tile::StreamKReductionStrategy::Atomic;
}
else if(strategy == "reduction")
else if(strategy == "linear")
{
return ck_tile::StreamKReductionStrategy::Reduction;
return ck_tile::StreamKReductionStrategy::Linear;
}
else
{

View File

@@ -112,7 +112,7 @@ std::tuple<float, ck_tile::index_t> gemm(const ck_tile::StreamKHostArgs& args,
hipGetErrorString(hipMemsetAsync(
args.e_ptr, 0, args.M * args.N * sizeof(CDataType), stream_config.stream_id_));
}
else if constexpr(ReductionStrategy == ck_tile::StreamKReductionStrategy::Reduction)
else if constexpr(ReductionStrategy == ck_tile::StreamKReductionStrategy::Linear)
{
// Reset sk flags to zero before each repetition of the kernel
workspace_data.SetZero();