[rocm-libraries] ROCm/rocm-libraries#8259 (commit df03f10)

Add cluster launch in test ck_tile mx gemm tdm wmma

## Motivation

Add cluster launch test in test_ck_tile_mx_gemm_pipeline_tdm_wmma on
gfx1250, so that we can check the performance on gfx1250 hardware.

## Technical Details

Added Out-of-bounds guard in RunGemm of MxGemmKernel to skip blocks
padded by cluster alignment.

Add ClusterEnable/ClusterDisable aliases and extend the tuple in
test_mx_gemm_pipeline_kernel_types.hpp by adding two kernel types with
ClusterEnable for F8 CompTDMV1 and CompTDMV2 respectively. The existing
F4 non-ClusterLaunch kernel types have issue to be fixed, so this PR
does not include F4 cases.

Read ClusterLaunch from the tuple in test_mx_gemm_pipeline_util.hpp.

Update invoke_mx_gemm to branch on ClusterLaunch, including Add cluster
size constants, Switch GemmShape type, TilePartitioner type, and the
kernel launch call.

## Test Plan

Tested the changes on gfx1250 FFM.

## Test Result

The added kernel types (instances) passed the tests on gfx1250 FFM.

## Submission Checklist

- [x ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.
This commit is contained in:
jefyang1
2026-06-11 17:33:11 +00:00
committed by assistant-librarian[bot]
parent 359f664b25
commit 276863ca87
3 changed files with 52 additions and 9 deletions

View File

@@ -231,6 +231,13 @@ struct MxGemmKernel
bs_scale_ptr[i] = reinterpret_cast<const int32_t*>(kargs.bs_scale_ptr[i]);
});
// cluster launch pads grid to cluster boundaries; skip out-of-bound blocks
if constexpr(BaseKernel::ClusterLaunch)
{
if(block_idx_m >= kargs.M || block_idx_n >= kargs.N)
return;
}
const auto& as_block_window = BaseKernel::MakeABlockWindows(
as_ptr, kargs, splitk_batch_offset.splitted_k, block_idx_m);
const auto& bs_block_window = BaseKernel::MakeBBlockWindows(