[rocm-libraries] ROCm/rocm-libraries#5284 (commit 76b5b15)

[CK_BUILDER] Add
 DeviceGroupedConvFwdMultipleABD_Wmma_CShuffle_V3 to CK Builder (#5284)

Add factory, InstanceTraits, and conv traits support for the WMMA V3
forward convolution kernel, enabling the CK Builder to generate and
dispatch this kernel variant used by MIOpen on gfx11/gfx12 GPUs.

## Motivation

As reported in issue #4944, MIOpen includes WMMA V3 forward convolution
kernels, so this PR adds support for those kernels similarly to other
supported kernels.

## Technical Details

This follows the same implementation as the other kernels. I added some
support for reflection, but I left a few todos since we need to
generalize our convolution traits to generalize across WMMA/MFMA and
CK/CKTile.

## Test Plan

Added faster tests to `ninja smoke-builder` that check the
instance-traits logic, and I added longer tests that instantiate
kernels, following the existing pattern in other kernals.

## Test Result

I tested all code with `ninja check-builder` on a gfx1101 build and ran
on gfx1101.

Co-authored-by: Claude Opus 4.6 <noreply@anthropic.com>
This commit is contained in:
John Shumway
2026-03-10 23:43:03 +00:00
committed by assistant-librarian[bot]
parent 26d29374e5
commit 9f47b8a63d
15 changed files with 916 additions and 0 deletions

View File

@@ -344,6 +344,16 @@ constexpr GridwiseWmmaGemmABK1 GemmParamsABK1_Wmma_16x16_2x1_per_wave{.ak1
.m_wmma_per_wave = 2,
.n_wmma_per_wave = 1};
constexpr GridwiseWmmaGemmABK1 GemmParamsABK1_Wmma_16x16_4x2_per_wave{.ak1 = 8,
.bk1 = 8,
.m_per_wmma = 16,
.n_per_wmma = 16,
.m_wmma_per_wave = 4,
.n_wmma_per_wave = 2};
constexpr ThreadBlock ThreadBlock_64_64x64x32{.block_size = 64,
.tile_size = {.m = 64, .n = 64, .k = 32}};
constexpr ThreadBlock ThreadBlock_256_256x256x32{.block_size = 256,
.tile_size = {.m = 256, .n = 256, .k = 32}};