[CK] Unify the grouped convolution gridwise Run() functions
(#4421)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit
## Motivation
There are currently three different grouped convolution related Run()
function overloads that exist in `gridwise_gemm_wmma_cshuffle_v3.hpp`.
These are used for the different types of grouped convolution: Forward,
Backward weights, and Backward data.
The functions are very similar and should be unified to a single `Run()`
function for all types of grouped convolution.
## Technical Details
The three old `Run<>()` functions were replaced with a single unified
function.
The new `Run<>()` function is run from device implementations:
- DeviceGroupedConvFwdMultipleABD_Wmma_CShuffle_V3
- DeviceGroupedConvBwdDataMultipleD_Wmma_CShuffleV3
- DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffleV3
- DeviceGroupedConvBwdWeightTwoStage_Wmma_CShuffleV3
- DeviceGroupedConvBwdWeight_Wmma_CShuffleV3
The DeviceGroupedConvFwdMultipleD_Wmma_CShuffle_V3_Large_Tensor
implementation uses a different `Run<>()` overload and was therefore not
modified.
## Test Plan
Run the following grouped convolution tests on `gfx1201`, as this
architecture is WMMA-capable:
- `test_grouped_convnd_fwd`
- `test_grouped_convnd_bwd_weight`
- `test_grouped_convnd_bwd_data`
Compilation and testing were also executed on `gfx1100` to avoid CI
problems.
## Test Result
First part (unification of `Run<>()` function): All tests successful.
Second part (integration of single `Run<>()` function as a direct call):
All tests successful.
## Submission Checklist
- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.