mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 10:09:41 +00:00
Refactors the way the number of XDL (matrix multiply-accumulate) instructions per wave is calculated and used in the grouped convolution forward implementations, especially to better support WMMA (Wave Matrix Multiply-Accumulate) instructions and 16x16 tiles.
The changes use MXdlPerWave instead of NXdlPerWave to increase number of waves per M dim.
[ROCm/composable_kernel commit: cbc8335964]