composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-04-20 06:49:15 +00:00

Files

Bartłomiej Kocot cbc8335964 Improve XDL to WMMA porting for grouped conv fwd (#3456 )

Refactors the way the number of XDL (matrix multiply-accumulate) instructions per wave is calculated and used in the grouped convolution forward implementations, especially to better support WMMA (Wave Matrix Multiply-Accumulate) instructions and 16x16 tiles. 
The changes use MXdlPerWave instead of NXdlPerWave to increase number of waves per M dim.

2025-12-19 15:58:51 -07:00

Improve XDL to WMMA porting for grouped conv fwd (#3456 )

2025-12-19 15:58:51 -07:00

ck_tile

[CK_BUILDER] Ck Tile Grouped convolution factory (#3352 )

2025-12-08 10:32:56 +01:00