mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-30 19:57:40 +00:00
This PR remap 32x32 warp tile to 16x16 warp tile for all CK kernels in wave32. the logic is same with ROCm/composable_kernel#3421. and the most change is in device classes. To reduece the instance build time, VGPR estimation is implemented in ~10 gridwise classes. and to pass all test in CI, several tests are minor adjusted.