1. Correct shuffle_b and MakeBFlatDramTileDistribution according to WMMA warp layout 2. Add FlatmmConfig16_Wmma for gfx11 and gfx12 [ROCm/composable_kernel commit: df4ee556d6]
df4ee556d6