mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-04-19 22:39:03 +00:00
When there are multiple workgroups contributing to a tile, when using atomics, there may be round off error in cases where the accumulator type is not the same as the C type. To compute an error tolerance for test validation, the Stream-K Tile Partitioner has a function called estimate_num_wgs_per_tile to estimate the number of workgroups per tile. That said, this function only provides an estimate. In some cases for DP+2TSK, the function returns 1 rather than the more accurate value of 2. Thus, this change updates the estimate_num_wgs_per_tile function to explicitely return the value of 2 in cases for DP+2TSK to ensure that we have a better error tolerance to avoid test failures due to round-off error.