[CK_TILE][REGRESSION] Correct blockSize in Generic2dBlockShape (c254f… (#2837)

* [CK_TILE][REGRESSION] Correct blockSize in Generic2dBlockShape (c254f3d7b4 )

WarpPerBlock_M * WarpPerBlock_N are not equal with ThreadPerBlock_M * ThreadPerBlock_N /warpSize. we should calculate BlockSize from WarpPerBlock_M * WarpPerBlock_N

To compatible with wave32, function GetBlockSize is added to calculate correct size in host side.

* fix blocksize for all kernel related with generic2dblockshap

* remove constexpr for blocks
This commit is contained in:
linqunAMD
2025-09-16 23:47:55 +08:00
committed by GitHub
parent 671adb59c5
commit b7a806f244
10 changed files with 63 additions and 26 deletions

View File

@@ -95,7 +95,11 @@ struct AddRmsnorm2dRdquantFwd
return dim3(integer_divide_ceil(hargs.m, Block_M));
}
CK_TILE_HOST static constexpr auto BlockSize() { return Problem::BlockShape::BlockSize; }
CK_TILE_HOST static constexpr auto BlockSize()
{
return is_wave32() ? Problem::BlockShape::template GetBlockSize<true>()
: Problem::BlockShape::template GetBlockSize<false>();
}
// clang-format off
template <typename T> struct t2s;