[CK_TILE][REGRESSION] Correct blockSize in Generic2dBlockShape (c254f… (#2837)

* [CK_TILE][REGRESSION] Correct blockSize in Generic2dBlockShape (c254f3d7b4 )

WarpPerBlock_M * WarpPerBlock_N are not equal with ThreadPerBlock_M * ThreadPerBlock_N /warpSize. we should calculate BlockSize from WarpPerBlock_M * WarpPerBlock_N

To compatible with wave32, function GetBlockSize is added to calculate correct size in host side.

* fix blocksize for all kernel related with generic2dblockshap

* remove constexpr for blocks
This commit is contained in:
linqunAMD
2025-09-16 23:47:55 +08:00
committed by GitHub
parent 671adb59c5
commit b7a806f244
10 changed files with 63 additions and 26 deletions

View File

@@ -49,7 +49,7 @@ float smoothquant_(const S& s, A a)
using Kernel = ck_tile::Smoothquant<Pipeline>;
const dim3 grids = Kernel::GridSize(a);
constexpr dim3 blocks = Kernel::BlockSize();
const dim3 blocks = Kernel::BlockSize();
constexpr ck_tile::index_t kBlockPerCu = 1;
auto kargs = Kernel::MakeKargs(a);