[CK_TILE] fix example reduces, permute and elementwise on gfx11 & gfx12 (#2810)

1. Refine Reduce2dShape to support both wave32 and wave64
2. Fix example reduce, permute and elementwise on gfx11 and gfx12

---------

Co-authored-by: Illia Silin <98187287+illsilin@users.noreply.github.com>
This commit is contained in:
linqunAMD
2025-09-11 12:41:20 +08:00
committed by GitHub
parent 80a61afb9b
commit 0b9a638f26
11 changed files with 38 additions and 22 deletions

View File

@@ -137,8 +137,7 @@ bool run(const ck_tile::ArgParser& arg_parser)
// This is often a multiple of the wavefront size, 64 on CDNA.
// Here, it's explicitly set to 512. This should be consistent with Shape::kBlockSize.
// Shape::kBlockSize would be BlockWarps * warpSize (e.g., 8 * 64 = 512).
constexpr ck_tile::index_t kBlockSize =
ck_tile::get_warp_size() * BlockWarps::at(ck_tile::number<0>{});
const ck_tile::index_t kBlockSize = Kernel::BlockSize();
// kBlockPerCu: Hint for how many workgroups can be scheduled per Compute Unit (CU).
// This can influence occupancy and performance.