mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-07-03 13:48:30 +00:00
[CK_TILE] Use launched block size for GEMM occupancy query (#8531) The grouped, grouped-quant, and stream-k GEMM kernels were asking the occupancy query about `kBlockSize`, but on wave32 (gfx1250) we actually launch `kBlockSize/2`. So the occupancy came back too low and the persistent/stream-k grid ended up undersized. Just pass `BlockSize().x` like the universal and flatmm kernels already do. No-op on wave64. Verified it builds + runs correctly on gfx1250 (grouped gemm) and builds on gfx950 (stream-k).