mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 18:17:44 +00:00
## Motivation Tile engine CI build on the develop branch started failing after a recent change(https://github.com/ROCm/rocm-libraries/pull/5218) in `cshuffle_epilogue.hpp`. The `EightWave` constant was unconditionally computed as `(MWave * NWave == 8)` for all architectures, but this logic is only valid for gfx9*. On other architectures (e.g., gfx1201), `EightWave` must always be `false`, otherwise it leads to incorrect `BlockedXDLN_PerWarp` computation and build failures. ## Technical Details In `cshuffle_epilogue.hpp`, the `EightWave` static constexpr was set as: ```cpp static constexpr bool EightWave = (MWave * NWave == 8); ``` This was applied regardless of the target GPU architecture. The fix uses a preprocessor guard to make this architecture-aware: - **gfx9* (`__gfx9__`):** `EightWave` is evaluated as `(MWave * NWave == 8)` — true or false depending on the wave configuration - **All other architectures:** `EightWave` defaults to `false` ## Test Plan - Tile engine CI build on develop branch ## Test Result - *Pending CI* ## Submission Checklist - [x] Look over the contributing guidelines at https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests. --- > **Note:** This PR supersedes ROCm/rocm-libraries#5436, which is blocked pending a review approval from a reviewer currently on PTO. The same changes have been applied to this branch (`users/tlakshma/ck/develop-clone`) to allow merging. Co-authored-by: Claude Sonnet 4.6 <noreply@anthropic.com> Co-authored-by: Max Podkorytov <4273004+tenpercent@users.noreply.github.com> Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>