mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-29 11:16:59 +00:00
[GFX1250][CK_TILE] Add scale16 (ScaleBlockSize=16) support to MX GEMM TDM pipeline (#8202) Enables `ScaleBlockSize=16` end-to-end for the FP8/BF8 MX GEMM TDM pipeline, building on the scale16 warp-gemm layer already in develop. - **warp gemm:** add the 32x32x128 f8f6f4 scale16 traits and alias (2x2 grid of 16x16x128 scale16 intrinsic calls with per-subtile `SCALE_OPSEL`), and route 32x32 f8f6f4 through the dispatcher's `IsScale16` path. - **default policy:** select the warp gemm via the dispatcher with `IsScale16=(ScaleBlockSize==16)` so `WarpTile=16` and `WarpTile=32` each pick the matching scale16 path; guard WarpTile M/N to 16 or 32; scale-tile distribution for the scale16 layout. - **pipeline V1/V2:** thread `Problem::ScaleBlockSize` through the scale-window setup (replacing the hardcoded 32); expose `ScaleBlockSize` for the kernel. - **block gemm:** extract int64 (scale16) / int32 (scale32) scales by width. - **kernel:** scale16 descriptor order; reject unsupported `BlockScaleSize`. Test coverage for this path is in the stacked follow-up PR.