mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-06-29 03:07:02 +00:00
[CK_Tile] Add wmma_bf16f32_16x16x32_bf16 via fused-downconvert override (#8028) MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit ## Summary Adds `__builtin_amdgcn_wmma_bf16f32_16x16x32_bf16` (fp32 accumulate → bf16 output) to the CK Tile WMMA warp-gemm path. **API only** — the unit test is split into a stacked PR (#8035) so this API change can be reviewed in isolation. ## Changes (4 files) - **16-bit trait:** `wmma_intrinsic_downconvert` (calls the bf16f32 builtin — fp32 C in, bf16 C out) plus `COutDataType = bf16_t` / `COutVecType`. - **`WarpGemmAttributeWmmaImpl` / `WarpGemmAttributeWmma`:** `mac_downconvert(c_fp32, a, b)` (kTransC-aware) returning the bf16 C-output vector. - **`WarpGemmImpl`:** `mac_downconvert` tail handler producing a bf16 C-output tile from the fp32 accumulator tile, reusing `CWarpDstrEncoding` (output layout identical to the f32 C tile). Verified on gfx1250 (via the stacked test PR #8035): the test passes; the existing WMMA warp-gemm test is unaffected (additive change only).