Files
composable_kernel/include
Aviral Goel 267ca67001 [rocm-libraries] ROCm/rocm-libraries#8028 (commit c1cb112)
[CK_Tile] Add wmma_bf16f32_16x16x32_bf16 via
 fused-downconvert override (#8028)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Summary

Adds `__builtin_amdgcn_wmma_bf16f32_16x16x32_bf16` (fp32 accumulate →
bf16 output) to the CK Tile WMMA warp-gemm path. **API only** — the unit
test is split into a stacked PR (#8035) so this API change can be
reviewed in isolation.

## Changes (4 files)

- **16-bit trait:** `wmma_intrinsic_downconvert` (calls the bf16f32
builtin — fp32 C in, bf16 C out) plus `COutDataType = bf16_t` /
`COutVecType`.
- **`WarpGemmAttributeWmmaImpl` / `WarpGemmAttributeWmma`:**
`mac_downconvert(c_fp32, a, b)` (kTransC-aware) returning the bf16
C-output vector.
- **`WarpGemmImpl`:** `mac_downconvert` tail handler producing a bf16
C-output tile from the fp32 accumulator tile, reusing
`CWarpDstrEncoding` (output layout identical to the f32 C tile).

Verified on gfx1250 (via the stacked test PR #8035): the test passes;
the existing WMMA warp-gemm test is unaffected (additive change only).
2026-06-05 05:01:31 +00:00
..