mirror of
https://github.com/amd/blis.git
synced 2026-05-12 18:15:37 +00:00
- BF16 instructions output is accumulated at a higher precision of FP32 which needs to be converted to a lower precison of bf16 post the GEMM operations. This is required in AI workloads where both input and output are in BF16 format. - BF16 downscaling is implemented as post-ops inside the GEMM microkernels. Change-Id: Id1606746e3db4f3ed88cba385a7709c8604002a8