blis/kernels at ef04388a446f1a38ce36fccfa862b68193895e36 - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-04-19 23:28:52 +00:00

Files

varshav2 ef04388a44 Added AVX2 support for BF16 kernels: Row major

- Currently the BF16 kernels uses the AVX512 VNNI instructions.
   In order to support AVX2 kernels, the BF16 input has to be converted
   to F32 and then the F32 kernels has to be executed.
 - Added un-pack function for the B-Matrix, which does the unpacking of
   the Re-ordered BF16 B-Matrix and converts it to Float.
 - Added a kernel, to convert the matrix data from Bf16 to F32 for the
   give input.
 - Added a new path to the BF16 5LOOP to work with the BF16 data, where
   the packed/unpacked A matrix is converted from BF16 to F32. The
   packed B matrix is converted from BF16 to F32 and the re-ordered B
   matrix is unre-ordered and converted to F32 before feeding to the
   F32 micro kernels.
 - Removed AVX512 condition checks in BF16 code path.
 - Added the Re-order reference code path to support BF16 AVX2.
 - Currently the F32 AVX-2 kernels supports only F32 BIAS support.
   Added BF16 support for BIAS post-op in F32 AVX2 kernels.
 - Bug fix in the test input generation script.

AMD Internal : [SWLCSG - 3281]

Change-Id: I1f9d59bfae4d874bf9fdab9bcfec5da91eadb0fb

2025-02-10 08:18:52 -05:00

armsve

Merge commit 'cfa3db3f' into amd-main

2024-07-08 06:09:11 -04:00

armv7a

Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.

2021-09-29 16:43:38 -05:00

armv8a

Armv8 Trash New Bulk Kernels

2021-10-08 02:35:58 +09:00

bgq

Replaced use of bool_t type with C99 bool.