blis/addon at e6cc2a3e227a3f38f4a3c9edf22d003131878fc2 - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-24 18:34:40 +00:00

Files

Meghana Vankadari 1072770c63 Implemented LPGEMV for bf16 datatype

1. The 5 LOOP LPGEMM path is in-efficient when A or B is a vector
   (i.e, m == 1 or n == 1).

2. An efficient implementation is developed considering the b matrix
   reorder in case of m=1 and post-ops fusion.

3. When m = 1 the algorithm divide the GEMM workload in n dimension
   intelligently at a granularity of NR. Each thread work on A:1xk
   B:kx(>=NR) and produce C=1x(>NR).  K is unrolled by 4 along with
   remainder loop.

4. When n = 1 the algorithm divide the GEMM workload in m dimension
   intelligently at a granularity of MR. Each thread work on A:(>=MR)xk
   B:kx1 and produce C = (>=MR)x1. When n=1 reordering of B is avoided
   to efficiently process in n one kernel.

AMD-Internal: [SWLCSG-2355]
Change-Id: I7497dad4c293587cbc171a5998b9f2817a4db880

2024-05-06 23:55:15 +05:30

aocl_gemm

Implemented LPGEMV for bf16 datatype

2024-05-06 23:55:15 +05:30

gemmd

Code cleanup: spelling corrections

2023-11-09 00:16:30 -05:00

CMakeLists.txt

CMake: Enable builds for both static and shared builds for Linux.

2024-03-14 10:32:51 -04:00