blis/kernels/zen at 394eee90f69bdcaba1ea306d97fa58a255331e7c - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-07-17 00:57:20 +00:00

Files

Bhaskar Nallani 2ce47e6f5e Implemented optimal AVX512-variant of f32 LPGEMV

1. The 5 LOOP LPGEMM path is in-efficient when A or B is a vector
   (i.e, m == 1 or n == 1).

2. An efficient implementation of lpgemv_rowvar_f32 is developed
   considering the b matrix reorder in case of m=1 and post-ops fusion.

3. When m = 1 the algorithm divide the GEMM workload in n dimension
   intelligently at a granularity of NR. Each thread work on A:1xk
   B:kx(>=NR) and produce C=1x(>NR).  K is unrolled by 4 along with
   remainder loop.

4. When n = 1 the algorithm divide the GEMM workload in m dimension
   intelligently at a granularity of MR. Each thread work on A:(>=MR)xk
   B:kx1 and produce C = (>=MR)x1. When n=1 reordering of B is avoided
   to efficiently process in n one kernel.

5. Fixed few warnings while loading 2 f32 bias elements using
   _mm_load_sd using float pointer. Typecasted to (const double *)

AMD-Internal: [SWLCSG-2391, SWLCSG-2353]
Change-Id: If1d0b8d59e0278f5f16b499de1d629e63da5b599

2024-03-04 23:53:23 +05:30

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

Fixed out of bounds read in DTRSM small kernels

2024-02-02 10:31:12 +05:30

lpgemm

Implemented optimal AVX512-variant of f32 LPGEMV

2024-03-04 23:53:23 +05:30

util

CMake: Adding new portable CMake system.

2023-11-09 15:49:45 +05:30

bli_kernels_zen.h

Re-Designed SGEMM SUP kernel to use mask load/store instruction

2023-11-10 01:23:48 -05:00