blis/kernels/zen4 at 27a9e2a0ff518ca7c613a889e456db6c4ba96c46 - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Files

mkadavil 27a9e2a0ff u8s8s32 fringe kernel optimizations.

-The n fringe micro kernels uses only a few zmm registers for computing
the output (eg: 6x16 uses 6 zmm registers for output as opposed to 24
used in 6x64). This results in lot of wasted registers that if utilized
can help increase the MR dimension and thus improve the reuse of
registers loaded with B. Based on this concept, the existing n fringe
kernels are modified (6x16 -> 12x16, 6x32 -> 9x32). It is to be noted
that the maximum number of registers are not used, since it results in
cache inefficient code due to the increase in MR and thus more
broadcasts required from unpacked A matrix.
-Compiler flag updates for AOCC build to generate loops with 64 byte
alignment. This has been observed to improve performance slightly when
k dimension is small.

AMD-Internal: [CPUPL-3173]
Change-Id: I199ce75ef71d994ffe0067dac1ed804dce1742ca

2023-04-03 05:35:18 -05:00

DAMAXV AXX512 micro kernel bug fix.

2022-06-13 10:52:53 +05:30

Added AVX512 8xk packing kernel

2023-03-27 23:18:32 -05:00

AVX-512 based col-preferred kernels for ZGEMM in native path

2023-03-28 23:05:06 -04:00

lpgemm

u8s8s32 fringe kernel optimizations.

2023-04-03 05:35:18 -05:00

bli_kernels_zen4.h

AVX-512 based col-preferred kernels for ZGEMM in native path

2023-03-28 23:05:06 -04:00

CMakeLists.txt

Integrated 32x6 DGEMM kernel for zen4 and its related changes are added.

2023-01-19 23:11:36 +05:30