-This API supports applying element wise operations (eg: post-ops) on a
bfloat16 input matrix to get an output matrix of the same(bfloat16) or
upscaled data type (float).
-Benchmarking/testing framework for the same is added.
AMD Internal: SWLCSG-2947
Change-Id: I43f1c269be1a1997d4912d8a3a97be5e5f3442d2
Details:
- Added new folder named JIT/ under addon/aocl_gemm/. This folder
will contain all the JIT related code.
- Modified lpgemm_cntx_init code to generate main and fringe kernels
for 6x64 bf16 microkernel and store function pointers to all the
generated kernels in a global function pointer array. This happens
only when gcc version is < 11.2
- When gcc version < 11.2, microkernel uses JIT-generated kernels.
otherwise, microkernel uses the intrinsics based implementation.
AMD-Internal: [SWLCSG-2622]
Change-Id: I16256c797b2546a8cd2049680001947346260461
-Currently in aocl_gemm, gelu (both tanh and erf based) computation is
only supported as a post-op as part of low precision gemm api call (done
at micro-kernel level). However gelu computation alone without gemm is
required in certain cases for users of aocl_gemm.
-In order to support this, two new api's - aocl_gelu_tanh_f32 and
aocl_gelu_erf_f32 are introduced as part of aocl_gemm. These api's
computes element-wise gelu_tanh and gelu_erf respectively of a matrix/
vector of floats. Both the api's invokes ISA specific vectorized micro-
kernels (vectorized only when incx=1), and a cntx based mechanism
(similar to lpgemm_cntx) is used to dispatch to the appropriate kernel.
AMD-Internal: [CPUPL-3218]
Change-Id: Ifebbaf5566d7462288a9a67f479104268b0cc704
-Currently lpgemm can only be built using either zen3 or zen4 config.
The lpgemm kernel code is re-structured to support amdzen, and thus
multi machine deployment.
-The micro-kernel calls (gemm and pack) are currently hardcoded in the
lpgemm framework. This is removed and a new lpgemm_cntx based dispatch
mechanism is designed to support runtime configurability for
micro-kernels.
AMD-Internal: [CPUPL-2965]
Change-Id: I4bbcb4e5db767def1663caf5481f0b4c988149ef