Details:
- For a variable x, Using address of x in an instruction throws
exception if the difference between &x and access position is
larger than 2 GiB. To solve this issue all variables are stored
within the JIT code section and are accessed using relative addressing.
- Fixed a bug in B matrix pack function for s8s8s32os32 API.
- Fixed a bug in JIT code to apply bias on col-major matrices.
AMD-Internal: [SWLCSG-2820]
Change-Id: I82f117a0422c794cb9b1a4d65a89d60de4adfd96
-As it stands the bf16bf16f32ob16 API expects bias array to be of type
float. However actual use case requires the usage of bias array of bf16
type. The bf16 micro-kernels are updated to work with bf16 bias array by
upscaling it to float type and then using it in the post-ops workflow.
-Corrected register usage in bf16 JIT generator for bf16bf16f32ob16 API
when k > KC.
AMD-Internal: [SWLCSG-2604]
Change-Id: I404e566ff59d1f3730b569eb8bef865cb7a3b4a1
- Implemented optimized lpgemv for both m == 1 and n == 1 cases.
- Fixed few bugs in LPGEMV for bf16 and f32 datatypes.
- Fixed few bugs in JIT-based implementation of LPGEMM for BF16
datatype.
AMD-Internal: [SWLCSG-2354]
Change-Id: I245fd97c8f160b148656f782d241f86097a0cf38
SWISH post-op computes swish(x) = x / (1 + exp(-1 * alpha * x)).
SiLU = SWISH with alpha = 1. Adding the support for swish in JIT
based BF16 kernels.
AMD-Internal: [SWLCSG-2387]
Change-Id: I9eea0c801f5f067a5cfbd2941bc991708b86e45e
Details:
- Added new folder named JIT/ under addon/aocl_gemm/. This folder
will contain all the JIT related code.
- Modified lpgemm_cntx_init code to generate main and fringe kernels
for 6x64 bf16 microkernel and store function pointers to all the
generated kernels in a global function pointer array. This happens
only when gcc version is < 11.2
- When gcc version < 11.2, microkernel uses JIT-generated kernels.
otherwise, microkernel uses the intrinsics based implementation.
AMD-Internal: [SWLCSG-2622]
Change-Id: I16256c797b2546a8cd2049680001947346260461