amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-04 14:31:12 +00:00

Author	SHA1	Message	Date
Meghana Vankadari	c1e063e65c	Fix for offset issue while reading constants from JIT code Details: - For a variable x, Using address of x in an instruction throws exception if the difference between &x and access position is larger than 2 GiB. To solve this issue all variables are stored within the JIT code section and are accessed using relative addressing. - Fixed a bug in B matrix pack function for s8s8s32os32 API. - Fixed a bug in JIT code to apply bias on col-major matrices. AMD-Internal: [SWLCSG-2820] Change-Id: I82f117a0422c794cb9b1a4d65a89d60de4adfd96	2024-06-24 07:14:15 -04:00
mkadavil	cd032225ca	BF16 bias support for bf16bf16f32ob16. -As it stands the bf16bf16f32ob16 API expects bias array to be of type float. However actual use case requires the usage of bias array of bf16 type. The bf16 micro-kernels are updated to work with bf16 bias array by upscaling it to float type and then using it in the post-ops workflow. -Corrected register usage in bf16 JIT generator for bf16bf16f32ob16 API when k > KC. AMD-Internal: [SWLCSG-2604] Change-Id: I404e566ff59d1f3730b569eb8bef865cb7a3b4a1	2024-05-23 04:48:20 +05:30
Meghana Vankadari	3a8b9270e7	Implemented lpgemv for AVX512-INT8 variants - Implemented optimized lpgemv for both m == 1 and n == 1 cases. - Fixed few bugs in LPGEMV for bf16 and f32 datatypes. - Fixed few bugs in JIT-based implementation of LPGEMM for BF16 datatype. AMD-Internal: [SWLCSG-2354] Change-Id: I245fd97c8f160b148656f782d241f86097a0cf38	2024-05-14 01:55:49 +05:30
mkadavil	ec67289601	SWISH post-op support for BF16 JIT based kernels. SWISH post-op computes swish(x) = x / (1 + exp(-1 * alpha * x)). SiLU = SWISH with alpha = 1. Adding the support for swish in JIT based BF16 kernels. AMD-Internal: [SWLCSG-2387] Change-Id: I9eea0c801f5f067a5cfbd2941bc991708b86e45e	2024-05-13 01:50:32 -04:00
Meghana Vankadari	da8fd8c301	Implemented JIT-based microkernel for bf16 datatype Details: - Added new folder named JIT/ under addon/aocl_gemm/. This folder will contain all the JIT related code. - Modified lpgemm_cntx_init code to generate main and fringe kernels for 6x64 bf16 microkernel and store function pointers to all the generated kernels in a global function pointer array. This happens only when gcc version is < 11.2 - When gcc version < 11.2, microkernel uses JIT-generated kernels. otherwise, microkernel uses the intrinsics based implementation. AMD-Internal: [SWLCSG-2622] Change-Id: I16256c797b2546a8cd2049680001947346260461	2024-03-13 05:55:18 +05:30

5 Commits