Commit Graph

5 Commits

Author SHA1 Message Date
mkadavil
f040ba617f Element wise operations API for bfloat16 input matrix in LPGEMM.
-This API supports applying element wise operations (eg: post-ops) on a
bfloat16 input matrix to get an output matrix of the same(bfloat16) or
upscaled data type (float).
-Benchmarking/testing framework for the same is added.

AMD Internal: SWLCSG-2947

Change-Id: I43f1c269be1a1997d4912d8a3a97be5e5f3442d2
2024-08-05 07:17:08 -04:00
Meghana Vankadari
da8fd8c301 Implemented JIT-based microkernel for bf16 datatype
Details:
- Added new folder named JIT/ under addon/aocl_gemm/. This folder
  will contain all the JIT related code.
- Modified lpgemm_cntx_init code to generate main and fringe kernels
  for 6x64 bf16 microkernel and store function pointers to all the
  generated kernels in a global function pointer array. This happens
  only when gcc version is < 11.2
- When gcc version < 11.2, microkernel uses JIT-generated kernels.
  otherwise, microkernel uses the intrinsics based implementation.

AMD-Internal: [SWLCSG-2622]
Change-Id: I16256c797b2546a8cd2049680001947346260461
2024-03-13 05:55:18 +05:30
Edward Smyth
ed5010d65b Code cleanup: AMD copyright notice
Standardize format of AMD copyright notice.

AMD-Internal: [CPUPL-3519]
Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0
2023-11-23 08:54:31 -05:00
mkadavil
e23765010d aocl_gelu_<tanh|erf>_f32 api's for gelu computation as part of lpgemm.
-Currently in aocl_gemm, gelu (both tanh and erf based) computation is
only supported as a post-op as part of low precision gemm api call (done
at micro-kernel level). However gelu computation alone without gemm is
required in certain cases for users of aocl_gemm.
-In order to support this, two new api's - aocl_gelu_tanh_f32 and
aocl_gelu_erf_f32 are introduced as part of aocl_gemm. These api's
computes element-wise gelu_tanh and gelu_erf respectively of a matrix/
vector of floats. Both the api's invokes ISA specific vectorized micro-
kernels (vectorized only when incx=1), and a cntx based mechanism
(similar to lpgemm_cntx) is used to dispatch to the appropriate kernel.

AMD-Internal: [CPUPL-3218]
Change-Id: Ifebbaf5566d7462288a9a67f479104268b0cc704
2023-04-17 05:15:56 -04:00
mkadavil
8dff49837d Lpgemm source restructuring to support amdzen config.
-Currently lpgemm can only be built using either zen3 or zen4 config.
The lpgemm kernel code is re-structured to support amdzen, and thus
multi machine deployment.
-The micro-kernel calls (gemm and pack) are currently hardcoded in the
lpgemm framework. This is removed and a new lpgemm_cntx based dispatch
mechanism is designed to support runtime configurability for
micro-kernels.

AMD-Internal: [CPUPL-2965]
Change-Id: I4bbcb4e5db767def1663caf5481f0b4c988149ef
2023-02-21 08:35:38 -05:00