Commit Graph

4 Commits

Author SHA1 Message Date
eashdash
ef134dc49f Added Trans A feature for all INT8 LPGEMM APIs
1. Added Trans A feature to handle column major inputs
   for A matrix.
2. Trans A is enabled by on-the-go pack of A matrix.
3. The on-the-go pack of A converts a column storage
   MCxKC block of A into row storage MCxKC block as
   LPGEMM kernels are row major kernels.
4. New pack routines are added for conversion of A matrix
   from column major storage to row major storage.
5. LPGEMM Cntx is updated with pack kernel function
   pointers.
6. Packing of A matrix:
   -  Converts column major input A to row major
      in blocks of MCxKC with newly added pack A
      functions when cs_a > 1.
7. Pack routines are added for AVX512 and AVX2
   INT8 LPGEMM APIs.
8. Trans A feature is now supported in:
   1. u8s8s32os32/os8
   2. u8s8s16os16/os8/ou8
   3. s8s8s32os32/os8
   4. s8s8s16os16/os8

AMD-Internal: SWLCSG-2582
Change-Id: I7ce331545525a9a09f3853280615b55fcf2edabf
2024-01-30 03:40:56 -05:00
Edward Smyth
ed5010d65b Code cleanup: AMD copyright notice
Standardize format of AMD copyright notice.

AMD-Internal: [CPUPL-3519]
Change-Id: I98530e58138765e5cd5bc0c97500506801eb0bf0
2023-11-23 08:54:31 -05:00
mkadavil
8dff49837d Lpgemm source restructuring to support amdzen config.
-Currently lpgemm can only be built using either zen3 or zen4 config.
The lpgemm kernel code is re-structured to support amdzen, and thus
multi machine deployment.
-The micro-kernel calls (gemm and pack) are currently hardcoded in the
lpgemm framework. This is removed and a new lpgemm_cntx based dispatch
mechanism is designed to support runtime configurability for
micro-kernels.

AMD-Internal: [CPUPL-2965]
Change-Id: I4bbcb4e5db767def1663caf5481f0b4c988149ef
2023-02-21 08:35:38 -05:00
mkadavil
6c112632a7 Low precision gemm integrated as aocl_gemm addon.
- Multi-Threaded int8 GEMM (Input - uint8_t, int8_t, Output - int32_t).
AVX512_vnni based micro-kernel for int8 gemm. Paralellization supported
along m and n dimensions.
- Multi-Threaded B matrix reorder support for sgemm. Reordering B matrix
is packing entire B matrix upfront before sgemm. It allows sgemm to
take advantage of packed B matrix without incurring packing costs during
runtime.
- Makefile updates to addon make rules to compile avx512 code for
selected files in addon folder.
- CPU features query enhancements to check for AVX512_VNNI flag.
- Bench for int8 gemm and sgemm with B matrix reorder. Supports
performance mode for benchmarking and accuracy mode for testing code
correctness.

AMD-Internal: [CPUPL-2102]

Change-Id: I8fb25f5c2fbd97d756f95b623332cb29e3b8d182
2022-06-09 10:28:38 -04:00