- Added a new GEMV kernel with MR = 8 which will be used
for cases where n=1.
- Modified GEMM and GEMV framework to choose right GEMV kernel
based on compile-time and run-time architecture parameters. This
had to be done since GEMV kernels are not stored-in/retrieved-from
the cntx.
- Added a pack kernel that packs A matrix from col-major to row-major
using AVX2 instructions.
AMD-Internal: [SWLCSG-3519]
Change-Id: Ibf7a8121d0bde37660eac58a160c5b9c9ebd2b5c
-New packing kernels for A matrix, both based on AVX512 and AVX2 ISA,
for both row and column major storage are added as part of this change.
Dependency on haswell A packing kernels are removed by this.
-Tiny GEMM thresholds are further tuned for BF16 and F32 APIs.
AMD-Internal: [SWLCSG-3380, SWLCSG-3415]
Change-Id: I7330defacbacc9d07037ce1baf4a441f941e59be
-Currently lpgemm sets the context (block sizes and micro-kernels) based
on the ISA of the machine it is being executed on. However this approach
does not give the flexibility to select a different context at runtime.
In order to enable runtime selection of context, the context
initialization is modified to read the AOCL_ENABLE_INSTRUCTIONS env
variable and set the context based on the same. As part of this commit,
only f32 context selection is enabled.
-Bug fixes in scale ops in f32 micro-kernels and GEMV path selection.
-Added vectorized f32 packing kernels for NR=16(AVX2) and NR=64(AVX512).
This is only for B matrix and helps remove dependency of f32 lpgemm api
on the BLIS packing framework.
AMD Internal: [CPUPL-5959]
Change-Id: I4b459aaf33c54423952f89905ba43cf119ce20f6
- Implemented the AVX512 packA kernel for col major inputs in F32 API
- Removed the work arounds for n = 1, mtag_a = PACK case, where the execution was
being directed to GEMM instead of GEMV.
Change-Id: I6fb700d96069213a762e8a83a209c5388a91050f