- Implemented an AVX512 rank-1 kernel that is
expected to handle column-major storage schemes
of A, B and C(without transposition) when k = 1.
- This kernel is single-threaded, and acts as a direct
call from the BLAS layer for its compatible inputs.
- Defined custom BLAS and BLIS_IMPLI layers for CGEMM
(instead of using the macro definition), in order to
integrate the call to this kernel at runtime(based on
the corresponding architecture and input constraints).
- Added unit-tests for functional and memory testing of the
kernel.
- Updated the ZEN5 context to include the AVX512 CGEMM
SUP kernels, with its cache-blocking parameters.
AMD-Internal: [CPUPL-6498]
Change-Id: I42a66c424325bd117ceb38970726a05e2896a46b