- Implemented bli_zgemm_4x4_avx2_k1_nn( ... ) kernel to replace
bli_zgemm_4x6_avx2_k1_nn( ... ) kernel in the BLAS layer of
ZGEMM. The kernel is built for handling the GEMM computation
with inputs having k = 1, and the transpose values for A and
B as N.
- The kernel dimension has been changed from 4x6 to 4x4,
due to the following reasons :
- The 1xNR block of B in the n-loop can be reused over multiple
MRx1 blocks of A in the m-loop during computation. Similar
analogy exists for the fringe cases.
- Every 1xNR block of B was scaled with alpha and stored in
registers before traversing in the m-dimension. Similar change
was done for fringe cases in n-dimension.
- These registers should not be modified during compute, hence
the kernel dimension was changed from 4x6 to 4x4.
- The check for early exit(with regards to BLAS mandate) has been
removed, since it is already present in the BLAS layer.
- The check for parallel ZGEMM has been moved post the redirection to
this kernel, since the kernel is single-threaded.
- The bli_kernels_zen.h file was updated with the new kernel signature.
AMD-Internal: [CPUPL-3622]
Change-Id: Iaf03b00d5075dd74cc412290d77a401986ba0bea