Details:
- SYRK for small matrix was implemented by reusing small GEMM routine. This was
resulting in output written to the full C matrix, and C being symmetric the
lower and upper triangles of C matrix contained same results. BLAS SYRK API
spec demands either lower or upper triangle of C matrix to be written with
results. So, this was resulting in BLAS test failures, even though testsuite
of BLIS was passing small SYRK operation.
- To fix BLAS test failures of small matrix SYRK, separate kernel routines are
implemented for small SYRK for both single and double precision. The newly
added small SYRK routines are in file kernels/zen/3/bli_syrk_small.c.
Now the intermediate results of matrix C are written to a scratch buffer.
Final results are written from scratch buffer to matrix C using SIMD
copy to either lower or upper traingle part of matrix C.
- Source and header files frame/3/syrk/bli_syrk_front.c and
frame/3/syrk/bli_syrk_front.h are changed to invoke new small SYRK routines.
Change-Id: I9cfb1116c93d150aefac673fca033952ecac97cb