Details:
- In case of GEMM, whenever beta is zero, we need to perform C = alpha
*(A * B) instead of C = beta * C + alpha * (A * B)
Added conditions to check the value of beta at different levels inside
small_gemm kernels and decide whether to perform scaling C with beta or
not.
-Modified small_gemm kernels to use BLIS specific functions to retrieve
different fields of objects.
-Calling bli_gemm_check before entering bli_gemm_small to facilitate
early return in case of invalid inputs.
-For corner cases inside small_gemm kernels, a buffer called f_temp
is used to load and store data to and from registers.
populating the buffer with zeroes before use.
-In bli_gemm_front, datatypes of status and return value from
bli_gemm_small are not matching.
Corrected the datatype of the variable 'status' inside bli_gemm_front
to err_t.
Change-Id: I8b52ad55008f028d6c8b7e0d20f746a869d9daea
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-689,SWLCSG-104]