- The temporary buffer allocated for C matrix when downscaling is
enabled is not filled properly. This results in wrong gemm accumulation
when beta != 0, and thus wrong output after downscaling. The C panel
iterators used for filling the temporary buffer are updated to fix it.
- Low precision gemm bench updated for testing/benchmarking downscaling.
AMD-Internal: [CPUPL-2514]
Change-Id: Ib1ba25ba9df2d2997edaaf0763ff0113fb35d6eb