- Added 32x3n n-biased kernels to directly handle the cases where n=3
which were earlier being handled by the primary n-biased, 32x8n,
kernel.
- Modified the n-biased fringe kernels to further handle the smaller
m-fringe cases. Thus, now the kernels handle the following range of m
for any value of n:
- 16x8n : m = [16, 31)
- 8x8n : m = [8, 15)
- m_leftx8n : m = [1, 7]
- Updated the function pointer map for n-biased kernels with added
granularity to invoke the smaller fringe cases directly on the basis
of m-dimension.
- Added micro-kernel unit tests for all the dgemv_n kernels.
AMD-Internal: [CPUPL-6231]
Change-Id: Ibe88848c2c1bbb65b3e79fbc90a2800dc15f5119
- Developed new AVX512 DGEMV kernels for Zen4/5 architectures and
AVX2 kernels for Zen1/2/3 architectures. These kernels are written
from the ground up and are independent of fused kernels.
- The DGEMV primary kernel processes the calculation in chunks of
8 columns. Fringe columns (sizes 1 to 7) are handled by fringe
kernels, which are invoked by the primary kernel as needed.
- Implemented the kernels by computing the dot product of matrix A
columns with vector x in chunks of 32 elements, storing the results
in accumulator registers. Fringe elements are handled in chunks
of 16, 8, etc. The data in the accumulator registers is then reduced
and added to vector y.
AMD-Internal: [CPUPL-5835]
Change-Id: I5cb9eb1330db095931586a7028fd7676fbbecc61
- AVX512 specific DGEMV native kernels are added for Zen4/5
architectures to handle the NO_TRANSPOSE cases and are independent of
the AXPYF fused kernels.
- The following set of kernels biased towards the n-dimension perform
beta scaling of y vector within the kernel itself and handle cases
where n is less than 5:
- bli_dgemv_n_zen_int_32x8n_avx512( ... )
- bli_dgemv_n_zen_int_32x4n_avx512( ... )
- bli_dgemv_n_zen_int_32x2n_avx512( ... )
- bli_dgemv_n_zen_int_32x1n_avx512( ... )
- The bli_dgemv_n_zen_int_16mx8_avx512( ... ) is biased towards the
m-dimension and for this kernel beta scaling is handled beforehand
within the framework.
- Added unit-tests for the new kernels.
- AVX2 path for Zen/2/3 architectures still follows the old approach of
using fused kernel, namely AXPYF, to perform the GEMV operation.
AMD-Internal: [CPUPL-5560]
Change-Id: I22bc2a865cd28b9cdcb383e17d1ff38bdd28de79