mirror of
https://github.com/amd/blis.git
synced 2026-06-30 03:07:23 +00:00
- Developed new AVX512 DGEMV kernels for Zen4/5 architectures and AVX2 kernels for Zen1/2/3 architectures. These kernels are written from the ground up and are independent of fused kernels. - The DGEMV primary kernel processes the calculation in chunks of 8 columns. Fringe columns (sizes 1 to 7) are handled by fringe kernels, which are invoked by the primary kernel as needed. - Implemented the kernels by computing the dot product of matrix A columns with vector x in chunks of 32 elements, storing the results in accumulator registers. Fringe elements are handled in chunks of 16, 8, etc. The data in the accumulator registers is then reduced and added to vector y. AMD-Internal: [CPUPL-5835] Change-Id: I5cb9eb1330db095931586a7028fd7676fbbecc61