mirror of
https://github.com/amd/blis.git
synced 2026-05-13 02:25:39 +00:00
- Updated the final reduction of partial sums( AVX-2 code section ) to use scalar accumulation entirely, instead of using the _mm256_hadd_pd( ... ) intrinsic. This will in turn change the associativity in the reduction step. - Reverted to using scalar code on the fringe cases in AVX-2 kernel for DNRM2 and DZNRM2, for improving functional correctness. AMD-Internal: [CPUPL-4049] Change-Id: I9d320b39d23a0cbcc77fb24d951fced778ea5ea5