blis/kernels/zen at 06f23c4fd4ee045fc17b67597a6e041dce91dcd3 - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-13 02:25:39 +00:00

Files

Vignesh Balasubramanian 06f23c4fd4 Bugfix : Functional correctness of DNRM2_ and DZNRM2_ APIs

- Updated the final reduction of partial sums( AVX-2 code section )
  to use scalar accumulation entirely, instead of using the
  _mm256_hadd_pd( ... ) intrinsic. This will in turn change the
  associativity in the reduction step.

- Reverted to using scalar code on the fringe cases in AVX-2 kernel
  for DNRM2 and DZNRM2, for improving functional correctness.

AMD-Internal: [CPUPL-4049]
Change-Id: I9d320b39d23a0cbcc77fb24d951fced778ea5ea5

2023-11-07 10:21:41 -05:00