- Optimized axpy2v implementation for double
datatype by handling rows in mulitple of 4
and store the final computed result at the
end of computation, preventing unnecessary
stores for improving the performance.
- Optimal and reuse of vector registers for
faster computation.
AMD-Internal: [CPUPL-1973]
Change-Id: I7b8ef94d0f67c1c666fdce26e9b2b7291365d2e9