- For the cases where AVX2 is available, an optimized function is called,
based on Blue's algorithm. The fallback method based on sumsqv is used
otherwise.
- Scaling is used to avoid overflow and underflow.
- Works correctly for negative increments.
- Cleaned up some white space in the AVX2 implementation for DNRM2.
AMD-Internal: [CPUPL-2551]
Change-Id: I0875234ea735540307168fe7efc3f10fe6c40ffc