- For the cases where AVX2 is available, an optimized function is called,
based on Blue's algorithm. The fallback method based on sumsqv is used
otherwise.
- Scaling is used to avoid overflow and underflow.
- Works correctly for negative increments.
AMD-Internal: [SWLCSG-1080]
Change-Id: I6bf2f42652ba6b8a8631a0a9e6f6297d5b3ea5d9