-DAMAXV AVX512 is giving wrong results when max element is present at
index in [n-u, n), where u < 32. This is a fallout of using wrong
start offset for the non-loop unrolled code.
-Functions for replacing NaN with negative numbers is replaced with
MACRO to avoid function call overhead and to remove static variables
used for stateful replacement numbers for NaN.
AMD-Internal: [CPUPL-2190]
Change-Id: Ie1435c38b264a271f869782793d0b52bbe6e1b2a