- Replaced switch case with if else, lookup table for switch case
is palced at the end of .text section which causes a huge jump.
- Reduced number of branches for tiny sizes.
- Cpuid query is slow, therefore added a new if statement which avoids cpuid
query for tiny sizes(<200).
- Redirected tiny sizes to AVX2 kernel.
AMD-Internal: [CPUPL-5407]
Change-Id: I8e73777b2f00c9dcff9775ddfcb7ca3f74fa901c