Details:
- Kernel is called directly from API call to avoid framework overhead in case of complex float and complex double precisions.
- Added SIMD code for complex float and complex double and unrolled for loop 5 times to improve performance
AMD-Internal: [CPUPL-1057]
Change-Id: I3b9d202398cacc0168882c9d6da2b450c27466a0