Threading related changes
--------------------------
- Created function bli_nthreads_l1 that dispatches the AOCL dynamic
logic for a L1 function based on the kernel ID and input datatypes.
- bli_nthreads_l1 gets the number of threads to be launched from the
rntm variable.
- Added aocl_'ker?'_dynamic function for DAXPYV, DSCALV, ZDSCALV and
DDOTV. This function contains the AOCL dynamic logic for the
respective kernels.
- Added handling for cases when number of elements (n) is less than
number of threads spawned (nt) in AOCL dynamic.
- Added function bli_thread_vector_partition that calculates the
amount of work the calling thread is supposed to perform on a
vector.
Interface changes
-----------------
- In BLIS impli layer of DSCALV, ZDSCALV and AXPYV, added logic to pick
kernel based on architecture ID and removed AVX2 flag check.
- Modified function signature of ZDSCALV. Alpha is passed as dcomplex
and only the real part of the alpha passed is used inside the kernel.
The change was done to facilitate kernel dispatch based on arch ID.
- Added n <= 0, BLAS exception in BLAS layer of DAXPYV and DDOTV.
Without this multithreaded code might crash because of minimum work
calculation.
Misc
-----
- Removed unused variables from ZSCAL2V and AXPYV kernels.
AMD-Internal: [CPUPL-3095]
Change-Id: I4fc7ef53d21f2d86846e86d88ed853deb8fe59e9