mirror of
https://github.com/amd/blis.git
synced 2026-05-13 10:35:38 +00:00
- Updated the bli_zaxpbyv_zen_int( ... ) kernel's computational logic. The kernel performs two different sets of compute based on the value of alpha, for both unit and non-unit strides. There are no constraints on beta scaling of the 'y' vector. - Updated the logic to support 'x' conjugate in the computation. The kernel supports conjugate/no conjugate operation through the usage of _mm256_fmsubadd_pd( ... ) and _mm256_addsub_pd( ... ) intrinsics. - Updated the early return condition in the kernel to adhere to the standard compliance. - Updated the scalar computation with vector computation(using 128 bit registers), in case of dealing with a single element(fringe case) in unit-stride or vectors with non-unit strides. A single dcomplex element occupies 128 bits in memory, thereby providing scope for this optimization. - Added accuracy and extreme value testing with sufficient sizes and initializations, to test the required main and fringe cases of the computation. AMD-Internal: [CPUPL-3623] Change-Id: I7ae918856e7aba49424162290f3e3d592c244826