blis/kernels/zen/1 at ea0324ab9558fe9bdf9579bb557706970e2ea2fb - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-13 10:35:38 +00:00

Files

Vignesh Balasubramanian a6a67fea2d ZAXPBYV optimizations for handling unit and non-unit strides

- Updated the bli_zaxpbyv_zen_int( ... ) kernel's computational
  logic. The kernel performs two different sets of compute based
  on the value of alpha, for both unit and non-unit strides. There
  are no constraints on beta scaling of the 'y' vector.

- Updated the logic to support 'x' conjugate in the computation.
  The kernel supports conjugate/no conjugate operation through the
  usage of _mm256_fmsubadd_pd( ... ) and _mm256_addsub_pd( ... )
  intrinsics.

- Updated the early return condition in the kernel to adhere to
  the standard compliance.

- Updated the scalar computation with vector computation(using 128
  bit registers), in case of dealing with a single element(fringe case)
  in unit-stride or vectors with non-unit strides. A single dcomplex
  element occupies 128 bits in memory, thereby providing scope for
  this optimization.

- Added accuracy and extreme value testing with sufficient sizes
  and initializations, to test the required main and fringe cases
  of the computation.

AMD-Internal: [CPUPL-3623]
Change-Id: I7ae918856e7aba49424162290f3e3d592c244826

2023-10-12 06:31:08 -04:00

bli_amaxv_zen_int.c

Code cleanup: spelling corrections

2023-04-19 12:44:56 -04:00

bli_axpbyv_zen_int10.c

Code cleanup: spelling corrections

2023-04-19 12:44:56 -04:00

bli_axpbyv_zen_int.c

ZAXPBYV optimizations for handling unit and non-unit strides

2023-10-12 06:31:08 -04:00

bli_axpyv_zen_int10.c

Code cleanup: spelling corrections