Files
blis/kernels
Harihara Sudhan S d4901f53ce Improved performance of double complex AXPYV kernel
- Increased unroll by reusing X registers that was previously used
  for performing shuffle.
- Added loops with smaller increment steps for better problem
  decomposition.
- Added X vector and Y vector prefetch to the kernel.
- Removed redundant code that handles fringe in incx = 1 and
  incy = 1. This remainder will be performed by the loop that handles
  non-unit stride cases.
- Vectorized loops that handle non-unit stride cases using SSE
  instructions.

AMD-Internal: [CPUPL-2773]
Change-Id: Ifb5dc128e17b4e21315789bfaa147e3a7ec976f0
2023-03-02 00:35:42 -05:00
..
2020-09-29 16:52:18 -05:00
2022-07-22 03:42:17 -04:00
2021-04-27 11:09:48 +05:30
2020-07-22 18:24:26 +05:30