Files
blis/kernels
Harihara Sudhan S 18ae57305e ZAXPYF4 optimization
- Vectorized alpha scaling of X vector using SSE instructions. This
  can be done irrespective of incx.
- Added code to prefetch A matrix and Y vector to L1 cache
- Vectorized fringe case computation and non-unit stride computation
  with SSE instructions.
- Increased unroll in unit stride cases for better register
  utilization.

AMD-Internal: [CPUPL-2773]
Change-Id: I217e6ce9e3f5753ebe271c684abd9a2274fd2715
2023-02-04 12:34:50 -05:00
..
2020-09-29 16:52:18 -05:00
2022-07-22 03:42:17 -04:00
2023-02-04 12:34:50 -05:00
2021-04-27 11:09:48 +05:30
2020-07-22 18:24:26 +05:30