Files
blis/kernels
Hari Govind S 8998839c71 Optimisation of DGEMV Transpose Case for unit stride
- Included a new code section to handle input having non-unit strided y
  vector for dgemv transpose case. Removed the same from the respective
  kernels to avoid repeated branching caused by condition checks within
  the 'for' loop.

- The condition check for beta is equal to zero in the primary kernels
  are moved outside the for loop to avoid repeated branching.

- The '_mm512_reduce_pd' operations in the primary kernel is replaced by
  a series of operations to reduce the number of instructions required
  to reduce the 8 registers.

- Changing naming convention for DGEMV transpose kernels.

- Modified unit kernel test to avoid y increment for dgemv tranpose
  kernels during the test.

AMD-Internal: [CPUPL-6565]
Change-Id: I1ac516d6b8f156ac53ac9f6eb18badd50e152e05
2025-03-06 05:15:58 -05:00
..
2021-10-08 02:35:58 +09:00
2024-08-05 15:35:08 -04:00
2025-02-07 10:39:24 -05:00
2024-08-05 15:35:08 -04:00
2024-08-05 15:35:08 -04:00
2023-11-23 08:54:31 -05:00
2020-07-22 18:24:26 +05:30
2024-08-05 15:35:08 -04:00