blis/kernels at 8998839c71ffd4d4b4c50273ddb63fc4f2976f93 - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-04-19 23:28:52 +00:00

Files

Hari Govind S 8998839c71 Optimisation of DGEMV Transpose Case for unit stride

- Included a new code section to handle input having non-unit strided y
  vector for dgemv transpose case. Removed the same from the respective
  kernels to avoid repeated branching caused by condition checks within
  the 'for' loop.

- The condition check for beta is equal to zero in the primary kernels
  are moved outside the for loop to avoid repeated branching.

- The '_mm512_reduce_pd' operations in the primary kernel is replaced by
  a series of operations to reduce the number of instructions required
  to reduce the 8 registers.

- Changing naming convention for DGEMV transpose kernels.

- Modified unit kernel test to avoid y increment for dgemv tranpose
  kernels during the test.

AMD-Internal: [CPUPL-6565]
Change-Id: I1ac516d6b8f156ac53ac9f6eb18badd50e152e05

2025-03-06 05:15:58 -05:00

armsve

Merge commit 'cfa3db3f' into amd-main

2024-07-08 06:09:11 -04:00

armv7a

Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.

2021-09-29 16:43:38 -05:00

armv8a

Armv8 Trash New Bulk Kernels

2021-10-08 02:35:58 +09:00

bgq

Replaced use of bool_t type with C99 bool.