mirror of
https://github.com/amd/blis.git
synced 2026-04-19 23:28:52 +00:00
- Included a new code section to handle input having non-unit strided y vector for dgemv transpose case. Removed the same from the respective kernels to avoid repeated branching caused by condition checks within the 'for' loop. - The condition check for beta is equal to zero in the primary kernels are moved outside the for loop to avoid repeated branching. - The '_mm512_reduce_pd' operations in the primary kernel is replaced by a series of operations to reduce the number of instructions required to reduce the 8 registers. - Changing naming convention for DGEMV transpose kernels. - Modified unit kernel test to avoid y increment for dgemv tranpose kernels during the test. AMD-Internal: [CPUPL-6565] Change-Id: I1ac516d6b8f156ac53ac9f6eb18badd50e152e05