Files
blis/kernels
Nallani Bhaskar e712673ab7 Peformance fixes for gcc compiler in fringe kernels
Description:
1. GCC avoiding loading b into registers in m fringe
   kenrels of int8 kernels. Instead gcc generating
   fma with memory as an operand for B input.

2. This is causing performance regression for larger n
   where each fma needs to load the input from memory
   again and again.

3. This is observed with gcc but not with clang.

4. Inserted dummy shuffle instructions for b data to
   further explicitly tell compiler that b needs to be in
   registers.

   AMD-Internal: SWLCSG-2948

Change-Id: Ibbf186fe6569e6265e2c2bb4ec3141ef323ea3e6
2024-08-05 14:31:22 -04:00
..
2021-10-08 02:35:58 +09:00
2023-11-22 17:51:46 -05:00
2023-11-23 08:54:31 -05:00
2023-11-23 08:54:31 -05:00
2020-07-22 18:24:26 +05:30