Files
blis/kernels
Nallani Bhaskar 4c2f436cce Peformance fixes for gcc compiler in fringe kernels
Description:
1. GCC avoiding loading b into registers in m fringe
   kenrels of int8 kernels. Instead gcc generating
   fma with memory as an operand for B input.

2. This is causing performance regression for larger n
   where each fma needs to load the input from memory
   again and again.

3. This is observed with gcc but not with clang.

4. Inserted dummy shuffle instructions for b data to
   further explicitly tell compiler that b needs to be in
   registers.

5. Moved packb_s4_to_bf16 under JIT macro to resovle
   compilation  issue with gcc version < 11.2

AMD-Internal: SWLCSG-2948

Change-Id: I5bd1bad7ad129e0dde91ed78d49a4ede3bff456a
2024-08-05 08:13:06 -04:00
..
2021-10-08 02:35:58 +09:00
2023-11-22 17:51:46 -05:00
2023-11-23 08:54:31 -05:00
2024-08-05 12:19:42 +05:30
2023-11-23 08:54:31 -05:00
2020-07-22 18:24:26 +05:30