blis/kernels at 4c2f436cce130a0e5ebeceef9b9b2e59a2617ac5 - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-24 02:14:33 +00:00

Files

History

Nallani Bhaskar 4c2f436cce Peformance fixes for gcc compiler in fringe kernels

Description:
1. GCC avoiding loading b into registers in m fringe
   kenrels of int8 kernels. Instead gcc generating
   fma with memory as an operand for B input.

2. This is causing performance regression for larger n
   where each fma needs to load the input from memory
   again and again.

3. This is observed with gcc but not with clang.

4. Inserted dummy shuffle instructions for b data to
   further explicitly tell compiler that b needs to be in
   registers.

5. Moved packb_s4_to_bf16 under JIT macro to resovle
   compilation  issue with gcc version < 11.2

AMD-Internal: SWLCSG-2948

Change-Id: I5bd1bad7ad129e0dde91ed78d49a4ede3bff456a

2024-08-05 08:13:06 -04:00

..

Merge commit 'cfa3db3f' into amd-main

2024-07-08 06:09:11 -04:00

Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.

2021-09-29 16:43:38 -05:00

Armv8 Trash New Bulk Kernels

2021-10-08 02:35:58 +09:00

Replaced use of bool_t type with C99 bool.

2020-08-03 11:27:13 +05:30

BLIS: Missing clobbers (batch 7)

2023-11-22 17:51:46 -05:00

Added a dummy file to kernels/generic.

2017-11-21 12:34:20 -06:00

BUGFIX: Updated ZGEMM microkernel to handle alpha = 0 case

2024-06-20 02:58:43 -04:00

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

BLIS: Missing clobbers (batch 7)

2023-11-22 17:51:46 -05:00

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Optionally disable trsm diagonal pre-inversion.

2020-12-04 16:08:15 -06:00

BLIS library porting on to Windows:

2020-06-16 18:29:00 +05:30

Remove UT-Austin from copyright headers' clause 3.

2018-12-04 14:31:06 -06:00

Merge commit 'e366665c' into amd-main

2023-10-18 09:09:54 -04:00

Code cleanup: No newline at end of file

2023-04-21 10:02:48 -04:00

BLIS: Missing clobbers (batch 7)

2023-11-22 17:51:46 -05:00

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

DGEMV Optimizations for Tiny Sizes

2024-08-05 12:19:42 +05:30

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00

Added support for zen3 configuration

2020-07-22 18:24:26 +05:30

Peformance fixes for gcc compiler in fringe kernels

2024-08-05 08:13:06 -04:00

Tuning the decision logic to choose SUP vs Native for ZGEMM

2024-08-03 19:08:07 +05:30

CMakeLists.txt

Removed -fno-tree-loop-vectorize from kernel flags

2024-07-19 00:49:52 -04:00