blis/config/zen2 at 45d82a1ebfb4ee2fcd7aa4e823708fc72052de6f - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-24 18:34:40 +00:00

Files

Shubham Sharma 75df1ef218 Removed -fno-tree-loop-vectorize from kernel flags

- This change in made in CMAKE build system only.
- Removed -fno-tree-loop-vectorize from global kernel flags,
  instead added it to lpgemm specific kernels only.
- If this flag is not used , then gcc tries to auto
  vectorize the code which results in usages of
  vector registers, if the auto vectorized function
  is using intrinsics then the total numbers of vector
  registers used by intrinsic and auto vectorized
  code becomes more than the registers
  available in machine which causes read and writes
  to stack, which is causing regression in lpgemm.
- If this flag is enabled globally, then the files which
  do not use any intrinsic code do not get auto
  vectorized.
- To get optimal performance for both blis and lpgemm,
  this flag is enabled for lpgemm kernels only.

Change-Id: I14e5c18cd53b058bfc9d764a8eaf825b4d0a81c4

2024-07-19 00:49:52 -04:00

bli_cntx_init_zen2.c

Bugfix and optimizations for ?AXPBYV API

2024-06-20 16:22:07 +05:30

bli_family_zen2.h

Tidy zen bli_cntx_init and bli_family files

2023-10-04 05:14:39 -04:00

make_defs.cmake

Removed -fno-tree-loop-vectorize from kernel flags

2024-07-19 00:49:52 -04:00

make_defs.mk

Code cleanup: AMD copyright notice

2023-11-23 08:54:31 -05:00