mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
-Certain sections of the f32 avx512 micro-kernel were observed to slow down when more post-ops are added. Analysis of the binary pointed to false dependencies in instructions being introduced in the presence of the extra post-ops. Addition of vzeroupper at the beginning of ir loop in f32 micro-kernel fixes this issue. -F32 gemm (lpgemm) thread factorization tuning for zen4/zen3 added. -Alpha scaling (multiply instruction) by default was resulting in performance regression when k dimension is small and alpha=1 in s32 micro-kernels. Alpha scaling is now only done when alpha != 1. -s16 micro-kernel performance was observed to be regressing when compiled with gcc for zen3 and older architecture supporting avx2. This issue is not observed when compiling using gcc with avx512 support enabled. The root cause was identified to be the -fgcse optimization flag in O2 when applied with avx2 support. This flag is now disabled for zen3 and older zen configs. AMD-Internal: [CPUPL-3067] Change-Id: I5aef9013432c037eb2edf28fdc89470a2eddad1c
For more information on sub-configurations and configuration families in BLIS, please read the Configuration Guide, which can be viewed in markdown-rendered form from the BLIS wiki page.
If you don't have time, or are impatient, take a look at the config_registry
file in the top-level directory of the BLIS distribution. It contains a
grammar-like mapping of configuration names, or families, to sub-configurations,
which may be other families. Keep in mind that the / notation:
<config>: <config>/<name>
means that the kernel set associated with <name> should be made available to
the configuration <config> if <config> is targeted at configure-time.
(Some configurations borrow kernels from other configurations, and this is how
we specify that requirement.)