mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
-Inefficient assembly is generated for s16 gemm micro-kernel(intrinsics code) when compiled using gcc. The presence of -fschedule-insns + -fschedule-insns2 + -ftree-pre in O2 compiler optimization flags results in the code being optimized to reduce data stalls, and results in the usage of stack to store intermediate C register output. Disabling -ftree-pre in gcc fixes the issue, even in the presence of the other two flags. AMD-Internal: [CPUPL-2971] Change-Id: Ibf0dcde20b5a18708a05faad34e684eb0a9a5463
For more information on sub-configurations and configuration families in BLIS, please read the Configuration Guide, which can be viewed in markdown-rendered form from the BLIS wiki page.
If you don't have time, or are impatient, take a look at the config_registry
file in the top-level directory of the BLIS distribution. It contains a
grammar-like mapping of configuration names, or families, to sub-configurations,
which may be other families. Keep in mind that the / notation:
<config>: <config>/<name>
means that the kernel set associated with <name> should be made available to
the configuration <config> if <config> is targeted at configure-time.
(Some configurations borrow kernels from other configurations, and this is how
we specify that requirement.)