mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
-The n fringe micro kernels uses only a few zmm registers for computing the output (eg: 6x16 uses 6 zmm registers for output as opposed to 24 used in 6x64). This results in lot of wasted registers that if utilized can help increase the MR dimension and thus improve the reuse of registers loaded with B. Based on this concept, the existing n fringe kernels are modified (6x16 -> 12x16, 6x32 -> 9x32). It is to be noted that the maximum number of registers are not used, since it results in cache inefficient code due to the increase in MR and thus more broadcasts required from unpacked A matrix. -Compiler flag updates for AOCC build to generate loops with 64 byte alignment. This has been observed to improve performance slightly when k dimension is small. AMD-Internal: [CPUPL-3173] Change-Id: I199ce75ef71d994ffe0067dac1ed804dce1742ca
For more information on sub-configurations and configuration families in BLIS, please read the Configuration Guide, which can be viewed in markdown-rendered form from the BLIS wiki page.
If you don't have time, or are impatient, take a look at the config_registry
file in the top-level directory of the BLIS distribution. It contains a
grammar-like mapping of configuration names, or families, to sub-configurations,
which may be other families. Keep in mind that the / notation:
<config>: <config>/<name>
means that the kernel set associated with <name> should be made available to
the configuration <config> if <config> is targeted at configure-time.
(Some configurations borrow kernels from other configurations, and this is how
we specify that requirement.)