mirror of
https://github.com/amd/blis.git
synced 2026-05-24 18:34:40 +00:00
- In order to reuse 24x8 AVX512 DGEMM SUP kernels, 24x8 triangular AVX512 DGEMMT SUP kernels are added. - Since the LCM of MR(24) and NR(8) is 24, therefore the diagonal pattern repeats every 24x24 block of C. To cover this 24x24 block, 3 kernels are needed for one variant of DGEMMT. A total of 6 kernels are needed to cover both upper and lower variants. - In order to maximize code reuse, the 24x8 kernels are broken into two parts, 8x8 diagonal GEMM and 16x8 full GEMM. The 8x8 diagonal GEMM is computed by 8x8 diagonal kernel, and 16x8 full GEMM part is computed by 24x8 DGEMM SUP kernel. - Changes are made in framework to enable the use of these kernels. AMD-Internal: [CPUPL-5338] Change-Id: I8e7007031e906f786b0c4fe12377ee439075207a
For more information on sub-configurations and configuration families in BLIS, please read the Configuration Guide, which can be viewed in markdown-rendered form from the BLIS wiki page.
If you don't have time, or are impatient, take a look at the config_registry
file in the top-level directory of the BLIS distribution. It contains a
grammar-like mapping of configuration names, or families, to sub-configurations,
which may be other families. Keep in mind that the / notation:
<config>: <config>/<name>
means that the kernel set associated with <name> should be made available to
the configuration <config> if <config> is targeted at configure-time.
(Some configurations borrow kernels from other configurations, and this is how
we specify that requirement.)