mirror of
https://github.com/amd/blis.git
synced 2026-05-24 18:34:40 +00:00
Existing Design: - GEMM AVX2 kernel performs computation and updates temporary C buffer - Portion of temporary C buffer is copied to output C buffer based on UPLO parameter - For diagonal blocks, using GEMM kernels is not efficient New Design: Implemented in current patch when UPLO='L' - GEMMT kernel used for computation, temporary buffer is not required. - Only required elements are computed using mask load store for all fringe cases - Exception: AVX2 code path is used when storage format is RRC, CRR, CRC - AOCL-Dynamic is added based on dimension - Check for AVX platform is added in SUP interface, It returns to native implementation if hardware doesnot support AVX platform - SUP ref_var2m is expanded for dcomplex datatype to avoid condition check which exists for double datatype AMD_Internal: [CPUPL-5006] Change-Id: I3e21404b732b8f2df9cbdba394303752fdf36286
For more information on sub-configurations and configuration families in BLIS, please read the Configuration Guide, which can be viewed in markdown-rendered form from the BLIS wiki page.
If you don't have time, or are impatient, take a look at the config_registry
file in the top-level directory of the BLIS distribution. It contains a
grammar-like mapping of configuration names, or families, to sub-configurations,
which may be other families. Keep in mind that the / notation:
<config>: <config>/<name>
means that the kernel set associated with <name> should be made available to
the configuration <config> if <config> is targeted at configure-time.
(Some configurations borrow kernels from other configurations, and this is how
we specify that requirement.)