mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
- Kernel block size is 12x4 - Updated the zen4 config to enable these kernels in zen4 path. - Tuned MC,KC,NC for better performance for m/n/k size > 500 - Updated CMakeLists.txt with ZGEMM kernels for windows build. Kernel supports: 1. Preload and prebroadcast of A and B 2. Prefecth of C Matrix 3. K loop is sub divided in to multiple loops to maintain distance between c prefetchs. 4. Special case when alpha/beta imag component is zero 5. Row/Col/General stride of Matrix C AMD-Internal: [CPUPL-2998] Change-Id: I62e3c352d475b1add3f43270805fbcee00e2e440
For more information on sub-configurations and configuration families in BLIS, please read the Configuration Guide, which can be viewed in markdown-rendered form from the BLIS wiki page.
If you don't have time, or are impatient, take a look at the config_registry
file in the top-level directory of the BLIS distribution. It contains a
grammar-like mapping of configuration names, or families, to sub-configurations,
which may be other families. Keep in mind that the / notation:
<config>: <config>/<name>
means that the kernel set associated with <name> should be made available to
the configuration <config> if <config> is targeted at configure-time.
(Some configurations borrow kernels from other configurations, and this is how
we specify that requirement.)