mirror of
https://github.com/amd/blis.git
synced 2026-05-13 02:25:39 +00:00
Details:
- Reorganized the way kernels are stored within the cntx_t structure so
that rather than having a function pointer for every supported size of
unrolled packm kernel (2xk, 3xk, 4xk, etc.), we store only two packm
kernels per datatype: one to pack MRxk micropanels and one to pack
NRxk micropanels.
- NOTE: The "bb" (broadcast B) reference kernels have been merged into
the "standard" kernels (packm [including 1er and unpackm], gemm,
trsm, gemmtrsm). This replication factor is controlled by
BLIS_BB[MN]_[sdcz] etc. Power9/10 needs testing since only a
replication factor of 1 has been tested. armsve also needs testing
since the MR value isn't available as a macro.
- Simplified the bli_cntx_*() APIs to conform to the new unified kernel
array within the cntx_t. Updated existing bli_cntx_init_<subconfig>()
function definitions for all subconfigurations.
- Consolidated all kernel id types (e.g. l1vkr_t, l1mkr_t, l3ukr_t,
etc.) into one kernel id type: ukr_t.
- Various edits, updates, and rewrites of reference kernels pursuant to
the aforementioned changes.
- Define compile-time macro constants (BLIS_MR_[sdcz], BLIS_NR_[sdcz],
and friends) in bli_kernel_macro_defs.h, but only when the macro
BLIS_IN_REF_KERNEL is defined by the build system.
- Loose ends:
- Still need to update documentation, including:
- docs/ConfigurationHowTo.md
- docs/KernelsHowTo.md
to reflect changes made in this commit.