mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
Details: - Added the ability for the kernel developer to indicate the gemm micro- kernel as having a preference for accessing the micro-tile of C via contiguous rows (as opposed to contiguous columns). This property may be encoded in bli_kernel.h as BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS, which may be defined or left undefined. Leaving it undefined leads to the default assumption of column preference. - Changed conditionals in frame/3/*/*_front.c that induce transposition of the operation so that the transposition is induced only if there is disagreement between the storage of C and the preference of the micro-kernel. Previously, the only conditional that needed to be met was that C was row-stored, which is to say that we assumed the micro- kernel preferred column-contiguous access on C. - Added a "prefers_contig_rows" property to func_t objects, and updated calls to bli_func_obj_create() in _cntl.c files in order to support the above changes. - Removed the row-storage optimization from bli_trsm_front.c because it is actually ineffective. This is because the right-side case of trsm flips the A and B micro-panel operands (since BLIS only requires left-side gemmtrsm/trsm kernels), meaning any transposition done at the high level is then undone at the low level. - Tweaked trmm, trmm3 _front.c files to eliminate a possible redundant invocation of the bli_obj_swap() macro.