mirror of
https://github.com/amd/blis.git
synced 2026-05-11 01:30:00 +00:00
Added optional row preference to ukernel config.
Details: - Added the ability for the kernel developer to indicate the gemm micro- kernel as having a preference for accessing the micro-tile of C via contiguous rows (as opposed to contiguous columns). This property may be encoded in bli_kernel.h as BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS, which may be defined or left undefined. Leaving it undefined leads to the default assumption of column preference. - Changed conditionals in frame/3/*/*_front.c that induce transposition of the operation so that the transposition is induced only if there is disagreement between the storage of C and the preference of the micro-kernel. Previously, the only conditional that needed to be met was that C was row-stored, which is to say that we assumed the micro- kernel preferred column-contiguous access on C. - Added a "prefers_contig_rows" property to func_t objects, and updated calls to bli_func_obj_create() in _cntl.c files in order to support the above changes. - Removed the row-storage optimization from bli_trsm_front.c because it is actually ineffective. This is because the right-side case of trsm flips the A and B micro-panel operands (since BLIS only requires left-side gemmtrsm/trsm kernels), meaning any transposition done at the high level is then undone at the low level. - Tweaked trmm, trmm3 _front.c files to eliminate a possible redundant invocation of the bli_obj_swap() macro.
This commit is contained in:
@@ -36,16 +36,53 @@
|
||||
#define BLIS_KERNEL_MACRO_DEFS_H
|
||||
|
||||
|
||||
// -- Construct kernel function names ------------------------------------------
|
||||
// -- Define row access bools --------------------------------------------------
|
||||
|
||||
// In this section we consider each datatype-specific "prefers contiguous rows"
|
||||
// macro. If it is defined, we re-define it to be 1 (TRUE); otherwise, we
|
||||
// define it to be 0 (FALSE).
|
||||
|
||||
// gemm micro-kernels
|
||||
|
||||
#ifdef BLIS_SGEMM_UKERNEL_PREFERS_CONTIG_ROWS
|
||||
#undef BLIS_SGEMM_UKERNEL_PREFERS_CONTIG_ROWS
|
||||
#define BLIS_SGEMM_UKERNEL_PREFERS_CONTIG_ROWS 1
|
||||
#else
|
||||
#define BLIS_SGEMM_UKERNEL_PREFERS_CONTIG_ROWS 0
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS
|
||||
#undef BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS
|
||||
#define BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS 1
|
||||
#else
|
||||
#define BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS 0
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_CGEMM_UKERNEL_PREFERS_CONTIG_ROWS
|
||||
#undef BLIS_CGEMM_UKERNEL_PREFERS_CONTIG_ROWS
|
||||
#define BLIS_CGEMM_UKERNEL_PREFERS_CONTIG_ROWS 1
|
||||
#else
|
||||
#define BLIS_CGEMM_UKERNEL_PREFERS_CONTIG_ROWS 0
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ZGEMM_UKERNEL_PREFERS_CONTIG_ROWS
|
||||
#undef BLIS_ZGEMM_UKERNEL_PREFERS_CONTIG_ROWS
|
||||
#define BLIS_ZGEMM_UKERNEL_PREFERS_CONTIG_ROWS 1
|
||||
#else
|
||||
#define BLIS_ZGEMM_UKERNEL_PREFERS_CONTIG_ROWS 0
|
||||
#endif
|
||||
|
||||
|
||||
// -- Define default kernel names ----------------------------------------------
|
||||
|
||||
// In this section we consider each datatype-specific micro-kernel macro;
|
||||
// if it is undefined, we define it to be the corresponding reference kernel.
|
||||
// In the case of complex gemm micro-kernels, we also define special _VIA_4M
|
||||
// macros so that later on we can tell whether or not to employ the 4m
|
||||
// implementations. Note that in order to properly determine whether 4m is a
|
||||
// viable option, we need to be able to test the existence of the real gemm
|
||||
// micro-kernels, which means we must consider the complex gemm micro-kernel
|
||||
// cases *BEFORE* the real cases.
|
||||
// In the case of complex gemm micro-kernels, we also define special macros so
|
||||
// that later on we can tell whether or not to employ the 4m implementations.
|
||||
// Note that in order to properly determine whether/ 4m is a viable option, we
|
||||
// need to be able to test the existence of the real gemm micro-kernels, which
|
||||
// means we must consider the complex gemm micro-kernel cases *BEFORE* the
|
||||
// real cases.
|
||||
|
||||
//
|
||||
// Level-3
|
||||
|
||||
Reference in New Issue
Block a user