Added optional row preference to ukernel config.

Details:
- Added the ability for the kernel developer to indicate the gemm micro-
  kernel as having a preference for accessing the micro-tile of C via
  contiguous rows (as opposed to contiguous columns). This property may
  be encoded in bli_kernel.h as BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS,
  which may be defined or left undefined. Leaving it undefined leads to
  the default assumption of column preference.
- Changed conditionals in frame/3/*/*_front.c that induce transposition
  of the operation so that the transposition is induced only if there
  is disagreement between the storage of C and the preference of the
  micro-kernel. Previously, the only conditional that needed to be met
  was that C was row-stored, which is to say that we assumed the micro-
  kernel preferred column-contiguous access on C.
- Added a "prefers_contig_rows" property to func_t objects, and updated
  calls to bli_func_obj_create() in _cntl.c files in order to support
  the above changes.
- Removed the row-storage optimization from bli_trsm_front.c because
  it is actually ineffective. This is because the right-side case of
  trsm flips the A and B micro-panel operands (since BLIS only requires
  left-side gemmtrsm/trsm kernels), meaning any transposition done
  at the high level is then undone at the low level.
- Tweaked trmm, trmm3 _front.c files to eliminate a possible redundant
  invocation of the bli_obj_swap() macro.
This commit is contained in:
Field G. Van Zee
2014-08-19 15:49:19 -05:00
parent 4cc2b464f2
commit d0eec4bddd
28 changed files with 399 additions and 212 deletions

View File

@@ -36,16 +36,53 @@
#define BLIS_KERNEL_MACRO_DEFS_H
// -- Construct kernel function names ------------------------------------------
// -- Define row access bools --------------------------------------------------
// In this section we consider each datatype-specific "prefers contiguous rows"
// macro. If it is defined, we re-define it to be 1 (TRUE); otherwise, we
// define it to be 0 (FALSE).
// gemm micro-kernels
#ifdef BLIS_SGEMM_UKERNEL_PREFERS_CONTIG_ROWS
#undef BLIS_SGEMM_UKERNEL_PREFERS_CONTIG_ROWS
#define BLIS_SGEMM_UKERNEL_PREFERS_CONTIG_ROWS 1
#else
#define BLIS_SGEMM_UKERNEL_PREFERS_CONTIG_ROWS 0
#endif
#ifdef BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS
#undef BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS
#define BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS 1
#else
#define BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS 0
#endif
#ifdef BLIS_CGEMM_UKERNEL_PREFERS_CONTIG_ROWS
#undef BLIS_CGEMM_UKERNEL_PREFERS_CONTIG_ROWS
#define BLIS_CGEMM_UKERNEL_PREFERS_CONTIG_ROWS 1
#else
#define BLIS_CGEMM_UKERNEL_PREFERS_CONTIG_ROWS 0
#endif
#ifdef BLIS_ZGEMM_UKERNEL_PREFERS_CONTIG_ROWS
#undef BLIS_ZGEMM_UKERNEL_PREFERS_CONTIG_ROWS
#define BLIS_ZGEMM_UKERNEL_PREFERS_CONTIG_ROWS 1
#else
#define BLIS_ZGEMM_UKERNEL_PREFERS_CONTIG_ROWS 0
#endif
// -- Define default kernel names ----------------------------------------------
// In this section we consider each datatype-specific micro-kernel macro;
// if it is undefined, we define it to be the corresponding reference kernel.
// In the case of complex gemm micro-kernels, we also define special _VIA_4M
// macros so that later on we can tell whether or not to employ the 4m
// implementations. Note that in order to properly determine whether 4m is a
// viable option, we need to be able to test the existence of the real gemm
// micro-kernels, which means we must consider the complex gemm micro-kernel
// cases *BEFORE* the real cases.
// In the case of complex gemm micro-kernels, we also define special macros so
// that later on we can tell whether or not to employ the 4m implementations.
// Note that in order to properly determine whether/ 4m is a viable option, we
// need to be able to test the existence of the real gemm micro-kernels, which
// means we must consider the complex gemm micro-kernel cases *BEFORE* the
// real cases.
//
// Level-3