mirror of
https://github.com/amd/blis.git
synced 2026-05-11 01:30:00 +00:00
Implemented runtime contexts and reorganized code.
Details:
- Retrofitted a new data structure, known as a context, into virtually
all internal APIs for computational operations in BLIS. The structure
is now present within the type-aware APIs, as well as many supporting
utility functions that require information stored in the context. User-
level object APIs were unaffected and continue to be "context-free,"
however, these APIs were duplicated/mirrored so that "context-aware"
APIs now also exist, differentiated with an "_ex" suffix (for "expert").
These new context-aware object APIs (along with the lower-level, type-
aware, BLAS-like APIs) contain the the address of a context as a last
parameter, after all other operands. Contexts, or specifically, cntx_t
object pointers, are passed all the way down the function stack into
the kernels and allow the code at any level to query information about
the runtime, such as kernel addresses and blocksizes, in a thread-
friendly manner--that is, one that allows thread-safety, even if the
original source of the information stored in the context changes at
run-time; see next bullet for more on this "original source" of info).
(Special thanks go to Lee Killough for suggesting the use of this kind
of data structure in discussions that transpired during the early
planning stages of BLIS, and also for suggesting such a perfectly
appropriate name.)
- Added a new API, in frame/base/bli_gks.c, to define a "global kernel
structure" (gks). This data structure and API will allow the caller to
initialize a context with the kernel addresses, blocksizes, and other
information associated with the currently active kernel configuration.
The currently active kernel configuration within the gks cannot be
changed (for now), and is initialized with the traditional cpp macros
that define kernel function names, blocksizes, and the like. However,
in the future, the gks API will be expanded to allow runtime management
of kernels and runtime parameters. The most obvious application of this
new infrastructure is the runtime detection of hardware (and the
implied selection of appropriate kernels). With contexts in place,
kernels may even be "hot swapped" at runtime within the gks. Once
execution enters a level-3 _front() function, the memory allocator will
be reinitialized on-the-fly, if necessary, to accommodate the new
kernels' blocksizes. If another application thread is executing with
another (previously loaded) kernel, it will finish in a deterministic
fashion because its kernel information was loaded into its context
before computation began, and also because the blocks it checked out
from the internal memory pools will be unaffected by the newer threads'
reinitialization of the allocator.
- Reorganized and streamlined the 'ind' directory, which contains much of
the code enabling use of induced methods for complex domain matrix
multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as
those APIs' functionality is now mostly subsumed within the global
kernel structure.
- Updated bli_pool.c to define a new function, bli_pool_reinit_if(),
that will reinitialize a memory pool if the necessary pool block size
has increased.
- Updated bli_mem.c to use bli_pool_reinit_if() instead of
bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed
usage of contexts where appropriate to communicate cache and register
blocksizes to bli_mem_compute_pool_block_sizes().
- Simplified control trees now that much of the information resides in
the context and/or the global kernel structure:
- Removed blocksize object pointers (blksz_t*) fields from all control
tree node definitions and replaced them with blocksize id (bszid_t)
values instead, which may be passed into a context query routine in
order to extract the corresponding blocksize from the given context.
- Removed micro-kernel function pointers (func_t*) fields from all
control tree node definitions. Now, any code that needs these function
pointers can query them from the local context, as identified by a
level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or
level-1v kernel id (l1vkr_t).
- Removed blksz_t object creation and initialization, as well as kernel
function object creation and initialization, from all operation-
specific control tree initialization files (bli_*_cntl.c), since this
information will now live in the gks and, secondarily, in the context.
- Removed blocksize multiples from blksz_t objects. Now, we track
blocksize multiples for each blocksize id (bszid_t) in the context
object.
- Removed the bool_t's that were required when a func_t was initialized.
These bools are meant to allow one to track the micro-kernel's storage
preferences (by rows or columns). This preference is now tracked
separately within the gks and contexts.
- Merged and reorganized many separate-but-related functions into single
files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and
util directories, but has the most obvious effect of allowing BLIS
to compile noticeably faster.
- Reorganized execution paths for level-1v, -1d, -1m, and -2 operations
in an attempt to reduce overhead for memory-bound operations. This
includes removal of default use of object-based variants for level-2
operations. Now, by default, level-2 operations will directly call a
low-level (non-object based) loop over a level-1v or -1f kernel.
- Converted many common query functions in blk_blksz.c (renamed from
bli_blocksize.c) and bli_func.c into cpp macros, now defined in their
respective header files.
- Defined bli_mbool.c API to create and query "multi-bools", or
heterogeneous bool_t's (one for each floating-point datatype), in the
same spirit as blksz_t and func_t.
- Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS
and BLIS_SIMD_SIZE. These values are needed in order to compute a third
new parameter, which may be set indirectly via the aforementioned
macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to
statically allocate memory in macro-kernels and the induced methods'
virtual kernels to be used as temporary space to hold a single
micro-tile. These values are now output by the testsuite. The default
value of BLIS_STACK_BUF_MAX_SIZE is computed as
"2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE".
- Cleaned up top-level 'kernels' directory (for example, renaming the
embarrassingly misleading "avx" and "avx2" directories to "sandybridge"
and "haswell," respectively, and gave more consistent and meaningful
names to many kernel files (as well as updating their interfaces to
conform to the new context-aware kernel APIs).
- Updated the testsuite to query blocksizes from a locally-initialized
context for test modules that need those values: axpyf, dotxf,
dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr.
- Reformatted many function signatures into a standard format that will
more easily facilitate future API-wide changes.
- Updated many "mxn" level-0 macros (ie: those used to inline double loops
for level-1m-like operations on small matrices) in frame/include/level0
to use more obscure local variable names in an effort to avoid variable
shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings,
which are only output using -Wshadow.)
- Added a conj argument to setm, so that its interface now mirrors that
of scalm. The semantic meaning of the conj argument is to optionally
allow implicit conjugation of the scalar prior to being populated into
the object.
- Deprecated all type-aware mixed domain and mixed precision APIs. Note
that this does not preclude supporting mixed types via the object APIs,
where it produces absolutely zero API code bloat.
This commit is contained in:
@@ -519,10 +519,6 @@
|
||||
|
||||
// axpy2v kernels
|
||||
|
||||
//#ifndef AXPY2V_KERNEL
|
||||
//#define AXPY2V_KERNEL AXPY2V_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SAXPY2V_KERNEL
|
||||
#define BLIS_SAXPY2V_KERNEL BLIS_SAXPY2V_KERNEL_REF
|
||||
#endif
|
||||
@@ -541,10 +537,6 @@
|
||||
|
||||
// dotaxpyv kernels
|
||||
|
||||
//#ifndef DOTAXPYV_KERNEL
|
||||
//#define DOTAXPYV_KERNEL DOTAXPYV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SDOTAXPYV_KERNEL
|
||||
#define BLIS_SDOTAXPYV_KERNEL BLIS_SDOTAXPYV_KERNEL_REF
|
||||
#endif
|
||||
@@ -563,10 +555,6 @@
|
||||
|
||||
// axpyf kernels
|
||||
|
||||
//#ifndef AXPYF_KERNEL
|
||||
//#define AXPYF_KERNEL AXPYF_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SAXPYF_KERNEL
|
||||
#define BLIS_SAXPYF_KERNEL BLIS_SAXPYF_KERNEL_REF
|
||||
#endif
|
||||
@@ -585,10 +573,6 @@
|
||||
|
||||
// dotxf kernels
|
||||
|
||||
//#ifndef DOTXF_KERNEL
|
||||
//#define DOTXF_KERNEL DOTXF_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SDOTXF_KERNEL
|
||||
#define BLIS_SDOTXF_KERNEL BLIS_SDOTXF_KERNEL_REF
|
||||
#endif
|
||||
@@ -607,10 +591,6 @@
|
||||
|
||||
// dotxaxpyf kernels
|
||||
|
||||
//#ifndef DOTXAXPYF_KERNEL
|
||||
//#define DOTXAXPYF_KERNEL DOTXAXPYF_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SDOTXAXPYF_KERNEL
|
||||
#define BLIS_SDOTXAXPYF_KERNEL BLIS_SDOTXAXPYF_KERNEL_REF
|
||||
#endif
|
||||
@@ -633,10 +613,6 @@
|
||||
|
||||
// addv kernels
|
||||
|
||||
//#ifndef ADDV_KERNEL
|
||||
//#define ADDV_KERNEL ADDV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SADDV_KERNEL
|
||||
#define BLIS_SADDV_KERNEL BLIS_SADDV_KERNEL_REF
|
||||
#endif
|
||||
@@ -655,10 +631,6 @@
|
||||
|
||||
// axpyv kernels
|
||||
|
||||
//#ifndef AXPYV_KERNEL
|
||||
//#define AXPYV_KERNEL AXPYV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SAXPYV_KERNEL
|
||||
#define BLIS_SAXPYV_KERNEL BLIS_SAXPYV_KERNEL_REF
|
||||
#endif
|
||||
@@ -677,10 +649,6 @@
|
||||
|
||||
// copyv kernels
|
||||
|
||||
//#ifndef COPYV_KERNEL
|
||||
//#define COPYV_KERNEL COPYV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SCOPYV_KERNEL
|
||||
#define BLIS_SCOPYV_KERNEL BLIS_SCOPYV_KERNEL_REF
|
||||
#endif
|
||||
@@ -699,10 +667,6 @@
|
||||
|
||||
// dotv kernels
|
||||
|
||||
//#ifndef DOTV_KERNEL
|
||||
//#define DOTV_KERNEL DOTV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SDOTV_KERNEL
|
||||
#define BLIS_SDOTV_KERNEL BLIS_SDOTV_KERNEL_REF
|
||||
#endif
|
||||
@@ -721,10 +685,6 @@
|
||||
|
||||
// dotxv kernels
|
||||
|
||||
//#ifndef DOTXV_KERNEL
|
||||
//#define DOTXV_KERNEL DOTXV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SDOTXV_KERNEL
|
||||
#define BLIS_SDOTXV_KERNEL BLIS_SDOTXV_KERNEL_REF
|
||||
#endif
|
||||
@@ -743,10 +703,6 @@
|
||||
|
||||
// invertv kernels
|
||||
|
||||
//#ifndef INVERTV_KERNEL
|
||||
//#define INVERTV_KERNEL INVERTV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SINVERTV_KERNEL
|
||||
#define BLIS_SINVERTV_KERNEL BLIS_SINVERTV_KERNEL_REF
|
||||
#endif
|
||||
@@ -765,10 +721,6 @@
|
||||
|
||||
// scal2v kernels
|
||||
|
||||
//#ifndef SCAL2V_KERNEL
|
||||
//#define SCAL2V_KERNEL SCAL2V_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SSCAL2V_KERNEL
|
||||
#define BLIS_SSCAL2V_KERNEL BLIS_SSCAL2V_KERNEL_REF
|
||||
#endif
|
||||
@@ -787,10 +739,6 @@
|
||||
|
||||
// scalv kernels
|
||||
|
||||
//#ifndef SCALV_KERNEL
|
||||
//#define SCALV_KERNEL SCALV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SSCALV_KERNEL
|
||||
#define BLIS_SSCALV_KERNEL BLIS_SSCALV_KERNEL_REF
|
||||
#endif
|
||||
@@ -809,10 +757,6 @@
|
||||
|
||||
// setv kernels
|
||||
|
||||
//#ifndef SETV_KERNEL
|
||||
//#define SETV_KERNEL SETV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SSETV_KERNEL
|
||||
#define BLIS_SSETV_KERNEL BLIS_SSETV_KERNEL_REF
|
||||
#endif
|
||||
@@ -831,10 +775,6 @@
|
||||
|
||||
// subv kernels
|
||||
|
||||
//#ifndef SUBV_KERNEL
|
||||
//#define SUBV_KERNEL SUBV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SSUBV_KERNEL
|
||||
#define BLIS_SSUBV_KERNEL BLIS_SSUBV_KERNEL_REF
|
||||
#endif
|
||||
@@ -853,10 +793,6 @@
|
||||
|
||||
// swapv kernels
|
||||
|
||||
//#ifndef SWAPV_KERNEL
|
||||
//#define SWAPV_KERNEL SWAPV_KERNEL_REF
|
||||
//#endif
|
||||
|
||||
#ifndef BLIS_SSWAPV_KERNEL
|
||||
#define BLIS_SSWAPV_KERNEL BLIS_SSWAPV_KERNEL_REF
|
||||
#endif
|
||||
@@ -1106,42 +1042,42 @@
|
||||
|
||||
// NOTE: These values determine high-level cache blocking for level-2
|
||||
// operations ONLY. So, if gemv is performed with a 2000x2000 matrix A and
|
||||
// MC = NC = 1000, then a total of four unblocked (or unblocked fused)
|
||||
// M2 = N2 = 1000, then a total of four unblocked (or unblocked fused)
|
||||
// gemv subproblems are called. The blocked algorithms are only useful in
|
||||
// that they provide the opportunity for packing vectors. (Matrices can also
|
||||
// be packed here, but this tends to be much too expensive in practice to
|
||||
// actually employ.)
|
||||
|
||||
#ifndef BLIS_DEFAULT_L2_MC_S
|
||||
#define BLIS_DEFAULT_L2_MC_S 1000
|
||||
#ifndef BLIS_DEFAULT_M2_S
|
||||
#define BLIS_DEFAULT_M2_S 1000
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DEFAULT_L2_NC_S
|
||||
#define BLIS_DEFAULT_L2_NC_S 1000
|
||||
#ifndef BLIS_DEFAULT_N2_S
|
||||
#define BLIS_DEFAULT_N2_S 1000
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DEFAULT_L2_MC_D
|
||||
#define BLIS_DEFAULT_L2_MC_D 1000
|
||||
#ifndef BLIS_DEFAULT_M2_D
|
||||
#define BLIS_DEFAULT_M2_D 1000
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DEFAULT_L2_NC_D
|
||||
#define BLIS_DEFAULT_L2_NC_D 1000
|
||||
#ifndef BLIS_DEFAULT_N2_D
|
||||
#define BLIS_DEFAULT_N2_D 1000
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DEFAULT_L2_MC_C
|
||||
#define BLIS_DEFAULT_L2_MC_C 1000
|
||||
#ifndef BLIS_DEFAULT_M2_C
|
||||
#define BLIS_DEFAULT_M2_C 1000
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DEFAULT_L2_NC_C
|
||||
#define BLIS_DEFAULT_L2_NC_C 1000
|
||||
#ifndef BLIS_DEFAULT_N2_C
|
||||
#define BLIS_DEFAULT_N2_C 1000
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DEFAULT_L2_MC_Z
|
||||
#define BLIS_DEFAULT_L2_MC_Z 1000
|
||||
#ifndef BLIS_DEFAULT_M2_Z
|
||||
#define BLIS_DEFAULT_M2_Z 1000
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DEFAULT_L2_NC_Z
|
||||
#define BLIS_DEFAULT_L2_NC_Z 1000
|
||||
#ifndef BLIS_DEFAULT_N2_Z
|
||||
#define BLIS_DEFAULT_N2_Z 1000
|
||||
#endif
|
||||
|
||||
//
|
||||
@@ -1150,74 +1086,74 @@
|
||||
|
||||
// Global level-1f fusing factors.
|
||||
|
||||
#ifndef BLIS_L1F_FUSE_FAC_S
|
||||
#define BLIS_L1F_FUSE_FAC_S 8
|
||||
#ifndef BLIS_DEFAULT_1F_S
|
||||
#define BLIS_DEFAULT_1F_S 8
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_L1F_FUSE_FAC_D
|
||||
#define BLIS_L1F_FUSE_FAC_D 4
|
||||
#ifndef BLIS_DEFAULT_1F_D
|
||||
#define BLIS_DEFAULT_1F_D 4
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_L1F_FUSE_FAC_C
|
||||
#define BLIS_L1F_FUSE_FAC_C 4
|
||||
#ifndef BLIS_DEFAULT_1F_C
|
||||
#define BLIS_DEFAULT_1F_C 4
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_L1F_FUSE_FAC_Z
|
||||
#define BLIS_L1F_FUSE_FAC_Z 2
|
||||
#ifndef BLIS_DEFAULT_1F_Z
|
||||
#define BLIS_DEFAULT_1F_Z 2
|
||||
#endif
|
||||
|
||||
// axpyf
|
||||
|
||||
#ifndef BLIS_AXPYF_FUSE_FAC_S
|
||||
#define BLIS_AXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
#ifndef BLIS_DEFAULT_AF_S
|
||||
#define BLIS_DEFAULT_AF_S BLIS_DEFAULT_1F_S
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_AXPYF_FUSE_FAC_D
|
||||
#define BLIS_AXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
#ifndef BLIS_DEFAULT_AF_D
|
||||
#define BLIS_DEFAULT_AF_D BLIS_DEFAULT_1F_D
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_AXPYF_FUSE_FAC_C
|
||||
#define BLIS_AXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
#ifndef BLIS_DEFAULT_AF_C
|
||||
#define BLIS_DEFAULT_AF_C BLIS_DEFAULT_1F_C
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_AXPYF_FUSE_FAC_Z
|
||||
#define BLIS_AXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
#ifndef BLIS_DEFAULT_AF_Z
|
||||
#define BLIS_DEFAULT_AF_Z BLIS_DEFAULT_1F_Z
|
||||
#endif
|
||||
|
||||
// dotxf
|
||||
|
||||
#ifndef BLIS_DOTXF_FUSE_FAC_S
|
||||
#define BLIS_DOTXF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
#ifndef BLIS_DEFAULT_DF_S
|
||||
#define BLIS_DEFAULT_DF_S BLIS_DEFAULT_1F_S
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DOTXF_FUSE_FAC_D
|
||||
#define BLIS_DOTXF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
#ifndef BLIS_DEFAULT_DF_D
|
||||
#define BLIS_DEFAULT_DF_D BLIS_DEFAULT_1F_D
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DOTXF_FUSE_FAC_C
|
||||
#define BLIS_DOTXF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
#ifndef BLIS_DEFAULT_DF_C
|
||||
#define BLIS_DEFAULT_DF_C BLIS_DEFAULT_1F_C
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DOTXF_FUSE_FAC_Z
|
||||
#define BLIS_DOTXF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
#ifndef BLIS_DEFAULT_DF_Z
|
||||
#define BLIS_DEFAULT_DF_Z BLIS_DEFAULT_1F_Z
|
||||
#endif
|
||||
|
||||
// dotxaxpyf
|
||||
|
||||
#ifndef BLIS_DOTXAXPYF_FUSE_FAC_S
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
#ifndef BLIS_DEFAULT_XF_S
|
||||
#define BLIS_DEFAULT_XF_S BLIS_DEFAULT_1F_S
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DOTXAXPYF_FUSE_FAC_D
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
#ifndef BLIS_DEFAULT_XF_D
|
||||
#define BLIS_DEFAULT_XF_D BLIS_DEFAULT_1F_D
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DOTXAXPYF_FUSE_FAC_C
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
#ifndef BLIS_DEFAULT_XF_C
|
||||
#define BLIS_DEFAULT_XF_C BLIS_DEFAULT_1F_C
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DOTXAXPYF_FUSE_FAC_Z
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
#ifndef BLIS_DEFAULT_XF_Z
|
||||
#define BLIS_DEFAULT_XF_Z BLIS_DEFAULT_1F_Z
|
||||
#endif
|
||||
|
||||
//
|
||||
@@ -1228,20 +1164,20 @@
|
||||
// non-contiguous vectors. Similar to that of KR, they can
|
||||
// typically be set to 1.
|
||||
|
||||
#ifndef BLIS_DEFAULT_VR_S
|
||||
#define BLIS_DEFAULT_VR_S 1
|
||||
#ifndef BLIS_DEFAULT_VF_S
|
||||
#define BLIS_DEFAULT_VF_S 1
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DEFAULT_VR_D
|
||||
#define BLIS_DEFAULT_VR_D 1
|
||||
#ifndef BLIS_DEFAULT_VF_D
|
||||
#define BLIS_DEFAULT_VF_D 1
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DEFAULT_VR_C
|
||||
#define BLIS_DEFAULT_VR_C 1
|
||||
#ifndef BLIS_DEFAULT_VF_C
|
||||
#define BLIS_DEFAULT_VF_C 1
|
||||
#endif
|
||||
|
||||
#ifndef BLIS_DEFAULT_VR_Z
|
||||
#define BLIS_DEFAULT_VR_Z 1
|
||||
#ifndef BLIS_DEFAULT_VF_Z
|
||||
#define BLIS_DEFAULT_VF_Z 1
|
||||
#endif
|
||||
|
||||
|
||||
@@ -1313,98 +1249,4 @@
|
||||
#endif
|
||||
|
||||
|
||||
// -- Abbreiviated kernel blocksize macros -------------------------------------
|
||||
|
||||
// Here, we shorten the blocksizes defined in bli_kernel.h so that they can
|
||||
// derived via the PASTEMAC macro.
|
||||
|
||||
// Default (minimum) cache blocksizes
|
||||
|
||||
#define bli_smc BLIS_DEFAULT_MC_S
|
||||
#define bli_skc BLIS_DEFAULT_KC_S
|
||||
#define bli_snc BLIS_DEFAULT_NC_S
|
||||
|
||||
#define bli_dmc BLIS_DEFAULT_MC_D
|
||||
#define bli_dkc BLIS_DEFAULT_KC_D
|
||||
#define bli_dnc BLIS_DEFAULT_NC_D
|
||||
|
||||
#define bli_cmc BLIS_DEFAULT_MC_C
|
||||
#define bli_ckc BLIS_DEFAULT_KC_C
|
||||
#define bli_cnc BLIS_DEFAULT_NC_C
|
||||
|
||||
#define bli_zmc BLIS_DEFAULT_MC_Z
|
||||
#define bli_zkc BLIS_DEFAULT_KC_Z
|
||||
#define bli_znc BLIS_DEFAULT_NC_Z
|
||||
|
||||
// Register blocksizes
|
||||
|
||||
#define bli_smr BLIS_DEFAULT_MR_S
|
||||
#define bli_skr BLIS_DEFAULT_KR_S
|
||||
#define bli_snr BLIS_DEFAULT_NR_S
|
||||
|
||||
#define bli_dmr BLIS_DEFAULT_MR_D
|
||||
#define bli_dkr BLIS_DEFAULT_KR_D
|
||||
#define bli_dnr BLIS_DEFAULT_NR_D
|
||||
|
||||
#define bli_cmr BLIS_DEFAULT_MR_C
|
||||
#define bli_ckr BLIS_DEFAULT_KR_C
|
||||
#define bli_cnr BLIS_DEFAULT_NR_C
|
||||
|
||||
#define bli_zmr BLIS_DEFAULT_MR_Z
|
||||
#define bli_zkr BLIS_DEFAULT_KR_Z
|
||||
#define bli_znr BLIS_DEFAULT_NR_Z
|
||||
|
||||
// Extended (maximum) cache blocksizes
|
||||
|
||||
#define bli_smaxmc BLIS_MAXIMUM_MC_S
|
||||
#define bli_smaxkc BLIS_MAXIMUM_KC_S
|
||||
#define bli_smaxnc BLIS_MAXIMUM_NC_S
|
||||
|
||||
#define bli_dmaxmc BLIS_MAXIMUM_MC_D
|
||||
#define bli_dmaxkc BLIS_MAXIMUM_KC_D
|
||||
#define bli_dmaxnc BLIS_MAXIMUM_NC_D
|
||||
|
||||
#define bli_cmaxmc BLIS_MAXIMUM_MC_C
|
||||
#define bli_cmaxkc BLIS_MAXIMUM_KC_C
|
||||
#define bli_cmaxnc BLIS_MAXIMUM_NC_C
|
||||
|
||||
#define bli_zmaxmc BLIS_MAXIMUM_MC_Z
|
||||
#define bli_zmaxkc BLIS_MAXIMUM_KC_Z
|
||||
#define bli_zmaxnc BLIS_MAXIMUM_NC_Z
|
||||
|
||||
// Extended (packing) register blocksizes
|
||||
|
||||
#define bli_spackmr BLIS_PACKDIM_MR_S
|
||||
#define bli_spackkr BLIS_PACKDIM_KR_S
|
||||
#define bli_spacknr BLIS_PACKDIM_NR_S
|
||||
|
||||
#define bli_dpackmr BLIS_PACKDIM_MR_D
|
||||
#define bli_dpackkr BLIS_PACKDIM_KR_D
|
||||
#define bli_dpacknr BLIS_PACKDIM_NR_D
|
||||
|
||||
#define bli_cpackmr BLIS_PACKDIM_MR_C
|
||||
#define bli_cpackkr BLIS_PACKDIM_KR_C
|
||||
#define bli_cpacknr BLIS_PACKDIM_NR_C
|
||||
|
||||
#define bli_zpackmr BLIS_PACKDIM_MR_Z
|
||||
#define bli_zpackkr BLIS_PACKDIM_KR_Z
|
||||
#define bli_zpacknr BLIS_PACKDIM_NR_Z
|
||||
|
||||
// Level-1f fusing factors
|
||||
|
||||
#define bli_saxpyf_fusefac BLIS_AXPYF_FUSE_FAC_S
|
||||
#define bli_daxpyf_fusefac BLIS_AXPYF_FUSE_FAC_D
|
||||
#define bli_caxpyf_fusefac BLIS_AXPYF_FUSE_FAC_C
|
||||
#define bli_zaxpyf_fusefac BLIS_AXPYF_FUSE_FAC_Z
|
||||
|
||||
#define bli_sdotxf_fusefac BLIS_DOTXF_FUSE_FAC_S
|
||||
#define bli_ddotxf_fusefac BLIS_DOTXF_FUSE_FAC_D
|
||||
#define bli_cdotxf_fusefac BLIS_DOTXF_FUSE_FAC_C
|
||||
#define bli_zdotxf_fusefac BLIS_DOTXF_FUSE_FAC_Z
|
||||
|
||||
#define bli_sdotxaxpyf_fusefac BLIS_DOTXAXPYF_FUSE_FAC_S
|
||||
#define bli_ddotxaxpyf_fusefac BLIS_DOTXAXPYF_FUSE_FAC_D
|
||||
#define bli_cdotxaxpyf_fusefac BLIS_DOTXAXPYF_FUSE_FAC_C
|
||||
#define bli_zdotxaxpyf_fusefac BLIS_DOTXAXPYF_FUSE_FAC_Z
|
||||
|
||||
#endif
|
||||
|
||||
Reference in New Issue
Block a user