mirror of
https://github.com/amd/blis.git
synced 2026-03-24 03:07:22 +00:00
Implemented runtime contexts and reorganized code.
Details:
- Retrofitted a new data structure, known as a context, into virtually
all internal APIs for computational operations in BLIS. The structure
is now present within the type-aware APIs, as well as many supporting
utility functions that require information stored in the context. User-
level object APIs were unaffected and continue to be "context-free,"
however, these APIs were duplicated/mirrored so that "context-aware"
APIs now also exist, differentiated with an "_ex" suffix (for "expert").
These new context-aware object APIs (along with the lower-level, type-
aware, BLAS-like APIs) contain the the address of a context as a last
parameter, after all other operands. Contexts, or specifically, cntx_t
object pointers, are passed all the way down the function stack into
the kernels and allow the code at any level to query information about
the runtime, such as kernel addresses and blocksizes, in a thread-
friendly manner--that is, one that allows thread-safety, even if the
original source of the information stored in the context changes at
run-time; see next bullet for more on this "original source" of info).
(Special thanks go to Lee Killough for suggesting the use of this kind
of data structure in discussions that transpired during the early
planning stages of BLIS, and also for suggesting such a perfectly
appropriate name.)
- Added a new API, in frame/base/bli_gks.c, to define a "global kernel
structure" (gks). This data structure and API will allow the caller to
initialize a context with the kernel addresses, blocksizes, and other
information associated with the currently active kernel configuration.
The currently active kernel configuration within the gks cannot be
changed (for now), and is initialized with the traditional cpp macros
that define kernel function names, blocksizes, and the like. However,
in the future, the gks API will be expanded to allow runtime management
of kernels and runtime parameters. The most obvious application of this
new infrastructure is the runtime detection of hardware (and the
implied selection of appropriate kernels). With contexts in place,
kernels may even be "hot swapped" at runtime within the gks. Once
execution enters a level-3 _front() function, the memory allocator will
be reinitialized on-the-fly, if necessary, to accommodate the new
kernels' blocksizes. If another application thread is executing with
another (previously loaded) kernel, it will finish in a deterministic
fashion because its kernel information was loaded into its context
before computation began, and also because the blocks it checked out
from the internal memory pools will be unaffected by the newer threads'
reinitialization of the allocator.
- Reorganized and streamlined the 'ind' directory, which contains much of
the code enabling use of induced methods for complex domain matrix
multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as
those APIs' functionality is now mostly subsumed within the global
kernel structure.
- Updated bli_pool.c to define a new function, bli_pool_reinit_if(),
that will reinitialize a memory pool if the necessary pool block size
has increased.
- Updated bli_mem.c to use bli_pool_reinit_if() instead of
bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed
usage of contexts where appropriate to communicate cache and register
blocksizes to bli_mem_compute_pool_block_sizes().
- Simplified control trees now that much of the information resides in
the context and/or the global kernel structure:
- Removed blocksize object pointers (blksz_t*) fields from all control
tree node definitions and replaced them with blocksize id (bszid_t)
values instead, which may be passed into a context query routine in
order to extract the corresponding blocksize from the given context.
- Removed micro-kernel function pointers (func_t*) fields from all
control tree node definitions. Now, any code that needs these function
pointers can query them from the local context, as identified by a
level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or
level-1v kernel id (l1vkr_t).
- Removed blksz_t object creation and initialization, as well as kernel
function object creation and initialization, from all operation-
specific control tree initialization files (bli_*_cntl.c), since this
information will now live in the gks and, secondarily, in the context.
- Removed blocksize multiples from blksz_t objects. Now, we track
blocksize multiples for each blocksize id (bszid_t) in the context
object.
- Removed the bool_t's that were required when a func_t was initialized.
These bools are meant to allow one to track the micro-kernel's storage
preferences (by rows or columns). This preference is now tracked
separately within the gks and contexts.
- Merged and reorganized many separate-but-related functions into single
files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and
util directories, but has the most obvious effect of allowing BLIS
to compile noticeably faster.
- Reorganized execution paths for level-1v, -1d, -1m, and -2 operations
in an attempt to reduce overhead for memory-bound operations. This
includes removal of default use of object-based variants for level-2
operations. Now, by default, level-2 operations will directly call a
low-level (non-object based) loop over a level-1v or -1f kernel.
- Converted many common query functions in blk_blksz.c (renamed from
bli_blocksize.c) and bli_func.c into cpp macros, now defined in their
respective header files.
- Defined bli_mbool.c API to create and query "multi-bools", or
heterogeneous bool_t's (one for each floating-point datatype), in the
same spirit as blksz_t and func_t.
- Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS
and BLIS_SIMD_SIZE. These values are needed in order to compute a third
new parameter, which may be set indirectly via the aforementioned
macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to
statically allocate memory in macro-kernels and the induced methods'
virtual kernels to be used as temporary space to hold a single
micro-tile. These values are now output by the testsuite. The default
value of BLIS_STACK_BUF_MAX_SIZE is computed as
"2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE".
- Cleaned up top-level 'kernels' directory (for example, renaming the
embarrassingly misleading "avx" and "avx2" directories to "sandybridge"
and "haswell," respectively, and gave more consistent and meaningful
names to many kernel files (as well as updating their interfaces to
conform to the new context-aware kernel APIs).
- Updated the testsuite to query blocksizes from a locally-initialized
context for test modules that need those values: axpyf, dotxf,
dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr.
- Reformatted many function signatures into a standard format that will
more easily facilitate future API-wide changes.
- Updated many "mxn" level-0 macros (ie: those used to inline double loops
for level-1m-like operations on small matrices) in frame/include/level0
to use more obscure local variable names in an effort to avoid variable
shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings,
which are only output using -Wshadow.)
- Added a conj argument to setm, so that its interface now mirrors that
of scalm. The semantic meaning of the conj argument is to optionally
allow implicit conjugation of the scalar prior to being populated into
the object.
- Deprecated all type-aware mixed domain and mixed precision APIs. Note
that this does not preclude supporting mixed types via the object APIs,
where it produces absolutely zero API code bloat.
This commit is contained in:
@@ -144,25 +144,25 @@
|
||||
|
||||
// -- Default fusing factors for level-1f operations --
|
||||
|
||||
#define BLIS_L1F_FUSE_FAC_S 8
|
||||
#define BLIS_L1F_FUSE_FAC_D 8
|
||||
#define BLIS_L1F_FUSE_FAC_C 4
|
||||
#define BLIS_L1F_FUSE_FAC_Z 2
|
||||
#define BLIS_DEFAULT_1F_S 8
|
||||
#define BLIS_DEFAULT_1F_D 8
|
||||
#define BLIS_DEFAULT_1F_C 4
|
||||
#define BLIS_DEFAULT_1F_Z 2
|
||||
|
||||
#define BLIS_AXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
#define BLIS_AXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
#define BLIS_AXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
#define BLIS_AXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
#define BLIS_DEFAULT_AF_S BLIS_DEFAULT_1F_S
|
||||
#define BLIS_DEFAULT_AF_D BLIS_DEFAULT_1F_D
|
||||
#define BLIS_DEFAULT_AF_C BLIS_DEFAULT_1F_C
|
||||
#define BLIS_DEFAULT_AF_Z BLIS_DEFAULT_1F_Z
|
||||
|
||||
#define BLIS_DOTXF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
#define BLIS_DOTXF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
#define BLIS_DOTXF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
#define BLIS_DOTXF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
#define BLIS_DEFAULT_DF_S BLIS_DEFAULT_1F_S
|
||||
#define BLIS_DEFAULT_DF_D BLIS_DEFAULT_1F_D
|
||||
#define BLIS_DEFAULT_DF_C BLIS_DEFAULT_1F_C
|
||||
#define BLIS_DEFAULT_DF_Z BLIS_DEFAULT_1F_Z
|
||||
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
#define BLIS_DEFAULT_XF_S BLIS_DEFAULT_1F_S
|
||||
#define BLIS_DEFAULT_XF_D BLIS_DEFAULT_1F_D
|
||||
#define BLIS_DEFAULT_XF_C BLIS_DEFAULT_1F_C
|
||||
#define BLIS_DEFAULT_XF_Z BLIS_DEFAULT_1F_Z
|
||||
|
||||
|
||||
|
||||
@@ -173,8 +173,8 @@
|
||||
|
||||
#include "bli_gemm_8x8.h"
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_8x8
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_8x8
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_int_8x8
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_int_8x8
|
||||
|
||||
// -- trsm-related --
|
||||
|
||||
|
||||
@@ -51,87 +51,6 @@
|
||||
// (b) MR (for zero-padding purposes when MR and NR are "swapped")
|
||||
//
|
||||
|
||||
// #define BLIS_DEFAULT_MC_S 128
|
||||
// #define BLIS_DEFAULT_KC_S 384
|
||||
// #define BLIS_DEFAULT_NC_S 4096
|
||||
|
||||
#define BLIS_DEFAULT_MC_D 1080
|
||||
#define BLIS_DEFAULT_KC_D 120
|
||||
#define BLIS_DEFAULT_NC_D 8400
|
||||
|
||||
// #define BLIS_DEFAULT_MC_C 128
|
||||
// #define BLIS_DEFAULT_KC_C 256
|
||||
// #define BLIS_DEFAULT_NC_C 4096
|
||||
//
|
||||
// #define BLIS_DEFAULT_MC_Z 64
|
||||
// #define BLIS_DEFAULT_KC_Z 256
|
||||
// #define BLIS_DEFAULT_NC_Z 2048
|
||||
|
||||
// -- Register blocksizes --
|
||||
|
||||
// #define BLIS_DEFAULT_MR_S 8
|
||||
// #define BLIS_DEFAULT_NR_S 8
|
||||
|
||||
#define BLIS_DEFAULT_MR_D 4
|
||||
#define BLIS_DEFAULT_NR_D 6
|
||||
|
||||
// #define BLIS_DEFAULT_MR_C 8
|
||||
// #define BLIS_DEFAULT_NR_C 4
|
||||
//
|
||||
// #define BLIS_DEFAULT_MR_Z 8
|
||||
// #define BLIS_DEFAULT_NR_Z 4
|
||||
|
||||
// NOTE: If the micro-kernel, which is typically unrolled to a factor
|
||||
// of f, handles leftover edge cases (ie: when k % f > 0) then these
|
||||
// register blocksizes in the k dimension can be defined to 1.
|
||||
|
||||
//#define BLIS_DEFAULT_KR_S 1
|
||||
//#define BLIS_DEFAULT_KR_D 1
|
||||
//#define BLIS_DEFAULT_KR_C 1
|
||||
//#define BLIS_DEFAULT_KR_Z 1
|
||||
|
||||
// -- Maximum cache blocksizes (for optimizing edge cases) --
|
||||
|
||||
// NOTE: These cache blocksize "extensions" have the same constraints as
|
||||
// the corresponding default blocksizes above. When these values are
|
||||
// larger than the default blocksizes, blocksizes used at edge cases are
|
||||
// enlarged if such an extension would encompass the remaining portion of
|
||||
// the matrix dimension.
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_S (BLIS_DEFAULT_MC_S + BLIS_DEFAULT_MC_S/4)
|
||||
//#define BLIS_MAXIMUM_KC_S (BLIS_DEFAULT_KC_S + BLIS_DEFAULT_KC_S/4)
|
||||
//#define BLIS_MAXIMUM_NC_S (BLIS_DEFAULT_NC_S + BLIS_DEFAULT_NC_S/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_D (BLIS_DEFAULT_MC_D + BLIS_DEFAULT_MC_D/4)
|
||||
//#define BLIS_MAXIMUM_KC_D (BLIS_DEFAULT_KC_D + BLIS_DEFAULT_KC_D/4)
|
||||
//#define BLIS_MAXIMUM_NC_D (BLIS_DEFAULT_NC_D + BLIS_DEFAULT_NC_D/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_C (BLIS_DEFAULT_MC_C + BLIS_DEFAULT_MC_C/4)
|
||||
//#define BLIS_MAXIMUM_KC_C (BLIS_DEFAULT_KC_C + BLIS_DEFAULT_KC_C/4)
|
||||
//#define BLIS_MAXIMUM_NC_C (BLIS_DEFAULT_NC_C + BLIS_DEFAULT_NC_C/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_Z (BLIS_DEFAULT_MC_Z + BLIS_DEFAULT_MC_Z/4)
|
||||
//#define BLIS_MAXIMUM_KC_Z (BLIS_DEFAULT_KC_Z + BLIS_DEFAULT_KC_Z/4)
|
||||
//#define BLIS_MAXIMUM_NC_Z (BLIS_DEFAULT_NC_Z + BLIS_DEFAULT_NC_Z/4)
|
||||
|
||||
// -- Packing register blocksize (for packed micro-panels) --
|
||||
|
||||
// NOTE: These register blocksize "extensions" determine whether the
|
||||
// leading dimensions used within the packed micro-panels are equal to
|
||||
// or greater than their corresponding register blocksizes above.
|
||||
|
||||
//#define BLIS_PACKDIM_MR_S (BLIS_DEFAULT_MR_S + ...)
|
||||
//#define BLIS_PACKDIM_NR_S (BLIS_DEFAULT_NR_S + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_D (BLIS_DEFAULT_MR_D + ...)
|
||||
//#define BLIS_PACKDIM_NR_D (BLIS_DEFAULT_NR_D + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_C (BLIS_DEFAULT_MR_C + ...)
|
||||
//#define BLIS_PACKDIM_NR_C (BLIS_DEFAULT_NR_C + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_Z (BLIS_DEFAULT_MR_Z + ...)
|
||||
//#define BLIS_PACKDIM_NR_Z (BLIS_DEFAULT_NR_Z + ...)
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -149,23 +68,28 @@
|
||||
|
||||
// -- gemm --
|
||||
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_8x8_FMA4
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_8x8_fma4
|
||||
#define BLIS_DEFAULT_MC_S 128
|
||||
#define BLIS_DEFAULT_KC_S 384
|
||||
#define BLIS_DEFAULT_NC_S 4096
|
||||
#define BLIS_DEFAULT_MR_S 8
|
||||
#define BLIS_DEFAULT_NR_S 8
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_4x6_FMA4
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_4x6_fma4
|
||||
#define BLIS_DEFAULT_MC_D 1080
|
||||
#define BLIS_DEFAULT_KC_D 120
|
||||
#define BLIS_DEFAULT_NC_D 8400
|
||||
#define BLIS_DEFAULT_MR_D 4
|
||||
#define BLIS_DEFAULT_NR_D 6
|
||||
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_8x4_FMA4
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_8x4_fma4
|
||||
#define BLIS_DEFAULT_MC_C 96
|
||||
#define BLIS_DEFAULT_KC_C 256
|
||||
#define BLIS_DEFAULT_NC_C 4096
|
||||
#define BLIS_DEFAULT_MR_C 8
|
||||
#define BLIS_DEFAULT_NR_C 4
|
||||
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_4x4_FMA4
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_4x4_fma4
|
||||
#define BLIS_DEFAULT_MC_Z 64
|
||||
#define BLIS_DEFAULT_KC_Z 192
|
||||
#define BLIS_DEFAULT_NC_Z 4096
|
||||
|
||||
@@ -51,28 +51,28 @@
|
||||
// (b) MR (for zero-padding purposes when MR and NR are "swapped")
|
||||
//
|
||||
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_new_16x3
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_16x3
|
||||
#define BLIS_DEFAULT_MC_S 528
|
||||
#define BLIS_DEFAULT_KC_S 256
|
||||
#define BLIS_DEFAULT_NC_S 8400
|
||||
#define BLIS_DEFAULT_MR_S 16
|
||||
#define BLIS_DEFAULT_NR_S 3
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_new_8x3
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_8x3
|
||||
#define BLIS_DEFAULT_MC_D 264
|
||||
#define BLIS_DEFAULT_KC_D 256
|
||||
#define BLIS_DEFAULT_NC_D 8400
|
||||
#define BLIS_DEFAULT_MR_D 8
|
||||
#define BLIS_DEFAULT_NR_D 3
|
||||
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_new_4x2
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_4x2
|
||||
#define BLIS_DEFAULT_MC_C 264
|
||||
#define BLIS_DEFAULT_KC_C 256
|
||||
#define BLIS_DEFAULT_NC_C 8400
|
||||
#define BLIS_DEFAULT_MR_C 4
|
||||
#define BLIS_DEFAULT_NR_C 2
|
||||
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_new_2x2
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_2x2
|
||||
#define BLIS_DEFAULT_MC_Z 100
|
||||
#define BLIS_DEFAULT_KC_Z 320
|
||||
#define BLIS_DEFAULT_NC_Z 8400
|
||||
|
||||
@@ -1 +1 @@
|
||||
../../kernels/arm/neon
|
||||
../../kernels/arm
|
||||
@@ -1 +1 @@
|
||||
../../kernels/arm/neon
|
||||
../../kernels/arm
|
||||
@@ -67,26 +67,6 @@
|
||||
//#define BLIS_DEFAULT_KC_Z 384
|
||||
//#define BLIS_DEFAULT_NC_Z 4096
|
||||
|
||||
// NOTE: If 4m blocksizes are not defined here, they will be determined
|
||||
// from the corresponding real domain blocksizes.
|
||||
#define BLIS_DEFAULT_4M_MC_C 384
|
||||
#define BLIS_DEFAULT_4M_KC_C 512
|
||||
#define BLIS_DEFAULT_4M_NC_C 4096
|
||||
|
||||
#define BLIS_DEFAULT_4M_MC_Z 192
|
||||
#define BLIS_DEFAULT_4M_KC_Z 256
|
||||
#define BLIS_DEFAULT_4M_NC_Z 4096
|
||||
|
||||
// NOTE: If 3m blocksizes are not defined here, they will be determined
|
||||
// from the corresponding real domain blocksizes.
|
||||
#define BLIS_DEFAULT_3M_MC_C 384
|
||||
#define BLIS_DEFAULT_3M_KC_C 512
|
||||
#define BLIS_DEFAULT_3M_NC_C 4096
|
||||
|
||||
#define BLIS_DEFAULT_3M_MC_Z 192
|
||||
#define BLIS_DEFAULT_3M_KC_Z 256
|
||||
#define BLIS_DEFAULT_3M_NC_Z 4096
|
||||
|
||||
// -- Register blocksizes --
|
||||
|
||||
#define BLIS_DEFAULT_MR_S 8
|
||||
@@ -101,56 +81,6 @@
|
||||
#define BLIS_DEFAULT_MR_Z 2
|
||||
#define BLIS_DEFAULT_NR_Z 2
|
||||
|
||||
// NOTE: If the micro-kernel, which is typically unrolled to a factor
|
||||
// of f, handles leftover edge cases (ie: when k % f > 0) then these
|
||||
// register blocksizes in the k dimension can be defined to 1.
|
||||
|
||||
//#define BLIS_DEFAULT_KR_S 1
|
||||
//#define BLIS_DEFAULT_KR_D 1
|
||||
//#define BLIS_DEFAULT_KR_C 1
|
||||
//#define BLIS_DEFAULT_KR_Z 1
|
||||
|
||||
// -- Maximum cache blocksizes (for optimizing edge cases) --
|
||||
|
||||
// NOTE: These cache blocksize "extensions" have the same constraints as
|
||||
// the corresponding default blocksizes above. When these values are
|
||||
// larger than the default blocksizes, blocksizes used at edge cases are
|
||||
// enlarged if such an extension would encompass the remaining portion of
|
||||
// the matrix dimension.
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_S (BLIS_DEFAULT_MC_S + BLIS_DEFAULT_MC_S/4)
|
||||
//#define BLIS_MAXIMUM_KC_S (BLIS_DEFAULT_KC_S + BLIS_DEFAULT_KC_S/4)
|
||||
//#define BLIS_MAXIMUM_NC_S (BLIS_DEFAULT_NC_S + BLIS_DEFAULT_NC_S/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_D (BLIS_DEFAULT_MC_D + BLIS_DEFAULT_MC_D/4)
|
||||
//#define BLIS_MAXIMUM_KC_D (BLIS_DEFAULT_KC_D + BLIS_DEFAULT_KC_D/4)
|
||||
//#define BLIS_MAXIMUM_NC_D (BLIS_DEFAULT_NC_D + BLIS_DEFAULT_NC_D/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_C (BLIS_DEFAULT_MC_C + BLIS_DEFAULT_MC_C/4)
|
||||
//#define BLIS_MAXIMUM_KC_C (BLIS_DEFAULT_KC_C + BLIS_DEFAULT_KC_C/4)
|
||||
//#define BLIS_MAXIMUM_NC_C (BLIS_DEFAULT_NC_C + BLIS_DEFAULT_NC_C/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_Z (BLIS_DEFAULT_MC_Z + BLIS_DEFAULT_MC_Z/4)
|
||||
//#define BLIS_MAXIMUM_KC_Z (BLIS_DEFAULT_KC_Z + BLIS_DEFAULT_KC_Z/4)
|
||||
//#define BLIS_MAXIMUM_NC_Z (BLIS_DEFAULT_NC_Z + BLIS_DEFAULT_NC_Z/4)
|
||||
|
||||
// -- Packing register blocksize (for packed micro-panels) --
|
||||
|
||||
// NOTE: These register blocksize "extensions" determine whether the
|
||||
// leading dimensions used within the packed micro-panels are equal to
|
||||
// or greater than their corresponding register blocksizes above.
|
||||
|
||||
//#define BLIS_PACKDIM_MR_S (BLIS_DEFAULT_MR_S + ...)
|
||||
//#define BLIS_PACKDIM_NR_S (BLIS_DEFAULT_NR_S + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_D (BLIS_DEFAULT_MR_D + ...)
|
||||
//#define BLIS_PACKDIM_NR_D (BLIS_DEFAULT_NR_D + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_C (BLIS_DEFAULT_MR_C + ...)
|
||||
//#define BLIS_PACKDIM_NR_C (BLIS_DEFAULT_NR_C + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_Z (BLIS_DEFAULT_MR_Z + ...)
|
||||
//#define BLIS_PACKDIM_NR_Z (BLIS_DEFAULT_NR_Z + ...)
|
||||
|
||||
|
||||
|
||||
@@ -169,13 +99,13 @@
|
||||
|
||||
// -- gemm --
|
||||
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_opt_8x4
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_4x4
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_8x4
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_4x4
|
||||
|
||||
// -- trsm-related --
|
||||
|
||||
#define BLIS_DGEMMTRSM_L_UKERNEL bli_dgemmtrsm_l_opt_4x4
|
||||
#define BLIS_DGEMMTRSM_U_UKERNEL bli_dgemmtrsm_u_opt_4x4
|
||||
#define BLIS_DGEMMTRSM_L_UKERNEL bli_dgemmtrsm_l_asm_4x4
|
||||
#define BLIS_DGEMMTRSM_U_UKERNEL bli_dgemmtrsm_u_asm_4x4
|
||||
|
||||
|
||||
|
||||
@@ -184,23 +114,23 @@
|
||||
|
||||
// -- axpy2v --
|
||||
|
||||
#define BLIS_DAXPY2V_KERNEL bli_daxpy2v_opt_var1
|
||||
#define BLIS_DAXPY2V_KERNEL bli_daxpy2v_int_var1
|
||||
|
||||
// -- dotaxpyv --
|
||||
|
||||
#define BLIS_DDOTAXPYV_KERNEL bli_ddotaxpyv_opt_var1
|
||||
#define BLIS_DDOTAXPYV_KERNEL bli_ddotaxpyv_int_var1
|
||||
|
||||
// -- axpyf --
|
||||
|
||||
#define BLIS_DAXPYF_KERNEL bli_daxpyf_opt_var1
|
||||
#define BLIS_DAXPYF_KERNEL bli_daxpyf_int_var1
|
||||
|
||||
// -- dotxf --
|
||||
|
||||
#define BLIS_DDOTXF_KERNEL bli_ddotxf_opt_var1
|
||||
#define BLIS_DDOTXF_KERNEL bli_ddotxf_int_var1
|
||||
|
||||
// -- dotxaxpyf --
|
||||
|
||||
#define BLIS_DDOTXAXPYF_KERNEL bli_ddotxaxpyf_opt_var1
|
||||
#define BLIS_DDOTXAXPYF_KERNEL bli_ddotxaxpyf_int_var1
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1 +1 @@
|
||||
../../kernels/x86_64/core2-sse3
|
||||
../../kernels/x86_64/penryn
|
||||
@@ -89,21 +89,6 @@
|
||||
|
||||
#endif
|
||||
|
||||
/*
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_8x4
|
||||
#define BLIS_DEFAULT_MC_C 96
|
||||
#define BLIS_DEFAULT_KC_C 256
|
||||
#define BLIS_DEFAULT_NC_C 4096
|
||||
#define BLIS_DEFAULT_MR_C 8
|
||||
#define BLIS_DEFAULT_NR_C 4
|
||||
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_4x4
|
||||
#define BLIS_DEFAULT_MC_Z 64
|
||||
#define BLIS_DEFAULT_KC_Z 192
|
||||
#define BLIS_DEFAULT_NC_Z 4096
|
||||
#define BLIS_DEFAULT_MR_Z 4
|
||||
#define BLIS_DEFAULT_NR_Z 4
|
||||
*/
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1 +1 @@
|
||||
../../kernels/x86_64/avx2
|
||||
../../kernels/x86_64/haswell
|
||||
@@ -149,7 +149,7 @@
|
||||
|
||||
// -- gemm --
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_d4x4
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_4x4
|
||||
|
||||
// -- trsm-related --
|
||||
|
||||
|
||||
@@ -42,6 +42,9 @@
|
||||
|
||||
#define BLIS_SIMD_ALIGN_SIZE 32
|
||||
|
||||
#define BLIS_SIMD_SIZE 64
|
||||
#define BLIS_SIMD_NUM_REGISTERS 32
|
||||
|
||||
|
||||
|
||||
#endif
|
||||
|
||||
@@ -153,8 +153,8 @@
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_30x8
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_opt_30x16
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_30x16
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_30x8
|
||||
|
||||
// -- trsm-related --
|
||||
|
||||
|
||||
@@ -51,7 +51,7 @@
|
||||
// (b) MR (for zero-padding purposes when MR and NR are "swapped")
|
||||
//
|
||||
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_new_16x3
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_16x3
|
||||
#define BLIS_DEFAULT_MC_S 2016
|
||||
#define BLIS_DEFAULT_KC_S 128
|
||||
#define BLIS_DEFAULT_NC_S 8400
|
||||
@@ -59,7 +59,7 @@
|
||||
#define BLIS_DEFAULT_NR_S 3
|
||||
//#define BLIS_UPANEL_B_ALIGN_SIZE_S 4096
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_new_8x3
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_8x3
|
||||
//#define BLIS_DEFAULT_MC_D 768
|
||||
//#define BLIS_DEFAULT_KC_D 168
|
||||
#define BLIS_DEFAULT_MC_D 1008
|
||||
@@ -69,14 +69,14 @@
|
||||
#define BLIS_DEFAULT_NR_D 3
|
||||
//#define BLIS_UPANEL_B_ALIGN_SIZE_D 4096
|
||||
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_new_4x2
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_4x2
|
||||
#define BLIS_DEFAULT_MC_C 512
|
||||
#define BLIS_DEFAULT_KC_C 256
|
||||
#define BLIS_DEFAULT_NC_C 8400
|
||||
#define BLIS_DEFAULT_MR_C 4
|
||||
#define BLIS_DEFAULT_NR_C 2
|
||||
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_new_2x2
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_2x2
|
||||
#define BLIS_DEFAULT_MC_Z 400
|
||||
#define BLIS_DEFAULT_KC_Z 160
|
||||
#define BLIS_DEFAULT_NC_Z 8400
|
||||
|
||||
@@ -1 +1 @@
|
||||
../../kernels/x86_64/avx
|
||||
../../kernels/x86_64/sandybridge
|
||||
@@ -177,17 +177,17 @@
|
||||
// be packed here, but this tends to be much too expensive in practice to
|
||||
// actually employ.)
|
||||
|
||||
//#define BLIS_DEFAULT_L2_MC_S 1000
|
||||
//#define BLIS_DEFAULT_L2_NC_S 1000
|
||||
//#define BLIS_DEFAULT_M2_S 1000
|
||||
//#define BLIS_DEFAULT_N2_S 1000
|
||||
|
||||
//#define BLIS_DEFAULT_L2_MC_D 1000
|
||||
//#define BLIS_DEFAULT_L2_NC_D 1000
|
||||
//#define BLIS_DEFAULT_M2_D 1000
|
||||
//#define BLIS_DEFAULT_N2_D 1000
|
||||
|
||||
//#define BLIS_DEFAULT_L2_MC_C 1000
|
||||
//#define BLIS_DEFAULT_L2_NC_C 1000
|
||||
//#define BLIS_DEFAULT_M2_C 1000
|
||||
//#define BLIS_DEFAULT_N2_C 1000
|
||||
|
||||
//#define BLIS_DEFAULT_L2_MC_Z 1000
|
||||
//#define BLIS_DEFAULT_L2_NC_Z 1000
|
||||
//#define BLIS_DEFAULT_M2_Z 1000
|
||||
//#define BLIS_DEFAULT_N2_Z 1000
|
||||
|
||||
|
||||
|
||||
@@ -196,25 +196,25 @@
|
||||
|
||||
// -- Default fusing factors for level-1f operations --
|
||||
|
||||
//#define BLIS_L1F_FUSE_FAC_S 8
|
||||
//#define BLIS_L1F_FUSE_FAC_D 4
|
||||
//#define BLIS_L1F_FUSE_FAC_C 4
|
||||
//#define BLIS_L1F_FUSE_FAC_Z 2
|
||||
//#define BLIS_DEFAULT_1F_S 8
|
||||
//#define BLIS_DEFAULT_1F_D 4
|
||||
//#define BLIS_DEFAULT_1F_C 4
|
||||
//#define BLIS_DEFAULT_1F_Z 2
|
||||
|
||||
//#define BLIS_AXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
//#define BLIS_AXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
//#define BLIS_AXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
//#define BLIS_AXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
//#define BLIS_DEFAULT_AF_S BLIS_DEFAULT_1F_S
|
||||
//#define BLIS_DEFAULT_AF_D BLIS_DEFAULT_1F_D
|
||||
//#define BLIS_DEFAULT_AF_C BLIS_DEFAULT_1F_C
|
||||
//#define BLIS_DEFAULT_AF_Z BLIS_DEFAULT_1F_Z
|
||||
|
||||
//#define BLIS_DOTXF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
//#define BLIS_DOTXF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
//#define BLIS_DOTXF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
//#define BLIS_DOTXF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
//#define BLIS_DEFAULT_DF_S BLIS_DEFAULT_1F_S
|
||||
//#define BLIS_DEFAULT_DF_D BLIS_DEFAULT_1F_D
|
||||
//#define BLIS_DEFAULT_DF_C BLIS_DEFAULT_1F_C
|
||||
//#define BLIS_DEFAULT_DF_Z BLIS_DEFAULT_1F_Z
|
||||
|
||||
//#define BLIS_DOTXAXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
//#define BLIS_DOTXAXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
//#define BLIS_DOTXAXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
//#define BLIS_DOTXAXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
//#define BLIS_DEFAULT_XF_S BLIS_DEFAULT_1F_S
|
||||
//#define BLIS_DEFAULT_XF_D BLIS_DEFAULT_1F_D
|
||||
//#define BLIS_DEFAULT_XF_C BLIS_DEFAULT_1F_C
|
||||
//#define BLIS_DEFAULT_XF_Z BLIS_DEFAULT_1F_Z
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -36,59 +36,87 @@
|
||||
|
||||
|
||||
|
||||
void bli_saxpyv_opt_var1( conj_t conjx,
|
||||
dim_t n,
|
||||
float* restrict alpha,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict y, inc_t incy )
|
||||
void bli_saxpyv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
dim_t n,
|
||||
float* alpha,
|
||||
float* x, inc_t incx,
|
||||
float* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SAXPYV_KERNEL_REF( conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_SAXPYV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_daxpyv_opt_var1( conj_t conjx,
|
||||
dim_t n,
|
||||
double* restrict alpha,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict y, inc_t incy )
|
||||
void bli_daxpyv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
dim_t n,
|
||||
double* alpha,
|
||||
double* x, inc_t incx,
|
||||
double* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DAXPYV_KERNEL_REF( conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_DAXPYV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_caxpyv_opt_var1( conj_t conjx,
|
||||
dim_t n,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict y, inc_t incy )
|
||||
void bli_caxpyv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
dim_t n,
|
||||
scomplex* alpha,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CAXPYV_KERNEL_REF( conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_CAXPYV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
dim_t n,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict y, inc_t incy )
|
||||
void bli_zaxpyv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
dim_t n,
|
||||
dcomplex* alpha,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template axpyv kernel implementation
|
||||
@@ -193,11 +221,15 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZAXPYV_KERNEL_REF( conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_ZAXPYV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -219,7 +251,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// Compute front edge cases if x and y were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha, *xp, *yp );
|
||||
bli_zaxpys( *alpha, *xp, *yp );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -228,7 +260,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha, *xp, *yp );
|
||||
bli_zaxpys( *alpha, *xp, *yp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -237,7 +269,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha, *xp, *yp );
|
||||
bli_zaxpys( *alpha, *xp, *yp );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -247,7 +279,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// Compute front edge cases if x and y were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha, *xp, *yp );
|
||||
bli_zaxpyjs( *alpha, *xp, *yp );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -256,7 +288,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha, *xp, *yp );
|
||||
bli_zaxpyjs( *alpha, *xp, *yp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -265,7 +297,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha, *xp, *yp );
|
||||
bli_zaxpyjs( *alpha, *xp, *yp );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
|
||||
@@ -36,66 +36,94 @@
|
||||
|
||||
|
||||
|
||||
void bli_sdotv_opt_var1( conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict y, inc_t incy,
|
||||
float* restrict rho )
|
||||
void bli_sdotv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* x, inc_t incx,
|
||||
float* y, inc_t incy,
|
||||
float* rho,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SDOTV_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho );
|
||||
BLIS_SDOTV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ddotv_opt_var1( conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict y, inc_t incy,
|
||||
double* restrict rho )
|
||||
void bli_ddotv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* x, inc_t incx,
|
||||
double* y, inc_t incy,
|
||||
double* rho,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DDOTV_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho );
|
||||
BLIS_DDOTV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_cdotv_opt_var1( conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict y, inc_t incy,
|
||||
scomplex* restrict rho )
|
||||
void bli_cdotv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* y, inc_t incy,
|
||||
scomplex* rho,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CDOTV_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho );
|
||||
BLIS_CDOTV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zdotv_opt_var1( conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict y, inc_t incy,
|
||||
dcomplex* restrict rho )
|
||||
void bli_zdotv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* y, inc_t incy,
|
||||
dcomplex* rho,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template dotv kernel implementation
|
||||
@@ -210,12 +238,16 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZDOTV_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho );
|
||||
BLIS_ZDOTV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -250,7 +282,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// Compute front edge cases if x and y were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -259,7 +291,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -268,7 +300,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -278,7 +310,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// Compute front edge cases if x and y were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -287,7 +319,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -296,7 +328,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -307,6 +339,6 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
if ( bli_is_conj( conjy ) )
|
||||
bli_zconjs( dotxy );
|
||||
|
||||
bli_zzcopys( dotxy, *rho );
|
||||
bli_zcopys( dotxy, *rho );
|
||||
}
|
||||
|
||||
|
||||
@@ -36,88 +36,108 @@
|
||||
|
||||
|
||||
|
||||
void bli_saxpy2v_opt_var1(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* restrict alpha1,
|
||||
float* restrict alpha2,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict y, inc_t incy,
|
||||
float* restrict z, inc_t incz
|
||||
)
|
||||
void bli_saxpy2v_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* alpha1,
|
||||
float* alpha2,
|
||||
float* x, inc_t incx,
|
||||
float* y, inc_t incy,
|
||||
float* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SAXPY2V_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_SAXPY2V_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_daxpy2v_opt_var1(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* restrict alpha1,
|
||||
double* restrict alpha2,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict y, inc_t incy,
|
||||
double* restrict z, inc_t incz
|
||||
)
|
||||
void bli_daxpy2v_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* alpha1,
|
||||
double* alpha2,
|
||||
double* x, inc_t incx,
|
||||
double* y, inc_t incy,
|
||||
double* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DAXPY2V_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_DAXPY2V_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_caxpy2v_opt_var1(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* restrict alpha1,
|
||||
scomplex* restrict alpha2,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict y, inc_t incy,
|
||||
scomplex* restrict z, inc_t incz
|
||||
)
|
||||
void bli_caxpy2v_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* alpha1,
|
||||
scomplex* alpha2,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* y, inc_t incy,
|
||||
scomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CAXPY2V_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_CAXPY2V_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zaxpy2v_opt_var1(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* restrict alpha1,
|
||||
dcomplex* restrict alpha2,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict y, inc_t incy,
|
||||
dcomplex* restrict z, inc_t incz
|
||||
)
|
||||
void bli_zaxpy2v_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* alpha1,
|
||||
dcomplex* alpha2,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* y, inc_t incy,
|
||||
dcomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template axpy2v kernel implementation
|
||||
@@ -229,14 +249,18 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZAXPY2V_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_ZAXPY2V_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -259,8 +283,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -272,8 +296,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -283,8 +307,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -294,8 +318,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -307,8 +331,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -318,8 +342,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -329,8 +353,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -342,8 +366,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -353,8 +377,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -364,8 +388,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -377,8 +401,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -388,8 +412,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
|
||||
@@ -36,87 +36,107 @@
|
||||
|
||||
|
||||
|
||||
void bli_saxpyf_opt_var1(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* restrict alpha,
|
||||
float* restrict a, inc_t inca, inc_t lda,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict y, inc_t incy
|
||||
)
|
||||
void bli_saxpyf_opt_var1
|
||||
(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* alpha,
|
||||
float* a, inc_t inca, inc_t lda,
|
||||
float* x, inc_t incx,
|
||||
float* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SAXPYF_KERNEL_REF( conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_SAXPYF_KERNEL_REF
|
||||
(
|
||||
conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_daxpyf_opt_var1(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* restrict alpha,
|
||||
double* restrict a, inc_t inca, inc_t lda,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict y, inc_t incy
|
||||
)
|
||||
void bli_daxpyf_opt_var1
|
||||
(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* alpha,
|
||||
double* a, inc_t inca, inc_t lda,
|
||||
double* x, inc_t incx,
|
||||
double* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DAXPYF_KERNEL_REF( conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_DAXPYF_KERNEL_REF
|
||||
(
|
||||
conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_caxpyf_opt_var1(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a, inc_t inca, inc_t lda,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict y, inc_t incy
|
||||
)
|
||||
void bli_caxpyf_opt_var1
|
||||
(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* alpha,
|
||||
scomplex* a, inc_t inca, inc_t lda,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CAXPYF_KERNEL_REF( conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_CAXPYF_KERNEL_REF
|
||||
(
|
||||
conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
void bli_zaxpyf_opt_var1(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a, inc_t inca, inc_t lda,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict y, inc_t incy
|
||||
)
|
||||
void bli_zaxpyf_opt_var1
|
||||
(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* alpha,
|
||||
dcomplex* a, inc_t inca, inc_t lda,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template axpyf kernel implementation
|
||||
@@ -243,14 +263,18 @@ void bli_zaxpyf_opt_var1(
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZAXPYF_KERNEL_REF( conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_ZAXPYF_KERNEL_REF
|
||||
(
|
||||
conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -274,16 +298,16 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzcopys( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zzscals( *alpha, alpha_x[ j ] );
|
||||
bli_zcopys( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zscals( *alpha, alpha_x[ j ] );
|
||||
}
|
||||
}
|
||||
else // if ( bli_is_conj( conjx ) )
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzcopyjs( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zzscals( *alpha, alpha_x[ j ] );
|
||||
bli_zcopyjs( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zscals( *alpha, alpha_x[ j ] );
|
||||
}
|
||||
}
|
||||
|
||||
@@ -296,7 +320,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -312,7 +336,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -324,7 +348,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -338,7 +362,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -354,7 +378,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -366,7 +390,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
|
||||
@@ -36,87 +36,115 @@
|
||||
|
||||
|
||||
|
||||
void bli_sdotaxpyv_opt_var1( conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* restrict alpha,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict y, inc_t incy,
|
||||
float* restrict rho,
|
||||
float* restrict z, inc_t incz )
|
||||
void bli_sdotaxpyv_opt_var1
|
||||
(
|
||||
conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* alpha,
|
||||
float* x, inc_t incx,
|
||||
float* y, inc_t incy,
|
||||
float* rho,
|
||||
float* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SDOTAXPYV_KERNEL_REF( conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz );
|
||||
BLIS_SDOTAXPYV_KERNEL_REF
|
||||
(
|
||||
conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ddotaxpyv_opt_var1( conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* restrict alpha,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict y, inc_t incy,
|
||||
double* restrict rho,
|
||||
double* restrict z, inc_t incz )
|
||||
void bli_ddotaxpyv_opt_var1
|
||||
(
|
||||
conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* alpha,
|
||||
double* x, inc_t incx,
|
||||
double* y, inc_t incy,
|
||||
double* rho,
|
||||
double* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DDOTAXPYV_KERNEL_REF( conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz );
|
||||
BLIS_DDOTAXPYV_KERNEL_REF
|
||||
(
|
||||
conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_cdotaxpyv_opt_var1( conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict y, inc_t incy,
|
||||
scomplex* restrict rho,
|
||||
scomplex* restrict z, inc_t incz )
|
||||
void bli_cdotaxpyv_opt_var1
|
||||
(
|
||||
conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* alpha,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* y, inc_t incy,
|
||||
scomplex* rho,
|
||||
scomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CDOTAXPYV_KERNEL_REF( conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz );
|
||||
BLIS_CDOTAXPYV_KERNEL_REF
|
||||
(
|
||||
conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict y, inc_t incy,
|
||||
dcomplex* restrict rho,
|
||||
dcomplex* restrict z, inc_t incz )
|
||||
void bli_zdotaxpyv_opt_var1
|
||||
(
|
||||
conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* alpha,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* y, inc_t incy,
|
||||
dcomplex* rho,
|
||||
dcomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template dotaxpyv kernel implementation
|
||||
@@ -240,15 +268,19 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZDOTAXPYV_KERNEL_REF( conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz );
|
||||
BLIS_ZDOTAXPYV_KERNEL_REF
|
||||
(
|
||||
conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -285,8 +317,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -298,8 +330,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -309,8 +341,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -320,8 +352,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -333,8 +365,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -344,8 +376,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -355,8 +387,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -368,8 +400,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -379,8 +411,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -390,8 +422,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -403,8 +435,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -414,8 +446,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -426,6 +458,6 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
if ( bli_is_conj( conjy ) )
|
||||
bli_zconjs( dotxy );
|
||||
|
||||
bli_zzcopys( dotxy, *rho );
|
||||
bli_zcopys( dotxy, *rho );
|
||||
}
|
||||
|
||||
|
||||
@@ -36,115 +36,143 @@
|
||||
|
||||
|
||||
|
||||
void bli_sdotxaxpyf_opt_var1( conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* restrict alpha,
|
||||
float* restrict a, inc_t inca, inc_t lda,
|
||||
float* restrict w, inc_t incw,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict beta,
|
||||
float* restrict y, inc_t incy,
|
||||
float* restrict z, inc_t incz )
|
||||
void bli_sdotxaxpyf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* alpha,
|
||||
float* a, inc_t inca, inc_t lda,
|
||||
float* w, inc_t incw,
|
||||
float* x, inc_t incx,
|
||||
float* beta,
|
||||
float* y, inc_t incy,
|
||||
float* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SDOTXAXPYF_KERNEL_REF( conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_SDOTXAXPYF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ddotxaxpyf_opt_var1( conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* restrict alpha,
|
||||
double* restrict a, inc_t inca, inc_t lda,
|
||||
double* restrict w, inc_t incw,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict beta,
|
||||
double* restrict y, inc_t incy,
|
||||
double* restrict z, inc_t incz )
|
||||
void bli_ddotxaxpyf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* alpha,
|
||||
double* a, inc_t inca, inc_t lda,
|
||||
double* w, inc_t incw,
|
||||
double* x, inc_t incx,
|
||||
double* beta,
|
||||
double* y, inc_t incy,
|
||||
double* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DDOTXAXPYF_KERNEL_REF( conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_DDOTXAXPYF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_cdotxaxpyf_opt_var1( conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a, inc_t inca, inc_t lda,
|
||||
scomplex* restrict w, inc_t incw,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict beta,
|
||||
scomplex* restrict y, inc_t incy,
|
||||
scomplex* restrict z, inc_t incz )
|
||||
void bli_cdotxaxpyf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* alpha,
|
||||
scomplex* a, inc_t inca, inc_t lda,
|
||||
scomplex* w, inc_t incw,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* beta,
|
||||
scomplex* y, inc_t incy,
|
||||
scomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CDOTXAXPYF_KERNEL_REF( conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_CDOTXAXPYF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a, inc_t inca, inc_t lda,
|
||||
dcomplex* restrict w, inc_t incw,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict beta,
|
||||
dcomplex* restrict y, inc_t incy,
|
||||
dcomplex* restrict z, inc_t incz )
|
||||
void bli_zdotxaxpyf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* alpha,
|
||||
dcomplex* a, inc_t inca, inc_t lda,
|
||||
dcomplex* w, inc_t incw,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* beta,
|
||||
dcomplex* y, inc_t incy,
|
||||
dcomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
|
||||
{
|
||||
/*
|
||||
@@ -289,19 +317,23 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZDOTXAXPYF_KERNEL_REF( conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_ZDOTXAXPYF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -326,16 +358,16 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzcopys( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zzscals( *alpha, alpha_x[ j ] );
|
||||
bli_zcopys( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zscals( *alpha, alpha_x[ j ] );
|
||||
}
|
||||
}
|
||||
else // if ( bli_is_conj( conjx ) )
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzcopyjs( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zzscals( *alpha, alpha_x[ j ] );
|
||||
bli_zcopyjs( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zscals( *alpha, alpha_x[ j ] );
|
||||
}
|
||||
}
|
||||
|
||||
@@ -366,8 +398,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -383,8 +415,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -396,8 +428,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -411,8 +443,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -428,8 +460,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -441,8 +473,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -456,8 +488,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -473,8 +505,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -486,8 +518,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -501,8 +533,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -518,8 +550,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -531,8 +563,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -555,8 +587,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
// scaling by beta.
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzscals( *beta, *yp[ j ] );
|
||||
bli_zzzaxpys( *alpha, At_w[ j ], *yp[ j ] );
|
||||
bli_zscals( *beta, *yp[ j ] );
|
||||
bli_zaxpys( *alpha, At_w[ j ], *yp[ j ] );
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -36,95 +36,115 @@
|
||||
|
||||
|
||||
|
||||
void bli_sdotxf_opt_var1(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* restrict alpha,
|
||||
float* restrict a, inc_t inca, inc_t lda,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict beta,
|
||||
float* restrict y, inc_t incy
|
||||
)
|
||||
void bli_sdotxf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* alpha,
|
||||
float* a, inc_t inca, inc_t lda,
|
||||
float* x, inc_t incx,
|
||||
float* beta,
|
||||
float* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SDOTXF_KERNEL_REF( conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy );
|
||||
BLIS_SDOTXF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ddotxf_opt_var1(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* restrict alpha,
|
||||
double* restrict a, inc_t inca, inc_t lda,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict beta,
|
||||
double* restrict y, inc_t incy
|
||||
)
|
||||
void bli_ddotxf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* alpha,
|
||||
double* a, inc_t inca, inc_t lda,
|
||||
double* x, inc_t incx,
|
||||
double* beta,
|
||||
double* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DDOTXF_KERNEL_REF( conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy );
|
||||
BLIS_DDOTXF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_cdotxf_opt_var1(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a, inc_t inca, inc_t lda,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict beta,
|
||||
scomplex* restrict y, inc_t incy
|
||||
)
|
||||
void bli_cdotxf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* alpha,
|
||||
scomplex* a, inc_t inca, inc_t lda,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* beta,
|
||||
scomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CDOTXF_KERNEL_REF( conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy );
|
||||
BLIS_CDOTXF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zdotxf_opt_var1(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a, inc_t inca, inc_t lda,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict beta,
|
||||
dcomplex* restrict y, inc_t incy
|
||||
)
|
||||
void bli_zdotxf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* alpha,
|
||||
dcomplex* a, inc_t inca, inc_t lda,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* beta,
|
||||
dcomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template dotxf kernel implementation
|
||||
@@ -225,10 +245,14 @@ void bli_zdotxf_opt_var1(
|
||||
// If the vector lengths are zero, scale r by beta and return.
|
||||
if ( bli_zero_dim1( m ) )
|
||||
{
|
||||
bli_zzscalv( BLIS_NO_CONJUGATE,
|
||||
b_n,
|
||||
beta,
|
||||
y, incy );
|
||||
bli_zscalv_ex
|
||||
(
|
||||
BLIS_NO_CONJUGATE,
|
||||
b_n,
|
||||
beta,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -265,15 +289,19 @@ void bli_zdotxf_opt_var1(
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZDOTXF_KERNEL_REF( conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy );
|
||||
BLIS_ZDOTXF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
@@ -36,37 +36,45 @@
|
||||
|
||||
|
||||
|
||||
void bli_sgemm_opt_mxn(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a1,
|
||||
float* restrict b1,
|
||||
float* restrict beta,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_sgemm_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a1,
|
||||
float* restrict b1,
|
||||
float* restrict beta,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SGEMM_UKERNEL_REF( k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_SGEMM_UKERNEL_REF
|
||||
(
|
||||
k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_dgemm_opt_mxn(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a1,
|
||||
double* restrict b1,
|
||||
double* restrict beta,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_dgemm_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a1,
|
||||
double* restrict b1,
|
||||
double* restrict beta,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template gemm micro-kernel implementation
|
||||
@@ -106,6 +114,14 @@ void bli_dgemm_opt_mxn(
|
||||
information that may be useful when optimizing the gemm
|
||||
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
|
||||
more info.)
|
||||
- cntx: The address of the runtime context. The context can be queried
|
||||
for implementation-specific values such as cache and register
|
||||
blocksizes. However, most micro-kernels intrinsically "know"
|
||||
these values already, and thus the cntx argument usually can
|
||||
be safely ignored. (The following template micro-kernel code
|
||||
does in fact query MR, NR, PACKMR, and PACKNR, as needed, but
|
||||
only because those values are not hard-coded, as they would be
|
||||
in a typical optimized micro-kernel implementation.)
|
||||
|
||||
Diagram for gemm
|
||||
|
||||
@@ -203,15 +219,19 @@ void bli_dgemm_opt_mxn(
|
||||
|
||||
-FGVZ
|
||||
*/
|
||||
const dim_t mr = bli_dmr;
|
||||
const dim_t nr = bli_dnr;
|
||||
const num_t dt = BLIS_DOUBLE;
|
||||
|
||||
const inc_t cs_a = bli_dpackmr;
|
||||
const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx );
|
||||
const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = bli_dpacknr;
|
||||
const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx );
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_ab = 1;
|
||||
const inc_t cs_ab = bli_dmr;
|
||||
const inc_t cs_a = packmr;
|
||||
const inc_t rs_b = packnr;
|
||||
|
||||
const inc_t rs_ab = 1;
|
||||
const inc_t cs_ab = mr;
|
||||
|
||||
dim_t l, j, i;
|
||||
|
||||
@@ -291,36 +311,56 @@ void bli_cgemm_opt_mxn(
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
(
|
||||
dim_t k,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a1,
|
||||
scomplex* restrict b1,
|
||||
scomplex* restrict beta,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CGEMM_UKERNEL_REF( k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_CGEMM_UKERNEL_REF
|
||||
(
|
||||
k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zgemm_opt_mxn(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a1,
|
||||
dcomplex* restrict b1,
|
||||
dcomplex* restrict beta,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_zgemm_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a1,
|
||||
dcomplex* restrict b1,
|
||||
dcomplex* restrict beta,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_ZGEMM_UKERNEL_REF( k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_ZGEMM_UKERNEL_REF
|
||||
(
|
||||
k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -36,18 +36,24 @@
|
||||
|
||||
|
||||
|
||||
void bli_sgemmtrsm_l_opt_mxn(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a10,
|
||||
float* restrict a11,
|
||||
float* restrict b01,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_sgemmtrsm_l_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a10,
|
||||
float* restrict a11,
|
||||
float* restrict b01,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_spacknr;
|
||||
const num_t dt = BLIS_FLOAT;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
float* restrict minus_one = bli_sm1;
|
||||
@@ -69,16 +75,18 @@ void bli_sgemmtrsm_l_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_dgemmtrsm_l_opt_mxn(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a10,
|
||||
double* restrict a11,
|
||||
double* restrict b01,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_dgemmtrsm_l_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a10,
|
||||
double* restrict a11,
|
||||
double* restrict b01,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template gemmtrsm_l micro-kernel implementation
|
||||
@@ -131,6 +139,14 @@ void bli_dgemmtrsm_l_opt_mxn(
|
||||
information that may be useful when optimizing the gemmtrsm
|
||||
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
|
||||
more info.)
|
||||
- cntx: The address of the runtime context. The context can be queried
|
||||
for implementation-specific values such as cache and register
|
||||
blocksizes. However, most micro-kernels intrinsically "know"
|
||||
these values already, and thus the cntx argument usually can
|
||||
be safely ignored. (The following template micro-kernel code
|
||||
does in fact query MR, NR, PACKMR, and PACKNR, as needed, but
|
||||
only because those values are not hard-coded, as they would be
|
||||
in a typical optimized micro-kernel implementation.)
|
||||
|
||||
Diagram for gemmtrsm_l
|
||||
|
||||
@@ -203,7 +219,11 @@ void bli_dgemmtrsm_l_opt_mxn(
|
||||
|
||||
-FGVZ
|
||||
*/
|
||||
const inc_t rs_b = bli_dpacknr;
|
||||
const num_t dt = BLIS_DOUBLE;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
double* restrict minus_one = bli_dm1;
|
||||
@@ -227,18 +247,24 @@ void bli_dgemmtrsm_l_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_cgemmtrsm_l_opt_mxn(
|
||||
dim_t k,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a10,
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b01,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_cgemmtrsm_l_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a10,
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b01,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_cpacknr;
|
||||
const num_t dt = BLIS_SCOMPLEX;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
scomplex* restrict minus_one = bli_cm1;
|
||||
@@ -260,18 +286,24 @@ void bli_cgemmtrsm_l_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_zgemmtrsm_l_opt_mxn(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a10,
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b01,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_zgemmtrsm_l_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a10,
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b01,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_zpacknr;
|
||||
const num_t dt = BLIS_DCOMPLEX;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
dcomplex* restrict minus_one = bli_zm1;
|
||||
|
||||
@@ -36,18 +36,24 @@
|
||||
|
||||
|
||||
|
||||
void bli_sgemmtrsm_u_opt_mxn(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a12,
|
||||
float* restrict a11,
|
||||
float* restrict b21,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_sgemmtrsm_u_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a10,
|
||||
float* restrict a11,
|
||||
float* restrict b01,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_spacknr;
|
||||
const num_t dt = BLIS_FLOAT;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
float* restrict minus_one = bli_sm1;
|
||||
@@ -69,16 +75,18 @@ void bli_sgemmtrsm_u_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_dgemmtrsm_u_opt_mxn(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a12,
|
||||
double* restrict a11,
|
||||
double* restrict b21,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_dgemmtrsm_u_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a10,
|
||||
double* restrict a11,
|
||||
double* restrict b01,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template gemmtrsm_u micro-kernel implementation
|
||||
@@ -131,6 +139,14 @@ void bli_dgemmtrsm_u_opt_mxn(
|
||||
information that may be useful when optimizing the gemmtrsm
|
||||
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
|
||||
more info.)
|
||||
- cntx: The address of the runtime context. The context can be queried
|
||||
for implementation-specific values such as cache and register
|
||||
blocksizes. However, most micro-kernels intrinsically "know"
|
||||
these values already, and thus the cntx argument usually can
|
||||
be safely ignored. (The following template micro-kernel code
|
||||
does in fact query MR, NR, PACKMR, and PACKNR, as needed, but
|
||||
only because those values are not hard-coded, as they would be
|
||||
in a typical optimized micro-kernel implementation.)
|
||||
|
||||
Diagram for gemmtrsm_u
|
||||
|
||||
@@ -200,7 +216,11 @@ void bli_dgemmtrsm_u_opt_mxn(
|
||||
blis-devel mailing list.
|
||||
|
||||
*/
|
||||
const inc_t rs_b = bli_dpacknr;
|
||||
const num_t dt = BLIS_DOUBLE;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
double* restrict minus_one = bli_dm1;
|
||||
@@ -224,18 +244,24 @@ void bli_dgemmtrsm_u_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_cgemmtrsm_u_opt_mxn(
|
||||
dim_t k,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a12,
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b21,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_cgemmtrsm_u_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a10,
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b01,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_cpacknr;
|
||||
const num_t dt = BLIS_SCOMPLEX;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
scomplex* restrict minus_one = bli_cm1;
|
||||
@@ -257,18 +283,24 @@ void bli_cgemmtrsm_u_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_zgemmtrsm_u_opt_mxn(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a12,
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b21,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_zgemmtrsm_u_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a10,
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b01,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_zpacknr;
|
||||
const num_t dt = BLIS_DCOMPLEX;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
dcomplex* restrict minus_one = bli_zm1;
|
||||
|
||||
@@ -36,28 +36,36 @@
|
||||
|
||||
|
||||
|
||||
void bli_strsm_l_opt_mxn(
|
||||
float* restrict a11,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_strsm_l_opt_mxn
|
||||
(
|
||||
float* restrict a11,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_STRSM_L_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_STRSM_L_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_dtrsm_l_opt_mxn(
|
||||
double* restrict a11,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_dtrsm_l_opt_mxn
|
||||
(
|
||||
double* restrict a11,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template trsm_l micro-kernel implementation
|
||||
@@ -100,6 +108,14 @@ void bli_dtrsm_l_opt_mxn(
|
||||
information that may be useful when optimizing the trsm
|
||||
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
|
||||
more info.)
|
||||
- cntx: The address of the runtime context. The context can be queried
|
||||
for implementation-specific values such as cache and register
|
||||
blocksizes. However, most micro-kernels intrinsically "know"
|
||||
these values already, and thus the cntx argument usually can
|
||||
be safely ignored. (The following template micro-kernel code
|
||||
does in fact query MR, NR, PACKMR, and PACKNR, as needed, but
|
||||
only because those values are not hard-coded, as they would be
|
||||
in a typical optimized micro-kernel implementation.)
|
||||
|
||||
Diagrams for trsm
|
||||
|
||||
@@ -142,14 +158,20 @@ void bli_dtrsm_l_opt_mxn(
|
||||
|
||||
-FGVZ
|
||||
*/
|
||||
const dim_t m = bli_dmr;
|
||||
const dim_t n = bli_dnr;
|
||||
const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx );
|
||||
const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_a = 1;
|
||||
const inc_t cs_a = bli_dpackmr;
|
||||
const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx );
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = bli_dpacknr;
|
||||
const inc_t cs_b = 1;
|
||||
const dim_t m = mr;
|
||||
const dim_t n = nr;
|
||||
|
||||
const inc_t rs_a = 1;
|
||||
const inc_t cs_a = packmr;
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
dim_t iter, i, j, l;
|
||||
dim_t n_behind;
|
||||
@@ -208,33 +230,45 @@ void bli_dtrsm_l_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_ctrsm_l_opt_mxn(
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_ctrsm_l_opt_mxn
|
||||
(
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CTRSM_L_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_CTRSM_L_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ztrsm_l_opt_mxn(
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_ztrsm_l_opt_mxn
|
||||
(
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_ZTRSM_L_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_ZTRSM_L_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -36,18 +36,24 @@
|
||||
|
||||
|
||||
|
||||
void bli_strsm_u_opt_mxn(
|
||||
float* restrict a11,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_strsm_u_opt_mxn
|
||||
(
|
||||
float* restrict a11,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_STRSM_U_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_STRSM_U_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -58,6 +64,13 @@ void bli_dtrsm_u_opt_mxn(
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
(
|
||||
double* restrict a11,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template trsm_u micro-kernel implementation
|
||||
@@ -100,6 +113,14 @@ void bli_dtrsm_u_opt_mxn(
|
||||
information that may be useful when optimizing the trsm
|
||||
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
|
||||
more info.)
|
||||
- cntx: The address of the runtime context. The context can be queried
|
||||
for implementation-specific values such as cache and register
|
||||
blocksizes. However, most micro-kernels intrinsically "know"
|
||||
these values already, and thus the cntx argument usually can
|
||||
be safely ignored. (The following template micro-kernel code
|
||||
does in fact query MR, NR, PACKMR, and PACKNR, as needed, but
|
||||
only because those values are not hard-coded, as they would be
|
||||
in a typical optimized micro-kernel implementation.)
|
||||
|
||||
Diagrams for trsm
|
||||
|
||||
@@ -141,14 +162,20 @@ void bli_dtrsm_u_opt_mxn(
|
||||
|
||||
-FGVZ
|
||||
*/
|
||||
const dim_t m = bli_dmr;
|
||||
const dim_t n = bli_dnr;
|
||||
const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx );
|
||||
const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_a = 1;
|
||||
const inc_t cs_a = bli_dpackmr;
|
||||
const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx );
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = bli_dpacknr;
|
||||
const inc_t cs_b = 1;
|
||||
const dim_t m = mr;
|
||||
const dim_t n = nr;
|
||||
|
||||
const inc_t rs_a = 1;
|
||||
const inc_t cs_a = packmr;
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
dim_t iter, i, j, l;
|
||||
dim_t n_behind;
|
||||
@@ -207,33 +234,45 @@ void bli_dtrsm_u_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_ctrsm_u_opt_mxn(
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_ctrsm_u_opt_mxn
|
||||
(
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CTRSM_U_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_CTRSM_U_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ztrsm_u_opt_mxn(
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_ztrsm_u_opt_mxn
|
||||
(
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_ZTRSM_U_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_ZTRSM_U_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -32,8 +32,10 @@
|
||||
|
||||
*/
|
||||
|
||||
//
|
||||
// Prototype object-based fusing factor query routine.
|
||||
//
|
||||
dim_t bli_dotxaxpyf_fusefac( num_t dt );
|
||||
#include "bli_l0_check.h"
|
||||
|
||||
#include "bli_l0_oapi.h"
|
||||
#include "bli_l0_tapi.h"
|
||||
|
||||
// copysc
|
||||
#include "bli_copysc.h"
|
||||
314
frame/0/bli_l0_check.c
Normal file
314
frame/0/bli_l0_check.c
Normal file
@@ -0,0 +1,314 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
//
|
||||
// Define object-based check functions.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
) \
|
||||
{ \
|
||||
bli_l0_xxsc_check( chi, psi ); \
|
||||
}
|
||||
|
||||
GENFRONT( addsc )
|
||||
GENFRONT( copysc )
|
||||
GENFRONT( divsc )
|
||||
GENFRONT( mulsc )
|
||||
GENFRONT( sqrtsc )
|
||||
GENFRONT( subsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* norm \
|
||||
) \
|
||||
{ \
|
||||
bli_l0_xx2sc_check( chi, norm ); \
|
||||
}
|
||||
|
||||
GENFRONT( absqsc )
|
||||
GENFRONT( normfsc )
|
||||
|
||||
|
||||
void bli_getsc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
double* zeta_r,
|
||||
double* zeta_i
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_noninteger_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
|
||||
void bli_setsc_check
|
||||
(
|
||||
double zeta_r,
|
||||
double zeta_i,
|
||||
obj_t* chi
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_floating_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
|
||||
void bli_unzipsc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
obj_t* zeta_r,
|
||||
obj_t* zeta_i
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_noninteger_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_real_object( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_real_object( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_nonconstant_object( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_nonconstant_object( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_real_proj_of( chi, zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_real_proj_of( chi, zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
|
||||
void bli_zipsc_check
|
||||
(
|
||||
obj_t* zeta_r,
|
||||
obj_t* zeta_i,
|
||||
obj_t* chi
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_real_object( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_real_object( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_noninteger_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_nonconstant_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_real_proj_of( chi, zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_real_proj_of( chi, zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_l0_xxsc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
obj_t* psi
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_noninteger_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_noninteger_object( psi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_nonconstant_object( psi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( psi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( psi );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
void bli_l0_xx2sc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
obj_t* absq
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_noninteger_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_nonconstant_object( absq );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_real_object( absq );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_real_proj_of( chi, absq );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( absq );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( absq );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
134
frame/0/bli_l0_check.h
Normal file
134
frame/0/bli_l0_check.h
Normal file
@@ -0,0 +1,134 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based check functions.
|
||||
//
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
);
|
||||
|
||||
GENTPROT( addsc )
|
||||
GENTPROT( copysc )
|
||||
GENTPROT( divsc )
|
||||
GENTPROT( mulsc )
|
||||
GENTPROT( sqrtsc )
|
||||
GENTPROT( subsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* absq \
|
||||
);
|
||||
|
||||
GENTPROT( absqsc )
|
||||
GENTPROT( normfsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
);
|
||||
|
||||
GENTPROT( getsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
double zeta_r, \
|
||||
double zeta_i, \
|
||||
obj_t* chi \
|
||||
);
|
||||
|
||||
GENTPROT( setsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i \
|
||||
);
|
||||
|
||||
GENTPROT( unzipsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i, \
|
||||
obj_t* chi \
|
||||
);
|
||||
|
||||
GENTPROT( zipsc )
|
||||
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_l0_xxsc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
obj_t* psi
|
||||
);
|
||||
|
||||
void bli_l0_xx2sc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
obj_t* norm
|
||||
);
|
||||
288
frame/0/bli_l0_oapi.c
Normal file
288
frame/0/bli_l0_oapi.c
Normal file
@@ -0,0 +1,288 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
//
|
||||
// Define object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* absq \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi; \
|
||||
num_t dt_absq_c = bli_obj_datatype_proj_to_complex( *absq ); \
|
||||
\
|
||||
void* buf_chi; \
|
||||
void* buf_absq = bli_obj_buffer_at_off( *absq ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, absq ); \
|
||||
\
|
||||
/* If chi is a scalar constant, use dt_absq_c to extract the address of the
|
||||
corresponding constant value; otherwise, use the datatype encoded
|
||||
within the chi object and extract the buffer at the chi offset. */ \
|
||||
bli_set_scalar_dt_buffer( chi, dt_absq_c, dt_chi, buf_chi ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_2 \
|
||||
( \
|
||||
dt_chi, \
|
||||
opname, \
|
||||
buf_chi, \
|
||||
buf_absq \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( absqsc )
|
||||
GENFRONT( normfsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
) \
|
||||
{ \
|
||||
num_t dt = bli_obj_datatype( *psi ); \
|
||||
\
|
||||
conj_t conjchi = bli_obj_conj_status( *chi ); \
|
||||
\
|
||||
void* buf_chi = bli_obj_buffer_for_1x1( dt, *chi ); \
|
||||
void* buf_psi = bli_obj_buffer_at_off( *psi ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, psi ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_3 \
|
||||
( \
|
||||
dt, \
|
||||
opname, \
|
||||
conjchi, \
|
||||
buf_chi, \
|
||||
buf_psi \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( addsc )
|
||||
GENFRONT( divsc )
|
||||
GENFRONT( mulsc )
|
||||
GENFRONT( subsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
) \
|
||||
{ \
|
||||
num_t dt = bli_obj_datatype( *psi ); \
|
||||
\
|
||||
void* buf_chi = bli_obj_buffer_for_1x1( dt, *chi ); \
|
||||
void* buf_psi = bli_obj_buffer_at_off( *psi ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, psi ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_2 \
|
||||
( \
|
||||
dt, \
|
||||
opname, \
|
||||
buf_chi, \
|
||||
buf_psi \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( sqrtsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi = bli_obj_datatype( *chi ); \
|
||||
num_t dt_def = BLIS_DCOMPLEX; \
|
||||
num_t dt_use; \
|
||||
\
|
||||
/* If chi is a constant object, default to using the dcomplex
|
||||
value to maximize precision, and since we don't know if the
|
||||
caller needs just the real or the real and imaginary parts. */ \
|
||||
void* buf_chi = bli_obj_buffer_for_1x1( dt_def, *chi ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
|
||||
\
|
||||
/* The _check() routine prevents integer types, so we know that chi
|
||||
is either a constant or an actual floating-point type. */ \
|
||||
if ( bli_is_constant( dt_chi ) ) dt_use = dt_def; \
|
||||
else dt_use = dt_chi; \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_3 \
|
||||
( \
|
||||
dt_use, \
|
||||
opname, \
|
||||
buf_chi, \
|
||||
zeta_r, \
|
||||
zeta_i \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( getsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
double zeta_r, \
|
||||
double zeta_i, \
|
||||
obj_t* chi \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi = bli_obj_datatype( *chi ); \
|
||||
\
|
||||
void* buf_chi = bli_obj_buffer_at_off( *chi ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( zeta_r, zeta_i, chi ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_3 \
|
||||
( \
|
||||
dt_chi, \
|
||||
opname, \
|
||||
zeta_r, \
|
||||
zeta_i, \
|
||||
buf_chi \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( setsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi; \
|
||||
num_t dt_zeta_c = bli_obj_datatype_proj_to_complex( *zeta_r ); \
|
||||
\
|
||||
void* buf_chi; \
|
||||
\
|
||||
void* buf_zeta_r = bli_obj_buffer_at_off( *zeta_r ); \
|
||||
void* buf_zeta_i = bli_obj_buffer_at_off( *zeta_i ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
|
||||
\
|
||||
/* If chi is a scalar constant, use dt_zeta_c to extract the address of the
|
||||
corresponding constant value; otherwise, use the datatype encoded
|
||||
within the chi object and extract the buffer at the chi offset. */ \
|
||||
bli_set_scalar_dt_buffer( chi, dt_zeta_c, dt_chi, buf_chi ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_3 \
|
||||
( \
|
||||
dt_chi, \
|
||||
opname, \
|
||||
buf_chi, \
|
||||
buf_zeta_r, \
|
||||
buf_zeta_i \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( unzipsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i, \
|
||||
obj_t* chi \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi = bli_obj_datatype( *chi ); \
|
||||
\
|
||||
void* buf_zeta_r = bli_obj_buffer_for_1x1( dt_chi, *zeta_r ); \
|
||||
void* buf_zeta_i = bli_obj_buffer_for_1x1( dt_chi, *zeta_i ); \
|
||||
\
|
||||
void* buf_chi = bli_obj_buffer_at_off( *chi ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_3 \
|
||||
( \
|
||||
dt_chi, \
|
||||
opname, \
|
||||
buf_zeta_i, \
|
||||
buf_zeta_r, \
|
||||
buf_chi \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( zipsc )
|
||||
|
||||
125
frame/0/bli_l0_oapi.h
Normal file
125
frame/0/bli_l0_oapi.h
Normal file
@@ -0,0 +1,125 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* absq \
|
||||
);
|
||||
|
||||
GENPROT( absqsc )
|
||||
GENPROT( normfsc )
|
||||
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
);
|
||||
|
||||
GENPROT( addsc )
|
||||
GENPROT( divsc )
|
||||
GENPROT( mulsc )
|
||||
GENPROT( sqrtsc )
|
||||
GENPROT( subsc )
|
||||
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
);
|
||||
|
||||
GENPROT( getsc )
|
||||
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
double zeta_r, \
|
||||
double zeta_i, \
|
||||
obj_t* chi \
|
||||
);
|
||||
|
||||
GENPROT( setsc )
|
||||
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i \
|
||||
);
|
||||
|
||||
GENPROT( unzipsc )
|
||||
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i, \
|
||||
obj_t* chi \
|
||||
);
|
||||
|
||||
GENPROT( zipsc )
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
210
frame/0/bli_l0_tapi.c
Normal file
210
frame/0/bli_l0_tapi.c
Normal file
@@ -0,0 +1,210 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname, kername ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
) \
|
||||
{ \
|
||||
ctype chi_conj; \
|
||||
\
|
||||
PASTEMAC(ch,copycjs)( conjchi, *chi, chi_conj ); \
|
||||
PASTEMAC(ch,kername)( chi_conj, *psi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( addsc, adds )
|
||||
INSERT_GENTFUNC_BASIC( divsc, invscals )
|
||||
INSERT_GENTFUNC_BASIC( subsc, subs )
|
||||
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname, kername ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
) \
|
||||
{ \
|
||||
if ( PASTEMAC(ch,eq0)( *chi ) ) \
|
||||
{ \
|
||||
/* Overwrite potential Infs and NaNs. */ \
|
||||
PASTEMAC(ch,set0s)( *psi ); \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
ctype chi_conj; \
|
||||
\
|
||||
PASTEMAC(ch,copycjs)( conjchi, *chi, chi_conj ); \
|
||||
PASTEMAC(ch,kername)( chi_conj, *psi ); \
|
||||
} \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( mulsc, scals )
|
||||
|
||||
|
||||
#undef GENTFUNCR
|
||||
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype_r* absq \
|
||||
) \
|
||||
{ \
|
||||
ctype_r chi_r; \
|
||||
ctype_r chi_i; \
|
||||
ctype_r absq_i; \
|
||||
\
|
||||
( void )absq_i; \
|
||||
\
|
||||
PASTEMAC2(ch,chr,gets)( *chi, chi_r, chi_i ); \
|
||||
\
|
||||
/* absq = chi_r * chi_r + chi_i * chi_i; \
|
||||
absq_r = 0.0; (thrown away) */ \
|
||||
PASTEMAC(ch,absq2ris)( chi_r, chi_i, *absq, absq_i ); \
|
||||
\
|
||||
( void )chi_i; \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNCR_BASIC0( absqsc )
|
||||
|
||||
|
||||
#undef GENTFUNCR
|
||||
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype_r* norm \
|
||||
) \
|
||||
{ \
|
||||
/* norm = sqrt( chi_r * chi_r + chi_i * chi_i ); */ \
|
||||
PASTEMAC2(ch,chr,abval2s)( *chi, *norm ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNCR_BASIC0( normfsc )
|
||||
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
) \
|
||||
{ \
|
||||
/* NOTE: sqrtsc/sqrt2s differs from normfsc/abval2s in the complex domain. */ \
|
||||
PASTEMAC(ch,sqrt2s)( *chi, *psi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC0( sqrtsc )
|
||||
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(ch,d,gets)( *chi, *zeta_r, *zeta_i ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC0( getsc )
|
||||
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
double zeta_r, \
|
||||
double zeta_i, \
|
||||
ctype* chi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(d,ch,sets)( zeta_r, zeta_i, *chi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC0( setsc )
|
||||
|
||||
|
||||
#undef GENTFUNCR
|
||||
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype_r* zeta_r, \
|
||||
ctype_r* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(ch,chr,gets)( *chi, *zeta_r, *zeta_i ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNCR_BASIC0( unzipsc )
|
||||
|
||||
|
||||
#undef GENTFUNCR
|
||||
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype_r* zeta_r, \
|
||||
ctype_r* zeta_i, \
|
||||
ctype* chi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(chr,ch,sets)( *zeta_r, *zeta_i, *chi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNCR_BASIC0( zipsc )
|
||||
|
||||
131
frame/0/bli_l0_tapi.h
Normal file
131
frame/0/bli_l0_tapi.h
Normal file
@@ -0,0 +1,131 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
//
|
||||
// Prototype BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( addsc )
|
||||
INSERT_GENTPROT_BASIC( divsc )
|
||||
INSERT_GENTPROT_BASIC( mulsc )
|
||||
INSERT_GENTPROT_BASIC( subsc )
|
||||
|
||||
|
||||
#undef GENTPROTR
|
||||
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype_r* absq \
|
||||
);
|
||||
|
||||
INSERT_GENTPROTR_BASIC( absqsc )
|
||||
INSERT_GENTPROTR_BASIC( normfsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( sqrtsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( getsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
double zeta_r, \
|
||||
double zeta_i, \
|
||||
ctype* chi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( setsc )
|
||||
|
||||
|
||||
#undef GENTPROTR
|
||||
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype_r* zeta_r, \
|
||||
ctype_r* zeta_i \
|
||||
);
|
||||
|
||||
INSERT_GENTPROTR_BASIC( unzipsc )
|
||||
|
||||
|
||||
#undef GENTPROTR
|
||||
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype_r* zeta_r, \
|
||||
ctype_r* zeta_i, \
|
||||
ctype* chi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROTR_BASIC( zipsc )
|
||||
|
||||
@@ -34,66 +34,93 @@
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
// NOTE: This is one of the few functions in BLIS that is defined
|
||||
// with heterogeneous type support. This is done so that we have
|
||||
// an operation that can be used to typecast (copy-cast) a scalar
|
||||
// of one datatype to a scalar of another datatype.
|
||||
|
||||
typedef void (*FUNCPTR_T)(
|
||||
conj_t conjchi,
|
||||
void* chi,
|
||||
void* psi
|
||||
);
|
||||
|
||||
static FUNCPTR_T GENARRAY2_ALL(ftypes,copysc);
|
||||
|
||||
//
|
||||
// Define object-based interface.
|
||||
// Define object-based interfaces.
|
||||
//
|
||||
void bli_copysc( obj_t* chi,
|
||||
obj_t* psi )
|
||||
{
|
||||
if ( bli_error_checking_is_enabled() )
|
||||
bli_copysc_check( chi, psi );
|
||||
|
||||
bli_copysc_unb_var1( chi, psi );
|
||||
}
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with homogeneous-typed operands.
|
||||
//
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname, varname ) \
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
) \
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(ch,ch,varname)( conjchi, \
|
||||
chi, \
|
||||
psi ); \
|
||||
conj_t conjchi = bli_obj_conj_status( *chi ); \
|
||||
\
|
||||
num_t dt_psi = bli_obj_datatype( *psi ); \
|
||||
void* buf_psi = bli_obj_buffer_at_off( *psi ); \
|
||||
\
|
||||
num_t dt_chi; \
|
||||
void* buf_chi; \
|
||||
\
|
||||
FUNCPTR_T f; \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, psi ); \
|
||||
\
|
||||
/* If chi is a scalar constant, use dt_psi to extract the address of the
|
||||
corresponding constant value; otherwise, use the datatype encoded
|
||||
within the chi object and extract the buffer at the chi offset. */ \
|
||||
bli_set_scalar_dt_buffer( chi, dt_psi, dt_chi, buf_chi ); \
|
||||
\
|
||||
/* Index into the type combination array to extract the correct
|
||||
function pointer. */ \
|
||||
f = ftypes[dt_chi][dt_psi]; \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
f( \
|
||||
conjchi, \
|
||||
buf_chi, \
|
||||
buf_psi \
|
||||
); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( copysc, copysc_unb_var1 )
|
||||
GENFRONT( copysc )
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with heterogeneous-typed operands.
|
||||
// Define BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTFUNC2
|
||||
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \
|
||||
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname ) \
|
||||
\
|
||||
void PASTEMAC2(chx,chy,opname)( \
|
||||
conj_t conjchi, \
|
||||
ctype_x* chi, \
|
||||
ctype_y* psi \
|
||||
) \
|
||||
void PASTEMAC2(chx,chy,varname) \
|
||||
( \
|
||||
conj_t conjchi, \
|
||||
void* chi, \
|
||||
void* psi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(chx,chy,varname)( conjchi, \
|
||||
chi, \
|
||||
psi ); \
|
||||
ctype_x* chi_cast = chi; \
|
||||
ctype_y* psi_cast = psi; \
|
||||
\
|
||||
if ( bli_is_conj( conjchi ) ) \
|
||||
{ \
|
||||
PASTEMAC2(chx,chy,copyjs)( *chi_cast, *psi_cast ); \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
PASTEMAC2(chx,chy,copys)( *chi_cast, *psi_cast ); \
|
||||
} \
|
||||
}
|
||||
|
||||
// Define the basic set of functions unconditionally, and then also some
|
||||
// mixed datatype functions if requested.
|
||||
INSERT_GENTFUNC2_BASIC( copysc, copysc_unb_var1 )
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_D( copysc, copysc_unb_var1 )
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_P( copysc, copysc_unb_var1 )
|
||||
#endif
|
||||
INSERT_GENTFUNC2_BASIC0( copysc )
|
||||
INSERT_GENTFUNC2_MIX_D0( copysc )
|
||||
INSERT_GENTFUNC2_MIX_P0( copysc )
|
||||
|
||||
|
||||
@@ -32,51 +32,37 @@
|
||||
|
||||
*/
|
||||
|
||||
#include "bli_copysc_check.h"
|
||||
#include "bli_copysc_unb_var1.h"
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based interface.
|
||||
// Prototype object-based interfaces.
|
||||
//
|
||||
void bli_copysc( obj_t* chi,
|
||||
obj_t* psi );
|
||||
|
||||
|
||||
//
|
||||
// Prototype BLAS-like interfaces with homogeneous-typed operands.
|
||||
//
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( copysc )
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
);
|
||||
GENFRONT( copysc )
|
||||
|
||||
|
||||
//
|
||||
// Prototype BLAS-like interfaces with heterogeneous-typed operands.
|
||||
// Define BLAS-like interfaces with heterogeneous-typed operands.
|
||||
//
|
||||
|
||||
#undef GENTPROT2
|
||||
#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \
|
||||
#define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \
|
||||
\
|
||||
void PASTEMAC2(chx,chy,opname)( \
|
||||
conj_t conjchi, \
|
||||
ctype_x* chi, \
|
||||
ctype_y* psi \
|
||||
);
|
||||
void PASTEMAC2(chx,chy,varname) \
|
||||
( \
|
||||
conj_t conjchi, \
|
||||
void* chi, \
|
||||
void* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT2_BASIC( copysc )
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
|
||||
INSERT_GENTPROT2_MIX_D( copysc )
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
|
||||
INSERT_GENTPROT2_MIX_P( copysc )
|
||||
#endif
|
||||
|
||||
|
||||
@@ -34,76 +34,78 @@
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
typedef void (*FUNCPTR_T)(
|
||||
void* chi,
|
||||
double* zeta_r,
|
||||
double* zeta_i
|
||||
);
|
||||
|
||||
static FUNCPTR_T GENARRAY(ftypes,getsc);
|
||||
|
||||
//
|
||||
// Define object-based interface.
|
||||
// Define object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname, varname ) \
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname)( \
|
||||
obj_t* x, \
|
||||
obj_t* y \
|
||||
obj_t* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( x, y ); \
|
||||
num_t dt_chi = bli_obj_datatype( *chi ); \
|
||||
num_t dt_def = BLIS_DCOMPLEX; \
|
||||
num_t dt_use; \
|
||||
\
|
||||
PASTEMAC0(varname)( x, \
|
||||
y ); \
|
||||
/* If chi is a constant object, default to using the dcomplex
|
||||
value to maximize precision, and since we don't know if the
|
||||
caller needs just the real or the real and imaginary parts. */ \
|
||||
void* buf_chi = bli_obj_buffer_for_1x1( dt_def, *chi ); \
|
||||
\
|
||||
FUNCPTR_T f; \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
|
||||
\
|
||||
/* The _check() routine prevents integer types, so we know that chi
|
||||
is either a constant or an actual floating-point type. */ \
|
||||
if ( bli_is_constant( dt_chi ) ) dt_use = dt_def; \
|
||||
else dt_use = dt_chi; \
|
||||
\
|
||||
/* Index into the type combination array to extract the correct
|
||||
function pointer. */ \
|
||||
f = ftypes[dt_use]; \
|
||||
\
|
||||
/* Invoke the function. */ \
|
||||
f( \
|
||||
buf_chi, \
|
||||
zeta_r, \
|
||||
zeta_i \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( addv, addv_kernel )
|
||||
GENFRONT( getsc )
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with homogeneous-typed operands.
|
||||
// Define BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname, varname ) \
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
conj_t conjx, \
|
||||
dim_t n, \
|
||||
ctype* x, inc_t incx, \
|
||||
ctype* y, inc_t incy \
|
||||
void* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(ch,ch,varname)( conjx, \
|
||||
n, \
|
||||
x, incx, \
|
||||
y, incy ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( addv, ADDV_KERNEL )
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with heterogeneous-typed operands.
|
||||
//
|
||||
#undef GENTFUNC2
|
||||
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \
|
||||
ctype* chi_cast = chi; \
|
||||
\
|
||||
void PASTEMAC2(chx,chy,opname)( \
|
||||
conj_t conjx, \
|
||||
dim_t n, \
|
||||
ctype_x* x, inc_t incx, \
|
||||
ctype_y* y, inc_t incy \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(chx,chy,varname)( conjx, \
|
||||
n, \
|
||||
x, incx, \
|
||||
y, incy ); \
|
||||
PASTEMAC2(ch,d,gets)( *chi_cast, *zeta_r, *zeta_i ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC2_BASIC( addv, ADDV_KERNEL )
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_D( addv, ADDV_KERNEL )
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_P( addv, ADDV_KERNEL )
|
||||
#endif
|
||||
INSERT_GENTFUNC_BASIC( getsc )
|
||||
|
||||
@@ -32,42 +32,33 @@
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
|
||||
//
|
||||
// Define object-based interface.
|
||||
// Prototype object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname, varname ) \
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname)( \
|
||||
obj_t* x \
|
||||
) \
|
||||
{ \
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( x ); \
|
||||
\
|
||||
PASTEMAC0(varname)( x ); \
|
||||
}
|
||||
|
||||
GENFRONT( invertv, invertv_kernel )
|
||||
obj_t* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
);
|
||||
GENFRONT( getsc )
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces.
|
||||
// Prototype BLAS-like interfaces with typed operands.
|
||||
//
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname, varname ) \
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
dim_t n, \
|
||||
ctype* x, inc_t incx \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC(ch,varname)( n, \
|
||||
x, incx ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( invertv, INVERTV_KERNEL )
|
||||
void* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( getsc )
|
||||
101
frame/0/old/bli_setsc.c
Normal file
101
frame/0/old/bli_setsc.c
Normal file
@@ -0,0 +1,101 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
typedef void (*FUNCPTR_T)(
|
||||
double* zeta_r,
|
||||
double* zeta_i,
|
||||
void* chi
|
||||
);
|
||||
|
||||
static FUNCPTR_T GENARRAY(ftypes,setsc);
|
||||
|
||||
//
|
||||
// Define object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname)( \
|
||||
double* zeta_r, \
|
||||
double* zeta_i, \
|
||||
obj_t* chi \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi = bli_obj_datatype( *chi ); \
|
||||
\
|
||||
void* buf_chi = bli_obj_buffer_at_off( *chi ); \
|
||||
\
|
||||
FUNCPTR_T f; \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( zeta_r, zeta_i, chi ); \
|
||||
\
|
||||
/* Index into the type combination array to extract the correct
|
||||
function pointer. */ \
|
||||
f = ftypes[dt_chi]; \
|
||||
\
|
||||
/* Invoke the function. */ \
|
||||
f( \
|
||||
zeta_r, \
|
||||
zeta_i, \
|
||||
buf_chi \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( setsc )
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
void* chi, \
|
||||
) \
|
||||
{ \
|
||||
ctype* chi_cast = chi; \
|
||||
\
|
||||
PASTEMAC2(d,ch,sets)( *zeta_r, *zeta_i, *chi_cast ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( setsc )
|
||||
|
||||
64
frame/0/old/bli_setsc.h
Normal file
64
frame/0/old/bli_setsc.h
Normal file
@@ -0,0 +1,64 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname)( \
|
||||
double* zeta_r, \
|
||||
double* zeta_i, \
|
||||
obj_t* chi \
|
||||
);
|
||||
GENFRONT( setsc )
|
||||
|
||||
|
||||
//
|
||||
// Prototype BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
double* zeta_r, \
|
||||
double* zeta_i, \
|
||||
void* chi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( setsc )
|
||||
@@ -38,22 +38,14 @@
|
||||
//
|
||||
// Define object-based interface.
|
||||
//
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname, varname ) \
|
||||
\
|
||||
void PASTEMAC0(opname)( \
|
||||
obj_t* x, \
|
||||
obj_t* y \
|
||||
) \
|
||||
{ \
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( x, y ); \
|
||||
\
|
||||
PASTEMAC0(varname)( x, \
|
||||
y ); \
|
||||
}
|
||||
void bli_copysc( obj_t* chi,
|
||||
obj_t* psi )
|
||||
{
|
||||
if ( bli_error_checking_is_enabled() )
|
||||
bli_copysc_check( chi, psi );
|
||||
|
||||
GENFRONT( swapv, swapv_kernel )
|
||||
bli_copysc_unb_var1( chi, psi );
|
||||
}
|
||||
|
||||
|
||||
//
|
||||
@@ -63,17 +55,17 @@ GENFRONT( swapv, swapv_kernel )
|
||||
#define GENTFUNC( ctype, ch, opname, varname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
dim_t n, \
|
||||
ctype* x, inc_t incx, \
|
||||
ctype* y, inc_t incy \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(ch,ch,varname)( n, \
|
||||
x, incx, \
|
||||
y, incy ); \
|
||||
PASTEMAC2(ch,ch,varname)( conjchi, \
|
||||
chi, \
|
||||
psi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( swapv, SWAPV_KERNEL )
|
||||
INSERT_GENTFUNC_BASIC( copysc, copysc_unb_var1 )
|
||||
|
||||
|
||||
//
|
||||
@@ -83,23 +75,25 @@ INSERT_GENTFUNC_BASIC( swapv, SWAPV_KERNEL )
|
||||
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \
|
||||
\
|
||||
void PASTEMAC2(chx,chy,opname)( \
|
||||
dim_t n, \
|
||||
ctype_x* x, inc_t incx, \
|
||||
ctype_y* y, inc_t incy \
|
||||
conj_t conjchi, \
|
||||
ctype_x* chi, \
|
||||
ctype_y* psi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(chx,chy,varname)( n, \
|
||||
x, incx, \
|
||||
y, incy ); \
|
||||
PASTEMAC2(chx,chy,varname)( conjchi, \
|
||||
chi, \
|
||||
psi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC2_BASIC( swapv, SWAPV_KERNEL )
|
||||
// Define the basic set of functions unconditionally, and then also some
|
||||
// mixed datatype functions if requested.
|
||||
INSERT_GENTFUNC2_BASIC( copysc, copysc_unb_var1 )
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_D( swapv, SWAPV_KERNEL )
|
||||
INSERT_GENTFUNC2_MIX_D( copysc, copysc_unb_var1 )
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_P( swapv, SWAPV_KERNEL )
|
||||
INSERT_GENTFUNC2_MIX_P( copysc, copysc_unb_var1 )
|
||||
#endif
|
||||
|
||||
@@ -32,17 +32,15 @@
|
||||
|
||||
*/
|
||||
|
||||
#include "bli_setv_check.h"
|
||||
|
||||
#include "bli_setv_kernel.h"
|
||||
#include "bli_setv_ref.h"
|
||||
#include "bli_copysc_check.h"
|
||||
#include "bli_copysc_unb_var1.h"
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based interface.
|
||||
//
|
||||
void bli_setv( obj_t* beta,
|
||||
obj_t* x );
|
||||
void bli_copysc( obj_t* chi,
|
||||
obj_t* psi );
|
||||
|
||||
|
||||
//
|
||||
@@ -52,33 +50,33 @@ void bli_setv( obj_t* beta,
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
dim_t n, \
|
||||
ctype* beta, \
|
||||
ctype* x, inc_t incx \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( setv )
|
||||
INSERT_GENTPROT_BASIC( copysc )
|
||||
|
||||
|
||||
//
|
||||
// Prototype BLAS-like interfaces with heterogeneous-typed operands.
|
||||
//
|
||||
#undef GENTPROT2
|
||||
#define GENTPROT2( ctype_b, ctype_x, chb, chx, opname ) \
|
||||
#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \
|
||||
\
|
||||
void PASTEMAC2(chb,chx,opname)( \
|
||||
dim_t n, \
|
||||
ctype_b* beta, \
|
||||
ctype_x* x, inc_t incx \
|
||||
void PASTEMAC2(chx,chy,opname)( \
|
||||
conj_t conjchi, \
|
||||
ctype_x* chi, \
|
||||
ctype_y* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT2_BASIC( setv )
|
||||
INSERT_GENTPROT2_BASIC( copysc )
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
|
||||
INSERT_GENTPROT2_MIX_D( setv )
|
||||
INSERT_GENTPROT2_MIX_D( copysc )
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
|
||||
INSERT_GENTPROT2_MIX_P( setv )
|
||||
INSERT_GENTPROT2_MIX_P( copysc )
|
||||
#endif
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user