From 537a1f4f85ce1aa008901857cb3182e6b4546d7f Mon Sep 17 00:00:00 2001 From: "Field G. Van Zee" Date: Mon, 11 Apr 2016 17:21:28 -0500 Subject: [PATCH] Implemented runtime contexts and reorganized code. Details: - Retrofitted a new data structure, known as a context, into virtually all internal APIs for computational operations in BLIS. The structure is now present within the type-aware APIs, as well as many supporting utility functions that require information stored in the context. User- level object APIs were unaffected and continue to be "context-free," however, these APIs were duplicated/mirrored so that "context-aware" APIs now also exist, differentiated with an "_ex" suffix (for "expert"). These new context-aware object APIs (along with the lower-level, type- aware, BLAS-like APIs) contain the the address of a context as a last parameter, after all other operands. Contexts, or specifically, cntx_t object pointers, are passed all the way down the function stack into the kernels and allow the code at any level to query information about the runtime, such as kernel addresses and blocksizes, in a thread- friendly manner--that is, one that allows thread-safety, even if the original source of the information stored in the context changes at run-time; see next bullet for more on this "original source" of info). (Special thanks go to Lee Killough for suggesting the use of this kind of data structure in discussions that transpired during the early planning stages of BLIS, and also for suggesting such a perfectly appropriate name.) - Added a new API, in frame/base/bli_gks.c, to define a "global kernel structure" (gks). This data structure and API will allow the caller to initialize a context with the kernel addresses, blocksizes, and other information associated with the currently active kernel configuration. The currently active kernel configuration within the gks cannot be changed (for now), and is initialized with the traditional cpp macros that define kernel function names, blocksizes, and the like. However, in the future, the gks API will be expanded to allow runtime management of kernels and runtime parameters. The most obvious application of this new infrastructure is the runtime detection of hardware (and the implied selection of appropriate kernels). With contexts in place, kernels may even be "hot swapped" at runtime within the gks. Once execution enters a level-3 _front() function, the memory allocator will be reinitialized on-the-fly, if necessary, to accommodate the new kernels' blocksizes. If another application thread is executing with another (previously loaded) kernel, it will finish in a deterministic fashion because its kernel information was loaded into its context before computation began, and also because the blocks it checked out from the internal memory pools will be unaffected by the newer threads' reinitialization of the allocator. - Reorganized and streamlined the 'ind' directory, which contains much of the code enabling use of induced methods for complex domain matrix multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as those APIs' functionality is now mostly subsumed within the global kernel structure. - Updated bli_pool.c to define a new function, bli_pool_reinit_if(), that will reinitialize a memory pool if the necessary pool block size has increased. - Updated bli_mem.c to use bli_pool_reinit_if() instead of bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed usage of contexts where appropriate to communicate cache and register blocksizes to bli_mem_compute_pool_block_sizes(). - Simplified control trees now that much of the information resides in the context and/or the global kernel structure: - Removed blocksize object pointers (blksz_t*) fields from all control tree node definitions and replaced them with blocksize id (bszid_t) values instead, which may be passed into a context query routine in order to extract the corresponding blocksize from the given context. - Removed micro-kernel function pointers (func_t*) fields from all control tree node definitions. Now, any code that needs these function pointers can query them from the local context, as identified by a level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or level-1v kernel id (l1vkr_t). - Removed blksz_t object creation and initialization, as well as kernel function object creation and initialization, from all operation- specific control tree initialization files (bli_*_cntl.c), since this information will now live in the gks and, secondarily, in the context. - Removed blocksize multiples from blksz_t objects. Now, we track blocksize multiples for each blocksize id (bszid_t) in the context object. - Removed the bool_t's that were required when a func_t was initialized. These bools are meant to allow one to track the micro-kernel's storage preferences (by rows or columns). This preference is now tracked separately within the gks and contexts. - Merged and reorganized many separate-but-related functions into single files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and util directories, but has the most obvious effect of allowing BLIS to compile noticeably faster. - Reorganized execution paths for level-1v, -1d, -1m, and -2 operations in an attempt to reduce overhead for memory-bound operations. This includes removal of default use of object-based variants for level-2 operations. Now, by default, level-2 operations will directly call a low-level (non-object based) loop over a level-1v or -1f kernel. - Converted many common query functions in blk_blksz.c (renamed from bli_blocksize.c) and bli_func.c into cpp macros, now defined in their respective header files. - Defined bli_mbool.c API to create and query "multi-bools", or heterogeneous bool_t's (one for each floating-point datatype), in the same spirit as blksz_t and func_t. - Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS and BLIS_SIMD_SIZE. These values are needed in order to compute a third new parameter, which may be set indirectly via the aforementioned macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to statically allocate memory in macro-kernels and the induced methods' virtual kernels to be used as temporary space to hold a single micro-tile. These values are now output by the testsuite. The default value of BLIS_STACK_BUF_MAX_SIZE is computed as "2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE". - Cleaned up top-level 'kernels' directory (for example, renaming the embarrassingly misleading "avx" and "avx2" directories to "sandybridge" and "haswell," respectively, and gave more consistent and meaningful names to many kernel files (as well as updating their interfaces to conform to the new context-aware kernel APIs). - Updated the testsuite to query blocksizes from a locally-initialized context for test modules that need those values: axpyf, dotxf, dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr. - Reformatted many function signatures into a standard format that will more easily facilitate future API-wide changes. - Updated many "mxn" level-0 macros (ie: those used to inline double loops for level-1m-like operations on small matrices) in frame/include/level0 to use more obscure local variable names in an effort to avoid variable shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings, which are only output using -Wshadow.) - Added a conj argument to setm, so that its interface now mirrors that of scalm. The semantic meaning of the conj argument is to optionally allow implicit conjugation of the scalar prior to being populated into the object. - Deprecated all type-aware mixed domain and mixed precision APIs. Note that this does not preclude supporting mixed types via the object APIs, where it produces absolutely zero API code bloat. --- config/bgq/bli_kernel.h | 36 +- config/bulldozer/bli_kernel.h | 94 +- config/carrizo/bli_kernel.h | 8 +- config/cortex-a15/kernels | 2 +- config/cortex-a9/kernels | 2 +- config/dunnington/bli_kernel.h | 88 +- config/dunnington/kernels | 2 +- config/haswell/bli_kernel.h | 15 - config/haswell/kernels | 2 +- config/loongson3a/bli_kernel.h | 2 +- config/mic/bli_config.h | 3 + config/mic/bli_kernel.h | 4 +- config/piledriver/bli_kernel.h | 8 +- config/sandybridge/kernels | 2 +- config/template/bli_kernel.h | 48 +- .../template/kernels/1/bli_axpyv_opt_var1.c | 124 +- config/template/kernels/1/bli_dotv_opt_var1.c | 142 ++- .../template/kernels/1f/bli_axpy2v_opt_var1.c | 216 ++-- .../template/kernels/1f/bli_axpyf_opt_var1.c | 188 +-- .../kernels/1f/bli_dotaxpyv_opt_var1.c | 226 ++-- .../kernels/1f/bli_dotxaxpyf_opt_var1.c | 300 +++-- .../template/kernels/1f/bli_dotxf_opt_var1.c | 196 +-- config/template/kernels/3/bli_gemm_opt_mxn.c | 148 ++- .../kernels/3/bli_gemmtrsm_l_opt_mxn.c | 120 +- .../kernels/3/bli_gemmtrsm_u_opt_mxn.c | 120 +- .../template/kernels/3/bli_trsm_l_opt_mxn.c | 118 +- .../template/kernels/3/bli_trsm_u_opt_mxn.c | 111 +- .../bli_dotxaxpyf_fusefac.h => 0/bli_l0.h} | 10 +- frame/0/bli_l0_check.c | 314 +++++ frame/0/bli_l0_check.h | 134 ++ frame/0/bli_l0_oapi.c | 288 +++++ frame/0/bli_l0_oapi.h | 125 ++ frame/0/bli_l0_tapi.c | 210 ++++ frame/0/bli_l0_tapi.h | 131 ++ frame/0/copysc/bli_copysc.c | 117 +- frame/0/copysc/bli_copysc.h | 50 +- frame/0/{ => old}/absqsc/bli_absqsc.c | 0 frame/0/{ => old}/absqsc/bli_absqsc.h | 0 frame/0/{ => old}/absqsc/bli_absqsc_check.c | 0 frame/0/{ => old}/absqsc/bli_absqsc_check.h | 0 .../0/{ => old}/absqsc/bli_absqsc_unb_var1.c | 0 .../0/{ => old}/absqsc/bli_absqsc_unb_var1.h | 0 frame/0/{ => old}/addsc/bli_addsc.c | 0 frame/0/{ => old}/addsc/bli_addsc.h | 0 frame/0/{ => old}/addsc/bli_addsc_check.c | 0 frame/0/{ => old}/addsc/bli_addsc_check.h | 0 frame/0/{ => old}/addsc/bli_addsc_unb_var1.c | 0 frame/0/{ => old}/addsc/bli_addsc_unb_var1.h | 0 .../{1/addv/bli_addv.c => 0/old/bli_getsc.c} | 100 +- .../bli_invertv.c => 0/old/bli_getsc.h} | 43 +- frame/0/old/bli_setsc.c | 101 ++ frame/0/old/bli_setsc.h | 64 + .../bli_swapv.c => 0/old/copysc/bli_copysc.c} | 56 +- .../bli_setv.h => 0/old/copysc/bli_copysc.h} | 34 +- frame/0/{ => old}/copysc/bli_copysc_check.c | 0 frame/0/{ => old}/copysc/bli_copysc_check.h | 0 .../0/{ => old}/copysc/bli_copysc_unb_var1.c | 0 .../0/{ => old}/copysc/bli_copysc_unb_var1.h | 0 frame/0/{ => old}/divsc/bli_divsc.c | 0 frame/0/{ => old}/divsc/bli_divsc.h | 0 frame/0/{ => old}/divsc/bli_divsc_check.c | 0 frame/0/{ => old}/divsc/bli_divsc_check.h | 0 frame/0/{ => old}/divsc/bli_divsc_unb_var1.c | 0 frame/0/{ => old}/divsc/bli_divsc_unb_var1.h | 0 frame/0/{ => old}/getsc/bli_getsc.c | 0 frame/0/{ => old}/getsc/bli_getsc.h | 0 frame/0/{ => old}/getsc/bli_getsc_check.c | 0 frame/0/{ => old}/getsc/bli_getsc_check.h | 0 frame/0/{ => old}/mulsc/bli_mulsc.c | 0 frame/0/{ => old}/mulsc/bli_mulsc.h | 0 frame/0/{ => old}/mulsc/bli_mulsc_check.c | 0 frame/0/{ => old}/mulsc/bli_mulsc_check.h | 0 frame/0/{ => old}/mulsc/bli_mulsc_unb_var1.c | 0 frame/0/{ => old}/mulsc/bli_mulsc_unb_var1.h | 0 frame/0/{ => old}/normfsc/bli_normfsc.c | 0 frame/0/{ => old}/normfsc/bli_normfsc.h | 0 frame/0/{ => old}/normfsc/bli_normfsc_check.c | 0 frame/0/{ => old}/normfsc/bli_normfsc_check.h | 0 .../{ => old}/normfsc/bli_normfsc_unb_var1.c | 0 .../{ => old}/normfsc/bli_normfsc_unb_var1.h | 0 frame/0/{ => old}/setsc/bli_setsc.c | 0 frame/0/{ => old}/setsc/bli_setsc.h | 0 frame/0/{ => old}/setsc/bli_setsc_check.c | 0 frame/0/{ => old}/setsc/bli_setsc_check.h | 0 frame/0/{ => old}/sqrtsc/bli_sqrtsc.c | 0 frame/0/{ => old}/sqrtsc/bli_sqrtsc.h | 0 frame/0/{ => old}/sqrtsc/bli_sqrtsc_check.c | 0 frame/0/{ => old}/sqrtsc/bli_sqrtsc_check.h | 0 .../0/{ => old}/sqrtsc/bli_sqrtsc_unb_var1.c | 0 .../0/{ => old}/sqrtsc/bli_sqrtsc_unb_var1.h | 0 frame/0/{ => old}/subsc/bli_subsc.c | 0 frame/0/{ => old}/subsc/bli_subsc.h | 0 frame/0/{ => old}/subsc/bli_subsc_check.c | 0 frame/0/{ => old}/subsc/bli_subsc_check.h | 0 frame/0/{ => old}/subsc/bli_subsc_unb_var1.c | 0 frame/0/{ => old}/subsc/bli_subsc_unb_var1.h | 0 frame/0/{ => old}/unzipsc/bli_unzipsc.c | 0 frame/0/{ => old}/unzipsc/bli_unzipsc.h | 0 frame/0/{ => old}/unzipsc/bli_unzipsc_check.c | 0 frame/0/{ => old}/unzipsc/bli_unzipsc_check.h | 0 .../{ => old}/unzipsc/bli_unzipsc_unb_var1.c | 0 .../{ => old}/unzipsc/bli_unzipsc_unb_var1.h | 0 frame/0/{ => old}/zipsc/bli_zipsc.c | 0 frame/0/{ => old}/zipsc/bli_zipsc.h | 0 frame/0/{ => old}/zipsc/bli_zipsc_check.c | 0 frame/0/{ => old}/zipsc/bli_zipsc_check.h | 0 frame/0/{ => old}/zipsc/bli_zipsc_unb_var1.c | 0 frame/0/{ => old}/zipsc/bli_zipsc_unb_var1.h | 0 frame/1/addv/bli_addv_ref.c | 145 --- frame/1/axpyv/bli_axpyv.c | 128 -- frame/1/axpyv/bli_axpyv_ref.c | 169 --- frame/1/bli_l1v.h | 58 + frame/1/bli_l1v_check.c | 348 ++++++ frame/1/bli_l1v_check.h | 155 +++ frame/1/bli_l1v_cntx.c | 89 ++ frame/1/bli_l1v_cntx.h | 57 + frame/1/bli_l1v_ft.h | 167 +++ frame/1/bli_l1v_ker.h | 151 +++ frame/1/bli_l1v_oapi.c | 370 ++++++ frame/1/bli_l1v_oapi.h | 137 +++ frame/1/bli_l1v_oapi_wc.c | 46 + frame/1/bli_l1v_oapi_woc.c | 46 + frame/1/bli_l1v_tapi.c | 289 +++++ frame/1/bli_l1v_tapi.h | 77 ++ frame/1/copyv/bli_copyv.c | 109 -- frame/1/copyv/bli_copyv_kernel.h | 61 - frame/1/dotv/bli_dotv.c | 121 -- frame/1/dotv/bli_dotv_ref.c | 177 --- frame/1/dotxv/bli_dotxv.c | 133 -- frame/1/dotxv/bli_dotxv_kernel.c | 153 --- frame/1/dotxv/bli_dotxv_kernel.h | 69 -- frame/1/dotxv/bli_dotxv_ref.c | 209 ---- frame/1/kernels/bli_addv_ref.c | 81 ++ frame/1/kernels/bli_axpyv_ref.c | 103 ++ frame/1/kernels/bli_copyv_ref.c | 81 ++ frame/1/kernels/bli_dotv_ref.c | 104 ++ frame/1/kernels/bli_dotxv_ref.c | 112 ++ frame/1/kernels/bli_invertv_ref.c | 63 + frame/1/kernels/bli_l1v_ref.h | 147 +++ frame/1/kernels/bli_scal2v_ref.c | 102 ++ .../bli_scalv_ref.c} | 73 +- frame/1/kernels/bli_setv_ref.c | 80 ++ frame/1/kernels/bli_subv_ref.c | 81 ++ frame/1/kernels/bli_swapv_ref.c | 67 + frame/1/{addv => kernels/old}/bli_addv_ref.h | 18 +- .../1/{axpyv => kernels/old}/bli_axpyv_ref.h | 22 +- .../1/{copyv => kernels/old}/bli_copyv_ref.h | 18 +- frame/1/{dotv => kernels/old}/bli_dotv_ref.h | 22 +- .../1/{dotxv => kernels/old}/bli_dotxv_ref.h | 28 +- .../old}/bli_invertv_ref.h | 6 +- .../{scal2v => kernels/old}/bli_scal2v_ref.h | 22 +- .../1/{scalv => kernels/old}/bli_scalv_ref.h | 23 +- frame/1/{setv => kernels/old}/bli_setv_ref.h | 22 +- frame/1/{subv => kernels/old}/bli_subv_ref.h | 18 +- .../1/{swapv => kernels/old}/bli_swapv_ref.h | 18 +- .../bli_subv_kernel.c => old/addv/bli_addv.c} | 102 +- frame/1/{ => old}/addv/bli_addv.h | 43 +- .../axpyv/bli_axpyv.c} | 126 +- frame/1/{ => old}/axpyv/bli_axpyv.h | 45 +- frame/1/{addv => old/check}/bli_addv_check.c | 0 frame/1/{addv => old/check}/bli_addv_check.h | 0 .../1/{axpyv => old/check}/bli_axpyv_check.c | 0 .../1/{axpyv => old/check}/bli_axpyv_check.h | 0 .../1/{copyv => old/check}/bli_copyv_check.c | 0 .../1/{copyv => old/check}/bli_copyv_check.h | 0 frame/1/{dotv => old/check}/bli_dotv_check.c | 0 frame/1/{dotv => old/check}/bli_dotv_check.h | 0 .../1/{dotxv => old/check}/bli_dotxv_check.c | 0 .../1/{dotxv => old/check}/bli_dotxv_check.h | 0 .../check}/bli_invertv_check.c | 0 .../check}/bli_invertv_check.h | 0 .../{scal2v => old/check}/bli_scal2v_check.c | 0 .../{scal2v => old/check}/bli_scal2v_check.h | 0 .../1/{scalv => old/check}/bli_scalv_check.c | 0 .../1/{scalv => old/check}/bli_scalv_check.h | 0 frame/1/{setv => old/check}/bli_setv_check.c | 0 frame/1/{setv => old/check}/bli_setv_check.h | 0 frame/1/{subv => old/check}/bli_subv_check.c | 0 frame/1/{subv => old/check}/bli_subv_check.h | 0 .../1/{swapv => old/check}/bli_swapv_check.c | 0 .../1/{swapv => old/check}/bli_swapv_check.h | 0 .../copyv/bli_copyv.c} | 102 +- frame/1/{ => old}/copyv/bli_copyv.h | 43 +- .../bli_dotv_kernel.c => old/dotv/bli_dotv.c} | 121 +- frame/1/{ => old}/dotv/bli_dotv.h | 49 +- frame/1/old/dotxv/bli_dotxv.c | 182 +++ frame/1/{ => old}/dotxv/bli_dotxv.h | 52 +- frame/1/old/invertv/bli_invertv.c | 114 ++ frame/1/{ => old}/invertv/bli_invertv.h | 19 +- .../scal2v/bli_scal2v.c} | 153 +-- frame/1/{ => old}/scal2v/bli_scal2v.h | 49 +- frame/1/old/scalv/bli_scalv.c | 140 +++ frame/1/{ => old}/scalv/bli_scalv.h | 54 +- frame/1/old/setv/bli_setv.c | 140 +++ .../old/setv/bli_setv.h} | 56 +- .../1/{ => old}/setv/old/bli_setv_unb_var2.c | 0 .../1/{ => old}/setv/old/bli_setv_unb_var2.h | 0 .../subv/bli_subv.c} | 102 +- frame/1/{ => old}/subv/bli_subv.h | 43 +- .../swapv/bli_swapv.c} | 95 +- frame/1/{ => old}/swapv/bli_swapv.h | 41 +- frame/1/packv/bli_packv_check.c | 15 +- frame/1/packv/bli_packv_check.h | 9 +- frame/1/packv/bli_packv_cntl.c | 29 +- frame/1/packv/bli_packv_cntl.h | 8 +- frame/1/packv/bli_packv_init.c | 55 +- frame/1/packv/bli_packv_init.h | 29 +- frame/1/packv/bli_packv_int.c | 5 +- frame/1/packv/bli_packv_int.h | 10 +- frame/1/packv/bli_packv_unb_var1.c | 47 +- frame/1/packv/bli_packv_unb_var1.h | 15 +- frame/1/scal2v/bli_scal2v.c | 130 -- frame/1/scal2v/bli_scal2v_kernel.c | 129 -- frame/1/scal2v/bli_scal2v_kernel.h | 64 - frame/1/scal2v/bli_scal2v_ref.c | 170 --- frame/1/scalv/bli_scalv.c | 124 -- frame/1/scalv/bli_scalv_int.c | 23 +- frame/1/scalv/bli_scalv_int.h | 3 +- frame/1/scalv/bli_scalv_kernel.c | 120 -- frame/1/scalv/bli_scalv_ref.c | 151 --- frame/1/setv/bli_setv.c | 121 -- frame/1/setv/bli_setv_kernel.c | 113 -- frame/1/setv/bli_setv_ref.c | 136 -- frame/1/subv/bli_subv.c | 109 -- frame/1/subv/bli_subv_kernel.h | 61 - frame/1/subv/bli_subv_ref.c | 145 --- frame/1/swapv/bli_swapv_ref.c | 128 -- frame/1/unpackv/bli_unpackv_check.c | 15 +- frame/1/unpackv/bli_unpackv_check.h | 9 +- frame/1/unpackv/bli_unpackv_int.c | 5 +- frame/1/unpackv/bli_unpackv_int.h | 1 + frame/1/unpackv/bli_unpackv_unb_var1.c | 47 +- frame/1/unpackv/bli_unpackv_unb_var1.h | 13 +- .../bli_dotxf_fusefac.c => 1d/bli_l1d.h} | 18 +- frame/1d/bli_l1d_check.c | 245 ++++ frame/1d/bli_l1d_check.h | 118 ++ frame/1d/bli_l1d_cntx.c | 66 + frame/1d/bli_l1d_cntx.h | 55 + frame/1d/bli_l1d_oapi.c | 291 +++++ .../bli_l1d_oapi.h} | 74 +- frame/1d/bli_l1d_oapi_wc.c | 46 + frame/1d/bli_l1d_oapi_woc.c | 46 + frame/1d/bli_l1d_tapi.c | 371 ++++++ frame/1d/bli_l1d_tapi.h | 127 ++ frame/1d/{addd => old}/bli_addd.c | 0 frame/1d/{addd => old}/bli_addd.h | 0 frame/1d/{addd => old}/bli_addd_check.c | 0 frame/1d/{addd => old}/bli_addd_check.h | 0 frame/1d/{addd => old}/bli_addd_unb_var1.c | 0 frame/1d/{addd => old}/bli_addd_unb_var1.h | 0 frame/1d/{axpyd => old}/bli_axpyd.c | 0 frame/1d/{axpyd => old}/bli_axpyd.h | 0 frame/1d/{axpyd => old}/bli_axpyd_check.c | 0 frame/1d/{axpyd => old}/bli_axpyd_check.h | 0 frame/1d/{axpyd => old}/bli_axpyd_unb_var1.c | 0 frame/1d/{axpyd => old}/bli_axpyd_unb_var1.h | 0 frame/1d/{copyd => old}/bli_copyd.c | 0 frame/1d/{copyd => old}/bli_copyd.h | 0 frame/1d/{copyd => old}/bli_copyd_check.c | 0 frame/1d/{copyd => old}/bli_copyd_check.h | 0 frame/1d/{copyd => old}/bli_copyd_unb_var1.c | 0 frame/1d/{copyd => old}/bli_copyd_unb_var1.h | 0 frame/1d/{invertd => old}/bli_invertd.c | 0 frame/1d/{invertd => old}/bli_invertd.h | 0 frame/1d/{invertd => old}/bli_invertd_check.c | 0 frame/1d/{invertd => old}/bli_invertd_check.h | 0 .../{invertd => old}/bli_invertd_unb_var1.c | 0 .../{invertd => old}/bli_invertd_unb_var1.h | 0 frame/1d/{scal2d => old}/bli_scal2d.c | 0 frame/1d/{scal2d => old}/bli_scal2d.h | 0 frame/1d/{scal2d => old}/bli_scal2d_check.c | 0 frame/1d/{scal2d => old}/bli_scal2d_check.h | 0 .../1d/{scal2d => old}/bli_scal2d_unb_var1.c | 0 .../1d/{scal2d => old}/bli_scal2d_unb_var1.h | 0 frame/1d/{scald => old}/bli_scald.c | 0 frame/1d/{scald => old}/bli_scald.h | 0 frame/1d/{scald => old}/bli_scald_check.c | 0 frame/1d/{scald => old}/bli_scald_check.h | 0 frame/1d/{scald => old}/bli_scald_unb_var1.c | 0 frame/1d/{scald => old}/bli_scald_unb_var1.h | 0 frame/1d/{setd => old}/bli_setd.c | 0 frame/1d/{setd => old}/bli_setd.h | 0 frame/1d/{setd => old}/bli_setd_check.c | 0 frame/1d/{setd => old}/bli_setd_check.h | 0 frame/1d/{setd => old}/bli_setd_unb_var1.c | 0 frame/1d/{setd => old}/bli_setd_unb_var1.h | 0 frame/1d/{setid => old}/bli_setid.c | 0 frame/1d/{setid => old}/bli_setid.h | 0 frame/1d/{setid => old}/bli_setid_check.c | 0 frame/1d/{setid => old}/bli_setid_check.h | 0 frame/1d/{setid => old}/bli_setid_unb_var1.c | 0 frame/1d/{setid => old}/bli_setid_unb_var1.h | 0 frame/1d/{subd => old}/bli_subd.c | 0 frame/1d/{subd => old}/bli_subd.h | 0 frame/1d/{subd => old}/bli_subd_check.c | 0 frame/1d/{subd => old}/bli_subd_check.h | 0 frame/1d/{subd => old}/bli_subd_unb_var1.c | 0 frame/1d/{subd => old}/bli_subd_unb_var1.h | 0 frame/1f/axpy2v/bli_axpy2v.c | 133 -- frame/1f/axpy2v/bli_axpy2v_kernel.c | 150 --- frame/1f/axpy2v/bli_axpy2v_kernel.h | 69 -- frame/1f/axpy2v/bli_axpy2v_ref.c | 161 --- frame/1f/axpyf/bli_axpyf.c | 141 --- frame/1f/axpyf/bli_axpyf_kernel.h | 68 - frame/1f/bli_l1f.h | 50 + frame/1f/bli_l1f_check.c | 450 +++++++ frame/1f/bli_l1f_check.h | 116 ++ frame/1f/bli_l1f_cntx.c | 143 +++ frame/1f/bli_l1f_cntx.h | 50 + frame/1f/bli_l1f_ft.h | 153 +++ frame/1f/bli_l1f_ker.h | 140 +++ frame/1f/bli_l1f_oapi.c | 392 ++++++ frame/1f/bli_l1f_oapi.h | 121 ++ frame/1f/bli_l1f_oapi_wc.c | 46 + frame/1f/bli_l1f_oapi_woc.c | 46 + frame/1f/bli_l1f_tapi.c | 263 ++++ frame/1f/bli_l1f_tapi.h | 58 + frame/1f/dotaxpyv/bli_dotaxpyv.c | 139 --- frame/1f/dotaxpyv/bli_dotaxpyv_kernel.c | 155 --- frame/1f/dotaxpyv/bli_dotaxpyv_kernel.h | 71 -- frame/1f/dotaxpyv/bli_dotaxpyv_ref.c | 170 --- frame/1f/dotxaxpyf/bli_dotxaxpyf.c | 159 --- frame/1f/dotxaxpyf/bli_dotxaxpyf_kernel.c | 188 --- frame/1f/dotxaxpyf/bli_dotxaxpyf_kernel.h | 77 -- frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var1.c | 227 ---- frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var2.c | 208 ---- frame/1f/dotxf/bli_dotxf.c | 147 --- frame/1f/dotxf/bli_dotxf_kernel.h | 70 -- frame/1f/kernels/bli_axpy2v_ref.c | 80 ++ frame/1f/kernels/bli_axpyf_ref.c | 86 ++ .../kernels/bli_dotaxpyv_ref.c} | 83 +- frame/1f/kernels/bli_dotxaxpyf_ref_var1.c | 118 ++ frame/1f/kernels/bli_dotxaxpyf_ref_var2.c | 97 ++ frame/1f/kernels/bli_dotxf_ref.c | 86 ++ frame/1f/kernels/bli_l1f_ref.h | 160 +++ .../{axpy2v => kernels/old}/bli_axpy2v_ref.h | 28 +- .../1f/{axpyf => kernels/old}/bli_axpyf_ref.h | 25 +- .../old}/bli_dotaxpyv_ref.h | 29 +- .../old}/bli_dotxaxpyf_ref_var1.h | 35 +- .../old}/bli_dotxaxpyf_ref_var2.h | 33 +- .../1f/{dotxf => kernels/old}/bli_dotxf_ref.h | 28 +- frame/1f/old/axpy2v/bli_axpy2v.c | 184 +++ frame/1f/{ => old}/axpy2v/bli_axpy2v.h | 67 +- frame/1f/old/axpyf/bli_axpyf.c | 177 +++ frame/1f/{ => old}/axpyf/bli_axpyf.h | 62 +- .../{axpy2v => old/check}/bli_axpy2v_check.c | 16 +- .../{axpy2v => old/check}/bli_axpy2v_check.h | 4 +- .../1f/{axpyf => old/check}/bli_axpyf_check.c | 0 .../1f/{axpyf => old/check}/bli_axpyf_check.h | 0 .../check}/bli_dotaxpyv_check.c | 0 .../check}/bli_dotaxpyv_check.h | 0 .../check}/bli_dotxaxpyf_check.c | 0 .../check}/bli_dotxaxpyf_check.h | 0 .../1f/{dotxf => old/check}/bli_dotxf_check.c | 0 .../1f/{dotxf => old/check}/bli_dotxf_check.h | 0 frame/1f/old/dotaxpyv/bli_dotaxpyv.c | 186 +++ frame/1f/{ => old}/dotaxpyv/bli_dotaxpyv.h | 69 +- frame/1f/old/dotxaxpyf/bli_dotxaxpyf.c | 230 ++++ frame/1f/{ => old}/dotxaxpyf/bli_dotxaxpyf.h | 86 +- frame/1f/old/dotxf/bli_dotxf.c | 195 +++ frame/1f/{ => old}/dotxf/bli_dotxf.h | 67 +- frame/1m/bli_l1m.h | 56 + frame/1m/bli_l1m_check.c | 212 ++++ frame/1m/bli_l1m_check.h | 101 ++ frame/1m/bli_l1m_cntx.c | 83 ++ frame/1m/bli_l1m_cntx.h | 53 + .../bli_dotv_kernel.h => 1m/bli_l1m_ft.h} | 51 +- frame/1m/bli_l1m_oapi.c | 286 +++++ frame/1m/bli_l1m_oapi.h | 82 ++ frame/1m/bli_l1m_oapi_wc.c | 46 + frame/1m/bli_l1m_oapi_woc.c | 46 + frame/1m/bli_l1m_tapi.c | 375 ++++++ frame/1m/bli_l1m_tapi.h | 100 ++ frame/1m/bli_l1m_unb_var1.c | 368 ++++++ frame/1m/bli_l1m_unb_var1.h | 100 ++ frame/1m/{ => old}/addm/bli_addm.c | 0 frame/1m/{ => old}/addm/bli_addm.h | 0 frame/1m/{ => old}/addm/bli_addm_check.c | 0 frame/1m/{ => old}/addm/bli_addm_check.h | 0 frame/1m/{ => old}/addm/bli_addm_unb_var1.c | 3 +- frame/1m/{ => old}/addm/bli_addm_unb_var1.h | 2 +- frame/1m/{ => old}/axpym/bli_axpym.c | 0 frame/1m/{ => old}/axpym/bli_axpym.h | 0 frame/1m/{ => old}/axpym/bli_axpym_check.c | 0 frame/1m/{ => old}/axpym/bli_axpym_check.h | 0 frame/1m/{ => old}/axpym/bli_axpym_unb_var1.c | 3 +- frame/1m/{ => old}/axpym/bli_axpym_unb_var1.h | 2 +- frame/1m/old/bli_scalm.c | 83 ++ frame/1m/{scalm => old}/bli_scalm.h | 33 +- frame/1m/{scalm => old}/bli_scalm_check.c | 0 frame/1m/{scalm => old}/bli_scalm_check.h | 0 frame/1m/{scalm => old}/bli_scalm_unb_var1.c | 130 +- frame/1m/{scalm => old}/bli_scalm_unb_var1.h | 8 +- frame/1m/{ => old}/copym/bli_copym.c | 0 frame/1m/{ => old}/copym/bli_copym.h | 0 frame/1m/{ => old}/copym/bli_copym_check.c | 0 frame/1m/{ => old}/copym/bli_copym_check.h | 0 frame/1m/{ => old}/copym/bli_copym_unb_var1.c | 3 +- frame/1m/{ => old}/copym/bli_copym_unb_var1.h | 2 +- frame/1m/{ => old}/scal2m/bli_scal2m.c | 0 frame/1m/{ => old}/scal2m/bli_scal2m.h | 0 frame/1m/{ => old}/scal2m/bli_scal2m_check.c | 0 frame/1m/{ => old}/scal2m/bli_scal2m_check.h | 0 .../1m/{ => old}/scal2m/bli_scal2m_unb_var1.c | 38 +- .../1m/{ => old}/scal2m/bli_scal2m_unb_var1.h | 2 +- frame/1m/{ => old}/setm/bli_setm.c | 0 frame/1m/{ => old}/setm/bli_setm.h | 0 frame/1m/{ => old}/setm/bli_setm_check.c | 0 frame/1m/{ => old}/setm/bli_setm_check.h | 0 frame/1m/{ => old}/setm/bli_setm_unb_var1.c | 9 +- frame/1m/{ => old}/setm/bli_setm_unb_var1.h | 3 +- frame/1m/{ => old}/subm/bli_subm.c | 0 frame/1m/{ => old}/subm/bli_subm.h | 0 frame/1m/{ => old}/subm/bli_subm_check.c | 0 frame/1m/{ => old}/subm/bli_subm_check.h | 0 frame/1m/{ => old}/subm/bli_subm_unb_var1.c | 3 +- frame/1m/{ => old}/subm/bli_subm_unb_var1.h | 2 +- frame/1m/packm/bli_packm.h | 1 + frame/1m/packm/bli_packm_blk_var1.c | 150 ++- frame/1m/packm/bli_packm_blk_var1.c.old | 2 +- frame/1m/packm/bli_packm_blk_var1.h | 47 +- frame/1m/packm/bli_packm_check.c | 12 +- frame/1m/packm/bli_packm_check.h | 12 +- frame/1m/packm/bli_packm_cntl.c | 96 +- frame/1m/packm/bli_packm_cntl.h | 16 +- frame/1m/packm/bli_packm_cntx.c | 57 + .../packm/bli_packm_cntx.h} | 14 +- frame/1m/packm/bli_packm_cxk.c | 68 +- frame/1m/packm/bli_packm_cxk.h | 20 +- frame/1m/packm/bli_packm_cxk_3mis.c | 80 +- frame/1m/packm/bli_packm_cxk_3mis.h | 20 +- frame/1m/packm/bli_packm_cxk_4mi.c | 77 +- frame/1m/packm/bli_packm_cxk_4mi.h | 20 +- frame/1m/packm/bli_packm_cxk_rih.c | 106 +- frame/1m/packm/bli_packm_cxk_rih.h | 22 +- frame/1m/packm/bli_packm_init.c | 94 +- frame/1m/packm/bli_packm_init.h | 8 +- frame/1m/packm/bli_packm_int.c | 5 +- frame/1m/packm/bli_packm_int.h | 1 + frame/1m/packm/bli_packm_struc_cxk.c | 416 ++++--- frame/1m/packm/bli_packm_struc_cxk.h | 116 +- frame/1m/packm/bli_packm_struc_cxk_3mis.c | 596 +++++---- frame/1m/packm/bli_packm_struc_cxk_3mis.h | 116 +- frame/1m/packm/bli_packm_struc_cxk_4mi.c | 530 ++++---- frame/1m/packm/bli_packm_struc_cxk_4mi.h | 116 +- frame/1m/packm/bli_packm_struc_cxk_rih.c | 392 +++--- frame/1m/packm/bli_packm_struc_cxk_rih.h | 116 +- frame/1m/packm/bli_packm_unb_var1.c | 167 ++- frame/1m/packm/bli_packm_unb_var1.h | 31 +- ...ef_cxk_3mis.c => bli_packm_cxk_3mis_ref.c} | 153 +-- .../ukernels/bli_packm_cxk_3mis_ref.h} | 32 +- ..._ref_cxk_4mi.c => bli_packm_cxk_4mi_ref.c} | 153 +-- .../packm/ukernels/bli_packm_cxk_4mi_ref.h} | 43 +- ...li_packm_ref_cxk.c => bli_packm_cxk_ref.c} | 170 +-- frame/1m/packm/ukernels/bli_packm_cxk_ref.h | 57 + ..._ref_cxk_rih.c => bli_packm_cxk_rih_ref.c} | 171 +-- .../1m/packm/ukernels/bli_packm_cxk_rih_ref.h | 57 + .../1m/packm/ukernels/bli_packm_ref_cxk_rih.h | 56 - frame/1m/scalm/bli_scalm.c | 118 -- frame/1m/scalm/bli_scalm_int.c | 40 +- frame/1m/scalm/bli_scalm_int.h | 3 +- frame/1m/unpackm/bli_unpackm_blk_var2.c | 79 +- frame/1m/unpackm/bli_unpackm_blk_var2.h | 4 +- frame/1m/unpackm/bli_unpackm_check.c | 1 + frame/1m/unpackm/bli_unpackm_check.h | 1 + frame/1m/unpackm/bli_unpackm_cxk.c | 56 +- frame/1m/unpackm/bli_unpackm_cxk.h | 20 +- frame/1m/unpackm/bli_unpackm_int.c | 5 +- frame/1m/unpackm/bli_unpackm_int.h | 1 + frame/1m/unpackm/bli_unpackm_unb_var1.c | 31 +- frame/1m/unpackm/bli_unpackm_unb_var1.h | 21 +- ...npackm_ref_cxk.c => bli_unpackm_cxk_ref.c} | 152 +-- .../1m/unpackm/ukernels/bli_unpackm_cxk_ref.h | 55 + frame/2/bli_l2.h | 58 + frame/2/bli_l2_check.c | 415 +++++++ frame/2/bli_l2_check.h | 118 ++ frame/2/bli_l2_cntx.c | 206 ++++ frame/2/bli_l2_cntx.h | 56 + frame/2/bli_l2_ft.h | 166 +++ frame/2/bli_l2_oapi.c | 418 +++++++ frame/2/bli_l2_oapi.h | 103 ++ frame/2/bli_l2_oapi_wc.c | 46 + frame/2/bli_l2_oapi_woc.c | 46 + frame/2/bli_l2_tapi.c | 502 ++++++++ frame/2/bli_l2_tapi.h | 170 +++ frame/2/gemv/bli_gemv.h | 68 +- frame/2/gemv/bli_gemv_blk_var1.c | 14 +- frame/2/gemv/bli_gemv_blk_var2.c | 14 +- frame/2/gemv/bli_gemv_cntl.c | 51 +- frame/2/gemv/bli_gemv_cntl.h | 6 +- frame/2/gemv/{bli_gemv.c => bli_gemv_front.c} | 78 +- frame/2/gemv/bli_gemv_front.h | 63 + frame/2/gemv/bli_gemv_int.c | 5 +- frame/2/gemv/bli_gemv_int.h | 20 +- frame/2/gemv/bli_gemv_unb_var1.c | 180 +-- frame/2/gemv/bli_gemv_unb_var2.c | 207 +--- frame/2/gemv/bli_gemv_unf_var1.c | 186 +-- frame/2/gemv/bli_gemv_unf_var2.c | 211 +--- frame/2/gemv/bli_gemv_var.h | 90 ++ frame/2/gemv/bli_gemv_var_oapi.c | 95 ++ frame/2/gemv/bli_gemv_var_oapi.c.prev | 97 ++ frame/2/gemv/{ => old}/bli_gemv_blk_var1.h | 1 + frame/2/gemv/{ => old}/bli_gemv_blk_var2.h | 1 + frame/2/gemv/{ => old}/bli_gemv_check.c | 1 + frame/2/gemv/{ => old}/bli_gemv_check.h | 1 + .../gemv/old/bli_gemv_cntx.c} | 63 +- .../gemv/old/bli_gemv_cntx.h} | 4 +- .../gemv/old/bli_gemv_unb_var1.c} | 118 +- frame/2/gemv/{ => old}/bli_gemv_unb_var1.h | 0 .../gemv/old/bli_gemv_unb_var2.c} | 132 +- frame/2/gemv/{ => old}/bli_gemv_unb_var2.h | 0 .../gemv/old/bli_gemv_unf_var1.c} | 137 ++- frame/2/gemv/{ => old}/bli_gemv_unf_var1.h | 0 frame/2/gemv/old/bli_gemv_unf_var2.c | 216 ++++ frame/2/gemv/{ => old}/bli_gemv_unf_var2.h | 0 frame/2/ger/bli_ger.h | 63 +- frame/2/ger/bli_ger_blk_var1.c | 14 +- frame/2/ger/bli_ger_blk_var2.c | 14 +- frame/2/ger/bli_ger_cntl.c | 31 +- frame/2/ger/bli_ger_cntl.h | 6 +- frame/2/ger/{bli_ger.c => bli_ger_front.c} | 73 +- frame/2/ger/bli_ger_front.h | 61 + frame/2/ger/bli_ger_int.c | 5 +- frame/2/ger/bli_ger_int.h | 1 + frame/2/ger/bli_ger_unb_var1.c | 161 +-- frame/2/ger/bli_ger_unb_var2.c | 161 +-- .../ger/bli_ger_var.h} | 59 +- frame/2/ger/bli_ger_var_oapi.c | 89 ++ frame/2/ger/{ => old}/bli_ger_blk_var1.h | 1 + frame/2/ger/{ => old}/bli_ger_blk_var2.h | 1 + frame/2/ger/{ => old}/bli_ger_check.c | 1 + frame/2/ger/{ => old}/bli_ger_check.h | 1 + .../ger/old/bli_ger_cntx.c} | 39 +- .../ger/old/bli_ger_cntx.h} | 5 +- frame/2/ger/old/bli_ger_unb_var1.c | 148 +++ frame/2/ger/{ => old}/bli_ger_unb_var1.h | 0 .../ger/old/bli_ger_unb_var2.c} | 145 ++- frame/2/ger/{ => old}/bli_ger_unb_var2.h | 0 frame/2/hemv/bli_hemv.h | 74 +- frame/2/hemv/bli_hemv_blk_var1.c | 22 +- frame/2/hemv/bli_hemv_blk_var2.c | 22 +- frame/2/hemv/bli_hemv_blk_var3.c | 22 +- frame/2/hemv/bli_hemv_blk_var4.c | 22 +- frame/2/hemv/bli_hemv_cntl.c | 20 +- frame/2/hemv/bli_hemv_cntl.h | 6 +- frame/2/hemv/{bli_hemv.c => bli_hemv_front.c} | 78 +- frame/2/hemv/bli_hemv_front.h | 68 + frame/2/hemv/bli_hemv_int.c | 8 +- frame/2/hemv/bli_hemv_int.h | 1 + frame/2/hemv/bli_hemv_unb_var1.c | 258 ++-- frame/2/hemv/bli_hemv_unb_var2.c | 268 ++-- frame/2/hemv/bli_hemv_unb_var3.c | 258 ++-- frame/2/hemv/bli_hemv_unb_var4.c | 254 ++-- frame/2/hemv/bli_hemv_unf_var1.c | 280 ++--- frame/2/hemv/bli_hemv_unf_var1a.c | 246 ++-- frame/2/hemv/bli_hemv_unf_var3.c | 298 ++--- frame/2/hemv/bli_hemv_unf_var3a.c | 264 ++-- frame/2/hemv/bli_hemv_var.h | 102 ++ frame/2/hemv/bli_hemv_var_oapi.c | 101 ++ frame/2/hemv/{ => old}/bli_hemv_blk_var1.h | 1 + frame/2/hemv/{ => old}/bli_hemv_blk_var2.h | 1 + frame/2/hemv/{ => old}/bli_hemv_blk_var3.h | 1 + frame/2/hemv/{ => old}/bli_hemv_blk_var4.h | 1 + frame/2/hemv/{ => old}/bli_hemv_check.c | 1 + frame/2/hemv/{ => old}/bli_hemv_check.h | 1 + frame/2/hemv/old/bli_hemv_cntx.c | 68 + .../hemv/old/bli_hemv_cntx.h} | 6 +- frame/2/hemv/old/bli_hemv_unb_var1.c | 254 ++++ frame/2/hemv/{ => old}/bli_hemv_unb_var1.h | 1 + frame/2/hemv/old/bli_hemv_unb_var2.c | 260 ++++ frame/2/hemv/{ => old}/bli_hemv_unb_var2.h | 1 + frame/2/hemv/old/bli_hemv_unb_var3.c | 253 ++++ frame/2/hemv/{ => old}/bli_hemv_unb_var3.h | 1 + frame/2/hemv/old/bli_hemv_unb_var4.c | 253 ++++ frame/2/hemv/{ => old}/bli_hemv_unb_var4.h | 1 + frame/2/hemv/old/bli_hemv_unf_var1.c | 297 +++++ frame/2/hemv/{ => old}/bli_hemv_unf_var1.h | 1 + frame/2/hemv/old/bli_hemv_unf_var1a.c | 246 ++++ frame/2/hemv/{ => old}/bli_hemv_unf_var1a.h | 1 + frame/2/hemv/old/bli_hemv_unf_var3.c | 315 +++++ frame/2/hemv/{ => old}/bli_hemv_unf_var3.h | 1 + frame/2/hemv/old/bli_hemv_unf_var3a.c | 263 ++++ frame/2/hemv/{ => old}/bli_hemv_unf_var3a.h | 1 + frame/2/her/bli_her.h | 58 +- frame/2/her/bli_her_blk_var1.c | 15 +- frame/2/her/bli_her_blk_var2.c | 15 +- frame/2/her/bli_her_cntl.c | 20 +- frame/2/her/bli_her_cntl.h | 6 +- frame/2/her/{bli_her.c => bli_her_front.c} | 68 +- frame/2/her/bli_her_front.h | 58 + frame/2/her/bli_her_int.c | 8 +- frame/2/her/bli_her_int.h | 1 + frame/2/her/bli_her_unb_var1.c | 186 +-- frame/2/her/bli_her_unb_var2.c | 186 +-- .../her/bli_her_var.h} | 59 +- frame/2/her/bli_her_var_oapi.c | 84 ++ frame/2/her/{ => old}/bli_her_blk_var1.h | 1 + frame/2/her/{ => old}/bli_her_blk_var2.h | 1 + frame/2/her/{ => old}/bli_her_check.c | 1 + frame/2/her/{ => old}/bli_her_check.h | 1 + frame/2/her/old/bli_her_unb_var1.c | 227 ++++ frame/2/her/{ => old}/bli_her_unb_var1.h | 1 + frame/2/her/old/bli_her_unb_var2.c | 227 ++++ frame/2/her/{ => old}/bli_her_unb_var2.h | 1 + frame/2/her2/bli_her2.h | 70 +- frame/2/her2/bli_her2_blk_var1.c | 20 +- frame/2/her2/bli_her2_blk_var2.c | 20 +- frame/2/her2/bli_her2_blk_var3.c | 20 +- frame/2/her2/bli_her2_blk_var4.c | 20 +- frame/2/her2/bli_her2_cntl.c | 20 +- frame/2/her2/bli_her2_cntl.h | 6 +- frame/2/her2/{bli_her2.c => bli_her2_front.c} | 76 +- frame/2/her2/bli_her2_front.h | 61 + frame/2/her2/bli_her2_int.c | 36 +- frame/2/her2/bli_her2_int.h | 15 +- frame/2/her2/bli_her2_unb_var1.c | 236 ++-- frame/2/her2/bli_her2_unb_var2.c | 244 ++-- frame/2/her2/bli_her2_unb_var3.c | 246 ++-- frame/2/her2/bli_her2_unb_var4.c | 242 ++-- frame/2/her2/bli_her2_unf_var1.c | 228 ++-- frame/2/her2/bli_her2_unf_var4.c | 234 ++-- frame/2/her2/bli_her2_var.h | 97 ++ frame/2/her2/bli_her2_var_oapi.c | 97 ++ frame/2/her2/{ => old}/bli_her2_blk_var1.h | 1 + frame/2/her2/{ => old}/bli_her2_blk_var2.h | 1 + frame/2/her2/{ => old}/bli_her2_blk_var3.h | 1 + frame/2/her2/{ => old}/bli_her2_blk_var4.h | 1 + frame/2/her2/{ => old}/bli_her2_check.c | 31 +- frame/2/her2/{ => old}/bli_her2_check.h | 31 +- frame/2/her2/old/bli_her2_cntx.c | 60 + .../her2/old/bli_her2_cntx.h} | 6 +- frame/2/her2/old/bli_her2_unb_var1.c | 253 ++++ frame/2/her2/{ => old}/bli_her2_unb_var1.h | 14 +- frame/2/her2/old/bli_her2_unb_var2.c | 262 ++++ frame/2/her2/{ => old}/bli_her2_unb_var2.h | 14 +- frame/2/her2/old/bli_her2_unb_var3.c | 262 ++++ frame/2/her2/{ => old}/bli_her2_unb_var3.h | 14 +- frame/2/her2/old/bli_her2_unb_var4.c | 261 ++++ frame/2/her2/{ => old}/bli_her2_unb_var4.h | 14 +- frame/2/her2/old/bli_her2_unf_var1.c | 250 ++++ frame/2/her2/{ => old}/bli_her2_unf_var1.h | 14 +- frame/2/her2/old/bli_her2_unf_var4.c | 258 ++++ frame/2/her2/{ => old}/bli_her2_unf_var4.h | 14 +- frame/2/symv/bli_symv.h | 59 +- frame/2/symv/{bli_symv.c => bli_symv_front.c} | 77 +- frame/2/symv/bli_symv_front.h | 64 + frame/2/symv/{ => old}/bli_symv_check.c | 0 frame/2/symv/{ => old}/bli_symv_check.h | 0 frame/2/syr/bli_syr.h | 51 +- frame/2/syr/{bli_syr.c => bli_syr_front.c} | 65 +- frame/2/syr/bli_syr_front.h | 58 + frame/2/syr/{ => old}/bli_syr_check.c | 0 frame/2/syr/{ => old}/bli_syr_check.h | 0 frame/2/syr2/bli_syr2.h | 56 +- frame/2/syr2/{bli_syr2.c => bli_syr2_front.c} | 74 +- frame/2/syr2/bli_syr2_front.h | 61 + frame/2/syr2/{ => old}/bli_syr2_check.c | 16 +- frame/2/syr2/{ => old}/bli_syr2_check.h | 16 +- frame/2/trmv/bli_trmv.h | 64 +- frame/2/trmv/bli_trmv_cntl.c | 22 +- frame/2/trmv/bli_trmv_cntl.h | 6 +- frame/2/trmv/{bli_trmv.c => bli_trmv_front.c} | 69 +- frame/2/trmv/bli_trmv_front.h | 59 + frame/2/trmv/bli_trmv_int.c | 5 +- frame/2/trmv/bli_trmv_int.h | 1 + frame/2/trmv/bli_trmv_l_blk_var1.c | 15 +- frame/2/trmv/bli_trmv_l_blk_var2.c | 15 +- frame/2/trmv/bli_trmv_u_blk_var1.c | 15 +- frame/2/trmv/bli_trmv_u_blk_var2.c | 15 +- frame/2/trmv/bli_trmv_unb_var1.c | 225 ++-- frame/2/trmv/bli_trmv_unb_var2.c | 221 ++-- frame/2/trmv/bli_trmv_unf_var1.c | 267 ++-- frame/2/trmv/bli_trmv_unf_var2.c | 257 ++-- frame/2/trmv/bli_trmv_var.h | 88 ++ frame/2/trmv/bli_trmv_var_oapi.c | 87 ++ frame/2/trmv/{ => old}/bli_trmv_check.c | 1 + frame/2/trmv/{ => old}/bli_trmv_check.h | 1 + frame/2/trmv/{ => old}/bli_trmv_l_blk_var1.h | 1 + frame/2/trmv/{ => old}/bli_trmv_l_blk_var2.h | 1 + frame/2/trmv/{ => old}/bli_trmv_u_blk_var1.h | 1 + frame/2/trmv/{ => old}/bli_trmv_u_blk_var2.h | 1 + frame/2/trmv/old/bli_trmv_unb_var1.c | 226 ++++ frame/2/trmv/{ => old}/bli_trmv_unb_var1.h | 1 + frame/2/trmv/old/bli_trmv_unb_var2.c | 224 ++++ frame/2/trmv/{ => old}/bli_trmv_unb_var2.h | 1 + frame/2/trmv/old/bli_trmv_unf_var1.c | 293 +++++ frame/2/trmv/{ => old}/bli_trmv_unf_var1.h | 1 + frame/2/trmv/old/bli_trmv_unf_var2.c | 288 +++++ frame/2/trmv/{ => old}/bli_trmv_unf_var2.h | 1 + frame/2/trsv/bli_trsv.h | 64 +- frame/2/trsv/bli_trsv_cntl.c | 20 +- frame/2/trsv/bli_trsv_cntl.h | 6 +- frame/2/trsv/{bli_trsv.c => bli_trsv_front.c} | 69 +- frame/2/trsv/bli_trsv_front.h | 58 + frame/2/trsv/bli_trsv_int.c | 5 +- frame/2/trsv/bli_trsv_int.h | 1 + frame/2/trsv/bli_trsv_l_blk_var1.c | 17 +- frame/2/trsv/bli_trsv_l_blk_var2.c | 17 +- frame/2/trsv/bli_trsv_u_blk_var1.c | 17 +- frame/2/trsv/bli_trsv_u_blk_var2.c | 17 +- frame/2/trsv/bli_trsv_unb_var1.c | 233 ++-- frame/2/trsv/bli_trsv_unb_var2.c | 229 ++-- frame/2/trsv/bli_trsv_unf_var1.c | 279 ++--- frame/2/trsv/bli_trsv_unf_var2.c | 269 ++-- frame/2/trsv/bli_trsv_var.h | 88 ++ frame/2/trsv/bli_trsv_var_oapi.c | 87 ++ frame/2/trsv/{ => old}/bli_trsv_check.c | 1 + frame/2/trsv/{ => old}/bli_trsv_check.h | 1 + frame/2/trsv/{ => old}/bli_trsv_l_blk_var1.h | 1 + frame/2/trsv/{ => old}/bli_trsv_l_blk_var2.h | 1 + frame/2/trsv/{ => old}/bli_trsv_u_blk_var1.h | 1 + frame/2/trsv/{ => old}/bli_trsv_u_blk_var2.h | 1 + frame/2/trsv/old/bli_trsv_unb_var1.c | 234 ++++ frame/2/trsv/{ => old}/bli_trsv_unb_var1.h | 1 + frame/2/trsv/old/bli_trsv_unb_var2.c | 232 ++++ frame/2/trsv/{ => old}/bli_trsv_unb_var2.h | 1 + frame/2/trsv/old/bli_trsv_unf_var1.c | 302 +++++ frame/2/trsv/{ => old}/bli_trsv_unf_var1.h | 1 + frame/2/trsv/old/bli_trsv_unf_var2.c | 297 +++++ frame/2/trsv/{ => old}/bli_trsv_unf_var2.h | 1 + .../{1/addv/bli_addv_kernel.h => 3/bli_l3.h} | 48 +- frame/3/bli_l3_blocksize.c | 495 ++++++++ frame/3/bli_l3_blocksize.h | 57 + frame/3/bli_l3_check.c | 508 ++++++++ .../bli_ind_query.h => 3/bli_l3_check.h} | 156 ++- frame/3/bli_l3_cntx.c | 121 ++ .../bli_axpyf_fusefac.c => 3/bli_l3_cntx.h} | 16 +- frame/3/bli_l3_ft.h | 106 ++ frame/3/bli_l3_oapi.c | 169 +++ frame/3/bli_l3_oapi.h | 107 ++ frame/3/bli_l3_oapi_wc.c | 46 + frame/3/bli_l3_oapi_woc.c | 46 + frame/3/bli_l3_oft.h | 122 ++ frame/3/bli_l3_prune.c | 127 ++ .../bli_invertv_kernel.h => 3/bli_l3_prune.h} | 32 +- frame/3/bli_l3_tapi.c | 469 +++++++ frame/3/bli_l3_tapi.h | 206 ++++ frame/3/bli_l3_ukr.h | 91 ++ frame/3/bli_l3_ukr_oapi.c | 220 ++++ .../bli_l3_ukr_oapi.h} | 61 +- frame/3/bli_l3_ukr_tapi.c | 144 +++ frame/3/bli_l3_ukr_tapi.h | 56 + frame/3/gemm/bli_gemm.h | 46 +- frame/3/gemm/bli_gemm_blk_var1f.c | 22 +- frame/3/gemm/bli_gemm_blk_var2f.c | 22 +- frame/3/gemm/bli_gemm_blk_var3f.c | 20 +- frame/3/gemm/bli_gemm_cntl.c | 143 +-- frame/3/gemm/bli_gemm_cntl.h | 8 +- frame/3/gemm/bli_gemm_front.c | 19 +- frame/3/gemm/bli_gemm_front.h | 1 + frame/3/gemm/bli_gemm_int.c | 20 +- frame/3/gemm/bli_gemm_int.h | 1 + frame/3/gemm/bli_gemm_ker_var2.c | 111 +- frame/3/gemm/bli_gemm_var.h | 95 ++ frame/3/gemm/ind/bli_gemm_blk_var4f.c | 68 +- frame/3/gemm/ind/bli_gemm_blk_var4f.h | 1 + frame/3/gemm/ind/bli_gemm_ker_var3.c | 113 +- frame/3/gemm/ind/bli_gemm_ker_var3.h | 2 + frame/3/gemm/ind/bli_gemm_ker_var4.c | 110 +- frame/3/gemm/ind/bli_gemm_ker_var4.h | 2 + frame/3/gemm/{ => old}/bli_gemm_blk_var1f.h | 1 + frame/3/gemm/{ => old}/bli_gemm_blk_var2f.h | 1 + frame/3/gemm/{ => old}/bli_gemm_blk_var3f.h | 1 + frame/3/gemm/old/bli_gemm_cntx.c | 69 ++ frame/3/gemm/old/bli_gemm_cntx.h | 37 + frame/3/gemm/{ => old}/bli_gemm_ker_var2.h | 2 + frame/3/gemm/other/bli_gemm_cntl_exp.c | 123 -- frame/3/gemm/other/bli_gemm_ker_var5.c | 7 +- frame/3/gemm/other/bli_gemm_ker_var5.h | 1 + frame/3/hemm/bli_hemm.h | 32 - frame/3/hemm/bli_hemm_front.c | 19 +- frame/3/hemm/bli_hemm_front.h | 1 + frame/3/her2k/bli_her2k.h | 30 - frame/3/her2k/bli_her2k_front.c | 22 +- frame/3/her2k/bli_her2k_front.h | 1 + frame/3/herk/bli_herk.h | 36 +- frame/3/herk/bli_herk_blk_var1f.c | 22 +- frame/3/herk/bli_herk_blk_var2f.c | 22 +- frame/3/herk/bli_herk_blk_var3f.c | 20 +- frame/3/herk/bli_herk_front.c | 19 +- frame/3/herk/bli_herk_front.h | 1 + frame/3/herk/bli_herk_int.c | 5 +- frame/3/herk/bli_herk_int.h | 1 + frame/3/herk/bli_herk_l_ker_var2.c | 130 +- frame/3/herk/bli_herk_u_ker_var2.c | 130 +- frame/3/herk/bli_herk_var.h | 89 ++ frame/3/herk/{ => old}/bli_herk_blk_var1f.h | 1 + frame/3/herk/{ => old}/bli_herk_blk_var2f.h | 1 + frame/3/herk/{ => old}/bli_herk_blk_var3f.h | 1 + frame/3/herk/{ => old}/bli_herk_l_ker_var2.h | 2 + frame/3/herk/{ => old}/bli_herk_u_ker_var2.h | 2 + frame/3/{gemm => old}/bli_gemm.c | 20 +- frame/3/{gemm => old}/bli_gemm_blocksize.c | 34 +- frame/3/{gemm => old}/bli_gemm_blocksize.h | 6 +- frame/3/{gemm => old}/bli_gemm_check.c | 1 + frame/3/{gemm => old}/bli_gemm_check.h | 1 + frame/3/{gemm => old}/bli_gemm_ukernel.c | 7 +- frame/3/{gemm => old}/bli_gemm_ukernel.h | 4 +- .../{gemm/ukernels => old}/bli_gemm_ukr_ref.h | 0 .../ukernels => old}/bli_gemmtrsm_l_ukr_ref.c | 0 .../ukernels => old}/bli_gemmtrsm_l_ukr_ref.h | 0 .../ukernels => old}/bli_gemmtrsm_u_ukr_ref.c | 0 .../ukernels => old}/bli_gemmtrsm_u_ukr_ref.h | 0 frame/3/{trsm => old}/bli_gemmtrsm_ukernel.c | 18 +- frame/3/{trsm => old}/bli_gemmtrsm_ukernel.h | 4 +- frame/3/{hemm => old}/bli_hemm.c | 20 +- frame/3/{hemm => old}/bli_hemm_check.c | 1 + frame/3/{hemm => old}/bli_hemm_check.h | 1 + frame/3/{her2k => old}/bli_her2k.c | 20 +- frame/3/{her2k => old}/bli_her2k_check.c | 0 frame/3/{her2k => old}/bli_her2k_check.h | 0 frame/3/{herk => old}/bli_herk.c | 20 +- frame/3/{herk => old}/bli_herk_check.c | 1 + frame/3/{herk => old}/bli_herk_check.h | 1 + frame/3/{herk => old}/bli_herk_prune.c | 0 frame/3/{herk => old}/bli_herk_prune.h | 0 frame/3/{symm => old}/bli_symm.c | 20 +- frame/3/{symm => old}/bli_symm_check.c | 0 frame/3/{symm => old}/bli_symm_check.h | 0 frame/3/{syr2k => old}/bli_syr2k.c | 20 +- frame/3/{syr2k => old}/bli_syr2k_check.c | 0 frame/3/{syr2k => old}/bli_syr2k_check.h | 0 frame/3/{syrk => old}/bli_syrk.c | 20 +- frame/3/{syrk => old}/bli_syrk_check.c | 0 frame/3/{syrk => old}/bli_syrk_check.h | 0 frame/3/{trmm => old}/bli_trmm.c | 20 +- frame/3/{trmm3 => old}/bli_trmm3.c | 20 +- frame/3/{trmm3 => old}/bli_trmm3_check.c | 0 frame/3/{trmm3 => old}/bli_trmm3_check.h | 0 frame/3/{trmm => old}/bli_trmm_blocksize.c | 40 +- frame/3/{trmm => old}/bli_trmm_blocksize.h | 6 +- frame/3/{trmm => old}/bli_trmm_check.c | 1 + frame/3/{trmm => old}/bli_trmm_check.h | 1 + frame/3/{trmm => old}/bli_trmm_prune.c | 0 frame/3/{trmm => old}/bli_trmm_prune.h | 0 frame/3/{trsm => old}/bli_trsm.c | 22 +- frame/3/{trsm => old}/bli_trsm_blocksize.c | 30 +- frame/3/{trsm => old}/bli_trsm_blocksize.h | 6 +- frame/3/{trsm => old}/bli_trsm_check.c | 0 frame/3/{trsm => old}/bli_trsm_check.h | 0 .../ukernels => old}/bli_trsm_l_ukr_ref.c | 0 .../ukernels => old}/bli_trsm_l_ukr_ref.h | 0 frame/3/{trsm => old}/bli_trsm_prune.c | 0 frame/3/{trsm => old}/bli_trsm_prune.h | 0 .../ukernels => old}/bli_trsm_u_ukr_ref.c | 0 .../ukernels => old}/bli_trsm_u_ukr_ref.h | 0 frame/3/{trsm => old}/bli_trsm_ukernel.c | 20 +- frame/3/{trsm => old}/bli_trsm_ukernel.h | 4 +- frame/3/symm/bli_symm.h | 32 - frame/3/symm/bli_symm_front.c | 19 +- frame/3/symm/bli_symm_front.h | 1 + frame/3/syr2k/bli_syr2k.h | 30 - frame/3/syr2k/bli_syr2k_front.c | 22 +- frame/3/syr2k/bli_syr2k_front.h | 1 + frame/3/syrk/bli_syrk.h | 27 - frame/3/syrk/bli_syrk_front.c | 19 +- frame/3/syrk/bli_syrk_front.h | 1 + frame/3/trmm/bli_trmm.h | 43 +- frame/3/trmm/bli_trmm_blk_var1f.c | 22 +- frame/3/trmm/bli_trmm_blk_var2b.c | 22 +- frame/3/trmm/bli_trmm_blk_var2f.c | 22 +- frame/3/trmm/bli_trmm_blk_var3b.c | 20 +- frame/3/trmm/bli_trmm_blk_var3f.c | 20 +- frame/3/trmm/bli_trmm_front.c | 19 +- frame/3/trmm/bli_trmm_front.h | 1 + frame/3/trmm/bli_trmm_int.c | 5 +- frame/3/trmm/bli_trmm_int.h | 1 + frame/3/trmm/bli_trmm_ll_ker_var2.c | 144 ++- frame/3/trmm/bli_trmm_lu_ker_var2.c | 144 ++- frame/3/trmm/bli_trmm_rl_ker_var2.c | 144 ++- frame/3/trmm/bli_trmm_ru_ker_var2.c | 144 ++- frame/3/trmm/bli_trmm_var.h | 96 ++ frame/3/trmm/{ => old}/bli_trmm_blk_var1f.h | 1 + frame/3/trmm/{ => old}/bli_trmm_blk_var2b.h | 1 + frame/3/trmm/{ => old}/bli_trmm_blk_var2f.h | 1 + frame/3/trmm/{ => old}/bli_trmm_blk_var3b.h | 1 + frame/3/trmm/{ => old}/bli_trmm_blk_var3f.h | 1 + frame/3/trmm/{ => old}/bli_trmm_ll_ker_var2.h | 4 +- frame/3/trmm/{ => old}/bli_trmm_lu_ker_var2.h | 2 + frame/3/trmm/{ => old}/bli_trmm_rl_ker_var2.h | 2 + frame/3/trmm/{ => old}/bli_trmm_ru_ker_var2.h | 2 + frame/3/trmm/other/bli_trmm_ll_blk_var1.c | 131 -- frame/3/trmm/other/bli_trmm_ll_blk_var1.h | 41 - frame/3/trmm/other/bli_trmm_ll_blk_var4.c | 195 --- frame/3/trmm/other/bli_trmm_ll_blk_var4.h | 41 - frame/3/trmm/other/bli_trmm_lu_blk_var1.c | 128 -- frame/3/trmm/other/bli_trmm_lu_blk_var1.h | 41 - frame/3/trmm/other/bli_trmm_lu_blk_var4.c | 193 --- frame/3/trmm/other/bli_trmm_lu_blk_var4.h | 41 - frame/3/trmm3/bli_trmm3.h | 33 - frame/3/trmm3/bli_trmm3_front.c | 19 +- frame/3/trmm3/bli_trmm3_front.h | 1 + frame/3/trsm/bli_trsm.h | 52 +- frame/3/trsm/bli_trsm_blk_var1b.c | 21 +- frame/3/trsm/bli_trsm_blk_var1f.c | 21 +- frame/3/trsm/bli_trsm_blk_var2b.c | 29 +- frame/3/trsm/bli_trsm_blk_var2f.c | 29 +- frame/3/trsm/bli_trsm_blk_var3b.c | 22 +- frame/3/trsm/bli_trsm_blk_var3f.c | 22 +- frame/3/trsm/bli_trsm_cntl.c | 132 +- frame/3/trsm/bli_trsm_cntl.h | 12 +- frame/3/trsm/bli_trsm_front.c | 10 +- frame/3/trsm/bli_trsm_front.h | 1 + frame/3/trsm/bli_trsm_int.c | 5 +- frame/3/trsm/bli_trsm_int.h | 1 + frame/3/trsm/bli_trsm_ll_ker_var2.c | 165 +-- frame/3/trsm/bli_trsm_lu_ker_var2.c | 165 +-- frame/3/trsm/bli_trsm_rl_ker_var2.c | 170 +-- frame/3/trsm/bli_trsm_ru_ker_var2.c | 170 +-- frame/3/trsm/bli_trsm_var.h | 96 ++ frame/3/trsm/{ => old}/bli_trsm_blk_var1b.h | 0 frame/3/trsm/{ => old}/bli_trsm_blk_var1f.h | 0 frame/3/trsm/{ => old}/bli_trsm_blk_var2b.h | 0 frame/3/trsm/{ => old}/bli_trsm_blk_var2f.h | 0 frame/3/trsm/{ => old}/bli_trsm_blk_var3b.h | 0 frame/3/trsm/{ => old}/bli_trsm_blk_var3f.h | 0 frame/3/trsm/old/bli_trsm_cntx.c | 76 ++ frame/3/trsm/old/bli_trsm_cntx.h | 37 + frame/3/trsm/{ => old}/bli_trsm_ll_ker_var2.h | 1 + frame/3/trsm/{ => old}/bli_trsm_lu_ker_var2.h | 1 + frame/3/trsm/{ => old}/bli_trsm_rl_ker_var2.h | 3 +- frame/3/trsm/{ => old}/bli_trsm_ru_ker_var2.h | 3 +- frame/3/trsm/other/bli_trsm_l_blk_var4.c | 174 --- frame/3/trsm/other/bli_trsm_l_blk_var4.h | 41 - frame/3/trsm/other/bli_trsm_u_blk_var4.c | 178 --- frame/3/trsm/other/bli_trsm_u_blk_var4.h | 41 - .../3/{gemm => }/ukernels/bli_gemm_ukr_ref.c | 46 +- frame/3/ukernels/bli_gemmtrsm_ukr_ref.c | 102 ++ frame/3/ukernels/bli_l3_ukr_ref.h | 53 + frame/3/ukernels/bli_trsm_ukr_ref.c | 199 +++ frame/base/{bli_blocksize.c => bli_blksz.c} | 172 +-- frame/base/{bli_blocksize.h => bli_blksz.h} | 109 +- frame/base/bli_check.c | 34 +- frame/base/bli_check.h | 2 + frame/base/bli_cntx.c | 868 +++++++++++++ frame/base/bli_cntx.h | 347 ++++++ frame/base/bli_error.c | 2 + frame/base/bli_func.c | 78 +- frame/base/bli_func.h | 50 +- frame/base/bli_gks.c | 1001 +++++++++++++++ frame/base/bli_gks.h | 101 ++ frame/base/bli_info.c | 147 +-- frame/base/bli_info.h | 74 +- frame/base/bli_mbool.c | 72 ++ .../bli_ukr_query.h => base/bli_mbool.h} | 47 +- frame/base/bli_mem.c | 191 +-- frame/base/bli_mem.h | 23 +- frame/base/bli_pool.c | 42 +- frame/base/bli_pool.h | 10 +- frame/base/bli_threading.c | 72 +- frame/base/bli_threading.h | 51 +- frame/base/bli_threading_omp.c | 35 +- frame/base/bli_threading_pthreads.c | 37 +- frame/cntl/bli_cntl.h | 2 +- frame/cntl/bli_cntl_init.c | 6 - frame/compat/bla_amax.c | 19 +- frame/compat/bla_amax.h | 9 +- frame/compat/bla_asum.c | 19 +- frame/compat/bla_asum.h | 9 +- frame/compat/bla_axpy.c | 29 +- frame/compat/bla_axpy.h | 13 +- frame/compat/bla_copy.c | 25 +- frame/compat/bla_copy.h | 11 +- frame/compat/bla_dot.c | 49 +- frame/compat/bla_dot.h | 31 +- frame/compat/bla_gemm.c | 72 +- frame/compat/bla_gemm.h | 25 +- frame/compat/bla_gemv.c | 62 +- frame/compat/bla_gemv.h | 21 +- frame/compat/bla_ger.c | 54 +- frame/compat/bla_ger.h | 17 +- frame/compat/bla_hemm.c | 70 +- frame/compat/bla_hemm.h | 23 +- frame/compat/bla_hemv.c | 58 +- frame/compat/bla_hemv.h | 19 +- frame/compat/bla_her.c | 46 +- frame/compat/bla_her.h | 15 +- frame/compat/bla_her2.c | 54 +- frame/compat/bla_her2.h | 17 +- frame/compat/bla_her2k.c | 68 +- frame/compat/bla_her2k.h | 23 +- frame/compat/bla_herk.c | 60 +- frame/compat/bla_herk.h | 21 +- frame/compat/bla_nrm2.c | 19 +- frame/compat/bla_nrm2.h | 9 +- frame/compat/bla_scal.c | 33 +- frame/compat/bla_scal.h | 11 +- frame/compat/bla_swap.c | 23 +- frame/compat/bla_swap.h | 11 +- frame/compat/bla_symm.c | 70 +- frame/compat/bla_symm.h | 23 +- frame/compat/bla_symv.c | 58 +- frame/compat/bla_symv.h | 19 +- frame/compat/bla_syr.c | 46 +- frame/compat/bla_syr.h | 15 +- frame/compat/bla_syr2.c | 55 +- frame/compat/bla_syr2.h | 17 +- frame/compat/bla_syr2k.c | 76 +- frame/compat/bla_syr2k.h | 23 +- frame/compat/bla_syrk.c | 60 +- frame/compat/bla_syrk.h | 21 +- frame/compat/bla_trmm.c | 68 +- frame/compat/bla_trmm.h | 23 +- frame/compat/bla_trmv.c | 54 +- frame/compat/bla_trmv.h | 17 +- frame/compat/bla_trsm.c | 68 +- frame/compat/bla_trsm.h | 23 +- frame/compat/bla_trsv.c | 54 +- frame/compat/bla_trsv.h | 17 +- frame/compat/check/bla_gemm_check.c | 74 +- frame/compat/check/bla_gemm_check.h | 23 +- frame/compat/check/bla_gemv_check.c | 55 +- frame/compat/check/bla_gemv_check.h | 19 +- frame/compat/check/bla_ger_check.c | 41 +- frame/compat/check/bla_ger_check.h | 17 +- frame/compat/check/bla_hemm_check.c | 63 +- frame/compat/check/bla_hemm_check.h | 21 +- frame/compat/check/bla_hemv_check.c | 47 +- frame/compat/check/bla_hemv_check.h | 17 +- frame/compat/check/bla_her2_check.c | 47 +- frame/compat/check/bla_her2_check.h | 17 +- frame/compat/check/bla_her2k_check.c | 63 +- frame/compat/check/bla_her2k_check.h | 21 +- frame/compat/check/bla_her_check.c | 42 +- frame/compat/check/bla_her_check.h | 15 +- frame/compat/check/bla_herk_check.c | 58 +- frame/compat/check/bla_herk_check.h | 19 +- frame/compat/check/bla_symm_check.c | 28 +- frame/compat/check/bla_symm_check.h | 21 +- frame/compat/check/bla_symv_check.c | 24 +- frame/compat/check/bla_symv_check.h | 17 +- frame/compat/check/bla_syr2_check.c | 24 +- frame/compat/check/bla_syr2_check.h | 17 +- frame/compat/check/bla_syr2k_check.c | 66 +- frame/compat/check/bla_syr2k_check.h | 21 +- frame/compat/check/bla_syr_check.c | 22 +- frame/compat/check/bla_syr_check.h | 15 +- frame/compat/check/bla_syrk_check.c | 61 +- frame/compat/check/bla_syrk_check.h | 19 +- frame/compat/check/bla_trmm_check.c | 83 +- frame/compat/check/bla_trmm_check.h | 23 +- frame/compat/check/bla_trmv_check.c | 67 +- frame/compat/check/bla_trmv_check.h | 19 +- frame/compat/check/bla_trsm_check.c | 30 +- frame/compat/check/bla_trsm_check.h | 23 +- frame/compat/check/bla_trsv_check.c | 26 +- frame/compat/check/bla_trsv_check.h | 19 +- frame/compat/f2c/bla_gbmv.h | 2 - frame/compat/f2c/bla_hbmv.h | 2 - frame/compat/f2c/bla_hpmv.h | 2 - frame/compat/f2c/bla_hpr.h | 2 - frame/compat/f2c/bla_hpr2.h | 2 - frame/compat/f2c/bla_lsame.h | 2 - frame/compat/f2c/bla_rot.h | 2 - frame/compat/f2c/bla_rotg.h | 2 - frame/compat/f2c/bla_rotm.h | 2 - frame/compat/f2c/bla_rotmg.h | 2 - frame/compat/f2c/bla_sbmv.h | 2 - frame/compat/f2c/bla_spmv.h | 2 - frame/compat/f2c/bla_spr.h | 2 - frame/compat/f2c/bla_spr2.h | 2 - frame/compat/f2c/bla_tbmv.h | 2 - frame/compat/f2c/bla_tbsv.h | 2 - frame/compat/f2c/bla_tpmv.h | 2 - frame/compat/f2c/bla_tpsv.h | 2 - frame/compat/f2c/bla_xerbla.h | 2 - frame/compat/f2c/util/bla_c_abs.h | 2 - frame/compat/f2c/util/bla_c_div.h | 2 - frame/compat/f2c/util/bla_d_abs.h | 2 - frame/compat/f2c/util/bla_d_cnjg.h | 2 - frame/compat/f2c/util/bla_d_imag.h | 2 - frame/compat/f2c/util/bla_d_sign.h | 2 - frame/compat/f2c/util/bla_f__cabs.c | 26 +- frame/compat/f2c/util/bla_f__cabs.h | 2 - frame/compat/f2c/util/bla_r_abs.h | 2 - frame/compat/f2c/util/bla_r_cnjg.h | 2 - frame/compat/f2c/util/bla_r_imag.h | 2 - frame/compat/f2c/util/bla_r_sign.h | 2 - frame/compat/f2c/util/bla_z_abs.h | 2 - frame/compat/f2c/util/bla_z_div.h | 2 - frame/include/bli_config_macro_defs.h | 24 +- frame/include/bli_genarray_macro_defs.h | 30 +- frame/include/bli_gentdef_macro_defs.h | 76 ++ frame/include/bli_gentfunc_macro_defs.h | 76 +- frame/include/bli_kernel_macro_defs.h | 272 +--- frame/include/bli_kernel_pre_macro_defs.h | 272 ++-- frame/include/bli_kernel_prototypes.h | 540 ++------ frame/include/bli_level3_type_defs.h | 119 -- frame/include/bli_macro_defs.h | 1 + .../bli_oapi_w_cntx.h} | 30 +- frame/include/bli_oapi_wo_cntx.h | 51 + frame/include/bli_obj_macro_defs.h | 86 +- frame/include/bli_param_macro_defs.h | 275 +++-- frame/include/bli_system.h | 1 + frame/include/bli_type_defs.h | 155 ++- frame/include/blis.h | 117 +- frame/include/level0/bli_adds_mxn.h | 40 +- frame/include/level0/bli_adds_mxn_uplo.h | 96 +- frame/include/level0/bli_copys_mxn.h | 40 +- frame/include/level0/bli_set0s_mxn.h | 32 +- frame/include/level0/bli_xpbys_mxn.h | 48 +- frame/include/level0/bli_xpbys_mxn_uplo.h | 176 +-- frame/include/level0/old/bli_set0ris_mxn.h | 40 +- frame/include/level0/ri/bli_absq2ris.h | 4 +- .../include/level0/ri/bli_scalris_mxn_uplo.h | 48 +- .../level0/rih/bli_scal2rihs_mxn_diag.h | 40 +- .../level0/rih/bli_scal2rihs_mxn_uplo.h | 196 +-- .../include/level0/rih/bli_setrihs_mxn_diag.h | 28 +- .../{ => old}/bli_kernel_post_macro_defs.h | 0 frame/include/old/bli_kernel_prototypes.h | 529 ++++++++ .../include/{ => old}/bli_kernel_type_defs.h | 14 +- frame/ind/bli_ind.c | 240 ++++ frame/ind/bli_ind.h | 52 +- .../{query/bli_ind_query.c => bli_l3_ind.c} | 200 +-- frame/ind/bli_l3_ind.h | 75 ++ frame/ind/cntl/bli_gemm3m1_cntl.c | 247 ---- frame/ind/cntl/bli_gemm3m2_cntl.c | 255 ---- frame/ind/cntl/bli_gemm3m3_cntl.c | 240 ---- frame/ind/cntl/bli_gemm3mh_cntl.c | 412 ------- frame/ind/cntl/bli_gemm4m1_cntl.c | 243 ---- frame/ind/cntl/bli_gemm4mb_cntl.c | 245 ---- frame/ind/cntl/bli_gemm4mh_cntl.c | 441 ------- frame/ind/cntl/bli_trsm3m1_cntl.c | 300 ----- frame/ind/cntl/bli_trsm4m1_cntl.c | 300 ----- frame/ind/cntx/bli_gemmind_cntx.c | 507 ++++++++ frame/ind/cntx/bli_gemmind_cntx.h | 100 ++ frame/ind/cntx/bli_trsmind_cntx.c | 144 +++ .../cntx/bli_trsmind_cntx.h} | 38 +- .../include/bli_packm_ind_pre_macro_defs.h | 108 +- .../{ => old}/bli_kernel_ind_prototypes.h | 0 frame/ind/oapi/bli_l3_3m4m_oapi.c | 382 ++++++ frame/ind/oapi/bli_l3_ind_oapi.c | 137 +++ .../{bli_oapi_ind.h => bli_l3_ind_oapi.h} | 38 +- frame/ind/oapi/bli_l3_nat_oapi.c | 225 ++++ frame/ind/oapi/{ => old}/bli_oapi_3m1.c | 24 +- frame/ind/oapi/{ => old}/bli_oapi_3m2.c | 16 +- frame/ind/oapi/{ => old}/bli_oapi_3m3.c | 16 +- frame/ind/oapi/{ => old}/bli_oapi_3mh.c | 32 +- frame/ind/oapi/old/bli_oapi_4m1.c | 280 +++++ frame/ind/oapi/{ => old}/bli_oapi_4mb.c | 20 +- frame/ind/oapi/{ => old}/bli_oapi_4mh.c | 40 +- .../bli_oapi_nat.c.old} | 93 +- frame/ind/query/bli_bsv_query.c | 178 --- frame/ind/query/bli_ukr_query.c | 228 ---- .../{bli_tapi_ind.c => bli_l3_ind_tapi.c} | 286 +++-- frame/ind/tapi/bli_l3_ind_tapi.h | 271 ++++ frame/ind/tapi/bli_tapi_ind.h | 251 ---- frame/ind/ukernels/gemm/bli_gemm3m1_ukr_ref.c | 113 +- frame/ind/ukernels/gemm/bli_gemm3m2_ukr_ref.c | 65 +- frame/ind/ukernels/gemm/bli_gemm3m3_ukr_ref.c | 65 +- frame/ind/ukernels/gemm/bli_gemm3mh_ukr_ref.c | 69 +- frame/ind/ukernels/gemm/bli_gemm4m1_ukr_ref.c | 125 +- frame/ind/ukernels/gemm/bli_gemm4mb_ukr_ref.c | 139 ++- frame/ind/ukernels/gemm/bli_gemm4mh_ukr_ref.c | 65 +- frame/ind/ukernels/gemm/bli_gemmind_ukr_ref.h | 20 +- .../ukernels/trsm/bli_gemmtrsm3m1_l_ukr_ref.c | 131 +- .../ukernels/trsm/bli_gemmtrsm3m1_u_ukr_ref.c | 151 ++- .../ukernels/trsm/bli_gemmtrsm4m1_l_ukr_ref.c | 132 +- .../ukernels/trsm/bli_gemmtrsm4m1_u_ukr_ref.c | 132 +- .../ukernels/trsm/bli_gemmtrsmind_x_ukr_ref.h | 22 +- .../ind/ukernels/trsm/bli_trsm3m1_l_ukr_ref.c | 48 +- .../ind/ukernels/trsm/bli_trsm3m1_u_ukr_ref.c | 40 +- .../ind/ukernels/trsm/bli_trsm4m1_l_ukr_ref.c | 38 +- .../ind/ukernels/trsm/bli_trsm4m1_u_ukr_ref.c | 38 +- .../ind/ukernels/trsm/bli_trsmind_x_ukr_ref.h | 14 +- .../bli_trsmind_cntl.h => util/bli_util.h} | 13 +- frame/util/bli_util_check.c | 443 +++++++ frame/util/bli_util_check.h | 197 +++ frame/util/bli_util_oapi.c | 512 ++++++++ frame/util/bli_util_oapi.h | 179 +++ frame/util/bli_util_oapi_wc.c | 46 + frame/util/bli_util_oapi_woc.c | 46 + frame/util/bli_util_tapi.c | 423 +++++++ frame/util/bli_util_tapi.h | 194 +++ frame/util/bli_util_unb_var1.c | 1091 +++++++++++++++++ frame/util/bli_util_unb_var1.h | 195 +++ frame/util/{ => old}/amaxv/bli_amaxv.c | 0 frame/util/{ => old}/amaxv/bli_amaxv.h | 0 frame/util/{ => old}/amaxv/bli_amaxv_check.c | 0 frame/util/{ => old}/amaxv/bli_amaxv_check.h | 0 .../util/{ => old}/amaxv/bli_amaxv_unb_var1.c | 58 +- .../util/{ => old}/amaxv/bli_amaxv_unb_var1.h | 12 +- frame/util/{ => old}/asumv/bli_asumv.c | 0 frame/util/{ => old}/asumv/bli_asumv.h | 0 frame/util/{ => old}/asumv/bli_asumv_check.c | 0 frame/util/{ => old}/asumv/bli_asumv_check.h | 0 .../util/{ => old}/asumv/bli_asumv_unb_var1.c | 36 +- .../util/{ => old}/asumv/bli_asumv_unb_var1.h | 0 frame/util/{ => old}/mkherm/bli_mkherm.c | 0 frame/util/{ => old}/mkherm/bli_mkherm.h | 0 .../util/{ => old}/mkherm/bli_mkherm_check.c | 0 .../util/{ => old}/mkherm/bli_mkherm_check.h | 0 .../{ => old}/mkherm/bli_mkherm_unb_var1.c | 41 +- .../{ => old}/mkherm/bli_mkherm_unb_var1.h | 0 frame/util/{ => old}/mksymm/bli_mksymm.c | 0 frame/util/{ => old}/mksymm/bli_mksymm.h | 0 .../util/{ => old}/mksymm/bli_mksymm_check.c | 0 .../util/{ => old}/mksymm/bli_mksymm_check.h | 0 .../{ => old}/mksymm/bli_mksymm_unb_var1.c | 23 +- .../{ => old}/mksymm/bli_mksymm_unb_var1.h | 0 frame/util/{ => old}/mktrim/bli_mktrim.c | 0 frame/util/{ => old}/mktrim/bli_mktrim.h | 0 .../util/{ => old}/mktrim/bli_mktrim_check.c | 0 .../util/{ => old}/mktrim/bli_mktrim_check.h | 0 .../{ => old}/mktrim/bli_mktrim_unb_var1.c | 34 +- .../{ => old}/mktrim/bli_mktrim_unb_var1.h | 0 frame/util/{ => old}/norm1m/bli_norm1m.c | 0 frame/util/{ => old}/norm1m/bli_norm1m.h | 0 .../util/{ => old}/norm1m/bli_norm1m_check.c | 0 .../util/{ => old}/norm1m/bli_norm1m_check.h | 0 .../{ => old}/norm1m/bli_norm1m_unb_var1.c | 107 +- .../{ => old}/norm1m/bli_norm1m_unb_var1.h | 20 +- frame/util/{ => old}/norm1v/bli_norm1v.c | 0 frame/util/{ => old}/norm1v/bli_norm1v.h | 0 .../util/{ => old}/norm1v/bli_norm1v_check.c | 0 .../util/{ => old}/norm1v/bli_norm1v_check.h | 0 .../{ => old}/norm1v/bli_norm1v_unb_var1.c | 35 +- .../{ => old}/norm1v/bli_norm1v_unb_var1.h | 12 +- frame/util/{ => old}/normfm/bli_normfm.c | 0 frame/util/{ => old}/normfm/bli_normfm.h | 0 .../util/{ => old}/normfm/bli_normfm_check.c | 0 .../util/{ => old}/normfm/bli_normfm_check.h | 0 .../{ => old}/normfm/bli_normfm_unb_var1.c | 129 +- .../{ => old}/normfm/bli_normfm_unb_var1.h | 24 +- frame/util/{ => old}/normfv/bli_normfv.c | 0 frame/util/{ => old}/normfv/bli_normfv.h | 0 .../util/{ => old}/normfv/bli_normfv_check.c | 0 .../util/{ => old}/normfv/bli_normfv_check.h | 0 .../{ => old}/normfv/bli_normfv_unb_var1.c | 55 +- .../{ => old}/normfv/bli_normfv_unb_var1.h | 15 +- frame/util/{ => old}/normim/bli_normim.c | 0 frame/util/{ => old}/normim/bli_normim.h | 0 .../util/{ => old}/normim/bli_normim_check.c | 0 .../util/{ => old}/normim/bli_normim_check.h | 0 .../{ => old}/normim/bli_normim_unb_var1.c | 40 +- .../{ => old}/normim/bli_normim_unb_var1.h | 20 +- frame/util/{ => old}/normiv/bli_normiv.c | 0 frame/util/{ => old}/normiv/bli_normiv.h | 0 .../util/{ => old}/normiv/bli_normiv_check.c | 0 .../util/{ => old}/normiv/bli_normiv_check.h | 0 .../{ => old}/normiv/bli_normiv_unb_var1.c | 35 +- .../{ => old}/normiv/bli_normiv_unb_var1.h | 12 +- frame/util/{ => old}/printm/bli_fprintm.c | 19 +- frame/util/{ => old}/printm/bli_fprintm.h | 19 +- .../util/{ => old}/printm/bli_fprintm_check.c | 0 .../util/{ => old}/printm/bli_fprintm_check.h | 0 frame/util/{ => old}/printm/bli_printm.c | 34 +- frame/util/{ => old}/printm/bli_printm.h | 17 +- frame/util/{ => old}/printv/bli_fprintv.c | 17 +- frame/util/{ => old}/printv/bli_fprintv.h | 17 +- .../util/{ => old}/printv/bli_fprintv_check.c | 0 .../util/{ => old}/printv/bli_fprintv_check.h | 0 frame/util/{ => old}/printv/bli_printv.c | 30 +- frame/util/{ => old}/printv/bli_printv.h | 15 +- frame/util/{ => old}/randm/bli_randm.c | 0 frame/util/{ => old}/randm/bli_randm.h | 0 frame/util/{ => old}/randm/bli_randm_check.c | 0 frame/util/{ => old}/randm/bli_randm_check.h | 0 .../util/{ => old}/randm/bli_randm_unb_var1.c | 85 +- .../util/{ => old}/randm/bli_randm_unb_var1.h | 0 frame/util/{ => old}/randv/bli_randv.c | 0 frame/util/{ => old}/randv/bli_randv.h | 0 frame/util/{ => old}/randv/bli_randv_check.c | 0 frame/util/{ => old}/randv/bli_randv_check.h | 0 .../util/{ => old}/randv/bli_randv_unb_var1.c | 0 .../util/{ => old}/randv/bli_randv_unb_var1.h | 1 + frame/util/{ => old}/sumsqv/bli_sumsqv.c | 0 frame/util/{ => old}/sumsqv/bli_sumsqv.h | 0 .../util/{ => old}/sumsqv/bli_sumsqv_check.c | 0 .../util/{ => old}/sumsqv/bli_sumsqv_check.h | 0 .../{ => old}/sumsqv/bli_sumsqv_unb_var1.c | 52 +- .../{ => old}/sumsqv/bli_sumsqv_unb_var1.h | 14 +- kernels/arm/{neon => }/3/bli_gemm_opt_4x4.c | 80 +- kernels/armv7a/3/bli_gemm_opt_4x4.c | 172 +-- .../armv8a/{neon => }/3/bli_gemm_opt_4x4.c | 0 kernels/bgq/1/bli_axpyv_opt_var1.c | 16 +- kernels/bgq/1/bli_dotv_opt_var1.c | 18 +- kernels/bgq/1f/bli_axpyf_opt_var1.c | 94 +- kernels/bgq/3/bli_gemm_8x8.h | 69 -- .../3/{bli_gemm_8x8.c => bli_gemm_int_8x8.c} | 41 +- ...{bli_gemm_ref_4x4.c => bli_gemm_c99_4x4.c} | 22 +- ...m_u_ref_4x4.c => bli_gemmtrsm_l_c99_4x4.c} | 66 +- ...m_l_ref_4x4.c => bli_gemmtrsm_u_c99_4x4.c} | 66 +- ..._trsm_l_ref_4x4.c => bli_trsm_l_c99_4x4.c} | 18 +- ..._trsm_u_ref_4x4.c => bli_trsm_u_c99_4x4.c} | 18 +- kernels/loongson3a/3/bli_gemm_opt_d4x4.c | 79 +- kernels/mic/3/bli_dgemm_opt_30x8.c | 60 +- kernels/mic/3/bli_sgemm_opt_30x16.c | 20 +- kernels/nacl/pnacl/3/bli_gemm_opt.c | 46 +- kernels/{ => old}/x86/1m/bli_packm_2xk.c | 0 kernels/{ => old}/x86/1m/bli_packm_2xk.h | 0 kernels/{ => old}/x86/1m/bli_packm_4xk.c | 0 kernels/{ => old}/x86/1m/bli_packm_4xk.h | 0 kernels/{ => old}/x86/3/bli_gemm_opt_d2x4.c | 0 kernels/{ => old}/x86/3/bli_gemm_opt_d4x2.c | 0 .../{ => old}/x86/3/bli_gemmtrsm_l_opt_d4x2.c | 0 .../{ => old}/x86/3/bli_gemmtrsm_u_opt_d4x2.c | 0 kernels/{ => old}/x86/3/bli_trsm_l_opt_d4x2.c | 0 kernels/power7/3/bli_gemm_opt_8x4.c | 80 +- kernels/power7/3/bli_gemm_opt_8x4.h | 80 +- kernels/x86/3/bli_gemm_opt_d2x4.h | 49 - kernels/x86/3/bli_gemm_opt_d4x2.h | 51 - kernels/x86/3/bli_gemmtrsm_l_opt_d4x2.h | 53 - kernels/x86/3/bli_gemmtrsm_u_opt_d4x2.h | 53 - kernels/x86/3/bli_trsm_l_opt_d4x2.h | 47 - ...mm_4x6_FMA4.c => bli_gemm_asm_d4x6_fma4.c} | 80 +- .../{avx2 => haswell}/3/bli_gemm_asm_d12x4.c | 80 +- .../{avx2 => haswell}/3/bli_gemm_asm_d8x6.c | 80 +- .../1/bli_axpyv_int_var1.c} | 16 +- .../1/bli_dotv_int_var1.c} | 18 +- .../1f/bli_axpy2v_opt_var1.c | 22 +- .../1f/bli_axpyf_opt_var1.c | 23 +- .../1f/bli_dotaxpyv_opt_var1.c | 24 +- .../1f/bli_dotxaxpyf_opt_var1.c | 30 +- .../1f/bli_dotxf_opt_var1.c | 24 +- .../1f/old}/bli_axpyf_opt_var1.c.alt | 0 .../1f/old}/bli_dotxf_opt_var1.c.alt | 0 .../3/bli_gemm_asm_d4x4.c} | 101 +- .../3/bli_gemmtrsm_l_asm_d4x4.c} | 119 +- .../3/bli_gemmtrsm_u_asm_d4x4.c} | 119 +- .../3/bli_trsm_l_asm_d4x4.c} | 76 +- .../3/bli_trsm_u_asm_d4x4.c} | 75 +- ...li_gemm_new_d8x3.c => bli_gemm_asm_d8x3.c} | 80 +- .../3/bli_gemm_asm_d8x4.c | 80 +- .../3/bli_gemm_int_d8x4.c | 135 +- testsuite/input.operations | 2 +- testsuite/src/test_axpy2v.c | 19 +- testsuite/src/test_axpyf.c | 21 +- testsuite/src/test_dotaxpyv.c | 19 +- testsuite/src/test_dotxaxpyf.c | 21 +- testsuite/src/test_dotxf.c | 21 +- testsuite/src/test_gemm_ukr.c | 45 +- testsuite/src/test_gemmtrsm_ukr.c | 42 +- testsuite/src/test_gemv.c | 4 +- testsuite/src/test_libblis.c | 236 ++-- testsuite/src/test_libblis.h | 2 +- testsuite/src/test_trsm_ukr.c | 41 +- 1337 files changed, 53442 insertions(+), 30646 deletions(-) rename frame/{1f/dotxaxpyf/bli_dotxaxpyf_fusefac.h => 0/bli_l0.h} (93%) create mode 100644 frame/0/bli_l0_check.c create mode 100644 frame/0/bli_l0_check.h create mode 100644 frame/0/bli_l0_oapi.c create mode 100644 frame/0/bli_l0_oapi.h create mode 100644 frame/0/bli_l0_tapi.c create mode 100644 frame/0/bli_l0_tapi.h rename frame/0/{ => old}/absqsc/bli_absqsc.c (100%) rename frame/0/{ => old}/absqsc/bli_absqsc.h (100%) rename frame/0/{ => old}/absqsc/bli_absqsc_check.c (100%) rename frame/0/{ => old}/absqsc/bli_absqsc_check.h (100%) rename frame/0/{ => old}/absqsc/bli_absqsc_unb_var1.c (100%) rename frame/0/{ => old}/absqsc/bli_absqsc_unb_var1.h (100%) rename frame/0/{ => old}/addsc/bli_addsc.c (100%) rename frame/0/{ => old}/addsc/bli_addsc.h (100%) rename frame/0/{ => old}/addsc/bli_addsc_check.c (100%) rename frame/0/{ => old}/addsc/bli_addsc_check.h (100%) rename frame/0/{ => old}/addsc/bli_addsc_unb_var1.c (100%) rename frame/0/{ => old}/addsc/bli_addsc_unb_var1.h (100%) rename frame/{1/addv/bli_addv.c => 0/old/bli_getsc.c} (52%) rename frame/{1/invertv/bli_invertv.c => 0/old/bli_getsc.h} (73%) create mode 100644 frame/0/old/bli_setsc.c create mode 100644 frame/0/old/bli_setsc.h rename frame/{1/swapv/bli_swapv.c => 0/old/copysc/bli_copysc.c} (68%) rename frame/{1/setv/bli_setv.h => 0/old/copysc/bli_copysc.h} (77%) rename frame/0/{ => old}/copysc/bli_copysc_check.c (100%) rename frame/0/{ => old}/copysc/bli_copysc_check.h (100%) rename frame/0/{ => old}/copysc/bli_copysc_unb_var1.c (100%) rename frame/0/{ => old}/copysc/bli_copysc_unb_var1.h (100%) rename frame/0/{ => old}/divsc/bli_divsc.c (100%) rename frame/0/{ => old}/divsc/bli_divsc.h (100%) rename frame/0/{ => old}/divsc/bli_divsc_check.c (100%) rename frame/0/{ => old}/divsc/bli_divsc_check.h (100%) rename frame/0/{ => old}/divsc/bli_divsc_unb_var1.c (100%) rename frame/0/{ => old}/divsc/bli_divsc_unb_var1.h (100%) rename frame/0/{ => old}/getsc/bli_getsc.c (100%) rename frame/0/{ => old}/getsc/bli_getsc.h (100%) rename frame/0/{ => old}/getsc/bli_getsc_check.c (100%) rename frame/0/{ => old}/getsc/bli_getsc_check.h (100%) rename frame/0/{ => old}/mulsc/bli_mulsc.c (100%) rename frame/0/{ => old}/mulsc/bli_mulsc.h (100%) rename frame/0/{ => old}/mulsc/bli_mulsc_check.c (100%) rename frame/0/{ => old}/mulsc/bli_mulsc_check.h (100%) rename frame/0/{ => old}/mulsc/bli_mulsc_unb_var1.c (100%) rename frame/0/{ => old}/mulsc/bli_mulsc_unb_var1.h (100%) rename frame/0/{ => old}/normfsc/bli_normfsc.c (100%) rename frame/0/{ => old}/normfsc/bli_normfsc.h (100%) rename frame/0/{ => old}/normfsc/bli_normfsc_check.c (100%) rename frame/0/{ => old}/normfsc/bli_normfsc_check.h (100%) rename frame/0/{ => old}/normfsc/bli_normfsc_unb_var1.c (100%) rename frame/0/{ => old}/normfsc/bli_normfsc_unb_var1.h (100%) rename frame/0/{ => old}/setsc/bli_setsc.c (100%) rename frame/0/{ => old}/setsc/bli_setsc.h (100%) rename frame/0/{ => old}/setsc/bli_setsc_check.c (100%) rename frame/0/{ => old}/setsc/bli_setsc_check.h (100%) rename frame/0/{ => old}/sqrtsc/bli_sqrtsc.c (100%) rename frame/0/{ => old}/sqrtsc/bli_sqrtsc.h (100%) rename frame/0/{ => old}/sqrtsc/bli_sqrtsc_check.c (100%) rename frame/0/{ => old}/sqrtsc/bli_sqrtsc_check.h (100%) rename frame/0/{ => old}/sqrtsc/bli_sqrtsc_unb_var1.c (100%) rename frame/0/{ => old}/sqrtsc/bli_sqrtsc_unb_var1.h (100%) rename frame/0/{ => old}/subsc/bli_subsc.c (100%) rename frame/0/{ => old}/subsc/bli_subsc.h (100%) rename frame/0/{ => old}/subsc/bli_subsc_check.c (100%) rename frame/0/{ => old}/subsc/bli_subsc_check.h (100%) rename frame/0/{ => old}/subsc/bli_subsc_unb_var1.c (100%) rename frame/0/{ => old}/subsc/bli_subsc_unb_var1.h (100%) rename frame/0/{ => old}/unzipsc/bli_unzipsc.c (100%) rename frame/0/{ => old}/unzipsc/bli_unzipsc.h (100%) rename frame/0/{ => old}/unzipsc/bli_unzipsc_check.c (100%) rename frame/0/{ => old}/unzipsc/bli_unzipsc_check.h (100%) rename frame/0/{ => old}/unzipsc/bli_unzipsc_unb_var1.c (100%) rename frame/0/{ => old}/unzipsc/bli_unzipsc_unb_var1.h (100%) rename frame/0/{ => old}/zipsc/bli_zipsc.c (100%) rename frame/0/{ => old}/zipsc/bli_zipsc.h (100%) rename frame/0/{ => old}/zipsc/bli_zipsc_check.c (100%) rename frame/0/{ => old}/zipsc/bli_zipsc_check.h (100%) rename frame/0/{ => old}/zipsc/bli_zipsc_unb_var1.c (100%) rename frame/0/{ => old}/zipsc/bli_zipsc_unb_var1.h (100%) delete mode 100644 frame/1/addv/bli_addv_ref.c delete mode 100644 frame/1/axpyv/bli_axpyv.c delete mode 100644 frame/1/axpyv/bli_axpyv_ref.c create mode 100644 frame/1/bli_l1v.h create mode 100644 frame/1/bli_l1v_check.c create mode 100644 frame/1/bli_l1v_check.h create mode 100644 frame/1/bli_l1v_cntx.c create mode 100644 frame/1/bli_l1v_cntx.h create mode 100644 frame/1/bli_l1v_ft.h create mode 100644 frame/1/bli_l1v_ker.h create mode 100644 frame/1/bli_l1v_oapi.c create mode 100644 frame/1/bli_l1v_oapi.h create mode 100644 frame/1/bli_l1v_oapi_wc.c create mode 100644 frame/1/bli_l1v_oapi_woc.c create mode 100644 frame/1/bli_l1v_tapi.c create mode 100644 frame/1/bli_l1v_tapi.h delete mode 100644 frame/1/copyv/bli_copyv.c delete mode 100644 frame/1/copyv/bli_copyv_kernel.h delete mode 100644 frame/1/dotv/bli_dotv.c delete mode 100644 frame/1/dotv/bli_dotv_ref.c delete mode 100644 frame/1/dotxv/bli_dotxv.c delete mode 100644 frame/1/dotxv/bli_dotxv_kernel.c delete mode 100644 frame/1/dotxv/bli_dotxv_kernel.h delete mode 100644 frame/1/dotxv/bli_dotxv_ref.c create mode 100644 frame/1/kernels/bli_addv_ref.c create mode 100644 frame/1/kernels/bli_axpyv_ref.c create mode 100644 frame/1/kernels/bli_copyv_ref.c create mode 100644 frame/1/kernels/bli_dotv_ref.c create mode 100644 frame/1/kernels/bli_dotxv_ref.c create mode 100644 frame/1/kernels/bli_invertv_ref.c create mode 100644 frame/1/kernels/bli_l1v_ref.h create mode 100644 frame/1/kernels/bli_scal2v_ref.c rename frame/1/{invertv/bli_invertv_ref.c => kernels/bli_scalv_ref.c} (70%) create mode 100644 frame/1/kernels/bli_setv_ref.c create mode 100644 frame/1/kernels/bli_subv_ref.c create mode 100644 frame/1/kernels/bli_swapv_ref.c rename frame/1/{addv => kernels/old}/bli_addv_ref.h (84%) rename frame/1/{axpyv => kernels/old}/bli_axpyv_ref.h (81%) rename frame/1/{copyv => kernels/old}/bli_copyv_ref.h (83%) rename frame/1/{dotv => kernels/old}/bli_dotv_ref.h (81%) rename frame/1/{dotxv => kernels/old}/bli_dotxv_ref.h (76%) rename frame/1/{invertv => kernels/old}/bli_invertv_ref.h (95%) rename frame/1/{scal2v => kernels/old}/bli_scal2v_ref.h (81%) rename frame/1/{scalv => kernels/old}/bli_scalv_ref.h (79%) rename frame/1/{setv => kernels/old}/bli_setv_ref.h (81%) rename frame/1/{subv => kernels/old}/bli_subv_ref.h (84%) rename frame/1/{swapv => kernels/old}/bli_swapv_ref.h (83%) rename frame/1/{subv/bli_subv_kernel.c => old/addv/bli_addv.c} (58%) rename frame/1/{ => old}/addv/bli_addv.h (75%) rename frame/1/{axpyv/bli_axpyv_kernel.c => old/axpyv/bli_axpyv.c} (52%) rename frame/1/{ => old}/axpyv/bli_axpyv.h (73%) rename frame/1/{addv => old/check}/bli_addv_check.c (100%) rename frame/1/{addv => old/check}/bli_addv_check.h (100%) rename frame/1/{axpyv => old/check}/bli_axpyv_check.c (100%) rename frame/1/{axpyv => old/check}/bli_axpyv_check.h (100%) rename frame/1/{copyv => old/check}/bli_copyv_check.c (100%) rename frame/1/{copyv => old/check}/bli_copyv_check.h (100%) rename frame/1/{dotv => old/check}/bli_dotv_check.c (100%) rename frame/1/{dotv => old/check}/bli_dotv_check.h (100%) rename frame/1/{dotxv => old/check}/bli_dotxv_check.c (100%) rename frame/1/{dotxv => old/check}/bli_dotxv_check.h (100%) rename frame/1/{invertv => old/check}/bli_invertv_check.c (100%) rename frame/1/{invertv => old/check}/bli_invertv_check.h (100%) rename frame/1/{scal2v => old/check}/bli_scal2v_check.c (100%) rename frame/1/{scal2v => old/check}/bli_scal2v_check.h (100%) rename frame/1/{scalv => old/check}/bli_scalv_check.c (100%) rename frame/1/{scalv => old/check}/bli_scalv_check.h (100%) rename frame/1/{setv => old/check}/bli_setv_check.c (100%) rename frame/1/{setv => old/check}/bli_setv_check.h (100%) rename frame/1/{subv => old/check}/bli_subv_check.c (100%) rename frame/1/{subv => old/check}/bli_subv_check.h (100%) rename frame/1/{swapv => old/check}/bli_swapv_check.c (100%) rename frame/1/{swapv => old/check}/bli_swapv_check.h (100%) rename frame/1/{addv/bli_addv_kernel.c => old/copyv/bli_copyv.c} (58%) rename frame/1/{ => old}/copyv/bli_copyv.h (75%) rename frame/1/{dotv/bli_dotv_kernel.c => old/dotv/bli_dotv.c} (55%) rename frame/1/{ => old}/dotv/bli_dotv.h (72%) create mode 100644 frame/1/old/dotxv/bli_dotxv.c rename frame/1/{ => old}/dotxv/bli_dotxv.h (70%) create mode 100644 frame/1/old/invertv/bli_invertv.c rename frame/1/{ => old}/invertv/bli_invertv.h (82%) rename frame/1/{copyv/bli_copyv_ref.c => old/scal2v/bli_scal2v.c} (51%) rename frame/1/{ => old}/scal2v/bli_scal2v.h (71%) create mode 100644 frame/1/old/scalv/bli_scalv.c rename frame/1/{ => old}/scalv/bli_scalv.h (68%) create mode 100644 frame/1/old/setv/bli_setv.c rename frame/{1m/packm/ukernels/bli_packm_ref_cxk.h => 1/old/setv/bli_setv.h} (65%) rename frame/1/{ => old}/setv/old/bli_setv_unb_var2.c (100%) rename frame/1/{ => old}/setv/old/bli_setv_unb_var2.h (100%) rename frame/1/{copyv/bli_copyv_kernel.c => old/subv/bli_subv.c} (58%) rename frame/1/{ => old}/subv/bli_subv.h (75%) rename frame/1/{swapv/bli_swapv_kernel.c => old/swapv/bli_swapv.c} (58%) rename frame/1/{ => old}/swapv/bli_swapv.h (76%) delete mode 100644 frame/1/scal2v/bli_scal2v.c delete mode 100644 frame/1/scal2v/bli_scal2v_kernel.c delete mode 100644 frame/1/scal2v/bli_scal2v_kernel.h delete mode 100644 frame/1/scal2v/bli_scal2v_ref.c delete mode 100644 frame/1/scalv/bli_scalv.c delete mode 100644 frame/1/scalv/bli_scalv_kernel.c delete mode 100644 frame/1/scalv/bli_scalv_ref.c delete mode 100644 frame/1/setv/bli_setv.c delete mode 100644 frame/1/setv/bli_setv_kernel.c delete mode 100644 frame/1/setv/bli_setv_ref.c delete mode 100644 frame/1/subv/bli_subv.c delete mode 100644 frame/1/subv/bli_subv_kernel.h delete mode 100644 frame/1/subv/bli_subv_ref.c delete mode 100644 frame/1/swapv/bli_swapv_ref.c rename frame/{1f/dotxf/bli_dotxf_fusefac.c => 1d/bli_l1d.h} (87%) create mode 100644 frame/1d/bli_l1d_check.c create mode 100644 frame/1d/bli_l1d_check.h create mode 100644 frame/1d/bli_l1d_cntx.c create mode 100644 frame/1d/bli_l1d_cntx.h create mode 100644 frame/1d/bli_l1d_oapi.c rename frame/{1m/packm/ukernels/bli_packm_ref_cxk_4mi.h => 1d/bli_l1d_oapi.h} (66%) create mode 100644 frame/1d/bli_l1d_oapi_wc.c create mode 100644 frame/1d/bli_l1d_oapi_woc.c create mode 100644 frame/1d/bli_l1d_tapi.c create mode 100644 frame/1d/bli_l1d_tapi.h rename frame/1d/{addd => old}/bli_addd.c (100%) rename frame/1d/{addd => old}/bli_addd.h (100%) rename frame/1d/{addd => old}/bli_addd_check.c (100%) rename frame/1d/{addd => old}/bli_addd_check.h (100%) rename frame/1d/{addd => old}/bli_addd_unb_var1.c (100%) rename frame/1d/{addd => old}/bli_addd_unb_var1.h (100%) rename frame/1d/{axpyd => old}/bli_axpyd.c (100%) rename frame/1d/{axpyd => old}/bli_axpyd.h (100%) rename frame/1d/{axpyd => old}/bli_axpyd_check.c (100%) rename frame/1d/{axpyd => old}/bli_axpyd_check.h (100%) rename frame/1d/{axpyd => old}/bli_axpyd_unb_var1.c (100%) rename frame/1d/{axpyd => old}/bli_axpyd_unb_var1.h (100%) rename frame/1d/{copyd => old}/bli_copyd.c (100%) rename frame/1d/{copyd => old}/bli_copyd.h (100%) rename frame/1d/{copyd => old}/bli_copyd_check.c (100%) rename frame/1d/{copyd => old}/bli_copyd_check.h (100%) rename frame/1d/{copyd => old}/bli_copyd_unb_var1.c (100%) rename frame/1d/{copyd => old}/bli_copyd_unb_var1.h (100%) rename frame/1d/{invertd => old}/bli_invertd.c (100%) rename frame/1d/{invertd => old}/bli_invertd.h (100%) rename frame/1d/{invertd => old}/bli_invertd_check.c (100%) rename frame/1d/{invertd => old}/bli_invertd_check.h (100%) rename frame/1d/{invertd => old}/bli_invertd_unb_var1.c (100%) rename frame/1d/{invertd => old}/bli_invertd_unb_var1.h (100%) rename frame/1d/{scal2d => old}/bli_scal2d.c (100%) rename frame/1d/{scal2d => old}/bli_scal2d.h (100%) rename frame/1d/{scal2d => old}/bli_scal2d_check.c (100%) rename frame/1d/{scal2d => old}/bli_scal2d_check.h (100%) rename frame/1d/{scal2d => old}/bli_scal2d_unb_var1.c (100%) rename frame/1d/{scal2d => old}/bli_scal2d_unb_var1.h (100%) rename frame/1d/{scald => old}/bli_scald.c (100%) rename frame/1d/{scald => old}/bli_scald.h (100%) rename frame/1d/{scald => old}/bli_scald_check.c (100%) rename frame/1d/{scald => old}/bli_scald_check.h (100%) rename frame/1d/{scald => old}/bli_scald_unb_var1.c (100%) rename frame/1d/{scald => old}/bli_scald_unb_var1.h (100%) rename frame/1d/{setd => old}/bli_setd.c (100%) rename frame/1d/{setd => old}/bli_setd.h (100%) rename frame/1d/{setd => old}/bli_setd_check.c (100%) rename frame/1d/{setd => old}/bli_setd_check.h (100%) rename frame/1d/{setd => old}/bli_setd_unb_var1.c (100%) rename frame/1d/{setd => old}/bli_setd_unb_var1.h (100%) rename frame/1d/{setid => old}/bli_setid.c (100%) rename frame/1d/{setid => old}/bli_setid.h (100%) rename frame/1d/{setid => old}/bli_setid_check.c (100%) rename frame/1d/{setid => old}/bli_setid_check.h (100%) rename frame/1d/{setid => old}/bli_setid_unb_var1.c (100%) rename frame/1d/{setid => old}/bli_setid_unb_var1.h (100%) rename frame/1d/{subd => old}/bli_subd.c (100%) rename frame/1d/{subd => old}/bli_subd.h (100%) rename frame/1d/{subd => old}/bli_subd_check.c (100%) rename frame/1d/{subd => old}/bli_subd_check.h (100%) rename frame/1d/{subd => old}/bli_subd_unb_var1.c (100%) rename frame/1d/{subd => old}/bli_subd_unb_var1.h (100%) delete mode 100644 frame/1f/axpy2v/bli_axpy2v.c delete mode 100644 frame/1f/axpy2v/bli_axpy2v_kernel.c delete mode 100644 frame/1f/axpy2v/bli_axpy2v_kernel.h delete mode 100644 frame/1f/axpy2v/bli_axpy2v_ref.c delete mode 100644 frame/1f/axpyf/bli_axpyf.c delete mode 100644 frame/1f/axpyf/bli_axpyf_kernel.h create mode 100644 frame/1f/bli_l1f.h create mode 100644 frame/1f/bli_l1f_check.c create mode 100644 frame/1f/bli_l1f_check.h create mode 100644 frame/1f/bli_l1f_cntx.c create mode 100644 frame/1f/bli_l1f_cntx.h create mode 100644 frame/1f/bli_l1f_ft.h create mode 100644 frame/1f/bli_l1f_ker.h create mode 100644 frame/1f/bli_l1f_oapi.c create mode 100644 frame/1f/bli_l1f_oapi.h create mode 100644 frame/1f/bli_l1f_oapi_wc.c create mode 100644 frame/1f/bli_l1f_oapi_woc.c create mode 100644 frame/1f/bli_l1f_tapi.c create mode 100644 frame/1f/bli_l1f_tapi.h delete mode 100644 frame/1f/dotaxpyv/bli_dotaxpyv.c delete mode 100644 frame/1f/dotaxpyv/bli_dotaxpyv_kernel.c delete mode 100644 frame/1f/dotaxpyv/bli_dotaxpyv_kernel.h delete mode 100644 frame/1f/dotaxpyv/bli_dotaxpyv_ref.c delete mode 100644 frame/1f/dotxaxpyf/bli_dotxaxpyf.c delete mode 100644 frame/1f/dotxaxpyf/bli_dotxaxpyf_kernel.c delete mode 100644 frame/1f/dotxaxpyf/bli_dotxaxpyf_kernel.h delete mode 100644 frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var1.c delete mode 100644 frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var2.c delete mode 100644 frame/1f/dotxf/bli_dotxf.c delete mode 100644 frame/1f/dotxf/bli_dotxf_kernel.h create mode 100644 frame/1f/kernels/bli_axpy2v_ref.c create mode 100644 frame/1f/kernels/bli_axpyf_ref.c rename frame/{1/invertv/bli_invertv_kernel.c => 1f/kernels/bli_dotaxpyv_ref.c} (64%) create mode 100644 frame/1f/kernels/bli_dotxaxpyf_ref_var1.c create mode 100644 frame/1f/kernels/bli_dotxaxpyf_ref_var2.c create mode 100644 frame/1f/kernels/bli_dotxf_ref.c create mode 100644 frame/1f/kernels/bli_l1f_ref.h rename frame/1f/{axpy2v => kernels/old}/bli_axpy2v_ref.h (76%) rename frame/1f/{axpyf => kernels/old}/bli_axpyf_ref.h (78%) rename frame/1f/{dotaxpyv => kernels/old}/bli_dotaxpyv_ref.h (75%) rename frame/1f/{dotxaxpyf => kernels/old}/bli_dotxaxpyf_ref_var1.h (70%) rename frame/1f/{dotxaxpyf => kernels/old}/bli_dotxaxpyf_ref_var2.h (70%) rename frame/1f/{dotxf => kernels/old}/bli_dotxf_ref.h (76%) create mode 100644 frame/1f/old/axpy2v/bli_axpy2v.c rename frame/1f/{ => old}/axpy2v/bli_axpy2v.h (62%) create mode 100644 frame/1f/old/axpyf/bli_axpyf.c rename frame/1f/{ => old}/axpyf/bli_axpyf.h (65%) rename frame/1f/{axpy2v => old/check}/bli_axpy2v_check.c (89%) rename frame/1f/{axpy2v => old/check}/bli_axpy2v_check.h (95%) rename frame/1f/{axpyf => old/check}/bli_axpyf_check.c (100%) rename frame/1f/{axpyf => old/check}/bli_axpyf_check.h (100%) rename frame/1f/{dotaxpyv => old/check}/bli_dotaxpyv_check.c (100%) rename frame/1f/{dotaxpyv => old/check}/bli_dotaxpyv_check.h (100%) rename frame/1f/{dotxaxpyf => old/check}/bli_dotxaxpyf_check.c (100%) rename frame/1f/{dotxaxpyf => old/check}/bli_dotxaxpyf_check.h (100%) rename frame/1f/{dotxf => old/check}/bli_dotxf_check.c (100%) rename frame/1f/{dotxf => old/check}/bli_dotxf_check.h (100%) create mode 100644 frame/1f/old/dotaxpyv/bli_dotaxpyv.c rename frame/1f/{ => old}/dotaxpyv/bli_dotaxpyv.h (62%) create mode 100644 frame/1f/old/dotxaxpyf/bli_dotxaxpyf.c rename frame/1f/{ => old}/dotxaxpyf/bli_dotxaxpyf.h (57%) create mode 100644 frame/1f/old/dotxf/bli_dotxf.c rename frame/1f/{ => old}/dotxf/bli_dotxf.h (63%) create mode 100644 frame/1m/bli_l1m.h create mode 100644 frame/1m/bli_l1m_check.c create mode 100644 frame/1m/bli_l1m_check.h create mode 100644 frame/1m/bli_l1m_cntx.c create mode 100644 frame/1m/bli_l1m_cntx.h rename frame/{1/dotv/bli_dotv_kernel.h => 1m/bli_l1m_ft.h} (66%) create mode 100644 frame/1m/bli_l1m_oapi.c create mode 100644 frame/1m/bli_l1m_oapi.h create mode 100644 frame/1m/bli_l1m_oapi_wc.c create mode 100644 frame/1m/bli_l1m_oapi_woc.c create mode 100644 frame/1m/bli_l1m_tapi.c create mode 100644 frame/1m/bli_l1m_tapi.h create mode 100644 frame/1m/bli_l1m_unb_var1.c create mode 100644 frame/1m/bli_l1m_unb_var1.h rename frame/1m/{ => old}/addm/bli_addm.c (100%) rename frame/1m/{ => old}/addm/bli_addm.h (100%) rename frame/1m/{ => old}/addm/bli_addm_check.c (100%) rename frame/1m/{ => old}/addm/bli_addm_check.h (100%) rename frame/1m/{ => old}/addm/bli_addm_unb_var1.c (98%) rename frame/1m/{ => old}/addm/bli_addm_unb_var1.h (97%) rename frame/1m/{ => old}/axpym/bli_axpym.c (100%) rename frame/1m/{ => old}/axpym/bli_axpym.h (100%) rename frame/1m/{ => old}/axpym/bli_axpym_check.c (100%) rename frame/1m/{ => old}/axpym/bli_axpym_check.h (100%) rename frame/1m/{ => old}/axpym/bli_axpym_unb_var1.c (99%) rename frame/1m/{ => old}/axpym/bli_axpym_unb_var1.h (97%) create mode 100644 frame/1m/old/bli_scalm.c rename frame/1m/{scalm => old}/bli_scalm.h (72%) rename frame/1m/{scalm => old}/bli_scalm_check.c (100%) rename frame/1m/{scalm => old}/bli_scalm_check.h (100%) rename frame/1m/{scalm => old}/bli_scalm_unb_var1.c (58%) rename frame/1m/{scalm => old}/bli_scalm_unb_var1.h (91%) rename frame/1m/{ => old}/copym/bli_copym.c (100%) rename frame/1m/{ => old}/copym/bli_copym.h (100%) rename frame/1m/{ => old}/copym/bli_copym_check.c (100%) rename frame/1m/{ => old}/copym/bli_copym_check.h (100%) rename frame/1m/{ => old}/copym/bli_copym_unb_var1.c (98%) rename frame/1m/{ => old}/copym/bli_copym_unb_var1.h (97%) rename frame/1m/{ => old}/scal2m/bli_scal2m.c (100%) rename frame/1m/{ => old}/scal2m/bli_scal2m.h (100%) rename frame/1m/{ => old}/scal2m/bli_scal2m_check.c (100%) rename frame/1m/{ => old}/scal2m/bli_scal2m_check.h (100%) rename frame/1m/{ => old}/scal2m/bli_scal2m_unb_var1.c (90%) rename frame/1m/{ => old}/scal2m/bli_scal2m_unb_var1.h (97%) rename frame/1m/{ => old}/setm/bli_setm.c (100%) rename frame/1m/{ => old}/setm/bli_setm.h (100%) rename frame/1m/{ => old}/setm/bli_setm_check.c (100%) rename frame/1m/{ => old}/setm/bli_setm_check.h (100%) rename frame/1m/{ => old}/setm/bli_setm_unb_var1.c (96%) rename frame/1m/{ => old}/setm/bli_setm_unb_var1.h (97%) rename frame/1m/{ => old}/subm/bli_subm.c (100%) rename frame/1m/{ => old}/subm/bli_subm.h (100%) rename frame/1m/{ => old}/subm/bli_subm_check.c (100%) rename frame/1m/{ => old}/subm/bli_subm_check.h (100%) rename frame/1m/{ => old}/subm/bli_subm_unb_var1.c (98%) rename frame/1m/{ => old}/subm/bli_subm_unb_var1.h (97%) create mode 100644 frame/1m/packm/bli_packm_cntx.c rename frame/{1f/dotxaxpyf/bli_dotxaxpyf_fusefac.c => 1m/packm/bli_packm_cntx.h} (88%) rename frame/1m/packm/ukernels/{bli_packm_ref_cxk_3mis.c => bli_packm_cxk_3mis_ref.c} (95%) rename frame/1m/{unpackm/ukernels/bli_unpackm_ref_cxk.h => packm/ukernels/bli_packm_cxk_3mis_ref.h} (73%) rename frame/1m/packm/ukernels/{bli_packm_ref_cxk_4mi.c => bli_packm_cxk_4mi_ref.c} (94%) rename frame/{1/setv/bli_setv_kernel.h => 1m/packm/ukernels/bli_packm_cxk_4mi_ref.h} (72%) rename frame/1m/packm/ukernels/{bli_packm_ref_cxk.c => bli_packm_cxk_ref.c} (92%) create mode 100644 frame/1m/packm/ukernels/bli_packm_cxk_ref.h rename frame/1m/packm/ukernels/{bli_packm_ref_cxk_rih.c => bli_packm_cxk_rih_ref.c} (96%) create mode 100644 frame/1m/packm/ukernels/bli_packm_cxk_rih_ref.h delete mode 100644 frame/1m/packm/ukernels/bli_packm_ref_cxk_rih.h delete mode 100644 frame/1m/scalm/bli_scalm.c rename frame/1m/unpackm/ukernels/{bli_unpackm_ref_cxk.c => bli_unpackm_cxk_ref.c} (89%) create mode 100644 frame/1m/unpackm/ukernels/bli_unpackm_cxk_ref.h create mode 100644 frame/2/bli_l2.h create mode 100644 frame/2/bli_l2_check.c create mode 100644 frame/2/bli_l2_check.h create mode 100644 frame/2/bli_l2_cntx.c create mode 100644 frame/2/bli_l2_cntx.h create mode 100644 frame/2/bli_l2_ft.h create mode 100644 frame/2/bli_l2_oapi.c create mode 100644 frame/2/bli_l2_oapi.h create mode 100644 frame/2/bli_l2_oapi_wc.c create mode 100644 frame/2/bli_l2_oapi_woc.c create mode 100644 frame/2/bli_l2_tapi.c create mode 100644 frame/2/bli_l2_tapi.h rename frame/2/gemv/{bli_gemv.c => bli_gemv_front.c} (79%) create mode 100644 frame/2/gemv/bli_gemv_front.h create mode 100644 frame/2/gemv/bli_gemv_var.h create mode 100644 frame/2/gemv/bli_gemv_var_oapi.c create mode 100644 frame/2/gemv/bli_gemv_var_oapi.c.prev rename frame/2/gemv/{ => old}/bli_gemv_blk_var1.h (98%) rename frame/2/gemv/{ => old}/bli_gemv_blk_var2.h (98%) rename frame/2/gemv/{ => old}/bli_gemv_check.c (99%) rename frame/2/gemv/{ => old}/bli_gemv_check.h (98%) rename frame/{ind/cntl/bli_ind_cntl_init.c => 2/gemv/old/bli_gemv_cntx.c} (65%) rename frame/{3/gemm/other/bli_gemm_cntl_exp.h => 2/gemv/old/bli_gemv_cntx.h} (95%) rename frame/{1f/dotxf/bli_dotxf_ref.c => 2/gemv/old/bli_gemv_unb_var1.c} (60%) rename frame/2/gemv/{ => old}/bli_gemv_unb_var1.h (100%) rename frame/{1f/axpyf/bli_axpyf_ref.c => 2/gemv/old/bli_gemv_unb_var2.c} (54%) rename frame/2/gemv/{ => old}/bli_gemv_unb_var2.h (100%) rename frame/{1f/dotxf/bli_dotxf_kernel.c => 2/gemv/old/bli_gemv_unf_var1.c} (52%) rename frame/2/gemv/{ => old}/bli_gemv_unf_var1.h (100%) create mode 100644 frame/2/gemv/old/bli_gemv_unf_var2.c rename frame/2/gemv/{ => old}/bli_gemv_unf_var2.h (100%) rename frame/2/ger/{bli_ger.c => bli_ger_front.c} (76%) create mode 100644 frame/2/ger/bli_ger_front.h rename frame/{1m/packm/ukernels/bli_packm_ref_cxk_3mis.h => 2/ger/bli_ger_var.h} (69%) create mode 100644 frame/2/ger/bli_ger_var_oapi.c rename frame/2/ger/{ => old}/bli_ger_blk_var1.h (98%) rename frame/2/ger/{ => old}/bli_ger_blk_var2.h (98%) rename frame/2/ger/{ => old}/bli_ger_check.c (99%) rename frame/2/ger/{ => old}/bli_ger_check.h (98%) rename frame/{ind/query/bli_bsv_query.h => 2/ger/old/bli_ger_cntx.c} (72%) rename frame/{ind/cntl/bli_ind_cntl_init.h => 2/ger/old/bli_ger_cntx.h} (96%) create mode 100644 frame/2/ger/old/bli_ger_unb_var1.c rename frame/2/ger/{ => old}/bli_ger_unb_var1.h (100%) rename frame/{1f/axpyf/bli_axpyf_kernel.c => 2/ger/old/bli_ger_unb_var2.c} (52%) rename frame/2/ger/{ => old}/bli_ger_unb_var2.h (100%) rename frame/2/hemv/{bli_hemv.c => bli_hemv_front.c} (78%) create mode 100644 frame/2/hemv/bli_hemv_front.h create mode 100644 frame/2/hemv/bli_hemv_var.h create mode 100644 frame/2/hemv/bli_hemv_var_oapi.c rename frame/2/hemv/{ => old}/bli_hemv_blk_var1.h (98%) rename frame/2/hemv/{ => old}/bli_hemv_blk_var2.h (98%) rename frame/2/hemv/{ => old}/bli_hemv_blk_var3.h (98%) rename frame/2/hemv/{ => old}/bli_hemv_blk_var4.h (98%) rename frame/2/hemv/{ => old}/bli_hemv_check.c (99%) rename frame/2/hemv/{ => old}/bli_hemv_check.h (98%) create mode 100644 frame/2/hemv/old/bli_hemv_cntx.c rename frame/{1f/dotxf/bli_dotxf_fusefac.h => 2/hemv/old/bli_hemv_cntx.h} (94%) create mode 100644 frame/2/hemv/old/bli_hemv_unb_var1.c rename frame/2/hemv/{ => old}/bli_hemv_unb_var1.h (98%) create mode 100644 frame/2/hemv/old/bli_hemv_unb_var2.c rename frame/2/hemv/{ => old}/bli_hemv_unb_var2.h (98%) create mode 100644 frame/2/hemv/old/bli_hemv_unb_var3.c rename frame/2/hemv/{ => old}/bli_hemv_unb_var3.h (98%) create mode 100644 frame/2/hemv/old/bli_hemv_unb_var4.c rename frame/2/hemv/{ => old}/bli_hemv_unb_var4.h (98%) create mode 100644 frame/2/hemv/old/bli_hemv_unf_var1.c rename frame/2/hemv/{ => old}/bli_hemv_unf_var1.h (98%) create mode 100644 frame/2/hemv/old/bli_hemv_unf_var1a.c rename frame/2/hemv/{ => old}/bli_hemv_unf_var1a.h (98%) create mode 100644 frame/2/hemv/old/bli_hemv_unf_var3.c rename frame/2/hemv/{ => old}/bli_hemv_unf_var3.h (98%) create mode 100644 frame/2/hemv/old/bli_hemv_unf_var3a.c rename frame/2/hemv/{ => old}/bli_hemv_unf_var3a.h (98%) rename frame/2/her/{bli_her.c => bli_her_front.c} (78%) create mode 100644 frame/2/her/bli_her_front.h rename frame/{1/axpyv/bli_axpyv_kernel.h => 2/her/bli_her_var.h} (65%) create mode 100644 frame/2/her/bli_her_var_oapi.c rename frame/2/her/{ => old}/bli_her_blk_var1.h (98%) rename frame/2/her/{ => old}/bli_her_blk_var2.h (98%) rename frame/2/her/{ => old}/bli_her_check.c (99%) rename frame/2/her/{ => old}/bli_her_check.h (98%) create mode 100644 frame/2/her/old/bli_her_unb_var1.c rename frame/2/her/{ => old}/bli_her_unb_var1.h (98%) create mode 100644 frame/2/her/old/bli_her_unb_var2.c rename frame/2/her/{ => old}/bli_her_unb_var2.h (98%) rename frame/2/her2/{bli_her2.c => bli_her2_front.c} (78%) create mode 100644 frame/2/her2/bli_her2_front.h create mode 100644 frame/2/her2/bli_her2_var.h create mode 100644 frame/2/her2/bli_her2_var_oapi.c rename frame/2/her2/{ => old}/bli_her2_blk_var1.h (98%) rename frame/2/her2/{ => old}/bli_her2_blk_var2.h (98%) rename frame/2/her2/{ => old}/bli_her2_blk_var3.h (98%) rename frame/2/her2/{ => old}/bli_her2_blk_var4.h (98%) rename frame/2/her2/{ => old}/bli_her2_check.c (86%) rename frame/2/her2/{ => old}/bli_her2_check.h (74%) create mode 100644 frame/2/her2/old/bli_her2_cntx.c rename frame/{1f/axpyf/bli_axpyf_fusefac.h => 2/her2/old/bli_her2_cntx.h} (94%) create mode 100644 frame/2/her2/old/bli_her2_unb_var1.c rename frame/2/her2/{ => old}/bli_her2_unb_var1.h (90%) create mode 100644 frame/2/her2/old/bli_her2_unb_var2.c rename frame/2/her2/{ => old}/bli_her2_unb_var2.h (90%) create mode 100644 frame/2/her2/old/bli_her2_unb_var3.c rename frame/2/her2/{ => old}/bli_her2_unb_var3.h (90%) create mode 100644 frame/2/her2/old/bli_her2_unb_var4.c rename frame/2/her2/{ => old}/bli_her2_unb_var4.h (90%) create mode 100644 frame/2/her2/old/bli_her2_unf_var1.c rename frame/2/her2/{ => old}/bli_her2_unf_var1.h (90%) create mode 100644 frame/2/her2/old/bli_her2_unf_var4.c rename frame/2/her2/{ => old}/bli_her2_unf_var4.h (90%) rename frame/2/symv/{bli_symv.c => bli_symv_front.c} (78%) create mode 100644 frame/2/symv/bli_symv_front.h rename frame/2/symv/{ => old}/bli_symv_check.c (100%) rename frame/2/symv/{ => old}/bli_symv_check.h (100%) rename frame/2/syr/{bli_syr.c => bli_syr_front.c} (80%) create mode 100644 frame/2/syr/bli_syr_front.h rename frame/2/syr/{ => old}/bli_syr_check.c (100%) rename frame/2/syr/{ => old}/bli_syr_check.h (100%) rename frame/2/syr2/{bli_syr2.c => bli_syr2_front.c} (77%) create mode 100644 frame/2/syr2/bli_syr2_front.h rename frame/2/syr2/{ => old}/bli_syr2_check.c (87%) rename frame/2/syr2/{ => old}/bli_syr2_check.h (84%) rename frame/2/trmv/{bli_trmv.c => bli_trmv_front.c} (78%) create mode 100644 frame/2/trmv/bli_trmv_front.h create mode 100644 frame/2/trmv/bli_trmv_var.h create mode 100644 frame/2/trmv/bli_trmv_var_oapi.c rename frame/2/trmv/{ => old}/bli_trmv_check.c (98%) rename frame/2/trmv/{ => old}/bli_trmv_check.h (98%) rename frame/2/trmv/{ => old}/bli_trmv_l_blk_var1.h (97%) rename frame/2/trmv/{ => old}/bli_trmv_l_blk_var2.h (97%) rename frame/2/trmv/{ => old}/bli_trmv_u_blk_var1.h (97%) rename frame/2/trmv/{ => old}/bli_trmv_u_blk_var2.h (97%) create mode 100644 frame/2/trmv/old/bli_trmv_unb_var1.c rename frame/2/trmv/{ => old}/bli_trmv_unb_var1.h (98%) create mode 100644 frame/2/trmv/old/bli_trmv_unb_var2.c rename frame/2/trmv/{ => old}/bli_trmv_unb_var2.h (98%) create mode 100644 frame/2/trmv/old/bli_trmv_unf_var1.c rename frame/2/trmv/{ => old}/bli_trmv_unf_var1.h (98%) create mode 100644 frame/2/trmv/old/bli_trmv_unf_var2.c rename frame/2/trmv/{ => old}/bli_trmv_unf_var2.h (98%) rename frame/2/trsv/{bli_trsv.c => bli_trsv_front.c} (77%) create mode 100644 frame/2/trsv/bli_trsv_front.h create mode 100644 frame/2/trsv/bli_trsv_var.h create mode 100644 frame/2/trsv/bli_trsv_var_oapi.c rename frame/2/trsv/{ => old}/bli_trsv_check.c (98%) rename frame/2/trsv/{ => old}/bli_trsv_check.h (98%) rename frame/2/trsv/{ => old}/bli_trsv_l_blk_var1.h (97%) rename frame/2/trsv/{ => old}/bli_trsv_l_blk_var2.h (97%) rename frame/2/trsv/{ => old}/bli_trsv_u_blk_var1.h (97%) rename frame/2/trsv/{ => old}/bli_trsv_u_blk_var2.h (97%) create mode 100644 frame/2/trsv/old/bli_trsv_unb_var1.c rename frame/2/trsv/{ => old}/bli_trsv_unb_var1.h (98%) create mode 100644 frame/2/trsv/old/bli_trsv_unb_var2.c rename frame/2/trsv/{ => old}/bli_trsv_unb_var2.h (98%) create mode 100644 frame/2/trsv/old/bli_trsv_unf_var1.c rename frame/2/trsv/{ => old}/bli_trsv_unf_var1.h (98%) create mode 100644 frame/2/trsv/old/bli_trsv_unf_var2.c rename frame/2/trsv/{ => old}/bli_trsv_unf_var2.h (98%) rename frame/{1/addv/bli_addv_kernel.h => 3/bli_l3.h} (70%) create mode 100644 frame/3/bli_l3_blocksize.c create mode 100644 frame/3/bli_l3_blocksize.h create mode 100644 frame/3/bli_l3_check.c rename frame/{ind/query/bli_ind_query.h => 3/bli_l3_check.h} (53%) create mode 100644 frame/3/bli_l3_cntx.c rename frame/{1f/axpyf/bli_axpyf_fusefac.c => 3/bli_l3_cntx.h} (87%) create mode 100644 frame/3/bli_l3_ft.h create mode 100644 frame/3/bli_l3_oapi.c create mode 100644 frame/3/bli_l3_oapi.h create mode 100644 frame/3/bli_l3_oapi_wc.c create mode 100644 frame/3/bli_l3_oapi_woc.c create mode 100644 frame/3/bli_l3_oft.h create mode 100644 frame/3/bli_l3_prune.c rename frame/{1/invertv/bli_invertv_kernel.h => 3/bli_l3_prune.h} (82%) create mode 100644 frame/3/bli_l3_tapi.c create mode 100644 frame/3/bli_l3_tapi.h create mode 100644 frame/3/bli_l3_ukr.h create mode 100644 frame/3/bli_l3_ukr_oapi.c rename frame/{1/scalv/bli_scalv_kernel.h => 3/bli_l3_ukr_oapi.h} (69%) create mode 100644 frame/3/bli_l3_ukr_tapi.c create mode 100644 frame/3/bli_l3_ukr_tapi.h create mode 100644 frame/3/gemm/bli_gemm_var.h rename frame/3/gemm/{ => old}/bli_gemm_blk_var1f.h (97%) rename frame/3/gemm/{ => old}/bli_gemm_blk_var2f.h (97%) rename frame/3/gemm/{ => old}/bli_gemm_blk_var3f.h (97%) create mode 100644 frame/3/gemm/old/bli_gemm_cntx.c create mode 100644 frame/3/gemm/old/bli_gemm_cntx.h rename frame/3/gemm/{ => old}/bli_gemm_ker_var2.h (97%) delete mode 100644 frame/3/gemm/other/bli_gemm_cntl_exp.c create mode 100644 frame/3/herk/bli_herk_var.h rename frame/3/herk/{ => old}/bli_herk_blk_var1f.h (97%) rename frame/3/herk/{ => old}/bli_herk_blk_var2f.h (97%) rename frame/3/herk/{ => old}/bli_herk_blk_var3f.h (97%) rename frame/3/herk/{ => old}/bli_herk_l_ker_var2.h (97%) rename frame/3/herk/{ => old}/bli_herk_u_ker_var2.h (97%) rename frame/3/{gemm => old}/bli_gemm.c (90%) rename frame/3/{gemm => old}/bli_gemm_blocksize.c (85%) rename frame/3/{gemm => old}/bli_gemm_blocksize.h (91%) rename frame/3/{gemm => old}/bli_gemm_check.c (99%) rename frame/3/{gemm => old}/bli_gemm_check.h (98%) rename frame/3/{gemm => old}/bli_gemm_ukernel.c (96%) rename frame/3/{gemm => old}/bli_gemm_ukernel.h (95%) rename frame/3/{gemm/ukernels => old}/bli_gemm_ukr_ref.h (100%) rename frame/3/{trsm/ukernels => old}/bli_gemmtrsm_l_ukr_ref.c (100%) rename frame/3/{trsm/ukernels => old}/bli_gemmtrsm_l_ukr_ref.h (100%) rename frame/3/{trsm/ukernels => old}/bli_gemmtrsm_u_ukr_ref.c (100%) rename frame/3/{trsm/ukernels => old}/bli_gemmtrsm_u_ukr_ref.h (100%) rename frame/3/{trsm => old}/bli_gemmtrsm_ukernel.c (91%) rename frame/3/{trsm => old}/bli_gemmtrsm_ukernel.h (95%) rename frame/3/{hemm => old}/bli_hemm.c (90%) rename frame/3/{hemm => old}/bli_hemm_check.c (99%) rename frame/3/{hemm => old}/bli_hemm_check.h (98%) rename frame/3/{her2k => old}/bli_her2k.c (90%) rename frame/3/{her2k => old}/bli_her2k_check.c (100%) rename frame/3/{her2k => old}/bli_her2k_check.h (100%) rename frame/3/{herk => old}/bli_herk.c (89%) rename frame/3/{herk => old}/bli_herk_check.c (99%) rename frame/3/{herk => old}/bli_herk_check.h (98%) rename frame/3/{herk => old}/bli_herk_prune.c (100%) rename frame/3/{herk => old}/bli_herk_prune.h (100%) rename frame/3/{symm => old}/bli_symm.c (90%) rename frame/3/{symm => old}/bli_symm_check.c (100%) rename frame/3/{symm => old}/bli_symm_check.h (100%) rename frame/3/{syr2k => old}/bli_syr2k.c (90%) rename frame/3/{syr2k => old}/bli_syr2k_check.c (100%) rename frame/3/{syr2k => old}/bli_syr2k_check.h (100%) rename frame/3/{syrk => old}/bli_syrk.c (89%) rename frame/3/{syrk => old}/bli_syrk_check.c (100%) rename frame/3/{syrk => old}/bli_syrk_check.h (100%) rename frame/3/{trmm => old}/bli_trmm.c (89%) rename frame/3/{trmm3 => old}/bli_trmm3.c (90%) rename frame/3/{trmm3 => old}/bli_trmm3_check.c (100%) rename frame/3/{trmm3 => old}/bli_trmm3_check.h (100%) rename frame/3/{trmm => old}/bli_trmm_blocksize.c (82%) rename frame/3/{trmm => old}/bli_trmm_blocksize.h (91%) rename frame/3/{trmm => old}/bli_trmm_check.c (99%) rename frame/3/{trmm => old}/bli_trmm_check.h (98%) rename frame/3/{trmm => old}/bli_trmm_prune.c (100%) rename frame/3/{trmm => old}/bli_trmm_prune.h (100%) rename frame/3/{trsm => old}/bli_trsm.c (88%) rename frame/3/{trsm => old}/bli_trsm_blocksize.c (86%) rename frame/3/{trsm => old}/bli_trsm_blocksize.h (91%) rename frame/3/{trsm => old}/bli_trsm_check.c (100%) rename frame/3/{trsm => old}/bli_trsm_check.h (100%) rename frame/3/{trsm/ukernels => old}/bli_trsm_l_ukr_ref.c (100%) rename frame/3/{trsm/ukernels => old}/bli_trsm_l_ukr_ref.h (100%) rename frame/3/{trsm => old}/bli_trsm_prune.c (100%) rename frame/3/{trsm => old}/bli_trsm_prune.h (100%) rename frame/3/{trsm/ukernels => old}/bli_trsm_u_ukr_ref.c (100%) rename frame/3/{trsm/ukernels => old}/bli_trsm_u_ukr_ref.h (100%) rename frame/3/{trsm => old}/bli_trsm_ukernel.c (90%) rename frame/3/{trsm => old}/bli_trsm_ukernel.h (95%) create mode 100644 frame/3/trmm/bli_trmm_var.h rename frame/3/trmm/{ => old}/bli_trmm_blk_var1f.h (97%) rename frame/3/trmm/{ => old}/bli_trmm_blk_var2b.h (97%) rename frame/3/trmm/{ => old}/bli_trmm_blk_var2f.h (97%) rename frame/3/trmm/{ => old}/bli_trmm_blk_var3b.h (97%) rename frame/3/trmm/{ => old}/bli_trmm_blk_var3f.h (97%) rename frame/3/trmm/{ => old}/bli_trmm_ll_ker_var2.h (95%) rename frame/3/trmm/{ => old}/bli_trmm_lu_ker_var2.h (97%) rename frame/3/trmm/{ => old}/bli_trmm_rl_ker_var2.h (97%) rename frame/3/trmm/{ => old}/bli_trmm_ru_ker_var2.h (97%) delete mode 100644 frame/3/trmm/other/bli_trmm_ll_blk_var1.c delete mode 100644 frame/3/trmm/other/bli_trmm_ll_blk_var1.h delete mode 100644 frame/3/trmm/other/bli_trmm_ll_blk_var4.c delete mode 100644 frame/3/trmm/other/bli_trmm_ll_blk_var4.h delete mode 100644 frame/3/trmm/other/bli_trmm_lu_blk_var1.c delete mode 100644 frame/3/trmm/other/bli_trmm_lu_blk_var1.h delete mode 100644 frame/3/trmm/other/bli_trmm_lu_blk_var4.c delete mode 100644 frame/3/trmm/other/bli_trmm_lu_blk_var4.h create mode 100644 frame/3/trsm/bli_trsm_var.h rename frame/3/trsm/{ => old}/bli_trsm_blk_var1b.h (100%) rename frame/3/trsm/{ => old}/bli_trsm_blk_var1f.h (100%) rename frame/3/trsm/{ => old}/bli_trsm_blk_var2b.h (100%) rename frame/3/trsm/{ => old}/bli_trsm_blk_var2f.h (100%) rename frame/3/trsm/{ => old}/bli_trsm_blk_var3b.h (100%) rename frame/3/trsm/{ => old}/bli_trsm_blk_var3f.h (100%) create mode 100644 frame/3/trsm/old/bli_trsm_cntx.c create mode 100644 frame/3/trsm/old/bli_trsm_cntx.h rename frame/3/trsm/{ => old}/bli_trsm_ll_ker_var2.h (98%) rename frame/3/trsm/{ => old}/bli_trsm_lu_ker_var2.h (98%) rename frame/3/trsm/{ => old}/bli_trsm_rl_ker_var2.h (96%) rename frame/3/trsm/{ => old}/bli_trsm_ru_ker_var2.h (96%) delete mode 100644 frame/3/trsm/other/bli_trsm_l_blk_var4.c delete mode 100644 frame/3/trsm/other/bli_trsm_l_blk_var4.h delete mode 100644 frame/3/trsm/other/bli_trsm_u_blk_var4.c delete mode 100644 frame/3/trsm/other/bli_trsm_u_blk_var4.h rename frame/3/{gemm => }/ukernels/bli_gemm_ukr_ref.c (74%) create mode 100644 frame/3/ukernels/bli_gemmtrsm_ukr_ref.c create mode 100644 frame/3/ukernels/bli_l3_ukr_ref.h create mode 100644 frame/3/ukernels/bli_trsm_ukr_ref.c rename frame/base/{bli_blocksize.c => bli_blksz.c} (65%) rename frame/base/{bli_blocksize.h => bli_blksz.h} (62%) create mode 100644 frame/base/bli_cntx.c create mode 100644 frame/base/bli_cntx.h create mode 100644 frame/base/bli_gks.c create mode 100644 frame/base/bli_gks.h create mode 100644 frame/base/bli_mbool.c rename frame/{ind/query/bli_ukr_query.h => base/bli_mbool.h} (72%) create mode 100644 frame/include/bli_gentdef_macro_defs.h delete mode 100644 frame/include/bli_level3_type_defs.h rename frame/{ind/cntl/bli_gemmind_cntl.h => include/bli_oapi_w_cntx.h} (73%) create mode 100644 frame/include/bli_oapi_wo_cntx.h rename frame/include/{ => old}/bli_kernel_post_macro_defs.h (100%) create mode 100644 frame/include/old/bli_kernel_prototypes.h rename frame/include/{ => old}/bli_kernel_type_defs.h (95%) create mode 100644 frame/ind/bli_ind.c rename frame/ind/{query/bli_ind_query.c => bli_l3_ind.c} (59%) create mode 100644 frame/ind/bli_l3_ind.h delete mode 100644 frame/ind/cntl/bli_gemm3m1_cntl.c delete mode 100644 frame/ind/cntl/bli_gemm3m2_cntl.c delete mode 100644 frame/ind/cntl/bli_gemm3m3_cntl.c delete mode 100644 frame/ind/cntl/bli_gemm3mh_cntl.c delete mode 100644 frame/ind/cntl/bli_gemm4m1_cntl.c delete mode 100644 frame/ind/cntl/bli_gemm4mb_cntl.c delete mode 100644 frame/ind/cntl/bli_gemm4mh_cntl.c delete mode 100644 frame/ind/cntl/bli_trsm3m1_cntl.c delete mode 100644 frame/ind/cntl/bli_trsm4m1_cntl.c create mode 100644 frame/ind/cntx/bli_gemmind_cntx.c create mode 100644 frame/ind/cntx/bli_gemmind_cntx.h create mode 100644 frame/ind/cntx/bli_trsmind_cntx.c rename frame/{1/swapv/bli_swapv_kernel.h => ind/cntx/bli_trsmind_cntx.h} (72%) rename frame/ind/include/{ => old}/bli_kernel_ind_prototypes.h (100%) create mode 100644 frame/ind/oapi/bli_l3_3m4m_oapi.c create mode 100644 frame/ind/oapi/bli_l3_ind_oapi.c rename frame/ind/oapi/{bli_oapi_ind.h => bli_l3_ind_oapi.h} (77%) create mode 100644 frame/ind/oapi/bli_l3_nat_oapi.c rename frame/ind/oapi/{ => old}/bli_oapi_3m1.c (92%) rename frame/ind/oapi/{ => old}/bli_oapi_3m2.c (93%) rename frame/ind/oapi/{ => old}/bli_oapi_3m3.c (93%) rename frame/ind/oapi/{ => old}/bli_oapi_3mh.c (86%) create mode 100644 frame/ind/oapi/old/bli_oapi_4m1.c rename frame/ind/oapi/{ => old}/bli_oapi_4mb.c (93%) rename frame/ind/oapi/{ => old}/bli_oapi_4mh.c (83%) rename frame/ind/oapi/{bli_oapi_4m1.c => old/bli_oapi_nat.c.old} (74%) delete mode 100644 frame/ind/query/bli_bsv_query.c delete mode 100644 frame/ind/query/bli_ukr_query.c rename frame/ind/tapi/{bli_tapi_ind.c => bli_l3_ind_tapi.c} (69%) create mode 100644 frame/ind/tapi/bli_l3_ind_tapi.h delete mode 100644 frame/ind/tapi/bli_tapi_ind.h rename frame/{ind/cntl/bli_trsmind_cntl.h => util/bli_util.h} (86%) create mode 100644 frame/util/bli_util_check.c create mode 100644 frame/util/bli_util_check.h create mode 100644 frame/util/bli_util_oapi.c create mode 100644 frame/util/bli_util_oapi.h create mode 100644 frame/util/bli_util_oapi_wc.c create mode 100644 frame/util/bli_util_oapi_woc.c create mode 100644 frame/util/bli_util_tapi.c create mode 100644 frame/util/bli_util_tapi.h create mode 100644 frame/util/bli_util_unb_var1.c create mode 100644 frame/util/bli_util_unb_var1.h rename frame/util/{ => old}/amaxv/bli_amaxv.c (100%) rename frame/util/{ => old}/amaxv/bli_amaxv.h (100%) rename frame/util/{ => old}/amaxv/bli_amaxv_check.c (100%) rename frame/util/{ => old}/amaxv/bli_amaxv_check.h (100%) rename frame/util/{ => old}/amaxv/bli_amaxv_unb_var1.c (75%) rename frame/util/{ => old}/amaxv/bli_amaxv_unb_var1.h (87%) rename frame/util/{ => old}/asumv/bli_asumv.c (100%) rename frame/util/{ => old}/asumv/bli_asumv.h (100%) rename frame/util/{ => old}/asumv/bli_asumv_check.c (100%) rename frame/util/{ => old}/asumv/bli_asumv_check.h (100%) rename frame/util/{ => old}/asumv/bli_asumv_unb_var1.c (82%) rename frame/util/{ => old}/asumv/bli_asumv_unb_var1.h (100%) rename frame/util/{ => old}/mkherm/bli_mkherm.c (100%) rename frame/util/{ => old}/mkherm/bli_mkherm.h (100%) rename frame/util/{ => old}/mkherm/bli_mkherm_check.c (100%) rename frame/util/{ => old}/mkherm/bli_mkherm_check.h (100%) rename frame/util/{ => old}/mkherm/bli_mkherm_unb_var1.c (85%) rename frame/util/{ => old}/mkherm/bli_mkherm_unb_var1.h (100%) rename frame/util/{ => old}/mksymm/bli_mksymm.c (100%) rename frame/util/{ => old}/mksymm/bli_mksymm.h (100%) rename frame/util/{ => old}/mksymm/bli_mksymm_check.c (100%) rename frame/util/{ => old}/mksymm/bli_mksymm_check.h (100%) rename frame/util/{ => old}/mksymm/bli_mksymm_unb_var1.c (90%) rename frame/util/{ => old}/mksymm/bli_mksymm_unb_var1.h (100%) rename frame/util/{ => old}/mktrim/bli_mktrim.c (100%) rename frame/util/{ => old}/mktrim/bli_mktrim.h (100%) rename frame/util/{ => old}/mktrim/bli_mktrim_check.c (100%) rename frame/util/{ => old}/mktrim/bli_mktrim_check.h (100%) rename frame/util/{ => old}/mktrim/bli_mktrim_unb_var1.c (85%) rename frame/util/{ => old}/mktrim/bli_mktrim_unb_var1.h (100%) rename frame/util/{ => old}/norm1m/bli_norm1m.c (100%) rename frame/util/{ => old}/norm1m/bli_norm1m.h (100%) rename frame/util/{ => old}/norm1m/bli_norm1m_check.c (100%) rename frame/util/{ => old}/norm1m/bli_norm1m_check.h (100%) rename frame/util/{ => old}/norm1m/bli_norm1m_unb_var1.c (73%) rename frame/util/{ => old}/norm1m/bli_norm1m_unb_var1.h (80%) rename frame/util/{ => old}/norm1v/bli_norm1v.c (100%) rename frame/util/{ => old}/norm1v/bli_norm1v.h (100%) rename frame/util/{ => old}/norm1v/bli_norm1v_check.c (100%) rename frame/util/{ => old}/norm1v/bli_norm1v_check.h (100%) rename frame/util/{ => old}/norm1v/bli_norm1v_unb_var1.c (81%) rename frame/util/{ => old}/norm1v/bli_norm1v_unb_var1.h (87%) rename frame/util/{ => old}/normfm/bli_normfm.c (100%) rename frame/util/{ => old}/normfm/bli_normfm.h (100%) rename frame/util/{ => old}/normfm/bli_normfm_check.c (100%) rename frame/util/{ => old}/normfm/bli_normfm_check.h (100%) rename frame/util/{ => old}/normfm/bli_normfm_unb_var1.c (70%) rename frame/util/{ => old}/normfm/bli_normfm_unb_var1.h (77%) rename frame/util/{ => old}/normfv/bli_normfv.c (100%) rename frame/util/{ => old}/normfv/bli_normfv.h (100%) rename frame/util/{ => old}/normfv/bli_normfv_check.c (100%) rename frame/util/{ => old}/normfv/bli_normfv_check.h (100%) rename frame/util/{ => old}/normfv/bli_normfv_unb_var1.c (73%) rename frame/util/{ => old}/normfv/bli_normfv_unb_var1.h (84%) rename frame/util/{ => old}/normim/bli_normim.c (100%) rename frame/util/{ => old}/normim/bli_normim.h (100%) rename frame/util/{ => old}/normim/bli_normim_check.c (100%) rename frame/util/{ => old}/normim/bli_normim_check.h (100%) rename frame/util/{ => old}/normim/bli_normim_unb_var1.c (82%) rename frame/util/{ => old}/normim/bli_normim_unb_var1.h (80%) rename frame/util/{ => old}/normiv/bli_normiv.c (100%) rename frame/util/{ => old}/normiv/bli_normiv.h (100%) rename frame/util/{ => old}/normiv/bli_normiv_check.c (100%) rename frame/util/{ => old}/normiv/bli_normiv_check.h (100%) rename frame/util/{ => old}/normiv/bli_normiv_unb_var1.c (82%) rename frame/util/{ => old}/normiv/bli_normiv_unb_var1.h (87%) rename frame/util/{ => old}/printm/bli_fprintm.c (92%) rename frame/util/{ => old}/printm/bli_fprintm.h (84%) rename frame/util/{ => old}/printm/bli_fprintm_check.c (100%) rename frame/util/{ => old}/printm/bli_fprintm_check.h (100%) rename frame/util/{ => old}/printm/bli_printm.c (78%) rename frame/util/{ => old}/printm/bli_printm.h (86%) rename frame/util/{ => old}/printv/bli_fprintv.c (91%) rename frame/util/{ => old}/printv/bli_fprintv.h (86%) rename frame/util/{ => old}/printv/bli_fprintv_check.c (100%) rename frame/util/{ => old}/printv/bli_fprintv_check.h (100%) rename frame/util/{ => old}/printv/bli_printv.c (81%) rename frame/util/{ => old}/printv/bli_printv.h (88%) rename frame/util/{ => old}/randm/bli_randm.c (100%) rename frame/util/{ => old}/randm/bli_randm.h (100%) rename frame/util/{ => old}/randm/bli_randm_check.c (100%) rename frame/util/{ => old}/randm/bli_randm_check.h (100%) rename frame/util/{ => old}/randm/bli_randm_unb_var1.c (80%) rename frame/util/{ => old}/randm/bli_randm_unb_var1.h (100%) rename frame/util/{ => old}/randv/bli_randv.c (100%) rename frame/util/{ => old}/randv/bli_randv.h (100%) rename frame/util/{ => old}/randv/bli_randv_check.c (100%) rename frame/util/{ => old}/randv/bli_randv_check.h (100%) rename frame/util/{ => old}/randv/bli_randv_unb_var1.c (100%) rename frame/util/{ => old}/randv/bli_randv_unb_var1.h (99%) rename frame/util/{ => old}/sumsqv/bli_sumsqv.c (100%) rename frame/util/{ => old}/sumsqv/bli_sumsqv.h (100%) rename frame/util/{ => old}/sumsqv/bli_sumsqv_check.c (100%) rename frame/util/{ => old}/sumsqv/bli_sumsqv_check.h (100%) rename frame/util/{ => old}/sumsqv/bli_sumsqv_unb_var1.c (81%) rename frame/util/{ => old}/sumsqv/bli_sumsqv_unb_var1.h (86%) rename kernels/arm/{neon => }/3/bli_gemm_opt_4x4.c (85%) rename kernels/armv8a/{neon => }/3/bli_gemm_opt_4x4.c (100%) delete mode 100644 kernels/bgq/3/bli_gemm_8x8.h rename kernels/bgq/3/{bli_gemm_8x8.c => bli_gemm_int_8x8.c} (94%) rename kernels/c99/3/{bli_gemm_ref_4x4.c => bli_gemm_c99_4x4.c} (92%) rename kernels/c99/3/{bli_gemmtrsm_u_ref_4x4.c => bli_gemmtrsm_l_c99_4x4.c} (65%) rename kernels/c99/3/{bli_gemmtrsm_l_ref_4x4.c => bli_gemmtrsm_u_c99_4x4.c} (65%) rename kernels/c99/3/{bli_trsm_l_ref_4x4.c => bli_trsm_l_c99_4x4.c} (93%) rename kernels/c99/3/{bli_trsm_u_ref_4x4.c => bli_trsm_u_c99_4x4.c} (93%) rename kernels/{ => old}/x86/1m/bli_packm_2xk.c (100%) rename kernels/{ => old}/x86/1m/bli_packm_2xk.h (100%) rename kernels/{ => old}/x86/1m/bli_packm_4xk.c (100%) rename kernels/{ => old}/x86/1m/bli_packm_4xk.h (100%) rename kernels/{ => old}/x86/3/bli_gemm_opt_d2x4.c (100%) rename kernels/{ => old}/x86/3/bli_gemm_opt_d4x2.c (100%) rename kernels/{ => old}/x86/3/bli_gemmtrsm_l_opt_d4x2.c (100%) rename kernels/{ => old}/x86/3/bli_gemmtrsm_u_opt_d4x2.c (100%) rename kernels/{ => old}/x86/3/bli_trsm_l_opt_d4x2.c (100%) delete mode 100644 kernels/x86/3/bli_gemm_opt_d2x4.h delete mode 100644 kernels/x86/3/bli_gemm_opt_d4x2.h delete mode 100644 kernels/x86/3/bli_gemmtrsm_l_opt_d4x2.h delete mode 100644 kernels/x86/3/bli_gemmtrsm_u_opt_d4x2.h delete mode 100644 kernels/x86/3/bli_trsm_l_opt_d4x2.h rename kernels/x86_64/bulldozer/3/{bli_gemm_4x6_FMA4.c => bli_gemm_asm_d4x6_fma4.c} (98%) rename kernels/x86_64/{avx2 => haswell}/3/bli_gemm_asm_d12x4.c (97%) rename kernels/x86_64/{avx2 => haswell}/3/bli_gemm_asm_d8x6.c (97%) rename kernels/x86_64/{core2-sse3/1/bli_axpyv_opt_var1.c => penryn/1/bli_axpyv_int_var1.c} (93%) rename kernels/x86_64/{core2-sse3/1/bli_dotv_opt_var1.c => penryn/1/bli_dotv_int_var1.c} (91%) rename kernels/x86_64/{core2-sse3 => penryn}/1f/bli_axpy2v_opt_var1.c (93%) rename kernels/x86_64/{core2-sse3 => penryn}/1f/bli_axpyf_opt_var1.c (92%) rename kernels/x86_64/{core2-sse3 => penryn}/1f/bli_dotaxpyv_opt_var1.c (89%) rename kernels/x86_64/{core2-sse3 => penryn}/1f/bli_dotxaxpyf_opt_var1.c (92%) rename kernels/x86_64/{core2-sse3 => penryn}/1f/bli_dotxf_opt_var1.c (93%) rename kernels/x86_64/{core2-sse3/1f => penryn/1f/old}/bli_axpyf_opt_var1.c.alt (100%) rename kernels/x86_64/{core2-sse3/1f => penryn/1f/old}/bli_dotxf_opt_var1.c.alt (100%) rename kernels/x86_64/{core2-sse3/3/bli_gemm_opt_d4x4.c => penryn/3/bli_gemm_asm_d4x4.c} (97%) rename kernels/x86_64/{core2-sse3/3/bli_gemmtrsm_l_opt_d4x4.c => penryn/3/bli_gemmtrsm_l_asm_d4x4.c} (89%) rename kernels/x86_64/{core2-sse3/3/bli_gemmtrsm_u_opt_d4x4.c => penryn/3/bli_gemmtrsm_u_asm_d4x4.c} (88%) rename kernels/x86_64/{core2-sse3/3/bli_trsm_l_opt_d4x4.c => penryn/3/bli_trsm_l_asm_d4x4.c} (86%) rename kernels/x86_64/{core2-sse3/3/bli_trsm_u_opt_d4x4.c => penryn/3/bli_trsm_u_asm_d4x4.c} (86%) rename kernels/x86_64/piledriver/3/{bli_gemm_new_d8x3.c => bli_gemm_asm_d8x3.c} (98%) rename kernels/x86_64/{avx => sandybridge}/3/bli_gemm_asm_d8x4.c (99%) rename kernels/x86_64/{avx => sandybridge}/3/bli_gemm_int_d8x4.c (89%) diff --git a/config/bgq/bli_kernel.h b/config/bgq/bli_kernel.h index d2c9fe07a..040736d1a 100644 --- a/config/bgq/bli_kernel.h +++ b/config/bgq/bli_kernel.h @@ -144,25 +144,25 @@ // -- Default fusing factors for level-1f operations -- -#define BLIS_L1F_FUSE_FAC_S 8 -#define BLIS_L1F_FUSE_FAC_D 8 -#define BLIS_L1F_FUSE_FAC_C 4 -#define BLIS_L1F_FUSE_FAC_Z 2 +#define BLIS_DEFAULT_1F_S 8 +#define BLIS_DEFAULT_1F_D 8 +#define BLIS_DEFAULT_1F_C 4 +#define BLIS_DEFAULT_1F_Z 2 -#define BLIS_AXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S -#define BLIS_AXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D -#define BLIS_AXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C -#define BLIS_AXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z +#define BLIS_DEFAULT_AF_S BLIS_DEFAULT_1F_S +#define BLIS_DEFAULT_AF_D BLIS_DEFAULT_1F_D +#define BLIS_DEFAULT_AF_C BLIS_DEFAULT_1F_C +#define BLIS_DEFAULT_AF_Z BLIS_DEFAULT_1F_Z -#define BLIS_DOTXF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S -#define BLIS_DOTXF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D -#define BLIS_DOTXF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C -#define BLIS_DOTXF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z +#define BLIS_DEFAULT_DF_S BLIS_DEFAULT_1F_S +#define BLIS_DEFAULT_DF_D BLIS_DEFAULT_1F_D +#define BLIS_DEFAULT_DF_C BLIS_DEFAULT_1F_C +#define BLIS_DEFAULT_DF_Z BLIS_DEFAULT_1F_Z -#define BLIS_DOTXAXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S -#define BLIS_DOTXAXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D -#define BLIS_DOTXAXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C -#define BLIS_DOTXAXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z +#define BLIS_DEFAULT_XF_S BLIS_DEFAULT_1F_S +#define BLIS_DEFAULT_XF_D BLIS_DEFAULT_1F_D +#define BLIS_DEFAULT_XF_C BLIS_DEFAULT_1F_C +#define BLIS_DEFAULT_XF_Z BLIS_DEFAULT_1F_Z @@ -173,8 +173,8 @@ #include "bli_gemm_8x8.h" -#define BLIS_DGEMM_UKERNEL bli_dgemm_8x8 -#define BLIS_ZGEMM_UKERNEL bli_zgemm_8x8 +#define BLIS_DGEMM_UKERNEL bli_dgemm_int_8x8 +#define BLIS_ZGEMM_UKERNEL bli_zgemm_int_8x8 // -- trsm-related -- diff --git a/config/bulldozer/bli_kernel.h b/config/bulldozer/bli_kernel.h index 99480ee2b..b750b6da2 100644 --- a/config/bulldozer/bli_kernel.h +++ b/config/bulldozer/bli_kernel.h @@ -51,87 +51,6 @@ // (b) MR (for zero-padding purposes when MR and NR are "swapped") // -// #define BLIS_DEFAULT_MC_S 128 -// #define BLIS_DEFAULT_KC_S 384 -// #define BLIS_DEFAULT_NC_S 4096 - -#define BLIS_DEFAULT_MC_D 1080 -#define BLIS_DEFAULT_KC_D 120 -#define BLIS_DEFAULT_NC_D 8400 - -// #define BLIS_DEFAULT_MC_C 128 -// #define BLIS_DEFAULT_KC_C 256 -// #define BLIS_DEFAULT_NC_C 4096 -// -// #define BLIS_DEFAULT_MC_Z 64 -// #define BLIS_DEFAULT_KC_Z 256 -// #define BLIS_DEFAULT_NC_Z 2048 - -// -- Register blocksizes -- - -// #define BLIS_DEFAULT_MR_S 8 -// #define BLIS_DEFAULT_NR_S 8 - -#define BLIS_DEFAULT_MR_D 4 -#define BLIS_DEFAULT_NR_D 6 - - // #define BLIS_DEFAULT_MR_C 8 - // #define BLIS_DEFAULT_NR_C 4 - // - // #define BLIS_DEFAULT_MR_Z 8 - // #define BLIS_DEFAULT_NR_Z 4 - -// NOTE: If the micro-kernel, which is typically unrolled to a factor -// of f, handles leftover edge cases (ie: when k % f > 0) then these -// register blocksizes in the k dimension can be defined to 1. - -//#define BLIS_DEFAULT_KR_S 1 -//#define BLIS_DEFAULT_KR_D 1 -//#define BLIS_DEFAULT_KR_C 1 -//#define BLIS_DEFAULT_KR_Z 1 - -// -- Maximum cache blocksizes (for optimizing edge cases) -- - -// NOTE: These cache blocksize "extensions" have the same constraints as -// the corresponding default blocksizes above. When these values are -// larger than the default blocksizes, blocksizes used at edge cases are -// enlarged if such an extension would encompass the remaining portion of -// the matrix dimension. - -//#define BLIS_MAXIMUM_MC_S (BLIS_DEFAULT_MC_S + BLIS_DEFAULT_MC_S/4) -//#define BLIS_MAXIMUM_KC_S (BLIS_DEFAULT_KC_S + BLIS_DEFAULT_KC_S/4) -//#define BLIS_MAXIMUM_NC_S (BLIS_DEFAULT_NC_S + BLIS_DEFAULT_NC_S/4) - -//#define BLIS_MAXIMUM_MC_D (BLIS_DEFAULT_MC_D + BLIS_DEFAULT_MC_D/4) -//#define BLIS_MAXIMUM_KC_D (BLIS_DEFAULT_KC_D + BLIS_DEFAULT_KC_D/4) -//#define BLIS_MAXIMUM_NC_D (BLIS_DEFAULT_NC_D + BLIS_DEFAULT_NC_D/4) - -//#define BLIS_MAXIMUM_MC_C (BLIS_DEFAULT_MC_C + BLIS_DEFAULT_MC_C/4) -//#define BLIS_MAXIMUM_KC_C (BLIS_DEFAULT_KC_C + BLIS_DEFAULT_KC_C/4) -//#define BLIS_MAXIMUM_NC_C (BLIS_DEFAULT_NC_C + BLIS_DEFAULT_NC_C/4) - -//#define BLIS_MAXIMUM_MC_Z (BLIS_DEFAULT_MC_Z + BLIS_DEFAULT_MC_Z/4) -//#define BLIS_MAXIMUM_KC_Z (BLIS_DEFAULT_KC_Z + BLIS_DEFAULT_KC_Z/4) -//#define BLIS_MAXIMUM_NC_Z (BLIS_DEFAULT_NC_Z + BLIS_DEFAULT_NC_Z/4) - -// -- Packing register blocksize (for packed micro-panels) -- - -// NOTE: These register blocksize "extensions" determine whether the -// leading dimensions used within the packed micro-panels are equal to -// or greater than their corresponding register blocksizes above. - -//#define BLIS_PACKDIM_MR_S (BLIS_DEFAULT_MR_S + ...) -//#define BLIS_PACKDIM_NR_S (BLIS_DEFAULT_NR_S + ...) - -//#define BLIS_PACKDIM_MR_D (BLIS_DEFAULT_MR_D + ...) -//#define BLIS_PACKDIM_NR_D (BLIS_DEFAULT_NR_D + ...) - -//#define BLIS_PACKDIM_MR_C (BLIS_DEFAULT_MR_C + ...) -//#define BLIS_PACKDIM_NR_C (BLIS_DEFAULT_NR_C + ...) - -//#define BLIS_PACKDIM_MR_Z (BLIS_DEFAULT_MR_Z + ...) -//#define BLIS_PACKDIM_NR_Z (BLIS_DEFAULT_NR_Z + ...) - @@ -149,23 +68,28 @@ // -- gemm -- -#define BLIS_SGEMM_UKERNEL bli_sgemm_8x8_FMA4 +#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_8x8_fma4 #define BLIS_DEFAULT_MC_S 128 #define BLIS_DEFAULT_KC_S 384 #define BLIS_DEFAULT_NC_S 4096 #define BLIS_DEFAULT_MR_S 8 #define BLIS_DEFAULT_NR_S 8 -#define BLIS_DGEMM_UKERNEL bli_dgemm_4x6_FMA4 +#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_4x6_fma4 +#define BLIS_DEFAULT_MC_D 1080 +#define BLIS_DEFAULT_KC_D 120 +#define BLIS_DEFAULT_NC_D 8400 +#define BLIS_DEFAULT_MR_D 4 +#define BLIS_DEFAULT_NR_D 6 -#define BLIS_CGEMM_UKERNEL bli_cgemm_8x4_FMA4 +#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_8x4_fma4 #define BLIS_DEFAULT_MC_C 96 #define BLIS_DEFAULT_KC_C 256 #define BLIS_DEFAULT_NC_C 4096 #define BLIS_DEFAULT_MR_C 8 #define BLIS_DEFAULT_NR_C 4 -#define BLIS_ZGEMM_UKERNEL bli_zgemm_4x4_FMA4 +#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_4x4_fma4 #define BLIS_DEFAULT_MC_Z 64 #define BLIS_DEFAULT_KC_Z 192 #define BLIS_DEFAULT_NC_Z 4096 diff --git a/config/carrizo/bli_kernel.h b/config/carrizo/bli_kernel.h index cdd1301b8..241f08a81 100644 --- a/config/carrizo/bli_kernel.h +++ b/config/carrizo/bli_kernel.h @@ -51,28 +51,28 @@ // (b) MR (for zero-padding purposes when MR and NR are "swapped") // -#define BLIS_SGEMM_UKERNEL bli_sgemm_new_16x3 +#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_16x3 #define BLIS_DEFAULT_MC_S 528 #define BLIS_DEFAULT_KC_S 256 #define BLIS_DEFAULT_NC_S 8400 #define BLIS_DEFAULT_MR_S 16 #define BLIS_DEFAULT_NR_S 3 -#define BLIS_DGEMM_UKERNEL bli_dgemm_new_8x3 +#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_8x3 #define BLIS_DEFAULT_MC_D 264 #define BLIS_DEFAULT_KC_D 256 #define BLIS_DEFAULT_NC_D 8400 #define BLIS_DEFAULT_MR_D 8 #define BLIS_DEFAULT_NR_D 3 -#define BLIS_CGEMM_UKERNEL bli_cgemm_new_4x2 +#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_4x2 #define BLIS_DEFAULT_MC_C 264 #define BLIS_DEFAULT_KC_C 256 #define BLIS_DEFAULT_NC_C 8400 #define BLIS_DEFAULT_MR_C 4 #define BLIS_DEFAULT_NR_C 2 -#define BLIS_ZGEMM_UKERNEL bli_zgemm_new_2x2 +#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_2x2 #define BLIS_DEFAULT_MC_Z 100 #define BLIS_DEFAULT_KC_Z 320 #define BLIS_DEFAULT_NC_Z 8400 diff --git a/config/cortex-a15/kernels b/config/cortex-a15/kernels index 7a25007de..8528d8234 120000 --- a/config/cortex-a15/kernels +++ b/config/cortex-a15/kernels @@ -1 +1 @@ -../../kernels/arm/neon \ No newline at end of file +../../kernels/arm \ No newline at end of file diff --git a/config/cortex-a9/kernels b/config/cortex-a9/kernels index 7a25007de..8528d8234 120000 --- a/config/cortex-a9/kernels +++ b/config/cortex-a9/kernels @@ -1 +1 @@ -../../kernels/arm/neon \ No newline at end of file +../../kernels/arm \ No newline at end of file diff --git a/config/dunnington/bli_kernel.h b/config/dunnington/bli_kernel.h index 8dba1f7ac..f256bcf55 100644 --- a/config/dunnington/bli_kernel.h +++ b/config/dunnington/bli_kernel.h @@ -67,26 +67,6 @@ //#define BLIS_DEFAULT_KC_Z 384 //#define BLIS_DEFAULT_NC_Z 4096 -// NOTE: If 4m blocksizes are not defined here, they will be determined -// from the corresponding real domain blocksizes. -#define BLIS_DEFAULT_4M_MC_C 384 -#define BLIS_DEFAULT_4M_KC_C 512 -#define BLIS_DEFAULT_4M_NC_C 4096 - -#define BLIS_DEFAULT_4M_MC_Z 192 -#define BLIS_DEFAULT_4M_KC_Z 256 -#define BLIS_DEFAULT_4M_NC_Z 4096 - -// NOTE: If 3m blocksizes are not defined here, they will be determined -// from the corresponding real domain blocksizes. -#define BLIS_DEFAULT_3M_MC_C 384 -#define BLIS_DEFAULT_3M_KC_C 512 -#define BLIS_DEFAULT_3M_NC_C 4096 - -#define BLIS_DEFAULT_3M_MC_Z 192 -#define BLIS_DEFAULT_3M_KC_Z 256 -#define BLIS_DEFAULT_3M_NC_Z 4096 - // -- Register blocksizes -- #define BLIS_DEFAULT_MR_S 8 @@ -101,56 +81,6 @@ #define BLIS_DEFAULT_MR_Z 2 #define BLIS_DEFAULT_NR_Z 2 -// NOTE: If the micro-kernel, which is typically unrolled to a factor -// of f, handles leftover edge cases (ie: when k % f > 0) then these -// register blocksizes in the k dimension can be defined to 1. - -//#define BLIS_DEFAULT_KR_S 1 -//#define BLIS_DEFAULT_KR_D 1 -//#define BLIS_DEFAULT_KR_C 1 -//#define BLIS_DEFAULT_KR_Z 1 - -// -- Maximum cache blocksizes (for optimizing edge cases) -- - -// NOTE: These cache blocksize "extensions" have the same constraints as -// the corresponding default blocksizes above. When these values are -// larger than the default blocksizes, blocksizes used at edge cases are -// enlarged if such an extension would encompass the remaining portion of -// the matrix dimension. - -//#define BLIS_MAXIMUM_MC_S (BLIS_DEFAULT_MC_S + BLIS_DEFAULT_MC_S/4) -//#define BLIS_MAXIMUM_KC_S (BLIS_DEFAULT_KC_S + BLIS_DEFAULT_KC_S/4) -//#define BLIS_MAXIMUM_NC_S (BLIS_DEFAULT_NC_S + BLIS_DEFAULT_NC_S/4) - -//#define BLIS_MAXIMUM_MC_D (BLIS_DEFAULT_MC_D + BLIS_DEFAULT_MC_D/4) -//#define BLIS_MAXIMUM_KC_D (BLIS_DEFAULT_KC_D + BLIS_DEFAULT_KC_D/4) -//#define BLIS_MAXIMUM_NC_D (BLIS_DEFAULT_NC_D + BLIS_DEFAULT_NC_D/4) - -//#define BLIS_MAXIMUM_MC_C (BLIS_DEFAULT_MC_C + BLIS_DEFAULT_MC_C/4) -//#define BLIS_MAXIMUM_KC_C (BLIS_DEFAULT_KC_C + BLIS_DEFAULT_KC_C/4) -//#define BLIS_MAXIMUM_NC_C (BLIS_DEFAULT_NC_C + BLIS_DEFAULT_NC_C/4) - -//#define BLIS_MAXIMUM_MC_Z (BLIS_DEFAULT_MC_Z + BLIS_DEFAULT_MC_Z/4) -//#define BLIS_MAXIMUM_KC_Z (BLIS_DEFAULT_KC_Z + BLIS_DEFAULT_KC_Z/4) -//#define BLIS_MAXIMUM_NC_Z (BLIS_DEFAULT_NC_Z + BLIS_DEFAULT_NC_Z/4) - -// -- Packing register blocksize (for packed micro-panels) -- - -// NOTE: These register blocksize "extensions" determine whether the -// leading dimensions used within the packed micro-panels are equal to -// or greater than their corresponding register blocksizes above. - -//#define BLIS_PACKDIM_MR_S (BLIS_DEFAULT_MR_S + ...) -//#define BLIS_PACKDIM_NR_S (BLIS_DEFAULT_NR_S + ...) - -//#define BLIS_PACKDIM_MR_D (BLIS_DEFAULT_MR_D + ...) -//#define BLIS_PACKDIM_NR_D (BLIS_DEFAULT_NR_D + ...) - -//#define BLIS_PACKDIM_MR_C (BLIS_DEFAULT_MR_C + ...) -//#define BLIS_PACKDIM_NR_C (BLIS_DEFAULT_NR_C + ...) - -//#define BLIS_PACKDIM_MR_Z (BLIS_DEFAULT_MR_Z + ...) -//#define BLIS_PACKDIM_NR_Z (BLIS_DEFAULT_NR_Z + ...) @@ -169,13 +99,13 @@ // -- gemm -- -#define BLIS_SGEMM_UKERNEL bli_sgemm_opt_8x4 -#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_4x4 +#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_8x4 +#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_4x4 // -- trsm-related -- -#define BLIS_DGEMMTRSM_L_UKERNEL bli_dgemmtrsm_l_opt_4x4 -#define BLIS_DGEMMTRSM_U_UKERNEL bli_dgemmtrsm_u_opt_4x4 +#define BLIS_DGEMMTRSM_L_UKERNEL bli_dgemmtrsm_l_asm_4x4 +#define BLIS_DGEMMTRSM_U_UKERNEL bli_dgemmtrsm_u_asm_4x4 @@ -184,23 +114,23 @@ // -- axpy2v -- -#define BLIS_DAXPY2V_KERNEL bli_daxpy2v_opt_var1 +#define BLIS_DAXPY2V_KERNEL bli_daxpy2v_int_var1 // -- dotaxpyv -- -#define BLIS_DDOTAXPYV_KERNEL bli_ddotaxpyv_opt_var1 +#define BLIS_DDOTAXPYV_KERNEL bli_ddotaxpyv_int_var1 // -- axpyf -- -#define BLIS_DAXPYF_KERNEL bli_daxpyf_opt_var1 +#define BLIS_DAXPYF_KERNEL bli_daxpyf_int_var1 // -- dotxf -- -#define BLIS_DDOTXF_KERNEL bli_ddotxf_opt_var1 +#define BLIS_DDOTXF_KERNEL bli_ddotxf_int_var1 // -- dotxaxpyf -- -#define BLIS_DDOTXAXPYF_KERNEL bli_ddotxaxpyf_opt_var1 +#define BLIS_DDOTXAXPYF_KERNEL bli_ddotxaxpyf_int_var1 diff --git a/config/dunnington/kernels b/config/dunnington/kernels index 462f5cab3..0ce5cd870 120000 --- a/config/dunnington/kernels +++ b/config/dunnington/kernels @@ -1 +1 @@ -../../kernels/x86_64/core2-sse3 \ No newline at end of file +../../kernels/x86_64/penryn \ No newline at end of file diff --git a/config/haswell/bli_kernel.h b/config/haswell/bli_kernel.h index 9df503005..ba0440e64 100644 --- a/config/haswell/bli_kernel.h +++ b/config/haswell/bli_kernel.h @@ -89,21 +89,6 @@ #endif -/* -#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_8x4 -#define BLIS_DEFAULT_MC_C 96 -#define BLIS_DEFAULT_KC_C 256 -#define BLIS_DEFAULT_NC_C 4096 -#define BLIS_DEFAULT_MR_C 8 -#define BLIS_DEFAULT_NR_C 4 - -#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_4x4 -#define BLIS_DEFAULT_MC_Z 64 -#define BLIS_DEFAULT_KC_Z 192 -#define BLIS_DEFAULT_NC_Z 4096 -#define BLIS_DEFAULT_MR_Z 4 -#define BLIS_DEFAULT_NR_Z 4 -*/ diff --git a/config/haswell/kernels b/config/haswell/kernels index 58026848b..53e3cdac8 120000 --- a/config/haswell/kernels +++ b/config/haswell/kernels @@ -1 +1 @@ -../../kernels/x86_64/avx2 \ No newline at end of file +../../kernels/x86_64/haswell \ No newline at end of file diff --git a/config/loongson3a/bli_kernel.h b/config/loongson3a/bli_kernel.h index cf1005fcc..b21a4062f 100644 --- a/config/loongson3a/bli_kernel.h +++ b/config/loongson3a/bli_kernel.h @@ -149,7 +149,7 @@ // -- gemm -- -#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_d4x4 +#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_4x4 // -- trsm-related -- diff --git a/config/mic/bli_config.h b/config/mic/bli_config.h index 36b14cf4c..3e18aa7b1 100644 --- a/config/mic/bli_config.h +++ b/config/mic/bli_config.h @@ -42,6 +42,9 @@ #define BLIS_SIMD_ALIGN_SIZE 32 +#define BLIS_SIMD_SIZE 64 +#define BLIS_SIMD_NUM_REGISTERS 32 + #endif diff --git a/config/mic/bli_kernel.h b/config/mic/bli_kernel.h index 880e97d35..8667bb678 100644 --- a/config/mic/bli_kernel.h +++ b/config/mic/bli_kernel.h @@ -153,8 +153,8 @@ #define BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS -#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_30x8 -#define BLIS_SGEMM_UKERNEL bli_sgemm_opt_30x16 +#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_30x16 +#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_30x8 // -- trsm-related -- diff --git a/config/piledriver/bli_kernel.h b/config/piledriver/bli_kernel.h index 6d41b514a..64ccf3c23 100644 --- a/config/piledriver/bli_kernel.h +++ b/config/piledriver/bli_kernel.h @@ -51,7 +51,7 @@ // (b) MR (for zero-padding purposes when MR and NR are "swapped") // -#define BLIS_SGEMM_UKERNEL bli_sgemm_new_16x3 +#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_16x3 #define BLIS_DEFAULT_MC_S 2016 #define BLIS_DEFAULT_KC_S 128 #define BLIS_DEFAULT_NC_S 8400 @@ -59,7 +59,7 @@ #define BLIS_DEFAULT_NR_S 3 //#define BLIS_UPANEL_B_ALIGN_SIZE_S 4096 -#define BLIS_DGEMM_UKERNEL bli_dgemm_new_8x3 +#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_8x3 //#define BLIS_DEFAULT_MC_D 768 //#define BLIS_DEFAULT_KC_D 168 #define BLIS_DEFAULT_MC_D 1008 @@ -69,14 +69,14 @@ #define BLIS_DEFAULT_NR_D 3 //#define BLIS_UPANEL_B_ALIGN_SIZE_D 4096 -#define BLIS_CGEMM_UKERNEL bli_cgemm_new_4x2 +#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_4x2 #define BLIS_DEFAULT_MC_C 512 #define BLIS_DEFAULT_KC_C 256 #define BLIS_DEFAULT_NC_C 8400 #define BLIS_DEFAULT_MR_C 4 #define BLIS_DEFAULT_NR_C 2 -#define BLIS_ZGEMM_UKERNEL bli_zgemm_new_2x2 +#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_2x2 #define BLIS_DEFAULT_MC_Z 400 #define BLIS_DEFAULT_KC_Z 160 #define BLIS_DEFAULT_NC_Z 8400 diff --git a/config/sandybridge/kernels b/config/sandybridge/kernels index 4fe1379f3..0132dbc87 120000 --- a/config/sandybridge/kernels +++ b/config/sandybridge/kernels @@ -1 +1 @@ -../../kernels/x86_64/avx \ No newline at end of file +../../kernels/x86_64/sandybridge \ No newline at end of file diff --git a/config/template/bli_kernel.h b/config/template/bli_kernel.h index b62626306..999abf6fc 100644 --- a/config/template/bli_kernel.h +++ b/config/template/bli_kernel.h @@ -177,17 +177,17 @@ // be packed here, but this tends to be much too expensive in practice to // actually employ.) -//#define BLIS_DEFAULT_L2_MC_S 1000 -//#define BLIS_DEFAULT_L2_NC_S 1000 +//#define BLIS_DEFAULT_M2_S 1000 +//#define BLIS_DEFAULT_N2_S 1000 -//#define BLIS_DEFAULT_L2_MC_D 1000 -//#define BLIS_DEFAULT_L2_NC_D 1000 +//#define BLIS_DEFAULT_M2_D 1000 +//#define BLIS_DEFAULT_N2_D 1000 -//#define BLIS_DEFAULT_L2_MC_C 1000 -//#define BLIS_DEFAULT_L2_NC_C 1000 +//#define BLIS_DEFAULT_M2_C 1000 +//#define BLIS_DEFAULT_N2_C 1000 -//#define BLIS_DEFAULT_L2_MC_Z 1000 -//#define BLIS_DEFAULT_L2_NC_Z 1000 +//#define BLIS_DEFAULT_M2_Z 1000 +//#define BLIS_DEFAULT_N2_Z 1000 @@ -196,25 +196,25 @@ // -- Default fusing factors for level-1f operations -- -//#define BLIS_L1F_FUSE_FAC_S 8 -//#define BLIS_L1F_FUSE_FAC_D 4 -//#define BLIS_L1F_FUSE_FAC_C 4 -//#define BLIS_L1F_FUSE_FAC_Z 2 +//#define BLIS_DEFAULT_1F_S 8 +//#define BLIS_DEFAULT_1F_D 4 +//#define BLIS_DEFAULT_1F_C 4 +//#define BLIS_DEFAULT_1F_Z 2 -//#define BLIS_AXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S -//#define BLIS_AXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D -//#define BLIS_AXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C -//#define BLIS_AXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z +//#define BLIS_DEFAULT_AF_S BLIS_DEFAULT_1F_S +//#define BLIS_DEFAULT_AF_D BLIS_DEFAULT_1F_D +//#define BLIS_DEFAULT_AF_C BLIS_DEFAULT_1F_C +//#define BLIS_DEFAULT_AF_Z BLIS_DEFAULT_1F_Z -//#define BLIS_DOTXF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S -//#define BLIS_DOTXF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D -//#define BLIS_DOTXF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C -//#define BLIS_DOTXF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z +//#define BLIS_DEFAULT_DF_S BLIS_DEFAULT_1F_S +//#define BLIS_DEFAULT_DF_D BLIS_DEFAULT_1F_D +//#define BLIS_DEFAULT_DF_C BLIS_DEFAULT_1F_C +//#define BLIS_DEFAULT_DF_Z BLIS_DEFAULT_1F_Z -//#define BLIS_DOTXAXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S -//#define BLIS_DOTXAXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D -//#define BLIS_DOTXAXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C -//#define BLIS_DOTXAXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z +//#define BLIS_DEFAULT_XF_S BLIS_DEFAULT_1F_S +//#define BLIS_DEFAULT_XF_D BLIS_DEFAULT_1F_D +//#define BLIS_DEFAULT_XF_C BLIS_DEFAULT_1F_C +//#define BLIS_DEFAULT_XF_Z BLIS_DEFAULT_1F_Z diff --git a/config/template/kernels/1/bli_axpyv_opt_var1.c b/config/template/kernels/1/bli_axpyv_opt_var1.c index bb320d13a..1480d54ec 100644 --- a/config/template/kernels/1/bli_axpyv_opt_var1.c +++ b/config/template/kernels/1/bli_axpyv_opt_var1.c @@ -36,59 +36,87 @@ -void bli_saxpyv_opt_var1( conj_t conjx, - dim_t n, - float* restrict alpha, - float* restrict x, inc_t incx, - float* restrict y, inc_t incy ) +void bli_saxpyv_opt_var1 + ( + conj_t conjx, + dim_t n, + float* alpha, + float* x, inc_t incx, + float* y, inc_t incy, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_SAXPYV_KERNEL_REF( conjx, - n, - alpha, - x, incx, - y, incy ); + BLIS_SAXPYV_KERNEL_REF + ( + conjx, + n, + alpha, + x, incx, + y, incy, + cntx + ); } -void bli_daxpyv_opt_var1( conj_t conjx, - dim_t n, - double* restrict alpha, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy ) +void bli_daxpyv_opt_var1 + ( + conj_t conjx, + dim_t n, + double* alpha, + double* x, inc_t incx, + double* y, inc_t incy, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_DAXPYV_KERNEL_REF( conjx, - n, - alpha, - x, incx, - y, incy ); + BLIS_DAXPYV_KERNEL_REF + ( + conjx, + n, + alpha, + x, incx, + y, incy, + cntx + ); } -void bli_caxpyv_opt_var1( conj_t conjx, - dim_t n, - scomplex* restrict alpha, - scomplex* restrict x, inc_t incx, - scomplex* restrict y, inc_t incy ) +void bli_caxpyv_opt_var1 + ( + conj_t conjx, + dim_t n, + scomplex* alpha, + scomplex* x, inc_t incx, + scomplex* y, inc_t incy, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_CAXPYV_KERNEL_REF( conjx, - n, - alpha, - x, incx, - y, incy ); + BLIS_CAXPYV_KERNEL_REF + ( + conjx, + n, + alpha, + x, incx, + y, incy, + cntx + ); } -void bli_zaxpyv_opt_var1( conj_t conjx, - dim_t n, - dcomplex* restrict alpha, - dcomplex* restrict x, inc_t incx, - dcomplex* restrict y, inc_t incy ) +void bli_zaxpyv_opt_var1 + ( + conj_t conjx, + dim_t n, + dcomplex* alpha, + dcomplex* x, inc_t incx, + dcomplex* y, inc_t incy, + cntx_t* cntx + ) { /* Template axpyv kernel implementation @@ -193,11 +221,15 @@ void bli_zaxpyv_opt_var1( conj_t conjx, // Call the reference implementation if needed. if ( use_ref == TRUE ) { - BLIS_ZAXPYV_KERNEL_REF( conjx, - n, - alpha, - x, incx, - y, incy ); + BLIS_ZAXPYV_KERNEL_REF + ( + conjx, + n, + alpha, + x, incx, + y, incy, + cntx + ); return; } @@ -219,7 +251,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx, // Compute front edge cases if x and y were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzaxpys( *alpha, *xp, *yp ); + bli_zaxpys( *alpha, *xp, *yp ); xp += 1; yp += 1; } @@ -228,7 +260,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx, // yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzaxpys( *alpha, *xp, *yp ); + bli_zaxpys( *alpha, *xp, *yp ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -237,7 +269,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx, // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzaxpys( *alpha, *xp, *yp ); + bli_zaxpys( *alpha, *xp, *yp ); xp += 1; yp += 1; } @@ -247,7 +279,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx, // Compute front edge cases if x and y were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzaxpyjs( *alpha, *xp, *yp ); + bli_zaxpyjs( *alpha, *xp, *yp ); xp += 1; yp += 1; } @@ -256,7 +288,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx, // yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzaxpyjs( *alpha, *xp, *yp ); + bli_zaxpyjs( *alpha, *xp, *yp ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -265,7 +297,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx, // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzaxpyjs( *alpha, *xp, *yp ); + bli_zaxpyjs( *alpha, *xp, *yp ); xp += 1; yp += 1; } diff --git a/config/template/kernels/1/bli_dotv_opt_var1.c b/config/template/kernels/1/bli_dotv_opt_var1.c index 5e61aff63..ded49839b 100644 --- a/config/template/kernels/1/bli_dotv_opt_var1.c +++ b/config/template/kernels/1/bli_dotv_opt_var1.c @@ -36,66 +36,94 @@ -void bli_sdotv_opt_var1( conj_t conjx, - conj_t conjy, - dim_t n, - float* restrict x, inc_t incx, - float* restrict y, inc_t incy, - float* restrict rho ) +void bli_sdotv_opt_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + float* x, inc_t incx, + float* y, inc_t incy, + float* rho, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_SDOTV_KERNEL_REF( conjx, - conjy, - n, - x, incx, - y, incy, - rho ); + BLIS_SDOTV_KERNEL_REF + ( + conjx, + conjy, + n, + x, incx, + y, incy, + rho, + cntx + ); } -void bli_ddotv_opt_var1( conj_t conjx, - conj_t conjy, - dim_t n, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy, - double* restrict rho ) +void bli_ddotv_opt_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + double* x, inc_t incx, + double* y, inc_t incy, + double* rho, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_DDOTV_KERNEL_REF( conjx, - conjy, - n, - x, incx, - y, incy, - rho ); + BLIS_DDOTV_KERNEL_REF + ( + conjx, + conjy, + n, + x, incx, + y, incy, + rho, + cntx + ); } -void bli_cdotv_opt_var1( conj_t conjx, - conj_t conjy, - dim_t n, - scomplex* restrict x, inc_t incx, - scomplex* restrict y, inc_t incy, - scomplex* restrict rho ) +void bli_cdotv_opt_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + scomplex* x, inc_t incx, + scomplex* y, inc_t incy, + scomplex* rho, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_CDOTV_KERNEL_REF( conjx, - conjy, - n, - x, incx, - y, incy, - rho ); + BLIS_CDOTV_KERNEL_REF + ( + conjx, + conjy, + n, + x, incx, + y, incy, + rho, + cntx + ); } -void bli_zdotv_opt_var1( conj_t conjx, - conj_t conjy, - dim_t n, - dcomplex* restrict x, inc_t incx, - dcomplex* restrict y, inc_t incy, - dcomplex* restrict rho ) +void bli_zdotv_opt_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + dcomplex* x, inc_t incx, + dcomplex* y, inc_t incy, + dcomplex* rho, + cntx_t* cntx + ) { /* Template dotv kernel implementation @@ -210,12 +238,16 @@ void bli_zdotv_opt_var1( conj_t conjx, // Call the reference implementation if needed. if ( use_ref == TRUE ) { - BLIS_ZDOTV_KERNEL_REF( conjx, - conjy, - n, - x, incx, - y, incy, - rho ); + BLIS_ZDOTV_KERNEL_REF + ( + conjx, + conjy, + n, + x, incx, + y, incy, + rho, + cntx + ); return; } @@ -250,7 +282,7 @@ void bli_zdotv_opt_var1( conj_t conjx, // Compute front edge cases if x and y were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzdots( *xp, *yp, dotxy ); + bli_zdots( *xp, *yp, dotxy ); xp += 1; yp += 1; } @@ -259,7 +291,7 @@ void bli_zdotv_opt_var1( conj_t conjx, // yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzdots( *xp, *yp, dotxy ); + bli_zdots( *xp, *yp, dotxy ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -268,7 +300,7 @@ void bli_zdotv_opt_var1( conj_t conjx, // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzdots( *xp, *yp, dotxy ); + bli_zdots( *xp, *yp, dotxy ); xp += 1; yp += 1; } @@ -278,7 +310,7 @@ void bli_zdotv_opt_var1( conj_t conjx, // Compute front edge cases if x and y were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzdotjs( *xp, *yp, dotxy ); + bli_zdotjs( *xp, *yp, dotxy ); xp += 1; yp += 1; } @@ -287,7 +319,7 @@ void bli_zdotv_opt_var1( conj_t conjx, // yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzdotjs( *xp, *yp, dotxy ); + bli_zdotjs( *xp, *yp, dotxy ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -296,7 +328,7 @@ void bli_zdotv_opt_var1( conj_t conjx, // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzdotjs( *xp, *yp, dotxy ); + bli_zdotjs( *xp, *yp, dotxy ); xp += 1; yp += 1; } @@ -307,6 +339,6 @@ void bli_zdotv_opt_var1( conj_t conjx, if ( bli_is_conj( conjy ) ) bli_zconjs( dotxy ); - bli_zzcopys( dotxy, *rho ); + bli_zcopys( dotxy, *rho ); } diff --git a/config/template/kernels/1f/bli_axpy2v_opt_var1.c b/config/template/kernels/1f/bli_axpy2v_opt_var1.c index cff49de8b..5448fbd83 100644 --- a/config/template/kernels/1f/bli_axpy2v_opt_var1.c +++ b/config/template/kernels/1f/bli_axpy2v_opt_var1.c @@ -36,88 +36,108 @@ -void bli_saxpy2v_opt_var1( - conj_t conjx, - conj_t conjy, - dim_t n, - float* restrict alpha1, - float* restrict alpha2, - float* restrict x, inc_t incx, - float* restrict y, inc_t incy, - float* restrict z, inc_t incz - ) +void bli_saxpy2v_opt_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + float* alpha1, + float* alpha2, + float* x, inc_t incx, + float* y, inc_t incy, + float* z, inc_t incz, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_SAXPY2V_KERNEL_REF( conjx, - conjy, - n, - alpha1, - alpha2, - x, incx, - y, incy, - z, incz ); + BLIS_SAXPY2V_KERNEL_REF + ( + conjx, + conjy, + n, + alpha1, + alpha2, + x, incx, + y, incy, + z, incz, + cntx + ); } -void bli_daxpy2v_opt_var1( - conj_t conjx, - conj_t conjy, - dim_t n, - double* restrict alpha1, - double* restrict alpha2, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy, - double* restrict z, inc_t incz - ) +void bli_daxpy2v_opt_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + double* alpha1, + double* alpha2, + double* x, inc_t incx, + double* y, inc_t incy, + double* z, inc_t incz, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_DAXPY2V_KERNEL_REF( conjx, - conjy, - n, - alpha1, - alpha2, - x, incx, - y, incy, - z, incz ); + BLIS_DAXPY2V_KERNEL_REF + ( + conjx, + conjy, + n, + alpha1, + alpha2, + x, incx, + y, incy, + z, incz, + cntx + ); } -void bli_caxpy2v_opt_var1( - conj_t conjx, - conj_t conjy, - dim_t n, - scomplex* restrict alpha1, - scomplex* restrict alpha2, - scomplex* restrict x, inc_t incx, - scomplex* restrict y, inc_t incy, - scomplex* restrict z, inc_t incz - ) +void bli_caxpy2v_opt_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + scomplex* alpha1, + scomplex* alpha2, + scomplex* x, inc_t incx, + scomplex* y, inc_t incy, + scomplex* z, inc_t incz, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_CAXPY2V_KERNEL_REF( conjx, - conjy, - n, - alpha1, - alpha2, - x, incx, - y, incy, - z, incz ); + BLIS_CAXPY2V_KERNEL_REF + ( + conjx, + conjy, + n, + alpha1, + alpha2, + x, incx, + y, incy, + z, incz, + cntx + ); } -void bli_zaxpy2v_opt_var1( - conj_t conjx, - conj_t conjy, - dim_t n, - dcomplex* restrict alpha1, - dcomplex* restrict alpha2, - dcomplex* restrict x, inc_t incx, - dcomplex* restrict y, inc_t incy, - dcomplex* restrict z, inc_t incz - ) +void bli_zaxpy2v_opt_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + dcomplex* alpha1, + dcomplex* alpha2, + dcomplex* x, inc_t incx, + dcomplex* y, inc_t incy, + dcomplex* z, inc_t incz, + cntx_t* cntx + ) { /* Template axpy2v kernel implementation @@ -229,14 +249,18 @@ void bli_zaxpy2v_opt_var1( // Call the reference implementation if needed. if ( use_ref == TRUE ) { - BLIS_ZAXPY2V_KERNEL_REF( conjx, - conjy, - n, - alpha1, - alpha2, - x, incx, - y, incy, - z, incz ); + BLIS_ZAXPY2V_KERNEL_REF + ( + conjx, + conjy, + n, + alpha1, + alpha2, + x, incx, + y, incy, + z, incz, + cntx + ); return; } @@ -259,8 +283,8 @@ void bli_zaxpy2v_opt_var1( // Compute front edge cases if x, y, and z were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzaxpys( *alpha1, *xp, *zp ); - bli_zzzaxpys( *alpha2, *yp, *zp ); + bli_zaxpys( *alpha1, *xp, *zp ); + bli_zaxpys( *alpha2, *yp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -272,8 +296,8 @@ void bli_zaxpy2v_opt_var1( // to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzaxpys( *alpha1, *xp, *zp ); - bli_zzzaxpys( *alpha2, *yp, *zp ); + bli_zaxpys( *alpha1, *xp, *zp ); + bli_zaxpys( *alpha2, *yp, *zp ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -283,8 +307,8 @@ void bli_zaxpy2v_opt_var1( // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzaxpys( *alpha1, *xp, *zp ); - bli_zzzaxpys( *alpha2, *yp, *zp ); + bli_zaxpys( *alpha1, *xp, *zp ); + bli_zaxpys( *alpha2, *yp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -294,8 +318,8 @@ void bli_zaxpy2v_opt_var1( // Compute front edge cases if x, y, and z were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzaxpys( *alpha1, *xp, *zp ); - bli_zzzaxpyjs( *alpha2, *yp, *zp ); + bli_zaxpys( *alpha1, *xp, *zp ); + bli_zaxpyjs( *alpha2, *yp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -307,8 +331,8 @@ void bli_zaxpy2v_opt_var1( // to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzaxpys( *alpha1, *xp, *zp ); - bli_zzzaxpyjs( *alpha2, *yp, *zp ); + bli_zaxpys( *alpha1, *xp, *zp ); + bli_zaxpyjs( *alpha2, *yp, *zp ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -318,8 +342,8 @@ void bli_zaxpy2v_opt_var1( // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzaxpys( *alpha1, *xp, *zp ); - bli_zzzaxpyjs( *alpha2, *yp, *zp ); + bli_zaxpys( *alpha1, *xp, *zp ); + bli_zaxpyjs( *alpha2, *yp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -329,8 +353,8 @@ void bli_zaxpy2v_opt_var1( // Compute front edge cases if x, y, and z were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzaxpyjs( *alpha1, *xp, *zp ); - bli_zzzaxpys( *alpha2, *yp, *zp ); + bli_zaxpyjs( *alpha1, *xp, *zp ); + bli_zaxpys( *alpha2, *yp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -342,8 +366,8 @@ void bli_zaxpy2v_opt_var1( // to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzaxpyjs( *alpha1, *xp, *zp ); - bli_zzzaxpys( *alpha2, *yp, *zp ); + bli_zaxpyjs( *alpha1, *xp, *zp ); + bli_zaxpys( *alpha2, *yp, *zp ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -353,8 +377,8 @@ void bli_zaxpy2v_opt_var1( // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzaxpyjs( *alpha1, *xp, *zp ); - bli_zzzaxpys( *alpha2, *yp, *zp ); + bli_zaxpyjs( *alpha1, *xp, *zp ); + bli_zaxpys( *alpha2, *yp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -364,8 +388,8 @@ void bli_zaxpy2v_opt_var1( // Compute front edge cases if x, y, and z were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzaxpyjs( *alpha1, *xp, *zp ); - bli_zzzaxpyjs( *alpha2, *yp, *zp ); + bli_zaxpyjs( *alpha1, *xp, *zp ); + bli_zaxpyjs( *alpha2, *yp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -377,8 +401,8 @@ void bli_zaxpy2v_opt_var1( // to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzaxpyjs( *alpha1, *xp, *zp ); - bli_zzzaxpyjs( *alpha2, *yp, *zp ); + bli_zaxpyjs( *alpha1, *xp, *zp ); + bli_zaxpyjs( *alpha2, *yp, *zp ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -388,8 +412,8 @@ void bli_zaxpy2v_opt_var1( // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzaxpyjs( *alpha1, *xp, *zp ); - bli_zzzaxpyjs( *alpha2, *yp, *zp ); + bli_zaxpyjs( *alpha1, *xp, *zp ); + bli_zaxpyjs( *alpha2, *yp, *zp ); xp += 1; yp += 1; zp += 1; } diff --git a/config/template/kernels/1f/bli_axpyf_opt_var1.c b/config/template/kernels/1f/bli_axpyf_opt_var1.c index 4100061bd..7a987d2e2 100644 --- a/config/template/kernels/1f/bli_axpyf_opt_var1.c +++ b/config/template/kernels/1f/bli_axpyf_opt_var1.c @@ -36,87 +36,107 @@ -void bli_saxpyf_opt_var1( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - float* restrict alpha, - float* restrict a, inc_t inca, inc_t lda, - float* restrict x, inc_t incx, - float* restrict y, inc_t incy - ) +void bli_saxpyf_opt_var1 + ( + conj_t conja, + conj_t conjx, + dim_t m, + dim_t b_n, + float* alpha, + float* a, inc_t inca, inc_t lda, + float* x, inc_t incx, + float* y, inc_t incy, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_SAXPYF_KERNEL_REF( conja, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - y, incy ); + BLIS_SAXPYF_KERNEL_REF + ( + conja, + conjx, + m, + b_n, + alpha, + a, inca, lda, + x, incx, + y, incy, + cntx + ); } -void bli_daxpyf_opt_var1( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - double* restrict alpha, - double* restrict a, inc_t inca, inc_t lda, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy - ) +void bli_daxpyf_opt_var1 + ( + conj_t conja, + conj_t conjx, + dim_t m, + dim_t b_n, + double* alpha, + double* a, inc_t inca, inc_t lda, + double* x, inc_t incx, + double* y, inc_t incy, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_DAXPYF_KERNEL_REF( conja, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - y, incy ); + BLIS_DAXPYF_KERNEL_REF + ( + conja, + conjx, + m, + b_n, + alpha, + a, inca, lda, + x, incx, + y, incy, + cntx + ); } -void bli_caxpyf_opt_var1( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - scomplex* restrict alpha, - scomplex* restrict a, inc_t inca, inc_t lda, - scomplex* restrict x, inc_t incx, - scomplex* restrict y, inc_t incy - ) +void bli_caxpyf_opt_var1 + ( + conj_t conja, + conj_t conjx, + dim_t m, + dim_t b_n, + scomplex* alpha, + scomplex* a, inc_t inca, inc_t lda, + scomplex* x, inc_t incx, + scomplex* y, inc_t incy, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_CAXPYF_KERNEL_REF( conja, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - y, incy ); + BLIS_CAXPYF_KERNEL_REF + ( + conja, + conjx, + m, + b_n, + alpha, + a, inca, lda, + x, incx, + y, incy, + cntx + ); } -void bli_zaxpyf_opt_var1( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - dcomplex* restrict alpha, - dcomplex* restrict a, inc_t inca, inc_t lda, - dcomplex* restrict x, inc_t incx, - dcomplex* restrict y, inc_t incy - ) +void bli_zaxpyf_opt_var1 + ( + conj_t conja, + conj_t conjx, + dim_t m, + dim_t b_n, + dcomplex* alpha, + dcomplex* a, inc_t inca, inc_t lda, + dcomplex* x, inc_t incx, + dcomplex* y, inc_t incy, + cntx_t* cntx + ) { /* Template axpyf kernel implementation @@ -243,14 +263,18 @@ void bli_zaxpyf_opt_var1( // Call the reference implementation if needed. if ( use_ref == TRUE ) { - BLIS_ZAXPYF_KERNEL_REF( conja, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - y, incy ); + BLIS_ZAXPYF_KERNEL_REF + ( + conja, + conjx, + m, + b_n, + alpha, + a, inca, lda, + x, incx, + y, incy, + cntx + ); return; } @@ -274,16 +298,16 @@ void bli_zaxpyf_opt_var1( { for ( j = 0; j < b_n; ++j ) { - bli_zzcopys( *xp[ j ], alpha_x[ j ] ); - bli_zzscals( *alpha, alpha_x[ j ] ); + bli_zcopys( *xp[ j ], alpha_x[ j ] ); + bli_zscals( *alpha, alpha_x[ j ] ); } } else // if ( bli_is_conj( conjx ) ) { for ( j = 0; j < b_n; ++j ) { - bli_zzcopyjs( *xp[ j ], alpha_x[ j ] ); - bli_zzscals( *alpha, alpha_x[ j ] ); + bli_zcopyjs( *xp[ j ], alpha_x[ j ] ); + bli_zscals( *alpha, alpha_x[ j ] ); } } @@ -296,7 +320,7 @@ void bli_zaxpyf_opt_var1( { for ( j = 0; j < b_n; ++j ) { - bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp ); + bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp ); ap[ j ] += 1; } @@ -312,7 +336,7 @@ void bli_zaxpyf_opt_var1( { for ( j = 0; j < b_n; ++j ) { - bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp ); + bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp ); ap[ j ] += n_elem_per_iter; } @@ -324,7 +348,7 @@ void bli_zaxpyf_opt_var1( { for ( j = 0; j < b_n; ++j ) { - bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp ); + bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp ); ap[ j ] += 1; } @@ -338,7 +362,7 @@ void bli_zaxpyf_opt_var1( { for ( j = 0; j < b_n; ++j ) { - bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp ); + bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp ); ap[ j ] += 1; } @@ -354,7 +378,7 @@ void bli_zaxpyf_opt_var1( { for ( j = 0; j < b_n; ++j ) { - bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp ); + bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp ); ap[ j ] += n_elem_per_iter; } @@ -366,7 +390,7 @@ void bli_zaxpyf_opt_var1( { for ( j = 0; j < b_n; ++j ) { - bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp ); + bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp ); ap[ j ] += 1; } diff --git a/config/template/kernels/1f/bli_dotaxpyv_opt_var1.c b/config/template/kernels/1f/bli_dotaxpyv_opt_var1.c index 7f7c12c62..7080505ac 100644 --- a/config/template/kernels/1f/bli_dotaxpyv_opt_var1.c +++ b/config/template/kernels/1f/bli_dotaxpyv_opt_var1.c @@ -36,87 +36,115 @@ -void bli_sdotaxpyv_opt_var1( conj_t conjxt, - conj_t conjx, - conj_t conjy, - dim_t n, - float* restrict alpha, - float* restrict x, inc_t incx, - float* restrict y, inc_t incy, - float* restrict rho, - float* restrict z, inc_t incz ) +void bli_sdotaxpyv_opt_var1 + ( + conj_t conjxt, + conj_t conjx, + conj_t conjy, + dim_t n, + float* alpha, + float* x, inc_t incx, + float* y, inc_t incy, + float* rho, + float* z, inc_t incz, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_SDOTAXPYV_KERNEL_REF( conjxt, - conjx, - conjy, - n, - alpha, - x, incx, - y, incy, - rho, - z, incz ); + BLIS_SDOTAXPYV_KERNEL_REF + ( + conjxt, + conjx, + conjy, + n, + alpha, + x, incx, + y, incy, + rho, + z, incz, + cntx + ); } -void bli_ddotaxpyv_opt_var1( conj_t conjxt, - conj_t conjx, - conj_t conjy, - dim_t n, - double* restrict alpha, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy, - double* restrict rho, - double* restrict z, inc_t incz ) +void bli_ddotaxpyv_opt_var1 + ( + conj_t conjxt, + conj_t conjx, + conj_t conjy, + dim_t n, + double* alpha, + double* x, inc_t incx, + double* y, inc_t incy, + double* rho, + double* z, inc_t incz, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_DDOTAXPYV_KERNEL_REF( conjxt, - conjx, - conjy, - n, - alpha, - x, incx, - y, incy, - rho, - z, incz ); + BLIS_DDOTAXPYV_KERNEL_REF + ( + conjxt, + conjx, + conjy, + n, + alpha, + x, incx, + y, incy, + rho, + z, incz, + cntx + ); } -void bli_cdotaxpyv_opt_var1( conj_t conjxt, - conj_t conjx, - conj_t conjy, - dim_t n, - scomplex* restrict alpha, - scomplex* restrict x, inc_t incx, - scomplex* restrict y, inc_t incy, - scomplex* restrict rho, - scomplex* restrict z, inc_t incz ) +void bli_cdotaxpyv_opt_var1 + ( + conj_t conjxt, + conj_t conjx, + conj_t conjy, + dim_t n, + scomplex* alpha, + scomplex* x, inc_t incx, + scomplex* y, inc_t incy, + scomplex* rho, + scomplex* z, inc_t incz, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_CDOTAXPYV_KERNEL_REF( conjxt, - conjx, - conjy, - n, - alpha, - x, incx, - y, incy, - rho, - z, incz ); + BLIS_CDOTAXPYV_KERNEL_REF + ( + conjxt, + conjx, + conjy, + n, + alpha, + x, incx, + y, incy, + rho, + z, incz, + cntx + ); } -void bli_zdotaxpyv_opt_var1( conj_t conjxt, - conj_t conjx, - conj_t conjy, - dim_t n, - dcomplex* restrict alpha, - dcomplex* restrict x, inc_t incx, - dcomplex* restrict y, inc_t incy, - dcomplex* restrict rho, - dcomplex* restrict z, inc_t incz ) +void bli_zdotaxpyv_opt_var1 + ( + conj_t conjxt, + conj_t conjx, + conj_t conjy, + dim_t n, + dcomplex* alpha, + dcomplex* x, inc_t incx, + dcomplex* y, inc_t incy, + dcomplex* rho, + dcomplex* z, inc_t incz, + cntx_t* cntx + ) { /* Template dotaxpyv kernel implementation @@ -240,15 +268,19 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // Call the reference implementation if needed. if ( use_ref == TRUE ) { - BLIS_ZDOTAXPYV_KERNEL_REF( conjxt, - conjx, - conjy, - n, - alpha, - x, incx, - y, incy, - rho, - z, incz ); + BLIS_ZDOTAXPYV_KERNEL_REF + ( + conjxt, + conjx, + conjy, + n, + alpha, + x, incx, + y, incy, + rho, + z, incz, + cntx + ); return; } @@ -285,8 +317,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // Compute front edge cases if x, y, and z were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzdots( *xp, *yp, dotxy ); - bli_zzzaxpys( *alpha, *xp, *zp ); + bli_zdots( *xp, *yp, dotxy ); + bli_zaxpys( *alpha, *xp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -298,8 +330,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzdots( *xp, *yp, dotxy ); - bli_zzzaxpys( *alpha, *xp, *zp ); + bli_zdots( *xp, *yp, dotxy ); + bli_zaxpys( *alpha, *xp, *zp ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -309,8 +341,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzdots( *xp, *yp, dotxy ); - bli_zzzaxpys( *alpha, *xp, *zp ); + bli_zdots( *xp, *yp, dotxy ); + bli_zaxpys( *alpha, *xp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -320,8 +352,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // Compute front edge cases if x, y, and z were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzdotjs( *xp, *yp, dotxy ); - bli_zzzaxpys( *alpha, *xp, *zp ); + bli_zdotjs( *xp, *yp, dotxy ); + bli_zaxpys( *alpha, *xp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -333,8 +365,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzdotjs( *xp, *yp, dotxy ); - bli_zzzaxpys( *alpha, *xp, *zp ); + bli_zdotjs( *xp, *yp, dotxy ); + bli_zaxpys( *alpha, *xp, *zp ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -344,8 +376,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzdotjs( *xp, *yp, dotxy ); - bli_zzzaxpys( *alpha, *xp, *zp ); + bli_zdotjs( *xp, *yp, dotxy ); + bli_zaxpys( *alpha, *xp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -355,8 +387,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // Compute front edge cases if x, y, and z were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzdots( *xp, *yp, dotxy ); - bli_zzzaxpyjs( *alpha, *xp, *zp ); + bli_zdots( *xp, *yp, dotxy ); + bli_zaxpyjs( *alpha, *xp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -368,8 +400,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzdots( *xp, *yp, dotxy ); - bli_zzzaxpyjs( *alpha, *xp, *zp ); + bli_zdots( *xp, *yp, dotxy ); + bli_zaxpyjs( *alpha, *xp, *zp ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -379,8 +411,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzdots( *xp, *yp, dotxy ); - bli_zzzaxpyjs( *alpha, *xp, *zp ); + bli_zdots( *xp, *yp, dotxy ); + bli_zaxpyjs( *alpha, *xp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -390,8 +422,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // Compute front edge cases if x, y, and z were unaligned. for ( i = 0; i < n_pre; ++i ) { - bli_zzzdotjs( *xp, *yp, dotxy ); - bli_zzzaxpyjs( *alpha, *xp, *zp ); + bli_zdotjs( *xp, *yp, dotxy ); + bli_zaxpyjs( *alpha, *xp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -403,8 +435,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE. for ( i = 0; i < n_iter; ++i ) { - bli_zzzdotjs( *xp, *yp, dotxy ); - bli_zzzaxpyjs( *alpha, *xp, *zp ); + bli_zdotjs( *xp, *yp, dotxy ); + bli_zaxpyjs( *alpha, *xp, *zp ); xp += n_elem_per_iter; yp += n_elem_per_iter; @@ -414,8 +446,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, // Compute tail edge cases, if applicable. for ( i = 0; i < n_left; ++i ) { - bli_zzzdotjs( *xp, *yp, dotxy ); - bli_zzzaxpyjs( *alpha, *xp, *zp ); + bli_zdotjs( *xp, *yp, dotxy ); + bli_zaxpyjs( *alpha, *xp, *zp ); xp += 1; yp += 1; zp += 1; } @@ -426,6 +458,6 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt, if ( bli_is_conj( conjy ) ) bli_zconjs( dotxy ); - bli_zzcopys( dotxy, *rho ); + bli_zcopys( dotxy, *rho ); } diff --git a/config/template/kernels/1f/bli_dotxaxpyf_opt_var1.c b/config/template/kernels/1f/bli_dotxaxpyf_opt_var1.c index 04b2b5ab4..fd4831fbe 100644 --- a/config/template/kernels/1f/bli_dotxaxpyf_opt_var1.c +++ b/config/template/kernels/1f/bli_dotxaxpyf_opt_var1.c @@ -36,115 +36,143 @@ -void bli_sdotxaxpyf_opt_var1( conj_t conjat, - conj_t conja, - conj_t conjw, - conj_t conjx, - dim_t m, - dim_t b_n, - float* restrict alpha, - float* restrict a, inc_t inca, inc_t lda, - float* restrict w, inc_t incw, - float* restrict x, inc_t incx, - float* restrict beta, - float* restrict y, inc_t incy, - float* restrict z, inc_t incz ) +void bli_sdotxaxpyf_opt_var1 + ( + conj_t conjat, + conj_t conja, + conj_t conjw, + conj_t conjx, + dim_t m, + dim_t b_n, + float* alpha, + float* a, inc_t inca, inc_t lda, + float* w, inc_t incw, + float* x, inc_t incx, + float* beta, + float* y, inc_t incy, + float* z, inc_t incz, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_SDOTXAXPYF_KERNEL_REF( conjat, - conja, - conjw, - conjx, - m, - b_n, - alpha, - a, inca, lda, - w, incw, - x, incx, - beta, - y, incy, - z, incz ); + BLIS_SDOTXAXPYF_KERNEL_REF + ( + conjat, + conja, + conjw, + conjx, + m, + b_n, + alpha, + a, inca, lda, + w, incw, + x, incx, + beta, + y, incy, + z, incz, + cntx + ); } -void bli_ddotxaxpyf_opt_var1( conj_t conjat, - conj_t conja, - conj_t conjw, - conj_t conjx, - dim_t m, - dim_t b_n, - double* restrict alpha, - double* restrict a, inc_t inca, inc_t lda, - double* restrict w, inc_t incw, - double* restrict x, inc_t incx, - double* restrict beta, - double* restrict y, inc_t incy, - double* restrict z, inc_t incz ) +void bli_ddotxaxpyf_opt_var1 + ( + conj_t conjat, + conj_t conja, + conj_t conjw, + conj_t conjx, + dim_t m, + dim_t b_n, + double* alpha, + double* a, inc_t inca, inc_t lda, + double* w, inc_t incw, + double* x, inc_t incx, + double* beta, + double* y, inc_t incy, + double* z, inc_t incz, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_DDOTXAXPYF_KERNEL_REF( conjat, - conja, - conjw, - conjx, - m, - b_n, - alpha, - a, inca, lda, - w, incw, - x, incx, - beta, - y, incy, - z, incz ); + BLIS_DDOTXAXPYF_KERNEL_REF + ( + conjat, + conja, + conjw, + conjx, + m, + b_n, + alpha, + a, inca, lda, + w, incw, + x, incx, + beta, + y, incy, + z, incz, + cntx + ); } -void bli_cdotxaxpyf_opt_var1( conj_t conjat, - conj_t conja, - conj_t conjw, - conj_t conjx, - dim_t m, - dim_t b_n, - scomplex* restrict alpha, - scomplex* restrict a, inc_t inca, inc_t lda, - scomplex* restrict w, inc_t incw, - scomplex* restrict x, inc_t incx, - scomplex* restrict beta, - scomplex* restrict y, inc_t incy, - scomplex* restrict z, inc_t incz ) +void bli_cdotxaxpyf_opt_var1 + ( + conj_t conjat, + conj_t conja, + conj_t conjw, + conj_t conjx, + dim_t m, + dim_t b_n, + scomplex* alpha, + scomplex* a, inc_t inca, inc_t lda, + scomplex* w, inc_t incw, + scomplex* x, inc_t incx, + scomplex* beta, + scomplex* y, inc_t incy, + scomplex* z, inc_t incz, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_CDOTXAXPYF_KERNEL_REF( conjat, - conja, - conjw, - conjx, - m, - b_n, - alpha, - a, inca, lda, - w, incw, - x, incx, - beta, - y, incy, - z, incz ); + BLIS_CDOTXAXPYF_KERNEL_REF + ( + conjat, + conja, + conjw, + conjx, + m, + b_n, + alpha, + a, inca, lda, + w, incw, + x, incx, + beta, + y, incy, + z, incz, + cntx + ); } -void bli_zdotxaxpyf_opt_var1( conj_t conjat, - conj_t conja, - conj_t conjw, - conj_t conjx, - dim_t m, - dim_t b_n, - dcomplex* restrict alpha, - dcomplex* restrict a, inc_t inca, inc_t lda, - dcomplex* restrict w, inc_t incw, - dcomplex* restrict x, inc_t incx, - dcomplex* restrict beta, - dcomplex* restrict y, inc_t incy, - dcomplex* restrict z, inc_t incz ) +void bli_zdotxaxpyf_opt_var1 + ( + conj_t conjat, + conj_t conja, + conj_t conjw, + conj_t conjx, + dim_t m, + dim_t b_n, + dcomplex* alpha, + dcomplex* a, inc_t inca, inc_t lda, + dcomplex* w, inc_t incw, + dcomplex* x, inc_t incx, + dcomplex* beta, + dcomplex* y, inc_t incy, + dcomplex* z, inc_t incz, + cntx_t* cntx + ) { /* @@ -289,19 +317,23 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, // Call the reference implementation if needed. if ( use_ref == TRUE ) { - BLIS_ZDOTXAXPYF_KERNEL_REF( conjat, - conja, - conjw, - conjx, - m, - b_n, - alpha, - a, inca, lda, - w, incw, - x, incx, - beta, - y, incy, - z, incz ); + BLIS_ZDOTXAXPYF_KERNEL_REF + ( + conjat, + conja, + conjw, + conjx, + m, + b_n, + alpha, + a, inca, lda, + w, incw, + x, incx, + beta, + y, incy, + z, incz, + cntx + ); return; } @@ -326,16 +358,16 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzcopys( *xp[ j ], alpha_x[ j ] ); - bli_zzscals( *alpha, alpha_x[ j ] ); + bli_zcopys( *xp[ j ], alpha_x[ j ] ); + bli_zscals( *alpha, alpha_x[ j ] ); } } else // if ( bli_is_conj( conjx ) ) { for ( j = 0; j < b_n; ++j ) { - bli_zzcopyjs( *xp[ j ], alpha_x[ j ] ); - bli_zzscals( *alpha, alpha_x[ j ] ); + bli_zcopyjs( *xp[ j ], alpha_x[ j ] ); + bli_zscals( *alpha, alpha_x[ j ] ); } } @@ -366,8 +398,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdots( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdots( *ap[ j ], *wp, At_w[ j ] ); + bli_zdots( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += 1; } @@ -383,8 +415,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdots( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdots( *ap[ j ], *wp, At_w[ j ] ); + bli_zdots( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += n_elem_per_iter; } @@ -396,8 +428,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdots( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdots( *ap[ j ], *wp, At_w[ j ] ); + bli_zdots( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += 1; } @@ -411,8 +443,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdotjs( *ap[ j ], *wp, At_w[ j ] ); + bli_zdots( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += 1; } @@ -428,8 +460,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdotjs( *ap[ j ], *wp, At_w[ j ] ); + bli_zdots( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += n_elem_per_iter; } @@ -441,8 +473,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdotjs( *ap[ j ], *wp, At_w[ j ] ); + bli_zdots( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += 1; } @@ -456,8 +488,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdots( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdots( *ap[ j ], *wp, At_w[ j ] ); + bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += 1; } @@ -473,8 +505,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdots( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdots( *ap[ j ], *wp, At_w[ j ] ); + bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += n_elem_per_iter; } @@ -486,8 +518,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdots( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdots( *ap[ j ], *wp, At_w[ j ] ); + bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += 1; } @@ -501,8 +533,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdotjs( *ap[ j ], *wp, At_w[ j ] ); + bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += 1; } @@ -518,8 +550,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdotjs( *ap[ j ], *wp, At_w[ j ] ); + bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += n_elem_per_iter; } @@ -531,8 +563,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, { for ( j = 0; j < b_n; ++j ) { - bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] ); - bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp ); + bli_zdotjs( *ap[ j ], *wp, At_w[ j ] ); + bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp ); ap[ j ] += 1; } @@ -555,8 +587,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat, // scaling by beta. for ( j = 0; j < b_n; ++j ) { - bli_zzscals( *beta, *yp[ j ] ); - bli_zzzaxpys( *alpha, At_w[ j ], *yp[ j ] ); + bli_zscals( *beta, *yp[ j ] ); + bli_zaxpys( *alpha, At_w[ j ], *yp[ j ] ); } } diff --git a/config/template/kernels/1f/bli_dotxf_opt_var1.c b/config/template/kernels/1f/bli_dotxf_opt_var1.c index 757614f3e..8f721309c 100644 --- a/config/template/kernels/1f/bli_dotxf_opt_var1.c +++ b/config/template/kernels/1f/bli_dotxf_opt_var1.c @@ -36,95 +36,115 @@ -void bli_sdotxf_opt_var1( - conj_t conjat, - conj_t conjx, - dim_t m, - dim_t b_n, - float* restrict alpha, - float* restrict a, inc_t inca, inc_t lda, - float* restrict x, inc_t incx, - float* restrict beta, - float* restrict y, inc_t incy - ) +void bli_sdotxf_opt_var1 + ( + conj_t conjat, + conj_t conjx, + dim_t m, + dim_t b_n, + float* alpha, + float* a, inc_t inca, inc_t lda, + float* x, inc_t incx, + float* beta, + float* y, inc_t incy, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_SDOTXF_KERNEL_REF( conjat, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - beta, - y, incy ); + BLIS_SDOTXF_KERNEL_REF + ( + conjat, + conjx, + m, + b_n, + alpha, + a, inca, lda, + x, incx, + beta, + y, incy, + cntx + ); } -void bli_ddotxf_opt_var1( - conj_t conjat, - conj_t conjx, - dim_t m, - dim_t b_n, - double* restrict alpha, - double* restrict a, inc_t inca, inc_t lda, - double* restrict x, inc_t incx, - double* restrict beta, - double* restrict y, inc_t incy - ) +void bli_ddotxf_opt_var1 + ( + conj_t conjat, + conj_t conjx, + dim_t m, + dim_t b_n, + double* alpha, + double* a, inc_t inca, inc_t lda, + double* x, inc_t incx, + double* beta, + double* y, inc_t incy, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_DDOTXF_KERNEL_REF( conjat, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - beta, - y, incy ); + BLIS_DDOTXF_KERNEL_REF + ( + conjat, + conjx, + m, + b_n, + alpha, + a, inca, lda, + x, incx, + beta, + y, incy, + cntx + ); } -void bli_cdotxf_opt_var1( - conj_t conjat, - conj_t conjx, - dim_t m, - dim_t b_n, - scomplex* restrict alpha, - scomplex* restrict a, inc_t inca, inc_t lda, - scomplex* restrict x, inc_t incx, - scomplex* restrict beta, - scomplex* restrict y, inc_t incy - ) +void bli_cdotxf_opt_var1 + ( + conj_t conjat, + conj_t conjx, + dim_t m, + dim_t b_n, + scomplex* alpha, + scomplex* a, inc_t inca, inc_t lda, + scomplex* x, inc_t incx, + scomplex* beta, + scomplex* y, inc_t incy, + cntx_t* cntx + ) { /* Just call the reference implementation. */ - BLIS_CDOTXF_KERNEL_REF( conjat, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - beta, - y, incy ); + BLIS_CDOTXF_KERNEL_REF + ( + conjat, + conjx, + m, + b_n, + alpha, + a, inca, lda, + x, incx, + beta, + y, incy, + cntx + ); } -void bli_zdotxf_opt_var1( - conj_t conjat, - conj_t conjx, - dim_t m, - dim_t b_n, - dcomplex* restrict alpha, - dcomplex* restrict a, inc_t inca, inc_t lda, - dcomplex* restrict x, inc_t incx, - dcomplex* restrict beta, - dcomplex* restrict y, inc_t incy - ) +void bli_zdotxf_opt_var1 + ( + conj_t conjat, + conj_t conjx, + dim_t m, + dim_t b_n, + dcomplex* alpha, + dcomplex* a, inc_t inca, inc_t lda, + dcomplex* x, inc_t incx, + dcomplex* beta, + dcomplex* y, inc_t incy, + cntx_t* cntx + ) { /* Template dotxf kernel implementation @@ -225,10 +245,14 @@ void bli_zdotxf_opt_var1( // If the vector lengths are zero, scale r by beta and return. if ( bli_zero_dim1( m ) ) { - bli_zzscalv( BLIS_NO_CONJUGATE, - b_n, - beta, - y, incy ); + bli_zscalv_ex + ( + BLIS_NO_CONJUGATE, + b_n, + beta, + y, incy, + cntx + ); return; } @@ -265,15 +289,19 @@ void bli_zdotxf_opt_var1( // Call the reference implementation if needed. if ( use_ref == TRUE ) { - BLIS_ZDOTXF_KERNEL_REF( conjat, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - beta, - y, incy ); + BLIS_ZDOTXF_KERNEL_REF + ( + conjat, + conjx, + m, + b_n, + alpha, + a, inca, lda, + x, incx, + beta, + y, incy, + cntx + ); return; } diff --git a/config/template/kernels/3/bli_gemm_opt_mxn.c b/config/template/kernels/3/bli_gemm_opt_mxn.c index 97b88dfe6..79da4b1ac 100644 --- a/config/template/kernels/3/bli_gemm_opt_mxn.c +++ b/config/template/kernels/3/bli_gemm_opt_mxn.c @@ -36,37 +36,45 @@ -void bli_sgemm_opt_mxn( - dim_t k, - float* restrict alpha, - float* restrict a1, - float* restrict b1, - float* restrict beta, - float* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_opt_mxn + ( + dim_t k, + float* restrict alpha, + float* restrict a1, + float* restrict b1, + float* restrict beta, + float* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_SGEMM_UKERNEL_REF( k, - alpha, - a1, - b1, - beta, - c11, rs_c, cs_c, - data ); + BLIS_SGEMM_UKERNEL_REF + ( + k, + alpha, + a1, + b1, + beta, + c11, rs_c, cs_c, + data, + cntx + ); } -void bli_dgemm_opt_mxn( - dim_t k, - double* restrict alpha, - double* restrict a1, - double* restrict b1, - double* restrict beta, - double* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_opt_mxn + ( + dim_t k, + double* restrict alpha, + double* restrict a1, + double* restrict b1, + double* restrict beta, + double* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Template gemm micro-kernel implementation @@ -106,6 +114,14 @@ void bli_dgemm_opt_mxn( information that may be useful when optimizing the gemm micro-kernel implementation. (See BLIS KernelsHowTo wiki for more info.) + - cntx: The address of the runtime context. The context can be queried + for implementation-specific values such as cache and register + blocksizes. However, most micro-kernels intrinsically "know" + these values already, and thus the cntx argument usually can + be safely ignored. (The following template micro-kernel code + does in fact query MR, NR, PACKMR, and PACKNR, as needed, but + only because those values are not hard-coded, as they would be + in a typical optimized micro-kernel implementation.) Diagram for gemm @@ -203,15 +219,19 @@ void bli_dgemm_opt_mxn( -FGVZ */ - const dim_t mr = bli_dmr; - const dim_t nr = bli_dnr; + const num_t dt = BLIS_DOUBLE; - const inc_t cs_a = bli_dpackmr; + const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); - const inc_t rs_b = bli_dpacknr; + const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx ); + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); - const inc_t rs_ab = 1; - const inc_t cs_ab = bli_dmr; + const inc_t cs_a = packmr; + const inc_t rs_b = packnr; + + const inc_t rs_ab = 1; + const inc_t cs_ab = mr; dim_t l, j, i; @@ -291,36 +311,56 @@ void bli_cgemm_opt_mxn( scomplex* restrict c11, inc_t rs_c, inc_t cs_c, auxinfo_t* data ) + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a1, + scomplex* restrict b1, + scomplex* restrict beta, + scomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_CGEMM_UKERNEL_REF( k, - alpha, - a1, - b1, - beta, - c11, rs_c, cs_c, - data ); + BLIS_CGEMM_UKERNEL_REF + ( + k, + alpha, + a1, + b1, + beta, + c11, rs_c, cs_c, + data, + cntx + ); } -void bli_zgemm_opt_mxn( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a1, - dcomplex* restrict b1, - dcomplex* restrict beta, - dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemm_opt_mxn + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a1, + dcomplex* restrict b1, + dcomplex* restrict beta, + dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_ZGEMM_UKERNEL_REF( k, - alpha, - a1, - b1, - beta, - c11, rs_c, cs_c, - data ); + BLIS_ZGEMM_UKERNEL_REF + ( + k, + alpha, + a1, + b1, + beta, + c11, rs_c, cs_c, + data, + cntx + ); } diff --git a/config/template/kernels/3/bli_gemmtrsm_l_opt_mxn.c b/config/template/kernels/3/bli_gemmtrsm_l_opt_mxn.c index c92f6df3b..1ae61aa97 100644 --- a/config/template/kernels/3/bli_gemmtrsm_l_opt_mxn.c +++ b/config/template/kernels/3/bli_gemmtrsm_l_opt_mxn.c @@ -36,18 +36,24 @@ -void bli_sgemmtrsm_l_opt_mxn( - dim_t k, - float* restrict alpha, - float* restrict a10, - float* restrict a11, - float* restrict b01, - float* restrict b11, - float* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemmtrsm_l_opt_mxn + ( + dim_t k, + float* restrict alpha, + float* restrict a10, + float* restrict a11, + float* restrict b01, + float* restrict b11, + float* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - const inc_t rs_b = bli_spacknr; + const num_t dt = BLIS_FLOAT; + + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); + + const inc_t rs_b = packnr; const inc_t cs_b = 1; float* restrict minus_one = bli_sm1; @@ -69,16 +75,18 @@ void bli_sgemmtrsm_l_opt_mxn( -void bli_dgemmtrsm_l_opt_mxn( - dim_t k, - double* restrict alpha, - double* restrict a10, - double* restrict a11, - double* restrict b01, - double* restrict b11, - double* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemmtrsm_l_opt_mxn + ( + dim_t k, + double* restrict alpha, + double* restrict a10, + double* restrict a11, + double* restrict b01, + double* restrict b11, + double* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Template gemmtrsm_l micro-kernel implementation @@ -131,6 +139,14 @@ void bli_dgemmtrsm_l_opt_mxn( information that may be useful when optimizing the gemmtrsm micro-kernel implementation. (See BLIS KernelsHowTo wiki for more info.) + - cntx: The address of the runtime context. The context can be queried + for implementation-specific values such as cache and register + blocksizes. However, most micro-kernels intrinsically "know" + these values already, and thus the cntx argument usually can + be safely ignored. (The following template micro-kernel code + does in fact query MR, NR, PACKMR, and PACKNR, as needed, but + only because those values are not hard-coded, as they would be + in a typical optimized micro-kernel implementation.) Diagram for gemmtrsm_l @@ -203,7 +219,11 @@ void bli_dgemmtrsm_l_opt_mxn( -FGVZ */ - const inc_t rs_b = bli_dpacknr; + const num_t dt = BLIS_DOUBLE; + + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); + + const inc_t rs_b = packnr; const inc_t cs_b = 1; double* restrict minus_one = bli_dm1; @@ -227,18 +247,24 @@ void bli_dgemmtrsm_l_opt_mxn( -void bli_cgemmtrsm_l_opt_mxn( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a10, - scomplex* restrict a11, - scomplex* restrict b01, - scomplex* restrict b11, - scomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_cgemmtrsm_l_opt_mxn + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a10, + scomplex* restrict a11, + scomplex* restrict b01, + scomplex* restrict b11, + scomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - const inc_t rs_b = bli_cpacknr; + const num_t dt = BLIS_SCOMPLEX; + + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); + + const inc_t rs_b = packnr; const inc_t cs_b = 1; scomplex* restrict minus_one = bli_cm1; @@ -260,18 +286,24 @@ void bli_cgemmtrsm_l_opt_mxn( -void bli_zgemmtrsm_l_opt_mxn( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a10, - dcomplex* restrict a11, - dcomplex* restrict b01, - dcomplex* restrict b11, - dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemmtrsm_l_opt_mxn + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a10, + dcomplex* restrict a11, + dcomplex* restrict b01, + dcomplex* restrict b11, + dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - const inc_t rs_b = bli_zpacknr; + const num_t dt = BLIS_DCOMPLEX; + + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); + + const inc_t rs_b = packnr; const inc_t cs_b = 1; dcomplex* restrict minus_one = bli_zm1; diff --git a/config/template/kernels/3/bli_gemmtrsm_u_opt_mxn.c b/config/template/kernels/3/bli_gemmtrsm_u_opt_mxn.c index c456726a0..58616a644 100644 --- a/config/template/kernels/3/bli_gemmtrsm_u_opt_mxn.c +++ b/config/template/kernels/3/bli_gemmtrsm_u_opt_mxn.c @@ -36,18 +36,24 @@ -void bli_sgemmtrsm_u_opt_mxn( - dim_t k, - float* restrict alpha, - float* restrict a12, - float* restrict a11, - float* restrict b21, - float* restrict b11, - float* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemmtrsm_u_opt_mxn + ( + dim_t k, + float* restrict alpha, + float* restrict a10, + float* restrict a11, + float* restrict b01, + float* restrict b11, + float* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - const inc_t rs_b = bli_spacknr; + const num_t dt = BLIS_FLOAT; + + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); + + const inc_t rs_b = packnr; const inc_t cs_b = 1; float* restrict minus_one = bli_sm1; @@ -69,16 +75,18 @@ void bli_sgemmtrsm_u_opt_mxn( -void bli_dgemmtrsm_u_opt_mxn( - dim_t k, - double* restrict alpha, - double* restrict a12, - double* restrict a11, - double* restrict b21, - double* restrict b11, - double* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemmtrsm_u_opt_mxn + ( + dim_t k, + double* restrict alpha, + double* restrict a10, + double* restrict a11, + double* restrict b01, + double* restrict b11, + double* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Template gemmtrsm_u micro-kernel implementation @@ -131,6 +139,14 @@ void bli_dgemmtrsm_u_opt_mxn( information that may be useful when optimizing the gemmtrsm micro-kernel implementation. (See BLIS KernelsHowTo wiki for more info.) + - cntx: The address of the runtime context. The context can be queried + for implementation-specific values such as cache and register + blocksizes. However, most micro-kernels intrinsically "know" + these values already, and thus the cntx argument usually can + be safely ignored. (The following template micro-kernel code + does in fact query MR, NR, PACKMR, and PACKNR, as needed, but + only because those values are not hard-coded, as they would be + in a typical optimized micro-kernel implementation.) Diagram for gemmtrsm_u @@ -200,7 +216,11 @@ void bli_dgemmtrsm_u_opt_mxn( blis-devel mailing list. */ - const inc_t rs_b = bli_dpacknr; + const num_t dt = BLIS_DOUBLE; + + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); + + const inc_t rs_b = packnr; const inc_t cs_b = 1; double* restrict minus_one = bli_dm1; @@ -224,18 +244,24 @@ void bli_dgemmtrsm_u_opt_mxn( -void bli_cgemmtrsm_u_opt_mxn( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a12, - scomplex* restrict a11, - scomplex* restrict b21, - scomplex* restrict b11, - scomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_cgemmtrsm_u_opt_mxn + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a10, + scomplex* restrict a11, + scomplex* restrict b01, + scomplex* restrict b11, + scomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - const inc_t rs_b = bli_cpacknr; + const num_t dt = BLIS_SCOMPLEX; + + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); + + const inc_t rs_b = packnr; const inc_t cs_b = 1; scomplex* restrict minus_one = bli_cm1; @@ -257,18 +283,24 @@ void bli_cgemmtrsm_u_opt_mxn( -void bli_zgemmtrsm_u_opt_mxn( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a12, - dcomplex* restrict a11, - dcomplex* restrict b21, - dcomplex* restrict b11, - dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemmtrsm_u_opt_mxn + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a10, + dcomplex* restrict a11, + dcomplex* restrict b01, + dcomplex* restrict b11, + dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - const inc_t rs_b = bli_zpacknr; + const num_t dt = BLIS_DCOMPLEX; + + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); + + const inc_t rs_b = packnr; const inc_t cs_b = 1; dcomplex* restrict minus_one = bli_zm1; diff --git a/config/template/kernels/3/bli_trsm_l_opt_mxn.c b/config/template/kernels/3/bli_trsm_l_opt_mxn.c index 9ce740140..a28760b88 100644 --- a/config/template/kernels/3/bli_trsm_l_opt_mxn.c +++ b/config/template/kernels/3/bli_trsm_l_opt_mxn.c @@ -36,28 +36,36 @@ -void bli_strsm_l_opt_mxn( - float* restrict a11, - float* restrict b11, - float* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_strsm_l_opt_mxn + ( + float* restrict a11, + float* restrict b11, + float* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_STRSM_L_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); + BLIS_STRSM_L_UKERNEL_REF + ( + a11, + b11, + c11, rs_c, cs_c, + data, + cntx + ); } -void bli_dtrsm_l_opt_mxn( - double* restrict a11, - double* restrict b11, - double* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dtrsm_l_opt_mxn + ( + double* restrict a11, + double* restrict b11, + double* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Template trsm_l micro-kernel implementation @@ -100,6 +108,14 @@ void bli_dtrsm_l_opt_mxn( information that may be useful when optimizing the trsm micro-kernel implementation. (See BLIS KernelsHowTo wiki for more info.) + - cntx: The address of the runtime context. The context can be queried + for implementation-specific values such as cache and register + blocksizes. However, most micro-kernels intrinsically "know" + these values already, and thus the cntx argument usually can + be safely ignored. (The following template micro-kernel code + does in fact query MR, NR, PACKMR, and PACKNR, as needed, but + only because those values are not hard-coded, as they would be + in a typical optimized micro-kernel implementation.) Diagrams for trsm @@ -142,14 +158,20 @@ void bli_dtrsm_l_opt_mxn( -FGVZ */ - const dim_t m = bli_dmr; - const dim_t n = bli_dnr; + const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); - const inc_t rs_a = 1; - const inc_t cs_a = bli_dpackmr; + const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx ); + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); - const inc_t rs_b = bli_dpacknr; - const inc_t cs_b = 1; + const dim_t m = mr; + const dim_t n = nr; + + const inc_t rs_a = 1; + const inc_t cs_a = packmr; + + const inc_t rs_b = packnr; + const inc_t cs_b = 1; dim_t iter, i, j, l; dim_t n_behind; @@ -208,33 +230,45 @@ void bli_dtrsm_l_opt_mxn( -void bli_ctrsm_l_opt_mxn( - scomplex* restrict a11, - scomplex* restrict b11, - scomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_ctrsm_l_opt_mxn + ( + scomplex* restrict a11, + scomplex* restrict b11, + scomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_CTRSM_L_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); + BLIS_CTRSM_L_UKERNEL_REF + ( + a11, + b11, + c11, rs_c, cs_c, + data, + cntx + ); } -void bli_ztrsm_l_opt_mxn( - dcomplex* restrict a11, - dcomplex* restrict b11, - dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_ztrsm_l_opt_mxn + ( + dcomplex* restrict a11, + dcomplex* restrict b11, + dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_ZTRSM_L_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); + BLIS_ZTRSM_L_UKERNEL_REF + ( + a11, + b11, + c11, rs_c, cs_c, + data, + cntx + ); } diff --git a/config/template/kernels/3/bli_trsm_u_opt_mxn.c b/config/template/kernels/3/bli_trsm_u_opt_mxn.c index a18887d85..ba0b46753 100644 --- a/config/template/kernels/3/bli_trsm_u_opt_mxn.c +++ b/config/template/kernels/3/bli_trsm_u_opt_mxn.c @@ -36,18 +36,24 @@ -void bli_strsm_u_opt_mxn( - float* restrict a11, - float* restrict b11, - float* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_strsm_u_opt_mxn + ( + float* restrict a11, + float* restrict b11, + float* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_STRSM_U_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); + BLIS_STRSM_U_UKERNEL_REF + ( + a11, + b11, + c11, rs_c, cs_c, + data, + cntx + ); } @@ -58,6 +64,13 @@ void bli_dtrsm_u_opt_mxn( double* restrict c11, inc_t rs_c, inc_t cs_c, auxinfo_t* data ) + ( + double* restrict a11, + double* restrict b11, + double* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Template trsm_u micro-kernel implementation @@ -100,6 +113,14 @@ void bli_dtrsm_u_opt_mxn( information that may be useful when optimizing the trsm micro-kernel implementation. (See BLIS KernelsHowTo wiki for more info.) + - cntx: The address of the runtime context. The context can be queried + for implementation-specific values such as cache and register + blocksizes. However, most micro-kernels intrinsically "know" + these values already, and thus the cntx argument usually can + be safely ignored. (The following template micro-kernel code + does in fact query MR, NR, PACKMR, and PACKNR, as needed, but + only because those values are not hard-coded, as they would be + in a typical optimized micro-kernel implementation.) Diagrams for trsm @@ -141,14 +162,20 @@ void bli_dtrsm_u_opt_mxn( -FGVZ */ - const dim_t m = bli_dmr; - const dim_t n = bli_dnr; + const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); - const inc_t rs_a = 1; - const inc_t cs_a = bli_dpackmr; + const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx ); + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); - const inc_t rs_b = bli_dpacknr; - const inc_t cs_b = 1; + const dim_t m = mr; + const dim_t n = nr; + + const inc_t rs_a = 1; + const inc_t cs_a = packmr; + + const inc_t rs_b = packnr; + const inc_t cs_b = 1; dim_t iter, i, j, l; dim_t n_behind; @@ -207,33 +234,45 @@ void bli_dtrsm_u_opt_mxn( -void bli_ctrsm_u_opt_mxn( - scomplex* restrict a11, - scomplex* restrict b11, - scomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_ctrsm_u_opt_mxn + ( + scomplex* restrict a11, + scomplex* restrict b11, + scomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_CTRSM_U_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); + BLIS_CTRSM_U_UKERNEL_REF + ( + a11, + b11, + c11, rs_c, cs_c, + data, + cntx + ); } -void bli_ztrsm_u_opt_mxn( - dcomplex* restrict a11, - dcomplex* restrict b11, - dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_ztrsm_u_opt_mxn + ( + dcomplex* restrict a11, + dcomplex* restrict b11, + dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_ZTRSM_U_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); + BLIS_ZTRSM_U_UKERNEL_REF + ( + a11, + b11, + c11, rs_c, cs_c, + data, + cntx + ); } diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf_fusefac.h b/frame/0/bli_l0.h similarity index 93% rename from frame/1f/dotxaxpyf/bli_dotxaxpyf_fusefac.h rename to frame/0/bli_l0.h index 8591a1b6b..de3f263cf 100644 --- a/frame/1f/dotxaxpyf/bli_dotxaxpyf_fusefac.h +++ b/frame/0/bli_l0.h @@ -32,8 +32,10 @@ */ -// -// Prototype object-based fusing factor query routine. -// -dim_t bli_dotxaxpyf_fusefac( num_t dt ); +#include "bli_l0_check.h" +#include "bli_l0_oapi.h" +#include "bli_l0_tapi.h" + +// copysc +#include "bli_copysc.h" diff --git a/frame/0/bli_l0_check.c b/frame/0/bli_l0_check.c new file mode 100644 index 000000000..4c11c5dee --- /dev/null +++ b/frame/0/bli_l0_check.c @@ -0,0 +1,314 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define object-based check functions. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* chi, \ + obj_t* psi \ + ) \ +{ \ + bli_l0_xxsc_check( chi, psi ); \ +} + +GENFRONT( addsc ) +GENFRONT( copysc ) +GENFRONT( divsc ) +GENFRONT( mulsc ) +GENFRONT( sqrtsc ) +GENFRONT( subsc ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* chi, \ + obj_t* norm \ + ) \ +{ \ + bli_l0_xx2sc_check( chi, norm ); \ +} + +GENFRONT( absqsc ) +GENFRONT( normfsc ) + + +void bli_getsc_check + ( + obj_t* chi, + double* zeta_r, + double* zeta_i + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( chi ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( chi ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( chi ); + bli_check_error_code( e_val ); +} + + +void bli_setsc_check + ( + double zeta_r, + double zeta_i, + obj_t* chi + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( chi ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( chi ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( chi ); + bli_check_error_code( e_val ); +} + + +void bli_unzipsc_check + ( + obj_t* chi, + obj_t* zeta_r, + obj_t* zeta_i + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_real_object( zeta_r ); + bli_check_error_code( e_val ); + + e_val = bli_check_real_object( zeta_i ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( zeta_r ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( zeta_i ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_real_proj_of( chi, zeta_r ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_real_proj_of( chi, zeta_i ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( zeta_r ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( zeta_i ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( zeta_r ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( zeta_i ); + bli_check_error_code( e_val ); +} + + +void bli_zipsc_check + ( + obj_t* zeta_r, + obj_t* zeta_i, + obj_t* chi + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_real_object( zeta_r ); + bli_check_error_code( e_val ); + + e_val = bli_check_real_object( zeta_i ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_real_proj_of( chi, zeta_r ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_real_proj_of( chi, zeta_i ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( zeta_r ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( zeta_i ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( chi ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( zeta_r ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( zeta_i ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( chi ); + bli_check_error_code( e_val ); +} + + +// ----------------------------------------------------------------------------- + +void bli_l0_xxsc_check + ( + obj_t* chi, + obj_t* psi + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( psi ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( psi ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( psi ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( psi ); + bli_check_error_code( e_val ); +} + +void bli_l0_xx2sc_check + ( + obj_t* chi, + obj_t* absq + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( absq ); + bli_check_error_code( e_val ); + + e_val = bli_check_real_object( absq ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_real_proj_of( chi, absq ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( absq ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( chi ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( absq ); + bli_check_error_code( e_val ); +} + diff --git a/frame/0/bli_l0_check.h b/frame/0/bli_l0_check.h new file mode 100644 index 000000000..94c0aa1b8 --- /dev/null +++ b/frame/0/bli_l0_check.h @@ -0,0 +1,134 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based check functions. +// + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* chi, \ + obj_t* psi \ + ); + +GENTPROT( addsc ) +GENTPROT( copysc ) +GENTPROT( divsc ) +GENTPROT( mulsc ) +GENTPROT( sqrtsc ) +GENTPROT( subsc ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* chi, \ + obj_t* absq \ + ); + +GENTPROT( absqsc ) +GENTPROT( normfsc ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* chi, \ + double* zeta_r, \ + double* zeta_i \ + ); + +GENTPROT( getsc ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + double zeta_r, \ + double zeta_i, \ + obj_t* chi \ + ); + +GENTPROT( setsc ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* chi, \ + obj_t* zeta_r, \ + obj_t* zeta_i \ + ); + +GENTPROT( unzipsc ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* zeta_r, \ + obj_t* zeta_i, \ + obj_t* chi \ + ); + +GENTPROT( zipsc ) + + +// ----------------------------------------------------------------------------- + +void bli_l0_xxsc_check + ( + obj_t* chi, + obj_t* psi + ); + +void bli_l0_xx2sc_check + ( + obj_t* chi, + obj_t* norm + ); diff --git a/frame/0/bli_l0_oapi.c b/frame/0/bli_l0_oapi.c new file mode 100644 index 000000000..6715eea23 --- /dev/null +++ b/frame/0/bli_l0_oapi.c @@ -0,0 +1,288 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define object-based interfaces. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + obj_t* absq \ + ) \ +{ \ + num_t dt_chi; \ + num_t dt_absq_c = bli_obj_datatype_proj_to_complex( *absq ); \ +\ + void* buf_chi; \ + void* buf_absq = bli_obj_buffer_at_off( *absq ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( chi, absq ); \ +\ + /* If chi is a scalar constant, use dt_absq_c to extract the address of the + corresponding constant value; otherwise, use the datatype encoded + within the chi object and extract the buffer at the chi offset. */ \ + bli_set_scalar_dt_buffer( chi, dt_absq_c, dt_chi, buf_chi ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_2 \ + ( \ + dt_chi, \ + opname, \ + buf_chi, \ + buf_absq \ + ); \ +} + +GENFRONT( absqsc ) +GENFRONT( normfsc ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + obj_t* psi \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *psi ); \ +\ + conj_t conjchi = bli_obj_conj_status( *chi ); \ +\ + void* buf_chi = bli_obj_buffer_for_1x1( dt, *chi ); \ + void* buf_psi = bli_obj_buffer_at_off( *psi ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( chi, psi ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_3 \ + ( \ + dt, \ + opname, \ + conjchi, \ + buf_chi, \ + buf_psi \ + ); \ +} + +GENFRONT( addsc ) +GENFRONT( divsc ) +GENFRONT( mulsc ) +GENFRONT( subsc ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + obj_t* psi \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *psi ); \ +\ + void* buf_chi = bli_obj_buffer_for_1x1( dt, *chi ); \ + void* buf_psi = bli_obj_buffer_at_off( *psi ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( chi, psi ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_2 \ + ( \ + dt, \ + opname, \ + buf_chi, \ + buf_psi \ + ); \ +} + +GENFRONT( sqrtsc ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + double* zeta_r, \ + double* zeta_i \ + ) \ +{ \ + num_t dt_chi = bli_obj_datatype( *chi ); \ + num_t dt_def = BLIS_DCOMPLEX; \ + num_t dt_use; \ +\ + /* If chi is a constant object, default to using the dcomplex + value to maximize precision, and since we don't know if the + caller needs just the real or the real and imaginary parts. */ \ + void* buf_chi = bli_obj_buffer_for_1x1( dt_def, *chi ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \ +\ + /* The _check() routine prevents integer types, so we know that chi + is either a constant or an actual floating-point type. */ \ + if ( bli_is_constant( dt_chi ) ) dt_use = dt_def; \ + else dt_use = dt_chi; \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_3 \ + ( \ + dt_use, \ + opname, \ + buf_chi, \ + zeta_r, \ + zeta_i \ + ); \ +} + +GENFRONT( getsc ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + double zeta_r, \ + double zeta_i, \ + obj_t* chi \ + ) \ +{ \ + num_t dt_chi = bli_obj_datatype( *chi ); \ +\ + void* buf_chi = bli_obj_buffer_at_off( *chi ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( zeta_r, zeta_i, chi ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_3 \ + ( \ + dt_chi, \ + opname, \ + zeta_r, \ + zeta_i, \ + buf_chi \ + ); \ +} + +GENFRONT( setsc ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + obj_t* zeta_r, \ + obj_t* zeta_i \ + ) \ +{ \ + num_t dt_chi; \ + num_t dt_zeta_c = bli_obj_datatype_proj_to_complex( *zeta_r ); \ +\ + void* buf_chi; \ +\ + void* buf_zeta_r = bli_obj_buffer_at_off( *zeta_r ); \ + void* buf_zeta_i = bli_obj_buffer_at_off( *zeta_i ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \ +\ + /* If chi is a scalar constant, use dt_zeta_c to extract the address of the + corresponding constant value; otherwise, use the datatype encoded + within the chi object and extract the buffer at the chi offset. */ \ + bli_set_scalar_dt_buffer( chi, dt_zeta_c, dt_chi, buf_chi ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_3 \ + ( \ + dt_chi, \ + opname, \ + buf_chi, \ + buf_zeta_r, \ + buf_zeta_i \ + ); \ +} + +GENFRONT( unzipsc ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* zeta_r, \ + obj_t* zeta_i, \ + obj_t* chi \ + ) \ +{ \ + num_t dt_chi = bli_obj_datatype( *chi ); \ +\ + void* buf_zeta_r = bli_obj_buffer_for_1x1( dt_chi, *zeta_r ); \ + void* buf_zeta_i = bli_obj_buffer_for_1x1( dt_chi, *zeta_i ); \ +\ + void* buf_chi = bli_obj_buffer_at_off( *chi ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_3 \ + ( \ + dt_chi, \ + opname, \ + buf_zeta_i, \ + buf_zeta_r, \ + buf_chi \ + ); \ +} + +GENFRONT( zipsc ) + diff --git a/frame/0/bli_l0_oapi.h b/frame/0/bli_l0_oapi.h new file mode 100644 index 000000000..0289aac76 --- /dev/null +++ b/frame/0/bli_l0_oapi.h @@ -0,0 +1,125 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + obj_t* absq \ + ); + +GENPROT( absqsc ) +GENPROT( normfsc ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + obj_t* psi \ + ); + +GENPROT( addsc ) +GENPROT( divsc ) +GENPROT( mulsc ) +GENPROT( sqrtsc ) +GENPROT( subsc ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + double* zeta_r, \ + double* zeta_i \ + ); + +GENPROT( getsc ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + double zeta_r, \ + double zeta_i, \ + obj_t* chi \ + ); + +GENPROT( setsc ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + obj_t* zeta_r, \ + obj_t* zeta_i \ + ); + +GENPROT( unzipsc ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* zeta_r, \ + obj_t* zeta_i, \ + obj_t* chi \ + ); + +GENPROT( zipsc ) + + + + + + + diff --git a/frame/0/bli_l0_tapi.c b/frame/0/bli_l0_tapi.c new file mode 100644 index 000000000..173189eba --- /dev/null +++ b/frame/0/bli_l0_tapi.c @@ -0,0 +1,210 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjchi, \ + ctype* chi, \ + ctype* psi \ + ) \ +{ \ + ctype chi_conj; \ +\ + PASTEMAC(ch,copycjs)( conjchi, *chi, chi_conj ); \ + PASTEMAC(ch,kername)( chi_conj, *psi ); \ +} + +INSERT_GENTFUNC_BASIC( addsc, adds ) +INSERT_GENTFUNC_BASIC( divsc, invscals ) +INSERT_GENTFUNC_BASIC( subsc, subs ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjchi, \ + ctype* chi, \ + ctype* psi \ + ) \ +{ \ + if ( PASTEMAC(ch,eq0)( *chi ) ) \ + { \ + /* Overwrite potential Infs and NaNs. */ \ + PASTEMAC(ch,set0s)( *psi ); \ + } \ + else \ + { \ + ctype chi_conj; \ +\ + PASTEMAC(ch,copycjs)( conjchi, *chi, chi_conj ); \ + PASTEMAC(ch,kername)( chi_conj, *psi ); \ + } \ +} + +INSERT_GENTFUNC_BASIC( mulsc, scals ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* chi, \ + ctype_r* absq \ + ) \ +{ \ + ctype_r chi_r; \ + ctype_r chi_i; \ + ctype_r absq_i; \ +\ + ( void )absq_i; \ +\ + PASTEMAC2(ch,chr,gets)( *chi, chi_r, chi_i ); \ +\ + /* absq = chi_r * chi_r + chi_i * chi_i; \ + absq_r = 0.0; (thrown away) */ \ + PASTEMAC(ch,absq2ris)( chi_r, chi_i, *absq, absq_i ); \ +\ + ( void )chi_i; \ +} + +INSERT_GENTFUNCR_BASIC0( absqsc ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* chi, \ + ctype_r* norm \ + ) \ +{ \ + /* norm = sqrt( chi_r * chi_r + chi_i * chi_i ); */ \ + PASTEMAC2(ch,chr,abval2s)( *chi, *norm ); \ +} + +INSERT_GENTFUNCR_BASIC0( normfsc ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* chi, \ + ctype* psi \ + ) \ +{ \ + /* NOTE: sqrtsc/sqrt2s differs from normfsc/abval2s in the complex domain. */ \ + PASTEMAC(ch,sqrt2s)( *chi, *psi ); \ +} + +INSERT_GENTFUNC_BASIC0( sqrtsc ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* chi, \ + double* zeta_r, \ + double* zeta_i \ + ) \ +{ \ + PASTEMAC2(ch,d,gets)( *chi, *zeta_r, *zeta_i ); \ +} + +INSERT_GENTFUNC_BASIC0( getsc ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + double zeta_r, \ + double zeta_i, \ + ctype* chi \ + ) \ +{ \ + PASTEMAC2(d,ch,sets)( zeta_r, zeta_i, *chi ); \ +} + +INSERT_GENTFUNC_BASIC0( setsc ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* chi, \ + ctype_r* zeta_r, \ + ctype_r* zeta_i \ + ) \ +{ \ + PASTEMAC2(ch,chr,gets)( *chi, *zeta_r, *zeta_i ); \ +} + +INSERT_GENTFUNCR_BASIC0( unzipsc ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype_r* zeta_r, \ + ctype_r* zeta_i, \ + ctype* chi \ + ) \ +{ \ + PASTEMAC2(chr,ch,sets)( *zeta_r, *zeta_i, *chi ); \ +} + +INSERT_GENTFUNCR_BASIC0( zipsc ) + diff --git a/frame/0/bli_l0_tapi.h b/frame/0/bli_l0_tapi.h new file mode 100644 index 000000000..ebd474b58 --- /dev/null +++ b/frame/0/bli_l0_tapi.h @@ -0,0 +1,131 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjchi, \ + ctype* chi, \ + ctype* psi \ + ); + +INSERT_GENTPROT_BASIC( addsc ) +INSERT_GENTPROT_BASIC( divsc ) +INSERT_GENTPROT_BASIC( mulsc ) +INSERT_GENTPROT_BASIC( subsc ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* chi, \ + ctype_r* absq \ + ); + +INSERT_GENTPROTR_BASIC( absqsc ) +INSERT_GENTPROTR_BASIC( normfsc ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* chi, \ + ctype* psi \ + ); + +INSERT_GENTPROT_BASIC( sqrtsc ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* chi, \ + double* zeta_r, \ + double* zeta_i \ + ); + +INSERT_GENTPROT_BASIC( getsc ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + double zeta_r, \ + double zeta_i, \ + ctype* chi \ + ); + +INSERT_GENTPROT_BASIC( setsc ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* chi, \ + ctype_r* zeta_r, \ + ctype_r* zeta_i \ + ); + +INSERT_GENTPROTR_BASIC( unzipsc ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype_r* zeta_r, \ + ctype_r* zeta_i, \ + ctype* chi \ + ); + +INSERT_GENTPROTR_BASIC( zipsc ) + diff --git a/frame/0/copysc/bli_copysc.c b/frame/0/copysc/bli_copysc.c index 346bf4d89..132ac4120 100644 --- a/frame/0/copysc/bli_copysc.c +++ b/frame/0/copysc/bli_copysc.c @@ -34,66 +34,93 @@ #include "blis.h" +// NOTE: This is one of the few functions in BLIS that is defined +// with heterogeneous type support. This is done so that we have +// an operation that can be used to typecast (copy-cast) a scalar +// of one datatype to a scalar of another datatype. + +typedef void (*FUNCPTR_T)( + conj_t conjchi, + void* chi, + void* psi + ); + +static FUNCPTR_T GENARRAY2_ALL(ftypes,copysc); // -// Define object-based interface. +// Define object-based interfaces. // -void bli_copysc( obj_t* chi, - obj_t* psi ) -{ - if ( bli_error_checking_is_enabled() ) - bli_copysc_check( chi, psi ); - bli_copysc_unb_var1( chi, psi ); -} - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#undef GENFRONT +#define GENFRONT( opname ) \ \ -void PASTEMAC(ch,opname)( \ - conj_t conjchi, \ - ctype* chi, \ - ctype* psi \ - ) \ +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + obj_t* psi \ + ) \ { \ - PASTEMAC2(ch,ch,varname)( conjchi, \ - chi, \ - psi ); \ + conj_t conjchi = bli_obj_conj_status( *chi ); \ +\ + num_t dt_psi = bli_obj_datatype( *psi ); \ + void* buf_psi = bli_obj_buffer_at_off( *psi ); \ +\ + num_t dt_chi; \ + void* buf_chi; \ +\ + FUNCPTR_T f; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( chi, psi ); \ +\ + /* If chi is a scalar constant, use dt_psi to extract the address of the + corresponding constant value; otherwise, use the datatype encoded + within the chi object and extract the buffer at the chi offset. */ \ + bli_set_scalar_dt_buffer( chi, dt_psi, dt_chi, buf_chi ); \ +\ + /* Index into the type combination array to extract the correct + function pointer. */ \ + f = ftypes[dt_chi][dt_psi]; \ +\ + /* Invoke the void pointer-based function. */ \ + f( \ + conjchi, \ + buf_chi, \ + buf_psi \ + ); \ } -INSERT_GENTFUNC_BASIC( copysc, copysc_unb_var1 ) +GENFRONT( copysc ) // -// Define BLAS-like interfaces with heterogeneous-typed operands. +// Define BLAS-like interfaces with typed operands. // + #undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \ +#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname ) \ \ -void PASTEMAC2(chx,chy,opname)( \ - conj_t conjchi, \ - ctype_x* chi, \ - ctype_y* psi \ - ) \ +void PASTEMAC2(chx,chy,varname) \ + ( \ + conj_t conjchi, \ + void* chi, \ + void* psi \ + ) \ { \ - PASTEMAC2(chx,chy,varname)( conjchi, \ - chi, \ - psi ); \ + ctype_x* chi_cast = chi; \ + ctype_y* psi_cast = psi; \ +\ + if ( bli_is_conj( conjchi ) ) \ + { \ + PASTEMAC2(chx,chy,copyjs)( *chi_cast, *psi_cast ); \ + } \ + else \ + { \ + PASTEMAC2(chx,chy,copys)( *chi_cast, *psi_cast ); \ + } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( copysc, copysc_unb_var1 ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( copysc, copysc_unb_var1 ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( copysc, copysc_unb_var1 ) -#endif +INSERT_GENTFUNC2_BASIC0( copysc ) +INSERT_GENTFUNC2_MIX_D0( copysc ) +INSERT_GENTFUNC2_MIX_P0( copysc ) diff --git a/frame/0/copysc/bli_copysc.h b/frame/0/copysc/bli_copysc.h index be2c8f52a..1e72a2b51 100644 --- a/frame/0/copysc/bli_copysc.h +++ b/frame/0/copysc/bli_copysc.h @@ -32,51 +32,37 @@ */ -#include "bli_copysc_check.h" -#include "bli_copysc_unb_var1.h" - // -// Prototype object-based interface. +// Prototype object-based interfaces. // -void bli_copysc( obj_t* chi, - obj_t* psi ); - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ +#undef GENFRONT +#define GENFRONT( opname ) \ \ -void PASTEMAC(ch,opname)( \ - conj_t conjchi, \ - ctype* chi, \ - ctype* psi \ - ); - -INSERT_GENTPROT_BASIC( copysc ) +void PASTEMAC0(opname) \ + ( \ + obj_t* chi, \ + obj_t* psi \ + ); +GENFRONT( copysc ) // -// Prototype BLAS-like interfaces with heterogeneous-typed operands. +// Define BLAS-like interfaces with heterogeneous-typed operands. // + #undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \ +#define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ \ -void PASTEMAC2(chx,chy,opname)( \ - conj_t conjchi, \ - ctype_x* chi, \ - ctype_y* psi \ - ); +void PASTEMAC2(chx,chy,varname) \ + ( \ + conj_t conjchi, \ + void* chi, \ + void* psi \ + ); INSERT_GENTPROT2_BASIC( copysc ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT INSERT_GENTPROT2_MIX_D( copysc ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT INSERT_GENTPROT2_MIX_P( copysc ) -#endif diff --git a/frame/0/absqsc/bli_absqsc.c b/frame/0/old/absqsc/bli_absqsc.c similarity index 100% rename from frame/0/absqsc/bli_absqsc.c rename to frame/0/old/absqsc/bli_absqsc.c diff --git a/frame/0/absqsc/bli_absqsc.h b/frame/0/old/absqsc/bli_absqsc.h similarity index 100% rename from frame/0/absqsc/bli_absqsc.h rename to frame/0/old/absqsc/bli_absqsc.h diff --git a/frame/0/absqsc/bli_absqsc_check.c b/frame/0/old/absqsc/bli_absqsc_check.c similarity index 100% rename from frame/0/absqsc/bli_absqsc_check.c rename to frame/0/old/absqsc/bli_absqsc_check.c diff --git a/frame/0/absqsc/bli_absqsc_check.h b/frame/0/old/absqsc/bli_absqsc_check.h similarity index 100% rename from frame/0/absqsc/bli_absqsc_check.h rename to frame/0/old/absqsc/bli_absqsc_check.h diff --git a/frame/0/absqsc/bli_absqsc_unb_var1.c b/frame/0/old/absqsc/bli_absqsc_unb_var1.c similarity index 100% rename from frame/0/absqsc/bli_absqsc_unb_var1.c rename to frame/0/old/absqsc/bli_absqsc_unb_var1.c diff --git a/frame/0/absqsc/bli_absqsc_unb_var1.h b/frame/0/old/absqsc/bli_absqsc_unb_var1.h similarity index 100% rename from frame/0/absqsc/bli_absqsc_unb_var1.h rename to frame/0/old/absqsc/bli_absqsc_unb_var1.h diff --git a/frame/0/addsc/bli_addsc.c b/frame/0/old/addsc/bli_addsc.c similarity index 100% rename from frame/0/addsc/bli_addsc.c rename to frame/0/old/addsc/bli_addsc.c diff --git a/frame/0/addsc/bli_addsc.h b/frame/0/old/addsc/bli_addsc.h similarity index 100% rename from frame/0/addsc/bli_addsc.h rename to frame/0/old/addsc/bli_addsc.h diff --git a/frame/0/addsc/bli_addsc_check.c b/frame/0/old/addsc/bli_addsc_check.c similarity index 100% rename from frame/0/addsc/bli_addsc_check.c rename to frame/0/old/addsc/bli_addsc_check.c diff --git a/frame/0/addsc/bli_addsc_check.h b/frame/0/old/addsc/bli_addsc_check.h similarity index 100% rename from frame/0/addsc/bli_addsc_check.h rename to frame/0/old/addsc/bli_addsc_check.h diff --git a/frame/0/addsc/bli_addsc_unb_var1.c b/frame/0/old/addsc/bli_addsc_unb_var1.c similarity index 100% rename from frame/0/addsc/bli_addsc_unb_var1.c rename to frame/0/old/addsc/bli_addsc_unb_var1.c diff --git a/frame/0/addsc/bli_addsc_unb_var1.h b/frame/0/old/addsc/bli_addsc_unb_var1.h similarity index 100% rename from frame/0/addsc/bli_addsc_unb_var1.h rename to frame/0/old/addsc/bli_addsc_unb_var1.h diff --git a/frame/1/addv/bli_addv.c b/frame/0/old/bli_getsc.c similarity index 52% rename from frame/1/addv/bli_addv.c rename to frame/0/old/bli_getsc.c index 95ddf23e5..22c10fdb6 100644 --- a/frame/1/addv/bli_addv.c +++ b/frame/0/old/bli_getsc.c @@ -34,76 +34,78 @@ #include "blis.h" +typedef void (*FUNCPTR_T)( + void* chi, + double* zeta_r, + double* zeta_i + ); + +static FUNCPTR_T GENARRAY(ftypes,getsc); // -// Define object-based interface. +// Define object-based interfaces. // + #undef GENFRONT -#define GENFRONT( opname, varname ) \ +#define GENFRONT( opname ) \ \ void PASTEMAC0(opname)( \ - obj_t* x, \ - obj_t* y \ + obj_t* chi, \ + double* zeta_r, \ + double* zeta_i \ ) \ { \ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( x, y ); \ + num_t dt_chi = bli_obj_datatype( *chi ); \ + num_t dt_def = BLIS_DCOMPLEX; \ + num_t dt_use; \ \ - PASTEMAC0(varname)( x, \ - y ); \ + /* If chi is a constant object, default to using the dcomplex + value to maximize precision, and since we don't know if the + caller needs just the real or the real and imaginary parts. */ \ + void* buf_chi = bli_obj_buffer_for_1x1( dt_def, *chi ); \ +\ + FUNCPTR_T f; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \ +\ + /* The _check() routine prevents integer types, so we know that chi + is either a constant or an actual floating-point type. */ \ + if ( bli_is_constant( dt_chi ) ) dt_use = dt_def; \ + else dt_use = dt_chi; \ +\ + /* Index into the type combination array to extract the correct + function pointer. */ \ + f = ftypes[dt_use]; \ +\ + /* Invoke the function. */ \ + f( \ + buf_chi, \ + zeta_r, \ + zeta_i \ + ); \ } -GENFRONT( addv, addv_kernel ) +GENFRONT( getsc ) // -// Define BLAS-like interfaces with homogeneous-typed operands. +// Define BLAS-like interfaces with typed operands. // + #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ void PASTEMAC(ch,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy \ + void* chi, \ + double* zeta_r, \ + double* zeta_i \ ) \ { \ - PASTEMAC2(ch,ch,varname)( conjx, \ - n, \ - x, incx, \ - y, incy ); \ -} - -INSERT_GENTFUNC_BASIC( addv, ADDV_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \ + ctype* chi_cast = chi; \ \ -void PASTEMAC2(chx,chy,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ) \ -{ \ - PASTEMAC2(chx,chy,varname)( conjx, \ - n, \ - x, incx, \ - y, incy ); \ + PASTEMAC2(ch,d,gets)( *chi_cast, *zeta_r, *zeta_i ); \ } -INSERT_GENTFUNC2_BASIC( addv, ADDV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( addv, ADDV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( addv, ADDV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC( getsc ) diff --git a/frame/1/invertv/bli_invertv.c b/frame/0/old/bli_getsc.h similarity index 73% rename from frame/1/invertv/bli_invertv.c rename to frame/0/old/bli_getsc.h index 22e989177..161ac0728 100644 --- a/frame/1/invertv/bli_invertv.c +++ b/frame/0/old/bli_getsc.h @@ -32,42 +32,33 @@ */ -#include "blis.h" - // -// Define object-based interface. +// Prototype object-based interfaces. // + #undef GENFRONT -#define GENFRONT( opname, varname ) \ +#define GENFRONT( opname ) \ \ void PASTEMAC0(opname)( \ - obj_t* x \ - ) \ -{ \ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( x ); \ -\ - PASTEMAC0(varname)( x ); \ -} - -GENFRONT( invertv, invertv_kernel ) + obj_t* chi, \ + double* zeta_r, \ + double* zeta_i \ + ); +GENFRONT( getsc ) // -// Define BLAS-like interfaces. +// Prototype BLAS-like interfaces with typed operands. // -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ \ void PASTEMAC(ch,opname)( \ - dim_t n, \ - ctype* x, inc_t incx \ - ) \ -{ \ - PASTEMAC(ch,varname)( n, \ - x, incx ); \ -} - -INSERT_GENTFUNC_BASIC( invertv, INVERTV_KERNEL ) + void* chi, \ + double* zeta_r, \ + double* zeta_i \ + ); +INSERT_GENTPROT_BASIC( getsc ) diff --git a/frame/0/old/bli_setsc.c b/frame/0/old/bli_setsc.c new file mode 100644 index 000000000..25f08e205 --- /dev/null +++ b/frame/0/old/bli_setsc.c @@ -0,0 +1,101 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*FUNCPTR_T)( + double* zeta_r, + double* zeta_i, + void* chi + ); + +static FUNCPTR_T GENARRAY(ftypes,setsc); + +// +// Define object-based interfaces. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname)( \ + double* zeta_r, \ + double* zeta_i, \ + obj_t* chi \ + ) \ +{ \ + num_t dt_chi = bli_obj_datatype( *chi ); \ +\ + void* buf_chi = bli_obj_buffer_at_off( *chi ); \ +\ + FUNCPTR_T f; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( zeta_r, zeta_i, chi ); \ +\ + /* Index into the type combination array to extract the correct + function pointer. */ \ + f = ftypes[dt_chi]; \ +\ + /* Invoke the function. */ \ + f( \ + zeta_r, \ + zeta_i, \ + buf_chi \ + ); \ +} + +GENFRONT( setsc ) + + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + double* zeta_r, \ + double* zeta_i \ + void* chi, \ + ) \ +{ \ + ctype* chi_cast = chi; \ +\ + PASTEMAC2(d,ch,sets)( *zeta_r, *zeta_i, *chi_cast ); \ +} + +INSERT_GENTFUNC_BASIC( setsc ) + diff --git a/frame/0/old/bli_setsc.h b/frame/0/old/bli_setsc.h new file mode 100644 index 000000000..05efe4a9e --- /dev/null +++ b/frame/0/old/bli_setsc.h @@ -0,0 +1,64 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname)( \ + double* zeta_r, \ + double* zeta_i, \ + obj_t* chi \ + ); +GENFRONT( setsc ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + double* zeta_r, \ + double* zeta_i, \ + void* chi \ + ); + +INSERT_GENTPROT_BASIC( setsc ) diff --git a/frame/1/swapv/bli_swapv.c b/frame/0/old/copysc/bli_copysc.c similarity index 68% rename from frame/1/swapv/bli_swapv.c rename to frame/0/old/copysc/bli_copysc.c index 3e4d96517..346bf4d89 100644 --- a/frame/1/swapv/bli_swapv.c +++ b/frame/0/old/copysc/bli_copysc.c @@ -38,22 +38,14 @@ // // Define object-based interface. // -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* x, \ - obj_t* y \ - ) \ -{ \ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( x, y ); \ -\ - PASTEMAC0(varname)( x, \ - y ); \ -} +void bli_copysc( obj_t* chi, + obj_t* psi ) +{ + if ( bli_error_checking_is_enabled() ) + bli_copysc_check( chi, psi ); -GENFRONT( swapv, swapv_kernel ) + bli_copysc_unb_var1( chi, psi ); +} // @@ -63,17 +55,17 @@ GENFRONT( swapv, swapv_kernel ) #define GENTFUNC( ctype, ch, opname, varname ) \ \ void PASTEMAC(ch,opname)( \ - dim_t n, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy \ + conj_t conjchi, \ + ctype* chi, \ + ctype* psi \ ) \ { \ - PASTEMAC2(ch,ch,varname)( n, \ - x, incx, \ - y, incy ); \ + PASTEMAC2(ch,ch,varname)( conjchi, \ + chi, \ + psi ); \ } -INSERT_GENTFUNC_BASIC( swapv, SWAPV_KERNEL ) +INSERT_GENTFUNC_BASIC( copysc, copysc_unb_var1 ) // @@ -83,23 +75,25 @@ INSERT_GENTFUNC_BASIC( swapv, SWAPV_KERNEL ) #define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \ \ void PASTEMAC2(chx,chy,opname)( \ - dim_t n, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ + conj_t conjchi, \ + ctype_x* chi, \ + ctype_y* psi \ ) \ { \ - PASTEMAC2(chx,chy,varname)( n, \ - x, incx, \ - y, incy ); \ + PASTEMAC2(chx,chy,varname)( conjchi, \ + chi, \ + psi ); \ } -INSERT_GENTFUNC2_BASIC( swapv, SWAPV_KERNEL ) +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2_BASIC( copysc, copysc_unb_var1 ) #ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( swapv, SWAPV_KERNEL ) +INSERT_GENTFUNC2_MIX_D( copysc, copysc_unb_var1 ) #endif #ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( swapv, SWAPV_KERNEL ) +INSERT_GENTFUNC2_MIX_P( copysc, copysc_unb_var1 ) #endif diff --git a/frame/1/setv/bli_setv.h b/frame/0/old/copysc/bli_copysc.h similarity index 77% rename from frame/1/setv/bli_setv.h rename to frame/0/old/copysc/bli_copysc.h index cb49e9f36..be2c8f52a 100644 --- a/frame/1/setv/bli_setv.h +++ b/frame/0/old/copysc/bli_copysc.h @@ -32,17 +32,15 @@ */ -#include "bli_setv_check.h" - -#include "bli_setv_kernel.h" -#include "bli_setv_ref.h" +#include "bli_copysc_check.h" +#include "bli_copysc_unb_var1.h" // // Prototype object-based interface. // -void bli_setv( obj_t* beta, - obj_t* x ); +void bli_copysc( obj_t* chi, + obj_t* psi ); // @@ -52,33 +50,33 @@ void bli_setv( obj_t* beta, #define GENTPROT( ctype, ch, opname ) \ \ void PASTEMAC(ch,opname)( \ - dim_t n, \ - ctype* beta, \ - ctype* x, inc_t incx \ + conj_t conjchi, \ + ctype* chi, \ + ctype* psi \ ); -INSERT_GENTPROT_BASIC( setv ) +INSERT_GENTPROT_BASIC( copysc ) // // Prototype BLAS-like interfaces with heterogeneous-typed operands. // #undef GENTPROT2 -#define GENTPROT2( ctype_b, ctype_x, chb, chx, opname ) \ +#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \ \ -void PASTEMAC2(chb,chx,opname)( \ - dim_t n, \ - ctype_b* beta, \ - ctype_x* x, inc_t incx \ +void PASTEMAC2(chx,chy,opname)( \ + conj_t conjchi, \ + ctype_x* chi, \ + ctype_y* psi \ ); -INSERT_GENTPROT2_BASIC( setv ) +INSERT_GENTPROT2_BASIC( copysc ) #ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( setv ) +INSERT_GENTPROT2_MIX_D( copysc ) #endif #ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( setv ) +INSERT_GENTPROT2_MIX_P( copysc ) #endif diff --git a/frame/0/copysc/bli_copysc_check.c b/frame/0/old/copysc/bli_copysc_check.c similarity index 100% rename from frame/0/copysc/bli_copysc_check.c rename to frame/0/old/copysc/bli_copysc_check.c diff --git a/frame/0/copysc/bli_copysc_check.h b/frame/0/old/copysc/bli_copysc_check.h similarity index 100% rename from frame/0/copysc/bli_copysc_check.h rename to frame/0/old/copysc/bli_copysc_check.h diff --git a/frame/0/copysc/bli_copysc_unb_var1.c b/frame/0/old/copysc/bli_copysc_unb_var1.c similarity index 100% rename from frame/0/copysc/bli_copysc_unb_var1.c rename to frame/0/old/copysc/bli_copysc_unb_var1.c diff --git a/frame/0/copysc/bli_copysc_unb_var1.h b/frame/0/old/copysc/bli_copysc_unb_var1.h similarity index 100% rename from frame/0/copysc/bli_copysc_unb_var1.h rename to frame/0/old/copysc/bli_copysc_unb_var1.h diff --git a/frame/0/divsc/bli_divsc.c b/frame/0/old/divsc/bli_divsc.c similarity index 100% rename from frame/0/divsc/bli_divsc.c rename to frame/0/old/divsc/bli_divsc.c diff --git a/frame/0/divsc/bli_divsc.h b/frame/0/old/divsc/bli_divsc.h similarity index 100% rename from frame/0/divsc/bli_divsc.h rename to frame/0/old/divsc/bli_divsc.h diff --git a/frame/0/divsc/bli_divsc_check.c b/frame/0/old/divsc/bli_divsc_check.c similarity index 100% rename from frame/0/divsc/bli_divsc_check.c rename to frame/0/old/divsc/bli_divsc_check.c diff --git a/frame/0/divsc/bli_divsc_check.h b/frame/0/old/divsc/bli_divsc_check.h similarity index 100% rename from frame/0/divsc/bli_divsc_check.h rename to frame/0/old/divsc/bli_divsc_check.h diff --git a/frame/0/divsc/bli_divsc_unb_var1.c b/frame/0/old/divsc/bli_divsc_unb_var1.c similarity index 100% rename from frame/0/divsc/bli_divsc_unb_var1.c rename to frame/0/old/divsc/bli_divsc_unb_var1.c diff --git a/frame/0/divsc/bli_divsc_unb_var1.h b/frame/0/old/divsc/bli_divsc_unb_var1.h similarity index 100% rename from frame/0/divsc/bli_divsc_unb_var1.h rename to frame/0/old/divsc/bli_divsc_unb_var1.h diff --git a/frame/0/getsc/bli_getsc.c b/frame/0/old/getsc/bli_getsc.c similarity index 100% rename from frame/0/getsc/bli_getsc.c rename to frame/0/old/getsc/bli_getsc.c diff --git a/frame/0/getsc/bli_getsc.h b/frame/0/old/getsc/bli_getsc.h similarity index 100% rename from frame/0/getsc/bli_getsc.h rename to frame/0/old/getsc/bli_getsc.h diff --git a/frame/0/getsc/bli_getsc_check.c b/frame/0/old/getsc/bli_getsc_check.c similarity index 100% rename from frame/0/getsc/bli_getsc_check.c rename to frame/0/old/getsc/bli_getsc_check.c diff --git a/frame/0/getsc/bli_getsc_check.h b/frame/0/old/getsc/bli_getsc_check.h similarity index 100% rename from frame/0/getsc/bli_getsc_check.h rename to frame/0/old/getsc/bli_getsc_check.h diff --git a/frame/0/mulsc/bli_mulsc.c b/frame/0/old/mulsc/bli_mulsc.c similarity index 100% rename from frame/0/mulsc/bli_mulsc.c rename to frame/0/old/mulsc/bli_mulsc.c diff --git a/frame/0/mulsc/bli_mulsc.h b/frame/0/old/mulsc/bli_mulsc.h similarity index 100% rename from frame/0/mulsc/bli_mulsc.h rename to frame/0/old/mulsc/bli_mulsc.h diff --git a/frame/0/mulsc/bli_mulsc_check.c b/frame/0/old/mulsc/bli_mulsc_check.c similarity index 100% rename from frame/0/mulsc/bli_mulsc_check.c rename to frame/0/old/mulsc/bli_mulsc_check.c diff --git a/frame/0/mulsc/bli_mulsc_check.h b/frame/0/old/mulsc/bli_mulsc_check.h similarity index 100% rename from frame/0/mulsc/bli_mulsc_check.h rename to frame/0/old/mulsc/bli_mulsc_check.h diff --git a/frame/0/mulsc/bli_mulsc_unb_var1.c b/frame/0/old/mulsc/bli_mulsc_unb_var1.c similarity index 100% rename from frame/0/mulsc/bli_mulsc_unb_var1.c rename to frame/0/old/mulsc/bli_mulsc_unb_var1.c diff --git a/frame/0/mulsc/bli_mulsc_unb_var1.h b/frame/0/old/mulsc/bli_mulsc_unb_var1.h similarity index 100% rename from frame/0/mulsc/bli_mulsc_unb_var1.h rename to frame/0/old/mulsc/bli_mulsc_unb_var1.h diff --git a/frame/0/normfsc/bli_normfsc.c b/frame/0/old/normfsc/bli_normfsc.c similarity index 100% rename from frame/0/normfsc/bli_normfsc.c rename to frame/0/old/normfsc/bli_normfsc.c diff --git a/frame/0/normfsc/bli_normfsc.h b/frame/0/old/normfsc/bli_normfsc.h similarity index 100% rename from frame/0/normfsc/bli_normfsc.h rename to frame/0/old/normfsc/bli_normfsc.h diff --git a/frame/0/normfsc/bli_normfsc_check.c b/frame/0/old/normfsc/bli_normfsc_check.c similarity index 100% rename from frame/0/normfsc/bli_normfsc_check.c rename to frame/0/old/normfsc/bli_normfsc_check.c diff --git a/frame/0/normfsc/bli_normfsc_check.h b/frame/0/old/normfsc/bli_normfsc_check.h similarity index 100% rename from frame/0/normfsc/bli_normfsc_check.h rename to frame/0/old/normfsc/bli_normfsc_check.h diff --git a/frame/0/normfsc/bli_normfsc_unb_var1.c b/frame/0/old/normfsc/bli_normfsc_unb_var1.c similarity index 100% rename from frame/0/normfsc/bli_normfsc_unb_var1.c rename to frame/0/old/normfsc/bli_normfsc_unb_var1.c diff --git a/frame/0/normfsc/bli_normfsc_unb_var1.h b/frame/0/old/normfsc/bli_normfsc_unb_var1.h similarity index 100% rename from frame/0/normfsc/bli_normfsc_unb_var1.h rename to frame/0/old/normfsc/bli_normfsc_unb_var1.h diff --git a/frame/0/setsc/bli_setsc.c b/frame/0/old/setsc/bli_setsc.c similarity index 100% rename from frame/0/setsc/bli_setsc.c rename to frame/0/old/setsc/bli_setsc.c diff --git a/frame/0/setsc/bli_setsc.h b/frame/0/old/setsc/bli_setsc.h similarity index 100% rename from frame/0/setsc/bli_setsc.h rename to frame/0/old/setsc/bli_setsc.h diff --git a/frame/0/setsc/bli_setsc_check.c b/frame/0/old/setsc/bli_setsc_check.c similarity index 100% rename from frame/0/setsc/bli_setsc_check.c rename to frame/0/old/setsc/bli_setsc_check.c diff --git a/frame/0/setsc/bli_setsc_check.h b/frame/0/old/setsc/bli_setsc_check.h similarity index 100% rename from frame/0/setsc/bli_setsc_check.h rename to frame/0/old/setsc/bli_setsc_check.h diff --git a/frame/0/sqrtsc/bli_sqrtsc.c b/frame/0/old/sqrtsc/bli_sqrtsc.c similarity index 100% rename from frame/0/sqrtsc/bli_sqrtsc.c rename to frame/0/old/sqrtsc/bli_sqrtsc.c diff --git a/frame/0/sqrtsc/bli_sqrtsc.h b/frame/0/old/sqrtsc/bli_sqrtsc.h similarity index 100% rename from frame/0/sqrtsc/bli_sqrtsc.h rename to frame/0/old/sqrtsc/bli_sqrtsc.h diff --git a/frame/0/sqrtsc/bli_sqrtsc_check.c b/frame/0/old/sqrtsc/bli_sqrtsc_check.c similarity index 100% rename from frame/0/sqrtsc/bli_sqrtsc_check.c rename to frame/0/old/sqrtsc/bli_sqrtsc_check.c diff --git a/frame/0/sqrtsc/bli_sqrtsc_check.h b/frame/0/old/sqrtsc/bli_sqrtsc_check.h similarity index 100% rename from frame/0/sqrtsc/bli_sqrtsc_check.h rename to frame/0/old/sqrtsc/bli_sqrtsc_check.h diff --git a/frame/0/sqrtsc/bli_sqrtsc_unb_var1.c b/frame/0/old/sqrtsc/bli_sqrtsc_unb_var1.c similarity index 100% rename from frame/0/sqrtsc/bli_sqrtsc_unb_var1.c rename to frame/0/old/sqrtsc/bli_sqrtsc_unb_var1.c diff --git a/frame/0/sqrtsc/bli_sqrtsc_unb_var1.h b/frame/0/old/sqrtsc/bli_sqrtsc_unb_var1.h similarity index 100% rename from frame/0/sqrtsc/bli_sqrtsc_unb_var1.h rename to frame/0/old/sqrtsc/bli_sqrtsc_unb_var1.h diff --git a/frame/0/subsc/bli_subsc.c b/frame/0/old/subsc/bli_subsc.c similarity index 100% rename from frame/0/subsc/bli_subsc.c rename to frame/0/old/subsc/bli_subsc.c diff --git a/frame/0/subsc/bli_subsc.h b/frame/0/old/subsc/bli_subsc.h similarity index 100% rename from frame/0/subsc/bli_subsc.h rename to frame/0/old/subsc/bli_subsc.h diff --git a/frame/0/subsc/bli_subsc_check.c b/frame/0/old/subsc/bli_subsc_check.c similarity index 100% rename from frame/0/subsc/bli_subsc_check.c rename to frame/0/old/subsc/bli_subsc_check.c diff --git a/frame/0/subsc/bli_subsc_check.h b/frame/0/old/subsc/bli_subsc_check.h similarity index 100% rename from frame/0/subsc/bli_subsc_check.h rename to frame/0/old/subsc/bli_subsc_check.h diff --git a/frame/0/subsc/bli_subsc_unb_var1.c b/frame/0/old/subsc/bli_subsc_unb_var1.c similarity index 100% rename from frame/0/subsc/bli_subsc_unb_var1.c rename to frame/0/old/subsc/bli_subsc_unb_var1.c diff --git a/frame/0/subsc/bli_subsc_unb_var1.h b/frame/0/old/subsc/bli_subsc_unb_var1.h similarity index 100% rename from frame/0/subsc/bli_subsc_unb_var1.h rename to frame/0/old/subsc/bli_subsc_unb_var1.h diff --git a/frame/0/unzipsc/bli_unzipsc.c b/frame/0/old/unzipsc/bli_unzipsc.c similarity index 100% rename from frame/0/unzipsc/bli_unzipsc.c rename to frame/0/old/unzipsc/bli_unzipsc.c diff --git a/frame/0/unzipsc/bli_unzipsc.h b/frame/0/old/unzipsc/bli_unzipsc.h similarity index 100% rename from frame/0/unzipsc/bli_unzipsc.h rename to frame/0/old/unzipsc/bli_unzipsc.h diff --git a/frame/0/unzipsc/bli_unzipsc_check.c b/frame/0/old/unzipsc/bli_unzipsc_check.c similarity index 100% rename from frame/0/unzipsc/bli_unzipsc_check.c rename to frame/0/old/unzipsc/bli_unzipsc_check.c diff --git a/frame/0/unzipsc/bli_unzipsc_check.h b/frame/0/old/unzipsc/bli_unzipsc_check.h similarity index 100% rename from frame/0/unzipsc/bli_unzipsc_check.h rename to frame/0/old/unzipsc/bli_unzipsc_check.h diff --git a/frame/0/unzipsc/bli_unzipsc_unb_var1.c b/frame/0/old/unzipsc/bli_unzipsc_unb_var1.c similarity index 100% rename from frame/0/unzipsc/bli_unzipsc_unb_var1.c rename to frame/0/old/unzipsc/bli_unzipsc_unb_var1.c diff --git a/frame/0/unzipsc/bli_unzipsc_unb_var1.h b/frame/0/old/unzipsc/bli_unzipsc_unb_var1.h similarity index 100% rename from frame/0/unzipsc/bli_unzipsc_unb_var1.h rename to frame/0/old/unzipsc/bli_unzipsc_unb_var1.h diff --git a/frame/0/zipsc/bli_zipsc.c b/frame/0/old/zipsc/bli_zipsc.c similarity index 100% rename from frame/0/zipsc/bli_zipsc.c rename to frame/0/old/zipsc/bli_zipsc.c diff --git a/frame/0/zipsc/bli_zipsc.h b/frame/0/old/zipsc/bli_zipsc.h similarity index 100% rename from frame/0/zipsc/bli_zipsc.h rename to frame/0/old/zipsc/bli_zipsc.h diff --git a/frame/0/zipsc/bli_zipsc_check.c b/frame/0/old/zipsc/bli_zipsc_check.c similarity index 100% rename from frame/0/zipsc/bli_zipsc_check.c rename to frame/0/old/zipsc/bli_zipsc_check.c diff --git a/frame/0/zipsc/bli_zipsc_check.h b/frame/0/old/zipsc/bli_zipsc_check.h similarity index 100% rename from frame/0/zipsc/bli_zipsc_check.h rename to frame/0/old/zipsc/bli_zipsc_check.h diff --git a/frame/0/zipsc/bli_zipsc_unb_var1.c b/frame/0/old/zipsc/bli_zipsc_unb_var1.c similarity index 100% rename from frame/0/zipsc/bli_zipsc_unb_var1.c rename to frame/0/old/zipsc/bli_zipsc_unb_var1.c diff --git a/frame/0/zipsc/bli_zipsc_unb_var1.h b/frame/0/old/zipsc/bli_zipsc_unb_var1.h similarity index 100% rename from frame/0/zipsc/bli_zipsc_unb_var1.h rename to frame/0/old/zipsc/bli_zipsc_unb_var1.h diff --git a/frame/1/addv/bli_addv_ref.c b/frame/1/addv/bli_addv_ref.c deleted file mode 100644 index 5a92bf7ad..000000000 --- a/frame/1/addv/bli_addv_ref.c +++ /dev/null @@ -1,145 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T addv_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - dim_t n, - void* x, inc_t incx, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,addv_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,addv_ref); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,addv_ref); -#endif -#endif - - -void bli_addv_ref( obj_t* x, - obj_t* y ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t conjx = bli_obj_conj_status( *x ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - FUNCPTR_T f; - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y]; - - // Invoke the function. - f( conjx, - n, - buf_x, inc_x, - buf_y, inc_y ); -} -*/ - - -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname ) \ -\ -void PASTEMAC2(chx,chy,varname) \ - ( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ) \ -{ \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_x* chi1; \ - ctype_y* psi1; \ - dim_t i; \ -\ - if ( bli_zero_dim1( n ) ) return; \ -\ - chi1 = x_cast; \ - psi1 = y_cast; \ -\ - if ( bli_is_conj( conjx ) ) \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC2(chx,chy,addjs)( *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ - else \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC2(chx,chy,adds)( *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC0( addv_ref ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D0( addv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P0( addv_ref ) -#endif - diff --git a/frame/1/axpyv/bli_axpyv.c b/frame/1/axpyv/bli_axpyv.c deleted file mode 100644 index 48b4db337..000000000 --- a/frame/1/axpyv/bli_axpyv.c +++ /dev/null @@ -1,128 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* alpha, \ - obj_t* x, \ - obj_t* y \ - ) \ -{ \ - num_t dt_x; \ - obj_t alpha_local; \ -\ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( alpha, x, y ); \ -\ - /* Use the datatype of x as the target type for beta (since we do - not assume mixed domain/type support is enabled). */ \ - dt_x = bli_obj_datatype( *x ); \ -\ - /* Create an object to hold a copy-cast of alpha. */ \ - bli_obj_scalar_init_detached_copy_of( dt_x, \ - BLIS_NO_CONJUGATE, \ - alpha, \ - &alpha_local ); \ -\ - PASTEMAC0(varname)( &alpha_local, \ - x, \ - y ); \ -} - -GENFRONT( axpyv, axpyv_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy \ - ) \ -{ \ - PASTEMAC3(ch,ch,ch,varname)( conjx, \ - n, \ - alpha, \ - x, incx, \ - y, incy ); \ -} - -INSERT_GENTFUNC_BASIC( axpyv, AXPYV_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3 -#define GENTFUNC3( ctype_a, ctype_x, ctype_y, cha, chx, chy, opname, varname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype_a* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ) \ -{ \ - PASTEMAC3(cha,chx,chy,varname)( conjx, \ - n, \ - alpha, \ - x, incx, \ - y, incy ); \ -} - -INSERT_GENTFUNC3_BASIC( axpyv, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3_MIX_D( axpyv, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3_MIX_P( axpyv, AXPYV_KERNEL ) -#endif - diff --git a/frame/1/axpyv/bli_axpyv_ref.c b/frame/1/axpyv/bli_axpyv_ref.c deleted file mode 100644 index 80c92fc08..000000000 --- a/frame/1/axpyv/bli_axpyv_ref.c +++ /dev/null @@ -1,169 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" -/* -#define FUNCPTR_T axpyv_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - dim_t n, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,axpyv_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,axpyv_ref); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,axpyv_ref); -#endif -#endif - - -void bli_axpyv_ref( obj_t* alpha, - obj_t* x, - obj_t* y ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t conjx = bli_obj_conj_status( *x ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // If alpha is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the alpha object and extract the buffer at the alpha offset. - bli_set_scalar_dt_buffer( alpha, dt_x, dt_alpha, buf_alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_alpha][dt_x][dt_y]; - - // Invoke the function. - f( conjx, - n, - buf_alpha, - buf_x, inc_x, - buf_y, inc_y ); -} -*/ - -#undef GENTFUNC3 -#define GENTFUNC3( ctype_a, ctype_x, ctype_y, cha, chx, chy, varname, addvker ) \ -\ -void PASTEMAC3(cha,chx,chy,varname) \ - ( \ - conj_t conjx, \ - dim_t n, \ - ctype_a* restrict alpha, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ) \ -{ \ - ctype_a* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_x* chi1; \ - ctype_y* psi1; \ - dim_t i; \ -\ - if ( bli_zero_dim1( n ) ) return; \ -\ - /* If alpha is zero, return. */ \ - if ( PASTEMAC(cha,eq0)( *alpha_cast ) ) return; \ -\ - /* If alpha is one, use addv. */ \ - if ( PASTEMAC(cha,eq1)( *alpha_cast ) ) \ - { \ - PASTEMAC2(chx,chy,addvker)( conjx, \ - n, \ - x, incx, \ - y, incy ); \ - return; \ - } \ -\ - chi1 = x_cast; \ - psi1 = y_cast; \ -\ - if ( bli_is_conj( conjx ) ) \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC3(cha,chx,chy,axpyjs)( *alpha_cast, *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ - else \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC3(cha,chx,chy,axpys)( *alpha_cast, *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3_BASIC( axpyv_ref, ADDV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3_MIX_D( axpyv_ref, ADDV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3_MIX_P( axpyv_ref, ADDV_KERNEL ) -#endif - diff --git a/frame/1/bli_l1v.h b/frame/1/bli_l1v.h new file mode 100644 index 000000000..f557118f0 --- /dev/null +++ b/frame/1/bli_l1v.h @@ -0,0 +1,58 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "bli_l1v_cntx.h" +#include "bli_l1v_check.h" + +#include "bli_l1v_ft.h" + +// Prototype object APIs with and without contexts. +#include "bli_oapi_w_cntx.h" +#include "bli_l1v_oapi.h" +#include "bli_oapi_wo_cntx.h" +#include "bli_l1v_oapi.h" + +#include "bli_l1v_tapi.h" + +// Pack-related +#include "bli_packv.h" +#include "bli_unpackv.h" + +// Other +#include "bli_scalv_cntl.h" +#include "bli_scalv_int.h" + +// Reference kernel headers +#include "bli_l1v_ref.h" + diff --git a/frame/1/bli_l1v_check.c b/frame/1/bli_l1v_check.c new file mode 100644 index 000000000..b3ac34397 --- /dev/null +++ b/frame/1/bli_l1v_check.c @@ -0,0 +1,348 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define object-based check functions. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* y \ + ) \ +{ \ + bli_l1v_xy_check( x, y ); \ +} + +GENFRONT( addv ) +GENFRONT( copyv ) +GENFRONT( subv ) +GENFRONT( swapv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + ) \ +{ \ + bli_l1v_axy_check( alpha, x, y ); \ +} + +GENFRONT( axpyv ) +GENFRONT( scal2v ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* y, \ + obj_t* rho \ + ) \ +{ \ + bli_l1v_dot_check( &BLIS_ONE, x, y, &BLIS_ONE, rho ); \ +} + +GENFRONT( dotv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y, \ + obj_t* beta, \ + obj_t* rho \ + ) \ +{ \ + bli_l1v_dot_check( alpha, x, y, beta, rho ); \ +} + +GENFRONT( dotxv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x \ + ) \ +{ \ + bli_l1v_x_check( x ); \ +} + +GENFRONT( invertv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + ) \ +{ \ + bli_l1v_ax_check( alpha, x ); \ +} + +GENFRONT( scalv ) +GENFRONT( setv ) + + +// ----------------------------------------------------------------------------- + +void bli_l1v_xy_check + ( + obj_t* x, + obj_t* y + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_equal_vector_lengths( x, y ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); +} + +void bli_l1v_axy_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_equal_vector_lengths( x, y ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); +} + +void bli_l1v_dot_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* beta, + obj_t* rho + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( rho ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( rho ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( rho ); + bli_check_error_code( e_val ); + + e_val = bli_check_equal_vector_lengths( x, y ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( rho ); + bli_check_error_code( e_val ); +} + +void bli_l1v_x_check + ( + obj_t* x + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); +} + +void bli_l1v_ax_check + ( + obj_t* alpha, + obj_t* x + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); +} + diff --git a/frame/1/bli_l1v_check.h b/frame/1/bli_l1v_check.h new file mode 100644 index 000000000..ab3cfeee9 --- /dev/null +++ b/frame/1/bli_l1v_check.h @@ -0,0 +1,155 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based check functions. +// + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* y \ + ); + +GENTPROT( addv ) +GENTPROT( copyv ) +GENTPROT( subv ) +GENTPROT( swapv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + ); + +GENTPROT( axpyv ) +GENTPROT( scal2v ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* y, \ + obj_t* rho \ + ); + +GENTPROT( dotv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y, \ + obj_t* beta, \ + obj_t* rho \ + ); + +GENTPROT( dotxv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x \ + ); + +GENTPROT( invertv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + ); + +GENTPROT( scalv ) +GENTPROT( setv ) + + +// ----------------------------------------------------------------------------- + +void bli_l1v_xy_check + ( + obj_t* x, + obj_t* y + ); + +void bli_l1v_axy_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y + ); + +void bli_l1v_dot_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* beta, + obj_t* rho + ); + +void bli_l1v_x_check + ( + obj_t* x + ); + +void bli_l1v_ax_check + ( + obj_t* alpha, + obj_t* x + ); + diff --git a/frame/1/bli_l1v_cntx.c b/frame/1/bli_l1v_cntx.c new file mode 100644 index 000000000..482441451 --- /dev/null +++ b/frame/1/bli_l1v_cntx.c @@ -0,0 +1,89 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define context initialization functions. +// + +#undef GENFRONT +#define GENFRONT( opname, kertype ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with the kernel associated with the current + operation. */ \ + bli_gks_cntx_set_l1v_ker( kertype, cntx ); \ +} \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( addv, BLIS_ADDV_KER ) +GENFRONT( copyv, BLIS_COPYV_KER ) +GENFRONT( dotv, BLIS_DOTV_KER ) +GENFRONT( dotxv, BLIS_DOTXV_KER ) +GENFRONT( invertv, BLIS_INVERTV_KER ) +GENFRONT( setv, BLIS_SETV_KER ) +GENFRONT( subv, BLIS_SUBV_KER ) +GENFRONT( swapv, BLIS_SWAPV_KER ) + + +#undef GENFRONT +#define GENFRONT( opname, kertype, depname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernel dependencies. */ \ + PASTEMAC(depname,_cntx_init)( cntx ); \ +\ + /* Initialize the context with the kernel associated with the current + operation. */ \ + bli_gks_cntx_set_l1v_ker( kertype, cntx ); \ +} \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( axpyv, BLIS_AXPYV_KER, addv ) +GENFRONT( scal2v, BLIS_SCAL2V_KER, setv ) +GENFRONT( scalv, BLIS_SCALV_KER, setv ) + diff --git a/frame/1/bli_l1v_cntx.h b/frame/1/bli_l1v_cntx.h new file mode 100644 index 000000000..6db0a29c1 --- /dev/null +++ b/frame/1/bli_l1v_cntx.h @@ -0,0 +1,57 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype context initialization functions. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ); \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ); + +GENPROT( addv ) +GENPROT( axpyv ) +GENPROT( copyv ) +GENPROT( dotv ) +GENPROT( dotxv ) +GENPROT( invertv ) +GENPROT( scalv ) +GENPROT( scal2v ) +GENPROT( setv ) +GENPROT( subv ) +GENPROT( swapv ) + diff --git a/frame/1/bli_l1v_ft.h b/frame/1/bli_l1v_ft.h new file mode 100644 index 000000000..e206938ce --- /dev/null +++ b/frame/1/bli_l1v_ft.h @@ -0,0 +1,167 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#ifndef BLIS_L1V_FT_H +#define BLIS_L1V_FT_H + + +// +// -- Level-1v function types -------------------------------------------------- +// + +// addv, copyv, subv + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( addv ) +INSERT_GENTDEF( copyv ) +INSERT_GENTDEF( subv ) + +// axpyv, scal2v + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( axpyv ) +INSERT_GENTDEF( scal2v ) + +// dotv + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( dotv ) + +// dotxv + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* beta, \ + ctype* rho, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( dotxv ) + +// invertv + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( invertv ) + +// scalv, setv + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conjalpha, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( scalv ) +INSERT_GENTDEF( setv ) + +// swapv + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( swapv ) + + + + +#endif + diff --git a/frame/1/bli_l1v_ker.h b/frame/1/bli_l1v_ker.h new file mode 100644 index 000000000..33cc7e6ae --- /dev/null +++ b/frame/1/bli_l1v_ker.h @@ -0,0 +1,151 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Define template prototypes for level-1v kernels. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( addv_ker_name ) +INSERT_GENTPROT_BASIC( copyv_ker_name ) +INSERT_GENTPROT_BASIC( subv_ker_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); \ + +INSERT_GENTPROT_BASIC( axpyv_ker_name ) +INSERT_GENTPROT_BASIC( scal2v_ker_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + cntx_t* cntx \ + ); \ + +INSERT_GENTPROT_BASIC( dotv_ker_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* beta, \ + ctype* rho, \ + cntx_t* cntx \ + ); \ + +INSERT_GENTPROT_BASIC( dotxv_ker_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); \ + +INSERT_GENTPROT_BASIC( invertv_ker_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjalpha, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); \ + +INSERT_GENTPROT_BASIC( scalv_ker_name ) +INSERT_GENTPROT_BASIC( setv_ker_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); \ + +INSERT_GENTPROT_BASIC( swapv_ker_name ) + diff --git a/frame/1/bli_l1v_oapi.c b/frame/1/bli_l1v_oapi.c new file mode 100644 index 000000000..6482d5cdf --- /dev/null +++ b/frame/1/bli_l1v_oapi.c @@ -0,0 +1,370 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +// Guard the function definitions so that they are only compiled when +// #included from files that define the object API macros. +#ifdef BLIS_ENABLE_OAPI + +// +// Define object-based interfaces. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + conj_t conjx = bli_obj_conj_status( *x ); \ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t inc_y = bli_obj_vector_inc( *y ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x, y ); \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_7 \ + ( \ + dt, \ + opname, \ + conjx, \ + n, \ + buf_x, inc_x, \ + buf_y, inc_y, \ + cntx \ + ); \ +} + +GENFRONT( addv ) +GENFRONT( copyv ) +GENFRONT( subv ) + + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + conj_t conjx = bli_obj_conj_status( *x ); \ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t inc_y = bli_obj_vector_inc( *y ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x, y ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_8 \ + ( \ + dt, \ + opname, \ + conjx, \ + n, \ + buf_alpha, \ + buf_x, inc_x, \ + buf_y, inc_y, \ + cntx \ + ); \ +} + +GENFRONT( axpyv ) +GENFRONT( scal2v ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* y, \ + obj_t* rho \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + conj_t conjx = bli_obj_conj_status( *x ); \ + conj_t conjy = bli_obj_conj_status( *y ); \ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t inc_y = bli_obj_vector_inc( *y ); \ + void* buf_rho = bli_obj_buffer_at_off( *rho ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x, y, rho ); \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_9 \ + ( \ + dt, \ + opname, \ + conjx, \ + conjy, \ + n, \ + buf_x, inc_x, \ + buf_y, inc_y, \ + buf_rho, \ + cntx \ + ); \ +} + +GENFRONT( dotv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y, \ + obj_t* beta, \ + obj_t* rho \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + conj_t conjx = bli_obj_conj_status( *x ); \ + conj_t conjy = bli_obj_conj_status( *y ); \ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t inc_y = bli_obj_vector_inc( *y ); \ + void* buf_rho = bli_obj_buffer_at_off( *rho ); \ +\ + void* buf_alpha; \ + void* buf_beta; \ +\ + obj_t alpha_local; \ + obj_t beta_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x, y, beta, rho ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + beta, &beta_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ + buf_beta = bli_obj_buffer_for_1x1( dt, beta_local ); \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_11 \ + ( \ + dt, \ + opname, \ + conjx, \ + conjy, \ + n, \ + buf_alpha, \ + buf_x, inc_x, \ + buf_y, inc_y, \ + buf_beta, \ + buf_rho, \ + cntx \ + ); \ +} + +GENFRONT( dotxv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x ); \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_4 \ + ( \ + dt, \ + opname, \ + n, \ + buf_x, inc_x, \ + cntx \ + ); \ +} + +GENFRONT( invertv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + /* conj_t conjalpha = bli_obj_conj_status( *alpha ); */ \ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_6 \ + ( \ + dt, \ + opname, \ + BLIS_NO_CONJUGATE, /* internal conjugation applied during copy-cast. */ \ + n, \ + buf_alpha, \ + buf_x, inc_x, \ + cntx \ + ); \ +} + +GENFRONT( scalv ) +GENFRONT( setv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t inc_y = bli_obj_vector_inc( *y ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x, y ); \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_6 \ + ( \ + dt, \ + opname, \ + n, \ + buf_x, inc_x, \ + buf_y, inc_y, \ + cntx \ + ); \ +} + +GENFRONT( swapv ) + + +#endif + diff --git a/frame/1/bli_l1v_oapi.h b/frame/1/bli_l1v_oapi.h new file mode 100644 index 000000000..2f4da57d8 --- /dev/null +++ b/frame/1/bli_l1v_oapi.h @@ -0,0 +1,137 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( addv ) +GENTPROT( copyv ) +GENTPROT( subv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( axpyv ) +GENTPROT( scal2v ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* y, \ + obj_t* rho \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( dotv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y, \ + obj_t* beta, \ + obj_t* rho \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( dotxv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( invertv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( scalv ) +GENTPROT( setv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( swapv ) + diff --git a/frame/1/bli_l1v_oapi_wc.c b/frame/1/bli_l1v_oapi_wc.c new file mode 100644 index 000000000..33d1d6201 --- /dev/null +++ b/frame/1/bli_l1v_oapi_wc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-aware. +#include "bli_oapi_w_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l1v_oapi.c" + diff --git a/frame/1/bli_l1v_oapi_woc.c b/frame/1/bli_l1v_oapi_woc.c new file mode 100644 index 000000000..8df8831d7 --- /dev/null +++ b/frame/1/bli_l1v_oapi_woc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-less. +#include "bli_oapi_wo_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l1v_oapi.c" + diff --git a/frame/1/bli_l1v_tapi.c b/frame/1/bli_l1v_tapi.c new file mode 100644 index 000000000..af92aa92d --- /dev/null +++ b/frame/1/bli_l1v_tapi.c @@ -0,0 +1,289 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + conjx, \ + n, \ + x, incx, \ + y, incy, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( addv, BLIS_ADDV_KER ) +INSERT_GENTFUNC_BASIC( copyv, BLIS_COPYV_KER ) +INSERT_GENTFUNC_BASIC( subv, BLIS_SUBV_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + conjx, \ + n, \ + alpha, \ + x, incx, \ + y, incy, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( axpyv, BLIS_AXPYV_KER ) +INSERT_GENTFUNC_BASIC( scal2v, BLIS_SCAL2V_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + conjx, \ + conjy, \ + n, \ + x, incx, \ + y, incy, \ + rho, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( dotv, BLIS_DOTV_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* beta, \ + ctype* rho, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + conjx, \ + conjy, \ + n, \ + alpha, \ + x, incx, \ + y, incy, \ + beta, \ + rho, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( dotxv, BLIS_DOTXV_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + n, \ + x, incx, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( invertv, BLIS_INVERTV_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjalpha, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + conjalpha, \ + n, \ + alpha, \ + x, incx, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( scalv, BLIS_SCALV_KER ) +INSERT_GENTFUNC_BASIC( setv, BLIS_SETV_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + n, \ + x, incx, \ + y, incy, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( swapv, BLIS_SWAPV_KER ) + diff --git a/frame/1/bli_l1v_tapi.h b/frame/1/bli_l1v_tapi.h new file mode 100644 index 000000000..618d9a280 --- /dev/null +++ b/frame/1/bli_l1v_tapi.h @@ -0,0 +1,77 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Generate prototypes for level-1v operations. +// + +#undef addv_ker_name +#define addv_ker_name addv + +#undef axpyv_ker_name +#define axpyv_ker_name axpyv + +#undef copyv_ker_name +#define copyv_ker_name copyv + +#undef dotv_ker_name +#define dotv_ker_name dotv + +#undef dotxv_ker_name +#define dotxv_ker_name dotxv + +#undef invertv_ker_name +#define invertv_ker_name invertv + +#undef scalv_ker_name +#define scalv_ker_name scalv + +#undef scal2v_ker_name +#define scal2v_ker_name scal2v + +#undef setv_ker_name +#define setv_ker_name setv + +#undef subv_ker_name +#define subv_ker_name subv + +#undef swapv_ker_name +#define swapv_ker_name swapv + + +// Include the level-1v kernel API template. + +#include "bli_l1v_ker.h" + diff --git a/frame/1/copyv/bli_copyv.c b/frame/1/copyv/bli_copyv.c deleted file mode 100644 index 56aa8b286..000000000 --- a/frame/1/copyv/bli_copyv.c +++ /dev/null @@ -1,109 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* x, \ - obj_t* y \ - ) \ -{ \ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( x, y ); \ -\ - PASTEMAC0(varname)( x, \ - y ); \ -} - -GENFRONT( copyv, copyv_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy \ - ) \ -{ \ - PASTEMAC2(ch,ch,varname)( conjx, \ - n, \ - x, incx, \ - y, incy ); \ -} - -INSERT_GENTFUNC_BASIC( copyv, COPYV_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \ -\ -void PASTEMAC2(chx,chy,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ) \ -{ \ - PASTEMAC2(chx,chy,varname)( conjx, \ - n, \ - x, incx, \ - y, incy ); \ -} - -INSERT_GENTFUNC2_BASIC( copyv, COPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( copyv, COPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( copyv, COPYV_KERNEL ) -#endif - diff --git a/frame/1/copyv/bli_copyv_kernel.h b/frame/1/copyv/bli_copyv_kernel.h deleted file mode 100644 index 83c8c6a4c..000000000 --- a/frame/1/copyv/bli_copyv_kernel.h +++ /dev/null @@ -1,61 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_copyv_kernel( obj_t* x, - obj_t* y ); - - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ -\ -void PASTEMAC2(chx,chy,varname)( \ - conj_t conjx, \ - dim_t n, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ); - -INSERT_GENTPROT2_BASIC( copyv_kernel_void ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( copyv_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( copyv_kernel_void ) -#endif diff --git a/frame/1/dotv/bli_dotv.c b/frame/1/dotv/bli_dotv.c deleted file mode 100644 index 34a1b8499..000000000 --- a/frame/1/dotv/bli_dotv.c +++ /dev/null @@ -1,121 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* x, \ - obj_t* y, \ - obj_t* rho \ - ) \ -{ \ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( x, y, rho ); \ -\ - PASTEMAC0(varname)( x, \ - y, \ - rho ); \ -} - -GENFRONT( dotv, dotv_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy, \ - ctype* rho \ - ) \ -{ \ - PASTEMAC3(ch,ch,ch,varname)( conjx, \ - conjy, \ - n, \ - x, incx, \ - y, incy, \ - rho ); \ -} - -INSERT_GENTFUNC_BASIC( dotv, DOTV_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3 -#define GENTFUNC3( ctype_x, ctype_y, ctype_r, chx, chy, chr, opname, varname ) \ -\ -void PASTEMAC3(chx,chy,chr,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_r* rho \ - ) \ -{ \ - PASTEMAC3(chx,chy,chr,varname)( conjx, \ - conjy, \ - n, \ - x, incx, \ - y, incy, \ - rho ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3_BASIC( dotv, DOTV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3_MIX_D( dotv, DOTV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3_MIX_P( dotv, DOTV_KERNEL ) -#endif - diff --git a/frame/1/dotv/bli_dotv_ref.c b/frame/1/dotv/bli_dotv_ref.c deleted file mode 100644 index 64fd207b0..000000000 --- a/frame/1/dotv/bli_dotv_ref.c +++ /dev/null @@ -1,177 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T dotv_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - conj_t conjy, - dim_t n, - void* x, inc_t incx, - void* y, inc_t incy, - void* rho - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotv_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotv_ref); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotv_ref); -#endif -#endif - - -void bli_dotv_ref( obj_t* x, - obj_t* y, - obj_t* rho ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_rho = bli_obj_datatype( *rho ); - - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - void* buf_rho = bli_obj_buffer_at_off( *rho ); - - FUNCPTR_T f; - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_rho]; - - // Invoke the function. - f( conjx, - conjy, - n, - buf_x, inc_x, - buf_y, inc_y, - buf_rho ); -} -*/ - -#undef GENTFUNC3 -#define GENTFUNC3( ctype_x, ctype_y, ctype_r, chx, chy, chr, varname ) \ -\ -void PASTEMAC3(chx,chy,chr,varname) \ - ( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_r* restrict rho \ - ) \ -{ \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_r* rho_cast = rho; \ - ctype_x* chi1; \ - ctype_y* psi1; \ - ctype_r dotxy; \ - dim_t i; \ - conj_t conjx_use; \ -\ - if ( bli_zero_dim1( n ) ) \ - { \ - PASTEMAC(chr,set0s)( *rho_cast ); \ - return; \ - } \ -\ - PASTEMAC(chr,set0s)( dotxy ); \ -\ - chi1 = x_cast; \ - psi1 = y_cast; \ -\ - conjx_use = conjx; \ -\ - /* If y must be conjugated, we do so indirectly by first toggling the - effective conjugation of x and then conjugating the resulting dot - product. */ \ - if ( bli_is_conj( conjy ) ) \ - bli_toggle_conj( conjx_use ); \ -\ - if ( bli_is_conj( conjx_use ) ) \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC3(chx,chy,chr,dotjs)( *chi1, *psi1, dotxy ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ - else \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC3(chx,chy,chr,dots)( *chi1, *psi1, dotxy ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ -\ - if ( bli_is_conj( conjy ) ) \ - PASTEMAC(chr,conjs)( dotxy ); \ -\ - PASTEMAC2(chr,chr,copys)( dotxy, *rho_cast ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3_BASIC0( dotv_ref ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3_MIX_D0( dotv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3_MIX_P0( dotv_ref ) -#endif - diff --git a/frame/1/dotxv/bli_dotxv.c b/frame/1/dotxv/bli_dotxv.c deleted file mode 100644 index 08cb71b04..000000000 --- a/frame/1/dotxv/bli_dotxv.c +++ /dev/null @@ -1,133 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* alpha, \ - obj_t* x, \ - obj_t* y, \ - obj_t* beta, \ - obj_t* rho \ - ) \ -{ \ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( alpha, x, y, beta, rho ); \ -\ - PASTEMAC0(varname)( alpha, \ - x, \ - y, \ - beta, \ - rho ); \ -} - -GENFRONT( dotxv, dotxv_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy, \ - ctype* beta, \ - ctype* rho \ - ) \ -{ \ - PASTEMAC3(ch,ch,ch,varname)( conjx, \ - conjy, \ - n, \ - alpha, \ - x, incx, \ - y, incy, \ - beta, \ - rho ); \ -} - -INSERT_GENTFUNC_BASIC( dotxv, DOTXV_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_r, ctype_xy, chx, chy, chr, chxy, opname, varname ) \ -\ -void PASTEMAC3(chx,chy,chr,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_xy* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_r* beta, \ - ctype_r* rho \ - ) \ -{ \ - PASTEMAC3(chx,chy,chr,varname)( conjx, \ - conjy, \ - n, \ - alpha, \ - x, incx, \ - y, incy, \ - beta, \ - rho ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( dotxv, DOTXV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( dotxv, DOTXV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( dotxv, DOTXV_KERNEL ) -#endif - diff --git a/frame/1/dotxv/bli_dotxv_kernel.c b/frame/1/dotxv/bli_dotxv_kernel.c deleted file mode 100644 index 247b767d2..000000000 --- a/frame/1/dotxv/bli_dotxv_kernel.c +++ /dev/null @@ -1,153 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -#define FUNCPTR_T dotxv_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - conj_t conjy, - dim_t n, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* beta, - void* rho - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotxv_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotxv_kernel_void); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotxv_kernel_void); -#endif -#endif - - -void bli_dotxv_kernel( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* beta, - obj_t* rho ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_rho = bli_obj_datatype( *rho ); - - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - void* buf_rho = bli_obj_buffer_at_off( *rho ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of x and y. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_x, dt_y ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of rho. - dt_beta = dt_rho; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_rho]; - - // Invoke the function. - f( conjx, - conjy, - n, - buf_alpha, - buf_x, inc_x, - buf_y, inc_y, - buf_beta, - buf_rho ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_r, ctype_xy, chx, chy, chr, chxy, varname, kername ) \ -\ -void PASTEMAC3(chx,chy,chr,varname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* beta, \ - void* rho \ - ) \ -{ \ - PASTEMAC3(chx,chy,chr,kername)( conjx, \ - conjy, \ - n, \ - alpha, \ - x, incx, \ - y, incy, \ - beta, \ - rho ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( dotxv_kernel_void, DOTXV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( dotxv_kernel_void, DOTXV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( dotxv_kernel_void, DOTXV_KERNEL ) -#endif - diff --git a/frame/1/dotxv/bli_dotxv_kernel.h b/frame/1/dotxv/bli_dotxv_kernel.h deleted file mode 100644 index fe41cf168..000000000 --- a/frame/1/dotxv/bli_dotxv_kernel.h +++ /dev/null @@ -1,69 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_dotxv_kernel( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* beta, - obj_t* rho ); - - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_r, ctype_xy, chx, chy, chr, chxy, varname ) \ -\ -void PASTEMAC3(chx,chy,chr,varname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* beta, \ - void* rho \ - ); - -INSERT_GENTPROT3U12_BASIC( dotxv_kernel_void ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotxv_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotxv_kernel_void ) -#endif - diff --git a/frame/1/dotxv/bli_dotxv_ref.c b/frame/1/dotxv/bli_dotxv_ref.c deleted file mode 100644 index f2b2c2486..000000000 --- a/frame/1/dotxv/bli_dotxv_ref.c +++ /dev/null @@ -1,209 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T dotxv_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - conj_t conjy, - dim_t n, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* beta, - void* rho - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotxv_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotxv_ref); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotxv_ref); -#endif -#endif - - -void bli_dotxv_ref( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* beta, - obj_t* rho ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_rho = bli_obj_datatype( *rho ); - - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - void* buf_rho = bli_obj_buffer_at_off( *rho ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of x and y. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_x, dt_y ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of rho. - dt_beta = dt_rho; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_rho]; - - // Invoke the function. - f( conjx, - conjy, - n, - buf_alpha, - buf_x, inc_x, - buf_y, inc_y, - buf_beta, - buf_rho ); -} -*/ - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_r, ctype_xy, chx, chy, chr, chxy, varname ) \ -\ -void PASTEMAC3(chx,chy,chr,varname) \ - ( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_xy* restrict alpha, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_r* restrict beta, \ - ctype_r* restrict rho \ - ) \ -{ \ - ctype_xy* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_r* beta_cast = beta; \ - ctype_r* rho_cast = rho; \ - ctype_x* chi1; \ - ctype_y* psi1; \ - ctype_xy dotxy; \ - dim_t i; \ - conj_t conjx_use; \ -\ - /* If beta is zero, clear rho. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chr,eq0)( *beta_cast ) ) \ - { \ - PASTEMAC(chr,set0s)( *rho_cast ); \ - } \ - else \ - { \ - PASTEMAC2(chr,chr,scals)( *beta_cast, *rho_cast ); \ - } \ -\ - if ( bli_zero_dim1( n ) ) return; \ -\ - PASTEMAC(chxy,set0s)( dotxy ); \ -\ - chi1 = x_cast; \ - psi1 = y_cast; \ -\ - /* If y must be conjugated, we do so indirectly by first toggling the - effective conjugation of x and then conjugating the resulting dot - product. */ \ - conjx_use = conjx; \ -\ - if ( bli_is_conj( conjy ) ) \ - bli_toggle_conj( conjx_use ); \ -\ - if ( bli_is_conj( conjx_use ) ) \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC3(chx,chy,chxy,dotjs)( *chi1, *psi1, dotxy ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ - else \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC3(chx,chy,chxy,dots)( *chi1, *psi1, dotxy ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ -\ - if ( bli_is_conj( conjy ) ) \ - PASTEMAC(chxy,conjs)( dotxy ); \ -\ - PASTEMAC3(chxy,chxy,chr,axpys)( *alpha_cast, dotxy, *rho_cast ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC0( dotxv_ref ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D0( dotxv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P0( dotxv_ref ) -#endif - diff --git a/frame/1/kernels/bli_addv_ref.c b/frame/1/kernels/bli_addv_ref.c new file mode 100644 index 000000000..4a91667a2 --- /dev/null +++ b/frame/1/kernels/bli_addv_ref.c @@ -0,0 +1,81 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype* psi1; \ + dim_t i; \ +\ + if ( bli_zero_dim1( n ) ) return; \ +\ + chi1 = x; \ + psi1 = y; \ +\ + if ( bli_is_conj( conjx ) ) \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,addjs)( *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ + else \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,adds)( *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC0( addv_ref ) + diff --git a/frame/1/kernels/bli_axpyv_ref.c b/frame/1/kernels/bli_axpyv_ref.c new file mode 100644 index 000000000..4b29505cf --- /dev/null +++ b/frame/1/kernels/bli_axpyv_ref.c @@ -0,0 +1,103 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype* psi1; \ + dim_t i; \ +\ + if ( bli_zero_dim1( n ) ) return; \ +\ + /* If alpha is zero, return. */ \ + if ( PASTEMAC(ch,eq0)( *alpha ) ) return; \ +\ + /* If alpha is one, use addv. */ \ + if ( PASTEMAC(ch,eq1)( *alpha ) ) \ + { \ + /* Query the context for the kernel function pointer. */ \ + const num_t dt = PASTEMAC(ch,type); \ + PASTECH(ch,addv_ft) addv_p = bli_cntx_get_l1v_ker_dt( dt, BLIS_ADDV_KER, cntx ); \ +\ + addv_p \ + ( \ + conjx, \ + n, \ + x, incx, \ + y, incy, \ + cntx \ + ); \ + return; \ + } \ +\ + chi1 = x; \ + psi1 = y; \ +\ + if ( bli_is_conj( conjx ) ) \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,axpyjs)( *alpha, *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ + else \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,axpys)( *alpha, *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC0( axpyv_ref ) + diff --git a/frame/1/kernels/bli_copyv_ref.c b/frame/1/kernels/bli_copyv_ref.c new file mode 100644 index 000000000..b852f76e7 --- /dev/null +++ b/frame/1/kernels/bli_copyv_ref.c @@ -0,0 +1,81 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype* psi1; \ + dim_t i; \ +\ + if ( bli_zero_dim1( n ) ) return; \ +\ + chi1 = x; \ + psi1 = y; \ +\ + if ( bli_is_conj( conjx ) ) \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,copyjs)( *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ + else \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,copys)( *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC0( copyv_ref ) + diff --git a/frame/1/kernels/bli_dotv_ref.c b/frame/1/kernels/bli_dotv_ref.c new file mode 100644 index 000000000..b17480b07 --- /dev/null +++ b/frame/1/kernels/bli_dotv_ref.c @@ -0,0 +1,104 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype* psi1; \ + ctype dotxy; \ + dim_t i; \ + conj_t conjx_use; \ +\ + if ( bli_zero_dim1( n ) ) \ + { \ + PASTEMAC(ch,set0s)( *rho ); \ + return; \ + } \ +\ + PASTEMAC(ch,set0s)( dotxy ); \ +\ + chi1 = x; \ + psi1 = y; \ +\ + conjx_use = conjx; \ +\ + /* If y must be conjugated, we do so indirectly by first toggling the + effective conjugation of x and then conjugating the resulting dot + product. */ \ + if ( bli_is_conj( conjy ) ) \ + bli_toggle_conj( conjx_use ); \ +\ + if ( bli_is_conj( conjx_use ) ) \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,dotjs)( *chi1, *psi1, dotxy ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ + else \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,dots)( *chi1, *psi1, dotxy ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ +\ + if ( bli_is_conj( conjy ) ) \ + PASTEMAC(ch,conjs)( dotxy ); \ +\ + PASTEMAC(ch,copys)( dotxy, *rho ); \ +} + +INSERT_GENTFUNC_BASIC0( dotv_ref ) + diff --git a/frame/1/kernels/bli_dotxv_ref.c b/frame/1/kernels/bli_dotxv_ref.c new file mode 100644 index 000000000..b611533d4 --- /dev/null +++ b/frame/1/kernels/bli_dotxv_ref.c @@ -0,0 +1,112 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* beta, \ + ctype* rho, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype* psi1; \ + ctype dotxy; \ + dim_t i; \ + conj_t conjx_use; \ +\ + /* If beta is zero, clear rho. Otherwise, scale by beta. */ \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ + { \ + PASTEMAC(ch,set0s)( *rho ); \ + } \ + else \ + { \ + PASTEMAC(ch,scals)( *beta, *rho ); \ + } \ +\ + if ( bli_zero_dim1( n ) ) return; \ +\ + PASTEMAC(ch,set0s)( dotxy ); \ +\ + chi1 = x; \ + psi1 = y; \ +\ + /* If y must be conjugated, we do so indirectly by first toggling the + effective conjugation of x and then conjugating the resulting dot + product. */ \ + conjx_use = conjx; \ +\ + if ( bli_is_conj( conjy ) ) \ + bli_toggle_conj( conjx_use ); \ +\ + if ( bli_is_conj( conjx_use ) ) \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,dotjs)( *chi1, *psi1, dotxy ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ + else \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,dots)( *chi1, *psi1, dotxy ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ +\ + if ( bli_is_conj( conjy ) ) \ + PASTEMAC(ch,conjs)( dotxy ); \ +\ + PASTEMAC(ch,axpys)( *alpha, dotxy, *rho ); \ +} + +INSERT_GENTFUNC_BASIC0( dotxv_ref ) + diff --git a/frame/1/kernels/bli_invertv_ref.c b/frame/1/kernels/bli_invertv_ref.c new file mode 100644 index 000000000..c7f3dbcb7 --- /dev/null +++ b/frame/1/kernels/bli_invertv_ref.c @@ -0,0 +1,63 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + dim_t i; \ +\ + if ( bli_zero_dim1( n ) ) return; \ +\ + chi1 = x; \ +\ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,inverts)( *chi1 ); \ +\ + chi1 += incx; \ + } \ +} + +INSERT_GENTFUNC_BASIC0( invertv_ref ) + diff --git a/frame/1/kernels/bli_l1v_ref.h b/frame/1/kernels/bli_l1v_ref.h new file mode 100644 index 000000000..f3857d841 --- /dev/null +++ b/frame/1/kernels/bli_l1v_ref.h @@ -0,0 +1,147 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( addv_ref ) +INSERT_GENTPROT_BASIC( copyv_ref ) +INSERT_GENTPROT_BASIC( subv_ref ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( axpyv_ref ) +INSERT_GENTPROT_BASIC( scal2v_ref ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( dotv_ref ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* beta, \ + ctype* rho, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( dotxv_ref ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( invertv_ref ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjalpha, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( scalv_ref ) +INSERT_GENTPROT_BASIC( setv_ref ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( swapv_ref ) + diff --git a/frame/1/kernels/bli_scal2v_ref.c b/frame/1/kernels/bli_scal2v_ref.c new file mode 100644 index 000000000..3f739cd90 --- /dev/null +++ b/frame/1/kernels/bli_scal2v_ref.c @@ -0,0 +1,102 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype* psi1; \ + dim_t i; \ +\ + if ( bli_zero_dim1( n ) ) return; \ +\ + /* If alpha is zero, use setv. */ \ + if ( PASTEMAC(ch,eq0)( *alpha ) ) \ + { \ + ctype* zero = PASTEMAC(ch,0); \ +\ + /* Query the context for the kernel function pointer. */ \ + const num_t dt = PASTEMAC(ch,type); \ + PASTECH(ch,setv_ft) setv_p = bli_cntx_get_l1v_ker_dt( dt, BLIS_SETV_KER, cntx ); \ +\ + setv_p \ + ( \ + BLIS_NO_CONJUGATE, \ + n, \ + zero, \ + y, incy, \ + cntx \ + ); \ + return; \ + } \ +\ + chi1 = x; \ + psi1 = y; \ +\ + if ( bli_is_conj( conjx ) ) \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,scal2js)( *alpha, *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ + else \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,scal2s)( *alpha, *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC0( scal2v_ref ) + diff --git a/frame/1/invertv/bli_invertv_ref.c b/frame/1/kernels/bli_scalv_ref.c similarity index 70% rename from frame/1/invertv/bli_invertv_ref.c rename to frame/1/kernels/bli_scalv_ref.c index 52bbfa349..982313c9b 100644 --- a/frame/1/invertv/bli_invertv_ref.c +++ b/frame/1/kernels/bli_scalv_ref.c @@ -34,63 +34,58 @@ #include "blis.h" -/* -#define FUNCPTR_T invertv_fp - -typedef void (*FUNCPTR_T)( - dim_t n, - void* x, inc_t incx - ); - -static FUNCPTR_T GENARRAY(ftypes,invertv_ref); - - -void bli_invertv_ref( obj_t* x ) -{ - num_t dt_x = bli_obj_datatype( *x ); - - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - FUNCPTR_T f; - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x]; - - // Invoke the function. - f( n, - buf_x, inc_x ); -} -*/ - - #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ void PASTEMAC(ch,varname) \ ( \ - dim_t n, \ - ctype* restrict x, inc_t incx \ + conj_t conjalpha, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ ) \ { \ - ctype* x_cast = x; \ ctype* chi1; \ + ctype alpha_conj; \ dim_t i; \ \ if ( bli_zero_dim1( n ) ) return; \ \ - chi1 = x_cast; \ + /* If alpha is one, return. */ \ + if ( PASTEMAC(ch,eq1)( *alpha ) ) return; \ +\ + /* If alpha is zero, use setv. */ \ + if ( PASTEMAC(ch,eq0)( *alpha ) ) \ + { \ + ctype* zero = PASTEMAC(ch,0); \ +\ + /* Query the context for the kernel function pointer. */ \ + const num_t dt = PASTEMAC(ch,type); \ + PASTECH(ch,setv_ft) setv_p = bli_cntx_get_l1v_ker_dt( dt, BLIS_SETV_KER, cntx ); \ +\ + setv_p \ + ( \ + BLIS_NO_CONJUGATE, \ + n, \ + zero, \ + x, incx, \ + cntx \ + ); \ + return; \ + } \ +\ + PASTEMAC(ch,copycjs)( conjalpha, *alpha, alpha_conj ); \ +\ + chi1 = x; \ \ for ( i = 0; i < n; ++i ) \ { \ - PASTEMAC(ch,inverts)( *chi1 ); \ + PASTEMAC(ch,scals)( alpha_conj, *chi1 ); \ \ chi1 += incx; \ } \ } -INSERT_GENTFUNC_BASIC0( invertv_ref ) +INSERT_GENTFUNC_BASIC0( scalv_ref ) diff --git a/frame/1/kernels/bli_setv_ref.c b/frame/1/kernels/bli_setv_ref.c new file mode 100644 index 000000000..f01364339 --- /dev/null +++ b/frame/1/kernels/bli_setv_ref.c @@ -0,0 +1,80 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjalpha, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype alpha_conj; \ + dim_t i; \ +\ + if ( bli_zero_dim1( n ) ) return; \ +\ + chi1 = x; \ +\ + if ( PASTEMAC(ch,eq0)( *alpha ) ) \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,set0s)( *chi1 ); \ +\ + chi1 += incx; \ + } \ + } \ + else \ + { \ + PASTEMAC(ch,copycjs)( conjalpha, *alpha, alpha_conj ); \ +\ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,copys)( alpha_conj, *chi1 ); \ +\ + chi1 += incx; \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC0( setv_ref ) + diff --git a/frame/1/kernels/bli_subv_ref.c b/frame/1/kernels/bli_subv_ref.c new file mode 100644 index 000000000..eca8f36dc --- /dev/null +++ b/frame/1/kernels/bli_subv_ref.c @@ -0,0 +1,81 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype* psi1; \ + dim_t i; \ +\ + if ( bli_zero_dim1( n ) ) return; \ +\ + chi1 = x; \ + psi1 = y; \ +\ + if ( bli_is_conj( conjx ) ) \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,subjs)( *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ + else \ + { \ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,subs)( *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC0( subv_ref ) + diff --git a/frame/1/kernels/bli_swapv_ref.c b/frame/1/kernels/bli_swapv_ref.c new file mode 100644 index 000000000..8fe4a4b9a --- /dev/null +++ b/frame/1/kernels/bli_swapv_ref.c @@ -0,0 +1,67 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype* psi1; \ + dim_t i; \ +\ + if ( bli_zero_dim1( n ) ) return; \ +\ + chi1 = x; \ + psi1 = y; \ +\ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,swaps)( *chi1, *psi1 ); \ +\ + chi1 += incx; \ + psi1 += incy; \ + } \ +} + +INSERT_GENTFUNC_BASIC0( swapv_ref ) + diff --git a/frame/1/addv/bli_addv_ref.h b/frame/1/kernels/old/bli_addv_ref.h similarity index 84% rename from frame/1/addv/bli_addv_ref.h rename to frame/1/kernels/old/bli_addv_ref.h index 048eb015c..98392eccf 100644 --- a/frame/1/addv/bli_addv_ref.h +++ b/frame/1/kernels/old/bli_addv_ref.h @@ -32,29 +32,17 @@ */ -/* -void bli_addv_ref( obj_t* x, - obj_t* y ); -*/ - #undef GENTPROT2 #define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ \ -void PASTEMAC2(chx,chy,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conjx, \ dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ ); INSERT_GENTPROT2_BASIC( addv_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( addv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( addv_ref ) -#endif diff --git a/frame/1/axpyv/bli_axpyv_ref.h b/frame/1/kernels/old/bli_axpyv_ref.h similarity index 81% rename from frame/1/axpyv/bli_axpyv_ref.h rename to frame/1/kernels/old/bli_axpyv_ref.h index f8e600cd0..cf3b08196 100644 --- a/frame/1/axpyv/bli_axpyv_ref.h +++ b/frame/1/kernels/old/bli_axpyv_ref.h @@ -32,32 +32,18 @@ */ -/* -void bli_axpyv_ref( obj_t* alpha, - obj_t* x, - obj_t* y ); -*/ - #undef GENTPROT3 #define GENTPROT3( ctype_a, ctype_x, ctype_y, cha, chx, chy, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conjx, \ dim_t n, \ - ctype_a* restrict alpha, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ + ctype_a* alpha, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ ); INSERT_GENTPROT3_BASIC( axpyv_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3_MIX_D( axpyv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3_MIX_P( axpyv_ref ) -#endif - diff --git a/frame/1/copyv/bli_copyv_ref.h b/frame/1/kernels/old/bli_copyv_ref.h similarity index 83% rename from frame/1/copyv/bli_copyv_ref.h rename to frame/1/kernels/old/bli_copyv_ref.h index 7cc4c4259..19d5ec0de 100644 --- a/frame/1/copyv/bli_copyv_ref.h +++ b/frame/1/kernels/old/bli_copyv_ref.h @@ -32,29 +32,17 @@ */ -/* -void bli_copyv_ref( obj_t* x, - obj_t* y ); -*/ - #undef GENTPROT2 #define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ \ -void PASTEMAC2(chx,chy,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conjx, \ dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ ); INSERT_GENTPROT2_BASIC( copyv_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( copyv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( copyv_ref ) -#endif diff --git a/frame/1/dotv/bli_dotv_ref.h b/frame/1/kernels/old/bli_dotv_ref.h similarity index 81% rename from frame/1/dotv/bli_dotv_ref.h rename to frame/1/kernels/old/bli_dotv_ref.h index c637215c7..35a1eda5b 100644 --- a/frame/1/dotv/bli_dotv_ref.h +++ b/frame/1/kernels/old/bli_dotv_ref.h @@ -32,33 +32,19 @@ */ -/* -void bli_dotv_ref( obj_t* x, - obj_t* y, - obj_t* rho ); -*/ - #undef GENTPROT3 #define GENTPROT3( ctype_x, ctype_y, ctype_r, chx, chy, chr, varname ) \ \ -void PASTEMAC3(chx,chy,chr,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conjx, \ conj_t conjy, \ dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_r* restrict rho \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy, \ + ctype_r* rho \ ); INSERT_GENTPROT3_BASIC( dotv_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3_MIX_D( dotv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3_MIX_P( dotv_ref ) -#endif - diff --git a/frame/1/dotxv/bli_dotxv_ref.h b/frame/1/kernels/old/bli_dotxv_ref.h similarity index 76% rename from frame/1/dotxv/bli_dotxv_ref.h rename to frame/1/kernels/old/bli_dotxv_ref.h index 560f69bd3..676dcce8e 100644 --- a/frame/1/dotxv/bli_dotxv_ref.h +++ b/frame/1/kernels/old/bli_dotxv_ref.h @@ -32,37 +32,21 @@ */ -/* -void bli_dotxv_ref( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* beta, - obj_t* rho ); -*/ - #undef GENTPROT3U12 #define GENTPROT3U12( ctype_x, ctype_y, ctype_r, ctype_xy, chx, chy, chr, chxy, varname ) \ \ -void PASTEMAC3(chx,chy,chr,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conjx, \ conj_t conjy, \ dim_t n, \ - ctype_xy* restrict alpha, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_r* restrict beta, \ - ctype_r* restrict rho \ + ctype_xy* alpha, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy, \ + ctype_r* beta, \ + ctype_r* rho \ ); INSERT_GENTPROT3U12_BASIC( dotxv_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotxv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotxv_ref ) -#endif - diff --git a/frame/1/invertv/bli_invertv_ref.h b/frame/1/kernels/old/bli_invertv_ref.h similarity index 95% rename from frame/1/invertv/bli_invertv_ref.h rename to frame/1/kernels/old/bli_invertv_ref.h index fa2562131..39796c5aa 100644 --- a/frame/1/invertv/bli_invertv_ref.h +++ b/frame/1/kernels/old/bli_invertv_ref.h @@ -32,10 +32,6 @@ */ -/* -void bli_invertv_ref( obj_t* x ); -*/ - #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ @@ -43,7 +39,7 @@ void bli_invertv_ref( obj_t* x ); void PASTEMAC(ch,varname) \ ( \ dim_t n, \ - ctype* restrict x, inc_t incx \ + ctype* x, inc_t incx \ ); INSERT_GENTPROT_BASIC( invertv_ref ) diff --git a/frame/1/scal2v/bli_scal2v_ref.h b/frame/1/kernels/old/bli_scal2v_ref.h similarity index 81% rename from frame/1/scal2v/bli_scal2v_ref.h rename to frame/1/kernels/old/bli_scal2v_ref.h index ef0ca47cd..b33fc94fb 100644 --- a/frame/1/scal2v/bli_scal2v_ref.h +++ b/frame/1/kernels/old/bli_scal2v_ref.h @@ -32,32 +32,18 @@ */ -/* -void bli_scal2v_ref( obj_t* beta, - obj_t* x, - obj_t* y ); -*/ - #undef GENTPROT3 #define GENTPROT3( ctype_b, ctype_x, ctype_y, chb, chx, chy, varname ) \ \ -void PASTEMAC3(chb,chx,chy,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conjx, \ dim_t n, \ - ctype_b* restrict beta, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ + ctype_b* beta, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ ); INSERT_GENTPROT3_BASIC( scal2v_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3_MIX_D( scal2v_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3_MIX_P( scal2v_ref ) -#endif - diff --git a/frame/1/scalv/bli_scalv_ref.h b/frame/1/kernels/old/bli_scalv_ref.h similarity index 79% rename from frame/1/scalv/bli_scalv_ref.h rename to frame/1/kernels/old/bli_scalv_ref.h index 9e22a56b6..43c79d65c 100644 --- a/frame/1/scalv/bli_scalv_ref.h +++ b/frame/1/kernels/old/bli_scalv_ref.h @@ -32,30 +32,17 @@ */ -/* -void bli_scalv_ref( obj_t* beta, - obj_t* x ); -*/ - #undef GENTPROT2 -#define GENTPROT2( ctype_b, ctype_x, chb, chx, varname ) \ +#define GENTPROT2( ctype_a, ctype_x, cha, chx, varname ) \ \ -void PASTEMAC2(chb,chx,varname) \ +void PASTEMAC(chx,varname) \ ( \ - conj_t conjbeta, \ + conj_t conjalpha, \ dim_t n, \ - ctype_b* restrict beta, \ - ctype_x* restrict x, inc_t incx \ + ctype_a* alpha, \ + ctype_x* x, inc_t incx \ ); INSERT_GENTPROT2_BASIC( scalv_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( scalv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( scalv_ref ) -#endif - diff --git a/frame/1/setv/bli_setv_ref.h b/frame/1/kernels/old/bli_setv_ref.h similarity index 81% rename from frame/1/setv/bli_setv_ref.h rename to frame/1/kernels/old/bli_setv_ref.h index 3ae3a1bb0..54f494307 100644 --- a/frame/1/setv/bli_setv_ref.h +++ b/frame/1/kernels/old/bli_setv_ref.h @@ -32,29 +32,17 @@ */ -/* -void bli_setv_ref( obj_t* beta, - obj_t* x ); -*/ - #undef GENTPROT2 -#define GENTPROT2( ctype_b, ctype_x, chb, chx, varname ) \ +#define GENTPROT2( ctype_a, ctype_x, cha, chx, varname ) \ \ -void PASTEMAC2(chb,chx,varname) \ +void PASTEMAC(chx,varname) \ ( \ + conj_t conjalpha, \ dim_t n, \ - ctype_b* restrict beta, \ - ctype_x* restrict x, inc_t incx \ + ctype_a* alpha, \ + ctype_x* x, inc_t incx \ ); INSERT_GENTPROT2_BASIC( setv_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( setv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( setv_ref ) -#endif - diff --git a/frame/1/subv/bli_subv_ref.h b/frame/1/kernels/old/bli_subv_ref.h similarity index 84% rename from frame/1/subv/bli_subv_ref.h rename to frame/1/kernels/old/bli_subv_ref.h index 033962169..6c24fda24 100644 --- a/frame/1/subv/bli_subv_ref.h +++ b/frame/1/kernels/old/bli_subv_ref.h @@ -32,29 +32,17 @@ */ -/* -void bli_subv_ref( obj_t* x, - obj_t* y ); -*/ - #undef GENTPROT2 #define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ \ -void PASTEMAC2(chx,chy,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conjx, \ dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ ); INSERT_GENTPROT2_BASIC( subv_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( subv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( subv_ref ) -#endif diff --git a/frame/1/swapv/bli_swapv_ref.h b/frame/1/kernels/old/bli_swapv_ref.h similarity index 83% rename from frame/1/swapv/bli_swapv_ref.h rename to frame/1/kernels/old/bli_swapv_ref.h index 15b4d6b37..eec6a52e8 100644 --- a/frame/1/swapv/bli_swapv_ref.h +++ b/frame/1/kernels/old/bli_swapv_ref.h @@ -32,28 +32,16 @@ */ -/* -void bli_swapv_ref( obj_t* x, - obj_t* y ); -*/ - #undef GENTPROT2 #define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ \ -void PASTEMAC2(chx,chy,varname) \ +void PASTEMAC(chx,varname) \ ( \ dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ ); INSERT_GENTPROT2_BASIC( swapv_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( swapv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( swapv_ref ) -#endif diff --git a/frame/1/subv/bli_subv_kernel.c b/frame/1/old/addv/bli_addv.c similarity index 58% rename from frame/1/subv/bli_subv_kernel.c rename to frame/1/old/addv/bli_addv.c index c5d97d54b..81a1ef151 100644 --- a/frame/1/subv/bli_subv_kernel.c +++ b/frame/1/old/addv/bli_addv.c @@ -34,8 +34,6 @@ #include "blis.h" -#define FUNCPTR_T subv_fp - typedef void (*FUNCPTR_T)( conj_t conjx, dim_t n, @@ -43,26 +41,19 @@ typedef void (*FUNCPTR_T)( void* y, inc_t incy ); -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,subv_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,subv_kernel_void); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,subv_kernel_void); -#endif -#endif +static FUNCPTR_T GENARRAY_MIN(ftypes,addv_void); -void bli_subv_kernel( obj_t* x, - obj_t* y ) +// +// Define object-based interface. +// +void bli_addv( obj_t* x, + obj_t* y ) { - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); + num_t dt = bli_obj_datatype( *x ); conj_t conjx = bli_obj_conj_status( *x ); + dim_t n = bli_obj_vector_dim( *x ); inc_t inc_x = bli_obj_vector_inc( *x ); @@ -71,13 +62,12 @@ void bli_subv_kernel( obj_t* x, inc_t inc_y = bli_obj_vector_inc( *y ); void* buf_y = bli_obj_buffer_at_off( *y ); - FUNCPTR_T f; + FUNCPTR_T f = ftypes[dt]; - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y]; + if ( bli_error_checking_is_enabled() ) + bli_addv_check( x, y ); - // Invoke the function. + // Invoke the void pointer-based function. f( conjx, n, buf_x, inc_x, @@ -85,31 +75,57 @@ void bli_subv_kernel( obj_t* x, } -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname, kername ) \ +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ \ -void PASTEMAC2(chx,chy,varname)( \ - conj_t conjx, \ - dim_t n, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ) \ { \ - PASTEMAC2(chx,chy,kername)( conjx, \ - n, \ - x, incx, \ - y, incy ); \ + PASTEMAC(ch,kername)( conjx, \ + n, \ + x, incx, \ + y, incy ); \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( subv_kernel_void, SUBV_KERNEL ) +INSERT_GENTFUNC_BASIC( addv_void, addv ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( subv_kernel_void, SUBV_KERNEL ) -#endif -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( subv_kernel_void, SUBV_KERNEL ) -#endif +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjx, \ + n, \ + x, incx, \ + y, incy ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( addv, BLIS_ADDV_KER ) diff --git a/frame/1/addv/bli_addv.h b/frame/1/old/addv/bli_addv.h similarity index 75% rename from frame/1/addv/bli_addv.h rename to frame/1/old/addv/bli_addv.h index 19f9ddd86..3fbd8337d 100644 --- a/frame/1/addv/bli_addv.h +++ b/frame/1/old/addv/bli_addv.h @@ -33,7 +33,6 @@ */ #include "bli_addv_check.h" -#include "bli_addv_kernel.h" #include "bli_addv_ref.h" @@ -45,7 +44,23 @@ void bli_addv( obj_t* x, // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ); + +INSERT_GENTPROT_BASIC( addv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -59,27 +74,3 @@ void PASTEMAC(ch,opname)( \ INSERT_GENTPROT_BASIC( addv ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \ -\ -void PASTEMAC2(chx,chy,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ); - -INSERT_GENTPROT2_BASIC( addv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( addv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( addv ) -#endif - diff --git a/frame/1/axpyv/bli_axpyv_kernel.c b/frame/1/old/axpyv/bli_axpyv.c similarity index 52% rename from frame/1/axpyv/bli_axpyv_kernel.c rename to frame/1/old/axpyv/bli_axpyv.c index 70034f778..5fb43d01e 100644 --- a/frame/1/axpyv/bli_axpyv_kernel.c +++ b/frame/1/old/axpyv/bli_axpyv.c @@ -34,8 +34,6 @@ #include "blis.h" -#define FUNCPTR_T axpyv_fp - typedef void (*FUNCPTR_T)( conj_t conjx, dim_t n, @@ -44,27 +42,20 @@ typedef void (*FUNCPTR_T)( void* y, inc_t incy ); -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,axpyv_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,axpyv_kernel_void); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,axpyv_kernel_void); -#endif -#endif +static FUNCPTR_T GENARRAY_MIN(ftypes,axpyv_void); -void bli_axpyv_kernel( obj_t* alpha, - obj_t* x, - obj_t* y ) +// +// Define object-based interface. +// +void bli_axpyv( obj_t* alpha, + obj_t* x, + obj_t* y ) { - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); + num_t dt = bli_obj_datatype( *x ); conj_t conjx = bli_obj_conj_status( *x ); + dim_t n = bli_obj_vector_dim( *x ); inc_t inc_x = bli_obj_vector_inc( *x ); @@ -73,21 +64,25 @@ void bli_axpyv_kernel( obj_t* alpha, inc_t inc_y = bli_obj_vector_inc( *y ); void* buf_y = bli_obj_buffer_at_off( *y ); - num_t dt_alpha; + obj_t alpha_local; void* buf_alpha; - FUNCPTR_T f; + FUNCPTR_T f = ftypes[dt]; - // If alpha is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the alpha object and extract the buffer at the alpha offset. - bli_set_scalar_dt_buffer( alpha, dt_x, dt_alpha, buf_alpha ); + if ( bli_error_checking_is_enabled() ) + bli_axpyv_check( alpha, x, y ); - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_alpha][dt_x][dt_y]; + // Create a local copy-cast of alpha (and apply internal conjugation + // if needed). + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alpha, + &alpha_local ); - // Invoke the function. + // Extract the scalar buffer. + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); + + // Invoke the void pointer-based function. f( conjx, n, buf_alpha, @@ -96,33 +91,62 @@ void bli_axpyv_kernel( obj_t* alpha, } -#undef GENTFUNC3 -#define GENTFUNC3( ctype_a, ctype_x, ctype_y, cha, chx, chy, varname, kername ) \ +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - conj_t conjx, \ - dim_t n, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ) \ { \ - PASTEMAC3(cha,chx,chy,kername)( conjx, \ - n, \ - alpha, \ - x, incx, \ - y, incy ); \ + PASTEMAC(ch,kername)( conjx, \ + n, \ + alpha, \ + x, incx, \ + y, incy ); \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3_BASIC( axpyv_kernel_void, AXPYV_KERNEL ) +INSERT_GENTFUNC_BASIC( axpyv_void, axpyv ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3_MIX_D( axpyv_kernel_void, AXPYV_KERNEL ) -#endif -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3_MIX_P( axpyv_kernel_void, AXPYV_KERNEL ) -#endif +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjx, \ + n, \ + alpha, \ + x, incx, \ + y, incy ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( axpyv, BLIS_AXPYV_KER ) + diff --git a/frame/1/axpyv/bli_axpyv.h b/frame/1/old/axpyv/bli_axpyv.h similarity index 73% rename from frame/1/axpyv/bli_axpyv.h rename to frame/1/old/axpyv/bli_axpyv.h index e8b644db2..a5a0cc556 100644 --- a/frame/1/axpyv/bli_axpyv.h +++ b/frame/1/old/axpyv/bli_axpyv.h @@ -33,7 +33,6 @@ */ #include "bli_axpyv_check.h" -#include "bli_axpyv_kernel.h" #include "bli_axpyv_ref.h" @@ -46,7 +45,24 @@ void bli_axpyv( obj_t* alpha, // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ); + +INSERT_GENTPROT_BASIC( axpyv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -61,28 +77,3 @@ void PASTEMAC(ch,opname)( \ INSERT_GENTPROT_BASIC( axpyv ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3 -#define GENTPROT3( ctype_a, ctype_x, ctype_y, cha, chx, chy, opname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype_a* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ); - -INSERT_GENTPROT3_BASIC( axpyv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3_MIX_D( axpyv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3_MIX_P( axpyv ) -#endif - diff --git a/frame/1/addv/bli_addv_check.c b/frame/1/old/check/bli_addv_check.c similarity index 100% rename from frame/1/addv/bli_addv_check.c rename to frame/1/old/check/bli_addv_check.c diff --git a/frame/1/addv/bli_addv_check.h b/frame/1/old/check/bli_addv_check.h similarity index 100% rename from frame/1/addv/bli_addv_check.h rename to frame/1/old/check/bli_addv_check.h diff --git a/frame/1/axpyv/bli_axpyv_check.c b/frame/1/old/check/bli_axpyv_check.c similarity index 100% rename from frame/1/axpyv/bli_axpyv_check.c rename to frame/1/old/check/bli_axpyv_check.c diff --git a/frame/1/axpyv/bli_axpyv_check.h b/frame/1/old/check/bli_axpyv_check.h similarity index 100% rename from frame/1/axpyv/bli_axpyv_check.h rename to frame/1/old/check/bli_axpyv_check.h diff --git a/frame/1/copyv/bli_copyv_check.c b/frame/1/old/check/bli_copyv_check.c similarity index 100% rename from frame/1/copyv/bli_copyv_check.c rename to frame/1/old/check/bli_copyv_check.c diff --git a/frame/1/copyv/bli_copyv_check.h b/frame/1/old/check/bli_copyv_check.h similarity index 100% rename from frame/1/copyv/bli_copyv_check.h rename to frame/1/old/check/bli_copyv_check.h diff --git a/frame/1/dotv/bli_dotv_check.c b/frame/1/old/check/bli_dotv_check.c similarity index 100% rename from frame/1/dotv/bli_dotv_check.c rename to frame/1/old/check/bli_dotv_check.c diff --git a/frame/1/dotv/bli_dotv_check.h b/frame/1/old/check/bli_dotv_check.h similarity index 100% rename from frame/1/dotv/bli_dotv_check.h rename to frame/1/old/check/bli_dotv_check.h diff --git a/frame/1/dotxv/bli_dotxv_check.c b/frame/1/old/check/bli_dotxv_check.c similarity index 100% rename from frame/1/dotxv/bli_dotxv_check.c rename to frame/1/old/check/bli_dotxv_check.c diff --git a/frame/1/dotxv/bli_dotxv_check.h b/frame/1/old/check/bli_dotxv_check.h similarity index 100% rename from frame/1/dotxv/bli_dotxv_check.h rename to frame/1/old/check/bli_dotxv_check.h diff --git a/frame/1/invertv/bli_invertv_check.c b/frame/1/old/check/bli_invertv_check.c similarity index 100% rename from frame/1/invertv/bli_invertv_check.c rename to frame/1/old/check/bli_invertv_check.c diff --git a/frame/1/invertv/bli_invertv_check.h b/frame/1/old/check/bli_invertv_check.h similarity index 100% rename from frame/1/invertv/bli_invertv_check.h rename to frame/1/old/check/bli_invertv_check.h diff --git a/frame/1/scal2v/bli_scal2v_check.c b/frame/1/old/check/bli_scal2v_check.c similarity index 100% rename from frame/1/scal2v/bli_scal2v_check.c rename to frame/1/old/check/bli_scal2v_check.c diff --git a/frame/1/scal2v/bli_scal2v_check.h b/frame/1/old/check/bli_scal2v_check.h similarity index 100% rename from frame/1/scal2v/bli_scal2v_check.h rename to frame/1/old/check/bli_scal2v_check.h diff --git a/frame/1/scalv/bli_scalv_check.c b/frame/1/old/check/bli_scalv_check.c similarity index 100% rename from frame/1/scalv/bli_scalv_check.c rename to frame/1/old/check/bli_scalv_check.c diff --git a/frame/1/scalv/bli_scalv_check.h b/frame/1/old/check/bli_scalv_check.h similarity index 100% rename from frame/1/scalv/bli_scalv_check.h rename to frame/1/old/check/bli_scalv_check.h diff --git a/frame/1/setv/bli_setv_check.c b/frame/1/old/check/bli_setv_check.c similarity index 100% rename from frame/1/setv/bli_setv_check.c rename to frame/1/old/check/bli_setv_check.c diff --git a/frame/1/setv/bli_setv_check.h b/frame/1/old/check/bli_setv_check.h similarity index 100% rename from frame/1/setv/bli_setv_check.h rename to frame/1/old/check/bli_setv_check.h diff --git a/frame/1/subv/bli_subv_check.c b/frame/1/old/check/bli_subv_check.c similarity index 100% rename from frame/1/subv/bli_subv_check.c rename to frame/1/old/check/bli_subv_check.c diff --git a/frame/1/subv/bli_subv_check.h b/frame/1/old/check/bli_subv_check.h similarity index 100% rename from frame/1/subv/bli_subv_check.h rename to frame/1/old/check/bli_subv_check.h diff --git a/frame/1/swapv/bli_swapv_check.c b/frame/1/old/check/bli_swapv_check.c similarity index 100% rename from frame/1/swapv/bli_swapv_check.c rename to frame/1/old/check/bli_swapv_check.c diff --git a/frame/1/swapv/bli_swapv_check.h b/frame/1/old/check/bli_swapv_check.h similarity index 100% rename from frame/1/swapv/bli_swapv_check.h rename to frame/1/old/check/bli_swapv_check.h diff --git a/frame/1/addv/bli_addv_kernel.c b/frame/1/old/copyv/bli_copyv.c similarity index 58% rename from frame/1/addv/bli_addv_kernel.c rename to frame/1/old/copyv/bli_copyv.c index a863284d8..ca2bdee47 100644 --- a/frame/1/addv/bli_addv_kernel.c +++ b/frame/1/old/copyv/bli_copyv.c @@ -34,8 +34,6 @@ #include "blis.h" -#define FUNCPTR_T addv_fp - typedef void (*FUNCPTR_T)( conj_t conjx, dim_t n, @@ -43,26 +41,19 @@ typedef void (*FUNCPTR_T)( void* y, inc_t incy ); -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,addv_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,addv_kernel_void); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,addv_kernel_void); -#endif -#endif +static FUNCPTR_T GENARRAY_MIN(ftypes,copyv_void); -void bli_addv_kernel( obj_t* x, - obj_t* y ) +// +// Define object-based interface. +// +void bli_copyv( obj_t* x, + obj_t* y ) { - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); + num_t dt = bli_obj_datatype( *x ); conj_t conjx = bli_obj_conj_status( *x ); + dim_t n = bli_obj_vector_dim( *x ); inc_t inc_x = bli_obj_vector_inc( *x ); @@ -71,13 +62,12 @@ void bli_addv_kernel( obj_t* x, inc_t inc_y = bli_obj_vector_inc( *y ); void* buf_y = bli_obj_buffer_at_off( *y ); - FUNCPTR_T f; + FUNCPTR_T f = ftypes[dt]; - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y]; + if ( bli_error_checking_is_enabled() ) + bli_copyv_check( x, y ); - // Invoke the function. + // Invoke the void pointer-based function. f( conjx, n, buf_x, inc_x, @@ -85,31 +75,57 @@ void bli_addv_kernel( obj_t* x, } -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname, kername ) \ +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ \ -void PASTEMAC2(chx,chy,varname)( \ - conj_t conjx, \ - dim_t n, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ) \ { \ - PASTEMAC2(chx,chy,kername)( conjx, \ - n, \ - x, incx, \ - y, incy ); \ + PASTEMAC(ch,kername)( conjx, \ + n, \ + x, incx, \ + y, incy ); \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( addv_kernel_void, ADDV_KERNEL ) +INSERT_GENTFUNC_BASIC( copyv_void, copyv ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( addv_kernel_void, ADDV_KERNEL ) -#endif -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( addv_kernel_void, ADDV_KERNEL ) -#endif +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjx, \ + n, \ + x, incx, \ + y, incy ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( copyv, BLIS_COPYV_KER ) diff --git a/frame/1/copyv/bli_copyv.h b/frame/1/old/copyv/bli_copyv.h similarity index 75% rename from frame/1/copyv/bli_copyv.h rename to frame/1/old/copyv/bli_copyv.h index b664fb55e..de193a5b9 100644 --- a/frame/1/copyv/bli_copyv.h +++ b/frame/1/old/copyv/bli_copyv.h @@ -33,7 +33,6 @@ */ #include "bli_copyv_check.h" -#include "bli_copyv_kernel.h" #include "bli_copyv_ref.h" @@ -45,7 +44,23 @@ void bli_copyv( obj_t* x, // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ); + +INSERT_GENTPROT_BASIC( copyv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -59,27 +74,3 @@ void PASTEMAC(ch,opname)( \ INSERT_GENTPROT_BASIC( copyv ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \ -\ -void PASTEMAC2(chx,chy,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ); - -INSERT_GENTPROT2_BASIC( copyv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( copyv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( copyv ) -#endif - diff --git a/frame/1/dotv/bli_dotv_kernel.c b/frame/1/old/dotv/bli_dotv.c similarity index 55% rename from frame/1/dotv/bli_dotv_kernel.c rename to frame/1/old/dotv/bli_dotv.c index 8f0970c08..f45401385 100644 --- a/frame/1/dotv/bli_dotv_kernel.c +++ b/frame/1/old/dotv/bli_dotv.c @@ -34,8 +34,6 @@ #include "blis.h" -#define FUNCPTR_T dotv_fp - typedef void (*FUNCPTR_T)( conj_t conjx, conj_t conjy, @@ -45,29 +43,21 @@ typedef void (*FUNCPTR_T)( void* rho ); -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotv_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotv_kernel_void); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotv_kernel_void); -#endif -#endif +static FUNCPTR_T GENARRAY_MIN(ftypes,dotv_void); -void bli_dotv_kernel( obj_t* x, - obj_t* y, - obj_t* rho ) +// +// Define object-based interface. +// +void bli_dotv( obj_t* x, + obj_t* y, + obj_t* rho ) { - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_rho = bli_obj_datatype( *rho ); + num_t dt = bli_obj_datatype( *x ); conj_t conjx = bli_obj_conj_status( *x ); conj_t conjy = bli_obj_conj_status( *y ); + dim_t n = bli_obj_vector_dim( *x ); inc_t inc_x = bli_obj_vector_inc( *x ); @@ -78,51 +68,80 @@ void bli_dotv_kernel( obj_t* x, void* buf_rho = bli_obj_buffer_at_off( *rho ); - FUNCPTR_T f; + FUNCPTR_T f = ftypes[dt]; - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_rho]; + if ( bli_error_checking_is_enabled() ) + bli_dotv_check( x, y, rho ); - // Invoke the function. + // Invoke the void pointer-based function. f( conjx, - conjy, n, - buf_x, inc_x, + buf_x, inc_x, buf_y, inc_y, buf_rho ); } -#undef GENTFUNC3 -#define GENTFUNC3( ctype_x, ctype_y, ctype_r, chx, chy, chr, varname, kername ) \ +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ \ -void PASTEMAC3(chx,chy,chr,varname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* rho \ - ) \ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* rho \ + ) \ { \ - PASTEMAC3(chx,chy,chr,kername)( conjx, \ - conjy, \ - n, \ - x, incx, \ - y, incy, \ - rho ); \ + PASTEMAC(ch,kername)( conjx, \ + conjy, \ + n, \ + x, incx, \ + y, incy, \ + rho ); \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3_BASIC( dotv_kernel_void, DOTV_KERNEL ) +INSERT_GENTFUNC_BASIC( dotv_void, dotv ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3_MIX_D( dotv_kernel_void, DOTV_KERNEL ) -#endif -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3_MIX_P( dotv_kernel_void, DOTV_KERNEL ) -#endif +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjx, \ + conjy, \ + n, \ + x, incx, \ + y, incy, \ + rho ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( dotv, BLIS_DOTV_KER ) + diff --git a/frame/1/dotv/bli_dotv.h b/frame/1/old/dotv/bli_dotv.h similarity index 72% rename from frame/1/dotv/bli_dotv.h rename to frame/1/old/dotv/bli_dotv.h index ddc22967a..816a7ee4c 100644 --- a/frame/1/dotv/bli_dotv.h +++ b/frame/1/old/dotv/bli_dotv.h @@ -33,7 +33,6 @@ */ #include "bli_dotv_check.h" -#include "bli_dotv_kernel.h" #include "bli_dotv_ref.h" @@ -46,7 +45,25 @@ void bli_dotv( obj_t* x, // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* rho \ + ); + +INSERT_GENTPROT_BASIC( dotv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -62,31 +79,3 @@ void PASTEMAC(ch,opname)( \ INSERT_GENTPROT_BASIC( dotv ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3 -#define GENTPROT3( ctype_x, ctype_y, ctype_r, chx, chy, chr, opname ) \ -\ -void PASTEMAC3(chx,chy,chr,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_r* rho \ - ); - - -INSERT_GENTPROT3_BASIC( dotv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3_MIX_D( dotv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3_MIX_P( dotv ) -#endif - diff --git a/frame/1/old/dotxv/bli_dotxv.c b/frame/1/old/dotxv/bli_dotxv.c new file mode 100644 index 000000000..70fd55776 --- /dev/null +++ b/frame/1/old/dotxv/bli_dotxv.c @@ -0,0 +1,182 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*FUNCPTR_T)( + conj_t conjx, + conj_t conjy, + dim_t n, + void* alpha, + void* x, inc_t incx, + void* y, inc_t incy, + void* beta, + void* rho + ); + +static FUNCPTR_T GENARRAY_MIN(ftypes,dotxv_void); + + +// +// Define object-based interface. +// +void bli_dotxv( obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* beta, + obj_t* rho ) +{ + num_t dt = bli_obj_datatype( *x ); + + conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); + + dim_t n = bli_obj_vector_dim( *x ); + + inc_t inc_x = bli_obj_vector_inc( *x ); + void* buf_x = bli_obj_buffer_at_off( *x ); + + inc_t inc_y = bli_obj_vector_inc( *y ); + void* buf_y = bli_obj_buffer_at_off( *y ); + + void* buf_rho = bli_obj_buffer_at_off( *rho ); + + obj_t alpha_local; + void* buf_alpha; + + obj_t beta_local; + void* buf_beta; + + FUNCPTR_T f = ftypes[dt]; + + if ( bli_error_checking_is_enabled() ) + bli_dotxv_check( alpha, x, y, beta, rho ); + + // Create local copy-casts of the scalars (and apply internal conjugation + // if needed). + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alpha, + &alpha_local ); + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + beta, + &beta_local ); + + // Extract the scalar buffers. + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); + buf_beta = bli_obj_buffer_for_1x1( dt, beta_local ); + + // Invoke the void pointer-based function. + f( conjx, + n, + buf_alpha, + buf_x, inc_x, + buf_y, inc_y, + buf_beta, + buf_rho ); +} + + +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* beta, \ + void* rho \ + ) \ +{ \ + PASTEMAC(ch,kername)( conjx, \ + conjy, \ + n, \ + alpha, \ + x, incx, \ + y, incy, \ + beta, \ + rho ); \ +} + +INSERT_GENTFUNC_BASIC( dotxv_void, dotxv ) + + +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* beta, \ + ctype* rho \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjx, \ + conjy, \ + n, \ + alpha, \ + x, incx, \ + y, incy, \ + beta, \ + rho ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( dotxv, BLIS_DOTXV_KER ) + + diff --git a/frame/1/dotxv/bli_dotxv.h b/frame/1/old/dotxv/bli_dotxv.h similarity index 70% rename from frame/1/dotxv/bli_dotxv.h rename to frame/1/old/dotxv/bli_dotxv.h index f48efbfdd..6a32154b4 100644 --- a/frame/1/dotxv/bli_dotxv.h +++ b/frame/1/old/dotxv/bli_dotxv.h @@ -33,7 +33,6 @@ */ #include "bli_dotxv_check.h" -#include "bli_dotxv_kernel.h" #include "bli_dotxv_ref.h" @@ -48,7 +47,27 @@ void bli_dotxv( obj_t* alpha, // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* beta, \ + void* rho \ + ); + +INSERT_GENTPROT_BASIC( dotxv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -66,32 +85,3 @@ void PASTEMAC(ch,opname)( \ INSERT_GENTPROT_BASIC( dotxv ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_r, ctype_xy, chx, chy, chr, chxy, opname ) \ -\ -void PASTEMAC3(chx,chy,chr,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_xy* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_r* beta, \ - ctype_r* rho \ - ); - - -INSERT_GENTPROT3U12_BASIC( dotxv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotxv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotxv ) -#endif - diff --git a/frame/1/old/invertv/bli_invertv.c b/frame/1/old/invertv/bli_invertv.c new file mode 100644 index 000000000..151cf2a88 --- /dev/null +++ b/frame/1/old/invertv/bli_invertv.c @@ -0,0 +1,114 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*FUNCPTR_T)( + dim_t n, + void* x, inc_t incx + ); + +static FUNCPTR_T GENARRAY_MIN(ftypes,invertv_void); + + +// +// Define object-based interface. +// +void bli_invertv( obj_t* x ) +{ + num_t dt = bli_obj_datatype( *x ); + + dim_t n = bli_obj_vector_dim( *x ); + + inc_t inc_x = bli_obj_vector_inc( *x ); + void* buf_x = bli_obj_buffer_at_off( *x ); + + FUNCPTR_T f = ftypes[dt]; + + if ( bli_error_checking_is_enabled() ) + bli_invertv_check( x ); + + // Invoke the void pointer-based function. + f( n, + buf_x, inc_x ); +} + + +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname)( \ + dim_t n, \ + void* x, inc_t incx \ + ) \ +{ \ + PASTEMAC(ch,kername)( n, \ + x, incx ); \ +} + +INSERT_GENTFUNC_BASIC( invertv_void, invertv ) + + +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + dim_t n, \ + ctype* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( n, \ + x, incx ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( invertv, BLIS_INVERTV_KER ) + + diff --git a/frame/1/invertv/bli_invertv.h b/frame/1/old/invertv/bli_invertv.h similarity index 82% rename from frame/1/invertv/bli_invertv.h rename to frame/1/old/invertv/bli_invertv.h index 771ae10f1..a62be93eb 100644 --- a/frame/1/invertv/bli_invertv.h +++ b/frame/1/old/invertv/bli_invertv.h @@ -33,7 +33,6 @@ */ #include "bli_invertv_check.h" -#include "bli_invertv_kernel.h" #include "bli_invertv_ref.h" @@ -44,14 +43,28 @@ void bli_invertv( obj_t* x ); // -// Prototype BLAS-like interfaces. +// Prototype BLAS-like interfaces with void pointer operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ \ void PASTEMAC(ch,opname)( \ dim_t n, \ - ctype* x, inc_t incx \ + void* x, inc_t incx \ + ); + +INSERT_GENTPROT_BASIC( invertv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + dim_t n, \ + ctype* x, inc_t incx \ ); INSERT_GENTPROT_BASIC( invertv ) diff --git a/frame/1/copyv/bli_copyv_ref.c b/frame/1/old/scal2v/bli_scal2v.c similarity index 51% rename from frame/1/copyv/bli_copyv_ref.c rename to frame/1/old/scal2v/bli_scal2v.c index abff268dc..826c2beeb 100644 --- a/frame/1/copyv/bli_copyv_ref.c +++ b/frame/1/old/scal2v/bli_scal2v.c @@ -34,36 +34,28 @@ #include "blis.h" -/* -#define FUNCPTR_T copyv_fp - typedef void (*FUNCPTR_T)( conj_t conjx, dim_t n, + void* alpha, void* x, inc_t incx, void* y, inc_t incy ); -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,copyv_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,copyv_ref); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,copyv_ref); -#endif -#endif +static FUNCPTR_T GENARRAY_MIN(ftypes,scal2v_void); -void bli_copyv_ref( obj_t* x, - obj_t* y ) +// +// Define object-based interface. +// +void bli_scal2v( obj_t* alpha, + obj_t* x, + obj_t* y ) { - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); + num_t dt = bli_obj_datatype( *x ); conj_t conjx = bli_obj_conj_status( *x ); + dim_t n = bli_obj_vector_dim( *x ); inc_t inc_x = bli_obj_vector_inc( *x ); @@ -72,74 +64,89 @@ void bli_copyv_ref( obj_t* x, inc_t inc_y = bli_obj_vector_inc( *y ); void* buf_y = bli_obj_buffer_at_off( *y ); - FUNCPTR_T f; + obj_t alpha_local; + void* buf_alpha; - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y]; + FUNCPTR_T f = ftypes[dt]; - // Invoke the function. + if ( bli_error_checking_is_enabled() ) + bli_scal2v_check( alpha, x, y ); + + // Create a local copy-cast of alpha (and apply internal conjugation + // if needed). + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alpha, + &alpha_local ); + + // Extract the scalar buffer. + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); + + // Invoke the void pointer-based function. f( conjx, n, + buf_alpha, buf_x, inc_x, buf_y, inc_y ); } -*/ -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname ) \ +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ \ -void PASTEMAC2(chx,chy,varname) \ - ( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ) \ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ) \ { \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_x* chi1; \ - ctype_y* psi1; \ - dim_t i; \ -\ - if ( bli_zero_dim1( n ) ) return; \ -\ - chi1 = x_cast; \ - psi1 = y_cast; \ -\ - if ( bli_is_conj( conjx ) ) \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC2(chx,chy,copyjs)( *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ - else \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC2(chx,chy,copys)( *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ + PASTEMAC(ch,kername)( conjx, \ + n, \ + alpha, \ + x, incx, \ + y, incy ); \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC0( copyv_ref ) +INSERT_GENTFUNC_BASIC( scal2v_void, scal2v ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D0( copyv_ref ) -#endif -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P0( copyv_ref ) -#endif +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjx, \ + n, \ + alpha, \ + x, incx, \ + y, incy ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( scal2v, BLIS_SCAL2V_KER ) + diff --git a/frame/1/scal2v/bli_scal2v.h b/frame/1/old/scal2v/bli_scal2v.h similarity index 71% rename from frame/1/scal2v/bli_scal2v.h rename to frame/1/old/scal2v/bli_scal2v.h index 38c4c9a69..5ae52ace1 100644 --- a/frame/1/scal2v/bli_scal2v.h +++ b/frame/1/old/scal2v/bli_scal2v.h @@ -33,20 +33,19 @@ */ #include "bli_scal2v_check.h" -#include "bli_scal2v_kernel.h" #include "bli_scal2v_ref.h" // // Prototype object-based interface. // -void bli_scal2v( obj_t* beta, +void bli_scal2v( obj_t* alpha, obj_t* x, obj_t* y ); // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -54,35 +53,27 @@ void bli_scal2v( obj_t* beta, void PASTEMAC(ch,opname)( \ conj_t conjx, \ dim_t n, \ - ctype* beta, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ); + +INSERT_GENTPROT_BASIC( scal2v_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + ctype* alpha, \ ctype* x, inc_t incx, \ ctype* y, inc_t incy \ ); INSERT_GENTPROT_BASIC( scal2v ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3 -#define GENTPROT3( ctype_b, ctype_x, ctype_y, chb, chx, chy, opname ) \ -\ -void PASTEMAC3(chb,chx,chy,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype_b* beta, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ); - -INSERT_GENTPROT3_BASIC( scal2v ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3_MIX_D( scal2v ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3_MIX_P( scal2v ) -#endif - diff --git a/frame/1/old/scalv/bli_scalv.c b/frame/1/old/scalv/bli_scalv.c new file mode 100644 index 000000000..de6993204 --- /dev/null +++ b/frame/1/old/scalv/bli_scalv.c @@ -0,0 +1,140 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*FUNCPTR_T)( + conj_t conjalpha, + dim_t n, + void* alpha, + void* x, inc_t incx + ); + +static FUNCPTR_T GENARRAY_MIN(ftypes,scalv_void); + + +// +// Define object-based interface. +// +void bli_scalv( obj_t* alpha, + obj_t* x ) +{ + num_t dt = bli_obj_datatype( *x ); + + dim_t n = bli_obj_vector_dim( *x ); + + inc_t inc_x = bli_obj_vector_inc( *x ); + void* buf_x = bli_obj_buffer_at_off( *x ); + + obj_t alpha_local; + void* buf_alpha; + + FUNCPTR_T f = ftypes[dt]; + + if ( bli_error_checking_is_enabled() ) + bli_scalv_check( alpha, x ); + + // Create a local copy-cast of alpha (and apply internal conjugation + // if needed). + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alpha, + &alpha_local ); + + // Extract the scalar buffer. + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); + + // Invoke the void pointer-based function. + f( BLIS_NO_CONJUGATE, // conjugation applied during copy-cast. + n, + buf_alpha, + buf_x, inc_x ); +} + + +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjalpha, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx \ + ) \ +{ \ + PASTEMAC(ch,kername)( conjalpha, \ + n, \ + alpha, \ + x, incx ); \ +} + +INSERT_GENTFUNC_BASIC( scalv_void, scalv ) + + +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjalpha, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjalpha, \ + n, \ + alpha, \ + x, incx ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( scalv, BLIS_SCALV_KER ) + + diff --git a/frame/1/scalv/bli_scalv.h b/frame/1/old/scalv/bli_scalv.h similarity index 68% rename from frame/1/scalv/bli_scalv.h rename to frame/1/old/scalv/bli_scalv.h index db48a3cad..ad08a72cd 100644 --- a/frame/1/scalv/bli_scalv.h +++ b/frame/1/old/scalv/bli_scalv.h @@ -32,57 +32,45 @@ */ -#include "bli_scalv_cntl.h" #include "bli_scalv_check.h" -#include "bli_scalv_int.h" - -#include "bli_scalv_kernel.h" #include "bli_scalv_ref.h" // // Prototype object-based interface. // -void bli_scalv( obj_t* beta, +void bli_scalv( obj_t* alpha, obj_t* x ); // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ \ void PASTEMAC(ch,opname)( \ - conj_t conjbeta, \ + conj_t conjalpha, \ dim_t n, \ - ctype* beta, \ - ctype* x, inc_t incx \ + void* alpha, \ + void* x, inc_t incx \ + ); + +INSERT_GENTPROT_BASIC( scalv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjalpha, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx \ ); INSERT_GENTPROT_BASIC( scalv ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT2 -#define GENTPROT2( ctype_b, ctype_x, chb, chx, opname ) \ -\ -void PASTEMAC2(chb,chx,opname)( \ - conj_t conjbeta, \ - dim_t n, \ - ctype_b* beta, \ - ctype_x* x, inc_t incx \ - ); - -INSERT_GENTPROT2_BASIC( scalv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( scalv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( scalv ) -#endif - diff --git a/frame/1/old/setv/bli_setv.c b/frame/1/old/setv/bli_setv.c new file mode 100644 index 000000000..de36559fe --- /dev/null +++ b/frame/1/old/setv/bli_setv.c @@ -0,0 +1,140 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*FUNCPTR_T)( + conj_t conjalpha, + dim_t n, + void* alpha, + void* x, inc_t incx + ); + +static FUNCPTR_T GENARRAY_MIN(ftypes,setv_void); + + +// +// Define object-based interface. +// +void bli_setv( obj_t* alpha, + obj_t* x ) +{ + num_t dt = bli_obj_datatype( *x ); + + dim_t n = bli_obj_vector_dim( *x ); + + inc_t inc_x = bli_obj_vector_inc( *x ); + void* buf_x = bli_obj_buffer_at_off( *x ); + + obj_t alpha_local; + void* buf_alpha; + + FUNCPTR_T f = ftypes[dt]; + + if ( bli_error_checking_is_enabled() ) + bli_setv_check( alpha, x ); + + // Create a local copy-cast of alpha (and apply internal conjugation + // if needed). + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alpha, + &alpha_local ); + + // Extract the scalar buffer. + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); + + // Invoke the void pointer-based function. + f( BLIS_NO_CONJUGATE, // conjugation applied during copy-cast. + n, + buf_alpha, + buf_x, inc_x ); +} + + +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjalpha, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx \ + ) \ +{ \ + PASTEMAC(ch,kername)( conjalpha, \ + n, \ + alpha, \ + x, incx ); \ +} + +INSERT_GENTFUNC_BASIC( setv_void, setv ) + + +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjalpha, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjalpha, \ + n, \ + alpha, \ + x, incx ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( setv, BLIS_SCALV_KER ) + + diff --git a/frame/1m/packm/ukernels/bli_packm_ref_cxk.h b/frame/1/old/setv/bli_setv.h similarity index 65% rename from frame/1m/packm/ukernels/bli_packm_ref_cxk.h rename to frame/1/old/setv/bli_setv.h index bf1354703..89d9a2b55 100644 --- a/frame/1m/packm/ukernels/bli_packm_ref_cxk.h +++ b/frame/1/old/setv/bli_setv.h @@ -32,25 +32,45 @@ */ +#include "bli_setv_check.h" +#include "bli_setv_ref.h" + + +// +// Prototype object-based interface. +// +void bli_setv( obj_t* alpha, + obj_t* x ); + + +// +// Prototype BLAS-like interfaces with void pointer operands. +// #undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ +#define GENTPROT( ctype, ch, opname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ); +void PASTEMAC(ch,opname)( \ + conj_t conjalpha, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx \ + ); -INSERT_GENTPROT_BASIC( packm_ref_2xk ) -INSERT_GENTPROT_BASIC( packm_ref_3xk ) -INSERT_GENTPROT_BASIC( packm_ref_4xk ) -INSERT_GENTPROT_BASIC( packm_ref_6xk ) -INSERT_GENTPROT_BASIC( packm_ref_8xk ) -INSERT_GENTPROT_BASIC( packm_ref_10xk ) -INSERT_GENTPROT_BASIC( packm_ref_12xk ) -INSERT_GENTPROT_BASIC( packm_ref_14xk ) -INSERT_GENTPROT_BASIC( packm_ref_16xk ) -INSERT_GENTPROT_BASIC( packm_ref_30xk ) +INSERT_GENTPROT_BASIC( setv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjalpha, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx \ + ); + +INSERT_GENTPROT_BASIC( setv ) diff --git a/frame/1/setv/old/bli_setv_unb_var2.c b/frame/1/old/setv/old/bli_setv_unb_var2.c similarity index 100% rename from frame/1/setv/old/bli_setv_unb_var2.c rename to frame/1/old/setv/old/bli_setv_unb_var2.c diff --git a/frame/1/setv/old/bli_setv_unb_var2.h b/frame/1/old/setv/old/bli_setv_unb_var2.h similarity index 100% rename from frame/1/setv/old/bli_setv_unb_var2.h rename to frame/1/old/setv/old/bli_setv_unb_var2.h diff --git a/frame/1/copyv/bli_copyv_kernel.c b/frame/1/old/subv/bli_subv.c similarity index 58% rename from frame/1/copyv/bli_copyv_kernel.c rename to frame/1/old/subv/bli_subv.c index aab8c0604..734f7a7ab 100644 --- a/frame/1/copyv/bli_copyv_kernel.c +++ b/frame/1/old/subv/bli_subv.c @@ -34,8 +34,6 @@ #include "blis.h" -#define FUNCPTR_T copyv_fp - typedef void (*FUNCPTR_T)( conj_t conjx, dim_t n, @@ -43,26 +41,19 @@ typedef void (*FUNCPTR_T)( void* y, inc_t incy ); -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,copyv_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,copyv_kernel_void); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,copyv_kernel_void); -#endif -#endif +static FUNCPTR_T GENARRAY_MIN(ftypes,subv_void); -void bli_copyv_kernel( obj_t* x, - obj_t* y ) +// +// Define object-based interface. +// +void bli_subv( obj_t* x, + obj_t* y ) { - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); + num_t dt = bli_obj_datatype( *x ); conj_t conjx = bli_obj_conj_status( *x ); + dim_t n = bli_obj_vector_dim( *x ); inc_t inc_x = bli_obj_vector_inc( *x ); @@ -71,13 +62,12 @@ void bli_copyv_kernel( obj_t* x, inc_t inc_y = bli_obj_vector_inc( *y ); void* buf_y = bli_obj_buffer_at_off( *y ); - FUNCPTR_T f; + FUNCPTR_T f = ftypes[dt]; - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y]; + if ( bli_error_checking_is_enabled() ) + bli_subv_check( x, y ); - // Invoke the function. + // Invoke the void pointer-based function. f( conjx, n, buf_x, inc_x, @@ -85,31 +75,57 @@ void bli_copyv_kernel( obj_t* x, } -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname, kername ) \ +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ \ -void PASTEMAC2(chx,chy,varname)( \ - conj_t conjx, \ - dim_t n, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ) \ { \ - PASTEMAC2(chx,chy,kername)( conjx, \ - n, \ - x, incx, \ - y, incy ); \ + PASTEMAC(ch,kername)( conjx, \ + n, \ + x, incx, \ + y, incy ); \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( copyv_kernel_void, COPYV_KERNEL ) +INSERT_GENTFUNC_BASIC( subv_void, subv ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( copyv_kernel_void, COPYV_KERNEL ) -#endif -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( copyv_kernel_void, COPYV_KERNEL ) -#endif +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjx, \ + n, \ + x, incx, \ + y, incy ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( subv, BLIS_SUBV_KER ) diff --git a/frame/1/subv/bli_subv.h b/frame/1/old/subv/bli_subv.h similarity index 75% rename from frame/1/subv/bli_subv.h rename to frame/1/old/subv/bli_subv.h index 5a287d3ff..8c16752b7 100644 --- a/frame/1/subv/bli_subv.h +++ b/frame/1/old/subv/bli_subv.h @@ -33,7 +33,6 @@ */ #include "bli_subv_check.h" -#include "bli_subv_kernel.h" #include "bli_subv_ref.h" @@ -45,7 +44,23 @@ void bli_subv( obj_t* x, // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + dim_t n, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ); + +INSERT_GENTPROT_BASIC( subv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -59,27 +74,3 @@ void PASTEMAC(ch,opname)( \ INSERT_GENTPROT_BASIC( subv ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \ -\ -void PASTEMAC2(chx,chy,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ); - -INSERT_GENTPROT2_BASIC( subv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( subv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( subv ) -#endif - diff --git a/frame/1/swapv/bli_swapv_kernel.c b/frame/1/old/swapv/bli_swapv.c similarity index 58% rename from frame/1/swapv/bli_swapv_kernel.c rename to frame/1/old/swapv/bli_swapv.c index f8fe41b69..577d1fe9f 100644 --- a/frame/1/swapv/bli_swapv_kernel.c +++ b/frame/1/old/swapv/bli_swapv.c @@ -34,32 +34,22 @@ #include "blis.h" -#define FUNCPTR_T swapv_fp - typedef void (*FUNCPTR_T)( dim_t n, void* x, inc_t incx, void* y, inc_t incy ); -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,swapv_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,swapv_kernel_void); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,swapv_kernel_void); -#endif -#endif +static FUNCPTR_T GENARRAY_MIN(ftypes,swapv_void); -void bli_swapv_kernel( obj_t* x, - obj_t* y ) +// +// Define object-based interface. +// +void bli_swapv( obj_t* x, + obj_t* y ) { - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); + num_t dt = bli_obj_datatype( *x ); dim_t n = bli_obj_vector_dim( *x ); @@ -69,42 +59,65 @@ void bli_swapv_kernel( obj_t* x, inc_t inc_y = bli_obj_vector_inc( *y ); void* buf_y = bli_obj_buffer_at_off( *y ); - FUNCPTR_T f; + FUNCPTR_T f = ftypes[dt]; - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y]; + if ( bli_error_checking_is_enabled() ) + bli_swapv_check( x, y ); - // Invoke the function. + // Invoke the void pointer-based function. f( n, buf_x, inc_x, buf_y, inc_y ); } -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname, kername ) \ +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ \ -void PASTEMAC2(chx,chy,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,opname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ) \ { \ - PASTEMAC2(chx,chy,kername)( n, \ - x, incx, \ - y, incy ); \ + PASTEMAC(ch,kername)( n, \ + x, incx, \ + y, incy ); \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( swapv_kernel_void, SWAPV_KERNEL ) +INSERT_GENTFUNC_BASIC( swapv_void, swapv ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( swapv_kernel_void, SWAPV_KERNEL ) -#endif -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( swapv_kernel_void, SWAPV_KERNEL ) -#endif +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1v_ker_dt( dt, kerid, &cntx ); \ +\ + f( n, \ + x, incx, \ + y, incy ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( swapv, BLIS_SWAPV_KER ) diff --git a/frame/1/swapv/bli_swapv.h b/frame/1/old/swapv/bli_swapv.h similarity index 76% rename from frame/1/swapv/bli_swapv.h rename to frame/1/old/swapv/bli_swapv.h index c73118aec..2fe9ce438 100644 --- a/frame/1/swapv/bli_swapv.h +++ b/frame/1/old/swapv/bli_swapv.h @@ -33,7 +33,6 @@ */ #include "bli_swapv_check.h" -#include "bli_swapv_kernel.h" #include "bli_swapv_ref.h" @@ -45,7 +44,22 @@ void bli_swapv( obj_t* x, // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ); + +INSERT_GENTPROT_BASIC( swapv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -58,26 +72,3 @@ void PASTEMAC(ch,opname)( \ INSERT_GENTPROT_BASIC( swapv ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \ -\ -void PASTEMAC2(chx,chy,opname)( \ - dim_t n, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ); - -INSERT_GENTPROT2_BASIC( swapv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( swapv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( swapv ) -#endif - diff --git a/frame/1/packv/bli_packv_check.c b/frame/1/packv/bli_packv_check.c index 8737b8bbb..930ba84d0 100644 --- a/frame/1/packv/bli_packv_check.c +++ b/frame/1/packv/bli_packv_check.c @@ -34,9 +34,12 @@ #include "blis.h" -void bli_packv_check( obj_t* c, - obj_t* p, - packv_t* cntl ) +void bli_packv_check + ( + obj_t* c, + obj_t* p, + cntx_t* cntx + ) { err_t e_val; @@ -50,11 +53,5 @@ void bli_packv_check( obj_t* c, // We don't check for conformal dimensions between c and p because // p has not yet been initialized. - // Check control tree pointer - - // NOTE: We can't check the control tree until we stop interpreting a - // NULL value (in bli_packv_int()) as a request to skip the operation. - //e_val = bli_check_valid_cntl( ( void* )cntl ); - //bli_check_error_code( e_val ); } diff --git a/frame/1/packv/bli_packv_check.h b/frame/1/packv/bli_packv_check.h index fd7d7c850..42674a4fa 100644 --- a/frame/1/packv/bli_packv_check.h +++ b/frame/1/packv/bli_packv_check.h @@ -32,6 +32,9 @@ */ -void bli_packv_check( obj_t* c, - obj_t* p, - packv_t* cntl ); +void bli_packv_check + ( + obj_t* c, + obj_t* p, + cntx_t* cntx + ); diff --git a/frame/1/packv/bli_packv_cntl.c b/frame/1/packv/bli_packv_cntl.c index dbc353b43..cb1404ee9 100644 --- a/frame/1/packv/bli_packv_cntl.c +++ b/frame/1/packv/bli_packv_cntl.c @@ -36,32 +36,23 @@ packv_t* packv_cntl; -blksz_t* packv_mult_dim; - -void bli_packv_cntl_init() +void bli_packv_cntl_init( void ) { - packv_mult_dim = bli_blksz_obj_create( BLIS_DEFAULT_VR_S, 0, - BLIS_DEFAULT_VR_D, 0, - BLIS_DEFAULT_VR_C, 0, - BLIS_DEFAULT_VR_Z, 0 ); - packv_cntl = bli_packv_cntl_obj_create( BLIS_UNBLOCKED, BLIS_VARIANT1, - packv_mult_dim, + BLIS_VF, BLIS_PACKED_VECTOR ); } -void bli_packv_cntl_finalize() +void bli_packv_cntl_finalize( void ) { bli_cntl_obj_free( packv_cntl ); - - bli_blksz_obj_free( packv_mult_dim ); } -packv_t* bli_packv_cntl_obj_create( impl_t impl_type, - varnum_t var_num, - blksz_t* mult_dim, - pack_t pack_schema ) +packv_t* bli_packv_cntl_obj_create( impl_t impl_type, + varnum_t var_num, + bszid_t bmid, + pack_t pack_schema ) { packv_t* cntl; @@ -69,7 +60,7 @@ packv_t* bli_packv_cntl_obj_create( impl_t impl_type, cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->mult_dim = mult_dim; + cntl->bmid = bmid; cntl->pack_schema = pack_schema; return cntl; @@ -78,12 +69,12 @@ packv_t* bli_packv_cntl_obj_create( impl_t impl_type, void bli_packv_cntl_obj_init( packv_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* mult_dim, + bszid_t bmid, pack_t pack_schema ) { cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->mult_dim = mult_dim; + cntl->bmid = bmid; cntl->pack_schema = pack_schema; } diff --git a/frame/1/packv/bli_packv_cntl.h b/frame/1/packv/bli_packv_cntl.h index 2cac974fb..30259b424 100644 --- a/frame/1/packv/bli_packv_cntl.h +++ b/frame/1/packv/bli_packv_cntl.h @@ -36,12 +36,12 @@ struct packv_s { impl_t impl_type; varnum_t var_num; + bszid_t bmid; pack_t pack_schema; - blksz_t* mult_dim; }; typedef struct packv_s packv_t; -#define cntl_mult_dim( cntl ) cntl->mult_dim +#define cntl_bmid( cntl ) cntl->bmid #define cntl_sub_packv( cntl ) cntl->sub_packv #define cntl_sub_packv_x( cntl ) cntl->sub_packv_x @@ -53,11 +53,11 @@ void bli_packv_cntl_init( void ); void bli_packv_cntl_finalize( void ); packv_t* bli_packv_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* mult_dim, + bszid_t bmid, pack_t pack_schema ); void bli_packv_cntl_obj_init( packv_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* mult_dim, + bszid_t bmid, pack_t pack_schema ); diff --git a/frame/1/packv/bli_packv_init.c b/frame/1/packv/bli_packv_init.c index 8af926615..b8d176b86 100644 --- a/frame/1/packv/bli_packv_init.c +++ b/frame/1/packv/bli_packv_init.c @@ -34,9 +34,13 @@ #include "blis.h" -void bli_packv_init( obj_t* a, - obj_t* p, - packv_t* cntl ) +void bli_packv_init + ( + obj_t* a, + obj_t* p, + cntx_t* cntx, + packv_t* cntl + ) { // The purpose of packm_init() is to initialize an object P so that // a source object A can be packed into P via one of the packv @@ -45,12 +49,12 @@ void bli_packv_init( obj_t* a, // has not already been allocated previously. pack_t pack_schema; - blksz_t* mult_m; + bszid_t bmult_id; obj_t c; // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_packv_check( a, p, cntl ); + bli_packv_check( a, p, cntx ); // First check if we are to skip this operation because the control tree // is NULL, and if so, simply alias the object to its packed counterpart. @@ -103,26 +107,34 @@ void bli_packv_init( obj_t* a, // explicitly into _init_pack(). This allows external code generators // the option of bypassing usage of control trees altogether. pack_schema = cntl_pack_schema( cntl ); - mult_m = cntl_mult_dim( cntl ); + bmult_id = cntl_bmid( cntl ); // Initialize object p for the final packed vector. - bli_packv_init_pack( pack_schema, - mult_m, - &c, - p ); + bli_packv_init_pack + ( + pack_schema, + bmult_id, + &c, + p, + cntx + ); // Now p is ready to be packed. } -void bli_packv_init_pack( pack_t pack_schema, - blksz_t* mult_m, - obj_t* c, - obj_t* p ) +void bli_packv_init_pack + ( + pack_t pack_schema, + bszid_t bmult_id, + obj_t* c, + obj_t* p, + cntx_t* cntx + ) { - num_t datatype = bli_obj_datatype( *c ); - dim_t dim_c = bli_obj_vector_dim( *c ); - dim_t mult_m_dim = bli_blksz_get_def( datatype, mult_m ); + num_t dt = bli_obj_datatype( *c ); + dim_t dim_c = bli_obj_vector_dim( *c ); + dim_t bmult = bli_cntx_get_blksz_def_dt( dt, bmult_id, cntx ); mem_t* mem_p; dim_t m_p_pad; @@ -149,7 +161,7 @@ void bli_packv_init_pack( pack_t pack_schema, mem_p = bli_obj_pack_mem( *p ); // Compute the dimensions padded by the dimension multiples. - m_p_pad = bli_align_dim_to_mult( bli_obj_vector_dim( *p ), mult_m_dim ); + m_p_pad = bli_align_dim_to_mult( bli_obj_vector_dim( *p ), bmult ); // Compute the size of the packed buffer. size_p = m_p_pad * 1 * bli_obj_elem_size( *p ); @@ -199,8 +211,11 @@ void bli_packv_init_pack( pack_t pack_schema, } } -void bli_packv_release( obj_t* p, - packv_t* cntl ) +void bli_packv_release + ( + obj_t* p, + packv_t* cntl + ) { if ( !cntl_is_noop( cntl ) ) bli_obj_release_pack( p ); diff --git a/frame/1/packv/bli_packv_init.h b/frame/1/packv/bli_packv_init.h index 3730fe3eb..03d12903c 100644 --- a/frame/1/packv/bli_packv_init.h +++ b/frame/1/packv/bli_packv_init.h @@ -32,17 +32,28 @@ */ -void bli_packv_init( obj_t* a, - obj_t* p, - packv_t* cntl ); +void bli_packv_init + ( + obj_t* a, + obj_t* p, + cntx_t* cntx, + packv_t* cntl + ); -void bli_packv_init_pack( pack_t pack_schema, - blksz_t* mult_m, - obj_t* c, - obj_t* p ); +void bli_packv_init_pack + ( + pack_t pack_schema, + bszid_t bmult_id, + obj_t* c, + obj_t* p, + cntx_t* cntx + ); -void bli_packv_release( obj_t* p, - packv_t* cntl ); +void bli_packv_release + ( + obj_t* p, + packv_t* cntl + ); /* void bli_packv_init_cast( obj_t* a, diff --git a/frame/1/packv/bli_packv_int.c b/frame/1/packv/bli_packv_int.c index 72de4e0c9..4a2cf970f 100644 --- a/frame/1/packv/bli_packv_int.c +++ b/frame/1/packv/bli_packv_int.c @@ -38,6 +38,7 @@ typedef void (*FUNCPTR_T)( obj_t* a, obj_t* p, + cntx_t* cntx, packv_t* cntl ); static FUNCPTR_T vars[1][3] = @@ -48,6 +49,7 @@ static FUNCPTR_T vars[1][3] = void bli_packv_int( obj_t* a, obj_t* p, + cntx_t* cntx, packv_t* cntl ) { // The packv operation consists of an optional typecasting pre-process. @@ -69,7 +71,7 @@ void bli_packv_int( obj_t* a, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_packv_check( a, p, cntl ); + bli_packv_check( a, p, cntx ); // Sanity check; A should never have a zero dimension. If we must support // it, then we should fold it into the next alias-and-early-exit block. @@ -121,6 +123,7 @@ void bli_packv_int( obj_t* a, // Invoke the variant. f( a, p, + cntx, cntl ); } diff --git a/frame/1/packv/bli_packv_int.h b/frame/1/packv/bli_packv_int.h index 962979fa9..d917fb4ab 100644 --- a/frame/1/packv/bli_packv_int.h +++ b/frame/1/packv/bli_packv_int.h @@ -32,7 +32,11 @@ */ -void bli_packv_int( obj_t* c, - obj_t* p, - packv_t* cntl ); +void bli_packv_int + ( + obj_t* c, + obj_t* p, + cntx_t* cntx, + packv_t* cntl + ); diff --git a/frame/1/packv/bli_packv_unb_var1.c b/frame/1/packv/bli_packv_unb_var1.c index ec87dd6fe..9f54f5eb0 100644 --- a/frame/1/packv/bli_packv_unb_var1.c +++ b/frame/1/packv/bli_packv_unb_var1.c @@ -39,7 +39,8 @@ typedef void (*FUNCPTR_T)( dim_t m, void* c, inc_t incc, - void* p, inc_t incp + void* p, inc_t incp, + cntx_t* cntx ); static FUNCPTR_T GENARRAY(ftypes,packv_unb_var1); @@ -47,6 +48,7 @@ static FUNCPTR_T GENARRAY(ftypes,packv_unb_var1); void bli_packv_unb_var1( obj_t* c, obj_t* p, + cntx_t* cntx, packv_t* cntl ) { num_t dt_cp = bli_obj_datatype( *c ); @@ -66,29 +68,40 @@ void bli_packv_unb_var1( obj_t* c, f = ftypes[dt_cp]; // Invoke the function. - f( dim_p, - buf_c, incc, - buf_p, incp ); + f + ( + dim_p, + buf_c, incc, + buf_p, incp, + cntx + ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, kername ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t m, \ - void* c, inc_t incc, \ - void* p, inc_t incp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t m, \ + void* c, inc_t incc, \ + void* p, inc_t incp, \ + cntx_t* cntx \ + ) \ { \ - ctype* c_cast = c; \ - ctype* p_cast = p; \ + const num_t dt = PASTEMAC(ch,type); \ \ - PASTEMAC2(ch,ch,kername)( BLIS_NO_CONJUGATE, \ - m, \ - c_cast, incc, \ - p_cast, incp ); \ + PASTECH(ch,copyv_ft) copyv_p = bli_cntx_get_l1v_ker_dt( dt, BLIS_COPYV_KER, cntx ); \ +\ + copyv_p \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + c, incc, \ + p, incp, \ + cntx \ + ); \ } -INSERT_GENTFUNC_BASIC( packv_unb_var1, COPYV_KERNEL ) +INSERT_GENTFUNC_BASIC0( packv_unb_var1 ) diff --git a/frame/1/packv/bli_packv_unb_var1.h b/frame/1/packv/bli_packv_unb_var1.h index 17a40e1d3..62338c1f5 100644 --- a/frame/1/packv/bli_packv_unb_var1.h +++ b/frame/1/packv/bli_packv_unb_var1.h @@ -34,18 +34,19 @@ void bli_packv_unb_var1( obj_t* c, obj_t* p, + cntx_t* cntx, packv_t* cntl ); -void bli_packv_unb_var1_set_strides( obj_t* p, - packv_t* cntl ); #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t m, \ - void* c, inc_t incc, \ - void* p, inc_t incp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + dim_t m, \ + void* c, inc_t incc, \ + void* p, inc_t incp, \ + cntx_t* cntx \ + ); INSERT_GENTPROT_BASIC( packv_unb_var1 ) diff --git a/frame/1/scal2v/bli_scal2v.c b/frame/1/scal2v/bli_scal2v.c deleted file mode 100644 index 3a1f0c11d..000000000 --- a/frame/1/scal2v/bli_scal2v.c +++ /dev/null @@ -1,130 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* beta, \ - obj_t* x, \ - obj_t* y \ - ) \ -{ \ - num_t dt_x; \ - obj_t beta_local; \ -\ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( beta, x, y ); \ -\ - /* Use the datatype of x as the target type for beta (since we do - not assume mixed domain/type support is enabled). */ \ - dt_x = bli_obj_datatype( *x ); \ -\ - /* Create an object to hold a copy-cast of beta. */ \ - bli_obj_scalar_init_detached_copy_of( dt_x, \ - BLIS_NO_CONJUGATE, \ - beta, \ - &beta_local ); \ -\ - PASTEMAC0(varname)( &beta_local, \ - x, \ - y ); \ -} - -GENFRONT( scal2v, scal2v_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype* beta, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy \ - ) \ -{ \ - PASTEMAC3(ch,ch,ch,varname)( conjx, \ - n, \ - beta, \ - x, incx, \ - y, incy ); \ -} - -INSERT_GENTFUNC_BASIC( scal2v, SCAL2V_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3 -#define GENTFUNC3( ctype_b, ctype_x, ctype_y, chb, chx, chy, opname, varname ) \ -\ -void PASTEMAC3(chb,chx,chy,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype_b* beta, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ) \ -{ \ - PASTEMAC3(chb,chx,chy,varname)( conjx, \ - n, \ - beta, \ - x, incx, \ - y, incy ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3_BASIC( scal2v, SCAL2V_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3_MIX_D( scal2v, SCAL2V_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3_MIX_P( scal2v, SCAL2V_KERNEL ) -#endif - diff --git a/frame/1/scal2v/bli_scal2v_kernel.c b/frame/1/scal2v/bli_scal2v_kernel.c deleted file mode 100644 index 13a78643e..000000000 --- a/frame/1/scal2v/bli_scal2v_kernel.c +++ /dev/null @@ -1,129 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -#define FUNCPTR_T scal2v_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - dim_t n, - void* beta, - void* x, inc_t incx, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,scal2v_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,scal2v_kernel_void); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,scal2v_kernel_void); -#endif -#endif - - -void bli_scal2v_kernel( obj_t* beta, - obj_t* x, - obj_t* y ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t conjx = bli_obj_conj_status( *x ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // If beta is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the beta object and extract the buffer at the beta offset. - bli_set_scalar_dt_buffer( beta, dt_x, dt_beta, buf_beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_beta][dt_x][dt_y]; - - // Invoke the function. - f( conjx, - n, - buf_beta, - buf_x, inc_x, - buf_y, inc_y ); -} - - -#undef GENTFUNC3 -#define GENTFUNC3( ctype_b, ctype_x, ctype_y, chb, chx, chy, varname, kername ) \ -\ -void PASTEMAC3(chb,chx,chy,varname)( \ - conj_t conjx, \ - dim_t n, \ - void* beta, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ) \ -{ \ - PASTEMAC3(chb,chx,chy,kername)( conjx, \ - n, \ - beta, \ - x, incx, \ - y, incy ); \ -} - - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3_BASIC( scal2v_kernel_void, SCAL2V_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3_MIX_D( scal2v_kernel_void, SCAL2V_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3_MIX_P( scal2v_kernel_void, SCAL2V_KERNEL ) -#endif - diff --git a/frame/1/scal2v/bli_scal2v_kernel.h b/frame/1/scal2v/bli_scal2v_kernel.h deleted file mode 100644 index 3777e56f0..000000000 --- a/frame/1/scal2v/bli_scal2v_kernel.h +++ /dev/null @@ -1,64 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_scal2v_kernel( obj_t* beta, - obj_t* x, - obj_t* y ); - - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT3 -#define GENTPROT3( ctype_b, ctype_x, ctype_y, chb, chx, chy, varname ) \ -\ -void PASTEMAC3(chb,chx,chy,varname)( \ - conj_t conjx, \ - dim_t n, \ - void* beta, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ); - -INSERT_GENTPROT3_BASIC( scal2v_kernel_void ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3_MIX_D( scal2v_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3_MIX_P( scal2v_kernel_void ) -#endif - diff --git a/frame/1/scal2v/bli_scal2v_ref.c b/frame/1/scal2v/bli_scal2v_ref.c deleted file mode 100644 index d1cc87643..000000000 --- a/frame/1/scal2v/bli_scal2v_ref.c +++ /dev/null @@ -1,170 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T scal2v_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - dim_t n, - void* beta, - void* x, inc_t incx, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,scal2v_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,scal2v_ref); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,scal2v_ref); -#endif -#endif - - -void bli_scal2v_ref( obj_t* beta, - obj_t* x, - obj_t* y ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t conjx = bli_obj_conj_status( *x ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // If beta is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the beta object and extract the buffer at the beta offset. - bli_set_scalar_dt_buffer( beta, dt_x, dt_beta, buf_beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_beta][dt_x][dt_y]; - - // Invoke the function. - f( conjx, - n, - buf_beta, - buf_x, inc_x, - buf_y, inc_y ); -} -*/ - - -#undef GENTFUNC3 -#define GENTFUNC3( ctype_b, ctype_x, ctype_y, chb, chx, chy, varname, setvker ) \ -\ -void PASTEMAC3(chb,chx,chy,varname) \ - ( \ - conj_t conjx, \ - dim_t n, \ - ctype_b* restrict beta, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ) \ -{ \ - ctype_b* beta_cast = beta; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_x* chi1; \ - ctype_y* psi1; \ - dim_t i; \ -\ - if ( bli_zero_dim1( n ) ) return; \ -\ - /* If beta is zero, use setv. */ \ - if ( PASTEMAC(chb,eq0)( *beta_cast ) ) \ - { \ - ctype_y* zero = PASTEMAC(chy,0); \ -\ - PASTEMAC2(chy,chy,setvker)( n, \ - zero, \ - y, incy ); \ - return; \ - } \ -\ - chi1 = x_cast; \ - psi1 = y_cast; \ -\ - if ( bli_is_conj( conjx ) ) \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC3(chb,chx,chy,scal2js)( *beta_cast, *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ - else \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC3(chb,chx,chy,scal2s)( *beta_cast, *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ -} - - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3_BASIC( scal2v_ref, SETV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3_MIX_D( scal2v_ref, SETV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3_MIX_P( scal2v_ref, SETV_KERNEL ) -#endif - diff --git a/frame/1/scalv/bli_scalv.c b/frame/1/scalv/bli_scalv.c deleted file mode 100644 index 4c8623918..000000000 --- a/frame/1/scalv/bli_scalv.c +++ /dev/null @@ -1,124 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* beta, \ - obj_t* x \ - ) \ -{ \ - num_t dt_x; \ - obj_t beta_local; \ -\ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( beta, x ); \ -\ - /* Use the datatype of x as the target type for beta (since we do - not assume mixed domain/type support is enabled). */ \ - dt_x = bli_obj_datatype( *x ); \ -\ - /* Create an object to hold a copy-cast of beta. */ \ - bli_obj_scalar_init_detached_copy_of( dt_x, \ - BLIS_NO_CONJUGATE, \ - beta, \ - &beta_local ); \ -\ - PASTEMAC0(varname)( &beta_local, \ - x ); \ -} - -GENFRONT( scalv, scalv_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjbeta, \ - dim_t n, \ - ctype* beta, \ - ctype* x, inc_t incx \ - ) \ -{ \ - PASTEMAC2(ch,ch,varname)( conjbeta, \ - n, \ - beta, \ - x, incx ); \ -} - -INSERT_GENTFUNC_BASIC( scalv, SCALV_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC2 -#define GENTFUNC2( ctype_b, ctype_x, chb, chx, opname, varname ) \ -\ -void PASTEMAC2(chb,chx,opname)( \ - conj_t conjbeta, \ - dim_t n, \ - ctype_b* beta, \ - ctype_x* x, inc_t incx \ - ) \ -{ \ - PASTEMAC2(chb,chx,varname)( conjbeta, \ - n, \ - beta, \ - x, incx ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( scalv, SCALV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( scalv, SCALV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( scalv, SCALV_KERNEL ) -#endif - diff --git a/frame/1/scalv/bli_scalv_int.c b/frame/1/scalv/bli_scalv_int.c index e74172df6..6e72b2fa8 100644 --- a/frame/1/scalv/bli_scalv_int.c +++ b/frame/1/scalv/bli_scalv_int.c @@ -34,19 +34,19 @@ #include "blis.h" -#define FUNCPTR_T scalv_fp - -typedef void (*FUNCPTR_T)( obj_t* beta, - obj_t* x ); +typedef void (*FUNCPTR_T)( obj_t* alpha, + obj_t* x, + cntx_t* cntx ); static FUNCPTR_T vars[1][3] = { // unblocked optimized unblocked blocked - { bli_scalv_kernel, bli_scalv_kernel, NULL } + { bli_scalv_ex, bli_scalv_ex, NULL } }; -void bli_scalv_int( obj_t* beta, +void bli_scalv_int( obj_t* alpha, obj_t* x, + cntx_t* cntx, scalv_t* cntl ) { varnum_t n; @@ -58,13 +58,13 @@ void bli_scalv_int( obj_t* beta, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_scalv_int_check( beta, x, cntl ); + bli_scalv_check( alpha, x ); // First check if we are to skip this operation. if ( cntl_is_noop( cntl ) ) return; - // Return early if the beta scalar equals one. - if ( bli_obj_equals( beta, &BLIS_ONE ) ) return; + // Return early if the alpha scalar equals one. + if ( bli_obj_equals( alpha, &BLIS_ONE ) ) return; // Extract the variant number and implementation type. n = cntl_var_num( cntl ); @@ -74,7 +74,8 @@ void bli_scalv_int( obj_t* beta, f = vars[n][i]; // Invoke the variant. - f( beta, - x ); + f( alpha, + x, + cntx ); } diff --git a/frame/1/scalv/bli_scalv_int.h b/frame/1/scalv/bli_scalv_int.h index 1580c9581..a198d42f8 100644 --- a/frame/1/scalv/bli_scalv_int.h +++ b/frame/1/scalv/bli_scalv_int.h @@ -32,7 +32,8 @@ */ -void bli_scalv_int( obj_t* beta, +void bli_scalv_int( obj_t* alpha, obj_t* x, + cntx_t* cntx, scalv_t* cntl ); diff --git a/frame/1/scalv/bli_scalv_kernel.c b/frame/1/scalv/bli_scalv_kernel.c deleted file mode 100644 index 1840ca13b..000000000 --- a/frame/1/scalv/bli_scalv_kernel.c +++ /dev/null @@ -1,120 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -#define FUNCPTR_T scalv_fp - -typedef void (*FUNCPTR_T)( - conj_t conjbeta, - dim_t n, - void* beta, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,scalv_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,scalv_kernel_void); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,scalv_kernel_void); -#endif -#endif - - -void bli_scalv_kernel( obj_t* beta, - obj_t* x ) -{ - num_t dt_x = bli_obj_datatype( *x ); - - conj_t conjbeta = bli_obj_conj_status( *beta ); - - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // If beta is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the beta object and extract the buffer at the beta offset. - bli_set_scalar_dt_buffer( beta, dt_x, dt_beta, buf_beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_beta][dt_x]; - - // Invoke the function. - f( conjbeta, - n, - buf_beta, - buf_x, inc_x ); -} - - -#undef GENTFUNC2 -#define GENTFUNC2( ctype_b, ctype_x, chb, chx, varname, kername ) \ -\ -void PASTEMAC2(chb,chx,varname)( \ - conj_t conjbeta, \ - dim_t n, \ - void* beta, \ - void* x, inc_t incx \ - ) \ -{ \ - PASTEMAC2(chb,chx,kername)( conjbeta, \ - n, \ - beta, \ - x, incx ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( scalv_kernel_void, SCALV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( scalv_kernel_void, SCALV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( scalv_kernel_void, SCALV_KERNEL ) -#endif - diff --git a/frame/1/scalv/bli_scalv_ref.c b/frame/1/scalv/bli_scalv_ref.c deleted file mode 100644 index 46af42506..000000000 --- a/frame/1/scalv/bli_scalv_ref.c +++ /dev/null @@ -1,151 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T scalv_fp - -typedef void (*FUNCPTR_T)( - conj_t conjbeta, - dim_t n, - void* beta, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,scalv_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,scalv_ref); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,scalv_ref); -#endif -#endif - - -void bli_scalv_ref( obj_t* beta, - obj_t* x ) -{ - num_t dt_x = bli_obj_datatype( *x ); - - conj_t conjbeta = bli_obj_conj_status( *beta ); - - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // If beta is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the beta object and extract the buffer at the beta offset. - bli_set_scalar_dt_buffer( beta, dt_x, dt_beta, buf_beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_beta][dt_x]; - - // Invoke the function. - f( conjbeta, - n, - buf_beta, - buf_x, inc_x ); -} -*/ - - -#undef GENTFUNC2 -#define GENTFUNC2( ctype_b, ctype_x, chb, chx, varname, setvker ) \ -\ -void PASTEMAC2(chb,chx,varname) \ - ( \ - conj_t conjbeta, \ - dim_t n, \ - ctype_b* restrict beta, \ - ctype_x* restrict x, inc_t incx \ - ) \ -{ \ - ctype_b* beta_cast = beta; \ - ctype_x* x_cast = x; \ - ctype_x* chi1; \ - ctype_b beta_conj; \ - dim_t i; \ -\ - if ( bli_zero_dim1( n ) ) return; \ -\ - /* If beta is one, return. */ \ - if ( PASTEMAC(chb,eq1)( *beta_cast ) ) return; \ -\ - /* If beta is zero, use setv. */ \ - if ( PASTEMAC(chb,eq0)( *beta_cast ) ) \ - { \ - ctype_x* zero = PASTEMAC(chb,0); \ -\ - PASTEMAC2(chx,chx,setvker)( n, \ - zero, \ - x, incx ); \ - return; \ - } \ -\ - PASTEMAC(chb,copycjs)( conjbeta, *beta_cast, beta_conj ); \ -\ - chi1 = x_cast; \ -\ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC2(chb,chx,scals)( beta_conj, *chi1 ); \ -\ - chi1 += incx; \ - } \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( scalv_ref, SETV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( scalv_ref, SETV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( scalv_ref, SETV_KERNEL ) -#endif - diff --git a/frame/1/setv/bli_setv.c b/frame/1/setv/bli_setv.c deleted file mode 100644 index 9f37076d8..000000000 --- a/frame/1/setv/bli_setv.c +++ /dev/null @@ -1,121 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* beta, \ - obj_t* x \ - ) \ -{ \ - num_t dt_x; \ - obj_t beta_local; \ -/* - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( beta, x ); \ -*/ \ -\ - /* Use the datatype of x as the target type for beta (since we do - not assume mixed domain/type support is enabled). */ \ - dt_x = bli_obj_datatype( *x ); \ -\ - /* Create an object to hold a copy-cast of beta. */ \ - bli_obj_scalar_init_detached_copy_of( dt_x, \ - BLIS_NO_CONJUGATE, \ - beta, \ - &beta_local ); \ -\ - PASTEMAC0(varname)( &beta_local, \ - x ); \ -} - -GENFRONT( setv, setv_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - dim_t n, \ - ctype* beta, \ - ctype* x, inc_t incx \ - ) \ -{ \ - PASTEMAC2(ch,ch,varname)( n, \ - beta, \ - x, incx ); \ -} - -INSERT_GENTFUNC_BASIC( setv, SETV_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC2 -#define GENTFUNC2( ctype_b, ctype_x, chb, chx, opname, varname ) \ -\ -void PASTEMAC2(chb,chx,opname)( \ - dim_t n, \ - ctype_b* beta, \ - ctype_x* x, inc_t incx \ - ) \ -{ \ - PASTEMAC2(chb,chx,varname)( n, \ - beta, \ - x, incx ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( setv, SETV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( setv, SETV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( setv, SETV_KERNEL ) -#endif - diff --git a/frame/1/setv/bli_setv_kernel.c b/frame/1/setv/bli_setv_kernel.c deleted file mode 100644 index cb3564787..000000000 --- a/frame/1/setv/bli_setv_kernel.c +++ /dev/null @@ -1,113 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -#define FUNCPTR_T setv_fp - -typedef void (*FUNCPTR_T)( - dim_t n, - void* beta, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,setv_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,setv_kernel_void); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,setv_kernel_void); -#endif -#endif - - -void bli_setv_kernel( obj_t* beta, - obj_t* x ) -{ - num_t dt_x = bli_obj_datatype( *x ); - - dim_t n = bli_obj_vector_dim( *x ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t inc_x = bli_obj_vector_inc( *x ); - - void* buf_beta; - num_t dt_beta; - - FUNCPTR_T f; - - // If beta is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the beta object and extract the buffer at the beta offset. - bli_set_scalar_dt_buffer( beta, dt_x, dt_beta, buf_beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_beta][dt_x]; - - // Invoke the function. - f( n, - buf_beta, - buf_x, inc_x ); -} - - -#undef GENTFUNC2 -#define GENTFUNC2( ctype_b, ctype_x, chb, chx, varname, kername ) \ -\ -void PASTEMAC2(chb,chx,varname)( \ - dim_t n, \ - void* beta, \ - void* x, inc_t incx \ - ) \ -{ \ - PASTEMAC2(chb,chx,kername)( n, \ - beta, \ - x, incx ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( setv_kernel_void, SETV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( setv_kernel_void, SETV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( setv_kernel_void, SETV_KERNEL ) -#endif diff --git a/frame/1/setv/bli_setv_ref.c b/frame/1/setv/bli_setv_ref.c deleted file mode 100644 index db9862606..000000000 --- a/frame/1/setv/bli_setv_ref.c +++ /dev/null @@ -1,136 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T setv_fp - -typedef void (*FUNCPTR_T)( - dim_t n, - void* beta, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,setv_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,setv_ref); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,setv_ref); -#endif -#endif - - -void bli_setv_ref( obj_t* beta, - obj_t* x ) -{ - num_t dt_x = bli_obj_datatype( *x ); - - dim_t n = bli_obj_vector_dim( *x ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t inc_x = bli_obj_vector_inc( *x ); - - void* buf_beta; - num_t dt_beta; - - FUNCPTR_T f; - - // If beta is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the beta object and extract the buffer at the beta offset. - bli_set_scalar_dt_buffer( beta, dt_x, dt_beta, buf_beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_beta][dt_x]; - - // Invoke the function. - f( n, - buf_beta, - buf_x, inc_x ); -} -*/ - -#undef GENTFUNC2 -#define GENTFUNC2( ctype_b, ctype_x, chb, chx, varname ) \ -\ -void PASTEMAC2(chb,chx,varname) \ - ( \ - dim_t n, \ - ctype_b* restrict beta, \ - ctype_x* restrict x, inc_t incx \ - ) \ -{ \ - ctype_b* beta_cast = beta; \ - ctype_x* chi1 = x; \ - dim_t i; \ -\ - if ( bli_zero_dim1( n ) ) return; \ -\ - if ( PASTEMAC(chb,eq0)( *beta_cast ) ) \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC(chx,set0s)( *chi1 ); \ -\ - chi1 += incx; \ - } \ - } \ - else \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC2(chb,chx,copys)( *beta_cast, *chi1 ); \ -\ - chi1 += incx; \ - } \ - } \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC0( setv_ref ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D0( setv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P0( setv_ref ) -#endif diff --git a/frame/1/subv/bli_subv.c b/frame/1/subv/bli_subv.c deleted file mode 100644 index 3425967f9..000000000 --- a/frame/1/subv/bli_subv.c +++ /dev/null @@ -1,109 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* x, \ - obj_t* y \ - ) \ -{ \ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( x, y ); \ -\ - PASTEMAC0(varname)( x, \ - y ); \ -} - -GENFRONT( subv, subv_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy \ - ) \ -{ \ - PASTEMAC2(ch,ch,varname)( conjx, \ - n, \ - x, incx, \ - y, incy ); \ -} - -INSERT_GENTFUNC_BASIC( subv, SUBV_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \ -\ -void PASTEMAC2(chx,chy,opname)( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ) \ -{ \ - PASTEMAC2(chx,chy,varname)( conjx, \ - n, \ - x, incx, \ - y, incy ); \ -} - -INSERT_GENTFUNC2_BASIC( subv, SUBV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( subv, SUBV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( subv, SUBV_KERNEL ) -#endif - diff --git a/frame/1/subv/bli_subv_kernel.h b/frame/1/subv/bli_subv_kernel.h deleted file mode 100644 index 2f4a16a06..000000000 --- a/frame/1/subv/bli_subv_kernel.h +++ /dev/null @@ -1,61 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_subv_kernel( obj_t* x, - obj_t* y ); - - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ -\ -void PASTEMAC2(chx,chy,varname)( \ - conj_t conjx, \ - dim_t n, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ); - -INSERT_GENTPROT2_BASIC( subv_kernel_void ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( subv_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( subv_kernel_void ) -#endif diff --git a/frame/1/subv/bli_subv_ref.c b/frame/1/subv/bli_subv_ref.c deleted file mode 100644 index 500f6f0ae..000000000 --- a/frame/1/subv/bli_subv_ref.c +++ /dev/null @@ -1,145 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T subv_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - dim_t n, - void* x, inc_t incx, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,subv_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,subv_ref); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,subv_ref); -#endif -#endif - - -void bli_subv_ref( obj_t* x, - obj_t* y ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t conjx = bli_obj_conj_status( *x ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - FUNCPTR_T f; - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y]; - - // Invoke the function. - f( conjx, - n, - buf_x, inc_x, - buf_y, inc_y ); -} -*/ - - -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname ) \ -\ -void PASTEMAC2(chx,chy,varname) \ - ( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ) \ -{ \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_x* chi1; \ - ctype_y* psi1; \ - dim_t i; \ -\ - if ( bli_zero_dim1( n ) ) return; \ -\ - chi1 = x_cast; \ - psi1 = y_cast; \ -\ - if ( bli_is_conj( conjx ) ) \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC2(chx,chy,subjs)( *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ - else \ - { \ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC2(chx,chy,subs)( *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ - } \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC0( subv_ref ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D0( subv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P0( subv_ref ) -#endif - diff --git a/frame/1/swapv/bli_swapv_ref.c b/frame/1/swapv/bli_swapv_ref.c deleted file mode 100644 index 82848821d..000000000 --- a/frame/1/swapv/bli_swapv_ref.c +++ /dev/null @@ -1,128 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T swapv_fp - -typedef void (*FUNCPTR_T)( - dim_t n, - void* x, inc_t incx, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,swapv_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,swapv_ref); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,swapv_ref); -#endif -#endif - - -void bli_swapv_ref( obj_t* x, - obj_t* y ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - FUNCPTR_T f; - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y]; - - // Invoke the function. - f( n, - buf_x, inc_x, - buf_y, inc_y ); -} -*/ - - -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname ) \ -\ -void PASTEMAC2(chx,chy,varname) \ - ( \ - dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ) \ -{ \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_x* chi1; \ - ctype_y* psi1; \ - dim_t i; \ -\ - if ( bli_zero_dim1( n ) ) return; \ -\ - chi1 = x_cast; \ - psi1 = y_cast; \ -\ - for ( i = 0; i < n; ++i ) \ - { \ - PASTEMAC2(chx,chy,swaps)( *chi1, *psi1 ); \ -\ - chi1 += incx; \ - psi1 += incy; \ - } \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC0( swapv_ref ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D0( swapv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P0( swapv_ref ) -#endif - diff --git a/frame/1/unpackv/bli_unpackv_check.c b/frame/1/unpackv/bli_unpackv_check.c index 4ca6b86f8..a487ba7c6 100644 --- a/frame/1/unpackv/bli_unpackv_check.c +++ b/frame/1/unpackv/bli_unpackv_check.c @@ -34,9 +34,12 @@ #include "blis.h" -void bli_unpackv_check( obj_t* p, - obj_t* a, - unpackv_t* cntl ) +void bli_unpackv_check + ( + obj_t* p, + obj_t* a, + cntx_t* cntx + ) { err_t e_val; @@ -58,11 +61,5 @@ void bli_unpackv_check( obj_t* p, e_val = bli_check_packv_schema_on_unpack( p ); bli_check_error_code( e_val ); - // Check control tree pointer - - // NOTE: We can't check the control tree until we stop interpreting a - // NULL value (in bli_unpackv_int()) as a request to skip the operation. - //e_val = bli_check_valid_cntl( ( void* )cntl ); - //bli_check_error_code( e_val ); } diff --git a/frame/1/unpackv/bli_unpackv_check.h b/frame/1/unpackv/bli_unpackv_check.h index 63bef2396..4410bc72b 100644 --- a/frame/1/unpackv/bli_unpackv_check.h +++ b/frame/1/unpackv/bli_unpackv_check.h @@ -32,6 +32,9 @@ */ -void bli_unpackv_check( obj_t* p, - obj_t* a, - unpackv_t* cntl ); +void bli_unpackv_check + ( + obj_t* p, + obj_t* a, + cntx_t* cntx + ); diff --git a/frame/1/unpackv/bli_unpackv_int.c b/frame/1/unpackv/bli_unpackv_int.c index ccddecab8..4f2d8bf63 100644 --- a/frame/1/unpackv/bli_unpackv_int.c +++ b/frame/1/unpackv/bli_unpackv_int.c @@ -38,6 +38,7 @@ typedef void (*FUNCPTR_T)( obj_t* p, obj_t* a, + cntx_t* cntx, unpackv_t* cntl ); static FUNCPTR_T vars[1][3] = @@ -48,6 +49,7 @@ static FUNCPTR_T vars[1][3] = void bli_unpackv_int( obj_t* p, obj_t* a, + cntx_t* cntx, unpackv_t* cntl ) { // The unpackv operation consists of an optional casting post-process. @@ -69,7 +71,7 @@ void bli_unpackv_int( obj_t* p, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_unpackv_check( p, a, cntl ); + bli_unpackv_check( p, a, cntx ); // Sanity check; A should never have a zero dimension. If we must support // it, then we should fold it into the next alias-and-early-exit block. @@ -123,6 +125,7 @@ void bli_unpackv_int( obj_t* p, // Invoke the variant. f( p, &c, + cntx, cntl ); // Now, if necessary, we cast the contents of c to vector a. If casting diff --git a/frame/1/unpackv/bli_unpackv_int.h b/frame/1/unpackv/bli_unpackv_int.h index ae94c4c81..701f12fef 100644 --- a/frame/1/unpackv/bli_unpackv_int.h +++ b/frame/1/unpackv/bli_unpackv_int.h @@ -34,6 +34,7 @@ void bli_unpackv_int( obj_t* p, obj_t* a, + cntx_t* cntx, unpackv_t* cntl ); /* diff --git a/frame/1/unpackv/bli_unpackv_unb_var1.c b/frame/1/unpackv/bli_unpackv_unb_var1.c index fbbda2440..65a3c0dc4 100644 --- a/frame/1/unpackv/bli_unpackv_unb_var1.c +++ b/frame/1/unpackv/bli_unpackv_unb_var1.c @@ -39,7 +39,8 @@ typedef void (*FUNCPTR_T)( dim_t m, void* p, inc_t incp, - void* c, inc_t incc + void* c, inc_t incc, + cntx_t* cntx ); static FUNCPTR_T GENARRAY(ftypes,unpackv_unb_var1); @@ -47,6 +48,7 @@ static FUNCPTR_T GENARRAY(ftypes,unpackv_unb_var1); void bli_unpackv_unb_var1( obj_t* p, obj_t* c, + cntx_t* cntx, unpackv_t* cntl ) { num_t dt_pc = bli_obj_datatype( *p ); @@ -66,29 +68,40 @@ void bli_unpackv_unb_var1( obj_t* p, f = ftypes[dt_pc]; // Invoke the function. - f( dim_c, - buf_p, incp, - buf_c, incc ); + f + ( + dim_c, + buf_p, incp, + buf_c, incc, + cntx + ); } #undef GENTFUNC -#define GENTFUNC( ctype_pc, chpc, varname, kername ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(chpc,varname)( \ - dim_t m, \ - void* p, inc_t incp, \ - void* c, inc_t incc \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t m, \ + void* p, inc_t incp, \ + void* c, inc_t incc, \ + cntx_t* cntx \ + ) \ { \ - ctype_pc* p_cast = p; \ - ctype_pc* c_cast = c; \ + const num_t dt = PASTEMAC(ch,type); \ \ - PASTEMAC2(chpc,chpc,kername)( BLIS_NO_CONJUGATE, \ - m, \ - p_cast, incp, \ - c_cast, incc ); \ + PASTECH(ch,copyv_ft) copyv_p = bli_cntx_get_l1v_ker_dt( dt, BLIS_COPYV_KER, cntx ); \ +\ + copyv_p \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + p, incp, \ + c, incc, \ + cntx \ + ); \ } -INSERT_GENTFUNC_BASIC( unpackv_unb_var1, COPYV_KERNEL ) +INSERT_GENTFUNC_BASIC0( unpackv_unb_var1 ) diff --git a/frame/1/unpackv/bli_unpackv_unb_var1.h b/frame/1/unpackv/bli_unpackv_unb_var1.h index dd34fa434..2a08d0020 100644 --- a/frame/1/unpackv/bli_unpackv_unb_var1.h +++ b/frame/1/unpackv/bli_unpackv_unb_var1.h @@ -34,16 +34,19 @@ void bli_unpackv_unb_var1( obj_t* p, obj_t* c, + cntx_t* cntx, unpackv_t* cntl ); #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t m, \ - void* p, inc_t incp, \ - void* c, inc_t incc \ - ); +void PASTEMAC(ch,varname) \ + ( \ + dim_t m, \ + void* p, inc_t incp, \ + void* c, inc_t incc, \ + cntx_t* cntx \ + ); INSERT_GENTPROT_BASIC( unpackv_unb_var1 ) diff --git a/frame/1f/dotxf/bli_dotxf_fusefac.c b/frame/1d/bli_l1d.h similarity index 87% rename from frame/1f/dotxf/bli_dotxf_fusefac.c rename to frame/1d/bli_l1d.h index 89755f265..ee40cdf9a 100644 --- a/frame/1f/dotxf/bli_dotxf_fusefac.c +++ b/frame/1d/bli_l1d.h @@ -32,16 +32,14 @@ */ -#include "blis.h" +#include "bli_l1d_cntx.h" +#include "bli_l1d_check.h" -// -// Define object-based fusing factor query routine. -// +// Prototype object APIs with and without contexts. +#include "bli_oapi_w_cntx.h" +#include "bli_l1d_oapi.h" +#include "bli_oapi_wo_cntx.h" +#include "bli_l1d_oapi.h" -static dim_t GENARRAY(factors,dotxf_fusefac); - -dim_t bli_dotxf_fusefac( num_t dt ) -{ - return factors[ dt ]; -} +#include "bli_l1d_tapi.h" diff --git a/frame/1d/bli_l1d_check.c b/frame/1d/bli_l1d_check.c new file mode 100644 index 000000000..3846d99ef --- /dev/null +++ b/frame/1d/bli_l1d_check.c @@ -0,0 +1,245 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define object-based check functions. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* y \ + ) \ +{ \ + bli_l1d_xy_check( x, y ); \ +} + +GENFRONT( addd ) +GENFRONT( copyd ) +GENFRONT( subd ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + ) \ +{ \ + bli_l1d_axy_check( alpha, x, y ); \ +} + +GENFRONT( axpyd ) +GENFRONT( scal2d ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x \ + ) \ +{ \ + bli_l1d_x_check( x ); \ +} + +GENFRONT( invertd ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + ) \ +{ \ + bli_l1d_ax_check( alpha, x ); \ +} + +GENFRONT( scald ) +GENFRONT( setd ) +GENFRONT( setid ) + + +// ----------------------------------------------------------------------------- + +void bli_l1d_xy_check + ( + obj_t* x, + obj_t* y + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_matrix_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_conformal_dims( x, y ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); +} + +void bli_l1d_axy_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_conformal_dims( x, y ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); +} + +void bli_l1d_x_check + ( + obj_t* x + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_matrix_object( x ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); +} + +void bli_l1d_ax_check + ( + obj_t* alpha, + obj_t* x + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( x ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); +} + diff --git a/frame/1d/bli_l1d_check.h b/frame/1d/bli_l1d_check.h new file mode 100644 index 000000000..c378c328a --- /dev/null +++ b/frame/1d/bli_l1d_check.h @@ -0,0 +1,118 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based check functions. +// + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* y \ + ); + +GENTPROT( addd ) +GENTPROT( copyd ) +GENTPROT( subd ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + ); + +GENTPROT( axpyd ) +GENTPROT( scal2d ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x \ + ); + +GENTPROT( invertd ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + ); + +GENTPROT( scald ) +GENTPROT( setd ) +GENTPROT( setid ) + + +// ----------------------------------------------------------------------------- + +void bli_l1d_xy_check + ( + obj_t* x, + obj_t* y + ); + +void bli_l1d_axy_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y + ); + +void bli_l1d_x_check + ( + obj_t* x + ); + +void bli_l1d_ax_check + ( + obj_t* alpha, + obj_t* x + ); + diff --git a/frame/1d/bli_l1d_cntx.c b/frame/1d/bli_l1d_cntx.c new file mode 100644 index 000000000..d285995b1 --- /dev/null +++ b/frame/1d/bli_l1d_cntx.c @@ -0,0 +1,66 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define context initialization functions. +// + +#undef GENFRONT +#define GENFRONT( opname, depname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernel dependencies. */ \ + PASTEMAC(depname,_cntx_init)( cntx ); \ +} \ +\ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( addd, addv ) +GENFRONT( axpyd, axpyv ) +GENFRONT( copyd, copyv ) +GENFRONT( invertd, invertv ) +GENFRONT( scald, scalv ) +GENFRONT( scal2d, scal2v ) +GENFRONT( setd, setv ) +GENFRONT( setid, setv ) +GENFRONT( subd, subv ) + diff --git a/frame/1d/bli_l1d_cntx.h b/frame/1d/bli_l1d_cntx.h new file mode 100644 index 000000000..50db79738 --- /dev/null +++ b/frame/1d/bli_l1d_cntx.h @@ -0,0 +1,55 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype context initialization functions. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ); \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ); + +GENPROT( addd ) +GENPROT( axpyd ) +GENPROT( copyd ) +GENPROT( invertd ) +GENPROT( scald ) +GENPROT( scal2d ) +GENPROT( setd ) +GENPROT( setid) +GENPROT( subd ) + diff --git a/frame/1d/bli_l1d_oapi.c b/frame/1d/bli_l1d_oapi.c new file mode 100644 index 000000000..b610c3184 --- /dev/null +++ b/frame/1d/bli_l1d_oapi.c @@ -0,0 +1,291 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +// Guard the function definitions so that they are only compiled when +// #included from files that define the object API macros. +#ifdef BLIS_ENABLE_OAPI + +// +// Define object-based interfaces. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + diag_t diagx = bli_obj_diag( *x ); \ + trans_t transx = bli_obj_conjtrans_status( *x ); \ + dim_t m = bli_obj_length( *y ); \ + dim_t n = bli_obj_width( *y ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t rs_y = bli_obj_row_stride( *y ); \ + inc_t cs_y = bli_obj_col_stride( *y ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x, y ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_12 \ + ( \ + dt, \ + opname, \ + diagoffx, \ + diagx, \ + transx, \ + m, \ + n, \ + buf_x, rs_x, cs_x, \ + buf_y, rs_y, cs_y, \ + cntx \ + ); \ +} + +GENFRONT( addd ) +GENFRONT( copyd ) +GENFRONT( subd ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + diag_t diagx = bli_obj_diag( *x ); \ + trans_t transx = bli_obj_conjtrans_status( *x ); \ + dim_t m = bli_obj_length( *y ); \ + dim_t n = bli_obj_width( *y ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t rs_y = bli_obj_row_stride( *y ); \ + inc_t cs_y = bli_obj_col_stride( *y ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x, y ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_13 \ + ( \ + dt, \ + opname, \ + diagoffx, \ + diagx, \ + transx, \ + m, \ + n, \ + buf_alpha, \ + buf_x, rs_x, cs_x, \ + buf_y, rs_y, cs_y, \ + cntx \ + ); \ +} + +GENFRONT( axpyd ) +GENFRONT( scal2d ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + dim_t m = bli_obj_length( *x ); \ + dim_t n = bli_obj_width( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x ); \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_7 \ + ( \ + dt, \ + opname, \ + diagoffx, \ + m, \ + n, \ + buf_x, rs_x, cs_x, \ + cntx \ + ); \ +} + +GENFRONT( invertd ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + /* conj_t conjalpha = bli_obj_conj_status( *alpha ); */ \ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + dim_t m = bli_obj_length( *x ); \ + dim_t n = bli_obj_width( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_9 \ + ( \ + dt, \ + opname, \ + BLIS_NO_CONJUGATE, /* internal conjugation applied during copy-cast. */ \ + diagoffx, \ + m, \ + n, \ + buf_alpha, \ + buf_x, rs_x, cs_x, \ + cntx \ + ); \ +} + +GENFRONT( scald ) +GENFRONT( setd ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + dim_t m = bli_obj_length( *x ); \ + dim_t n = bli_obj_width( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ +\ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_8 \ + ( \ + dt, \ + opname, \ + diagoffx, \ + m, \ + n, \ + buf_alpha, \ + buf_x, rs_x, cs_x, \ + cntx \ + ); \ +} + +GENFRONT( setid ) + + +#endif + diff --git a/frame/1m/packm/ukernels/bli_packm_ref_cxk_4mi.h b/frame/1d/bli_l1d_oapi.h similarity index 66% rename from frame/1m/packm/ukernels/bli_packm_ref_cxk_4mi.h rename to frame/1d/bli_l1d_oapi.h index ec81874a3..78b76d669 100644 --- a/frame/1m/packm/ukernels/bli_packm_ref_cxk_4mi.h +++ b/frame/1d/bli_l1d_oapi.h @@ -32,24 +32,64 @@ */ + +// +// Prototype object-based interfaces. +// + #undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ +#define GENTPROT( opname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ); +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ); -INSERT_GENTPROT_BASIC( packm_ref_2xk_4mi ) -INSERT_GENTPROT_BASIC( packm_ref_4xk_4mi ) -INSERT_GENTPROT_BASIC( packm_ref_6xk_4mi ) -INSERT_GENTPROT_BASIC( packm_ref_8xk_4mi ) -INSERT_GENTPROT_BASIC( packm_ref_10xk_4mi ) -INSERT_GENTPROT_BASIC( packm_ref_12xk_4mi ) -INSERT_GENTPROT_BASIC( packm_ref_14xk_4mi ) -INSERT_GENTPROT_BASIC( packm_ref_16xk_4mi ) -INSERT_GENTPROT_BASIC( packm_ref_30xk_4mi ) +GENTPROT( addd ) +GENTPROT( copyd ) +GENTPROT( subd ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( axpyd ) +GENTPROT( scal2d ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( invertd ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( scald ) +GENTPROT( setd ) +GENTPROT( setid ) diff --git a/frame/1d/bli_l1d_oapi_wc.c b/frame/1d/bli_l1d_oapi_wc.c new file mode 100644 index 000000000..456b911ec --- /dev/null +++ b/frame/1d/bli_l1d_oapi_wc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-aware. +#include "bli_oapi_w_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l1d_oapi.c" + diff --git a/frame/1d/bli_l1d_oapi_woc.c b/frame/1d/bli_l1d_oapi_woc.c new file mode 100644 index 000000000..81d4370a9 --- /dev/null +++ b/frame/1d/bli_l1d_oapi_woc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-less. +#include "bli_oapi_wo_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l1d_oapi.c" + diff --git a/frame/1d/bli_l1d_tapi.c b/frame/1d/bli_l1d_tapi.c new file mode 100644 index 000000000..5ef92603a --- /dev/null +++ b/frame/1d/bli_l1d_tapi.c @@ -0,0 +1,371 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + ctype* x1; \ + ctype* y1; \ + conj_t conjx; \ + dim_t n_elem; \ + dim_t offx, offy; \ + inc_t incx, incy; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + if ( bli_is_outside_diag( diagoffx, transx, m, n ) ) return; \ +\ + /* Determine the distance to the diagonals, the number of diagonal + elements, and the diagonal increments. */ \ + bli_set_dims_incs_2d( diagoffx, transx, \ + m, n, rs_x, cs_x, rs_y, cs_y, \ + offx, offy, n_elem, incx, incy ); \ +\ + conjx = bli_extract_conj( transx ); \ +\ + if ( bli_is_nonunit_diag( diagx ) ) \ + { \ + x1 = x + offx; \ + y1 = y + offy; \ + } \ + else /* if ( bli_is_unit_diag( diagx ) ) */ \ + { \ + /* Simulate a unit diagonal for x with a zero increment over a unit + scalar. */ \ + x1 = PASTEMAC(ch,1); \ + incx = 0; \ + y1 = y + offy; \ + } \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Query the context for the operation's kernel address. */ \ + PASTECH2(ch,kername,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjx, \ + n_elem, \ + x1, incx, \ + y1, incy, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC2( addd, addv, BLIS_ADDV_KER ) +INSERT_GENTFUNC_BASIC2( copyd, copyv, BLIS_COPYV_KER ) +INSERT_GENTFUNC_BASIC2( subd, subv, BLIS_SUBV_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + ctype* x1; \ + ctype* y1; \ + conj_t conjx; \ + dim_t n_elem; \ + dim_t offx, offy; \ + inc_t incx, incy; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + if ( bli_is_outside_diag( diagoffx, transx, m, n ) ) return; \ +\ + /* Determine the distance to the diagonals, the number of diagonal + elements, and the diagonal increments. */ \ + bli_set_dims_incs_2d( diagoffx, transx, \ + m, n, rs_x, cs_x, rs_y, cs_y, \ + offx, offy, n_elem, incx, incy ); \ +\ + conjx = bli_extract_conj( transx ); \ +\ + if ( bli_is_nonunit_diag( diagx ) ) \ + { \ + x1 = x + offx; \ + y1 = y + offy; \ + } \ + else /* if ( bli_is_unit_diag( diagx ) ) */ \ + { \ + /* Simulate a unit diagonal for x with a zero increment over a unit + scalar. */ \ + x1 = PASTEMAC(ch,1); \ + incx = 0; \ + y1 = y + offy; \ + } \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Query the context for the operation's kernel address. */ \ + PASTECH2(ch,kername,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjx, \ + n_elem, \ + alpha, \ + x1, incx, \ + y1, incy, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC2( axpyd, axpyv, BLIS_AXPYV_KER ) +INSERT_GENTFUNC_BASIC2( scal2d, scal2v, BLIS_SCAL2V_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + ctype* x1; \ + dim_t n_elem; \ + dim_t offx; \ + inc_t incx; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + if ( bli_is_outside_diag( diagoffx, BLIS_NO_TRANSPOSE, m, n ) ) return; \ +\ + /* Determine the distance to the diagonals, the number of diagonal + elements, and the diagonal increments. */ \ + bli_set_dims_incs_1d( diagoffx, \ + m, n, rs_x, cs_x, \ + offx, n_elem, incx ); \ +\ + x1 = x + offx; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Query the context for the operation's kernel address. */ \ + PASTECH2(ch,kername,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + n_elem, \ + x1, incx, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC2( invertd, invertv, BLIS_INVERTV_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjalpha, \ + doff_t diagoffx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + ctype* x1; \ + dim_t n_elem; \ + dim_t offx; \ + inc_t incx; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + if ( bli_is_outside_diag( diagoffx, BLIS_NO_TRANSPOSE, m, n ) ) return; \ +\ + /* Determine the distance to the diagonals, the number of diagonal + elements, and the diagonal increments. */ \ + bli_set_dims_incs_1d( diagoffx, \ + m, n, rs_x, cs_x, \ + offx, n_elem, incx ); \ +\ + x1 = x + offx; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Query the context for the operation's kernel address. */ \ + PASTECH2(ch,kername,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx_p ); \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjalpha, \ + n_elem, \ + alpha, \ + x1, incx, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC2( scald, scalv, BLIS_SCALV_KER ) +INSERT_GENTFUNC_BASIC2( setd, setv, BLIS_SETV_KER ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname, kername, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + dim_t m, \ + dim_t n, \ + ctype_r* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + const num_t dt_r = PASTEMAC(chr,type); \ + cntx_t* cntx_p; \ +\ + ctype_r* x1; \ + dim_t n_elem; \ + dim_t offx; \ + inc_t incx; \ +\ + /* If the datatype is real, the entire operation is a no-op. */ \ + if ( bli_is_real( dt ) ) return; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + if ( bli_is_outside_diag( diagoffx, BLIS_NO_TRANSPOSE, m, n ) ) return; \ +\ + /* Determine the distance to the diagonals, the number of diagonal + elements, and the diagonal increments. */ \ + bli_set_dims_incs_1d( diagoffx, \ + m, n, rs_x, cs_x, \ + offx, n_elem, incx ); \ +\ + /* Alternate implementation. (Substitute for remainder of function). */ \ + /* for ( i = 0; i < n_elem; ++i ) \ + { \ + ctype* chi11 = x1 + (i )*incx; \ +\ + PASTEMAC(ch,setis)( *alpha, *chi11 ); \ + } */ \ +\ + /* Acquire the addres of the imaginary component of the first element, + and scale the increment for use in the real domain. Note that the + indexing into the imaginary field only needs to work for complex + datatypes since we return early for real domain types. */ \ + x1 = ( ctype_r* )( x + offx ) + 1; \ + incx = 2*incx; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Query the context for the operation's kernel address. */ \ + PASTECH2(chr,kername,_ft) f = bli_cntx_get_l1v_ker_dt( dt_r, kerid, cntx_p ); \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + BLIS_NO_CONJUGATE, \ + n_elem, \ + alpha, \ + x1, incx, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNCR_BASIC2( setid, setv, BLIS_SETV_KER ) + diff --git a/frame/1d/bli_l1d_tapi.h b/frame/1d/bli_l1d_tapi.h new file mode 100644 index 000000000..be994a224 --- /dev/null +++ b/frame/1d/bli_l1d_tapi.h @@ -0,0 +1,127 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( addd ) +INSERT_GENTPROT_BASIC( copyd ) +INSERT_GENTPROT_BASIC( subd ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( axpyd ) +INSERT_GENTPROT_BASIC( scal2d ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( invertd ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjalpha, \ + doff_t diagoffx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( scald ) +INSERT_GENTPROT_BASIC( setd ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + dim_t m, \ + dim_t n, \ + ctype_r* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( setid ) + diff --git a/frame/1d/addd/bli_addd.c b/frame/1d/old/bli_addd.c similarity index 100% rename from frame/1d/addd/bli_addd.c rename to frame/1d/old/bli_addd.c diff --git a/frame/1d/addd/bli_addd.h b/frame/1d/old/bli_addd.h similarity index 100% rename from frame/1d/addd/bli_addd.h rename to frame/1d/old/bli_addd.h diff --git a/frame/1d/addd/bli_addd_check.c b/frame/1d/old/bli_addd_check.c similarity index 100% rename from frame/1d/addd/bli_addd_check.c rename to frame/1d/old/bli_addd_check.c diff --git a/frame/1d/addd/bli_addd_check.h b/frame/1d/old/bli_addd_check.h similarity index 100% rename from frame/1d/addd/bli_addd_check.h rename to frame/1d/old/bli_addd_check.h diff --git a/frame/1d/addd/bli_addd_unb_var1.c b/frame/1d/old/bli_addd_unb_var1.c similarity index 100% rename from frame/1d/addd/bli_addd_unb_var1.c rename to frame/1d/old/bli_addd_unb_var1.c diff --git a/frame/1d/addd/bli_addd_unb_var1.h b/frame/1d/old/bli_addd_unb_var1.h similarity index 100% rename from frame/1d/addd/bli_addd_unb_var1.h rename to frame/1d/old/bli_addd_unb_var1.h diff --git a/frame/1d/axpyd/bli_axpyd.c b/frame/1d/old/bli_axpyd.c similarity index 100% rename from frame/1d/axpyd/bli_axpyd.c rename to frame/1d/old/bli_axpyd.c diff --git a/frame/1d/axpyd/bli_axpyd.h b/frame/1d/old/bli_axpyd.h similarity index 100% rename from frame/1d/axpyd/bli_axpyd.h rename to frame/1d/old/bli_axpyd.h diff --git a/frame/1d/axpyd/bli_axpyd_check.c b/frame/1d/old/bli_axpyd_check.c similarity index 100% rename from frame/1d/axpyd/bli_axpyd_check.c rename to frame/1d/old/bli_axpyd_check.c diff --git a/frame/1d/axpyd/bli_axpyd_check.h b/frame/1d/old/bli_axpyd_check.h similarity index 100% rename from frame/1d/axpyd/bli_axpyd_check.h rename to frame/1d/old/bli_axpyd_check.h diff --git a/frame/1d/axpyd/bli_axpyd_unb_var1.c b/frame/1d/old/bli_axpyd_unb_var1.c similarity index 100% rename from frame/1d/axpyd/bli_axpyd_unb_var1.c rename to frame/1d/old/bli_axpyd_unb_var1.c diff --git a/frame/1d/axpyd/bli_axpyd_unb_var1.h b/frame/1d/old/bli_axpyd_unb_var1.h similarity index 100% rename from frame/1d/axpyd/bli_axpyd_unb_var1.h rename to frame/1d/old/bli_axpyd_unb_var1.h diff --git a/frame/1d/copyd/bli_copyd.c b/frame/1d/old/bli_copyd.c similarity index 100% rename from frame/1d/copyd/bli_copyd.c rename to frame/1d/old/bli_copyd.c diff --git a/frame/1d/copyd/bli_copyd.h b/frame/1d/old/bli_copyd.h similarity index 100% rename from frame/1d/copyd/bli_copyd.h rename to frame/1d/old/bli_copyd.h diff --git a/frame/1d/copyd/bli_copyd_check.c b/frame/1d/old/bli_copyd_check.c similarity index 100% rename from frame/1d/copyd/bli_copyd_check.c rename to frame/1d/old/bli_copyd_check.c diff --git a/frame/1d/copyd/bli_copyd_check.h b/frame/1d/old/bli_copyd_check.h similarity index 100% rename from frame/1d/copyd/bli_copyd_check.h rename to frame/1d/old/bli_copyd_check.h diff --git a/frame/1d/copyd/bli_copyd_unb_var1.c b/frame/1d/old/bli_copyd_unb_var1.c similarity index 100% rename from frame/1d/copyd/bli_copyd_unb_var1.c rename to frame/1d/old/bli_copyd_unb_var1.c diff --git a/frame/1d/copyd/bli_copyd_unb_var1.h b/frame/1d/old/bli_copyd_unb_var1.h similarity index 100% rename from frame/1d/copyd/bli_copyd_unb_var1.h rename to frame/1d/old/bli_copyd_unb_var1.h diff --git a/frame/1d/invertd/bli_invertd.c b/frame/1d/old/bli_invertd.c similarity index 100% rename from frame/1d/invertd/bli_invertd.c rename to frame/1d/old/bli_invertd.c diff --git a/frame/1d/invertd/bli_invertd.h b/frame/1d/old/bli_invertd.h similarity index 100% rename from frame/1d/invertd/bli_invertd.h rename to frame/1d/old/bli_invertd.h diff --git a/frame/1d/invertd/bli_invertd_check.c b/frame/1d/old/bli_invertd_check.c similarity index 100% rename from frame/1d/invertd/bli_invertd_check.c rename to frame/1d/old/bli_invertd_check.c diff --git a/frame/1d/invertd/bli_invertd_check.h b/frame/1d/old/bli_invertd_check.h similarity index 100% rename from frame/1d/invertd/bli_invertd_check.h rename to frame/1d/old/bli_invertd_check.h diff --git a/frame/1d/invertd/bli_invertd_unb_var1.c b/frame/1d/old/bli_invertd_unb_var1.c similarity index 100% rename from frame/1d/invertd/bli_invertd_unb_var1.c rename to frame/1d/old/bli_invertd_unb_var1.c diff --git a/frame/1d/invertd/bli_invertd_unb_var1.h b/frame/1d/old/bli_invertd_unb_var1.h similarity index 100% rename from frame/1d/invertd/bli_invertd_unb_var1.h rename to frame/1d/old/bli_invertd_unb_var1.h diff --git a/frame/1d/scal2d/bli_scal2d.c b/frame/1d/old/bli_scal2d.c similarity index 100% rename from frame/1d/scal2d/bli_scal2d.c rename to frame/1d/old/bli_scal2d.c diff --git a/frame/1d/scal2d/bli_scal2d.h b/frame/1d/old/bli_scal2d.h similarity index 100% rename from frame/1d/scal2d/bli_scal2d.h rename to frame/1d/old/bli_scal2d.h diff --git a/frame/1d/scal2d/bli_scal2d_check.c b/frame/1d/old/bli_scal2d_check.c similarity index 100% rename from frame/1d/scal2d/bli_scal2d_check.c rename to frame/1d/old/bli_scal2d_check.c diff --git a/frame/1d/scal2d/bli_scal2d_check.h b/frame/1d/old/bli_scal2d_check.h similarity index 100% rename from frame/1d/scal2d/bli_scal2d_check.h rename to frame/1d/old/bli_scal2d_check.h diff --git a/frame/1d/scal2d/bli_scal2d_unb_var1.c b/frame/1d/old/bli_scal2d_unb_var1.c similarity index 100% rename from frame/1d/scal2d/bli_scal2d_unb_var1.c rename to frame/1d/old/bli_scal2d_unb_var1.c diff --git a/frame/1d/scal2d/bli_scal2d_unb_var1.h b/frame/1d/old/bli_scal2d_unb_var1.h similarity index 100% rename from frame/1d/scal2d/bli_scal2d_unb_var1.h rename to frame/1d/old/bli_scal2d_unb_var1.h diff --git a/frame/1d/scald/bli_scald.c b/frame/1d/old/bli_scald.c similarity index 100% rename from frame/1d/scald/bli_scald.c rename to frame/1d/old/bli_scald.c diff --git a/frame/1d/scald/bli_scald.h b/frame/1d/old/bli_scald.h similarity index 100% rename from frame/1d/scald/bli_scald.h rename to frame/1d/old/bli_scald.h diff --git a/frame/1d/scald/bli_scald_check.c b/frame/1d/old/bli_scald_check.c similarity index 100% rename from frame/1d/scald/bli_scald_check.c rename to frame/1d/old/bli_scald_check.c diff --git a/frame/1d/scald/bli_scald_check.h b/frame/1d/old/bli_scald_check.h similarity index 100% rename from frame/1d/scald/bli_scald_check.h rename to frame/1d/old/bli_scald_check.h diff --git a/frame/1d/scald/bli_scald_unb_var1.c b/frame/1d/old/bli_scald_unb_var1.c similarity index 100% rename from frame/1d/scald/bli_scald_unb_var1.c rename to frame/1d/old/bli_scald_unb_var1.c diff --git a/frame/1d/scald/bli_scald_unb_var1.h b/frame/1d/old/bli_scald_unb_var1.h similarity index 100% rename from frame/1d/scald/bli_scald_unb_var1.h rename to frame/1d/old/bli_scald_unb_var1.h diff --git a/frame/1d/setd/bli_setd.c b/frame/1d/old/bli_setd.c similarity index 100% rename from frame/1d/setd/bli_setd.c rename to frame/1d/old/bli_setd.c diff --git a/frame/1d/setd/bli_setd.h b/frame/1d/old/bli_setd.h similarity index 100% rename from frame/1d/setd/bli_setd.h rename to frame/1d/old/bli_setd.h diff --git a/frame/1d/setd/bli_setd_check.c b/frame/1d/old/bli_setd_check.c similarity index 100% rename from frame/1d/setd/bli_setd_check.c rename to frame/1d/old/bli_setd_check.c diff --git a/frame/1d/setd/bli_setd_check.h b/frame/1d/old/bli_setd_check.h similarity index 100% rename from frame/1d/setd/bli_setd_check.h rename to frame/1d/old/bli_setd_check.h diff --git a/frame/1d/setd/bli_setd_unb_var1.c b/frame/1d/old/bli_setd_unb_var1.c similarity index 100% rename from frame/1d/setd/bli_setd_unb_var1.c rename to frame/1d/old/bli_setd_unb_var1.c diff --git a/frame/1d/setd/bli_setd_unb_var1.h b/frame/1d/old/bli_setd_unb_var1.h similarity index 100% rename from frame/1d/setd/bli_setd_unb_var1.h rename to frame/1d/old/bli_setd_unb_var1.h diff --git a/frame/1d/setid/bli_setid.c b/frame/1d/old/bli_setid.c similarity index 100% rename from frame/1d/setid/bli_setid.c rename to frame/1d/old/bli_setid.c diff --git a/frame/1d/setid/bli_setid.h b/frame/1d/old/bli_setid.h similarity index 100% rename from frame/1d/setid/bli_setid.h rename to frame/1d/old/bli_setid.h diff --git a/frame/1d/setid/bli_setid_check.c b/frame/1d/old/bli_setid_check.c similarity index 100% rename from frame/1d/setid/bli_setid_check.c rename to frame/1d/old/bli_setid_check.c diff --git a/frame/1d/setid/bli_setid_check.h b/frame/1d/old/bli_setid_check.h similarity index 100% rename from frame/1d/setid/bli_setid_check.h rename to frame/1d/old/bli_setid_check.h diff --git a/frame/1d/setid/bli_setid_unb_var1.c b/frame/1d/old/bli_setid_unb_var1.c similarity index 100% rename from frame/1d/setid/bli_setid_unb_var1.c rename to frame/1d/old/bli_setid_unb_var1.c diff --git a/frame/1d/setid/bli_setid_unb_var1.h b/frame/1d/old/bli_setid_unb_var1.h similarity index 100% rename from frame/1d/setid/bli_setid_unb_var1.h rename to frame/1d/old/bli_setid_unb_var1.h diff --git a/frame/1d/subd/bli_subd.c b/frame/1d/old/bli_subd.c similarity index 100% rename from frame/1d/subd/bli_subd.c rename to frame/1d/old/bli_subd.c diff --git a/frame/1d/subd/bli_subd.h b/frame/1d/old/bli_subd.h similarity index 100% rename from frame/1d/subd/bli_subd.h rename to frame/1d/old/bli_subd.h diff --git a/frame/1d/subd/bli_subd_check.c b/frame/1d/old/bli_subd_check.c similarity index 100% rename from frame/1d/subd/bli_subd_check.c rename to frame/1d/old/bli_subd_check.c diff --git a/frame/1d/subd/bli_subd_check.h b/frame/1d/old/bli_subd_check.h similarity index 100% rename from frame/1d/subd/bli_subd_check.h rename to frame/1d/old/bli_subd_check.h diff --git a/frame/1d/subd/bli_subd_unb_var1.c b/frame/1d/old/bli_subd_unb_var1.c similarity index 100% rename from frame/1d/subd/bli_subd_unb_var1.c rename to frame/1d/old/bli_subd_unb_var1.c diff --git a/frame/1d/subd/bli_subd_unb_var1.h b/frame/1d/old/bli_subd_unb_var1.h similarity index 100% rename from frame/1d/subd/bli_subd_unb_var1.h rename to frame/1d/old/bli_subd_unb_var1.h diff --git a/frame/1f/axpy2v/bli_axpy2v.c b/frame/1f/axpy2v/bli_axpy2v.c deleted file mode 100644 index cf5ab476d..000000000 --- a/frame/1f/axpy2v/bli_axpy2v.c +++ /dev/null @@ -1,133 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* alpha1, \ - obj_t* alpha2, \ - obj_t* x, \ - obj_t* y, \ - obj_t* z \ - ) \ -{ \ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( alpha1, alpha2, x, y, z ); \ -\ - PASTEMAC0(varname)( alpha1, \ - alpha2, \ - x, \ - y, \ - z ); \ -} - -GENFRONT( axpy2v, axpy2v_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype* alpha1, \ - ctype* alpha2, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy, \ - ctype* z, inc_t incz \ - ) \ -{ \ - PASTEMAC3(ch,ch,ch,varname)( conjx, \ - conjy, \ - n, \ - alpha1, \ - alpha2, \ - x, incx, \ - y, incy, \ - z, incz ); \ -} - -INSERT_GENTFUNC_BASIC( axpy2v, AXPY2V_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, opname, varname ) \ -\ -void PASTEMAC3(chx,chy,chz,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_xy* alpha1, \ - ctype_xy* alpha2, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_z* z, inc_t incz \ - ) \ -{ \ - PASTEMAC3(chx,chy,chz,varname)( conjx, \ - conjy, \ - n, \ - alpha1, \ - alpha2, \ - x, incx, \ - y, incy, \ - z, incz ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( axpy2v, AXPY2V_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( axpy2v, AXPY2V_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( axpy2v, AXPY2V_KERNEL ) -#endif - diff --git a/frame/1f/axpy2v/bli_axpy2v_kernel.c b/frame/1f/axpy2v/bli_axpy2v_kernel.c deleted file mode 100644 index 697215a25..000000000 --- a/frame/1f/axpy2v/bli_axpy2v_kernel.c +++ /dev/null @@ -1,150 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -#define FUNCPTR_T axpy2v_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - conj_t conjy, - dim_t n, - void* alpha1, - void* alpha2, - void* x, inc_t incx, - void* y, inc_t incy, - void* z, inc_t incz - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,axpy2v_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,axpy2v_kernel_void); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,axpy2v_kernel_void); -#endif -#endif - - -void bli_axpy2v_kernel( obj_t* alpha1, - obj_t* alpha2, - obj_t* x, - obj_t* y, - obj_t* z ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - inc_t inc_z = bli_obj_vector_inc( *z ); - void* buf_z = bli_obj_buffer_at_off( *z ); - - num_t dt_alpha1; - void* buf_alpha1; - - num_t dt_alpha2; - void* buf_alpha2; - - FUNCPTR_T f; - - // If alpha is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the alpha object and extract the buffer at the alpha offset. - bli_set_scalar_dt_buffer( alpha1, dt_x, dt_alpha1, buf_alpha1 ); - bli_set_scalar_dt_buffer( alpha2, dt_x, dt_alpha2, buf_alpha2 ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_alpha1][dt_x][dt_y]; - - // Invoke the function. - f( conjx, - conjy, - n, - buf_alpha1, - buf_alpha2, - buf_x, inc_x, - buf_y, inc_y, - buf_z, inc_z ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, varname, kername ) \ -\ -void PASTEMAC3(chx,chy,chz,varname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - void* alpha1, \ - void* alpha2, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* z, inc_t incz \ - ) \ -{ \ - PASTEMAC3(chx,chy,chz,kername)( conjx, \ - conjy, \ - n, \ - alpha1, \ - alpha2, \ - x, incx, \ - y, incy, \ - z, incz ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( axpy2v_kernel_void, AXPY2V_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( axpy2v_kernel_void, AXPY2V_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( axpy2v_kernel_void, AXPY2V_KERNEL ) -#endif - diff --git a/frame/1f/axpy2v/bli_axpy2v_kernel.h b/frame/1f/axpy2v/bli_axpy2v_kernel.h deleted file mode 100644 index 134131eda..000000000 --- a/frame/1f/axpy2v/bli_axpy2v_kernel.h +++ /dev/null @@ -1,69 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_axpy2v_kernel( obj_t* alpha1, - obj_t* alpha2, - obj_t* x, - obj_t* y, - obj_t* z ); - - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT3 -#define GENTPROT3( ctype_x, ctype_y, ctype_z, chx, chy, chz, varname ) \ -\ -void PASTEMAC3(chx,chy,chz,varname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - void* alpha1, \ - void* alpha2, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* z, inc_t incz \ - ); - -INSERT_GENTPROT3_BASIC( axpy2v_kernel_void ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3_MIX_D( axpy2v_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3_MIX_P( axpy2v_kernel_void ) -#endif - diff --git a/frame/1f/axpy2v/bli_axpy2v_ref.c b/frame/1f/axpy2v/bli_axpy2v_ref.c deleted file mode 100644 index 825d088c4..000000000 --- a/frame/1f/axpy2v/bli_axpy2v_ref.c +++ /dev/null @@ -1,161 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T axpy2v_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - conj_t conjy, - dim_t n, - void* alpha1, - void* alpha2, - void* x, inc_t incx, - void* y, inc_t incy, - void* z, inc_t incz - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,axpy2v_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,axpy2v_ref); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,axpy2v_ref); -#endif -#endif - - -void bli_axpy2v_ref( obj_t* alpha1, - obj_t* alpha2, - obj_t* x, - obj_t* y, - obj_t* z ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - inc_t inc_z = bli_obj_vector_inc( *z ); - void* buf_z = bli_obj_buffer_at_off( *z ); - - num_t dt_alpha1; - void* buf_alpha1; - - num_t dt_alpha2; - void* buf_alpha2; - - FUNCPTR_T f; - - // If alpha is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the alpha object and extract the buffer at the alpha offset. - bli_set_scalar_dt_buffer( alpha1, dt_x, dt_alpha1, buf_alpha1 ); - bli_set_scalar_dt_buffer( alpha2, dt_x, dt_alpha2, buf_alpha2 ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_alpha1][dt_x][dt_y]; - - // Invoke the function. - f( conjx, - conjy, - n, - buf_alpha1, - buf_alpha2, - buf_x, inc_x, - buf_y, inc_y, - buf_z, inc_z ); -} -*/ - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, varname, kername ) \ -\ -void PASTEMAC3(chx,chy,chz,varname) \ - ( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_xy* restrict alpha1, \ - ctype_xy* restrict alpha2, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_z* restrict z, inc_t incz \ - ) \ -{ \ - ctype_xy* alpha1_cast = alpha1; \ - ctype_xy* alpha2_cast = alpha2; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_z* z_cast = z; \ -\ - PASTEMAC3(chxy,chx,chz,kername)( conjx, \ - n, \ - alpha1_cast, \ - x_cast, incx, \ - z_cast, incz ); \ - PASTEMAC3(chxy,chy,chz,kername)( conjy, \ - n, \ - alpha2_cast, \ - y_cast, incy, \ - z_cast, incz ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( axpy2v_ref, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( axpy2v_ref, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( axpy2v_ref, AXPYV_KERNEL ) -#endif - diff --git a/frame/1f/axpyf/bli_axpyf.c b/frame/1f/axpyf/bli_axpyf.c deleted file mode 100644 index 689a7e16b..000000000 --- a/frame/1f/axpyf/bli_axpyf.c +++ /dev/null @@ -1,141 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* alpha, \ - obj_t* a, \ - obj_t* x, \ - obj_t* y \ - ) \ -{ \ - obj_t a_local; \ -\ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( alpha, a, x, y ); \ -\ - bli_obj_alias_to( *a, a_local ); \ -\ - if ( bli_obj_has_trans( a_local ) ) \ - { \ - bli_obj_induce_trans( a_local ); \ - bli_obj_toggle_trans( a_local ); \ - } \ -\ - PASTEMAC0(varname)( alpha, \ - &a_local, \ - x, \ - y ); \ -} - -GENFRONT( axpyf, axpyf_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype* alpha, \ - ctype* a, inc_t inca, inc_t lda, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy \ - ) \ -{ \ - PASTEMAC3(ch,ch,ch,varname)( conja, \ - conjx, \ - m, \ - b_n, \ - alpha, \ - a, inca, lda, \ - x, incx, \ - y, incy ); \ -} - -INSERT_GENTFUNC_BASIC( axpyf, AXPYF_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, opname, varname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t inca, inc_t lda, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ) \ -{ \ - PASTEMAC3(cha,chx,chy,varname)( conja, \ - conjx, \ - m, \ - b_n, \ - alpha, \ - a, inca, lda, \ - x, incx, \ - y, incy ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( axpyf, AXPYF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( axpyf, AXPYF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( axpyf, AXPYF_KERNEL ) -#endif - diff --git a/frame/1f/axpyf/bli_axpyf_kernel.h b/frame/1f/axpyf/bli_axpyf_kernel.h deleted file mode 100644 index 2d4fe74b2..000000000 --- a/frame/1f/axpyf/bli_axpyf_kernel.h +++ /dev/null @@ -1,68 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_axpyf_kernel( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* y ); - - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname ) \ -\ -void PASTEMAC3(cha,chx,chy,varname)( \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - void* alpha, \ - void* a, inc_t inca, inc_t lda, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ); - -INSERT_GENTPROT3U12_BASIC( axpyf_kernel_void ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( axpyf_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( axpyf_kernel_void ) -#endif - diff --git a/frame/1f/bli_l1f.h b/frame/1f/bli_l1f.h new file mode 100644 index 000000000..6052a4b5c --- /dev/null +++ b/frame/1f/bli_l1f.h @@ -0,0 +1,50 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "bli_l1f_cntx.h" +#include "bli_l1f_check.h" + +#include "bli_l1f_ft.h" + +// Prototype object APIs with and without contexts. +#include "bli_oapi_w_cntx.h" +#include "bli_l1f_oapi.h" +#include "bli_oapi_wo_cntx.h" +#include "bli_l1f_oapi.h" + +#include "bli_l1f_tapi.h" + +// Reference kernel headers +#include "bli_l1f_ref.h" + diff --git a/frame/1f/bli_l1f_check.c b/frame/1f/bli_l1f_check.c new file mode 100644 index 000000000..127348c8e --- /dev/null +++ b/frame/1f/bli_l1f_check.c @@ -0,0 +1,450 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define object-based check functions. +// + +void bli_axpy2v_check + ( + obj_t* alphax, + obj_t* alphay, + obj_t* x, + obj_t* y, + obj_t* z + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alphax ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( alphay ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( z ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alphax ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( alphay ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( z ); + bli_check_error_code( e_val ); + + e_val = bli_check_equal_vector_lengths( x, y ); + bli_check_error_code( e_val ); + + e_val = bli_check_equal_vector_lengths( x, z ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alphax ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( alphay ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( z ); + bli_check_error_code( e_val ); +} + + +void bli_axpyf_check + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* y + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_dim_equals( x, bli_obj_width_after_trans( *a ) ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_dim_equals( y, bli_obj_length_after_trans( *a ) ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); +} + + +void bli_dotaxpyv_check + ( + obj_t* alpha, + obj_t* xt, + obj_t* x, + obj_t* y, + obj_t* rho, + obj_t* z + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( xt ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( rho ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( rho ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( z ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( xt ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( rho ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( z ); + bli_check_error_code( e_val ); + + e_val = bli_check_equal_vector_lengths( x, xt ); + bli_check_error_code( e_val ); + + e_val = bli_check_equal_vector_lengths( x, y ); + bli_check_error_code( e_val ); + + e_val = bli_check_equal_vector_lengths( x, z ); + bli_check_error_code( e_val ); + + // Check object aliases. + + e_val = bli_check_object_alias_of( xt, x ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( xt ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( rho ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( z ); + bli_check_error_code( e_val ); +} + + +void bli_dotxaxpyf_check + ( + obj_t* alpha, + obj_t* at, + obj_t* a, + obj_t* w, + obj_t* x, + obj_t* beta, + obj_t* y, + obj_t* z + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( at ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( w ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( z ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( at ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( w ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( z ); + bli_check_error_code( e_val ); + + e_val = bli_check_equal_vector_lengths( w, z ); + bli_check_error_code( e_val ); + + e_val = bli_check_equal_vector_lengths( x, y ); + bli_check_error_code( e_val ); + + e_val = bli_check_conformal_dims( at, a ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_length_equals( at, bli_obj_vector_dim( *w ) ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_width_equals( at, bli_obj_vector_dim( *y ) ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_length_equals( a, bli_obj_vector_dim( *z ) ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_width_equals( a, bli_obj_vector_dim( *x ) ); + bli_check_error_code( e_val ); + + // Check object aliases. + + e_val = bli_check_object_alias_of( at, a ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( at ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( w ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( z ); + bli_check_error_code( e_val ); +} + + +void bli_dotxf_check + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_dim_equals( x, bli_obj_length_after_trans( *a ) ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_dim_equals( y, bli_obj_width_after_trans( *a ) ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); +} + diff --git a/frame/1f/bli_l1f_check.h b/frame/1f/bli_l1f_check.h new file mode 100644 index 000000000..77996090e --- /dev/null +++ b/frame/1f/bli_l1f_check.h @@ -0,0 +1,116 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based check functions. +// + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alphax, \ + obj_t* alphay, \ + obj_t* x, \ + obj_t* y, \ + obj_t* z \ + ); + +GENTPROT( axpy2v ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* y \ + ); + +GENTPROT( axpyf ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* xt, \ + obj_t* x, \ + obj_t* y, \ + obj_t* rho, \ + obj_t* z \ + ); + +GENTPROT( dotaxpyv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* at, \ + obj_t* a, \ + obj_t* w, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y, \ + obj_t* z \ + ); + +GENTPROT( dotxaxpyf ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y \ + ); + +GENTPROT( dotxf ) + diff --git a/frame/1f/bli_l1f_cntx.c b/frame/1f/bli_l1f_cntx.c new file mode 100644 index 000000000..379cbce7d --- /dev/null +++ b/frame/1f/bli_l1f_cntx.c @@ -0,0 +1,143 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define context initialization functions. +// + +#undef GENFRONT +#define GENFRONT( opname, kertype, depname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernel dependencies. */ \ + PASTEMAC(depname,_cntx_init)( cntx ); \ +\ + /* Initialize the context with the kernel associated with the current + operation. */ \ + bli_gks_cntx_set_l1f_ker( kertype, cntx ); \ +} \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( axpy2v, BLIS_AXPY2V_KER, axpyv ) + + +#undef GENFRONT +#define GENFRONT( opname, kertype, depname1, depname2 ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernel dependencies. */ \ + PASTEMAC(depname1,_cntx_init)( cntx ); \ + PASTEMAC(depname2,_cntx_init)( cntx ); \ +\ + /* Initialize the context with the kernel associated with the current + operation. */ \ + bli_gks_cntx_set_l1f_ker( kertype, cntx ); \ +} \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( dotaxpyv, BLIS_DOTAXPYV_KER, dotxv, axpyv ) + + +#undef GENFRONT +#define GENFRONT( opname, kertype, depname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernel dependencies. */ \ + PASTEMAC(depname,_cntx_init)( cntx ); \ +\ + /* Initialize the context with the kernel associated with the current + operation. */ \ + bli_gks_cntx_set_l1f_ker( kertype, cntx ); \ +\ + /* Initialize the context with the current architecture's level-1f + fusing blocksizes. */ \ + bli_gks_cntx_set_blkszs( BLIS_NAT, 1, \ + BLIS_AF, BLIS_AF, /* axpyf fusing factor */ \ + cntx ); \ +} \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( axpyf, BLIS_AXPYF_KER, axpyv ) + + +#undef GENFRONT +#define GENFRONT( opname, kertype, depname1, depname2 ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernel dependencies. */ \ + PASTEMAC(depname1,_cntx_init)( cntx ); \ + PASTEMAC(depname2,_cntx_init)( cntx ); \ +\ + /* Initialize the context with the kernel associated with the current + operation. */ \ + bli_gks_cntx_set_l1f_ker( kertype, cntx ); \ +\ + /* Initialize the context with the current architecture's level-1f + fusing blocksizes. */ \ + bli_gks_cntx_set_blkszs( BLIS_NAT, 2, \ + BLIS_DF, BLIS_DF, /* dotxf fusing factor */ \ + BLIS_XF, BLIS_XF, /* dotxaxpyf fusing factor */ \ + cntx ); \ +} \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( dotxf, BLIS_DOTXF_KER, dotv, dotxv ) +GENFRONT( dotxaxpyf, BLIS_DOTXAXPYF_KER, dotxf, axpyf ) + diff --git a/frame/1f/bli_l1f_cntx.h b/frame/1f/bli_l1f_cntx.h new file mode 100644 index 000000000..86b3af25f --- /dev/null +++ b/frame/1f/bli_l1f_cntx.h @@ -0,0 +1,50 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype context initialization functions. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ); \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ); + +GENPROT( axpy2v ) +GENPROT( axpyf ) +GENPROT( dotaxpyv ) +GENPROT( dotxaxpyf ) +GENPROT( dotxf ) diff --git a/frame/1f/bli_l1f_ft.h b/frame/1f/bli_l1f_ft.h new file mode 100644 index 000000000..f8d15fc3c --- /dev/null +++ b/frame/1f/bli_l1f_ft.h @@ -0,0 +1,153 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#ifndef BLIS_L1F_FT_H +#define BLIS_L1F_FT_H + + +// +// -- Level-1f function types -------------------------------------------------- +// + +// axpy2v + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha1, \ + ctype* alpha2, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( axpy2v ) + +// axpyf + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( axpyf ) + +// dotaxpyv + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conjxt, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( dotaxpyv ) + +// dotxf + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conjat, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( dotxf ) + +// dotxaxpyf + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* w, inc_t incw, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( dotxaxpyf ) + + + +#endif + diff --git a/frame/1f/bli_l1f_ker.h b/frame/1f/bli_l1f_ker.h new file mode 100644 index 000000000..953aaf0af --- /dev/null +++ b/frame/1f/bli_l1f_ker.h @@ -0,0 +1,140 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Define template prototypes for level-1f kernels. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alphax, \ + ctype* alphay, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( axpy2v_ker_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( axpyf_ker_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjxt, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( dotaxpyv_ker_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* w, inc_t incw, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( dotxaxpyf_ker_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjat, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( dotxf_ker_name ) + diff --git a/frame/1f/bli_l1f_oapi.c b/frame/1f/bli_l1f_oapi.c new file mode 100644 index 000000000..d9c2eda10 --- /dev/null +++ b/frame/1f/bli_l1f_oapi.c @@ -0,0 +1,392 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +// Guard the function definitions so that they are only compiled when +// #included from files that define the object API macros. +#ifdef BLIS_ENABLE_OAPI + +// +// Define object-based interfaces. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alphax, \ + obj_t* alphay, \ + obj_t* x, \ + obj_t* y, \ + obj_t* z \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + conj_t conjx = bli_obj_conj_status( *x ); \ + conj_t conjy = bli_obj_conj_status( *y ); \ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t inc_y = bli_obj_vector_inc( *y ); \ + void* buf_z = bli_obj_buffer_at_off( *z ); \ + inc_t inc_z = bli_obj_vector_inc( *z ); \ +\ + void* buf_alphax; \ + void* buf_alphay; \ +\ + obj_t alphax_local; \ + obj_t alphay_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alphax, alphay, x, y, z ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alphax, &alphax_local ); \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alphay, &alphay_local ); \ + buf_alphax = bli_obj_buffer_for_1x1( dt, alphax_local ); \ + buf_alphay = bli_obj_buffer_for_1x1( dt, alphay_local ); \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_12 \ + ( \ + dt, \ + opname, \ + conjx, \ + conjy, \ + n, \ + buf_alphax, \ + buf_alphay, \ + buf_x, inc_x, \ + buf_y, inc_y, \ + buf_z, inc_z, \ + cntx \ + ); \ +} + +GENFRONT( axpy2v ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + conj_t conja = bli_obj_conj_status( *a ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ + dim_t m = bli_obj_vector_dim( *y ); \ + dim_t b_n = bli_obj_vector_dim( *x ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t inc_y = bli_obj_vector_inc( *y ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, a, x, y ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Support cases where matrix A requires a transposition. */ \ + if ( bli_obj_has_trans( *a ) ) { bli_swap_incs( rs_a, cs_a ); } \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_13 \ + ( \ + dt, \ + opname, \ + conja, \ + conjx, \ + m, \ + b_n, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_x, inc_x, \ + buf_y, inc_y, \ + cntx \ + ); \ +} + +GENFRONT( axpyf ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* xt, \ + obj_t* x, \ + obj_t* y, \ + obj_t* rho, \ + obj_t* z \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + conj_t conjxt = bli_obj_conj_status( *xt ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ + conj_t conjy = bli_obj_conj_status( *y ); \ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t inc_y = bli_obj_vector_inc( *y ); \ + void* buf_z = bli_obj_buffer_at_off( *z ); \ + inc_t inc_z = bli_obj_vector_inc( *z ); \ + void* buf_rho = bli_obj_buffer_at_off( *rho ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, xt, x, y, rho, z ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_13 \ + ( \ + dt, \ + opname, \ + conjxt, \ + conjx, \ + conjy, \ + n, \ + buf_alpha, \ + buf_x, inc_x, \ + buf_y, inc_y, \ + buf_rho, \ + buf_z, inc_z, \ + cntx \ + ); \ +} + +GENFRONT( dotaxpyv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* at, \ + obj_t* a, \ + obj_t* w, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y, \ + obj_t* z \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + conj_t conjat = bli_obj_conj_status( *at ); \ + conj_t conja = bli_obj_conj_status( *a ); \ + conj_t conjw = bli_obj_conj_status( *w ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ + dim_t m = bli_obj_vector_dim( *z ); \ + dim_t b_n = bli_obj_vector_dim( *y ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ + void* buf_w = bli_obj_buffer_at_off( *w ); \ + inc_t inc_w = bli_obj_vector_inc( *w ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t inc_y = bli_obj_vector_inc( *y ); \ + void* buf_z = bli_obj_buffer_at_off( *z ); \ + inc_t inc_z = bli_obj_vector_inc( *z ); \ +\ + void* buf_alpha; \ + void* buf_beta; \ +\ + obj_t alpha_local; \ + obj_t beta_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, at, a, w, x, beta, y, z ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + beta, &beta_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ + buf_beta = bli_obj_buffer_for_1x1( dt, beta_local ); \ +\ + /* Support cases where matrix A requires a transposition. */ \ + if ( bli_obj_has_trans( *a ) ) { bli_swap_incs( rs_a, cs_a ); } \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_20 \ + ( \ + dt, \ + opname, \ + conjat, \ + conja, \ + conjw, \ + conjx, \ + m, \ + b_n, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_w, inc_w, \ + buf_x, inc_x, \ + buf_beta, \ + buf_y, inc_y, \ + buf_z, inc_z, \ + cntx \ + ); \ +} + +GENFRONT( dotxaxpyf ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + conj_t conjat = bli_obj_conj_status( *a ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ + dim_t m = bli_obj_vector_dim( *x ); \ + dim_t b_n = bli_obj_vector_dim( *y ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t inc_x = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t inc_y = bli_obj_vector_inc( *y ); \ +\ + void* buf_alpha; \ + void* buf_beta; \ +\ + obj_t alpha_local; \ + obj_t beta_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, a, x, beta, y ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + beta, &beta_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ + buf_beta = bli_obj_buffer_for_1x1( dt, beta_local ); \ +\ + /* Support cases where matrix A requires a transposition. */ \ + if ( bli_obj_has_trans( *a ) ) { bli_swap_incs( rs_a, cs_a ); } \ +\ + /* Invoke the void pointer-based function. */ \ + bli_call_ft_14 \ + ( \ + dt, \ + opname, \ + conjat, \ + conjx, \ + m, \ + b_n, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_x, inc_x, \ + buf_beta, \ + buf_y, inc_y, \ + cntx \ + ); \ +} + +GENFRONT( dotxf ) + + +#endif + diff --git a/frame/1f/bli_l1f_oapi.h b/frame/1f/bli_l1f_oapi.h new file mode 100644 index 000000000..c69896ad7 --- /dev/null +++ b/frame/1f/bli_l1f_oapi.h @@ -0,0 +1,121 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alphax, \ + obj_t* alphay, \ + obj_t* x, \ + obj_t* y, \ + obj_t* z \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( axpy2v ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( axpyf ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* xt, \ + obj_t* x, \ + obj_t* y, \ + obj_t* rho, \ + obj_t* z \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( dotaxpyv ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* at, \ + obj_t* a, \ + obj_t* w, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y, \ + obj_t* z \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( dotxaxpyf ) + + +#undef GENTPROT +#define GENTPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENTPROT( dotxf ) + diff --git a/frame/1f/bli_l1f_oapi_wc.c b/frame/1f/bli_l1f_oapi_wc.c new file mode 100644 index 000000000..d4b6071c5 --- /dev/null +++ b/frame/1f/bli_l1f_oapi_wc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-aware. +#include "bli_oapi_w_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l1f_oapi.c" + diff --git a/frame/1f/bli_l1f_oapi_woc.c b/frame/1f/bli_l1f_oapi_woc.c new file mode 100644 index 000000000..e60068949 --- /dev/null +++ b/frame/1f/bli_l1f_oapi_woc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-less. +#include "bli_oapi_wo_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l1f_oapi.c" + diff --git a/frame/1f/bli_l1f_tapi.c b/frame/1f/bli_l1f_tapi.c new file mode 100644 index 000000000..a7efd91f8 --- /dev/null +++ b/frame/1f/bli_l1f_tapi.c @@ -0,0 +1,263 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alphax, \ + ctype* alphay, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1f_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + conjx, \ + conjy, \ + n, \ + alphax, \ + alphay, \ + x, incx, \ + y, incy, \ + z, incz, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( axpy2v, BLIS_AXPY2V_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1f_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + conja, \ + conjx, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + x, incx, \ + y, incy, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( axpyf, BLIS_AXPYF_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjxt, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1f_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + conjxt, \ + conjx, \ + conjy, \ + n, \ + alpha, \ + x, incx, \ + y, incy, \ + rho, \ + z, incz, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( dotaxpyv, BLIS_DOTAXPYV_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* w, inc_t incw, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1f_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + conjat, \ + conja, \ + conjw, \ + conjx, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + w, incw, \ + x, incx, \ + beta, \ + y, incy, \ + z, incz, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( dotxaxpyf, BLIS_DOTXAXPYF_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjat, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx_p; \ +\ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + PASTECH2(ch,opname,_ft) f = bli_cntx_get_l1f_ker_dt( dt, kerid, cntx_p ); \ +\ + f \ + ( \ + conjat, \ + conjx, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + x, incx, \ + beta, \ + y, incy, \ + cntx_p \ + ); \ +\ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( dotxf, BLIS_DOTXF_KER ) + diff --git a/frame/1f/bli_l1f_tapi.h b/frame/1f/bli_l1f_tapi.h new file mode 100644 index 000000000..55740c263 --- /dev/null +++ b/frame/1f/bli_l1f_tapi.h @@ -0,0 +1,58 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Generate prototypes for level-1f operations. +// + +#undef axpy2v_ker_name +#define axpy2v_ker_name axpy2v + +#undef dotaxpyv_ker_name +#define dotaxpyv_ker_name dotaxpyv + +#undef axpyf_ker_name +#define axpyf_ker_name axpyf + +#undef dotxf_ker_name +#define dotxf_ker_name dotxf + +#undef dotxaxpyf_ker_name +#define dotxaxpyf_ker_name dotxaxpyf + +// Include the level-1f kernel API template. + +#include "bli_l1f_ker.h" + diff --git a/frame/1f/dotaxpyv/bli_dotaxpyv.c b/frame/1f/dotaxpyv/bli_dotaxpyv.c deleted file mode 100644 index 84025bee4..000000000 --- a/frame/1f/dotaxpyv/bli_dotaxpyv.c +++ /dev/null @@ -1,139 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* alpha, \ - obj_t* xt, \ - obj_t* x, \ - obj_t* y, \ - obj_t* rho, \ - obj_t* z \ - ) \ -{ \ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( alpha, xt, x, y, rho, z ); \ -\ - PASTEMAC0(varname)( alpha, \ - xt, \ - x, \ - y, \ - rho, \ - z ); \ -} - -GENFRONT( dotaxpyv, dotaxpyv_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjxt, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy, \ - ctype* rho, \ - ctype* z, inc_t incz \ - ) \ -{ \ - PASTEMAC3(ch,ch,ch,varname)( conjxt, \ - conjx, \ - conjy, \ - m, \ - alpha, \ - x, incx, \ - y, incy, \ - rho, \ - z, incz ); \ -} - -INSERT_GENTFUNC_BASIC( dotaxpyv, DOTAXPYV_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, opname, varname ) \ -\ -void PASTEMAC3(chx,chy,chz,opname)( \ - conj_t conjxt, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype_x* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_xy* rho, \ - ctype_z* z, inc_t incz \ - ) \ -{ \ - PASTEMAC3(chx,chy,chz,varname)( conjxt, \ - conjx, \ - conjy, \ - m, \ - alpha, \ - x, incx, \ - y, incy, \ - rho, \ - z, incz ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( dotaxpyv, DOTAXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( dotaxpyv, DOTAXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( dotaxpyv, DOTAXPYV_KERNEL ) -#endif - diff --git a/frame/1f/dotaxpyv/bli_dotaxpyv_kernel.c b/frame/1f/dotaxpyv/bli_dotaxpyv_kernel.c deleted file mode 100644 index faafb226b..000000000 --- a/frame/1f/dotaxpyv/bli_dotaxpyv_kernel.c +++ /dev/null @@ -1,155 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -#define FUNCPTR_T dotaxpyv_fp - -typedef void (*FUNCPTR_T)( - conj_t conjxt, - conj_t conjx, - conj_t conjy, - dim_t n, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* rho, - void* z, inc_t incz - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotaxpyv_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotaxpyv_kernel_void); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotaxpyv_kernel_void); -#endif -#endif - - -void bli_dotaxpyv_kernel( obj_t* alpha, - obj_t* xt, - obj_t* x, - obj_t* y, - obj_t* rho, - obj_t* z ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_z = bli_obj_datatype( *z ); - - conj_t conjxt = bli_obj_conj_status( *xt ); - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - inc_t inc_z = bli_obj_vector_inc( *z ); - void* buf_z = bli_obj_buffer_at_off( *z ); - - void* buf_rho = bli_obj_buffer_at_off( *rho ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // If alpha is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the alpha object and extract the buffer at the alpha offset. - bli_set_scalar_dt_buffer( alpha, dt_x, dt_alpha, buf_alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_z]; - - // Invoke the function. - f( conjxt, - conjx, - conjy, - n, - buf_alpha, - buf_x, inc_x, - buf_y, inc_y, - buf_rho, - buf_z, inc_z ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, varname, kername ) \ -\ -void PASTEMAC3(chx,chy,chz,varname)( \ - conj_t conjxt, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* rho, \ - void* z, inc_t incz \ - ) \ -{ \ - PASTEMAC3(chx,chy,chz,kername)( conjxt, \ - conjx, \ - conjy, \ - m, \ - alpha, \ - x, incx, \ - y, incy, \ - rho, \ - z, incz ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( dotaxpyv_kernel_void, DOTAXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( dotaxpyv_kernel_void, DOTAXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( dotaxpyv_kernel_void, DOTAXPYV_KERNEL ) -#endif - diff --git a/frame/1f/dotaxpyv/bli_dotaxpyv_kernel.h b/frame/1f/dotaxpyv/bli_dotaxpyv_kernel.h deleted file mode 100644 index b5e9cc802..000000000 --- a/frame/1f/dotaxpyv/bli_dotaxpyv_kernel.h +++ /dev/null @@ -1,71 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_dotaxpyv_kernel( obj_t* alpha, - obj_t* xt, - obj_t* x, - obj_t* y, - obj_t* rho, - obj_t* z ); - - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, varname ) \ -\ -void PASTEMAC3(chx,chy,chz,varname)( \ - conj_t conjxt, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* rho, \ - void* z, inc_t incz \ - ); - -INSERT_GENTPROT3U12_BASIC( dotaxpyv_kernel_void ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotaxpyv_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotaxpyv_kernel_void ) -#endif - diff --git a/frame/1f/dotaxpyv/bli_dotaxpyv_ref.c b/frame/1f/dotaxpyv/bli_dotaxpyv_ref.c deleted file mode 100644 index b45fc44cb..000000000 --- a/frame/1f/dotaxpyv/bli_dotaxpyv_ref.c +++ /dev/null @@ -1,170 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T dotaxpyv_fp - -typedef void (*FUNCPTR_T)( - conj_t conjxt, - conj_t conjx, - conj_t conjy, - dim_t n, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* rho, - void* z, inc_t incz - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotaxpyv_ref); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotaxpyv_ref); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotaxpyv_ref); -#endif -#endif - - -void bli_dotaxpyv_ref( obj_t* alpha, - obj_t* xt, - obj_t* x, - obj_t* y, - obj_t* rho, - obj_t* z ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_z = bli_obj_datatype( *z ); - - conj_t conjxt = bli_obj_conj_status( *xt ); - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - inc_t inc_z = bli_obj_vector_inc( *z ); - void* buf_z = bli_obj_buffer_at_off( *z ); - - void* buf_rho = bli_obj_buffer_at_off( *rho ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // If alpha is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the alpha object and extract the buffer at the alpha offset. - bli_set_scalar_dt_buffer( alpha, dt_x, dt_alpha, buf_alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_z]; - - // Invoke the function. - f( conjxt, - conjx, - conjy, - n, - buf_alpha, - buf_x, inc_x, - buf_y, inc_y, - buf_rho, - buf_z, inc_z ); -} -*/ - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, varname, dotxvker, axpyvker ) \ -\ -void PASTEMAC3(chx,chy,chz,varname) \ - ( \ - conj_t conjxt, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype_x* restrict alpha, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_xy* restrict rho, \ - ctype_z* restrict z, inc_t incz \ - ) \ -{ \ - ctype_xy* one = PASTEMAC(chxy,1); \ - ctype_xy* zero = PASTEMAC(chxy,0); \ - ctype_x* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_xy* rho_cast = rho; \ - ctype_z* z_cast = z; \ -\ - PASTEMAC3(chx,chy,chxy,dotxvker)( conjxt, \ - conjy, \ - m, \ - one, \ - x_cast, incx, \ - y_cast, incy, \ - zero, \ - rho_cast ); \ - PASTEMAC3(chx,chx,chz,axpyvker)( conjx, \ - m, \ - alpha_cast, \ - x_cast, incx, \ - z_cast, incz ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC2( dotaxpyv_ref, DOTXV_KERNEL, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D2( dotaxpyv_ref, DOTXV_KERNEL, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P2( dotaxpyv_ref, DOTXV_KERNEL, AXPYV_KERNEL ) -#endif - diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf.c b/frame/1f/dotxaxpyf/bli_dotxaxpyf.c deleted file mode 100644 index 7be080926..000000000 --- a/frame/1f/dotxaxpyf/bli_dotxaxpyf.c +++ /dev/null @@ -1,159 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* alpha, \ - obj_t* at, \ - obj_t* a, \ - obj_t* w, \ - obj_t* x, \ - obj_t* beta, \ - obj_t* y, \ - obj_t* z \ - ) \ -{ \ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( alpha, at, a, w, x, beta, y, z ); \ -\ - PASTEMAC0(varname)( alpha, \ - at, \ - a, \ - w, \ - x, \ - beta, \ - y, \ - z ); \ -} - -GENFRONT( dotxaxpyf, dotxaxpyf_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjat, \ - conj_t conja, \ - conj_t conjw, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype* alpha, \ - ctype* a, inc_t inca, inc_t lda, \ - ctype* w, inc_t incw, \ - ctype* x, inc_t incx, \ - ctype* beta, \ - ctype* y, inc_t incy, \ - ctype* z, inc_t incz \ - ) \ -{ \ - PASTEMAC3(ch,ch,ch,varname)( conjat, \ - conja, \ - conjw, \ - conjx, \ - m, \ - b_n, \ - alpha, \ - a, inca, lda, \ - w, incw, \ - x, incx, \ - beta, \ - y, incy, \ - z, incz ); \ -} - -INSERT_GENTFUNC_BASIC( dotxaxpyf, DOTXAXPYF_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_b, ctype_c, ctype_ab, cha, chb, chc, chab, opname, varname ) \ -\ -void PASTEMAC3(cha,chb,chc,opname)( \ - conj_t conjat, \ - conj_t conja, \ - conj_t conjw, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ab* alpha, \ - ctype_a* a, inc_t inca, inc_t lda, \ - ctype_b* w, inc_t incw, \ - ctype_b* x, inc_t incx, \ - ctype_c* beta, \ - ctype_c* y, inc_t incy, \ - ctype_c* z, inc_t incz \ - ) \ -{ \ - PASTEMAC3(cha,chb,chc,varname)( conjat, \ - conja, \ - conjw, \ - conjx, \ - m, \ - b_n, \ - alpha, \ - a, inca, lda, \ - w, incw, \ - x, incx, \ - beta, \ - y, incy, \ - z, incz ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( dotxaxpyf, DOTXAXPYF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( dotxaxpyf, DOTXAXPYF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( dotxaxpyf, DOTXAXPYF_KERNEL ) -#endif - diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf_kernel.c b/frame/1f/dotxaxpyf/bli_dotxaxpyf_kernel.c deleted file mode 100644 index 6e4e9b0b8..000000000 --- a/frame/1f/dotxaxpyf/bli_dotxaxpyf_kernel.c +++ /dev/null @@ -1,188 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -#define FUNCPTR_T dotxaxpyf_fp - -typedef void (*FUNCPTR_T)( - conj_t conjat, - conj_t conja, - conj_t conjw, - conj_t conjx, - dim_t m, - dim_t b_n, - void* alpha, - void* a, inc_t inca, inc_t lda, - void* w, inc_t incw, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy, - void* z, inc_t incz - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotxaxpyf_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotxaxpyf_kernel_void); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotxaxpyf_kernel_void); -#endif -#endif - - -void bli_dotxaxpyf_kernel( obj_t* alpha, - obj_t* at, - obj_t* a, - obj_t* w, - obj_t* x, - obj_t* beta, - obj_t* y, - obj_t* z ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t conjat = bli_obj_conj_status( *at ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjw = bli_obj_conj_status( *w ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_vector_dim( *z ); - dim_t b_n = bli_obj_vector_dim( *y ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - inc_t inc_w = bli_obj_vector_inc( *w ); - void* buf_w = bli_obj_buffer_at_off( *w ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - inc_t inc_z = bli_obj_vector_inc( *z ); - void* buf_z = bli_obj_buffer_at_off( *z ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( conjat, - conja, - conjw, - conjx, - m, - b_n, - buf_alpha, - buf_a, rs_a, cs_a, - buf_w, inc_w, - buf_x, inc_x, - buf_beta, - buf_y, inc_y, - buf_z, inc_z ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_b, ctype_c, ctype_ab, cha, chb, chc, chab, varname, kername ) \ -\ -void PASTEMAC3(cha,chb,chc,varname)( \ - conj_t conjat, \ - conj_t conja, \ - conj_t conjw, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - void* alpha, \ - void* a, inc_t inca, inc_t lda, \ - void* w, inc_t incw, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy, \ - void* z, inc_t incz \ - ) \ -{ \ - PASTEMAC3(cha,chb,chc,kername)( conjat, \ - conja, \ - conjw, \ - conjx, \ - m, \ - b_n, \ - alpha, \ - a, inca, lda, \ - w, incw, \ - x, incx, \ - beta, \ - y, incy, \ - z, incz ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( dotxaxpyf_kernel_void, DOTXAXPYF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( dotxaxpyf_kernel_void, DOTXAXPYF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( dotxaxpyf_kernel_void, DOTXAXPYF_KERNEL ) -#endif - diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf_kernel.h b/frame/1f/dotxaxpyf/bli_dotxaxpyf_kernel.h deleted file mode 100644 index 9d7f0b475..000000000 --- a/frame/1f/dotxaxpyf/bli_dotxaxpyf_kernel.h +++ /dev/null @@ -1,77 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_dotxaxpyf_kernel( obj_t* alpha, - obj_t* at, - obj_t* a, - obj_t* w, - obj_t* x, - obj_t* beta, - obj_t* y, - obj_t* z ); - - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_b, ctype_c, ctype_ab, cha, chb, chc, chab, varname ) \ -\ -void PASTEMAC3(cha,chb,chc,varname)( \ - conj_t conjat, \ - conj_t conja, \ - conj_t conjw, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - void* alpha, \ - void* a, inc_t inca, inc_t lda, \ - void* w, inc_t incw, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy, \ - void* z, inc_t incz \ - ); - -INSERT_GENTPROT3U12_BASIC( dotxaxpyf_kernel_void ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotxaxpyf_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotxaxpyf_kernel_void ) -#endif - diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var1.c b/frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var1.c deleted file mode 100644 index c0b9ff878..000000000 --- a/frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var1.c +++ /dev/null @@ -1,227 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T dotxaxpyf_fp - -typedef void (*FUNCPTR_T)( - conj_t conjat, - conj_t conja, - conj_t conjw, - conj_t conjx, - dim_t m, - dim_t b_n, - void* alpha, - void* a, inc_t inca, inc_t lda, - void* w, inc_t incw, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy, - void* z, inc_t incz - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotxaxpyf_ref_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotxaxpyf_ref_var1); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotxaxpyf_ref_var1); -#endif -#endif - - -void bli_dotxaxpyf_ref_var1( obj_t* alpha, - obj_t* at, - obj_t* a, - obj_t* w, - obj_t* x, - obj_t* beta, - obj_t* y, - obj_t* z ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t conjat = bli_obj_conj_status( *at ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjw = bli_obj_conj_status( *w ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_vector_dim( *z ); - dim_t b_n = bli_obj_vector_dim( *y ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - inc_t inc_w = bli_obj_vector_inc( *w ); - void* buf_w = bli_obj_buffer_at_off( *w ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - inc_t inc_z = bli_obj_vector_inc( *z ); - void* buf_z = bli_obj_buffer_at_off( *z ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( conjat, - conja, - conjw, - conjx, - m, - b_n, - buf_alpha, - buf_a, rs_a, cs_a, - buf_w, inc_w, - buf_x, inc_x, - buf_beta, - buf_y, inc_y, - buf_z, inc_z ); -} -*/ - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_b, ctype_c, ctype_ab, cha, chb, chc, chab, varname, dotxvker, axpyvker ) \ -\ -void PASTEMAC3(cha,chb,chc,varname) \ - ( \ - conj_t conjat, \ - conj_t conja, \ - conj_t conjw, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ab* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_b* restrict w, inc_t incw, \ - ctype_b* restrict x, inc_t incx, \ - ctype_c* restrict beta, \ - ctype_c* restrict y, inc_t incy, \ - ctype_c* restrict z, inc_t incz \ - ) \ -{ \ - ctype_ab* alpha_cast = alpha; \ - ctype_a* a_cast = a; \ - ctype_b* w_cast = w; \ - ctype_b* x_cast = x; \ - ctype_c* beta_cast = beta; \ - ctype_c* y_cast = y; \ - ctype_c* z_cast = z; \ - ctype_a* a1; \ - ctype_b* chi1; \ - ctype_b* w1; \ - ctype_c* psi1; \ - ctype_c* z1; \ - ctype_b conjx_chi1; \ - ctype_ab alpha_chi1; \ - dim_t i; \ -\ - /* A is m x n. */ \ - /* y = beta * y + alpha * A^T w; */ \ - /* z = z + alpha * A x; */ \ - for ( i = 0; i < b_n; ++i ) \ - { \ - a1 = a_cast + (0 )*inca + (i )*lda; \ - w1 = w_cast + (0 )*incw; \ - psi1 = y_cast + (i )*incy; \ -\ - PASTEMAC3(cha,chb,chc,dotxv)( conjat, \ - conjw, \ - m, \ - alpha_cast, \ - a1, inca, \ - w1, incw, \ - beta_cast, \ - psi1 ); \ - } \ -\ - for ( i = 0; i < b_n; ++i ) \ - { \ - a1 = a_cast + (0 )*inca + (i )*lda; \ - chi1 = x_cast + (i )*incx; \ - z1 = z_cast + (0 )*incz; \ -\ - PASTEMAC2(chb,chb,copycjs)( conjx, *chi1, conjx_chi1 ); \ - PASTEMAC3(chab,chb,chab,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ - PASTEMAC3(chab,cha,chc,axpyv)( conja, \ - m, \ - &alpha_chi1, \ - a1, inca, \ - z1, incz ); \ - } \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC2( dotxaxpyf_ref_var1, DOTXV_KERNEL, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D2( dotxaxpyf_ref_var1, DOTXV_KERNEL, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P2( dotxaxpyf_ref_var1, DOTXV_KERNEL, AXPYV_KERNEL ) -#endif - diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var2.c b/frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var2.c deleted file mode 100644 index 449dbfb5f..000000000 --- a/frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var2.c +++ /dev/null @@ -1,208 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -/* -#define FUNCPTR_T dotxaxpyf_fp - -typedef void (*FUNCPTR_T)( - conj_t conjat, - conj_t conja, - conj_t conjw, - conj_t conjx, - dim_t m, - dim_t b_n, - void* alpha, - void* a, inc_t inca, inc_t lda, - void* w, inc_t incw, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy, - void* z, inc_t incz - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotxaxpyf_ref_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotxaxpyf_ref_var2); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotxaxpyf_ref_var2); -#endif -#endif - - -void bli_dotxaxpyf_ref_var2( obj_t* alpha, - obj_t* at, - obj_t* a, - obj_t* w, - obj_t* x, - obj_t* beta, - obj_t* y, - obj_t* z ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t conjat = bli_obj_conj_status( *at ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjw = bli_obj_conj_status( *w ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_vector_dim( *z ); - dim_t b_n = bli_obj_vector_dim( *y ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - inc_t inc_w = bli_obj_vector_inc( *w ); - void* buf_w = bli_obj_buffer_at_off( *w ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - - inc_t inc_z = bli_obj_vector_inc( *z ); - void* buf_z = bli_obj_buffer_at_off( *z ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( conjat, - conja, - conjw, - conjx, - m, - b_n, - buf_alpha, - buf_a, rs_a, cs_a, - buf_w, inc_w, - buf_x, inc_x, - buf_beta, - buf_y, inc_y, - buf_z, inc_z ); -} -*/ - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_b, ctype_c, ctype_ab, cha, chb, chc, chab, varname, dotxfker, axpyfker ) \ -\ -void PASTEMAC3(cha,chb,chc,varname) \ - ( \ - conj_t conjat, \ - conj_t conja, \ - conj_t conjw, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ab* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_b* restrict w, inc_t incw, \ - ctype_b* restrict x, inc_t incx, \ - ctype_c* restrict beta, \ - ctype_c* restrict y, inc_t incy, \ - ctype_c* restrict z, inc_t incz \ - ) \ -{ \ - ctype_ab* alpha_cast = alpha; \ - ctype_a* a_cast = a; \ - ctype_b* w_cast = w; \ - ctype_b* x_cast = x; \ - ctype_c* beta_cast = beta; \ - ctype_c* y_cast = y; \ - ctype_c* z_cast = z; \ -\ - /* A is m x n. */ \ - /* y = beta * y + alpha * A^T w; */ \ - /* z = z + alpha * A x; */ \ -\ - PASTEMAC3(cha,chb,chc,dotxfker)( conjat, \ - conjw, \ - m, \ - b_n, \ - alpha_cast, \ - a_cast, inca, lda, \ - w_cast, incw, \ - beta_cast, \ - y_cast, incy ); \ -\ - PASTEMAC3(cha,chb,chc,axpyfker)( conja, \ - conjx, \ - m, \ - b_n, \ - alpha_cast, \ - a_cast, inca, lda, \ - x_cast, incx, \ - z_cast, incz ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC2( dotxaxpyf_ref_var2, DOTXF_KERNEL, AXPYF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D2( dotxaxpyf_ref_var2, DOTXF_KERNEL, AXPYF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P2( dotxaxpyf_ref_var2, DOTXF_KERNEL, AXPYF_KERNEL ) -#endif - diff --git a/frame/1f/dotxf/bli_dotxf.c b/frame/1f/dotxf/bli_dotxf.c deleted file mode 100644 index b5c1775d1..000000000 --- a/frame/1f/dotxf/bli_dotxf.c +++ /dev/null @@ -1,147 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// -// Define object-based interface. -// -#undef GENFRONT -#define GENFRONT( opname, varname ) \ -\ -void PASTEMAC0(opname)( \ - obj_t* alpha, \ - obj_t* a, \ - obj_t* x, \ - obj_t* beta, \ - obj_t* y \ - ) \ -{ \ - obj_t a_local; \ -\ - if ( bli_error_checking_is_enabled() ) \ - PASTEMAC(opname,_check)( alpha, a, x, beta, y ); \ -\ - bli_obj_alias_to( *a, a_local ); \ -\ - if ( bli_obj_has_trans( a_local ) ) \ - { \ - bli_obj_induce_trans( a_local ); \ - bli_obj_toggle_trans( a_local ); \ - } \ -\ - PASTEMAC0(varname)( alpha, \ - &a_local, \ - x, \ - beta, \ - y ); \ -} - -GENFRONT( dotxf, dotxf_kernel ) - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjat, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype* alpha, \ - ctype* a, inc_t inca, inc_t lda, \ - ctype* x, inc_t incx, \ - ctype* beta, \ - ctype* y, inc_t incy \ - ) \ -{ \ - PASTEMAC3(ch,ch,ch,varname)( conjat, \ - conjx, \ - m, \ - b_n, \ - alpha, \ - a, inca, lda, \ - x, incx, \ - beta, \ - y, incy ); \ -} - -INSERT_GENTFUNC_BASIC( dotxf, DOTXF_KERNEL ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, opname, varname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - conj_t conjat, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t inca, inc_t lda, \ - ctype_x* x, inc_t incx, \ - ctype_y* beta, \ - ctype_y* y, inc_t incy \ - ) \ -{ \ - PASTEMAC3(cha,chx,chy,varname)( conjat, \ - conjx, \ - m, \ - b_n, \ - alpha, \ - a, inca, lda, \ - x, incx, \ - beta, \ - y, incy ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( dotxf, DOTXF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( dotxf, DOTXF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( dotxf, DOTXF_KERNEL ) -#endif - diff --git a/frame/1f/dotxf/bli_dotxf_kernel.h b/frame/1f/dotxf/bli_dotxf_kernel.h deleted file mode 100644 index de672caa2..000000000 --- a/frame/1f/dotxf/bli_dotxf_kernel.h +++ /dev/null @@ -1,70 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_dotxf_kernel( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ); - - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname ) \ -\ -void PASTEMAC3(cha,chx,chy,varname)( \ - conj_t conjat, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - void* alpha, \ - void* a, inc_t inca, inc_t lda, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ); - -INSERT_GENTPROT3U12_BASIC( dotxf_kernel_void ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotxf_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotxf_kernel_void ) -#endif - diff --git a/frame/1f/kernels/bli_axpy2v_ref.c b/frame/1f/kernels/bli_axpy2v_ref.c new file mode 100644 index 000000000..e91a510cb --- /dev/null +++ b/frame/1f/kernels/bli_axpy2v_ref.c @@ -0,0 +1,80 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alphax, \ + ctype* alphay, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ) \ +{ \ + /* Query the context for the kernel function pointer. */ \ + const num_t dt = PASTEMAC(ch,type); \ + PASTECH(ch,axpyv_ft) kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ +\ + kfp_av \ + ( \ + conjx, \ + n, \ + alphax, \ + x, incx, \ + z, incz, \ + cntx \ + ); \ +\ + kfp_av \ + ( \ + conjy, \ + n, \ + alphay, \ + y, incy, \ + z, incz, \ + cntx \ + ); \ +} + +INSERT_GENTFUNC_BASIC0( axpy2v_ref ) + diff --git a/frame/1f/kernels/bli_axpyf_ref.c b/frame/1f/kernels/bli_axpyf_ref.c new file mode 100644 index 000000000..228d53823 --- /dev/null +++ b/frame/1f/kernels/bli_axpyf_ref.c @@ -0,0 +1,86 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* a1; \ + ctype* chi1; \ + ctype* y1; \ + ctype alpha_chi1; \ + dim_t i; \ +\ + /* Query the context for the kernel function pointer. */ \ + const num_t dt = PASTEMAC(ch,type); \ + PASTECH(ch,axpyv_ft) kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ +\ + for ( i = 0; i < b_n; ++i ) \ + { \ + a1 = a + (0 )*inca + (i )*lda; \ + chi1 = x + (i )*incx; \ + y1 = y + (0 )*incy; \ +\ + PASTEMAC(ch,copycjs)( conjx, *chi1, alpha_chi1 ); \ + PASTEMAC(ch,scals)( *alpha, alpha_chi1 ); \ +\ + kfp_av \ + ( \ + conja, \ + m, \ + &alpha_chi1, \ + a1, inca, \ + y1, incy, \ + cntx \ + ); \ + } \ +} + +INSERT_GENTFUNC_BASIC0( axpyf_ref ) + diff --git a/frame/1/invertv/bli_invertv_kernel.c b/frame/1f/kernels/bli_dotaxpyv_ref.c similarity index 64% rename from frame/1/invertv/bli_invertv_kernel.c rename to frame/1f/kernels/bli_dotaxpyv_ref.c index 81ef38309..22893a5d4 100644 --- a/frame/1/invertv/bli_invertv_kernel.c +++ b/frame/1f/kernels/bli_dotaxpyv_ref.c @@ -34,48 +34,55 @@ #include "blis.h" -#define FUNCPTR_T invertv_fp - -typedef void (*FUNCPTR_T)( - dim_t n, - void* x, inc_t incx - ); - -static FUNCPTR_T GENARRAY(ftypes,invertv_kernel_void); - - -void bli_invertv_kernel( obj_t* x ) -{ - num_t dt_x = bli_obj_datatype( *x ); - - dim_t n = bli_obj_vector_dim( *x ); - - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - FUNCPTR_T f; - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x]; - - // Invoke the function. - f( n, - buf_x, inc_x ); -} - #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, kername ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t n, \ - void* x, inc_t incx \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjxt, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ) \ { \ - PASTEMAC(ch,kername)( n, \ - x, incx ); \ + ctype* one = PASTEMAC(ch,1); \ + ctype* zero = PASTEMAC(ch,0); \ +\ + /* Query the context for the kernel function pointer. */ \ + const num_t dt = PASTEMAC(ch,type); \ + PASTECH(ch,dotxv_ft) kfp_dv = bli_cntx_get_l1v_ker_dt( dt, BLIS_DOTXV_KER, cntx ); \ + PASTECH(ch,axpyv_ft) kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ +\ + kfp_dv \ + ( \ + conjxt, \ + conjy, \ + m, \ + one, \ + x, incx, \ + y, incy, \ + zero, \ + rho, \ + cntx \ + ); \ +\ + kfp_av \ + ( \ + conjx, \ + m, \ + alpha, \ + x, incx, \ + z, incz, \ + cntx \ + ); \ } -INSERT_GENTFUNC_BASIC( invertv_kernel_void, INVERTV_KERNEL ) +INSERT_GENTFUNC_BASIC0( dotaxpyv_ref ) diff --git a/frame/1f/kernels/bli_dotxaxpyf_ref_var1.c b/frame/1f/kernels/bli_dotxaxpyf_ref_var1.c new file mode 100644 index 000000000..4d2851fed --- /dev/null +++ b/frame/1f/kernels/bli_dotxaxpyf_ref_var1.c @@ -0,0 +1,118 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* w, inc_t incw, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* a1; \ + ctype* chi1; \ + ctype* w1; \ + ctype* psi1; \ + ctype* z1; \ + ctype conjx_chi1; \ + ctype alpha_chi1; \ + dim_t i; \ +\ + /* Query the context for the kernel function pointer. */ \ + const num_t dt = PASTEMAC(ch,type); \ + PASTECH(ch,dotxv_ft) kfp_dv = bli_cntx_get_l1v_ker_dt( dt, BLIS_DOTXV_KER, cntx ); \ + PASTECH(ch,axpyv_ft) kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ +\ + /* A is m x n. */ \ + /* y = beta * y + alpha * A^T w; */ \ + /* z = z + alpha * A x; */ \ + for ( i = 0; i < b_n; ++i ) \ + { \ + a1 = a + (0 )*inca + (i )*lda; \ + w1 = w + (0 )*incw; \ + psi1 = y + (i )*incy; \ +\ + kfp_dv \ + ( \ + conjat, \ + conjw, \ + m, \ + alpha, \ + a1, inca, \ + w1, incw, \ + beta, \ + psi1, \ + cntx \ + ); \ + } \ +\ + for ( i = 0; i < b_n; ++i ) \ + { \ + a1 = a + (0 )*inca + (i )*lda; \ + chi1 = x + (i )*incx; \ + z1 = z + (0 )*incz; \ +\ + PASTEMAC(ch,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC(ch,scal2s)( *alpha, conjx_chi1, alpha_chi1 ); \ +\ + kfp_av \ + ( \ + conja, \ + m, \ + &alpha_chi1, \ + a1, inca, \ + z1, incz, \ + cntx \ + ); \ + } \ +} + +INSERT_GENTFUNC_BASIC0( dotxaxpyf_ref_var1 ) + diff --git a/frame/1f/kernels/bli_dotxaxpyf_ref_var2.c b/frame/1f/kernels/bli_dotxaxpyf_ref_var2.c new file mode 100644 index 000000000..051e86f01 --- /dev/null +++ b/frame/1f/kernels/bli_dotxaxpyf_ref_var2.c @@ -0,0 +1,97 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* w, inc_t incw, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ) \ +{ \ + /* A is m x n. */ \ + /* y = beta * y + alpha * A^T w; */ \ + /* z = z + alpha * A x; */ \ +\ + /* Query the context for the kernel function pointer. */ \ + const num_t dt = PASTEMAC(ch,type); \ + PASTECH(ch,dotxf_ft) kfp_df = bli_cntx_get_l1f_ker_dt( dt, BLIS_DOTXF_KER, cntx ); \ + PASTECH(ch,axpyf_ft) kfp_af = bli_cntx_get_l1f_ker_dt( dt, BLIS_AXPYF_KER, cntx ); \ +\ + kfp_df \ + ( \ + conjat, \ + conjw, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + w, incw, \ + beta, \ + y, incy, \ + cntx \ + ); \ +\ + kfp_af \ + ( \ + conja, \ + conjx, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + x, incx, \ + z, incz, \ + cntx \ + ); \ +} + +INSERT_GENTFUNC_BASIC0( dotxaxpyf_ref_var2 ) + diff --git a/frame/1f/kernels/bli_dotxf_ref.c b/frame/1f/kernels/bli_dotxf_ref.c new file mode 100644 index 000000000..5e50847db --- /dev/null +++ b/frame/1f/kernels/bli_dotxf_ref.c @@ -0,0 +1,86 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjat, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* a1; \ + ctype* x1; \ + ctype* psi1; \ + dim_t i; \ +\ + /* Query the context for the kernel function pointer. */ \ + const num_t dt = PASTEMAC(ch,type); \ + PASTECH(ch,dotxv_ft) kfp_dv = bli_cntx_get_l1v_ker_dt( dt, BLIS_DOTXV_KER, cntx ); \ +\ + for ( i = 0; i < b_n; ++i ) \ + { \ + a1 = a + (0 )*inca + (i )*lda; \ + x1 = x + (0 )*incx; \ + psi1 = y + (i )*incy; \ +\ + kfp_dv \ + ( \ + conjat, \ + conjx, \ + m, \ + alpha, \ + a1, inca, \ + x1, incx, \ + beta, \ + psi1, \ + cntx \ + ); \ + } \ +} + +INSERT_GENTFUNC_BASIC0( dotxf_ref ) + diff --git a/frame/1f/kernels/bli_l1f_ref.h b/frame/1f/kernels/bli_l1f_ref.h new file mode 100644 index 000000000..6a73ac5d1 --- /dev/null +++ b/frame/1f/kernels/bli_l1f_ref.h @@ -0,0 +1,160 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha1, \ + ctype* alpha2, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( axpy2v_ref ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( axpyf_ref ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjxt, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( dotaxpyv_ref ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* w, inc_t incw, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( dotxaxpyf_ref_var1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* w, inc_t incw, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( dotxaxpyf_ref_var2 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjat, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( dotxf_ref ) + diff --git a/frame/1f/axpy2v/bli_axpy2v_ref.h b/frame/1f/kernels/old/bli_axpy2v_ref.h similarity index 76% rename from frame/1f/axpy2v/bli_axpy2v_ref.h rename to frame/1f/kernels/old/bli_axpy2v_ref.h index e36ab2089..ad0d32374 100644 --- a/frame/1f/axpy2v/bli_axpy2v_ref.h +++ b/frame/1f/kernels/old/bli_axpy2v_ref.h @@ -32,37 +32,21 @@ */ -/* -void bli_axpy2v_ref( obj_t* alpha1, - obj_t* alpha2, - obj_t* x, - obj_t* y, - obj_t* z ); -*/ - #undef GENTPROT3U12 #define GENTPROT3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, varname ) \ \ -void PASTEMAC3(chx,chy,chz,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conjx, \ conj_t conjy, \ dim_t n, \ - ctype_xy* restrict alpha1, \ - ctype_xy* restrict alpha2, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_z* restrict z, inc_t incz \ + ctype_xy* alpha1, \ + ctype_xy* alpha2, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy, \ + ctype_z* z, inc_t incz \ ); INSERT_GENTPROT3U12_BASIC( axpy2v_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( axpy2v_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( axpy2v_ref ) -#endif - diff --git a/frame/1f/axpyf/bli_axpyf_ref.h b/frame/1f/kernels/old/bli_axpyf_ref.h similarity index 78% rename from frame/1f/axpyf/bli_axpyf_ref.h rename to frame/1f/kernels/old/bli_axpyf_ref.h index 86fb32681..bc5fbc0e4 100644 --- a/frame/1f/axpyf/bli_axpyf_ref.h +++ b/frame/1f/kernels/old/bli_axpyf_ref.h @@ -32,36 +32,21 @@ */ -/* -void bli_axpyf_ref( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* y ); -*/ - #undef GENTPROT3U12 #define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conja, \ conj_t conjx, \ dim_t m, \ dim_t b_n, \ - ctype_ax* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ + ctype_ax* alpha, \ + ctype_a* a, inc_t inca, inc_t lda, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ ); INSERT_GENTPROT3U12_BASIC( axpyf_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( axpyf_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( axpyf_ref ) -#endif - diff --git a/frame/1f/dotaxpyv/bli_dotaxpyv_ref.h b/frame/1f/kernels/old/bli_dotaxpyv_ref.h similarity index 75% rename from frame/1f/dotaxpyv/bli_dotaxpyv_ref.h rename to frame/1f/kernels/old/bli_dotaxpyv_ref.h index 40d41916d..4771d16c6 100644 --- a/frame/1f/dotaxpyv/bli_dotaxpyv_ref.h +++ b/frame/1f/kernels/old/bli_dotaxpyv_ref.h @@ -32,39 +32,22 @@ */ -/* -void bli_dotaxpyv_ref( obj_t* alpha, - obj_t* xt, - obj_t* x, - obj_t* y, - obj_t* rho, - obj_t* z ); -*/ - #undef GENTPROT3U12 #define GENTPROT3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, varname ) \ \ -void PASTEMAC3(chx,chy,chz,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conjxt, \ conj_t conjx, \ conj_t conjy, \ dim_t m, \ - ctype_x* restrict alpha, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_xy* restrict rho, \ - ctype_z* restrict z, inc_t incz \ + ctype_x* alpha, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy, \ + ctype_xy* rho, \ + ctype_z* z, inc_t incz \ ); INSERT_GENTPROT3U12_BASIC( dotaxpyv_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotaxpyv_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotaxpyv_ref ) -#endif - diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var1.h b/frame/1f/kernels/old/bli_dotxaxpyf_ref_var1.h similarity index 70% rename from frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var1.h rename to frame/1f/kernels/old/bli_dotxaxpyf_ref_var1.h index 21cad7147..e080fa742 100644 --- a/frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var1.h +++ b/frame/1f/kernels/old/bli_dotxaxpyf_ref_var1.h @@ -32,22 +32,11 @@ */ -/* -void bli_dotxaxpyf_ref_var1( obj_t* alpha, - obj_t* at, - obj_t* a, - obj_t* w, - obj_t* x, - obj_t* beta, - obj_t* y, - obj_t* z ); -*/ - #undef GENTPROT3U12 #define GENTPROT3U12( ctype_a, ctype_b, ctype_c, ctype_ab, cha, chb, chc, chab, varname ) \ \ -void PASTEMAC3(cha,chb,chc,varname) \ +void PASTEMAC(cha,varname) \ ( \ conj_t conjat, \ conj_t conja, \ @@ -55,22 +44,14 @@ void PASTEMAC3(cha,chb,chc,varname) \ conj_t conjx, \ dim_t m, \ dim_t b_n, \ - ctype_ab* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_b* restrict w, inc_t incw, \ - ctype_b* restrict x, inc_t incx, \ - ctype_c* restrict beta, \ - ctype_c* restrict y, inc_t incy, \ - ctype_c* restrict z, inc_t incz \ + ctype_ab* alpha, \ + ctype_a* a, inc_t inca, inc_t lda, \ + ctype_b* w, inc_t incw, \ + ctype_b* x, inc_t incx, \ + ctype_c* beta, \ + ctype_c* y, inc_t incy, \ + ctype_c* z, inc_t incz \ ); INSERT_GENTPROT3U12_BASIC( dotxaxpyf_ref_var1 ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotxaxpyf_ref_var1 ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotxaxpyf_ref_var1 ) -#endif - diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var2.h b/frame/1f/kernels/old/bli_dotxaxpyf_ref_var2.h similarity index 70% rename from frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var2.h rename to frame/1f/kernels/old/bli_dotxaxpyf_ref_var2.h index 7d89d8cea..744634963 100644 --- a/frame/1f/dotxaxpyf/bli_dotxaxpyf_ref_var2.h +++ b/frame/1f/kernels/old/bli_dotxaxpyf_ref_var2.h @@ -32,20 +32,11 @@ */ -void bli_dotxaxpyf_ref_var2( obj_t* alpha, - obj_t* at, - obj_t* a, - obj_t* w, - obj_t* x, - obj_t* beta, - obj_t* y, - obj_t* z ); - #undef GENTPROT3U12 #define GENTPROT3U12( ctype_a, ctype_b, ctype_c, ctype_ab, cha, chb, chc, chab, varname ) \ \ -void PASTEMAC3(cha,chb,chc,varname) \ +void PASTEMAC(cha,varname) \ ( \ conj_t conjat, \ conj_t conja, \ @@ -53,22 +44,14 @@ void PASTEMAC3(cha,chb,chc,varname) \ conj_t conjx, \ dim_t m, \ dim_t b_n, \ - ctype_ab* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_b* restrict w, inc_t incw, \ - ctype_b* restrict x, inc_t incx, \ - ctype_c* restrict beta, \ - ctype_c* restrict y, inc_t incy, \ - ctype_c* restrict z, inc_t incz \ + ctype_ab* alpha, \ + ctype_a* a, inc_t inca, inc_t lda, \ + ctype_b* w, inc_t incw, \ + ctype_b* x, inc_t incx, \ + ctype_c* beta, \ + ctype_c* y, inc_t incy, \ + ctype_c* z, inc_t incz \ ); INSERT_GENTPROT3U12_BASIC( dotxaxpyf_ref_var2 ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotxaxpyf_ref_var2 ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotxaxpyf_ref_var2 ) -#endif - diff --git a/frame/1f/dotxf/bli_dotxf_ref.h b/frame/1f/kernels/old/bli_dotxf_ref.h similarity index 76% rename from frame/1f/dotxf/bli_dotxf_ref.h rename to frame/1f/kernels/old/bli_dotxf_ref.h index daad2724a..e2ad196b8 100644 --- a/frame/1f/dotxf/bli_dotxf_ref.h +++ b/frame/1f/kernels/old/bli_dotxf_ref.h @@ -32,38 +32,22 @@ */ -/* -void bli_dotxf_ref( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ); -*/ - #undef GENTPROT3U12 #define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname) \ +void PASTEMAC(chx,varname) \ ( \ conj_t conjat, \ conj_t conjx, \ dim_t m, \ dim_t b_n, \ - ctype_ax* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict beta, \ - ctype_y* restrict y, inc_t incy \ + ctype_ax* alpha, \ + ctype_a* a, inc_t inca, inc_t lda, \ + ctype_x* x, inc_t incx, \ + ctype_y* beta, \ + ctype_y* y, inc_t incy \ ); INSERT_GENTPROT3U12_BASIC( dotxf_ref ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotxf_ref ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotxf_ref ) -#endif - diff --git a/frame/1f/old/axpy2v/bli_axpy2v.c b/frame/1f/old/axpy2v/bli_axpy2v.c new file mode 100644 index 000000000..a5f9a3196 --- /dev/null +++ b/frame/1f/old/axpy2v/bli_axpy2v.c @@ -0,0 +1,184 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*FUNCPTR_T)( + conj_t conjx, + conj_t conjy, + dim_t n, + void* alphax, + void* alphay, + void* x, inc_t incx, + void* y, inc_t incy, + void* z, inc_t incz + ); + +static FUNCPTR_T GENARRAY_MIN(ftypes,axpy2v_void); + + +// +// Define object-based interface. +// +void bli_axpy2v( obj_t* alphax, + obj_t* alphay, + obj_t* x, + obj_t* y, + obj_t* z ) +{ + num_t dt = bli_obj_datatype( *x ); + + conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); + + dim_t n = bli_obj_vector_dim( *x ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t inc_x = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t inc_y = bli_obj_vector_inc( *y ); + + void* buf_z = bli_obj_buffer_at_off( *z ); + inc_t inc_z = bli_obj_vector_inc( *z ); + + obj_t alphax_local; + void* buf_alphax; + + obj_t alphay_local; + void* buf_alphay; + + FUNCPTR_T f = ftypes[dt]; + + if ( bli_error_checking_is_enabled() ) + bli_axpy2v_check( alphax, alphay, x, y, z ); + + // Create local copy-casts of the scalars (and apply internal conjugation + // if needed). + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alphax, + &alphax_local ); + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alphay, + &alphay_local ); + + // Extract the scalar buffers. + buf_alphax = bli_obj_buffer_for_1x1( dt, alphax_local ); + buf_alphay = bli_obj_buffer_for_1x1( dt, alphay_local ); + + // Invoke the void pointer-based function. + f( conjx, + conjy, + n, + buf_alphax, + buf_alphay, + buf_x, inc_x, + buf_y, inc_y, + buf_z, inc_z ); +} + + +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + void* alphax, \ + void* alphay, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* z, inc_t incz \ + ) \ +{ \ + PASTEMAC(ch,kername)( conjx, \ + conjy, \ + n, \ + alphax, \ + alphay, \ + x, incx, \ + y, incy, \ + z, incz ); \ +} + +INSERT_GENTFUNC_BASIC( axpy2v_void, axpy2v ) + + +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alphax, \ + ctype* alphay, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1f_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjx, \ + conjy, \ + n, \ + alphax, \ + alphay, \ + x, incx, \ + y, incy, \ + z, incz ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( axpy2v, BLIS_AXPY2V_KER ) + + diff --git a/frame/1f/axpy2v/bli_axpy2v.h b/frame/1f/old/axpy2v/bli_axpy2v.h similarity index 62% rename from frame/1f/axpy2v/bli_axpy2v.h rename to frame/1f/old/axpy2v/bli_axpy2v.h index 21de24bc3..89f4c145e 100644 --- a/frame/1f/axpy2v/bli_axpy2v.h +++ b/frame/1f/old/axpy2v/bli_axpy2v.h @@ -33,22 +33,21 @@ */ #include "bli_axpy2v_check.h" -#include "bli_axpy2v_kernel.h" #include "bli_axpy2v_ref.h" // // Prototype object-based interface. // -void bli_axpy2v( obj_t* alpha1, - obj_t* alpha2, - obj_t* x, - obj_t* y, - obj_t* z ); +void bli_axpy2v( obj_t* alphax, + obj_t* alphay, + obj_t* x, + obj_t* y, + obj_t* z ) // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -57,40 +56,32 @@ void PASTEMAC(ch,opname)( \ conj_t conjx, \ conj_t conjy, \ dim_t n, \ - ctype* alpha1, \ - ctype* alpha2, \ + void* alphax, \ + void* alphay, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* z, inc_t incz \ + ); + +INSERT_GENTPROT_BASIC( axpy2v_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alphax, \ + ctype* alphay, \ ctype* x, inc_t incx, \ ctype* y, inc_t incy, \ - ctype* z, inc_t incz \ + ctype* z, inc_t incz \ ); INSERT_GENTPROT_BASIC( axpy2v ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, opname ) \ -\ -void PASTEMAC3(chx,chy,chz,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_xy* alpha1, \ - ctype_xy* alpha2, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_z* z, inc_t incz \ - ); - -INSERT_GENTPROT3U12_BASIC( axpy2v ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( axpy2v ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( axpy2v ) -#endif - diff --git a/frame/1f/old/axpyf/bli_axpyf.c b/frame/1f/old/axpyf/bli_axpyf.c new file mode 100644 index 000000000..12e9f6bc8 --- /dev/null +++ b/frame/1f/old/axpyf/bli_axpyf.c @@ -0,0 +1,177 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*FUNCPTR_T)( + conj_t conja, + conj_t conjx, + dim_t m, + dim_t b_n, + void* alpha, + void* a, inc_t inca, inc_t lda, + void* x, inc_t incx, + void* y, inc_t incy + ); + +static FUNCPTR_T GENARRAY_MIN(ftypes,axpyf_void); + + +// +// Define object-based interface. +// +void bli_axpyf( obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* y ) +{ + num_t dt = bli_obj_datatype( *x ); + + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_vector_dim( *y ); + dim_t b_n = bli_obj_vector_dim( *x ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t inc_x = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t inc_y = bli_obj_vector_inc( *y ); + + obj_t alpha_local; + void* buf_alpha; + + FUNCPTR_T f = ftypes[dt]; + + if ( bli_error_checking_is_enabled() ) + bli_axpyf_check( alpha, a, x, y ); + + // Create a local copy-cast of alpha (and apply internal conjugation + // if needed). + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alpha, + &alpha_local ); + + // Extract the scalar buffer. + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); + + // Invoke the void pointer-based function. + f( conja, + conjx, + m, + b_n, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, inc_x, + buf_y, inc_y ); +} + + +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + void* alpha, \ + void* a, inc_t inca, inc_t lda, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ) \ +{ \ + PASTEMAC(ch,kername)( conja, \ + conjx, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + x, incx, \ + y, incy ); \ +} + +INSERT_GENTFUNC_BASIC( axpyf_void, axpyf ) + + +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1f_ker_dt( dt, kerid, &cntx ); \ +\ + f( conja, \ + conjx, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + x, incx, \ + y, incy ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( axpyf, BLIS_AXPYF_KER ) + + diff --git a/frame/1f/axpyf/bli_axpyf.h b/frame/1f/old/axpyf/bli_axpyf.h similarity index 65% rename from frame/1f/axpyf/bli_axpyf.h rename to frame/1f/old/axpyf/bli_axpyf.h index 74b2b9b90..f1d23194c 100644 --- a/frame/1f/axpyf/bli_axpyf.h +++ b/frame/1f/old/axpyf/bli_axpyf.h @@ -33,22 +33,40 @@ */ #include "bli_axpyf_check.h" -#include "bli_axpyf_fusefac.h" -#include "bli_axpyf_kernel.h" #include "bli_axpyf_ref.h" // // Prototype object-based interface. // -void bli_axpyf( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* y ); +void bli_axpyf( obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* y ) // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + void* alpha, \ + void* a, inc_t inca, inc_t lda, \ + void* x, inc_t incx, \ + void* y, inc_t incy \ + ); + +INSERT_GENTPROT_BASIC( axpyf_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -61,36 +79,8 @@ void PASTEMAC(ch,opname)( \ ctype* alpha, \ ctype* a, inc_t inca, inc_t lda, \ ctype* x, inc_t incx, \ - ctype* y, inc_t incy \ + ctype* y, inc_t incy \ ); INSERT_GENTPROT_BASIC( axpyf ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, opname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t inca, inc_t lda, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy \ - ); - -INSERT_GENTPROT3U12_BASIC( axpyf ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( axpyf ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( axpyf ) -#endif - diff --git a/frame/1f/axpy2v/bli_axpy2v_check.c b/frame/1f/old/check/bli_axpy2v_check.c similarity index 89% rename from frame/1f/axpy2v/bli_axpy2v_check.c rename to frame/1f/old/check/bli_axpy2v_check.c index 7f2b2951b..b4eadf112 100644 --- a/frame/1f/axpy2v/bli_axpy2v_check.c +++ b/frame/1f/old/check/bli_axpy2v_check.c @@ -34,8 +34,8 @@ #include "blis.h" -void bli_axpy2v_check( obj_t* alpha1, - obj_t* alpha2, +void bli_axpy2v_check( obj_t* alphax, + obj_t* alphay, obj_t* x, obj_t* y, obj_t* z ) @@ -44,10 +44,10 @@ void bli_axpy2v_check( obj_t* alpha1, // Check object datatypes. - e_val = bli_check_noninteger_object( alpha1 ); + e_val = bli_check_noninteger_object( alphax ); bli_check_error_code( e_val ); - e_val = bli_check_noninteger_object( alpha2 ); + e_val = bli_check_noninteger_object( alphay ); bli_check_error_code( e_val ); e_val = bli_check_floating_object( x ); @@ -61,10 +61,10 @@ void bli_axpy2v_check( obj_t* alpha1, // Check object dimensions. - e_val = bli_check_scalar_object( alpha1 ); + e_val = bli_check_scalar_object( alphax ); bli_check_error_code( e_val ); - e_val = bli_check_scalar_object( alpha2 ); + e_val = bli_check_scalar_object( alphay ); bli_check_error_code( e_val ); e_val = bli_check_vector_object( x ); @@ -84,10 +84,10 @@ void bli_axpy2v_check( obj_t* alpha1, // Check object buffers (for non-NULLness). - e_val = bli_check_object_buffer( alpha1 ); + e_val = bli_check_object_buffer( alphax ); bli_check_error_code( e_val ); - e_val = bli_check_object_buffer( alpha2 ); + e_val = bli_check_object_buffer( alphay ); bli_check_error_code( e_val ); e_val = bli_check_object_buffer( x ); diff --git a/frame/1f/axpy2v/bli_axpy2v_check.h b/frame/1f/old/check/bli_axpy2v_check.h similarity index 95% rename from frame/1f/axpy2v/bli_axpy2v_check.h rename to frame/1f/old/check/bli_axpy2v_check.h index ea1668ee0..8638a64b4 100644 --- a/frame/1f/axpy2v/bli_axpy2v_check.h +++ b/frame/1f/old/check/bli_axpy2v_check.h @@ -32,8 +32,8 @@ */ -void bli_axpy2v_check( obj_t* alpha1, - obj_t* alpha2, +void bli_axpy2v_check( obj_t* alphax, + obj_t* alphay, obj_t* x, obj_t* y, obj_t* z ); diff --git a/frame/1f/axpyf/bli_axpyf_check.c b/frame/1f/old/check/bli_axpyf_check.c similarity index 100% rename from frame/1f/axpyf/bli_axpyf_check.c rename to frame/1f/old/check/bli_axpyf_check.c diff --git a/frame/1f/axpyf/bli_axpyf_check.h b/frame/1f/old/check/bli_axpyf_check.h similarity index 100% rename from frame/1f/axpyf/bli_axpyf_check.h rename to frame/1f/old/check/bli_axpyf_check.h diff --git a/frame/1f/dotaxpyv/bli_dotaxpyv_check.c b/frame/1f/old/check/bli_dotaxpyv_check.c similarity index 100% rename from frame/1f/dotaxpyv/bli_dotaxpyv_check.c rename to frame/1f/old/check/bli_dotaxpyv_check.c diff --git a/frame/1f/dotaxpyv/bli_dotaxpyv_check.h b/frame/1f/old/check/bli_dotaxpyv_check.h similarity index 100% rename from frame/1f/dotaxpyv/bli_dotaxpyv_check.h rename to frame/1f/old/check/bli_dotaxpyv_check.h diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf_check.c b/frame/1f/old/check/bli_dotxaxpyf_check.c similarity index 100% rename from frame/1f/dotxaxpyf/bli_dotxaxpyf_check.c rename to frame/1f/old/check/bli_dotxaxpyf_check.c diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf_check.h b/frame/1f/old/check/bli_dotxaxpyf_check.h similarity index 100% rename from frame/1f/dotxaxpyf/bli_dotxaxpyf_check.h rename to frame/1f/old/check/bli_dotxaxpyf_check.h diff --git a/frame/1f/dotxf/bli_dotxf_check.c b/frame/1f/old/check/bli_dotxf_check.c similarity index 100% rename from frame/1f/dotxf/bli_dotxf_check.c rename to frame/1f/old/check/bli_dotxf_check.c diff --git a/frame/1f/dotxf/bli_dotxf_check.h b/frame/1f/old/check/bli_dotxf_check.h similarity index 100% rename from frame/1f/dotxf/bli_dotxf_check.h rename to frame/1f/old/check/bli_dotxf_check.h diff --git a/frame/1f/old/dotaxpyv/bli_dotaxpyv.c b/frame/1f/old/dotaxpyv/bli_dotaxpyv.c new file mode 100644 index 000000000..280958735 --- /dev/null +++ b/frame/1f/old/dotaxpyv/bli_dotaxpyv.c @@ -0,0 +1,186 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*FUNCPTR_T)( + conj_t conjxt, + conj_t conjx, + conj_t conjy, + dim_t n, + void* alpha, + void* x, inc_t incx, + void* y, inc_t incy, + void* rho, + void* z, inc_t incz + ); + +static FUNCPTR_T GENARRAY_MIN(ftypes,dotaxpyv_void); + + +// +// Define object-based interface. +// +void bli_dotaxpyv( obj_t* alpha, + obj_t* xt, + obj_t* x, + obj_t* y, + obj_t* rho, + obj_t* z ) +{ + num_t dt = bli_obj_datatype( *x ); + + conj_t conjxt = bli_obj_conj_status( *xt ); + conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); + + dim_t n = bli_obj_vector_dim( *x ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t inc_x = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t inc_y = bli_obj_vector_inc( *y ); + + void* buf_z = bli_obj_buffer_at_off( *z ); + inc_t inc_z = bli_obj_vector_inc( *z ); + + void* buf_rho = bli_obj_buffer_at_off( *z ); + + obj_t alpha_local; + void* buf_alpha; + + FUNCPTR_T f = ftypes[dt]; + + if ( bli_error_checking_is_enabled() ) + bli_dotaxpyv_check( alpha, xt, x, y, rho, z ); + + // Create a local copy-cast of alpha (and apply internal conjugation + // if needed). + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alpha, + &alpha_local ); + + // Extract the scalar buffer. + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); + + // Invoke the void pointer-based function. + f( conjxt, + conjx, + conjy, + n, + buf_alpha, + buf_x, inc_x, + buf_y, inc_y, + buf_rho, + buf_z, inc_z ); +} + + +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjxt, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* rho, \ + void* z, inc_t incz \ + ) \ +{ \ + PASTEMAC(ch,kername)( conjxt, \ + conjx, \ + conjy, \ + n, \ + alpha, \ + x, incx, \ + y, incy, \ + rho, \ + z, incz ); \ +} + +INSERT_GENTFUNC_BASIC( dotaxpyv_void, dotaxpyv ) + + +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjxt, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* rho, \ + ctype* z, inc_t incz \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1f_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjxt, \ + conjx, \ + conjy, \ + n, \ + alpha, \ + x, incx, \ + y, incy, \ + rho, \ + z, incz ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( dotaxpyv, BLIS_DOTAXPYV_KER ) + + diff --git a/frame/1f/dotaxpyv/bli_dotaxpyv.h b/frame/1f/old/dotaxpyv/bli_dotaxpyv.h similarity index 62% rename from frame/1f/dotaxpyv/bli_dotaxpyv.h rename to frame/1f/old/dotaxpyv/bli_dotaxpyv.h index 2db74025f..e62a62137 100644 --- a/frame/1f/dotaxpyv/bli_dotaxpyv.h +++ b/frame/1f/old/dotaxpyv/bli_dotaxpyv.h @@ -33,23 +33,21 @@ */ #include "bli_dotaxpyv_check.h" -#include "bli_dotaxpyv_kernel.h" #include "bli_dotaxpyv_ref.h" // // Prototype object-based interface. // -void bli_dotaxpyv( obj_t* alpha, - obj_t* xt, - obj_t* x, - obj_t* y, - obj_t* rho, - obj_t* z ); +void bli_dotaxpyv( obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* rho, + obj_t* z ) // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -58,43 +56,34 @@ void PASTEMAC(ch,opname)( \ conj_t conjxt, \ conj_t conjx, \ conj_t conjy, \ - dim_t m, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* rho, \ + void* z, inc_t incz \ + ); + +INSERT_GENTPROT_BASIC( dotaxpyv_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjxt, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ ctype* alpha, \ ctype* x, inc_t incx, \ ctype* y, inc_t incy, \ ctype* rho, \ - ctype* z, inc_t incz \ + ctype* z, inc_t incz \ ); INSERT_GENTPROT_BASIC( dotaxpyv ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, opname ) \ -\ -void PASTEMAC3(chx,chy,chz,opname)( \ - conj_t conjxt, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype_x* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_xy* rho, \ - ctype_z* z, inc_t incz \ - ); - - -INSERT_GENTPROT3U12_BASIC( dotaxpyv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotaxpyv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotaxpyv ) -#endif - diff --git a/frame/1f/old/dotxaxpyf/bli_dotxaxpyf.c b/frame/1f/old/dotxaxpyf/bli_dotxaxpyf.c new file mode 100644 index 000000000..f0e2123bc --- /dev/null +++ b/frame/1f/old/dotxaxpyf/bli_dotxaxpyf.c @@ -0,0 +1,230 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*FUNCPTR_T)( + conj_t conjat, + conj_t conja, + conj_t conjw, + conj_t conjx, + dim_t m, + dim_t b_n, + void* alpha, + void* a, inc_t inca, inc_t lda, + void* w, inc_t incw, + void* x, inc_t incx, + void* beta, + void* y, inc_t incy, + void* z, inc_t incz + ); + +static FUNCPTR_T GENARRAY_MIN(ftypes,dotxaxpyf_void); + + +// +// Define object-based interface. +// +void bli_dotxaxpyf( obj_t* alpha, + obj_t* at, + obj_t* a, + obj_t* w, + obj_t* x, + obj_t* beta, + obj_t* y, + obj_t* z ) +{ + num_t dt = bli_obj_datatype( *x ); + + conj_t conjat = bli_obj_conj_status( *at ); + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjw = bli_obj_conj_status( *w ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_vector_dim( *z ); + dim_t b_n = bli_obj_vector_dim( *y ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_w = bli_obj_buffer_at_off( *w ); + inc_t inc_w = bli_obj_vector_inc( *w ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t inc_x = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t inc_y = bli_obj_vector_inc( *y ); + + void* buf_z = bli_obj_buffer_at_off( *z ); + inc_t inc_z = bli_obj_vector_inc( *z ); + + obj_t alpha_local; + void* buf_alpha; + + obj_t beta_local; + void* buf_beta; + + FUNCPTR_T f = ftypes[dt]; + + if ( bli_error_checking_is_enabled() ) + bli_dotxaxpyf_check( alpha, at, a, w, x, beta, y, z ); + + // Create local copy-casts of the scalars (and apply internal conjugation + // if needed). + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alpha, + &alpha_local ); + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + beta, + &beta_local ); + + // Extract the scalar buffers. + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); + buf_beta = bli_obj_buffer_for_1x1( dt, beta_local ); + + // Support cases where matrix A requires a transposition. + if ( bli_obj_has_trans( *a ) ) { bli_swap_incs( rs_a, cs_a ); } + + // Invoke the void pointer-based function. + f( conjat, + conja, + conjw, + conjx, + m, + b_n, + buf_alpha, + buf_a, rs_a, cs_a, + buf_w, inc_w, + buf_x, inc_x, + buf_beta, + buf_y, inc_y, + buf_z, inc_z ); +} + + +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + void* alpha, \ + void* a, inc_t inca, inc_t lda, \ + void* w, inc_t incw, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy, \ + void* z, inc_t incz \ + ) \ +{ \ + PASTEMAC(ch,kername)( conjat, \ + conja, \ + conjw, \ + conjx, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + w, incw, \ + x, incx, \ + beta, \ + y, incy, \ + z, incz ); \ +} + +INSERT_GENTFUNC_BASIC( dotxaxpyf_void, dotxaxpyf ) + + +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* w, inc_t incw, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + ctype* z, inc_t incz \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1f_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjat, \ + conja, \ + conjw, \ + conjx, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + w, incw, \ + x, incx, \ + beta, \ + y, incy, \ + z, incz ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( dotxaxpyf, BLIS_DOTXAXPYF_KER ) + + diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf.h b/frame/1f/old/dotxaxpyf/bli_dotxaxpyf.h similarity index 57% rename from frame/1f/dotxaxpyf/bli_dotxaxpyf.h rename to frame/1f/old/dotxaxpyf/bli_dotxaxpyf.h index dcb19c81d..ee1eae704 100644 --- a/frame/1f/dotxaxpyf/bli_dotxaxpyf.h +++ b/frame/1f/old/dotxaxpyf/bli_dotxaxpyf.h @@ -33,27 +33,24 @@ */ #include "bli_dotxaxpyf_check.h" -#include "bli_dotxaxpyf_fusefac.h" -#include "bli_dotxaxpyf_kernel.h" -#include "bli_dotxaxpyf_ref_var1.h" -#include "bli_dotxaxpyf_ref_var2.h" +#include "bli_dotxaxpyf_ref.h" // // Prototype object-based interface. // -void bli_dotxaxpyf( obj_t* alpha, - obj_t* at, - obj_t* a, - obj_t* w, - obj_t* x, - obj_t* beta, - obj_t* y, - obj_t* z ); +void bli_dotxaxpyf( obj_t* alpha, + obj_t* at, + obj_t* a, + obj_t* w, + obj_t* x, + obj_t* beta, + obj_t* y, + obj_t* z ); // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -64,49 +61,40 @@ void PASTEMAC(ch,opname)( \ conj_t conjw, \ conj_t conjx, \ dim_t m, \ - dim_t n, \ + dim_t b_n, \ + void* alpha, \ + void* a, inc_t inca, inc_t lda, \ + void* w, inc_t incw, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy, \ + void* z, inc_t incz \ + ); + +INSERT_GENTPROT_BASIC( dotxaxpyf_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ ctype* alpha, \ ctype* a, inc_t inca, inc_t lda, \ ctype* w, inc_t incw, \ ctype* x, inc_t incx, \ ctype* beta, \ ctype* y, inc_t incy, \ - ctype* z, inc_t incz \ + ctype* z, inc_t incz \ ); INSERT_GENTPROT_BASIC( dotxaxpyf ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_b, ctype_c, ctype_ab, cha, chb, chc, chab, opname ) \ -\ -void PASTEMAC3(cha,chb,chc,opname)( \ - conj_t conjat, \ - conj_t conja, \ - conj_t conjw, \ - conj_t conjx, \ - dim_t m, \ - dim_t n, \ - ctype_ab* alpha, \ - ctype_a* a, inc_t inca, inc_t lda, \ - ctype_b* w, inc_t incw, \ - ctype_b* x, inc_t incx, \ - ctype_c* beta, \ - ctype_c* y, inc_t incy, \ - ctype_c* z, inc_t incz \ - ); - - -INSERT_GENTPROT3U12_BASIC( dotxaxpyf ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotxaxpyf ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotxaxpyf ) -#endif - diff --git a/frame/1f/old/dotxf/bli_dotxf.c b/frame/1f/old/dotxf/bli_dotxf.c new file mode 100644 index 000000000..4ac1e41fe --- /dev/null +++ b/frame/1f/old/dotxf/bli_dotxf.c @@ -0,0 +1,195 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*FUNCPTR_T)( + conj_t conjat, + conj_t conjx, + dim_t m, + dim_t b_n, + void* alpha, + void* a, inc_t inca, inc_t lda, + void* x, inc_t incx, + void* beta, + void* y, inc_t incy + ); + +static FUNCPTR_T GENARRAY_MIN(ftypes,dotxf_void); + + +// +// Define object-based interface. +// +void bli_dotxf( obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y ) +{ + num_t dt = bli_obj_datatype( *x ); + + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_vector_dim( *y ); + dim_t b_n = bli_obj_vector_dim( *x ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t inc_x = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t inc_y = bli_obj_vector_inc( *y ); + + obj_t alpha_local; + void* buf_alpha; + + obj_t beta_local; + void* buf_beta; + + FUNCPTR_T f = ftypes[dt]; + + if ( bli_error_checking_is_enabled() ) + bli_dotxf_check( alpha, a, x, beta, y ); + + // Create local copy-casts of the scalars (and apply internal conjugation + // if needed). + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + alpha, + &alpha_local ); + bli_obj_scalar_init_detached_copy_of( dt, + BLIS_NO_CONJUGATE, + beta, + &beta_local ); + + // Extract the scalar buffers. + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); + buf_beta = bli_obj_buffer_for_1x1( dt, beta_local ); + + // Support cases where matrix A requires a transposition. + if ( bli_obj_has_trans( *a ) ) { bli_swap_incs( rs_a, cs_a ); } + + // Invoke the void pointer-based function. + f( conja, + conjx, + m, + b_n, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, inc_x, + buf_beta, + buf_y, inc_y ); +} + + +// +// Define BLAS-like interfaces with void pointer operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjat, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + void* alpha, \ + void* a, inc_t inca, inc_t lda, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy \ + ) \ +{ \ + PASTEMAC(ch,kername)( conjat, \ + conjx, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + x, incx, \ + beta, \ + y, incy ); \ +} + +INSERT_GENTFUNC_BASIC( dotxf_void, dotxf ) + + +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kerid ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjat, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype* alpha, \ + ctype* a, inc_t inca, inc_t lda, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ + cntx_t* cntx; \ +\ + PASTECH2(ch,opname,_ker_t) f; \ +\ + PASTEMAC(opname,_cntx_init)( &cntx ); \ +\ + f = bli_cntx_get_l1f_ker_dt( dt, kerid, &cntx ); \ +\ + f( conjat, \ + conjx, \ + m, \ + b_n, \ + alpha, \ + a, inca, lda, \ + x, incx, \ + beta, \ + y, incy ); \ +\ + PASTEMAC(opname,_cntx_finalize)( &cntx ); \ +} + +INSERT_GENTFUNC_BASIC( dotxf, BLIS_DOTXF_KER ) + + diff --git a/frame/1f/dotxf/bli_dotxf.h b/frame/1f/old/dotxf/bli_dotxf.h similarity index 63% rename from frame/1f/dotxf/bli_dotxf.h rename to frame/1f/old/dotxf/bli_dotxf.h index 0b1b57b0a..d705ba2ff 100644 --- a/frame/1f/dotxf/bli_dotxf.h +++ b/frame/1f/old/dotxf/bli_dotxf.h @@ -33,23 +33,42 @@ */ #include "bli_dotxf_check.h" -#include "bli_dotxf_fusefac.h" -#include "bli_dotxf_kernel.h" #include "bli_dotxf_ref.h" // // Prototype object-based interface. // -void bli_dotxf( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ); +void bli_dotxf( obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y ) // -// Prototype BLAS-like interfaces with homogeneous-typed operands. +// Prototype BLAS-like interfaces with void pointer operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjat, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + void* alpha, \ + void* a, inc_t inca, inc_t lda, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy \ + ); + +INSERT_GENTPROT_BASIC( dotxf_void ) + + +// +// Prototype BLAS-like interfaces with typed operands. // #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ @@ -63,38 +82,8 @@ void PASTEMAC(ch,opname)( \ ctype* a, inc_t inca, inc_t lda, \ ctype* x, inc_t incx, \ ctype* beta, \ - ctype* y, inc_t incy \ + ctype* y, inc_t incy \ ); INSERT_GENTPROT_BASIC( dotxf ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, opname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - conj_t conjat, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t inca, inc_t lda, \ - ctype_x* x, inc_t incx, \ - ctype_y* beta, \ - ctype_y* y, inc_t incy \ - ); - - -INSERT_GENTPROT3U12_BASIC( dotxf ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( dotxf ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( dotxf ) -#endif - diff --git a/frame/1m/bli_l1m.h b/frame/1m/bli_l1m.h new file mode 100644 index 000000000..ff9c98459 --- /dev/null +++ b/frame/1m/bli_l1m.h @@ -0,0 +1,56 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "bli_l1m_cntx.h" +#include "bli_l1m_check.h" + +#include "bli_l1m_ft.h" + +// Prototype object APIs with and without contexts. +#include "bli_oapi_w_cntx.h" +#include "bli_l1m_oapi.h" +#include "bli_oapi_wo_cntx.h" +#include "bli_l1m_oapi.h" + +#include "bli_l1m_tapi.h" +#include "bli_l1m_unb_var1.h" + +// Pack-related +#include "bli_packm.h" +#include "bli_unpackm.h" + +// Other +#include "bli_scalm_cntl.h" +#include "bli_scalm_int.h" + diff --git a/frame/1m/bli_l1m_check.c b/frame/1m/bli_l1m_check.c new file mode 100644 index 000000000..d2ae6c5c4 --- /dev/null +++ b/frame/1m/bli_l1m_check.c @@ -0,0 +1,212 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define object-based check functions. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* y \ + ) \ +{ \ + bli_l1m_xy_check( x, y ); \ +} + +GENFRONT( addm ) +GENFRONT( copym ) +GENFRONT( subm ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + ) \ +{ \ + bli_l1m_axy_check( alpha, x, y ); \ +} + +GENFRONT( axpym ) +GENFRONT( scal2m ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + ) \ +{ \ + bli_l1m_ax_check( alpha, x ); \ +} + +GENFRONT( scalm ) +GENFRONT( setm ) + + +// ----------------------------------------------------------------------------- + +void bli_l1m_xy_check + ( + obj_t* x, + obj_t* y + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_matrix_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_conformal_dims( x, y ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); +} + +void bli_l1m_axy_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_conformal_dims( x, y ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); +} + +void bli_l1m_ax_check + ( + obj_t* alpha, + obj_t* x + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( x ); + bli_check_error_code( e_val ); + + // Check object properties. + + //e_val = bli_check_nonunit_diag( x ); + //bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); +} + diff --git a/frame/1m/bli_l1m_check.h b/frame/1m/bli_l1m_check.h new file mode 100644 index 000000000..5556185be --- /dev/null +++ b/frame/1m/bli_l1m_check.h @@ -0,0 +1,101 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based check functions. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* y \ + ); + +GENPROT( addm ) +GENPROT( copym ) +GENPROT( subm ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + ); + +GENPROT( axpym ) +GENPROT( scal2m ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + ); + +GENPROT( scalm ) +GENPROT( setm ) + + +// ----------------------------------------------------------------------------- + +void bli_l1m_xy_check + ( + obj_t* x, + obj_t* y + ); + +void bli_l1m_axy_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y + ); + +void bli_l1m_ax_check + ( + obj_t* alpha, + obj_t* x + ); + diff --git a/frame/1m/bli_l1m_cntx.c b/frame/1m/bli_l1m_cntx.c new file mode 100644 index 000000000..8569416fd --- /dev/null +++ b/frame/1m/bli_l1m_cntx.c @@ -0,0 +1,83 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define context initialization functions. +// + +#undef GENFRONT +#define GENFRONT( opname, depname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernel dependencies. */ \ + PASTEMAC(depname,_cntx_init)( cntx ); \ +} \ +\ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( addm, addv ) +GENFRONT( axpym, axpyv ) +GENFRONT( scalm, scalv ) +GENFRONT( setm, setv ) +GENFRONT( subm, subv ) + + +#undef GENFRONT +#define GENFRONT( opname, depname1, depname2 ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernel dependencies. */ \ + PASTEMAC(depname1,_cntx_init)( cntx ); \ + PASTEMAC(depname2,_cntx_init)( cntx ); \ +} \ +\ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( copym, copyv, setv ) +GENFRONT( scal2m, scal2v, setv ) + diff --git a/frame/1m/bli_l1m_cntx.h b/frame/1m/bli_l1m_cntx.h new file mode 100644 index 000000000..46524fa0b --- /dev/null +++ b/frame/1m/bli_l1m_cntx.h @@ -0,0 +1,53 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype context initialization functions. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ); \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ); + +GENPROT( addm ) +GENPROT( axpym ) +GENPROT( copym ) +GENPROT( scalm ) +GENPROT( scal2m ) +GENPROT( setm ) +GENPROT( subm ) + diff --git a/frame/1/dotv/bli_dotv_kernel.h b/frame/1m/bli_l1m_ft.h similarity index 66% rename from frame/1/dotv/bli_dotv_kernel.h rename to frame/1m/bli_l1m_ft.h index c1a217361..381f18513 100644 --- a/frame/1/dotv/bli_dotv_kernel.h +++ b/frame/1m/bli_l1m_ft.h @@ -32,34 +32,43 @@ */ -void bli_dotv_kernel( obj_t* x, - obj_t* y, - obj_t* rho ); +#ifndef BLIS_L1M_FT_H +#define BLIS_L1M_FT_H // -// Prototype the void pointer kernel wrappers. +// -- Level-1m function types -------------------------------------------------- // -#undef GENTPROT3 -#define GENTPROT3( ctype_x, ctype_y, ctype_r, chx, chy, chr, varname ) \ +// packm + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ \ -void PASTEMAC3(chx,chy,chr,varname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* rho \ - ); +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( packm ) + + -INSERT_GENTPROT3_BASIC( dotv_kernel_void ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3_MIX_D( dotv_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3_MIX_P( dotv_kernel_void ) #endif diff --git a/frame/1m/bli_l1m_oapi.c b/frame/1m/bli_l1m_oapi.c new file mode 100644 index 000000000..e467019c7 --- /dev/null +++ b/frame/1m/bli_l1m_oapi.c @@ -0,0 +1,286 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +// Guard the function definitions so that they are only compiled when +// #included from files that define the object API macros. +#ifdef BLIS_ENABLE_OAPI + +// +// Define object-based interfaces. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + diag_t diagx = bli_obj_diag( *x ); \ + uplo_t uplox = bli_obj_uplo( *x ); \ + trans_t transx = bli_obj_conjtrans_status( *x ); \ + dim_t m = bli_obj_length( *y ); \ + dim_t n = bli_obj_width( *y ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t rs_y = bli_obj_row_stride( *y ); \ + inc_t cs_y = bli_obj_col_stride( *y ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x, y ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_13 \ + ( \ + dt, \ + opname, \ + diagoffx, \ + diagx, \ + uplox, \ + transx, \ + m, \ + n, \ + buf_x, rs_x, cs_x, \ + buf_y, rs_y, cs_y, \ + cntx \ + ); \ +} + +GENFRONT( addm ) +GENFRONT( copym ) +GENFRONT( subm ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + diag_t diagx = bli_obj_diag( *x ); \ + uplo_t uplox = bli_obj_uplo( *x ); \ + trans_t transx = bli_obj_conjtrans_status( *x ); \ + dim_t m = bli_obj_length( *y ); \ + dim_t n = bli_obj_width( *y ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t rs_y = bli_obj_row_stride( *y ); \ + inc_t cs_y = bli_obj_col_stride( *y ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x, y ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_14 \ + ( \ + dt, \ + opname, \ + diagoffx, \ + diagx, \ + uplox, \ + transx, \ + m, \ + n, \ + buf_alpha, \ + buf_x, rs_x, cs_x, \ + buf_y, rs_y, cs_y, \ + cntx \ + ); \ +} + +GENFRONT( axpym ) +GENFRONT( scal2m ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + /* conj_t conjalpha = bli_obj_conj_status( *alpha ); */ \ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + diag_t diagx = bli_obj_diag( *x ); \ + uplo_t uplox = bli_obj_uplo( *x ); \ + dim_t m = bli_obj_length( *x ); \ + dim_t n = bli_obj_width( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ + obj_t x_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x ); \ +\ + /* Alias x to x_local so we can apply alpha if it is non-unit. */ \ + bli_obj_alias_to( *x, x_local ); \ +\ + /* If alpha is non-unit, apply it to the scalar attached to x. */ \ + if ( !bli_obj_equals( alpha, &BLIS_ONE ) ) \ + { \ + /* Create a local copy-cast of alpha (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ +\ + bli_obj_scalar_apply_scalar( &alpha_local, &x_local ); \ + } \ +\ + /* Grab the address of the internal scalar buffer for the scalar + attached to x. */ \ + buf_alpha = bli_obj_internal_scalar_buffer( x_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_11 \ + ( \ + dt, \ + opname, \ + BLIS_NO_CONJUGATE, /* internal conjugation applied during copy-cast. */ \ + diagoffx, \ + diagx, \ + uplox, \ + m, \ + n, \ + buf_alpha, \ + buf_x, rs_x, cs_x, \ + cntx \ + ); \ +} + +GENFRONT( scalm ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + /* conj_t conjalpha = bli_obj_conj_status( *alpha ); */ \ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + diag_t diagx = bli_obj_diag( *x ); \ + uplo_t uplox = bli_obj_uplo( *x ); \ + dim_t m = bli_obj_length( *x ); \ + dim_t n = bli_obj_width( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_11 \ + ( \ + dt, \ + opname, \ + BLIS_NO_CONJUGATE, /* internal conjugation applied during copy-cast. */ \ + diagoffx, \ + diagx, \ + uplox, \ + m, \ + n, \ + buf_alpha, \ + buf_x, rs_x, cs_x, \ + cntx \ + ); \ +} + +GENFRONT( setm ) + + +#endif + diff --git a/frame/1m/bli_l1m_oapi.h b/frame/1m/bli_l1m_oapi.h new file mode 100644 index 000000000..98d9b06e7 --- /dev/null +++ b/frame/1m/bli_l1m_oapi.h @@ -0,0 +1,82 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( addm ) +GENPROT( copym ) +GENPROT( subm ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( axpym ) +GENPROT( scal2m ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( scalm ) +GENPROT( setm ) + diff --git a/frame/1m/bli_l1m_oapi_wc.c b/frame/1m/bli_l1m_oapi_wc.c new file mode 100644 index 000000000..0e9aa5c7b --- /dev/null +++ b/frame/1m/bli_l1m_oapi_wc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-aware. +#include "bli_oapi_w_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l1m_oapi.c" + diff --git a/frame/1m/bli_l1m_oapi_woc.c b/frame/1m/bli_l1m_oapi_woc.c new file mode 100644 index 000000000..74d4aed5b --- /dev/null +++ b/frame/1m/bli_l1m_oapi_woc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-less. +#include "bli_oapi_wo_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l1m_oapi.c" + diff --git a/frame/1m/bli_l1m_tapi.c b/frame/1m/bli_l1m_tapi.c new file mode 100644 index 000000000..c4dc5f9a8 --- /dev/null +++ b/frame/1m/bli_l1m_tapi.c @@ -0,0 +1,375 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, auxker ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + diagoffx, \ + diagx, \ + uplox, \ + transx, \ + m, \ + n, \ + x, rs_x, cs_x, \ + y, rs_y, cs_y, \ + cntx_p \ + ); \ +\ + /* When the diagonal of an upper- or lower-stored matrix is unit, + we handle it with a separate post-processing step. */ \ + if ( bli_is_upper_or_lower( uplox ) && \ + bli_is_unit_diag( diagx ) ) \ + { \ + PASTEMAC(ch,auxker) \ + ( \ + diagoffx, \ + diagx, \ + transx, \ + m, \ + n, \ + x, rs_x, cs_x, \ + y, rs_y, cs_y, \ + cntx_p \ + ); \ + } \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC( addm, addd ) +INSERT_GENTFUNC_BASIC( subm, subd ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + diagoffx, \ + diagx, \ + uplox, \ + transx, \ + m, \ + n, \ + x, rs_x, cs_x, \ + y, rs_y, cs_y, \ + cntx_p \ + ); \ +\ + /* When the diagonal of an upper- or lower-stored matrix is unit, + we handle it with a separate post-processing step. */ \ + if ( bli_is_upper_or_lower( uplox ) && \ + bli_is_unit_diag( diagx ) ) \ + { \ + doff_t diagoffy = diagoffx; \ + ctype* one = PASTEMAC(ch,1); \ +\ + if ( bli_does_trans( transx ) ) \ + bli_negate_diag_offset( diagoffy ); \ +\ + PASTEMAC(ch,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffy, \ + m, \ + n, \ + one, \ + y, rs_y, cs_y, \ + cntx_p \ + ); \ + } \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC0( copym ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + /* If alpha is zero, then the entire operation is a no-op. */ \ + if ( PASTEMAC(ch,eq0)( *alpha ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + diagoffx, \ + diagx, \ + uplox, \ + transx, \ + m, \ + n, \ + alpha, \ + x, rs_x, cs_x, \ + y, rs_y, cs_y, \ + cntx_p \ + ); \ +\ + /* When the diagonal of an upper- or lower-stored matrix is unit, + we handle it with a separate post-processing step. */ \ + if ( bli_is_upper_or_lower( uplox ) && \ + bli_is_unit_diag( diagx ) ) \ + { \ + PASTEMAC(ch,axpyd) \ + ( \ + diagoffx, \ + diagx, \ + transx, \ + m, \ + n, \ + alpha, \ + x, rs_x, cs_x, \ + y, rs_y, cs_y, \ + cntx_p \ + ); \ + } \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC0( axpym ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* If alpha is zero, then we set the output matrix to zero. This + seemingly minor optimization is important because it will clear + any NaNs and Infs in x that would otherwise propogate. */ \ + if ( PASTEMAC(ch,eq0)( *alpha ) ) \ + { \ + PASTEMAC(ch,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffx, \ + diagx, \ + uplox, \ + m, \ + n, \ + alpha, \ + y, rs_y, cs_y, \ + cntx_p \ + ); \ + return; \ + } \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + diagoffx, \ + diagx, \ + uplox, \ + transx, \ + m, \ + n, \ + alpha, \ + x, rs_x, cs_x, \ + y, rs_y, cs_y, \ + cntx_p \ + ); \ +\ + /* When the diagonal of an upper- or lower-stored matrix is unit, + we handle it with a separate post-processing step. */ \ + if ( bli_is_upper_or_lower( uplox ) && \ + bli_is_unit_diag( diagx ) ) \ + { \ + doff_t diagoffy = diagoffx; \ +\ + if ( bli_does_trans( transx ) ) \ + bli_negate_diag_offset( diagoffy ); \ +\ + PASTEMAC(ch,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffy, \ + m, \ + n, \ + alpha, \ + y, rs_y, cs_y, \ + cntx_p \ + ); \ + } \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC0( scal2m ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjalpha, \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + conjalpha, \ + diagoffx, \ + diagx, \ + uplox, \ + m, \ + n, \ + alpha, \ + x, rs_x, cs_x, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC0( scalm ) +INSERT_GENTFUNC_BASIC0( setm ) + diff --git a/frame/1m/bli_l1m_tapi.h b/frame/1m/bli_l1m_tapi.h new file mode 100644 index 000000000..4b0551995 --- /dev/null +++ b/frame/1m/bli_l1m_tapi.h @@ -0,0 +1,100 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( addm ) +INSERT_GENTPROT_BASIC( copym ) +INSERT_GENTPROT_BASIC( subm ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( axpym ) +INSERT_GENTPROT_BASIC( scal2m ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjalpha, \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( scalm ) +INSERT_GENTPROT_BASIC( setm ) + diff --git a/frame/1m/bli_l1m_unb_var1.c b/frame/1m/bli_l1m_unb_var1.c new file mode 100644 index 000000000..3f7aac19e --- /dev/null +++ b/frame/1m/bli_l1m_unb_var1.c @@ -0,0 +1,368 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype* x1; \ + ctype* y1; \ + uplo_t uplox_eff; \ + conj_t conjx; \ + dim_t n_iter; \ + dim_t n_elem, n_elem_max; \ + inc_t ldx, incx; \ + inc_t ldy, incy; \ + dim_t j, i; \ + dim_t ij0, n_shift; \ +\ + /* Set various loop parameters. */ \ + bli_set_dims_incs_uplo_2m( diagoffx, diagx, transx, \ + uplox, m, n, rs_x, cs_x, rs_y, cs_y, \ + uplox_eff, n_elem_max, n_iter, incx, ldx, incy, ldy, \ + ij0, n_shift ); \ +\ + if ( bli_is_zeros( uplox_eff ) ) return; \ +\ + /* Extract the conjugation component from the transx parameter. */ \ + conjx = bli_extract_conj( transx ); \ +\ + /* Query the kernel needed for this operation. */ \ + PASTECH2(ch,kername,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx ); \ +\ + /* Handle dense and upper/lower storage cases separately. */ \ + if ( bli_is_dense( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = n_elem_max; \ +\ + x1 = x + (j )*ldx + (0 )*incx; \ + y1 = y + (j )*ldy + (0 )*incy; \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjx, \ + n_elem, \ + x1, incx, \ + y1, incy, \ + cntx \ + ); \ + } \ + } \ + else \ + { \ + if ( bli_is_upper( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = bli_min( n_shift + j + 1, n_elem_max ); \ +\ + x1 = x + (ij0+j )*ldx + (0 )*incx; \ + y1 = y + (ij0+j )*ldy + (0 )*incy; \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjx, \ + n_elem, \ + x1, incx, \ + y1, incy, \ + cntx \ + ); \ + } \ + } \ + else if ( bli_is_lower( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + i = bli_max( 0, ( doff_t )j - ( doff_t )n_shift ); \ + n_elem = n_elem_max - i; \ +\ + x1 = x + (j )*ldx + (ij0+i )*incx; \ + y1 = y + (j )*ldy + (ij0+i )*incy; \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjx, \ + n_elem, \ + x1, incx, \ + y1, incy, \ + cntx \ + ); \ + } \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC2( addm_unb_var1, addv, BLIS_ADDV_KER ) +INSERT_GENTFUNC_BASIC2( copym_unb_var1, copyv, BLIS_COPYV_KER ) +INSERT_GENTFUNC_BASIC2( subm_unb_var1, subv, BLIS_SUBV_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype* x1; \ + ctype* y1; \ + uplo_t uplox_eff; \ + conj_t conjx; \ + dim_t n_iter; \ + dim_t n_elem, n_elem_max; \ + inc_t ldx, incx; \ + inc_t ldy, incy; \ + dim_t j, i; \ + dim_t ij0, n_shift; \ +\ + /* Set various loop parameters. */ \ + bli_set_dims_incs_uplo_2m( diagoffx, diagx, transx, \ + uplox, m, n, rs_x, cs_x, rs_y, cs_y, \ + uplox_eff, n_elem_max, n_iter, incx, ldx, incy, ldy, \ + ij0, n_shift ); \ +\ + if ( bli_is_zeros( uplox_eff ) ) return; \ +\ + /* Extract the conjugation component from the transx parameter. */ \ + conjx = bli_extract_conj( transx ); \ +\ + /* Query the kernel needed for this operation. */ \ + PASTECH2(ch,kername,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx ); \ +\ + /* Handle dense and upper/lower storage cases separately. */ \ + if ( bli_is_dense( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = n_elem_max; \ +\ + x1 = x + (j )*ldx + (0 )*incx; \ + y1 = y + (j )*ldy + (0 )*incy; \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjx, \ + n_elem, \ + alpha, \ + x1, incx, \ + y1, incy, \ + cntx \ + ); \ + } \ + } \ + else \ + { \ + if ( bli_is_upper( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = bli_min( n_shift + j + 1, n_elem_max ); \ +\ + x1 = x + (ij0+j )*ldx + (0 )*incx; \ + y1 = y + (ij0+j )*ldy + (0 )*incy; \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjx, \ + n_elem, \ + alpha, \ + x1, incx, \ + y1, incy, \ + cntx \ + ); \ + } \ + } \ + else if ( bli_is_lower( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + i = bli_max( 0, ( doff_t )j - ( doff_t )n_shift ); \ + n_elem = n_elem_max - i; \ +\ + x1 = x + (j )*ldx + (ij0+i )*incx; \ + y1 = y + (j )*ldy + (ij0+i )*incy; \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjx, \ + n_elem, \ + alpha, \ + x1, incx, \ + y1, incy, \ + cntx \ + ); \ + } \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC2( axpym_unb_var1, axpyv, BLIS_AXPYV_KER ) +INSERT_GENTFUNC_BASIC2( scal2m_unb_var1, scal2v, BLIS_SCAL2V_KER ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, kername, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjalpha, \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype* x1; \ + uplo_t uplox_eff; \ + dim_t n_iter; \ + dim_t n_elem, n_elem_max; \ + inc_t ldx, incx; \ + dim_t j, i; \ + dim_t ij0, n_shift; \ +\ + /* Set various loop parameters. */ \ + bli_set_dims_incs_uplo_1m( diagoffx, diagx, \ + uplox, m, n, rs_x, cs_x, \ + uplox_eff, n_elem_max, n_iter, incx, ldx, \ + ij0, n_shift ); \ +\ + if ( bli_is_zeros( uplox_eff ) ) return; \ +\ + /* Query the kernel needed for this operation. */ \ + PASTECH2(ch,kername,_ft) f = bli_cntx_get_l1v_ker_dt( dt, kerid, cntx ); \ +\ + /* Handle dense and upper/lower storage cases separately. */ \ + if ( bli_is_dense( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = n_elem_max; \ +\ + x1 = x + (j )*ldx + (0 )*incx; \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjalpha, \ + n_elem, \ + alpha, \ + x1, incx, \ + cntx \ + ); \ + } \ + } \ + else \ + { \ + if ( bli_is_upper( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = bli_min( n_shift + j + 1, n_elem_max ); \ +\ + x1 = x + (ij0+j )*ldx + (0 )*incx; \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjalpha, \ + n_elem, \ + alpha, \ + x1, incx, \ + cntx \ + ); \ + } \ + } \ + else if ( bli_is_lower( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + i = bli_max( 0, ( doff_t )j - ( doff_t )n_shift ); \ + n_elem = n_elem_max - i; \ +\ + x1 = x + (j )*ldx + (ij0+i )*incx; \ +\ + /* Invoke the kernel with the appropriate parameters. */ \ + f( \ + conjalpha, \ + n_elem, \ + alpha, \ + x1, incx, \ + cntx \ + ); \ + } \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC2( scalm_unb_var1, scalv, BLIS_SCALV_KER ) +INSERT_GENTFUNC_BASIC2( setm_unb_var1, setv, BLIS_SETV_KER ) + diff --git a/frame/1m/bli_l1m_unb_var1.h b/frame/1m/bli_l1m_unb_var1.h new file mode 100644 index 000000000..7a20d23ed --- /dev/null +++ b/frame/1m/bli_l1m_unb_var1.h @@ -0,0 +1,100 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( addm ) +INSERT_GENTPROT_BASIC( copym ) +INSERT_GENTPROT_BASIC( subm ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + trans_t transx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype* y, inc_t rs_y, inc_t cs_y, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( axpym ) +INSERT_GENTPROT_BASIC( scal2m ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + conj_t conjalpha, \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( scalm ) +INSERT_GENTPROT_BASIC( setm ) + diff --git a/frame/1m/addm/bli_addm.c b/frame/1m/old/addm/bli_addm.c similarity index 100% rename from frame/1m/addm/bli_addm.c rename to frame/1m/old/addm/bli_addm.c diff --git a/frame/1m/addm/bli_addm.h b/frame/1m/old/addm/bli_addm.h similarity index 100% rename from frame/1m/addm/bli_addm.h rename to frame/1m/old/addm/bli_addm.h diff --git a/frame/1m/addm/bli_addm_check.c b/frame/1m/old/addm/bli_addm_check.c similarity index 100% rename from frame/1m/addm/bli_addm_check.c rename to frame/1m/old/addm/bli_addm_check.c diff --git a/frame/1m/addm/bli_addm_check.h b/frame/1m/old/addm/bli_addm_check.h similarity index 100% rename from frame/1m/addm/bli_addm_check.h rename to frame/1m/old/addm/bli_addm_check.h diff --git a/frame/1m/addm/bli_addm_unb_var1.c b/frame/1m/old/addm/bli_addm_unb_var1.c similarity index 98% rename from frame/1m/addm/bli_addm_unb_var1.c rename to frame/1m/old/addm/bli_addm_unb_var1.c index 075ee108a..5379c5029 100644 --- a/frame/1m/addm/bli_addm_unb_var1.c +++ b/frame/1m/old/addm/bli_addm_unb_var1.c @@ -61,7 +61,8 @@ static FUNCPTR_T GENARRAY2_MIN(ftypes,addm_unb_var1); void bli_addm_unb_var1( obj_t* x, - obj_t* y ) + obj_t* y, + cntx_t* cntx ) { num_t dt_x = bli_obj_datatype( *x ); num_t dt_y = bli_obj_datatype( *y ); diff --git a/frame/1m/addm/bli_addm_unb_var1.h b/frame/1m/old/addm/bli_addm_unb_var1.h similarity index 97% rename from frame/1m/addm/bli_addm_unb_var1.h rename to frame/1m/old/addm/bli_addm_unb_var1.h index f0cd79599..9c8ab3f4a 100644 --- a/frame/1m/addm/bli_addm_unb_var1.h +++ b/frame/1m/old/addm/bli_addm_unb_var1.h @@ -32,7 +32,7 @@ */ -void bli_addm_unb_var1( obj_t* x, obj_t* y ); +void bli_addm_unb_var1( obj_t* x, obj_t* y, cntx_t* cntx ); #undef GENTPROT2 #define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ diff --git a/frame/1m/axpym/bli_axpym.c b/frame/1m/old/axpym/bli_axpym.c similarity index 100% rename from frame/1m/axpym/bli_axpym.c rename to frame/1m/old/axpym/bli_axpym.c diff --git a/frame/1m/axpym/bli_axpym.h b/frame/1m/old/axpym/bli_axpym.h similarity index 100% rename from frame/1m/axpym/bli_axpym.h rename to frame/1m/old/axpym/bli_axpym.h diff --git a/frame/1m/axpym/bli_axpym_check.c b/frame/1m/old/axpym/bli_axpym_check.c similarity index 100% rename from frame/1m/axpym/bli_axpym_check.c rename to frame/1m/old/axpym/bli_axpym_check.c diff --git a/frame/1m/axpym/bli_axpym_check.h b/frame/1m/old/axpym/bli_axpym_check.h similarity index 100% rename from frame/1m/axpym/bli_axpym_check.h rename to frame/1m/old/axpym/bli_axpym_check.h diff --git a/frame/1m/axpym/bli_axpym_unb_var1.c b/frame/1m/old/axpym/bli_axpym_unb_var1.c similarity index 99% rename from frame/1m/axpym/bli_axpym_unb_var1.c rename to frame/1m/old/axpym/bli_axpym_unb_var1.c index 50c9b574c..41d09c6e5 100644 --- a/frame/1m/axpym/bli_axpym_unb_var1.c +++ b/frame/1m/old/axpym/bli_axpym_unb_var1.c @@ -63,7 +63,8 @@ static FUNCPTR_T GENARRAY3_MIN(ftypes,axpym_unb_var1); void bli_axpym_unb_var1( obj_t* alpha, obj_t* x, - obj_t* y ) + obj_t* y, + cntx_t* cntx ) { num_t dt_x = bli_obj_datatype( *x ); num_t dt_y = bli_obj_datatype( *y ); diff --git a/frame/1m/axpym/bli_axpym_unb_var1.h b/frame/1m/old/axpym/bli_axpym_unb_var1.h similarity index 97% rename from frame/1m/axpym/bli_axpym_unb_var1.h rename to frame/1m/old/axpym/bli_axpym_unb_var1.h index 61393b111..9bd2bbef1 100644 --- a/frame/1m/axpym/bli_axpym_unb_var1.h +++ b/frame/1m/old/axpym/bli_axpym_unb_var1.h @@ -32,7 +32,7 @@ */ -void bli_axpym_unb_var1( obj_t* alpha, obj_t* x, obj_t* y ); +void bli_axpym_unb_var1( obj_t* alpha, obj_t* x, obj_t* y, cntx_t* cntx ); #undef GENTPROT3 diff --git a/frame/1m/old/bli_scalm.c b/frame/1m/old/bli_scalm.c new file mode 100644 index 000000000..d5431f8e0 --- /dev/null +++ b/frame/1m/old/bli_scalm.c @@ -0,0 +1,83 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +extern scalm_t* scalm_cntl; + + +// +// Define object-based interface. +// +void bli_scalm( obj_t* alpha, + obj_t* x ) +{ + if ( bli_error_checking_is_enabled() ) + bli_scalm_check( alpha, x ); + + bli_scalm_int( alpha, + x, + scalm_cntl ); +} + + +// +// Define BLAS-like interfaces with typed operands. +// +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, varname ) \ +\ +void PASTEMAC(ch,opname)( \ + conj_t conjalpha, \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t rs_x, inc_t cs_x \ + ) \ +{ \ + PASTEMAC(ch,varname)( conjalpha, \ + diagoffx, \ + diagx, \ + uplox, \ + m, \ + n, \ + alpha, \ + x, rs_x, cs_x ); \ +} + +INSERT_GENTFUNC_BASIC( scalm, scalm_unb_var1 ) + diff --git a/frame/1m/scalm/bli_scalm.h b/frame/1m/old/bli_scalm.h similarity index 72% rename from frame/1m/scalm/bli_scalm.h rename to frame/1m/old/bli_scalm.h index b17678a1b..21a71d6b4 100644 --- a/frame/1m/scalm/bli_scalm.h +++ b/frame/1m/old/bli_scalm.h @@ -42,7 +42,7 @@ // // Prototype object-based interface. // -void bli_scalm( obj_t* beta, +void bli_scalm( obj_t* alpha, obj_t* x ); @@ -53,41 +53,14 @@ void bli_scalm( obj_t* beta, #define GENTPROT( ctype, ch, opname ) \ \ void PASTEMAC(ch,opname)( \ - conj_t conjbeta, \ + conj_t conjalpha, \ doff_t diagoffx, \ uplo_t uplox, \ dim_t m, \ dim_t n, \ - ctype* beta, \ + ctype* alpha, \ ctype* x, inc_t rs_x, inc_t cs_x \ ); INSERT_GENTPROT_BASIC( scalm ) - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT2 -#define GENTPROT2( ctype_b, ctype_x, chb, chx, opname ) \ -\ -void PASTEMAC2(chb,chx,opname)( \ - conj_t conjbeta, \ - doff_t diagoffx, \ - uplo_t uplox, \ - dim_t m, \ - dim_t n, \ - ctype_b* beta, \ - ctype_x* x, inc_t rs_x, inc_t cs_x \ - ); - -INSERT_GENTPROT2_BASIC( scalm ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( scalm ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( scalm ) -#endif - diff --git a/frame/1m/scalm/bli_scalm_check.c b/frame/1m/old/bli_scalm_check.c similarity index 100% rename from frame/1m/scalm/bli_scalm_check.c rename to frame/1m/old/bli_scalm_check.c diff --git a/frame/1m/scalm/bli_scalm_check.h b/frame/1m/old/bli_scalm_check.h similarity index 100% rename from frame/1m/scalm/bli_scalm_check.h rename to frame/1m/old/bli_scalm_check.h diff --git a/frame/1m/scalm/bli_scalm_unb_var1.c b/frame/1m/old/bli_scalm_unb_var1.c similarity index 58% rename from frame/1m/scalm/bli_scalm_unb_var1.c rename to frame/1m/old/bli_scalm_unb_var1.c index b79b66c53..ee7fbaf0b 100644 --- a/frame/1m/scalm/bli_scalm_unb_var1.c +++ b/frame/1m/old/bli_scalm_unb_var1.c @@ -37,33 +37,27 @@ #define FUNCPTR_T scalm_fp typedef void (*FUNCPTR_T)( - conj_t conjbeta, + conj_t conjalpha, doff_t diagoffx, + diag_t diagx, uplo_t uplox, dim_t m, dim_t n, - void* beta, + void* alpha, void* x, inc_t rs_x, inc_t cs_x ); -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,scalm_unb_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,scalm_unb_var1); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,scalm_unb_var1); -#endif -#endif +static FUNCPTR_T GENARRAY_MIN(ftypes,scalm_unb_var1); -void bli_scalm_unb_var1( obj_t* x ) +void bli_scalm_unb_var1( obj_t* alpha, + obj_t* x, + cntx_t* cntx ) { num_t dt_x = bli_obj_datatype( *x ); doff_t diagoffx = bli_obj_diag_offset( *x ); + uplo_t diagx = bli_obj_diag( *x ); uplo_t uplox = bli_obj_uplo( *x ); dim_t m = bli_obj_length( *x ); @@ -73,66 +67,77 @@ void bli_scalm_unb_var1( obj_t* x ) inc_t rs_x = bli_obj_row_stride( *x ); inc_t cs_x = bli_obj_col_stride( *x ); - void* buf_beta; + void* buf_alpha; + + obj_t x_local; FUNCPTR_T f; + // Alias x to x_local so we can apply alpha if it is non-unit. + bli_obj_alias_to( *x, x_local ); + + // If alpha is non-unit, apply it to the scalar attached to x. + if ( !bli_obj_equals( alpha, &BLIS_ONE ) ) + { + bli_obj_scalar_apply_scalar( alpha, &x_local ); + } // Grab the address of the internal scalar buffer for the scalar // attached to x. - buf_beta = bli_obj_internal_scalar_buffer( *x ); + buf_alpha_x = bli_obj_internal_scalar_buffer( *x ); // Index into the type combination array to extract the correct // function pointer. - // NOTE: We use dt_x for both beta and x because beta was obtained + // NOTE: We use dt_x for both alpha and x because alpha was obtained // from the attached scalar of x, which is guaranteed to be of the // same datatype as x. f = ftypes[dt_x][dt_x]; // Invoke the function. - // NOTE: We unconditionally pass in BLIS_NO_CONJUGATE for beta + // NOTE: We unconditionally pass in BLIS_NO_CONJUGATE for alpha // because it would have already been conjugated by the front-end. f( BLIS_NO_CONJUGATE, diagoffx, + diagx, uplox, m, n, - buf_beta, + buf_alpha, buf_x, rs_x, cs_x ); } -#undef GENTFUNC2 -#define GENTFUNC2( ctype_b, ctype_x, chb, chx, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(chb,chx,varname)( \ - conj_t conjbeta, \ - doff_t diagoffx, \ - uplo_t uplox, \ - dim_t m, \ - dim_t n, \ - void* beta, \ - void* x, inc_t rs_x, inc_t cs_x \ - ) \ +void PASTEMAC(ch,varname)( \ + conj_t conjalpha, \ + doff_t diagoffx, \ + doff_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t rs_x, inc_t cs_x \ + ) \ { \ - ctype_b* beta_cast = beta; \ - ctype_x* x_cast = x; \ - ctype_x* x1; \ - uplo_t uplox_eff; \ - dim_t n_iter; \ - dim_t n_elem, n_elem_max; \ - inc_t ldx, incx; \ - dim_t j, i; \ - dim_t ij0, n_shift; \ + ctype* alpha_cast = alpha; \ + ctype* x_cast = x; \ + ctype* x1; \ + uplo_t uplox_eff; \ + dim_t n_iter; \ + dim_t n_elem, n_elem_max; \ + inc_t ldx, incx; \ + dim_t j, i; \ + dim_t ij0, n_shift; \ \ if ( bli_zero_dim2( m, n ) ) return; \ \ - /* If beta is unit, the entire operation is a no-op. */ \ - if ( PASTEMAC(chb,eq1)( *beta_cast ) ) return; \ + /* If alpha is unit, the entire operation is a no-op. */ \ + if ( PASTEMAC(chb,eq1)( *alpha_cast ) ) return; \ \ - /* Set various loop parameters. Here, we assume diagx is BLIS_NONUNIT_DIAG - because in _check() we disallow scalm on unit diagonal matrices. */ \ - bli_set_dims_incs_uplo_1m( diagoffx, BLIS_NONUNIT_DIAG, \ + /* Set various loop parameters. */ \ + bli_set_dims_incs_uplo_1m( diagoffx, diagx, \ uplox, m, n, rs_x, cs_x, \ uplox_eff, n_elem_max, n_iter, incx, ldx, \ ij0, n_shift ); \ @@ -148,10 +153,10 @@ void PASTEMAC2(chb,chx,varname)( \ \ x1 = x_cast + (j )*ldx + (0 )*incx; \ \ - PASTEMAC2(chb,chx,kername)( conjbeta, \ - n_elem, \ - beta_cast, \ - x1, incx ); \ + PASTEMAC(ch,kername)( conjalpha, \ + n_elem, \ + alpha_cast, \ + x1, incx ); \ } \ } \ else \ @@ -164,10 +169,10 @@ void PASTEMAC2(chb,chx,varname)( \ \ x1 = x_cast + (ij0+j )*ldx + (0 )*incx; \ \ - PASTEMAC2(chb,chx,kername)( conjbeta, \ - n_elem, \ - beta_cast, \ - x1, incx ); \ + PASTEMAC(ch,kername)( conjalpha, \ + n_elem, \ + alpha_cast, \ + x1, incx ); \ } \ } \ else if ( bli_is_lower( uplox_eff ) ) \ @@ -179,25 +184,14 @@ void PASTEMAC2(chb,chx,varname)( \ \ x1 = x_cast + (j )*ldx + (ij0+i )*incx; \ \ - PASTEMAC2(chb,chx,kername)( conjbeta, \ - n_elem, \ - beta_cast, \ - x1, incx ); \ + PASTEMAC(ch,kername)( conjalpha, \ + n_elem, \ + alpha_cast, \ + x1, incx ); \ } \ } \ } \ } - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( scalm_unb_var1, SCALV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( scalm_unb_var1, SCALV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( scalm_unb_var1, SCALV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( scalm_unb_var1 ) diff --git a/frame/1m/scalm/bli_scalm_unb_var1.h b/frame/1m/old/bli_scalm_unb_var1.h similarity index 91% rename from frame/1m/scalm/bli_scalm_unb_var1.h rename to frame/1m/old/bli_scalm_unb_var1.h index cf8a94dd8..b8574f089 100644 --- a/frame/1m/scalm/bli_scalm_unb_var1.h +++ b/frame/1m/old/bli_scalm_unb_var1.h @@ -32,19 +32,21 @@ */ -void bli_scalm_unb_var1( obj_t* x ); +void bli_scalm_unb_var1( obj_t* alpha, + obj_t* x, + cntx_t* cntx ); #undef GENTPROT2 #define GENTPROT2( ctype_b, ctype_x, chb, chx, varname ) \ \ void PASTEMAC2(chb,chx,varname)( \ - conj_t conjbeta, \ + conj_t conjalpha, \ doff_t diagoffx, \ uplo_t uplox, \ dim_t m, \ dim_t n, \ - void* beta, \ + void* alpha, \ void* x, inc_t rs_x, inc_t cs_x \ ); diff --git a/frame/1m/copym/bli_copym.c b/frame/1m/old/copym/bli_copym.c similarity index 100% rename from frame/1m/copym/bli_copym.c rename to frame/1m/old/copym/bli_copym.c diff --git a/frame/1m/copym/bli_copym.h b/frame/1m/old/copym/bli_copym.h similarity index 100% rename from frame/1m/copym/bli_copym.h rename to frame/1m/old/copym/bli_copym.h diff --git a/frame/1m/copym/bli_copym_check.c b/frame/1m/old/copym/bli_copym_check.c similarity index 100% rename from frame/1m/copym/bli_copym_check.c rename to frame/1m/old/copym/bli_copym_check.c diff --git a/frame/1m/copym/bli_copym_check.h b/frame/1m/old/copym/bli_copym_check.h similarity index 100% rename from frame/1m/copym/bli_copym_check.h rename to frame/1m/old/copym/bli_copym_check.h diff --git a/frame/1m/copym/bli_copym_unb_var1.c b/frame/1m/old/copym/bli_copym_unb_var1.c similarity index 98% rename from frame/1m/copym/bli_copym_unb_var1.c rename to frame/1m/old/copym/bli_copym_unb_var1.c index c8ca22fc1..2faee991a 100644 --- a/frame/1m/copym/bli_copym_unb_var1.c +++ b/frame/1m/old/copym/bli_copym_unb_var1.c @@ -61,7 +61,8 @@ static FUNCPTR_T GENARRAY2_MIN(ftypes,copym_unb_var1); void bli_copym_unb_var1( obj_t* x, - obj_t* y ) + obj_t* y, + cntx_t* cntx ) { num_t dt_x = bli_obj_datatype( *x ); num_t dt_y = bli_obj_datatype( *y ); diff --git a/frame/1m/copym/bli_copym_unb_var1.h b/frame/1m/old/copym/bli_copym_unb_var1.h similarity index 97% rename from frame/1m/copym/bli_copym_unb_var1.h rename to frame/1m/old/copym/bli_copym_unb_var1.h index 958b71c83..af254f4ed 100644 --- a/frame/1m/copym/bli_copym_unb_var1.h +++ b/frame/1m/old/copym/bli_copym_unb_var1.h @@ -32,7 +32,7 @@ */ -void bli_copym_unb_var1( obj_t* x, obj_t* y ); +void bli_copym_unb_var1( obj_t* x, obj_t* y, cntx_t* cntx ); #undef GENTPROT2 #define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ diff --git a/frame/1m/scal2m/bli_scal2m.c b/frame/1m/old/scal2m/bli_scal2m.c similarity index 100% rename from frame/1m/scal2m/bli_scal2m.c rename to frame/1m/old/scal2m/bli_scal2m.c diff --git a/frame/1m/scal2m/bli_scal2m.h b/frame/1m/old/scal2m/bli_scal2m.h similarity index 100% rename from frame/1m/scal2m/bli_scal2m.h rename to frame/1m/old/scal2m/bli_scal2m.h diff --git a/frame/1m/scal2m/bli_scal2m_check.c b/frame/1m/old/scal2m/bli_scal2m_check.c similarity index 100% rename from frame/1m/scal2m/bli_scal2m_check.c rename to frame/1m/old/scal2m/bli_scal2m_check.c diff --git a/frame/1m/scal2m/bli_scal2m_check.h b/frame/1m/old/scal2m/bli_scal2m_check.h similarity index 100% rename from frame/1m/scal2m/bli_scal2m_check.h rename to frame/1m/old/scal2m/bli_scal2m_check.h diff --git a/frame/1m/scal2m/bli_scal2m_unb_var1.c b/frame/1m/old/scal2m/bli_scal2m_unb_var1.c similarity index 90% rename from frame/1m/scal2m/bli_scal2m_unb_var1.c rename to frame/1m/old/scal2m/bli_scal2m_unb_var1.c index 61eaa7826..fe3202cf7 100644 --- a/frame/1m/scal2m/bli_scal2m_unb_var1.c +++ b/frame/1m/old/scal2m/bli_scal2m_unb_var1.c @@ -43,7 +43,7 @@ typedef void (*FUNCPTR_T)( trans_t transx, dim_t m, dim_t n, - void* beta, + void* alpha, void* x, inc_t rs_x, inc_t cs_x, void* y, inc_t rs_y, inc_t cs_y ); @@ -61,9 +61,10 @@ static FUNCPTR_T GENARRAY3_MIN(ftypes,scal2m_unb_var1); #endif -void bli_scal2m_unb_var1( obj_t* beta, +void bli_scal2m_unb_var1( obj_t* alpha, obj_t* x, - obj_t* y ) + obj_t* y, + cntx_t* cntx ) { num_t dt_x = bli_obj_datatype( *x ); num_t dt_y = bli_obj_datatype( *y ); @@ -84,19 +85,19 @@ void bli_scal2m_unb_var1( obj_t* beta, inc_t cs_y = bli_obj_col_stride( *y ); void* buf_y = bli_obj_buffer_at_off( *y ); - num_t dt_beta; - void* buf_beta; + num_t dt_alpha; + void* buf_alpha; FUNCPTR_T f; - // If beta is a scalar constant, use dt_x to extract the address of the + // If alpha is a scalar constant, use dt_x to extract the address of the // corresponding constant value; otherwise, use the datatype encoded - // within the beta object and extract the buffer at the beta offset. - bli_set_scalar_dt_buffer( beta, dt_x, dt_beta, buf_beta ); + // within the alpha object and extract the buffer at the alpha offset. + bli_set_scalar_dt_buffer( alpha, dt_x, dt_alpha, buf_alpha ); // Index into the type combination array to extract the correct // function pointer. - f = ftypes[dt_beta][dt_x][dt_y]; + f = ftypes[dt_alpha][dt_x][dt_y]; // Invoke the function. f( diagoffx, @@ -105,7 +106,7 @@ void bli_scal2m_unb_var1( obj_t* beta, transx, m, n, - buf_beta, + buf_alpha, buf_x, rs_x, cs_x, buf_y, rs_y, cs_y ); } @@ -121,12 +122,12 @@ void PASTEMAC3(cha,chx,chy,varname)( \ trans_t transx, \ dim_t m, \ dim_t n, \ - void* beta, \ + void* alpha, \ void* x, inc_t rs_x, inc_t cs_x, \ void* y, inc_t rs_y, inc_t cs_y \ ) \ { \ - ctype_a* beta_cast = beta; \ + ctype_a* alpha_cast = alpha; \ ctype_x* x_cast = x; \ ctype_y* y_cast = y; \ ctype_x* x1; \ @@ -142,8 +143,8 @@ void PASTEMAC3(cha,chx,chy,varname)( \ \ if ( bli_zero_dim2( m, n ) ) return; \ \ - /* If beta is unit, then we can simply copy. */ \ - if ( PASTEMAC(cha,eq1)( *beta_cast ) ) \ + /* If alpha is unit, then we can simply copy. */ \ + if ( PASTEMAC(cha,eq1)( *alpha_cast ) ) \ { \ PASTEMAC2(chx,chy,copym)( diagoffx, \ diagx, \ @@ -176,14 +177,13 @@ void PASTEMAC3(cha,chx,chy,varname)( \ for ( j = 0; j < n_iter; ++j ) \ { \ n_elem = n_elem_max; \ -/*printf( "scal2m_unb_var1: dense: iter %u\n", j );*/ \ \ x1 = x_cast + (j )*ldx + (0 )*incx; \ y1 = y_cast + (j )*ldy + (0 )*incy; \ \ PASTEMAC3(cha,chx,chy,kername)( conjx, \ n_elem, \ - beta_cast, \ + alpha_cast, \ x1, incx, \ y1, incy ); \ } \ @@ -202,7 +202,7 @@ void PASTEMAC3(cha,chx,chy,varname)( \ \ PASTEMAC3(cha,chx,chy,kername)( conjx, \ n_elem, \ - beta_cast, \ + alpha_cast, \ x1, incx, \ y1, incy ); \ } \ @@ -220,7 +220,7 @@ void PASTEMAC3(cha,chx,chy,varname)( \ \ PASTEMAC3(cha,chx,chy,kername)( conjx, \ n_elem, \ - beta_cast, \ + alpha_cast, \ x1, incx, \ y1, incy ); \ } \ @@ -237,7 +237,7 @@ void PASTEMAC3(cha,chx,chy,varname)( \ PASTEMAC2(cha,chy,setd)( diagoffy, \ m, \ n, \ - beta_cast, \ + alpha_cast, \ y_cast, rs_y, cs_y ); \ } \ } \ diff --git a/frame/1m/scal2m/bli_scal2m_unb_var1.h b/frame/1m/old/scal2m/bli_scal2m_unb_var1.h similarity index 97% rename from frame/1m/scal2m/bli_scal2m_unb_var1.h rename to frame/1m/old/scal2m/bli_scal2m_unb_var1.h index dab9be8ca..bba8a9dec 100644 --- a/frame/1m/scal2m/bli_scal2m_unb_var1.h +++ b/frame/1m/old/scal2m/bli_scal2m_unb_var1.h @@ -32,7 +32,7 @@ */ -void bli_scal2m_unb_var1( obj_t* beta, obj_t* x, obj_t* y ); +void bli_scal2m_unb_var1( obj_t* beta, obj_t* x, obj_t* y, cntx_t* cntx ); #undef GENTPROT3 diff --git a/frame/1m/setm/bli_setm.c b/frame/1m/old/setm/bli_setm.c similarity index 100% rename from frame/1m/setm/bli_setm.c rename to frame/1m/old/setm/bli_setm.c diff --git a/frame/1m/setm/bli_setm.h b/frame/1m/old/setm/bli_setm.h similarity index 100% rename from frame/1m/setm/bli_setm.h rename to frame/1m/old/setm/bli_setm.h diff --git a/frame/1m/setm/bli_setm_check.c b/frame/1m/old/setm/bli_setm_check.c similarity index 100% rename from frame/1m/setm/bli_setm_check.c rename to frame/1m/old/setm/bli_setm_check.c diff --git a/frame/1m/setm/bli_setm_check.h b/frame/1m/old/setm/bli_setm_check.h similarity index 100% rename from frame/1m/setm/bli_setm_check.h rename to frame/1m/old/setm/bli_setm_check.h diff --git a/frame/1m/setm/bli_setm_unb_var1.c b/frame/1m/old/setm/bli_setm_unb_var1.c similarity index 96% rename from frame/1m/setm/bli_setm_unb_var1.c rename to frame/1m/old/setm/bli_setm_unb_var1.c index 535cc3168..9e8150d8b 100644 --- a/frame/1m/setm/bli_setm_unb_var1.c +++ b/frame/1m/old/setm/bli_setm_unb_var1.c @@ -37,6 +37,7 @@ #define FUNCPTR_T setm_fp typedef void (*FUNCPTR_T)( + conj_t conjbeta, doff_t diagoffx, diag_t diagx, uplo_t uplox, @@ -60,10 +61,12 @@ static FUNCPTR_T GENARRAY2_MIN(ftypes,setm_unb_var1); void bli_setm_unb_var1( obj_t* beta, - obj_t* x ) + obj_t* x, + cntx_t* cntx ) { num_t dt_x = bli_obj_datatype( *x ); + conj_t conjbeta = bli_obj_conj_status( *beta ); doff_t diagoffx = bli_obj_diag_offset( *x ); diag_t diagx = bli_obj_diag( *x ); uplo_t uplox = bli_obj_uplo( *x ); @@ -90,7 +93,8 @@ void bli_setm_unb_var1( obj_t* beta, f = ftypes[dt_beta][dt_x]; // Invoke the function. - f( diagoffx, + f( conjbeta, + diagoffx, diagx, uplox, m, @@ -104,6 +108,7 @@ void bli_setm_unb_var1( obj_t* beta, #define GENTFUNC2( ctype_b, ctype_x, chb, chx, varname, kername ) \ \ void PASTEMAC2(chb,chx,varname)( \ + conj_t conjbeta, \ doff_t diagoffx, \ diag_t diagx, \ uplo_t uplox, \ diff --git a/frame/1m/setm/bli_setm_unb_var1.h b/frame/1m/old/setm/bli_setm_unb_var1.h similarity index 97% rename from frame/1m/setm/bli_setm_unb_var1.h rename to frame/1m/old/setm/bli_setm_unb_var1.h index 0ba6aaa14..dcaca0efd 100644 --- a/frame/1m/setm/bli_setm_unb_var1.h +++ b/frame/1m/old/setm/bli_setm_unb_var1.h @@ -32,8 +32,7 @@ */ -void bli_setm_unb_var1( obj_t* beta, - obj_t* x ); +void bli_setm_unb_var1( obj_t* beta, obj_t* x, cntx_t* cntx ); #undef GENTPROT2 diff --git a/frame/1m/subm/bli_subm.c b/frame/1m/old/subm/bli_subm.c similarity index 100% rename from frame/1m/subm/bli_subm.c rename to frame/1m/old/subm/bli_subm.c diff --git a/frame/1m/subm/bli_subm.h b/frame/1m/old/subm/bli_subm.h similarity index 100% rename from frame/1m/subm/bli_subm.h rename to frame/1m/old/subm/bli_subm.h diff --git a/frame/1m/subm/bli_subm_check.c b/frame/1m/old/subm/bli_subm_check.c similarity index 100% rename from frame/1m/subm/bli_subm_check.c rename to frame/1m/old/subm/bli_subm_check.c diff --git a/frame/1m/subm/bli_subm_check.h b/frame/1m/old/subm/bli_subm_check.h similarity index 100% rename from frame/1m/subm/bli_subm_check.h rename to frame/1m/old/subm/bli_subm_check.h diff --git a/frame/1m/subm/bli_subm_unb_var1.c b/frame/1m/old/subm/bli_subm_unb_var1.c similarity index 98% rename from frame/1m/subm/bli_subm_unb_var1.c rename to frame/1m/old/subm/bli_subm_unb_var1.c index 0d87cfe9b..a025b764a 100644 --- a/frame/1m/subm/bli_subm_unb_var1.c +++ b/frame/1m/old/subm/bli_subm_unb_var1.c @@ -61,7 +61,8 @@ static FUNCPTR_T GENARRAY2_MIN(ftypes,subm_unb_var1); void bli_subm_unb_var1( obj_t* x, - obj_t* y ) + obj_t* y, + cntx_t* cntx ) { num_t dt_x = bli_obj_datatype( *x ); num_t dt_y = bli_obj_datatype( *y ); diff --git a/frame/1m/subm/bli_subm_unb_var1.h b/frame/1m/old/subm/bli_subm_unb_var1.h similarity index 97% rename from frame/1m/subm/bli_subm_unb_var1.h rename to frame/1m/old/subm/bli_subm_unb_var1.h index 41c0adf3b..aaf7ea9f2 100644 --- a/frame/1m/subm/bli_subm_unb_var1.h +++ b/frame/1m/old/subm/bli_subm_unb_var1.h @@ -32,7 +32,7 @@ */ -void bli_subm_unb_var1( obj_t* x, obj_t* y ); +void bli_subm_unb_var1( obj_t* x, obj_t* y, cntx_t* cntx ); #undef GENTPROT2 #define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ diff --git a/frame/1m/packm/bli_packm.h b/frame/1m/packm/bli_packm.h index 26030bb2d..7a44ecb9f 100644 --- a/frame/1m/packm/bli_packm.h +++ b/frame/1m/packm/bli_packm.h @@ -33,6 +33,7 @@ */ #include "bli_packm_cntl.h" +#include "bli_packm_cntx.h" #include "bli_packm_check.h" #include "bli_packm_init.h" #include "bli_packm_int.h" diff --git a/frame/1m/packm/bli_packm_blk_var1.c b/frame/1m/packm/bli_packm_blk_var1.c index 3c0318bad..67c13c4e5 100644 --- a/frame/1m/packm/bli_packm_blk_var1.c +++ b/frame/1m/packm/bli_packm_blk_var1.c @@ -56,19 +56,46 @@ typedef void (*FUNCPTR_T)( inc_t is_p, dim_t pd_p, inc_t ps_p, void* packm_ker, + cntx_t* cntx, packm_thrinfo_t* thread ); static FUNCPTR_T GENARRAY(ftypes,packm_blk_var1); -extern func_t* packm_struc_cxk_kers; -extern func_t* packm_struc_cxk_4mi_kers; -extern func_t* packm_struc_cxk_3mis_kers; -extern func_t* packm_struc_cxk_rih_kers; + +static func_t packm_struc_cxk_kers[BLIS_NUM_PACK_SCHEMA_TYPES] = +{ + /* float (0) scomplex (1) double (2) dcomplex (3) */ +// 0000 row/col panels + { { bli_spackm_struc_cxk, bli_cpackm_struc_cxk, + bli_dpackm_struc_cxk, bli_zpackm_struc_cxk, } }, +// 0001 row/col panels: 4m interleaved + { { NULL, bli_cpackm_struc_cxk_4mi, + NULL, bli_zpackm_struc_cxk_4mi, } }, +// 0010 row/col panels: 3m interleaved + { { NULL, bli_cpackm_struc_cxk_3mis, + NULL, bli_zpackm_struc_cxk_3mis, } }, +// 0011 row/col panels: 4m separated (NOT IMPLEMENTED) + { { NULL, NULL, + NULL, NULL, } }, +// 0100 row/col panels: 3m separated + { { NULL, bli_cpackm_struc_cxk_3mis, + NULL, bli_zpackm_struc_cxk_3mis, } }, +// 0101 row/col panels: real only + { { NULL, bli_cpackm_struc_cxk_rih, + NULL, bli_zpackm_struc_cxk_rih, } }, +// 0110 row/col panels: imaginary only + { { NULL, bli_cpackm_struc_cxk_rih, + NULL, bli_zpackm_struc_cxk_rih, } }, +// 0111 row/col panels: real+imaginary only + { { NULL, bli_cpackm_struc_cxk_rih, + NULL, bli_zpackm_struc_cxk_rih, } }, +}; void bli_packm_blk_var1( obj_t* c, obj_t* p, + cntx_t* cntx, packm_thrinfo_t* t ) { num_t dt_cp = bli_obj_datatype( *c ); @@ -108,9 +135,19 @@ void bli_packm_blk_var1( obj_t* c, FUNCPTR_T f; + // Treatment of kappa (ie: packing during scaling) depends on // whether we are executing an induced method. - if ( bli_is_ind_packed( schema ) ) + if ( bli_is_nat_packed( schema ) ) + { + // This branch if for native execution, where we assume that + // the micro-kernel will always apply the alpha scalar of the + // higher-level operation. Thus, we use BLIS_ONE for kappa so + // that the underlying packm implementation does not perform + // any scaling during packing. + buf_kappa = bli_obj_buffer_for_const( dt_cp, BLIS_ONE ); + } + else // if ( bli_is_ind_packed( schema ) ) { // The value for kappa we use will depend on whether the scalar // attached to A has a nonzero imaginary component. If it does, @@ -123,6 +160,7 @@ void bli_packm_blk_var1( obj_t* c, { if ( bli_obj_scalar_has_nonzero_imag( p ) ) { +//printf( "applying non-zero imag kappa\n" ); // Detach the scalar. bli_obj_scalar_detach( p, &kappa ); @@ -144,18 +182,10 @@ void bli_packm_blk_var1( obj_t* c, // Acquire the buffer to the kappa chosen above. buf_kappa = bli_obj_buffer_for_1x1( dt_cp, *kappa_p ); } - else // if ( bli_is_nat_packed( schema ) ) - { - // This branch if for native execution, where we assume that - // the micro-kernel will always apply the alpha scalar of the - // higher-level operation. Thus, we use BLIS_ONE for kappa so - // that the underlying packm implementation does not perform - // any scaling during packing. - buf_kappa = bli_obj_buffer_for_const( dt_cp, BLIS_ONE ); - } // Choose the correct func_t object based on the pack_t schema. +#if 0 if ( bli_is_4mi_packed( schema ) ) packm_kers = packm_struc_cxk_4mi_kers; else if ( bli_is_3mi_packed( schema ) || bli_is_3ms_packed( schema ) ) packm_kers = packm_struc_cxk_3mis_kers; @@ -163,11 +193,39 @@ void bli_packm_blk_var1( obj_t* c, bli_is_io_packed( schema ) || bli_is_rpi_packed( schema ) ) packm_kers = packm_struc_cxk_rih_kers; else packm_kers = packm_struc_cxk_kers; +#else + func_t* cntx_packm_kers = bli_cntx_get_packm_ukr( cntx ); + + //if ( bli_func_is_null_dt( dt_cp, cntx_packm_kers ) ) + { + // If the packm structure-aware kernel func_t in the context is + // NULL (which is the default value after the context is created), + // we use the default lookup table to determine the right func_t + // for the current schema. + const dim_t i = bli_pack_schema_index( schema ); +//printf( "bli_packm_blk_var1: pack schema index = %lu (schema = %x)\n", i, schema ); + + packm_kers = &packm_struc_cxk_kers[ i ]; + } +#if 0 + else // cntx's packm func_t overrides + { + // If the packm structure-aware kernel func_t in the context is + // non-NULL (ie: assumed to be valid), we use that instead. + //packm_kers = bli_cntx_packm_ukrs( cntx ); + packm_kers = cntx_packm_kers; + } +#endif +#endif // Query the datatype-specific function pointer from the func_t object. - packm_ker = bli_func_obj_query( dt_cp, packm_kers ); + packm_ker = bli_func_get_dt( dt_cp, packm_kers ); +//bli_cntx_print( cntx ); +//printf( "bli_packm_blk_var1: packm_ker = %p\n", packm_ker ); +//printf( "bli_packm_blk_var1: cntx_packm_ker = %p\n", cntx_packm_kers ); +//printf( "bli_packm_blk_var1: local_table_entry = %p\n", &packm_struc_cxk_kers[ bli_pack_schema_index( schema ) ] ); // Index into the type combination array to extract the correct // function pointer. f = ftypes[dt_cp]; @@ -192,37 +250,40 @@ void bli_packm_blk_var1( obj_t* c, is_p, pd_p, ps_p, packm_ker, + cntx, t ); } #undef GENTFUNCR -#define GENTFUNCR( ctype, ctype_r, ch, chr, varname, kertype ) \ +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - trans_t transc, \ - pack_t schema, \ - bool_t invdiag, \ - bool_t revifup, \ - bool_t reviflo, \ - dim_t m, \ - dim_t n, \ - dim_t m_max, \ - dim_t n_max, \ - void* kappa, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p, \ - dim_t pd_p, inc_t ps_p, \ - void* packm_ker, \ - packm_thrinfo_t* thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + trans_t transc, \ + pack_t schema, \ + bool_t invdiag, \ + bool_t revifup, \ + bool_t reviflo, \ + dim_t m, \ + dim_t n, \ + dim_t m_max, \ + dim_t n_max, \ + void* kappa, \ + void* c, inc_t rs_c, inc_t cs_c, \ + void* p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + dim_t pd_p, inc_t ps_p, \ + void* packm_ker, \ + cntx_t* cntx, \ + packm_thrinfo_t* thread \ + ) \ { \ - PASTECH(ch,kertype) packm_ker_cast = packm_ker; \ + PASTECH2(ch,opname,_ft) packm_ker_cast = packm_ker; \ \ ctype* restrict kappa_cast = kappa; \ ctype* restrict c_cast = c; \ @@ -454,7 +515,8 @@ PASTEMAC(ch,fprintm)( stdout, "packm_var2: a", m, n, \ kappa_cast, \ c_use, rs_c, cs_c, \ p_use, rs_p, cs_p, \ - is_p_use ); \ + is_p_use, \ + cntx ); \ } \ \ /* NOTE: This value is usually LESS than ps_p because triangular @@ -492,7 +554,8 @@ PASTEMAC(ch,fprintm)( stdout, "packm_var2: a", m, n, \ kappa_cast, \ c_use, rs_c, cs_c, \ p_use, rs_p, cs_p, \ - is_p_use ); \ + is_p_use, \ + cntx ); \ } \ \ p_inc = ps_p; \ @@ -527,7 +590,8 @@ PASTEMAC(ch,fprintm)( stdout, "packm_var2: a", m, n, \ kappa_cast, \ c_use, rs_c, cs_c, \ p_use, rs_p, cs_p, \ - is_p_use ); \ + is_p_use, \ + cntx ); \ } \ \ /* NOTE: This value is equivalent to ps_p. */ \ @@ -601,5 +665,5 @@ PASTEMAC(ch,fprintm)( stdout, "packm_var2: a", m, n, \ } \ } -INSERT_GENTFUNCR_BASIC( packm_blk_var1, packm_ker_t ) +INSERT_GENTFUNCR_BASIC( packm, packm_blk_var1 ) diff --git a/frame/1m/packm/bli_packm_blk_var1.c.old b/frame/1m/packm/bli_packm_blk_var1.c.old index 78d52c9ca..98b6bb233 100644 --- a/frame/1m/packm/bli_packm_blk_var1.c.old +++ b/frame/1m/packm/bli_packm_blk_var1.c.old @@ -147,7 +147,7 @@ void bli_packm_blk_var1( obj_t* c, #undef GENTFUNC #define GENTFUNC( ctype, ch, varname, kertype ) \ \ -void PASTEMAC(ch,varname)( \ +void PASTEMAC(ch,varname) \ struc_t strucc, \ doff_t diagoffc, \ diag_t diagc, \ diff --git a/frame/1m/packm/bli_packm_blk_var1.h b/frame/1m/packm/bli_packm_blk_var1.h index 482d3377f..a946443f5 100644 --- a/frame/1m/packm/bli_packm_blk_var1.h +++ b/frame/1m/packm/bli_packm_blk_var1.h @@ -34,34 +34,37 @@ void bli_packm_blk_var1( obj_t* c, obj_t* p, + cntx_t* cntx, packm_thrinfo_t* t ); #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - trans_t transc, \ - pack_t schema, \ - bool_t invdiag, \ - bool_t revifup, \ - bool_t reviflo, \ - dim_t m, \ - dim_t n, \ - dim_t m_max, \ - dim_t n_max, \ - void* kappa, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p, \ - dim_t pd_p, inc_t ps_p, \ - void* packm_ker, \ - packm_thrinfo_t* thread \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + trans_t transc, \ + pack_t schema, \ + bool_t invdiag, \ + bool_t revifup, \ + bool_t reviflo, \ + dim_t m, \ + dim_t n, \ + dim_t m_max, \ + dim_t n_max, \ + void* kappa, \ + void* c, inc_t rs_c, inc_t cs_c, \ + void* p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + dim_t pd_p, inc_t ps_p, \ + void* packm_ker, \ + cntx_t* cntx, \ + packm_thrinfo_t* thread \ + ); INSERT_GENTPROT_BASIC( packm_blk_var1 ) diff --git a/frame/1m/packm/bli_packm_check.c b/frame/1m/packm/bli_packm_check.c index 77de952aa..6a56b8676 100644 --- a/frame/1m/packm/bli_packm_check.c +++ b/frame/1m/packm/bli_packm_check.c @@ -35,9 +35,9 @@ #include "blis.h" -void bli_packm_init_check( obj_t* a, - obj_t* p, - packm_t* cntl ) +void bli_packm_init_check( obj_t* a, + obj_t* p, + cntx_t* cntx ) { err_t e_val; @@ -54,9 +54,9 @@ void bli_packm_init_check( obj_t* a, //bli_check_error_code( e_val ); } -void bli_packm_int_check( obj_t* a, - obj_t* p, - packm_t* cntl ) +void bli_packm_int_check( obj_t* a, + obj_t* p, + cntx_t* cntx ) { err_t e_val; diff --git a/frame/1m/packm/bli_packm_check.h b/frame/1m/packm/bli_packm_check.h index b3997e1b5..9974ced6b 100644 --- a/frame/1m/packm/bli_packm_check.h +++ b/frame/1m/packm/bli_packm_check.h @@ -32,10 +32,10 @@ */ -void bli_packm_init_check( obj_t* a, - obj_t* p, - packm_t* cntl ); +void bli_packm_init_check( obj_t* a, + obj_t* p, + cntx_t* cntx ); -void bli_packm_int_check( obj_t* a, - obj_t* p, - packm_t* cntl ); +void bli_packm_int_check( obj_t* a, + obj_t* p, + cntx_t* cntx ); diff --git a/frame/1m/packm/bli_packm_cntl.c b/frame/1m/packm/bli_packm_cntl.c index 1c3586121..73c0fbe2c 100644 --- a/frame/1m/packm/bli_packm_cntl.c +++ b/frame/1m/packm/bli_packm_cntl.c @@ -34,77 +34,13 @@ #include "blis.h" -blksz_t* packm_mult_ldim; -blksz_t* packm_mult_nvec; - -func_t* packm_struc_cxk_kers; -func_t* packm_struc_cxk_4mi_kers; -func_t* packm_struc_cxk_3mis_kers; -func_t* packm_struc_cxk_rih_kers; - packm_t* packm_cntl_row; packm_t* packm_cntl_col; -packm_t* packm_cntl_rpn; -packm_t* packm_cntl_cpn; - packm_t* packm_cntl; void bli_packm_cntl_init() { - // Create function pointer object for each datatype-specific packm - // kernel. - packm_struc_cxk_kers - = - bli_func_obj_create( bli_spackm_struc_cxk, FALSE, - bli_dpackm_struc_cxk, FALSE, - bli_cpackm_struc_cxk, FALSE, - bli_zpackm_struc_cxk, FALSE ); - - packm_struc_cxk_4mi_kers - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - bli_cpackm_struc_cxk_4mi, FALSE, - bli_zpackm_struc_cxk_4mi, FALSE ); - - packm_struc_cxk_3mis_kers - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - bli_cpackm_struc_cxk_3mis, FALSE, - bli_zpackm_struc_cxk_3mis, FALSE ); - - packm_struc_cxk_rih_kers - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - bli_cpackm_struc_cxk_rih, FALSE, - bli_zpackm_struc_cxk_rih, FALSE ); - - - // Create blocksize objects for m and n register blocking. We will attach - // these to the packm control node so they can be used to (a) allocate a - // block whose m and n dimension are multiples of mr and nr, and (b) know - // how much zero-padding is necessary for edge cases. - // NOTE: these alignments end up getting applied to matrices packed for - // level-2 operations, even though they are not needed, and/or smaller - // alignments may be sufficient. For simplicity, we choose to tweak the - // dimensions of all pack matrix buffers the same amount. - packm_mult_ldim - = - bli_blksz_obj_create( BLIS_DEFAULT_MR_S, 0, - BLIS_DEFAULT_MR_D, 0, - BLIS_DEFAULT_MR_C, 0, - BLIS_DEFAULT_MR_Z, 0 ); - - packm_mult_nvec - = - bli_blksz_obj_create( BLIS_DEFAULT_NR_S, 0, - BLIS_DEFAULT_NR_D, 0, - BLIS_DEFAULT_NR_C, 0, - BLIS_DEFAULT_NR_Z, 0 ); - // Generally speaking, the BLIS_PACKED_ROWS and BLIS_PACKED_COLUMNS // are used by the level-2 operations. These schemas amount to simple // copies to row or column storage. These simple schemas may be used @@ -121,8 +57,8 @@ void bli_packm_cntl_init() = bli_packm_cntl_obj_create( BLIS_UNBLOCKED, BLIS_VARIANT1, // When packing to rows: - packm_mult_nvec, // - nvec multiple is used for m dimension - packm_mult_ldim, // - ldim multiple is used for n dimension + BLIS_VF, // used for m dimension + BLIS_VF, // used for n dimension FALSE, // do NOT invert diagonal FALSE, // do NOT iterate backwards if upper FALSE, // do NOT iterate backwards if lower @@ -135,8 +71,8 @@ void bli_packm_cntl_init() = bli_packm_cntl_obj_create( BLIS_UNBLOCKED, BLIS_VARIANT1, // When packing to columns: - packm_mult_ldim, // - ldim multiple is used for m dimension - packm_mult_nvec, // - nvec multiple is used for n dimension + BLIS_VF, // used for m dimension + BLIS_VF, // used for n dimension FALSE, // do NOT invert diagonal FALSE, // do NOT iterate backwards if upper FALSE, // do NOT iterate backwards if lower @@ -151,22 +87,14 @@ void bli_packm_cntl_init() void bli_packm_cntl_finalize() { - bli_func_obj_free( packm_struc_cxk_kers ); - bli_func_obj_free( packm_struc_cxk_4mi_kers ); - bli_func_obj_free( packm_struc_cxk_3mis_kers ); - bli_func_obj_free( packm_struc_cxk_rih_kers ); - bli_cntl_obj_free( packm_cntl_row ); bli_cntl_obj_free( packm_cntl_col ); - - bli_blksz_obj_free( packm_mult_ldim ); - bli_blksz_obj_free( packm_mult_nvec ); } packm_t* bli_packm_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* mr, - blksz_t* nr, + bszid_t bmid_m, + bszid_t bmid_n, bool_t does_invert_diag, bool_t rev_iter_if_upper, bool_t rev_iter_if_lower, @@ -179,8 +107,8 @@ packm_t* bli_packm_cntl_obj_create( impl_t impl_type, cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->mr = mr; - cntl->nr = nr; + cntl->bmid_m = bmid_m; + cntl->bmid_n = bmid_n; cntl->does_invert_diag = does_invert_diag; cntl->rev_iter_if_upper = rev_iter_if_upper; cntl->rev_iter_if_lower = rev_iter_if_lower; @@ -193,8 +121,8 @@ packm_t* bli_packm_cntl_obj_create( impl_t impl_type, void bli_packm_cntl_obj_init( packm_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* mr, - blksz_t* nr, + bszid_t bmid_m, + bszid_t bmid_n, bool_t does_invert_diag, bool_t rev_iter_if_upper, bool_t rev_iter_if_lower, @@ -203,8 +131,8 @@ void bli_packm_cntl_obj_init( packm_t* cntl, { cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->mr = mr; - cntl->nr = nr; + cntl->bmid_m = bmid_m; + cntl->bmid_n = bmid_n; cntl->does_invert_diag = does_invert_diag; cntl->rev_iter_if_upper = rev_iter_if_upper; cntl->rev_iter_if_lower = rev_iter_if_lower; diff --git a/frame/1m/packm/bli_packm_cntl.h b/frame/1m/packm/bli_packm_cntl.h index 03c04c7bb..c27dba9a9 100644 --- a/frame/1m/packm/bli_packm_cntl.h +++ b/frame/1m/packm/bli_packm_cntl.h @@ -36,8 +36,8 @@ struct packm_s { impl_t impl_type; varnum_t var_num; - blksz_t* mr; - blksz_t* nr; + bszid_t bmid_m; + bszid_t bmid_n; bool_t does_invert_diag; bool_t rev_iter_if_upper; bool_t rev_iter_if_lower; @@ -46,8 +46,8 @@ struct packm_s }; typedef struct packm_s packm_t; -#define cntl_mr( cntl ) cntl->mr -#define cntl_nr( cntl ) cntl->nr +#define cntl_bmid_m( cntl ) cntl->bmid_m +#define cntl_bmid_n( cntl ) cntl->bmid_n #define cntl_does_invert_diag( cntl ) cntl->does_invert_diag #define cntl_rev_iter_if_upper( cntl ) cntl->rev_iter_if_upper @@ -67,8 +67,8 @@ void bli_packm_cntl_init( void ); void bli_packm_cntl_finalize( void ); packm_t* bli_packm_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* mr_def, - blksz_t* nr_def, + bszid_t bmid_m, + bszid_t bmid_n, bool_t does_invert_diag, bool_t rev_iter_if_upper, bool_t rev_iter_if_lower, @@ -77,8 +77,8 @@ packm_t* bli_packm_cntl_obj_create( impl_t impl_type, void bli_packm_cntl_obj_init( packm_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* mr_def, - blksz_t* nr_def, + bszid_t bmid_m, + bszid_t bmid_n, bool_t does_invert_diag, bool_t rev_iter_if_upper, bool_t rev_iter_if_lower, diff --git a/frame/1m/packm/bli_packm_cntx.c b/frame/1m/packm/bli_packm_cntx.c new file mode 100644 index 000000000..787531f41 --- /dev/null +++ b/frame/1m/packm/bli_packm_cntx.c @@ -0,0 +1,57 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define context initialization functions. +// + +void bli_packm_cntx_init( cntx_t* cntx ) +{ + bli_cntx_obj_create( cntx ); + + // Initialize the context with kernels that may be needed for the + // current operation. + bli_gks_cntx_set_l1v_ker( BLIS_COPYV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_INVERTV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_SCALV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_SCAL2V_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_SETV_KER, cntx ); +} + +void bli_packm_cntx_finalize( cntx_t* cntx ) +{ + bli_cntx_obj_free( cntx ); +} diff --git a/frame/1f/dotxaxpyf/bli_dotxaxpyf_fusefac.c b/frame/1m/packm/bli_packm_cntx.h similarity index 88% rename from frame/1f/dotxaxpyf/bli_dotxaxpyf_fusefac.c rename to frame/1m/packm/bli_packm_cntx.h index 4820cf022..1ab4df826 100644 --- a/frame/1f/dotxaxpyf/bli_dotxaxpyf_fusefac.c +++ b/frame/1m/packm/bli_packm_cntx.h @@ -32,16 +32,16 @@ */ -#include "blis.h" // -// Define object-based fusing factor query routine. +// Prototype context initialization functions. // -static dim_t GENARRAY(factors,dotxaxpyf_fusefac); +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ); \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ); -dim_t bli_dotxaxpyf_fusefac( num_t dt ) -{ - return factors[ dt ]; -} +GENPROT( packm ) diff --git a/frame/1m/packm/bli_packm_cxk.c b/frame/1m/packm/bli_packm_cxk.c index 7451d8318..c50b06456 100644 --- a/frame/1m/packm/bli_packm_cxk.c +++ b/frame/1m/packm/bli_packm_cxk.c @@ -36,13 +36,14 @@ #define FUNCPTR_T packm_cxk_fp -typedef void (*FUNCPTR_T)( - conj_t conja, - dim_t panel_len, - void* kappa, - void* a, inc_t inca, inc_t lda, - void* p, inc_t ldp - ); +typedef void (*FUNCPTR_T) + ( + conj_t conja, + dim_t panel_len, + void* kappa, + void* a, inc_t inca, inc_t lda, + void* p, inc_t ldp + ); #undef FUNCPTR_ARRAY_LENGTH #define FUNCPTR_ARRAY_LENGTH 18 @@ -155,14 +156,16 @@ static FUNCPTR_T ftypes[FUNCPTR_ARRAY_LENGTH][BLIS_NUM_FP_TYPES] = #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t panel_dim, \ - dim_t panel_len, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t panel_dim, \ + dim_t panel_len, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ num_t dt; \ FUNCPTR_T f; \ @@ -181,25 +184,32 @@ void PASTEMAC(ch,varname)( \ provided, we invoke the implementation. Otherwise, we use scal2m. */ \ if ( f != NULL ) \ { \ - f( conja, \ - panel_len, \ - kappa, \ - a, inca, lda, \ - p, ldp ); \ + f \ + ( \ + conja, \ + panel_len, \ + kappa, \ + a, inca, lda, \ + p, ldp \ + ); \ } \ else \ { \ /* Treat the micro-panel as panel_dim x panel_len and column-stored (unit row stride). */ \ - PASTEMAC3(ch,ch,ch,scal2m)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - conja, \ - panel_dim, \ - panel_len, \ - kappa, \ - a, inca, lda, \ - p, 1, ldp ); \ + PASTEMAC(ch,scal2m) \ + ( \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + conja, \ + panel_dim, \ + panel_len, \ + kappa, \ + a, inca, lda, \ + p, 1, ldp, \ + cntx \ + ); \ } \ } diff --git a/frame/1m/packm/bli_packm_cxk.h b/frame/1m/packm/bli_packm_cxk.h index 6a31a2a7b..322eaa4ad 100644 --- a/frame/1m/packm/bli_packm_cxk.h +++ b/frame/1m/packm/bli_packm_cxk.h @@ -32,20 +32,22 @@ */ -#include "bli_packm_ref_cxk.h" +#include "bli_packm_cxk_ref.h" #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t panel_dim, \ - dim_t panel_len, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t panel_dim, \ + dim_t panel_len, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROT_BASIC( packm_cxk ) diff --git a/frame/1m/packm/bli_packm_cxk_3mis.c b/frame/1m/packm/bli_packm_cxk_3mis.c index 243934a82..80c388096 100644 --- a/frame/1m/packm/bli_packm_cxk_3mis.c +++ b/frame/1m/packm/bli_packm_cxk_3mis.c @@ -36,13 +36,14 @@ #define FUNCPTR_T packm_cxk_fp -typedef void (*FUNCPTR_T)( - conj_t conja, - dim_t panel_len, - void* kappa, - void* a, inc_t inca, inc_t lda, - void* p, inc_t is_p, inc_t ldp - ); +typedef void (*FUNCPTR_T) + ( + conj_t conja, + dim_t panel_len, + void* kappa, + void* a, inc_t inca, inc_t lda, + void* p, inc_t is_p, inc_t ldp + ); #undef FUNCPTR_ARRAY_LENGTH #define FUNCPTR_ARRAY_LENGTH 32 @@ -195,14 +196,16 @@ static FUNCPTR_T ftypes[FUNCPTR_ARRAY_LENGTH][BLIS_NUM_FP_TYPES] = #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t panel_dim, \ - dim_t panel_len, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t panel_dim, \ + dim_t panel_len, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ num_t dt; \ FUNCPTR_T f; \ @@ -221,11 +224,14 @@ void PASTEMAC(ch,varname)( \ provided, we invoke the implementation. Otherwise, we use scal2m. */ \ if ( f != NULL ) \ { \ - f( conja, \ - panel_len, \ - kappa, \ - a, inca, lda, \ - p, is_p, ldp ); \ + f \ + ( \ + conja, \ + panel_len, \ + kappa, \ + a, inca, lda, \ + p, is_p, ldp \ + ); \ } \ else \ { \ @@ -258,13 +264,16 @@ void PASTEMAC(ch,varname)( \ ctype_r* restrict pi11_i = p_i + (i )*1 + (j )*ldp; \ ctype_r* restrict pi11_rpi = p_rpi + (i )*1 + (j )*ldp; \ \ - PASTEMAC(ch,scal2jri3s)( *kappa_r, \ - *kappa_i, \ - *alpha11_r, \ - *alpha11_i, \ - *pi11_r, \ - *pi11_i, \ - *pi11_rpi ); \ + PASTEMAC(ch,scal2jri3s) \ + ( \ + *kappa_r, \ + *kappa_i, \ + *alpha11_r, \ + *alpha11_i, \ + *pi11_r, \ + *pi11_i, \ + *pi11_rpi \ + ); \ } \ } \ } \ @@ -280,13 +289,16 @@ void PASTEMAC(ch,varname)( \ ctype_r* restrict pi11_i = p_i + (i )*1 + (j )*ldp; \ ctype_r* restrict pi11_rpi = p_rpi + (i )*1 + (j )*ldp; \ \ - PASTEMAC(ch,scal2ri3s)( *kappa_r, \ - *kappa_i, \ - *alpha11_r, \ - *alpha11_i, \ - *pi11_r, \ - *pi11_i, \ - *pi11_rpi ); \ + PASTEMAC(ch,scal2ri3s) \ + ( \ + *kappa_r, \ + *kappa_i, \ + *alpha11_r, \ + *alpha11_i, \ + *pi11_r, \ + *pi11_i, \ + *pi11_rpi \ + ); \ } \ } \ } \ diff --git a/frame/1m/packm/bli_packm_cxk_3mis.h b/frame/1m/packm/bli_packm_cxk_3mis.h index 73a94e860..f60e78c9e 100644 --- a/frame/1m/packm/bli_packm_cxk_3mis.h +++ b/frame/1m/packm/bli_packm_cxk_3mis.h @@ -32,20 +32,22 @@ */ -#include "bli_packm_ref_cxk_3mis.h" +#include "bli_packm_cxk_3mis_ref.h" #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t panel_dim, \ - dim_t panel_len, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t panel_dim, \ + dim_t panel_len, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_cxk_3mis ) diff --git a/frame/1m/packm/bli_packm_cxk_4mi.c b/frame/1m/packm/bli_packm_cxk_4mi.c index 45b0f6a22..c0291d245 100644 --- a/frame/1m/packm/bli_packm_cxk_4mi.c +++ b/frame/1m/packm/bli_packm_cxk_4mi.c @@ -36,13 +36,14 @@ #define FUNCPTR_T packm_cxk_fp -typedef void (*FUNCPTR_T)( - conj_t conja, - dim_t panel_len, - void* kappa, - void* a, inc_t inca, inc_t lda, - void* p, inc_t is_p, inc_t ldp - ); +typedef void (*FUNCPTR_T) + ( + conj_t conja, + dim_t panel_len, + void* kappa, + void* a, inc_t inca, inc_t lda, + void* p, inc_t is_p, inc_t ldp + ); #undef FUNCPTR_ARRAY_LENGTH #define FUNCPTR_ARRAY_LENGTH 32 @@ -192,18 +193,19 @@ static FUNCPTR_T ftypes[FUNCPTR_ARRAY_LENGTH][BLIS_NUM_FP_TYPES] = - #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t panel_dim, \ - dim_t panel_len, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t panel_dim, \ + dim_t panel_len, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ num_t dt; \ FUNCPTR_T f; \ @@ -222,11 +224,14 @@ void PASTEMAC(ch,varname)( \ provided, we invoke the implementation. Otherwise, we use scal2m. */ \ if ( f != NULL ) \ { \ - f( conja, \ - panel_len, \ - kappa, \ - a, inca, lda, \ - p, is_p, ldp ); \ + f \ + ( \ + conja, \ + panel_len, \ + kappa, \ + a, inca, lda, \ + p, is_p, ldp \ + ); \ } \ else \ { \ @@ -257,12 +262,15 @@ void PASTEMAC(ch,varname)( \ ctype_r* restrict pi11_r = p_r + (i )*1 + (j )*ldp; \ ctype_r* restrict pi11_i = p_i + (i )*1 + (j )*ldp; \ \ - PASTEMAC(ch,scal2jris)( *kappa_r, \ - *kappa_i, \ - *alpha11_r, \ - *alpha11_i, \ - *pi11_r, \ - *pi11_i ); \ + PASTEMAC(ch,scal2jris) \ + ( \ + *kappa_r, \ + *kappa_i, \ + *alpha11_r, \ + *alpha11_i, \ + *pi11_r, \ + *pi11_i \ + ); \ } \ } \ } \ @@ -277,12 +285,15 @@ void PASTEMAC(ch,varname)( \ ctype_r* restrict pi11_r = p_r + (i )*1 + (j )*ldp; \ ctype_r* restrict pi11_i = p_i + (i )*1 + (j )*ldp; \ \ - PASTEMAC(ch,scal2ris)( *kappa_r, \ - *kappa_i, \ - *alpha11_r, \ - *alpha11_i, \ - *pi11_r, \ - *pi11_i ); \ + PASTEMAC(ch,scal2ris) \ + ( \ + *kappa_r, \ + *kappa_i, \ + *alpha11_r, \ + *alpha11_i, \ + *pi11_r, \ + *pi11_i \ + ); \ } \ } \ } \ diff --git a/frame/1m/packm/bli_packm_cxk_4mi.h b/frame/1m/packm/bli_packm_cxk_4mi.h index 388829ae8..dd9520d6d 100644 --- a/frame/1m/packm/bli_packm_cxk_4mi.h +++ b/frame/1m/packm/bli_packm_cxk_4mi.h @@ -32,20 +32,22 @@ */ -#include "bli_packm_ref_cxk_4mi.h" +#include "bli_packm_cxk_4mi_ref.h" #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t panel_dim, \ - dim_t panel_len, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t panel_dim, \ + dim_t panel_len, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_cxk_4mi ) diff --git a/frame/1m/packm/bli_packm_cxk_rih.c b/frame/1m/packm/bli_packm_cxk_rih.c index 2dca0a4f9..ec70c08c1 100644 --- a/frame/1m/packm/bli_packm_cxk_rih.c +++ b/frame/1m/packm/bli_packm_cxk_rih.c @@ -36,14 +36,15 @@ #define FUNCPTR_T packm_cxk_fp -typedef void (*FUNCPTR_T)( - conj_t conja, - pack_t schema, - dim_t panel_len, - void* kappa, - void* a, inc_t inca, inc_t lda, - void* p, inc_t ldp - ); +typedef void (*FUNCPTR_T) + ( + conj_t conja, + pack_t schema, + dim_t panel_len, + void* kappa, + void* a, inc_t inca, inc_t lda, + void* p, inc_t ldp + ); #undef FUNCPTR_ARRAY_LENGTH #define FUNCPTR_ARRAY_LENGTH 32 @@ -194,15 +195,17 @@ static FUNCPTR_T ftypes_rih[FUNCPTR_ARRAY_LENGTH][BLIS_NUM_FP_TYPES] = #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t panel_dim, \ - dim_t panel_len, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t panel_dim, \ + dim_t panel_len, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ num_t dt; \ FUNCPTR_T f; \ @@ -221,12 +224,15 @@ void PASTEMAC(ch,varname)( \ provided, we invoke the implementation. Otherwise, we use scal2m. */ \ if ( f != NULL ) \ { \ - f( conja, \ - schema, \ - panel_len, \ - kappa, \ - a, inca, lda, \ - p, ldp ); \ + f \ + ( \ + conja, \ + schema, \ + panel_len, \ + kappa, \ + a, inca, lda, \ + p, ldp \ + ); \ } \ else \ { \ @@ -252,9 +258,12 @@ void PASTEMAC(ch,varname)( \ ctype* restrict alpha11 = a_r + (i )*inca1 + (j )*lda1; \ ctype_r* restrict pi11_r = p_r + (i )*1 + (j )*ldp1; \ \ - PASTEMAC(ch,scal2jros)( *kappa_cast, \ - *alpha11, \ - *pi11_r ); \ + PASTEMAC(ch,scal2jros) \ + ( \ + *kappa_cast, \ + *alpha11, \ + *pi11_r \ + ); \ } \ } \ } \ @@ -267,9 +276,12 @@ void PASTEMAC(ch,varname)( \ ctype* restrict alpha11 = a_r + (i )*inca1 + (j )*lda1; \ ctype_r* restrict pi11_r = p_r + (i )*1 + (j )*ldp1; \ \ - PASTEMAC(ch,scal2ros)( *kappa_cast, \ - *alpha11, \ - *pi11_r ); \ + PASTEMAC(ch,scal2ros) \ + ( \ + *kappa_cast, \ + *alpha11, \ + *pi11_r \ + ); \ } \ } \ } \ @@ -285,9 +297,12 @@ void PASTEMAC(ch,varname)( \ ctype* restrict alpha11 = a_r + (i )*inca1 + (j )*lda1; \ ctype_r* restrict pi11_r = p_r + (i )*1 + (j )*ldp1; \ \ - PASTEMAC(ch,scal2jios)( *kappa_cast, \ - *alpha11, \ - *pi11_r ); \ + PASTEMAC(ch,scal2jios) \ + ( \ + *kappa_cast, \ + *alpha11, \ + *pi11_r \ + ); \ } \ } \ } \ @@ -300,9 +315,12 @@ void PASTEMAC(ch,varname)( \ ctype* restrict alpha11 = a_r + (i )*inca1 + (j )*lda1; \ ctype_r* restrict pi11_r = p_r + (i )*1 + (j )*ldp1; \ \ - PASTEMAC(ch,scal2ios)( *kappa_cast, \ - *alpha11, \ - *pi11_r ); \ + PASTEMAC(ch,scal2ios) \ + ( \ + *kappa_cast, \ + *alpha11, \ + *pi11_r \ + ); \ } \ } \ } \ @@ -318,9 +336,12 @@ void PASTEMAC(ch,varname)( \ ctype* restrict alpha11 = a_r + (i )*inca1 + (j )*lda1; \ ctype_r* restrict pi11_r = p_r + (i )*1 + (j )*ldp1; \ \ - PASTEMAC(ch,scal2jrpis)( *kappa_cast, \ - *alpha11, \ - *pi11_r ); \ + PASTEMAC(ch,scal2jrpis) \ + ( \ + *kappa_cast, \ + *alpha11, \ + *pi11_r \ + ); \ } \ } \ } \ @@ -333,9 +354,12 @@ void PASTEMAC(ch,varname)( \ ctype* restrict alpha11 = a_r + (i )*inca1 + (j )*lda1; \ ctype_r* restrict pi11_r = p_r + (i )*1 + (j )*ldp1; \ \ - PASTEMAC(ch,scal2rpis)( *kappa_cast, \ - *alpha11, \ - *pi11_r ); \ + PASTEMAC(ch,scal2rpis) \ + ( \ + *kappa_cast, \ + *alpha11, \ + *pi11_r \ + ); \ } \ } \ } \ diff --git a/frame/1m/packm/bli_packm_cxk_rih.h b/frame/1m/packm/bli_packm_cxk_rih.h index 5106b7b03..23462da6f 100644 --- a/frame/1m/packm/bli_packm_cxk_rih.h +++ b/frame/1m/packm/bli_packm_cxk_rih.h @@ -32,21 +32,23 @@ */ -#include "bli_packm_ref_cxk_rih.h" +#include "bli_packm_cxk_rih_ref.h" #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t panel_dim, \ - dim_t panel_len, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t panel_dim, \ + dim_t panel_len, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_cxk_rih ) diff --git a/frame/1m/packm/bli_packm_init.c b/frame/1m/packm/bli_packm_init.c index 4c35e0201..76afaac4d 100644 --- a/frame/1m/packm/bli_packm_init.c +++ b/frame/1m/packm/bli_packm_init.c @@ -36,6 +36,7 @@ void bli_packm_init( obj_t* a, obj_t* p, + cntx_t* cntx, packm_t* cntl ) { // The purpose of packm_init() is to initialize an object P so that @@ -49,13 +50,13 @@ void bli_packm_init( obj_t* a, packord_t pack_ord_if_up; packord_t pack_ord_if_lo; packbuf_t pack_buf_type; - blksz_t* mr; - blksz_t* nr; + bszid_t bmult_id_m; + bszid_t bmult_id_n; obj_t c; // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_packm_init_check( a, p, cntl ); + bli_packm_init_check( a, p, cntx ); // First check if we are to skip this operation because the control tree // is NULL, and if so, simply alias the object to its packed counterpart. @@ -132,13 +133,38 @@ void bli_packm_init( obj_t* a, } - // Extract various fields from the control tree and pass them in - // explicitly into _init_pack(). This allows external code generators - // the option of bypassing usage of control trees altogether. - schema = cntl_pack_schema( cntl ); - pack_buf_type = cntl_pack_buf_type( cntl ); - mr = cntl_mr( cntl ); - nr = cntl_nr( cntl ); + // Extract various fields from the control tree. + pack_buf_type = cntl_pack_buf_type( cntl ); + bmult_id_m = cntl_bmid_m( cntl ); + bmult_id_n = cntl_bmid_n( cntl ); + + // Extract the schema from the context, depending on whether we are + // preparing to pack a block of A or panel of B. For A and B, we must + // obtain the schema from the context since the induced methods reuse + // the same control trees used by native execution, and those induced + // methods specify the schema used by the current execution phase + // within the context (whereas the control tree does not change). + if ( pack_buf_type == BLIS_BUFFER_FOR_A_BLOCK ) + { + schema = bli_cntx_get_pack_schema_a( cntx ); +//printf( "bli_packm_init: pack schema a = %x\n", schema ); + } + else if ( pack_buf_type == BLIS_BUFFER_FOR_B_PANEL ) + { + schema = bli_cntx_get_pack_schema_b( cntx ); +//printf( "bli_packm_init: pack schema b = %x\n", schema ); + } + else // if ( pack_buf_type == BLIS_BUFFER_FOR_C_PANEL ) + { + // If we get a request to pack C for some reason, it is likely + // not part of an induced method, and so it would be safe (and + // necessary) to read the pack schema from the control tree. + schema = cntl_pack_schema( cntl ); +//printf( "bli_packm_init: pack schema c = %x\n", schema ); + } + + // Prepare a few other variables based on properties of the control + // tree. if ( cntl_does_invert_diag( cntl ) ) invert_diag = BLIS_INVERT_DIAG; else invert_diag = BLIS_NO_INVERT_DIAG; @@ -155,10 +181,11 @@ void bli_packm_init( obj_t* a, pack_ord_if_up, pack_ord_if_lo, pack_buf_type, - mr, - nr, + bmult_id_m, + bmult_id_n, &c, - p ); + p, + cntx ); // Now p is ready to be packed. } @@ -169,19 +196,20 @@ void bli_packm_init_pack( invdiag_t invert_diag, packord_t pack_ord_if_up, packord_t pack_ord_if_lo, packbuf_t pack_buf_type, - blksz_t* mr, - blksz_t* nr, + bszid_t bmult_id_m, + bszid_t bmult_id_n, obj_t* c, - obj_t* p ) + obj_t* p, + cntx_t* cntx ) { num_t dt = bli_obj_datatype( *c ); trans_t transc = bli_obj_onlytrans_status( *c ); dim_t m_c = bli_obj_length( *c ); dim_t n_c = bli_obj_width( *c ); - dim_t mr_def_dim = bli_blksz_get_def( dt, mr ); - dim_t mr_pack_dim = bli_blksz_get_max( dt, mr ); - dim_t nr_def_dim = bli_blksz_get_def( dt, nr ); - dim_t nr_pack_dim = bli_blksz_get_max( dt, nr ); + dim_t bmult_m_def = bli_cntx_get_blksz_def_dt( dt, bmult_id_m, cntx ); + dim_t bmult_m_pack = bli_cntx_get_blksz_max_dt( dt, bmult_id_m, cntx ); + dim_t bmult_n_def = bli_cntx_get_blksz_def_dt( dt, bmult_id_n, cntx ); + dim_t bmult_n_pack = bli_cntx_get_blksz_max_dt( dt, bmult_id_n, cntx ); mem_t* mem_p; dim_t m_p, n_p; @@ -255,8 +283,8 @@ void bli_packm_init_pack( invdiag_t invert_diag, // level-2 operations, but that's okay with us. m_p = bli_obj_length( *p ); n_p = bli_obj_width( *p ); - m_p_pad = bli_align_dim_to_mult( m_p, mr_def_dim ); - n_p_pad = bli_align_dim_to_mult( n_p, nr_def_dim ); + m_p_pad = bli_align_dim_to_mult( m_p, bmult_m_def ); + n_p_pad = bli_align_dim_to_mult( n_p, bmult_n_def ); // Save the padded dimensions into the packed object. It is important // to save these dimensions since they represent the actual dimensions @@ -325,13 +353,14 @@ void bli_packm_init_pack( invdiag_t invert_diag, dim_t ps_p, ps_p_orig; // The panel dimension (for each datatype) should be equal to the - // register blocksize in the m dimension. - m_panel = mr_def_dim; + // default (logical) blocksize multiple in the m dimension. + m_panel = bmult_m_def; // The "column stride" of a row panel packed object is interpreted as - // the column stride WITHIN a panel. Thus, this is equal to the panel - // pack dimension (which may be equal to the panel dimension). - cs_p = mr_pack_dim; + // the column stride WITHIN a panel. Thus, this is equal to the + // packing (storage) blocksize multiple (which may be equal to the + // default (logical) blocksize multiple. + cs_p = bmult_m_pack; // The "row stride" of a row panel packed object is interpreted // as the row stride WITHIN a panel. Thus, it is unit. @@ -417,13 +446,14 @@ void bli_packm_init_pack( invdiag_t invert_diag, dim_t ps_p, ps_p_orig; // The panel dimension (for each datatype) should be equal to the - // register blocksize in the n dimension. - n_panel = nr_def_dim; + // default (logical) blocksize multiple in the n dimension. + n_panel = bmult_n_def; // The "row stride" of a column panel packed object is interpreted as - // the row stride WITHIN a panel. Thus, this is equal to the panel - // pack dimension (which may be equal to the panel dimension). - rs_p = nr_pack_dim; + // the row stride WITHIN a panel. Thus, this is equal to the + // packing (storage) blocksize multiple (which may be equal to the + // default (logical) blocksize multiple. + rs_p = bmult_n_pack; // The "column stride" of a column panel packed object is interpreted // as the column stride WITHIN a panel. Thus, it is unit. diff --git a/frame/1m/packm/bli_packm_init.h b/frame/1m/packm/bli_packm_init.h index f5d2f1f6a..a21956ba2 100644 --- a/frame/1m/packm/bli_packm_init.h +++ b/frame/1m/packm/bli_packm_init.h @@ -34,6 +34,7 @@ void bli_packm_init( obj_t* a, obj_t* p, + cntx_t* cntx, packm_t* cntl ); void bli_packm_init_pack( invdiag_t invert_diag, @@ -41,10 +42,11 @@ void bli_packm_init_pack( invdiag_t invert_diag, packord_t pack_ord_if_up, packord_t pack_ord_if_lo, packbuf_t pack_buf_type, - blksz_t* mr, - blksz_t* nr, + bszid_t mr_id, + bszid_t nr_id, obj_t* c, - obj_t* p ); + obj_t* p, + cntx_t* cntx ); /* void bli_packm_init_cast( obj_t* a, diff --git a/frame/1m/packm/bli_packm_int.c b/frame/1m/packm/bli_packm_int.c index c90c0eaa5..6650fbcd7 100644 --- a/frame/1m/packm/bli_packm_int.c +++ b/frame/1m/packm/bli_packm_int.c @@ -38,6 +38,7 @@ typedef void (*FUNCPTR_T)( obj_t* a, obj_t* p, + cntx_t* cntx, packm_thrinfo_t* t ); static FUNCPTR_T vars[6][3] = @@ -53,6 +54,7 @@ static FUNCPTR_T vars[6][3] = void bli_packm_int( obj_t* a, obj_t* p, + cntx_t* cntx, packm_t* cntl, packm_thrinfo_t* thread ) { @@ -62,7 +64,7 @@ void bli_packm_int( obj_t* a, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_packm_int_check( a, p, cntl ); + bli_packm_int_check( a, p, cntx ); // Sanity check; A should never have a zero dimension. If we must support // it, then we should fold it into the next alias-and-early-exit block. @@ -122,6 +124,7 @@ void bli_packm_int( obj_t* a, // Invoke the variant with kappa_use. f( a, p, + cntx, thread ); // Barrier so that packing is done before computation diff --git a/frame/1m/packm/bli_packm_int.h b/frame/1m/packm/bli_packm_int.h index 35f990620..e46d131a5 100644 --- a/frame/1m/packm/bli_packm_int.h +++ b/frame/1m/packm/bli_packm_int.h @@ -34,6 +34,7 @@ void bli_packm_int( obj_t* a, obj_t* p, + cntx_t* cntx, packm_t* cntl, packm_thrinfo_t* thread ); diff --git a/frame/1m/packm/bli_packm_struc_cxk.c b/frame/1m/packm/bli_packm_struc_cxk.c index 1b3808c56..54fd1da72 100644 --- a/frame/1m/packm/bli_packm_struc_cxk.c +++ b/frame/1m/packm/bli_packm_struc_cxk.c @@ -37,23 +37,25 @@ #undef GENTFUNC #define GENTFUNC( ctype, ch, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + cntx_t* cntx \ + ) \ { \ dim_t panel_dim; \ dim_t panel_len; \ @@ -89,56 +91,68 @@ void PASTEMAC(ch,varname)( \ { \ /* For micro-panels of general matrices, we can call the pack kernel front-end directly. */ \ - PASTEMAC(ch,kername)( conjc, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, ldp, \ + cntx \ + ); \ } \ else if ( bli_is_herm_or_symm( strucc ) ) \ { \ /* Call a helper function for micro-panels of Hermitian/symmetric matrices. */ \ - PASTEMAC(ch,packm_herm_cxk)( strucc, \ - diagoffc, \ - uploc, \ - conjc, \ - schema, \ - m_panel, \ - n_panel, \ - m_panel_max, \ - n_panel_max, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, rs_c, cs_c, \ - incc, ldc, \ - p, rs_p, cs_p, \ - ldp ); \ + PASTEMAC(ch,packm_herm_cxk) \ + ( \ + strucc, \ + diagoffc, \ + uploc, \ + conjc, \ + schema, \ + m_panel, \ + n_panel, \ + m_panel_max, \ + n_panel_max, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, rs_c, cs_c, \ + incc, ldc, \ + p, rs_p, cs_p, \ + ldp, \ + cntx \ + ); \ } \ else /* ( bli_is_triangular( strucc ) ) */ \ { \ /* Call a helper function for micro-panels of triangular matrices. */ \ - PASTEMAC(ch,packm_tri_cxk)( strucc, \ - diagoffc, \ - diagc, \ - uploc, \ - conjc, \ - schema, \ - invdiag, \ - m_panel, \ - n_panel, \ - m_panel_max, \ - n_panel_max, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, rs_c, cs_c, \ - incc, ldc, \ - p, rs_p, cs_p, \ - ldp ); \ + PASTEMAC(ch,packm_tri_cxk) \ + ( \ + strucc, \ + diagoffc, \ + diagc, \ + uploc, \ + conjc, \ + schema, \ + invdiag, \ + m_panel, \ + n_panel, \ + m_panel_max, \ + n_panel_max, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, rs_c, cs_c, \ + incc, ldc, \ + p, rs_p, cs_p, \ + ldp, \ + cntx \ + ); \ } \ \ \ @@ -157,13 +171,18 @@ void PASTEMAC(ch,varname)( \ dim_t n_edge = n_panel_max; \ ctype* p_edge = p + (i )*rs_p; \ \ - PASTEMAC(ch,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero, \ - p_edge, rs_p, cs_p ); \ + PASTEMAC(ch,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero, \ + p_edge, rs_p, cs_p, \ + cntx \ + ); \ } \ \ if ( n_panel != n_panel_max ) \ @@ -174,13 +193,18 @@ void PASTEMAC(ch,varname)( \ dim_t n_edge = n_panel_max - j; \ ctype* p_edge = p + (j )*cs_p; \ \ - PASTEMAC(ch,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero, \ - p_edge, rs_p, cs_p ); \ + PASTEMAC(ch,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero, \ + p_edge, rs_p, cs_p, \ + cntx \ + ); \ } \ \ \ @@ -204,11 +228,16 @@ void PASTEMAC(ch,varname)( \ dim_t n_br = n_panel_max - j; \ ctype* p_br = p + (i )*rs_p + (j )*cs_p; \ \ - PASTEMAC(ch,setd)( 0, \ - m_br, \ - n_br, \ - one, \ - p_br, rs_p, cs_p ); \ + PASTEMAC(ch,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + m_br, \ + n_br, \ + one, \ + p_br, rs_p, cs_p, \ + cntx \ + ); \ } \ } \ \ @@ -231,24 +260,26 @@ INSERT_GENTFUNC_BASIC( packm_struc_cxk, packm_cxk ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ doff_t diagoffc_abs; \ dim_t i, j; \ @@ -284,12 +315,16 @@ void PASTEMAC(ch,varname)( \ } \ \ /* Pack the full panel. */ \ - PASTEMAC(ch,kername)( conjc, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, ldp, \ + cntx \ + ); \ } \ else /* if ( bli_intersects_diag_n( diagoffc, m_panel, n_panel ) ) */ \ { \ @@ -372,38 +407,50 @@ void PASTEMAC(ch,varname)( \ \ /* Pack to p10. For upper storage, this includes the unstored triangle of c11. */ \ - PASTEMAC(ch,kername)( conjc10, \ - p10_dim, \ - p10_len, \ - kappa, \ - c10, incc10, ldc10, \ - p10, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc10, \ + p10_dim, \ + p10_len, \ + kappa, \ + c10, incc10, ldc10, \ + p10, ldp, \ + cntx \ + ); \ \ /* Pack to p12. For lower storage, this includes the unstored triangle of c11. */ \ - PASTEMAC(ch,kername)( conjc12, \ - p12_dim, \ - p12_len, \ - kappa, \ - c12, incc12, ldc12, \ - p12, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc12, \ + p12_dim, \ + p12_len, \ + kappa, \ + c12, incc12, ldc12, \ + p12, ldp, \ + cntx \ + ); \ \ /* Pack the stored triangle of c11 to p11. */ \ { \ dim_t p11_m = panel_dim; \ dim_t p11_n = panel_dim; \ - dim_t j = diagoffc_abs; \ - ctype* restrict c11 = c + (j )*ldc; \ - ctype* restrict p11 = p + (j )*ldp; \ + dim_t j2 = diagoffc_abs; \ + ctype* restrict c11 = c + (j2 )*ldc; \ + ctype* restrict p11 = p + (j2 )*ldp; \ \ - PASTEMAC(ch,copym)( 0, \ - BLIS_NONUNIT_DIAG, \ - uploc, \ - conjc, \ - p11_m, \ - p11_n, \ - c11, rs_c, cs_c, \ - p11, rs_p, cs_p ); \ + PASTEMAC(ch,copym) \ + ( \ + 0, \ + BLIS_NONUNIT_DIAG, \ + uploc, \ + conjc, \ + p11_m, \ + p11_n, \ + c11, rs_c, cs_c, \ + p11, rs_p, cs_p, \ + cntx \ + ); \ \ /* If source matrix c is Hermitian, we have to zero out the imaginary components of the diagonal of p11 in case the @@ -423,13 +470,18 @@ void PASTEMAC(ch,varname)( \ /* Now that the diagonal has been made explicitly Hermitian (if applicable), we can now safely scale the stored triangle specified by uploc. */ \ - PASTEMAC(ch,scalm)( BLIS_NO_CONJUGATE, \ - 0, \ - uploc, \ - p11_m, \ - p11_n, \ - kappa, \ - p11, rs_p, cs_p ); \ + PASTEMAC(ch,scalm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + uploc, \ + p11_m, \ + p11_n, \ + kappa, \ + p11, rs_p, cs_p, \ + cntx \ + ); \ } \ } \ } @@ -443,54 +495,69 @@ INSERT_GENTFUNC_BASIC( packm_herm_cxk, packm_cxk ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffp, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffp, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ /* Pack the panel. */ \ - PASTEMAC(ch,kername)( conjc, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, ldp, \ + cntx \ + ); \ \ \ /* If the diagonal of c is implicitly unit, explicitly set the the diagonal of the packed panel to kappa. */ \ if ( bli_is_unit_diag( diagc ) ) \ { \ - PASTEMAC(ch,setd)( diagoffp, \ - m_panel, \ - n_panel, \ - kappa, \ - p, rs_p, cs_p ); \ + PASTEMAC(ch,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + m_panel, \ + n_panel, \ + kappa, \ + p, rs_p, cs_p, \ + cntx \ + ); \ } \ \ /* If requested, invert the diagonal of the packed panel. */ \ if ( invdiag == TRUE ) \ { \ - PASTEMAC(ch,invertd)( diagoffp, \ - m_panel, \ - n_panel, \ - p, rs_p, cs_p ); \ + PASTEMAC(ch,invertd) \ + ( \ + diagoffp, \ + m_panel, \ + n_panel, \ + p, rs_p, cs_p, \ + cntx \ + ); \ } \ \ /* Set the region opposite the diagonal of p to zero. To do this, @@ -508,13 +575,18 @@ void PASTEMAC(ch,varname)( \ bli_toggle_uplo( uplop ); \ bli_shift_diag_offset_to_shrink_uplo( uplop, diagoffp ); \ \ - PASTEMAC(ch,setm)( diagoffp, \ - BLIS_NONUNIT_DIAG, \ - uplop, \ - m_panel, \ - n_panel, \ - zero, \ - p, rs_p, cs_p ); \ + PASTEMAC(ch,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + BLIS_NONUNIT_DIAG, \ + uplop, \ + m_panel, \ + n_panel, \ + zero, \ + p, rs_p, cs_p, \ + cntx \ + ); \ } \ \ } diff --git a/frame/1m/packm/bli_packm_struc_cxk.h b/frame/1m/packm/bli_packm_struc_cxk.h index 4686a0c47..506cdb881 100644 --- a/frame/1m/packm/bli_packm_struc_cxk.h +++ b/frame/1m/packm/bli_packm_struc_cxk.h @@ -35,23 +35,25 @@ #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffp, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffp, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + cntx_t* cntx \ + ); INSERT_GENTPROT_BASIC( packm_struc_cxk ) @@ -60,24 +62,26 @@ INSERT_GENTPROT_BASIC( packm_struc_cxk ) #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROT_BASIC( packm_herm_cxk ) @@ -86,26 +90,28 @@ INSERT_GENTPROT_BASIC( packm_herm_cxk ) #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROT_BASIC( packm_tri_cxk ) diff --git a/frame/1m/packm/bli_packm_struc_cxk_3mis.c b/frame/1m/packm/bli_packm_struc_cxk_3mis.c index 1ad507ec6..97c232e47 100644 --- a/frame/1m/packm/bli_packm_struc_cxk_3mis.c +++ b/frame/1m/packm/bli_packm_struc_cxk_3mis.c @@ -37,23 +37,25 @@ #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + cntx_t* cntx \ + ) \ { \ dim_t panel_dim; \ dim_t panel_len; \ @@ -89,56 +91,68 @@ void PASTEMAC(ch,varname)( \ { \ /* For micro-panels of general matrices, we can call the pack kernel front-end directly. */ \ - PASTEMAC(ch,kername)( conjc, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, is_p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, is_p, ldp, \ + cntx \ + ); \ } \ else if ( bli_is_herm_or_symm( strucc ) ) \ { \ /* Call a helper function for micro-panels of Hermitian/symmetric matrices. */ \ - PASTEMAC(ch,packm_herm_cxk_3mis)( strucc, \ - diagoffc, \ - uploc, \ - conjc, \ - schema, \ - m_panel, \ - n_panel, \ - m_panel_max, \ - n_panel_max, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, rs_c, cs_c, \ - incc, ldc, \ - p, rs_p, cs_p, \ - is_p, ldp ); \ + PASTEMAC(ch,packm_herm_cxk_3mis) \ + ( \ + strucc, \ + diagoffc, \ + uploc, \ + conjc, \ + schema, \ + m_panel, \ + n_panel, \ + m_panel_max, \ + n_panel_max, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, rs_c, cs_c, \ + incc, ldc, \ + p, rs_p, cs_p, \ + is_p, ldp, \ + cntx \ + ); \ } \ else /* ( bli_is_triangular( strucc ) ) */ \ { \ /* Call a helper function for micro-panels of triangular matrices. */ \ - PASTEMAC(ch,packm_tri_cxk_3mis)( strucc, \ - diagoffc, \ - diagc, \ - uploc, \ - conjc, \ - schema, \ - invdiag, \ - m_panel, \ - n_panel, \ - m_panel_max, \ - n_panel_max, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, rs_c, cs_c, \ - incc, ldc, \ - p, rs_p, cs_p, \ - is_p, ldp ); \ + PASTEMAC(ch,packm_tri_cxk_3mis) \ + ( \ + strucc, \ + diagoffc, \ + diagc, \ + uploc, \ + conjc, \ + schema, \ + invdiag, \ + m_panel, \ + n_panel, \ + m_panel_max, \ + n_panel_max, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, rs_c, cs_c, \ + incc, ldc, \ + p, rs_p, cs_p, \ + is_p, ldp, \ + cntx \ + ); \ } \ \ \ @@ -159,27 +173,42 @@ void PASTEMAC(ch,varname)( \ ctype_r* p_edge_i = ( ctype_r* )p + is_p + (i )*rs_p; \ ctype_r* p_edge_rpi = ( ctype_r* )p + 2*is_p + (i )*rs_p; \ \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_r, rs_p, cs_p ); \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_i, rs_p, cs_p ); \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_rpi, rs_p, cs_p ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_r, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_i, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_rpi, rs_p, cs_p, \ + cntx \ + ); \ } \ \ if ( n_panel != n_panel_max ) \ @@ -192,27 +221,42 @@ void PASTEMAC(ch,varname)( \ ctype_r* p_edge_i = ( ctype_r* )p + is_p + (j )*cs_p; \ ctype_r* p_edge_rpi = ( ctype_r* )p + 2*is_p + (j )*cs_p; \ \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_r, rs_p, cs_p ); \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_i, rs_p, cs_p ); \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_rpi, rs_p, cs_p ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_r, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_i, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_rpi, rs_p, cs_p, \ + cntx \ + ); \ } \ \ \ @@ -238,16 +282,26 @@ void PASTEMAC(ch,varname)( \ ctype_r* p_br_r = ( ctype_r* )p + (i )*rs_p + (j )*cs_p; \ ctype_r* p_br_i = ( ctype_r* )p + is_p + (i )*rs_p + (j )*cs_p; \ \ - PASTEMAC(chr,setd)( 0, \ - m_br, \ - n_br, \ - one_r, \ - p_br_r, rs_p, cs_p ); \ - PASTEMAC(chr,setd)( 0, \ - m_br, \ - n_br, \ - zero_r, \ - p_br_i, rs_p, cs_p ); \ + PASTEMAC(chr,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + m_br, \ + n_br, \ + one_r, \ + p_br_r, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + m_br, \ + n_br, \ + zero_r, \ + p_br_i, rs_p, cs_p, \ + cntx \ + ); \ } \ } \ } @@ -260,24 +314,26 @@ INSERT_GENTFUNCCO_BASIC( packm_struc_cxk_3mis, packm_cxk_3mis ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ doff_t diagoffc_abs; \ dim_t i, j; \ @@ -314,12 +370,16 @@ void PASTEMAC(ch,varname)( \ } \ \ /* Pack the full panel. */ \ - PASTEMAC(ch,kername)( conjc, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, is_p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, is_p, ldp, \ + cntx \ + ); \ } \ else /* if ( bli_intersects_diag_n( diagoffc, m_panel, n_panel ) ) */ \ { \ @@ -407,21 +467,29 @@ void PASTEMAC(ch,varname)( \ \ /* Pack to p10. For upper storage, this includes the unstored triangle of c11. */ \ - PASTEMAC(ch,kername)( conjc10, \ - p10_dim, \ - p10_len, \ - kappa, \ - c10, incc10, ldc10, \ - p10, is_p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc10, \ + p10_dim, \ + p10_len, \ + kappa, \ + c10, incc10, ldc10, \ + p10, is_p, ldp, \ + cntx \ + ); \ \ /* Pack to p12. For lower storage, this includes the unstored triangle of c11. */ \ - PASTEMAC(ch,kername)( conjc12, \ - p12_dim, \ - p12_len, \ - kappa, \ - c12, incc12, ldc12, \ - p12, is_p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc12, \ + p12_dim, \ + p12_len, \ + kappa, \ + c12, incc12, ldc12, \ + p12, is_p, ldp, \ + cntx \ + ); \ \ /* Pack the stored triangle of c11 to p11. */ \ { \ @@ -429,9 +497,9 @@ void PASTEMAC(ch,varname)( \ dim_t p11_n = panel_dim; \ inc_t rs_c11 = 2*rs_c; \ inc_t cs_c11 = 2*cs_c; \ - dim_t j = diagoffc_abs; \ - ctype* c11 = ( ctype* )c + (j )*ldc; \ - ctype_r* p11 = ( ctype_r* )p_r + (j )*ldp; \ + dim_t j2 = diagoffc_abs; \ + ctype* c11 = ( ctype* )c + (j2 )*ldc; \ + ctype_r* p11 = ( ctype_r* )p_r + (j2 )*ldp; \ ctype_r* c11_r = ( ctype_r* )c11; \ ctype_r* c11_i = ( ctype_r* )c11 + 1; \ ctype_r* p11_r = ( ctype_r* )p11; \ @@ -442,27 +510,35 @@ void PASTEMAC(ch,varname)( \ ctype_r kappa_i = PASTEMAC(ch,imag)( *kappa ); \ \ /* Copy the real part of the stored triangle of c11 to p11_r. */ \ - PASTEMAC(chr,scal2m)( 0, \ - BLIS_NONUNIT_DIAG, \ - uploc, \ - BLIS_NO_TRANSPOSE, \ - p11_m, \ - p11_n, \ - alpha_r, \ - c11_r, rs_c11, cs_c11, \ - p11_r, rs_p, cs_p ); \ + PASTEMAC(chr,scal2m) \ + ( \ + 0, \ + BLIS_NONUNIT_DIAG, \ + uploc, \ + BLIS_NO_TRANSPOSE, \ + p11_m, \ + p11_n, \ + alpha_r, \ + c11_r, rs_c11, cs_c11, \ + p11_r, rs_p, cs_p, \ + cntx \ + ); \ \ /* Copy the imaginary part of the stored triangle of c11 to p11_i, scaling by -1 if conjugation on c was requested. */ \ - PASTEMAC(chr,scal2m)( 0, \ - BLIS_NONUNIT_DIAG, \ - uploc, \ - BLIS_NO_TRANSPOSE, \ - p11_m, \ - p11_n, \ - alpha_i, \ - c11_i, rs_c11, cs_c11, \ - p11_i, rs_p, cs_p ); \ + PASTEMAC(chr,scal2m) \ + ( \ + 0, \ + BLIS_NONUNIT_DIAG, \ + uploc, \ + BLIS_NO_TRANSPOSE, \ + p11_m, \ + p11_n, \ + alpha_i, \ + c11_i, rs_c11, cs_c11, \ + p11_i, rs_p, cs_p, \ + cntx \ + ); \ \ /* If source matrix c is Hermitian, we have to zero out the imaginary components of the diagonal of p11 in case the @@ -481,23 +557,29 @@ void PASTEMAC(ch,varname)( \ part of c11 that was copied above. */ \ if ( bli_is_upper( uploc ) ) \ { \ - PASTEMAC(ch,scalris_mxn_u)( 0, \ - p11_m, \ - p11_n, \ - &kappa_r, \ - &kappa_i, \ - p11_r, \ - p11_i, rs_p, cs_p ); \ + PASTEMAC(ch,scalris_mxn_u) \ + ( \ + 0, \ + p11_m, \ + p11_n, \ + &kappa_r, \ + &kappa_i, \ + p11_r, \ + p11_i, rs_p, cs_p \ + ); \ } \ else \ { \ - PASTEMAC(ch,scalris_mxn_l)( 0, \ - p11_m, \ - p11_n, \ - &kappa_r, \ - &kappa_i, \ - p11_r, \ - p11_i, rs_p, cs_p ); \ + PASTEMAC(ch,scalris_mxn_l) \ + ( \ + 0, \ + p11_m, \ + p11_n, \ + &kappa_r, \ + &kappa_i, \ + p11_r, \ + p11_i, rs_p, cs_p \ + ); \ } \ \ /* Update the p11 section of the ri panel. It simply needs @@ -512,9 +594,12 @@ void PASTEMAC(ch,varname)( \ ctype_r* pi11_i = p11_i + (i )*rs_p + (j )*cs_p; \ ctype_r* pi11_rpi = p11_rpi + (i )*rs_p + (j )*cs_p; \ \ - PASTEMAC(chr,add3s)( *pi11_r, \ - *pi11_i, \ - *pi11_rpi ); \ + PASTEMAC(chr,add3s) \ + ( \ + *pi11_r, \ + *pi11_i, \ + *pi11_rpi \ + ); \ } \ } \ } \ @@ -530,34 +615,40 @@ INSERT_GENTFUNCCO_BASIC( packm_herm_cxk_3mis, packm_cxk_3mis ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffp, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffp, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ /* Pack the panel. */ \ - PASTEMAC(ch,kername)( conjc, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, is_p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, is_p, ldp, \ + cntx \ + ); \ \ \ /* Tweak the panel according to its triangular structure */ \ @@ -590,16 +681,26 @@ void PASTEMAC(ch,varname)( \ ctype_r kappa_i = PASTEMAC(ch,imag)( *kappa ); \ dim_t i; \ \ - PASTEMAC(chr,setd)( diagoffp, \ - m_panel, \ - n_panel, \ - &kappa_r, \ - p_r, rs_p, cs_p ); \ - PASTEMAC(chr,setd)( diagoffp, \ - m_panel, \ - n_panel, \ - &kappa_i, \ - p_i, rs_p, cs_p ); \ + PASTEMAC(chr,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + m_panel, \ + n_panel, \ + &kappa_r, \ + p_r, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + m_panel, \ + n_panel, \ + &kappa_i, \ + p_i, rs_p, cs_p, \ + cntx \ + ); \ \ /* Update the diagonal of the p11 section of the rpi panel. It simply needs to contain the sum of diagonals of p11_r @@ -646,27 +747,42 @@ void PASTEMAC(ch,varname)( \ bli_toggle_uplo( uplop ); \ bli_shift_diag_offset_to_shrink_uplo( uplop, diagoffp ); \ \ - PASTEMAC(chr,setm)( diagoffp, \ - BLIS_NONUNIT_DIAG, \ - uplop, \ - m_panel, \ - n_panel, \ - zero_r, \ - p_r, rs_p, cs_p ); \ - PASTEMAC(chr,setm)( diagoffp, \ - BLIS_NONUNIT_DIAG, \ - uplop, \ - m_panel, \ - n_panel, \ - zero_r, \ - p_i, rs_p, cs_p ); \ - PASTEMAC(chr,setm)( diagoffp, \ - BLIS_NONUNIT_DIAG, \ - uplop, \ - m_panel, \ - n_panel, \ - zero_r, \ - p_rpi, rs_p, cs_p ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + BLIS_NONUNIT_DIAG, \ + uplop, \ + m_panel, \ + n_panel, \ + zero_r, \ + p_r, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + BLIS_NONUNIT_DIAG, \ + uplop, \ + m_panel, \ + n_panel, \ + zero_r, \ + p_i, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + BLIS_NONUNIT_DIAG, \ + uplop, \ + m_panel, \ + n_panel, \ + zero_r, \ + p_rpi, rs_p, cs_p, \ + cntx \ + ); \ } \ } \ } diff --git a/frame/1m/packm/bli_packm_struc_cxk_3mis.h b/frame/1m/packm/bli_packm_struc_cxk_3mis.h index 6ac583b4e..e3419faeb 100644 --- a/frame/1m/packm/bli_packm_struc_cxk_3mis.h +++ b/frame/1m/packm/bli_packm_struc_cxk_3mis.h @@ -35,23 +35,25 @@ #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffp, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffp, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_struc_cxk_3mis ) @@ -60,24 +62,26 @@ INSERT_GENTPROTCO_BASIC( packm_struc_cxk_3mis ) #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p, inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_herm_cxk_3mis ) @@ -86,26 +90,28 @@ INSERT_GENTPROTCO_BASIC( packm_herm_cxk_3mis ) #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p, inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_tri_cxk_3mis ) diff --git a/frame/1m/packm/bli_packm_struc_cxk_4mi.c b/frame/1m/packm/bli_packm_struc_cxk_4mi.c index e84ff23e4..ae9f24fd9 100644 --- a/frame/1m/packm/bli_packm_struc_cxk_4mi.c +++ b/frame/1m/packm/bli_packm_struc_cxk_4mi.c @@ -37,23 +37,25 @@ #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + cntx_t* cntx \ + ) \ { \ dim_t panel_dim; \ dim_t panel_len; \ @@ -89,56 +91,68 @@ void PASTEMAC(ch,varname)( \ { \ /* For micro-panels of general matrices, we can call the pack kernel front-end directly. */ \ - PASTEMAC(ch,kername)( conjc, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, is_p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, is_p, ldp, \ + cntx \ + ); \ } \ else if ( bli_is_herm_or_symm( strucc ) ) \ { \ /* Call a helper function for micro-panels of Hermitian/symmetric matrices. */ \ - PASTEMAC(ch,packm_herm_cxk_4mi)( strucc, \ - diagoffc, \ - uploc, \ - conjc, \ - schema, \ - m_panel, \ - n_panel, \ - m_panel_max, \ - n_panel_max, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, rs_c, cs_c, \ - incc, ldc, \ - p, rs_p, cs_p, \ - is_p, ldp ); \ + PASTEMAC(ch,packm_herm_cxk_4mi) \ + ( \ + strucc, \ + diagoffc, \ + uploc, \ + conjc, \ + schema, \ + m_panel, \ + n_panel, \ + m_panel_max, \ + n_panel_max, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, rs_c, cs_c, \ + incc, ldc, \ + p, rs_p, cs_p, \ + is_p, ldp, \ + cntx \ + ); \ } \ else /* ( bli_is_triangular( strucc ) ) */ \ { \ /* Call a helper function for micro-panels of triangular matrices. */ \ - PASTEMAC(ch,packm_tri_cxk_4mi)( strucc, \ - diagoffc, \ - diagc, \ - uploc, \ - conjc, \ - schema, \ - invdiag, \ - m_panel, \ - n_panel, \ - m_panel_max, \ - n_panel_max, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, rs_c, cs_c, \ - incc, ldc, \ - p, rs_p, cs_p, \ - is_p, ldp ); \ + PASTEMAC(ch,packm_tri_cxk_4mi) \ + ( \ + strucc, \ + diagoffc, \ + diagc, \ + uploc, \ + conjc, \ + schema, \ + invdiag, \ + m_panel, \ + n_panel, \ + m_panel_max, \ + n_panel_max, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, rs_c, cs_c, \ + incc, ldc, \ + p, rs_p, cs_p, \ + is_p, ldp, \ + cntx \ + ); \ } \ \ \ @@ -158,20 +172,30 @@ void PASTEMAC(ch,varname)( \ ctype_r* p_edge_r = ( ctype_r* )p + (i )*rs_p; \ ctype_r* p_edge_i = ( ctype_r* )p + is_p + (i )*rs_p; \ \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_r, rs_p, cs_p ); \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_i, rs_p, cs_p ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_r, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_i, rs_p, cs_p, \ + cntx \ + ); \ } \ \ if ( n_panel != n_panel_max ) \ @@ -183,20 +207,30 @@ void PASTEMAC(ch,varname)( \ ctype_r* p_edge_r = ( ctype_r* )p + (j )*cs_p; \ ctype_r* p_edge_i = ( ctype_r* )p + is_p + (j )*cs_p; \ \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_r, rs_p, cs_p ); \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_i, rs_p, cs_p ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_r, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_i, rs_p, cs_p, \ + cntx \ + ); \ } \ \ \ @@ -222,16 +256,26 @@ void PASTEMAC(ch,varname)( \ ctype_r* p_br_r = ( ctype_r* )p + (i )*rs_p + (j )*cs_p; \ ctype_r* p_br_i = ( ctype_r* )p + is_p + (i )*rs_p + (j )*cs_p; \ \ - PASTEMAC(chr,setd)( 0, \ - m_br, \ - n_br, \ - one_r, \ - p_br_r, rs_p, cs_p ); \ - PASTEMAC(chr,setd)( 0, \ - m_br, \ - n_br, \ - zero_r, \ - p_br_i, rs_p, cs_p ); \ + PASTEMAC(chr,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + m_br, \ + n_br, \ + one_r, \ + p_br_r, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + m_br, \ + n_br, \ + zero_r, \ + p_br_i, rs_p, cs_p, \ + cntx \ + ); \ } \ } \ } @@ -244,24 +288,26 @@ INSERT_GENTFUNCCO_BASIC( packm_struc_cxk_4mi, packm_cxk_4mi ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ doff_t diagoffc_abs; \ dim_t i, j; \ @@ -298,12 +344,16 @@ void PASTEMAC(ch,varname)( \ } \ \ /* Pack the full panel. */ \ - PASTEMAC(ch,kername)( conjc, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, is_p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, is_p, ldp, \ + cntx \ + ); \ } \ else /* if ( bli_intersects_diag_n( diagoffc, m_panel, n_panel ) ) */ \ { \ @@ -391,21 +441,29 @@ void PASTEMAC(ch,varname)( \ \ /* Pack to p10. For upper storage, this includes the unstored triangle of c11. */ \ - PASTEMAC(ch,kername)( conjc10, \ - p10_dim, \ - p10_len, \ - kappa, \ - c10, incc10, ldc10, \ - p10, is_p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc10, \ + p10_dim, \ + p10_len, \ + kappa, \ + c10, incc10, ldc10, \ + p10, is_p, ldp, \ + cntx \ + ); \ \ /* Pack to p12. For lower storage, this includes the unstored triangle of c11. */ \ - PASTEMAC(ch,kername)( conjc12, \ - p12_dim, \ - p12_len, \ - kappa, \ - c12, incc12, ldc12, \ - p12, is_p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc12, \ + p12_dim, \ + p12_len, \ + kappa, \ + c12, incc12, ldc12, \ + p12, is_p, ldp, \ + cntx \ + ); \ \ /* Pack the stored triangle of c11 to p11. */ \ { \ @@ -413,9 +471,9 @@ void PASTEMAC(ch,varname)( \ dim_t p11_n = panel_dim; \ inc_t rs_c11 = 2*rs_c; \ inc_t cs_c11 = 2*cs_c; \ - dim_t j = diagoffc_abs; \ - ctype* c11 = ( ctype* )c + (j )*ldc; \ - ctype_r* p11 = ( ctype_r* )p_r + (j )*ldp; \ + dim_t j2 = diagoffc_abs; \ + ctype* c11 = ( ctype* )c + (j2 )*ldc; \ + ctype_r* p11 = ( ctype_r* )p_r + (j2 )*ldp; \ ctype_r* c11_r = ( ctype_r* )c11; \ ctype_r* c11_i = ( ctype_r* )c11 + 1; \ ctype_r* p11_r = ( ctype_r* )p11; \ @@ -426,27 +484,35 @@ void PASTEMAC(ch,varname)( \ ctype_r kappa_i = PASTEMAC(ch,imag)( *kappa ); \ \ /* Copy the real part of the stored triangle of c11 to p11_r. */ \ - PASTEMAC(chr,scal2m)( 0, \ - BLIS_NONUNIT_DIAG, \ - uploc, \ - BLIS_NO_TRANSPOSE, \ - p11_m, \ - p11_n, \ - alpha_r, \ - c11_r, rs_c11, cs_c11, \ - p11_r, rs_p, cs_p ); \ + PASTEMAC(chr,scal2m) \ + ( \ + 0, \ + BLIS_NONUNIT_DIAG, \ + uploc, \ + BLIS_NO_TRANSPOSE, \ + p11_m, \ + p11_n, \ + alpha_r, \ + c11_r, rs_c11, cs_c11, \ + p11_r, rs_p, cs_p, \ + cntx \ + ); \ \ /* Copy the imaginary part of the stored triangle of c11 to p11_i, scaling by -1 if conjugation on c was requested. */ \ - PASTEMAC(chr,scal2m)( 0, \ - BLIS_NONUNIT_DIAG, \ - uploc, \ - BLIS_NO_TRANSPOSE, \ - p11_m, \ - p11_n, \ - alpha_i, \ - c11_i, rs_c11, cs_c11, \ - p11_i, rs_p, cs_p ); \ + PASTEMAC(chr,scal2m) \ + ( \ + 0, \ + BLIS_NONUNIT_DIAG, \ + uploc, \ + BLIS_NO_TRANSPOSE, \ + p11_m, \ + p11_n, \ + alpha_i, \ + c11_i, rs_c11, cs_c11, \ + p11_i, rs_p, cs_p, \ + cntx \ + ); \ \ /* If source matrix c is Hermitian, we have to zero out the imaginary components of the diagonal of p11 in case the @@ -465,23 +531,29 @@ void PASTEMAC(ch,varname)( \ part of c11 that was copied above. */ \ if ( bli_is_upper( uploc ) ) \ { \ - PASTEMAC(ch,scalris_mxn_u)( 0, \ - p11_m, \ - p11_n, \ - &kappa_r, \ - &kappa_i, \ - p11_r, \ - p11_i, rs_p, cs_p ); \ + PASTEMAC(ch,scalris_mxn_u) \ + ( \ + 0, \ + p11_m, \ + p11_n, \ + &kappa_r, \ + &kappa_i, \ + p11_r, \ + p11_i, rs_p, cs_p \ + ); \ } \ else \ { \ - PASTEMAC(ch,scalris_mxn_l)( 0, \ - p11_m, \ - p11_n, \ - &kappa_r, \ - &kappa_i, \ - p11_r, \ - p11_i, rs_p, cs_p ); \ + PASTEMAC(ch,scalris_mxn_l) \ + ( \ + 0, \ + p11_m, \ + p11_n, \ + &kappa_r, \ + &kappa_i, \ + p11_r, \ + p11_i, rs_p, cs_p \ + ); \ } \ /* PASTEMAC(chr,fprintm)( stdout, "packm_herm_cxk: ap_r copied", m_panel_max, n_panel_max, \ @@ -502,34 +574,40 @@ INSERT_GENTFUNCCO_BASIC( packm_herm_cxk_4mi, packm_cxk_4mi ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffp, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffp, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ /* Pack the panel. */ \ - PASTEMAC(ch,kername)( conjc, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, is_p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, is_p, ldp, \ + cntx \ + ); \ \ \ /* Tweak the panel according to its triangular structure */ \ @@ -548,16 +626,26 @@ void PASTEMAC(ch,varname)( \ ctype_r kappa_r = PASTEMAC(ch,real)( *kappa ); \ ctype_r kappa_i = PASTEMAC(ch,imag)( *kappa ); \ \ - PASTEMAC(chr,setd)( diagoffp, \ - m_panel, \ - n_panel, \ - &kappa_r, \ - p_r, rs_p, cs_p ); \ - PASTEMAC(chr,setd)( diagoffp, \ - m_panel, \ - n_panel, \ - &kappa_i, \ - p_i, rs_p, cs_p ); \ + PASTEMAC(chr,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + m_panel, \ + n_panel, \ + &kappa_r, \ + p_r, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setd) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + m_panel, \ + n_panel, \ + &kappa_i, \ + p_i, rs_p, cs_p, \ + cntx \ + ); \ } \ \ \ @@ -591,20 +679,30 @@ void PASTEMAC(ch,varname)( \ bli_toggle_uplo( uplop ); \ bli_shift_diag_offset_to_shrink_uplo( uplop, diagoffp ); \ \ - PASTEMAC(chr,setm)( diagoffp, \ - BLIS_NONUNIT_DIAG, \ - uplop, \ - m_panel, \ - n_panel, \ - zero_r, \ - p_r, rs_p, cs_p ); \ - PASTEMAC(chr,setm)( diagoffp, \ - BLIS_NONUNIT_DIAG, \ - uplop, \ - m_panel, \ - n_panel, \ - zero_r, \ - p_i, rs_p, cs_p ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + BLIS_NONUNIT_DIAG, \ + uplop, \ + m_panel, \ + n_panel, \ + zero_r, \ + p_r, rs_p, cs_p, \ + cntx \ + ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + BLIS_NONUNIT_DIAG, \ + uplop, \ + m_panel, \ + n_panel, \ + zero_r, \ + p_i, rs_p, cs_p, \ + cntx \ + ); \ } \ } \ } diff --git a/frame/1m/packm/bli_packm_struc_cxk_4mi.h b/frame/1m/packm/bli_packm_struc_cxk_4mi.h index 10d7ada5c..ddc420420 100644 --- a/frame/1m/packm/bli_packm_struc_cxk_4mi.h +++ b/frame/1m/packm/bli_packm_struc_cxk_4mi.h @@ -35,23 +35,25 @@ #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffp, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffp, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_struc_cxk_4mi ) @@ -60,24 +62,26 @@ INSERT_GENTPROTCO_BASIC( packm_struc_cxk_4mi ) #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p, inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_herm_cxk_4mi ) @@ -86,26 +90,28 @@ INSERT_GENTPROTCO_BASIC( packm_herm_cxk_4mi ) #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p, inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_tri_cxk_4mi ) diff --git a/frame/1m/packm/bli_packm_struc_cxk_rih.c b/frame/1m/packm/bli_packm_struc_cxk_rih.c index 038162a08..96985f335 100644 --- a/frame/1m/packm/bli_packm_struc_cxk_rih.c +++ b/frame/1m/packm/bli_packm_struc_cxk_rih.c @@ -37,23 +37,25 @@ #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + cntx_t* cntx \ + ) \ { \ dim_t panel_dim; \ dim_t panel_len; \ @@ -92,57 +94,69 @@ void PASTEMAC(ch,varname)( \ { \ /* For micro-panels of general matrices, we can call the pack kernel front-end directly. */ \ - PASTEMAC(ch,kername)( conjc, \ - schema, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + schema, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, ldp, \ + cntx \ + ); \ } \ else if ( bli_is_herm_or_symm( strucc ) ) \ { \ /* Call a helper function for micro-panels of Hermitian/symmetric matrices. */ \ - PASTEMAC(ch,packm_herm_cxk_rih)( strucc, \ - diagoffc, \ - uploc, \ - conjc, \ - schema, \ - m_panel, \ - n_panel, \ - m_panel_max, \ - n_panel_max, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, rs_c, cs_c, \ - incc, ldc, \ - p, rs_p, cs_p, \ - ldp ); \ + PASTEMAC(ch,packm_herm_cxk_rih) \ + ( \ + strucc, \ + diagoffc, \ + uploc, \ + conjc, \ + schema, \ + m_panel, \ + n_panel, \ + m_panel_max, \ + n_panel_max, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, rs_c, cs_c, \ + incc, ldc, \ + p, rs_p, cs_p, \ + ldp, \ + cntx \ + ); \ } \ else /* ( bli_is_triangular( strucc ) ) */ \ { \ /* Call a helper function for micro-panels of triangular matrices. */ \ - PASTEMAC(ch,packm_tri_cxk_rih)( strucc, \ - diagoffc, \ - diagc, \ - uploc, \ - conjc, \ - schema, \ - invdiag, \ - m_panel, \ - n_panel, \ - m_panel_max, \ - n_panel_max, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, rs_c, cs_c, \ - incc, ldc, \ - p, rs_p, cs_p, \ - ldp ); \ + PASTEMAC(ch,packm_tri_cxk_rih) \ + ( \ + strucc, \ + diagoffc, \ + diagc, \ + uploc, \ + conjc, \ + schema, \ + invdiag, \ + m_panel, \ + n_panel, \ + m_panel_max, \ + n_panel_max, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, rs_c, cs_c, \ + incc, ldc, \ + p, rs_p, cs_p, \ + ldp, \ + cntx \ + ); \ } \ \ \ @@ -161,13 +175,18 @@ void PASTEMAC(ch,varname)( \ dim_t n_edge = n_panel_max; \ ctype_r* p_edge_r = ( ctype_r* )p + (i )*rs_p; \ \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_r, rs_p, cs_p ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_r, rs_p, cs_p, \ + cntx \ + ); \ } \ \ if ( n_panel != n_panel_max ) \ @@ -178,13 +197,18 @@ void PASTEMAC(ch,varname)( \ dim_t n_edge = n_panel_max - j; \ ctype_r* p_edge_r = ( ctype_r* )p + (j )*cs_p; \ \ - PASTEMAC(chr,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_edge, \ - n_edge, \ - zero_r, \ - p_edge_r, rs_p, cs_p ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_edge, \ + n_edge, \ + zero_r, \ + p_edge_r, rs_p, cs_p, \ + cntx \ + ); \ } \ \ \ @@ -232,24 +256,26 @@ INSERT_GENTFUNCCO_BASIC( packm_struc_cxk_rih, packm_cxk_rih ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ bool_t row_stored; \ bool_t col_stored; \ @@ -286,13 +312,17 @@ void PASTEMAC(ch,varname)( \ } \ \ /* Pack the full panel. */ \ - PASTEMAC(ch,kername)( conjc, \ - schema, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + schema, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, ldp, \ + cntx \ + ); \ } \ else /* if ( bli_intersects_diag_n( diagoffc, m_panel, n_panel ) ) */ \ { \ @@ -377,38 +407,49 @@ void PASTEMAC(ch,varname)( \ \ /* Pack to p10. For upper storage, this includes the unstored triangle of c11. */ \ - PASTEMAC(ch,kername)( conjc10, \ - schema, \ - p10_dim, \ - p10_len, \ - kappa, \ - c10, incc10, ldc10, \ - p10, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc10, \ + schema, \ + p10_dim, \ + p10_len, \ + kappa, \ + c10, incc10, ldc10, \ + p10, ldp, \ + cntx \ + ); \ \ /* Pack to p12. For lower storage, this includes the unstored triangle of c11. */ \ - PASTEMAC(ch,kername)( conjc12, \ - schema, \ - p12_dim, \ - p12_len, \ - kappa, \ - c12, incc12, ldc12, \ - p12, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc12, \ + schema, \ + p12_dim, \ + p12_len, \ + kappa, \ + c12, incc12, ldc12, \ + p12, ldp, \ + cntx \ + ); \ \ /* Pack the stored triangle of c11 to p11. */ \ { \ - dim_t j = diagoffc_abs; \ - ctype_r* restrict p_r = ( ctype_r* )p; \ - ctype* restrict c11 = c + (j )*ldc; \ - ctype_r* restrict p11_r = p_r + (j )*ldp; \ + dim_t j2 = diagoffc_abs; \ + /*ctype_r* restrict p_r = ( ctype_r* )p;*/ \ + ctype* restrict c11 = c + (j2 )*ldc; \ + ctype_r* restrict p11_r = p_r + (j2 )*ldp; \ \ - PASTEMAC(ch,scal2rihs_mxn_uplo)( schema, \ - uploc, \ - conjc, \ - panel_dim, \ - kappa, \ - c11, rs_c, cs_c, \ - p11_r, rs_p, cs_p ); \ + PASTEMAC(ch,scal2rihs_mxn_uplo) \ + ( \ + schema, \ + uploc, \ + conjc, \ + panel_dim, \ + kappa, \ + c11, rs_c, cs_c, \ + p11_r, rs_p, cs_p \ + ); \ \ /* If we are packing a micro-panel with Hermitian structure, we must take special care of the diagonal. Now, if kappa @@ -422,12 +463,15 @@ void PASTEMAC(ch,varname)( \ the result to the diagonal of p11. */ \ if ( bli_is_hermitian( strucc ) ) \ { \ - PASTEMAC3(ch,chr,ch,scal2rihs_mxn_diag)( schema, \ - panel_dim, \ - panel_dim, \ - kappa, \ - c11, rs_c, cs_c, \ - p11_r, rs_p, cs_p ); \ + PASTEMAC3(ch,chr,ch,scal2rihs_mxn_diag) \ + ( \ + schema, \ + panel_dim, \ + panel_dim, \ + kappa, \ + c11, rs_c, cs_c, \ + p11_r, rs_p, cs_p \ + ); \ } \ \ /* @@ -449,35 +493,41 @@ INSERT_GENTFUNCCO_BASIC( packm_herm_cxk_rih, packm_cxk_rih ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffp, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffp, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t ldp, \ + cntx_t* cntx \ + ) \ { \ /* Pack the panel. */ \ - PASTEMAC(ch,kername)( conjc, \ - schema, \ - panel_dim, \ - panel_len, \ - kappa, \ - c, incc, ldc, \ - p, ldp ); \ + PASTEMAC(ch,kername) \ + ( \ + conjc, \ + schema, \ + panel_dim, \ + panel_len, \ + kappa, \ + c, incc, ldc, \ + p, ldp, \ + cntx \ + ); \ \ \ /* Tweak the panel according to its triangular structure */ \ @@ -491,11 +541,14 @@ void PASTEMAC(ch,varname)( \ the diagonal of the packed panel to kappa. */ \ if ( bli_is_unit_diag( diagc ) ) \ { \ - PASTEMAC(ch,setrihs_mxn_diag)( schema, \ - panel_dim, \ - panel_dim, \ - kappa, \ - p11_r, rs_p, cs_p ); \ + PASTEMAC(ch,setrihs_mxn_diag) \ + ( \ + schema, \ + panel_dim, \ + panel_dim, \ + kappa, \ + p11_r, rs_p, cs_p \ + ); \ } \ \ \ @@ -518,13 +571,18 @@ void PASTEMAC(ch,varname)( \ bli_toggle_uplo( uplop ); \ bli_shift_diag_offset_to_shrink_uplo( uplop, diagoffp ); \ \ - PASTEMAC(chr,setm)( diagoffp, \ - BLIS_NONUNIT_DIAG, \ - uplop, \ - m_panel, \ - n_panel, \ - zero_r, \ - p_r, rs_p, cs_p ); \ + PASTEMAC(chr,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + BLIS_NONUNIT_DIAG, \ + uplop, \ + m_panel, \ + n_panel, \ + zero_r, \ + p_r, rs_p, cs_p, \ + cntx \ + ); \ } \ } \ } diff --git a/frame/1m/packm/bli_packm_struc_cxk_rih.h b/frame/1m/packm/bli_packm_struc_cxk_rih.h index de0ef0bf2..490d1935a 100644 --- a/frame/1m/packm/bli_packm_struc_cxk_rih.h +++ b/frame/1m/packm/bli_packm_struc_cxk_rih.h @@ -35,23 +35,25 @@ #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffp, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t is_p \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffp, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t is_p, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_struc_cxk_rih ) @@ -60,24 +62,26 @@ INSERT_GENTPROTCO_BASIC( packm_struc_cxk_rih ) #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_herm_cxk_rih ) @@ -86,26 +90,28 @@ INSERT_GENTPROTCO_BASIC( packm_herm_cxk_rih ) #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - conj_t conjc, \ - pack_t schema, \ - bool_t invdiag, \ - dim_t m_panel, \ - dim_t n_panel, \ - dim_t m_panel_max, \ - dim_t n_panel_max, \ - dim_t panel_dim, \ - dim_t panel_len, \ - ctype* restrict kappa, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - inc_t incc, inc_t ldc, \ - ctype* restrict p, inc_t rs_p, inc_t cs_p, \ - inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + conj_t conjc, \ + pack_t schema, \ + bool_t invdiag, \ + dim_t m_panel, \ + dim_t n_panel, \ + dim_t m_panel_max, \ + dim_t n_panel_max, \ + dim_t panel_dim, \ + dim_t panel_len, \ + ctype* restrict kappa, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + inc_t incc, inc_t ldc, \ + ctype* restrict p, inc_t rs_p, inc_t cs_p, \ + inc_t ldp, \ + cntx_t* cntx \ + ); INSERT_GENTPROTCO_BASIC( packm_tri_cxk_rih ) diff --git a/frame/1m/packm/bli_packm_unb_var1.c b/frame/1m/packm/bli_packm_unb_var1.c index d6e4fe1bf..227ad6f71 100644 --- a/frame/1m/packm/bli_packm_unb_var1.c +++ b/frame/1m/packm/bli_packm_unb_var1.c @@ -48,7 +48,8 @@ typedef void (*FUNCPTR_T)( dim_t n_max, void* kappa, void* c, inc_t rs_c, inc_t cs_c, - void* p, inc_t rs_p, inc_t cs_p + void* p, inc_t rs_p, inc_t cs_p, + cntx_t* cntx ); static FUNCPTR_T GENARRAY(ftypes,packm_unb_var1); @@ -56,6 +57,7 @@ static FUNCPTR_T GENARRAY(ftypes,packm_unb_var1); void bli_packm_unb_var1( obj_t* c, obj_t* p, + cntx_t* cntx, packm_thrinfo_t* thread ) { num_t dt_cp = bli_obj_datatype( *c ); @@ -96,39 +98,45 @@ void bli_packm_unb_var1( obj_t* c, if( thread_am_ochief( thread ) ) { // Invoke the function. - f( strucc, - diagoffc, - diagc, - uploc, - transc, - m_p, - n_p, - m_max_p, - n_max_p, - buf_kappa, - buf_c, rs_c, cs_c, - buf_p, rs_p, cs_p ); + f + ( + strucc, + diagoffc, + diagc, + uploc, + transc, + m_p, + n_p, + m_max_p, + n_max_p, + buf_kappa, + buf_c, rs_c, cs_c, + buf_p, rs_p, cs_p, + cntx + ); } } #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - trans_t transc, \ - dim_t m, \ - dim_t n, \ - dim_t m_max, \ - dim_t n_max, \ - void* kappa, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* p, inc_t rs_p, inc_t cs_p \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + trans_t transc, \ + dim_t m, \ + dim_t n, \ + dim_t m_max, \ + dim_t n_max, \ + void* kappa, \ + void* c, inc_t rs_c, inc_t cs_c, \ + void* p, inc_t rs_p, inc_t cs_p, \ + cntx_t* cntx \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict c_cast = c; \ @@ -140,15 +148,19 @@ void PASTEMAC(ch,varname)( \ because the structure has already been "densified"), this ends up being the only action we take. Note that if kappa is unit, the data is simply copied (rather than scaled by one). */ \ - PASTEMAC3(ch,ch,ch,scal2m)( diagoffc, \ - diagc, \ - uploc, \ - transc, \ - m, \ - n, \ - kappa_cast, \ - c_cast, rs_c, cs_c, \ - p_cast, rs_p, cs_p ); \ + PASTEMAC(ch,scal2m) \ + ( \ + diagoffc, \ + diagc, \ + uploc, \ + transc, \ + m, \ + n, \ + kappa_cast, \ + c_cast, rs_c, cs_c, \ + p_cast, rs_p, cs_p, \ + cntx \ + ); \ \ /* If uploc is upper or lower, then the structure of c is necessarily non-dense (ie: Hermitian, symmetric, or triangular, where part of the @@ -178,15 +190,19 @@ void PASTEMAC(ch,varname)( \ (as specified by the original value of diagoffc). Notice that we use a diag parameter of non-unit since we can assume nothing about the neighboring off-diagonal. */ \ - PASTEMAC3(ch,ch,ch,scal2m)( diagoffc, \ - BLIS_NONUNIT_DIAG, \ - uploc, \ - transc, \ - m, \ - n, \ - kappa_cast, \ - c_cast, rs_c, cs_c, \ - p_cast, rs_p, cs_p ); \ + PASTEMAC(ch,scal2m) \ + ( \ + diagoffc, \ + BLIS_NONUNIT_DIAG, \ + uploc, \ + transc, \ + m, \ + n, \ + kappa_cast, \ + c_cast, rs_c, cs_c, \ + p_cast, rs_p, cs_p, \ + cntx \ + ); \ } \ else /* if ( bli_is_triangular( strucc ) ) */ \ { \ @@ -209,13 +225,18 @@ void PASTEMAC(ch,varname)( \ bli_shift_diag_offset_to_shrink_uplo( uplop, diagoffp ); \ \ /* Set the region opposite the diagonal of p to zero. */ \ - PASTEMAC2(ch,ch,setm)( diagoffp, \ - BLIS_NONUNIT_DIAG, \ - uplop, \ - m, \ - n, \ - zero, \ - p_cast, rs_p, cs_p ); \ + PASTEMAC(ch,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffp, \ + BLIS_NONUNIT_DIAG, \ + uplop, \ + m, \ + n, \ + zero, \ + p_cast, rs_p, cs_p, \ + cntx \ + ); \ } \ } \ \ @@ -230,28 +251,38 @@ void PASTEMAC(ch,varname)( \ { \ ctype* p_edge = p_cast + (m )*rs_p; \ \ - PASTEMAC2(ch,ch,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_max - m, \ - n_max, \ - zero, \ - p_edge, rs_p, cs_p ); \ + PASTEMAC(ch,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_max - m, \ + n_max, \ + zero, \ + p_edge, rs_p, cs_p, \ + cntx \ + ); \ } \ \ if ( n != n_max ) \ { \ ctype* p_edge = p_cast + (n )*cs_p; \ \ - PASTEMAC2(ch,ch,setm)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - m_max, \ - n_max - n, \ - zero, \ - p_edge, rs_p, cs_p ); \ + PASTEMAC(ch,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + m_max, \ + n_max - n, \ + zero, \ + p_edge, rs_p, cs_p, \ + cntx \ + ); \ } \ } -INSERT_GENTFUNC_BASIC( packm, packm_unb_var1 ) +INSERT_GENTFUNC_BASIC0( packm_unb_var1 ) diff --git a/frame/1m/packm/bli_packm_unb_var1.h b/frame/1m/packm/bli_packm_unb_var1.h index f5527da70..d5fd4cbdc 100644 --- a/frame/1m/packm/bli_packm_unb_var1.h +++ b/frame/1m/packm/bli_packm_unb_var1.h @@ -34,26 +34,29 @@ void bli_packm_unb_var1( obj_t* c, obj_t* p, + cntx_t* cntx, packm_thrinfo_t* thread ); #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - trans_t transc, \ - dim_t m, \ - dim_t n, \ - dim_t m_max, \ - dim_t n_max, \ - void* kappa, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* p, inc_t rs_p, inc_t cs_p \ - ); +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + trans_t transc, \ + dim_t m, \ + dim_t n, \ + dim_t m_max, \ + dim_t n_max, \ + void* kappa, \ + void* c, inc_t rs_c, inc_t cs_c, \ + void* p, inc_t rs_p, inc_t cs_p, \ + cntx_t* cntx \ + ); INSERT_GENTPROT_BASIC( packm_unb_var1 ) diff --git a/frame/1m/packm/ukernels/bli_packm_ref_cxk_3mis.c b/frame/1m/packm/ukernels/bli_packm_cxk_3mis_ref.c similarity index 95% rename from frame/1m/packm/ukernels/bli_packm_ref_cxk_3mis.c rename to frame/1m/packm/ukernels/bli_packm_cxk_3mis_ref.c index 39b64fc00..dd6a4225e 100644 --- a/frame/1m/packm/ukernels/bli_packm_ref_cxk_3mis.c +++ b/frame/1m/packm/ukernels/bli_packm_cxk_3mis_ref.c @@ -37,13 +37,14 @@ #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -121,20 +122,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_2xk_3mis ) +INSERT_GENTFUNCCO_BASIC0( packm_2xk_3mis_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -220,20 +222,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_4xk_3mis ) +INSERT_GENTFUNCCO_BASIC0( packm_4xk_3mis_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -327,20 +330,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_6xk_3mis ) +INSERT_GENTFUNCCO_BASIC0( packm_6xk_3mis_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -442,20 +446,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_8xk_3mis ) +INSERT_GENTFUNCCO_BASIC0( packm_8xk_3mis_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -565,20 +570,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_10xk_3mis ) +INSERT_GENTFUNCCO_BASIC0( packm_10xk_3mis_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -696,20 +702,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_12xk_3mis ) +INSERT_GENTFUNCCO_BASIC0( packm_12xk_3mis_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -835,20 +842,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_14xk_3mis ) +INSERT_GENTFUNCCO_BASIC0( packm_14xk_3mis_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -982,20 +990,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_16xk_3mis ) +INSERT_GENTFUNCCO_BASIC0( packm_16xk_3mis_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -1186,5 +1195,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_30xk_3mis ) +INSERT_GENTFUNCCO_BASIC0( packm_30xk_3mis_ref ) diff --git a/frame/1m/unpackm/ukernels/bli_unpackm_ref_cxk.h b/frame/1m/packm/ukernels/bli_packm_cxk_3mis_ref.h similarity index 73% rename from frame/1m/unpackm/ukernels/bli_unpackm_ref_cxk.h rename to frame/1m/packm/ukernels/bli_packm_cxk_3mis_ref.h index 17fa425c8..2158bb041 100644 --- a/frame/1m/unpackm/ukernels/bli_unpackm_ref_cxk.h +++ b/frame/1m/packm/ukernels/bli_packm_cxk_3mis_ref.h @@ -35,20 +35,22 @@ #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conjp, \ - dim_t n, \ - void* beta, \ - void* p, \ - void* a, inc_t inca, inc_t lda \ - ); +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ); -INSERT_GENTPROT_BASIC( unpackm_ref_2xk ) -INSERT_GENTPROT_BASIC( unpackm_ref_4xk ) -INSERT_GENTPROT_BASIC( unpackm_ref_6xk ) -INSERT_GENTPROT_BASIC( unpackm_ref_8xk ) -INSERT_GENTPROT_BASIC( unpackm_ref_10xk ) -INSERT_GENTPROT_BASIC( unpackm_ref_12xk ) -INSERT_GENTPROT_BASIC( unpackm_ref_14xk ) -INSERT_GENTPROT_BASIC( unpackm_ref_16xk ) +INSERT_GENTPROT_BASIC( packm_2xk_3mis_ref ) +INSERT_GENTPROT_BASIC( packm_4xk_3mis_ref ) +INSERT_GENTPROT_BASIC( packm_6xk_3mis_ref ) +INSERT_GENTPROT_BASIC( packm_8xk_3mis_ref ) +INSERT_GENTPROT_BASIC( packm_10xk_3mis_ref ) +INSERT_GENTPROT_BASIC( packm_12xk_3mis_ref ) +INSERT_GENTPROT_BASIC( packm_14xk_3mis_ref ) +INSERT_GENTPROT_BASIC( packm_16xk_3mis_ref ) +INSERT_GENTPROT_BASIC( packm_30xk_3mis_ref ) diff --git a/frame/1m/packm/ukernels/bli_packm_ref_cxk_4mi.c b/frame/1m/packm/ukernels/bli_packm_cxk_4mi_ref.c similarity index 94% rename from frame/1m/packm/ukernels/bli_packm_ref_cxk_4mi.c rename to frame/1m/packm/ukernels/bli_packm_cxk_4mi_ref.c index b2dce50a3..35d2d9662 100644 --- a/frame/1m/packm/ukernels/bli_packm_ref_cxk_4mi.c +++ b/frame/1m/packm/ukernels/bli_packm_cxk_4mi_ref.c @@ -37,13 +37,14 @@ #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -116,20 +117,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_2xk_4mi ) +INSERT_GENTFUNCCO_BASIC0( packm_2xk_4mi_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -210,20 +212,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_4xk_4mi ) +INSERT_GENTFUNCCO_BASIC0( packm_4xk_4mi_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -312,20 +315,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_6xk_4mi ) +INSERT_GENTFUNCCO_BASIC0( packm_6xk_4mi_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -422,20 +426,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_8xk_4mi ) +INSERT_GENTFUNCCO_BASIC0( packm_8xk_4mi_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -540,20 +545,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_10xk_4mi ) +INSERT_GENTFUNCCO_BASIC0( packm_10xk_4mi_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -666,20 +672,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_12xk_4mi ) +INSERT_GENTFUNCCO_BASIC0( packm_12xk_4mi_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -800,20 +807,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_14xk_4mi ) +INSERT_GENTFUNCCO_BASIC0( packm_14xk_4mi_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -942,20 +950,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_16xk_4mi ) +INSERT_GENTFUNCCO_BASIC0( packm_16xk_4mi_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -1140,5 +1149,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_30xk_4mi ) +INSERT_GENTFUNCCO_BASIC0( packm_30xk_4mi_ref ) diff --git a/frame/1/setv/bli_setv_kernel.h b/frame/1m/packm/ukernels/bli_packm_cxk_4mi_ref.h similarity index 72% rename from frame/1/setv/bli_setv_kernel.h rename to frame/1m/packm/ukernels/bli_packm_cxk_4mi_ref.h index 661e6c545..506da3525 100644 --- a/frame/1/setv/bli_setv_kernel.h +++ b/frame/1m/packm/ukernels/bli_packm_cxk_4mi_ref.h @@ -32,30 +32,25 @@ */ -void bli_setv_kernel( obj_t* beta, - obj_t* x ); - - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT2 -#define GENTPROT2( ctype_b, ctype_x, chb, chx, varname ) \ +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC2(chb,chx,varname)( \ - dim_t n, \ - void* beta, \ - void* x, inc_t incx \ - ); +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t is_p, inc_t ldp \ + ); -INSERT_GENTPROT2_BASIC( setv_kernel_void ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( setv_kernel_void ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( setv_kernel_void ) -#endif +INSERT_GENTPROT_BASIC( packm_2xk_4mi_ref ) +INSERT_GENTPROT_BASIC( packm_4xk_4mi_ref ) +INSERT_GENTPROT_BASIC( packm_6xk_4mi_ref ) +INSERT_GENTPROT_BASIC( packm_8xk_4mi_ref ) +INSERT_GENTPROT_BASIC( packm_10xk_4mi_ref ) +INSERT_GENTPROT_BASIC( packm_12xk_4mi_ref ) +INSERT_GENTPROT_BASIC( packm_14xk_4mi_ref ) +INSERT_GENTPROT_BASIC( packm_16xk_4mi_ref ) +INSERT_GENTPROT_BASIC( packm_30xk_4mi_ref ) diff --git a/frame/1m/packm/ukernels/bli_packm_ref_cxk.c b/frame/1m/packm/ukernels/bli_packm_cxk_ref.c similarity index 92% rename from frame/1m/packm/ukernels/bli_packm_ref_cxk.c rename to frame/1m/packm/ukernels/bli_packm_cxk_ref.c index a46e2c78a..15abe2878 100644 --- a/frame/1m/packm/ukernels/bli_packm_ref_cxk.c +++ b/frame/1m/packm/ukernels/bli_packm_cxk_ref.c @@ -37,13 +37,14 @@ #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict alpha1 = a; \ @@ -122,20 +123,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC0( packm_ref_2xk ) +INSERT_GENTFUNC_BASIC0( packm_2xk_ref ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict alpha1 = a; \ @@ -222,20 +224,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC0( packm_ref_3xk ) +INSERT_GENTFUNC_BASIC0( packm_3xk_ref ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict alpha1 = a; \ @@ -320,20 +323,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC0( packm_ref_4xk ) +INSERT_GENTFUNC_BASIC0( packm_4xk_ref ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict alpha1 = a; \ @@ -407,20 +411,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC0( packm_ref_6xk ) +INSERT_GENTFUNC_BASIC0( packm_6xk_ref ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict alpha1 = a; \ @@ -529,20 +534,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC0( packm_ref_8xk ) +INSERT_GENTFUNC_BASIC0( packm_8xk_ref ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict alpha1 = a; \ @@ -632,20 +638,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC0( packm_ref_10xk ) +INSERT_GENTFUNC_BASIC0( packm_10xk_ref ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict alpha1 = a; \ @@ -743,20 +750,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC0( packm_ref_12xk ) +INSERT_GENTFUNC_BASIC0( packm_12xk_ref ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict alpha1 = a; \ @@ -862,20 +870,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC0( packm_ref_14xk ) +INSERT_GENTFUNC_BASIC0( packm_14xk_ref ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict alpha1 = a; \ @@ -989,20 +998,21 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC0( packm_ref_16xk ) +INSERT_GENTFUNC_BASIC0( packm_16xk_ref ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ ctype* restrict kappa_cast = kappa; \ ctype* restrict alpha1 = a; \ @@ -1172,5 +1182,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC0( packm_ref_30xk ) +INSERT_GENTFUNC_BASIC0( packm_30xk_ref ) diff --git a/frame/1m/packm/ukernels/bli_packm_cxk_ref.h b/frame/1m/packm/ukernels/bli_packm_cxk_ref.h new file mode 100644 index 000000000..3083c2e08 --- /dev/null +++ b/frame/1m/packm/ukernels/bli_packm_cxk_ref.h @@ -0,0 +1,57 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ); + +INSERT_GENTPROT_BASIC( packm_2xk_ref ) +INSERT_GENTPROT_BASIC( packm_3xk_ref ) +INSERT_GENTPROT_BASIC( packm_4xk_ref ) +INSERT_GENTPROT_BASIC( packm_6xk_ref ) +INSERT_GENTPROT_BASIC( packm_8xk_ref ) +INSERT_GENTPROT_BASIC( packm_10xk_ref ) +INSERT_GENTPROT_BASIC( packm_12xk_ref ) +INSERT_GENTPROT_BASIC( packm_14xk_ref ) +INSERT_GENTPROT_BASIC( packm_16xk_ref ) +INSERT_GENTPROT_BASIC( packm_30xk_ref ) + diff --git a/frame/1m/packm/ukernels/bli_packm_ref_cxk_rih.c b/frame/1m/packm/ukernels/bli_packm_cxk_rih_ref.c similarity index 96% rename from frame/1m/packm/ukernels/bli_packm_ref_cxk_rih.c rename to frame/1m/packm/ukernels/bli_packm_cxk_rih_ref.c index e3e0ec54c..e0bdeb250 100644 --- a/frame/1m/packm/ukernels/bli_packm_ref_cxk_rih.c +++ b/frame/1m/packm/ukernels/bli_packm_cxk_rih_ref.c @@ -37,14 +37,15 @@ #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -209,21 +210,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_2xk_rih ) +INSERT_GENTFUNCCO_BASIC0( packm_2xk_rih_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -410,21 +412,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_4xk_rih ) +INSERT_GENTFUNCCO_BASIC0( packm_4xk_rih_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -633,21 +636,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_6xk_rih ) +INSERT_GENTFUNCCO_BASIC0( packm_6xk_rih_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -878,21 +882,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_8xk_rih ) +INSERT_GENTFUNCCO_BASIC0( packm_8xk_rih_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -1145,21 +1150,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_10xk_rih ) +INSERT_GENTFUNCCO_BASIC0( packm_10xk_rih_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -1434,21 +1440,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_12xk_rih ) +INSERT_GENTFUNCCO_BASIC0( packm_12xk_rih_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -1745,21 +1752,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_14xk_rih ) +INSERT_GENTFUNCCO_BASIC0( packm_14xk_rih_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -2078,21 +2086,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_16xk_rih ) +INSERT_GENTFUNCCO_BASIC0( packm_16xk_rih_ref ) #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ) \ { \ const inc_t inca2 = 2 * inca; \ const inc_t lda2 = 2 * lda; \ @@ -2565,5 +2574,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC0( packm_ref_30xk_rih ) +INSERT_GENTFUNCCO_BASIC0( packm_30xk_rih_ref ) diff --git a/frame/1m/packm/ukernels/bli_packm_cxk_rih_ref.h b/frame/1m/packm/ukernels/bli_packm_cxk_rih_ref.h new file mode 100644 index 000000000..70d037e0a --- /dev/null +++ b/frame/1m/packm/ukernels/bli_packm_cxk_rih_ref.h @@ -0,0 +1,57 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conja, \ + pack_t schema, \ + dim_t n, \ + void* kappa, \ + void* a, inc_t inca, inc_t lda, \ + void* p, inc_t ldp \ + ); + +INSERT_GENTPROT_BASIC( packm_2xk_rih_ref ) +INSERT_GENTPROT_BASIC( packm_4xk_rih_ref ) +INSERT_GENTPROT_BASIC( packm_6xk_rih_ref ) +INSERT_GENTPROT_BASIC( packm_8xk_rih_ref ) +INSERT_GENTPROT_BASIC( packm_10xk_rih_ref ) +INSERT_GENTPROT_BASIC( packm_12xk_rih_ref ) +INSERT_GENTPROT_BASIC( packm_14xk_rih_ref ) +INSERT_GENTPROT_BASIC( packm_16xk_rih_ref ) +INSERT_GENTPROT_BASIC( packm_30xk_rih_ref ) + diff --git a/frame/1m/packm/ukernels/bli_packm_ref_cxk_rih.h b/frame/1m/packm/ukernels/bli_packm_ref_cxk_rih.h deleted file mode 100644 index a5c544da9..000000000 --- a/frame/1m/packm/ukernels/bli_packm_ref_cxk_rih.h +++ /dev/null @@ -1,56 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ -\ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - pack_t schema, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t ldp \ - ); - -INSERT_GENTPROT_BASIC( packm_ref_2xk_rih ) -INSERT_GENTPROT_BASIC( packm_ref_4xk_rih ) -INSERT_GENTPROT_BASIC( packm_ref_6xk_rih ) -INSERT_GENTPROT_BASIC( packm_ref_8xk_rih ) -INSERT_GENTPROT_BASIC( packm_ref_10xk_rih ) -INSERT_GENTPROT_BASIC( packm_ref_12xk_rih ) -INSERT_GENTPROT_BASIC( packm_ref_14xk_rih ) -INSERT_GENTPROT_BASIC( packm_ref_16xk_rih ) -INSERT_GENTPROT_BASIC( packm_ref_30xk_rih ) - diff --git a/frame/1m/scalm/bli_scalm.c b/frame/1m/scalm/bli_scalm.c deleted file mode 100644 index 3af5e7205..000000000 --- a/frame/1m/scalm/bli_scalm.c +++ /dev/null @@ -1,118 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - - -// -// Define object-based interface. -// -void bli_scalm( obj_t* beta, - obj_t* x ) -{ - if ( bli_error_checking_is_enabled() ) - bli_scalm_check( beta, x ); - - bli_scalm_int( beta, - x, - scalm_cntl ); -} - - -// -// Define BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjbeta, \ - doff_t diagoffx, \ - uplo_t uplox, \ - dim_t m, \ - dim_t n, \ - ctype* beta, \ - ctype* x, inc_t rs_x, inc_t cs_x \ - ) \ -{ \ - PASTEMAC2(ch,ch,varname)( conjbeta, \ - diagoffx, \ - uplox, \ - m, \ - n, \ - beta, \ - x, rs_x, cs_x ); \ -} - -INSERT_GENTFUNC_BASIC( scalm, scalm_unb_var1 ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC2 -#define GENTFUNC2( ctype_b, ctype_x, chb, chx, opname, varname ) \ -\ -void PASTEMAC2(chb,chx,opname)( \ - conj_t conjbeta, \ - doff_t diagoffx, \ - uplo_t uplox, \ - dim_t m, \ - dim_t n, \ - ctype_b* beta, \ - ctype_x* x, inc_t rs_x, inc_t cs_x \ - ) \ -{ \ - PASTEMAC2(chb,chx,varname)( conjbeta, \ - diagoffx, \ - uplox, \ - m, \ - n, \ - beta, \ - x, rs_x, cs_x ); \ -} - -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( scalm, scalm_unb_var1 ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( scalm, scalm_unb_var1 ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( scalm, scalm_unb_var1 ) -#endif diff --git a/frame/1m/scalm/bli_scalm_int.c b/frame/1m/scalm/bli_scalm_int.c index ba8a17243..d0160d66d 100644 --- a/frame/1m/scalm/bli_scalm_int.c +++ b/frame/1m/scalm/bli_scalm_int.c @@ -36,19 +36,22 @@ #define FUNCPTR_T scalm_fp -typedef void (*FUNCPTR_T)( obj_t* x ); +typedef void (*FUNCPTR_T)( obj_t* alpha, + obj_t* x, + cntx_t* cntx ); static FUNCPTR_T vars[1][3] = { // unblocked optimized unblocked blocked - { bli_scalm_unb_var1, NULL, NULL } + { bli_scalm_ex, bli_scalm_ex, NULL } }; -void bli_scalm_int( obj_t* beta, +void bli_scalm_int( obj_t* alpha, obj_t* x, + cntx_t* cntx, scalm_t* cntl ) { - obj_t x_local; + //obj_t x_local; varnum_t n; impl_t i; FUNCPTR_T f; @@ -58,23 +61,28 @@ void bli_scalm_int( obj_t* beta, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_scalm_int_check( beta, x, cntl ); + bli_scalm_check( alpha, x ); // First check if we are to skip this operation. if ( cntl_is_noop( cntl ) ) return; - // Return early if both beta and the scalar attached to x are unit. - if ( bli_obj_equals( beta, &BLIS_ONE ) && + // Return early if both alpha and the scalar attached to x are unit. + if ( bli_obj_equals( alpha, &BLIS_ONE ) && bli_obj_scalar_equals( x, &BLIS_ONE ) ) return; - // Alias x to x_local so we can apply beta if it is non-unit. - bli_obj_alias_to( *x, x_local ); + // + // This code has been disabled since we've now added the alpha + // parameter back to the object interface to the underlying + // scalm variant. + // + // Alias x to x_local so we can apply alpha if it is non-unit. + //bli_obj_alias_to( *x, x_local ); - // If beta is non-unit, apply it to the scalar attached to x. - if ( !bli_obj_equals( beta, &BLIS_ONE ) ) - { - bli_obj_scalar_apply_scalar( beta, &x_local ); - } + // If alpha is non-unit, apply it to the scalar attached to x. + //if ( !bli_obj_equals( alpha, &BLIS_ONE ) ) + //{ + // bli_obj_scalar_apply_scalar( alpha, &x_local ); + //} // Extract the variant number and implementation type. n = cntl_var_num( cntl ); @@ -84,6 +92,8 @@ void bli_scalm_int( obj_t* beta, f = vars[n][i]; // Invoke the variant. - f( &x_local ); + f( alpha, + x, + cntx ); } diff --git a/frame/1m/scalm/bli_scalm_int.h b/frame/1m/scalm/bli_scalm_int.h index f6a6f8098..29047d2dd 100644 --- a/frame/1m/scalm/bli_scalm_int.h +++ b/frame/1m/scalm/bli_scalm_int.h @@ -32,7 +32,8 @@ */ -void bli_scalm_int( obj_t* beta, +void bli_scalm_int( obj_t* alpha, obj_t* x, + cntx_t* cntx, scalm_t* cntl ); diff --git a/frame/1m/unpackm/bli_unpackm_blk_var2.c b/frame/1m/unpackm/bli_unpackm_blk_var2.c index d0da146a0..ab2c2cf1c 100644 --- a/frame/1m/unpackm/bli_unpackm_blk_var2.c +++ b/frame/1m/unpackm/bli_unpackm_blk_var2.c @@ -48,7 +48,8 @@ typedef void (*FUNCPTR_T)( dim_t n_panel, void* p, inc_t rs_p, inc_t cs_p, dim_t pd_p, inc_t ps_p, - void* c, inc_t rs_c, inc_t cs_c + void* c, inc_t rs_c, inc_t cs_c, + cntx_t* cntx ); static FUNCPTR_T GENARRAY(ftypes,unpackm_blk_var2); @@ -56,6 +57,7 @@ static FUNCPTR_T GENARRAY(ftypes,unpackm_blk_var2); void bli_unpackm_blk_var2( obj_t* p, obj_t* c, + cntx_t* cntx, unpackm_t* cntl ) { num_t dt_cp = bli_obj_datatype( *c ); @@ -112,27 +114,30 @@ void bli_unpackm_blk_var2( obj_t* p, n_panel, buf_p, rs_p, cs_p, pd_p, ps_p, - buf_c, rs_c, cs_c ); + buf_c, rs_c, cs_c, + cntx ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname )( \ - struc_t strucc, \ - doff_t diagoffc, \ - diag_t diagc, \ - uplo_t uploc, \ - trans_t transc, \ - dim_t m, \ - dim_t n, \ - dim_t m_panel, \ - dim_t n_panel, \ - void* p, inc_t rs_p, inc_t cs_p, \ - dim_t pd_p, inc_t ps_p, \ - void* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + struc_t strucc, \ + doff_t diagoffc, \ + diag_t diagc, \ + uplo_t uploc, \ + trans_t transc, \ + dim_t m, \ + dim_t n, \ + dim_t m_panel, \ + dim_t n_panel, \ + void* p, inc_t rs_p, inc_t cs_p, \ + dim_t pd_p, inc_t ps_p, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ ctype* restrict one = PASTEMAC(ch,1); \ ctype* restrict c_cast = c; \ @@ -226,25 +231,33 @@ void PASTEMAC(ch,varname )( \ if ( bli_intersects_diag_n( diagoffc_i, *m_panel_full, *n_panel_full ) && \ bli_is_upper_or_lower( uploc ) ) \ { \ - PASTEMAC3(ch,ch,ch,scal2m)( diagoffc_i, \ - diagc, \ - uploc, \ - transc, \ - *m_panel_full, \ - *n_panel_full, \ - one, \ - p_begin, rs_p, cs_p, \ - c_begin, rs_c, cs_c ); \ + PASTEMAC(ch,scal2m) \ + ( \ + diagoffc_i, \ + diagc, \ + uploc, \ + transc, \ + *m_panel_full, \ + *n_panel_full, \ + one, \ + p_begin, rs_p, cs_p, \ + c_begin, rs_c, cs_c, \ + cntx \ + ); \ } \ else \ { \ /* Pack the current panel. */ \ - PASTEMAC(ch,unpackm_cxk)( BLIS_NO_CONJUGATE, \ - panel_dim_i, \ - panel_len, \ - one, \ - p_begin, ldp, \ - c_begin, incc, ldc ); \ + PASTEMAC(ch,unpackm_cxk) \ + ( \ + BLIS_NO_CONJUGATE, \ + panel_dim_i, \ + panel_len, \ + one, \ + p_begin, ldp, \ + c_begin, incc, ldc, \ + cntx \ + ); \ } \ \ /*PASTEMAC(ch,fprintm)( stdout, "p copied", *m_panel_full, *n_panel_full, \ @@ -253,5 +266,5 @@ void PASTEMAC(ch,varname )( \ \ } -INSERT_GENTFUNC_BASIC( unpackm, unpackm_blk_var2 ) +INSERT_GENTFUNC_BASIC0( unpackm_blk_var2 ) diff --git a/frame/1m/unpackm/bli_unpackm_blk_var2.h b/frame/1m/unpackm/bli_unpackm_blk_var2.h index ce165e8c5..1f783260a 100644 --- a/frame/1m/unpackm/bli_unpackm_blk_var2.h +++ b/frame/1m/unpackm/bli_unpackm_blk_var2.h @@ -34,6 +34,7 @@ void bli_unpackm_blk_var2( obj_t* p, obj_t* c, + cntx_t* cntx, unpackm_t* cntl ); @@ -52,7 +53,8 @@ void PASTEMAC(ch,varname)( \ dim_t n_panel, \ void* p, inc_t rs_p, inc_t cs_p, \ dim_t pd_p, inc_t ps_p, \ - void* c, inc_t rs_c, inc_t cs_c \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ ); INSERT_GENTPROT_BASIC( unpackm_blk_var2 ) diff --git a/frame/1m/unpackm/bli_unpackm_check.c b/frame/1m/unpackm/bli_unpackm_check.c index c9feec40a..87af08f43 100644 --- a/frame/1m/unpackm/bli_unpackm_check.c +++ b/frame/1m/unpackm/bli_unpackm_check.c @@ -36,6 +36,7 @@ void bli_unpackm_check( obj_t* p, obj_t* a, + cntx_t* cntx, unpackm_t* cntl ) { err_t e_val; diff --git a/frame/1m/unpackm/bli_unpackm_check.h b/frame/1m/unpackm/bli_unpackm_check.h index 745163368..217b03c4a 100644 --- a/frame/1m/unpackm/bli_unpackm_check.h +++ b/frame/1m/unpackm/bli_unpackm_check.h @@ -34,4 +34,5 @@ void bli_unpackm_check( obj_t* p, obj_t* a, + cntx_t* cntx, unpackm_t* cntl ); diff --git a/frame/1m/unpackm/bli_unpackm_cxk.c b/frame/1m/unpackm/bli_unpackm_cxk.c index 18b3d9f9a..a31a7f9dc 100644 --- a/frame/1m/unpackm/bli_unpackm_cxk.c +++ b/frame/1m/unpackm/bli_unpackm_cxk.c @@ -150,7 +150,7 @@ static FUNCPTR_T ftypes[FUNCPTR_ARRAY_LENGTH][BLIS_NUM_FP_TYPES] = #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, copyvker ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ void PASTEMAC(ch,opname)( \ conj_t conjp, \ @@ -158,7 +158,8 @@ void PASTEMAC(ch,opname)( \ dim_t n, \ void* beta, \ void* p, inc_t ldp, \ - void* a, inc_t inca, inc_t lda \ + void* a, inc_t inca, inc_t lda, \ + cntx_t* cntx \ ) \ { \ dim_t panel_dim; \ @@ -166,13 +167,17 @@ void PASTEMAC(ch,opname)( \ FUNCPTR_T f; \ \ /* If the panel dimension is unit, then we recognize that this allows - the kernel to reduce to a copyv, so we call that kernel directly. */ \ + the kernel to reduce to a copyv, so we call that directly. */ \ if ( m == 1 ) \ { \ - PASTEMAC2(ch,ch,copyvker)( conjp, \ - n, \ - p, 1, \ - a, lda ); \ + PASTEMAC(ch,copyv) \ + ( \ + conjp, \ + n, \ + p, 1, \ + a, lda, \ + cntx \ + ); \ return; \ } \ \ @@ -196,26 +201,33 @@ void PASTEMAC(ch,opname)( \ allow the kernel implementations to remain very simple. */ \ if ( f != NULL && m == panel_dim ) \ { \ - f( conjp, \ - n, \ - beta, \ - p, \ - a, inca, lda ); \ + f \ + ( \ + conjp, \ + n, \ + beta, \ + p, \ + a, inca, lda \ + ); \ } \ else \ { \ /* Treat the panel as m x n and column-stored (unit row stride). */ \ - PASTEMAC3(ch,ch,ch,scal2m)( 0, \ - BLIS_NONUNIT_DIAG, \ - BLIS_DENSE, \ - conjp, \ - m, \ - n, \ - beta, \ - p, 1, ldp, \ - a, inca, lda ); \ + PASTEMAC(ch,scal2m) \ + ( \ + 0, \ + BLIS_NONUNIT_DIAG, \ + BLIS_DENSE, \ + conjp, \ + m, \ + n, \ + beta, \ + p, 1, ldp, \ + a, inca, lda, \ + cntx \ + ); \ } \ } -INSERT_GENTFUNC_BASIC( unpackm_cxk, COPYV_KERNEL ) +INSERT_GENTFUNC_BASIC0( unpackm_cxk ) diff --git a/frame/1m/unpackm/bli_unpackm_cxk.h b/frame/1m/unpackm/bli_unpackm_cxk.h index beebe1940..89473913e 100644 --- a/frame/1m/unpackm/bli_unpackm_cxk.h +++ b/frame/1m/unpackm/bli_unpackm_cxk.h @@ -32,20 +32,22 @@ */ -#include "bli_unpackm_ref_cxk.h" +#include "bli_unpackm_cxk_ref.h" #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conjp, \ - dim_t m, \ - dim_t n, \ - void* beta, \ - void* p, inc_t ldp, \ - void* a, inc_t inca, inc_t lda \ - ); +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjp, \ + dim_t m, \ + dim_t n, \ + void* beta, \ + void* p, inc_t ldp, \ + void* a, inc_t inca, inc_t lda, \ + cntx_t* cntx \ + ); INSERT_GENTPROT_BASIC( unpackm_cxk ) diff --git a/frame/1m/unpackm/bli_unpackm_int.c b/frame/1m/unpackm/bli_unpackm_int.c index bee104d05..0ac17a194 100644 --- a/frame/1m/unpackm/bli_unpackm_int.c +++ b/frame/1m/unpackm/bli_unpackm_int.c @@ -38,6 +38,7 @@ typedef void (*FUNCPTR_T)( obj_t* p, obj_t* a, + cntx_t* cntx, unpackm_t* cntl ); static FUNCPTR_T vars[2][3] = @@ -49,6 +50,7 @@ static FUNCPTR_T vars[2][3] = void bli_unpackm_int( obj_t* p, obj_t* a, + cntx_t* cntx, unpackm_t* cntl, packm_thrinfo_t* thread ) { @@ -90,7 +92,7 @@ void bli_unpackm_int( obj_t* p, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_unpackm_check( p, a, cntl ); + bli_unpackm_check( p, a, cntx, cntl ); // Now, if we are not skipping the unpack operation, then the only // question left is whether we are to typecast matrix a after unpacking. @@ -126,6 +128,7 @@ void bli_unpackm_int( obj_t* p, if( thread_am_ochief( thread ) ) { f( p, &c, + cntx, cntl ); } thread_obarrier( thread ); diff --git a/frame/1m/unpackm/bli_unpackm_int.h b/frame/1m/unpackm/bli_unpackm_int.h index 47d10e171..06eed00a7 100644 --- a/frame/1m/unpackm/bli_unpackm_int.h +++ b/frame/1m/unpackm/bli_unpackm_int.h @@ -34,6 +34,7 @@ void bli_unpackm_int( obj_t* p, obj_t* a, + cntx_t* cntx, unpackm_t* cntl, packm_thrinfo_t* thread ); diff --git a/frame/1m/unpackm/bli_unpackm_unb_var1.c b/frame/1m/unpackm/bli_unpackm_unb_var1.c index 95a133eac..0794f6c4f 100644 --- a/frame/1m/unpackm/bli_unpackm_unb_var1.c +++ b/frame/1m/unpackm/bli_unpackm_unb_var1.c @@ -43,7 +43,8 @@ typedef void (*FUNCPTR_T)( dim_t m, dim_t n, void* p, inc_t rs_p, inc_t cs_p, - void* c, inc_t rs_c, inc_t cs_c + void* c, inc_t rs_c, inc_t cs_c, + cntx_t* cntx ); static FUNCPTR_T GENARRAY(ftypes,unpackm_unb_var1); @@ -51,6 +52,7 @@ static FUNCPTR_T GENARRAY(ftypes,unpackm_unb_var1); void bli_unpackm_unb_var1( obj_t* p, obj_t* c, + cntx_t* cntx, unpackm_t* cntl ) { num_t dt_pc = bli_obj_datatype( *p ); @@ -83,7 +85,9 @@ void bli_unpackm_unb_var1( obj_t* p, m_c, n_c, buf_p, rs_p, cs_p, - buf_c, rs_c, cs_c ); + buf_c, rs_c, cs_c, + cntx + ); } @@ -97,20 +101,25 @@ void PASTEMAC(ch,varname)( \ dim_t m, \ dim_t n, \ void* p, inc_t rs_p, inc_t cs_p, \ - void* c, inc_t rs_c, inc_t cs_c \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ ) \ { \ ctype* p_cast = p; \ ctype* c_cast = c; \ \ - PASTEMAC2(ch,ch,copym)( diagoffp,\ - BLIS_NONUNIT_DIAG, \ - uplop, \ - transp, \ - m, \ - n, \ - p_cast, rs_p, cs_p, \ - c_cast, rs_c, cs_c ); \ + PASTEMAC(ch,copym) \ + ( \ + diagoffp,\ + BLIS_NONUNIT_DIAG, \ + uplop, \ + transp, \ + m, \ + n, \ + p_cast, rs_p, cs_p, \ + c_cast, rs_c, cs_c, \ + cntx \ + ); \ } INSERT_GENTFUNC_BASIC( unpackm, unpackm_unb_var1 ) diff --git a/frame/1m/unpackm/bli_unpackm_unb_var1.h b/frame/1m/unpackm/bli_unpackm_unb_var1.h index a8e58cbd8..fcb98bda5 100644 --- a/frame/1m/unpackm/bli_unpackm_unb_var1.h +++ b/frame/1m/unpackm/bli_unpackm_unb_var1.h @@ -34,20 +34,23 @@ void bli_unpackm_unb_var1( obj_t* p, obj_t* c, + cntx_t* cntx, unpackm_t* cntl ); #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffp, \ - uplo_t uplop, \ - trans_t transp, \ - dim_t m, \ - dim_t n, \ - void* p, inc_t rs_p, inc_t cs_p, \ - void* c, inc_t rs_c, inc_t cs_c \ - ); +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffp, \ + uplo_t uplop, \ + trans_t transp, \ + dim_t m, \ + dim_t n, \ + void* p, inc_t rs_p, inc_t cs_p, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); INSERT_GENTPROT_BASIC( unpackm_unb_var1 ) diff --git a/frame/1m/unpackm/ukernels/bli_unpackm_ref_cxk.c b/frame/1m/unpackm/ukernels/bli_unpackm_cxk_ref.c similarity index 89% rename from frame/1m/unpackm/ukernels/bli_unpackm_ref_cxk.c rename to frame/1m/unpackm/ukernels/bli_unpackm_cxk_ref.c index f04f2e38c..fe88aaece 100644 --- a/frame/1m/unpackm/ukernels/bli_unpackm_ref_cxk.c +++ b/frame/1m/unpackm/ukernels/bli_unpackm_cxk_ref.c @@ -35,15 +35,16 @@ #include "blis.h" #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conjp, \ - dim_t n, \ - void* beta, \ - void* p, \ - void* a, inc_t inca, inc_t lda \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjp, \ + dim_t n, \ + void* beta, \ + void* p, \ + void* a, inc_t inca, inc_t lda \ + ) \ { \ const inc_t ldp = 2; \ \ @@ -103,21 +104,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC( unpackm_ref_2xk, unpackm_ref_2xk ) +INSERT_GENTFUNC_BASIC0( unpackm_2xk_ref ) #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conjp, \ - dim_t n, \ - void* beta, \ - void* p, \ - void* a, inc_t inca, inc_t lda \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjp, \ + dim_t n, \ + void* beta, \ + void* p, \ + void* a, inc_t inca, inc_t lda \ + ) \ { \ const inc_t ldp = 4; \ \ @@ -185,21 +187,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC( unpackm_ref_4xk, unpackm_ref_4xk ) +INSERT_GENTFUNC_BASIC0( unpackm_4xk_ref ) #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conjp, \ - dim_t n, \ - void* beta, \ - void* p, \ - void* a, inc_t inca, inc_t lda \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjp, \ + dim_t n, \ + void* beta, \ + void* p, \ + void* a, inc_t inca, inc_t lda \ + ) \ { \ const inc_t ldp = 6; \ \ @@ -275,21 +278,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC( unpackm_ref_6xk, unpackm_ref_6xk ) +INSERT_GENTFUNC_BASIC0( unpackm_6xk_ref ) #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conjp, \ - dim_t n, \ - void* beta, \ - void* p, \ - void* a, inc_t inca, inc_t lda \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjp, \ + dim_t n, \ + void* beta, \ + void* p, \ + void* a, inc_t inca, inc_t lda \ + ) \ { \ const inc_t ldp = 8; \ \ @@ -373,21 +377,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC( unpackm_ref_8xk, unpackm_ref_8xk ) +INSERT_GENTFUNC_BASIC0( unpackm_8xk_ref ) #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conjp, \ - dim_t n, \ - void* beta, \ - void* p, \ - void* a, inc_t inca, inc_t lda \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjp, \ + dim_t n, \ + void* beta, \ + void* p, \ + void* a, inc_t inca, inc_t lda \ + ) \ { \ const inc_t ldp = 10; \ \ @@ -479,21 +484,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC( unpackm_ref_10xk, unpackm_ref_10xk ) +INSERT_GENTFUNC_BASIC0( unpackm_10xk_ref ) #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conjp, \ - dim_t n, \ - void* beta, \ - void* p, \ - void* a, inc_t inca, inc_t lda \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjp, \ + dim_t n, \ + void* beta, \ + void* p, \ + void* a, inc_t inca, inc_t lda \ + ) \ { \ const inc_t ldp = 12; \ \ @@ -593,21 +599,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC( unpackm_ref_12xk, unpackm_ref_12xk ) +INSERT_GENTFUNC_BASIC0( unpackm_12xk_ref ) #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conjp, \ - dim_t n, \ - void* beta, \ - void* p, \ - void* a, inc_t inca, inc_t lda \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjp, \ + dim_t n, \ + void* beta, \ + void* p, \ + void* a, inc_t inca, inc_t lda \ + ) \ { \ const inc_t ldp = 14; \ \ @@ -715,21 +722,22 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC( unpackm_ref_14xk, unpackm_ref_14xk ) +INSERT_GENTFUNC_BASIC0( unpackm_14xk_ref ) #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conjp, \ - dim_t n, \ - void* beta, \ - void* p, \ - void* a, inc_t inca, inc_t lda \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjp, \ + dim_t n, \ + void* beta, \ + void* p, \ + void* a, inc_t inca, inc_t lda \ + ) \ { \ const inc_t ldp = 16; \ \ @@ -845,5 +853,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC( unpackm_ref_16xk, unpackm_ref_16xk ) +INSERT_GENTFUNC_BASIC0( unpackm_16xk_ref ) diff --git a/frame/1m/unpackm/ukernels/bli_unpackm_cxk_ref.h b/frame/1m/unpackm/ukernels/bli_unpackm_cxk_ref.h new file mode 100644 index 000000000..8756747ff --- /dev/null +++ b/frame/1m/unpackm/ukernels/bli_unpackm_cxk_ref.h @@ -0,0 +1,55 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjp, \ + dim_t n, \ + void* beta, \ + void* p, \ + void* a, inc_t inca, inc_t lda \ + ); + +INSERT_GENTPROT_BASIC( unpackm_2xk_ref ) +INSERT_GENTPROT_BASIC( unpackm_4xk_ref ) +INSERT_GENTPROT_BASIC( unpackm_6xk_ref ) +INSERT_GENTPROT_BASIC( unpackm_8xk_ref ) +INSERT_GENTPROT_BASIC( unpackm_10xk_ref ) +INSERT_GENTPROT_BASIC( unpackm_12xk_ref ) +INSERT_GENTPROT_BASIC( unpackm_14xk_ref ) +INSERT_GENTPROT_BASIC( unpackm_16xk_ref ) + diff --git a/frame/2/bli_l2.h b/frame/2/bli_l2.h new file mode 100644 index 000000000..f251844cd --- /dev/null +++ b/frame/2/bli_l2.h @@ -0,0 +1,58 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "bli_l2_cntx.h" +#include "bli_l2_check.h" + +#include "bli_l2_ft.h" + +// Prototype object APIs with and without contexts. +#include "bli_oapi_w_cntx.h" +#include "bli_l2_oapi.h" +#include "bli_oapi_wo_cntx.h" +#include "bli_l2_oapi.h" + +#include "bli_l2_tapi.h" + +// Operation-specific headers +#include "bli_gemv.h" +#include "bli_ger.h" +#include "bli_hemv.h" +#include "bli_her.h" +#include "bli_her2.h" +#include "bli_symv.h" +#include "bli_syr.h" +#include "bli_syr2.h" +#include "bli_trmv.h" +#include "bli_trsv.h" diff --git a/frame/2/bli_l2_check.c b/frame/2/bli_l2_check.c new file mode 100644 index 000000000..a51d6bf1e --- /dev/null +++ b/frame/2/bli_l2_check.c @@ -0,0 +1,415 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +void bli_gemv_check + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y + ) +{ + err_t e_val; + + // Perform checks common to gemv/hemv/symv/trmv/trsv. + + bli_xxmv_check( alpha, a, x, beta, y ); + + // Check object structure. + + e_val = bli_check_general_object( a ); + bli_check_error_code( e_val ); +} + + +void bli_hemv_check + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y + ) +{ + err_t e_val; + + // Perform checks common to gemv/hemv/symv/trmv/trsv. + + bli_xxmv_check( alpha, a, x, beta, y ); + + // Check squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + + // Check object structure. + + e_val = bli_check_hermitian_object( a ); + bli_check_error_code( e_val ); +} + + +void bli_symv_check + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y + ) +{ + err_t e_val; + + // Perform checks common to gemv/hemv/symv/trmv/trsv. + + bli_xxmv_check( alpha, a, x, beta, y ); + + // Check squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + + // Check object structure. + + e_val = bli_check_symmetric_object( a ); + bli_check_error_code( e_val ); +} + + +void bli_trmv_check + ( + obj_t* alpha, + obj_t* a, + obj_t* x + ) +{ + err_t e_val; + + // Perform checks common to gemv/hemv/symv/trmv/trsv. + + bli_xxmv_check( alpha, a, x, alpha, x ); + + // Check squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + + // Check object structure. + + e_val = bli_check_triangular_object( a ); + bli_check_error_code( e_val ); +} + + +void bli_trsv_check + ( + obj_t* alpha, + obj_t* a, + obj_t* x + ) +{ + err_t e_val; + + // Perform checks common to gemv/hemv/symv/trmv/trsv. + + bli_xxmv_check( alpha, a, x, alpha, x ); + + // Check squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + + // Check object structure. + + e_val = bli_check_triangular_object( a ); + bli_check_error_code( e_val ); +} + + +void bli_ger_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* a + ) +{ + err_t e_val; + + // Perform checks common to ger/her/her2/syr/syr2. + + bli_xxr_check( alpha, x, y, a ); + + // Check object structure. + + e_val = bli_check_general_object( a ); + bli_check_error_code( e_val ); +} + + +void bli_her_check + ( + obj_t* alpha, + obj_t* x, + obj_t* a + ) +{ + err_t e_val; + + // Perform checks common to ger/her/her2/syr/syr2. + + bli_xxr_check( alpha, x, x, a ); + + // Check squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + + // Check object structure. + + e_val = bli_check_hermitian_object( a ); + bli_check_error_code( e_val ); +} + + +void bli_her2_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* a + ) +{ + err_t e_val; + + // Perform checks common to ger/her/her2/syr/syr2. + + bli_xxr_check( alpha, x, y, a ); + + // Check squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + + // Check object structure. + + e_val = bli_check_hermitian_object( a ); + bli_check_error_code( e_val ); +} + + +void bli_syr_check + ( + obj_t* alpha, + obj_t* x, + obj_t* a + ) +{ + err_t e_val; + + // Perform checks common to ger/her/her2/syr/syr2. + + bli_xxr_check( alpha, x, x, a ); + + // Check squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + + // Check object structure. + + e_val = bli_check_symmetric_object( a ); + bli_check_error_code( e_val ); +} + + +void bli_syr2_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* a + ) +{ + err_t e_val; + + // Perform checks common to ger/her/her2/syr/syr2. + + bli_xxr_check( alpha, x, y, a ); + + // Check squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + + // Check object structure. + + e_val = bli_check_symmetric_object( a ); + bli_check_error_code( e_val ); +} + + +// ----------------------------------------------------------------------------- + +void bli_xxmv_check + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_dim_equals( x, bli_obj_width_after_trans( *a ) ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_dim_equals( y, bli_obj_length_after_trans( *a ) ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); +} + +void bli_xxr_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* a + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( a ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_object( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_dim_equals( x, bli_obj_length_after_trans( *a ) ); + bli_check_error_code( e_val ); + + e_val = bli_check_vector_dim_equals( y, bli_obj_width_after_trans( *a ) ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( y ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( a ); + bli_check_error_code( e_val ); +} + diff --git a/frame/2/bli_l2_check.h b/frame/2/bli_l2_check.h new file mode 100644 index 000000000..286398391 --- /dev/null +++ b/frame/2/bli_l2_check.h @@ -0,0 +1,118 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based check functions. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y \ + ); + +GENPROT( gemv ) +GENPROT( hemv ) +GENPROT( symv ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y, \ + obj_t* a \ + ); + +GENPROT( ger ) +GENPROT( her2 ) +GENPROT( syr2 ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* a \ + ); + +GENPROT( her ) +GENPROT( syr ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x \ + ); + +GENPROT( trmv ) +GENPROT( trsv ) + + +// ----------------------------------------------------------------------------- + +void bli_xxmv_check + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y + ); + +void bli_xxr_check + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* a + ); diff --git a/frame/2/bli_l2_cntx.c b/frame/2/bli_l2_cntx.c new file mode 100644 index 000000000..841217365 --- /dev/null +++ b/frame/2/bli_l2_cntx.c @@ -0,0 +1,206 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define context initialization functions. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + /* Perform basic setup on the context. */ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernels employed by the current + operation. */ \ + /*bli_gks_cntx_set_l1f_ker( BLIS_AXPYF_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1f_ker( BLIS_DOTXF_KER, cntx );*/ \ + bli_axpyf_cntx_init( cntx ); \ + bli_dotxf_cntx_init( cntx ); \ +\ + /*bli_gks_cntx_set_l1v_ker( BLIS_AXPYV_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1v_ker( BLIS_DOTXV_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1v_ker( BLIS_SCALV_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1v_ker( BLIS_SETV_KER, cntx );*/ \ + bli_axpyv_cntx_init( cntx ); \ + bli_dotxv_cntx_init( cntx ); \ + bli_scalv_cntx_init( cntx ); \ + bli_setv_cntx_init( cntx ); \ +\ + /* Initialize the context with packm-related kernels. */ \ + bli_packm_cntx_init( cntx ); \ +\ + /* Set the register and cache blocksizes and multiples, as well + as the execution method. */ \ + bli_gks_cntx_set_blkszs( BLIS_NAT, 4, \ + BLIS_N2, BLIS_N2, \ + BLIS_M2, BLIS_M2, \ + BLIS_AF, BLIS_AF, \ + BLIS_DF, BLIS_DF, \ + cntx ); \ +} \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + /* Free the context and all memory allocated to it. */ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( gemv ) +GENFRONT( trmv ) +GENFRONT( trsv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + /* Perform basic setup on the context. */ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernels employed by the current + operation. */ \ + /*bli_gks_cntx_set_l1v_ker( BLIS_AXPYV_KER, cntx );*/ \ + bli_axpyv_cntx_init( cntx ); \ +\ + /* Initialize the context with packm-related kernels. */ \ + bli_packm_cntx_init( cntx ); \ +\ + /* Set the register and cache blocksizes and multiples, as well + as the execution method. */ \ + bli_gks_cntx_set_blkszs( BLIS_NAT, 2, \ + BLIS_N2, BLIS_N2, \ + BLIS_M2, BLIS_M2, \ + cntx ); \ +} \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + /* Free the context and all memory allocated to it. */ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( ger ) +GENFRONT( her ) +GENFRONT( syr ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + /* Perform basic setup on the context. */ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernels employed by the current + operation. */ \ + /*bli_gks_cntx_set_l1f_ker( BLIS_DOTAXPYV_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1f_ker( BLIS_AXPYF_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1f_ker( BLIS_DOTXF_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1f_ker( BLIS_DOTXAXPYF_KER, cntx );*/ \ + bli_dotaxpyv_cntx_init( cntx ); \ + bli_axpyf_cntx_init( cntx ); \ + bli_dotxf_cntx_init( cntx ); \ + bli_dotxaxpyf_cntx_init( cntx ); \ +\ + /*bli_gks_cntx_set_l1v_ker( BLIS_AXPYV_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1v_ker( BLIS_DOTXV_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1v_ker( BLIS_SCALV_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1v_ker( BLIS_SETV_KER, cntx );*/ \ + bli_axpyv_cntx_init( cntx ); \ + bli_dotxv_cntx_init( cntx ); \ + bli_scalv_cntx_init( cntx ); \ + bli_setv_cntx_init( cntx ); \ +\ + /* Initialize the context with packm-related kernels. */ \ + bli_packm_cntx_init( cntx ); \ +\ + /* Set the register and cache blocksizes and multiples, as well + as the execution method. */ \ + bli_gks_cntx_set_blkszs( BLIS_NAT, 5, \ + BLIS_N2, BLIS_N2, \ + BLIS_M2, BLIS_M2, \ + BLIS_AF, BLIS_AF, \ + BLIS_DF, BLIS_DF, \ + BLIS_XF, BLIS_XF, \ + cntx ); \ +} \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + /* Free the context and all memory allocated to it. */ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( hemv ) +GENFRONT( symv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ) \ +{ \ + /* Perform basic setup on the context. */ \ + bli_cntx_obj_create( cntx ); \ +\ + /* Initialize the context with kernels employed by the current + operation. */ \ + /*bli_gks_cntx_set_l1f_ker( BLIS_AXPY2V_KER, cntx );*/ \ + /*bli_gks_cntx_set_l1v_ker( BLIS_AXPYV_KER, cntx );*/ \ + bli_axpy2v_cntx_init( cntx ); \ + bli_axpyv_cntx_init( cntx ); \ +\ + /* Initialize the context with packm-related kernels. */ \ + bli_packm_cntx_init( cntx ); \ +\ + /* Set the register and cache blocksizes and multiples, as well + as the execution method. */ \ + bli_gks_cntx_set_blkszs( BLIS_NAT, 2, \ + BLIS_N2, BLIS_N2, \ + BLIS_M2, BLIS_M2, \ + cntx ); \ +} \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ) \ +{ \ + /* Free the context and all memory allocated to it. */ \ + bli_cntx_obj_free( cntx ); \ +} + +GENFRONT( her2 ) +GENFRONT( syr2 ) + diff --git a/frame/2/bli_l2_cntx.h b/frame/2/bli_l2_cntx.h new file mode 100644 index 000000000..8b6566f55 --- /dev/null +++ b/frame/2/bli_l2_cntx.h @@ -0,0 +1,56 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype context initialization functions. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ); \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ); + +GENPROT( gemv ) +GENPROT( hemv ) +GENPROT( symv ) +GENPROT( trmv ) +GENPROT( trsv ) + +GENPROT( ger ) +GENPROT( her ) +GENPROT( her2 ) +GENPROT( syr ) +GENPROT( syr2 ) diff --git a/frame/2/bli_l2_ft.h b/frame/2/bli_l2_ft.h new file mode 100644 index 000000000..a20a3a3eb --- /dev/null +++ b/frame/2/bli_l2_ft.h @@ -0,0 +1,166 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#ifndef BLIS_L2_FT_H +#define BLIS_L2_FT_H + + +// +// -- Level-2 function types --------------------------------------------------- +// + +// gemv + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( gemv ) + +// ger + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( ger ) + +// hemv (and symv) + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + uplo_t uploa, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( hemv ) + +// her (and syr) + +#undef GENTDEFR +#define GENTDEFR( ctype, ctype_r, ch, chr, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + uplo_t uploa, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, /* complex alpha allows her variants to also perform syr. */ \ + ctype* x, inc_t incx, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEFR( her ) + +// her2 (and syr2) + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + uplo_t uploa, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( her2 ) + +// trmv (and trsv) + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTDEF( trmv ) + + +#endif diff --git a/frame/2/bli_l2_oapi.c b/frame/2/bli_l2_oapi.c new file mode 100644 index 000000000..f9d8dd2df --- /dev/null +++ b/frame/2/bli_l2_oapi.c @@ -0,0 +1,418 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +// Guard the function definitions so that they are only compiled when +// #included from files that define the object API macros. +#ifdef BLIS_ENABLE_OAPI + +// +// Define object-based interfaces. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *a ); \ +\ + trans_t transa = bli_obj_conjtrans_status( *a ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ + dim_t m = bli_obj_length( *a ); \ + dim_t n = bli_obj_width( *a ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t incy = bli_obj_vector_inc( *y ); \ +\ + void* buf_alpha; \ + void* buf_beta; \ +\ + obj_t alpha_local; \ + obj_t beta_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, a, x, beta, y ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + beta, &beta_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ + buf_beta = bli_obj_buffer_for_1x1( dt, beta_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_14 \ + ( \ + dt, \ + opname, \ + transa, \ + conjx, \ + m, \ + n, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_x, incx, \ + buf_beta, \ + buf_y, incy, \ + cntx \ + ); \ +} + +GENFRONT( gemv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y, \ + obj_t* a \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *a ); \ +\ + conj_t conjx = bli_obj_conj_status( *x ); \ + conj_t conjy = bli_obj_conj_status( *y ); \ + dim_t m = bli_obj_length( *a ); \ + dim_t n = bli_obj_width( *a ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t incy = bli_obj_vector_inc( *y ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x, y, a ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_13 \ + ( \ + dt, \ + opname, \ + conjx, \ + conjy, \ + m, \ + n, \ + buf_alpha, \ + buf_x, incx, \ + buf_y, incy, \ + buf_a, rs_a, cs_a, \ + cntx \ + ); \ +} + +GENFRONT( ger ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *a ); \ +\ + uplo_t uploa = bli_obj_uplo( *a ); \ + conj_t conja = bli_obj_conj_status( *a ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ + dim_t m = bli_obj_length( *a ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t incy = bli_obj_vector_inc( *y ); \ +\ + void* buf_alpha; \ + void* buf_beta; \ +\ + obj_t alpha_local; \ + obj_t beta_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, a, x, beta, y ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + beta, &beta_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ + buf_beta = bli_obj_buffer_for_1x1( dt, beta_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_14 \ + ( \ + dt, \ + opname, \ + uploa, \ + conja, \ + conjx, \ + m, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_x, incx, \ + buf_beta, \ + buf_y, incy, \ + cntx \ + ); \ +} + +GENFRONT( hemv ) +GENFRONT( symv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* a \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *a ); \ +\ + uplo_t uploa = bli_obj_uplo( *a ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ + dim_t m = bli_obj_length( *a ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x, a ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_10 \ + ( \ + dt, \ + opname, \ + uploa, \ + conjx, \ + m, \ + buf_alpha, \ + buf_x, incx, \ + buf_a, rs_a, cs_a, \ + cntx \ + ); \ +} + +GENFRONT( her ) +GENFRONT( syr ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y, \ + obj_t* a \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *a ); \ +\ + uplo_t uploa = bli_obj_uplo( *a ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ + conj_t conjy = bli_obj_conj_status( *y ); \ + dim_t m = bli_obj_length( *a ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t incy = bli_obj_vector_inc( *y ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, x, y, a ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_13 \ + ( \ + dt, \ + opname, \ + uploa, \ + conjx, \ + conjy, \ + m, \ + buf_alpha, \ + buf_x, incx, \ + buf_y, incy, \ + buf_a, rs_a, cs_a, \ + cntx \ + ); \ +} + +GENFRONT( her2 ) +GENFRONT( syr2 ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *a ); \ +\ + uplo_t uploa = bli_obj_uplo( *a ); \ + trans_t transa = bli_obj_conjtrans_status( *a ); \ + diag_t diaga = bli_obj_diag( *a ); \ + dim_t m = bli_obj_length( *a ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_alpha; \ +\ + obj_t alpha_local; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( alpha, a, x ); \ +\ + /* Create local copy-casts of scalars (and apply internal conjugation + as needed). */ \ + bli_obj_scalar_init_detached_copy_of( dt, BLIS_NO_CONJUGATE, \ + alpha, &alpha_local ); \ + buf_alpha = bli_obj_buffer_for_1x1( dt, alpha_local ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_11 \ + ( \ + dt, \ + opname, \ + uploa, \ + transa, \ + diaga, \ + m, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_x, incx, \ + cntx \ + ); \ +} + +GENFRONT( trmv ) +GENFRONT( trsv ) + + +#endif + diff --git a/frame/2/bli_l2_oapi.h b/frame/2/bli_l2_oapi.h new file mode 100644 index 000000000..cd95ea760 --- /dev/null +++ b/frame/2/bli_l2_oapi.h @@ -0,0 +1,103 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( gemv ) +GENPROT( hemv ) +GENPROT( symv ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y, \ + obj_t* a \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( ger ) +GENPROT( her2 ) +GENPROT( syr2 ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* a \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( her ) +GENPROT( syr ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( trmv ) +GENPROT( trsv ) + diff --git a/frame/2/bli_l2_oapi_wc.c b/frame/2/bli_l2_oapi_wc.c new file mode 100644 index 000000000..96ced1ede --- /dev/null +++ b/frame/2/bli_l2_oapi_wc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-aware. +#include "bli_oapi_w_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l2_oapi.c" + diff --git a/frame/2/bli_l2_oapi_woc.c b/frame/2/bli_l2_oapi_woc.c new file mode 100644 index 000000000..183349a42 --- /dev/null +++ b/frame/2/bli_l2_oapi_woc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-less. +#include "bli_oapi_wo_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l2_oapi.c" + diff --git a/frame/2/bli_l2_tapi.c b/frame/2/bli_l2_tapi.c new file mode 100644 index 000000000..24558fd9d --- /dev/null +++ b/frame/2/bli_l2_tapi.c @@ -0,0 +1,502 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, ftname, rvarname, cvarname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ + dim_t m_y, n_x; \ +\ + /* Determine the dimensions of y and x. */ \ + bli_set_dims_with_trans( transa, m, n, m_y, n_x ); \ +\ + /* If y has zero elements, return early. */ \ + if ( bli_zero_dim1( m_y ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* If x has zero elements, or if alpha is zero, scale y by beta and + return early. */ \ + if ( bli_zero_dim1( n_x ) || PASTEMAC(ch,eq0)( *alpha ) ) \ + { \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m_y, \ + beta, \ + y, incy, \ + cntx_p \ + ); \ + return; \ + } \ +\ + /* Declare a void function pointer for the current operation. */ \ + PASTECH2(ch,ftname,_ft) f; \ +\ + /* Choose the underlying implementation. */ \ + if ( bli_does_notrans( transa ) ) \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,rvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,cvarname); \ + } \ + else /* if ( bli_does_trans( transa ) ) */ \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,cvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,rvarname); \ + } \ +\ + /* Invoke the variant chosen above, which loops over a level-1v or + level-1f kernel to implement the current operation. */ \ + f( \ + transa, \ + conjx, \ + m, \ + n, \ + alpha, \ + a, rs_a, cs_a, \ + x, incx, \ + beta, \ + y, incy, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC3( gemv, gemv, gemv_unf_var1, gemv_unf_var2 ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, ftname, rvarname, cvarname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* If x or y has zero elements, or if alpha is zero, return early. */ \ + if ( bli_zero_dim2( m, n ) || PASTEMAC(ch,eq0)( *alpha ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Declare a void function pointer for the current operation. */ \ + PASTECH2(ch,ftname,_ft) f; \ +\ + /* Choose the underlying implementation. */ \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,rvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,cvarname); \ +\ + /* Invoke the variant chosen above, which loops over a level-1v or + level-1f kernel to implement the current operation. */ \ + f( \ + conjx, \ + conjy, \ + m, \ + n, \ + alpha, \ + x, incx, \ + y, incy, \ + a, rs_a, cs_a, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC3( ger, ger, ger_unb_var1, ger_unb_var2 ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, ftname, conjh, rvarname, cvarname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* If x has zero elements, or if alpha is zero, scale y by beta and + return early. */ \ + if ( bli_zero_dim1( m ) || PASTEMAC(ch,eq0)( *alpha ) ) \ + { \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta, \ + y, incy, \ + cntx_p \ + ); \ + return; \ + } \ +\ + /* Declare a void function pointer for the current operation. */ \ + PASTECH2(ch,ftname,_ft) f; \ +\ + /* Choose the underlying implementation. */ \ + if ( bli_is_lower( uploa ) ) \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,rvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,cvarname); \ + } \ + else /* if ( bli_is_upper( uploa ) ) */ \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,cvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,rvarname); \ + } \ +\ + /* Invoke the variant chosen above, which loops over a level-1v or + level-1f kernel to implement the current operation. */ \ + f( \ + uploa, \ + conja, \ + conjx, \ + conjh, /* used by variants to distinguish hemv from symv */ \ + m, \ + alpha, \ + a, rs_a, cs_a, \ + x, incx, \ + beta, \ + y, incy, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC4( hemv, hemv, BLIS_CONJUGATE, hemv_unf_var1, hemv_unf_var3 ) +INSERT_GENTFUNC_BASIC4( symv, hemv, BLIS_NO_CONJUGATE, hemv_unf_var1, hemv_unf_var3 ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname, ftname, conjh, rvarname, cvarname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conjx, \ + dim_t m, \ + ctype_r* alpha, \ + ctype* x, inc_t incx, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ + ctype alpha_local; \ +\ + /* If x has zero elements, or if alpha is zero, return early. */ \ + if ( bli_zero_dim1( m ) || PASTEMAC(chr,eq0)( *alpha ) ) return; \ +\ + /* Make a local copy of alpha, cast into the complex domain. This + allows us to use the same underlying her variants to implement + both her and syr operations. */ \ + PASTEMAC2(chr,ch,copys)( *alpha, alpha_local ); \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Declare a void function pointer for the current operation. */ \ + PASTECH2(ch,ftname,_ft) f; \ +\ + /* Choose the underlying implementation. */ \ + if ( bli_is_lower( uploa ) ) \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,rvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,cvarname); \ + } \ + else /* if ( bli_is_upper( uploa ) ) */ \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,cvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,rvarname); \ + } \ +\ + /* Invoke the variant chosen above, which loops over a level-1v or + level-1f kernel to implement the current operation. */ \ + f( \ + uploa, \ + conjx, \ + conjh, /* used by variants to distinguish her from syr */ \ + m, \ + &alpha_local, \ + x, incx, \ + a, rs_a, cs_a, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNCR_BASIC4( her, her, BLIS_CONJUGATE, her_unb_var1, her_unb_var2 ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, ftname, conjh, rvarname, cvarname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conjx, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* If x has zero elements, or if alpha is zero, return early. */ \ + if ( bli_zero_dim1( m ) || PASTEMAC(ch,eq0)( *alpha ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Declare a void function pointer for the current operation. */ \ + PASTECH2(ch,ftname,_ft) f; \ +\ + /* Choose the underlying implementation. */ \ + if ( bli_is_lower( uploa ) ) \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,rvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,cvarname); \ + } \ + else /* if ( bli_is_upper( uploa ) ) */ \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,cvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,rvarname); \ + } \ +\ + /* Invoke the variant chosen above, which loops over a level-1v or + level-1f kernel to implement the current operation. */ \ + f( \ + uploa, \ + conjx, \ + conjh, /* used by variants to distinguish her2 from syr2 */ \ + m, \ + alpha, \ + x, incx, \ + a, rs_a, cs_a, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC4( syr, her, BLIS_NO_CONJUGATE, her_unb_var1, her_unb_var2 ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, ftname, conjh, rvarname, cvarname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* If x has zero elements, or if alpha is zero, return early. */ \ + if ( bli_zero_dim1( m ) || PASTEMAC(ch,eq0)( *alpha ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* Declare a void function pointer for the current operation. */ \ + PASTECH2(ch,ftname,_ft) f; \ +\ + /* Choose the underlying implementation. */ \ + if ( bli_is_lower( uploa ) ) \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,rvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,cvarname); \ + } \ + else /* if ( bli_is_upper( uploa ) ) */ \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,cvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,rvarname); \ + } \ +\ + /* Invoke the variant chosen above, which loops over a level-1v or + level-1f kernel to implement the current operation. */ \ + f( \ + uploa, \ + conjx, \ + conjy, \ + conjh, \ + m, \ + alpha, \ + x, incx, \ + y, incy, \ + a, rs_a, cs_a, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC4( her2, her2, BLIS_CONJUGATE, her2_unf_var1, her2_unf_var4 ) +INSERT_GENTFUNC_BASIC4( syr2, her2, BLIS_NO_CONJUGATE, her2_unf_var1, her2_unf_var4 ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, ftname, rvarname, cvarname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + bli_cntx_init_local_if( opname, cntx, cntx_p ); \ +\ + /* If x has zero elements, return early. */ \ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* If alpha is zero, set x to zero and return early. */ \ + if ( PASTEMAC(ch,eq0)( *alpha ) ) \ + { \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + alpha, \ + x, incx, \ + cntx_p \ + ); \ + return; \ + } \ +\ + /* Declare a void function pointer for the current operation. */ \ + PASTECH2(ch,ftname,_ft) f; \ +\ + /* Choose the underlying implementation. */ \ + if ( bli_does_notrans( transa ) ) \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,rvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,cvarname); \ + } \ + else /* if ( bli_does_trans( transa ) ) */ \ + { \ + if ( bli_is_row_stored( rs_a, cs_a ) ) f = PASTEMAC(ch,cvarname); \ + else /* column or general stored */ f = PASTEMAC(ch,rvarname); \ + } \ +\ + /* Invoke the variant chosen above, which loops over a level-1v or + level-1f kernel to implement the current operation. */ \ + f( \ + uploa, \ + transa, \ + diaga, \ + m, \ + alpha, \ + a, rs_a, cs_a, \ + x, incx, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + bli_cntx_finalize_local_if( opname, cntx ); \ +} + +INSERT_GENTFUNC_BASIC3( trmv, trmv, trmv_unf_var1, trmv_unf_var2 ) +INSERT_GENTFUNC_BASIC3( trsv, trmv, trsv_unf_var1, trsv_unf_var2 ) diff --git a/frame/2/bli_l2_tapi.h b/frame/2/bli_l2_tapi.h new file mode 100644 index 000000000..8c4575815 --- /dev/null +++ b/frame/2/bli_l2_tapi.h @@ -0,0 +1,170 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( gemv ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( ger ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( hemv ) +INSERT_GENTPROT_BASIC( symv ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conjx, \ + dim_t m, \ + ctype_r* alpha, \ + ctype* x, inc_t incx, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( her ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conjx, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( syr ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( her2 ) +INSERT_GENTPROT_BASIC( syr2 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( trmv ) +INSERT_GENTPROT_BASIC( trsv ) diff --git a/frame/2/gemv/bli_gemv.h b/frame/2/gemv/bli_gemv.h index e91646a62..b7c39613c 100644 --- a/frame/2/gemv/bli_gemv.h +++ b/frame/2/gemv/bli_gemv.h @@ -33,72 +33,8 @@ */ #include "bli_gemv_cntl.h" -#include "bli_gemv_check.h" +#include "bli_gemv_front.h" #include "bli_gemv_int.h" -#include "bli_gemv_unb_var1.h" -#include "bli_gemv_unb_var2.h" - -#include "bli_gemv_unf_var1.h" -#include "bli_gemv_unf_var2.h" - -#include "bli_gemv_blk_var1.h" -#include "bli_gemv_blk_var2.h" - - -void bli_gemv( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ); - - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - trans_t transa, \ - conj_t conjx, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* x, inc_t incx, \ - ctype* beta, \ - ctype* y, inc_t incy \ - ); - -INSERT_GENTPROT_BASIC( gemv ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, opname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - trans_t transa, \ - conj_t conjx, \ - dim_t m, \ - dim_t n, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t rs_a, inc_t cs_a, \ - ctype_x* x, inc_t incx, \ - ctype_y* beta, \ - ctype_y* y, inc_t incy \ - ); - -INSERT_GENTPROT3U12_BASIC( gemv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( gemv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( gemv ) -#endif +#include "bli_gemv_var.h" diff --git a/frame/2/gemv/bli_gemv_blk_var1.c b/frame/2/gemv/bli_gemv_blk_var1.c index 337ad8f01..8a06b528f 100644 --- a/frame/2/gemv/bli_gemv_blk_var1.c +++ b/frame/2/gemv/bli_gemv_blk_var1.c @@ -39,6 +39,7 @@ void bli_gemv_blk_var1( obj_t* alpha, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, gemv_t* cntl ) { obj_t a1, a1_pack; @@ -60,7 +61,7 @@ void bli_gemv_blk_var1( obj_t* alpha, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, m_trans, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and y1. bli_acquire_mpart_t2b( BLIS_SUBPART1, @@ -70,16 +71,16 @@ void bli_gemv_blk_var1( obj_t* alpha, // Initialize objects for packing A1 and y1 (if needed). bli_packm_init( &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packv_init( &y1, &y1_pack, - cntl_sub_packv_y( cntl ) ); + cntx, cntl_sub_packv_y( cntl ) ); // Copy/pack A1, y1 (if needed). bli_packm_int( &a1, &a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &y1, &y1_pack, - cntl_sub_packv_y( cntl ) ); + cntx, cntl_sub_packv_y( cntl ) ); // y1 = beta * y1 + alpha * A1 * x; bli_gemv_int( BLIS_NO_TRANSPOSE, @@ -89,11 +90,12 @@ void bli_gemv_blk_var1( obj_t* alpha, x, beta, &y1_pack, + cntx, cntl_sub_gemv( cntl ) ); // Copy/unpack y1 (if y1 was packed). bli_unpackv_int( &y1_pack, &y1, - cntl_sub_unpackv_y( cntl ) ); + cntx, cntl_sub_unpackv_y( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/gemv/bli_gemv_blk_var2.c b/frame/2/gemv/bli_gemv_blk_var2.c index b1aed7f41..b3f9032e8 100644 --- a/frame/2/gemv/bli_gemv_blk_var2.c +++ b/frame/2/gemv/bli_gemv_blk_var2.c @@ -39,6 +39,7 @@ void bli_gemv_blk_var2( obj_t* alpha, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, gemv_t* cntl ) { obj_t a1, a1_pack; @@ -58,14 +59,14 @@ void bli_gemv_blk_var2( obj_t* alpha, // y = beta * y; bli_scalv_int( beta, y, - cntl_sub_scalv( cntl ) ); + cntx, cntl_sub_scalv( cntl ) ); // Partition along the "k" dimension (n dimension of A). for ( i = 0; i < n_trans; i += b_alg ) { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, n_trans, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and x1. bli_acquire_mpart_l2r( BLIS_SUBPART1, @@ -75,16 +76,16 @@ void bli_gemv_blk_var2( obj_t* alpha, // Initialize objects for packing A1 and x1 (if needed). bli_packm_init( &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x( cntl ) ); + cntx, cntl_sub_packv_x( cntl ) ); // Copy/pack A1, x1 (if needed). bli_packm_int( &a1, &a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x( cntl ) ); + cntx, cntl_sub_packv_x( cntl ) ); // y = y + alpha * A1 * x1; bli_gemv_int( BLIS_NO_TRANSPOSE, @@ -94,6 +95,7 @@ void bli_gemv_blk_var2( obj_t* alpha, &x1_pack, &BLIS_ONE, y, + cntx, cntl_sub_gemv( cntl ) ); } diff --git a/frame/2/gemv/bli_gemv_cntl.c b/frame/2/gemv/bli_gemv_cntl.c index 7b1ad8033..eabba1368 100644 --- a/frame/2/gemv/bli_gemv_cntl.c +++ b/frame/2/gemv/bli_gemv_cntl.c @@ -39,9 +39,6 @@ extern packm_t* packm_cntl; extern packv_t* packv_cntl; extern unpackv_t* unpackv_cntl; -blksz_t* gemv_mc; -blksz_t* gemv_nc; - gemv_t* gemv_cntl_bs_ke_dot; gemv_t* gemv_cntl_bs_ke_axpy; @@ -57,37 +54,22 @@ gemv_t* gemv_cntl_ge_axpy; void bli_gemv_cntl_init() { - // Create blocksize objects for each dimension. - gemv_mc - = - bli_blksz_obj_create( BLIS_DEFAULT_L2_MC_S, 0, - BLIS_DEFAULT_L2_MC_D, 0, - BLIS_DEFAULT_L2_MC_C, 0, - BLIS_DEFAULT_L2_MC_Z, 0 ); - gemv_nc - = - bli_blksz_obj_create( BLIS_DEFAULT_L2_NC_S, 0, - BLIS_DEFAULT_L2_NC_D, 0, - BLIS_DEFAULT_L2_NC_C, 0, - BLIS_DEFAULT_L2_NC_Z, 0 ); - - // Create control trees for the lowest-level kernels. These trees induce // operations on (persumably) relatively small block-subvector problems. gemv_cntl_bs_ke_dot = bli_gemv_cntl_obj_create( BLIS_UNB_FUSED, BLIS_VARIANT1, + 0, NULL, NULL, NULL, - NULL, NULL, NULL, - NULL ); + NULL, NULL, NULL ); gemv_cntl_bs_ke_axpy = bli_gemv_cntl_obj_create( BLIS_UNB_FUSED, BLIS_VARIANT2, + 0, NULL, NULL, NULL, - NULL, NULL, NULL, - NULL ); + NULL, NULL, NULL ); // Create control trees for problems with relatively small m dimension @@ -96,7 +78,7 @@ void bli_gemv_cntl_init() = bli_gemv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemv_nc, + BLIS_N2, scalv_cntl, // scale y up-front packm_cntl, // pack A1 (if needed) packv_cntl, // pack x1 (if needed) @@ -107,7 +89,7 @@ void bli_gemv_cntl_init() = bli_gemv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemv_nc, + BLIS_N2, scalv_cntl, // scale y up-front packm_cntl, // pack A1 (if needed) packv_cntl, // pack x1 (if needed) @@ -122,7 +104,7 @@ void bli_gemv_cntl_init() = bli_gemv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemv_mc, + BLIS_M2, NULL, // no scaling in blk_var1 packm_cntl, // pack A1 (if needed) NULL, // x is not partitioned in var1 @@ -133,7 +115,7 @@ void bli_gemv_cntl_init() = bli_gemv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemv_mc, + BLIS_M2, NULL, // no scaling in blk_var1 packm_cntl, // pack A1 (if needed) NULL, // x is not partitioned in var1 @@ -148,7 +130,7 @@ void bli_gemv_cntl_init() = bli_gemv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemv_mc, + BLIS_M2, NULL, // no scaling in blk_var1 NULL, // do not pack A1 NULL, // x is not partitioned in var1 @@ -159,7 +141,7 @@ void bli_gemv_cntl_init() = bli_gemv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemv_mc, + BLIS_M2, NULL, // no scaling in blk_var1 NULL, // do not pack A1 NULL, // x is not partitioned in var1 @@ -170,9 +152,6 @@ void bli_gemv_cntl_init() void bli_gemv_cntl_finalize() { - bli_blksz_obj_free( gemv_mc ); - bli_blksz_obj_free( gemv_nc ); - bli_cntl_obj_free( gemv_cntl_bs_ke_dot ); bli_cntl_obj_free( gemv_cntl_bs_ke_axpy ); @@ -189,7 +168,7 @@ void bli_gemv_cntl_finalize() gemv_t* bli_gemv_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a, packv_t* sub_packv_x, @@ -199,11 +178,11 @@ gemv_t* bli_gemv_cntl_obj_create( impl_t impl_type, { gemv_t* cntl; - cntl = ( gemv_t* ) bli_malloc( sizeof(gemv_t) ); + cntl = ( gemv_t* ) bli_malloc( sizeof(gemv_t) ); cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_scalv = sub_scalv; cntl->sub_packm_a = sub_packm_a; cntl->sub_packv_x = sub_packv_x; @@ -217,7 +196,7 @@ gemv_t* bli_gemv_cntl_obj_create( impl_t impl_type, void bli_gemv_cntl_obj_init( gemv_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a, packv_t* sub_packv_x, @@ -227,7 +206,7 @@ void bli_gemv_cntl_obj_init( gemv_t* cntl, { cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_scalv = sub_scalv; cntl->sub_packm_a = sub_packm_a; cntl->sub_packv_x = sub_packv_x; diff --git a/frame/2/gemv/bli_gemv_cntl.h b/frame/2/gemv/bli_gemv_cntl.h index 1c6d09408..c0786d276 100644 --- a/frame/2/gemv/bli_gemv_cntl.h +++ b/frame/2/gemv/bli_gemv_cntl.h @@ -36,7 +36,7 @@ struct gemv_s { impl_t impl_type; varnum_t var_num; - blksz_t* b; + bszid_t bszid; struct scalv_s* sub_scalv; struct packm_s* sub_packm_a; struct packv_s* sub_packv_x; @@ -58,7 +58,7 @@ void bli_gemv_cntl_init( void ); void bli_gemv_cntl_finalize( void ); gemv_t* bli_gemv_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a, packv_t* sub_packv_x, @@ -68,7 +68,7 @@ gemv_t* bli_gemv_cntl_obj_create( impl_t impl_type, void bli_gemv_cntl_obj_init( gemv_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a, packv_t* sub_packv_x, diff --git a/frame/2/gemv/bli_gemv.c b/frame/2/gemv/bli_gemv_front.c similarity index 79% rename from frame/2/gemv/bli_gemv.c rename to frame/2/gemv/bli_gemv_front.c index dec86fa40..e2dac2179 100644 --- a/frame/2/gemv/bli_gemv.c +++ b/frame/2/gemv/bli_gemv_front.c @@ -39,11 +39,15 @@ extern gemv_t* gemv_cntl_bs_ke_dot; extern gemv_t* gemv_cntl_ge_axpy; extern gemv_t* gemv_cntl_ge_dot; -void bli_gemv( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ) +void bli_gemv_front + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx + ) { gemv_t* gemv_cntl; num_t dt_targ_a; @@ -150,6 +154,7 @@ void bli_gemv( obj_t* alpha, x, &beta_local, y, + cntx, gemv_cntl ); } @@ -158,19 +163,21 @@ void bli_gemv( obj_t* alpha, // Define BLAS-like interfaces with homogeneous-typed operands. // #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - trans_t transa, \ - conj_t conjx, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* x, inc_t incx, \ - ctype* beta, \ - ctype* y, inc_t incy \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -202,40 +209,9 @@ void PASTEMAC(ch,opname)( \ &ao, \ &xo, \ &betao, \ - &yo ); \ + &yo, \ + cntx ); \ } -INSERT_GENTFUNC_BASIC( gemv, gemv ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, opname, varname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - trans_t transa, \ - conj_t conjx, \ - dim_t m, \ - dim_t n, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t rs_a, inc_t cs_a, \ - ctype_x* x, inc_t incx, \ - ctype_y* beta, \ - ctype_y* y, inc_t incy \ - ) \ -{ \ - bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ -} - -INSERT_GENTFUNC3U12_BASIC( gemv, gemv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( gemv, gemv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( gemv, gemv ) -#endif +INSERT_GENTFUNC_BASIC0( gemv_front ) diff --git a/frame/2/gemv/bli_gemv_front.h b/frame/2/gemv/bli_gemv_front.h new file mode 100644 index 000000000..5390839c1 --- /dev/null +++ b/frame/2/gemv/bli_gemv_front.h @@ -0,0 +1,63 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +void bli_gemv_front + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx + ); + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( gemv_front ) + diff --git a/frame/2/gemv/bli_gemv_int.c b/frame/2/gemv/bli_gemv_int.c index 4c7ba0e5c..84caef7e4 100644 --- a/frame/2/gemv/bli_gemv_int.c +++ b/frame/2/gemv/bli_gemv_int.c @@ -41,6 +41,7 @@ typedef void (*FUNCPTR_T)( obj_t* alpha, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, gemv_t* cntl ); static FUNCPTR_T vars[3][3] = @@ -58,6 +59,7 @@ void bli_gemv_int( trans_t transa, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, gemv_t* cntl ) { varnum_t n; @@ -73,7 +75,7 @@ void bli_gemv_int( trans_t transa, // Check parameters. We use the aliased copy of A so the transa parameter // is taken into account for dimension checking. if ( bli_error_checking_is_enabled() ) - bli_gemv_int_check( alpha, &a_local, &x_local, beta, y, cntl ); + bli_gemv_check( alpha, &a_local, &x_local, beta, y ); // If y has a zero dimension, return early. if ( bli_obj_has_zero_dim( *y ) ) return; @@ -98,6 +100,7 @@ void bli_gemv_int( trans_t transa, &x_local, beta, y, + cntx, cntl ); } diff --git a/frame/2/gemv/bli_gemv_int.h b/frame/2/gemv/bli_gemv_int.h index 13d2823cd..69415034c 100644 --- a/frame/2/gemv/bli_gemv_int.h +++ b/frame/2/gemv/bli_gemv_int.h @@ -32,11 +32,15 @@ */ -void bli_gemv_int( trans_t transa, - conj_t conjx, - obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - gemv_t* cntl ); +void bli_gemv_int + ( + trans_t transa, + conj_t conjx, + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx, + gemv_t* cntl + ); diff --git a/frame/2/gemv/bli_gemv_unb_var1.c b/frame/2/gemv/bli_gemv_unb_var1.c index c4c63a3af..4b0c85a21 100644 --- a/frame/2/gemv/bli_gemv_unb_var1.c +++ b/frame/2/gemv/bli_gemv_unb_var1.c @@ -34,157 +34,65 @@ #include "blis.h" -#define FUNCPTR_T gemv_fp - -typedef void (*FUNCPTR_T)( - trans_t transa, - conj_t conjx, - dim_t m, - dim_t n, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,gemv_unb_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,gemv_unb_var1); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,gemv_unb_var1); -#endif -#endif - - -void bli_gemv_unb_var1( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - gemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t transa = bli_obj_conjtrans_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - dim_t n = bli_obj_width( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( transa, - conjx, - m, - n, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - trans_t transa, \ - conj_t conjx, \ - dim_t m, \ - dim_t n, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_a* a1t; \ - ctype_x* x1; \ - ctype_y* psi1; \ - dim_t i; \ - dim_t n_elem, n_iter; \ - inc_t rs_at, cs_at; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim2( m, n ) ) return; \ + ctype* a1t; \ + ctype* x1; \ + ctype* psi1; \ + dim_t i; \ + dim_t n_elem, n_iter; \ + inc_t rs_at, cs_at; \ + conj_t conja; \ \ bli_set_dims_incs_with_trans( transa, \ m, n, rs_a, cs_a, \ n_iter, n_elem, rs_at, cs_at ); \ \ conja = bli_extract_conj( transa ); \ +\ + PASTECH(ch,dotxv_ft) kfp_dv; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_dv = bli_cntx_get_l1v_ker_dt( dt, BLIS_DOTXV_KER, cntx ); \ \ for ( i = 0; i < n_iter; ++i ) \ { \ - a1t = a_cast + (i )*rs_at + (0 )*cs_at; \ - x1 = x_cast + (0 )*incy; \ - psi1 = y_cast + (i )*incy; \ + a1t = a + (i )*rs_at + (0 )*cs_at; \ + x1 = x + (0 )*incy; \ + psi1 = y + (i )*incy; \ \ /* psi1 = beta * psi1 + alpha * a1t * x1; */ \ - PASTEMAC3(cha,chx,chy,kername)( conja, \ - conjx, \ - n_elem, \ - alpha_cast, \ - a1t, cs_at, \ - x1, incx, \ - beta_cast, \ - psi1 ); \ + kfp_dv \ + ( \ + conja, \ + conjx, \ + n_elem, \ + alpha, \ + a1t, cs_at, \ + x1, incx, \ + beta, \ + psi1, \ + cntx \ + ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( gemv_unb_var1, DOTXV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( gemv_unb_var1, DOTXV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( gemv_unb_var1, DOTXV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( gemv_unb_var1 ) diff --git a/frame/2/gemv/bli_gemv_unb_var2.c b/frame/2/gemv/bli_gemv_unb_var2.c index 49620f55a..f14fc1bd6 100644 --- a/frame/2/gemv/bli_gemv_unb_var2.c +++ b/frame/2/gemv/bli_gemv_unb_var2.c @@ -34,125 +34,34 @@ #include "blis.h" -#define FUNCPTR_T gemv_fp - -typedef void (*FUNCPTR_T)( - trans_t transa, - conj_t conjx, - dim_t m, - dim_t n, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,gemv_unb_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,gemv_unb_var2); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,gemv_unb_var2); -#endif -#endif - - -void bli_gemv_unb_var2( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - gemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t transa = bli_obj_conjtrans_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - dim_t n = bli_obj_width( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( transa, - conjx, - m, - n, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - trans_t transa, \ - conj_t conjx, \ - dim_t m, \ - dim_t n, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_y* zero = PASTEMAC(chy,0); \ - ctype_a* a1; \ - ctype_x* chi1; \ - ctype_y* y1; \ - ctype_ax alpha_chi1; \ - dim_t i; \ - dim_t n_elem, n_iter; \ - inc_t rs_at, cs_at; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim2( m, n ) ) return; \ + ctype* zero = PASTEMAC(ch,0); \ + ctype* a1; \ + ctype* chi1; \ + ctype* y1; \ + ctype alpha_chi1; \ + dim_t i; \ + dim_t n_elem, n_iter; \ + inc_t rs_at, cs_at; \ + conj_t conja; \ \ bli_set_dims_incs_with_trans( transa, \ m, n, rs_a, cs_a, \ @@ -161,49 +70,57 @@ void PASTEMAC3(cha,chx,chy,varname)( \ conja = bli_extract_conj( transa ); \ \ /* If beta is zero, use setv. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ { \ /* y = 0; */ \ - PASTEMAC2(chy,chy,setv)( n_elem, \ - zero, \ - y_cast, incy ); \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + n_elem, \ + zero, \ + y, incy, \ + cntx \ + ); \ } \ else \ { \ /* y = beta * y; */ \ - PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ - n_elem, \ - beta_cast, \ - y_cast, incy ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + n_elem, \ + beta, \ + y, incy, \ + cntx \ + ); \ } \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ for ( i = 0; i < n_iter; ++i ) \ { \ - a1 = a_cast + (0 )*rs_at + (i )*cs_at; \ - chi1 = x_cast + (i )*incx; \ - y1 = y_cast + (0 )*incy; \ + a1 = a + (0 )*rs_at + (i )*cs_at; \ + chi1 = x + (i )*incx; \ + y1 = y + (0 )*incy; \ \ /* y = y + alpha * chi1 * a1; */ \ - PASTEMAC2(chx,chax,copycjs)( conjx, *chi1, alpha_chi1 ); \ - PASTEMAC2(chax,chax,scals)( *alpha_cast, alpha_chi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, alpha_chi1 ); \ + PASTEMAC(ch,scals)( *alpha, alpha_chi1 ); \ \ - PASTEMAC3(chax,cha,chy,kername)( conja, \ - n_elem, \ - &alpha_chi1, \ - a1, rs_at, \ - y1, incy ); \ + kfp_av \ + ( \ + conja, \ + n_elem, \ + &alpha_chi1, \ + a1, rs_at, \ + y1, incy, \ + cntx \ + ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( gemv_unb_var2, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( gemv_unb_var2, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( gemv_unb_var2, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( gemv_unb_var2 ) diff --git a/frame/2/gemv/bli_gemv_unf_var1.c b/frame/2/gemv/bli_gemv_unf_var1.c index 47affb4be..87481ad3c 100644 --- a/frame/2/gemv/bli_gemv_unf_var1.c +++ b/frame/2/gemv/bli_gemv_unf_var1.c @@ -34,124 +34,33 @@ #include "blis.h" -#define FUNCPTR_T gemv_fp - -typedef void (*FUNCPTR_T)( - trans_t transa, - conj_t conjx, - dim_t m, - dim_t n, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,gemv_unf_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,gemv_unf_var1); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,gemv_unf_var1); -#endif -#endif - - -void bli_gemv_unf_var1( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - gemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t transa = bli_obj_conjtrans_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - dim_t n = bli_obj_width( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( transa, - conjx, - m, - n, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - trans_t transa, \ - conj_t conjx, \ - dim_t m, \ - dim_t n, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_a* A1; \ - ctype_x* x1; \ - ctype_y* y1; \ - dim_t i; \ - dim_t b_fuse, f; \ - dim_t n_elem, n_iter; \ - inc_t rs_at, cs_at; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim2( m, n ) ) return; \ + ctype* A1; \ + ctype* x1; \ + ctype* y1; \ + dim_t i; \ + dim_t b_fuse, f; \ + dim_t n_elem, n_iter; \ + inc_t rs_at, cs_at; \ + conj_t conja; \ \ bli_set_dims_incs_with_trans( transa, \ m, n, rs_a, cs_a, \ @@ -159,40 +68,37 @@ void PASTEMAC3(cha,chx,chy,varname)( \ \ conja = bli_extract_conj( transa ); \ \ - /* Query the fusing factor for the dotxf implementation. */ \ - b_fuse = PASTEMAC(chax,dotxf_fusefac); \ + PASTECH(ch,dotxf_ft) kfp_df; \ +\ + /* Query the context for the kernel function pointer and fusing factor. */ \ + kfp_df = bli_cntx_get_l1f_ker_dt( dt, BLIS_DOTXF_KER, cntx ); \ + b_fuse = bli_cntx_get_blksz_def_dt( dt, BLIS_DF, cntx ); \ \ for ( i = 0; i < n_iter; i += f ) \ { \ f = bli_determine_blocksize_dim_f( i, n_iter, b_fuse ); \ \ - A1 = a_cast + (i )*rs_at + (0 )*cs_at; \ - x1 = x_cast + (0 )*incy; \ - y1 = y_cast + (i )*incy; \ + A1 = a + (i )*rs_at + (0 )*cs_at; \ + x1 = x + (0 )*incy; \ + y1 = y + (i )*incy; \ \ /* y1 = beta * y1 + alpha * A1 * x; */ \ - PASTEMAC3(cha,chx,chy,kername)( conja, \ - conjx, \ - n_elem, \ - f, \ - alpha_cast, \ - A1, cs_at, rs_at, \ - x1, incx, \ - beta_cast, \ - y1, incy ); \ + kfp_df \ + ( \ + conja, \ + conjx, \ + n_elem, \ + f, \ + alpha, \ + A1, cs_at, rs_at, \ + x1, incx, \ + beta, \ + y1, incy, \ + cntx \ + ); \ \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( gemv_unf_var1, DOTXF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( gemv_unf_var1, DOTXF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( gemv_unf_var1, DOTXF_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( gemv_unf_var1 ) diff --git a/frame/2/gemv/bli_gemv_unf_var2.c b/frame/2/gemv/bli_gemv_unf_var2.c index 1390b6c36..9228aabaa 100644 --- a/frame/2/gemv/bli_gemv_unf_var2.c +++ b/frame/2/gemv/bli_gemv_unf_var2.c @@ -34,125 +34,34 @@ #include "blis.h" -#define FUNCPTR_T gemv_fp - -typedef void (*FUNCPTR_T)( - trans_t transa, - conj_t conjx, - dim_t m, - dim_t n, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,gemv_unf_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,gemv_unf_var2); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,gemv_unf_var2); -#endif -#endif - - -void bli_gemv_unf_var2( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - gemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - conj_t transa = bli_obj_conjtrans_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - dim_t n = bli_obj_width( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( transa, - conjx, - m, - n, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - trans_t transa, \ - conj_t conjx, \ - dim_t m, \ - dim_t n, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_y* zero = PASTEMAC(chy,0); \ - ctype_a* A1; \ - ctype_x* x1; \ - ctype_y* y1; \ - dim_t i; \ - dim_t b_fuse, f; \ - dim_t n_elem, n_iter; \ - inc_t rs_at, cs_at; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim2( m, n ) ) return; \ + ctype* zero = PASTEMAC(ch,0); \ + ctype* A1; \ + ctype* x1; \ + ctype* y1; \ + dim_t i; \ + dim_t b_fuse, f; \ + dim_t n_elem, n_iter; \ + inc_t rs_at, cs_at; \ + conj_t conja; \ \ bli_set_dims_incs_with_trans( transa, \ m, n, rs_a, cs_a, \ @@ -161,54 +70,60 @@ void PASTEMAC3(cha,chx,chy,varname)( \ conja = bli_extract_conj( transa ); \ \ /* If beta is zero, use setv. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ { \ /* y = 0; */ \ - PASTEMAC2(chy,chy,setv)( n_elem, \ - zero, \ - y_cast, incy ); \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + n_elem, \ + zero, \ + y, incy, \ + cntx \ + ); \ } \ else \ { \ /* y = beta * y; */ \ - PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ - n_elem, \ - beta_cast, \ - y_cast, incy ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + n_elem, \ + beta, \ + y, incy, \ + cntx \ + ); \ } \ \ - /* Query the fusing factor for the axpyf implementation. */ \ - b_fuse = PASTEMAC(chax,axpyf_fusefac); \ + PASTECH(ch,axpyf_ft) kfp_af; \ +\ + /* Query the context for the kernel function pointer and fusing factor. */ \ + kfp_af = bli_cntx_get_l1f_ker_dt( dt, BLIS_AXPYF_KER, cntx ); \ + b_fuse = bli_cntx_get_blksz_def_dt( dt, BLIS_AF, cntx ); \ \ for ( i = 0; i < n_iter; i += f ) \ { \ f = bli_determine_blocksize_dim_f( i, n_iter, b_fuse ); \ \ - A1 = a_cast + (0 )*rs_at + (i )*cs_at; \ - x1 = x_cast + (i )*incx; \ - y1 = y_cast + (0 )*incy; \ + A1 = a + (0 )*rs_at + (i )*cs_at; \ + x1 = x + (i )*incx; \ + y1 = y + (0 )*incy; \ \ /* y = y + alpha * A1 * x1; */ \ - PASTEMAC3(cha,chx,chy,kername)( conja, \ - conjx, \ - n_elem, \ - f, \ - alpha_cast, \ - A1, rs_at, cs_at, \ - x1, incx, \ - y1, incy ); \ + kfp_af \ + ( \ + conja, \ + conjx, \ + n_elem, \ + f, \ + alpha, \ + A1, rs_at, cs_at, \ + x1, incx, \ + y1, incy, \ + cntx \ + ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( gemv_unf_var2, AXPYF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( gemv_unf_var2, AXPYF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( gemv_unf_var2, AXPYF_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( gemv_unf_var2 ) diff --git a/frame/2/gemv/bli_gemv_var.h b/frame/2/gemv/bli_gemv_var.h new file mode 100644 index 000000000..9dd3f5d71 --- /dev/null +++ b/frame/2/gemv/bli_gemv_var.h @@ -0,0 +1,90 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y, \ + cntx_t* cntx, \ + gemv_t* cntl \ + ); + +GENPROT( gemv_blk_var1 ) +GENPROT( gemv_blk_var2 ) + +GENPROT( gemv_unb_var1 ) +GENPROT( gemv_unb_var2 ) + +GENPROT( gemv_unf_var1 ) +GENPROT( gemv_unf_var2 ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( gemv_unb_var1 ) +INSERT_GENTPROT_BASIC( gemv_unb_var2 ) + +INSERT_GENTPROT_BASIC( gemv_unf_var1 ) +INSERT_GENTPROT_BASIC( gemv_unf_var2 ) + diff --git a/frame/2/gemv/bli_gemv_var_oapi.c b/frame/2/gemv/bli_gemv_var_oapi.c new file mode 100644 index 000000000..6d27452c2 --- /dev/null +++ b/frame/2/gemv/bli_gemv_var_oapi.c @@ -0,0 +1,95 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y, \ + cntx_t* cntx, \ + gemv_t* cntl \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *a ); \ +\ + trans_t transa = bli_obj_conjtrans_status( *a ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ +\ + dim_t m = bli_obj_length( *a ); \ + dim_t n = bli_obj_width( *a ); \ +\ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ +\ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t incy = bli_obj_vector_inc( *y ); \ +\ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ + void* buf_beta = bli_obj_buffer_for_1x1( dt, *beta ); \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + bli_call_ft_14 \ + ( \ + dt, \ + opname, \ + transa, \ + conjx, \ + m, \ + n, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_x, incx, \ + buf_beta, \ + buf_y, incy, \ + cntx \ + ); \ +} \ + +GENFRONT( gemv_unb_var1 ) +GENFRONT( gemv_unb_var2 ) + +GENFRONT( gemv_unf_var1 ) +GENFRONT( gemv_unf_var2 ) + diff --git a/frame/2/gemv/bli_gemv_var_oapi.c.prev b/frame/2/gemv/bli_gemv_var_oapi.c.prev new file mode 100644 index 000000000..771cfbf12 --- /dev/null +++ b/frame/2/gemv/bli_gemv_var_oapi.c.prev @@ -0,0 +1,97 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENFRONT +#define GENFRONT( ftname, opname ) \ +\ +/*static gemv_vft GENARRAY(ftypes,gemv_unb_var1);*/ \ +static GENARRAY_VFP(ftname,opname); \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y, \ + cntx_t* cntx, \ + gemv_t* cntl \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *a ); \ +\ + trans_t transa = bli_obj_conjtrans_status( *a ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ +\ + dim_t m = bli_obj_length( *a ); \ + dim_t n = bli_obj_width( *a ); \ +\ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ +\ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t incy = bli_obj_vector_inc( *y ); \ +\ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ + void* buf_beta = bli_obj_buffer_for_1x1( dt, *beta ); \ +\ + PASTECH(ftname,_vft) f = PASTECH(opname,_vfp)[dt]; \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + f( \ + transa, \ + conjx, \ + m, \ + n, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_x, incx, \ + buf_beta, \ + buf_y, incy, \ + cntx \ + ); \ +} \ + +GENFRONT( gemv, gemv_unb_var1 ) +GENFRONT( gemv, gemv_unb_var2 ) + +GENFRONT( gemv, gemv_unf_var1 ) +GENFRONT( gemv, gemv_unf_var2 ) + diff --git a/frame/2/gemv/bli_gemv_blk_var1.h b/frame/2/gemv/old/bli_gemv_blk_var1.h similarity index 98% rename from frame/2/gemv/bli_gemv_blk_var1.h rename to frame/2/gemv/old/bli_gemv_blk_var1.h index 6308ea262..34914a199 100644 --- a/frame/2/gemv/bli_gemv_blk_var1.h +++ b/frame/2/gemv/old/bli_gemv_blk_var1.h @@ -37,5 +37,6 @@ void bli_gemv_blk_var1( obj_t* alpha, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, gemv_t* cntl ); diff --git a/frame/2/gemv/bli_gemv_blk_var2.h b/frame/2/gemv/old/bli_gemv_blk_var2.h similarity index 98% rename from frame/2/gemv/bli_gemv_blk_var2.h rename to frame/2/gemv/old/bli_gemv_blk_var2.h index 4034c3593..7b68d77ea 100644 --- a/frame/2/gemv/bli_gemv_blk_var2.h +++ b/frame/2/gemv/old/bli_gemv_blk_var2.h @@ -37,5 +37,6 @@ void bli_gemv_blk_var2( obj_t* alpha, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, gemv_t* cntl ); diff --git a/frame/2/gemv/bli_gemv_check.c b/frame/2/gemv/old/bli_gemv_check.c similarity index 99% rename from frame/2/gemv/bli_gemv_check.c rename to frame/2/gemv/old/bli_gemv_check.c index d6c8852f5..49834320a 100644 --- a/frame/2/gemv/bli_gemv_check.c +++ b/frame/2/gemv/old/bli_gemv_check.c @@ -106,6 +106,7 @@ void bli_gemv_int_check( obj_t* alpha, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, gemv_t* cntl ) { err_t e_val; diff --git a/frame/2/gemv/bli_gemv_check.h b/frame/2/gemv/old/bli_gemv_check.h similarity index 98% rename from frame/2/gemv/bli_gemv_check.h rename to frame/2/gemv/old/bli_gemv_check.h index d256e8d9e..7b0d2938e 100644 --- a/frame/2/gemv/bli_gemv_check.h +++ b/frame/2/gemv/old/bli_gemv_check.h @@ -49,5 +49,6 @@ void bli_gemv_int_check( obj_t* alpha, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, gemv_t* cntl ); diff --git a/frame/ind/cntl/bli_ind_cntl_init.c b/frame/2/gemv/old/bli_gemv_cntx.c similarity index 65% rename from frame/ind/cntl/bli_ind_cntl_init.c rename to frame/2/gemv/old/bli_gemv_cntx.c index 5e244e52d..e43579262 100644 --- a/frame/ind/cntl/bli_ind_cntl_init.c +++ b/frame/2/gemv/old/bli_gemv_cntx.c @@ -34,55 +34,32 @@ #include "blis.h" -void bli_ind_cntl_init( void ) +void bli_gemv_cntx_init( cntx_t* cntx ) { - // Level-3 via 3mh - bli_gemm3mh_cntl_init(); + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); - // Level-3 via 3m3 - bli_gemm3m3_cntl_init(); + // Initialize the context with kernels for the current architecture. + bli_gks_cntx_set_l1v_ker( BLIS_AXPYV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_DOTXV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_SCALV_KER, cntx ); - // Level-3 via 3m2 - bli_gemm3m2_cntl_init(); + bli_gks_cntx_set_l1f_ker( BLIS_AXPYF_KER, cntx ); + bli_gks_cntx_set_l1f_ker( BLIS_DOTXF_KER, cntx ); - // Level-3 via 3m1 - bli_gemm3m1_cntl_init(); - bli_trsm3m1_cntl_init(); - - // Level-3 via 4mh - bli_gemm4mh_cntl_init(); - - // Level-3 via 4mb - bli_gemm4mb_cntl_init(); - - // Level-3 via 4m1 - bli_gemm4m1_cntl_init(); - bli_trsm4m1_cntl_init(); + // Set the register and cache blocksizes and multiples, as well + // as the execution method. + bli_gks_cntx_set_blkszs( BLIS_NAT, 4, + BLIS_N2, BLIS_N2, + BLIS_M2, BLIS_M2, + BLIS_AF, BLIS_AF, + BLIS_DF, BLIS_DF, + cntx ); } -void bli_ind_cntl_finalize( void ) +void bli_gemv_cntx_finalize( cntx_t* cntx ) { - // Level-3 via 3mh - bli_gemm3mh_cntl_finalize(); - - // Level-3 via 3m3 - bli_gemm3m3_cntl_finalize(); - - // Level-3 via 3m2 - bli_gemm3m2_cntl_finalize(); - - // Level-3 via 3m1 - bli_gemm3m1_cntl_finalize(); - bli_trsm3m1_cntl_finalize(); - - // Level-3 via 4mh - bli_gemm4mh_cntl_finalize(); - - // Level-3 via 4mb - bli_gemm4mb_cntl_finalize(); - - // Level-3 via 4m1 - bli_gemm4m1_cntl_finalize(); - bli_trsm4m1_cntl_finalize(); + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); } diff --git a/frame/3/gemm/other/bli_gemm_cntl_exp.h b/frame/2/gemv/old/bli_gemv_cntx.h similarity index 95% rename from frame/3/gemm/other/bli_gemm_cntl_exp.h rename to frame/2/gemv/old/bli_gemv_cntx.h index eaab8b8ba..7041c5439 100644 --- a/frame/3/gemm/other/bli_gemm_cntl_exp.h +++ b/frame/2/gemv/old/bli_gemv_cntx.h @@ -32,6 +32,6 @@ */ -void bli_gemm_cntl_init_exp( void ); -void bli_gemm_cntl_finalize_exp( void ); +void bli_gemv_cntx_init( void ); +void bli_gemv_cntx_finalize( void ); diff --git a/frame/1f/dotxf/bli_dotxf_ref.c b/frame/2/gemv/old/bli_gemv_unb_var1.c similarity index 60% rename from frame/1f/dotxf/bli_dotxf_ref.c rename to frame/2/gemv/old/bli_gemv_unb_var1.c index c57b2db14..241b0752a 100644 --- a/frame/1f/dotxf/bli_dotxf_ref.c +++ b/frame/2/gemv/old/bli_gemv_unb_var1.c @@ -34,59 +34,59 @@ #include "blis.h" -/* -#define FUNCPTR_T dotxf_fp +#define FUNCPTR_T gemv_fp typedef void (*FUNCPTR_T)( - conj_t conjat, - conj_t conjx, - dim_t m, - dim_t b_n, - void* alpha, - void* a, inc_t inca, inc_t lda, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy + trans_t transa, + conj_t conjx, + dim_t m, + dim_t n, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx, + void* beta, + void* y, inc_t incy ); // If some mixed datatype functions will not be compiled, we initialize // the corresponding elements of the function array to NULL. #ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotxf_ref); +static FUNCPTR_T GENARRAY3_ALL(ftypes,gemv_unb_var1); #else #ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotxf_ref); +static FUNCPTR_T GENARRAY3_EXT(ftypes,gemv_unb_var1); #else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotxf_ref); +static FUNCPTR_T GENARRAY3_MIN(ftypes,gemv_unb_var1); #endif #endif -void bli_dotxf_ref( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ) +void bli_gemv_unb_var1( obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + gemv_t* cntl ) { num_t dt_a = bli_obj_datatype( *a ); num_t dt_x = bli_obj_datatype( *x ); num_t dt_y = bli_obj_datatype( *y ); - conj_t conjat = bli_obj_conj_status( *a ); + conj_t transa = bli_obj_conjtrans_status( *a ); conj_t conjx = bli_obj_conj_status( *x ); - dim_t m = bli_obj_vector_dim( *x ); - dim_t b_n = bli_obj_vector_dim( *y ); + dim_t m = bli_obj_length( *a ); + dim_t n = bli_obj_width( *a ); void* buf_a = bli_obj_buffer_at_off( *a ); inc_t rs_a = bli_obj_row_stride( *a ); inc_t cs_a = bli_obj_col_stride( *a ); - inc_t inc_x = bli_obj_vector_inc( *x ); void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); - inc_t inc_y = bli_obj_vector_inc( *y ); void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); num_t dt_alpha; void* buf_alpha; @@ -110,57 +110,69 @@ void bli_dotxf_ref( obj_t* alpha, f = ftypes[dt_a][dt_x][dt_y]; // Invoke the function. - f( conjat, + f( transa, conjx, m, - b_n, + n, buf_alpha, buf_a, rs_a, cs_a, - buf_x, inc_x, + buf_x, incx, buf_beta, - buf_y, inc_y ); + buf_y, incy ); } -*/ #undef GENTFUNC3U12 #define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ \ -void PASTEMAC3(cha,chx,chy,varname) \ - ( \ - conj_t conjat, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ax* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict beta, \ - ctype_y* restrict y, inc_t incy \ - ) \ +void PASTEMAC3(cha,chx,chy,varname)( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy \ + ) \ { \ + const num_t dt = PASTEMAC(ch,type); \ +\ ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ ctype_a* a_cast = a; \ ctype_x* x_cast = x; \ - ctype_y* beta_cast = beta; \ ctype_y* y_cast = y; \ - ctype_a* a1; \ + ctype_a* a1t; \ ctype_x* x1; \ ctype_y* psi1; \ dim_t i; \ + dim_t n_elem, n_iter; \ + inc_t rs_at, cs_at; \ + conj_t conja; \ \ - for ( i = 0; i < b_n; ++i ) \ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + bli_set_dims_incs_with_trans( transa, \ + m, n, rs_a, cs_a, \ + n_iter, n_elem, rs_at, cs_at ); \ +\ + conja = bli_extract_conj( transa ); \ +\ + for ( i = 0; i < n_iter; ++i ) \ { \ - a1 = a_cast + (0 )*inca + (i )*lda; \ - x1 = x_cast + (0 )*incx; \ + a1t = a_cast + (i )*rs_at + (0 )*cs_at; \ + x1 = x_cast + (0 )*incy; \ psi1 = y_cast + (i )*incy; \ \ - PASTEMAC3(cha,chx,chy,kername)( conjat, \ + /* psi1 = beta * psi1 + alpha * a1t * x1; */ \ + PASTEMAC3(cha,chx,chy,kername)( conja, \ conjx, \ - m, \ + n_elem, \ alpha_cast, \ - a1, inca, \ - x1, incx, \ + a1t, cs_at, \ + x1, incx, \ beta_cast, \ psi1 ); \ } \ @@ -168,13 +180,13 @@ void PASTEMAC3(cha,chx,chy,varname) \ // Define the basic set of functions unconditionally, and then also some // mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( dotxf_ref, DOTXV_KERNEL ) +INSERT_GENTFUNC3U12_BASIC( gemv_unb_var1, DOTXV_KERNEL ) #ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( dotxf_ref, DOTXV_KERNEL ) +INSERT_GENTFUNC3U12_MIX_D( gemv_unb_var1, DOTXV_KERNEL ) #endif #ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( dotxf_ref, DOTXV_KERNEL ) +INSERT_GENTFUNC3U12_MIX_P( gemv_unb_var1, DOTXV_KERNEL ) #endif diff --git a/frame/2/gemv/bli_gemv_unb_var1.h b/frame/2/gemv/old/bli_gemv_unb_var1.h similarity index 100% rename from frame/2/gemv/bli_gemv_unb_var1.h rename to frame/2/gemv/old/bli_gemv_unb_var1.h diff --git a/frame/1f/axpyf/bli_axpyf_ref.c b/frame/2/gemv/old/bli_gemv_unb_var2.c similarity index 54% rename from frame/1f/axpyf/bli_axpyf_ref.c rename to frame/2/gemv/old/bli_gemv_unb_var2.c index fb0558601..418ba1668 100644 --- a/frame/1f/axpyf/bli_axpyf_ref.c +++ b/frame/2/gemv/old/bli_gemv_unb_var2.c @@ -34,61 +34,66 @@ #include "blis.h" -/* -#define FUNCPTR_T axpyf_fp +#define FUNCPTR_T gemv_fp typedef void (*FUNCPTR_T)( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - void* alpha, - void* a, inc_t inca, inc_t lda, - void* x, inc_t incx, - void* y, inc_t incy + trans_t transa, + conj_t conjx, + dim_t m, + dim_t n, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx, + void* beta, + void* y, inc_t incy ); // If some mixed datatype functions will not be compiled, we initialize // the corresponding elements of the function array to NULL. #ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,axpyf_ref); +static FUNCPTR_T GENARRAY3_ALL(ftypes,gemv_unb_var2); #else #ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,axpyf_ref); +static FUNCPTR_T GENARRAY3_EXT(ftypes,gemv_unb_var2); #else -static FUNCPTR_T GENARRAY3_MIN(ftypes,axpyf_ref); +static FUNCPTR_T GENARRAY3_MIN(ftypes,gemv_unb_var2); #endif #endif -void bli_axpyf_ref( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* y ) +void bli_gemv_unb_var2( obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + gemv_t* cntl ) { num_t dt_a = bli_obj_datatype( *a ); num_t dt_x = bli_obj_datatype( *x ); num_t dt_y = bli_obj_datatype( *y ); - conj_t conja = bli_obj_conj_status( *a ); + conj_t transa = bli_obj_conjtrans_status( *a ); conj_t conjx = bli_obj_conj_status( *x ); - dim_t m = bli_obj_vector_dim( *y ); - dim_t b_n = bli_obj_vector_dim( *x ); + dim_t m = bli_obj_length( *a ); + dim_t n = bli_obj_width( *a ); void* buf_a = bli_obj_buffer_at_off( *a ); inc_t rs_a = bli_obj_row_stride( *a ); inc_t cs_a = bli_obj_col_stride( *a ); - inc_t inc_x = bli_obj_vector_inc( *x ); void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); - inc_t inc_y = bli_obj_vector_inc( *y ); void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); num_t dt_alpha; void* buf_alpha; + num_t dt_beta; + void* buf_beta; + FUNCPTR_T f; // The datatype of alpha MUST be the type union of a and x. This is to @@ -96,74 +101,111 @@ void bli_axpyf_ref( obj_t* alpha, dt_alpha = bli_datatype_union( dt_a, dt_x ); buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + // The datatype of beta MUST be the same as the datatype of y. + dt_beta = dt_y; + buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); + // Index into the type combination array to extract the correct // function pointer. f = ftypes[dt_a][dt_x][dt_y]; // Invoke the function. - f( conja, + f( transa, conjx, m, - b_n, + n, buf_alpha, buf_a, rs_a, cs_a, - buf_x, inc_x, - buf_y, inc_y ); + buf_x, incx, + buf_beta, + buf_y, incy ); } -*/ #undef GENTFUNC3U12 #define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ \ -void PASTEMAC3(cha,chx,chy,varname) \ - ( \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ax* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ) \ +void PASTEMAC3(cha,chx,chy,varname)( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy \ + ) \ { \ + const num_t dt = PASTEMAC(ch,type); \ +\ ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ ctype_a* a_cast = a; \ ctype_x* x_cast = x; \ ctype_y* y_cast = y; \ + ctype_y* zero = PASTEMAC(chy,0); \ ctype_a* a1; \ ctype_x* chi1; \ ctype_y* y1; \ ctype_ax alpha_chi1; \ dim_t i; \ + dim_t n_elem, n_iter; \ + inc_t rs_at, cs_at; \ + conj_t conja; \ \ - for ( i = 0; i < b_n; ++i ) \ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + bli_set_dims_incs_with_trans( transa, \ + m, n, rs_a, cs_a, \ + n_elem, n_iter, rs_at, cs_at ); \ +\ + conja = bli_extract_conj( transa ); \ +\ + /* If beta is zero, use setv. Otherwise, scale by beta. */ \ + if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ { \ - a1 = a_cast + (0 )*inca + (i )*lda; \ + /* y = 0; */ \ + PASTEMAC2(chy,chy,setv)( n_elem, \ + zero, \ + y_cast, incy ); \ + } \ + else \ + { \ + /* y = beta * y; */ \ + PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ + n_elem, \ + beta_cast, \ + y_cast, incy ); \ + } \ +\ + for ( i = 0; i < n_iter; ++i ) \ + { \ + a1 = a_cast + (0 )*rs_at + (i )*cs_at; \ chi1 = x_cast + (i )*incx; \ y1 = y_cast + (0 )*incy; \ \ + /* y = y + alpha * chi1 * a1; */ \ PASTEMAC2(chx,chax,copycjs)( conjx, *chi1, alpha_chi1 ); \ PASTEMAC2(chax,chax,scals)( *alpha_cast, alpha_chi1 ); \ \ PASTEMAC3(chax,cha,chy,kername)( conja, \ - m, \ + n_elem, \ &alpha_chi1, \ - a1, inca, \ + a1, rs_at, \ y1, incy ); \ } \ } // Define the basic set of functions unconditionally, and then also some // mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( axpyf_ref, AXPYV_KERNEL ) +INSERT_GENTFUNC3U12_BASIC( gemv_unb_var2, AXPYV_KERNEL ) #ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( axpyf_ref, AXPYV_KERNEL ) +INSERT_GENTFUNC3U12_MIX_D( gemv_unb_var2, AXPYV_KERNEL ) #endif #ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( axpyf_ref, AXPYV_KERNEL ) +INSERT_GENTFUNC3U12_MIX_P( gemv_unb_var2, AXPYV_KERNEL ) #endif diff --git a/frame/2/gemv/bli_gemv_unb_var2.h b/frame/2/gemv/old/bli_gemv_unb_var2.h similarity index 100% rename from frame/2/gemv/bli_gemv_unb_var2.h rename to frame/2/gemv/old/bli_gemv_unb_var2.h diff --git a/frame/1f/dotxf/bli_dotxf_kernel.c b/frame/2/gemv/old/bli_gemv_unf_var1.c similarity index 52% rename from frame/1f/dotxf/bli_dotxf_kernel.c rename to frame/2/gemv/old/bli_gemv_unf_var1.c index 3be191305..4c3552fc0 100644 --- a/frame/1f/dotxf/bli_dotxf_kernel.c +++ b/frame/2/gemv/old/bli_gemv_unf_var1.c @@ -34,58 +34,59 @@ #include "blis.h" -#define FUNCPTR_T dotxf_fp +#define FUNCPTR_T gemv_fp typedef void (*FUNCPTR_T)( - conj_t conjat, - conj_t conjx, - dim_t m, - dim_t b_n, - void* alpha, - void* a, inc_t inca, inc_t lda, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy + trans_t transa, + conj_t conjx, + dim_t m, + dim_t n, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx, + void* beta, + void* y, inc_t incy ); // If some mixed datatype functions will not be compiled, we initialize // the corresponding elements of the function array to NULL. #ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,dotxf_kernel_void); +static FUNCPTR_T GENARRAY3_ALL(ftypes,gemv_unf_var1); #else #ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,dotxf_kernel_void); +static FUNCPTR_T GENARRAY3_EXT(ftypes,gemv_unf_var1); #else -static FUNCPTR_T GENARRAY3_MIN(ftypes,dotxf_kernel_void); +static FUNCPTR_T GENARRAY3_MIN(ftypes,gemv_unf_var1); #endif #endif -void bli_dotxf_kernel( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ) +void bli_gemv_unf_var1( obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + gemv_t* cntl ) { num_t dt_a = bli_obj_datatype( *a ); num_t dt_x = bli_obj_datatype( *x ); num_t dt_y = bli_obj_datatype( *y ); - conj_t conjat = bli_obj_conj_status( *a ); + conj_t transa = bli_obj_conjtrans_status( *a ); conj_t conjx = bli_obj_conj_status( *x ); - dim_t m = bli_obj_vector_dim( *x ); - dim_t b_n = bli_obj_vector_dim( *y ); + dim_t m = bli_obj_length( *a ); + dim_t n = bli_obj_width( *a ); void* buf_a = bli_obj_buffer_at_off( *a ); inc_t rs_a = bli_obj_row_stride( *a ); inc_t cs_a = bli_obj_col_stride( *a ); - inc_t inc_x = bli_obj_vector_inc( *x ); void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); - inc_t inc_y = bli_obj_vector_inc( *y ); void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); num_t dt_alpha; void* buf_alpha; @@ -109,15 +110,15 @@ void bli_dotxf_kernel( obj_t* alpha, f = ftypes[dt_a][dt_x][dt_y]; // Invoke the function. - f( conjat, + f( transa, conjx, m, - b_n, + n, buf_alpha, buf_a, rs_a, cs_a, - buf_x, inc_x, + buf_x, incx, buf_beta, - buf_y, inc_y ); + buf_y, incy ); } @@ -125,37 +126,75 @@ void bli_dotxf_kernel( obj_t* alpha, #define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ \ void PASTEMAC3(cha,chx,chy,varname)( \ - conj_t conjat, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - void* alpha, \ - void* a, inc_t inca, inc_t lda, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy \ + ) \ { \ - PASTEMAC3(cha,chx,chy,kername)( conjat, \ - conjx, \ - m, \ - b_n, \ - alpha, \ - a, inca, lda, \ - x, incx, \ - beta, \ - y, incy ); \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_a* A1; \ + ctype_x* x1; \ + ctype_y* y1; \ + dim_t i; \ + dim_t b_fuse, f; \ + dim_t n_elem, n_iter; \ + inc_t rs_at, cs_at; \ + conj_t conja; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + bli_set_dims_incs_with_trans( transa, \ + m, n, rs_a, cs_a, \ + n_iter, n_elem, rs_at, cs_at ); \ +\ + conja = bli_extract_conj( transa ); \ +\ + /* Query the fusing factor for the dotxf implementation. */ \ + b_fuse = PASTEMAC(chax,dotxf_fusefac); \ +\ + for ( i = 0; i < n_iter; i += f ) \ + { \ + f = bli_determine_blocksize_dim_f( i, n_iter, b_fuse ); \ +\ + A1 = a_cast + (i )*rs_at + (0 )*cs_at; \ + x1 = x_cast + (0 )*incy; \ + y1 = y_cast + (i )*incy; \ +\ + /* y1 = beta * y1 + alpha * A1 * x; */ \ + PASTEMAC3(cha,chx,chy,kername)( conja, \ + conjx, \ + n_elem, \ + f, \ + alpha_cast, \ + A1, cs_at, rs_at, \ + x1, incx, \ + beta_cast, \ + y1, incy ); \ +\ + } \ } // Define the basic set of functions unconditionally, and then also some // mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( dotxf_kernel_void, DOTXF_KERNEL ) +INSERT_GENTFUNC3U12_BASIC( gemv_unf_var1, DOTXF_KERNEL ) #ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( dotxf_kernel_void, DOTXF_KERNEL ) +INSERT_GENTFUNC3U12_MIX_D( gemv_unf_var1, DOTXF_KERNEL ) #endif #ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( dotxf_kernel_void, DOTXF_KERNEL ) +INSERT_GENTFUNC3U12_MIX_P( gemv_unf_var1, DOTXF_KERNEL ) #endif diff --git a/frame/2/gemv/bli_gemv_unf_var1.h b/frame/2/gemv/old/bli_gemv_unf_var1.h similarity index 100% rename from frame/2/gemv/bli_gemv_unf_var1.h rename to frame/2/gemv/old/bli_gemv_unf_var1.h diff --git a/frame/2/gemv/old/bli_gemv_unf_var2.c b/frame/2/gemv/old/bli_gemv_unf_var2.c new file mode 100644 index 000000000..53eca9fb1 --- /dev/null +++ b/frame/2/gemv/old/bli_gemv_unf_var2.c @@ -0,0 +1,216 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T gemv_fp + +typedef void (*FUNCPTR_T)( + trans_t transa, + conj_t conjx, + dim_t m, + dim_t n, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx, + void* beta, + void* y, inc_t incy + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY3_ALL(ftypes,gemv_unf_var2); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY3_EXT(ftypes,gemv_unf_var2); +#else +static FUNCPTR_T GENARRAY3_MIN(ftypes,gemv_unf_var2); +#endif +#endif + + +void bli_gemv_unf_var2( obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + gemv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + + conj_t transa = bli_obj_conjtrans_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *a ); + dim_t n = bli_obj_width( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + num_t dt_alpha; + void* buf_alpha; + + num_t dt_beta; + void* buf_beta; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // The datatype of beta MUST be the same as the datatype of y. + dt_beta = dt_y; + buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a][dt_x][dt_y]; + + // Invoke the function. + f( transa, + conjx, + m, + n, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx, + buf_beta, + buf_y, incy ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +\ +void PASTEMAC3(cha,chx,chy,varname)( \ + trans_t transa, \ + conj_t conjx, \ + dim_t m, \ + dim_t n, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_y* zero = PASTEMAC(chy,0); \ + ctype_a* A1; \ + ctype_x* x1; \ + ctype_y* y1; \ + dim_t i; \ + dim_t b_fuse, f; \ + dim_t n_elem, n_iter; \ + inc_t rs_at, cs_at; \ + conj_t conja; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + bli_set_dims_incs_with_trans( transa, \ + m, n, rs_a, cs_a, \ + n_elem, n_iter, rs_at, cs_at ); \ +\ + conja = bli_extract_conj( transa ); \ +\ + /* If beta is zero, use setv. Otherwise, scale by beta. */ \ + if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + { \ + /* y = 0; */ \ + PASTEMAC2(chy,chy,setv)( n_elem, \ + zero, \ + y_cast, incy ); \ + } \ + else \ + { \ + /* y = beta * y; */ \ + PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ + n_elem, \ + beta_cast, \ + y_cast, incy ); \ + } \ +\ + /* Query the fusing factor for the axpyf implementation. */ \ + b_fuse = PASTEMAC(chax,axpyf_fusefac); \ +\ + for ( i = 0; i < n_iter; i += f ) \ + { \ + f = bli_determine_blocksize_dim_f( i, n_iter, b_fuse ); \ +\ + A1 = a_cast + (0 )*rs_at + (i )*cs_at; \ + x1 = x_cast + (i )*incx; \ + y1 = y_cast + (0 )*incy; \ +\ + /* y = y + alpha * A1 * x1; */ \ + PASTEMAC3(cha,chx,chy,kername)( conja, \ + conjx, \ + n_elem, \ + f, \ + alpha_cast, \ + A1, rs_at, cs_at, \ + x1, incx, \ + y1, incy ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( gemv_unf_var2, AXPYF_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( gemv_unf_var2, AXPYF_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( gemv_unf_var2, AXPYF_KERNEL ) +#endif + diff --git a/frame/2/gemv/bli_gemv_unf_var2.h b/frame/2/gemv/old/bli_gemv_unf_var2.h similarity index 100% rename from frame/2/gemv/bli_gemv_unf_var2.h rename to frame/2/gemv/old/bli_gemv_unf_var2.h diff --git a/frame/2/ger/bli_ger.h b/frame/2/ger/bli_ger.h index b016e6c57..dc6f9e3f9 100644 --- a/frame/2/ger/bli_ger.h +++ b/frame/2/ger/bli_ger.h @@ -33,66 +33,7 @@ */ #include "bli_ger_cntl.h" -#include "bli_ger_check.h" +#include "bli_ger_front.h" #include "bli_ger_int.h" -#include "bli_ger_unb_var1.h" -#include "bli_ger_unb_var2.h" - -#include "bli_ger_blk_var1.h" -#include "bli_ger_blk_var2.h" - - -void bli_ger( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* a ); - - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy, \ - ctype* a, inc_t rs_a, inc_t cs_a \ - ); - -INSERT_GENTPROT_BASIC( ger ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_a, ctype_xy, chx, chy, cha, chxy, opname ) \ -\ -void PASTEMAC3(chx,chy,cha,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - dim_t n, \ - ctype_xy* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_a* a, inc_t rs_a, inc_t cs_a \ - ); - -INSERT_GENTPROT3U12_BASIC( ger ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( ger ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( ger ) -#endif - +#include "bli_ger_var.h" diff --git a/frame/2/ger/bli_ger_blk_var1.c b/frame/2/ger/bli_ger_blk_var1.c index b52647928..c3fee9f51 100644 --- a/frame/2/ger/bli_ger_blk_var1.c +++ b/frame/2/ger/bli_ger_blk_var1.c @@ -38,6 +38,7 @@ void bli_ger_blk_var1( obj_t* alpha, obj_t* x, obj_t* y, obj_t* a, + cntx_t* cntx, ger_t* cntl ) { obj_t a1, a1_pack; @@ -59,7 +60,7 @@ void bli_ger_blk_var1( obj_t* alpha, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, m_trans, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and x1. bli_acquire_mpart_t2b( BLIS_SUBPART1, @@ -69,16 +70,16 @@ void bli_ger_blk_var1( obj_t* alpha, // Initialize objects for packing A1 and x1 (if needed). bli_packm_init( &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x( cntl ) ); + cntx, cntl_sub_packv_x( cntl ) ); // Copy/pack A1, x1 (if needed). bli_packm_int( &a1, &a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x( cntl ) ); + cntx, cntl_sub_packv_x( cntl ) ); // A1 = A1 + alpha * x1 * y; bli_ger_int( BLIS_NO_CONJUGATE, @@ -87,11 +88,12 @@ void bli_ger_blk_var1( obj_t* alpha, &x1_pack, y, &a1_pack, + cntx, cntl_sub_ger( cntl ) ); // Copy/unpack A1 (if A1 was packed). bli_unpackm_int( &a1_pack, &a1, - cntl_sub_unpackm_a( cntl ), + cntx, cntl_sub_unpackm_a( cntl ), &BLIS_PACKM_SINGLE_THREADED ); } diff --git a/frame/2/ger/bli_ger_blk_var2.c b/frame/2/ger/bli_ger_blk_var2.c index 3be4418a6..5bb5d5407 100644 --- a/frame/2/ger/bli_ger_blk_var2.c +++ b/frame/2/ger/bli_ger_blk_var2.c @@ -38,6 +38,7 @@ void bli_ger_blk_var2( obj_t* alpha, obj_t* x, obj_t* y, obj_t* a, + cntx_t* cntx, ger_t* cntl ) { obj_t a1, a1_pack; @@ -59,7 +60,7 @@ void bli_ger_blk_var2( obj_t* alpha, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, n_trans, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and y1. bli_acquire_mpart_l2r( BLIS_SUBPART1, @@ -69,16 +70,16 @@ void bli_ger_blk_var2( obj_t* alpha, // Initialize objects for packing A1 and y1 (if needed). bli_packm_init( &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packv_init( &y1, &y1_pack, - cntl_sub_packv_y( cntl ) ); + cntx, cntl_sub_packv_y( cntl ) ); // Copy/pack A1, y1 (if needed). bli_packm_int( &a1, &a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &y1, &y1_pack, - cntl_sub_packv_y( cntl ) ); + cntx, cntl_sub_packv_y( cntl ) ); // A1 = A1 + alpha * x * y1; bli_ger_int( BLIS_NO_CONJUGATE, @@ -87,11 +88,12 @@ void bli_ger_blk_var2( obj_t* alpha, x, &y1_pack, &a1_pack, + cntx, cntl_sub_ger( cntl ) ); // Copy/unpack A1 (if A1 was packed). bli_unpackm_int( &a1_pack, &a1, - cntl_sub_unpackm_a( cntl ), + cntx, cntl_sub_unpackm_a( cntl ), &BLIS_PACKM_SINGLE_THREADED ); } diff --git a/frame/2/ger/bli_ger_cntl.c b/frame/2/ger/bli_ger_cntl.c index 4e0921c49..5eeebfe24 100644 --- a/frame/2/ger/bli_ger_cntl.c +++ b/frame/2/ger/bli_ger_cntl.c @@ -38,9 +38,6 @@ extern packm_t* packm_cntl; extern packv_t* packv_cntl; extern unpackm_t* unpackm_cntl; -extern blksz_t* gemv_mc; -extern blksz_t* gemv_nc; - ger_t* ger_cntl_bs_ke_row; ger_t* ger_cntl_bs_ke_col; @@ -62,14 +59,16 @@ void bli_ger_cntl_init() = bli_ger_cntl_obj_create( BLIS_UNBLOCKED, BLIS_VARIANT1, + 0, NULL, NULL, NULL, - NULL, NULL, NULL ); + NULL, NULL ); ger_cntl_bs_ke_col = bli_ger_cntl_obj_create( BLIS_UNBLOCKED, BLIS_VARIANT2, + 0, NULL, NULL, NULL, - NULL, NULL, NULL ); + NULL, NULL ); // Create control trees for problems with relatively small m dimension @@ -78,7 +77,7 @@ void bli_ger_cntl_init() = bli_ger_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemv_nc, + BLIS_N2, NULL, // x is not partitioned in var2 packv_cntl, // pack y1 (if needed) packm_cntl, // pack A1 (if needed) @@ -88,7 +87,7 @@ void bli_ger_cntl_init() = bli_ger_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemv_nc, + BLIS_N2, NULL, // x is not partitioned in var2 packv_cntl, // pack y1 (if needed) packm_cntl, // pack A1 (if needed) @@ -102,7 +101,7 @@ void bli_ger_cntl_init() = bli_ger_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemv_mc, + BLIS_M2, packv_cntl, // pack x1 (if needed) NULL, // y is not partitioned in var1 packm_cntl, // pack A1 (if needed) @@ -112,7 +111,7 @@ void bli_ger_cntl_init() = bli_ger_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemv_mc, + BLIS_M2, packv_cntl, // pack x1 (if needed) NULL, // y is not partitioned in var1 packm_cntl, // pack A1 (if needed) @@ -126,7 +125,7 @@ void bli_ger_cntl_init() = bli_ger_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemv_nc, + BLIS_N2, NULL, // x is not partitioned in var2 packv_cntl, // pack y1 (if needed) NULL, // do not pack A1 @@ -136,7 +135,7 @@ void bli_ger_cntl_init() = bli_ger_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemv_nc, + BLIS_N2, NULL, // x is not partitioned in var2 packv_cntl, // pack y1 (if needed) NULL, // do not pack A1 @@ -162,7 +161,7 @@ void bli_ger_cntl_finalize() ger_t* bli_ger_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x, packv_t* sub_packv_y, packm_t* sub_packm_a, @@ -171,11 +170,11 @@ ger_t* bli_ger_cntl_obj_create( impl_t impl_type, { ger_t* cntl; - cntl = ( ger_t* ) bli_malloc( sizeof(ger_t) ); + cntl = ( ger_t* ) bli_malloc( sizeof(ger_t) ); cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_packv_x = sub_packv_x; cntl->sub_packv_y = sub_packv_y; cntl->sub_packm_a = sub_packm_a; @@ -188,7 +187,7 @@ ger_t* bli_ger_cntl_obj_create( impl_t impl_type, void bli_ger_cntl_obj_init( ger_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x, packv_t* sub_packv_y, packm_t* sub_packm_a, @@ -197,7 +196,7 @@ void bli_ger_cntl_obj_init( ger_t* cntl, { cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_packv_x = sub_packv_x; cntl->sub_packv_y = sub_packv_y; cntl->sub_packm_a = sub_packm_a; diff --git a/frame/2/ger/bli_ger_cntl.h b/frame/2/ger/bli_ger_cntl.h index 06879d039..7ca51942d 100644 --- a/frame/2/ger/bli_ger_cntl.h +++ b/frame/2/ger/bli_ger_cntl.h @@ -36,7 +36,7 @@ struct ger_s { impl_t impl_type; varnum_t var_num; - blksz_t* b; + bszid_t bszid; struct packv_s* sub_packv_x; struct packv_s* sub_packv_y; struct packm_s* sub_packm_a; @@ -53,7 +53,7 @@ void bli_ger_cntl_init( void ); void bli_ger_cntl_finalize( void ); ger_t* bli_ger_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x, packv_t* sub_packv_y, packm_t* sub_packm_a, @@ -62,7 +62,7 @@ ger_t* bli_ger_cntl_obj_create( impl_t impl_type, void bli_ger_cntl_obj_init( ger_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x, packv_t* sub_packv_y, packm_t* sub_packm_a, diff --git a/frame/2/ger/bli_ger.c b/frame/2/ger/bli_ger_front.c similarity index 76% rename from frame/2/ger/bli_ger.c rename to frame/2/ger/bli_ger_front.c index c03b8358e..52413a6ef 100644 --- a/frame/2/ger/bli_ger.c +++ b/frame/2/ger/bli_ger_front.c @@ -39,10 +39,14 @@ extern ger_t* ger_cntl_bs_ke_col; extern ger_t* ger_cntl_ge_row; extern ger_t* ger_cntl_ge_col; -void bli_ger( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* a ) +void bli_ger_front + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* a, + cntx_t* cntx + ) { ger_t* ger_cntl; num_t dt_targ_x; @@ -114,6 +118,7 @@ void bli_ger( obj_t* alpha, x, y, a, + cntx, ger_cntl ); } @@ -122,18 +127,20 @@ void bli_ger( obj_t* alpha, // Define BLAS-like interfaces with homogeneous-typed operands. // #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy, \ - ctype* a, inc_t rs_a, inc_t cs_a \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -161,39 +168,9 @@ void PASTEMAC(ch,opname)( \ PASTEMAC0(opname)( &alphao, \ &xo, \ &yo, \ - &ao ); \ + &ao, \ + cntx ); \ } -INSERT_GENTFUNC_BASIC( ger, ger ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_a, ctype_xy, chx, chy, cha, chxy, opname, varname ) \ -\ -void PASTEMAC3(chx,chy,cha,opname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - dim_t n, \ - ctype_xy* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_a* a, inc_t rs_a, inc_t cs_a \ - ) \ -{ \ - bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ -} - -INSERT_GENTFUNC3U12_BASIC( ger, ger ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( ger, ger ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( ger, ger ) -#endif +INSERT_GENTFUNC_BASIC0( ger_front ) diff --git a/frame/2/ger/bli_ger_front.h b/frame/2/ger/bli_ger_front.h new file mode 100644 index 000000000..9a917e376 --- /dev/null +++ b/frame/2/ger/bli_ger_front.h @@ -0,0 +1,61 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +void bli_ger_front + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* a, + cntx_t* cntx + ); + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( ger_front ) + diff --git a/frame/2/ger/bli_ger_int.c b/frame/2/ger/bli_ger_int.c index 4367e9ff8..a0b2a6b78 100644 --- a/frame/2/ger/bli_ger_int.c +++ b/frame/2/ger/bli_ger_int.c @@ -40,6 +40,7 @@ typedef void (*FUNCPTR_T)( obj_t* alpha, obj_t* x, obj_t* y, obj_t* a, + cntx_t* cntx, ger_t* cntl ); static FUNCPTR_T vars[4][3] = @@ -57,6 +58,7 @@ void bli_ger_int( conj_t conjx, obj_t* x, obj_t* y, obj_t* a, + cntx_t* cntx, ger_t* cntl ) { varnum_t n; @@ -69,7 +71,7 @@ void bli_ger_int( conj_t conjx, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_ger_int_check( alpha, x, y, a, cntl ); + bli_ger_check( alpha, x, y, a ); // If A has a zero dimension, return early. if ( bli_obj_has_zero_dim( *a ) ) return; @@ -123,6 +125,7 @@ void bli_ger_int( conj_t conjx, &x_local, &y_local, &a_local, + cntx, cntl ); } diff --git a/frame/2/ger/bli_ger_int.h b/frame/2/ger/bli_ger_int.h index 2dae7382d..d3fe25e49 100644 --- a/frame/2/ger/bli_ger_int.h +++ b/frame/2/ger/bli_ger_int.h @@ -38,5 +38,6 @@ void bli_ger_int( conj_t conjx, obj_t* x, obj_t* y, obj_t* a, + cntx_t* cntx, ger_t* cntl ); diff --git a/frame/2/ger/bli_ger_unb_var1.c b/frame/2/ger/bli_ger_unb_var1.c index ef83d60fb..1dee4ce83 100644 --- a/frame/2/ger/bli_ger_unb_var1.c +++ b/frame/2/ger/bli_ger_unb_var1.c @@ -34,139 +34,56 @@ #include "blis.h" -#define FUNCPTR_T ger_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - conj_t conjy, - dim_t m, - dim_t n, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* a, inc_t rs_a, inc_t cs_a - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,ger_unb_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,ger_unb_var1); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,ger_unb_var1); -#endif -#endif - - -void bli_ger_unb_var1( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* a, - ger_t* cntl ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_a = bli_obj_datatype( *a ); - - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - - dim_t m = bli_obj_length( *a ); - dim_t n = bli_obj_width( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of x and y. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_x, dt_y ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_a]; - - // Invoke the function. - f( conjx, - conjy, - m, - n, - buf_alpha, - buf_x, incx, - buf_y, incy, - buf_a, rs_a, cs_a ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_a, ctype_xy, chx, chy, cha, chxy, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(chx,chy,cha,varname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - dim_t n, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* a, inc_t rs_a, inc_t cs_a \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ { \ - ctype_xy* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_a* a_cast = a; \ - ctype_a* a1t; \ - ctype_x* chi1; \ - ctype_y* y1; \ - ctype_xy alpha_chi1; \ - dim_t i; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim2( m, n ) ) return; \ + ctype* a1t; \ + ctype* chi1; \ + ctype* y1; \ + ctype alpha_chi1; \ + dim_t i; \ \ - if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ - a1t = a_cast + (i )*rs_a + (0 )*cs_a; \ - chi1 = x_cast + (i )*incx; \ - y1 = y_cast + (0 )*incy; \ + a1t = a + (i )*rs_a + (0 )*cs_a; \ + chi1 = x + (i )*incx; \ + y1 = y + (0 )*incy; \ \ /* a1t = a1t + alpha * chi1 * y; */ \ - PASTEMAC2(chx,chxy,copycjs)( conjx, *chi1, alpha_chi1 ); \ - PASTEMAC2(chxy,chxy,scals)( *alpha_cast, alpha_chi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, alpha_chi1 ); \ + PASTEMAC(ch,scals)( *alpha, alpha_chi1 ); \ \ - PASTEMAC3(chxy,chy,cha,kername)( conjy, \ - n, \ - &alpha_chi1, \ - y1, incy, \ - a1t, cs_a ); \ + kfp_av \ + ( \ + conjy, \ + n, \ + &alpha_chi1, \ + y1, incy, \ + a1t, cs_a, \ + cntx \ + ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( ger_unb_var1, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( ger_unb_var1, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( ger_unb_var1, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( ger_unb_var1 ) diff --git a/frame/2/ger/bli_ger_unb_var2.c b/frame/2/ger/bli_ger_unb_var2.c index b16c26454..afdbd363d 100644 --- a/frame/2/ger/bli_ger_unb_var2.c +++ b/frame/2/ger/bli_ger_unb_var2.c @@ -34,139 +34,56 @@ #include "blis.h" -#define FUNCPTR_T ger_fp - -typedef void (*FUNCPTR_T)( - conj_t conjx, - conj_t conjy, - dim_t m, - dim_t n, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* a, inc_t rs_a, inc_t cs_a - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,ger_unb_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,ger_unb_var2); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,ger_unb_var2); -#endif -#endif - - -void bli_ger_unb_var2( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* a, - ger_t* cntl ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_a = bli_obj_datatype( *a ); - - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - - dim_t m = bli_obj_length( *a ); - dim_t n = bli_obj_width( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of x and y. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_x, dt_y ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_a]; - - // Invoke the function. - f( conjx, - conjy, - m, - n, - buf_alpha, - buf_x, incx, - buf_y, incy, - buf_a, rs_a, cs_a ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_a, ctype_xy, chx, chy, cha, chxy, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(chx,chy,cha,varname)( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - dim_t n, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* a, inc_t rs_a, inc_t cs_a \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ { \ - ctype_xy* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_a* a_cast = a; \ - ctype_a* a1; \ - ctype_x* x1; \ - ctype_y* psi1; \ - ctype_xy alpha_psi1; \ - dim_t j; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim2( m, n ) ) return; \ + ctype* a1; \ + ctype* x1; \ + ctype* psi1; \ + ctype alpha_psi1; \ + dim_t j; \ \ - if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ for ( j = 0; j < n; ++j ) \ { \ - a1 = a_cast + (0 )*rs_a + (j )*cs_a; \ - x1 = x_cast + (0 )*incx; \ - psi1 = y_cast + (j )*incy; \ + a1 = a + (0 )*rs_a + (j )*cs_a; \ + x1 = x + (0 )*incx; \ + psi1 = y + (j )*incy; \ \ /* a1 = a1 + alpha * psi1 * x; */ \ - PASTEMAC2(chy,chxy,copycjs)( conjy, *psi1, alpha_psi1 ); \ - PASTEMAC2(chxy,chxy,scals)( *alpha_cast, alpha_psi1 ); \ + PASTEMAC(ch,copycjs)( conjy, *psi1, alpha_psi1 ); \ + PASTEMAC(ch,scals)( *alpha, alpha_psi1 ); \ \ - PASTEMAC3(chxy,chx,cha,kername)( conjx, \ - m, \ - &alpha_psi1, \ - x1, incx, \ - a1, rs_a ); \ + kfp_av \ + ( \ + conjx, \ + m, \ + &alpha_psi1, \ + x1, incx, \ + a1, rs_a, \ + cntx \ + ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( ger_unb_var2, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( ger_unb_var2, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( ger_unb_var2, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( ger_unb_var2 ) diff --git a/frame/1m/packm/ukernels/bli_packm_ref_cxk_3mis.h b/frame/2/ger/bli_ger_var.h similarity index 69% rename from frame/1m/packm/ukernels/bli_packm_ref_cxk_3mis.h rename to frame/2/ger/bli_ger_var.h index 5e5d33bf2..5833ec3f4 100644 --- a/frame/1m/packm/ukernels/bli_packm_ref_cxk_3mis.h +++ b/frame/2/ger/bli_ger_var.h @@ -32,24 +32,51 @@ */ + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y, \ + obj_t* a, \ + cntx_t* cntx, \ + ger_t* cntl \ + ); + +GENPROT( ger_blk_var1 ) +GENPROT( ger_blk_var2 ) + +GENPROT( ger_unb_var1 ) +GENPROT( ger_unb_var2 ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// + #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - conj_t conja, \ - dim_t n, \ - void* kappa, \ - void* a, inc_t inca, inc_t lda, \ - void* p, inc_t is_p, inc_t ldp \ - ); +void PASTEMAC(ch,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); -INSERT_GENTPROT_BASIC( packm_ref_2xk_3mis ) -INSERT_GENTPROT_BASIC( packm_ref_4xk_3mis ) -INSERT_GENTPROT_BASIC( packm_ref_6xk_3mis ) -INSERT_GENTPROT_BASIC( packm_ref_8xk_3mis ) -INSERT_GENTPROT_BASIC( packm_ref_10xk_3mis ) -INSERT_GENTPROT_BASIC( packm_ref_12xk_3mis ) -INSERT_GENTPROT_BASIC( packm_ref_14xk_3mis ) -INSERT_GENTPROT_BASIC( packm_ref_16xk_3mis ) -INSERT_GENTPROT_BASIC( packm_ref_30xk_3mis ) +INSERT_GENTPROT_BASIC( ger_unb_var1 ) +INSERT_GENTPROT_BASIC( ger_unb_var2 ) diff --git a/frame/2/ger/bli_ger_var_oapi.c b/frame/2/ger/bli_ger_var_oapi.c new file mode 100644 index 000000000..f03452dce --- /dev/null +++ b/frame/2/ger/bli_ger_var_oapi.c @@ -0,0 +1,89 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* y, \ + obj_t* a, \ + cntx_t* cntx, \ + ger_t* cntl \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *a ); \ +\ + conj_t conjx = bli_obj_conj_status( *x ); \ + conj_t conjy = bli_obj_conj_status( *y ); \ +\ + dim_t m = bli_obj_length( *a ); \ + dim_t n = bli_obj_width( *a ); \ +\ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t incy = bli_obj_vector_inc( *y ); \ +\ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ +\ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + bli_call_ft_13 \ + ( \ + dt, \ + opname, \ + conjx, \ + conjy, \ + m, \ + n, \ + buf_alpha, \ + buf_x, incx, \ + buf_y, incy, \ + buf_a, rs_a, cs_a, \ + cntx \ + ); \ +} \ + +GENFRONT( ger_unb_var1 ) +GENFRONT( ger_unb_var2 ) + diff --git a/frame/2/ger/bli_ger_blk_var1.h b/frame/2/ger/old/bli_ger_blk_var1.h similarity index 98% rename from frame/2/ger/bli_ger_blk_var1.h rename to frame/2/ger/old/bli_ger_blk_var1.h index fb1138b77..a00436f5f 100644 --- a/frame/2/ger/bli_ger_blk_var1.h +++ b/frame/2/ger/old/bli_ger_blk_var1.h @@ -36,5 +36,6 @@ void bli_ger_blk_var1( obj_t* alpha, obj_t* x, obj_t* y, obj_t* a, + cntx_t* cntx, ger_t* cntl ); diff --git a/frame/2/ger/bli_ger_blk_var2.h b/frame/2/ger/old/bli_ger_blk_var2.h similarity index 98% rename from frame/2/ger/bli_ger_blk_var2.h rename to frame/2/ger/old/bli_ger_blk_var2.h index 37870421d..28a9ae9ff 100644 --- a/frame/2/ger/bli_ger_blk_var2.h +++ b/frame/2/ger/old/bli_ger_blk_var2.h @@ -36,5 +36,6 @@ void bli_ger_blk_var2( obj_t* alpha, obj_t* x, obj_t* y, obj_t* a, + cntx_t* cntx, ger_t* cntl ); diff --git a/frame/2/ger/bli_ger_check.c b/frame/2/ger/old/bli_ger_check.c similarity index 99% rename from frame/2/ger/bli_ger_check.c rename to frame/2/ger/old/bli_ger_check.c index e2994f385..d09824f11 100644 --- a/frame/2/ger/bli_ger_check.c +++ b/frame/2/ger/old/bli_ger_check.c @@ -97,6 +97,7 @@ void bli_ger_int_check( obj_t* alpha, obj_t* x, obj_t* y, obj_t* a, + cntx_t* cntx, ger_t* cntl ) { err_t e_val; diff --git a/frame/2/ger/bli_ger_check.h b/frame/2/ger/old/bli_ger_check.h similarity index 98% rename from frame/2/ger/bli_ger_check.h rename to frame/2/ger/old/bli_ger_check.h index cdd82ae3d..f69ec361c 100644 --- a/frame/2/ger/bli_ger_check.h +++ b/frame/2/ger/old/bli_ger_check.h @@ -46,4 +46,5 @@ void bli_ger_int_check( obj_t* alpha, obj_t* x, obj_t* y, obj_t* a, + cntx_t* cntx, ger_t* cntl ); diff --git a/frame/ind/query/bli_bsv_query.h b/frame/2/ger/old/bli_ger_cntx.c similarity index 72% rename from frame/ind/query/bli_bsv_query.h rename to frame/2/ger/old/bli_ger_cntx.c index d944148e9..9e9a57197 100644 --- a/frame/ind/query/bli_bsv_query.h +++ b/frame/2/ger/old/bli_ger_cntx.c @@ -32,30 +32,27 @@ */ -#ifndef BLIS_BSV_QUERY_H -#define BLIS_BSV_QUERY_H +#include "blis.h" - -typedef enum +void bli_ger_cntx_init( cntx_t* cntx ) { - BLIS_MC = 0, - BLIS_NC, - BLIS_KC, - BLIS_MR, - BLIS_NR, - BLIS_KR, -} bszid_t; + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); -#define BLIS_NUM_LEVEL3_BLKSZS 6 + // Initialize the context with kernels for the current architecture. + bli_gks_cntx_set_l1v_ker( BLIS_AXPYV_KER, cntx ); -// ----------------------------------------------------------------------------- + // Set the register and cache blocksizes and multiples, as well + // as the execution method. + bli_gks_cntx_set_blkszs( BLIS_NAT, 2, + BLIS_N2, BLIS_N2, + BLIS_M2, BLIS_M2, + cntx ); +} -dim_t bli_bsv_get_avail_blksz_dt( bszid_t bsv, opid_t oper, num_t dt ); -dim_t bli_bsv_get_avail_blksz_max_dt( bszid_t bsv, opid_t oper, num_t dt ); - -blksz_t* bli_bsv_get_avail_blksz( bszid_t bsv, opid_t oper, num_t dt ); -blksz_t* bli_bsv_get_blksz( bszid_t bsv, ind_t method ); - - -#endif +void bli_ger_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} diff --git a/frame/ind/cntl/bli_ind_cntl_init.h b/frame/2/ger/old/bli_ger_cntx.h similarity index 96% rename from frame/ind/cntl/bli_ind_cntl_init.h rename to frame/2/ger/old/bli_ger_cntx.h index 1547c94b0..901a3d2c0 100644 --- a/frame/ind/cntl/bli_ind_cntl_init.h +++ b/frame/2/ger/old/bli_ger_cntx.h @@ -32,5 +32,6 @@ */ -void bli_ind_cntl_init( void ); -void bli_ind_cntl_finalize( void ); +void bli_ger_cntx_init( void ); +void bli_ger_cntx_finalize( void ); + diff --git a/frame/2/ger/old/bli_ger_unb_var1.c b/frame/2/ger/old/bli_ger_unb_var1.c new file mode 100644 index 000000000..57d539f55 --- /dev/null +++ b/frame/2/ger/old/bli_ger_unb_var1.c @@ -0,0 +1,148 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +static ger_vft GENARRAY(ftypes,ger_unb_var1); + +void bli_ger_unb_var1( obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* a, + cntx_t* cntx, + ger_t* cntl ) +{ + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + num_t dt_a = bli_obj_datatype( *a ); + + conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); + + dim_t m = bli_obj_length( *a ); + dim_t n = bli_obj_width( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of x and y. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_x, dt_y ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a]; + + // Invoke the function. + f( conjx, + conjy, + m, + n, + buf_alpha, + buf_x, incx, + buf_y, incy, + buf_a, rs_a, cs_a, + cntx ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC( ctype, ch, varname, kername, kerid ) \ +\ +void PASTEMAC(cha,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_xy* alpha_cast = alpha; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_a* a_cast = a; \ + ctype_a* a1t; \ + ctype_x* chi1; \ + ctype_y* y1; \ + ctype_xy alpha_chi1; \ + dim_t i; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + a1t = a_cast + (i )*rs_a + (0 )*cs_a; \ + chi1 = x_cast + (i )*incx; \ + y1 = y_cast + (0 )*incy; \ +\ + /* a1t = a1t + alpha * chi1 * y; */ \ + PASTEMAC2(chx,chxy,copycjs)( conjx, *chi1, alpha_chi1 ); \ + PASTEMAC2(chxy,chxy,scals)( *alpha_cast, alpha_chi1 ); \ +\ + PASTEMAC(cha,kername) \ + ( \ + conjy, \ + n, \ + &alpha_chi1, \ + y1, incy, \ + a1t, cs_a, \ + cntx \ + ); \ + } \ +} + +INSERT_GENTFUNC3U12_BASIC( ger_unb_var1, AXPYV_KERNEL ) + diff --git a/frame/2/ger/bli_ger_unb_var1.h b/frame/2/ger/old/bli_ger_unb_var1.h similarity index 100% rename from frame/2/ger/bli_ger_unb_var1.h rename to frame/2/ger/old/bli_ger_unb_var1.h diff --git a/frame/1f/axpyf/bli_axpyf_kernel.c b/frame/2/ger/old/bli_ger_unb_var2.c similarity index 52% rename from frame/1f/axpyf/bli_axpyf_kernel.c rename to frame/2/ger/old/bli_ger_unb_var2.c index 8b087646f..eaaa418fb 100644 --- a/frame/1f/axpyf/bli_axpyf_kernel.c +++ b/frame/2/ger/old/bli_ger_unb_var2.c @@ -34,116 +34,125 @@ #include "blis.h" -#define FUNCPTR_T axpyf_fp +static ger_vft GENARRAY(ftypes,ger_unb_var2); -typedef void (*FUNCPTR_T)( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - void* alpha, - void* a, inc_t inca, inc_t lda, - void* x, inc_t incx, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,axpyf_kernel_void); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,axpyf_kernel_void); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,axpyf_kernel_void); -#endif -#endif - - -void bli_axpyf_kernel( obj_t* alpha, - obj_t* a, +void bli_ger_unb_var2( obj_t* alpha, obj_t* x, - obj_t* y ) + obj_t* y, + obj_t* a, + cntx_t* cntx, + ger_t* cntl ) { - num_t dt_a = bli_obj_datatype( *a ); num_t dt_x = bli_obj_datatype( *x ); num_t dt_y = bli_obj_datatype( *y ); + num_t dt_a = bli_obj_datatype( *a ); - conj_t conja = bli_obj_conj_status( *a ); conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); - dim_t m = bli_obj_vector_dim( *y ); - dim_t b_n = bli_obj_vector_dim( *x ); + dim_t m = bli_obj_length( *a ); + dim_t n = bli_obj_width( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); void* buf_a = bli_obj_buffer_at_off( *a ); inc_t rs_a = bli_obj_row_stride( *a ); inc_t cs_a = bli_obj_col_stride( *a ); - inc_t inc_x = bli_obj_vector_inc( *x ); - void* buf_x = bli_obj_buffer_at_off( *x ); - - inc_t inc_y = bli_obj_vector_inc( *y ); - void* buf_y = bli_obj_buffer_at_off( *y ); - num_t dt_alpha; void* buf_alpha; FUNCPTR_T f; - // The datatype of alpha MUST be the type union of a and x. This is to + // The datatype of alpha MUST be the type union of x and y. This is to // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); + dt_alpha = bli_datatype_union( dt_x, dt_y ); buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); // Index into the type combination array to extract the correct // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; + f = ftypes[dt_a]; // Invoke the function. - f( conja, - conjx, + f( conjx, + conjy, m, - b_n, + n, buf_alpha, + buf_x, incx, + buf_y, incy, buf_a, rs_a, cs_a, - buf_x, inc_x, - buf_y, inc_y ); + cntx ); } #undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#define GENTFUNC( ctype, ch, varname, kername, kerid ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - void* alpha, \ - void* a, inc_t inca, inc_t lda, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(cha,varname) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + dim_t n, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ { \ - PASTEMAC3(cha,chx,chy,kername)( conja, \ - conjx, \ - m, \ - b_n, \ - alpha, \ - a, inca, lda, \ - x, incx, \ - y, incy ); \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_xy* alpha_cast = alpha; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_a* a_cast = a; \ + ctype_a* a1; \ + ctype_x* x1; \ + ctype_y* psi1; \ + ctype_xy alpha_psi1; \ + dim_t j; \ +\ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ +\ + for ( j = 0; j < n; ++j ) \ + { \ + a1 = a_cast + (0 )*rs_a + (j )*cs_a; \ + x1 = x_cast + (0 )*incx; \ + psi1 = y_cast + (j )*incy; \ +\ + /* a1 = a1 + alpha * psi1 * x; */ \ + PASTEMAC2(chy,chxy,copycjs)( conjy, *psi1, alpha_psi1 ); \ + PASTEMAC2(chxy,chxy,scals)( *alpha_cast, alpha_psi1 ); \ +\ + PASTEMAC(cha,kername) \ + ( \ + conjx, \ + m, \ + &alpha_psi1, \ + x1, incx, \ + a1, rs_a, \ + cntx \ + ); \ + } \ } // Define the basic set of functions unconditionally, and then also some // mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( axpyf_kernel_void, AXPYF_KERNEL ) +INSERT_GENTFUNC3U12_BASIC( ger_unb_var2, AXPYV_KERNEL ) #ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( axpyf_kernel_void, AXPYF_KERNEL ) +INSERT_GENTFUNC3U12_MIX_D( ger_unb_var2, AXPYV_KERNEL ) #endif #ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( axpyf_kernel_void, AXPYF_KERNEL ) +INSERT_GENTFUNC3U12_MIX_P( ger_unb_var2, AXPYV_KERNEL ) #endif diff --git a/frame/2/ger/bli_ger_unb_var2.h b/frame/2/ger/old/bli_ger_unb_var2.h similarity index 100% rename from frame/2/ger/bli_ger_unb_var2.h rename to frame/2/ger/old/bli_ger_unb_var2.h diff --git a/frame/2/hemv/bli_hemv.h b/frame/2/hemv/bli_hemv.h index fe004b0be..07b5ff0c0 100644 --- a/frame/2/hemv/bli_hemv.h +++ b/frame/2/hemv/bli_hemv.h @@ -33,78 +33,8 @@ */ #include "bli_hemv_cntl.h" -#include "bli_hemv_check.h" +#include "bli_hemv_front.h" #include "bli_hemv_int.h" -#include "bli_hemv_unb_var1.h" -#include "bli_hemv_unb_var2.h" -#include "bli_hemv_unb_var3.h" -#include "bli_hemv_unb_var4.h" - -#include "bli_hemv_unf_var1a.h" -#include "bli_hemv_unf_var3a.h" -#include "bli_hemv_unf_var1.h" -#include "bli_hemv_unf_var3.h" - -#include "bli_hemv_blk_var1.h" -#include "bli_hemv_blk_var2.h" -#include "bli_hemv_blk_var3.h" -#include "bli_hemv_blk_var4.h" - - -void bli_hemv( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ); - - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploa, \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* x, inc_t incx, \ - ctype* beta, \ - ctype* y, inc_t incy \ - ); - -INSERT_GENTPROT_BASIC( hemv ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, opname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - uplo_t uploa, \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t rs_a, inc_t cs_a, \ - ctype_x* x, inc_t incx, \ - ctype_y* beta, \ - ctype_y* y, inc_t incy \ - ); - -INSERT_GENTPROT3U12_BASIC( hemv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( hemv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( hemv ) -#endif +#include "bli_hemv_var.h" diff --git a/frame/2/hemv/bli_hemv_blk_var1.c b/frame/2/hemv/bli_hemv_blk_var1.c index 56b26e72c..a7ef35a4b 100644 --- a/frame/2/hemv/bli_hemv_blk_var1.c +++ b/frame/2/hemv/bli_hemv_blk_var1.c @@ -40,6 +40,7 @@ void bli_hemv_blk_var1( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ) { obj_t a11, a11_pack; @@ -73,14 +74,14 @@ void bli_hemv_blk_var1( conj_t conjh, // y = beta * y; bli_scalv_int( beta, y, - cntl_sub_scalv( cntl ) ); + cntx, cntl_sub_scalv( cntl ) ); // Partition diagonally. for ( ij = 0; ij < mn; ij += b_alg ) { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A10, x1, x0, y1, and y0. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -98,20 +99,20 @@ void bli_hemv_blk_var1( conj_t conjh, // Initialize objects for packing A11, x1, and y1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_init( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // Copy/pack A11, x1, y1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_int( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // y0 = y0 + alpha * A10' * x1; bli_gemv_int( bli_apply_conj( conjh, BLIS_TRANSPOSE ), @@ -121,6 +122,7 @@ void bli_hemv_blk_var1( conj_t conjh, &x1_pack, &BLIS_ONE, &y0, + cntx, cntl_sub_gemv_t_rp( cntl ) ); // y1 = y1 + alpha * A11 * x1; @@ -130,6 +132,7 @@ void bli_hemv_blk_var1( conj_t conjh, &x1_pack, &BLIS_ONE, &y1_pack, + cntx, cntl_sub_hemv( cntl ) ); // y1 = y1 + alpha * A10 * x0; @@ -140,11 +143,12 @@ void bli_hemv_blk_var1( conj_t conjh, &x0, &BLIS_ONE, &y1_pack, + cntx, cntl_sub_gemv_n_rp( cntl ) ); // Copy/unpack y1 (if y1 was packed). bli_unpackv_int( &y1_pack, &y1, - cntl_sub_unpackv_y1( cntl ) ); + cntx, cntl_sub_unpackv_y1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/hemv/bli_hemv_blk_var2.c b/frame/2/hemv/bli_hemv_blk_var2.c index 9ead6f7e9..0a7c22dd9 100644 --- a/frame/2/hemv/bli_hemv_blk_var2.c +++ b/frame/2/hemv/bli_hemv_blk_var2.c @@ -40,6 +40,7 @@ void bli_hemv_blk_var2( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ) { obj_t a11, a11_pack; @@ -74,14 +75,14 @@ void bli_hemv_blk_var2( conj_t conjh, // y = beta * y; bli_scalv_int( beta, y, - cntl_sub_scalv( cntl ) ); + cntx, cntl_sub_scalv( cntl ) ); // Partition diagonally. for ( ij = 0; ij < mn; ij += b_alg ) { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A10, A21, x1, x0, x2, y1, and y0. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -101,20 +102,20 @@ void bli_hemv_blk_var2( conj_t conjh, // Initialize objects for packing A11, x1, and y1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_init( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // Copy/pack A11, x1, y1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_int( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // y1 = y1 + alpha * A10 * x0; bli_gemv_int( BLIS_NO_TRANSPOSE, @@ -124,6 +125,7 @@ void bli_hemv_blk_var2( conj_t conjh, &x0, &BLIS_ONE, &y1_pack, + cntx, cntl_sub_gemv_n_rp( cntl ) ); // y1 = y1 + alpha * A11 * x1; @@ -133,6 +135,7 @@ void bli_hemv_blk_var2( conj_t conjh, &x1_pack, &BLIS_ONE, &y1_pack, + cntx, cntl_sub_hemv( cntl ) ); // y1 = y1 + alpha * A21' * x2; @@ -143,11 +146,12 @@ void bli_hemv_blk_var2( conj_t conjh, &x2, &BLIS_ONE, &y1_pack, + cntx, cntl_sub_gemv_t_cp( cntl ) ); // Copy/unpack y1 (if y1 was packed). bli_unpackv_int( &y1_pack, &y1, - cntl_sub_unpackv_y1( cntl ) ); + cntx, cntl_sub_unpackv_y1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/hemv/bli_hemv_blk_var3.c b/frame/2/hemv/bli_hemv_blk_var3.c index 06121f379..25b6d6b75 100644 --- a/frame/2/hemv/bli_hemv_blk_var3.c +++ b/frame/2/hemv/bli_hemv_blk_var3.c @@ -40,6 +40,7 @@ void bli_hemv_blk_var3( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ) { obj_t a11, a11_pack; @@ -73,14 +74,14 @@ void bli_hemv_blk_var3( conj_t conjh, // y = beta * y; bli_scalv_int( beta, y, - cntl_sub_scalv( cntl ) ); + cntx, cntl_sub_scalv( cntl ) ); // Partition diagonally. for ( ij = 0; ij < mn; ij += b_alg ) { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A10, x1, x0, y1, and y0. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -98,20 +99,20 @@ void bli_hemv_blk_var3( conj_t conjh, // Initialize objects for packing A11, x1, and y1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_init( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // Copy/pack A11, x1, y1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_int( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // y1 = y1 + alpha * A21' * x2; bli_gemv_int( bli_apply_conj( conjh, BLIS_TRANSPOSE ), @@ -121,6 +122,7 @@ void bli_hemv_blk_var3( conj_t conjh, &x2, &BLIS_ONE, &y1_pack, + cntx, cntl_sub_gemv_t_cp( cntl ) ); // y1 = y1 + alpha * A11 * x1; @@ -130,6 +132,7 @@ void bli_hemv_blk_var3( conj_t conjh, &x1_pack, &BLIS_ONE, &y1_pack, + cntx, cntl_sub_hemv( cntl ) ); // y2 = y2 + alpha * A21 * x1; @@ -140,11 +143,12 @@ void bli_hemv_blk_var3( conj_t conjh, &x1_pack, &BLIS_ONE, &y2, + cntx, cntl_sub_gemv_n_cp( cntl ) ); // Copy/unpack y1 (if y1 was packed). bli_unpackv_int( &y1_pack, &y1, - cntl_sub_unpackv_y1( cntl ) ); + cntx, cntl_sub_unpackv_y1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/hemv/bli_hemv_blk_var4.c b/frame/2/hemv/bli_hemv_blk_var4.c index ae6adf4eb..52dc67852 100644 --- a/frame/2/hemv/bli_hemv_blk_var4.c +++ b/frame/2/hemv/bli_hemv_blk_var4.c @@ -40,6 +40,7 @@ void bli_hemv_blk_var4( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ) { obj_t a11, a11_pack; @@ -74,14 +75,14 @@ void bli_hemv_blk_var4( conj_t conjh, // y = beta * y; bli_scalv_int( beta, y, - cntl_sub_scalv( cntl ) ); + cntx, cntl_sub_scalv( cntl ) ); // Partition diagonally. for ( ij = 0; ij < mn; ij += b_alg ) { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A10, A21, x1, y1, y0, and y2. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -101,20 +102,20 @@ void bli_hemv_blk_var4( conj_t conjh, // Initialize objects for packing A11, x1, and y1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_init( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // Copy/pack A11, x1, y1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_int( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // y0 = y0 + alpha * A10' * x1; bli_gemv_int( bli_apply_conj( conjh, BLIS_TRANSPOSE ), @@ -124,6 +125,7 @@ void bli_hemv_blk_var4( conj_t conjh, &x1_pack, &BLIS_ONE, &y0, + cntx, cntl_sub_gemv_t_rp( cntl ) ); // y1 = y1 + alpha * A11 * x1; @@ -133,6 +135,7 @@ void bli_hemv_blk_var4( conj_t conjh, &x1_pack, &BLIS_ONE, &y1_pack, + cntx, cntl_sub_hemv( cntl ) ); // y2 = y2 + alpha * A21 * x1; @@ -143,11 +146,12 @@ void bli_hemv_blk_var4( conj_t conjh, &x1_pack, &BLIS_ONE, &y2, + cntx, cntl_sub_gemv_n_cp( cntl ) ); // Copy/unpack y1 (if y1 was packed). bli_unpackv_int( &y1_pack, &y1, - cntl_sub_unpackv_y1( cntl ) ); + cntx, cntl_sub_unpackv_y1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/hemv/bli_hemv_cntl.c b/frame/2/hemv/bli_hemv_cntl.c index f49e3a856..a4f8ed263 100644 --- a/frame/2/hemv/bli_hemv_cntl.c +++ b/frame/2/hemv/bli_hemv_cntl.c @@ -44,8 +44,6 @@ extern gemv_t* gemv_cntl_rp_bs_axpy; extern gemv_t* gemv_cntl_cp_bs_dot; extern gemv_t* gemv_cntl_cp_bs_axpy; -extern blksz_t* gemv_mc; - hemv_t* hemv_cntl_bs_ke_lrow_ucol; hemv_t* hemv_cntl_bs_ke_lcol_urow; hemv_t* hemv_cntl_ge_lrow_ucol; @@ -60,16 +58,18 @@ void bli_hemv_cntl_init() = bli_hemv_cntl_obj_create( BLIS_UNB_FUSED, BLIS_VARIANT1, + 0, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL, NULL ); + NULL, NULL ); hemv_cntl_bs_ke_lcol_urow = bli_hemv_cntl_obj_create( BLIS_UNB_FUSED, BLIS_VARIANT3, + 0, NULL, NULL, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL, NULL ); + NULL, NULL ); // Create control trees for generally large problems. Here, we choose a @@ -78,7 +78,7 @@ void bli_hemv_cntl_init() = bli_hemv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemv_mc, + BLIS_M2, scalv_cntl, // scale y up-front packm_cntl, // pack A11 (if needed) packv_cntl, // pack x1 (if needed) @@ -93,7 +93,7 @@ void bli_hemv_cntl_init() = bli_hemv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemv_mc, + BLIS_M2, scalv_cntl, // scale y up-front packm_cntl, // pack A11 (if needed) packv_cntl, // pack x1 (if needed) @@ -117,7 +117,7 @@ void bli_hemv_cntl_finalize() hemv_t* bli_hemv_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a11, packv_t* sub_packv_x1, @@ -135,7 +135,7 @@ hemv_t* bli_hemv_cntl_obj_create( impl_t impl_type, cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_scalv = sub_scalv; cntl->sub_packm_a11 = sub_packm_a11; cntl->sub_packv_x1 = sub_packv_x1; @@ -153,7 +153,7 @@ hemv_t* bli_hemv_cntl_obj_create( impl_t impl_type, void bli_hemv_cntl_obj_init( hemv_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a11, packv_t* sub_packv_x1, @@ -167,7 +167,7 @@ void bli_hemv_cntl_obj_init( hemv_t* cntl, { cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_scalv = sub_scalv; cntl->sub_packm_a11 = sub_packm_a11; cntl->sub_packv_x1 = sub_packv_x1; diff --git a/frame/2/hemv/bli_hemv_cntl.h b/frame/2/hemv/bli_hemv_cntl.h index 407f3e106..2a2bdce0e 100644 --- a/frame/2/hemv/bli_hemv_cntl.h +++ b/frame/2/hemv/bli_hemv_cntl.h @@ -36,7 +36,7 @@ struct hemv_s { impl_t impl_type; varnum_t var_num; - blksz_t* b; + bszid_t bszid; struct scalv_s* sub_scalv; struct packm_s* sub_packm_a11; struct packv_s* sub_packv_x1; @@ -56,7 +56,7 @@ void bli_hemv_cntl_init( void ); void bli_hemv_cntl_finalize( void ); hemv_t* bli_hemv_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a11, packv_t* sub_packv_x1, @@ -70,7 +70,7 @@ hemv_t* bli_hemv_cntl_obj_create( impl_t impl_type, void bli_hemv_cntl_obj_init( hemv_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a11, packv_t* sub_packv_x1, diff --git a/frame/2/hemv/bli_hemv.c b/frame/2/hemv/bli_hemv_front.c similarity index 78% rename from frame/2/hemv/bli_hemv.c rename to frame/2/hemv/bli_hemv_front.c index d75b14639..6cba96866 100644 --- a/frame/2/hemv/bli_hemv.c +++ b/frame/2/hemv/bli_hemv_front.c @@ -39,11 +39,15 @@ extern hemv_t* hemv_cntl_bs_ke_lcol_urow; extern hemv_t* hemv_cntl_ge_lrow_ucol; extern hemv_t* hemv_cntl_ge_lcol_urow; -void bli_hemv( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ) +void bli_hemv_front + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx + ) { hemv_t* hemv_cntl; num_t dt_targ_a; @@ -138,7 +142,6 @@ void bli_hemv( obj_t* alpha, } } - // Invoke the internal back-end with the copy-casts of scalars and the // chosen control tree. Set conjh to BLIS_CONJUGATE to invoke the // Hermitian (and not symmetric) algorithms. @@ -148,6 +151,7 @@ void bli_hemv( obj_t* alpha, x, &beta_local, y, + cntx, hemv_cntl ); } @@ -156,19 +160,21 @@ void bli_hemv( obj_t* alpha, // Define BLAS-like interfaces with homogeneous-typed operands. // #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploa, \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* x, inc_t incx, \ - ctype* beta, \ - ctype* y, inc_t incy \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -197,39 +203,9 @@ void PASTEMAC(ch,opname)( \ &ao, \ &xo, \ &betao, \ - &yo ); \ + &yo, \ + cntx ); \ } -INSERT_GENTFUNC_BASIC( hemv, hemv ) +INSERT_GENTFUNC_BASIC0( hemv_front ) - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, opname, varname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - uplo_t uploa, \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t rs_a, inc_t cs_a, \ - ctype_x* x, inc_t incx, \ - ctype_y* beta, \ - ctype_y* y, inc_t incy \ - ) \ -{ \ - bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ -} - -INSERT_GENTFUNC3U12_BASIC( hemv, hemv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( hemv, hemv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( hemv, hemv ) -#endif diff --git a/frame/2/hemv/bli_hemv_front.h b/frame/2/hemv/bli_hemv_front.h new file mode 100644 index 000000000..655026d2a --- /dev/null +++ b/frame/2/hemv/bli_hemv_front.h @@ -0,0 +1,68 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +void bli_hemv_front + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx + ); + + +// +// Prototype BLAS-like interfaces with homogeneous-typed operands. +// +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( hemv_front ) + diff --git a/frame/2/hemv/bli_hemv_int.c b/frame/2/hemv/bli_hemv_int.c index d128bcb4f..60347627d 100644 --- a/frame/2/hemv/bli_hemv_int.c +++ b/frame/2/hemv/bli_hemv_int.c @@ -42,6 +42,7 @@ typedef void (*FUNCPTR_T)( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); static FUNCPTR_T vars[4][3] = @@ -59,6 +60,7 @@ void bli_hemv_int( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ) { varnum_t n; @@ -68,7 +70,10 @@ void bli_hemv_int( conj_t conjh, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_hemv_int_check( conjh, alpha, a, x, beta, y, cntl ); + { + if ( bli_is_conj( conjh ) ) bli_hemv_check( alpha, a, x, beta, y ); + else bli_symv_check( alpha, a, x, beta, y ); + } // If y has a zero dimension, return early. if ( bli_obj_has_zero_dim( *y ) ) return; @@ -112,6 +117,7 @@ void bli_hemv_int( conj_t conjh, x, beta, y, + cntx, cntl ); } diff --git a/frame/2/hemv/bli_hemv_int.h b/frame/2/hemv/bli_hemv_int.h index fb52fe280..004edd285 100644 --- a/frame/2/hemv/bli_hemv_int.h +++ b/frame/2/hemv/bli_hemv_int.h @@ -38,4 +38,5 @@ void bli_hemv_int( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/bli_hemv_unb_var1.c b/frame/2/hemv/bli_hemv_unb_var1.c index db1630590..a443a445f 100644 --- a/frame/2/hemv/bli_hemv_unb_var1.c +++ b/frame/2/hemv/bli_hemv_unb_var1.c @@ -34,135 +34,41 @@ #include "blis.h" -#define FUNCPTR_T hemv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conja, - conj_t conjx, - conj_t conjh, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,hemv_unb_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,hemv_unb_var1); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,hemv_unb_var1); -#endif -#endif - - -void bli_hemv_unb_var1( conj_t conjh, - obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - hemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - uplo_t uplo = bli_obj_uplo( *a ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( uplo, - conja, - conjx, - conjh, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername1, kername2 ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - uplo_t uplo, \ - conj_t conja, \ - conj_t conjx, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_y* one = PASTEMAC(chy,1); \ - ctype_y* zero = PASTEMAC(chy,0); \ - ctype_a* a10t; \ - ctype_a* alpha11; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_y* y0; \ - ctype_y* psi1; \ - ctype_x conjx_chi1; \ - ctype_ax alpha_chi1; \ - ctype_a alpha11_temp; \ - dim_t i; \ - dim_t n_behind; \ - inc_t rs_at, cs_at; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* one = PASTEMAC(ch,1); \ + ctype* zero = PASTEMAC(ch,0); \ + ctype* a10t; \ + ctype* alpha11; \ + ctype* x0; \ + ctype* chi1; \ + ctype* y0; \ + ctype* psi1; \ + ctype conjx_chi1; \ + ctype alpha_chi1; \ + ctype alpha11_temp; \ + dim_t i; \ + dim_t n_behind; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -185,74 +91,88 @@ void PASTEMAC3(cha,chx,chy,varname)( \ } \ \ /* If beta is zero, use setv. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ { \ /* y = 0; */ \ - PASTEMAC2(chy,chy,setv)( m, \ - zero, \ - y_cast, incy ); \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y, incy, \ + cntx \ + ); \ } \ else \ { \ /* y = beta * y; */ \ - PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - beta_cast, \ - y_cast, incy ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta, \ + y, incy, \ + cntx \ + ); \ } \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ + PASTECH(ch,dotxv_ft) kfp_dv; \ +\ + /* Query the context for the kernel function pointers. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ + kfp_dv = bli_cntx_get_l1v_ker_dt( dt, BLIS_DOTXV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_behind = i; \ - a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - x0 = x_cast + (0 )*incx; \ - chi1 = x_cast + (i )*incx; \ - y0 = y_cast + (0 )*incy; \ - psi1 = y_cast + (i )*incy; \ + a10t = a + (i )*rs_at + (0 )*cs_at; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + x0 = x + (0 )*incx; \ + chi1 = x + (i )*incx; \ + y0 = y + (0 )*incy; \ + psi1 = y + (i )*incy; \ \ /* Apply conjx to chi1 and and scale by alpha. */ \ - PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC(ch,scal2s)( *alpha, conjx_chi1, alpha_chi1 ); \ \ /* y0 = y0 + alpha * a10t' * chi1; */ \ - PASTEMAC3(chax,cha,chy,kername1)( conj0, \ - n_behind, \ - &alpha_chi1, \ - a10t, cs_at, \ - y0, incy ); \ + kfp_av \ + ( \ + conj0, \ + n_behind, \ + &alpha_chi1, \ + a10t, cs_at, \ + y0, incy, \ + cntx \ + ); \ \ /* psi1 = psi1 + alpha * a10t * x0; */ \ - PASTEMAC3(cha,chx,chax,kername2)( conj1, \ - conjx, \ - n_behind, \ - alpha_cast, \ - a10t, cs_at, \ - x0, incx, \ - one, \ - psi1 ); \ + kfp_dv \ + ( \ + conj1, \ + conjx, \ + n_behind, \ + alpha, \ + a10t, cs_at, \ + x0, incx, \ + one, \ + psi1, \ + cntx \ + ); \ \ /* For hemv, explicitly set the imaginary component of alpha11 to zero. */ \ - PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_temp ); \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(cha,seti0s)( alpha11_temp ); \ + PASTEMAC(ch,seti0s)( alpha11_temp ); \ \ /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ + PASTEMAC(ch,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC2( hemv_unb_var1, AXPYV_KERNEL, DOTXV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D2( hemv_unb_var1, AXPYV_KERNEL, DOTXV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P2( hemv_unb_var1, AXPYV_KERNEL, DOTXV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( hemv_unb_var1 ) diff --git a/frame/2/hemv/bli_hemv_unb_var2.c b/frame/2/hemv/bli_hemv_unb_var2.c index baf59345b..92b534979 100644 --- a/frame/2/hemv/bli_hemv_unb_var2.c +++ b/frame/2/hemv/bli_hemv_unb_var2.c @@ -34,137 +34,43 @@ #include "blis.h" -#define FUNCPTR_T hemv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conja, - conj_t conjx, - conj_t conjh, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,hemv_unb_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,hemv_unb_var2); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,hemv_unb_var2); -#endif -#endif - - -void bli_hemv_unb_var2( conj_t conjh, - obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - hemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - uplo_t uplo = bli_obj_uplo( *a ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( uplo, - conja, - conjx, - conjh, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - uplo_t uplo, \ - conj_t conja, \ - conj_t conjx, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_y* one = PASTEMAC(chy,1); \ - ctype_y* zero = PASTEMAC(chy,0); \ - ctype_a* a10t; \ - ctype_a* alpha11; \ - ctype_a* a21; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_y* psi1; \ - ctype_x conjx_chi1; \ - ctype_ax alpha_chi1; \ - ctype_a alpha11_temp; \ - dim_t i; \ - dim_t n_behind; \ - dim_t n_ahead; \ - inc_t rs_at, cs_at; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* one = PASTEMAC(ch,1); \ + ctype* zero = PASTEMAC(ch,0); \ + ctype* a10t; \ + ctype* alpha11; \ + ctype* a21; \ + ctype* x0; \ + ctype* chi1; \ + ctype* x2; \ + ctype* psi1; \ + ctype conjx_chi1; \ + ctype alpha_chi1; \ + ctype alpha11_temp; \ + dim_t i; \ + dim_t n_behind; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -187,78 +93,90 @@ void PASTEMAC3(cha,chx,chy,varname)( \ } \ \ /* If beta is zero, use setv. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ { \ /* y = 0; */ \ - PASTEMAC2(chy,chy,setv)( m, \ - zero, \ - y_cast, incy ); \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y, incy, \ + cntx \ + ); \ } \ else \ { \ /* y = beta * y; */ \ - PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - beta_cast, \ - y_cast, incy ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta, \ + y, incy, \ + cntx \ + ); \ } \ +\ + PASTECH(ch,dotxv_ft) kfp_dv; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_dv = bli_cntx_get_l1v_ker_dt( dt, BLIS_DOTXV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_behind = i; \ n_ahead = m - i - 1; \ - a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ - x0 = x_cast + (0 )*incx; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ - psi1 = y_cast + (i )*incy; \ + a10t = a + (i )*rs_at + (0 )*cs_at; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a21 = a + (i+1)*rs_at + (i )*cs_at; \ + x0 = x + (0 )*incx; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ + psi1 = y + (i )*incy; \ \ /* Apply conjx to chi1 and and scale by alpha. */ \ - PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC(ch,scal2s)( *alpha, conjx_chi1, alpha_chi1 ); \ \ /* psi1 = psi1 + alpha * a10t * x0; */ \ - PASTEMAC3(cha,chx,chax,kername)( conj0, \ - conjx, \ - n_behind, \ - alpha_cast, \ - a10t, cs_at, \ - x0, incx, \ - one, \ - psi1 ); \ + kfp_dv \ + ( \ + conj0, \ + conjx, \ + n_behind, \ + alpha, \ + a10t, cs_at, \ + x0, incx, \ + one, \ + psi1, \ + cntx \ + ); \ \ /* psi1 = psi1 + alpha * a21' * x2; */ \ - PASTEMAC3(cha,chx,chax,kername)( conj1, \ - conjx, \ - n_ahead, \ - alpha_cast, \ - a21, rs_at, \ - x2, incx, \ - one, \ - psi1 ); \ + kfp_dv \ + ( \ + conj1, \ + conjx, \ + n_ahead, \ + alpha, \ + a21, rs_at, \ + x2, incx, \ + one, \ + psi1, \ + cntx \ + ); \ \ /* For hemv, explicitly set the imaginary component of alpha11 to zero. */ \ - PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_temp ); \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(cha,seti0s)( alpha11_temp ); \ + PASTEMAC(ch,seti0s)( alpha11_temp ); \ \ /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ + PASTEMAC(ch,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( hemv_unb_var2, DOTXV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( hemv_unb_var2, DOTXV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( hemv_unb_var2, DOTXV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( hemv_unb_var2 ) diff --git a/frame/2/hemv/bli_hemv_unb_var3.c b/frame/2/hemv/bli_hemv_unb_var3.c index 6b3ec2da8..eee9db0ae 100644 --- a/frame/2/hemv/bli_hemv_unb_var3.c +++ b/frame/2/hemv/bli_hemv_unb_var3.c @@ -34,135 +34,41 @@ #include "blis.h" -#define FUNCPTR_T hemv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conja, - conj_t conjx, - conj_t conjh, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,hemv_unb_var3); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,hemv_unb_var3); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,hemv_unb_var3); -#endif -#endif - - -void bli_hemv_unb_var3( conj_t conjh, - obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - hemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - uplo_t uplo = bli_obj_uplo( *a ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( uplo, - conja, - conjx, - conjh, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername1, kername2 ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - uplo_t uplo, \ - conj_t conja, \ - conj_t conjx, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_y* one = PASTEMAC(chy,1); \ - ctype_y* zero = PASTEMAC(chy,0); \ - ctype_a* alpha11; \ - ctype_a* a21; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_y* psi1; \ - ctype_y* y2; \ - ctype_x conjx_chi1; \ - ctype_ax alpha_chi1; \ - ctype_a alpha11_temp; \ - dim_t i; \ - dim_t n_ahead; \ - inc_t rs_at, cs_at; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* one = PASTEMAC(ch,1); \ + ctype* zero = PASTEMAC(ch,0); \ + ctype* alpha11; \ + ctype* a21; \ + ctype* chi1; \ + ctype* x2; \ + ctype* psi1; \ + ctype* y2; \ + ctype conjx_chi1; \ + ctype alpha_chi1; \ + ctype alpha11_temp; \ + dim_t i; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -185,73 +91,87 @@ void PASTEMAC3(cha,chx,chy,varname)( \ } \ \ /* If beta is zero, use setv. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ { \ /* y = 0; */ \ - PASTEMAC2(chy,chy,setv)( m, \ - zero, \ - y_cast, incy ); \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y, incy, \ + cntx \ + ); \ } \ else \ { \ /* y = beta * y; */ \ - PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - beta_cast, \ - y_cast, incy ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta, \ + y, incy, \ + cntx \ + ); \ } \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ + PASTECH(ch,dotxv_ft) kfp_dv; \ +\ + /* Query the context for the kernel function pointers. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ + kfp_dv = bli_cntx_get_l1v_ker_dt( dt, BLIS_DOTXV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_ahead = m - i - 1; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ - psi1 = y_cast + (i )*incy; \ - y2 = y_cast + (i+1)*incy; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a21 = a + (i+1)*rs_at + (i )*cs_at; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ + psi1 = y + (i )*incy; \ + y2 = y + (i+1)*incy; \ \ /* Apply conjx to chi1 and and scale by alpha. */ \ - PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC(ch,scal2s)( *alpha, conjx_chi1, alpha_chi1 ); \ \ /* For hemv, explicitly set the imaginary component of alpha11 to zero. */ \ - PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_temp ); \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(cha,seti0s)( alpha11_temp ); \ + PASTEMAC(ch,seti0s)( alpha11_temp ); \ \ /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ + PASTEMAC(ch,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ \ /* psi1 = psi1 + alpha * a21' * x2; */ \ - PASTEMAC3(cha,chx,chax,kername1)( conj0, \ - conjx, \ - n_ahead, \ - alpha_cast, \ - a21, rs_at, \ - x2, incx, \ - one, \ - psi1 ); \ + kfp_dv \ + ( \ + conj0, \ + conjx, \ + n_ahead, \ + alpha, \ + a21, rs_at, \ + x2, incx, \ + one, \ + psi1, \ + cntx \ + ); \ \ /* y2 = y2 + alpha * a21 * chi1; */ \ - PASTEMAC3(chax,cha,chy,kername2)( conj1, \ - n_ahead, \ - &alpha_chi1, \ - a21, rs_at, \ - y2, incy ); \ + kfp_av \ + ( \ + conj1, \ + n_ahead, \ + &alpha_chi1, \ + a21, rs_at, \ + y2, incy, \ + cntx \ + ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC2( hemv_unb_var3, DOTXV_KERNEL, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D2( hemv_unb_var3, DOTXV_KERNEL, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P2( hemv_unb_var3, DOTXV_KERNEL, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( hemv_unb_var3 ) diff --git a/frame/2/hemv/bli_hemv_unb_var4.c b/frame/2/hemv/bli_hemv_unb_var4.c index ead2d8538..bdb5b6e3d 100644 --- a/frame/2/hemv/bli_hemv_unb_var4.c +++ b/frame/2/hemv/bli_hemv_unb_var4.c @@ -34,136 +34,42 @@ #include "blis.h" -#define FUNCPTR_T hemv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conja, - conj_t conjx, - conj_t conjh, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,hemv_unb_var4); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,hemv_unb_var4); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,hemv_unb_var4); -#endif -#endif - - -void bli_hemv_unb_var4( conj_t conjh, - obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - hemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - uplo_t uplo = bli_obj_uplo( *a ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( uplo, - conja, - conjx, - conjh, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - uplo_t uplo, \ - conj_t conja, \ - conj_t conjx, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_y* zero = PASTEMAC(chy,0); \ - ctype_a* a10t; \ - ctype_a* alpha11; \ - ctype_a* a21; \ - ctype_x* chi1; \ - ctype_y* y0; \ - ctype_y* psi1; \ - ctype_y* y2; \ - ctype_x conjx_chi1; \ - ctype_ax alpha_chi1; \ - ctype_a alpha11_temp; \ - dim_t i; \ - dim_t n_behind; \ - dim_t n_ahead; \ - inc_t rs_at, cs_at; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* zero = PASTEMAC(ch,0); \ + ctype* a10t; \ + ctype* alpha11; \ + ctype* a21; \ + ctype* chi1; \ + ctype* y0; \ + ctype* psi1; \ + ctype* y2; \ + ctype conjx_chi1; \ + ctype alpha_chi1; \ + ctype alpha11_temp; \ + dim_t i; \ + dim_t n_behind; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -186,72 +92,84 @@ void PASTEMAC3(cha,chx,chy,varname)( \ } \ \ /* If beta is zero, use setv. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ { \ /* y = 0; */ \ - PASTEMAC2(chy,chy,setv)( m, \ - zero, \ - y_cast, incy ); \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y, incy, \ + cntx \ + ); \ } \ else \ { \ /* y = beta * y; */ \ - PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - beta_cast, \ - y_cast, incy ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta, \ + y, incy, \ + cntx \ + ); \ } \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointers. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_behind = i; \ n_ahead = m - i - 1; \ - a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ - chi1 = x_cast + (i )*incx; \ - y0 = y_cast + (0 )*incy; \ - psi1 = y_cast + (i )*incy; \ - y2 = y_cast + (i+1)*incy; \ + a10t = a + (i )*rs_at + (0 )*cs_at; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a21 = a + (i+1)*rs_at + (i )*cs_at; \ + chi1 = x + (i )*incx; \ + y0 = y + (0 )*incy; \ + psi1 = y + (i )*incy; \ + y2 = y + (i+1)*incy; \ \ /* Apply conjx to chi1 and and scale by alpha. */ \ - PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC(ch,scal2s)( *alpha, conjx_chi1, alpha_chi1 ); \ \ /* y0 = y0 + alpha * a10t' * chi1; */ \ - PASTEMAC3(chax,cha,chy,kername)( conj0, \ - n_behind, \ - &alpha_chi1, \ - a10t, cs_at, \ - y0, incy ); \ + kfp_av \ + ( \ + conj0, \ + n_behind, \ + &alpha_chi1, \ + a10t, cs_at, \ + y0, incy, \ + cntx \ + ); \ \ /* For hemv, explicitly set the imaginary component of alpha11 to zero. */ \ - PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_temp ); \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(cha,seti0s)( alpha11_temp ); \ + PASTEMAC(ch,seti0s)( alpha11_temp ); \ \ /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ + PASTEMAC(ch,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ \ /* y2 = y2 + alpha * a21 * chi1; */ \ - PASTEMAC3(chax,cha,chy,kername)( conj1, \ - n_ahead, \ - &alpha_chi1, \ - a21, rs_at, \ - y2, incy ); \ + kfp_av \ + ( \ + conj1, \ + n_ahead, \ + &alpha_chi1, \ + a21, rs_at, \ + y2, incy, \ + cntx \ + ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( hemv_unb_var4, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( hemv_unb_var4, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( hemv_unb_var4, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( hemv_unb_var4 ) diff --git a/frame/2/hemv/bli_hemv_unf_var1.c b/frame/2/hemv/bli_hemv_unf_var1.c index ad669466f..2eebfc265 100644 --- a/frame/2/hemv/bli_hemv_unf_var1.c +++ b/frame/2/hemv/bli_hemv_unf_var1.c @@ -34,144 +34,50 @@ #include "blis.h" -#define FUNCPTR_T hemv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conja, - conj_t conjx, - conj_t conjh, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,hemv_unf_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,hemv_unf_var1); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,hemv_unf_var1); -#endif -#endif - - -void bli_hemv_unf_var1( conj_t conjh, - obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - hemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - uplo_t uplo = bli_obj_uplo( *a ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( uplo, - conja, - conjx, - conjh, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - uplo_t uplo, \ - conj_t conja, \ - conj_t conjx, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_y* one = PASTEMAC(chy,1); \ - ctype_y* zero = PASTEMAC(chy,0); \ - ctype_a* A10; \ - ctype_a* A11; \ - ctype_a* a10t; \ - ctype_a* alpha11; \ - ctype_a* a21; \ - ctype_x* x0; \ - ctype_x* x1; \ - ctype_x* chi11; \ - ctype_y* y0; \ - ctype_y* y1; \ - ctype_y* y01; \ - ctype_y* psi11; \ - ctype_y* y21; \ - ctype_x conjx_chi11; \ - ctype_ax alpha_chi11; \ - ctype_a alpha11_temp; \ - dim_t i, k, j; \ - dim_t b_fuse, f; \ - dim_t n_behind; \ - dim_t f_ahead, f_behind; \ - inc_t rs_at, cs_at; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* one = PASTEMAC(ch,1); \ + ctype* zero = PASTEMAC(ch,0); \ + ctype* A10; \ + ctype* A11; \ + ctype* a10t; \ + ctype* alpha11; \ + ctype* a21; \ + ctype* x0; \ + ctype* x1; \ + ctype* chi11; \ + ctype* y0; \ + ctype* y1; \ + ctype* y01; \ + ctype* psi11; \ + ctype* y21; \ + ctype conjx_chi11; \ + ctype alpha_chi11; \ + ctype alpha11_temp; \ + dim_t i, k, j; \ + dim_t b_fuse, f; \ + dim_t n_behind; \ + dim_t f_ahead, f_behind; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -194,51 +100,67 @@ void PASTEMAC3(cha,chx,chy,varname)( \ } \ \ /* If beta is zero, use setv. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ { \ /* y = 0; */ \ - PASTEMAC2(chy,chy,setv)( m, \ - zero, \ - y_cast, incy ); \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y, incy, \ + cntx \ + ); \ } \ else \ { \ /* y = beta * y; */ \ - PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - beta_cast, \ - y_cast, incy ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta, \ + y, incy, \ + cntx \ + ); \ } \ \ - /* Query the fusing factor for the dotxaxpyf implementation. */ \ - b_fuse = PASTEMAC(chax,dotxaxpyf_fusefac); \ + PASTECH(ch,dotxaxpyf_ft) kfp_xf; \ +\ + /* Query the context for the kernel function pointer and fusing factor. */ \ + kfp_xf = bli_cntx_get_l1f_ker_dt( dt, BLIS_DOTXAXPYF_KER, cntx ); \ + b_fuse = bli_cntx_get_blksz_def_dt( dt, BLIS_XF, cntx ); \ \ for ( i = 0; i < m; i += f ) \ { \ f = bli_determine_blocksize_dim_f( i, m, b_fuse ); \ n_behind = i; \ - A10 = a_cast + (i )*rs_at + (0 )*cs_at; \ - A11 = a_cast + (i )*rs_at + (i )*cs_at; \ - x0 = x_cast + (0 )*incx; \ - x1 = x_cast + (i )*incx; \ - y0 = y_cast + (0 )*incy; \ - y1 = y_cast + (i )*incy; \ + A10 = a + (i )*rs_at + (0 )*cs_at; \ + A11 = a + (i )*rs_at + (i )*cs_at; \ + x0 = x + (0 )*incx; \ + x1 = x + (i )*incx; \ + y0 = y + (0 )*incy; \ + y1 = y + (i )*incy; \ \ /* y1 = y1 + alpha * A10 * x0; (dotxf) */ \ /* y0 = y0 + alpha * A10' * x1; (axpyf) */ \ - PASTEMAC3(cha,chx,chy,kername)( conj0, \ - conj1, \ - conjx, \ - conjx, \ - n_behind, \ - f, \ - alpha_cast, \ - A10, cs_at, rs_at, \ - x0, incx, \ - x1, incx, \ - one, \ - y1, incy, \ - y0, incy ); \ + kfp_xf \ + ( \ + conj0, \ + conj1, \ + conjx, \ + conjx, \ + n_behind, \ + f, \ + alpha, \ + A10, cs_at, rs_at, \ + x0, incx, \ + x1, incx, \ + one, \ + y1, incy, \ + y0, incy, \ + cntx \ + ); \ \ /* y1 = y1 + alpha * A11 * x1; (variant 4) */ \ for ( k = 0; k < f; ++k ) \ @@ -254,52 +176,42 @@ void PASTEMAC3(cha,chx,chy,varname)( \ y21 = y1 + (k+1)*incy; \ \ /* y01 = y01 + alpha * a10t' * chi11; */ \ - PASTEMAC2(chx,chx,copycjs)( conjx, *chi11, conjx_chi11 ); \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi11, alpha_chi11 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi11, conjx_chi11 ); \ + PASTEMAC(ch,scal2s)( *alpha, conjx_chi11, alpha_chi11 ); \ if ( bli_is_conj( conj1 ) ) \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(chax,cha,chy,axpyjs)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ + PASTEMAC(ch,axpyjs)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ } \ else \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ + PASTEMAC(ch,axpys)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ } \ \ /* For hemv, explicitly set the imaginary component of alpha11 to zero. */ \ - PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_temp ); \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(cha,seti0s)( alpha11_temp ); \ + PASTEMAC(ch,seti0s)( alpha11_temp ); \ \ /* psi11 = psi11 + alpha * alpha11 * chi11; */ \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, alpha11_temp, *psi11 ); \ + PASTEMAC(ch,axpys)( alpha_chi11, alpha11_temp, *psi11 ); \ \ /* y21 = y21 + alpha * a21 * chi11; */ \ if ( bli_is_conj( conj0 ) ) \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(chax,cha,chy,axpyjs)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ + PASTEMAC(ch,axpyjs)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ } \ else \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ + PASTEMAC(ch,axpys)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ } \ } \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( hemv_unf_var1, DOTXAXPYF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( hemv_unf_var1, DOTXAXPYF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( hemv_unf_var1, DOTXAXPYF_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( hemv_unf_var1 ) diff --git a/frame/2/hemv/bli_hemv_unf_var1a.c b/frame/2/hemv/bli_hemv_unf_var1a.c index 35cab687d..c8d75e0b7 100644 --- a/frame/2/hemv/bli_hemv_unf_var1a.c +++ b/frame/2/hemv/bli_hemv_unf_var1a.c @@ -34,135 +34,41 @@ #include "blis.h" -#define FUNCPTR_T hemv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conja, - conj_t conjx, - conj_t conjh, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,hemv_unf_var1a); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,hemv_unf_var1a); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,hemv_unf_var1a); -#endif -#endif - - -void bli_hemv_unf_var1a( conj_t conjh, - obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - hemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - uplo_t uplo = bli_obj_uplo( *a ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( uplo, - conja, - conjx, - conjh, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - uplo_t uplo, \ - conj_t conja, \ - conj_t conjx, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_y* zero = PASTEMAC(chy,0); \ - ctype_a* a10t; \ - ctype_a* alpha11; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_y* y0; \ - ctype_y* psi1; \ - ctype_ax rho; \ - ctype_x conjx_chi1; \ - ctype_ax alpha_chi1; \ - ctype_a alpha11_temp; \ - dim_t i; \ - dim_t n_behind; \ - inc_t rs_at, cs_at; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* zero = PASTEMAC(ch,0); \ + ctype* a10t; \ + ctype* alpha11; \ + ctype* x0; \ + ctype* chi1; \ + ctype* y0; \ + ctype* psi1; \ + ctype rho; \ + ctype conjx_chi1; \ + ctype alpha_chi1; \ + ctype alpha11_temp; \ + dim_t i; \ + dim_t n_behind; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -185,70 +91,78 @@ void PASTEMAC3(cha,chx,chy,varname)( \ } \ \ /* If beta is zero, use setv. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ { \ /* y = 0; */ \ - PASTEMAC2(chy,chy,setv)( m, \ - zero, \ - y_cast, incy ); \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y, incy, \ + cntx \ + ); \ } \ else \ { \ /* y = beta * y; */ \ - PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - beta_cast, \ - y_cast, incy ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta, \ + y, incy, \ + cntx \ + ); \ } \ +\ + PASTECH(ch,dotaxpyv_ft) kfp_vf; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_vf = bli_cntx_get_l1f_ker_dt( dt, BLIS_DOTAXPYV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_behind = i; \ - a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - x0 = x_cast + (0 )*incx; \ - chi1 = x_cast + (i )*incx; \ - y0 = y_cast + (0 )*incy; \ - psi1 = y_cast + (i )*incy; \ + a10t = a + (i )*rs_at + (0 )*cs_at; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + x0 = x + (0 )*incx; \ + chi1 = x + (i )*incx; \ + y0 = y + (0 )*incy; \ + psi1 = y + (i )*incy; \ \ /* Apply conjx to chi1 and and scale by alpha. */ \ - PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC(ch,scal2s)( *alpha, conjx_chi1, alpha_chi1 ); \ \ /* psi1 = psi1 + alpha * a10t * x0; (dotv) */ \ /* y0 = y0 + alpha * a10t' * chi1; (axpyv) */ \ - PASTEMAC3(cha,chx,chy,kername)( conj0, \ - conj1, \ - conjx, \ - n_behind, \ - &alpha_chi1, \ - a10t, cs_at, \ - x0, incx, \ - &rho, \ - y0, incy ); \ - PASTEMAC3(chax,chax,chy,axpys)( *alpha_cast, rho, *psi1 ); \ + kfp_vf \ + ( \ + conj0, \ + conj1, \ + conjx, \ + n_behind, \ + &alpha_chi1, \ + a10t, cs_at, \ + x0, incx, \ + &rho, \ + y0, incy, \ + cntx \ + ); \ + PASTEMAC(ch,axpys)( *alpha, rho, *psi1 ); \ \ /* For hemv, explicitly set the imaginary component of alpha11 to zero. */ \ - PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_temp ); \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(cha,seti0s)( alpha11_temp ); \ + PASTEMAC(ch,seti0s)( alpha11_temp ); \ \ /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ + PASTEMAC(ch,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( hemv_unf_var1a, DOTAXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( hemv_unf_var1a, DOTAXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( hemv_unf_var1a, DOTAXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( hemv_unf_var1a ) diff --git a/frame/2/hemv/bli_hemv_unf_var3.c b/frame/2/hemv/bli_hemv_unf_var3.c index 6599f2a82..1cf5a34a7 100644 --- a/frame/2/hemv/bli_hemv_unf_var3.c +++ b/frame/2/hemv/bli_hemv_unf_var3.c @@ -34,162 +34,50 @@ #include "blis.h" -#define FUNCPTR_T hemv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conja, - conj_t conjx, - conj_t conjh, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,hemv_unf_var3); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,hemv_unf_var3); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,hemv_unf_var3); -#endif -#endif - - -void bli_hemv_unf_var3( conj_t conjh, - obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - hemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - uplo_t uplo = bli_obj_uplo( *a ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - -#if 0 - obj_t x_copy, y_copy; - - bli_obj_create( dt_x, m, 1, 0, 0, &x_copy ); - bli_obj_create( dt_y, m, 1, 0, 0, &y_copy ); - bli_copyv( x, &x_copy ); - bli_copyv( y, &y_copy ); - buf_x = bli_obj_buffer_at_off( x_copy ); - buf_y = bli_obj_buffer_at_off( y_copy ); - incx = 1; - incy = 1; -#endif - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( uplo, - conja, - conjx, - conjh, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -#if 0 - bli_copyv( &y_copy, y ); - bli_obj_free( &x_copy ); - bli_obj_free( &y_copy ); -#endif -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - uplo_t uplo, \ - conj_t conja, \ - conj_t conjx, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_y* one = PASTEMAC(chy,1); \ - ctype_y* zero = PASTEMAC(chy,0); \ - ctype_a* A11; \ - ctype_a* A21; \ - ctype_a* a10t; \ - ctype_a* alpha11; \ - ctype_a* a21; \ - ctype_x* x1; \ - ctype_x* x2; \ - ctype_x* chi11; \ - ctype_y* y1; \ - ctype_y* y2; \ - ctype_y* y01; \ - ctype_y* psi11; \ - ctype_y* y21; \ - ctype_x conjx_chi11; \ - ctype_ax alpha_chi11; \ - ctype_a alpha11_temp; \ - dim_t i, k, j; \ - dim_t b_fuse, f; \ - dim_t n_ahead; \ - dim_t f_ahead, f_behind; \ - inc_t rs_at, cs_at; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* one = PASTEMAC(ch,1); \ + ctype* zero = PASTEMAC(ch,0); \ + ctype* A11; \ + ctype* A21; \ + ctype* a10t; \ + ctype* alpha11; \ + ctype* a21; \ + ctype* x1; \ + ctype* x2; \ + ctype* chi11; \ + ctype* y1; \ + ctype* y2; \ + ctype* y01; \ + ctype* psi11; \ + ctype* y21; \ + ctype conjx_chi11; \ + ctype alpha_chi11; \ + ctype alpha11_temp; \ + dim_t i, k, j; \ + dim_t b_fuse, f; \ + dim_t n_ahead; \ + dim_t f_ahead, f_behind; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -212,35 +100,47 @@ void PASTEMAC3(cha,chx,chy,varname)( \ } \ \ /* If beta is zero, use setv. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ { \ /* y = 0; */ \ - PASTEMAC2(chy,chy,setv)( m, \ - zero, \ - y_cast, incy ); \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y, incy, \ + cntx \ + ); \ } \ else \ { \ /* y = beta * y; */ \ - PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - beta_cast, \ - y_cast, incy ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta, \ + y, incy, \ + cntx \ + ); \ } \ \ - /* Query the fusing factor for the dotxaxpyf implementation. */ \ - b_fuse = PASTEMAC(chax,dotxaxpyf_fusefac); \ + PASTECH(ch,dotxaxpyf_ft) kfp_xf; \ +\ + /* Query the context for the kernel function pointer and fusing factor. */ \ + kfp_xf = bli_cntx_get_l1f_ker_dt( dt, BLIS_DOTXAXPYF_KER, cntx ); \ + b_fuse = bli_cntx_get_blksz_def_dt( dt, BLIS_XF, cntx ); \ \ for ( i = 0; i < m; i += f ) \ { \ f = bli_determine_blocksize_dim_f( i, m, b_fuse ); \ n_ahead = m - i - f; \ - A11 = a_cast + (i )*rs_at + (i )*cs_at; \ - A21 = a_cast + (i+f)*rs_at + (i )*cs_at; \ - x1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+f)*incx; \ - y1 = y_cast + (i )*incy; \ - y2 = y_cast + (i+f)*incy; \ + A11 = a + (i )*rs_at + (i )*cs_at; \ + A21 = a + (i+f)*rs_at + (i )*cs_at; \ + x1 = x + (i )*incx; \ + x2 = x + (i+f)*incx; \ + y1 = y + (i )*incy; \ + y2 = y + (i+f)*incy; \ \ /* y1 = y1 + alpha * A11 * x1; (variant 4) */ \ for ( k = 0; k < f; ++k ) \ @@ -256,68 +156,62 @@ void PASTEMAC3(cha,chx,chy,varname)( \ y21 = y1 + (k+1)*incy; \ \ /* y01 = y01 + alpha * a10t' * chi11; */ \ - PASTEMAC2(chx,chx,copycjs)( conjx, *chi11, conjx_chi11 ); \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi11, alpha_chi11 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi11, conjx_chi11 ); \ + PASTEMAC(ch,scal2s)( *alpha, conjx_chi11, alpha_chi11 ); \ if ( bli_is_conj( conj0 ) ) \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(chax,cha,chy,axpyjs)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ + PASTEMAC(ch,axpyjs)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ } \ else \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ + PASTEMAC(ch,axpys)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ } \ \ /* For hemv, explicitly set the imaginary component of alpha11 to zero. */ \ - PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_temp ); \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(cha,seti0s)( alpha11_temp ); \ + PASTEMAC(ch,seti0s)( alpha11_temp ); \ \ /* psi11 = psi11 + alpha * alpha11 * chi11; */ \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, alpha11_temp, *psi11 ); \ + PASTEMAC(ch,axpys)( alpha_chi11, alpha11_temp, *psi11 ); \ \ /* y21 = y21 + alpha * a21 * chi11; */ \ if ( bli_is_conj( conj1 ) ) \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(chax,cha,chy,axpyjs)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ + PASTEMAC(ch,axpyjs)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ } \ else \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ + PASTEMAC(ch,axpys)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ } \ } \ \ /* y1 = y1 + alpha * A21' * x2; (dotxf) */ \ /* y2 = y2 + alpha * A21 * x1; (axpyf) */ \ - PASTEMAC3(cha,chx,chy,kername)( conj0, \ - conj1, \ - conjx, \ - conjx, \ - n_ahead, \ - f, \ - alpha_cast, \ - A21, rs_at, cs_at, \ - x2, incx, \ - x1, incx, \ - one, \ - y1, incy, \ - y2, incy ); \ + kfp_xf \ + ( \ + conj0, \ + conj1, \ + conjx, \ + conjx, \ + n_ahead, \ + f, \ + alpha, \ + A21, rs_at, cs_at, \ + x2, incx, \ + x1, incx, \ + one, \ + y1, incy, \ + y2, incy, \ + cntx \ + ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( hemv_unf_var3, DOTXAXPYF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( hemv_unf_var3, DOTXAXPYF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( hemv_unf_var3, DOTXAXPYF_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( hemv_unf_var3 ) diff --git a/frame/2/hemv/bli_hemv_unf_var3a.c b/frame/2/hemv/bli_hemv_unf_var3a.c index c6974bb6b..17386be9f 100644 --- a/frame/2/hemv/bli_hemv_unf_var3a.c +++ b/frame/2/hemv/bli_hemv_unf_var3a.c @@ -34,153 +34,41 @@ #include "blis.h" -#define FUNCPTR_T hemv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conja, - conj_t conjx, - conj_t conjh, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx, - void* beta, - void* y, inc_t incy - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,hemv_unf_var3a); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,hemv_unf_var3a); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,hemv_unf_var3a); -#endif -#endif - - -void bli_hemv_unf_var3a( conj_t conjh, - obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y, - hemv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - - uplo_t uplo = bli_obj_uplo( *a ); - conj_t conja = bli_obj_conj_status( *a ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - num_t dt_alpha; - void* buf_alpha; - - num_t dt_beta; - void* buf_beta; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // The datatype of beta MUST be the same as the datatype of y. - dt_beta = dt_y; - buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); - -#if 0 - obj_t x_copy, y_copy; - - bli_obj_create( dt_x, m, 1, 0, 0, &x_copy ); - bli_obj_create( dt_y, m, 1, 0, 0, &y_copy ); - bli_copyv( x, &x_copy ); - bli_copyv( y, &y_copy ); - buf_x = bli_obj_buffer_at_off( x_copy ); - buf_y = bli_obj_buffer_at_off( y_copy ); - incx = 1; - incy = 1; -#endif - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x][dt_y]; - - // Invoke the function. - f( uplo, - conja, - conjx, - conjh, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx, - buf_beta, - buf_y, incy ); -#if 0 - bli_copyv( &y_copy, y ); - bli_obj_free( &x_copy ); - bli_obj_free( &y_copy ); -#endif -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - uplo_t uplo, \ - conj_t conja, \ - conj_t conjx, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx, \ - void* beta, \ - void* y, inc_t incy \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_y* beta_cast = beta; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_y* zero = PASTEMAC(chy,0); \ - ctype_a* alpha11; \ - ctype_a* a21; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_y* psi1; \ - ctype_y* y2; \ - ctype_ax rho; \ - ctype_x conjx_chi1; \ - ctype_ax alpha_chi1; \ - ctype_a alpha11_temp; \ - dim_t i; \ - dim_t n_ahead; \ - inc_t rs_at, cs_at; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* zero = PASTEMAC(ch,0); \ + ctype* alpha11; \ + ctype* a21; \ + ctype* chi1; \ + ctype* x2; \ + ctype* psi1; \ + ctype* y2; \ + ctype rho; \ + ctype conjx_chi1; \ + ctype alpha_chi1; \ + ctype alpha11_temp; \ + dim_t i; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -203,69 +91,77 @@ void PASTEMAC3(cha,chx,chy,varname)( \ } \ \ /* If beta is zero, use setv. Otherwise, scale by beta. */ \ - if ( PASTEMAC(chy,eq0)( *beta_cast ) ) \ + if ( PASTEMAC(ch,eq0)( *beta ) ) \ { \ /* y = 0; */ \ - PASTEMAC2(chy,chy,setv)( m, \ - zero, \ - y_cast, incy ); \ + PASTEMAC(ch,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y, incy, \ + cntx \ + ); \ } \ else \ { \ /* y = beta * y; */ \ - PASTEMAC2(chy,chy,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - beta_cast, \ - y_cast, incy ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta, \ + y, incy, \ + cntx \ + ); \ } \ +\ + PASTECH(ch,dotaxpyv_ft) kfp_vf; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_vf = bli_cntx_get_l1f_ker_dt( dt, BLIS_DOTAXPYV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_ahead = m - i - 1; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ - psi1 = y_cast + (i )*incy; \ - y2 = y_cast + (i+1)*incy; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a21 = a + (i+1)*rs_at + (i )*cs_at; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ + psi1 = y + (i )*incy; \ + y2 = y + (i+1)*incy; \ \ /* For hemv, explicitly set the imaginary component of alpha11 to zero. */ \ - PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_temp ); \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(cha,seti0s)( alpha11_temp ); \ + PASTEMAC(ch,seti0s)( alpha11_temp ); \ \ /* Apply conjx to chi1 and and scale by alpha. */ \ - PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC(ch,scal2s)( *alpha, conjx_chi1, alpha_chi1 ); \ \ /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ - PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ + PASTEMAC(ch,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ \ /* psi1 = psi1 + alpha * a21' * x2; (dotv) */ \ /* y2 = y2 + alpha * a21 * chi1; (axpyv) */ \ - PASTEMAC3(cha,chx,chy,kername)( conj0, \ - conj1, \ - conjx, \ - n_ahead, \ - &alpha_chi1, \ - a21, rs_at, \ - x2, incx, \ - &rho, \ - y2, incy ); \ - PASTEMAC3(chax,chax,chy,axpys)( *alpha_cast, rho, *psi1 ); \ + kfp_vf \ + ( \ + conj0, \ + conj1, \ + conjx, \ + n_ahead, \ + &alpha_chi1, \ + a21, rs_at, \ + x2, incx, \ + &rho, \ + y2, incy, \ + cntx \ + ); \ + PASTEMAC(ch,axpys)( *alpha, rho, *psi1 ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( hemv_unf_var3a, DOTAXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( hemv_unf_var3a, DOTAXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( hemv_unf_var3a, DOTAXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( hemv_unf_var3a ) diff --git a/frame/2/hemv/bli_hemv_var.h b/frame/2/hemv/bli_hemv_var.h new file mode 100644 index 000000000..cf0e25bd4 --- /dev/null +++ b/frame/2/hemv/bli_hemv_var.h @@ -0,0 +1,102 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + conj_t conjh, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y, \ + cntx_t* cntx, \ + hemv_t* cntl \ + ); + +GENPROT( hemv_blk_var1 ) +GENPROT( hemv_blk_var2 ) +GENPROT( hemv_blk_var3 ) +GENPROT( hemv_blk_var4 ) + +GENPROT( hemv_unb_var1 ) +GENPROT( hemv_unb_var2 ) +GENPROT( hemv_unb_var3 ) +GENPROT( hemv_unb_var4 ) + +GENPROT( hemv_unf_var1 ) +GENPROT( hemv_unf_var3 ) +GENPROT( hemv_unf_var1a ) +GENPROT( hemv_unf_var3a ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( hemv_unb_var1 ) +INSERT_GENTPROT_BASIC( hemv_unb_var2 ) +INSERT_GENTPROT_BASIC( hemv_unb_var3 ) +INSERT_GENTPROT_BASIC( hemv_unb_var4 ) + +INSERT_GENTPROT_BASIC( hemv_unf_var1 ) +INSERT_GENTPROT_BASIC( hemv_unf_var3 ) +INSERT_GENTPROT_BASIC( hemv_unf_var1a ) +INSERT_GENTPROT_BASIC( hemv_unf_var3a ) + diff --git a/frame/2/hemv/bli_hemv_var_oapi.c b/frame/2/hemv/bli_hemv_var_oapi.c new file mode 100644 index 000000000..c0fc00ad4 --- /dev/null +++ b/frame/2/hemv/bli_hemv_var_oapi.c @@ -0,0 +1,101 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + conj_t conjh, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + obj_t* beta, \ + obj_t* y, \ + cntx_t* cntx, \ + hemv_t* cntl \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *a ); \ +\ + uplo_t uplo = bli_obj_uplo( *a ); \ + conj_t conja = bli_obj_conj_status( *a ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ +\ + dim_t m = bli_obj_length( *a ); \ +\ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ +\ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t incy = bli_obj_vector_inc( *y ); \ +\ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ + void* buf_beta = bli_obj_buffer_for_1x1( dt, *beta ); \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + bli_call_ft_15 \ + ( \ + dt, \ + opname, \ + uplo, \ + conja, \ + conjx, \ + conjh, \ + m, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_x, incx, \ + buf_beta, \ + buf_y, incy, \ + cntx \ + ); \ +} \ + +GENFRONT( hemv_unb_var1 ) +GENFRONT( hemv_unb_var2 ) +GENFRONT( hemv_unb_var3 ) +GENFRONT( hemv_unb_var4 ) + +GENFRONT( hemv_unf_var1 ) +GENFRONT( hemv_unf_var3 ) +GENFRONT( hemv_unf_var1a ) +GENFRONT( hemv_unf_var3a ) + diff --git a/frame/2/hemv/bli_hemv_blk_var1.h b/frame/2/hemv/old/bli_hemv_blk_var1.h similarity index 98% rename from frame/2/hemv/bli_hemv_blk_var1.h rename to frame/2/hemv/old/bli_hemv_blk_var1.h index fcee4fe45..3fa15d1ee 100644 --- a/frame/2/hemv/bli_hemv_blk_var1.h +++ b/frame/2/hemv/old/bli_hemv_blk_var1.h @@ -38,5 +38,6 @@ void bli_hemv_blk_var1( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/bli_hemv_blk_var2.h b/frame/2/hemv/old/bli_hemv_blk_var2.h similarity index 98% rename from frame/2/hemv/bli_hemv_blk_var2.h rename to frame/2/hemv/old/bli_hemv_blk_var2.h index 62929b972..38f32267a 100644 --- a/frame/2/hemv/bli_hemv_blk_var2.h +++ b/frame/2/hemv/old/bli_hemv_blk_var2.h @@ -38,5 +38,6 @@ void bli_hemv_blk_var2( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/bli_hemv_blk_var3.h b/frame/2/hemv/old/bli_hemv_blk_var3.h similarity index 98% rename from frame/2/hemv/bli_hemv_blk_var3.h rename to frame/2/hemv/old/bli_hemv_blk_var3.h index a841f2a16..b720c8c78 100644 --- a/frame/2/hemv/bli_hemv_blk_var3.h +++ b/frame/2/hemv/old/bli_hemv_blk_var3.h @@ -38,5 +38,6 @@ void bli_hemv_blk_var3( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/bli_hemv_blk_var4.h b/frame/2/hemv/old/bli_hemv_blk_var4.h similarity index 98% rename from frame/2/hemv/bli_hemv_blk_var4.h rename to frame/2/hemv/old/bli_hemv_blk_var4.h index 2836dcd5e..6e5bc3dbb 100644 --- a/frame/2/hemv/bli_hemv_blk_var4.h +++ b/frame/2/hemv/old/bli_hemv_blk_var4.h @@ -38,5 +38,6 @@ void bli_hemv_blk_var4( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/bli_hemv_check.c b/frame/2/hemv/old/bli_hemv_check.c similarity index 99% rename from frame/2/hemv/bli_hemv_check.c rename to frame/2/hemv/old/bli_hemv_check.c index fb71a2aba..3c1293c98 100644 --- a/frame/2/hemv/bli_hemv_check.c +++ b/frame/2/hemv/old/bli_hemv_check.c @@ -110,6 +110,7 @@ void bli_hemv_int_check( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ) { err_t e_val; diff --git a/frame/2/hemv/bli_hemv_check.h b/frame/2/hemv/old/bli_hemv_check.h similarity index 98% rename from frame/2/hemv/bli_hemv_check.h rename to frame/2/hemv/old/bli_hemv_check.h index de15959ae..8b1d67dc6 100644 --- a/frame/2/hemv/bli_hemv_check.h +++ b/frame/2/hemv/old/bli_hemv_check.h @@ -50,4 +50,5 @@ void bli_hemv_int_check( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/old/bli_hemv_cntx.c b/frame/2/hemv/old/bli_hemv_cntx.c new file mode 100644 index 000000000..5dede7c22 --- /dev/null +++ b/frame/2/hemv/old/bli_hemv_cntx.c @@ -0,0 +1,68 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +void bli_hemv_cntx_init( cntx_t* cntx ) +{ + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with kernels for the current architecture. + bli_gks_cntx_set_l1v_ker( BLIS_AXPYV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_DOTXV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_SCALV_KER, cntx ); + + bli_gks_cntx_set_l1f_ker( BLIS_DOTAXPYV_KER, cntx ); + bli_gks_cntx_set_l1f_ker( BLIS_AXPYF_KER, cntx ); + bli_gks_cntx_set_l1f_ker( BLIS_DOTXF_KER, cntx ); + bli_gks_cntx_set_l1f_ker( BLIS_DOTXAXPYF_KER, cntx ); + + // Set the register and cache blocksizes and multiples, as well + // as the execution method. + bli_gks_cntx_set_blkszs( BLIS_NAT, 5, + BLIS_N2, BLIS_N2, + BLIS_M2, BLIS_M2, + BLIS_AF, BLIS_AF, + BLIS_DF, BLIS_DF, + BLIS_XF, BLIS_XF, + cntx ); +} + +void bli_hemv_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + diff --git a/frame/1f/dotxf/bli_dotxf_fusefac.h b/frame/2/hemv/old/bli_hemv_cntx.h similarity index 94% rename from frame/1f/dotxf/bli_dotxf_fusefac.h rename to frame/2/hemv/old/bli_hemv_cntx.h index 8a2ea6dda..c5bf2a75f 100644 --- a/frame/1f/dotxf/bli_dotxf_fusefac.h +++ b/frame/2/hemv/old/bli_hemv_cntx.h @@ -32,8 +32,6 @@ */ -// -// Prototype object-based fusing factor query routine. -// -dim_t bli_dotxf_fusefac( num_t dt ); +void bli_hemv_cntx_init( void ); +void bli_hemv_cntx_finalize( void ); diff --git a/frame/2/hemv/old/bli_hemv_unb_var1.c b/frame/2/hemv/old/bli_hemv_unb_var1.c new file mode 100644 index 000000000..9ef9c9ed9 --- /dev/null +++ b/frame/2/hemv/old/bli_hemv_unb_var1.c @@ -0,0 +1,254 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +static hemv_vft GENARRAY(ftypes,hemv_unb_var1); + +void bli_hemv_unb_var1( conj_t conjh, + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx, + hemv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + + uplo_t uplo = bli_obj_uplo( *a ); + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + num_t dt_alpha; + void* buf_alpha; + + num_t dt_beta; + void* buf_beta; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // The datatype of beta MUST be the same as the datatype of y. + dt_beta = dt_y; + buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a]; + + // Invoke the function. + f( uplo, + conja, + conjx, + conjh, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx, + buf_beta, + buf_y, incy ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername1, kername2 ) \ +\ +void PASTEMAC(cha,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_y* one = PASTEMAC(chy,1); \ + ctype_y* zero = PASTEMAC(chy,0); \ + ctype_a* a10t; \ + ctype_a* alpha11; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_y* y0; \ + ctype_y* psi1; \ + ctype_x conjx_chi1; \ + ctype_ax alpha_chi1; \ + ctype_a alpha11_temp; \ + dim_t i; \ + dim_t n_behind; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ +\ + conj0 = bli_apply_conj( conjh, conja ); \ + conj1 = conja; \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ +\ + conj0 = conja; \ + conj1 = bli_apply_conj( conjh, conja ); \ + } \ +\ + /* If beta is zero, use setv. Otherwise, scale by beta. */ \ + if ( PASTEMAC(cha,eq0)( *beta_cast ) ) \ + { \ + /* y = 0; */ \ + PASTEMAC(cha,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y_cast, incy, \ + cntx \ + ); \ + } \ + else \ + { \ + /* y = beta * y; */ \ + PASTEMAC(cha,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta_cast, \ + y_cast, incy, \ + cntx \ + ); \ + } \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_behind = i; \ + a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + x0 = x_cast + (0 )*incx; \ + chi1 = x_cast + (i )*incx; \ + y0 = y_cast + (0 )*incy; \ + psi1 = y_cast + (i )*incy; \ +\ + /* Apply conjx to chi1 and and scale by alpha. */ \ + PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ +\ + /* y0 = y0 + alpha * a10t' * chi1; */ \ + PASTEMAC(cha,kername1) \ + ( \ + conj0, \ + n_behind, \ + &alpha_chi1, \ + a10t, cs_at, \ + y0, incy, \ + cntx \ + ); \ +\ + /* psi1 = psi1 + alpha * a10t * x0; */ \ + PASTEMAC(cha,kername2) \ + ( \ + conj1, \ + conjx, \ + n_behind, \ + alpha_cast, \ + a10t, cs_at, \ + x0, incx, \ + one, \ + psi1, \ + cntx \ + ); \ +\ + /* For hemv, explicitly set the imaginary component of alpha11 to + zero. */ \ + PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(cha,seti0s)( alpha11_temp ); \ +\ + /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ +\ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC2( hemv_unb_var1, AXPYV_KERNEL, DOTXV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D2( hemv_unb_var1, AXPYV_KERNEL, DOTXV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P2( hemv_unb_var1, AXPYV_KERNEL, DOTXV_KERNEL ) +#endif + diff --git a/frame/2/hemv/bli_hemv_unb_var1.h b/frame/2/hemv/old/bli_hemv_unb_var1.h similarity index 98% rename from frame/2/hemv/bli_hemv_unb_var1.h rename to frame/2/hemv/old/bli_hemv_unb_var1.h index ffd996597..39442570d 100644 --- a/frame/2/hemv/bli_hemv_unb_var1.h +++ b/frame/2/hemv/old/bli_hemv_unb_var1.h @@ -39,6 +39,7 @@ void bli_hemv_unb_var1( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/old/bli_hemv_unb_var2.c b/frame/2/hemv/old/bli_hemv_unb_var2.c new file mode 100644 index 000000000..149eef4bf --- /dev/null +++ b/frame/2/hemv/old/bli_hemv_unb_var2.c @@ -0,0 +1,260 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +static hemv_vft GENARRAY(ftypes,hemv_unb_var2); + +void bli_hemv_unb_var2( conj_t conjh, + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx, + hemv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + + uplo_t uplo = bli_obj_uplo( *a ); + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + num_t dt_alpha; + void* buf_alpha; + + num_t dt_beta; + void* buf_beta; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // The datatype of beta MUST be the same as the datatype of y. + dt_beta = dt_y; + buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a]; + + // Invoke the function. + f( uplo, + conja, + conjx, + conjh, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx, + buf_beta, + buf_y, incy ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +\ +void PASTEMAC(cha,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_y* one = PASTEMAC(chy,1); \ + ctype_y* zero = PASTEMAC(chy,0); \ + ctype_a* a10t; \ + ctype_a* alpha11; \ + ctype_a* a21; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_y* psi1; \ + ctype_x conjx_chi1; \ + ctype_ax alpha_chi1; \ + ctype_a alpha11_temp; \ + dim_t i; \ + dim_t n_behind; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ +\ + conj0 = conja; \ + conj1 = bli_apply_conj( conjh, conja ); \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ +\ + conj0 = bli_apply_conj( conjh, conja ); \ + conj1 = conja; \ + } \ +\ + /* If beta is zero, use setv. Otherwise, scale by beta. */ \ + if ( PASTEMAC(cha,eq0)( *beta_cast ) ) \ + { \ + /* y = 0; */ \ + PASTEMAC(cha,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y_cast, incy, \ + cntx \ + ); \ + } \ + else \ + { \ + /* y = beta * y; */ \ + PASTEMAC(cha,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta_cast, \ + y_cast, incy, \ + cntx \ + ); \ + } \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_behind = i; \ + n_ahead = m - i - 1; \ + a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ + x0 = x_cast + (0 )*incx; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ + psi1 = y_cast + (i )*incy; \ +\ + /* Apply conjx to chi1 and and scale by alpha. */ \ + PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ +\ + /* psi1 = psi1 + alpha * a10t * x0; */ \ + PASTEMAC(cha,kername) \ + ( \ + conj0, \ + conjx, \ + n_behind, \ + alpha_cast, \ + a10t, cs_at, \ + x0, incx, \ + one, \ + psi1, \ + cntx \ + ); \ +\ + /* psi1 = psi1 + alpha * a21' * x2; */ \ + PASTEMAC(cha,kername) \ + ( \ + conj1, \ + conjx, \ + n_ahead, \ + alpha_cast, \ + a21, rs_at, \ + x2, incx, \ + one, \ + psi1, \ + cntx \ + ); \ +\ + /* For hemv, explicitly set the imaginary component of alpha11 to + zero. */ \ + PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(cha,seti0s)( alpha11_temp ); \ +\ + /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( hemv_unb_var2, DOTXV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( hemv_unb_var2, DOTXV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( hemv_unb_var2, DOTXV_KERNEL ) +#endif + diff --git a/frame/2/hemv/bli_hemv_unb_var2.h b/frame/2/hemv/old/bli_hemv_unb_var2.h similarity index 98% rename from frame/2/hemv/bli_hemv_unb_var2.h rename to frame/2/hemv/old/bli_hemv_unb_var2.h index 9ac58352a..7774d1bf6 100644 --- a/frame/2/hemv/bli_hemv_unb_var2.h +++ b/frame/2/hemv/old/bli_hemv_unb_var2.h @@ -39,6 +39,7 @@ void bli_hemv_unb_var2( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/old/bli_hemv_unb_var3.c b/frame/2/hemv/old/bli_hemv_unb_var3.c new file mode 100644 index 000000000..714ea1d96 --- /dev/null +++ b/frame/2/hemv/old/bli_hemv_unb_var3.c @@ -0,0 +1,253 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +static hemv_vft GENARRAY(ftypes,hemv_unb_var3); + +void bli_hemv_unb_var3( conj_t conjh, + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx, + hemv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + + uplo_t uplo = bli_obj_uplo( *a ); + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + num_t dt_alpha; + void* buf_alpha; + + num_t dt_beta; + void* buf_beta; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // The datatype of beta MUST be the same as the datatype of y. + dt_beta = dt_y; + buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a]; + + // Invoke the function. + f( uplo, + conja, + conjx, + conjh, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx, + buf_beta, + buf_y, incy ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername1, kername2 ) \ +\ +void PASTEMAC(cha,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_y* one = PASTEMAC(chy,1); \ + ctype_y* zero = PASTEMAC(chy,0); \ + ctype_a* alpha11; \ + ctype_a* a21; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_y* psi1; \ + ctype_y* y2; \ + ctype_x conjx_chi1; \ + ctype_ax alpha_chi1; \ + ctype_a alpha11_temp; \ + dim_t i; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ +\ + conj0 = bli_apply_conj( conjh, conja ); \ + conj1 = conja; \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ +\ + conj0 = conja; \ + conj1 = bli_apply_conj( conjh, conja ); \ + } \ +\ + /* If beta is zero, use setv. Otherwise, scale by beta. */ \ + if ( PASTEMAC(cha,eq0)( *beta_cast ) ) \ + { \ + /* y = 0; */ \ + PASTEMAC(cha,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y_cast, incy, \ + cntx \ + ); \ + } \ + else \ + { \ + /* y = beta * y; */ \ + PASTEMAC(cha,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta_cast, \ + y_cast, incy, \ + cntx \ + ); \ + } \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_ahead = m - i - 1; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ + psi1 = y_cast + (i )*incy; \ + y2 = y_cast + (i+1)*incy; \ +\ + /* Apply conjx to chi1 and and scale by alpha. */ \ + PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ +\ + /* For hemv, explicitly set the imaginary component of alpha11 to + zero. */ \ + PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(cha,seti0s)( alpha11_temp ); \ +\ + /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ +\ + /* psi1 = psi1 + alpha * a21' * x2; */ \ + PASTEMAC(cha,kername1) \ + ( \ + conj0, \ + conjx, \ + n_ahead, \ + alpha_cast, \ + a21, rs_at, \ + x2, incx, \ + one, \ + psi1, \ + cntx \ + ); \ +\ + /* y2 = y2 + alpha * a21 * chi1; */ \ + PASTEMAC(cha,kername2) \ + ( \ + conj1, \ + n_ahead, \ + &alpha_chi1, \ + a21, rs_at, \ + y2, incy, \ + cntx \ + ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC2( hemv_unb_var3, DOTXV_KERNEL, AXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D2( hemv_unb_var3, DOTXV_KERNEL, AXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P2( hemv_unb_var3, DOTXV_KERNEL, AXPYV_KERNEL ) +#endif + diff --git a/frame/2/hemv/bli_hemv_unb_var3.h b/frame/2/hemv/old/bli_hemv_unb_var3.h similarity index 98% rename from frame/2/hemv/bli_hemv_unb_var3.h rename to frame/2/hemv/old/bli_hemv_unb_var3.h index 729094f0c..ddbfc9b87 100644 --- a/frame/2/hemv/bli_hemv_unb_var3.h +++ b/frame/2/hemv/old/bli_hemv_unb_var3.h @@ -39,6 +39,7 @@ void bli_hemv_unb_var3( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/old/bli_hemv_unb_var4.c b/frame/2/hemv/old/bli_hemv_unb_var4.c new file mode 100644 index 000000000..c233a6dd9 --- /dev/null +++ b/frame/2/hemv/old/bli_hemv_unb_var4.c @@ -0,0 +1,253 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +static hemv_vft GENARRAY(ftypes,hemv_unb_var4); + +void bli_hemv_unb_var4( conj_t conjh, + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx, + hemv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + + uplo_t uplo = bli_obj_uplo( *a ); + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + num_t dt_alpha; + void* buf_alpha; + + num_t dt_beta; + void* buf_beta; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // The datatype of beta MUST be the same as the datatype of y. + dt_beta = dt_y; + buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a]; + + // Invoke the function. + f( uplo, + conja, + conjx, + conjh, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx, + buf_beta, + buf_y, incy ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +\ +void PASTEMAC(cha,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_y* zero = PASTEMAC(chy,0); \ + ctype_a* a10t; \ + ctype_a* alpha11; \ + ctype_a* a21; \ + ctype_x* chi1; \ + ctype_y* y0; \ + ctype_y* psi1; \ + ctype_y* y2; \ + ctype_x conjx_chi1; \ + ctype_ax alpha_chi1; \ + ctype_a alpha11_temp; \ + dim_t i; \ + dim_t n_behind; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ +\ + conj0 = bli_apply_conj( conjh, conja ); \ + conj1 = conja; \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ +\ + conj0 = conja; \ + conj1 = bli_apply_conj( conjh, conja ); \ + } \ +\ + /* If beta is zero, use setv. Otherwise, scale by beta. */ \ + if ( PASTEMAC(cha,eq0)( *beta_cast ) ) \ + { \ + /* y = 0; */ \ + PASTEMAC(cha,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y_cast, incy, \ + cntx \ + ); \ + } \ + else \ + { \ + /* y = beta * y; */ \ + PASTEMAC(cha,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta_cast, \ + y_cast, incy, \ + cntx \ + ); \ + } \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_behind = i; \ + n_ahead = m - i - 1; \ + a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ + chi1 = x_cast + (i )*incx; \ + y0 = y_cast + (0 )*incy; \ + psi1 = y_cast + (i )*incy; \ + y2 = y_cast + (i+1)*incy; \ +\ + /* Apply conjx to chi1 and and scale by alpha. */ \ + PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ +\ + /* y0 = y0 + alpha * a10t' * chi1; */ \ + PASTEMAC(cha,kername) \ + ( \ + conj0, \ + n_behind, \ + &alpha_chi1, \ + a10t, cs_at, \ + y0, incy, \ + cntx \ + ); \ +\ + /* For hemv, explicitly set the imaginary component of alpha11 to + zero. */ \ + PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(cha,seti0s)( alpha11_temp ); \ +\ + /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ +\ + /* y2 = y2 + alpha * a21 * chi1; */ \ + PASTEMAC(cha,kername) \ + ( \ + conj1, \ + n_ahead, \ + &alpha_chi1, \ + a21, rs_at, \ + y2, incy, \ + cntx \ + ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( hemv_unb_var4, AXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( hemv_unb_var4, AXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( hemv_unb_var4, AXPYV_KERNEL ) +#endif + diff --git a/frame/2/hemv/bli_hemv_unb_var4.h b/frame/2/hemv/old/bli_hemv_unb_var4.h similarity index 98% rename from frame/2/hemv/bli_hemv_unb_var4.h rename to frame/2/hemv/old/bli_hemv_unb_var4.h index 1bde74d5e..c7b3fb543 100644 --- a/frame/2/hemv/bli_hemv_unb_var4.h +++ b/frame/2/hemv/old/bli_hemv_unb_var4.h @@ -39,6 +39,7 @@ void bli_hemv_unb_var4( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/old/bli_hemv_unf_var1.c b/frame/2/hemv/old/bli_hemv_unf_var1.c new file mode 100644 index 000000000..d685775bc --- /dev/null +++ b/frame/2/hemv/old/bli_hemv_unf_var1.c @@ -0,0 +1,297 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +static hemv_vft GENARRAY(ftypes,hemv_unf_var1); + +void bli_hemv_unf_var1( conj_t conjh, + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx, + hemv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + + uplo_t uplo = bli_obj_uplo( *a ); + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + num_t dt_alpha; + void* buf_alpha; + + num_t dt_beta; + void* buf_beta; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // The datatype of beta MUST be the same as the datatype of y. + dt_beta = dt_y; + buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a]; + + // Invoke the function. + f( uplo, + conja, + conjx, + conjh, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx, + buf_beta, + buf_y, incy ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +\ +void PASTEMAC(cha,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_y* one = PASTEMAC(chy,1); \ + ctype_y* zero = PASTEMAC(chy,0); \ + ctype_a* A10; \ + ctype_a* A11; \ + ctype_a* a10t; \ + ctype_a* alpha11; \ + ctype_a* a21; \ + ctype_x* x0; \ + ctype_x* x1; \ + ctype_x* chi11; \ + ctype_y* y0; \ + ctype_y* y1; \ + ctype_y* y01; \ + ctype_y* psi11; \ + ctype_y* y21; \ + ctype_x conjx_chi11; \ + ctype_ax alpha_chi11; \ + ctype_a alpha11_temp; \ + dim_t i, k, j; \ + dim_t b_fuse, f; \ + dim_t n_behind; \ + dim_t f_ahead, f_behind; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ +\ + conj0 = conja; \ + conj1 = bli_apply_conj( conjh, conja ); \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ +\ + conj0 = bli_apply_conj( conjh, conja ); \ + conj1 = conja; \ + } \ +\ + /* If beta is zero, use setv. Otherwise, scale by beta. */ \ + if ( PASTEMAC(cha,eq0)( *beta_cast ) ) \ + { \ + /* y = 0; */ \ + PASTEMAC(cha,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y_cast, incy, \ + cntx \ + ); \ + } \ + else \ + { \ + /* y = beta * y; */ \ + PASTEMAC(cha,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta_cast, \ + y_cast, incy, \ + cntx \ + ); \ + } \ +\ + /* Query the fusing factor for the dotxaxpyf implementation. */ \ + b_fuse = PASTEMAC(chax,dotxaxpyf_fusefac); \ +\ + for ( i = 0; i < m; i += f ) \ + { \ + f = bli_determine_blocksize_dim_f( i, m, b_fuse ); \ + n_behind = i; \ + A10 = a_cast + (i )*rs_at + (0 )*cs_at; \ + A11 = a_cast + (i )*rs_at + (i )*cs_at; \ + x0 = x_cast + (0 )*incx; \ + x1 = x_cast + (i )*incx; \ + y0 = y_cast + (0 )*incy; \ + y1 = y_cast + (i )*incy; \ +\ + /* y1 = y1 + alpha * A10 * x0; (dotxf) */ \ + /* y0 = y0 + alpha * A10' * x1; (axpyf) */ \ + PASTEMAC(cha,kername) \ + ( \ + conj0, \ + conj1, \ + conjx, \ + conjx, \ + n_behind, \ + f, \ + alpha_cast, \ + A10, cs_at, rs_at, \ + x0, incx, \ + x1, incx, \ + one, \ + y1, incy, \ + y0, incy, \ + cntx \ + ); \ +\ + /* y1 = y1 + alpha * A11 * x1; (variant 4) */ \ + for ( k = 0; k < f; ++k ) \ + { \ + f_behind = k; \ + f_ahead = f - k - 1; \ + a10t = A11 + (k )*rs_at + (0 )*cs_at; \ + alpha11 = A11 + (k )*rs_at + (k )*cs_at; \ + a21 = A11 + (k+1)*rs_at + (k )*cs_at; \ + chi11 = x1 + (k )*incx; \ + y01 = y1 + (0 )*incy; \ + psi11 = y1 + (k )*incy; \ + y21 = y1 + (k+1)*incy; \ +\ + /* y01 = y01 + alpha * a10t' * chi11; */ \ + PASTEMAC2(chx,chx,copycjs)( conjx, *chi11, conjx_chi11 ); \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi11, alpha_chi11 ); \ + if ( bli_is_conj( conj1 ) ) \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(chax,cha,chy,axpyjs)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ + } \ + else \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ + } \ +\ + /* For hemv, explicitly set the imaginary component of alpha11 to + zero. */ \ + PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(cha,seti0s)( alpha11_temp ); \ +\ + /* psi11 = psi11 + alpha * alpha11 * chi11; */ \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, alpha11_temp, *psi11 ); \ +\ + /* y21 = y21 + alpha * a21 * chi11; */ \ + if ( bli_is_conj( conj0 ) ) \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(chax,cha,chy,axpyjs)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ + } \ + else \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ + } \ + } \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( hemv_unf_var1, DOTXAXPYF_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( hemv_unf_var1, DOTXAXPYF_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( hemv_unf_var1, DOTXAXPYF_KERNEL ) +#endif + diff --git a/frame/2/hemv/bli_hemv_unf_var1.h b/frame/2/hemv/old/bli_hemv_unf_var1.h similarity index 98% rename from frame/2/hemv/bli_hemv_unf_var1.h rename to frame/2/hemv/old/bli_hemv_unf_var1.h index 346886d5e..a06cf4e74 100644 --- a/frame/2/hemv/bli_hemv_unf_var1.h +++ b/frame/2/hemv/old/bli_hemv_unf_var1.h @@ -39,6 +39,7 @@ void bli_hemv_unf_var1( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/old/bli_hemv_unf_var1a.c b/frame/2/hemv/old/bli_hemv_unf_var1a.c new file mode 100644 index 000000000..f11bb0a91 --- /dev/null +++ b/frame/2/hemv/old/bli_hemv_unf_var1a.c @@ -0,0 +1,246 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +static hemv_vft GENARRAY(ftypes,hemv_unf_var1a); + +void bli_hemv_unf_var1a( conj_t conjh, + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx, + hemv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + + uplo_t uplo = bli_obj_uplo( *a ); + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + num_t dt_alpha; + void* buf_alpha; + + num_t dt_beta; + void* buf_beta; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // The datatype of beta MUST be the same as the datatype of y. + dt_beta = dt_y; + buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a]; + + // Invoke the function. + f( uplo, + conja, + conjx, + conjh, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx, + buf_beta, + buf_y, incy ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +\ +void PASTEMAC(cha,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_y* zero = PASTEMAC(chy,0); \ + ctype_a* a10t; \ + ctype_a* alpha11; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_y* y0; \ + ctype_y* psi1; \ + ctype_ax rho; \ + ctype_x conjx_chi1; \ + ctype_ax alpha_chi1; \ + ctype_a alpha11_temp; \ + dim_t i; \ + dim_t n_behind; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ +\ + conj0 = conja; \ + conj1 = bli_apply_conj( conjh, conja ); \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ +\ + conj0 = bli_apply_conj( conjh, conja ); \ + conj1 = conja; \ + } \ +\ + /* If beta is zero, use setv. Otherwise, scale by beta. */ \ + if ( PASTEMAC(cha,eq0)( *beta_cast ) ) \ + { \ + /* y = 0; */ \ + PASTEMAC(cha,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y_cast, incy, \ + cntx \ + ); \ + } \ + else \ + { \ + /* y = beta * y; */ \ + PASTEMAC(cha,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta_cast, \ + y_cast, incy, \ + cntx \ + ); \ + } \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_behind = i; \ + a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + x0 = x_cast + (0 )*incx; \ + chi1 = x_cast + (i )*incx; \ + y0 = y_cast + (0 )*incy; \ + psi1 = y_cast + (i )*incy; \ +\ + /* Apply conjx to chi1 and and scale by alpha. */ \ + PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ +\ + /* psi1 = psi1 + alpha * a10t * x0; (dotv) */ \ + /* y0 = y0 + alpha * a10t' * chi1; (axpyv) */ \ + PASTEMAC(cha,kername) \ + ( \ + conj0, \ + conj1, \ + conjx, \ + n_behind, \ + &alpha_chi1, \ + a10t, cs_at, \ + x0, incx, \ + &rho, \ + y0, incy, \ + cntx, \ + ); \ + PASTEMAC3(chax,chax,chy,axpys)( *alpha_cast, rho, *psi1 ); \ +\ + /* For hemv, explicitly set the imaginary component of alpha11 to + zero. */ \ + PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(cha,seti0s)( alpha11_temp ); \ +\ + /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ +\ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( hemv_unf_var1a, DOTAXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( hemv_unf_var1a, DOTAXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( hemv_unf_var1a, DOTAXPYV_KERNEL ) +#endif + diff --git a/frame/2/hemv/bli_hemv_unf_var1a.h b/frame/2/hemv/old/bli_hemv_unf_var1a.h similarity index 98% rename from frame/2/hemv/bli_hemv_unf_var1a.h rename to frame/2/hemv/old/bli_hemv_unf_var1a.h index 25c64bbfb..a5004bc8b 100644 --- a/frame/2/hemv/bli_hemv_unf_var1a.h +++ b/frame/2/hemv/old/bli_hemv_unf_var1a.h @@ -39,6 +39,7 @@ void bli_hemv_unf_var1a( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/old/bli_hemv_unf_var3.c b/frame/2/hemv/old/bli_hemv_unf_var3.c new file mode 100644 index 000000000..5223e86b8 --- /dev/null +++ b/frame/2/hemv/old/bli_hemv_unf_var3.c @@ -0,0 +1,315 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +static hemv_vft GENARRAY(ftypes,hemv_unf_var3); + +void bli_hemv_unf_var3( conj_t conjh, + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx, + hemv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + + uplo_t uplo = bli_obj_uplo( *a ); + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + num_t dt_alpha; + void* buf_alpha; + + num_t dt_beta; + void* buf_beta; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // The datatype of beta MUST be the same as the datatype of y. + dt_beta = dt_y; + buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); + +#if 0 + obj_t x_copy, y_copy; + + bli_obj_create( dt_x, m, 1, 0, 0, &x_copy ); + bli_obj_create( dt_y, m, 1, 0, 0, &y_copy ); + bli_copyv( x, &x_copy ); + bli_copyv( y, &y_copy ); + buf_x = bli_obj_buffer_at_off( x_copy ); + buf_y = bli_obj_buffer_at_off( y_copy ); + incx = 1; + incy = 1; +#endif + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a]; + + // Invoke the function. + f( uplo, + conja, + conjx, + conjh, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx, + buf_beta, + buf_y, incy ); +#if 0 + bli_copyv( &y_copy, y ); + bli_obj_free( &x_copy ); + bli_obj_free( &y_copy ); +#endif +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +\ +void PASTEMAC(cha,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_y* one = PASTEMAC(chy,1); \ + ctype_y* zero = PASTEMAC(chy,0); \ + ctype_a* A11; \ + ctype_a* A21; \ + ctype_a* a10t; \ + ctype_a* alpha11; \ + ctype_a* a21; \ + ctype_x* x1; \ + ctype_x* x2; \ + ctype_x* chi11; \ + ctype_y* y1; \ + ctype_y* y2; \ + ctype_y* y01; \ + ctype_y* psi11; \ + ctype_y* y21; \ + ctype_x conjx_chi11; \ + ctype_ax alpha_chi11; \ + ctype_a alpha11_temp; \ + dim_t i, k, j; \ + dim_t b_fuse, f; \ + dim_t n_ahead; \ + dim_t f_ahead, f_behind; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ +\ + conj0 = bli_apply_conj( conjh, conja ); \ + conj1 = conja; \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ +\ + conj0 = conja; \ + conj1 = bli_apply_conj( conjh, conja ); \ + } \ +\ + /* If beta is zero, use setv. Otherwise, scale by beta. */ \ + if ( PASTEMAC(cha,eq0)( *beta_cast ) ) \ + { \ + /* y = 0; */ \ + PASTEMAC(cha,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y_cast, incy, \ + cntx \ + ); \ + } \ + else \ + { \ + /* y = beta * y; */ \ + PASTEMAC(cha,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta_cast, \ + y_cast, incy, \ + cntx \ + ); \ + } \ +\ + /* Query the fusing factor for the dotxaxpyf implementation. */ \ + b_fuse = PASTEMAC(chax,dotxaxpyf_fusefac); \ +\ + for ( i = 0; i < m; i += f ) \ + { \ + f = bli_determine_blocksize_dim_f( i, m, b_fuse ); \ + n_ahead = m - i - f; \ + A11 = a_cast + (i )*rs_at + (i )*cs_at; \ + A21 = a_cast + (i+f)*rs_at + (i )*cs_at; \ + x1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+f)*incx; \ + y1 = y_cast + (i )*incy; \ + y2 = y_cast + (i+f)*incy; \ +\ + /* y1 = y1 + alpha * A11 * x1; (variant 4) */ \ + for ( k = 0; k < f; ++k ) \ + { \ + f_behind = k; \ + f_ahead = f - k - 1; \ + a10t = A11 + (k )*rs_at + (0 )*cs_at; \ + alpha11 = A11 + (k )*rs_at + (k )*cs_at; \ + a21 = A11 + (k+1)*rs_at + (k )*cs_at; \ + chi11 = x1 + (k )*incx; \ + y01 = y1 + (0 )*incy; \ + psi11 = y1 + (k )*incy; \ + y21 = y1 + (k+1)*incy; \ +\ + /* y01 = y01 + alpha * a10t' * chi11; */ \ + PASTEMAC2(chx,chx,copycjs)( conjx, *chi11, conjx_chi11 ); \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi11, alpha_chi11 ); \ + if ( bli_is_conj( conj0 ) ) \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(chax,cha,chy,axpyjs)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ + } \ + else \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, *(a10t + j*cs_at), *(y01 + j*incy) ); \ + } \ +\ + /* For hemv, explicitly set the imaginary component of alpha11 to + zero. */ \ + PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(cha,seti0s)( alpha11_temp ); \ +\ + /* psi11 = psi11 + alpha * alpha11 * chi11; */ \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, alpha11_temp, *psi11 ); \ +\ + /* y21 = y21 + alpha * a21 * chi11; */ \ + if ( bli_is_conj( conj1 ) ) \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(chax,cha,chy,axpyjs)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ + } \ + else \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi11, *(a21 + j*rs_at), *(y21 + j*incy) ); \ + } \ + } \ +\ + /* y1 = y1 + alpha * A21' * x2; (dotxf) */ \ + /* y2 = y2 + alpha * A21 * x1; (axpyf) */ \ + PASTEMAC(cha,kername) \ + ( \ + conj0, \ + conj1, \ + conjx, \ + conjx, \ + n_ahead, \ + f, \ + alpha_cast, \ + A21, rs_at, cs_at, \ + x2, incx, \ + x1, incx, \ + one, \ + y1, incy, \ + y2, incy, \ + cntx \ + ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( hemv_unf_var3, DOTXAXPYF_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( hemv_unf_var3, DOTXAXPYF_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( hemv_unf_var3, DOTXAXPYF_KERNEL ) +#endif + diff --git a/frame/2/hemv/bli_hemv_unf_var3.h b/frame/2/hemv/old/bli_hemv_unf_var3.h similarity index 98% rename from frame/2/hemv/bli_hemv_unf_var3.h rename to frame/2/hemv/old/bli_hemv_unf_var3.h index 10031821f..2d53cc745 100644 --- a/frame/2/hemv/bli_hemv_unf_var3.h +++ b/frame/2/hemv/old/bli_hemv_unf_var3.h @@ -39,6 +39,7 @@ void bli_hemv_unf_var3( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/hemv/old/bli_hemv_unf_var3a.c b/frame/2/hemv/old/bli_hemv_unf_var3a.c new file mode 100644 index 000000000..5bb132280 --- /dev/null +++ b/frame/2/hemv/old/bli_hemv_unf_var3a.c @@ -0,0 +1,263 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +static hemv_vft GENARRAY(ftypes,hemv_unf_var3a); + +void bli_hemv_unf_var3a( conj_t conjh, + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx, + hemv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + + uplo_t uplo = bli_obj_uplo( *a ); + conj_t conja = bli_obj_conj_status( *a ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + num_t dt_alpha; + void* buf_alpha; + + num_t dt_beta; + void* buf_beta; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // The datatype of beta MUST be the same as the datatype of y. + dt_beta = dt_y; + buf_beta = bli_obj_buffer_for_1x1( dt_beta, *beta ); + +#if 0 + obj_t x_copy, y_copy; + + bli_obj_create( dt_x, m, 1, 0, 0, &x_copy ); + bli_obj_create( dt_y, m, 1, 0, 0, &y_copy ); + bli_copyv( x, &x_copy ); + bli_copyv( y, &y_copy ); + buf_x = bli_obj_buffer_at_off( x_copy ); + buf_y = bli_obj_buffer_at_off( y_copy ); + incx = 1; + incy = 1; +#endif + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a]; + + // Invoke the function. + f( uplo, + conja, + conjx, + conjh, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx, + buf_beta, + buf_y, incy ); +#if 0 + bli_copyv( &y_copy, y ); + bli_obj_free( &x_copy ); + bli_obj_free( &y_copy ); +#endif +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, varname, kername ) \ +\ +void PASTEMAC(cha,varname) \ + ( \ + uplo_t uplo, \ + conj_t conja, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx, \ + void* beta, \ + void* y, inc_t incy, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_y* beta_cast = beta; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_y* zero = PASTEMAC(chy,0); \ + ctype_a* alpha11; \ + ctype_a* a21; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_y* psi1; \ + ctype_y* y2; \ + ctype_ax rho; \ + ctype_x conjx_chi1; \ + ctype_ax alpha_chi1; \ + ctype_a alpha11_temp; \ + dim_t i; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + conj_t conj0, conj1; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ +\ + conj0 = bli_apply_conj( conjh, conja ); \ + conj1 = conja; \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ +\ + conj0 = conja; \ + conj1 = bli_apply_conj( conjh, conja ); \ + } \ +\ + /* If beta is zero, use setv. Otherwise, scale by beta. */ \ + if ( PASTEMAC(cha,eq0)( *beta_cast ) ) \ + { \ + /* y = 0; */ \ + PASTEMAC(cha,setv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + zero, \ + y_cast, incy, \ + cntx \ + ); \ + } \ + else \ + { \ + /* y = beta * y; */ \ + PASTEMAC(cha,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + beta_cast, \ + y_cast, incy, \ + cntx \ + ); \ + } \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_ahead = m - i - 1; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ + psi1 = y_cast + (i )*incy; \ + y2 = y_cast + (i+1)*incy; \ +\ + /* For hemv, explicitly set the imaginary component of alpha11 to + zero. */ \ + PASTEMAC2(cha,cha,copycjs)( conja, *alpha11, alpha11_temp ); \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(cha,seti0s)( alpha11_temp ); \ +\ + /* Apply conjx to chi1 and and scale by alpha. */ \ + PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx_chi1 ); \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, conjx_chi1, alpha_chi1 ); \ +\ + /* psi1 = psi1 + alpha * alpha11 * chi1; */ \ + PASTEMAC3(chax,cha,chy,axpys)( alpha_chi1, alpha11_temp, *psi1 ); \ +\ + /* psi1 = psi1 + alpha * a21' * x2; (dotv) */ \ + /* y2 = y2 + alpha * a21 * chi1; (axpyv) */ \ + PASTEMAC(cha,kername) \ + ( \ + conj0, \ + conj1, \ + conjx, \ + n_ahead, \ + &alpha_chi1, \ + a21, rs_at, \ + x2, incx, \ + &rho, \ + y2, incy, \ + cntx, \ + ); \ + PASTEMAC3(chax,chax,chy,axpys)( *alpha_cast, rho, *psi1 ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( hemv_unf_var3a, DOTAXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( hemv_unf_var3a, DOTAXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( hemv_unf_var3a, DOTAXPYV_KERNEL ) +#endif + diff --git a/frame/2/hemv/bli_hemv_unf_var3a.h b/frame/2/hemv/old/bli_hemv_unf_var3a.h similarity index 98% rename from frame/2/hemv/bli_hemv_unf_var3a.h rename to frame/2/hemv/old/bli_hemv_unf_var3a.h index cbacbfcd1..769ba1762 100644 --- a/frame/2/hemv/bli_hemv_unf_var3a.h +++ b/frame/2/hemv/old/bli_hemv_unf_var3a.h @@ -39,6 +39,7 @@ void bli_hemv_unf_var3a( conj_t conjh, obj_t* x, obj_t* beta, obj_t* y, + cntx_t* cntx, hemv_t* cntl ); diff --git a/frame/2/her/bli_her.h b/frame/2/her/bli_her.h index 340578dc8..fe9d2d84e 100644 --- a/frame/2/her/bli_her.h +++ b/frame/2/her/bli_her.h @@ -33,61 +33,7 @@ */ #include "bli_her_cntl.h" -#include "bli_her_check.h" +#include "bli_her_front.h" #include "bli_her_int.h" -#include "bli_her_unb_var1.h" -#include "bli_her_unb_var2.h" - -#include "bli_her_blk_var1.h" -#include "bli_her_blk_var2.h" - - -void bli_her( obj_t* alpha, - obj_t* x, - obj_t* c ); - - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROTR -#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - dim_t m, \ - ctype_r* alpha, \ - ctype* x, inc_t incx, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROTR_BASIC( her ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT2R -#define GENTPROT2R( ctype_x, ctype_c, ctype_xr, chx, chc, chxr, opname ) \ -\ -void PASTEMAC2(chx,chc,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - dim_t m, \ - ctype_xr* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_c* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT2R_BASIC( her ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2R_MIX_D( her ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2R_MIX_P( her ) -#endif - +#include "bli_her_var.h" diff --git a/frame/2/her/bli_her_blk_var1.c b/frame/2/her/bli_her_blk_var1.c index 40630f7cb..2c84010ab 100644 --- a/frame/2/her/bli_her_blk_var1.c +++ b/frame/2/her/bli_her_blk_var1.c @@ -38,6 +38,7 @@ void bli_her_blk_var1( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ) { obj_t c11, c11_pack; @@ -70,7 +71,7 @@ void bli_her_blk_var1( conj_t conjh, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, c, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for C11, C10, x1, and x0. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -84,16 +85,16 @@ void bli_her_blk_var1( conj_t conjh, // Initialize objects for packing C11 and x1 (if needed). bli_packm_init( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ) ); + cntx, cntl_sub_packm_c11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // Copy/pack C11, x1 (if needed). bli_packm_int( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ), + cntx, cntl_sub_packm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // C10 = C10 + alpha * x1 * x0'; bli_ger_int( BLIS_NO_CONJUGATE, @@ -102,6 +103,7 @@ void bli_her_blk_var1( conj_t conjh, &x1_pack, &x0, &c10, + cntx, cntl_sub_ger( cntl ) ); // C11 = C11 + alpha * x1 * x1'; @@ -109,11 +111,12 @@ void bli_her_blk_var1( conj_t conjh, alpha, &x1_pack, &c11_pack, + cntx, cntl_sub_her( cntl ) ); // Copy/unpack C11 (if C11 was packed). bli_unpackm_int( &c11_pack, &c11, - cntl_sub_unpackm_c11( cntl ), + cntx, cntl_sub_unpackm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); } diff --git a/frame/2/her/bli_her_blk_var2.c b/frame/2/her/bli_her_blk_var2.c index 1e0cdbf8e..9e88c67ec 100644 --- a/frame/2/her/bli_her_blk_var2.c +++ b/frame/2/her/bli_her_blk_var2.c @@ -38,6 +38,7 @@ void bli_her_blk_var2( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ) { obj_t c11, c11_pack; @@ -70,7 +71,7 @@ void bli_her_blk_var2( conj_t conjh, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, c, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for C11, C21, x1, and x2. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -84,16 +85,16 @@ void bli_her_blk_var2( conj_t conjh, // Initialize objects for packing C11 and x1 (if needed). bli_packm_init( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ) ); + cntx, cntl_sub_packm_c11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // Copy/pack C11, x1 (if needed). bli_packm_int( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ), + cntx, cntl_sub_packm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // C21 = C21 + alpha * x2 * x1'; bli_ger_int( BLIS_NO_CONJUGATE, @@ -102,6 +103,7 @@ void bli_her_blk_var2( conj_t conjh, &x2, &x1_pack, &c21, + cntx, cntl_sub_ger( cntl ) ); // C11 = C11 + alpha * x1 * x1'; @@ -109,11 +111,12 @@ void bli_her_blk_var2( conj_t conjh, alpha, &x1_pack, &c11_pack, + cntx, cntl_sub_her( cntl ) ); // Copy/unpack C11 (if C11 was packed). bli_unpackm_int( &c11_pack, &c11, - cntl_sub_unpackm_c11( cntl ), + cntx, cntl_sub_unpackm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); } diff --git a/frame/2/her/bli_her_cntl.c b/frame/2/her/bli_her_cntl.c index 1d118659b..c23156b83 100644 --- a/frame/2/her/bli_her_cntl.c +++ b/frame/2/her/bli_her_cntl.c @@ -43,8 +43,6 @@ extern ger_t* ger_cntl_cp_bs_col; extern ger_t* ger_cntl_bs_ke_row; extern ger_t* ger_cntl_bs_ke_col; -extern blksz_t* gemv_mc; - her_t* her_cntl_bs_ke_lrow_ucol; her_t* her_cntl_bs_ke_lcol_urow; @@ -60,14 +58,16 @@ void bli_her_cntl_init() = bli_her_cntl_obj_create( BLIS_UNBLOCKED, BLIS_VARIANT1, + 0, NULL, NULL, NULL, - NULL, NULL, NULL ); + NULL, NULL ); her_cntl_bs_ke_lcol_urow = bli_her_cntl_obj_create( BLIS_UNBLOCKED, BLIS_VARIANT2, + 0, NULL, NULL, NULL, - NULL, NULL, NULL ); + NULL, NULL ); // Create control trees for generally large problems. Here, we choose @@ -77,7 +77,7 @@ void bli_her_cntl_init() = bli_her_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemv_mc, + BLIS_M2, packv_cntl, // pack x1 (if needed) NULL, // do NOT pack C11 ger_cntl_rp_bs_row, @@ -87,7 +87,7 @@ void bli_her_cntl_init() = bli_her_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemv_mc, + BLIS_M2, packv_cntl, // pack x1 (if needed) NULL, // do NOT pack C11 ger_cntl_cp_bs_col, @@ -106,7 +106,7 @@ void bli_her_cntl_finalize() her_t* bli_her_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x1, packm_t* sub_packm_c11, ger_t* sub_ger, @@ -119,7 +119,7 @@ her_t* bli_her_cntl_obj_create( impl_t impl_type, cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_packv_x1 = sub_packv_x1; cntl->sub_packm_c11 = sub_packm_c11; cntl->sub_ger = sub_ger; @@ -132,7 +132,7 @@ her_t* bli_her_cntl_obj_create( impl_t impl_type, void bli_her_cntl_obj_init( her_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x1, packm_t* sub_packm_c11, ger_t* sub_ger, @@ -141,7 +141,7 @@ void bli_her_cntl_obj_init( her_t* cntl, { cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_packv_x1 = sub_packv_x1; cntl->sub_packm_c11 = sub_packm_c11; cntl->sub_ger = sub_ger; diff --git a/frame/2/her/bli_her_cntl.h b/frame/2/her/bli_her_cntl.h index c9dd06d53..779439378 100644 --- a/frame/2/her/bli_her_cntl.h +++ b/frame/2/her/bli_her_cntl.h @@ -36,7 +36,7 @@ struct her_s { impl_t impl_type; varnum_t var_num; - blksz_t* b; + bszid_t bszid; struct packv_s* sub_packv_x1; struct packm_s* sub_packm_c11; struct ger_s* sub_ger; @@ -51,7 +51,7 @@ void bli_her_cntl_init( void ); void bli_her_cntl_finalize( void ); her_t* bli_her_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x1, packm_t* sub_packm_c11, ger_t* sub_ger, @@ -60,7 +60,7 @@ her_t* bli_her_cntl_obj_create( impl_t impl_type, void bli_her_cntl_obj_init( her_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x1, packm_t* sub_packm_c11, ger_t* sub_ger, diff --git a/frame/2/her/bli_her.c b/frame/2/her/bli_her_front.c similarity index 78% rename from frame/2/her/bli_her.c rename to frame/2/her/bli_her_front.c index 3add15163..eae28eb71 100644 --- a/frame/2/her/bli_her.c +++ b/frame/2/her/bli_her_front.c @@ -39,9 +39,13 @@ extern her_t* her_cntl_bs_ke_lcol_urow; extern her_t* her_cntl_ge_lrow_ucol; extern her_t* her_cntl_ge_lcol_urow; -void bli_her( obj_t* alpha, - obj_t* x, - obj_t* c ) +void bli_her_front + ( + obj_t* alpha, + obj_t* x, + obj_t* c, + cntx_t* cntx + ) { her_t* her_cntl; num_t dt_targ_x; @@ -53,7 +57,7 @@ void bli_her( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_her_check( BLIS_CONJUGATE, alpha, x, c ); + bli_her_check( alpha, x, c ); // Query the target datatypes of each object. @@ -115,7 +119,6 @@ void bli_her( obj_t* alpha, } } - // Invoke the internal back-end with the copy-cast scalar and the // chosen control tree. Set conjh to BLIS_CONJUGATE to invoke the // Hermitian (and not symmetric) algorithms. @@ -123,6 +126,7 @@ void bli_her( obj_t* alpha, &alpha_local, x, c, + cntx, her_cntl ); } @@ -131,16 +135,18 @@ void bli_her( obj_t* alpha, // Define BLAS-like interfaces with homogeneous-typed operands. // #undef GENTFUNCR -#define GENTFUNCR( ctype, ctype_r, ch, chr, opname, varname ) \ +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - dim_t m, \ - ctype_r* alpha, \ - ctype* x, inc_t incx, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + conj_t conjx, \ + dim_t m, \ + ctype_r* alpha, \ + ctype* x, inc_t incx, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt_r = PASTEMAC(chr,type); \ const num_t dt = PASTEMAC(ch,type); \ @@ -163,37 +169,9 @@ void PASTEMAC(ch,opname)( \ \ PASTEMAC0(opname)( &alphao, \ &xo, \ - &co ); \ + &co, \ + cntx ); \ } -INSERT_GENTFUNCR_BASIC( her, her ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC2R -#define GENTFUNC2R( ctype_x, ctype_c, ctype_xr, chx, chc, chxr, opname, varname ) \ -\ -void PASTEMAC2(chx,chc,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - dim_t m, \ - ctype_xr* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_c* c, inc_t rs_c, inc_t cs_c \ - ) \ -{ \ - bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ -} - -INSERT_GENTFUNC2R_BASIC( her, her ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2R_MIX_D( her, her ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2R_MIX_P( her, her ) -#endif +INSERT_GENTFUNCR_BASIC0( her_front ) diff --git a/frame/2/her/bli_her_front.h b/frame/2/her/bli_her_front.h new file mode 100644 index 000000000..6f82f9307 --- /dev/null +++ b/frame/2/her/bli_her_front.h @@ -0,0 +1,58 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +void bli_her_front + ( + obj_t* alpha, + obj_t* x, + obj_t* c, + cntx_t* cntx + ); + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + conj_t conjx, \ + dim_t m, \ + ctype_r* alpha, \ + ctype* x, inc_t incx, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( her_front ) + diff --git a/frame/2/her/bli_her_int.c b/frame/2/her/bli_her_int.c index c96fab1f1..af099f891 100644 --- a/frame/2/her/bli_her_int.c +++ b/frame/2/her/bli_her_int.c @@ -40,6 +40,7 @@ typedef void (*FUNCPTR_T)( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ); static FUNCPTR_T vars[4][3] = @@ -55,6 +56,7 @@ void bli_her_int( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ) { varnum_t n; @@ -65,7 +67,10 @@ void bli_her_int( conj_t conjh, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_her_int_check( conjh, alpha, x, c, cntl ); + { + if ( bli_is_conj( conjh ) ) bli_her_check( alpha, x, c ); + else bli_syr_check( alpha, x, c ); + } // If C or x has a zero dimension, return early. if ( bli_obj_has_zero_dim( *c ) ) return; @@ -98,6 +103,7 @@ void bli_her_int( conj_t conjh, alpha, &x_local, &c_local, + cntx, cntl ); } diff --git a/frame/2/her/bli_her_int.h b/frame/2/her/bli_her_int.h index 5fec45109..8c5358556 100644 --- a/frame/2/her/bli_her_int.h +++ b/frame/2/her/bli_her_int.h @@ -36,5 +36,6 @@ void bli_her_int( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ); diff --git a/frame/2/her/bli_her_unb_var1.c b/frame/2/her/bli_her_unb_var1.c index 741c3d1dc..63216caea 100644 --- a/frame/2/her/bli_her_unb_var1.c +++ b/frame/2/her/bli_her_unb_var1.c @@ -34,121 +34,46 @@ #include "blis.h" -#define FUNCPTR_T her_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conjx, - conj_t conjh, - dim_t m, - void* alpha, - void* x, inc_t incx, - void* c, inc_t rs_c, inc_t cs_c - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,her_unb_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,her_unb_var1); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,her_unb_var1); -#endif -#endif - - -void bli_her_unb_var1( conj_t conjh, - obj_t* alpha, - obj_t* x, - obj_t* c, - her_t* cntl ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_c = bli_obj_datatype( *c ); - - uplo_t uplo = bli_obj_uplo( *c ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *c ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_c = bli_obj_buffer_at_off( *c ); - inc_t rs_c = bli_obj_row_stride( *c ); - inc_t cs_c = bli_obj_col_stride( *c ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - - // If alpha is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the alpha object and extract the buffer at the alpha offset. - bli_set_scalar_dt_buffer( alpha, dt_x, dt_alpha, buf_alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_c]; - - // Invoke the function. - f( uplo, - conjx, - conjh, - m, - buf_alpha, - buf_x, incx, - buf_c, rs_c, cs_c ); -} - - -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_c, chx, chc, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(chx,chc,varname)( \ - uplo_t uplo, \ - conj_t conjx, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* x, inc_t incx, \ - void* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, /* complex alpha allows her variants to also perform syr. */ \ + ctype* x, inc_t incx, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ - ctype_x* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_c* c_cast = c; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_c* c10t; \ - ctype_c* gamma11; \ - ctype_x alpha_local; \ - ctype_x alpha_chi1; \ - ctype_x alpha_chi1_chi1; \ - ctype_x conjx0_chi1; \ - ctype_x conjx1_chi1; \ - dim_t i; \ - dim_t n_behind; \ - inc_t rs_ct, cs_ct; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype* x0; \ + ctype* chi1; \ + ctype* c10t; \ + ctype* gamma11; \ + ctype alpha_local; \ + ctype alpha_chi1; \ + ctype alpha_chi1_chi1; \ + ctype conjx0_chi1; \ + ctype conjx1_chi1; \ + dim_t i; \ + dim_t n_behind; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ \ /* Eliminate unused variable warnings. */ \ ( void )conj0; \ -\ - if ( bli_zero_dim1( m ) ) return; \ -\ - if ( PASTEMAC(chx,eq0)( *alpha_cast ) ) return; \ \ /* Make a local copy of alpha and zero out the imaginary component if we are being invoked as her, since her requires alpha to be real. */ \ - PASTEMAC2(chx,chx,copys)( *alpha_cast, alpha_local ); \ + PASTEMAC(ch,copys)( *alpha, alpha_local ); \ if ( bli_is_conj( conjh ) ) \ { \ - PASTEMAC(chx,seti0s)( alpha_local ); \ + PASTEMAC(ch,seti0s)( alpha_local ); \ } \ \ /* The algorithm will be expressed in terms of the lower triangular case; @@ -174,52 +99,51 @@ void PASTEMAC2(chx,chc,varname)( \ conjugation for the scalar and vector subproblems. */ \ conj0 = conjx; \ conj1 = bli_apply_conj( conjh, conjx ); \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_behind = i; \ - x0 = x_cast + (0 )*incx; \ - chi1 = x_cast + (i )*incx; \ - c10t = c_cast + (i )*rs_ct + (0 )*cs_ct; \ - gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ + x0 = x + (0 )*incx; \ + chi1 = x + (i )*incx; \ + c10t = c + (i )*rs_ct + (0 )*cs_ct; \ + gamma11 = c + (i )*rs_ct + (i )*cs_ct; \ \ /* Apply conjx to chi1. */ \ - PASTEMAC2(chx,chx,copycjs)( conj0, *chi1, conjx0_chi1 ); \ - PASTEMAC2(chx,chx,copycjs)( conj1, *chi1, conjx1_chi1 ); \ + PASTEMAC(ch,copycjs)( conj0, *chi1, conjx0_chi1 ); \ + PASTEMAC(ch,copycjs)( conj1, *chi1, conjx1_chi1 ); \ \ /* Compute scalar for vector subproblem. */ \ - PASTEMAC3(chx,chx,chx,scal2s)( alpha_local, conjx0_chi1, alpha_chi1 ); \ + PASTEMAC(ch,scal2s)( alpha_local, conjx0_chi1, alpha_chi1 ); \ \ /* Compute alpha * chi1 * conj(chi1) after chi1 has already been conjugated, if needed, by conjx. */ \ - PASTEMAC3(chx,chx,chx,scal2s)( alpha_chi1, conjx1_chi1, alpha_chi1_chi1 ); \ + PASTEMAC(ch,scal2s)( alpha_chi1, conjx1_chi1, alpha_chi1_chi1 ); \ \ /* c10t = c10t + alpha * chi1 * x0'; */ \ - PASTEMAC3(chx,chx,chc,kername)( conj1, \ - n_behind, \ - &alpha_chi1, \ - x0, incx, \ - c10t, cs_ct ); \ + kfp_av \ + ( \ + conj1, \ + n_behind, \ + &alpha_chi1, \ + x0, incx, \ + c10t, cs_ct, \ + cntx \ + ); \ \ /* gamma11 = gamma11 + alpha * chi1 * conj(chi1); */ \ - PASTEMAC2(chx,chc,adds)( alpha_chi1_chi1, *gamma11 ); \ + PASTEMAC(ch,adds)( alpha_chi1_chi1, *gamma11 ); \ \ /* For her2, explicitly set the imaginary component of gamma11 to zero. */ \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(chc,seti0s)( *gamma11 ); \ + PASTEMAC(ch,seti0s)( *gamma11 ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( her_unb_var1, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( her_unb_var1, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( her_unb_var1, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( her_unb_var1 ) diff --git a/frame/2/her/bli_her_unb_var2.c b/frame/2/her/bli_her_unb_var2.c index ea2475200..4967f4df5 100644 --- a/frame/2/her/bli_her_unb_var2.c +++ b/frame/2/her/bli_her_unb_var2.c @@ -34,121 +34,46 @@ #include "blis.h" -#define FUNCPTR_T her_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conjx, - conj_t conjh, - dim_t m, - void* alpha, - void* x, inc_t incx, - void* c, inc_t rs_c, inc_t cs_c - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,her_unb_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,her_unb_var2); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,her_unb_var2); -#endif -#endif - - -void bli_her_unb_var2( conj_t conjh, - obj_t* alpha, - obj_t* x, - obj_t* c, - her_t* cntl ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_c = bli_obj_datatype( *c ); - - uplo_t uplo = bli_obj_uplo( *c ); - conj_t conjx = bli_obj_conj_status( *x ); - - dim_t m = bli_obj_length( *c ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_c = bli_obj_buffer_at_off( *c ); - inc_t rs_c = bli_obj_row_stride( *c ); - inc_t cs_c = bli_obj_col_stride( *c ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - - // If alpha is a scalar constant, use dt_x to extract the address of the - // corresponding constant value; otherwise, use the datatype encoded - // within the alpha object and extract the buffer at the alpha offset. - bli_set_scalar_dt_buffer( alpha, dt_x, dt_alpha, buf_alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_c]; - - // Invoke the function. - f( uplo, - conjx, - conjh, - m, - buf_alpha, - buf_x, incx, - buf_c, rs_c, cs_c ); -} - - -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_c, chx, chc, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(chx,chc,varname)( \ - uplo_t uplo, \ - conj_t conjx, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* x, inc_t incx, \ - void* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, /* complex alpha allows her variants to also perform syr. */ \ + ctype* x, inc_t incx, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ - ctype_x* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_c* c_cast = c; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_c* gamma11; \ - ctype_c* c21; \ - ctype_x alpha_local; \ - ctype_x alpha_chi1; \ - ctype_x alpha_chi1_chi1; \ - ctype_x conjx0_chi1; \ - ctype_x conjx1_chi1; \ - dim_t i; \ - dim_t n_ahead; \ - inc_t rs_ct, cs_ct; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype* chi1; \ + ctype* x2; \ + ctype* gamma11; \ + ctype* c21; \ + ctype alpha_local; \ + ctype alpha_chi1; \ + ctype alpha_chi1_chi1; \ + ctype conjx0_chi1; \ + ctype conjx1_chi1; \ + dim_t i; \ + dim_t n_ahead; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ \ /* Eliminate unused variable warnings. */ \ ( void )conj0; \ -\ - if ( bli_zero_dim1( m ) ) return; \ -\ - if ( PASTEMAC(chx,eq0)( *alpha_cast ) ) return; \ \ /* Make a local copy of alpha and zero out the imaginary component if we are being invoked as her, since her requires alpha to be real. */ \ - PASTEMAC2(chx,chx,copys)( *alpha_cast, alpha_local ); \ + PASTEMAC(ch,copys)( *alpha, alpha_local ); \ if ( bli_is_conj( conjh ) ) \ { \ - PASTEMAC(chx,seti0s)( alpha_local ); \ + PASTEMAC(ch,seti0s)( alpha_local ); \ } \ \ /* The algorithm will be expressed in terms of the lower triangular case; @@ -174,52 +99,51 @@ void PASTEMAC2(chx,chc,varname)( \ conjugation for the scalar and vector subproblems. */ \ conj0 = bli_apply_conj( conjh, conjx ); \ conj1 = conjx; \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_ahead = m - i - 1; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ - gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ - c21 = c_cast + (i+1)*rs_ct + (i )*cs_ct; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ + gamma11 = c + (i )*rs_ct + (i )*cs_ct; \ + c21 = c + (i+1)*rs_ct + (i )*cs_ct; \ \ /* Apply conjx to chi1. */ \ - PASTEMAC2(chx,chx,copycjs)( conj0, *chi1, conjx0_chi1 ); \ - PASTEMAC2(chx,chx,copycjs)( conj1, *chi1, conjx1_chi1 ); \ + PASTEMAC(ch,copycjs)( conj0, *chi1, conjx0_chi1 ); \ + PASTEMAC(ch,copycjs)( conj1, *chi1, conjx1_chi1 ); \ \ /* Compute scalar for vector subproblem. */ \ - PASTEMAC3(chx,chx,chx,scal2s)( alpha_local, conjx0_chi1, alpha_chi1 ); \ + PASTEMAC(ch,scal2s)( alpha_local, conjx0_chi1, alpha_chi1 ); \ \ /* Compute alpha * chi1 * conj(chi1) after chi1 has already been conjugated, if needed, by conjx. */ \ - PASTEMAC3(chx,chx,chx,scal2s)( alpha_chi1, conjx1_chi1, alpha_chi1_chi1 ); \ + PASTEMAC(ch,scal2s)( alpha_chi1, conjx1_chi1, alpha_chi1_chi1 ); \ \ /* c21 = c21 + alpha * x2 * conj(chi1); */ \ - PASTEMAC3(chx,chx,chc,kername)( conj1, \ - n_ahead, \ - &alpha_chi1, \ - x2, incx, \ - c21, rs_ct ); \ + kfp_av \ + ( \ + conj1, \ + n_ahead, \ + &alpha_chi1, \ + x2, incx, \ + c21, rs_ct, \ + cntx \ + ); \ \ /* gamma11 = gamma11 + alpha * chi1 * conj(chi1); */ \ - PASTEMAC2(chx,chc,adds)( alpha_chi1_chi1, *gamma11 ); \ + PASTEMAC(ch,adds)( alpha_chi1_chi1, *gamma11 ); \ \ /* For her, explicitly set the imaginary component of gamma11 to zero. */ \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(chc,seti0s)( *gamma11 ); \ + PASTEMAC(ch,seti0s)( *gamma11 ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2_BASIC( her_unb_var2, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( her_unb_var2, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( her_unb_var2, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( her_unb_var2 ) diff --git a/frame/1/axpyv/bli_axpyv_kernel.h b/frame/2/her/bli_her_var.h similarity index 65% rename from frame/1/axpyv/bli_axpyv_kernel.h rename to frame/2/her/bli_her_var.h index 2db508ef1..3e65e2bc4 100644 --- a/frame/1/axpyv/bli_axpyv_kernel.h +++ b/frame/2/her/bli_her_var.h @@ -32,33 +32,50 @@ */ -void bli_axpyv_kernel( obj_t* alpha, - obj_t* x, - obj_t* y ); - // -// Prototype the void pointer kernel wrappers. +// Prototype object-based interfaces. // -#undef GENTPROT3 -#define GENTPROT3( ctype_a, ctype_x, ctype_y, cha, chx, chy, varname ) \ +#undef GENPROT +#define GENPROT( opname ) \ \ -void PASTEMAC3(cha,chx,chy,varname)( \ - conj_t conjx, \ - dim_t n, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ); +void PASTEMAC0(opname) \ + ( \ + conj_t conjh, \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* c, \ + cntx_t* cntx, \ + her_t* cntl \ + ); -INSERT_GENTPROT3_BASIC( axpyv_kernel_void ) +GENPROT( her_blk_var1 ) +GENPROT( her_blk_var2 ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3_MIX_D( axpyv_kernel_void ) -#endif +GENPROT( her_unb_var1 ) +GENPROT( her_unb_var2 ) -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3_MIX_P( axpyv_kernel_void ) -#endif + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, /* complex alpha allows her variants to also perform syr. */ \ + ctype* x, inc_t incx, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( her_unb_var1 ) +INSERT_GENTPROTR_BASIC( her_unb_var2 ) diff --git a/frame/2/her/bli_her_var_oapi.c b/frame/2/her/bli_her_var_oapi.c new file mode 100644 index 000000000..a49cf62e0 --- /dev/null +++ b/frame/2/her/bli_her_var_oapi.c @@ -0,0 +1,84 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + conj_t conjh, \ + obj_t* alpha, \ + obj_t* x, \ + obj_t* c, \ + cntx_t* cntx, \ + her_t* cntl \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *c ); \ +\ + uplo_t uplo = bli_obj_uplo( *c ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ +\ + dim_t m = bli_obj_length( *c ); \ +\ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_c = bli_obj_buffer_at_off( *c ); \ + inc_t rs_c = bli_obj_row_stride( *c ); \ + inc_t cs_c = bli_obj_col_stride( *c ); \ +\ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + bli_call_ft_11 \ + ( \ + dt, \ + opname, \ + uplo, \ + conjx, \ + conjh, \ + m, \ + buf_alpha, \ + buf_x, incx, \ + buf_c, rs_c, cs_c, \ + cntx \ + ); \ +} \ + +GENFRONT( her_unb_var1 ) +GENFRONT( her_unb_var2 ) + diff --git a/frame/2/her/bli_her_blk_var1.h b/frame/2/her/old/bli_her_blk_var1.h similarity index 98% rename from frame/2/her/bli_her_blk_var1.h rename to frame/2/her/old/bli_her_blk_var1.h index 70b55a82c..0a5153beb 100644 --- a/frame/2/her/bli_her_blk_var1.h +++ b/frame/2/her/old/bli_her_blk_var1.h @@ -36,5 +36,6 @@ void bli_her_blk_var1( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ); diff --git a/frame/2/her/bli_her_blk_var2.h b/frame/2/her/old/bli_her_blk_var2.h similarity index 98% rename from frame/2/her/bli_her_blk_var2.h rename to frame/2/her/old/bli_her_blk_var2.h index bd86b72d9..d0ba7b384 100644 --- a/frame/2/her/bli_her_blk_var2.h +++ b/frame/2/her/old/bli_her_blk_var2.h @@ -36,5 +36,6 @@ void bli_her_blk_var2( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ); diff --git a/frame/2/her/bli_her_check.c b/frame/2/her/old/bli_her_check.c similarity index 99% rename from frame/2/her/bli_her_check.c rename to frame/2/her/old/bli_her_check.c index 164d885ab..86adc5026 100644 --- a/frame/2/her/bli_her_check.c +++ b/frame/2/her/old/bli_her_check.c @@ -99,6 +99,7 @@ void bli_her_int_check( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ) { err_t e_val; diff --git a/frame/2/her/bli_her_check.h b/frame/2/her/old/bli_her_check.h similarity index 98% rename from frame/2/her/bli_her_check.h rename to frame/2/her/old/bli_her_check.h index ed9a4856a..0dcb7d9c3 100644 --- a/frame/2/her/bli_her_check.h +++ b/frame/2/her/old/bli_her_check.h @@ -46,5 +46,6 @@ void bli_her_int_check( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ); diff --git a/frame/2/her/old/bli_her_unb_var1.c b/frame/2/her/old/bli_her_unb_var1.c new file mode 100644 index 000000000..27eb63543 --- /dev/null +++ b/frame/2/her/old/bli_her_unb_var1.c @@ -0,0 +1,227 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T her_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + conj_t conjx, + conj_t conjh, + dim_t m, + void* alpha, + void* x, inc_t incx, + void* c, inc_t rs_c, inc_t cs_c + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY2_ALL(ftypes,her_unb_var1); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY2_EXT(ftypes,her_unb_var1); +#else +static FUNCPTR_T GENARRAY2_MIN(ftypes,her_unb_var1); +#endif +#endif + + +void bli_her_unb_var1( conj_t conjh, + obj_t* alpha, + obj_t* x, + obj_t* c, + her_t* cntl ) +{ + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_c = bli_obj_datatype( *c ); + + uplo_t uplo = bli_obj_uplo( *c ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *c ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_c = bli_obj_buffer_at_off( *c ); + inc_t rs_c = bli_obj_row_stride( *c ); + inc_t cs_c = bli_obj_col_stride( *c ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + + // If alpha is a scalar constant, use dt_x to extract the address of the + // corresponding constant value; otherwise, use the datatype encoded + // within the alpha object and extract the buffer at the alpha offset. + bli_set_scalar_dt_buffer( alpha, dt_x, dt_alpha, buf_alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_x][dt_c]; + + // Invoke the function. + f( uplo, + conjx, + conjh, + m, + buf_alpha, + buf_x, incx, + buf_c, rs_c, cs_c ); +} + + +#undef GENTFUNC2 +#define GENTFUNC2( ctype_x, ctype_c, chx, chc, varname, kername ) \ +\ +void PASTEMAC2(chx,chc,varname)( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* x, inc_t incx, \ + void* c, inc_t rs_c, inc_t cs_c \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_x* alpha_cast = alpha; \ + ctype_x* x_cast = x; \ + ctype_c* c_cast = c; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_c* c10t; \ + ctype_c* gamma11; \ + ctype_x alpha_local; \ + ctype_x alpha_chi1; \ + ctype_x alpha_chi1_chi1; \ + ctype_x conjx0_chi1; \ + ctype_x conjx1_chi1; \ + dim_t i; \ + dim_t n_behind; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ +\ + /* Eliminate unused variable warnings. */ \ + ( void )conj0; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( PASTEMAC(chx,eq0)( *alpha_cast ) ) return; \ +\ + /* Make a local copy of alpha and zero out the imaginary component if + we are being invoked as her, since her requires alpha to be real. */ \ + PASTEMAC2(chx,chx,copys)( *alpha_cast, alpha_local ); \ + if ( bli_is_conj( conjh ) ) \ + { \ + PASTEMAC(chx,seti0s)( alpha_local ); \ + } \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_ct = rs_c; \ + cs_ct = cs_c; \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_ct = cs_c; \ + cs_ct = rs_c; \ +\ + /* Toggle conjugation of conjx, but only if we are being invoked + as her; for syr, conjx is unchanged. */ \ + conjx = bli_apply_conj( conjh, conjx ); \ + } \ +\ + /* Apply conjh (which carries the conjugation component of the Hermitian + transpose, if applicable) to conjx as needed to arrive at the effective + conjugation for the scalar and vector subproblems. */ \ + conj0 = conjx; \ + conj1 = bli_apply_conj( conjh, conjx ); \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_behind = i; \ + x0 = x_cast + (0 )*incx; \ + chi1 = x_cast + (i )*incx; \ + c10t = c_cast + (i )*rs_ct + (0 )*cs_ct; \ + gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ +\ + /* Apply conjx to chi1. */ \ + PASTEMAC2(chx,chx,copycjs)( conj0, *chi1, conjx0_chi1 ); \ + PASTEMAC2(chx,chx,copycjs)( conj1, *chi1, conjx1_chi1 ); \ +\ + /* Compute scalar for vector subproblem. */ \ + PASTEMAC3(chx,chx,chx,scal2s)( alpha_local, conjx0_chi1, alpha_chi1 ); \ +\ + /* Compute alpha * chi1 * conj(chi1) after chi1 has already been + conjugated, if needed, by conjx. */ \ + PASTEMAC3(chx,chx,chx,scal2s)( alpha_chi1, conjx1_chi1, alpha_chi1_chi1 ); \ +\ + /* c10t = c10t + alpha * chi1 * x0'; */ \ + PASTEMAC3(chx,chx,chc,kername)( conj1, \ + n_behind, \ + &alpha_chi1, \ + x0, incx, \ + c10t, cs_ct ); \ +\ + /* gamma11 = gamma11 + alpha * chi1 * conj(chi1); */ \ + PASTEMAC2(chx,chc,adds)( alpha_chi1_chi1, *gamma11 ); \ +\ + /* For her2, explicitly set the imaginary component of gamma11 to + zero. */ \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(chc,seti0s)( *gamma11 ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2_BASIC( her_unb_var1, AXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC2_MIX_D( her_unb_var1, AXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC2_MIX_P( her_unb_var1, AXPYV_KERNEL ) +#endif + diff --git a/frame/2/her/bli_her_unb_var1.h b/frame/2/her/old/bli_her_unb_var1.h similarity index 98% rename from frame/2/her/bli_her_unb_var1.h rename to frame/2/her/old/bli_her_unb_var1.h index 358c51041..6e7a84c53 100644 --- a/frame/2/her/bli_her_unb_var1.h +++ b/frame/2/her/old/bli_her_unb_var1.h @@ -37,6 +37,7 @@ void bli_her_unb_var1( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ); diff --git a/frame/2/her/old/bli_her_unb_var2.c b/frame/2/her/old/bli_her_unb_var2.c new file mode 100644 index 000000000..33031cd45 --- /dev/null +++ b/frame/2/her/old/bli_her_unb_var2.c @@ -0,0 +1,227 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T her_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + conj_t conjx, + conj_t conjh, + dim_t m, + void* alpha, + void* x, inc_t incx, + void* c, inc_t rs_c, inc_t cs_c + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY2_ALL(ftypes,her_unb_var2); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY2_EXT(ftypes,her_unb_var2); +#else +static FUNCPTR_T GENARRAY2_MIN(ftypes,her_unb_var2); +#endif +#endif + + +void bli_her_unb_var2( conj_t conjh, + obj_t* alpha, + obj_t* x, + obj_t* c, + her_t* cntl ) +{ + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_c = bli_obj_datatype( *c ); + + uplo_t uplo = bli_obj_uplo( *c ); + conj_t conjx = bli_obj_conj_status( *x ); + + dim_t m = bli_obj_length( *c ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_c = bli_obj_buffer_at_off( *c ); + inc_t rs_c = bli_obj_row_stride( *c ); + inc_t cs_c = bli_obj_col_stride( *c ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + + // If alpha is a scalar constant, use dt_x to extract the address of the + // corresponding constant value; otherwise, use the datatype encoded + // within the alpha object and extract the buffer at the alpha offset. + bli_set_scalar_dt_buffer( alpha, dt_x, dt_alpha, buf_alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_x][dt_c]; + + // Invoke the function. + f( uplo, + conjx, + conjh, + m, + buf_alpha, + buf_x, incx, + buf_c, rs_c, cs_c ); +} + + +#undef GENTFUNC2 +#define GENTFUNC2( ctype_x, ctype_c, chx, chc, varname, kername ) \ +\ +void PASTEMAC2(chx,chc,varname)( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* x, inc_t incx, \ + void* c, inc_t rs_c, inc_t cs_c \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_x* alpha_cast = alpha; \ + ctype_x* x_cast = x; \ + ctype_c* c_cast = c; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_c* gamma11; \ + ctype_c* c21; \ + ctype_x alpha_local; \ + ctype_x alpha_chi1; \ + ctype_x alpha_chi1_chi1; \ + ctype_x conjx0_chi1; \ + ctype_x conjx1_chi1; \ + dim_t i; \ + dim_t n_ahead; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ +\ + /* Eliminate unused variable warnings. */ \ + ( void )conj0; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( PASTEMAC(chx,eq0)( *alpha_cast ) ) return; \ +\ + /* Make a local copy of alpha and zero out the imaginary component if + we are being invoked as her, since her requires alpha to be real. */ \ + PASTEMAC2(chx,chx,copys)( *alpha_cast, alpha_local ); \ + if ( bli_is_conj( conjh ) ) \ + { \ + PASTEMAC(chx,seti0s)( alpha_local ); \ + } \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_ct = rs_c; \ + cs_ct = cs_c; \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_ct = cs_c; \ + cs_ct = rs_c; \ +\ + /* Toggle conjugation of conjx, but only if we are being invoked + as her; for syr, conjx is unchanged. */ \ + conjx = bli_apply_conj( conjh, conjx ); \ + } \ +\ + /* Apply conjh (which carries the conjugation component of the Hermitian + transpose, if applicable) to conjx as needed to arrive at the effective + conjugation for the scalar and vector subproblems. */ \ + conj0 = bli_apply_conj( conjh, conjx ); \ + conj1 = conjx; \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_ahead = m - i - 1; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ + gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ + c21 = c_cast + (i+1)*rs_ct + (i )*cs_ct; \ +\ + /* Apply conjx to chi1. */ \ + PASTEMAC2(chx,chx,copycjs)( conj0, *chi1, conjx0_chi1 ); \ + PASTEMAC2(chx,chx,copycjs)( conj1, *chi1, conjx1_chi1 ); \ +\ + /* Compute scalar for vector subproblem. */ \ + PASTEMAC3(chx,chx,chx,scal2s)( alpha_local, conjx0_chi1, alpha_chi1 ); \ +\ + /* Compute alpha * chi1 * conj(chi1) after chi1 has already been + conjugated, if needed, by conjx. */ \ + PASTEMAC3(chx,chx,chx,scal2s)( alpha_chi1, conjx1_chi1, alpha_chi1_chi1 ); \ +\ + /* c21 = c21 + alpha * x2 * conj(chi1); */ \ + PASTEMAC3(chx,chx,chc,kername)( conj1, \ + n_ahead, \ + &alpha_chi1, \ + x2, incx, \ + c21, rs_ct ); \ +\ + /* gamma11 = gamma11 + alpha * chi1 * conj(chi1); */ \ + PASTEMAC2(chx,chc,adds)( alpha_chi1_chi1, *gamma11 ); \ +\ + /* For her, explicitly set the imaginary component of gamma11 to + zero. */ \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(chc,seti0s)( *gamma11 ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2_BASIC( her_unb_var2, AXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC2_MIX_D( her_unb_var2, AXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC2_MIX_P( her_unb_var2, AXPYV_KERNEL ) +#endif + diff --git a/frame/2/her/bli_her_unb_var2.h b/frame/2/her/old/bli_her_unb_var2.h similarity index 98% rename from frame/2/her/bli_her_unb_var2.h rename to frame/2/her/old/bli_her_unb_var2.h index 328ddf25f..b9dc6bf07 100644 --- a/frame/2/her/bli_her_unb_var2.h +++ b/frame/2/her/old/bli_her_unb_var2.h @@ -37,6 +37,7 @@ void bli_her_unb_var2( conj_t conjh, obj_t* alpha, obj_t* x, obj_t* c, + cntx_t* cntx, her_t* cntl ); diff --git a/frame/2/her2/bli_her2.h b/frame/2/her2/bli_her2.h index a9dc2637d..273b6841e 100644 --- a/frame/2/her2/bli_her2.h +++ b/frame/2/her2/bli_her2.h @@ -33,73 +33,7 @@ */ #include "bli_her2_cntl.h" -#include "bli_her2_check.h" +#include "bli_her2_front.h" #include "bli_her2_int.h" -#include "bli_her2_unb_var1.h" -#include "bli_her2_unb_var2.h" -#include "bli_her2_unb_var3.h" -#include "bli_her2_unb_var4.h" - -#include "bli_her2_unf_var1.h" -#include "bli_her2_unf_var4.h" - -#include "bli_her2_blk_var1.h" -#include "bli_her2_blk_var2.h" -#include "bli_her2_blk_var3.h" -#include "bli_her2_blk_var4.h" - - -void bli_her2( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ); - - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( her2 ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, opname ) \ -\ -void PASTEMAC3(chx,chy,chc,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype_xy* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_c* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT3U12_BASIC( her2 ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( her2 ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( her2 ) -#endif - +#include "bli_her2_var.h" diff --git a/frame/2/her2/bli_her2_blk_var1.c b/frame/2/her2/bli_her2_blk_var1.c index 663bf50bf..b71365785 100644 --- a/frame/2/her2/bli_her2_blk_var1.c +++ b/frame/2/her2/bli_her2_blk_var1.c @@ -40,6 +40,7 @@ void bli_her2_blk_var1( conj_t conjh, obj_t* x, obj_t* y, obj_t* c, + cntx_t* cntx, her2_t* cntl ) { obj_t c11, c11_pack; @@ -75,7 +76,7 @@ void bli_her2_blk_var1( conj_t conjh, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, c, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for C11, C10, x1, x0, y1, and y0. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -93,20 +94,20 @@ void bli_her2_blk_var1( conj_t conjh, // Initialize objects for packing C11, x1, and y1 (if needed). bli_packm_init( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ) ); + cntx, cntl_sub_packm_c11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_init( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // Copy/pack C11, x1, y1 (if needed). bli_packm_int( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ), + cntx, cntl_sub_packm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_int( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // C10 = C10 + alpha * x1 * y0'; bli_ger_int( BLIS_NO_CONJUGATE, @@ -115,6 +116,7 @@ void bli_her2_blk_var1( conj_t conjh, &x1_pack, &y0, &c10, + cntx, cntl_sub_ger_rp( cntl ) ); // C10 = C10 + conj(alpha) * y1 * x0'; @@ -124,6 +126,7 @@ void bli_her2_blk_var1( conj_t conjh, &y1_pack, &x0, &c10, + cntx, cntl_sub_ger_rp( cntl ) ); // C11 = C11 + alpha * x1 * y1' + conj(alpha) * y1 * x1'; @@ -133,11 +136,12 @@ void bli_her2_blk_var1( conj_t conjh, &x1_pack, &y1_pack, &c11_pack, + cntx, cntl_sub_her2( cntl ) ); // Copy/unpack C11 (if C11 was packed). bli_unpackm_int( &c11_pack, &c11, - cntl_sub_unpackm_c11( cntl ), + cntx, cntl_sub_unpackm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); } diff --git a/frame/2/her2/bli_her2_blk_var2.c b/frame/2/her2/bli_her2_blk_var2.c index 1757fa777..43de6417e 100644 --- a/frame/2/her2/bli_her2_blk_var2.c +++ b/frame/2/her2/bli_her2_blk_var2.c @@ -40,6 +40,7 @@ void bli_her2_blk_var2( conj_t conjh, obj_t* x, obj_t* y, obj_t* c, + cntx_t* cntx, her2_t* cntl ) { obj_t c11, c11_pack; @@ -76,7 +77,7 @@ void bli_her2_blk_var2( conj_t conjh, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, c, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for C11, C10, C21, x1, x0, x2, and y1. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -96,20 +97,20 @@ void bli_her2_blk_var2( conj_t conjh, // Initialize objects for packing C11, x1, and y1 (if needed). bli_packm_init( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ) ); + cntx, cntl_sub_packm_c11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_init( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // Copy/pack C11, x1, y1 (if needed). bli_packm_int( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ), + cntx, cntl_sub_packm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_int( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // C10 = C10 + conj(alpha) * y1 * x0'; bli_ger_int( BLIS_NO_CONJUGATE, @@ -118,6 +119,7 @@ void bli_her2_blk_var2( conj_t conjh, &y1_pack, &x0, &c10, + cntx, cntl_sub_ger_rp( cntl ) ); // C21 = C21 + alpha * x2 * y1'; @@ -127,6 +129,7 @@ void bli_her2_blk_var2( conj_t conjh, &x2, &y1_pack, &c21, + cntx, cntl_sub_ger_cp( cntl ) ); // C11 = C11 + alpha * x1 * y1' + conj(alpha) * y1 * x1'; @@ -136,11 +139,12 @@ void bli_her2_blk_var2( conj_t conjh, &x1_pack, &y1_pack, &c11_pack, + cntx, cntl_sub_her2( cntl ) ); // Copy/unpack C11 (if C11 was packed). bli_unpackm_int( &c11_pack, &c11, - cntl_sub_unpackm_c11( cntl ), + cntx, cntl_sub_unpackm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); } diff --git a/frame/2/her2/bli_her2_blk_var3.c b/frame/2/her2/bli_her2_blk_var3.c index 5a12f3ce9..86382e7a5 100644 --- a/frame/2/her2/bli_her2_blk_var3.c +++ b/frame/2/her2/bli_her2_blk_var3.c @@ -40,6 +40,7 @@ void bli_her2_blk_var3( conj_t conjh, obj_t* x, obj_t* y, obj_t* c, + cntx_t* cntx, her2_t* cntl ) { obj_t c11, c11_pack; @@ -76,7 +77,7 @@ void bli_her2_blk_var3( conj_t conjh, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, c, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for C11, C10, C21, x1, y1, y0, and y2. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -96,20 +97,20 @@ void bli_her2_blk_var3( conj_t conjh, // Initialize objects for packing C11, x1, and y1 (if needed). bli_packm_init( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ) ); + cntx, cntl_sub_packm_c11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_init( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // Copy/pack C11, x1, y1 (if needed). bli_packm_int( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ), + cntx, cntl_sub_packm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_int( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // C10 = C10 + alpha * x1 * y0'; bli_ger_int( BLIS_NO_CONJUGATE, @@ -118,6 +119,7 @@ void bli_her2_blk_var3( conj_t conjh, &x1_pack, &y0, &c10, + cntx, cntl_sub_ger_rp( cntl ) ); // C21 = C21 + conj(alpha) * y2 * x1'; @@ -127,6 +129,7 @@ void bli_her2_blk_var3( conj_t conjh, &y2, &x1_pack, &c21, + cntx, cntl_sub_ger_cp( cntl ) ); // C11 = C11 + alpha * x1 * y1' + conj(alpha) * y1 * x1'; @@ -136,11 +139,12 @@ void bli_her2_blk_var3( conj_t conjh, &x1_pack, &y1_pack, &c11_pack, + cntx, cntl_sub_her2( cntl ) ); // Copy/unpack C11 (if C11 was packed). bli_unpackm_int( &c11_pack, &c11, - cntl_sub_unpackm_c11( cntl ), + cntx, cntl_sub_unpackm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); } diff --git a/frame/2/her2/bli_her2_blk_var4.c b/frame/2/her2/bli_her2_blk_var4.c index 8177115ff..ab8dce348 100644 --- a/frame/2/her2/bli_her2_blk_var4.c +++ b/frame/2/her2/bli_her2_blk_var4.c @@ -40,6 +40,7 @@ void bli_her2_blk_var4( conj_t conjh, obj_t* x, obj_t* y, obj_t* c, + cntx_t* cntx, her2_t* cntl ) { obj_t c11, c11_pack; @@ -75,7 +76,7 @@ void bli_her2_blk_var4( conj_t conjh, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, c, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for C11, C21, x1, x2, y1, and y2. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -93,20 +94,20 @@ void bli_her2_blk_var4( conj_t conjh, // Initialize objects for packing C11, x1, and y1 (if needed). bli_packm_init( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ) ); + cntx, cntl_sub_packm_c11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_init( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // Copy/pack C11, x1, y1 (if needed). bli_packm_int( &c11, &c11_pack, - cntl_sub_packm_c11( cntl ), + cntx, cntl_sub_packm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); bli_packv_int( &y1, &y1_pack, - cntl_sub_packv_y1( cntl ) ); + cntx, cntl_sub_packv_y1( cntl ) ); // C21 = C21 + alpha * x2 * y1'; bli_ger_int( BLIS_NO_CONJUGATE, @@ -115,6 +116,7 @@ void bli_her2_blk_var4( conj_t conjh, &x2, &y1_pack, &c21, + cntx, cntl_sub_ger_cp( cntl ) ); // C21 = C21 + conj(alpha) * y2 * x1'; @@ -124,6 +126,7 @@ void bli_her2_blk_var4( conj_t conjh, &y2, &x1_pack, &c21, + cntx, cntl_sub_ger_cp( cntl ) ); // C11 = C11 + alpha * x1 * y1' + conj(alpha) * y1 * x1'; @@ -133,11 +136,12 @@ void bli_her2_blk_var4( conj_t conjh, &x1_pack, &y1_pack, &c11_pack, + cntx, cntl_sub_her2( cntl ) ); // Copy/unpack C11 (if C11 was packed). bli_unpackm_int( &c11_pack, &c11, - cntl_sub_unpackm_c11( cntl ), + cntx, cntl_sub_unpackm_c11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); } diff --git a/frame/2/her2/bli_her2_cntl.c b/frame/2/her2/bli_her2_cntl.c index 3797c6223..ce9877b4b 100644 --- a/frame/2/her2/bli_her2_cntl.c +++ b/frame/2/her2/bli_her2_cntl.c @@ -41,8 +41,6 @@ extern unpackm_t* unpackm_cntl; extern ger_t* ger_cntl_rp_bs_row; extern ger_t* ger_cntl_cp_bs_col; -extern blksz_t* gemv_mc; - her2_t* her2_cntl_bs_ke_lrow_ucol; her2_t* her2_cntl_bs_ke_lcol_urow; @@ -58,16 +56,18 @@ void bli_her2_cntl_init() = bli_her2_cntl_obj_create( BLIS_UNB_FUSED, BLIS_VARIANT1, + 0, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL ); + NULL ); her2_cntl_bs_ke_lcol_urow = bli_her2_cntl_obj_create( BLIS_UNB_FUSED, BLIS_VARIANT4, + 0, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL ); + NULL ); // Create control trees for generally large problems. Here, we choose @@ -77,7 +77,7 @@ void bli_her2_cntl_init() = bli_her2_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemv_mc, + BLIS_M2, packv_cntl, // pack x1 (if needed) packv_cntl, // pack y1 (if needed) packm_cntl, // pack C11 (if needed) @@ -89,7 +89,7 @@ void bli_her2_cntl_init() = bli_her2_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT4, - gemv_mc, + BLIS_M2, packv_cntl, // pack x1 (if needed) packv_cntl, // pack y1 (if needed) packm_cntl, // pack C11 (if needed) @@ -110,7 +110,7 @@ void bli_her2_cntl_finalize() her2_t* bli_her2_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x1, packv_t* sub_packv_y1, packm_t* sub_packm_c11, @@ -125,7 +125,7 @@ her2_t* bli_her2_cntl_obj_create( impl_t impl_type, cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_packv_x1 = sub_packv_x1; cntl->sub_packv_y1 = sub_packv_y1; cntl->sub_packm_c11 = sub_packm_c11; @@ -140,7 +140,7 @@ her2_t* bli_her2_cntl_obj_create( impl_t impl_type, void bli_her2_cntl_obj_init( her2_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x1, packv_t* sub_packv_y1, packm_t* sub_packm_c11, @@ -151,7 +151,7 @@ void bli_her2_cntl_obj_init( her2_t* cntl, { cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_packv_x1 = sub_packv_x1; cntl->sub_packv_y1 = sub_packv_y1; cntl->sub_packm_c11 = sub_packm_c11; diff --git a/frame/2/her2/bli_her2_cntl.h b/frame/2/her2/bli_her2_cntl.h index fcd6f6289..8a6696fc8 100644 --- a/frame/2/her2/bli_her2_cntl.h +++ b/frame/2/her2/bli_her2_cntl.h @@ -36,7 +36,7 @@ struct her2_s { impl_t impl_type; varnum_t var_num; - blksz_t* b; + bszid_t bszid; struct packv_s* sub_packv_x1; struct packv_s* sub_packv_y1; struct packm_s* sub_packm_c11; @@ -53,7 +53,7 @@ void bli_her2_cntl_init( void ); void bli_her2_cntl_finalize( void ); her2_t* bli_her2_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x1, packv_t* sub_packv_y1, packm_t* sub_packm_c11, @@ -64,7 +64,7 @@ her2_t* bli_her2_cntl_obj_create( impl_t impl_type, void bli_her2_cntl_obj_init( her2_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packv_t* sub_packv_x1, packv_t* sub_packv_y1, packm_t* sub_packm_c11, diff --git a/frame/2/her2/bli_her2.c b/frame/2/her2/bli_her2_front.c similarity index 78% rename from frame/2/her2/bli_her2.c rename to frame/2/her2/bli_her2_front.c index d90ad6ced..c2b7cff6b 100644 --- a/frame/2/her2/bli_her2.c +++ b/frame/2/her2/bli_her2_front.c @@ -39,10 +39,14 @@ extern her2_t* her2_cntl_bs_ke_lcol_urow; extern her2_t* her2_cntl_ge_lrow_ucol; extern her2_t* her2_cntl_ge_lcol_urow; -void bli_her2( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ) +void bli_her2_front + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c, + cntx_t* cntx + ) { her2_t* her2_cntl; num_t dt_targ_x; @@ -57,7 +61,7 @@ void bli_her2( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_her2_check( BLIS_CONJUGATE, alpha, x, y, c ); + bli_her2_check( alpha, x, y, c ); // Query the target datatypes of each object. @@ -130,7 +134,6 @@ void bli_her2( obj_t* alpha, } } - // Invoke the internal back-end with the copy-cast scalar and the // chosen control tree. Set conjh to BLIS_CONJUGATE to invoke the // Hermitian (and not symmetric) algorithms. @@ -140,6 +143,7 @@ void bli_her2( obj_t* alpha, x, y, c, + cntx, her2_cntl ); } @@ -148,18 +152,20 @@ void bli_her2( obj_t* alpha, // Define BLAS-like interfaces with homogeneous-typed operands. // #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -186,39 +192,9 @@ void PASTEMAC(ch,opname)( \ PASTEMAC0(opname)( &alphao, \ &xo, \ &yo, \ - &co ); \ + &co, \ + cntx ); \ } -INSERT_GENTFUNC_BASIC( her2, her2 ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, opname, varname ) \ -\ -void PASTEMAC3(chx,chy,chc,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype_xy* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_c* c, inc_t rs_c, inc_t cs_c \ - ) \ -{ \ - bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ -} - -INSERT_GENTFUNC3U12_BASIC( her2, her2 ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( her2, her2 ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( her2, her2 ) -#endif +INSERT_GENTFUNC_BASIC0( her2_front ) diff --git a/frame/2/her2/bli_her2_front.h b/frame/2/her2/bli_her2_front.h new file mode 100644 index 000000000..2f86f566b --- /dev/null +++ b/frame/2/her2/bli_her2_front.h @@ -0,0 +1,61 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +void bli_her2_front + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c, + cntx_t* cntx + ); + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( her2_front ) + diff --git a/frame/2/her2/bli_her2_int.c b/frame/2/her2/bli_her2_int.c index 11078d44e..dbe058e6e 100644 --- a/frame/2/her2/bli_her2_int.c +++ b/frame/2/her2/bli_her2_int.c @@ -36,13 +36,14 @@ #define FUNCPTR_T her2_fp -typedef void (*FUNCPTR_T)( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ); +typedef void (*FUNCPTR_T)( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + cntx_t* cntx, + her2_t* cntl ); static FUNCPTR_T vars[4][3] = { @@ -53,13 +54,14 @@ static FUNCPTR_T vars[4][3] = { bli_her2_unb_var4, bli_her2_unf_var4, bli_her2_blk_var4 }, }; -void bli_her2_int( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ) +void bli_her2_int( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + cntx_t* cntx, + her2_t* cntl ) { varnum_t n; impl_t i; @@ -72,7 +74,10 @@ void bli_her2_int( conj_t conjh, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_her2_int_check( conjh, alpha, x, y, c, cntl ); + { + if ( bli_is_conj( conjh ) ) bli_her2_check( alpha, x, y, c ); + else bli_syr2_check( alpha, x, y, c ); + } // If C, x, or y has a zero dimension, return early. if ( bli_obj_has_zero_dim( *c ) ) return; @@ -123,6 +128,7 @@ void bli_her2_int( conj_t conjh, &x_local, &y_local, &c_local, + cntx, cntl ); } diff --git a/frame/2/her2/bli_her2_int.h b/frame/2/her2/bli_her2_int.h index c03e9b918..cddb728ab 100644 --- a/frame/2/her2/bli_her2_int.h +++ b/frame/2/her2/bli_her2_int.h @@ -32,11 +32,12 @@ */ -void bli_her2_int( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ); +void bli_her2_int( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + cntx_t* cntx, + her2_t* cntl ); diff --git a/frame/2/her2/bli_her2_unb_var1.c b/frame/2/her2/bli_her2_unb_var1.c index 8ed71dd38..50a8f7e5b 100644 --- a/frame/2/her2/bli_her2_unb_var1.c +++ b/frame/2/her2/bli_her2_unb_var1.c @@ -34,129 +34,44 @@ #include "blis.h" -#define FUNCPTR_T her2_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conjx, - conj_t conjy, - conj_t conjh, - dim_t m, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* c, inc_t rs_c, inc_t cs_c - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unb_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unb_var1); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unb_var1); -#endif -#endif - - -void bli_her2_unb_var1( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_c = bli_obj_datatype( *c ); - - uplo_t uplo = bli_obj_uplo( *c ); - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - - dim_t m = bli_obj_length( *c ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - void* buf_c = bli_obj_buffer_at_off( *c ); - inc_t rs_c = bli_obj_row_stride( *c ); - inc_t cs_c = bli_obj_col_stride( *c ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of the datatypes of x and y. - dt_alpha = bli_datatype_union( dt_x, dt_y ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_c]; - - // Invoke the function. - f( uplo, - conjx, - conjy, - conjh, - m, - buf_alpha, - buf_x, incx, - buf_y, incy, - buf_c, rs_c, cs_c ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(chx,chy,chc,varname)( \ - uplo_t uplo, \ - conj_t conjx, \ - conj_t conjy, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ - ctype_xy* two = PASTEMAC(chxy,2); \ - ctype_xy* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_c* c_cast = c; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_y* y0; \ - ctype_y* psi1; \ - ctype_c* c10t; \ - ctype_c* gamma11; \ - ctype_xy alpha0; \ - ctype_xy alpha1; \ - ctype_xy alpha0_chi1; \ - ctype_xy alpha1_psi1; \ - ctype_xy alpha0_chi1_psi1; \ - ctype_x conjx0_chi1; \ - ctype_y conjy1_psi1; \ - ctype_y conjy0_psi1; \ - dim_t i; \ - dim_t n_behind; \ - inc_t rs_ct, cs_ct; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ -\ - if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ + ctype* two = PASTEMAC(ch,2); \ + ctype* x0; \ + ctype* chi1; \ + ctype* y0; \ + ctype* psi1; \ + ctype* c10t; \ + ctype* gamma11; \ + ctype alpha0; \ + ctype alpha1; \ + ctype alpha0_chi1; \ + ctype alpha1_psi1; \ + ctype alpha0_chi1_psi1; \ + ctype conjx0_chi1; \ + ctype conjy1_psi1; \ + ctype conjy0_psi1; \ + dim_t i; \ + dim_t n_behind; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -166,8 +81,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ rs_ct = rs_c; \ cs_ct = cs_c; \ \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copys)( *alpha, alpha0 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha1 ); \ } \ else /* if ( bli_is_upper( uplo ) ) */ \ { \ @@ -179,8 +94,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ conjx = bli_apply_conj( conjh, conjx ); \ conjy = bli_apply_conj( conjh, conjy ); \ \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha0 ); \ + PASTEMAC(ch,copys)( *alpha, alpha1 ); \ } \ \ /* Apply conjh (which carries the conjugation component of the Hermitian @@ -188,64 +103,67 @@ void PASTEMAC3(chx,chy,chc,varname)( \ the effective conjugation for the vector subproblems. */ \ conj0 = bli_apply_conj( conjh, conjy ); \ conj1 = bli_apply_conj( conjh, conjx ); \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_behind = i; \ - x0 = x_cast + (0 )*incx; \ - chi1 = x_cast + (i )*incx; \ - y0 = y_cast + (0 )*incy; \ - psi1 = y_cast + (i )*incy; \ - c10t = c_cast + (i )*rs_ct + (0 )*cs_ct; \ - gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ + x0 = x + (0 )*incx; \ + chi1 = x + (i )*incx; \ + y0 = y + (0 )*incy; \ + psi1 = y + (i )*incy; \ + c10t = c + (i )*rs_ct + (0 )*cs_ct; \ + gamma11 = c + (i )*rs_ct + (i )*cs_ct; \ \ /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ - PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx0_chi1 ); \ - PASTEMAC2(chy,chy,copycjs)( conjy, *psi1, conjy1_psi1 ); \ - PASTEMAC2(chy,chy,copycjs)( conj0, *psi1, conjy0_psi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, conjx0_chi1 ); \ + PASTEMAC(ch,copycjs)( conjy, *psi1, conjy1_psi1 ); \ + PASTEMAC(ch,copycjs)( conj0, *psi1, conjy0_psi1 ); \ \ /* Compute scalars for vector subproblems. */ \ - PASTEMAC3(chxy,chx,chxy,scal2s)( alpha0, conjx0_chi1, alpha0_chi1 ); \ - PASTEMAC3(chxy,chx,chxy,scal2s)( alpha1, conjy1_psi1, alpha1_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha0, conjx0_chi1, alpha0_chi1 ); \ + PASTEMAC(ch,scal2s)( alpha1, conjy1_psi1, alpha1_psi1 ); \ \ /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have already been conjugated, if needed, by conjx and conjy. */ \ - PASTEMAC3(chy,chxy,chxy,scal2s)( alpha0_chi1, conjy0_psi1, alpha0_chi1_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha0_chi1, conjy0_psi1, alpha0_chi1_psi1 ); \ \ /* c10t = c10t + alpha * chi1 * y0'; */ \ - PASTEMAC3(chxy,chy,chc,kername)( conj0, \ - n_behind, \ - &alpha0_chi1, \ - y0, incy, \ - c10t, cs_ct ); \ + kfp_av \ + ( \ + conj0, \ + n_behind, \ + &alpha0_chi1, \ + y0, incy, \ + c10t, cs_ct, \ + cntx \ + ); \ \ /* c10t = c10t + conj(alpha) * psi1 * x0'; */ \ - PASTEMAC3(chxy,chx,chc,kername)( conj1, \ - n_behind, \ - &alpha1_psi1, \ - x0, incx, \ - c10t, cs_ct ); \ + kfp_av \ + ( \ + conj1, \ + n_behind, \ + &alpha1_psi1, \ + x0, incx, \ + c10t, cs_ct, \ + cntx \ + ); \ \ /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + conj(alpha) * psi1 * conj(chi1); */ \ - PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ + PASTEMAC(ch,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ \ /* For her2, explicitly set the imaginary component of gamma11 to zero. */ \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(chc,seti0s)( *gamma11 ); \ + PASTEMAC(ch,seti0s)( *gamma11 ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( her2_unb_var1, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( her2_unb_var1, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( her2_unb_var1, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( her2_unb_var1 ) diff --git a/frame/2/her2/bli_her2_unb_var2.c b/frame/2/her2/bli_her2_unb_var2.c index 6c6f2d7a5..55b3c980a 100644 --- a/frame/2/her2/bli_her2_unb_var2.c +++ b/frame/2/her2/bli_her2_unb_var2.c @@ -34,135 +34,50 @@ #include "blis.h" -#define FUNCPTR_T her2_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conjx, - conj_t conjy, - conj_t conjh, - dim_t m, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* c, inc_t rs_c, inc_t cs_c - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unb_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unb_var2); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unb_var2); -#endif -#endif - - -void bli_her2_unb_var2( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_c = bli_obj_datatype( *c ); - - uplo_t uplo = bli_obj_uplo( *c ); - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - - dim_t m = bli_obj_length( *c ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - void* buf_c = bli_obj_buffer_at_off( *c ); - inc_t rs_c = bli_obj_row_stride( *c ); - inc_t cs_c = bli_obj_col_stride( *c ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of the datatypes of x and y. - dt_alpha = bli_datatype_union( dt_x, dt_y ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_c]; - - // Invoke the function. - f( uplo, - conjx, - conjy, - conjh, - m, - buf_alpha, - buf_x, incx, - buf_y, incy, - buf_c, rs_c, cs_c ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(chx,chy,chc,varname)( \ - uplo_t uplo, \ - conj_t conjx, \ - conj_t conjy, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ - ctype_xy* two = PASTEMAC(chxy,2); \ - ctype_xy* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_c* c_cast = c; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_y* psi1; \ - ctype_c* c10t; \ - ctype_c* gamma11; \ - ctype_c* c21; \ - ctype_xy alpha0; \ - ctype_xy alpha1; \ - ctype_xy alpha0_psi1; \ - ctype_xy alpha1_psi1; \ - ctype_xy alpha0_chi1_psi1; \ - ctype_y conjy0_psi1; \ - ctype_y conjy1_psi1; \ - ctype_x conjx0_chi1; \ - dim_t i; \ - dim_t n_behind; \ - dim_t n_ahead; \ - inc_t rs_ct, cs_ct; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype* two = PASTEMAC(ch,2); \ + ctype* x0; \ + ctype* chi1; \ + ctype* x2; \ + ctype* psi1; \ + ctype* c10t; \ + ctype* gamma11; \ + ctype* c21; \ + ctype alpha0; \ + ctype alpha1; \ + ctype alpha0_psi1; \ + ctype alpha1_psi1; \ + ctype alpha0_chi1_psi1; \ + ctype conjy0_psi1; \ + ctype conjy1_psi1; \ + ctype conjx0_chi1; \ + dim_t i; \ + dim_t n_behind; \ + dim_t n_ahead; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ conj_t conjh_conjy; \ \ /* Eliminate unused variable warnings. */ \ ( void )conjh_conjy; \ -\ - if ( bli_zero_dim1( m ) ) return; \ -\ - if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -172,8 +87,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ rs_ct = rs_c; \ cs_ct = cs_c; \ \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copys)( *alpha, alpha0 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha1 ); \ } \ else /* if ( bli_is_upper( uplo ) ) */ \ { \ @@ -185,8 +100,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ conjx = bli_apply_conj( conjh, conjx ); \ conjy = bli_apply_conj( conjh, conjy ); \ \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha0 ); \ + PASTEMAC(ch,copys)( *alpha, alpha1 ); \ } \ \ /* Apply conjh (which carries the conjugation component of the Hermitian @@ -195,66 +110,69 @@ void PASTEMAC3(chx,chy,chc,varname)( \ conj0 = conjx; \ conj1 = bli_apply_conj( conjh, conjx ); \ conjh_conjy = bli_apply_conj( conjh, conjy ); \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_behind = i; \ n_ahead = m - i - 1; \ - x0 = x_cast + (0 )*incx; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ - psi1 = y_cast + (i )*incy; \ - c10t = c_cast + (i )*rs_ct + (0 )*cs_ct; \ - gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ - c21 = c_cast + (i+1)*rs_ct + (i )*cs_ct; \ + x0 = x + (0 )*incx; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ + psi1 = y + (i )*incy; \ + c10t = c + (i )*rs_ct + (0 )*cs_ct; \ + gamma11 = c + (i )*rs_ct + (i )*cs_ct; \ + c21 = c + (i+1)*rs_ct + (i )*cs_ct; \ \ /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ - PASTEMAC2(chy,chy,copycjs)( conjh_conjy, *psi1, conjy0_psi1 ); \ - PASTEMAC2(chy,chy,copycjs)( conjy, *psi1, conjy1_psi1 ); \ - PASTEMAC2(chx,chx,copycjs)( conj0, *chi1, conjx0_chi1 ); \ + PASTEMAC(ch,copycjs)( conjh_conjy, *psi1, conjy0_psi1 ); \ + PASTEMAC(ch,copycjs)( conjy, *psi1, conjy1_psi1 ); \ + PASTEMAC(ch,copycjs)( conj0, *chi1, conjx0_chi1 ); \ \ /* Compute scalars for vector subproblems. */ \ - PASTEMAC3(chxy,chy,chxy,scal2s)( alpha0, conjy0_psi1, alpha0_psi1 ); \ - PASTEMAC3(chxy,chy,chxy,scal2s)( alpha1, conjy1_psi1, alpha1_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha0, conjy0_psi1, alpha0_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha1, conjy1_psi1, alpha1_psi1 ); \ \ /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have already been conjugated, if needed, by conjx and conjy. */ \ - PASTEMAC3(chy,chx,chxy,scal2s)( alpha0_psi1, conjx0_chi1, alpha0_chi1_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha0_psi1, conjx0_chi1, alpha0_chi1_psi1 ); \ \ /* c21 = c21 + alpha * x2 * conj(psi1); */ \ - PASTEMAC3(chxy,chx,chc,kername)( conj0, \ - n_ahead, \ - &alpha0_psi1, \ - x2, incx, \ - c21, rs_ct ); \ + kfp_av \ + ( \ + conj0, \ + n_ahead, \ + &alpha0_psi1, \ + x2, incx, \ + c21, rs_ct, \ + cntx \ + ); \ \ /* c10t = c10t + conj(alpha) * psi1 * x0'; */ \ - PASTEMAC3(chxy,chx,chc,kername)( conj1, \ - n_behind, \ - &alpha1_psi1, \ - x0, incx, \ - c10t, cs_ct ); \ + kfp_av \ + ( \ + conj1, \ + n_behind, \ + &alpha1_psi1, \ + x0, incx, \ + c10t, cs_ct, \ + cntx \ + ); \ \ /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + conj(alpha) * psi1 * conj(chi1); */ \ - PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ + PASTEMAC(ch,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ \ /* For her2, explicitly set the imaginary component of gamma11 to zero. */ \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(chc,seti0s)( *gamma11 ); \ + PASTEMAC(ch,seti0s)( *gamma11 ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( her2_unb_var2, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( her2_unb_var2, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( her2_unb_var2, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( her2_unb_var2 ) diff --git a/frame/2/her2/bli_her2_unb_var3.c b/frame/2/her2/bli_her2_unb_var3.c index 11bf71c06..45701910e 100644 --- a/frame/2/her2/bli_her2_unb_var3.c +++ b/frame/2/her2/bli_her2_unb_var3.c @@ -34,135 +34,50 @@ #include "blis.h" -#define FUNCPTR_T her2_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conjx, - conj_t conjy, - conj_t conjh, - dim_t m, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* c, inc_t rs_c, inc_t cs_c - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unb_var3); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unb_var3); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unb_var3); -#endif -#endif - - -void bli_her2_unb_var3( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_c = bli_obj_datatype( *c ); - - uplo_t uplo = bli_obj_uplo( *c ); - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - - dim_t m = bli_obj_length( *c ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - void* buf_c = bli_obj_buffer_at_off( *c ); - inc_t rs_c = bli_obj_row_stride( *c ); - inc_t cs_c = bli_obj_col_stride( *c ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of the datatypes of x and y. - dt_alpha = bli_datatype_union( dt_x, dt_y ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_c]; - - // Invoke the function. - f( uplo, - conjx, - conjy, - conjh, - m, - buf_alpha, - buf_x, incx, - buf_y, incy, - buf_c, rs_c, cs_c ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(chx,chy,chc,varname)( \ - uplo_t uplo, \ - conj_t conjx, \ - conj_t conjy, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ - ctype_xy* two = PASTEMAC(chxy,2); \ - ctype_xy* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_c* c_cast = c; \ - ctype_x* chi1; \ - ctype_y* y0; \ - ctype_y* psi1; \ - ctype_y* y2; \ - ctype_c* c10t; \ - ctype_c* gamma11; \ - ctype_c* c21; \ - ctype_xy alpha0; \ - ctype_xy alpha1; \ - ctype_xy alpha0_chi1; \ - ctype_xy alpha1_chi1; \ - ctype_xy alpha0_chi1_psi1; \ - ctype_x conjx0_chi1; \ - ctype_x conjx1_chi1; \ - ctype_y conjy0_psi1; \ - dim_t i; \ - dim_t n_behind; \ - dim_t n_ahead; \ - inc_t rs_ct, cs_ct; \ - conj_t conj0, conj1; \ - conj_t conjh_conjx; \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype* two = PASTEMAC(ch,2); \ + ctype* chi1; \ + ctype* y0; \ + ctype* psi1; \ + ctype* y2; \ + ctype* c10t; \ + ctype* gamma11; \ + ctype* c21; \ + ctype alpha0; \ + ctype alpha1; \ + ctype alpha0_chi1; \ + ctype alpha1_chi1; \ + ctype alpha0_chi1_psi1; \ + ctype conjx0_chi1; \ + ctype conjx1_chi1; \ + ctype conjy0_psi1; \ + dim_t i; \ + dim_t n_behind; \ + dim_t n_ahead; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ + conj_t conjh_conjx; \ \ /* Eliminate unused variable warnings. */ \ ( void )conjh_conjx; \ -\ - if ( bli_zero_dim1( m ) ) return; \ -\ - if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -172,8 +87,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ rs_ct = rs_c; \ cs_ct = cs_c; \ \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copys)( *alpha, alpha0 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha1 ); \ } \ else /* if ( bli_is_upper( uplo ) ) */ \ { \ @@ -185,8 +100,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ conjx = bli_apply_conj( conjh, conjx ); \ conjy = bli_apply_conj( conjh, conjy ); \ \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha0 ); \ + PASTEMAC(ch,copys)( *alpha, alpha1 ); \ } \ \ /* Apply conjh (which carries the conjugation component of the Hermitian @@ -195,66 +110,69 @@ void PASTEMAC3(chx,chy,chc,varname)( \ conj0 = bli_apply_conj( conjh, conjy ); \ conj1 = conjy; \ conjh_conjx = bli_apply_conj( conjh, conjx ); \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_behind = i; \ n_ahead = m - i - 1; \ - chi1 = x_cast + (i )*incx; \ - y0 = y_cast + (0 )*incy; \ - psi1 = y_cast + (i )*incy; \ - y2 = y_cast + (i+1)*incy; \ - c10t = c_cast + (i )*rs_ct + (0 )*cs_ct; \ - gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ - c21 = c_cast + (i+1)*rs_ct + (i )*cs_ct; \ + chi1 = x + (i )*incx; \ + y0 = y + (0 )*incy; \ + psi1 = y + (i )*incy; \ + y2 = y + (i+1)*incy; \ + c10t = c + (i )*rs_ct + (0 )*cs_ct; \ + gamma11 = c + (i )*rs_ct + (i )*cs_ct; \ + c21 = c + (i+1)*rs_ct + (i )*cs_ct; \ \ /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ - PASTEMAC2(chy,chy,copycjs)( conjx, *chi1, conjx0_chi1 ); \ - PASTEMAC2(chy,chy,copycjs)( conjh_conjx, *chi1, conjx1_chi1 ); \ - PASTEMAC2(chx,chx,copycjs)( conj0, *psi1, conjy0_psi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, conjx0_chi1 ); \ + PASTEMAC(ch,copycjs)( conjh_conjx, *chi1, conjx1_chi1 ); \ + PASTEMAC(ch,copycjs)( conj0, *psi1, conjy0_psi1 ); \ \ /* Compute scalars for vector subproblems. */ \ - PASTEMAC3(chxy,chy,chxy,scal2s)( alpha0, conjx0_chi1, alpha0_chi1 ); \ - PASTEMAC3(chxy,chy,chxy,scal2s)( alpha1, conjx1_chi1, alpha1_chi1 ); \ + PASTEMAC(ch,scal2s)( alpha0, conjx0_chi1, alpha0_chi1 ); \ + PASTEMAC(ch,scal2s)( alpha1, conjx1_chi1, alpha1_chi1 ); \ \ /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have already been conjugated, if needed, by conjx and conjy. */ \ - PASTEMAC3(chy,chx,chxy,scal2s)( alpha0_chi1, conjy0_psi1, alpha0_chi1_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha0_chi1, conjy0_psi1, alpha0_chi1_psi1 ); \ \ /* c10t = c10t + alpha * chi1 * y0'; */ \ - PASTEMAC3(chxy,chy,chc,kername)( conj0, \ - n_behind, \ - &alpha0_chi1, \ - y0, incy, \ - c10t, cs_ct ); \ + kfp_av \ + ( \ + conj0, \ + n_behind, \ + &alpha0_chi1, \ + y0, incy, \ + c10t, cs_ct, \ + cntx \ + ); \ \ /* c21 = c21 + conj(alpha) * y2 * conj(chi1); */ \ - PASTEMAC3(chxy,chy,chc,kername)( conj1, \ - n_ahead, \ - &alpha1_chi1, \ - y2, incy, \ - c21, rs_ct ); \ + kfp_av \ + ( \ + conj1, \ + n_ahead, \ + &alpha1_chi1, \ + y2, incy, \ + c21, rs_ct, \ + cntx \ + ); \ \ /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + conj(alpha) * psi1 * conj(chi1); */ \ - PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ + PASTEMAC(ch,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ \ /* For her2, explicitly set the imaginary component of gamma11 to zero. */ \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(chc,seti0s)( *gamma11 ); \ + PASTEMAC(ch,seti0s)( *gamma11 ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( her2_unb_var3, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( her2_unb_var3, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( her2_unb_var3, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( her2_unb_var3 ) diff --git a/frame/2/her2/bli_her2_unb_var4.c b/frame/2/her2/bli_her2_unb_var4.c index b08ba5800..d3d4e7bb8 100644 --- a/frame/2/her2/bli_her2_unb_var4.c +++ b/frame/2/her2/bli_her2_unb_var4.c @@ -34,135 +34,50 @@ #include "blis.h" -#define FUNCPTR_T her2_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conjx, - conj_t conjy, - conj_t conjh, - dim_t m, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* c, inc_t rs_c, inc_t cs_c - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unb_var4); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unb_var4); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unb_var4); -#endif -#endif - - -void bli_her2_unb_var4( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_c = bli_obj_datatype( *c ); - - uplo_t uplo = bli_obj_uplo( *c ); - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - - dim_t m = bli_obj_length( *c ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - void* buf_c = bli_obj_buffer_at_off( *c ); - inc_t rs_c = bli_obj_row_stride( *c ); - inc_t cs_c = bli_obj_col_stride( *c ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of the datatypes of x and y. - dt_alpha = bli_datatype_union( dt_x, dt_y ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_c]; - - // Invoke the function. - f( uplo, - conjx, - conjy, - conjh, - m, - buf_alpha, - buf_x, incx, - buf_y, incy, - buf_c, rs_c, cs_c ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(chx,chy,chc,varname)( \ - uplo_t uplo, \ - conj_t conjx, \ - conj_t conjy, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ - ctype_xy* two = PASTEMAC(chxy,2); \ - ctype_xy* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_c* c_cast = c; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_y* psi1; \ - ctype_y* y2; \ - ctype_c* gamma11; \ - ctype_c* c21; \ - ctype_xy alpha0; \ - ctype_xy alpha1; \ - ctype_xy alpha0_psi1; \ - ctype_xy alpha1_chi1; \ - ctype_xy alpha0_chi1_psi1; \ - ctype_y conjy0_psi1; \ - ctype_x conjx1_chi1; \ - ctype_x conjx0_chi1; \ - dim_t i; \ - dim_t n_ahead; \ - inc_t rs_ct, cs_ct; \ - conj_t conj0, conj1; \ - conj_t conjh_conjx; \ - conj_t conjh_conjy; \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype* two = PASTEMAC(ch,2); \ + ctype* chi1; \ + ctype* x2; \ + ctype* psi1; \ + ctype* y2; \ + ctype* gamma11; \ + ctype* c21; \ + ctype alpha0; \ + ctype alpha1; \ + ctype alpha0_psi1; \ + ctype alpha1_chi1; \ + ctype alpha0_chi1_psi1; \ + ctype conjy0_psi1; \ + ctype conjx1_chi1; \ + ctype conjx0_chi1; \ + dim_t i; \ + dim_t n_ahead; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ + conj_t conjh_conjx; \ + conj_t conjh_conjy; \ \ /* Eliminate unused variable warnings. */ \ ( void )conjh_conjx; \ ( void )conjh_conjy; \ -\ - if ( bli_zero_dim1( m ) ) return; \ -\ - if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -172,8 +87,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ rs_ct = rs_c; \ cs_ct = cs_c; \ \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copys)( *alpha, alpha0 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha1 ); \ } \ else /* if ( bli_is_upper( uplo ) ) */ \ { \ @@ -185,8 +100,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ conjx = bli_apply_conj( conjh, conjx ); \ conjy = bli_apply_conj( conjh, conjy ); \ \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha0 ); \ + PASTEMAC(ch,copys)( *alpha, alpha1 ); \ } \ \ /* Apply conjh (which carries the conjugation component of the Hermitian @@ -196,64 +111,67 @@ void PASTEMAC3(chx,chy,chc,varname)( \ conj1 = conjy; \ conjh_conjx = bli_apply_conj( conjh, conjx ); \ conjh_conjy = bli_apply_conj( conjh, conjy ); \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_ahead = m - i - 1; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ - psi1 = y_cast + (i )*incy; \ - y2 = y_cast + (i+1)*incy; \ - gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ - c21 = c_cast + (i+1)*rs_ct + (i )*cs_ct; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ + psi1 = y + (i )*incy; \ + y2 = y + (i+1)*incy; \ + gamma11 = c + (i )*rs_ct + (i )*cs_ct; \ + c21 = c + (i+1)*rs_ct + (i )*cs_ct; \ \ /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ - PASTEMAC2(chy,chy,copycjs)( conjh_conjy, *psi1, conjy0_psi1 ); \ - PASTEMAC2(chx,chx,copycjs)( conjh_conjx, *chi1, conjx1_chi1 ); \ - PASTEMAC2(chx,chx,copycjs)( conj0, *chi1, conjx0_chi1 ); \ + PASTEMAC(ch,copycjs)( conjh_conjy, *psi1, conjy0_psi1 ); \ + PASTEMAC(ch,copycjs)( conjh_conjx, *chi1, conjx1_chi1 ); \ + PASTEMAC(ch,copycjs)( conj0, *chi1, conjx0_chi1 ); \ \ /* Compute scalars for vector subproblems. */ \ - PASTEMAC3(chxy,chx,chxy,scal2s)( alpha0, conjy0_psi1, alpha0_psi1 ); \ - PASTEMAC3(chxy,chx,chxy,scal2s)( alpha1, conjx1_chi1, alpha1_chi1 ); \ + PASTEMAC(ch,scal2s)( alpha0, conjy0_psi1, alpha0_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha1, conjx1_chi1, alpha1_chi1 ); \ \ /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have already been conjugated, if needed, by conjx and conjy. */ \ - PASTEMAC3(chy,chxy,chxy,scal2s)( alpha0_psi1, conjx0_chi1, alpha0_chi1_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha0_psi1, conjx0_chi1, alpha0_chi1_psi1 ); \ \ /* c21 = c21 + alpha * x2 * conj(psi1); */ \ - PASTEMAC3(chxy,chx,chc,kername)( conj0, \ - n_ahead, \ - &alpha0_psi1, \ - x2, incx, \ - c21, rs_ct ); \ + kfp_av \ + ( \ + conj0, \ + n_ahead, \ + &alpha0_psi1, \ + x2, incx, \ + c21, rs_ct, \ + cntx \ + ); \ \ /* c21 = c21 + conj(alpha) * y2 * conj(chi1); */ \ - PASTEMAC3(chxy,chy,chc,kername)( conj1, \ - n_ahead, \ - &alpha1_chi1, \ - y2, incy, \ - c21, rs_ct ); \ + kfp_av \ + ( \ + conj1, \ + n_ahead, \ + &alpha1_chi1, \ + y2, incy, \ + c21, rs_ct, \ + cntx \ + ); \ \ /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + conj(alpha) * psi1 * conj(chi1); */ \ - PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ + PASTEMAC(ch,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ \ /* For her2, explicitly set the imaginary component of gamma11 to zero. */ \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(chc,seti0s)( *gamma11 ); \ + PASTEMAC(ch,seti0s)( *gamma11 ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( her2_unb_var4, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( her2_unb_var4, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( her2_unb_var4, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( her2_unb_var4 ) diff --git a/frame/2/her2/bli_her2_unf_var1.c b/frame/2/her2/bli_her2_unf_var1.c index 5e668503f..32fbd24ef 100644 --- a/frame/2/her2/bli_her2_unf_var1.c +++ b/frame/2/her2/bli_her2_unf_var1.c @@ -34,129 +34,44 @@ #include "blis.h" -#define FUNCPTR_T her2_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conjx, - conj_t conjy, - conj_t conjh, - dim_t m, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* c, inc_t rs_c, inc_t cs_c - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unf_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unf_var1); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unf_var1); -#endif -#endif - - -void bli_her2_unf_var1( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_c = bli_obj_datatype( *c ); - - uplo_t uplo = bli_obj_uplo( *c ); - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - - dim_t m = bli_obj_length( *c ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - void* buf_c = bli_obj_buffer_at_off( *c ); - inc_t rs_c = bli_obj_row_stride( *c ); - inc_t cs_c = bli_obj_col_stride( *c ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of the datatypes of x and y. - dt_alpha = bli_datatype_union( dt_x, dt_y ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_c]; - - // Invoke the function. - f( uplo, - conjx, - conjy, - conjh, - m, - buf_alpha, - buf_x, incx, - buf_y, incy, - buf_c, rs_c, cs_c ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(chx,chy,chc,varname)( \ - uplo_t uplo, \ - conj_t conjx, \ - conj_t conjy, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ - ctype_xy* two = PASTEMAC(chxy,2); \ - ctype_xy* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_c* c_cast = c; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_y* y0; \ - ctype_y* psi1; \ - ctype_c* c10t; \ - ctype_c* gamma11; \ - ctype_xy alpha0; \ - ctype_xy alpha1; \ - ctype_xy alpha0_chi1; \ - ctype_xy alpha1_psi1; \ - ctype_xy alpha0_chi1_psi1; \ - ctype_x conjx0_chi1; \ - ctype_y conjy1_psi1; \ - ctype_y conjy0_psi1; \ - dim_t i; \ - dim_t n_behind; \ - inc_t rs_ct, cs_ct; \ - conj_t conj0, conj1; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ -\ - if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ + ctype* two = PASTEMAC(ch,2); \ + ctype* x0; \ + ctype* chi1; \ + ctype* y0; \ + ctype* psi1; \ + ctype* c10t; \ + ctype* gamma11; \ + ctype alpha0; \ + ctype alpha1; \ + ctype alpha0_chi1; \ + ctype alpha1_psi1; \ + ctype alpha0_chi1_psi1; \ + ctype conjx0_chi1; \ + ctype conjy1_psi1; \ + ctype conjy0_psi1; \ + dim_t i; \ + dim_t n_behind; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -166,8 +81,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ rs_ct = rs_c; \ cs_ct = cs_c; \ \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copys)( *alpha, alpha0 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha1 ); \ } \ else /* if ( bli_is_upper( uplo ) ) */ \ { \ @@ -179,8 +94,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ conjx = bli_apply_conj( conjh, conjx ); \ conjy = bli_apply_conj( conjh, conjy ); \ \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha0 ); \ + PASTEMAC(ch,copys)( *alpha, alpha1 ); \ } \ \ /* Apply conjh (which carries the conjugation component of the Hermitian @@ -188,61 +103,60 @@ void PASTEMAC3(chx,chy,chc,varname)( \ the effective conjugation for the vector subproblems. */ \ conj0 = bli_apply_conj( conjh, conjy ); \ conj1 = bli_apply_conj( conjh, conjx ); \ +\ + PASTECH(ch,axpy2v_ft) kfp_2v; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_2v = bli_cntx_get_l1f_ker_dt( dt, BLIS_AXPY2V_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_behind = i; \ - x0 = x_cast + (0 )*incx; \ - chi1 = x_cast + (i )*incx; \ - y0 = y_cast + (0 )*incy; \ - psi1 = y_cast + (i )*incy; \ - c10t = c_cast + (i )*rs_ct + (0 )*cs_ct; \ - gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ + x0 = x + (0 )*incx; \ + chi1 = x + (i )*incx; \ + y0 = y + (0 )*incy; \ + psi1 = y + (i )*incy; \ + c10t = c + (i )*rs_ct + (0 )*cs_ct; \ + gamma11 = c + (i )*rs_ct + (i )*cs_ct; \ \ /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ - PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx0_chi1 ); \ - PASTEMAC2(chy,chy,copycjs)( conjy, *psi1, conjy1_psi1 ); \ - PASTEMAC2(chy,chy,copycjs)( conj0, *psi1, conjy0_psi1 ); \ + PASTEMAC(ch,copycjs)( conjx, *chi1, conjx0_chi1 ); \ + PASTEMAC(ch,copycjs)( conjy, *psi1, conjy1_psi1 ); \ + PASTEMAC(ch,copycjs)( conj0, *psi1, conjy0_psi1 ); \ \ /* Compute scalars for vector subproblems. */ \ - PASTEMAC3(chxy,chx,chxy,scal2s)( alpha0, conjx0_chi1, alpha0_chi1 ); \ - PASTEMAC3(chxy,chx,chxy,scal2s)( alpha1, conjy1_psi1, alpha1_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha0, conjx0_chi1, alpha0_chi1 ); \ + PASTEMAC(ch,scal2s)( alpha1, conjy1_psi1, alpha1_psi1 ); \ \ /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have already been conjugated, if needed, by conjx and conjy. */ \ - PASTEMAC3(chy,chxy,chxy,scal2s)( alpha0_chi1, conjy0_psi1, alpha0_chi1_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha0_chi1, conjy0_psi1, alpha0_chi1_psi1 ); \ \ /* c10t = c10t + alpha * chi1 * y0'; */ \ /* c10t = c10t + conj(alpha) * psi1 * x0'; */ \ - PASTEMAC3(chy,chx,chc,kername)( conj0, \ - conj1, \ - n_behind, \ - &alpha0_chi1, \ - &alpha1_psi1, \ - y0, incy, \ - x0, incx, \ - c10t, cs_ct ); \ + kfp_2v \ + ( \ + conj0, \ + conj1, \ + n_behind, \ + &alpha0_chi1, \ + &alpha1_psi1, \ + y0, incy, \ + x0, incx, \ + c10t, cs_ct, \ + cntx \ + ); \ \ /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + conj(alpha) * psi1 * conj(chi1); */ \ - PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ + PASTEMAC(ch,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ \ /* For her2, explicitly set the imaginary component of gamma11 to zero. */ \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(chc,seti0s)( *gamma11 ); \ + PASTEMAC(ch,seti0s)( *gamma11 ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( her2_unf_var1, AXPY2V_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( her2_unf_var1, AXPY2V_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( her2_unf_var1, AXPY2V_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( her2_unf_var1 ) diff --git a/frame/2/her2/bli_her2_unf_var4.c b/frame/2/her2/bli_her2_unf_var4.c index 2f0a2c43a..60d4657ab 100644 --- a/frame/2/her2/bli_her2_unf_var4.c +++ b/frame/2/her2/bli_her2_unf_var4.c @@ -34,135 +34,50 @@ #include "blis.h" -#define FUNCPTR_T her2_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - conj_t conjx, - conj_t conjy, - conj_t conjh, - dim_t m, - void* alpha, - void* x, inc_t incx, - void* y, inc_t incy, - void* c, inc_t rs_c, inc_t cs_c - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unf_var4); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unf_var4); -#else -static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unf_var4); -#endif -#endif - - -void bli_her2_unf_var4( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ) -{ - num_t dt_x = bli_obj_datatype( *x ); - num_t dt_y = bli_obj_datatype( *y ); - num_t dt_c = bli_obj_datatype( *c ); - - uplo_t uplo = bli_obj_uplo( *c ); - conj_t conjx = bli_obj_conj_status( *x ); - conj_t conjy = bli_obj_conj_status( *y ); - - dim_t m = bli_obj_length( *c ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - void* buf_y = bli_obj_buffer_at_off( *y ); - inc_t incy = bli_obj_vector_inc( *y ); - - void* buf_c = bli_obj_buffer_at_off( *c ); - inc_t rs_c = bli_obj_row_stride( *c ); - inc_t cs_c = bli_obj_col_stride( *c ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of the datatypes of x and y. - dt_alpha = bli_datatype_union( dt_x, dt_y ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_x][dt_y][dt_c]; - - // Invoke the function. - f( uplo, - conjx, - conjy, - conjh, - m, - buf_alpha, - buf_x, incx, - buf_y, incy, - buf_c, rs_c, cs_c ); -} - - -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC3(chx,chy,chc,varname)( \ - uplo_t uplo, \ - conj_t conjx, \ - conj_t conjy, \ - conj_t conjh, \ - dim_t m, \ - void* alpha, \ - void* x, inc_t incx, \ - void* y, inc_t incy, \ - void* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ - ctype_xy* two = PASTEMAC(chxy,2); \ - ctype_xy* alpha_cast = alpha; \ - ctype_x* x_cast = x; \ - ctype_y* y_cast = y; \ - ctype_c* c_cast = c; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_y* psi1; \ - ctype_y* y2; \ - ctype_c* gamma11; \ - ctype_c* c21; \ - ctype_xy alpha0; \ - ctype_xy alpha1; \ - ctype_xy alpha0_psi1; \ - ctype_xy alpha1_chi1; \ - ctype_xy alpha0_chi1_psi1; \ - ctype_y conjy0_psi1; \ - ctype_x conjx1_chi1; \ - ctype_x conjx0_chi1; \ - dim_t i; \ - dim_t n_ahead; \ - inc_t rs_ct, cs_ct; \ - conj_t conj0, conj1; \ - conj_t conjh_conjx; \ - conj_t conjh_conjy; \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype* two = PASTEMAC(ch,2); \ + ctype* chi1; \ + ctype* x2; \ + ctype* psi1; \ + ctype* y2; \ + ctype* gamma11; \ + ctype* c21; \ + ctype alpha0; \ + ctype alpha1; \ + ctype alpha0_psi1; \ + ctype alpha1_chi1; \ + ctype alpha0_chi1_psi1; \ + ctype conjy0_psi1; \ + ctype conjx1_chi1; \ + ctype conjx0_chi1; \ + dim_t i; \ + dim_t n_ahead; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ + conj_t conjh_conjx; \ + conj_t conjh_conjy; \ \ /* Eliminate unused variable warnings. */ \ ( void )conjh_conjx; \ ( void )conjh_conjy; \ -\ - if ( bli_zero_dim1( m ) ) return; \ -\ - if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ \ /* The algorithm will be expressed in terms of the lower triangular case; the upper triangular case is supported by swapping the row and column @@ -172,8 +87,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ rs_ct = rs_c; \ cs_ct = cs_c; \ \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copys)( *alpha, alpha0 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha1 ); \ } \ else /* if ( bli_is_upper( uplo ) ) */ \ { \ @@ -185,8 +100,8 @@ void PASTEMAC3(chx,chy,chc,varname)( \ conjx = bli_apply_conj( conjh, conjx ); \ conjy = bli_apply_conj( conjh, conjy ); \ \ - PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ - PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + PASTEMAC(ch,copycjs)( conjh, *alpha, alpha0 ); \ + PASTEMAC(ch,copys)( *alpha, alpha1 ); \ } \ \ /* Apply conjh (which carries the conjugation component of the Hermitian @@ -196,61 +111,60 @@ void PASTEMAC3(chx,chy,chc,varname)( \ conj1 = conjy; \ conjh_conjx = bli_apply_conj( conjh, conjx ); \ conjh_conjy = bli_apply_conj( conjh, conjy ); \ +\ + PASTECH(ch,axpy2v_ft) kfp_2v; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_2v = bli_cntx_get_l1f_ker_dt( dt, BLIS_AXPY2V_KER, cntx ); \ \ for ( i = 0; i < m; ++i ) \ { \ n_ahead = m - i - 1; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ - psi1 = y_cast + (i )*incy; \ - y2 = y_cast + (i+1)*incy; \ - gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ - c21 = c_cast + (i+1)*rs_ct + (i )*cs_ct; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ + psi1 = y + (i )*incy; \ + y2 = y + (i+1)*incy; \ + gamma11 = c + (i )*rs_ct + (i )*cs_ct; \ + c21 = c + (i+1)*rs_ct + (i )*cs_ct; \ \ /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ - PASTEMAC2(chy,chy,copycjs)( conjh_conjy, *psi1, conjy0_psi1 ); \ - PASTEMAC2(chx,chx,copycjs)( conjh_conjx, *chi1, conjx1_chi1 ); \ - PASTEMAC2(chx,chx,copycjs)( conj0, *chi1, conjx0_chi1 ); \ + PASTEMAC(ch,copycjs)( conjh_conjy, *psi1, conjy0_psi1 ); \ + PASTEMAC(ch,copycjs)( conjh_conjx, *chi1, conjx1_chi1 ); \ + PASTEMAC(ch,copycjs)( conj0, *chi1, conjx0_chi1 ); \ \ /* Compute scalars for vector subproblems. */ \ - PASTEMAC3(chxy,chx,chxy,scal2s)( alpha0, conjy0_psi1, alpha0_psi1 ); \ - PASTEMAC3(chxy,chx,chxy,scal2s)( alpha1, conjx1_chi1, alpha1_chi1 ); \ + PASTEMAC(ch,scal2s)( alpha0, conjy0_psi1, alpha0_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha1, conjx1_chi1, alpha1_chi1 ); \ \ /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have already been conjugated, if needed, by conjx and conjy. */ \ - PASTEMAC3(chy,chxy,chxy,scal2s)( alpha0_psi1, conjx0_chi1, alpha0_chi1_psi1 ); \ + PASTEMAC(ch,scal2s)( alpha0_psi1, conjx0_chi1, alpha0_chi1_psi1 ); \ \ /* c21 = c21 + alpha * x2 * conj(psi1); */ \ /* c21 = c21 + conj(alpha) * y2 * conj(chi1); */ \ - PASTEMAC3(chx,chy,chc,kername)( conj0, \ - conj1, \ - n_ahead, \ - &alpha0_psi1, \ - &alpha1_chi1, \ - x2, incx, \ - y2, incy, \ - c21, rs_ct ); \ + kfp_2v \ + ( \ + conj0, \ + conj1, \ + n_ahead, \ + &alpha0_psi1, \ + &alpha1_chi1, \ + x2, incx, \ + y2, incy, \ + c21, rs_ct, \ + cntx \ + ); \ \ /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + conj(alpha) * psi1 * conj(chi1); */ \ - PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ + PASTEMAC(ch,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ \ /* For her2, explicitly set the imaginary component of gamma11 to zero. */ \ if ( bli_is_conj( conjh ) ) \ - PASTEMAC(chc,seti0s)( *gamma11 ); \ + PASTEMAC(ch,seti0s)( *gamma11 ); \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC3U12_BASIC( her2_unf_var4, AXPY2V_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( her2_unf_var4, AXPY2V_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( her2_unf_var4, AXPY2V_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( her2_unf_var4 ) diff --git a/frame/2/her2/bli_her2_var.h b/frame/2/her2/bli_her2_var.h new file mode 100644 index 000000000..301b6931e --- /dev/null +++ b/frame/2/her2/bli_her2_var.h @@ -0,0 +1,97 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + conj_t conjh, \ + obj_t* alpha, \ + obj_t* alpha_conj, \ + obj_t* x, \ + obj_t* y, \ + obj_t* c, \ + cntx_t* cntx, \ + her2_t* cntl \ + ); + +GENPROT( her2_blk_var1 ) +GENPROT( her2_blk_var2 ) +GENPROT( her2_blk_var3 ) +GENPROT( her2_blk_var4 ) + +GENPROT( her2_unb_var1 ) +GENPROT( her2_unb_var2 ) +GENPROT( her2_unb_var3 ) +GENPROT( her2_unb_var4 ) + +GENPROT( her2_unf_var1 ) +GENPROT( her2_unf_var4 ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( her2_unb_var1 ) +INSERT_GENTPROT_BASIC( her2_unb_var2 ) +INSERT_GENTPROT_BASIC( her2_unb_var3 ) +INSERT_GENTPROT_BASIC( her2_unb_var4 ) + +INSERT_GENTPROT_BASIC( her2_unf_var1 ) +INSERT_GENTPROT_BASIC( her2_unf_var4 ) + diff --git a/frame/2/her2/bli_her2_var_oapi.c b/frame/2/her2/bli_her2_var_oapi.c new file mode 100644 index 000000000..6c87496d6 --- /dev/null +++ b/frame/2/her2/bli_her2_var_oapi.c @@ -0,0 +1,97 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + conj_t conjh, \ + obj_t* alpha, \ + obj_t* alpha_conj, \ + obj_t* x, \ + obj_t* y, \ + obj_t* c, \ + cntx_t* cntx, \ + her2_t* cntl \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *c ); \ +\ + uplo_t uplo = bli_obj_uplo( *c ); \ + conj_t conjx = bli_obj_conj_status( *x ); \ + conj_t conjy = bli_obj_conj_status( *y ); \ +\ + dim_t m = bli_obj_length( *c ); \ +\ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_y = bli_obj_buffer_at_off( *y ); \ + inc_t incy = bli_obj_vector_inc( *y ); \ +\ + void* buf_c = bli_obj_buffer_at_off( *c ); \ + inc_t rs_c = bli_obj_row_stride( *c ); \ + inc_t cs_c = bli_obj_col_stride( *c ); \ +\ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + bli_call_ft_14 \ + ( \ + dt, \ + opname, \ + uplo, \ + conjx, \ + conjy, \ + conjh, \ + m, \ + buf_alpha, \ + buf_x, incx, \ + buf_y, incy, \ + buf_c, rs_c, cs_c, \ + cntx \ + ); \ +} \ + +GENFRONT( her2_unb_var1 ) +GENFRONT( her2_unb_var2 ) +GENFRONT( her2_unb_var3 ) +GENFRONT( her2_unb_var4 ) + +GENFRONT( her2_unf_var1 ) +GENFRONT( her2_unf_var4 ) + diff --git a/frame/2/her2/bli_her2_blk_var1.h b/frame/2/her2/old/bli_her2_blk_var1.h similarity index 98% rename from frame/2/her2/bli_her2_blk_var1.h rename to frame/2/her2/old/bli_her2_blk_var1.h index 338fa6bf0..9e0e69424 100644 --- a/frame/2/her2/bli_her2_blk_var1.h +++ b/frame/2/her2/old/bli_her2_blk_var1.h @@ -38,5 +38,6 @@ void bli_her2_blk_var1( conj_t conjh, obj_t* x, obj_t* y, obj_t* c, + cntx_t* cntx, her2_t* cntl ); diff --git a/frame/2/her2/bli_her2_blk_var2.h b/frame/2/her2/old/bli_her2_blk_var2.h similarity index 98% rename from frame/2/her2/bli_her2_blk_var2.h rename to frame/2/her2/old/bli_her2_blk_var2.h index 6769bf54f..85924943f 100644 --- a/frame/2/her2/bli_her2_blk_var2.h +++ b/frame/2/her2/old/bli_her2_blk_var2.h @@ -38,5 +38,6 @@ void bli_her2_blk_var2( conj_t conjh, obj_t* x, obj_t* y, obj_t* c, + cntx_t* cntx, her2_t* cntl ); diff --git a/frame/2/her2/bli_her2_blk_var3.h b/frame/2/her2/old/bli_her2_blk_var3.h similarity index 98% rename from frame/2/her2/bli_her2_blk_var3.h rename to frame/2/her2/old/bli_her2_blk_var3.h index b8c1227e1..99f96b950 100644 --- a/frame/2/her2/bli_her2_blk_var3.h +++ b/frame/2/her2/old/bli_her2_blk_var3.h @@ -38,5 +38,6 @@ void bli_her2_blk_var3( conj_t conjh, obj_t* x, obj_t* y, obj_t* c, + cntx_t* cntx, her2_t* cntl ); diff --git a/frame/2/her2/bli_her2_blk_var4.h b/frame/2/her2/old/bli_her2_blk_var4.h similarity index 98% rename from frame/2/her2/bli_her2_blk_var4.h rename to frame/2/her2/old/bli_her2_blk_var4.h index f75659830..41c12453e 100644 --- a/frame/2/her2/bli_her2_blk_var4.h +++ b/frame/2/her2/old/bli_her2_blk_var4.h @@ -38,5 +38,6 @@ void bli_her2_blk_var4( conj_t conjh, obj_t* x, obj_t* y, obj_t* c, + cntx_t* cntx, her2_t* cntl ); diff --git a/frame/2/her2/bli_her2_check.c b/frame/2/her2/old/bli_her2_check.c similarity index 86% rename from frame/2/her2/bli_her2_check.c rename to frame/2/her2/old/bli_her2_check.c index ac29fae55..4a0039041 100644 --- a/frame/2/her2/bli_her2_check.c +++ b/frame/2/her2/old/bli_her2_check.c @@ -34,11 +34,11 @@ #include "blis.h" -void bli_her2_basic_check( conj_t conjh, - obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ) +void bli_her2_basic_check( conj_t conjh, + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c ) { err_t e_val; @@ -80,11 +80,11 @@ void bli_her2_basic_check( conj_t conjh, bli_check_error_code( e_val ); } -void bli_her2_check( conj_t conjh, - obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ) +void bli_her2_check( conj_t conjh, + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c ) { err_t e_val; @@ -98,11 +98,12 @@ void bli_her2_check( conj_t conjh, bli_check_error_code( e_val ); } -void bli_her2_int_check( conj_t conjh, - obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c, +void bli_her2_int_check( conj_t conjh, + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c, + cntx_t* cntx, her2_t* cntl ) { err_t e_val; diff --git a/frame/2/her2/bli_her2_check.h b/frame/2/her2/old/bli_her2_check.h similarity index 74% rename from frame/2/her2/bli_her2_check.h rename to frame/2/her2/old/bli_her2_check.h index 973351f1d..a0a7add3b 100644 --- a/frame/2/her2/bli_her2_check.h +++ b/frame/2/her2/old/bli_her2_check.h @@ -32,21 +32,22 @@ */ -void bli_her2_basic_check( conj_t conjh, - obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ); +void bli_her2_basic_check( conj_t conjh, + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c ); -void bli_her2_check( conj_t conjh, - obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ); +void bli_her2_check( conj_t conjh, + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c ); -void bli_her2_int_check( conj_t conjh, - obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c, +void bli_her2_int_check( conj_t conjh, + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c, + cntx_t* cntx, her2_t* cntl ); diff --git a/frame/2/her2/old/bli_her2_cntx.c b/frame/2/her2/old/bli_her2_cntx.c new file mode 100644 index 000000000..19ac371ec --- /dev/null +++ b/frame/2/her2/old/bli_her2_cntx.c @@ -0,0 +1,60 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +void bli_her2_cntx_init( cntx_t* cntx ) +{ + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with kernels for the current architecture. + bli_gks_cntx_set_l1v_ker( BLIS_AXPYV_KER, cntx ); + + bli_gks_cntx_set_l1f_ker( BLIS_AXPY2V_KER, cntx ); + + // Set the register and cache blocksizes and multiples, as well + // as the execution method. + bli_gks_cntx_set_blkszs( BLIS_NAT, 2, + BLIS_N2, BLIS_N2, + BLIS_M2, BLIS_M2, + cntx ); +} + +void bli_her2_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + diff --git a/frame/1f/axpyf/bli_axpyf_fusefac.h b/frame/2/her2/old/bli_her2_cntx.h similarity index 94% rename from frame/1f/axpyf/bli_axpyf_fusefac.h rename to frame/2/her2/old/bli_her2_cntx.h index a1a2e9021..c64ff5bbe 100644 --- a/frame/1f/axpyf/bli_axpyf_fusefac.h +++ b/frame/2/her2/old/bli_her2_cntx.h @@ -32,8 +32,6 @@ */ -// -// Prototype object-based fusing factor query routine. -// -dim_t bli_axpyf_fusefac( num_t dt ); +void bli_her2_cntx_init( void ); +void bli_her2_cntx_finalize( void ); diff --git a/frame/2/her2/old/bli_her2_unb_var1.c b/frame/2/her2/old/bli_her2_unb_var1.c new file mode 100644 index 000000000..25f93e3f5 --- /dev/null +++ b/frame/2/her2/old/bli_her2_unb_var1.c @@ -0,0 +1,253 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T her2_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + conj_t conjx, + conj_t conjy, + conj_t conjh, + dim_t m, + void* alpha, + void* x, inc_t incx, + void* y, inc_t incy, + void* c, inc_t rs_c, inc_t cs_c + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unb_var1); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unb_var1); +#else +static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unb_var1); +#endif +#endif + + +void bli_her2_unb_var1( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ) +{ + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + num_t dt_c = bli_obj_datatype( *c ); + + uplo_t uplo = bli_obj_uplo( *c ); + conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); + + dim_t m = bli_obj_length( *c ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + void* buf_c = bli_obj_buffer_at_off( *c ); + inc_t rs_c = bli_obj_row_stride( *c ); + inc_t cs_c = bli_obj_col_stride( *c ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of the datatypes of x and y. + dt_alpha = bli_datatype_union( dt_x, dt_y ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_x][dt_y][dt_c]; + + // Invoke the function. + f( uplo, + conjx, + conjy, + conjh, + m, + buf_alpha, + buf_x, incx, + buf_y, incy, + buf_c, rs_c, cs_c ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +\ +void PASTEMAC3(chx,chy,chc,varname)( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* c, inc_t rs_c, inc_t cs_c \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_xy* two = PASTEMAC(chxy,2); \ + ctype_xy* alpha_cast = alpha; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_c* c_cast = c; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_y* y0; \ + ctype_y* psi1; \ + ctype_c* c10t; \ + ctype_c* gamma11; \ + ctype_xy alpha0; \ + ctype_xy alpha1; \ + ctype_xy alpha0_chi1; \ + ctype_xy alpha1_psi1; \ + ctype_xy alpha0_chi1_psi1; \ + ctype_x conjx0_chi1; \ + ctype_y conjy1_psi1; \ + ctype_y conjy0_psi1; \ + dim_t i; \ + dim_t n_behind; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_ct = rs_c; \ + cs_ct = cs_c; \ +\ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_ct = cs_c; \ + cs_ct = rs_c; \ +\ + /* Toggle conjugation of conjx/conjy, but only if we are being invoked + as her2; for syr2, conjx/conjy are unchanged. */ \ + conjx = bli_apply_conj( conjh, conjx ); \ + conjy = bli_apply_conj( conjh, conjy ); \ +\ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + } \ +\ + /* Apply conjh (which carries the conjugation component of the Hermitian + transpose, if applicable) to conjx and/or conjy as needed to arrive at + the effective conjugation for the vector subproblems. */ \ + conj0 = bli_apply_conj( conjh, conjy ); \ + conj1 = bli_apply_conj( conjh, conjx ); \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_behind = i; \ + x0 = x_cast + (0 )*incx; \ + chi1 = x_cast + (i )*incx; \ + y0 = y_cast + (0 )*incy; \ + psi1 = y_cast + (i )*incy; \ + c10t = c_cast + (i )*rs_ct + (0 )*cs_ct; \ + gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ +\ + /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ + PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx0_chi1 ); \ + PASTEMAC2(chy,chy,copycjs)( conjy, *psi1, conjy1_psi1 ); \ + PASTEMAC2(chy,chy,copycjs)( conj0, *psi1, conjy0_psi1 ); \ +\ + /* Compute scalars for vector subproblems. */ \ + PASTEMAC3(chxy,chx,chxy,scal2s)( alpha0, conjx0_chi1, alpha0_chi1 ); \ + PASTEMAC3(chxy,chx,chxy,scal2s)( alpha1, conjy1_psi1, alpha1_psi1 ); \ +\ + /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have + already been conjugated, if needed, by conjx and conjy. */ \ + PASTEMAC3(chy,chxy,chxy,scal2s)( alpha0_chi1, conjy0_psi1, alpha0_chi1_psi1 ); \ +\ + /* c10t = c10t + alpha * chi1 * y0'; */ \ + PASTEMAC3(chxy,chy,chc,kername)( conj0, \ + n_behind, \ + &alpha0_chi1, \ + y0, incy, \ + c10t, cs_ct ); \ +\ + /* c10t = c10t + conj(alpha) * psi1 * x0'; */ \ + PASTEMAC3(chxy,chx,chc,kername)( conj1, \ + n_behind, \ + &alpha1_psi1, \ + x0, incx, \ + c10t, cs_ct ); \ +\ + /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + + conj(alpha) * psi1 * conj(chi1); */ \ + PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ +\ + /* For her2, explicitly set the imaginary component of gamma11 to + zero. */ \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(chc,seti0s)( *gamma11 ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( her2_unb_var1, AXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( her2_unb_var1, AXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( her2_unb_var1, AXPYV_KERNEL ) +#endif + diff --git a/frame/2/her2/bli_her2_unb_var1.h b/frame/2/her2/old/bli_her2_unb_var1.h similarity index 90% rename from frame/2/her2/bli_her2_unb_var1.h rename to frame/2/her2/old/bli_her2_unb_var1.h index 835bc66df..f2ab28611 100644 --- a/frame/2/her2/bli_her2_unb_var1.h +++ b/frame/2/her2/old/bli_her2_unb_var1.h @@ -33,13 +33,13 @@ */ -void bli_her2_unb_var1( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ); +void bli_her2_unb_var1( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ); #undef GENTPROT3 diff --git a/frame/2/her2/old/bli_her2_unb_var2.c b/frame/2/her2/old/bli_her2_unb_var2.c new file mode 100644 index 000000000..4c25158be --- /dev/null +++ b/frame/2/her2/old/bli_her2_unb_var2.c @@ -0,0 +1,262 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T her2_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + conj_t conjx, + conj_t conjy, + conj_t conjh, + dim_t m, + void* alpha, + void* x, inc_t incx, + void* y, inc_t incy, + void* c, inc_t rs_c, inc_t cs_c + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unb_var2); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unb_var2); +#else +static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unb_var2); +#endif +#endif + + +void bli_her2_unb_var2( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ) +{ + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + num_t dt_c = bli_obj_datatype( *c ); + + uplo_t uplo = bli_obj_uplo( *c ); + conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); + + dim_t m = bli_obj_length( *c ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + void* buf_c = bli_obj_buffer_at_off( *c ); + inc_t rs_c = bli_obj_row_stride( *c ); + inc_t cs_c = bli_obj_col_stride( *c ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of the datatypes of x and y. + dt_alpha = bli_datatype_union( dt_x, dt_y ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_x][dt_y][dt_c]; + + // Invoke the function. + f( uplo, + conjx, + conjy, + conjh, + m, + buf_alpha, + buf_x, incx, + buf_y, incy, + buf_c, rs_c, cs_c ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +\ +void PASTEMAC3(chx,chy,chc,varname)( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* c, inc_t rs_c, inc_t cs_c \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_xy* two = PASTEMAC(chxy,2); \ + ctype_xy* alpha_cast = alpha; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_c* c_cast = c; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_y* psi1; \ + ctype_c* c10t; \ + ctype_c* gamma11; \ + ctype_c* c21; \ + ctype_xy alpha0; \ + ctype_xy alpha1; \ + ctype_xy alpha0_psi1; \ + ctype_xy alpha1_psi1; \ + ctype_xy alpha0_chi1_psi1; \ + ctype_y conjy0_psi1; \ + ctype_y conjy1_psi1; \ + ctype_x conjx0_chi1; \ + dim_t i; \ + dim_t n_behind; \ + dim_t n_ahead; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ + conj_t conjh_conjy; \ +\ + /* Eliminate unused variable warnings. */ \ + ( void )conjh_conjy; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_ct = rs_c; \ + cs_ct = cs_c; \ +\ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_ct = cs_c; \ + cs_ct = rs_c; \ +\ + /* Toggle conjugation of conjx/conjy, but only if we are being invoked + as her2; for syr2, conjx/conjy are unchanged. */ \ + conjx = bli_apply_conj( conjh, conjx ); \ + conjy = bli_apply_conj( conjh, conjy ); \ +\ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + } \ +\ + /* Apply conjh (which carries the conjugation component of the Hermitian + transpose, if applicable) to conjx and/or conjy as needed to arrive at + the effective conjugation for the vector subproblems. */ \ + conj0 = conjx; \ + conj1 = bli_apply_conj( conjh, conjx ); \ + conjh_conjy = bli_apply_conj( conjh, conjy ); \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_behind = i; \ + n_ahead = m - i - 1; \ + x0 = x_cast + (0 )*incx; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ + psi1 = y_cast + (i )*incy; \ + c10t = c_cast + (i )*rs_ct + (0 )*cs_ct; \ + gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ + c21 = c_cast + (i+1)*rs_ct + (i )*cs_ct; \ +\ + /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ + PASTEMAC2(chy,chy,copycjs)( conjh_conjy, *psi1, conjy0_psi1 ); \ + PASTEMAC2(chy,chy,copycjs)( conjy, *psi1, conjy1_psi1 ); \ + PASTEMAC2(chx,chx,copycjs)( conj0, *chi1, conjx0_chi1 ); \ +\ + /* Compute scalars for vector subproblems. */ \ + PASTEMAC3(chxy,chy,chxy,scal2s)( alpha0, conjy0_psi1, alpha0_psi1 ); \ + PASTEMAC3(chxy,chy,chxy,scal2s)( alpha1, conjy1_psi1, alpha1_psi1 ); \ +\ + /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have + already been conjugated, if needed, by conjx and conjy. */ \ + PASTEMAC3(chy,chx,chxy,scal2s)( alpha0_psi1, conjx0_chi1, alpha0_chi1_psi1 ); \ +\ + /* c21 = c21 + alpha * x2 * conj(psi1); */ \ + PASTEMAC3(chxy,chx,chc,kername)( conj0, \ + n_ahead, \ + &alpha0_psi1, \ + x2, incx, \ + c21, rs_ct ); \ +\ + /* c10t = c10t + conj(alpha) * psi1 * x0'; */ \ + PASTEMAC3(chxy,chx,chc,kername)( conj1, \ + n_behind, \ + &alpha1_psi1, \ + x0, incx, \ + c10t, cs_ct ); \ +\ + /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + + conj(alpha) * psi1 * conj(chi1); */ \ + PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ +\ + /* For her2, explicitly set the imaginary component of gamma11 to + zero. */ \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(chc,seti0s)( *gamma11 ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( her2_unb_var2, AXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( her2_unb_var2, AXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( her2_unb_var2, AXPYV_KERNEL ) +#endif + diff --git a/frame/2/her2/bli_her2_unb_var2.h b/frame/2/her2/old/bli_her2_unb_var2.h similarity index 90% rename from frame/2/her2/bli_her2_unb_var2.h rename to frame/2/her2/old/bli_her2_unb_var2.h index 268362048..94bc5c5fb 100644 --- a/frame/2/her2/bli_her2_unb_var2.h +++ b/frame/2/her2/old/bli_her2_unb_var2.h @@ -33,13 +33,13 @@ */ -void bli_her2_unb_var2( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ); +void bli_her2_unb_var2( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ); #undef GENTPROT3 diff --git a/frame/2/her2/old/bli_her2_unb_var3.c b/frame/2/her2/old/bli_her2_unb_var3.c new file mode 100644 index 000000000..8ffc9e79a --- /dev/null +++ b/frame/2/her2/old/bli_her2_unb_var3.c @@ -0,0 +1,262 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T her2_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + conj_t conjx, + conj_t conjy, + conj_t conjh, + dim_t m, + void* alpha, + void* x, inc_t incx, + void* y, inc_t incy, + void* c, inc_t rs_c, inc_t cs_c + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unb_var3); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unb_var3); +#else +static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unb_var3); +#endif +#endif + + +void bli_her2_unb_var3( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ) +{ + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + num_t dt_c = bli_obj_datatype( *c ); + + uplo_t uplo = bli_obj_uplo( *c ); + conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); + + dim_t m = bli_obj_length( *c ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + void* buf_c = bli_obj_buffer_at_off( *c ); + inc_t rs_c = bli_obj_row_stride( *c ); + inc_t cs_c = bli_obj_col_stride( *c ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of the datatypes of x and y. + dt_alpha = bli_datatype_union( dt_x, dt_y ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_x][dt_y][dt_c]; + + // Invoke the function. + f( uplo, + conjx, + conjy, + conjh, + m, + buf_alpha, + buf_x, incx, + buf_y, incy, + buf_c, rs_c, cs_c ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +\ +void PASTEMAC3(chx,chy,chc,varname)( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* c, inc_t rs_c, inc_t cs_c \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_xy* two = PASTEMAC(chxy,2); \ + ctype_xy* alpha_cast = alpha; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_c* c_cast = c; \ + ctype_x* chi1; \ + ctype_y* y0; \ + ctype_y* psi1; \ + ctype_y* y2; \ + ctype_c* c10t; \ + ctype_c* gamma11; \ + ctype_c* c21; \ + ctype_xy alpha0; \ + ctype_xy alpha1; \ + ctype_xy alpha0_chi1; \ + ctype_xy alpha1_chi1; \ + ctype_xy alpha0_chi1_psi1; \ + ctype_x conjx0_chi1; \ + ctype_x conjx1_chi1; \ + ctype_y conjy0_psi1; \ + dim_t i; \ + dim_t n_behind; \ + dim_t n_ahead; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ + conj_t conjh_conjx; \ +\ + /* Eliminate unused variable warnings. */ \ + ( void )conjh_conjx; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_ct = rs_c; \ + cs_ct = cs_c; \ +\ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_ct = cs_c; \ + cs_ct = rs_c; \ +\ + /* Toggle conjugation of conjx/conjy, but only if we are being invoked + as her2; for syr2, conjx/conjy are unchanged. */ \ + conjx = bli_apply_conj( conjh, conjx ); \ + conjy = bli_apply_conj( conjh, conjy ); \ +\ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + } \ +\ + /* Apply conjh (which carries the conjugation component of the Hermitian + transpose, if applicable) to conjx and/or conjy as needed to arrive at + the effective conjugation for the vector subproblems. */ \ + conj0 = bli_apply_conj( conjh, conjy ); \ + conj1 = conjy; \ + conjh_conjx = bli_apply_conj( conjh, conjx ); \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_behind = i; \ + n_ahead = m - i - 1; \ + chi1 = x_cast + (i )*incx; \ + y0 = y_cast + (0 )*incy; \ + psi1 = y_cast + (i )*incy; \ + y2 = y_cast + (i+1)*incy; \ + c10t = c_cast + (i )*rs_ct + (0 )*cs_ct; \ + gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ + c21 = c_cast + (i+1)*rs_ct + (i )*cs_ct; \ +\ + /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ + PASTEMAC2(chy,chy,copycjs)( conjx, *chi1, conjx0_chi1 ); \ + PASTEMAC2(chy,chy,copycjs)( conjh_conjx, *chi1, conjx1_chi1 ); \ + PASTEMAC2(chx,chx,copycjs)( conj0, *psi1, conjy0_psi1 ); \ +\ + /* Compute scalars for vector subproblems. */ \ + PASTEMAC3(chxy,chy,chxy,scal2s)( alpha0, conjx0_chi1, alpha0_chi1 ); \ + PASTEMAC3(chxy,chy,chxy,scal2s)( alpha1, conjx1_chi1, alpha1_chi1 ); \ +\ + /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have + already been conjugated, if needed, by conjx and conjy. */ \ + PASTEMAC3(chy,chx,chxy,scal2s)( alpha0_chi1, conjy0_psi1, alpha0_chi1_psi1 ); \ +\ + /* c10t = c10t + alpha * chi1 * y0'; */ \ + PASTEMAC3(chxy,chy,chc,kername)( conj0, \ + n_behind, \ + &alpha0_chi1, \ + y0, incy, \ + c10t, cs_ct ); \ +\ + /* c21 = c21 + conj(alpha) * y2 * conj(chi1); */ \ + PASTEMAC3(chxy,chy,chc,kername)( conj1, \ + n_ahead, \ + &alpha1_chi1, \ + y2, incy, \ + c21, rs_ct ); \ +\ + /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + + conj(alpha) * psi1 * conj(chi1); */ \ + PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ +\ + /* For her2, explicitly set the imaginary component of gamma11 to + zero. */ \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(chc,seti0s)( *gamma11 ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( her2_unb_var3, AXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( her2_unb_var3, AXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( her2_unb_var3, AXPYV_KERNEL ) +#endif + diff --git a/frame/2/her2/bli_her2_unb_var3.h b/frame/2/her2/old/bli_her2_unb_var3.h similarity index 90% rename from frame/2/her2/bli_her2_unb_var3.h rename to frame/2/her2/old/bli_her2_unb_var3.h index 60dc00d20..1ba11f230 100644 --- a/frame/2/her2/bli_her2_unb_var3.h +++ b/frame/2/her2/old/bli_her2_unb_var3.h @@ -33,13 +33,13 @@ */ -void bli_her2_unb_var3( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ); +void bli_her2_unb_var3( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ); #undef GENTPROT3 diff --git a/frame/2/her2/old/bli_her2_unb_var4.c b/frame/2/her2/old/bli_her2_unb_var4.c new file mode 100644 index 000000000..1b7ad41ef --- /dev/null +++ b/frame/2/her2/old/bli_her2_unb_var4.c @@ -0,0 +1,261 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T her2_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + conj_t conjx, + conj_t conjy, + conj_t conjh, + dim_t m, + void* alpha, + void* x, inc_t incx, + void* y, inc_t incy, + void* c, inc_t rs_c, inc_t cs_c + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unb_var4); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unb_var4); +#else +static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unb_var4); +#endif +#endif + + +void bli_her2_unb_var4( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ) +{ + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + num_t dt_c = bli_obj_datatype( *c ); + + uplo_t uplo = bli_obj_uplo( *c ); + conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); + + dim_t m = bli_obj_length( *c ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + void* buf_c = bli_obj_buffer_at_off( *c ); + inc_t rs_c = bli_obj_row_stride( *c ); + inc_t cs_c = bli_obj_col_stride( *c ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of the datatypes of x and y. + dt_alpha = bli_datatype_union( dt_x, dt_y ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_x][dt_y][dt_c]; + + // Invoke the function. + f( uplo, + conjx, + conjy, + conjh, + m, + buf_alpha, + buf_x, incx, + buf_y, incy, + buf_c, rs_c, cs_c ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +\ +void PASTEMAC3(chx,chy,chc,varname)( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* c, inc_t rs_c, inc_t cs_c \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_xy* two = PASTEMAC(chxy,2); \ + ctype_xy* alpha_cast = alpha; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_c* c_cast = c; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_y* psi1; \ + ctype_y* y2; \ + ctype_c* gamma11; \ + ctype_c* c21; \ + ctype_xy alpha0; \ + ctype_xy alpha1; \ + ctype_xy alpha0_psi1; \ + ctype_xy alpha1_chi1; \ + ctype_xy alpha0_chi1_psi1; \ + ctype_y conjy0_psi1; \ + ctype_x conjx1_chi1; \ + ctype_x conjx0_chi1; \ + dim_t i; \ + dim_t n_ahead; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ + conj_t conjh_conjx; \ + conj_t conjh_conjy; \ +\ + /* Eliminate unused variable warnings. */ \ + ( void )conjh_conjx; \ + ( void )conjh_conjy; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_ct = rs_c; \ + cs_ct = cs_c; \ +\ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_ct = cs_c; \ + cs_ct = rs_c; \ +\ + /* Toggle conjugation of conjx/conjy, but only if we are being invoked + as her2; for syr2, conjx/conjy are unchanged. */ \ + conjx = bli_apply_conj( conjh, conjx ); \ + conjy = bli_apply_conj( conjh, conjy ); \ +\ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + } \ +\ + /* Apply conjh (which carries the conjugation component of the Hermitian + transpose, if applicable) to conjx and/or conjy as needed to arrive at + the effective conjugation for the vector subproblems. */ \ + conj0 = conjx; \ + conj1 = conjy; \ + conjh_conjx = bli_apply_conj( conjh, conjx ); \ + conjh_conjy = bli_apply_conj( conjh, conjy ); \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_ahead = m - i - 1; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ + psi1 = y_cast + (i )*incy; \ + y2 = y_cast + (i+1)*incy; \ + gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ + c21 = c_cast + (i+1)*rs_ct + (i )*cs_ct; \ +\ + /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ + PASTEMAC2(chy,chy,copycjs)( conjh_conjy, *psi1, conjy0_psi1 ); \ + PASTEMAC2(chx,chx,copycjs)( conjh_conjx, *chi1, conjx1_chi1 ); \ + PASTEMAC2(chx,chx,copycjs)( conj0, *chi1, conjx0_chi1 ); \ +\ + /* Compute scalars for vector subproblems. */ \ + PASTEMAC3(chxy,chx,chxy,scal2s)( alpha0, conjy0_psi1, alpha0_psi1 ); \ + PASTEMAC3(chxy,chx,chxy,scal2s)( alpha1, conjx1_chi1, alpha1_chi1 ); \ +\ + /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have + already been conjugated, if needed, by conjx and conjy. */ \ + PASTEMAC3(chy,chxy,chxy,scal2s)( alpha0_psi1, conjx0_chi1, alpha0_chi1_psi1 ); \ +\ + /* c21 = c21 + alpha * x2 * conj(psi1); */ \ + PASTEMAC3(chxy,chx,chc,kername)( conj0, \ + n_ahead, \ + &alpha0_psi1, \ + x2, incx, \ + c21, rs_ct ); \ +\ + /* c21 = c21 + conj(alpha) * y2 * conj(chi1); */ \ + PASTEMAC3(chxy,chy,chc,kername)( conj1, \ + n_ahead, \ + &alpha1_chi1, \ + y2, incy, \ + c21, rs_ct ); \ +\ + /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + + conj(alpha) * psi1 * conj(chi1); */ \ + PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ +\ + /* For her2, explicitly set the imaginary component of gamma11 to + zero. */ \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(chc,seti0s)( *gamma11 ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( her2_unb_var4, AXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( her2_unb_var4, AXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( her2_unb_var4, AXPYV_KERNEL ) +#endif + diff --git a/frame/2/her2/bli_her2_unb_var4.h b/frame/2/her2/old/bli_her2_unb_var4.h similarity index 90% rename from frame/2/her2/bli_her2_unb_var4.h rename to frame/2/her2/old/bli_her2_unb_var4.h index 16dcb7c2e..8ee19401a 100644 --- a/frame/2/her2/bli_her2_unb_var4.h +++ b/frame/2/her2/old/bli_her2_unb_var4.h @@ -33,13 +33,13 @@ */ -void bli_her2_unb_var4( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ); +void bli_her2_unb_var4( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ); #undef GENTPROT3 diff --git a/frame/2/her2/old/bli_her2_unf_var1.c b/frame/2/her2/old/bli_her2_unf_var1.c new file mode 100644 index 000000000..06571e0a6 --- /dev/null +++ b/frame/2/her2/old/bli_her2_unf_var1.c @@ -0,0 +1,250 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T her2_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + conj_t conjx, + conj_t conjy, + conj_t conjh, + dim_t m, + void* alpha, + void* x, inc_t incx, + void* y, inc_t incy, + void* c, inc_t rs_c, inc_t cs_c + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unf_var1); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unf_var1); +#else +static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unf_var1); +#endif +#endif + + +void bli_her2_unf_var1( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ) +{ + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + num_t dt_c = bli_obj_datatype( *c ); + + uplo_t uplo = bli_obj_uplo( *c ); + conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); + + dim_t m = bli_obj_length( *c ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + void* buf_c = bli_obj_buffer_at_off( *c ); + inc_t rs_c = bli_obj_row_stride( *c ); + inc_t cs_c = bli_obj_col_stride( *c ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of the datatypes of x and y. + dt_alpha = bli_datatype_union( dt_x, dt_y ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_x][dt_y][dt_c]; + + // Invoke the function. + f( uplo, + conjx, + conjy, + conjh, + m, + buf_alpha, + buf_x, incx, + buf_y, incy, + buf_c, rs_c, cs_c ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +\ +void PASTEMAC3(chx,chy,chc,varname)( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* c, inc_t rs_c, inc_t cs_c \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_xy* two = PASTEMAC(chxy,2); \ + ctype_xy* alpha_cast = alpha; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_c* c_cast = c; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_y* y0; \ + ctype_y* psi1; \ + ctype_c* c10t; \ + ctype_c* gamma11; \ + ctype_xy alpha0; \ + ctype_xy alpha1; \ + ctype_xy alpha0_chi1; \ + ctype_xy alpha1_psi1; \ + ctype_xy alpha0_chi1_psi1; \ + ctype_x conjx0_chi1; \ + ctype_y conjy1_psi1; \ + ctype_y conjy0_psi1; \ + dim_t i; \ + dim_t n_behind; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_ct = rs_c; \ + cs_ct = cs_c; \ +\ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_ct = cs_c; \ + cs_ct = rs_c; \ +\ + /* Toggle conjugation of conjx/conjy, but only if we are being invoked + as her2; for syr2, conjx/conjy are unchanged. */ \ + conjx = bli_apply_conj( conjh, conjx ); \ + conjy = bli_apply_conj( conjh, conjy ); \ +\ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + } \ +\ + /* Apply conjh (which carries the conjugation component of the Hermitian + transpose, if applicable) to conjx and/or conjy as needed to arrive at + the effective conjugation for the vector subproblems. */ \ + conj0 = bli_apply_conj( conjh, conjy ); \ + conj1 = bli_apply_conj( conjh, conjx ); \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_behind = i; \ + x0 = x_cast + (0 )*incx; \ + chi1 = x_cast + (i )*incx; \ + y0 = y_cast + (0 )*incy; \ + psi1 = y_cast + (i )*incy; \ + c10t = c_cast + (i )*rs_ct + (0 )*cs_ct; \ + gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ +\ + /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ + PASTEMAC2(chx,chx,copycjs)( conjx, *chi1, conjx0_chi1 ); \ + PASTEMAC2(chy,chy,copycjs)( conjy, *psi1, conjy1_psi1 ); \ + PASTEMAC2(chy,chy,copycjs)( conj0, *psi1, conjy0_psi1 ); \ +\ + /* Compute scalars for vector subproblems. */ \ + PASTEMAC3(chxy,chx,chxy,scal2s)( alpha0, conjx0_chi1, alpha0_chi1 ); \ + PASTEMAC3(chxy,chx,chxy,scal2s)( alpha1, conjy1_psi1, alpha1_psi1 ); \ +\ + /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have + already been conjugated, if needed, by conjx and conjy. */ \ + PASTEMAC3(chy,chxy,chxy,scal2s)( alpha0_chi1, conjy0_psi1, alpha0_chi1_psi1 ); \ +\ + /* c10t = c10t + alpha * chi1 * y0'; */ \ + /* c10t = c10t + conj(alpha) * psi1 * x0'; */ \ + PASTEMAC3(chy,chx,chc,kername)( conj0, \ + conj1, \ + n_behind, \ + &alpha0_chi1, \ + &alpha1_psi1, \ + y0, incy, \ + x0, incx, \ + c10t, cs_ct ); \ +\ + /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + + conj(alpha) * psi1 * conj(chi1); */ \ + PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ +\ + /* For her2, explicitly set the imaginary component of gamma11 to + zero. */ \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(chc,seti0s)( *gamma11 ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( her2_unf_var1, AXPY2V_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( her2_unf_var1, AXPY2V_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( her2_unf_var1, AXPY2V_KERNEL ) +#endif + diff --git a/frame/2/her2/bli_her2_unf_var1.h b/frame/2/her2/old/bli_her2_unf_var1.h similarity index 90% rename from frame/2/her2/bli_her2_unf_var1.h rename to frame/2/her2/old/bli_her2_unf_var1.h index dedac9c78..0f3ad8d82 100644 --- a/frame/2/her2/bli_her2_unf_var1.h +++ b/frame/2/her2/old/bli_her2_unf_var1.h @@ -33,13 +33,13 @@ */ -void bli_her2_unf_var1( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ); +void bli_her2_unf_var1( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ); #undef GENTPROT3 diff --git a/frame/2/her2/old/bli_her2_unf_var4.c b/frame/2/her2/old/bli_her2_unf_var4.c new file mode 100644 index 000000000..a1732ba59 --- /dev/null +++ b/frame/2/her2/old/bli_her2_unf_var4.c @@ -0,0 +1,258 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T her2_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + conj_t conjx, + conj_t conjy, + conj_t conjh, + dim_t m, + void* alpha, + void* x, inc_t incx, + void* y, inc_t incy, + void* c, inc_t rs_c, inc_t cs_c + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY3_ALL(ftypes,her2_unf_var4); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY3_EXT(ftypes,her2_unf_var4); +#else +static FUNCPTR_T GENARRAY3_MIN(ftypes,her2_unf_var4); +#endif +#endif + + +void bli_her2_unf_var4( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ) +{ + num_t dt_x = bli_obj_datatype( *x ); + num_t dt_y = bli_obj_datatype( *y ); + num_t dt_c = bli_obj_datatype( *c ); + + uplo_t uplo = bli_obj_uplo( *c ); + conj_t conjx = bli_obj_conj_status( *x ); + conj_t conjy = bli_obj_conj_status( *y ); + + dim_t m = bli_obj_length( *c ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + void* buf_y = bli_obj_buffer_at_off( *y ); + inc_t incy = bli_obj_vector_inc( *y ); + + void* buf_c = bli_obj_buffer_at_off( *c ); + inc_t rs_c = bli_obj_row_stride( *c ); + inc_t cs_c = bli_obj_col_stride( *c ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of the datatypes of x and y. + dt_alpha = bli_datatype_union( dt_x, dt_y ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_x][dt_y][dt_c]; + + // Invoke the function. + f( uplo, + conjx, + conjy, + conjh, + m, + buf_alpha, + buf_x, incx, + buf_y, incy, + buf_c, rs_c, cs_c ); +} + + +#undef GENTFUNC3U12 +#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, varname, kername ) \ +\ +void PASTEMAC3(chx,chy,chc,varname)( \ + uplo_t uplo, \ + conj_t conjx, \ + conj_t conjy, \ + conj_t conjh, \ + dim_t m, \ + void* alpha, \ + void* x, inc_t incx, \ + void* y, inc_t incy, \ + void* c, inc_t rs_c, inc_t cs_c \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_xy* two = PASTEMAC(chxy,2); \ + ctype_xy* alpha_cast = alpha; \ + ctype_x* x_cast = x; \ + ctype_y* y_cast = y; \ + ctype_c* c_cast = c; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_y* psi1; \ + ctype_y* y2; \ + ctype_c* gamma11; \ + ctype_c* c21; \ + ctype_xy alpha0; \ + ctype_xy alpha1; \ + ctype_xy alpha0_psi1; \ + ctype_xy alpha1_chi1; \ + ctype_xy alpha0_chi1_psi1; \ + ctype_y conjy0_psi1; \ + ctype_x conjx1_chi1; \ + ctype_x conjx0_chi1; \ + dim_t i; \ + dim_t n_ahead; \ + inc_t rs_ct, cs_ct; \ + conj_t conj0, conj1; \ + conj_t conjh_conjx; \ + conj_t conjh_conjy; \ +\ + /* Eliminate unused variable warnings. */ \ + ( void )conjh_conjx; \ + ( void )conjh_conjy; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( PASTEMAC(chxy,eq0)( *alpha_cast ) ) return; \ +\ + /* The algorithm will be expressed in terms of the lower triangular case; + the upper triangular case is supported by swapping the row and column + strides of A and toggling some conj parameters. */ \ + if ( bli_is_lower( uplo ) ) \ + { \ + rs_ct = rs_c; \ + cs_ct = cs_c; \ +\ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha1 ); \ + } \ + else /* if ( bli_is_upper( uplo ) ) */ \ + { \ + rs_ct = cs_c; \ + cs_ct = rs_c; \ +\ + /* Toggle conjugation of conjx/conjy, but only if we are being invoked + as her2; for syr2, conjx/conjy are unchanged. */ \ + conjx = bli_apply_conj( conjh, conjx ); \ + conjy = bli_apply_conj( conjh, conjy ); \ +\ + PASTEMAC2(chxy,chxy,copycjs)( conjh, *alpha_cast, alpha0 ); \ + PASTEMAC2(chxy,chxy,copys)( *alpha_cast, alpha1 ); \ + } \ +\ + /* Apply conjh (which carries the conjugation component of the Hermitian + transpose, if applicable) to conjx and/or conjy as needed to arrive at + the effective conjugation for the vector subproblems. */ \ + conj0 = conjx; \ + conj1 = conjy; \ + conjh_conjx = bli_apply_conj( conjh, conjx ); \ + conjh_conjy = bli_apply_conj( conjh, conjy ); \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + n_ahead = m - i - 1; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ + psi1 = y_cast + (i )*incy; \ + y2 = y_cast + (i+1)*incy; \ + gamma11 = c_cast + (i )*rs_ct + (i )*cs_ct; \ + c21 = c_cast + (i+1)*rs_ct + (i )*cs_ct; \ +\ + /* Apply conjx and/or conjy to chi1 and/or psi1. */ \ + PASTEMAC2(chy,chy,copycjs)( conjh_conjy, *psi1, conjy0_psi1 ); \ + PASTEMAC2(chx,chx,copycjs)( conjh_conjx, *chi1, conjx1_chi1 ); \ + PASTEMAC2(chx,chx,copycjs)( conj0, *chi1, conjx0_chi1 ); \ +\ + /* Compute scalars for vector subproblems. */ \ + PASTEMAC3(chxy,chx,chxy,scal2s)( alpha0, conjy0_psi1, alpha0_psi1 ); \ + PASTEMAC3(chxy,chx,chxy,scal2s)( alpha1, conjx1_chi1, alpha1_chi1 ); \ +\ + /* Compute alpha * chi1 * conj(psi1) after both chi1 and psi1 have + already been conjugated, if needed, by conjx and conjy. */ \ + PASTEMAC3(chy,chxy,chxy,scal2s)( alpha0_psi1, conjx0_chi1, alpha0_chi1_psi1 ); \ +\ + /* c21 = c21 + alpha * x2 * conj(psi1); */ \ + /* c21 = c21 + conj(alpha) * y2 * conj(chi1); */ \ + PASTEMAC3(chx,chy,chc,kername)( conj0, \ + conj1, \ + n_ahead, \ + &alpha0_psi1, \ + &alpha1_chi1, \ + x2, incx, \ + y2, incy, \ + c21, rs_ct ); \ +\ + /* gamma11 = gamma11 + alpha * chi1 * conj(psi1) \ + + conj(alpha) * psi1 * conj(chi1); */ \ + PASTEMAC3(chxy,chxy,chc,axpys)( *two, alpha0_chi1_psi1, *gamma11 ); \ +\ + /* For her2, explicitly set the imaginary component of gamma11 to + zero. */ \ + if ( bli_is_conj( conjh ) ) \ + PASTEMAC(chc,seti0s)( *gamma11 ); \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC3U12_BASIC( her2_unf_var4, AXPY2V_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC3U12_MIX_D( her2_unf_var4, AXPY2V_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC3U12_MIX_P( her2_unf_var4, AXPY2V_KERNEL ) +#endif + diff --git a/frame/2/her2/bli_her2_unf_var4.h b/frame/2/her2/old/bli_her2_unf_var4.h similarity index 90% rename from frame/2/her2/bli_her2_unf_var4.h rename to frame/2/her2/old/bli_her2_unf_var4.h index 3da89d62c..080489bd2 100644 --- a/frame/2/her2/bli_her2_unf_var4.h +++ b/frame/2/her2/old/bli_her2_unf_var4.h @@ -33,13 +33,13 @@ */ -void bli_her2_unf_var4( conj_t conjh, - obj_t* alpha, - obj_t* alpha_conj, - obj_t* x, - obj_t* y, - obj_t* c, - her2_t* cntl ); +void bli_her2_unf_var4( conj_t conjh, + obj_t* alpha, + obj_t* alpha_conj, + obj_t* x, + obj_t* y, + obj_t* c, + her2_t* cntl ); #undef GENTPROT3 diff --git a/frame/2/symv/bli_symv.h b/frame/2/symv/bli_symv.h index 2600c5929..5195a4c50 100644 --- a/frame/2/symv/bli_symv.h +++ b/frame/2/symv/bli_symv.h @@ -32,62 +32,5 @@ */ -#include "bli_symv_check.h" - - -void bli_symv( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ); - - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploa, \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* x, inc_t incx, \ - ctype* beta, \ - ctype* y, inc_t incy \ - ); - -INSERT_GENTPROT_BASIC( symv ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, opname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - uplo_t uploa, \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t rs_a, inc_t cs_a, \ - ctype_x* x, inc_t incx, \ - ctype_y* beta, \ - ctype_y* y, inc_t incy \ - ); - -INSERT_GENTPROT3U12_BASIC( symv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( symv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( symv ) -#endif +#include "bli_symv_front.h" diff --git a/frame/2/symv/bli_symv.c b/frame/2/symv/bli_symv_front.c similarity index 78% rename from frame/2/symv/bli_symv.c rename to frame/2/symv/bli_symv_front.c index 9411996db..badb19d9a 100644 --- a/frame/2/symv/bli_symv.c +++ b/frame/2/symv/bli_symv_front.c @@ -39,11 +39,15 @@ extern hemv_t* hemv_cntl_bs_ke_lcol_urow; extern hemv_t* hemv_cntl_ge_lrow_ucol; extern hemv_t* hemv_cntl_ge_lcol_urow; -void bli_symv( obj_t* alpha, - obj_t* a, - obj_t* x, - obj_t* beta, - obj_t* y ) +void bli_symv_front + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx + ) { hemv_t* hemv_cntl; num_t dt_targ_a; @@ -148,6 +152,7 @@ void bli_symv( obj_t* alpha, x, &beta_local, y, + cntx, hemv_cntl ); } @@ -156,19 +161,21 @@ void bli_symv( obj_t* alpha, // Define BLAS-like interfaces with homogeneous-typed operands. // #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploa, \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* x, inc_t incx, \ - ctype* beta, \ - ctype* y, inc_t incy \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -197,39 +204,9 @@ void PASTEMAC(ch,opname)( \ &ao, \ &xo, \ &betao, \ - &yo ); \ + &yo, \ + cntx ); \ } -INSERT_GENTFUNC_BASIC( symv, symv ) +INSERT_GENTFUNC_BASIC0( symv_front ) - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, opname, varname ) \ -\ -void PASTEMAC3(cha,chx,chy,opname)( \ - uplo_t uploa, \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t rs_a, inc_t cs_a, \ - ctype_x* x, inc_t incx, \ - ctype_y* beta, \ - ctype_y* y, inc_t incy \ - ) \ -{ \ - bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ -} - -INSERT_GENTFUNC3U12_BASIC( symv, symv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( symv, symv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( symv, symv ) -#endif diff --git a/frame/2/symv/bli_symv_front.h b/frame/2/symv/bli_symv_front.h new file mode 100644 index 000000000..a5fd5cbb5 --- /dev/null +++ b/frame/2/symv/bli_symv_front.h @@ -0,0 +1,64 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +void bli_symv_front + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + obj_t* beta, + obj_t* y, + cntx_t* cntx + ); + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + ctype* beta, \ + ctype* y, inc_t incy, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( symv_front ) + diff --git a/frame/2/symv/bli_symv_check.c b/frame/2/symv/old/bli_symv_check.c similarity index 100% rename from frame/2/symv/bli_symv_check.c rename to frame/2/symv/old/bli_symv_check.c diff --git a/frame/2/symv/bli_symv_check.h b/frame/2/symv/old/bli_symv_check.h similarity index 100% rename from frame/2/symv/bli_symv_check.h rename to frame/2/symv/old/bli_symv_check.h diff --git a/frame/2/syr/bli_syr.h b/frame/2/syr/bli_syr.h index be2d386d8..25a5e0a63 100644 --- a/frame/2/syr/bli_syr.h +++ b/frame/2/syr/bli_syr.h @@ -32,54 +32,5 @@ */ -#include "bli_syr_check.h" - - -void bli_syr( obj_t* alpha, - obj_t* x, - obj_t* c ); - - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - dim_t m, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( syr ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_c, chx, chc, opname ) \ -\ -void PASTEMAC2(chx,chc,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - dim_t m, \ - ctype_x* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_c* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT2_BASIC( syr ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( syr ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( syr ) -#endif +#include "bli_syr_front.h" diff --git a/frame/2/syr/bli_syr.c b/frame/2/syr/bli_syr_front.c similarity index 80% rename from frame/2/syr/bli_syr.c rename to frame/2/syr/bli_syr_front.c index 0242dfaca..9be765dce 100644 --- a/frame/2/syr/bli_syr.c +++ b/frame/2/syr/bli_syr_front.c @@ -39,9 +39,13 @@ extern her_t* her_cntl_bs_ke_lcol_urow; extern her_t* her_cntl_ge_lrow_ucol; extern her_t* her_cntl_ge_lcol_urow; -void bli_syr( obj_t* alpha, - obj_t* x, - obj_t* c ) +void bli_syr_front + ( + obj_t* alpha, + obj_t* x, + obj_t* c, + cntx_t* cntx + ) { her_t* her_cntl; num_t dt_targ_x; @@ -117,7 +121,6 @@ void bli_syr( obj_t* alpha, } } - // Invoke the internal back-end with the copy-cast scalar and the // chosen control tree. Set conjh to BLIS_NO_CONJUGATE to invoke the // symmetric (and not Hermitian) algorithms. @@ -125,6 +128,7 @@ void bli_syr( obj_t* alpha, &alpha_local, x, c, + cntx, her_cntl ); } @@ -133,16 +137,18 @@ void bli_syr( obj_t* alpha, // Define BLAS-like interfaces with homogeneous-typed operands. // #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - dim_t m, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + conj_t conjx, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -164,36 +170,9 @@ void PASTEMAC(ch,opname)( \ \ PASTEMAC0(opname)( &alphao, \ &xo, \ - &co ); \ + &co, \ + cntx ); \ } -INSERT_GENTFUNC_BASIC( syr, syr ) +INSERT_GENTFUNC_BASIC0( syr_front ) - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC2 -#define GENTFUNC2( ctype_x, ctype_c, chx, chc, opname, varname ) \ -\ -void PASTEMAC2(chx,chc,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - dim_t m, \ - ctype_x* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_c* c, inc_t rs_c, inc_t cs_c \ - ) \ -{ \ - bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ -} - -INSERT_GENTFUNC2_BASIC( syr, syr ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2_MIX_D( syr, syr ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2_MIX_P( syr, syr ) -#endif diff --git a/frame/2/syr/bli_syr_front.h b/frame/2/syr/bli_syr_front.h new file mode 100644 index 000000000..d36f4c3a4 --- /dev/null +++ b/frame/2/syr/bli_syr_front.h @@ -0,0 +1,58 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +void bli_syr_front + ( + obj_t* alpha, + obj_t* x, + obj_t* c, + cntx_t* cntx + ); + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + conj_t conjx, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( syr_front ) + diff --git a/frame/2/syr/bli_syr_check.c b/frame/2/syr/old/bli_syr_check.c similarity index 100% rename from frame/2/syr/bli_syr_check.c rename to frame/2/syr/old/bli_syr_check.c diff --git a/frame/2/syr/bli_syr_check.h b/frame/2/syr/old/bli_syr_check.h similarity index 100% rename from frame/2/syr/bli_syr_check.h rename to frame/2/syr/old/bli_syr_check.h diff --git a/frame/2/syr2/bli_syr2.h b/frame/2/syr2/bli_syr2.h index 9115af8fb..39d45c6c5 100644 --- a/frame/2/syr2/bli_syr2.h +++ b/frame/2/syr2/bli_syr2.h @@ -32,59 +32,5 @@ */ -#include "bli_syr2_check.h" - - -void bli_syr2( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ); - - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( syr2 ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, opname ) \ -\ -void PASTEMAC3(chx,chy,chc,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype_xy* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_c* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT3U12_BASIC( syr2 ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT3U12_MIX_D( syr2 ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT3U12_MIX_P( syr2 ) -#endif +#include "bli_syr2_front.h" diff --git a/frame/2/syr2/bli_syr2.c b/frame/2/syr2/bli_syr2_front.c similarity index 77% rename from frame/2/syr2/bli_syr2.c rename to frame/2/syr2/bli_syr2_front.c index 56e161f24..25c0c3cbf 100644 --- a/frame/2/syr2/bli_syr2.c +++ b/frame/2/syr2/bli_syr2_front.c @@ -39,10 +39,14 @@ extern her2_t* her2_cntl_bs_ke_lcol_urow; extern her2_t* her2_cntl_ge_lrow_ucol; extern her2_t* her2_cntl_ge_lcol_urow; -void bli_syr2( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ) +void bli_syr2_front + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c, + cntx_t* cntx + ) { her2_t* her2_cntl; num_t dt_targ_x; @@ -123,7 +127,6 @@ void bli_syr2( obj_t* alpha, } } - // Invoke the internal back-end with the copy-cast scalar and the // chosen control tree. Set conjh to BLIS_NO_CONJUGATE to invoke the // symmetric (and not Hermitian) algorithms. @@ -133,6 +136,7 @@ void bli_syr2( obj_t* alpha, x, y, c, + cntx, her2_cntl ); } @@ -141,18 +145,20 @@ void bli_syr2( obj_t* alpha, // Define BLAS-like interfaces with homogeneous-typed operands. // #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype* alpha, \ - ctype* x, inc_t incx, \ - ctype* y, inc_t incy, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -179,39 +185,9 @@ void PASTEMAC(ch,opname)( \ PASTEMAC0(opname)( &alphao, \ &xo, \ &yo, \ - &co ); \ + &co, \ + cntx ); \ } -INSERT_GENTFUNC_BASIC( syr2, syr2 ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC3U12 -#define GENTFUNC3U12( ctype_x, ctype_y, ctype_c, ctype_xy, chx, chy, chc, chxy, opname, varname ) \ -\ -void PASTEMAC3(chx,chy,chc,opname)( \ - uplo_t uploc, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype_xy* alpha, \ - ctype_x* x, inc_t incx, \ - ctype_y* y, inc_t incy, \ - ctype_c* c, inc_t rs_c, inc_t cs_c \ - ) \ -{ \ - bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ -} - -INSERT_GENTFUNC3U12_BASIC( syr2, syr2 ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC3U12_MIX_D( syr2, syr2 ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC3U12_MIX_P( syr2, syr2 ) -#endif +INSERT_GENTFUNC_BASIC0( syr2_front ) diff --git a/frame/2/syr2/bli_syr2_front.h b/frame/2/syr2/bli_syr2_front.h new file mode 100644 index 000000000..1d34a7741 --- /dev/null +++ b/frame/2/syr2/bli_syr2_front.h @@ -0,0 +1,61 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +void bli_syr2_front + ( + obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c, + cntx_t* cntx + ); + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + ctype* alpha, \ + ctype* x, inc_t incx, \ + ctype* y, inc_t incy, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( syr2_front ) + diff --git a/frame/2/syr2/bli_syr2_check.c b/frame/2/syr2/old/bli_syr2_check.c similarity index 87% rename from frame/2/syr2/bli_syr2_check.c rename to frame/2/syr2/old/bli_syr2_check.c index 754f6b965..a374ab05d 100644 --- a/frame/2/syr2/bli_syr2_check.c +++ b/frame/2/syr2/old/bli_syr2_check.c @@ -34,19 +34,19 @@ #include "blis.h" -void bli_syr2_basic_check( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ) +void bli_syr2_basic_check( obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c ) { // The basic properties of syr2 are identical to that of her2. bli_her2_basic_check( BLIS_NO_CONJUGATE, alpha, x, y, c ); } -void bli_syr2_check( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ) +void bli_syr2_check( obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c ) { err_t e_val; diff --git a/frame/2/syr2/bli_syr2_check.h b/frame/2/syr2/old/bli_syr2_check.h similarity index 84% rename from frame/2/syr2/bli_syr2_check.h rename to frame/2/syr2/old/bli_syr2_check.h index 5690326ba..c4fd59d41 100644 --- a/frame/2/syr2/bli_syr2_check.h +++ b/frame/2/syr2/old/bli_syr2_check.h @@ -32,13 +32,13 @@ */ -void bli_syr2_basic_check( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ); +void bli_syr2_basic_check( obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c ); -void bli_syr2_check( obj_t* alpha, - obj_t* x, - obj_t* y, - obj_t* c ); +void bli_syr2_check( obj_t* alpha, + obj_t* x, + obj_t* y, + obj_t* c ); diff --git a/frame/2/trmv/bli_trmv.h b/frame/2/trmv/bli_trmv.h index 205b457e1..242642a91 100644 --- a/frame/2/trmv/bli_trmv.h +++ b/frame/2/trmv/bli_trmv.h @@ -33,68 +33,8 @@ */ #include "bli_trmv_cntl.h" -#include "bli_trmv_check.h" +#include "bli_trmv_front.h" #include "bli_trmv_int.h" -#include "bli_trmv_unb_var1.h" -#include "bli_trmv_unb_var2.h" - -#include "bli_trmv_unf_var1.h" -#include "bli_trmv_unf_var2.h" - -#include "bli_trmv_l_blk_var1.h" -#include "bli_trmv_l_blk_var2.h" -#include "bli_trmv_u_blk_var1.h" -#include "bli_trmv_u_blk_var2.h" - - -void bli_trmv( obj_t* alpha, - obj_t* a, - obj_t* x ); - - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* x, inc_t incx \ - ); - -INSERT_GENTPROT_BASIC( trmv ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT2U -#define GENTPROT2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, opname ) \ -\ -void PASTEMAC2(cha,chx,opname)( \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t rs_a, inc_t cs_a, \ - ctype_x* x, inc_t incx \ - ); - -INSERT_GENTPROT2U_BASIC( trmv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2U_MIX_D( trmv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2U_MIX_P( trmv ) -#endif +#include "bli_trmv_var.h" diff --git a/frame/2/trmv/bli_trmv_cntl.c b/frame/2/trmv/bli_trmv_cntl.c index c1876cb43..c71ca0a95 100644 --- a/frame/2/trmv/bli_trmv_cntl.c +++ b/frame/2/trmv/bli_trmv_cntl.c @@ -43,8 +43,6 @@ extern gemv_t* gemv_cntl_rp_bs_axpy; extern gemv_t* gemv_cntl_cp_bs_dot; extern gemv_t* gemv_cntl_cp_bs_axpy; -extern blksz_t* gemv_mc; - trmv_t* trmv_cntl_bs_ke_nrow_tcol; trmv_t* trmv_cntl_bs_ke_ncol_trow; trmv_t* trmv_cntl_ge_nrow_tcol; @@ -59,17 +57,17 @@ void bli_trmv_cntl_init() = bli_trmv_cntl_obj_create( BLIS_UNB_FUSED, BLIS_VARIANT1, + 0, NULL, NULL, NULL, - NULL, NULL, NULL, - NULL ); + NULL, NULL, NULL ); trmv_cntl_bs_ke_ncol_trow = bli_trmv_cntl_obj_create( BLIS_UNB_FUSED, BLIS_VARIANT2, + 0, NULL, NULL, NULL, - NULL, NULL, NULL, - NULL ); + NULL, NULL, NULL ); // Create control trees for generally large problems. Here we choose a @@ -78,7 +76,7 @@ void bli_trmv_cntl_init() = bli_trmv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, // use var1 to maximize x1 usage - gemv_mc, + BLIS_M2, packm_cntl, // pack A11 (if needed) packv_cntl, // pack x1 (if needed) gemv_cntl_rp_bs_dot, // gemv_rp needed by var1 @@ -89,7 +87,7 @@ void bli_trmv_cntl_init() = bli_trmv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, // use var1 to maximize x1 usage - gemv_mc, + BLIS_M2, packm_cntl, // pack A11 (if needed) packv_cntl, // pack x1 (if needed) gemv_cntl_rp_bs_axpy, // gemv_rp needed by var1 @@ -109,7 +107,7 @@ void bli_trmv_cntl_finalize() trmv_t* bli_trmv_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packm_t* sub_packm_a11, packv_t* sub_packv_x1, gemv_t* sub_gemv_rp, @@ -123,7 +121,7 @@ trmv_t* bli_trmv_cntl_obj_create( impl_t impl_type, cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_packm_a11 = sub_packm_a11; cntl->sub_packv_x1 = sub_packv_x1; cntl->sub_gemv_rp = sub_gemv_rp; @@ -137,7 +135,7 @@ trmv_t* bli_trmv_cntl_obj_create( impl_t impl_type, void bli_trmv_cntl_obj_init( trmv_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packm_t* sub_packm_a11, packv_t* sub_packv_x1, gemv_t* sub_gemv_rp, @@ -147,7 +145,7 @@ void bli_trmv_cntl_obj_init( trmv_t* cntl, { cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_packm_a11 = sub_packm_a11; cntl->sub_packv_x1 = sub_packv_x1; cntl->sub_gemv_rp = sub_gemv_rp; diff --git a/frame/2/trmv/bli_trmv_cntl.h b/frame/2/trmv/bli_trmv_cntl.h index d69907cc5..3ce17353e 100644 --- a/frame/2/trmv/bli_trmv_cntl.h +++ b/frame/2/trmv/bli_trmv_cntl.h @@ -36,7 +36,7 @@ struct trmv_s { impl_t impl_type; varnum_t var_num; - blksz_t* b; + bszid_t bszid; struct packm_s* sub_packm_a11; struct packv_s* sub_packv_x1; struct gemv_s* sub_gemv_rp; @@ -52,7 +52,7 @@ void bli_trmv_cntl_init( void ); void bli_trmv_cntl_finalize( void ); trmv_t* bli_trmv_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packm_t* sub_packm_a11, packv_t* sub_packv_x1, gemv_t* sub_gemv_rp, @@ -62,7 +62,7 @@ trmv_t* bli_trmv_cntl_obj_create( impl_t impl_type, void bli_trmv_cntl_obj_init( trmv_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, packm_t* sub_packm_a11, packv_t* sub_packv_x1, gemv_t* sub_gemv_rp, diff --git a/frame/2/trmv/bli_trmv.c b/frame/2/trmv/bli_trmv_front.c similarity index 78% rename from frame/2/trmv/bli_trmv.c rename to frame/2/trmv/bli_trmv_front.c index 1009e4616..049f9aa37 100644 --- a/frame/2/trmv/bli_trmv.c +++ b/frame/2/trmv/bli_trmv_front.c @@ -39,9 +39,13 @@ extern trmv_t* trmv_cntl_bs_ke_ncol_trow; extern trmv_t* trmv_cntl_ge_nrow_tcol; extern trmv_t* trmv_cntl_ge_ncol_trow; -void bli_trmv( obj_t* alpha, - obj_t* a, - obj_t* x ) +void bli_trmv_front + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx + ) { trmv_t* trmv_cntl; num_t dt_targ_a; @@ -117,12 +121,12 @@ void bli_trmv( obj_t* alpha, } } - // Invoke the internal back-end with the copy-cast of alpha and the // chosen control tree. bli_trmv_int( &alpha_local, a, x, + cntx, trmv_cntl ); } @@ -131,17 +135,19 @@ void bli_trmv( obj_t* alpha, // Define BLAS-like interfaces with homogeneous-typed operands. // #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* x, inc_t incx \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -164,38 +170,9 @@ void PASTEMAC(ch,opname)( \ \ PASTEMAC0(opname)( &alphao, \ &ao, \ - &xo ); \ + &xo, \ + cntx ); \ } -INSERT_GENTFUNC_BASIC( trmv, trmv ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC2U -#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, opname, varname ) \ -\ -void PASTEMAC2(cha,chx,opname)( \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t rs_a, inc_t cs_a, \ - ctype_x* x, inc_t incx \ - ) \ -{ \ - bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ -} - -INSERT_GENTFUNC2U_BASIC( trmv, trmv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2U_MIX_D( trmv, trmv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2U_MIX_P( trmv, trmv ) -#endif +INSERT_GENTFUNC_BASIC0( trmv_front ) diff --git a/frame/2/trmv/bli_trmv_front.h b/frame/2/trmv/bli_trmv_front.h new file mode 100644 index 000000000..3f9724f8a --- /dev/null +++ b/frame/2/trmv/bli_trmv_front.h @@ -0,0 +1,59 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +void bli_trmv_front + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx + ); + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( trmv_front ) + diff --git a/frame/2/trmv/bli_trmv_int.c b/frame/2/trmv/bli_trmv_int.c index 510ef7785..cc210a065 100644 --- a/frame/2/trmv/bli_trmv_int.c +++ b/frame/2/trmv/bli_trmv_int.c @@ -39,6 +39,7 @@ typedef void (*FUNCPTR_T)( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); static FUNCPTR_T vars[2][3][3] = @@ -62,6 +63,7 @@ static FUNCPTR_T vars[2][3][3] = void bli_trmv_int( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ) { varnum_t n; @@ -72,7 +74,7 @@ void bli_trmv_int( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_trmv_int_check( alpha, a, x, cntl ); + bli_trmv_check( alpha, a, x ); // If A or x has a zero dimension, return early. if ( bli_obj_has_zero_dim( *a ) ) return; @@ -123,6 +125,7 @@ void bli_trmv_int( obj_t* alpha, f( alpha, &a_local, x, + cntx, cntl ); } diff --git a/frame/2/trmv/bli_trmv_int.h b/frame/2/trmv/bli_trmv_int.h index 43efc2b6e..ee9bbd733 100644 --- a/frame/2/trmv/bli_trmv_int.h +++ b/frame/2/trmv/bli_trmv_int.h @@ -35,5 +35,6 @@ void bli_trmv_int( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); diff --git a/frame/2/trmv/bli_trmv_l_blk_var1.c b/frame/2/trmv/bli_trmv_l_blk_var1.c index cd7b6a248..f3af591de 100644 --- a/frame/2/trmv/bli_trmv_l_blk_var1.c +++ b/frame/2/trmv/bli_trmv_l_blk_var1.c @@ -37,6 +37,7 @@ void bli_trmv_l_blk_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ) { obj_t a11, a11_pack; @@ -60,7 +61,7 @@ void bli_trmv_l_blk_var1( obj_t* alpha, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_b( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A10, x1, and x0. bli_acquire_mpart_br2tl( BLIS_SUBPART11, @@ -74,21 +75,22 @@ void bli_trmv_l_blk_var1( obj_t* alpha, // Initialize objects for packing A11 and x1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // Copy/pack A11, x1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // x1 = alpha * tril( A11 ) * x1; bli_trmv_int( alpha, &a11_pack, &x1_pack, + cntx, cntl_sub_trmv( cntl ) ); // x1 = x1 + alpha * A10 * x0; @@ -99,11 +101,12 @@ void bli_trmv_l_blk_var1( obj_t* alpha, &x0, &BLIS_ONE, &x1_pack, + cntx, cntl_sub_gemv_rp( cntl ) ); // Copy/unpack x1 (if x1 was packed). bli_unpackv_int( &x1_pack, &x1, - cntl_sub_unpackv_x1( cntl ) ); + cntx, cntl_sub_unpackv_x1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/trmv/bli_trmv_l_blk_var2.c b/frame/2/trmv/bli_trmv_l_blk_var2.c index 1df5b68af..6bcd16ecf 100644 --- a/frame/2/trmv/bli_trmv_l_blk_var2.c +++ b/frame/2/trmv/bli_trmv_l_blk_var2.c @@ -37,6 +37,7 @@ void bli_trmv_l_blk_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ) { obj_t a11, a11_pack; @@ -60,7 +61,7 @@ void bli_trmv_l_blk_var2( obj_t* alpha, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_b( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A21, x1, and x2. bli_acquire_mpart_br2tl( BLIS_SUBPART11, @@ -74,16 +75,16 @@ void bli_trmv_l_blk_var2( obj_t* alpha, // Initialize objects for packing A11 and x1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // Copy/pack A11, x1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // x2 = x2 + alpha * A21 * x1; bli_gemv_int( BLIS_NO_TRANSPOSE, @@ -93,17 +94,19 @@ void bli_trmv_l_blk_var2( obj_t* alpha, &x1_pack, &BLIS_ONE, &x2, + cntx, cntl_sub_gemv_cp( cntl ) ); // x1 = alpha * tril( A11 ) * x1; bli_trmv_int( alpha, &a11_pack, &x1_pack, + cntx, cntl_sub_trmv( cntl ) ); // Copy/unpack x1 (if x1 was packed). bli_unpackv_int( &x1_pack, &x1, - cntl_sub_unpackv_x1( cntl ) ); + cntx, cntl_sub_unpackv_x1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/trmv/bli_trmv_u_blk_var1.c b/frame/2/trmv/bli_trmv_u_blk_var1.c index aa0b324fa..178e7e90c 100644 --- a/frame/2/trmv/bli_trmv_u_blk_var1.c +++ b/frame/2/trmv/bli_trmv_u_blk_var1.c @@ -37,6 +37,7 @@ void bli_trmv_u_blk_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ) { obj_t a11, a11_pack; @@ -60,7 +61,7 @@ void bli_trmv_u_blk_var1( obj_t* alpha, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A12, x1, and x2. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -74,21 +75,22 @@ void bli_trmv_u_blk_var1( obj_t* alpha, // Initialize objects for packing A11 and x1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // Copy/pack A11, x1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // x1 = alpha * triu( A11 ) * x1; bli_trmv_int( alpha, &a11_pack, &x1_pack, + cntx, cntl_sub_trmv( cntl ) ); // x1 = x1 + alpha * A12 * x2; @@ -99,11 +101,12 @@ void bli_trmv_u_blk_var1( obj_t* alpha, &x2, &BLIS_ONE, &x1_pack, + cntx, cntl_sub_gemv_rp( cntl ) ); // Copy/unpack x1 (if x1 was packed). bli_unpackv_int( &x1_pack, &x1, - cntl_sub_unpackv_x1( cntl ) ); + cntx, cntl_sub_unpackv_x1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/trmv/bli_trmv_u_blk_var2.c b/frame/2/trmv/bli_trmv_u_blk_var2.c index 00e33d8ba..fd691c6ab 100644 --- a/frame/2/trmv/bli_trmv_u_blk_var2.c +++ b/frame/2/trmv/bli_trmv_u_blk_var2.c @@ -37,6 +37,7 @@ void bli_trmv_u_blk_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ) { obj_t a11, a11_pack; @@ -60,7 +61,7 @@ void bli_trmv_u_blk_var2( obj_t* alpha, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_b( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A21, x1, and x2. bli_acquire_mpart_br2tl( BLIS_SUBPART11, @@ -74,16 +75,16 @@ void bli_trmv_u_blk_var2( obj_t* alpha, // Initialize objects for packing A11 and x1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // Copy/pack A11, x1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // x0 = x0 + alpha * A01 * x1; bli_gemv_int( BLIS_NO_TRANSPOSE, @@ -93,17 +94,19 @@ void bli_trmv_u_blk_var2( obj_t* alpha, &x1_pack, &BLIS_ONE, &x0, + cntx, cntl_sub_gemv_cp( cntl ) ); // x1 = alpha * triu( A11 ) * x1; bli_trmv_int( alpha, &a11_pack, &x1_pack, + cntx, cntl_sub_trmv( cntl ) ); // Copy/unpack x1 (if x1 was packed). bli_unpackv_int( &x1_pack, &x1, - cntl_sub_unpackv_x1( cntl ) ); + cntx, cntl_sub_unpackv_x1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/trmv/bli_trmv_unb_var1.c b/frame/2/trmv/bli_trmv_unb_var1.c index 11d3ca757..89df4f86e 100644 --- a/frame/2/trmv/bli_trmv_unb_var1.c +++ b/frame/2/trmv/bli_trmv_unb_var1.c @@ -34,190 +34,121 @@ #include "blis.h" -#define FUNCPTR_T trmv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - trans_t trans, - diag_t diag, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,trmv_unb_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,trmv_unb_var1); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,trmv_unb_var1); -#endif -#endif - - -void bli_trmv_unb_var1( obj_t* alpha, - obj_t* a, - obj_t* x, - trmv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - - uplo_t uplo = bli_obj_uplo( *a ); - trans_t trans = bli_obj_conjtrans_status( *a ); - diag_t diag = bli_obj_diag( *a ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x]; - - // Invoke the function. - f( uplo, - trans, - diag, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx ); -} - - -#undef GENTFUNC2U -#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(cha,chx,varname)( \ - uplo_t uplo, \ - trans_t trans, \ - diag_t diag, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_a* a10t; \ - ctype_a* alpha11; \ - ctype_a* a12t; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_ax alpha_alpha11_conj; \ - ctype_ax rho; \ - dim_t iter, i; \ - dim_t n_ahead; \ - inc_t rs_at, cs_at; \ - uplo_t uplo_trans; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* a10t; \ + ctype* alpha11; \ + ctype* a12t; \ + ctype* x0; \ + ctype* chi1; \ + ctype* x2; \ + ctype alpha_alpha11_conj; \ + ctype rho; \ + dim_t iter, i; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + uplo_t uploa_trans; \ + conj_t conja; \ \ - if ( bli_does_notrans( trans ) ) \ + if ( bli_does_notrans( transa ) ) \ { \ rs_at = rs_a; \ cs_at = cs_a; \ - uplo_trans = uplo; \ + uploa_trans = uploa; \ } \ - else /* if ( bli_does_trans( trans ) ) */ \ + else /* if ( bli_does_trans( transa ) ) */ \ { \ rs_at = cs_a; \ cs_at = rs_a; \ - uplo_trans = bli_uplo_toggled( uplo ); \ + uploa_trans = bli_uplo_toggled( uploa ); \ } \ \ - conja = bli_extract_conj( trans ); \ + conja = bli_extract_conj( transa ); \ +\ + PASTECH(ch,dotv_ft) kfp_dv; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_dv = bli_cntx_get_l1v_ker_dt( dt, BLIS_DOTV_KER, cntx ); \ \ /* We reduce all of the possible cases down to just lower/upper. */ \ - if ( bli_is_upper( uplo_trans ) ) \ + if ( bli_is_upper( uploa_trans ) ) \ { \ for ( iter = 0; iter < m; ++iter ) \ { \ i = iter; \ n_ahead = m - iter - 1; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a12t = a_cast + (i )*rs_at + (i+1)*cs_at; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a12t = a + (i )*rs_at + (i+1)*cs_at; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ \ /* chi1 = alpha * alpha11 * chi1; */ \ - PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ - if ( bli_is_nonunit_diag( diag ) ) \ - PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ - PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi1 ); \ + PASTEMAC(ch,copys)( *alpha, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diaga ) ) \ + PASTEMAC(ch,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC(ch,scals)( alpha_alpha11_conj, *chi1 ); \ \ /* chi1 = chi1 + alpha * a12t * x2; */ \ - PASTEMAC3(cha,chx,chax,dotv)( conja, \ - BLIS_NO_CONJUGATE, \ - n_ahead, \ - a12t, cs_at, \ - x2, incx, \ - &rho ); \ - PASTEMAC3(chax,chax,chx,axpys)( *alpha_cast, rho, *chi1 ); \ + kfp_dv \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + a12t, cs_at, \ + x2, incx, \ + &rho, \ + cntx \ + ); \ + PASTEMAC(ch,axpys)( *alpha, rho, *chi1 ); \ } \ } \ - else /* if ( bli_is_lower( uplo_trans ) ) */ \ + else /* if ( bli_is_lower( uploa_trans ) ) */ \ { \ for ( iter = 0; iter < m; ++iter ) \ { \ i = m - iter - 1; \ n_ahead = i; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ - chi1 = x_cast + (i )*incx; \ - x0 = x_cast + (0 )*incx; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a10t = a + (i )*rs_at + (0 )*cs_at; \ + chi1 = x + (i )*incx; \ + x0 = x + (0 )*incx; \ \ /* chi1 = alpha * alpha11 * chi1; */ \ - PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ - if ( bli_is_nonunit_diag( diag ) ) \ - PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ - PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi1 ); \ + PASTEMAC(ch,copys)( *alpha, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diaga ) ) \ + PASTEMAC(ch,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC(ch,scals)( alpha_alpha11_conj, *chi1 ); \ \ /* chi1 = chi1 + alpha * a10t * x0; */ \ - PASTEMAC3(cha,chx,chax,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_ahead, \ - a10t, cs_at, \ - x0, incx, \ - &rho ); \ - PASTEMAC3(chax,chax,chx,axpys)( *alpha_cast, rho, *chi1 ); \ + kfp_dv \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + a10t, cs_at, \ + x0, incx, \ + &rho, \ + cntx \ + ); \ + PASTEMAC(ch,axpys)( *alpha, rho, *chi1 ); \ } \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2U_BASIC( trmv_unb_var1, DOTV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2U_MIX_D( trmv_unb_var1, DOTV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2U_MIX_P( trmv_unb_var1, DOTV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( trmv_unb_var1 ) diff --git a/frame/2/trmv/bli_trmv_unb_var2.c b/frame/2/trmv/bli_trmv_unb_var2.c index 23e625142..06d39b485 100644 --- a/frame/2/trmv/bli_trmv_unb_var2.c +++ b/frame/2/trmv/bli_trmv_unb_var2.c @@ -34,188 +34,119 @@ #include "blis.h" -#define FUNCPTR_T trmv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - trans_t trans, - diag_t diag, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,trmv_unb_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,trmv_unb_var2); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,trmv_unb_var2); -#endif -#endif - - -void bli_trmv_unb_var2( obj_t* alpha, - obj_t* a, - obj_t* x, - trmv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - - uplo_t uplo = bli_obj_uplo( *a ); - trans_t trans = bli_obj_conjtrans_status( *a ); - diag_t diag = bli_obj_diag( *a ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x]; - - // Invoke the function. - f( uplo, - trans, - diag, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx ); -} - - -#undef GENTFUNC2U -#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(cha,chx,varname)( \ - uplo_t uplo, \ - trans_t trans, \ - diag_t diag, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_a* a01; \ - ctype_a* alpha11; \ - ctype_a* a21; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_ax alpha_alpha11_conj; \ - ctype_ax alpha_chi1; \ - dim_t iter, i; \ - dim_t n_behind; \ - inc_t rs_at, cs_at; \ - uplo_t uplo_trans; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* a01; \ + ctype* alpha11; \ + ctype* a21; \ + ctype* x0; \ + ctype* chi1; \ + ctype* x2; \ + ctype alpha_alpha11_conj; \ + ctype alpha_chi1; \ + dim_t iter, i; \ + dim_t n_behind; \ + inc_t rs_at, cs_at; \ + uplo_t uploa_trans; \ + conj_t conja; \ \ - if ( bli_does_notrans( trans ) ) \ + if ( bli_does_notrans( transa ) ) \ { \ rs_at = rs_a; \ cs_at = cs_a; \ - uplo_trans = uplo; \ + uploa_trans = uploa; \ } \ - else /* if ( bli_does_trans( trans ) ) */ \ + else /* if ( bli_does_trans( transa ) ) */ \ { \ rs_at = cs_a; \ cs_at = rs_a; \ - uplo_trans = bli_uplo_toggled( uplo ); \ + uploa_trans = bli_uplo_toggled( uploa ); \ } \ \ - conja = bli_extract_conj( trans ); \ + conja = bli_extract_conj( transa ); \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ /* We reduce all of the possible cases down to just lower/upper. */ \ - if ( bli_is_upper( uplo_trans ) ) \ + if ( bli_is_upper( uploa_trans ) ) \ { \ for ( iter = 0; iter < m; ++iter ) \ { \ i = iter; \ n_behind = i; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a01 = a_cast + (0 )*rs_at + (i )*cs_at; \ - chi1 = x_cast + (i )*incx; \ - x0 = x_cast + (0 )*incx; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a01 = a + (0 )*rs_at + (i )*cs_at; \ + chi1 = x + (i )*incx; \ + x0 = x + (0 )*incx; \ \ /* x0 = x0 + alpha * chi1 * a01; */ \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, *chi1, alpha_chi1 ); \ - PASTEMAC3(chax,cha,chx,axpyv)( conja, \ - n_behind, \ - &alpha_chi1, \ - a01, rs_at, \ - x0, incx ); \ + PASTEMAC(ch,scal2s)( *alpha, *chi1, alpha_chi1 ); \ + kfp_av \ + ( \ + conja, \ + n_behind, \ + &alpha_chi1, \ + a01, rs_at, \ + x0, incx, \ + cntx \ + ); \ \ /* chi1 = alpha * alpha11 * chi1; */ \ - PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ - if ( bli_is_nonunit_diag( diag ) ) \ - PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ - PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi1 ); \ + PASTEMAC(ch,copys)( *alpha, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diaga ) ) \ + PASTEMAC(ch,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC(ch,scals)( alpha_alpha11_conj, *chi1 ); \ } \ } \ - else /* if ( bli_is_lower( uplo_trans ) ) */ \ + else /* if ( bli_is_lower( uploa_trans ) ) */ \ { \ for ( iter = 0; iter < m; ++iter ) \ { \ i = m - iter - 1; \ n_behind = iter; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a21 = a + (i+1)*rs_at + (i )*cs_at; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ \ /* x2 = x2 + alpha * chi1 * a21; */ \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, *chi1, alpha_chi1 ); \ - PASTEMAC3(chax,cha,chx,kername)( conja, \ - n_behind, \ - &alpha_chi1, \ - a21, rs_at, \ - x2, incx ); \ + PASTEMAC(ch,scal2s)( *alpha, *chi1, alpha_chi1 ); \ + kfp_av \ + ( \ + conja, \ + n_behind, \ + &alpha_chi1, \ + a21, rs_at, \ + x2, incx, \ + cntx \ + ); \ \ /* chi1 = alpha * alpha11 * chi1; */ \ - PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ - if ( bli_is_nonunit_diag( diag ) ) \ - PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ - PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi1 ); \ + PASTEMAC(ch,copys)( *alpha, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diaga ) ) \ + PASTEMAC(ch,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC(ch,scals)( alpha_alpha11_conj, *chi1 ); \ } \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2U_BASIC( trmv_unb_var2, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2U_MIX_D( trmv_unb_var2, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2U_MIX_P( trmv_unb_var2, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( trmv_unb_var2 ) diff --git a/frame/2/trmv/bli_trmv_unf_var1.c b/frame/2/trmv/bli_trmv_unf_var1.c index f6f74eb5c..fa60afa6f 100644 --- a/frame/2/trmv/bli_trmv_unf_var1.c +++ b/frame/2/trmv/bli_trmv_unf_var1.c @@ -34,147 +34,78 @@ #include "blis.h" -#define FUNCPTR_T trmv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - trans_t trans, - diag_t diag, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,trmv_unf_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,trmv_unf_var1); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,trmv_unf_var1); -#endif -#endif - - -void bli_trmv_unf_var1( obj_t* alpha, - obj_t* a, - obj_t* x, - trmv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - - uplo_t uplo = bli_obj_uplo( *a ); - trans_t trans = bli_obj_conjtrans_status( *a ); - diag_t diag = bli_obj_diag( *a ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x]; - - // Invoke the function. - f( uplo, - trans, - diag, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx ); -} - - -#undef GENTFUNC2U -#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(cha,chx,varname)( \ - uplo_t uplo, \ - trans_t trans, \ - diag_t diag, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_x* one = PASTEMAC(chx,1); \ - ctype_a* A10; \ - ctype_a* A11; \ - ctype_a* A12; \ - ctype_a* a10t; \ - ctype_a* alpha11; \ - ctype_a* a12t; \ - ctype_x* x0; \ - ctype_x* x1; \ - ctype_x* x2; \ - ctype_x* x01; \ - ctype_x* chi11; \ - ctype_x* x21; \ - ctype_ax alpha_alpha11_conj; \ - ctype_ax rho1; \ - dim_t iter, i, k, j, l; \ - dim_t b_fuse, f; \ - dim_t n_ahead, f_ahead; \ - inc_t rs_at, cs_at; \ - uplo_t uplo_trans; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* one = PASTEMAC(ch,1); \ + ctype* A10; \ + ctype* A11; \ + ctype* A12; \ + ctype* a10t; \ + ctype* alpha11; \ + ctype* a12t; \ + ctype* x0; \ + ctype* x1; \ + ctype* x2; \ + ctype* x01; \ + ctype* chi11; \ + ctype* x21; \ + ctype alpha_alpha11_conj; \ + ctype rho1; \ + dim_t iter, i, k, j, l; \ + dim_t b_fuse, f; \ + dim_t n_ahead, f_ahead; \ + inc_t rs_at, cs_at; \ + uplo_t uploa_trans; \ + conj_t conja; \ \ - if ( bli_does_notrans( trans ) ) \ + if ( bli_does_notrans( transa ) ) \ { \ rs_at = rs_a; \ cs_at = cs_a; \ - uplo_trans = uplo; \ + uploa_trans = uploa; \ } \ - else /* if ( bli_does_trans( trans ) ) */ \ + else /* if ( bli_does_trans( transa ) ) */ \ { \ rs_at = cs_a; \ cs_at = rs_a; \ - uplo_trans = bli_uplo_toggled( uplo ); \ + uploa_trans = bli_uplo_toggled( uploa ); \ } \ \ - conja = bli_extract_conj( trans ); \ + conja = bli_extract_conj( transa ); \ \ - /* Query the fusing factor for the dotxf implementation. */ \ - b_fuse = PASTEMAC(chax,dotxf_fusefac); \ + PASTECH(ch,dotxf_ft) kfp_df; \ +\ + /* Query the context for the kernel function pointer and fusing factor. */ \ + kfp_df = bli_cntx_get_l1f_ker_dt( dt, BLIS_DOTXF_KER, cntx ); \ + b_fuse = bli_cntx_get_blksz_def_dt( dt, BLIS_DF, cntx ); \ \ /* We reduce all of the possible cases down to just lower/upper. */ \ - if ( bli_is_upper( uplo_trans ) ) \ + if ( bli_is_upper( uploa_trans ) ) \ { \ for ( iter = 0; iter < m; iter += f ) \ { \ f = bli_determine_blocksize_dim_f( iter, m, b_fuse ); \ i = iter; \ n_ahead = m - iter - f; \ - A11 = a_cast + (i )*rs_at + (i )*cs_at; \ - A12 = a_cast + (i )*rs_at + (i+f)*cs_at; \ - x1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+f)*incx; \ + A11 = a + (i )*rs_at + (i )*cs_at; \ + A12 = a + (i )*rs_at + (i+f)*cs_at; \ + x1 = x + (i )*incx; \ + x2 = x + (i+f)*incx; \ \ /* x1 = alpha * A11 * x1; */ \ for ( k = 0; k < f; ++k ) \ @@ -187,49 +118,53 @@ void PASTEMAC2(cha,chx,varname)( \ x21 = x1 + (l+1)*incx; \ \ /* chi11 = alpha * alpha11 * chi11; */ \ - PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ - if ( bli_is_nonunit_diag( diag ) ) \ - PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ - PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi11 ); \ + PASTEMAC(ch,copys)( *alpha, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diaga ) ) \ + PASTEMAC(ch,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC(ch,scals)( alpha_alpha11_conj, *chi11 ); \ \ /* chi11 = chi11 + alpha * a12t * x21; */ \ - PASTEMAC(chax,set0s)( rho1 ); \ + PASTEMAC(ch,set0s)( rho1 ); \ if ( bli_is_conj( conja ) ) \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(cha,chx,chax,dotjs)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ + PASTEMAC(ch,dotjs)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ } \ else \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(cha,chx,chax,dots)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ + PASTEMAC(ch,dots)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ } \ - PASTEMAC3(chax,chax,chx,axpys)( *alpha_cast, rho1, *chi11 ); \ + PASTEMAC(ch,axpys)( *alpha, rho1, *chi11 ); \ } \ \ /* x1 = x1 + alpha * A12 * x2; */ \ - PASTEMAC3(cha,chx,chx,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_ahead, \ - f, \ - alpha_cast, \ - A12, cs_at, rs_at, \ - x2, incx, \ - one, \ - x1, incx ); \ + kfp_df \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + f, \ + alpha, \ + A12, cs_at, rs_at, \ + x2, incx, \ + one, \ + x1, incx, \ + cntx \ + ); \ } \ } \ - else /* if ( bli_is_lower( uplo_trans ) ) */ \ + else /* if ( bli_is_lower( uploa_trans ) ) */ \ { \ for ( iter = 0; iter < m; iter += f ) \ { \ f = bli_determine_blocksize_dim_b( iter, m, b_fuse ); \ i = m - iter - f; \ n_ahead = i; \ - A11 = a_cast + (i )*rs_at + (i )*cs_at; \ - A10 = a_cast + (i )*rs_at + (0 )*cs_at; \ - x1 = x_cast + (i )*incx; \ - x0 = x_cast + (0 )*incx; \ + A11 = a + (i )*rs_at + (i )*cs_at; \ + A10 = a + (i )*rs_at + (0 )*cs_at; \ + x1 = x + (i )*incx; \ + x0 = x + (0 )*incx; \ \ /* x1 = alpha * A11 * x1; */ \ for ( k = 0; k < f; ++k ) \ @@ -242,49 +177,43 @@ void PASTEMAC2(cha,chx,varname)( \ x01 = x1 + (0 )*incx; \ \ /* chi11 = alpha * alpha11 * chi11; */ \ - PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ - if ( bli_is_nonunit_diag( diag ) ) \ - PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ - PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi11 ); \ + PASTEMAC(ch,copys)( *alpha, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diaga ) ) \ + PASTEMAC(ch,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC(ch,scals)( alpha_alpha11_conj, *chi11 ); \ \ /* chi11 = chi11 + alpha * a10t * x01; */ \ - PASTEMAC(chax,set0s)( rho1 ); \ + PASTEMAC(ch,set0s)( rho1 ); \ if ( bli_is_conj( conja ) ) \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(cha,chx,chax,dotjs)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ + PASTEMAC(ch,dotjs)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ } \ else \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(cha,chx,chax,dots)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ + PASTEMAC(ch,dots)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ } \ - PASTEMAC3(chax,chax,chx,axpys)( *alpha_cast, rho1, *chi11 ); \ + PASTEMAC(ch,axpys)( *alpha, rho1, *chi11 ); \ } \ \ /* x1 = x1 + alpha * A10 * x0; */ \ - PASTEMAC3(cha,chx,chx,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_ahead, \ - f, \ - alpha_cast, \ - A10, cs_at, rs_at, \ - x0, incx, \ - one, \ - x1, incx ); \ + kfp_df \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + f, \ + alpha, \ + A10, cs_at, rs_at, \ + x0, incx, \ + one, \ + x1, incx, \ + cntx \ + ); \ } \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2U_BASIC( trmv_unf_var1, DOTXF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2U_MIX_D( trmv_unf_var1, DOTXF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2U_MIX_P( trmv_unf_var1, DOTXF_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( trmv_unf_var1 ) diff --git a/frame/2/trmv/bli_trmv_unf_var2.c b/frame/2/trmv/bli_trmv_unf_var2.c index 39bd0c4de..42e8587b8 100644 --- a/frame/2/trmv/bli_trmv_unf_var2.c +++ b/frame/2/trmv/bli_trmv_unf_var2.c @@ -34,156 +34,91 @@ #include "blis.h" -#define FUNCPTR_T trmv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - trans_t trans, - diag_t diag, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,trmv_unf_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,trmv_unf_var2); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,trmv_unf_var2); -#endif -#endif - - -void bli_trmv_unf_var2( obj_t* alpha, - obj_t* a, - obj_t* x, - trmv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - - uplo_t uplo = bli_obj_uplo( *a ); - trans_t trans = bli_obj_conjtrans_status( *a ); - diag_t diag = bli_obj_diag( *a ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x]; - - // Invoke the function. - f( uplo, - trans, - diag, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx ); -} - - -#undef GENTFUNC2U -#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(cha,chx,varname)( \ - uplo_t uplo, \ - trans_t trans, \ - diag_t diag, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_a* A01; \ - ctype_a* A11; \ - ctype_a* A21; \ - ctype_a* a01; \ - ctype_a* alpha11; \ - ctype_a* a21; \ - ctype_x* x0; \ - ctype_x* x1; \ - ctype_x* x2; \ - ctype_x* x01; \ - ctype_x* chi11; \ - ctype_x* x21; \ - ctype_ax alpha_alpha11_conj; \ - ctype_ax alpha_chi11; \ - dim_t iter, i, k, j, l; \ - dim_t b_fuse, f; \ - dim_t n_behind, f_behind; \ - inc_t rs_at, cs_at; \ - uplo_t uplo_trans; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* A01; \ + ctype* A11; \ + ctype* A21; \ + ctype* a01; \ + ctype* alpha11; \ + ctype* a21; \ + ctype* x0; \ + ctype* x1; \ + ctype* x2; \ + ctype* x01; \ + ctype* chi11; \ + ctype* x21; \ + ctype alpha_alpha11_conj; \ + ctype alpha_chi11; \ + dim_t iter, i, k, j, l; \ + dim_t b_fuse, f; \ + dim_t n_behind, f_behind; \ + inc_t rs_at, cs_at; \ + uplo_t uploa_trans; \ + conj_t conja; \ \ - if ( bli_does_notrans( trans ) ) \ + if ( bli_does_notrans( transa ) ) \ { \ rs_at = rs_a; \ cs_at = cs_a; \ - uplo_trans = uplo; \ + uploa_trans = uploa; \ } \ - else /* if ( bli_does_trans( trans ) ) */ \ + else /* if ( bli_does_trans( transa ) ) */ \ { \ rs_at = cs_a; \ cs_at = rs_a; \ - uplo_trans = bli_uplo_toggled( uplo ); \ + uploa_trans = bli_uplo_toggled( uploa ); \ } \ \ - conja = bli_extract_conj( trans ); \ + conja = bli_extract_conj( transa ); \ \ - /* Query the fusing factor for the axpyf implementation. */ \ - b_fuse = PASTEMAC(chax,axpyf_fusefac); \ + PASTECH(ch,axpyf_ft) kfp_af; \ +\ + /* Query the context for the kernel function pointer and fusing factor. */ \ + kfp_af = bli_cntx_get_l1f_ker_dt( dt, BLIS_AXPYF_KER, cntx ); \ + b_fuse = bli_cntx_get_blksz_def_dt( dt, BLIS_AF, cntx ); \ \ /* We reduce all of the possible cases down to just lower/upper. */ \ - if ( bli_is_upper( uplo_trans ) ) \ + if ( bli_is_upper( uploa_trans ) ) \ { \ for ( iter = 0; iter < m; iter += f ) \ { \ f = bli_determine_blocksize_dim_f( iter, m, b_fuse ); \ i = iter; \ n_behind = i; \ - A11 = a_cast + (i )*rs_at + (i )*cs_at; \ - A01 = a_cast + (0 )*rs_at + (i )*cs_at; \ - x1 = x_cast + (i )*incx; \ - x0 = x_cast + (0 )*incx; \ + A11 = a + (i )*rs_at + (i )*cs_at; \ + A01 = a + (0 )*rs_at + (i )*cs_at; \ + x1 = x + (i )*incx; \ + x0 = x + (0 )*incx; \ \ /* x0 = x0 + alpha * A01 * x1; */ \ - PASTEMAC3(cha,chx,chx,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_behind, \ - f, \ - alpha_cast, \ - A01, rs_at, cs_at, \ - x1, incx, \ - x0, incx ); \ + kfp_af \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + f, \ + alpha, \ + A01, rs_at, cs_at, \ + x1, incx, \ + x0, incx, \ + cntx \ + ); \ \ /* x1 = alpha * A11 * x1; */ \ for ( k = 0; k < f; ++k ) \ @@ -196,47 +131,51 @@ void PASTEMAC2(cha,chx,varname)( \ x01 = x1 + (0 )*incx; \ \ /* x01 = x01 + alpha * chi11 * a01; */ \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, *chi11, alpha_chi11 ); \ + PASTEMAC(ch,scal2s)( *alpha, *chi11, alpha_chi11 ); \ if ( bli_is_conj( conja ) ) \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(chax,cha,chx,axpyjs)( alpha_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ + PASTEMAC(ch,axpyjs)( alpha_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ } \ else \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(chax,cha,chx,axpys)( alpha_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ + PASTEMAC(ch,axpys)( alpha_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ } \ \ /* chi11 = alpha * alpha11 * chi11; */ \ - PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ - if ( bli_is_nonunit_diag( diag ) ) \ - PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ - PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi11 ); \ + PASTEMAC(ch,copys)( *alpha, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diaga ) ) \ + PASTEMAC(ch,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC(ch,scals)( alpha_alpha11_conj, *chi11 ); \ } \ } \ } \ - else /* if ( bli_is_lower( uplo_trans ) ) */ \ + else /* if ( bli_is_lower( uploa_trans ) ) */ \ { \ for ( iter = 0; iter < m; iter += f ) \ { \ f = bli_determine_blocksize_dim_b( iter, m, b_fuse ); \ i = m - iter - f; \ n_behind = iter; \ - A11 = a_cast + (i )*rs_at + (i )*cs_at; \ - A21 = a_cast + (i+f)*rs_at + (i )*cs_at; \ - x1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+f)*incx; \ + A11 = a + (i )*rs_at + (i )*cs_at; \ + A21 = a + (i+f)*rs_at + (i )*cs_at; \ + x1 = x + (i )*incx; \ + x2 = x + (i+f)*incx; \ \ /* x2 = x2 + alpha * A21 * x1; */ \ - PASTEMAC3(cha,chx,chx,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_behind, \ - f, \ - alpha_cast, \ - A21, rs_at, cs_at, \ - x1, incx, \ - x2, incx ); \ + kfp_af \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + f, \ + alpha, \ + A21, rs_at, cs_at, \ + x1, incx, \ + x2, incx, \ + cntx \ + ); \ \ /* x1 = alpha * A11 * x1; */ \ for ( k = 0; k < f; ++k ) \ @@ -249,37 +188,27 @@ void PASTEMAC2(cha,chx,varname)( \ x21 = x1 + (l+1)*incx; \ \ /* x21 = x21 + alpha * chi11 * a21; */ \ - PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, *chi11, alpha_chi11 ); \ + PASTEMAC(ch,scal2s)( *alpha, *chi11, alpha_chi11 ); \ if ( bli_is_conj( conja ) ) \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(chax,cha,chx,axpyjs)( alpha_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ + PASTEMAC(ch,axpyjs)( alpha_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ } \ else \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(chax,cha,chx,axpys)( alpha_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ + PASTEMAC(ch,axpys)( alpha_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ } \ \ /* chi11 = alpha * alpha11 * chi11; */ \ - PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ - if ( bli_is_nonunit_diag( diag ) ) \ - PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ - PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi11 ); \ + PASTEMAC(ch,copys)( *alpha, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diaga ) ) \ + PASTEMAC(ch,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC(ch,scals)( alpha_alpha11_conj, *chi11 ); \ } \ } \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2U_BASIC( trmv_unf_var2, AXPYF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2U_MIX_D( trmv_unf_var2, AXPYF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2U_MIX_P( trmv_unf_var2, AXPYF_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( trmv_unf_var2 ) diff --git a/frame/2/trmv/bli_trmv_var.h b/frame/2/trmv/bli_trmv_var.h new file mode 100644 index 000000000..cca3be140 --- /dev/null +++ b/frame/2/trmv/bli_trmv_var.h @@ -0,0 +1,88 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + cntx_t* cntx, \ + trmv_t* cntl \ + ); + +GENPROT( trmv_l_blk_var1 ) +GENPROT( trmv_l_blk_var2 ) +GENPROT( trmv_u_blk_var1 ) +GENPROT( trmv_u_blk_var2 ) + +GENPROT( trmv_unb_var1 ) +GENPROT( trmv_unb_var2 ) + +GENPROT( trmv_unf_var1 ) +GENPROT( trmv_unf_var2 ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( trmv_unb_var1 ) +INSERT_GENTPROT_BASIC( trmv_unb_var2 ) + +INSERT_GENTPROT_BASIC( trmv_unf_var1 ) +INSERT_GENTPROT_BASIC( trmv_unf_var2 ) + diff --git a/frame/2/trmv/bli_trmv_var_oapi.c b/frame/2/trmv/bli_trmv_var_oapi.c new file mode 100644 index 000000000..75926054b --- /dev/null +++ b/frame/2/trmv/bli_trmv_var_oapi.c @@ -0,0 +1,87 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + cntx_t* cntx, \ + trmv_t* cntl \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *a ); \ +\ + uplo_t uploa = bli_obj_uplo( *a ); \ + trans_t transa = bli_obj_conjtrans_status( *a ); \ + diag_t diaga = bli_obj_diag( *a ); \ +\ + dim_t m = bli_obj_length( *a ); \ +\ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ +\ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + bli_call_ft_11 \ + ( \ + dt, \ + opname, \ + uploa, \ + transa, \ + diaga, \ + m, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_x, incx, \ + cntx \ + ); \ +} \ + +GENFRONT( trmv_unb_var1 ) +GENFRONT( trmv_unb_var2 ) + +GENFRONT( trmv_unf_var1 ) +GENFRONT( trmv_unf_var2 ) + diff --git a/frame/2/trmv/bli_trmv_check.c b/frame/2/trmv/old/bli_trmv_check.c similarity index 98% rename from frame/2/trmv/bli_trmv_check.c rename to frame/2/trmv/old/bli_trmv_check.c index 7e3700345..efbcb0fe0 100644 --- a/frame/2/trmv/bli_trmv_check.c +++ b/frame/2/trmv/old/bli_trmv_check.c @@ -88,6 +88,7 @@ void bli_trmv_check( obj_t* alpha, void bli_trmv_int_check( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ) { err_t e_val; diff --git a/frame/2/trmv/bli_trmv_check.h b/frame/2/trmv/old/bli_trmv_check.h similarity index 98% rename from frame/2/trmv/bli_trmv_check.h rename to frame/2/trmv/old/bli_trmv_check.h index 1cd040b71..1f52e2399 100644 --- a/frame/2/trmv/bli_trmv_check.h +++ b/frame/2/trmv/old/bli_trmv_check.h @@ -43,4 +43,5 @@ void bli_trmv_check( obj_t* alpha, void bli_trmv_int_check( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); diff --git a/frame/2/trmv/bli_trmv_l_blk_var1.h b/frame/2/trmv/old/bli_trmv_l_blk_var1.h similarity index 97% rename from frame/2/trmv/bli_trmv_l_blk_var1.h rename to frame/2/trmv/old/bli_trmv_l_blk_var1.h index cb076d39a..bd3b45992 100644 --- a/frame/2/trmv/bli_trmv_l_blk_var1.h +++ b/frame/2/trmv/old/bli_trmv_l_blk_var1.h @@ -35,5 +35,6 @@ void bli_trmv_l_blk_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); diff --git a/frame/2/trmv/bli_trmv_l_blk_var2.h b/frame/2/trmv/old/bli_trmv_l_blk_var2.h similarity index 97% rename from frame/2/trmv/bli_trmv_l_blk_var2.h rename to frame/2/trmv/old/bli_trmv_l_blk_var2.h index 6dd9a4727..32574e914 100644 --- a/frame/2/trmv/bli_trmv_l_blk_var2.h +++ b/frame/2/trmv/old/bli_trmv_l_blk_var2.h @@ -35,5 +35,6 @@ void bli_trmv_l_blk_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); diff --git a/frame/2/trmv/bli_trmv_u_blk_var1.h b/frame/2/trmv/old/bli_trmv_u_blk_var1.h similarity index 97% rename from frame/2/trmv/bli_trmv_u_blk_var1.h rename to frame/2/trmv/old/bli_trmv_u_blk_var1.h index dff3bd1a6..e05f44e57 100644 --- a/frame/2/trmv/bli_trmv_u_blk_var1.h +++ b/frame/2/trmv/old/bli_trmv_u_blk_var1.h @@ -35,5 +35,6 @@ void bli_trmv_u_blk_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); diff --git a/frame/2/trmv/bli_trmv_u_blk_var2.h b/frame/2/trmv/old/bli_trmv_u_blk_var2.h similarity index 97% rename from frame/2/trmv/bli_trmv_u_blk_var2.h rename to frame/2/trmv/old/bli_trmv_u_blk_var2.h index d1b7e6f6a..25575a086 100644 --- a/frame/2/trmv/bli_trmv_u_blk_var2.h +++ b/frame/2/trmv/old/bli_trmv_u_blk_var2.h @@ -35,5 +35,6 @@ void bli_trmv_u_blk_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); diff --git a/frame/2/trmv/old/bli_trmv_unb_var1.c b/frame/2/trmv/old/bli_trmv_unb_var1.c new file mode 100644 index 000000000..ae33a42e5 --- /dev/null +++ b/frame/2/trmv/old/bli_trmv_unb_var1.c @@ -0,0 +1,226 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T trmv_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + trans_t trans, + diag_t diag, + dim_t m, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY2_ALL(ftypes,trmv_unb_var1); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY2_EXT(ftypes,trmv_unb_var1); +#else +static FUNCPTR_T GENARRAY2_MIN(ftypes,trmv_unb_var1); +#endif +#endif + + +void bli_trmv_unb_var1( obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx, + trmv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + + uplo_t uplo = bli_obj_uplo( *a ); + trans_t trans = bli_obj_conjtrans_status( *a ); + diag_t diag = bli_obj_diag( *a ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a][dt_x]; + + // Invoke the function. + f( uplo, + trans, + diag, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx ); +} + + +#undef GENTFUNC2U +#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +\ +void PASTEMAC2(cha,chx,varname)( \ + uplo_t uplo, \ + trans_t trans, \ + diag_t diag, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_a* a10t; \ + ctype_a* alpha11; \ + ctype_a* a12t; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_ax alpha_alpha11_conj; \ + ctype_ax rho; \ + dim_t iter, i; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + uplo_t uplo_trans; \ + conj_t conja; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( bli_does_notrans( trans ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ + uplo_trans = uplo; \ + } \ + else /* if ( bli_does_trans( trans ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ + uplo_trans = bli_uplo_toggled( uplo ); \ + } \ +\ + conja = bli_extract_conj( trans ); \ +\ + /* We reduce all of the possible cases down to just lower/upper. */ \ + if ( bli_is_upper( uplo_trans ) ) \ + { \ + for ( iter = 0; iter < m; ++iter ) \ + { \ + i = iter; \ + n_ahead = m - iter - 1; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a12t = a_cast + (i )*rs_at + (i+1)*cs_at; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ +\ + /* chi1 = alpha * alpha11 * chi1; */ \ + PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diag ) ) \ + PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi1 ); \ +\ + /* chi1 = chi1 + alpha * a12t * x2; */ \ + PASTEMAC3(cha,chx,chax,dotv)( conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + a12t, cs_at, \ + x2, incx, \ + &rho ); \ + PASTEMAC3(chax,chax,chx,axpys)( *alpha_cast, rho, *chi1 ); \ + } \ + } \ + else /* if ( bli_is_lower( uplo_trans ) ) */ \ + { \ + for ( iter = 0; iter < m; ++iter ) \ + { \ + i = m - iter - 1; \ + n_ahead = i; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ + chi1 = x_cast + (i )*incx; \ + x0 = x_cast + (0 )*incx; \ +\ + /* chi1 = alpha * alpha11 * chi1; */ \ + PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diag ) ) \ + PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi1 ); \ +\ + /* chi1 = chi1 + alpha * a10t * x0; */ \ + PASTEMAC3(cha,chx,chax,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + a10t, cs_at, \ + x0, incx, \ + &rho ); \ + PASTEMAC3(chax,chax,chx,axpys)( *alpha_cast, rho, *chi1 ); \ + } \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2U_BASIC( trmv_unb_var1, DOTV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC2U_MIX_D( trmv_unb_var1, DOTV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC2U_MIX_P( trmv_unb_var1, DOTV_KERNEL ) +#endif + diff --git a/frame/2/trmv/bli_trmv_unb_var1.h b/frame/2/trmv/old/bli_trmv_unb_var1.h similarity index 98% rename from frame/2/trmv/bli_trmv_unb_var1.h rename to frame/2/trmv/old/bli_trmv_unb_var1.h index 21e0a925f..eb51b2163 100644 --- a/frame/2/trmv/bli_trmv_unb_var1.h +++ b/frame/2/trmv/old/bli_trmv_unb_var1.h @@ -36,6 +36,7 @@ void bli_trmv_unb_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); #undef GENTPROT2 diff --git a/frame/2/trmv/old/bli_trmv_unb_var2.c b/frame/2/trmv/old/bli_trmv_unb_var2.c new file mode 100644 index 000000000..014944358 --- /dev/null +++ b/frame/2/trmv/old/bli_trmv_unb_var2.c @@ -0,0 +1,224 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T trmv_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + trans_t trans, + diag_t diag, + dim_t m, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY2_ALL(ftypes,trmv_unb_var2); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY2_EXT(ftypes,trmv_unb_var2); +#else +static FUNCPTR_T GENARRAY2_MIN(ftypes,trmv_unb_var2); +#endif +#endif + + +void bli_trmv_unb_var2( obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx, + trmv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + + uplo_t uplo = bli_obj_uplo( *a ); + trans_t trans = bli_obj_conjtrans_status( *a ); + diag_t diag = bli_obj_diag( *a ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a][dt_x]; + + // Invoke the function. + f( uplo, + trans, + diag, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx ); +} + + +#undef GENTFUNC2U +#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +\ +void PASTEMAC2(cha,chx,varname)( \ + uplo_t uplo, \ + trans_t trans, \ + diag_t diag, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_a* a01; \ + ctype_a* alpha11; \ + ctype_a* a21; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_ax alpha_alpha11_conj; \ + ctype_ax alpha_chi1; \ + dim_t iter, i; \ + dim_t n_behind; \ + inc_t rs_at, cs_at; \ + uplo_t uplo_trans; \ + conj_t conja; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( bli_does_notrans( trans ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ + uplo_trans = uplo; \ + } \ + else /* if ( bli_does_trans( trans ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ + uplo_trans = bli_uplo_toggled( uplo ); \ + } \ +\ + conja = bli_extract_conj( trans ); \ +\ + /* We reduce all of the possible cases down to just lower/upper. */ \ + if ( bli_is_upper( uplo_trans ) ) \ + { \ + for ( iter = 0; iter < m; ++iter ) \ + { \ + i = iter; \ + n_behind = i; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a01 = a_cast + (0 )*rs_at + (i )*cs_at; \ + chi1 = x_cast + (i )*incx; \ + x0 = x_cast + (0 )*incx; \ +\ + /* x0 = x0 + alpha * chi1 * a01; */ \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, *chi1, alpha_chi1 ); \ + PASTEMAC3(chax,cha,chx,axpyv)( conja, \ + n_behind, \ + &alpha_chi1, \ + a01, rs_at, \ + x0, incx ); \ +\ + /* chi1 = alpha * alpha11 * chi1; */ \ + PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diag ) ) \ + PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi1 ); \ + } \ + } \ + else /* if ( bli_is_lower( uplo_trans ) ) */ \ + { \ + for ( iter = 0; iter < m; ++iter ) \ + { \ + i = m - iter - 1; \ + n_behind = iter; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ +\ + /* x2 = x2 + alpha * chi1 * a21; */ \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, *chi1, alpha_chi1 ); \ + PASTEMAC3(chax,cha,chx,kername)( conja, \ + n_behind, \ + &alpha_chi1, \ + a21, rs_at, \ + x2, incx ); \ +\ + /* chi1 = alpha * alpha11 * chi1; */ \ + PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diag ) ) \ + PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi1 ); \ + } \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2U_BASIC( trmv_unb_var2, AXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC2U_MIX_D( trmv_unb_var2, AXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC2U_MIX_P( trmv_unb_var2, AXPYV_KERNEL ) +#endif + diff --git a/frame/2/trmv/bli_trmv_unb_var2.h b/frame/2/trmv/old/bli_trmv_unb_var2.h similarity index 98% rename from frame/2/trmv/bli_trmv_unb_var2.h rename to frame/2/trmv/old/bli_trmv_unb_var2.h index 905a6c4ea..76500190f 100644 --- a/frame/2/trmv/bli_trmv_unb_var2.h +++ b/frame/2/trmv/old/bli_trmv_unb_var2.h @@ -36,6 +36,7 @@ void bli_trmv_unb_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); #undef GENTPROT2 diff --git a/frame/2/trmv/old/bli_trmv_unf_var1.c b/frame/2/trmv/old/bli_trmv_unf_var1.c new file mode 100644 index 000000000..ae69a2888 --- /dev/null +++ b/frame/2/trmv/old/bli_trmv_unf_var1.c @@ -0,0 +1,293 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T trmv_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + trans_t trans, + diag_t diag, + dim_t m, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY2_ALL(ftypes,trmv_unf_var1); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY2_EXT(ftypes,trmv_unf_var1); +#else +static FUNCPTR_T GENARRAY2_MIN(ftypes,trmv_unf_var1); +#endif +#endif + + +void bli_trmv_unf_var1( obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx, + trmv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + + uplo_t uplo = bli_obj_uplo( *a ); + trans_t trans = bli_obj_conjtrans_status( *a ); + diag_t diag = bli_obj_diag( *a ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a][dt_x]; + + // Invoke the function. + f( uplo, + trans, + diag, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx ); +} + + +#undef GENTFUNC2U +#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +\ +void PASTEMAC2(cha,chx,varname)( \ + uplo_t uplo, \ + trans_t trans, \ + diag_t diag, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_x* one = PASTEMAC(chx,1); \ + ctype_a* A10; \ + ctype_a* A11; \ + ctype_a* A12; \ + ctype_a* a10t; \ + ctype_a* alpha11; \ + ctype_a* a12t; \ + ctype_x* x0; \ + ctype_x* x1; \ + ctype_x* x2; \ + ctype_x* x01; \ + ctype_x* chi11; \ + ctype_x* x21; \ + ctype_ax alpha_alpha11_conj; \ + ctype_ax rho1; \ + dim_t iter, i, k, j, l; \ + dim_t b_fuse, f; \ + dim_t n_ahead, f_ahead; \ + inc_t rs_at, cs_at; \ + uplo_t uplo_trans; \ + conj_t conja; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( bli_does_notrans( trans ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ + uplo_trans = uplo; \ + } \ + else /* if ( bli_does_trans( trans ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ + uplo_trans = bli_uplo_toggled( uplo ); \ + } \ +\ + conja = bli_extract_conj( trans ); \ +\ + /* Query the fusing factor for the dotxf implementation. */ \ + b_fuse = PASTEMAC(chax,dotxf_fusefac); \ +\ + /* We reduce all of the possible cases down to just lower/upper. */ \ + if ( bli_is_upper( uplo_trans ) ) \ + { \ + for ( iter = 0; iter < m; iter += f ) \ + { \ + f = bli_determine_blocksize_dim_f( iter, m, b_fuse ); \ + i = iter; \ + n_ahead = m - iter - f; \ + A11 = a_cast + (i )*rs_at + (i )*cs_at; \ + A12 = a_cast + (i )*rs_at + (i+f)*cs_at; \ + x1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+f)*incx; \ +\ + /* x1 = alpha * A11 * x1; */ \ + for ( k = 0; k < f; ++k ) \ + { \ + l = k; \ + f_ahead = f - l - 1; \ + alpha11 = A11 + (l )*rs_at + (l )*cs_at; \ + a12t = A11 + (l )*rs_at + (l+1)*cs_at; \ + chi11 = x1 + (l )*incx; \ + x21 = x1 + (l+1)*incx; \ +\ + /* chi11 = alpha * alpha11 * chi11; */ \ + PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diag ) ) \ + PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi11 ); \ +\ + /* chi11 = chi11 + alpha * a12t * x21; */ \ + PASTEMAC(chax,set0s)( rho1 ); \ + if ( bli_is_conj( conja ) ) \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(cha,chx,chax,dotjs)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ + } \ + else \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(cha,chx,chax,dots)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ + } \ + PASTEMAC3(chax,chax,chx,axpys)( *alpha_cast, rho1, *chi11 ); \ + } \ +\ + /* x1 = x1 + alpha * A12 * x2; */ \ + PASTEMAC3(cha,chx,chx,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + f, \ + alpha_cast, \ + A12, cs_at, rs_at, \ + x2, incx, \ + one, \ + x1, incx ); \ + } \ + } \ + else /* if ( bli_is_lower( uplo_trans ) ) */ \ + { \ + for ( iter = 0; iter < m; iter += f ) \ + { \ + f = bli_determine_blocksize_dim_b( iter, m, b_fuse ); \ + i = m - iter - f; \ + n_ahead = i; \ + A11 = a_cast + (i )*rs_at + (i )*cs_at; \ + A10 = a_cast + (i )*rs_at + (0 )*cs_at; \ + x1 = x_cast + (i )*incx; \ + x0 = x_cast + (0 )*incx; \ +\ + /* x1 = alpha * A11 * x1; */ \ + for ( k = 0; k < f; ++k ) \ + { \ + l = f - k - 1; \ + f_ahead = l; \ + alpha11 = A11 + (l )*rs_at + (l )*cs_at; \ + a10t = A11 + (l )*rs_at + (0 )*cs_at; \ + chi11 = x1 + (l )*incx; \ + x01 = x1 + (0 )*incx; \ +\ + /* chi11 = alpha * alpha11 * chi11; */ \ + PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diag ) ) \ + PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi11 ); \ +\ + /* chi11 = chi11 + alpha * a10t * x01; */ \ + PASTEMAC(chax,set0s)( rho1 ); \ + if ( bli_is_conj( conja ) ) \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(cha,chx,chax,dotjs)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ + } \ + else \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(cha,chx,chax,dots)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ + } \ + PASTEMAC3(chax,chax,chx,axpys)( *alpha_cast, rho1, *chi11 ); \ + } \ +\ + /* x1 = x1 + alpha * A10 * x0; */ \ + PASTEMAC3(cha,chx,chx,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + f, \ + alpha_cast, \ + A10, cs_at, rs_at, \ + x0, incx, \ + one, \ + x1, incx ); \ + } \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2U_BASIC( trmv_unf_var1, DOTXF_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC2U_MIX_D( trmv_unf_var1, DOTXF_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC2U_MIX_P( trmv_unf_var1, DOTXF_KERNEL ) +#endif + diff --git a/frame/2/trmv/bli_trmv_unf_var1.h b/frame/2/trmv/old/bli_trmv_unf_var1.h similarity index 98% rename from frame/2/trmv/bli_trmv_unf_var1.h rename to frame/2/trmv/old/bli_trmv_unf_var1.h index c7187d3b8..02417c677 100644 --- a/frame/2/trmv/bli_trmv_unf_var1.h +++ b/frame/2/trmv/old/bli_trmv_unf_var1.h @@ -36,6 +36,7 @@ void bli_trmv_unf_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); #undef GENTPROT2 diff --git a/frame/2/trmv/old/bli_trmv_unf_var2.c b/frame/2/trmv/old/bli_trmv_unf_var2.c new file mode 100644 index 000000000..2909d4bc0 --- /dev/null +++ b/frame/2/trmv/old/bli_trmv_unf_var2.c @@ -0,0 +1,288 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T trmv_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + trans_t trans, + diag_t diag, + dim_t m, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY2_ALL(ftypes,trmv_unf_var2); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY2_EXT(ftypes,trmv_unf_var2); +#else +static FUNCPTR_T GENARRAY2_MIN(ftypes,trmv_unf_var2); +#endif +#endif + + +void bli_trmv_unf_var2( obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx, + trmv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + + uplo_t uplo = bli_obj_uplo( *a ); + trans_t trans = bli_obj_conjtrans_status( *a ); + diag_t diag = bli_obj_diag( *a ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a][dt_x]; + + // Invoke the function. + f( uplo, + trans, + diag, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx ); +} + + +#undef GENTFUNC2U +#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +\ +void PASTEMAC2(cha,chx,varname)( \ + uplo_t uplo, \ + trans_t trans, \ + diag_t diag, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_a* A01; \ + ctype_a* A11; \ + ctype_a* A21; \ + ctype_a* a01; \ + ctype_a* alpha11; \ + ctype_a* a21; \ + ctype_x* x0; \ + ctype_x* x1; \ + ctype_x* x2; \ + ctype_x* x01; \ + ctype_x* chi11; \ + ctype_x* x21; \ + ctype_ax alpha_alpha11_conj; \ + ctype_ax alpha_chi11; \ + dim_t iter, i, k, j, l; \ + dim_t b_fuse, f; \ + dim_t n_behind, f_behind; \ + inc_t rs_at, cs_at; \ + uplo_t uplo_trans; \ + conj_t conja; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( bli_does_notrans( trans ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ + uplo_trans = uplo; \ + } \ + else /* if ( bli_does_trans( trans ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ + uplo_trans = bli_uplo_toggled( uplo ); \ + } \ +\ + conja = bli_extract_conj( trans ); \ +\ + /* Query the fusing factor for the axpyf implementation. */ \ + b_fuse = PASTEMAC(chax,axpyf_fusefac); \ +\ + /* We reduce all of the possible cases down to just lower/upper. */ \ + if ( bli_is_upper( uplo_trans ) ) \ + { \ + for ( iter = 0; iter < m; iter += f ) \ + { \ + f = bli_determine_blocksize_dim_f( iter, m, b_fuse ); \ + i = iter; \ + n_behind = i; \ + A11 = a_cast + (i )*rs_at + (i )*cs_at; \ + A01 = a_cast + (0 )*rs_at + (i )*cs_at; \ + x1 = x_cast + (i )*incx; \ + x0 = x_cast + (0 )*incx; \ +\ + /* x0 = x0 + alpha * A01 * x1; */ \ + PASTEMAC3(cha,chx,chx,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + f, \ + alpha_cast, \ + A01, rs_at, cs_at, \ + x1, incx, \ + x0, incx ); \ +\ + /* x1 = alpha * A11 * x1; */ \ + for ( k = 0; k < f; ++k ) \ + { \ + l = k; \ + f_behind = l; \ + alpha11 = A11 + (l )*rs_at + (l )*cs_at; \ + a01 = A11 + (0 )*rs_at + (l )*cs_at; \ + chi11 = x1 + (l )*incx; \ + x01 = x1 + (0 )*incx; \ +\ + /* x01 = x01 + alpha * chi11 * a01; */ \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, *chi11, alpha_chi11 ); \ + if ( bli_is_conj( conja ) ) \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(chax,cha,chx,axpyjs)( alpha_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ + } \ + else \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(chax,cha,chx,axpys)( alpha_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ + } \ +\ + /* chi11 = alpha * alpha11 * chi11; */ \ + PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diag ) ) \ + PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi11 ); \ + } \ + } \ + } \ + else /* if ( bli_is_lower( uplo_trans ) ) */ \ + { \ + for ( iter = 0; iter < m; iter += f ) \ + { \ + f = bli_determine_blocksize_dim_b( iter, m, b_fuse ); \ + i = m - iter - f; \ + n_behind = iter; \ + A11 = a_cast + (i )*rs_at + (i )*cs_at; \ + A21 = a_cast + (i+f)*rs_at + (i )*cs_at; \ + x1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+f)*incx; \ +\ + /* x2 = x2 + alpha * A21 * x1; */ \ + PASTEMAC3(cha,chx,chx,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + f, \ + alpha_cast, \ + A21, rs_at, cs_at, \ + x1, incx, \ + x2, incx ); \ +\ + /* x1 = alpha * A11 * x1; */ \ + for ( k = 0; k < f; ++k ) \ + { \ + l = f - k - 1; \ + f_behind = k; \ + alpha11 = A11 + (l )*rs_at + (l )*cs_at; \ + a21 = A11 + (l+1)*rs_at + (l )*cs_at; \ + chi11 = x1 + (l )*incx; \ + x21 = x1 + (l+1)*incx; \ +\ + /* x21 = x21 + alpha * chi11 * a21; */ \ + PASTEMAC3(chax,chx,chax,scal2s)( *alpha_cast, *chi11, alpha_chi11 ); \ + if ( bli_is_conj( conja ) ) \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(chax,cha,chx,axpyjs)( alpha_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ + } \ + else \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(chax,cha,chx,axpys)( alpha_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ + } \ +\ + /* chi11 = alpha * alpha11 * chi11; */ \ + PASTEMAC2(chax,chax,copys)( *alpha_cast, alpha_alpha11_conj ); \ + if ( bli_is_nonunit_diag( diag ) ) \ + PASTEMAC2(cha,chax,scalcjs)( conja, *alpha11, alpha_alpha11_conj ); \ + PASTEMAC2(chax,chx,scals)( alpha_alpha11_conj, *chi11 ); \ + } \ + } \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2U_BASIC( trmv_unf_var2, AXPYF_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC2U_MIX_D( trmv_unf_var2, AXPYF_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC2U_MIX_P( trmv_unf_var2, AXPYF_KERNEL ) +#endif + diff --git a/frame/2/trmv/bli_trmv_unf_var2.h b/frame/2/trmv/old/bli_trmv_unf_var2.h similarity index 98% rename from frame/2/trmv/bli_trmv_unf_var2.h rename to frame/2/trmv/old/bli_trmv_unf_var2.h index 1c5c45951..078558a91 100644 --- a/frame/2/trmv/bli_trmv_unf_var2.h +++ b/frame/2/trmv/old/bli_trmv_unf_var2.h @@ -36,6 +36,7 @@ void bli_trmv_unf_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trmv_t* cntl ); #undef GENTPROT2 diff --git a/frame/2/trsv/bli_trsv.h b/frame/2/trsv/bli_trsv.h index 1ad21ec63..7b51ed69a 100644 --- a/frame/2/trsv/bli_trsv.h +++ b/frame/2/trsv/bli_trsv.h @@ -33,68 +33,8 @@ */ #include "bli_trsv_cntl.h" -#include "bli_trsv_check.h" +#include "bli_trsv_front.h" #include "bli_trsv_int.h" -#include "bli_trsv_unb_var1.h" -#include "bli_trsv_unb_var2.h" - -#include "bli_trsv_unf_var1.h" -#include "bli_trsv_unf_var2.h" - -#include "bli_trsv_l_blk_var1.h" -#include "bli_trsv_l_blk_var2.h" -#include "bli_trsv_u_blk_var1.h" -#include "bli_trsv_u_blk_var2.h" - - -void bli_trsv( obj_t* alpha, - obj_t* a, - obj_t* x ); - - -// -// Prototype BLAS-like interfaces with homogeneous-typed operands. -// -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* x, inc_t incx \ - ); - -INSERT_GENTPROT_BASIC( trsv ) - - -// -// Prototype BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTPROT2U -#define GENTPROT2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, opname ) \ -\ -void PASTEMAC2(cha,chx,opname)( \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t rs_a, inc_t cs_a, \ - ctype_x* x, inc_t incx \ - ); - -INSERT_GENTPROT2U_BASIC( trsv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2U_MIX_D( trsv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2U_MIX_P( trsv ) -#endif +#include "bli_trsv_var.h" diff --git a/frame/2/trsv/bli_trsv_cntl.c b/frame/2/trsv/bli_trsv_cntl.c index 72da76243..9a3b20b1c 100644 --- a/frame/2/trsv/bli_trsv_cntl.c +++ b/frame/2/trsv/bli_trsv_cntl.c @@ -44,8 +44,6 @@ extern gemv_t* gemv_cntl_rp_bs_axpy; extern gemv_t* gemv_cntl_cp_bs_dot; extern gemv_t* gemv_cntl_cp_bs_axpy; -extern blksz_t* gemv_mc; - trsv_t* trsv_cntl_bs_ke_nrow_tcol; trsv_t* trsv_cntl_bs_ke_ncol_trow; trsv_t* trsv_cntl_ge_nrow_tcol; @@ -60,16 +58,18 @@ void bli_trsv_cntl_init() = bli_trsv_cntl_obj_create( BLIS_UNB_FUSED, BLIS_VARIANT1, + 0, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL ); + NULL ); trsv_cntl_bs_ke_ncol_trow = bli_trsv_cntl_obj_create( BLIS_UNB_FUSED, BLIS_VARIANT2, + 0, NULL, NULL, NULL, NULL, NULL, NULL, - NULL, NULL ); + NULL ); // Create control trees for generally large problems. Here we choose a // variant that prioritizes keeping a subvector of x in cache. @@ -77,7 +77,7 @@ void bli_trsv_cntl_init() = bli_trsv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, // use var1 to maximize x1 usage - gemv_mc, + BLIS_M2, scalv_cntl, // scale x up-front packm_cntl, // pack A11 (if needed) packv_cntl, // pack x1 (if needed) @@ -89,7 +89,7 @@ void bli_trsv_cntl_init() = bli_trsv_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, // use var1 to maximize x1 usage - gemv_mc, + BLIS_M2, scalv_cntl, // scale x up-front packm_cntl, // pack A11 (if needed) packv_cntl, // pack x1 (if needed) @@ -110,7 +110,7 @@ void bli_trsv_cntl_finalize() trsv_t* bli_trsv_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a11, packv_t* sub_packv_x1, @@ -125,7 +125,7 @@ trsv_t* bli_trsv_cntl_obj_create( impl_t impl_type, cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_scalv = sub_scalv; cntl->sub_packm_a11 = sub_packm_a11; cntl->sub_packv_x1 = sub_packv_x1; @@ -140,7 +140,7 @@ trsv_t* bli_trsv_cntl_obj_create( impl_t impl_type, void bli_trsv_cntl_obj_init( trsv_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a11, packv_t* sub_packv_x1, @@ -151,7 +151,7 @@ void bli_trsv_cntl_obj_init( trsv_t* cntl, { cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; + cntl->bszid = bszid; cntl->sub_scalv = sub_scalv; cntl->sub_packm_a11 = sub_packm_a11; cntl->sub_packv_x1 = sub_packv_x1; diff --git a/frame/2/trsv/bli_trsv_cntl.h b/frame/2/trsv/bli_trsv_cntl.h index 45e305376..6c2c43893 100644 --- a/frame/2/trsv/bli_trsv_cntl.h +++ b/frame/2/trsv/bli_trsv_cntl.h @@ -36,7 +36,7 @@ struct trsv_s { impl_t impl_type; varnum_t var_num; - blksz_t* b; + bszid_t bszid; struct scalv_s* sub_scalv; struct packm_s* sub_packm_a11; struct packv_s* sub_packv_x1; @@ -53,7 +53,7 @@ void bli_trsv_cntl_init( void ); void bli_trsv_cntl_finalize( void ); trsv_t* bli_trsv_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a11, packv_t* sub_packv_x1, @@ -64,7 +64,7 @@ trsv_t* bli_trsv_cntl_obj_create( impl_t impl_type, void bli_trsv_cntl_obj_init( trsv_t* cntl, impl_t impl_type, varnum_t var_num, - blksz_t* b, + bszid_t bszid, scalv_t* sub_scalv, packm_t* sub_packm_a11, packv_t* sub_packv_x1, diff --git a/frame/2/trsv/bli_trsv.c b/frame/2/trsv/bli_trsv_front.c similarity index 77% rename from frame/2/trsv/bli_trsv.c rename to frame/2/trsv/bli_trsv_front.c index ac393926e..4b9d5558b 100644 --- a/frame/2/trsv/bli_trsv.c +++ b/frame/2/trsv/bli_trsv_front.c @@ -39,9 +39,13 @@ extern trsv_t* trsv_cntl_bs_ke_ncol_trow; extern trsv_t* trsv_cntl_ge_nrow_tcol; extern trsv_t* trsv_cntl_ge_ncol_trow; -void bli_trsv( obj_t* alpha, - obj_t* a, - obj_t* x ) +void bli_trsv_front + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx + ) { trsv_t* trsv_cntl; num_t dt_targ_a; @@ -113,12 +117,12 @@ void bli_trsv( obj_t* alpha, } } - // Invoke the internal back-end with the copy-cast of alpha and the // chosen control tree. bli_trsv_int( &alpha_local, a, x, + cntx, trsv_cntl ); } @@ -127,17 +131,19 @@ void bli_trsv( obj_t* alpha, // Define BLAS-like interfaces with homogeneous-typed operands. // #undef GENTFUNC -#define GENTFUNC( ctype, ch, opname, varname ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* x, inc_t incx \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -160,38 +166,9 @@ void PASTEMAC(ch,opname)( \ \ PASTEMAC0(opname)( &alphao, \ &ao, \ - &xo ); \ + &xo, \ + cntx ); \ } -INSERT_GENTFUNC_BASIC( trsv, trsv ) - - -// -// Define BLAS-like interfaces with heterogeneous-typed operands. -// -#undef GENTFUNC2U -#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, opname, varname ) \ -\ -void PASTEMAC2(cha,chx,opname)( \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - ctype_ax* alpha, \ - ctype_a* a, inc_t rs_a, inc_t cs_a, \ - ctype_x* x, inc_t incx \ - ) \ -{ \ - bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ -} - -INSERT_GENTFUNC2U_BASIC( trsv, trsv ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2U_MIX_D( trsv, trsv ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2U_MIX_P( trsv, trsv ) -#endif +INSERT_GENTFUNC_BASIC0( trsv_front ) diff --git a/frame/2/trsv/bli_trsv_front.h b/frame/2/trsv/bli_trsv_front.h new file mode 100644 index 000000000..dddc55898 --- /dev/null +++ b/frame/2/trsv/bli_trsv_front.h @@ -0,0 +1,58 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +void bli_trsv_front + ( + obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx + ); + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( trsv_front ) diff --git a/frame/2/trsv/bli_trsv_int.c b/frame/2/trsv/bli_trsv_int.c index 0f1454fc8..f715b50d7 100644 --- a/frame/2/trsv/bli_trsv_int.c +++ b/frame/2/trsv/bli_trsv_int.c @@ -39,6 +39,7 @@ typedef void (*FUNCPTR_T)( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); static FUNCPTR_T vars[2][3][3] = @@ -62,6 +63,7 @@ static FUNCPTR_T vars[2][3][3] = void bli_trsv_int( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ) { varnum_t n; @@ -72,7 +74,7 @@ void bli_trsv_int( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_trsv_int_check( alpha, a, x, cntl ); + bli_trsv_check( alpha, a, x ); // If A or x has a zero dimension, return early. if ( bli_obj_has_zero_dim( *a ) ) return; @@ -123,6 +125,7 @@ void bli_trsv_int( obj_t* alpha, f( alpha, &a_local, x, + cntx, cntl ); } diff --git a/frame/2/trsv/bli_trsv_int.h b/frame/2/trsv/bli_trsv_int.h index 267fddce2..14f695e22 100644 --- a/frame/2/trsv/bli_trsv_int.h +++ b/frame/2/trsv/bli_trsv_int.h @@ -35,5 +35,6 @@ void bli_trsv_int( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); diff --git a/frame/2/trsv/bli_trsv_l_blk_var1.c b/frame/2/trsv/bli_trsv_l_blk_var1.c index 557cab5d8..7c4551d7f 100644 --- a/frame/2/trsv/bli_trsv_l_blk_var1.c +++ b/frame/2/trsv/bli_trsv_l_blk_var1.c @@ -37,6 +37,7 @@ void bli_trsv_l_blk_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ) { obj_t a11, a11_pack; @@ -58,14 +59,14 @@ void bli_trsv_l_blk_var1( obj_t* alpha, // x = alpha * x; bli_scalv_int( alpha, x, - cntl_sub_scalv( cntl ) ); + cntx, cntl_sub_scalv( cntl ) ); // Partition diagonally. for ( ij = 0; ij < mn; ij += b_alg ) { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A10, x1, and x0. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -79,16 +80,16 @@ void bli_trsv_l_blk_var1( obj_t* alpha, // Initialize objects for packing A11 and x1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // Copy/pack A11, x1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // x1 = x1 - A10 * x0; bli_gemv_int( BLIS_NO_TRANSPOSE, @@ -98,17 +99,19 @@ void bli_trsv_l_blk_var1( obj_t* alpha, &x0, &BLIS_ONE, &x1_pack, + cntx, cntl_sub_gemv_rp( cntl ) ); // x1 = x1 / tril( A11 ); bli_trsv_int( &BLIS_ONE, &a11_pack, &x1_pack, + cntx, cntl_sub_trsv( cntl ) ); // Copy/unpack x1 (if x1 was packed). bli_unpackv_int( &x1_pack, &x1, - cntl_sub_unpackv_x1( cntl ) ); + cntx, cntl_sub_unpackv_x1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/trsv/bli_trsv_l_blk_var2.c b/frame/2/trsv/bli_trsv_l_blk_var2.c index 3ee427e0b..835801e00 100644 --- a/frame/2/trsv/bli_trsv_l_blk_var2.c +++ b/frame/2/trsv/bli_trsv_l_blk_var2.c @@ -37,6 +37,7 @@ void bli_trsv_l_blk_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ) { obj_t a11, a11_pack; @@ -58,14 +59,14 @@ void bli_trsv_l_blk_var2( obj_t* alpha, // x = alpha * x; bli_scalv_int( alpha, x, - cntl_sub_scalv( cntl ) ); + cntx, cntl_sub_scalv( cntl ) ); // Partition diagonally. for ( ij = 0; ij < mn; ij += b_alg ) { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A21, x1, and x2. bli_acquire_mpart_tl2br( BLIS_SUBPART11, @@ -79,21 +80,22 @@ void bli_trsv_l_blk_var2( obj_t* alpha, // Initialize objects for packing A11 and x1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // Copy/pack A11, x1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // x1 = x1 / tril( A11 ); bli_trsv_int( &BLIS_ONE, &a11_pack, &x1_pack, + cntx, cntl_sub_trsv( cntl ) ); // x2 = x2 - A21 * x1; @@ -104,11 +106,12 @@ void bli_trsv_l_blk_var2( obj_t* alpha, &x1_pack, &BLIS_ONE, &x2, + cntx, cntl_sub_gemv_cp( cntl ) ); // Copy/unpack x1 (if x1 was packed). bli_unpackv_int( &x1_pack, &x1, - cntl_sub_unpackv_x1( cntl ) ); + cntx, cntl_sub_unpackv_x1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/trsv/bli_trsv_u_blk_var1.c b/frame/2/trsv/bli_trsv_u_blk_var1.c index b8b5a3f3b..62ae66823 100644 --- a/frame/2/trsv/bli_trsv_u_blk_var1.c +++ b/frame/2/trsv/bli_trsv_u_blk_var1.c @@ -37,6 +37,7 @@ void bli_trsv_u_blk_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ) { obj_t a11, a11_pack; @@ -58,14 +59,14 @@ void bli_trsv_u_blk_var1( obj_t* alpha, // x = alpha * x; bli_scalv_int( alpha, x, - cntl_sub_scalv( cntl ) ); + cntx, cntl_sub_scalv( cntl ) ); // Partition diagonally. for ( ij = 0; ij < mn; ij += b_alg ) { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_b( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A12, x1, and x2. bli_acquire_mpart_br2tl( BLIS_SUBPART11, @@ -79,16 +80,16 @@ void bli_trsv_u_blk_var1( obj_t* alpha, // Initialize objects for packing A11 and x1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // Copy/pack A11, x1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // x1 = x1 - A12 * x2; bli_gemv_int( BLIS_NO_TRANSPOSE, @@ -98,17 +99,19 @@ void bli_trsv_u_blk_var1( obj_t* alpha, &x2, &BLIS_ONE, &x1_pack, + cntx, cntl_sub_gemv_rp( cntl ) ); // x1 = x1 / tril( A11 ); bli_trsv_int( &BLIS_ONE, &a11_pack, &x1_pack, + cntx, cntl_sub_trsv( cntl ) ); // Copy/unpack x1 (if x1 was packed). bli_unpackv_int( &x1_pack, &x1, - cntl_sub_unpackv_x1( cntl ) ); + cntx, cntl_sub_unpackv_x1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/trsv/bli_trsv_u_blk_var2.c b/frame/2/trsv/bli_trsv_u_blk_var2.c index 1020bf5bf..ef872f706 100644 --- a/frame/2/trsv/bli_trsv_u_blk_var2.c +++ b/frame/2/trsv/bli_trsv_u_blk_var2.c @@ -37,6 +37,7 @@ void bli_trsv_u_blk_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ) { obj_t a11, a11_pack; @@ -58,14 +59,14 @@ void bli_trsv_u_blk_var2( obj_t* alpha, // x = alpha * x; bli_scalv_int( alpha, x, - cntl_sub_scalv( cntl ) ); + cntx, cntl_sub_scalv( cntl ) ); // Partition diagonally. for ( ij = 0; ij < mn; ij += b_alg ) { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_b( ij, mn, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A11, A01, x1, and x0. bli_acquire_mpart_br2tl( BLIS_SUBPART11, @@ -79,21 +80,22 @@ void bli_trsv_u_blk_var2( obj_t* alpha, // Initialize objects for packing A11 and x1 (if needed). bli_packm_init( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ) ); + cntx, cntl_sub_packm_a11( cntl ) ); bli_packv_init( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // Copy/pack A11, x1 (if needed). bli_packm_int( &a11, &a11_pack, - cntl_sub_packm_a11( cntl ), + cntx, cntl_sub_packm_a11( cntl ), &BLIS_PACKM_SINGLE_THREADED ); bli_packv_int( &x1, &x1_pack, - cntl_sub_packv_x1( cntl ) ); + cntx, cntl_sub_packv_x1( cntl ) ); // x1 = x1 / tril( A11 ); bli_trsv_int( &BLIS_ONE, &a11_pack, &x1_pack, + cntx, cntl_sub_trsv( cntl ) ); // x0 = x0 - A01 * x1; @@ -104,11 +106,12 @@ void bli_trsv_u_blk_var2( obj_t* alpha, &x1_pack, &BLIS_ONE, &x0, + cntx, cntl_sub_gemv_cp( cntl ) ); // Copy/unpack x1 (if x1 was packed). bli_unpackv_int( &x1_pack, &x1, - cntl_sub_unpackv_x1( cntl ) ); + cntx, cntl_sub_unpackv_x1( cntl ) ); } // If any packing buffers were acquired within packm, release them back diff --git a/frame/2/trsv/bli_trsv_unb_var1.c b/frame/2/trsv/bli_trsv_unb_var1.c index 67fbce519..4ff99bcb4 100644 --- a/frame/2/trsv/bli_trsv_unb_var1.c +++ b/frame/2/trsv/bli_trsv_unb_var1.c @@ -34,198 +34,133 @@ #include "blis.h" -#define FUNCPTR_T trsv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - trans_t trans, - diag_t diag, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,trsv_unb_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,trsv_unb_var1); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,trsv_unb_var1); -#endif -#endif - - -void bli_trsv_unb_var1( obj_t* alpha, - obj_t* a, - obj_t* x, - trsv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - - uplo_t uplo = bli_obj_uplo( *a ); - trans_t trans = bli_obj_conjtrans_status( *a ); - diag_t diag = bli_obj_diag( *a ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x]; - - // Invoke the function. - f( uplo, - trans, - diag, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx ); -} - - -#undef GENTFUNC2U -#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(cha,chx,varname)( \ - uplo_t uplo, \ - trans_t trans, \ - diag_t diag, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_a* a10t; \ - ctype_a* alpha11; \ - ctype_a* a12t; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_ax alpha11_conj; \ - ctype_ax rho; \ - dim_t iter, i; \ - dim_t n_behind; \ - inc_t rs_at, cs_at; \ - uplo_t uplo_trans; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* a10t; \ + ctype* alpha11; \ + ctype* a12t; \ + ctype* x0; \ + ctype* chi1; \ + ctype* x2; \ + ctype alpha11_conj; \ + ctype rho; \ + dim_t iter, i; \ + dim_t n_behind; \ + inc_t rs_at, cs_at; \ + uplo_t uploa_trans; \ + conj_t conja; \ \ - if ( bli_does_notrans( trans ) ) \ + if ( bli_does_notrans( transa ) ) \ { \ rs_at = rs_a; \ cs_at = cs_a; \ - uplo_trans = uplo; \ + uploa_trans = uploa; \ } \ - else /* if ( bli_does_trans( trans ) ) */ \ + else /* if ( bli_does_trans( transa ) ) */ \ { \ rs_at = cs_a; \ cs_at = rs_a; \ - uplo_trans = bli_uplo_toggled( uplo ); \ + uploa_trans = bli_uplo_toggled( uploa ); \ } \ \ - conja = bli_extract_conj( trans ); \ + conja = bli_extract_conj( transa ); \ \ /* x = alpha * x; */ \ - PASTEMAC2(chax,chx,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - alpha_cast, \ - x, incx ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + alpha, \ + x, incx, \ + cntx \ + ); \ +\ + PASTECH(ch,dotv_ft) kfp_tv; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_tv = bli_cntx_get_l1v_ker_dt( dt, BLIS_DOTV_KER, cntx ); \ \ /* We reduce all of the possible cases down to just lower/upper. */ \ - if ( bli_is_upper( uplo_trans ) ) \ + if ( bli_is_upper( uploa_trans ) ) \ { \ for ( iter = 0; iter < m; ++iter ) \ { \ i = m - iter - 1; \ n_behind = iter; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a12t = a_cast + (i )*rs_at + (i+1)*cs_at; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a12t = a + (i )*rs_at + (i+1)*cs_at; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ \ /* chi1 = chi1 - a12t * x2; */ \ - PASTEMAC3(cha,chx,chax,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_behind, \ - a12t, cs_at, \ - x2, incx, \ - &rho ); \ - PASTEMAC2(chax,chx,subs)( rho, *chi1 ); \ + kfp_tv \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + a12t, cs_at, \ + x2, incx, \ + &rho, \ + cntx \ + ); \ + PASTEMAC(ch,subs)( rho, *chi1 ); \ \ /* chi1 = chi1 / alpha11; */ \ - if ( bli_is_nonunit_diag( diag ) ) \ + if ( bli_is_nonunit_diag( diaga ) ) \ { \ - PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ - PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi1 ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC(ch,invscals)( alpha11_conj, *chi1 ); \ } \ } \ } \ - else /* if ( bli_is_lower( uplo_trans ) ) */ \ + else /* if ( bli_is_lower( uploa_trans ) ) */ \ { \ for ( iter = 0; iter < m; ++iter ) \ { \ i = iter; \ n_behind = i; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ - chi1 = x_cast + (i )*incx; \ - x0 = x_cast + (0 )*incx; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a10t = a + (i )*rs_at + (0 )*cs_at; \ + chi1 = x + (i )*incx; \ + x0 = x + (0 )*incx; \ \ /* chi1 = chi1 - a10t * x0; */ \ - PASTEMAC3(cha,chx,chax,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_behind, \ - a10t, cs_at, \ - x0, incx, \ - &rho ); \ - PASTEMAC2(chax,chx,subs)( rho, *chi1 ); \ + kfp_tv \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + a10t, cs_at, \ + x0, incx, \ + &rho, \ + cntx \ + ); \ + PASTEMAC(ch,subs)( rho, *chi1 ); \ \ /* chi1 = chi1 / alpha11; */ \ - if ( bli_is_nonunit_diag( diag ) ) \ + if ( bli_is_nonunit_diag( diaga ) ) \ { \ - PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ - PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi1 ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC(ch,invscals)( alpha11_conj, *chi1 ); \ } \ } \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2U_BASIC( trsv_unb_var1, DOTV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2U_MIX_D( trsv_unb_var1, DOTV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2U_MIX_P( trsv_unb_var1, DOTV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( trsv_unb_var1 ) diff --git a/frame/2/trsv/bli_trsv_unb_var2.c b/frame/2/trsv/bli_trsv_unb_var2.c index 1e3558eec..1f5b341c8 100644 --- a/frame/2/trsv/bli_trsv_unb_var2.c +++ b/frame/2/trsv/bli_trsv_unb_var2.c @@ -34,196 +34,131 @@ #include "blis.h" -#define FUNCPTR_T trsv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - trans_t trans, - diag_t diag, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,trsv_unb_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,trsv_unb_var2); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,trsv_unb_var2); -#endif -#endif - - -void bli_trsv_unb_var2( obj_t* alpha, - obj_t* a, - obj_t* x, - trsv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - - uplo_t uplo = bli_obj_uplo( *a ); - trans_t trans = bli_obj_conjtrans_status( *a ); - diag_t diag = bli_obj_diag( *a ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x]; - - // Invoke the function. - f( uplo, - trans, - diag, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx ); -} - - -#undef GENTFUNC2U -#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(cha,chx,varname)( \ - uplo_t uplo, \ - trans_t trans, \ - diag_t diag, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_a* a01; \ - ctype_a* alpha11; \ - ctype_a* a21; \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_ax alpha11_conj; \ - ctype_ax minus_chi1; \ - dim_t iter, i; \ - dim_t n_ahead; \ - inc_t rs_at, cs_at; \ - uplo_t uplo_trans; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* a01; \ + ctype* alpha11; \ + ctype* a21; \ + ctype* x0; \ + ctype* chi1; \ + ctype* x2; \ + ctype alpha11_conj; \ + ctype minus_chi1; \ + dim_t iter, i; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + uplo_t uploa_trans; \ + conj_t conja; \ \ - if ( bli_does_notrans( trans ) ) \ + if ( bli_does_notrans( transa ) ) \ { \ rs_at = rs_a; \ cs_at = cs_a; \ - uplo_trans = uplo; \ + uploa_trans = uploa; \ } \ - else /* if ( bli_does_trans( trans ) ) */ \ + else /* if ( bli_does_trans( transa ) ) */ \ { \ rs_at = cs_a; \ cs_at = rs_a; \ - uplo_trans = bli_uplo_toggled( uplo ); \ + uploa_trans = bli_uplo_toggled( uploa ); \ } \ \ - conja = bli_extract_conj( trans ); \ + conja = bli_extract_conj( transa ); \ \ /* x = alpha * x; */ \ - PASTEMAC2(chax,chx,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - alpha_cast, \ - x, incx ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + alpha, \ + x, incx, \ + cntx \ + ); \ +\ + PASTECH(ch,axpyv_ft) kfp_av; \ +\ + /* Query the context for the kernel function pointer. */ \ + kfp_av = bli_cntx_get_l1v_ker_dt( dt, BLIS_AXPYV_KER, cntx ); \ \ /* We reduce all of the possible cases down to just lower/upper. */ \ - if ( bli_is_upper( uplo_trans ) ) \ + if ( bli_is_upper( uploa_trans ) ) \ { \ for ( iter = 0; iter < m; ++iter ) \ { \ i = m - iter - 1; \ n_ahead = i; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a01 = a_cast + (0 )*rs_at + (i )*cs_at; \ - chi1 = x_cast + (i )*incx; \ - x0 = x_cast + (0 )*incx; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a01 = a + (0 )*rs_at + (i )*cs_at; \ + chi1 = x + (i )*incx; \ + x0 = x + (0 )*incx; \ \ /* chi1 = chi1 / alpha11; */ \ - if ( bli_is_nonunit_diag( diag ) ) \ + if ( bli_is_nonunit_diag( diaga ) ) \ { \ - PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ - PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi1 ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC(ch,invscals)( alpha11_conj, *chi1 ); \ } \ \ /* x0 = x0 - chi1 * a01; */ \ - PASTEMAC2(chx,chax,neg2s)( *chi1, minus_chi1 ); \ - PASTEMAC3(chax,cha,chx,kername)( conja, \ - n_ahead, \ - &minus_chi1, \ - a01, rs_at, \ - x0, incx ); \ + PASTEMAC(ch,neg2s)( *chi1, minus_chi1 ); \ + kfp_av \ + ( \ + conja, \ + n_ahead, \ + &minus_chi1, \ + a01, rs_at, \ + x0, incx, \ + cntx \ + ); \ } \ } \ - else /* if ( bli_is_lower( uplo_trans ) ) */ \ + else /* if ( bli_is_lower( uploa_trans ) ) */ \ { \ for ( iter = 0; iter < m; ++iter ) \ { \ i = iter; \ n_ahead = m - iter - 1; \ - alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ - a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ - chi1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+1)*incx; \ + alpha11 = a + (i )*rs_at + (i )*cs_at; \ + a21 = a + (i+1)*rs_at + (i )*cs_at; \ + chi1 = x + (i )*incx; \ + x2 = x + (i+1)*incx; \ \ /* chi1 = chi1 / alpha11; */ \ - if ( bli_is_nonunit_diag( diag ) ) \ + if ( bli_is_nonunit_diag( diaga ) ) \ { \ - PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ - PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi1 ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC(ch,invscals)( alpha11_conj, *chi1 ); \ } \ \ /* x2 = x2 - chi1 * a21; */ \ - PASTEMAC2(chx,chax,neg2s)( *chi1, minus_chi1 ); \ - PASTEMAC3(chax,cha,chx,kername)( conja, \ - n_ahead, \ - &minus_chi1, \ - a21, rs_at, \ - x2, incx ); \ + PASTEMAC(ch,neg2s)( *chi1, minus_chi1 ); \ + kfp_av \ + ( \ + conja, \ + n_ahead, \ + &minus_chi1, \ + a21, rs_at, \ + x2, incx, \ + cntx \ + ); \ } \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2U_BASIC( trsv_unb_var2, AXPYV_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2U_MIX_D( trsv_unb_var2, AXPYV_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2U_MIX_P( trsv_unb_var2, AXPYV_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( trsv_unb_var2 ) diff --git a/frame/2/trsv/bli_trsv_unf_var1.c b/frame/2/trsv/bli_trsv_unf_var1.c index a9560b2af..08130315c 100644 --- a/frame/2/trsv/bli_trsv_unf_var1.c +++ b/frame/2/trsv/bli_trsv_unf_var1.c @@ -34,165 +34,104 @@ #include "blis.h" -#define FUNCPTR_T trsv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - trans_t trans, - diag_t diag, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,trsv_unf_var1); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,trsv_unf_var1); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,trsv_unf_var1); -#endif -#endif - - -void bli_trsv_unf_var1( obj_t* alpha, - obj_t* a, - obj_t* x, - trsv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - - uplo_t uplo = bli_obj_uplo( *a ); - trans_t trans = bli_obj_conjtrans_status( *a ); - diag_t diag = bli_obj_diag( *a ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x]; - - // Invoke the function. - f( uplo, - trans, - diag, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx ); -} - - -#undef GENTFUNC2U -#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(cha,chx,varname)( \ - uplo_t uplo, \ - trans_t trans, \ - diag_t diag, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_x* one = PASTEMAC(chx,1); \ - ctype_ax* minus_one = PASTEMAC(chax,m1); \ - ctype_a* A10; \ - ctype_a* A11; \ - ctype_a* A12; \ - ctype_a* a10t; \ - ctype_a* alpha11; \ - ctype_a* a12t; \ - ctype_x* x0; \ - ctype_x* x1; \ - ctype_x* x2; \ - ctype_x* x01; \ - ctype_x* chi11; \ - ctype_x* x21; \ - ctype_ax alpha11_conj; \ - ctype_ax rho1; \ - dim_t iter, i, k, j, l; \ - dim_t b_fuse, f; \ - dim_t n_behind, f_behind; \ - inc_t rs_at, cs_at; \ - uplo_t uplo_trans; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* one = PASTEMAC(ch,1); \ + ctype* minus_one = PASTEMAC(ch,m1); \ + ctype* A10; \ + ctype* A11; \ + ctype* A12; \ + ctype* a10t; \ + ctype* alpha11; \ + ctype* a12t; \ + ctype* x0; \ + ctype* x1; \ + ctype* x2; \ + ctype* x01; \ + ctype* chi11; \ + ctype* x21; \ + ctype alpha11_conj; \ + ctype rho1; \ + dim_t iter, i, k, j, l; \ + dim_t b_fuse, f; \ + dim_t n_behind, f_behind; \ + inc_t rs_at, cs_at; \ + uplo_t uploa_trans; \ + conj_t conja; \ \ - if ( bli_does_notrans( trans ) ) \ + /* x = alpha * x; */ \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + alpha, \ + x, incx, \ + cntx \ + ); \ +\ + if ( bli_does_notrans( transa ) ) \ { \ rs_at = rs_a; \ cs_at = cs_a; \ - uplo_trans = uplo; \ + uploa_trans = uploa; \ } \ - else /* if ( bli_does_trans( trans ) ) */ \ + else /* if ( bli_does_trans( transa ) ) */ \ { \ rs_at = cs_a; \ cs_at = rs_a; \ - uplo_trans = bli_uplo_toggled( uplo ); \ + uploa_trans = bli_uplo_toggled( uploa ); \ } \ \ - conja = bli_extract_conj( trans ); \ + conja = bli_extract_conj( transa ); \ \ - /* Query the fusing factor for the dotxf implementation. */ \ - b_fuse = PASTEMAC(chax,dotxf_fusefac); \ + PASTECH(ch,dotxf_ft) kfp_df; \ \ - /* x = alpha * x; */ \ - PASTEMAC2(chax,chx,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - alpha_cast, \ - x, incx ); \ + /* Query the context for the kernel function pointer and fusing factor. */ \ + kfp_df = bli_cntx_get_l1f_ker_dt( dt, BLIS_DOTXF_KER, cntx ); \ + b_fuse = bli_cntx_get_blksz_def_dt( dt, BLIS_DF, cntx ); \ \ /* We reduce all of the possible cases down to just lower/upper. */ \ - if ( bli_is_upper( uplo_trans ) ) \ + if ( bli_is_upper( uploa_trans ) ) \ { \ for ( iter = 0; iter < m; iter += f ) \ { \ f = bli_determine_blocksize_dim_b( iter, m, b_fuse ); \ i = m - iter - f; \ n_behind = iter; \ - A11 = a_cast + (i )*rs_at + (i )*cs_at; \ - A12 = a_cast + (i )*rs_at + (i+f)*cs_at; \ - x1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+f)*incx; \ + A11 = a + (i )*rs_at + (i )*cs_at; \ + A12 = a + (i )*rs_at + (i+f)*cs_at; \ + x1 = x + (i )*incx; \ + x2 = x + (i+f)*incx; \ \ /* x1 = x1 - A12 * x2; */ \ - PASTEMAC3(cha,chx,chx,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_behind, \ - f, \ - minus_one, \ - A12, cs_at, rs_at, \ - x2, incx, \ - one, \ - x1, incx ); \ + kfp_df \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + f, \ + minus_one, \ + A12, cs_at, rs_at, \ + x2, incx, \ + one, \ + x1, incx, \ + cntx \ + ); \ \ /* x1 = x1 / triu( A11 ); */ \ for ( k = 0; k < f; ++k ) \ @@ -205,50 +144,54 @@ void PASTEMAC2(cha,chx,varname)( \ x21 = x1 + (l+1)*incx; \ \ /* chi11 = chi11 - a12t * x21; */ \ - PASTEMAC(chax,set0s)( rho1 ); \ + PASTEMAC(ch,set0s)( rho1 ); \ if ( bli_is_conj( conja ) ) \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(cha,chx,chax,dotjs)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ + PASTEMAC(ch,dotjs)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ } \ else \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(cha,chx,chax,dots)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ + PASTEMAC(ch,dots)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ } \ - PASTEMAC2(chax,chx,subs)( rho1, *chi11 ); \ + PASTEMAC(ch,subs)( rho1, *chi11 ); \ \ /* chi11 = chi11 / alpha11; */ \ - if ( bli_is_nonunit_diag( diag ) ) \ + if ( bli_is_nonunit_diag( diaga ) ) \ { \ - PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ - PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi11 ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC(ch,invscals)( alpha11_conj, *chi11 ); \ } \ } \ } \ } \ - else /* if ( bli_is_lower( uplo_trans ) ) */ \ + else /* if ( bli_is_lower( uploa_trans ) ) */ \ { \ for ( iter = 0; iter < m; iter += f ) \ { \ f = bli_determine_blocksize_dim_f( iter, m, b_fuse ); \ i = iter; \ n_behind = i; \ - A11 = a_cast + (i )*rs_at + (i )*cs_at; \ - A10 = a_cast + (i )*rs_at + (0 )*cs_at; \ - x1 = x_cast + (i )*incx; \ - x0 = x_cast + (0 )*incx; \ + A11 = a + (i )*rs_at + (i )*cs_at; \ + A10 = a + (i )*rs_at + (0 )*cs_at; \ + x1 = x + (i )*incx; \ + x0 = x + (0 )*incx; \ \ /* x1 = x1 - A10 * x0; */ \ - PASTEMAC3(cha,chx,chx,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_behind, \ - f, \ - minus_one, \ - A10, cs_at, rs_at, \ - x0, incx, \ - one, \ - x1, incx ); \ + kfp_df \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + f, \ + minus_one, \ + A10, cs_at, rs_at, \ + x0, incx, \ + one, \ + x1, incx, \ + cntx \ + ); \ \ /* x1 = x1 / tril( A11 ); */ \ for ( k = 0; k < f; ++k ) \ @@ -261,39 +204,29 @@ void PASTEMAC2(cha,chx,varname)( \ x01 = x1 + (0 )*incx; \ \ /* chi11 = chi11 - a10t * x01; */ \ - PASTEMAC(chax,set0s)( rho1 ); \ + PASTEMAC(ch,set0s)( rho1 ); \ if ( bli_is_conj( conja ) ) \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(cha,chx,chax,dotjs)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ + PASTEMAC(ch,dotjs)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ } \ else \ { \ for ( j = 0; j < f_behind; ++j ) \ - PASTEMAC3(cha,chx,chax,dots)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ + PASTEMAC(ch,dots)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ } \ - PASTEMAC2(chax,chx,subs)( rho1, *chi11 ); \ + PASTEMAC(ch,subs)( rho1, *chi11 ); \ \ /* chi11 = chi11 / alpha11; */ \ - if ( bli_is_nonunit_diag( diag ) ) \ + if ( bli_is_nonunit_diag( diaga ) ) \ { \ - PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ - PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi11 ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC(ch,invscals)( alpha11_conj, *chi11 ); \ } \ } \ } \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2U_BASIC( trsv_unf_var1, DOTXF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2U_MIX_D( trsv_unf_var1, DOTXF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2U_MIX_P( trsv_unf_var1, DOTXF_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( trsv_unf_var1 ) diff --git a/frame/2/trsv/bli_trsv_unf_var2.c b/frame/2/trsv/bli_trsv_unf_var2.c index b2eb07745..cf45e47cc 100644 --- a/frame/2/trsv/bli_trsv_unf_var2.c +++ b/frame/2/trsv/bli_trsv_unf_var2.c @@ -34,153 +34,88 @@ #include "blis.h" -#define FUNCPTR_T trsv_fp - -typedef void (*FUNCPTR_T)( - uplo_t uplo, - trans_t trans, - diag_t diag, - dim_t m, - void* alpha, - void* a, inc_t rs_a, inc_t cs_a, - void* x, inc_t incx - ); - -// If some mixed datatype functions will not be compiled, we initialize -// the corresponding elements of the function array to NULL. -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -static FUNCPTR_T GENARRAY2_ALL(ftypes,trsv_unf_var2); -#else -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -static FUNCPTR_T GENARRAY2_EXT(ftypes,trsv_unf_var2); -#else -static FUNCPTR_T GENARRAY2_MIN(ftypes,trsv_unf_var2); -#endif -#endif - - -void bli_trsv_unf_var2( obj_t* alpha, - obj_t* a, - obj_t* x, - trsv_t* cntl ) -{ - num_t dt_a = bli_obj_datatype( *a ); - num_t dt_x = bli_obj_datatype( *x ); - - uplo_t uplo = bli_obj_uplo( *a ); - trans_t trans = bli_obj_conjtrans_status( *a ); - diag_t diag = bli_obj_diag( *a ); - - dim_t m = bli_obj_length( *a ); - - void* buf_a = bli_obj_buffer_at_off( *a ); - inc_t rs_a = bli_obj_row_stride( *a ); - inc_t cs_a = bli_obj_col_stride( *a ); - - void* buf_x = bli_obj_buffer_at_off( *x ); - inc_t incx = bli_obj_vector_inc( *x ); - - num_t dt_alpha; - void* buf_alpha; - - FUNCPTR_T f; - - // The datatype of alpha MUST be the type union of a and x. This is to - // prevent any unnecessary loss of information during computation. - dt_alpha = bli_datatype_union( dt_a, dt_x ); - buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); - - // Index into the type combination array to extract the correct - // function pointer. - f = ftypes[dt_a][dt_x]; - - // Invoke the function. - f( uplo, - trans, - diag, - m, - buf_alpha, - buf_a, rs_a, cs_a, - buf_x, incx ); -} - - -#undef GENTFUNC2U -#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC2(cha,chx,varname)( \ - uplo_t uplo, \ - trans_t trans, \ - diag_t diag, \ - dim_t m, \ - void* alpha, \ - void* a, inc_t rs_a, inc_t cs_a, \ - void* x, inc_t incx \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ { \ - ctype_ax* alpha_cast = alpha; \ - ctype_a* a_cast = a; \ - ctype_x* x_cast = x; \ - ctype_ax* minus_one = PASTEMAC(chax,m1); \ - ctype_a* A01; \ - ctype_a* A11; \ - ctype_a* A21; \ - ctype_a* a01; \ - ctype_a* alpha11; \ - ctype_a* a21; \ - ctype_x* x0; \ - ctype_x* x1; \ - ctype_x* x2; \ - ctype_x* x01; \ - ctype_x* chi11; \ - ctype_x* x21; \ - ctype_ax alpha11_conj; \ - ctype_ax minus_chi11; \ - dim_t iter, i, k, j, l; \ - dim_t b_fuse, f; \ - dim_t n_ahead, f_ahead; \ - inc_t rs_at, cs_at; \ - uplo_t uplo_trans; \ - conj_t conja; \ + const num_t dt = PASTEMAC(ch,type); \ \ - if ( bli_zero_dim1( m ) ) return; \ + ctype* minus_one = PASTEMAC(ch,m1); \ + ctype* A01; \ + ctype* A11; \ + ctype* A21; \ + ctype* a01; \ + ctype* alpha11; \ + ctype* a21; \ + ctype* x0; \ + ctype* x1; \ + ctype* x2; \ + ctype* x01; \ + ctype* chi11; \ + ctype* x21; \ + ctype alpha11_conj; \ + ctype minus_chi11; \ + dim_t iter, i, k, j, l; \ + dim_t b_fuse, f; \ + dim_t n_ahead, f_ahead; \ + inc_t rs_at, cs_at; \ + uplo_t uploa_trans; \ + conj_t conja; \ \ - if ( bli_does_notrans( trans ) ) \ + /* x = alpha * x; */ \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + m, \ + alpha, \ + x, incx, \ + cntx \ + ); \ +\ + if ( bli_does_notrans( transa ) ) \ { \ rs_at = rs_a; \ cs_at = cs_a; \ - uplo_trans = uplo; \ + uploa_trans = uploa; \ } \ - else /* if ( bli_does_trans( trans ) ) */ \ + else /* if ( bli_does_trans( transa ) ) */ \ { \ rs_at = cs_a; \ cs_at = rs_a; \ - uplo_trans = bli_uplo_toggled( uplo ); \ + uploa_trans = bli_uplo_toggled( uploa ); \ } \ \ - conja = bli_extract_conj( trans ); \ + conja = bli_extract_conj( transa ); \ \ - /* Query the fusing factor for the axpyf implementation. */ \ - b_fuse = PASTEMAC(chax,axpyf_fusefac); \ + PASTECH(ch,axpyf_ft) kfp_af; \ \ - /* x = alpha * x; */ \ - PASTEMAC2(chax,chx,scalv)( BLIS_NO_CONJUGATE, \ - m, \ - alpha_cast, \ - x, incx ); \ + /* Query the context for the kernel function pointer and fusing factor. */ \ + kfp_af = bli_cntx_get_l1f_ker_dt( dt, BLIS_AXPYF_KER, cntx ); \ + b_fuse = bli_cntx_get_blksz_def_dt( dt, BLIS_AF, cntx ); \ \ /* We reduce all of the possible cases down to just lower/upper. */ \ - if ( bli_is_upper( uplo_trans ) ) \ + if ( bli_is_upper( uploa_trans ) ) \ { \ for ( iter = 0; iter < m; iter += f ) \ { \ f = bli_determine_blocksize_dim_b( iter, m, b_fuse ); \ i = m - iter - f; \ n_ahead = i; \ - A11 = a_cast + (i )*rs_at + (i )*cs_at; \ - A01 = a_cast + (0 )*rs_at + (i )*cs_at; \ - x1 = x_cast + (i )*incx; \ - x0 = x_cast + (0 )*incx; \ + A11 = a + (i )*rs_at + (i )*cs_at; \ + A01 = a + (0 )*rs_at + (i )*cs_at; \ + x1 = x + (i )*incx; \ + x0 = x + (0 )*incx; \ \ /* x1 = x1 / triu( A11 ); */ \ for ( k = 0; k < f; ++k ) \ @@ -193,48 +128,52 @@ void PASTEMAC2(cha,chx,varname)( \ x01 = x1 + (0 )*incx; \ \ /* chi11 = chi11 / alpha11; */ \ - if ( bli_is_nonunit_diag( diag ) ) \ + if ( bli_is_nonunit_diag( diaga ) ) \ { \ - PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ - PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi11 ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC(ch,invscals)( alpha11_conj, *chi11 ); \ } \ \ /* x01 = x01 - chi11 * a01; */ \ - PASTEMAC2(chx,chax,neg2s)( *chi11, minus_chi11 ); \ + PASTEMAC(ch,neg2s)( *chi11, minus_chi11 ); \ if ( bli_is_conj( conja ) ) \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(chax,cha,chx,axpyjs)( minus_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ + PASTEMAC(ch,axpyjs)( minus_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ } \ else \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(chax,cha,chx,axpys)( minus_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ + PASTEMAC(ch,axpys)( minus_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ } \ } \ \ /* x0 = x0 - A01 * x1; */ \ - PASTEMAC3(cha,chx,chx,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_ahead, \ - f, \ - minus_one, \ - A01, rs_at, cs_at, \ - x1, incx, \ - x0, incx ); \ + kfp_af \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + f, \ + minus_one, \ + A01, rs_at, cs_at, \ + x1, incx, \ + x0, incx, \ + cntx \ + ); \ } \ } \ - else /* if ( bli_is_lower( uplo_trans ) ) */ \ + else /* if ( bli_is_lower( uploa_trans ) ) */ \ { \ for ( iter = 0; iter < m; iter += f ) \ { \ f = bli_determine_blocksize_dim_f( iter, m, b_fuse ); \ i = iter; \ n_ahead = m - iter - f; \ - A11 = a_cast + (i )*rs_at + (i )*cs_at; \ - A21 = a_cast + (i+f)*rs_at + (i )*cs_at; \ - x1 = x_cast + (i )*incx; \ - x2 = x_cast + (i+f)*incx; \ + A11 = a + (i )*rs_at + (i )*cs_at; \ + A21 = a + (i+f)*rs_at + (i )*cs_at; \ + x1 = x + (i )*incx; \ + x2 = x + (i+f)*incx; \ \ /* x1 = x1 / tril( A11 ); */ \ for ( k = 0; k < f; ++k ) \ @@ -247,48 +186,42 @@ void PASTEMAC2(cha,chx,varname)( \ x21 = x1 + (l+1)*incx; \ \ /* chi11 = chi11 / alpha11; */ \ - if ( bli_is_nonunit_diag( diag ) ) \ + if ( bli_is_nonunit_diag( diaga ) ) \ { \ - PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ - PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi11 ); \ + PASTEMAC(ch,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC(ch,invscals)( alpha11_conj, *chi11 ); \ } \ \ /* x21 = x21 - chi11 * a21; */ \ - PASTEMAC2(chx,chax,neg2s)( *chi11, minus_chi11 ); \ + PASTEMAC(ch,neg2s)( *chi11, minus_chi11 ); \ if ( bli_is_conj( conja ) ) \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(chax,cha,chx,axpyjs)( minus_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ + PASTEMAC(ch,axpyjs)( minus_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ } \ else \ { \ for ( j = 0; j < f_ahead; ++j ) \ - PASTEMAC3(chax,cha,chx,axpys)( minus_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ + PASTEMAC(ch,axpys)( minus_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ } \ } \ \ /* x2 = x2 - A21 * x1; */ \ - PASTEMAC3(cha,chx,chx,kername)( conja, \ - BLIS_NO_CONJUGATE, \ - n_ahead, \ - f, \ - minus_one, \ - A21, rs_at, cs_at, \ - x1, incx, \ - x2, incx ); \ + kfp_af \ + ( \ + conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + f, \ + minus_one, \ + A21, rs_at, cs_at, \ + x1, incx, \ + x2, incx, \ + cntx \ + ); \ } \ } \ } -// Define the basic set of functions unconditionally, and then also some -// mixed datatype functions if requested. -INSERT_GENTFUNC2U_BASIC( trsv_unf_var2, AXPYF_KERNEL ) - -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTFUNC2U_MIX_D( trsv_unf_var2, AXPYF_KERNEL ) -#endif - -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTFUNC2U_MIX_P( trsv_unf_var2, AXPYF_KERNEL ) -#endif +INSERT_GENTFUNC_BASIC0( trsv_unf_var2 ) diff --git a/frame/2/trsv/bli_trsv_var.h b/frame/2/trsv/bli_trsv_var.h new file mode 100644 index 000000000..bc66f49ff --- /dev/null +++ b/frame/2/trsv/bli_trsv_var.h @@ -0,0 +1,88 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + cntx_t* cntx, \ + trsv_t* cntl \ + ); + +GENPROT( trsv_l_blk_var1 ) +GENPROT( trsv_l_blk_var2 ) +GENPROT( trsv_u_blk_var1 ) +GENPROT( trsv_u_blk_var2 ) + +GENPROT( trsv_unb_var1 ) +GENPROT( trsv_unb_var2 ) + +GENPROT( trsv_unf_var1 ) +GENPROT( trsv_unf_var2 ) + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( trsv_unb_var1 ) +INSERT_GENTPROT_BASIC( trsv_unb_var2 ) + +INSERT_GENTPROT_BASIC( trsv_unf_var1 ) +INSERT_GENTPROT_BASIC( trsv_unf_var2 ) + diff --git a/frame/2/trsv/bli_trsv_var_oapi.c b/frame/2/trsv/bli_trsv_var_oapi.c new file mode 100644 index 000000000..f38a5123f --- /dev/null +++ b/frame/2/trsv/bli_trsv_var_oapi.c @@ -0,0 +1,87 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* x, \ + cntx_t* cntx, \ + trsv_t* cntl \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *a ); \ +\ + uplo_t uploa = bli_obj_uplo( *a ); \ + trans_t transa = bli_obj_conjtrans_status( *a ); \ + diag_t diaga = bli_obj_diag( *a ); \ +\ + dim_t m = bli_obj_length( *a ); \ +\ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ +\ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + bli_call_ft_11 \ + ( \ + dt, \ + opname, \ + uploa, \ + transa, \ + diaga, \ + m, \ + buf_alpha, \ + buf_a, rs_a, cs_a, \ + buf_x, incx, \ + cntx \ + ); \ +} \ + +GENFRONT( trsv_unb_var1 ) +GENFRONT( trsv_unb_var2 ) + +GENFRONT( trsv_unf_var1 ) +GENFRONT( trsv_unf_var2 ) + diff --git a/frame/2/trsv/bli_trsv_check.c b/frame/2/trsv/old/bli_trsv_check.c similarity index 98% rename from frame/2/trsv/bli_trsv_check.c rename to frame/2/trsv/old/bli_trsv_check.c index 9f8c1e360..cfa2b95b7 100644 --- a/frame/2/trsv/bli_trsv_check.c +++ b/frame/2/trsv/old/bli_trsv_check.c @@ -88,6 +88,7 @@ void bli_trsv_check( obj_t* alpha, void bli_trsv_int_check( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ) { err_t e_val; diff --git a/frame/2/trsv/bli_trsv_check.h b/frame/2/trsv/old/bli_trsv_check.h similarity index 98% rename from frame/2/trsv/bli_trsv_check.h rename to frame/2/trsv/old/bli_trsv_check.h index 5be364ddf..f4f6528b6 100644 --- a/frame/2/trsv/bli_trsv_check.h +++ b/frame/2/trsv/old/bli_trsv_check.h @@ -43,4 +43,5 @@ void bli_trsv_check( obj_t* alpha, void bli_trsv_int_check( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); diff --git a/frame/2/trsv/bli_trsv_l_blk_var1.h b/frame/2/trsv/old/bli_trsv_l_blk_var1.h similarity index 97% rename from frame/2/trsv/bli_trsv_l_blk_var1.h rename to frame/2/trsv/old/bli_trsv_l_blk_var1.h index 906ceb8ad..6db83ee56 100644 --- a/frame/2/trsv/bli_trsv_l_blk_var1.h +++ b/frame/2/trsv/old/bli_trsv_l_blk_var1.h @@ -35,5 +35,6 @@ void bli_trsv_l_blk_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); diff --git a/frame/2/trsv/bli_trsv_l_blk_var2.h b/frame/2/trsv/old/bli_trsv_l_blk_var2.h similarity index 97% rename from frame/2/trsv/bli_trsv_l_blk_var2.h rename to frame/2/trsv/old/bli_trsv_l_blk_var2.h index f994d05c9..a0fba7e0e 100644 --- a/frame/2/trsv/bli_trsv_l_blk_var2.h +++ b/frame/2/trsv/old/bli_trsv_l_blk_var2.h @@ -35,5 +35,6 @@ void bli_trsv_l_blk_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); diff --git a/frame/2/trsv/bli_trsv_u_blk_var1.h b/frame/2/trsv/old/bli_trsv_u_blk_var1.h similarity index 97% rename from frame/2/trsv/bli_trsv_u_blk_var1.h rename to frame/2/trsv/old/bli_trsv_u_blk_var1.h index 69b8de7bf..48da1a7ee 100644 --- a/frame/2/trsv/bli_trsv_u_blk_var1.h +++ b/frame/2/trsv/old/bli_trsv_u_blk_var1.h @@ -35,5 +35,6 @@ void bli_trsv_u_blk_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); diff --git a/frame/2/trsv/bli_trsv_u_blk_var2.h b/frame/2/trsv/old/bli_trsv_u_blk_var2.h similarity index 97% rename from frame/2/trsv/bli_trsv_u_blk_var2.h rename to frame/2/trsv/old/bli_trsv_u_blk_var2.h index 2f02db5a6..9f493405f 100644 --- a/frame/2/trsv/bli_trsv_u_blk_var2.h +++ b/frame/2/trsv/old/bli_trsv_u_blk_var2.h @@ -35,5 +35,6 @@ void bli_trsv_u_blk_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); diff --git a/frame/2/trsv/old/bli_trsv_unb_var1.c b/frame/2/trsv/old/bli_trsv_unb_var1.c new file mode 100644 index 000000000..411881c52 --- /dev/null +++ b/frame/2/trsv/old/bli_trsv_unb_var1.c @@ -0,0 +1,234 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T trsv_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + trans_t trans, + diag_t diag, + dim_t m, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY2_ALL(ftypes,trsv_unb_var1); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY2_EXT(ftypes,trsv_unb_var1); +#else +static FUNCPTR_T GENARRAY2_MIN(ftypes,trsv_unb_var1); +#endif +#endif + + +void bli_trsv_unb_var1( obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx, + trsv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + + uplo_t uplo = bli_obj_uplo( *a ); + trans_t trans = bli_obj_conjtrans_status( *a ); + diag_t diag = bli_obj_diag( *a ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a][dt_x]; + + // Invoke the function. + f( uplo, + trans, + diag, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx ); +} + + +#undef GENTFUNC2U +#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +\ +void PASTEMAC2(cha,chx,varname)( \ + uplo_t uplo, \ + trans_t trans, \ + diag_t diag, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_a* a10t; \ + ctype_a* alpha11; \ + ctype_a* a12t; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_ax alpha11_conj; \ + ctype_ax rho; \ + dim_t iter, i; \ + dim_t n_behind; \ + inc_t rs_at, cs_at; \ + uplo_t uplo_trans; \ + conj_t conja; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( bli_does_notrans( trans ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ + uplo_trans = uplo; \ + } \ + else /* if ( bli_does_trans( trans ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ + uplo_trans = bli_uplo_toggled( uplo ); \ + } \ +\ + conja = bli_extract_conj( trans ); \ +\ + /* x = alpha * x; */ \ + PASTEMAC2(chax,chx,scalv)( BLIS_NO_CONJUGATE, \ + m, \ + alpha_cast, \ + x, incx ); \ +\ + /* We reduce all of the possible cases down to just lower/upper. */ \ + if ( bli_is_upper( uplo_trans ) ) \ + { \ + for ( iter = 0; iter < m; ++iter ) \ + { \ + i = m - iter - 1; \ + n_behind = iter; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a12t = a_cast + (i )*rs_at + (i+1)*cs_at; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ +\ + /* chi1 = chi1 - a12t * x2; */ \ + PASTEMAC3(cha,chx,chax,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + a12t, cs_at, \ + x2, incx, \ + &rho ); \ + PASTEMAC2(chax,chx,subs)( rho, *chi1 ); \ +\ + /* chi1 = chi1 / alpha11; */ \ + if ( bli_is_nonunit_diag( diag ) ) \ + { \ + PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi1 ); \ + } \ + } \ + } \ + else /* if ( bli_is_lower( uplo_trans ) ) */ \ + { \ + for ( iter = 0; iter < m; ++iter ) \ + { \ + i = iter; \ + n_behind = i; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a10t = a_cast + (i )*rs_at + (0 )*cs_at; \ + chi1 = x_cast + (i )*incx; \ + x0 = x_cast + (0 )*incx; \ +\ + /* chi1 = chi1 - a10t * x0; */ \ + PASTEMAC3(cha,chx,chax,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + a10t, cs_at, \ + x0, incx, \ + &rho ); \ + PASTEMAC2(chax,chx,subs)( rho, *chi1 ); \ +\ + /* chi1 = chi1 / alpha11; */ \ + if ( bli_is_nonunit_diag( diag ) ) \ + { \ + PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi1 ); \ + } \ + } \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2U_BASIC( trsv_unb_var1, DOTV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC2U_MIX_D( trsv_unb_var1, DOTV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC2U_MIX_P( trsv_unb_var1, DOTV_KERNEL ) +#endif + diff --git a/frame/2/trsv/bli_trsv_unb_var1.h b/frame/2/trsv/old/bli_trsv_unb_var1.h similarity index 98% rename from frame/2/trsv/bli_trsv_unb_var1.h rename to frame/2/trsv/old/bli_trsv_unb_var1.h index ba939c2c0..68fbda468 100644 --- a/frame/2/trsv/bli_trsv_unb_var1.h +++ b/frame/2/trsv/old/bli_trsv_unb_var1.h @@ -36,6 +36,7 @@ void bli_trsv_unb_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); #undef GENTPROT2 diff --git a/frame/2/trsv/old/bli_trsv_unb_var2.c b/frame/2/trsv/old/bli_trsv_unb_var2.c new file mode 100644 index 000000000..be1154124 --- /dev/null +++ b/frame/2/trsv/old/bli_trsv_unb_var2.c @@ -0,0 +1,232 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T trsv_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + trans_t trans, + diag_t diag, + dim_t m, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY2_ALL(ftypes,trsv_unb_var2); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY2_EXT(ftypes,trsv_unb_var2); +#else +static FUNCPTR_T GENARRAY2_MIN(ftypes,trsv_unb_var2); +#endif +#endif + + +void bli_trsv_unb_var2( obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx, + trsv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + + uplo_t uplo = bli_obj_uplo( *a ); + trans_t trans = bli_obj_conjtrans_status( *a ); + diag_t diag = bli_obj_diag( *a ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a][dt_x]; + + // Invoke the function. + f( uplo, + trans, + diag, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx ); +} + + +#undef GENTFUNC2U +#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +\ +void PASTEMAC2(cha,chx,varname)( \ + uplo_t uplo, \ + trans_t trans, \ + diag_t diag, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_a* a01; \ + ctype_a* alpha11; \ + ctype_a* a21; \ + ctype_x* x0; \ + ctype_x* chi1; \ + ctype_x* x2; \ + ctype_ax alpha11_conj; \ + ctype_ax minus_chi1; \ + dim_t iter, i; \ + dim_t n_ahead; \ + inc_t rs_at, cs_at; \ + uplo_t uplo_trans; \ + conj_t conja; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( bli_does_notrans( trans ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ + uplo_trans = uplo; \ + } \ + else /* if ( bli_does_trans( trans ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ + uplo_trans = bli_uplo_toggled( uplo ); \ + } \ +\ + conja = bli_extract_conj( trans ); \ +\ + /* x = alpha * x; */ \ + PASTEMAC2(chax,chx,scalv)( BLIS_NO_CONJUGATE, \ + m, \ + alpha_cast, \ + x, incx ); \ +\ + /* We reduce all of the possible cases down to just lower/upper. */ \ + if ( bli_is_upper( uplo_trans ) ) \ + { \ + for ( iter = 0; iter < m; ++iter ) \ + { \ + i = m - iter - 1; \ + n_ahead = i; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a01 = a_cast + (0 )*rs_at + (i )*cs_at; \ + chi1 = x_cast + (i )*incx; \ + x0 = x_cast + (0 )*incx; \ +\ + /* chi1 = chi1 / alpha11; */ \ + if ( bli_is_nonunit_diag( diag ) ) \ + { \ + PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi1 ); \ + } \ +\ + /* x0 = x0 - chi1 * a01; */ \ + PASTEMAC2(chx,chax,neg2s)( *chi1, minus_chi1 ); \ + PASTEMAC3(chax,cha,chx,kername)( conja, \ + n_ahead, \ + &minus_chi1, \ + a01, rs_at, \ + x0, incx ); \ + } \ + } \ + else /* if ( bli_is_lower( uplo_trans ) ) */ \ + { \ + for ( iter = 0; iter < m; ++iter ) \ + { \ + i = iter; \ + n_ahead = m - iter - 1; \ + alpha11 = a_cast + (i )*rs_at + (i )*cs_at; \ + a21 = a_cast + (i+1)*rs_at + (i )*cs_at; \ + chi1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+1)*incx; \ +\ + /* chi1 = chi1 / alpha11; */ \ + if ( bli_is_nonunit_diag( diag ) ) \ + { \ + PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi1 ); \ + } \ +\ + /* x2 = x2 - chi1 * a21; */ \ + PASTEMAC2(chx,chax,neg2s)( *chi1, minus_chi1 ); \ + PASTEMAC3(chax,cha,chx,kername)( conja, \ + n_ahead, \ + &minus_chi1, \ + a21, rs_at, \ + x2, incx ); \ + } \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2U_BASIC( trsv_unb_var2, AXPYV_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC2U_MIX_D( trsv_unb_var2, AXPYV_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC2U_MIX_P( trsv_unb_var2, AXPYV_KERNEL ) +#endif + diff --git a/frame/2/trsv/bli_trsv_unb_var2.h b/frame/2/trsv/old/bli_trsv_unb_var2.h similarity index 98% rename from frame/2/trsv/bli_trsv_unb_var2.h rename to frame/2/trsv/old/bli_trsv_unb_var2.h index e9054bc92..bc870d936 100644 --- a/frame/2/trsv/bli_trsv_unb_var2.h +++ b/frame/2/trsv/old/bli_trsv_unb_var2.h @@ -36,6 +36,7 @@ void bli_trsv_unb_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); #undef GENTPROT2 diff --git a/frame/2/trsv/old/bli_trsv_unf_var1.c b/frame/2/trsv/old/bli_trsv_unf_var1.c new file mode 100644 index 000000000..fcb76327e --- /dev/null +++ b/frame/2/trsv/old/bli_trsv_unf_var1.c @@ -0,0 +1,302 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T trsv_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + trans_t trans, + diag_t diag, + dim_t m, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY2_ALL(ftypes,trsv_unf_var1); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY2_EXT(ftypes,trsv_unf_var1); +#else +static FUNCPTR_T GENARRAY2_MIN(ftypes,trsv_unf_var1); +#endif +#endif + + +void bli_trsv_unf_var1( obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx, + trsv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + + uplo_t uplo = bli_obj_uplo( *a ); + trans_t trans = bli_obj_conjtrans_status( *a ); + diag_t diag = bli_obj_diag( *a ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a][dt_x]; + + // Invoke the function. + f( uplo, + trans, + diag, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx ); +} + + +#undef GENTFUNC2U +#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +\ +void PASTEMAC2(cha,chx,varname)( \ + uplo_t uplo, \ + trans_t trans, \ + diag_t diag, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_x* one = PASTEMAC(chx,1); \ + ctype_ax* minus_one = PASTEMAC(chax,m1); \ + ctype_a* A10; \ + ctype_a* A11; \ + ctype_a* A12; \ + ctype_a* a10t; \ + ctype_a* alpha11; \ + ctype_a* a12t; \ + ctype_x* x0; \ + ctype_x* x1; \ + ctype_x* x2; \ + ctype_x* x01; \ + ctype_x* chi11; \ + ctype_x* x21; \ + ctype_ax alpha11_conj; \ + ctype_ax rho1; \ + dim_t iter, i, k, j, l; \ + dim_t b_fuse, f; \ + dim_t n_behind, f_behind; \ + inc_t rs_at, cs_at; \ + uplo_t uplo_trans; \ + conj_t conja; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( bli_does_notrans( trans ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ + uplo_trans = uplo; \ + } \ + else /* if ( bli_does_trans( trans ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ + uplo_trans = bli_uplo_toggled( uplo ); \ + } \ +\ + conja = bli_extract_conj( trans ); \ +\ + /* Query the fusing factor for the dotxf implementation. */ \ + b_fuse = PASTEMAC(chax,dotxf_fusefac); \ +\ + /* x = alpha * x; */ \ + PASTEMAC2(chax,chx,scalv)( BLIS_NO_CONJUGATE, \ + m, \ + alpha_cast, \ + x, incx ); \ +\ + /* We reduce all of the possible cases down to just lower/upper. */ \ + if ( bli_is_upper( uplo_trans ) ) \ + { \ + for ( iter = 0; iter < m; iter += f ) \ + { \ + f = bli_determine_blocksize_dim_b( iter, m, b_fuse ); \ + i = m - iter - f; \ + n_behind = iter; \ + A11 = a_cast + (i )*rs_at + (i )*cs_at; \ + A12 = a_cast + (i )*rs_at + (i+f)*cs_at; \ + x1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+f)*incx; \ +\ + /* x1 = x1 - A12 * x2; */ \ + PASTEMAC3(cha,chx,chx,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + f, \ + minus_one, \ + A12, cs_at, rs_at, \ + x2, incx, \ + one, \ + x1, incx ); \ +\ + /* x1 = x1 / triu( A11 ); */ \ + for ( k = 0; k < f; ++k ) \ + { \ + l = f - k - 1; \ + f_behind = k; \ + alpha11 = A11 + (l )*rs_at + (l )*cs_at; \ + a12t = A11 + (l )*rs_at + (l+1)*cs_at; \ + chi11 = x1 + (l )*incx; \ + x21 = x1 + (l+1)*incx; \ +\ + /* chi11 = chi11 - a12t * x21; */ \ + PASTEMAC(chax,set0s)( rho1 ); \ + if ( bli_is_conj( conja ) ) \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(cha,chx,chax,dotjs)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ + } \ + else \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(cha,chx,chax,dots)( *(a12t + j*cs_at), *(x21 + j*incx), rho1 ); \ + } \ + PASTEMAC2(chax,chx,subs)( rho1, *chi11 ); \ +\ + /* chi11 = chi11 / alpha11; */ \ + if ( bli_is_nonunit_diag( diag ) ) \ + { \ + PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi11 ); \ + } \ + } \ + } \ + } \ + else /* if ( bli_is_lower( uplo_trans ) ) */ \ + { \ + for ( iter = 0; iter < m; iter += f ) \ + { \ + f = bli_determine_blocksize_dim_f( iter, m, b_fuse ); \ + i = iter; \ + n_behind = i; \ + A11 = a_cast + (i )*rs_at + (i )*cs_at; \ + A10 = a_cast + (i )*rs_at + (0 )*cs_at; \ + x1 = x_cast + (i )*incx; \ + x0 = x_cast + (0 )*incx; \ +\ + /* x1 = x1 - A10 * x0; */ \ + PASTEMAC3(cha,chx,chx,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_behind, \ + f, \ + minus_one, \ + A10, cs_at, rs_at, \ + x0, incx, \ + one, \ + x1, incx ); \ +\ + /* x1 = x1 / tril( A11 ); */ \ + for ( k = 0; k < f; ++k ) \ + { \ + l = k; \ + f_behind = l; \ + alpha11 = A11 + (l )*rs_at + (l )*cs_at; \ + a10t = A11 + (l )*rs_at + (0 )*cs_at; \ + chi11 = x1 + (l )*incx; \ + x01 = x1 + (0 )*incx; \ +\ + /* chi11 = chi11 - a10t * x01; */ \ + PASTEMAC(chax,set0s)( rho1 ); \ + if ( bli_is_conj( conja ) ) \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(cha,chx,chax,dotjs)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ + } \ + else \ + { \ + for ( j = 0; j < f_behind; ++j ) \ + PASTEMAC3(cha,chx,chax,dots)( *(a10t + j*cs_at), *(x01 + j*incx), rho1 ); \ + } \ + PASTEMAC2(chax,chx,subs)( rho1, *chi11 ); \ +\ + /* chi11 = chi11 / alpha11; */ \ + if ( bli_is_nonunit_diag( diag ) ) \ + { \ + PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi11 ); \ + } \ + } \ + } \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2U_BASIC( trsv_unf_var1, DOTXF_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC2U_MIX_D( trsv_unf_var1, DOTXF_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC2U_MIX_P( trsv_unf_var1, DOTXF_KERNEL ) +#endif + diff --git a/frame/2/trsv/bli_trsv_unf_var1.h b/frame/2/trsv/old/bli_trsv_unf_var1.h similarity index 98% rename from frame/2/trsv/bli_trsv_unf_var1.h rename to frame/2/trsv/old/bli_trsv_unf_var1.h index f9d9db6e9..a5b755bd8 100644 --- a/frame/2/trsv/bli_trsv_unf_var1.h +++ b/frame/2/trsv/old/bli_trsv_unf_var1.h @@ -36,6 +36,7 @@ void bli_trsv_unf_var1( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); #undef GENTPROT2 diff --git a/frame/2/trsv/old/bli_trsv_unf_var2.c b/frame/2/trsv/old/bli_trsv_unf_var2.c new file mode 100644 index 000000000..6f33f290d --- /dev/null +++ b/frame/2/trsv/old/bli_trsv_unf_var2.c @@ -0,0 +1,297 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#define FUNCPTR_T trsv_fp + +typedef void (*FUNCPTR_T)( + uplo_t uplo, + trans_t trans, + diag_t diag, + dim_t m, + void* alpha, + void* a, inc_t rs_a, inc_t cs_a, + void* x, inc_t incx + ); + +// If some mixed datatype functions will not be compiled, we initialize +// the corresponding elements of the function array to NULL. +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +static FUNCPTR_T GENARRAY2_ALL(ftypes,trsv_unf_var2); +#else +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +static FUNCPTR_T GENARRAY2_EXT(ftypes,trsv_unf_var2); +#else +static FUNCPTR_T GENARRAY2_MIN(ftypes,trsv_unf_var2); +#endif +#endif + + +void bli_trsv_unf_var2( obj_t* alpha, + obj_t* a, + obj_t* x, + cntx_t* cntx, + trsv_t* cntl ) +{ + num_t dt_a = bli_obj_datatype( *a ); + num_t dt_x = bli_obj_datatype( *x ); + + uplo_t uplo = bli_obj_uplo( *a ); + trans_t trans = bli_obj_conjtrans_status( *a ); + diag_t diag = bli_obj_diag( *a ); + + dim_t m = bli_obj_length( *a ); + + void* buf_a = bli_obj_buffer_at_off( *a ); + inc_t rs_a = bli_obj_row_stride( *a ); + inc_t cs_a = bli_obj_col_stride( *a ); + + void* buf_x = bli_obj_buffer_at_off( *x ); + inc_t incx = bli_obj_vector_inc( *x ); + + num_t dt_alpha; + void* buf_alpha; + + FUNCPTR_T f; + + // The datatype of alpha MUST be the type union of a and x. This is to + // prevent any unnecessary loss of information during computation. + dt_alpha = bli_datatype_union( dt_a, dt_x ); + buf_alpha = bli_obj_buffer_for_1x1( dt_alpha, *alpha ); + + // Index into the type combination array to extract the correct + // function pointer. + f = ftypes[dt_a][dt_x]; + + // Invoke the function. + f( uplo, + trans, + diag, + m, + buf_alpha, + buf_a, rs_a, cs_a, + buf_x, incx ); +} + + +#undef GENTFUNC2U +#define GENTFUNC2U( ctype_a, ctype_x, ctype_ax, cha, chx, chax, varname, kername ) \ +\ +void PASTEMAC2(cha,chx,varname)( \ + uplo_t uplo, \ + trans_t trans, \ + diag_t diag, \ + dim_t m, \ + void* alpha, \ + void* a, inc_t rs_a, inc_t cs_a, \ + void* x, inc_t incx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + ctype_ax* alpha_cast = alpha; \ + ctype_a* a_cast = a; \ + ctype_x* x_cast = x; \ + ctype_ax* minus_one = PASTEMAC(chax,m1); \ + ctype_a* A01; \ + ctype_a* A11; \ + ctype_a* A21; \ + ctype_a* a01; \ + ctype_a* alpha11; \ + ctype_a* a21; \ + ctype_x* x0; \ + ctype_x* x1; \ + ctype_x* x2; \ + ctype_x* x01; \ + ctype_x* chi11; \ + ctype_x* x21; \ + ctype_ax alpha11_conj; \ + ctype_ax minus_chi11; \ + dim_t iter, i, k, j, l; \ + dim_t b_fuse, f; \ + dim_t n_ahead, f_ahead; \ + inc_t rs_at, cs_at; \ + uplo_t uplo_trans; \ + conj_t conja; \ +\ + if ( bli_zero_dim1( m ) ) return; \ +\ + if ( bli_does_notrans( trans ) ) \ + { \ + rs_at = rs_a; \ + cs_at = cs_a; \ + uplo_trans = uplo; \ + } \ + else /* if ( bli_does_trans( trans ) ) */ \ + { \ + rs_at = cs_a; \ + cs_at = rs_a; \ + uplo_trans = bli_uplo_toggled( uplo ); \ + } \ +\ + conja = bli_extract_conj( trans ); \ +\ + /* Query the fusing factor for the axpyf implementation. */ \ + b_fuse = PASTEMAC(chax,axpyf_fusefac); \ +\ + /* x = alpha * x; */ \ + PASTEMAC2(chax,chx,scalv)( BLIS_NO_CONJUGATE, \ + m, \ + alpha_cast, \ + x, incx ); \ +\ + /* We reduce all of the possible cases down to just lower/upper. */ \ + if ( bli_is_upper( uplo_trans ) ) \ + { \ + for ( iter = 0; iter < m; iter += f ) \ + { \ + f = bli_determine_blocksize_dim_b( iter, m, b_fuse ); \ + i = m - iter - f; \ + n_ahead = i; \ + A11 = a_cast + (i )*rs_at + (i )*cs_at; \ + A01 = a_cast + (0 )*rs_at + (i )*cs_at; \ + x1 = x_cast + (i )*incx; \ + x0 = x_cast + (0 )*incx; \ +\ + /* x1 = x1 / triu( A11 ); */ \ + for ( k = 0; k < f; ++k ) \ + { \ + l = f - k - 1; \ + f_ahead = l; \ + alpha11 = A11 + (l )*rs_at + (l )*cs_at; \ + a01 = A11 + (0 )*rs_at + (l )*cs_at; \ + chi11 = x1 + (l )*incx; \ + x01 = x1 + (0 )*incx; \ +\ + /* chi11 = chi11 / alpha11; */ \ + if ( bli_is_nonunit_diag( diag ) ) \ + { \ + PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi11 ); \ + } \ +\ + /* x01 = x01 - chi11 * a01; */ \ + PASTEMAC2(chx,chax,neg2s)( *chi11, minus_chi11 ); \ + if ( bli_is_conj( conja ) ) \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(chax,cha,chx,axpyjs)( minus_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ + } \ + else \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(chax,cha,chx,axpys)( minus_chi11, *(a01 + j*rs_at), *(x01 + j*incx) ); \ + } \ + } \ +\ + /* x0 = x0 - A01 * x1; */ \ + PASTEMAC3(cha,chx,chx,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + f, \ + minus_one, \ + A01, rs_at, cs_at, \ + x1, incx, \ + x0, incx ); \ + } \ + } \ + else /* if ( bli_is_lower( uplo_trans ) ) */ \ + { \ + for ( iter = 0; iter < m; iter += f ) \ + { \ + f = bli_determine_blocksize_dim_f( iter, m, b_fuse ); \ + i = iter; \ + n_ahead = m - iter - f; \ + A11 = a_cast + (i )*rs_at + (i )*cs_at; \ + A21 = a_cast + (i+f)*rs_at + (i )*cs_at; \ + x1 = x_cast + (i )*incx; \ + x2 = x_cast + (i+f)*incx; \ +\ + /* x1 = x1 / tril( A11 ); */ \ + for ( k = 0; k < f; ++k ) \ + { \ + l = k; \ + f_ahead = f - k - 1; \ + alpha11 = A11 + (l )*rs_at + (l )*cs_at; \ + a21 = A11 + (l+1)*rs_at + (l )*cs_at; \ + chi11 = x1 + (l )*incx; \ + x21 = x1 + (l+1)*incx; \ +\ + /* chi11 = chi11 / alpha11; */ \ + if ( bli_is_nonunit_diag( diag ) ) \ + { \ + PASTEMAC2(cha,chax,copycjs)( conja, *alpha11, alpha11_conj ); \ + PASTEMAC2(chax,chx,invscals)( alpha11_conj, *chi11 ); \ + } \ +\ + /* x21 = x21 - chi11 * a21; */ \ + PASTEMAC2(chx,chax,neg2s)( *chi11, minus_chi11 ); \ + if ( bli_is_conj( conja ) ) \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(chax,cha,chx,axpyjs)( minus_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ + } \ + else \ + { \ + for ( j = 0; j < f_ahead; ++j ) \ + PASTEMAC3(chax,cha,chx,axpys)( minus_chi11, *(a21 + j*rs_at), *(x21 + j*incx) ); \ + } \ + } \ +\ + /* x2 = x2 - A21 * x1; */ \ + PASTEMAC3(cha,chx,chx,kername)( conja, \ + BLIS_NO_CONJUGATE, \ + n_ahead, \ + f, \ + minus_one, \ + A21, rs_at, cs_at, \ + x1, incx, \ + x2, incx ); \ + } \ + } \ +} + +// Define the basic set of functions unconditionally, and then also some +// mixed datatype functions if requested. +INSERT_GENTFUNC2U_BASIC( trsv_unf_var2, AXPYF_KERNEL ) + +#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT +INSERT_GENTFUNC2U_MIX_D( trsv_unf_var2, AXPYF_KERNEL ) +#endif + +#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT +INSERT_GENTFUNC2U_MIX_P( trsv_unf_var2, AXPYF_KERNEL ) +#endif + diff --git a/frame/2/trsv/bli_trsv_unf_var2.h b/frame/2/trsv/old/bli_trsv_unf_var2.h similarity index 98% rename from frame/2/trsv/bli_trsv_unf_var2.h rename to frame/2/trsv/old/bli_trsv_unf_var2.h index 8abde247f..df92dce31 100644 --- a/frame/2/trsv/bli_trsv_unf_var2.h +++ b/frame/2/trsv/old/bli_trsv_unf_var2.h @@ -36,6 +36,7 @@ void bli_trsv_unf_var2( obj_t* alpha, obj_t* a, obj_t* x, + cntx_t* cntx, trsv_t* cntl ); #undef GENTPROT2 diff --git a/frame/1/addv/bli_addv_kernel.h b/frame/3/bli_l3.h similarity index 70% rename from frame/1/addv/bli_addv_kernel.h rename to frame/3/bli_l3.h index 33aedafd4..9c933584d 100644 --- a/frame/1/addv/bli_addv_kernel.h +++ b/frame/3/bli_l3.h @@ -32,31 +32,37 @@ */ -void bli_addv_kernel( obj_t* x, - obj_t* y ); +#include "bli_l3_cntx.h" +#include "bli_l3_check.h" +#include "bli_l3_ft.h" +#include "bli_l3_oft.h" -// -// Prototype the void pointer kernel wrappers. -// +#include "bli_l3_blocksize.h" +#include "bli_l3_prune.h" -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ -\ -void PASTEMAC2(chx,chy,varname)( \ - conj_t conjx, \ - dim_t n, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ); +// Prototype object APIs with and without contexts. +#include "bli_oapi_w_cntx.h" +#include "bli_l3_oapi.h" +#include "bli_oapi_wo_cntx.h" +#include "bli_l3_oapi.h" -INSERT_GENTPROT2_BASIC( addv_kernel_void ) +#include "bli_l3_tapi.h" -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( addv_kernel_void ) -#endif +#include "bli_l3_ukr_oapi.h" +#include "bli_l3_ukr_tapi.h" -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( addv_kernel_void ) -#endif +// Prototype reference micro-kernels. +#include "bli_l3_ukr_ref.h" +// Operation-specific headers +#include "bli_gemm.h" +#include "bli_hemm.h" +#include "bli_herk.h" +#include "bli_her2k.h" +#include "bli_symm.h" +#include "bli_syrk.h" +#include "bli_syr2k.h" +#include "bli_trmm.h" +#include "bli_trmm3.h" +#include "bli_trsm.h" diff --git a/frame/3/bli_l3_blocksize.c b/frame/3/bli_l3_blocksize.c new file mode 100644 index 000000000..97556dedd --- /dev/null +++ b/frame/3/bli_l3_blocksize.c @@ -0,0 +1,495 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + + +#undef GENFRONT +#define GENFRONT( opname, chdir ) \ +\ +dim_t PASTEMAC0(opname) \ + ( \ + dim_t i, \ + dim_t dim, \ + obj_t* a, \ + obj_t* b, \ + bszid_t bszid, \ + cntx_t* cntx \ + ) \ +{ \ + num_t dt; \ + blksz_t* bsize; \ + dim_t mnr; \ + dim_t b_alg, b_max; \ + dim_t b_use; \ + \ + /* bli_*_determine_kc_f(): + + We assume that this function is being called from an algorithm that + is moving "forward" (ie: top to bottom, left to right, top-left + to bottom-right). */ \ +\ + /* bli_*_determine_kc_b(): + + We assume that this function is being called from an algorithm that + is moving "backward" (ie: bottom to top, right to left, bottom-right + to top-left). */ \ +\ + /* Extract the execution datatype and use it to query the corresponding + blocksize and blocksize maximum values from the blksz_t object. */ \ + dt = bli_obj_execution_datatype( *a ); \ + bsize = bli_cntx_get_blksz( bszid, cntx ); \ + b_alg = bli_blksz_get_def( dt, bsize ); \ + b_max = bli_blksz_get_max( dt, bsize ); \ +\ + /* Nudge the default and maximum kc blocksizes up to the nearest + multiple of MR if A is Hermitian or symmetric, or NR if B is + Hermitian or symmetric. If neither case applies, then we leave + the blocksizes unchanged. */ \ + if ( bli_obj_root_is_herm_or_symm( *a ) ) \ + { \ + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); \ + b_alg = bli_align_dim_to_mult( b_alg, mnr ); \ + b_max = bli_align_dim_to_mult( b_max, mnr ); \ + } \ + else if ( bli_obj_root_is_herm_or_symm( *b ) ) \ + { \ + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); \ + b_alg = bli_align_dim_to_mult( b_alg, mnr ); \ + b_max = bli_align_dim_to_mult( b_max, mnr ); \ + } \ +\ + b_use = PASTEMAC2(determine_blocksize_,chdir,_sub)( i, dim, b_alg, b_max ); \ +\ + return b_use; \ +} + +GENFRONT( gemm_determine_kc_f, f ) +GENFRONT( gemm_determine_kc_b, b ) + +// ----------------------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, chdir ) \ +\ +dim_t PASTEMAC0(opname) \ + ( \ + dim_t i, \ + dim_t dim, \ + obj_t* a, \ + obj_t* b, \ + bszid_t bszid, \ + cntx_t* cntx \ + ) \ +{ \ + num_t dt; \ + blksz_t* bsize; \ + dim_t mnr; \ + dim_t b_alg, b_max; \ + dim_t b_use; \ + \ + /* bli_*_determine_kc_f(): + + We assume that this function is being called from an algorithm that + is moving "forward" (ie: top to bottom, left to right, top-left + to bottom-right). */ \ +\ + /* bli_*_determine_kc_b(): + + We assume that this function is being called from an algorithm that + is moving "backward" (ie: bottom to top, right to left, bottom-right + to top-left). */ \ +\ + /* Extract the execution datatype and use it to query the corresponding + blocksize and blocksize maximum values from the blksz_t object. */ \ + dt = bli_obj_execution_datatype( *a ); \ + bsize = bli_cntx_get_blksz( bszid, cntx ); \ + b_alg = bli_blksz_get_def( dt, bsize ); \ + b_max = bli_blksz_get_max( dt, bsize ); \ +\ + /* Nudge the default and maximum kc blocksizes up to the nearest + multiple of MR if the triangular matrix is on the left, or NR + if the triangular matrix is one the right. */ \ + if ( bli_obj_root_is_triangular( *a ) ) \ + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); \ + else \ + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); \ +\ + b_alg = bli_align_dim_to_mult( b_alg, mnr ); \ + b_max = bli_align_dim_to_mult( b_max, mnr ); \ +\ + b_use = PASTEMAC2(determine_blocksize_,chdir,_sub)( i, dim, b_alg, b_max ); \ +\ + return b_use; \ +} + +GENFRONT( trmm_determine_kc_f, f ) +GENFRONT( trmm_determine_kc_b, b ) + +// ----------------------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, chdir ) \ +\ +dim_t PASTEMAC0(opname) \ + ( \ + dim_t i, \ + dim_t dim, \ + obj_t* a, \ + obj_t* b, \ + bszid_t bszid, \ + cntx_t* cntx \ + ) \ +{ \ + num_t dt; \ + blksz_t* bsize; \ + dim_t mnr; \ + dim_t b_alg, b_max; \ + dim_t b_use; \ + \ + /* bli_*_determine_kc_f(): + + We assume that this function is being called from an algorithm that + is moving "forward" (ie: top to bottom, left to right, top-left + to bottom-right). */ \ +\ + /* bli_*_determine_kc_b(): + + We assume that this function is being called from an algorithm that + is moving "backward" (ie: bottom to top, right to left, bottom-right + to top-left). */ \ +\ + /* Extract the execution datatype and use it to query the corresponding + blocksize and blocksize maximum values from the blksz_t object. */ \ + dt = bli_obj_execution_datatype( *a ); \ + bsize = bli_cntx_get_blksz( bszid, cntx ); \ + b_alg = bli_blksz_get_def( dt, bsize ); \ + b_max = bli_blksz_get_max( dt, bsize ); \ +\ + /* Nudge the default and maximum kc blocksizes up to the nearest + multiple of MR. We always use MR (rather than sometimes using NR) + because even when the triangle is on the right, packing of that + matrix uses MR, since only left-side trsm micro-kernels are + supported. */ \ + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); \ + b_alg = bli_align_dim_to_mult( b_alg, mnr ); \ + b_max = bli_align_dim_to_mult( b_max, mnr ); \ +\ + b_use = PASTEMAC2(determine_blocksize_,chdir,_sub)( i, dim, b_alg, b_max ); \ +\ + return b_use; \ +} + +GENFRONT( trsm_determine_kc_f, f ) +GENFRONT( trsm_determine_kc_b, b ) + + + + + + + + + + +#if 0 +dim_t bli_gemm_determine_kc_f + ( + dim_t i, + dim_t dim, + obj_t* a, + obj_t* b, + bszid_t bszid, + cntx_t* cntx + ) +{ + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; + + // We assume that this function is being called from an algorithm that + // is moving "forward" (ie: top to bottom, left to right, top-left + // to bottom-right). + + // Extract the execution datatype and use it to query the corresponding + // blocksize and blocksize maximum values from the blksz_t object. + dt = bli_obj_execution_datatype( *a ); + bsize = bli_cntx_get_blksz( bszid, cntx ); + b_alg = bli_blksz_get_def( dt, bsize ); + b_max = bli_blksz_get_max( dt, bsize ); + + // Nudge the default and maximum kc blocksizes up to the nearest + // multiple of MR if A is Hermitian or symmetric, or NR if B is + // Hermitian or symmetric. If neither case applies, then we leave + // the blocksizes unchanged. + if ( bli_obj_root_is_herm_or_symm( *a ) ) + { + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + b_alg = bli_align_dim_to_mult( b_alg, mnr ); + b_max = bli_align_dim_to_mult( b_max, mnr ); + } + else if ( bli_obj_root_is_herm_or_symm( *b ) ) + { + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); + b_alg = bli_align_dim_to_mult( b_alg, mnr ); + b_max = bli_align_dim_to_mult( b_max, mnr ); + } + + b_use = bli_determine_blocksize_f_sub( i, dim, b_alg, b_max ); + + return b_use; +} + +dim_t bli_gemm_determine_kc_b + ( + dim_t i, + dim_t dim, + obj_t* a, + obj_t* b, + bszid_t bszid, + cntx_t* cntx + ) +{ + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; + + // We assume that this function is being called from an algorithm that + // is moving "backward" (ie: bottom to top, right to left, bottom-right + // to top-left). + + // Extract the execution datatype and use it to query the corresponding + // blocksize and blocksize maximum values from the blksz_t object. + dt = bli_obj_execution_datatype( *a ); + bsize = bli_cntx_get_blksz( bszid, cntx ); + b_alg = bli_blksz_get_def( dt, bsize ); + b_max = bli_blksz_get_max( dt, bsize ); + + // Nudge the default and maximum kc blocksizes up to the nearest + // multiple of MR if A is Hermitian or symmetric, or NR if B is + // Hermitian or symmetric. If neither case applies, then we leave + // the blocksizes unchanged. + if ( bli_obj_root_is_herm_or_symm( *a ) ) + { + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + b_alg = bli_align_dim_to_mult( b_alg, mnr ); + b_max = bli_align_dim_to_mult( b_max, mnr ); + } + else if ( bli_obj_root_is_herm_or_symm( *b ) ) + { + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); + b_alg = bli_align_dim_to_mult( b_alg, mnr ); + b_max = bli_align_dim_to_mult( b_max, mnr ); + } + + b_use = bli_determine_blocksize_b_sub( i, dim, b_alg, b_max ); + + return b_use; +} + +// ----------------------------------------------------------------------------- + +dim_t bli_trmm_determine_kc_f + ( + dim_t i, + dim_t dim, + obj_t* a, + obj_t* b, + bszid_t bszid, + cntx_t* cntx + ) +{ + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; + + // We assume that this function is being called from an algorithm that + // is moving "forward" (ie: top to bottom, left to right, top-left + // to bottom-right). + + // Extract the execution datatype and use it to query the corresponding + // blocksize and blocksize maximum values from the blksz_t object. + dt = bli_obj_execution_datatype( *a ); + bsize = bli_cntx_get_blksz( bszid, cntx ); + b_alg = bli_blksz_get_def( dt, bsize ); + b_max = bli_blksz_get_max( dt, bsize ); + + // Nudge the default and maximum kc blocksizes up to the nearest + // multiple of MR if the triangular matrix is on the left, or NR + // if the triangular matrix is one the right. + if ( bli_obj_root_is_triangular( *a ) ) + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + else + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); + + b_alg = bli_align_dim_to_mult( b_alg, mnr ); + b_max = bli_align_dim_to_mult( b_max, mnr ); + + b_use = bli_determine_blocksize_f_sub( i, dim, b_alg, b_max ); + + return b_use; +} + +dim_t bli_trmm_determine_kc_b + ( + dim_t i, + dim_t dim, + obj_t* a, + obj_t* b, + bszid_t bszid, + cntx_t* cntx + ) +{ + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; + + // We assume that this function is being called from an algorithm that + // is moving "backward" (ie: bottom to top, right to left, bottom-right + // to top-left). + + // Extract the execution datatype and use it to query the corresponding + // blocksize and blocksize maximum values from the blksz_t object. + dt = bli_obj_execution_datatype( *a ); + bsize = bli_cntx_get_blksz( bszid, cntx ); + b_alg = bli_blksz_get_def( dt, bsize ); + b_max = bli_blksz_get_max( dt, bsize ); + + // Nudge the default and maximum kc blocksizes up to the nearest + // multiple of MR if the triangular matrix is on the left, or NR + // if the triangular matrix is one the right. + if ( bli_obj_root_is_triangular( *a ) ) + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + else + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); + + b_alg = bli_align_dim_to_mult( b_alg, mnr ); + b_max = bli_align_dim_to_mult( b_max, mnr ); + + b_use = bli_determine_blocksize_b_sub( i, dim, b_alg, b_max ); + + return b_use; +} + +// ----------------------------------------------------------------------------- + +dim_t bli_trsm_determine_kc_f + ( + dim_t i, + dim_t dim, + obj_t* obj, + bszid_t bszid, + cntx_t* cntx + ) +{ + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; + + // We assume that this function is being called from an algorithm that + // is moving "forward" (ie: top to bottom, left to right, top-left + // to bottom-right). + + // Extract the execution datatype and use it to query the corresponding + // blocksize and blocksize maximum values from the blksz_t object. + dt = bli_obj_execution_datatype( *obj ); + bsize = bli_cntx_get_blksz( bszid, cntx ); + b_alg = bli_blksz_get_def( dt, bsize ); + b_max = bli_blksz_get_max( dt, bsize ); + + // Nudge the default and maximum kc blocksizes up to the nearest + // multiple of MR. We always use MR (rather than sometimes using NR) + // because even when the triangle is on the right, packing of that + // matrix uses MR, since only left-side trsm micro-kernels are + // supported. + mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + b_alg = bli_align_dim_to_mult( b_alg, mr ); + b_max = bli_align_dim_to_mult( b_max, mr ); + + b_use = bli_determine_blocksize_f_sub( i, dim, b_alg, b_max ); + + return b_use; +} + +dim_t bli_trsm_determine_kc_b + ( + dim_t i, + dim_t dim, + obj_t* obj, + bszid_t bszid, + cntx_t* cntx + ) +{ + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; + + // We assume that this function is being called from an algorithm that + // is moving "backward" (ie: bottom to top, right to left, bottom-right + // to top-left). + + // Extract the execution datatype and use it to query the corresponding + // blocksize and blocksize maximum values from the blksz_t object. + dt = bli_obj_execution_datatype( *obj ); + bsize = bli_cntx_get_blksz( bszid, cntx ); + b_alg = bli_blksz_get_def( dt, bsize ); + b_max = bli_blksz_get_max( dt, bsize ); + + // Nudge the default and maximum kc blocksizes up to the nearest + // multiple of MR. We always use MR (rather than sometimes using NR) + // because even when the triangle is on the right, packing of that + // matrix uses MR, since only left-side trsm micro-kernels are + // supported. + mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + b_alg = bli_align_dim_to_mult( b_alg, mr ); + b_max = bli_align_dim_to_mult( b_max, mr ); + + b_use = bli_determine_blocksize_b_sub( i, dim, b_alg, b_max ); + + return b_use; +} + +#endif diff --git a/frame/3/bli_l3_blocksize.h b/frame/3/bli_l3_blocksize.h new file mode 100644 index 000000000..01e10c3fe --- /dev/null +++ b/frame/3/bli_l3_blocksize.h @@ -0,0 +1,57 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +dim_t PASTEMAC0(opname) \ + ( \ + dim_t i, \ + dim_t dim, \ + obj_t* a, \ + obj_t* b, \ + bszid_t bszid, \ + cntx_t* cntx \ + ); + +GENPROT( gemm_determine_kc_f ) +GENPROT( gemm_determine_kc_b ) + +GENPROT( trmm_determine_kc_f ) +GENPROT( trmm_determine_kc_b ) + +GENPROT( trsm_determine_kc_f ) +GENPROT( trsm_determine_kc_b ) + diff --git a/frame/3/bli_l3_check.c b/frame/3/bli_l3_check.c new file mode 100644 index 000000000..48249a9b3 --- /dev/null +++ b/frame/3/bli_l3_check.c @@ -0,0 +1,508 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +void bli_gemm_check + ( + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + //err_t e_val; + + // Check basic properties of the operation. + + bli_gemm_basic_check( alpha, a, b, beta, c, cntx ); + + // Check object structure. + + // NOTE: Can't perform these checks as long as bli_gemm_check() is called + // from bli_gemm_int(), which is in the execution path for structured + // level-3 operations such as hemm. + + //e_val = bli_check_general_object( a ); + //bli_check_error_code( e_val ); + + //e_val = bli_check_general_object( b ); + //bli_check_error_code( e_val ); +} + +void bli_hemm_check + ( + side_t side, + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + + // Perform checks common to hemm/symm. + + bli_hemm_basic_check( side, alpha, a, b, beta, c, cntx ); + + // Check object structure. + + e_val = bli_check_hermitian_object( a ); + bli_check_error_code( e_val ); +} + +void bli_herk_check + ( + obj_t* alpha, + obj_t* a, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + obj_t ah; + + // Alias A to A^H so we can perform dimension checks. + bli_obj_alias_with_trans( BLIS_CONJ_TRANSPOSE, *a, ah ); + + // Check basic properties of the operation. + + bli_herk_basic_check( alpha, a, &ah, beta, c, cntx ); + + // Check for real-valued alpha and beta. + + e_val = bli_check_real_valued_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_real_valued_object( beta ); + bli_check_error_code( e_val ); + + // Check matrix structure. + + e_val = bli_check_hermitian_object( c ); + bli_check_error_code( e_val ); +} + +void bli_her2k_check + ( + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + obj_t ah, bh; + + // Alias A and B to A^H and B^H so we can perform dimension checks. + bli_obj_alias_with_trans( BLIS_CONJ_TRANSPOSE, *a, ah ); + bli_obj_alias_with_trans( BLIS_CONJ_TRANSPOSE, *b, bh ); + + // Check basic properties of the operation. + + bli_her2k_basic_check( alpha, a, &bh, b, &ah, beta, c, cntx ); + + // Check for real-valued beta. + + e_val = bli_check_real_valued_object( beta ); + bli_check_error_code( e_val ); + + // Check matrix structure. + + e_val = bli_check_hermitian_object( c ); + bli_check_error_code( e_val ); +} + +void bli_symm_check + ( + side_t side, + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + + // Check basic properties of the operation. + + bli_hemm_basic_check( side, alpha, a, b, beta, c, cntx ); + + // Check object structure. + + e_val = bli_check_symmetric_object( a ); + bli_check_error_code( e_val ); +} + +void bli_syrk_check + ( + obj_t* alpha, + obj_t* a, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + obj_t at; + + // Alias A to A^T so we can perform dimension checks. + bli_obj_alias_with_trans( BLIS_TRANSPOSE, *a, at ); + + // Check basic properties of the operation. + + bli_herk_basic_check( alpha, a, &at, beta, c, cntx ); + + // Check matrix structure. + + e_val = bli_check_symmetric_object( c ); + bli_check_error_code( e_val ); +} + +void bli_syr2k_check + ( + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + obj_t at, bt; + + // Alias A and B to A^T and B^T so we can perform dimension checks. + bli_obj_alias_with_trans( BLIS_TRANSPOSE, *a, at ); + bli_obj_alias_with_trans( BLIS_TRANSPOSE, *b, bt ); + + // Check basic properties of the operation. + + bli_her2k_basic_check( alpha, a, &bt, b, &at, beta, c, cntx ); + + // Check matrix structure. + + e_val = bli_check_symmetric_object( c ); + bli_check_error_code( e_val ); +} + +#if 0 +void bli_trmm_check + ( + side_t side, + obj_t* alpha, + obj_t* a, + obj_t* b, + cntx_t* cntx + ) +{ + err_t e_val; + + // Perform checks common to hemm/symm. + + bli_hemm_basic_check( side, alpha, a, b, &BLIS_ZERO, b, cntx ); + + // Check object structure. + + e_val = bli_check_triangular_object( a ); + bli_check_error_code( e_val ); +} +#endif + +void bli_trmm_check + ( + side_t side, + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + + // Perform checks common to hemm/symm. + + bli_hemm_basic_check( side, alpha, a, b, beta, c, cntx ); + + // Check object structure. + + e_val = bli_check_triangular_object( a ); + bli_check_error_code( e_val ); +} + +void bli_trsm_check + ( + side_t side, + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + + // Perform checks common to hemm/symm. + + bli_hemm_basic_check( side, alpha, a, b, beta, c, cntx ); + + // Check object structure. + + e_val = bli_check_triangular_object( a ); + bli_check_error_code( e_val ); +} + +// ----------------------------------------------------------------------------- + +void bli_gemm_basic_check + ( + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + + // Perform standard checks. + + bli_l3_basic_check( alpha, a, b, beta, c, cntx ); + + // Check object dimensions. + + e_val = bli_check_level3_dims( a, b, c ); + bli_check_error_code( e_val ); +} + +void bli_hemm_basic_check + ( + side_t side, + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + + // Perform standard checks. + + bli_l3_basic_check( alpha, a, b, beta, c, cntx ); + + // Check object dimensions. + + if ( bli_is_left( side ) ) + { + e_val = bli_check_level3_dims( a, b, c ); + bli_check_error_code( e_val ); + } + else // if ( bli_is_right( side ) ) + { + e_val = bli_check_level3_dims( b, a, c ); + bli_check_error_code( e_val ); + } + + // Check matrix squareness. + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); +} + +void bli_herk_basic_check + ( + obj_t* alpha, + obj_t* a, + obj_t* ah, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + + // Perform standard checks. + + bli_l3_basic_check( alpha, a, ah, beta, c, cntx ); + + // Check object dimensions. + + e_val = bli_check_level3_dims( a, ah, c ); + bli_check_error_code( e_val ); + + // Check matrix squareness. + + e_val = bli_check_square_object( c ); + bli_check_error_code( e_val ); + + // Check matrix structure. + + e_val = bli_check_general_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_general_object( ah ); + bli_check_error_code( e_val ); +} + +void bli_her2k_basic_check + ( + obj_t* alpha, + obj_t* a, + obj_t* bh, + obj_t* b, + obj_t* ah, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + + // Perform standard checks. + + bli_l3_basic_check( alpha, a, bh, beta, c, cntx ); + bli_l3_basic_check( alpha, b, ah, beta, c, cntx ); + + // Check object dimensions. + + e_val = bli_check_level3_dims( a, bh, c ); + bli_check_error_code( e_val ); + + e_val = bli_check_level3_dims( b, ah, c ); + bli_check_error_code( e_val ); + + // Check matrix squareness. + + e_val = bli_check_square_object( c ); + bli_check_error_code( e_val ); + + // Check matrix structure. + + e_val = bli_check_general_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_general_object( bh ); + bli_check_error_code( e_val ); + + e_val = bli_check_general_object( b ); + bli_check_error_code( e_val ); + + e_val = bli_check_general_object( ah ); + bli_check_error_code( e_val ); +} + +void bli_l3_basic_check + ( + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( b ); + bli_check_error_code( e_val ); + + e_val = bli_check_floating_object( c ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_scalar_object( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( b ); + bli_check_error_code( e_val ); + + e_val = bli_check_matrix_object( c ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( alpha ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( b ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( beta ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( c ); + bli_check_error_code( e_val ); + + // Check for sufficiently sized stack buffers + + e_val = bli_check_sufficient_stack_buf_size( bli_obj_datatype( *a ), cntx ); + bli_check_error_code( e_val ); +} + diff --git a/frame/ind/query/bli_ind_query.h b/frame/3/bli_l3_check.h similarity index 53% rename from frame/ind/query/bli_ind_query.h rename to frame/3/bli_l3_check.h index 6b1437c94..5af4d776c 100644 --- a/frame/ind/query/bli_ind_query.h +++ b/frame/3/bli_l3_check.h @@ -32,82 +32,116 @@ */ -#ifndef BLIS_IND_QUERY_H -#define BLIS_IND_QUERY_H - -typedef enum -{ - BLIS_3MH = 0, - BLIS_3M3, - BLIS_3M2, - BLIS_3M1, - BLIS_4MH, - BLIS_4M1B, - BLIS_4M1A, - BLIS_NAT, -} ind_t; - -#define BLIS_NUM_IND_METHODS (BLIS_NAT+1) - -// ----------------------------------------------------------------------------- +// +// Prototype object-based check functions. +// #undef GENPROT #define GENPROT( opname ) \ \ -bool_t PASTEMAC(opname,ind_has_avail)( num_t dt ); \ -void* PASTEMAC(opname,ind_get_avail)( num_t dt ); +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ); GENPROT( gemm ) -GENPROT( hemm ) -GENPROT( herk ) GENPROT( her2k ) -GENPROT( symm ) -GENPROT( syrk ) GENPROT( syr2k ) -GENPROT( trmm3 ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ); + +GENPROT( hemm ) +GENPROT( symm ) GENPROT( trmm ) GENPROT( trsm ) + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ); + +GENPROT( herk ) +GENPROT( syrk ) + + // ----------------------------------------------------------------------------- -bool_t bli_ind_oper_is_impl( opid_t oper, ind_t method ); +void bli_gemm_basic_check + ( + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ); -//bool_t bli_ind_oper_is_avail( opid_t oper, ind_t method, num_t dt ); +void bli_hemm_basic_check + ( + side_t side, + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ); -bool_t bli_ind_oper_has_avail( opid_t oper, num_t dt ); -void* bli_ind_oper_get_avail( opid_t oper, num_t dt ); +void bli_herk_basic_check + ( + obj_t* alpha, + obj_t* a, + obj_t* ah, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ); -ind_t bli_ind_oper_find_avail( opid_t oper, num_t dt ); - -char* bli_ind_oper_get_avail_impl_string( opid_t oper, num_t dt ); - -void bli_ind_init( void ); -void bli_ind_finalize( void ); -bool_t bli_ind_is_initialized( void ); - -void bli_ind_enable( ind_t method ); -void bli_ind_disable( ind_t method ); -void bli_ind_disable_all( void ); - -void bli_ind_enable_dt( ind_t method, num_t dt ); -void bli_ind_disable_dt( ind_t method, num_t dt ); -void bli_ind_disable_all_dt( num_t dt ); - -void bli_ind_set_enable_dt( ind_t method, num_t dt, bool_t status ); - -void bli_ind_oper_enable_only( opid_t oper, ind_t method, num_t dt ); -void bli_ind_oper_set_enable_all( opid_t oper, num_t dt, bool_t status ); - -void bli_ind_oper_set_enable( opid_t oper, ind_t method, num_t dt, bool_t status ); -bool_t bli_ind_oper_get_enable( opid_t oper, ind_t method, num_t dt ); - -void* bli_ind_oper_get_func( opid_t oper, ind_t method ); - -char* bli_ind_get_impl_string( ind_t method ); - -num_t bli_ind_map_cdt_to_index( num_t dt ); - - -#endif +void bli_her2k_basic_check + ( + obj_t* alpha, + obj_t* a, + obj_t* bh, + obj_t* b, + obj_t* ah, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ); +void bli_l3_basic_check + ( + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + cntx_t* cntx + ); diff --git a/frame/3/bli_l3_cntx.c b/frame/3/bli_l3_cntx.c new file mode 100644 index 000000000..5a51dc3d6 --- /dev/null +++ b/frame/3/bli_l3_cntx.c @@ -0,0 +1,121 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define context initialization functions. +// + +void bli_gemm_cntx_init( cntx_t* cntx ) +{ + bli_cntx_obj_create( cntx ); + + //bli_cntx_obj_clear( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), given the execution method. + bli_gks_cntx_set_blkszs( BLIS_NAT, 6, + BLIS_NC, BLIS_NR, + BLIS_KC, BLIS_KR, + BLIS_MC, BLIS_MR, + BLIS_NR, BLIS_NR, + BLIS_MR, BLIS_MR, + BLIS_KR, BLIS_KR, + cntx ); + + // Set the pack_t schemas for native execution. + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS, + BLIS_PACKED_COL_PANELS, + cntx ); +} + +void bli_gemm_cntx_finalize( cntx_t* cntx ) +{ + bli_cntx_obj_free( cntx ); +} + +// ----------------------------------------------------------------------------- + +void bli_trsm_cntx_init( cntx_t* cntx ) +{ + bli_cntx_obj_create( cntx ); + + //bli_cntx_obj_clear( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the current architecture's native + // level-3 trsm micro-kernels. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMMTRSM_L_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMMTRSM_U_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr( BLIS_TRSM_L_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr( BLIS_TRSM_U_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), given the execution method. + bli_gks_cntx_set_blkszs( BLIS_NAT, 6, + BLIS_NC, BLIS_NR, + BLIS_KC, BLIS_KR, + BLIS_MC, BLIS_MR, + BLIS_NR, BLIS_NR, + BLIS_MR, BLIS_MR, + BLIS_KR, BLIS_KR, + cntx ); + + // Set the pack_t schemas for native execution. + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS, + BLIS_PACKED_COL_PANELS, + cntx ); +} + +void bli_trsm_cntx_finalize( cntx_t* cntx ) +{ + bli_cntx_obj_free( cntx ); +} + diff --git a/frame/1f/axpyf/bli_axpyf_fusefac.c b/frame/3/bli_l3_cntx.h similarity index 87% rename from frame/1f/axpyf/bli_axpyf_fusefac.c rename to frame/3/bli_l3_cntx.h index e6c00bc8a..21b756656 100644 --- a/frame/1f/axpyf/bli_axpyf_fusefac.c +++ b/frame/3/bli_l3_cntx.h @@ -32,16 +32,16 @@ */ -#include "blis.h" // -// Define object-based fusing factor query routine. +// Prototype context initialization functions. // -static dim_t GENARRAY(factors,axpyf_fusefac); - -dim_t bli_axpyf_fusefac( num_t dt ) -{ - return factors[ dt ]; -} +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_cntx_init)( cntx_t* cntx ); \ +void PASTEMAC(opname,_cntx_finalize)( cntx_t* cntx ); +GENPROT( gemm ) +GENPROT( trsm ) diff --git a/frame/3/bli_l3_ft.h b/frame/3/bli_l3_ft.h new file mode 100644 index 000000000..5ca8c4475 --- /dev/null +++ b/frame/3/bli_l3_ft.h @@ -0,0 +1,106 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#ifndef BLIS_L3_FT_H +#define BLIS_L3_FT_H + + +// +// -- Level-3 micro-kernel types ----------------------------------------------- +// + +// gemm + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ); + +INSERT_GENTDEF( gemm_ukr ) + + +// trsm_[lu] + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ); + +INSERT_GENTDEF( trsm_ukr ) + + +// gemmtrsm_[lu] + +#undef GENTDEF +#define GENTDEF( ctype, ch, opname, tsuf ) \ +\ +typedef void (*PASTECH2(ch,opname,tsuf)) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a1x, \ + ctype* restrict a11, \ + ctype* restrict bx1, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ); + +INSERT_GENTDEF( gemmtrsm_ukr ) + + + + + + +#endif + diff --git a/frame/3/bli_l3_oapi.c b/frame/3/bli_l3_oapi.c new file mode 100644 index 000000000..dacc08df8 --- /dev/null +++ b/frame/3/bli_l3_oapi.c @@ -0,0 +1,169 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +// Guard the function definitions so that they are only compiled when +// #included from files that define the object API macros. +#ifdef BLIS_ENABLE_OAPI + +// +// Define object-based interfaces. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + /* Invoke the operation's "ind" function--its induced method front-end. + This function will call native execution for real domain problems. + For complex problems, it calls the highest priority induced method + that is available (ie: implemented and enabled), and if none are + enabled, it calls native execution. */ \ + PASTEMAC(opname,ind) \ + ( \ + alpha, \ + a, \ + b, \ + beta, \ + c, \ + cntx \ + ); \ +} + +GENFRONT( gemm ) +GENFRONT( her2k ) +GENFRONT( syr2k ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + PASTEMAC(opname,ind) \ + ( \ + side, \ + alpha, \ + a, \ + b, \ + beta, \ + c, \ + cntx \ + ); \ +} + +GENFRONT( hemm ) +GENFRONT( symm ) +GENFRONT( trmm3 ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* beta, \ + obj_t* c \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + PASTEMAC(opname,ind) \ + ( \ + alpha, \ + a, \ + beta, \ + c, \ + cntx \ + ); \ +} + +GENFRONT( herk ) +GENFRONT( syrk ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + PASTEMAC(opname,ind) \ + ( \ + side, \ + alpha, \ + a, \ + b, \ + cntx \ + ); \ +} + +GENFRONT( trmm ) +GENFRONT( trsm ) + + +#endif + diff --git a/frame/3/bli_l3_oapi.h b/frame/3/bli_l3_oapi.h new file mode 100644 index 000000000..a029f7718 --- /dev/null +++ b/frame/3/bli_l3_oapi.h @@ -0,0 +1,107 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( gemm ) +GENPROT( her2k ) +GENPROT( syr2k ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( hemm ) +GENPROT( symm ) +GENPROT( trmm3 ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* beta, \ + obj_t* c \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( herk ) +GENPROT( syrk ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( trmm ) +GENPROT( trsm ) + diff --git a/frame/3/bli_l3_oapi_wc.c b/frame/3/bli_l3_oapi_wc.c new file mode 100644 index 000000000..83c8cce22 --- /dev/null +++ b/frame/3/bli_l3_oapi_wc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-aware. +#include "bli_oapi_w_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l3_oapi.c" + diff --git a/frame/3/bli_l3_oapi_woc.c b/frame/3/bli_l3_oapi_woc.c new file mode 100644 index 000000000..97c51f74d --- /dev/null +++ b/frame/3/bli_l3_oapi_woc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-less. +#include "bli_oapi_wo_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_l3_oapi.c" + diff --git a/frame/3/bli_l3_oft.h b/frame/3/bli_l3_oft.h new file mode 100644 index 000000000..2b278acb7 --- /dev/null +++ b/frame/3/bli_l3_oft.h @@ -0,0 +1,122 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#ifndef BLIS_L3_OFT_H +#define BLIS_L3_OFT_H + + +// +// -- Level-3 object function types -------------------------------------------- +// + +// gemm + +#undef GENTDEF +#define GENTDEF( opname ) \ +\ +typedef void (*PASTECH(opname,_oft)) \ +( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ +); + +GENTDEF( gemm ) +GENTDEF( her2k ) +GENTDEF( syr2k ) + + +// hemm, symm, trmm3 + +#undef GENTDEF +#define GENTDEF( opname ) \ +\ +typedef void (*PASTECH(opname,_oft)) \ +( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ +); + +GENTDEF( hemm ) +GENTDEF( symm ) +GENTDEF( trmm3 ) + + +// herk, syrk + +#undef GENTDEF +#define GENTDEF( opname ) \ +\ +typedef void (*PASTECH(opname,_oft)) \ +( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ +); + +GENTDEF( herk ) +GENTDEF( syrk ) + + +// trmm, trsm + +#undef GENTDEF +#define GENTDEF( opname ) \ +\ +typedef void (*PASTECH(opname,_oft)) \ +( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + cntx_t* cntx \ +); + +GENTDEF( trmm ) +GENTDEF( trsm ) + + + +#endif + diff --git a/frame/3/bli_l3_prune.c b/frame/3/bli_l3_prune.c new file mode 100644 index 000000000..a8c853c56 --- /dev/null +++ b/frame/3/bli_l3_prune.c @@ -0,0 +1,127 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_prune_unref_mparts_m) \ + ( \ + obj_t* a, \ + obj_t* ah, \ + obj_t* c \ + ) \ +{ \ + /* Prune any unreferenced part from the subpartition of C (that would + be encountered from partitioning in the m dimension) and adjust the + subpartition of A accordingly. */ \ + bli_prune_unref_mparts( c, BLIS_M, a, BLIS_M ); \ +} \ +void PASTEMAC(opname,_prune_unref_mparts_n) \ + ( \ + obj_t* a, \ + obj_t* ah, \ + obj_t* c \ + ) \ +{ \ + /* Prune any unreferenced part from the subpartition of C (that would + be encountered from partitioning in the n dimension) and adjust the + subpartition of Ah accordingly. */ \ + bli_prune_unref_mparts( c, BLIS_N, ah, BLIS_N ); \ +} \ +void PASTEMAC(opname,_prune_unref_mparts_k) \ + ( \ + obj_t* a, \ + obj_t* ah, \ + obj_t* c \ + ) \ +{ \ + /* As long as A and Ah are general in structure, no pruning should be + for the k dimension. */ \ +} + +GENFRONT( herk ) + +// ----------------------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_prune_unref_mparts_m) \ + ( \ + obj_t* a, \ + obj_t* b, \ + obj_t* c \ + ) \ +{ \ + /* Prune any unreferenced part from the subpartition of A (that would + be encountered from partitioning in the m dimension) and adjust the + subpartition of C accordingly. */ \ + bli_prune_unref_mparts( a, BLIS_M, c, BLIS_M ); \ +} \ +void PASTEMAC(opname,_prune_unref_mparts_n) \ + ( \ + obj_t* a, \ + obj_t* b, \ + obj_t* c \ + ) \ +{ \ + /* Prune any unreferenced part from the subpartition of B (that would + be encountered from partitioning in the n dimension) and adjust the + subpartition of C accordingly. */ \ + bli_prune_unref_mparts( b, BLIS_N, c, BLIS_N ); \ +} \ +void PASTEMAC(opname,_prune_unref_mparts_k) \ + ( \ + obj_t* a, \ + obj_t* b, \ + obj_t* c \ + ) \ +{ \ + /* Prune any unreferenced part from the subpartition of A (that would + be encountered from partitioning in the k dimension) and adjust the + subpartition of B accordingly. */ \ + bli_prune_unref_mparts( a, BLIS_N, b, BLIS_M ); \ +\ + /* Prune any unreferenced part from the subpartition of B (that would + be encountered from partitioning in the k dimension) and adjust the + subpartition of A accordingly. */ \ + bli_prune_unref_mparts( b, BLIS_M, a, BLIS_N ); \ +} + +GENFRONT( trmm ) +GENFRONT( trsm ) + + diff --git a/frame/1/invertv/bli_invertv_kernel.h b/frame/3/bli_l3_prune.h similarity index 82% rename from frame/1/invertv/bli_invertv_kernel.h rename to frame/3/bli_l3_prune.h index 6de80922f..b4870407d 100644 --- a/frame/1/invertv/bli_invertv_kernel.h +++ b/frame/3/bli_l3_prune.h @@ -32,20 +32,26 @@ */ -void bli_invertv_kernel( obj_t* x ); - -// -// Prototype the void pointer kernel wrappers. -// - -#undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ +#undef GENPROT +#define GENPROT( opname, dim ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t n, \ - void* x, inc_t incx \ - ); +void PASTEMAC2(opname,_prune_unref_mparts_,dim) \ + ( \ + obj_t* a, \ + obj_t* b, \ + obj_t* c \ + ); -INSERT_GENTPROT_BASIC( invertv_kernel_void ) +GENPROT( herk, m ) +GENPROT( herk, n ) +GENPROT( herk, k ) + +GENPROT( trmm, m ) +GENPROT( trmm, n ) +GENPROT( trmm, k ) + +GENPROT( trsm, m ) +GENPROT( trsm, n ) +GENPROT( trsm, k ) diff --git a/frame/3/bli_l3_tapi.c b/frame/3/bli_l3_tapi.c new file mode 100644 index 000000000..fa4c0ca79 --- /dev/null +++ b/frame/3/bli_l3_tapi.c @@ -0,0 +1,469 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include the definition of EX_SUF for the context-aware object API. +#include "bli_oapi_w_cntx.h" + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + obj_t alphao, ao, bo, betao, co; \ +\ + dim_t m_a, n_a; \ + dim_t m_b, n_b; \ +\ + bli_set_dims_with_trans( transa, m, k, m_a, n_a ); \ + bli_set_dims_with_trans( transb, k, n, m_b, n_b ); \ +\ + bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \ + bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \ +\ + bli_obj_create_with_attached_buffer( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \ + bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \ + bli_obj_create_with_attached_buffer( dt, m, n, c, rs_c, cs_c, &co ); \ +\ + bli_obj_set_conjtrans( transa, ao ); \ + bli_obj_set_conjtrans( transb, bo ); \ +\ + PASTEMAC(opname,EX_SUF) \ + ( \ + &alphao, \ + &ao, \ + &bo, \ + &betao, \ + &co, \ + cntx \ + ); \ +} + +INSERT_GENTFUNC_BASIC0( gemm ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, struca ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + conj_t conja, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + obj_t alphao, ao, bo, betao, co; \ +\ + dim_t mn_a; \ + dim_t m_b, n_b; \ +\ + bli_set_dim_with_side( side, m, n, mn_a ); \ + bli_set_dims_with_trans( transb, m, n, m_b, n_b ); \ +\ + bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \ + bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \ +\ + bli_obj_create_with_attached_buffer( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \ + bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \ + bli_obj_create_with_attached_buffer( dt, m, n, c, rs_c, cs_c, &co ); \ +\ + bli_obj_set_uplo( uploa, ao ); \ + bli_obj_set_conj( conja, ao ); \ + bli_obj_set_conjtrans( transb, bo ); \ +\ + bli_obj_set_struc( struca, ao ); \ +\ + PASTEMAC(opname,EX_SUF) \ + ( \ + side, \ + &alphao, \ + &ao, \ + &bo, \ + &betao, \ + &co, \ + cntx \ + ); \ +} + +INSERT_GENTFUNC_BASIC( hemm, BLIS_HERMITIAN ) +INSERT_GENTFUNC_BASIC( symm, BLIS_SYMMETRIC ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + dim_t m, \ + dim_t k, \ + ctype_r* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype_r* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt_r = PASTEMAC(chr,type); \ + const num_t dt = PASTEMAC(ch,type); \ +\ + obj_t alphao, ao, betao, co; \ +\ + dim_t m_a, n_a; \ +\ + bli_set_dims_with_trans( transa, m, k, m_a, n_a ); \ +\ + bli_obj_create_1x1_with_attached_buffer( dt_r, alpha, &alphao ); \ + bli_obj_create_1x1_with_attached_buffer( dt_r, beta, &betao ); \ +\ + bli_obj_create_with_attached_buffer( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \ + bli_obj_create_with_attached_buffer( dt, m, m, c, rs_c, cs_c, &co ); \ +\ + bli_obj_set_uplo( uploc, co ); \ + bli_obj_set_conjtrans( transa, ao ); \ +\ + bli_obj_set_struc( BLIS_HERMITIAN, co ); \ +\ + PASTEMAC(opname,EX_SUF) \ + ( \ + &alphao, \ + &ao, \ + &betao, \ + &co, \ + cntx \ + ); \ +} + +INSERT_GENTFUNCR_BASIC0( herk ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype_r* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt_r = PASTEMAC(chr,type); \ + const num_t dt = PASTEMAC(ch,type); \ +\ + obj_t alphao, ao, bo, betao, co; \ +\ + dim_t m_a, n_a; \ + dim_t m_b, n_b; \ +\ + bli_set_dims_with_trans( transa, m, k, m_a, n_a ); \ + bli_set_dims_with_trans( transb, m, k, m_b, n_b ); \ +\ + bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \ + bli_obj_create_1x1_with_attached_buffer( dt_r, beta, &betao ); \ +\ + bli_obj_create_with_attached_buffer( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \ + bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \ + bli_obj_create_with_attached_buffer( dt, m, m, c, rs_c, cs_c, &co ); \ +\ + bli_obj_set_uplo( uploc, co ); \ + bli_obj_set_conjtrans( transa, ao ); \ + bli_obj_set_conjtrans( transb, bo ); \ +\ + bli_obj_set_struc( BLIS_HERMITIAN, co ); \ +\ + PASTEMAC(opname,EX_SUF) \ + ( \ + &alphao, \ + &ao, \ + &bo, \ + &betao, \ + &co, \ + cntx \ + ); \ +} + +INSERT_GENTFUNCR_BASIC0( her2k ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + obj_t alphao, ao, betao, co; \ +\ + dim_t m_a, n_a; \ +\ + bli_set_dims_with_trans( transa, m, k, m_a, n_a ); \ +\ + bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \ + bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \ +\ + bli_obj_create_with_attached_buffer( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \ + bli_obj_create_with_attached_buffer( dt, m, m, c, rs_c, cs_c, &co ); \ +\ + bli_obj_set_uplo( uploc, co ); \ + bli_obj_set_conjtrans( transa, ao ); \ +\ + bli_obj_set_struc( BLIS_SYMMETRIC, co ); \ +\ + PASTEMAC(opname,EX_SUF) \ + ( \ + &alphao, \ + &ao, \ + &betao, \ + &co, \ + cntx \ + ); \ +} + +INSERT_GENTFUNC_BASIC0( syrk ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + obj_t alphao, ao, bo, betao, co; \ +\ + dim_t m_a, n_a; \ + dim_t m_b, n_b; \ +\ + bli_set_dims_with_trans( transa, m, k, m_a, n_a ); \ + bli_set_dims_with_trans( transb, m, k, m_b, n_b ); \ +\ + bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \ + bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \ +\ + bli_obj_create_with_attached_buffer( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \ + bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \ + bli_obj_create_with_attached_buffer( dt, m, m, c, rs_c, cs_c, &co ); \ +\ + bli_obj_set_uplo( uploc, co ); \ + bli_obj_set_conjtrans( transa, ao ); \ + bli_obj_set_conjtrans( transb, bo ); \ +\ + bli_obj_set_struc( BLIS_SYMMETRIC, co ); \ +\ + PASTEMAC(opname,EX_SUF) \ + ( \ + &alphao, \ + &ao, \ + &bo, \ + &betao, \ + &co, \ + cntx \ + ); \ +} + +INSERT_GENTFUNC_BASIC0( syr2k ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + obj_t alphao, ao, bo, betao, co; \ +\ + dim_t mn_a; \ + dim_t m_b, n_b; \ +\ + bli_set_dim_with_side( side, m, n, mn_a ); \ + bli_set_dims_with_trans( transb, m, n, m_b, n_b ); \ +\ + bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \ + bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \ +\ + bli_obj_create_with_attached_buffer( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \ + bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \ + bli_obj_create_with_attached_buffer( dt, m, n, c, rs_c, cs_c, &co ); \ +\ + bli_obj_set_uplo( uploa, ao ); \ + bli_obj_set_diag( diaga, ao ); \ + bli_obj_set_conjtrans( transa, ao ); \ + bli_obj_set_conjtrans( transb, bo ); \ +\ + bli_obj_set_struc( BLIS_TRIANGULAR, ao ); \ +\ + PASTEMAC(opname,EX_SUF) \ + ( \ + side, \ + &alphao, \ + &ao, \ + &bo, \ + &betao, \ + &co, \ + cntx \ + ); \ +} + +INSERT_GENTFUNC_BASIC0( trmm3 ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + cntx_t* cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + obj_t alphao, ao, bo; \ +\ + dim_t mn_a; \ +\ + bli_set_dim_with_side( side, m, n, mn_a ); \ +\ + bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \ +\ + bli_obj_create_with_attached_buffer( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \ + bli_obj_create_with_attached_buffer( dt, m, n, b, rs_b, cs_b, &bo ); \ +\ + bli_obj_set_uplo( uploa, ao ); \ + bli_obj_set_diag( diaga, ao ); \ + bli_obj_set_conjtrans( transa, ao ); \ +\ + bli_obj_set_struc( BLIS_TRIANGULAR, ao ); \ +\ + PASTEMAC(opname,EX_SUF) \ + ( \ + side, \ + &alphao, \ + &ao, \ + &bo, \ + cntx \ + ); \ +} + +INSERT_GENTFUNC_BASIC0( trmm ) +INSERT_GENTFUNC_BASIC0( trsm ) + diff --git a/frame/3/bli_l3_tapi.h b/frame/3/bli_l3_tapi.h new file mode 100644 index 000000000..05a346063 --- /dev/null +++ b/frame/3/bli_l3_tapi.h @@ -0,0 +1,206 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( gemm ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + conj_t conja, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( hemm ) +INSERT_GENTPROT_BASIC( symm ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + dim_t m, \ + dim_t k, \ + ctype_r* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype_r* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( herk ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype_r* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( her2k ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( syrk ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( syr2k ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( trmm3 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( trmm ) +INSERT_GENTPROT_BASIC( trsm ) + diff --git a/frame/3/bli_l3_ukr.h b/frame/3/bli_l3_ukr.h new file mode 100644 index 000000000..9af83b781 --- /dev/null +++ b/frame/3/bli_l3_ukr.h @@ -0,0 +1,91 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +// +// Define template prototypes for level-3 micro-kernels. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ); + +INSERT_GENTPROT_BASIC( gemm_ukr_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a1x, \ + ctype* restrict a11, \ + ctype* restrict bx1, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ); + +INSERT_GENTPROT_BASIC( gemmtrsm_l_ukr_name ) +INSERT_GENTPROT_BASIC( gemmtrsm_u_ukr_name ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ); + +INSERT_GENTPROT_BASIC( trsm_l_ukr_name ) +INSERT_GENTPROT_BASIC( trsm_u_ukr_name ) + diff --git a/frame/3/bli_l3_ukr_oapi.c b/frame/3/bli_l3_ukr_oapi.c new file mode 100644 index 000000000..f3a4a6dcd --- /dev/null +++ b/frame/3/bli_l3_ukr_oapi.c @@ -0,0 +1,220 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *c ); \ +\ + dim_t k = bli_obj_width( *a ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + void* buf_b = bli_obj_buffer_at_off( *b ); \ + void* buf_c = bli_obj_buffer_at_off( *c ); \ + inc_t rs_c = bli_obj_row_stride( *c ); \ + inc_t cs_c = bli_obj_col_stride( *c ); \ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ + void* buf_beta = bli_obj_buffer_for_1x1( dt, *beta ); \ +\ + auxinfo_t data; \ +\ + /* Fill the auxinfo_t struct in case the micro-kernel uses it. */ \ + bli_auxinfo_set_next_a( buf_a, data ); \ + bli_auxinfo_set_next_b( buf_b, data ); \ + bli_auxinfo_set_is_a( 1, data ); \ + bli_auxinfo_set_is_b( 1, data ); \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + bli_call_ft_10 \ + ( \ + dt, \ + opname, \ + k, \ + buf_alpha, \ + buf_a, \ + buf_b, \ + buf_beta, \ + buf_c, rs_c, cs_c, \ + &data, \ + cntx \ + ); \ +} \ + +GENFRONT( gemm_ukernel ) + + +#undef GENFRONT +#define GENFRONT( opname, opnamel, opnameu ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* a, \ + obj_t* b, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *c ); \ +\ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + void* buf_b = bli_obj_buffer_at_off( *b ); \ + void* buf_c = bli_obj_buffer_at_off( *c ); \ + inc_t rs_c = bli_obj_row_stride( *c ); \ + inc_t cs_c = bli_obj_col_stride( *c ); \ +\ + auxinfo_t data; \ +\ + /* Fill the auxinfo_t struct in case the micro-kernel uses it. */ \ + bli_auxinfo_set_next_a( buf_a, data ); \ + bli_auxinfo_set_next_b( buf_b, data ); \ + bli_auxinfo_set_is_a( 1, data ); \ + bli_auxinfo_set_is_b( 1, data ); \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + if ( bli_obj_is_lower( *a ) ) \ + { \ + bli_call_ft_7 \ + ( \ + dt, \ + opnamel, \ + buf_a, \ + buf_b, \ + buf_c, rs_c, cs_c, \ + &data, \ + cntx \ + ); \ + } \ + else /* if ( bli_obj_is_upper( *a ) ) */ \ + { \ + bli_call_ft_7 \ + ( \ + dt, \ + opnameu, \ + buf_a, \ + buf_b, \ + buf_c, rs_c, cs_c, \ + &data, \ + cntx \ + ); \ + } \ +} \ + +GENFRONT( trsm_ukernel, trsm_l_ukernel, trsm_u_ukernel ) + + +#undef GENFRONT +#define GENFRONT( opname, opnamel, opnameu ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a1x, \ + obj_t* a11, \ + obj_t* bx1, \ + obj_t* b11, \ + obj_t* c11, \ + cntx_t* cntx \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *c11 ); \ +\ + dim_t k = bli_obj_width( *a1x ); \ + void* buf_a1x = bli_obj_buffer_at_off( *a1x ); \ + void* buf_a11 = bli_obj_buffer_at_off( *a11 ); \ + void* buf_bx1 = bli_obj_buffer_at_off( *bx1 ); \ + void* buf_b11 = bli_obj_buffer_at_off( *b11 ); \ + void* buf_c11 = bli_obj_buffer_at_off( *c11 ); \ + inc_t rs_c = bli_obj_row_stride( *c11 ); \ + inc_t cs_c = bli_obj_col_stride( *c11 ); \ + void* buf_alpha = bli_obj_buffer_for_1x1( dt, *alpha ); \ +\ + auxinfo_t data; \ +\ + /* Fill the auxinfo_t struct in case the micro-kernel uses it. */ \ + if ( bli_obj_is_lower( *a11 ) ) \ + { bli_auxinfo_set_next_a( buf_a1x, data ); } \ + else /* if ( bli_obj_is_upper( *a11 ) ) */ \ + { bli_auxinfo_set_next_a( buf_a11, data ); } \ + bli_auxinfo_set_next_b( buf_bx1, data ); \ +\ + /* Invoke the void pointer-based function for the given datatype. */ \ + if ( bli_obj_is_lower( *a11 ) ) \ + { \ + bli_call_ft_11 \ + ( \ + dt, \ + opnamel, \ + k, \ + buf_alpha, \ + buf_a1x, \ + buf_a11, \ + buf_bx1, \ + buf_b11, \ + buf_c11, rs_c, cs_c, \ + &data, \ + cntx \ + ); \ + } \ + else /* if ( bli_obj_is_upper( *a11 ) ) */ \ + { \ + bli_call_ft_11 \ + ( \ + dt, \ + opnameu, \ + k, \ + buf_alpha, \ + buf_a1x, \ + buf_a11, \ + buf_bx1, \ + buf_b11, \ + buf_c11, rs_c, cs_c, \ + &data, \ + cntx \ + ); \ + } \ +} \ + +GENFRONT( gemmtrsm_ukernel, gemmtrsm_l_ukernel, gemmtrsm_u_ukernel ) + diff --git a/frame/1/scalv/bli_scalv_kernel.h b/frame/3/bli_l3_ukr_oapi.h similarity index 69% rename from frame/1/scalv/bli_scalv_kernel.h rename to frame/3/bli_l3_ukr_oapi.h index e5db40102..3647e954a 100644 --- a/frame/1/scalv/bli_scalv_kernel.h +++ b/frame/3/bli_l3_ukr_oapi.h @@ -32,31 +32,54 @@ */ -void bli_scalv_kernel( obj_t* beta, - obj_t* x ); - // -// Prototype the void pointer kernel wrappers. +// Prototype object-based interfaces. // -#undef GENTPROT2 -#define GENTPROT2( ctype_b, ctype_x, chb, chx, varname ) \ +#undef GENPROT +#define GENPROT( opname ) \ \ -void PASTEMAC2(chb,chx,varname)( \ - conj_t conjbeta, \ - dim_t n, \ - void* beta, \ - void* x, inc_t incx \ - ); +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ); -INSERT_GENTPROT2_BASIC( scalv_kernel_void ) +GENPROT( gemm_ukernel ) -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( scalv_kernel_void ) -#endif -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( scalv_kernel_void ) -#endif +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* a, \ + obj_t* b, \ + obj_t* c, \ + cntx_t* cntx \ + ); + +GENPROT( trsm_ukernel ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* alpha, \ + obj_t* a1x, \ + obj_t* a11, \ + obj_t* bx1, \ + obj_t* b11, \ + obj_t* c11, \ + cntx_t* cntx \ + ); + +GENPROT( gemmtrsm_ukernel ) diff --git a/frame/3/bli_l3_ukr_tapi.c b/frame/3/bli_l3_ukr_tapi.c new file mode 100644 index 000000000..7a03f1981 --- /dev/null +++ b/frame/3/bli_l3_ukr_tapi.c @@ -0,0 +1,144 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, tname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + /* Query the context for the function address of the current + datatype's micro-kernel. */ \ + PASTECH2(ch,tname,_ft) f = bli_cntx_get_l3_ukr_dt( dt, kerid, cntx ); \ +\ + /* Invoke the typed function for the given datatype. */ \ + f( \ + k, \ + alpha, \ + a, \ + b, \ + beta, \ + c, rs_c, cs_c, \ + data, \ + cntx \ + ); \ +} \ + +INSERT_GENTFUNC_BASIC2( gemm_ukernel, gemm_ukr, BLIS_GEMM_UKR ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, tname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + /* Query the context for the function address of the current + datatype's micro-kernel. */ \ + PASTECH2(ch,tname,_ft) f = bli_cntx_get_l3_ukr_dt( dt, kerid, cntx ); \ +\ + /* Invoke the typed function for the given datatype. */ \ + f( \ + a, \ + b, \ + c, rs_c, cs_c, \ + data, \ + cntx \ + ); \ +} \ + +INSERT_GENTFUNC_BASIC2( trsm_l_ukernel, trsm_ukr, BLIS_TRSM_L_UKR ) +INSERT_GENTFUNC_BASIC2( trsm_u_ukernel, trsm_ukr, BLIS_TRSM_U_UKR ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, tname, kerid ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a1x, \ + ctype* restrict a11, \ + ctype* restrict bx1, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + /* Query the context for the function address of the current + datatype's micro-kernel. */ \ + PASTECH2(ch,tname,_ft) f = bli_cntx_get_l3_ukr_dt( dt, kerid, cntx ); \ +\ + /* Invoke the typed function for the given datatype. */ \ + f( \ + k, \ + alpha, \ + a1x, \ + a11, \ + bx1, \ + b11, \ + c11, rs_c, cs_c, \ + data, \ + cntx \ + ); \ +} \ + +INSERT_GENTFUNC_BASIC2( gemmtrsm_l_ukernel, gemmtrsm_ukr, BLIS_GEMMTRSM_L_UKR ) +INSERT_GENTFUNC_BASIC2( gemmtrsm_u_ukernel, gemmtrsm_ukr, BLIS_GEMMTRSM_U_UKR ) + diff --git a/frame/3/bli_l3_ukr_tapi.h b/frame/3/bli_l3_ukr_tapi.h new file mode 100644 index 000000000..2eb4e21c1 --- /dev/null +++ b/frame/3/bli_l3_ukr_tapi.h @@ -0,0 +1,56 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Generate prototypes for level-3 micro-kernel wrappers. +// + +#undef gemm_ukr_name +#define gemm_ukr_name gemm_ukernel + +#undef gemmtrsm_l_ukr_name +#define gemmtrsm_l_ukr_name gemmtrsm_l_ukernel +#undef gemmtrsm_u_ukr_name +#define gemmtrsm_u_ukr_name gemmtrsm_u_ukernel + +#undef trsm_l_ukr_name +#define trsm_l_ukr_name trsm_l_ukernel +#undef trsm_u_ukr_name +#define trsm_u_ukr_name trsm_u_ukernel + +// Include the level-3 micro-kernel API template. + +#include "bli_l3_ukr.h" + diff --git a/frame/3/gemm/bli_gemm.h b/frame/3/gemm/bli_gemm.h index 2ffb96049..46f42dae6 100644 --- a/frame/3/gemm/bli_gemm.h +++ b/frame/3/gemm/bli_gemm.h @@ -33,52 +33,8 @@ */ #include "bli_gemm_cntl.h" -#include "bli_gemm_blocksize.h" -#include "bli_gemm_check.h" #include "bli_gemm_front.h" #include "bli_gemm_int.h" -#include "bli_gemm_ukernel.h" - -#include "bli_gemm_blk_var1f.h" -#include "bli_gemm_blk_var2f.h" -#include "bli_gemm_blk_var3f.h" - -#include "bli_gemm_ker_var2.h" - -// Headers for induced algorithms: -#include "bli_gemm_blk_var4f.h" // 3m3 -#include "bli_gemm_ker_var3.h" // 4m1b -#include "bli_gemm_ker_var4.h" // 3m2 - -#include "bli_gemm_ukr_ref.h" - - -// -// Prototype object-based interface. -// -void bli_gemm( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c ); - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - trans_t transa, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( gemm ) +#include "bli_gemm_var.h" diff --git a/frame/3/gemm/bli_gemm_blk_var1f.c b/frame/3/gemm/bli_gemm_blk_var1f.c index f3dfa284b..319614a14 100644 --- a/frame/3/gemm/bli_gemm_blk_var1f.c +++ b/frame/3/gemm/bli_gemm_blk_var1f.c @@ -37,6 +37,7 @@ void bli_gemm_blk_var1f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ) { @@ -56,13 +57,13 @@ void bli_gemm_blk_var1f( obj_t* a, // Initialize object for packing B. bli_obj_init_pack( &b_pack_s ); bli_packm_init( b, &b_pack_s, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); // Scale C by beta (if instructed). // Since scalm doesn't support multithreading yet, must be done by chief thread (ew) bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } b_pack = thread_obroadcast( thread, &b_pack_s ); @@ -76,12 +77,12 @@ void bli_gemm_blk_var1f( obj_t* a, // Pack B (if instructed). bli_packm_int( b, b_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), gemm_thread_sub_opackm( thread ) ); dim_t my_start, my_end; bli_get_range_t2b( thread, a, - bli_blksz_get_mult_for_obj( a, cntl_blocksize( cntl ) ), + bli_cntx_get_bmult( cntl_bszid( cntl ), cntx ), &my_start, &my_end ); // Partition along the m dimension. @@ -92,7 +93,7 @@ void bli_gemm_blk_var1f( obj_t* a, // This causes the right blocksize to be used if c and a are // complex and b is real. b_alg = bli_determine_blocksize_f( i, my_end, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and C1. bli_acquire_mpart_t2b( BLIS_SUBPART1, @@ -103,20 +104,20 @@ void bli_gemm_blk_var1f( obj_t* a, // Initialize objects for packing A1 and C1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packm_init( &c1, c1_pack, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), gemm_thread_sub_ipackm( thread ) ); // Pack C1 (if instructed). bli_packm_int( &c1, c1_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), gemm_thread_sub_ipackm( thread ) ); // Perform gemm subproblem. @@ -125,6 +126,7 @@ void bli_gemm_blk_var1f( obj_t* a, b_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_gemm( cntl ), gemm_thread_sub_gemm( thread ) ); @@ -133,7 +135,7 @@ void bli_gemm_blk_var1f( obj_t* a, // Unpack C1 (if C1 was packed). // Currently must be done by 1 thread bli_unpackm_int( c1_pack, &c1, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), gemm_thread_sub_ipackm( thread ) ); } diff --git a/frame/3/gemm/bli_gemm_blk_var2f.c b/frame/3/gemm/bli_gemm_blk_var2f.c index 4b3d9da73..302b9bf9d 100644 --- a/frame/3/gemm/bli_gemm_blk_var2f.c +++ b/frame/3/gemm/bli_gemm_blk_var2f.c @@ -37,6 +37,7 @@ void bli_gemm_blk_var2f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ) { @@ -56,12 +57,12 @@ void bli_gemm_blk_var2f( obj_t* a, // Initialize object for packing A bli_obj_init_pack( &a_pack_s ); bli_packm_init( a, &a_pack_s, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } a_pack = thread_obroadcast( thread, &a_pack_s ); @@ -75,12 +76,12 @@ void bli_gemm_blk_var2f( obj_t* a, // Pack A (if instructed). bli_packm_int( a, a_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), gemm_thread_sub_opackm( thread ) ); dim_t my_start, my_end; bli_get_range_l2r( thread, b, - bli_blksz_get_mult_for_obj( b, cntl_blocksize( cntl ) ), + bli_cntx_get_bmult( cntl_bszid( cntl ), cntx ), &my_start, &my_end ); // Partition along the n dimension. @@ -91,7 +92,7 @@ void bli_gemm_blk_var2f( obj_t* a, // This causes the right blocksize to be used if c and a are // complex and b is real. b_alg = bli_determine_blocksize_f( i, my_end, b, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for B1 and C1. bli_acquire_mpart_l2r( BLIS_SUBPART1, @@ -102,20 +103,20 @@ void bli_gemm_blk_var2f( obj_t* a, // Initialize objects for packing A1 and B1. if( thread_am_ichief( thread ) ) { bli_packm_init( &b1, b1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); bli_packm_init( &c1, c1_pack, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); } thread_ibarrier( thread ); // Pack B1 (if instructed). bli_packm_int( &b1, b1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), gemm_thread_sub_ipackm( thread ) ); // Pack C1 (if instructed). bli_packm_int( &c1, c1_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), gemm_thread_sub_ipackm( thread ) ); // Perform gemm subproblem. @@ -124,6 +125,7 @@ void bli_gemm_blk_var2f( obj_t* a, b1_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_gemm( cntl ), gemm_thread_sub_gemm( thread ) ); @@ -132,7 +134,7 @@ void bli_gemm_blk_var2f( obj_t* a, // Unpack C1 (if C1 was packed). // Currently must be done by 1 thread bli_unpackm_int( c1_pack, &c1, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), gemm_thread_sub_ipackm( thread ) ); } diff --git a/frame/3/gemm/bli_gemm_blk_var3f.c b/frame/3/gemm/bli_gemm_blk_var3f.c index febf38cde..f57f3ea13 100644 --- a/frame/3/gemm/bli_gemm_blk_var3f.c +++ b/frame/3/gemm/bli_gemm_blk_var3f.c @@ -37,6 +37,7 @@ void bli_gemm_blk_var3f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ) { @@ -56,12 +57,12 @@ void bli_gemm_blk_var3f( obj_t* a, // Initialize object for packing C bli_obj_init_pack( &c_pack_s ); bli_packm_init( c, &c_pack_s, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } c_pack = thread_obroadcast( thread, &c_pack_s ); @@ -75,7 +76,7 @@ void bli_gemm_blk_var3f( obj_t* a, // Pack C (if instructed). bli_packm_int( c, c_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), gemm_thread_sub_opackm( thread ) ); // Query dimension in partitioning direction. @@ -89,7 +90,7 @@ void bli_gemm_blk_var3f( obj_t* a, // the kc blocksize so that we can implement the "nudging" of kc // to be a multiple of mr or nr, as needed. b_alg = bli_gemm_determine_kc_f( i, k_trans, a, b, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and B1. bli_acquire_mpart_l2r( BLIS_SUBPART1, @@ -100,20 +101,20 @@ void bli_gemm_blk_var3f( obj_t* a, // Initialize objects for packing A1 and B1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packm_init( &b1, b1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), gemm_thread_sub_ipackm( thread ) ); // Pack B1 (if instructed). bli_packm_int( &b1, b1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), gemm_thread_sub_ipackm( thread ) ); // Perform gemm subproblem. @@ -122,6 +123,7 @@ void bli_gemm_blk_var3f( obj_t* a, b1_pack, &BLIS_ONE, c_pack, + cntx, cntl_sub_gemm( cntl ), gemm_thread_sub_gemm( thread) ); @@ -140,7 +142,7 @@ void bli_gemm_blk_var3f( obj_t* a, // Unpack C (if C was packed). bli_unpackm_int( c_pack, c, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), gemm_thread_sub_opackm( thread ) ); // If any packing buffers were acquired within packm, release them back diff --git a/frame/3/gemm/bli_gemm_cntl.c b/frame/3/gemm/bli_gemm_cntl.c index bf3e60b61..09b128354 100644 --- a/frame/3/gemm/bli_gemm_cntl.c +++ b/frame/3/gemm/bli_gemm_cntl.c @@ -36,16 +36,6 @@ extern scalm_t* scalm_cntl; -blksz_t* gemm_mc; -blksz_t* gemm_nc; -blksz_t* gemm_kc; -blksz_t* gemm_mr; -blksz_t* gemm_nr; -blksz_t* gemm_kr; - -func_t* gemm_ukrs; -func_t* gemm_ref_ukrs; - packm_t* gemm_packa_cntl; packm_t* gemm_packb_cntl; @@ -58,88 +48,13 @@ gemm_t* gemm_cntl; void bli_gemm_cntl_init() { - // Create blocksize objects for each dimension. - gemm_mc - = - bli_blksz_obj_create( BLIS_DEFAULT_MC_S, BLIS_MAXIMUM_MC_S, - BLIS_DEFAULT_MC_D, BLIS_MAXIMUM_MC_D, - BLIS_DEFAULT_MC_C, BLIS_MAXIMUM_MC_C, - BLIS_DEFAULT_MC_Z, BLIS_MAXIMUM_MC_Z ); - gemm_nc - = - bli_blksz_obj_create( BLIS_DEFAULT_NC_S, BLIS_MAXIMUM_NC_S, - BLIS_DEFAULT_NC_D, BLIS_MAXIMUM_NC_D, - BLIS_DEFAULT_NC_C, BLIS_MAXIMUM_NC_C, - BLIS_DEFAULT_NC_Z, BLIS_MAXIMUM_NC_Z ); - gemm_kc - = - bli_blksz_obj_create( BLIS_DEFAULT_KC_S, BLIS_MAXIMUM_KC_S, - BLIS_DEFAULT_KC_D, BLIS_MAXIMUM_KC_D, - BLIS_DEFAULT_KC_C, BLIS_MAXIMUM_KC_C, - BLIS_DEFAULT_KC_Z, BLIS_MAXIMUM_KC_Z ); - gemm_mr - = - bli_blksz_obj_create( BLIS_DEFAULT_MR_S, BLIS_PACKDIM_MR_S, - BLIS_DEFAULT_MR_D, BLIS_PACKDIM_MR_D, - BLIS_DEFAULT_MR_C, BLIS_PACKDIM_MR_C, - BLIS_DEFAULT_MR_Z, BLIS_PACKDIM_MR_Z ); - gemm_nr - = - bli_blksz_obj_create( BLIS_DEFAULT_NR_S, BLIS_PACKDIM_NR_S, - BLIS_DEFAULT_NR_D, BLIS_PACKDIM_NR_D, - BLIS_DEFAULT_NR_C, BLIS_PACKDIM_NR_C, - BLIS_DEFAULT_NR_Z, BLIS_PACKDIM_NR_Z ); - gemm_kr - = - bli_blksz_obj_create( BLIS_DEFAULT_KR_S, BLIS_PACKDIM_KR_S, - BLIS_DEFAULT_KR_D, BLIS_PACKDIM_KR_D, - BLIS_DEFAULT_KR_C, BLIS_PACKDIM_KR_C, - BLIS_DEFAULT_KR_Z, BLIS_PACKDIM_KR_Z ); - - - // Attach the register blksz_t objects as blocksize multiples to the cache - // blksz_t objects. - bli_blksz_obj_attach_mult_to( gemm_mr, gemm_mc ); - bli_blksz_obj_attach_mult_to( gemm_nr, gemm_nc ); - bli_blksz_obj_attach_mult_to( gemm_kr, gemm_kc ); - - - // Attach the mr and nr blksz_t objects to each cache blksz_t object. - // The primary example of why this is needed relates to nudging kc. - // In hemm, symm, trmm, or trmm3, we need to know both mr and nr, - // since the multiple we target in nudging depends on whether the - // structured matrix is on the left or the right. - bli_blksz_obj_attach_mr_nr_to( gemm_mr, gemm_nr, gemm_mc ); - bli_blksz_obj_attach_mr_nr_to( gemm_mr, gemm_nr, gemm_nc ); - bli_blksz_obj_attach_mr_nr_to( gemm_mr, gemm_nr, gemm_kc ); - - - // Create function pointer object for each datatype-specific gemm - // micro-kernel. - gemm_ukrs - = - bli_func_obj_create( BLIS_SGEMM_UKERNEL, BLIS_SGEMM_UKERNEL_PREFERS_CONTIG_ROWS, - BLIS_DGEMM_UKERNEL, BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS, - BLIS_CGEMM_UKERNEL, BLIS_CGEMM_UKERNEL_PREFERS_CONTIG_ROWS, - BLIS_ZGEMM_UKERNEL, BLIS_ZGEMM_UKERNEL_PREFERS_CONTIG_ROWS ); - - - // Create function pointer object for reference micro-kernels. - gemm_ref_ukrs - = - bli_func_obj_create( BLIS_SGEMM_UKERNEL_REF, FALSE, - BLIS_DGEMM_UKERNEL_REF, FALSE, - BLIS_CGEMM_UKERNEL_REF, FALSE, - BLIS_ZGEMM_UKERNEL_REF, FALSE ); - - // Create control tree objects for packm operations. gemm_packa_cntl = bli_packm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemm_mr, - gemm_kr, + BLIS_MR, + BLIS_KR, FALSE, // do NOT invert diagonal FALSE, // reverse iteration if upper? FALSE, // reverse iteration if lower? @@ -150,8 +65,8 @@ void bli_gemm_cntl_init() = bli_packm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemm_kr, - gemm_nr, + BLIS_KR, + BLIS_NR, FALSE, // do NOT invert diagonal FALSE, // reverse iteration if upper? FALSE, // reverse iteration if lower? @@ -168,8 +83,7 @@ void bli_gemm_cntl_init() = bli_gemm_cntl_obj_create( BLIS_UNB_OPT, BLIS_VARIANT2, - NULL, - gemm_ukrs, + 0, // bszid_t not used by macro-kernel NULL, NULL, NULL, NULL, NULL, NULL ); @@ -179,8 +93,7 @@ void bli_gemm_cntl_init() = bli_gemm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemm_mc, - NULL, + BLIS_MC, NULL, gemm_packa_cntl, gemm_packb_cntl, @@ -194,8 +107,7 @@ void bli_gemm_cntl_init() = bli_gemm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT3, - gemm_kc, - NULL, + BLIS_KC, NULL, NULL, NULL, @@ -209,8 +121,7 @@ void bli_gemm_cntl_init() = bli_gemm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemm_nc, - NULL, + BLIS_NC, NULL, NULL, NULL, @@ -224,16 +135,6 @@ void bli_gemm_cntl_init() void bli_gemm_cntl_finalize() { - bli_blksz_obj_free( gemm_mc ); - bli_blksz_obj_free( gemm_nc ); - bli_blksz_obj_free( gemm_kc ); - bli_blksz_obj_free( gemm_mr ); - bli_blksz_obj_free( gemm_nr ); - bli_blksz_obj_free( gemm_kr ); - - bli_func_obj_free( gemm_ukrs ); - bli_func_obj_free( gemm_ref_ukrs ); - bli_cntl_obj_free( gemm_packa_cntl ); bli_cntl_obj_free( gemm_packb_cntl ); @@ -245,8 +146,7 @@ void bli_gemm_cntl_finalize() gemm_t* bli_gemm_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, - func_t* gemm_ukrs_, + bszid_t bszid, scalm_t* sub_scalm, packm_t* sub_packm_a, packm_t* sub_packm_b, @@ -260,8 +160,7 @@ gemm_t* bli_gemm_cntl_obj_create( impl_t impl_type, cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; - cntl->gemm_ukrs = gemm_ukrs_; // avoid name conflict with global symbol + cntl->bszid = bszid; cntl->sub_scalm = sub_scalm; cntl->sub_packm_a = sub_packm_a; cntl->sub_packm_b = sub_packm_b; @@ -272,25 +171,3 @@ gemm_t* bli_gemm_cntl_obj_create( impl_t impl_type, return cntl; } -func_t* bli_gemm_cntl_ukrs( gemm_t* cntl ) -{ - dim_t max_depth = 10; - dim_t i; - - for ( i = 0; ; ++i ) - { - // If the gemm sub-tree is NULL, we are at the leaf. - if ( cntl_sub_gemm( cntl ) == NULL ) break; - - // If the above branch was not taken, we can assume the gemm - // sub-tree is valid. Here, we step down into that sub-tree. - cntl = cntl_sub_gemm( cntl ); - - // Safeguard against infinite loops due to bad control tree - // configuration. - if ( i == max_depth ) bli_abort(); - } - - return cntl_gemm_ukrs( cntl ); -} - diff --git a/frame/3/gemm/bli_gemm_cntl.h b/frame/3/gemm/bli_gemm_cntl.h index ed72ed2df..fa620a344 100644 --- a/frame/3/gemm/bli_gemm_cntl.h +++ b/frame/3/gemm/bli_gemm_cntl.h @@ -36,8 +36,7 @@ struct gemm_s { impl_t impl_type; varnum_t var_num; - blksz_t* b; - func_t* gemm_ukrs; + bszid_t bszid; struct scalm_s* sub_scalm; struct packm_s* sub_packm_a; struct packm_s* sub_packm_b; @@ -48,19 +47,16 @@ struct gemm_s typedef struct gemm_s gemm_t; #define cntl_sub_gemm( cntl ) cntl->sub_gemm -#define cntl_gemm_ukrs( cntl ) cntl->gemm_ukrs void bli_gemm_cntl_init( void ); void bli_gemm_cntl_finalize( void ); gemm_t* bli_gemm_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, - func_t* gemm_ukrs, + bszid_t bszid, scalm_t* sub_scalm, packm_t* sub_pack_a, packm_t* sub_pack_b, packm_t* sub_pack_c, gemm_t* sub_gemm, unpackm_t* sub_unpack_c ); -func_t* bli_gemm_cntl_ukrs( gemm_t* cntl ); diff --git a/frame/3/gemm/bli_gemm_front.c b/frame/3/gemm/bli_gemm_front.c index 78220aff3..2c4537805 100644 --- a/frame/3/gemm/bli_gemm_front.c +++ b/frame/3/gemm/bli_gemm_front.c @@ -39,6 +39,7 @@ void bli_gemm_front( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { obj_t a_local; @@ -47,7 +48,7 @@ void bli_gemm_front( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_gemm_check( alpha, a, b, beta, c ); + bli_gemm_check( alpha, a, b, beta, c, cntx ); // If alpha is zero, scale by beta and return. if ( bli_obj_equals( alpha, &BLIS_ZERO ) ) @@ -56,6 +57,10 @@ void bli_gemm_front( obj_t* alpha, return; } + // Reinitialize the memory allocator to accommodate the blocksizes + // in the current context. + bli_mem_reinit( cntx ); + // Alias A, B, and C in case we need to apply transformations. bli_obj_alias_to( *a, a_local ); bli_obj_alias_to( *b, b_local ); @@ -65,14 +70,7 @@ void bli_gemm_front( obj_t* alpha, // contiguous columns, or if C is stored by columns and the micro-kernel // prefers contiguous rows, transpose the entire operation to allow the // micro-kernel to access elements of C in its preferred manner. - if ( - ( bli_obj_is_row_stored( c_local ) && - bli_func_prefers_contig_cols( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) || - ( bli_obj_is_col_stored( c_local ) && - bli_func_prefers_contig_rows( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) - ) + if ( bli_cntx_l3_nat_ukr_dislikes_storage_of( &c_local, BLIS_GEMM_UKR, cntx ) ) { bli_obj_swap( a_local, b_local ); @@ -86,12 +84,13 @@ void bli_gemm_front( obj_t* alpha, // Invoke the internal back-end. bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_gemm_int, + (l3_int_t) bli_gemm_int, alpha, &a_local, &b_local, beta, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); diff --git a/frame/3/gemm/bli_gemm_front.h b/frame/3/gemm/bli_gemm_front.h index 9801ec3d8..0176eef37 100644 --- a/frame/3/gemm/bli_gemm_front.h +++ b/frame/3/gemm/bli_gemm_front.h @@ -37,5 +37,6 @@ void bli_gemm_front( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/gemm/bli_gemm_int.c b/frame/3/gemm/bli_gemm_int.c index d70a302e0..70f523992 100644 --- a/frame/3/gemm/bli_gemm_int.c +++ b/frame/3/gemm/bli_gemm_int.c @@ -39,6 +39,7 @@ typedef void (*FUNCPTR_T)( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ); @@ -47,8 +48,8 @@ static FUNCPTR_T vars[6][3] = // unblocked optimized unblocked blocked { NULL, NULL, bli_gemm_blk_var1f }, { NULL, bli_gemm_ker_var2, bli_gemm_blk_var2f }, - { NULL, bli_gemm_ker_var3, bli_gemm_blk_var3f }, - { NULL, bli_gemm_ker_var4, bli_gemm_blk_var4f }, + { NULL, NULL, bli_gemm_blk_var3f }, + { NULL, NULL, NULL, }, { NULL, NULL, NULL }, { NULL, NULL, NULL } }; @@ -58,6 +59,7 @@ void bli_gemm_int( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ) { @@ -67,10 +69,11 @@ void bli_gemm_int( obj_t* alpha, varnum_t n; impl_t i; FUNCPTR_T f; + ind_t im; // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_gemm_int_check( alpha, a, b, beta, c, cntl ); + bli_gemm_basic_check( alpha, a, b, beta, c, cntx ); // If C has a zero dimension, return early. if ( bli_obj_has_zero_dim( *c ) ) return; @@ -137,10 +140,21 @@ void bli_gemm_int( obj_t* alpha, // Index into the variant array to extract the correct function pointer. f = vars[n][i]; + // Somewhat hackish support for 3m3, 3m2, and 4m1b method implementations. + im = bli_cntx_get_ind_method( cntx ); + + if ( im != BLIS_NAT ) + { + if ( im == BLIS_3M3 && f == bli_gemm_blk_var1f ) f = bli_gemm_blk_var4f; + else if ( im == BLIS_3M2 && f == bli_gemm_ker_var2 ) f = bli_gemm_ker_var4; + else if ( im == BLIS_4M1B && f == bli_gemm_ker_var2 ) f = bli_gemm_ker_var3; + } + // Invoke the variant. f( &a_local, &b_local, &c_local, + cntx, cntl, thread ); } diff --git a/frame/3/gemm/bli_gemm_int.h b/frame/3/gemm/bli_gemm_int.h index ab071ac27..4d7b41f85 100644 --- a/frame/3/gemm/bli_gemm_int.h +++ b/frame/3/gemm/bli_gemm_int.h @@ -37,6 +37,7 @@ void bli_gemm_int( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ); diff --git a/frame/3/gemm/bli_gemm_ker_var2.c b/frame/3/gemm/bli_gemm_ker_var2.c index edec1990b..4fbe2a8bb 100644 --- a/frame/3/gemm/bli_gemm_ker_var2.c +++ b/frame/3/gemm/bli_gemm_ker_var2.c @@ -49,7 +49,7 @@ typedef void (*FUNCPTR_T)( dim_t pd_b, inc_t ps_b, void* beta, void* c, inc_t rs_c, inc_t cs_c, - void* gemm_ukr, + cntx_t* cntx, gemm_thrinfo_t* thread ); @@ -59,6 +59,7 @@ static FUNCPTR_T GENARRAY(ftypes,gemm_ker_var2); void bli_gemm_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ) { @@ -95,10 +96,6 @@ void bli_gemm_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemm_ukrs; - void* gemm_ukr; - - // Detach and multiply the scalars attached to A and B. bli_obj_scalar_detach( a, &scalar_a ); bli_obj_scalar_detach( b, &scalar_b ); @@ -113,12 +110,6 @@ void bli_gemm_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t object containing - // the gemm micro-kernel function addresses, and then query the - // function address corresponding to the current datatype. - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( schema_a, schema_b, @@ -132,46 +123,51 @@ void bli_gemm_ker_var2( obj_t* a, pd_b, ps_b, buf_beta, buf_c, rs_c, cs_c, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, ukrtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha, \ - void* a, inc_t cs_a, inc_t is_a, \ - dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, inc_t is_b, \ - dim_t pd_b, inc_t ps_b, \ - void* beta, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemm_ukr, \ - gemm_thrinfo_t* thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, inc_t is_a, \ + dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, inc_t is_b, \ + dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + gemm_thrinfo_t* thread \ + ) \ { \ - /* Cast the micro-kernel address to its function pointer type. */ \ - PASTECH(ch,ukrtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ const dim_t MR = pd_a; \ const dim_t NR = pd_b; \ /*const dim_t PACKMR = cs_a;*/ \ /*const dim_t PACKNR = rs_b;*/ \ +\ + /* Query the context for the micro-kernel address and cast it to its + function pointer type. */ \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict zero = PASTEMAC(ch,0); \ ctype* restrict a_cast = a; \ @@ -290,24 +286,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - beta_cast, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + beta_cast, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Scale the bottom edge of C and add the result from above. */ \ PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \ @@ -318,9 +322,10 @@ void PASTEMAC(ch,varname)( \ } \ } \ \ +/*PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var2: c", MR, NR, c11, rs_c, cs_c, "%4.1f", "" );*/ \ /*PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var2: b1", k, NR, b1, NR, 1, "%4.1f", "" ); \ PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var2: a1", MR, k, a1, 1, MR, "%4.1f", "" );*/ \ } -INSERT_GENTFUNC_BASIC( gemm_ker_var2, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( gemm_ker_var2 ) diff --git a/frame/3/gemm/bli_gemm_var.h b/frame/3/gemm/bli_gemm_var.h new file mode 100644 index 000000000..bfcb5072b --- /dev/null +++ b/frame/3/gemm/bli_gemm_var.h @@ -0,0 +1,95 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* a, \ + obj_t* b, \ + obj_t* c, \ + cntx_t* cntx, \ + gemm_t* cntl, \ + gemm_thrinfo_t* thread \ + ); + +GENPROT( gemm_blk_var1f ) +GENPROT( gemm_blk_var2f ) +GENPROT( gemm_blk_var3f ) + +GENPROT( gemm_ker_var2 ) + +// Headers for induced algorithms: +GENPROT( gemm_blk_var4f ) // 3m3 +GENPROT( gemm_ker_var3 ) // 4m1b +GENPROT( gemm_ker_var4 ) // 3m2 + + +// +// Prototype BLAS-like interfaces with void pointer operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, inc_t is_a, \ + dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, inc_t is_b, \ + dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + gemm_thrinfo_t* thread \ + ); + +INSERT_GENTPROT_BASIC( gemm_ker_var2 ) + +// Headers for induced algorithms: +INSERT_GENTPROT_BASIC( gemm_ker_var3 ) // 4m1b +INSERT_GENTPROT_BASIC( gemm_ker_var4 ) // 3m2 + diff --git a/frame/3/gemm/ind/bli_gemm_blk_var4f.c b/frame/3/gemm/ind/bli_gemm_blk_var4f.c index 0102675be..317207f9e 100644 --- a/frame/3/gemm/ind/bli_gemm_blk_var4f.c +++ b/frame/3/gemm/ind/bli_gemm_blk_var4f.c @@ -37,17 +37,10 @@ void bli_gemm_blk_var4f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ) { - extern packm_t* gemm3mh_packa_cntl_ro; - extern packm_t* gemm3mh_packa_cntl_io; - extern packm_t* gemm3mh_packa_cntl_rpi; - - packm_t* packa_cntl_ro = gemm3mh_packa_cntl_ro; - packm_t* packa_cntl_io = gemm3mh_packa_cntl_io; - packm_t* packa_cntl_rpi = gemm3mh_packa_cntl_rpi; - //The s is for "lives on the stack" obj_t b_pack_s; obj_t a1_pack_s, c1_pack_s; @@ -60,17 +53,23 @@ void bli_gemm_blk_var4f( obj_t* a, dim_t i; dim_t b_alg; + // Make a copy of the context for each stage. + cntx_t cntx_ro = *cntx; + cntx_t cntx_io = *cntx; + cntx_t cntx_rpi = *cntx; + if( thread_am_ochief( thread ) ) { // Initialize object for packing B. bli_obj_init_pack( &b_pack_s ); bli_packm_init( b, &b_pack_s, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); // Scale C by beta (if instructed). - // Since scalm doesn't support multithreading yet, must be done by chief thread (ew) + // Since scalm doesn't support multithreading yet, must be done by + // chief thread (ew) bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } b_pack = thread_obroadcast( thread, &b_pack_s ); @@ -84,12 +83,12 @@ void bli_gemm_blk_var4f( obj_t* a, // Pack B (if instructed). bli_packm_int( b, b_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), gemm_thread_sub_opackm( thread ) ); dim_t my_start, my_end; bli_get_range_t2b( thread, a, - bli_blksz_get_mult_for_obj( a, cntl_blocksize( cntl ) ), + bli_cntx_get_bmult( cntl_bszid( cntl ), cntx ), &my_start, &my_end ); // Partition along the m dimension. @@ -100,7 +99,7 @@ void bli_gemm_blk_var4f( obj_t* a, // This causes the right blocksize to be used if c and a are // complex and b is real. b_alg = bli_determine_blocksize_f( i, my_end, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and C1. bli_acquire_mpart_t2b( BLIS_SUBPART1, @@ -109,83 +108,95 @@ void bli_gemm_blk_var4f( obj_t* a, i, b_alg, c, &c1 ); + // Initialize the context for the real-only stage. + bli_gemm3m3_cntx_stage( 0, &cntx_ro ); // Initialize objects for packing A1 and C1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - packa_cntl_ro ); + &cntx_ro, cntl_sub_packm_a( cntl ) ); bli_packm_init( &c1, c1_pack, - cntl_sub_packm_c( cntl ) ); + &cntx_ro, cntl_sub_packm_c( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - packa_cntl_ro, + &cntx_ro, cntl_sub_packm_a( cntl ), gemm_thread_sub_ipackm( thread ) ); // Pack C1 (if instructed). bli_packm_int( &c1, c1_pack, - cntl_sub_packm_c( cntl ), + &cntx_ro, cntl_sub_packm_c( cntl ), gemm_thread_sub_ipackm( thread ) ); - // Perform gemm subproblem. + // Perform gemm subproblem (real-only). bli_gemm_int( &BLIS_ONE, a1_pack, b_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_gemm( cntl ), gemm_thread_sub_gemm( thread ) ); thread_ibarrier( thread ); + // Only apply beta within the first of three subproblems. if ( thread_am_ichief( thread ) ) bli_obj_scalar_reset( c1_pack ); + // Initialize the context for the imag-only stage. + bli_gemm3m3_cntx_stage( 1, &cntx_io ); + // Initialize objects for packing A1 and C1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - packa_cntl_io ); + &cntx_io, cntl_sub_packm_a( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - packa_cntl_io, + &cntx_io, cntl_sub_packm_a( cntl ), gemm_thread_sub_ipackm( thread ) ); - // Perform gemm subproblem. + // Perform gemm subproblem (imag-only). bli_gemm_int( &BLIS_ONE, a1_pack, b_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_gemm( cntl ), gemm_thread_sub_gemm( thread ) ); thread_ibarrier( thread ); + // Initialize the context for the real+imag stage. + bli_gemm3m3_cntx_stage( 2, &cntx_rpi ); + // Initialize objects for packing A1 and C1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - packa_cntl_rpi ); + &cntx_rpi, cntl_sub_packm_a( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - packa_cntl_rpi, + &cntx_rpi, cntl_sub_packm_a( cntl ), gemm_thread_sub_ipackm( thread ) ); - // Perform gemm subproblem. + // Perform gemm subproblem (real+imag). bli_gemm_int( &BLIS_ONE, a1_pack, b_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_gemm( cntl ), gemm_thread_sub_gemm( thread ) ); @@ -195,7 +206,7 @@ void bli_gemm_blk_var4f( obj_t* a, // Unpack C1 (if C1 was packed). // Currently must be done by 1 thread bli_unpackm_int( c1_pack, &c1, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), gemm_thread_sub_ipackm( thread ) ); } @@ -206,8 +217,9 @@ void bli_gemm_blk_var4f( obj_t* a, bli_packm_release( b_pack, cntl_sub_packm_b( cntl ) ); if( thread_am_ichief( thread ) ){ // It doesn't matter which packm cntl node we pass in, as long - // as it is valid, packm_release() will release the mem_t entry. - bli_packm_release( a1_pack, packa_cntl_ro ); + // as it is valid, packm_release() will release the mem_t entry + // stored in a1_pack. + bli_packm_release( a1_pack, cntl_sub_packm_a( cntl ) ); bli_packm_release( c1_pack, cntl_sub_packm_c( cntl ) ); } } diff --git a/frame/3/gemm/ind/bli_gemm_blk_var4f.h b/frame/3/gemm/ind/bli_gemm_blk_var4f.h index 68a4b5c91..123620dc8 100644 --- a/frame/3/gemm/ind/bli_gemm_blk_var4f.h +++ b/frame/3/gemm/ind/bli_gemm_blk_var4f.h @@ -35,6 +35,7 @@ void bli_gemm_blk_var4f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ); diff --git a/frame/3/gemm/ind/bli_gemm_ker_var3.c b/frame/3/gemm/ind/bli_gemm_ker_var3.c index dc6634f0e..880cd5a37 100644 --- a/frame/3/gemm/ind/bli_gemm_ker_var3.c +++ b/frame/3/gemm/ind/bli_gemm_ker_var3.c @@ -49,7 +49,7 @@ typedef void (*FUNCPTR_T)( dim_t pd_b, inc_t ps_b, void* beta, void* c, inc_t rs_c, inc_t cs_c, - void* gemm_ukr, + cntx_t* cntx, gemm_thrinfo_t* thread ); @@ -59,6 +59,7 @@ static FUNCPTR_T GENARRAY(ftypes,gemm_ker_var3); void bli_gemm_ker_var3( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ) { @@ -95,10 +96,6 @@ void bli_gemm_ker_var3( obj_t* a, FUNCPTR_T f; - func_t* gemm_ukrs; - void* gemm_ukr; - - // Detach and multiply the scalars attached to A and B. bli_obj_scalar_detach( a, &scalar_a ); bli_obj_scalar_detach( b, &scalar_b ); @@ -113,12 +110,6 @@ void bli_gemm_ker_var3( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t object containing - // the gemm micro-kernel function addresses, and then query the - // function address corresponding to the current datatype. - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( schema_a, schema_b, @@ -132,46 +123,51 @@ void bli_gemm_ker_var3( obj_t* a, pd_b, ps_b, buf_beta, buf_c, rs_c, cs_c, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, ukrtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha, \ - void* a, inc_t cs_a, inc_t is_a, \ - dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, inc_t is_b, \ - dim_t pd_b, inc_t ps_b, \ - void* beta, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemm_ukr, \ - gemm_thrinfo_t* thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, inc_t is_a, \ + dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, inc_t is_b, \ + dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + gemm_thrinfo_t* thread \ + ) \ { \ - /* Cast the micro-kernel address to its function pointer type. */ \ - PASTECH(ch,ukrtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ const dim_t MR = pd_a; \ const dim_t NR = pd_b; \ /*const dim_t PACKMR = cs_a;*/ \ /*const dim_t PACKNR = rs_b;*/ \ +\ + /* Query the context for the micro-kernel address and cast it to its + function pointer type. */ \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict zero = PASTEMAC(ch,0); \ ctype* restrict one = PASTEMAC(ch,1); \ @@ -308,25 +304,35 @@ void PASTEMAC(ch,varname)( \ /* Handle interior and edge cases separately. */ \ if ( m_cur == MR && n_cur == NR ) \ { \ +/*PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var3 (4m1b): c before", 8, 6, c11, rs_c, cs_c, "%4.1f", "" );*/ \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - beta_use, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + beta_use, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ +/*PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var3 (4m1b): c after", 8, 6, c11, rs_c, cs_c, "%4.1f", "" );*/ \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Scale the bottom edge of C and add the result from above. */ \ PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \ @@ -337,10 +343,11 @@ void PASTEMAC(ch,varname)( \ } \ } \ } \ +/*printf( "gemm_ker_var3 (4m1b): returning\n" );*/ \ \ /*PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var3: b1", k, NR, b1, NR, 1, "%4.1f", "" ); \ PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var3: a1", MR, k, a1, 1, MR, "%4.1f", "" );*/ \ } -INSERT_GENTFUNC_BASIC( gemm_ker_var3, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( gemm_ker_var3 ) diff --git a/frame/3/gemm/ind/bli_gemm_ker_var3.h b/frame/3/gemm/ind/bli_gemm_ker_var3.h index 9aa4e739e..c3703bf2c 100644 --- a/frame/3/gemm/ind/bli_gemm_ker_var3.h +++ b/frame/3/gemm/ind/bli_gemm_ker_var3.h @@ -39,6 +39,7 @@ void bli_gemm_ker_var3( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ); @@ -62,6 +63,7 @@ void PASTEMAC(ch,varname)( \ dim_t pd_b, inc_t ps_b, \ void* beta, \ void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ void* gemm_ukr, \ gemm_thrinfo_t* thread \ ); diff --git a/frame/3/gemm/ind/bli_gemm_ker_var4.c b/frame/3/gemm/ind/bli_gemm_ker_var4.c index 7a18967b1..6ed8557e8 100644 --- a/frame/3/gemm/ind/bli_gemm_ker_var4.c +++ b/frame/3/gemm/ind/bli_gemm_ker_var4.c @@ -49,7 +49,7 @@ typedef void (*FUNCPTR_T)( dim_t pd_b, inc_t ps_b, void* beta, void* c, inc_t rs_c, inc_t cs_c, - void* gemm_ukr, + cntx_t* cntx, gemm_thrinfo_t* thread ); @@ -59,6 +59,7 @@ static FUNCPTR_T GENARRAY(ftypes,gemm_ker_var4); void bli_gemm_ker_var4( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ) { @@ -95,10 +96,6 @@ void bli_gemm_ker_var4( obj_t* a, FUNCPTR_T f; - func_t* gemm_ukrs; - void* gemm_ukr; - - // Detach and multiply the scalars attached to A and B. bli_obj_scalar_detach( a, &scalar_a ); bli_obj_scalar_detach( b, &scalar_b ); @@ -113,12 +110,6 @@ void bli_gemm_ker_var4( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t object containing - // the gemm micro-kernel function addresses, and then query the - // function address corresponding to the current datatype. - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( schema_a, schema_b, @@ -132,46 +123,51 @@ void bli_gemm_ker_var4( obj_t* a, pd_b, ps_b, buf_beta, buf_c, rs_c, cs_c, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, ukrtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha, \ - void* a, inc_t cs_a, inc_t is_a, \ - dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, inc_t is_b, \ - dim_t pd_b, inc_t ps_b, \ - void* beta, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemm_ukr, \ - gemm_thrinfo_t* thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, inc_t is_a, \ + dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, inc_t is_b, \ + dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + gemm_thrinfo_t* thread \ + ) \ { \ - /* Cast the micro-kernel address to its function pointer type. */ \ - PASTECH(ch,ukrtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ const dim_t MR = pd_a; \ const dim_t NR = pd_b; \ /*const dim_t PACKMR = cs_a;*/ \ /*const dim_t PACKNR = rs_b;*/ \ +\ + /* Query the context for the micro-kernel address and cast it to its + function pointer type. */ \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict zero = PASTEMAC(ch,0); \ ctype* restrict one = PASTEMAC(ch,1); \ @@ -318,24 +314,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - beta_use, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + beta_use, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Scale the bottom edge of C and add the result from above. */ \ PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \ @@ -351,5 +355,5 @@ void PASTEMAC(ch,varname)( \ PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var4: a1", MR, k, a1, 1, MR, "%4.1f", "" );*/ \ } -INSERT_GENTFUNC_BASIC( gemm_ker_var4, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( gemm_ker_var4 ) diff --git a/frame/3/gemm/ind/bli_gemm_ker_var4.h b/frame/3/gemm/ind/bli_gemm_ker_var4.h index 65822fd92..cdd9c026a 100644 --- a/frame/3/gemm/ind/bli_gemm_ker_var4.h +++ b/frame/3/gemm/ind/bli_gemm_ker_var4.h @@ -39,6 +39,7 @@ void bli_gemm_ker_var4( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ); @@ -62,6 +63,7 @@ void PASTEMAC(ch,varname)( \ dim_t pd_b, inc_t ps_b, \ void* beta, \ void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ void* gemm_ukr, \ gemm_thrinfo_t* thread \ ); diff --git a/frame/3/gemm/bli_gemm_blk_var1f.h b/frame/3/gemm/old/bli_gemm_blk_var1f.h similarity index 97% rename from frame/3/gemm/bli_gemm_blk_var1f.h rename to frame/3/gemm/old/bli_gemm_blk_var1f.h index efdaa5df5..e16122e6b 100644 --- a/frame/3/gemm/bli_gemm_blk_var1f.h +++ b/frame/3/gemm/old/bli_gemm_blk_var1f.h @@ -35,6 +35,7 @@ void bli_gemm_blk_var1f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ); diff --git a/frame/3/gemm/bli_gemm_blk_var2f.h b/frame/3/gemm/old/bli_gemm_blk_var2f.h similarity index 97% rename from frame/3/gemm/bli_gemm_blk_var2f.h rename to frame/3/gemm/old/bli_gemm_blk_var2f.h index 3718d8623..ee43ab76b 100644 --- a/frame/3/gemm/bli_gemm_blk_var2f.h +++ b/frame/3/gemm/old/bli_gemm_blk_var2f.h @@ -35,6 +35,7 @@ void bli_gemm_blk_var2f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ); diff --git a/frame/3/gemm/bli_gemm_blk_var3f.h b/frame/3/gemm/old/bli_gemm_blk_var3f.h similarity index 97% rename from frame/3/gemm/bli_gemm_blk_var3f.h rename to frame/3/gemm/old/bli_gemm_blk_var3f.h index 0eb2c9fba..7e98bf259 100644 --- a/frame/3/gemm/bli_gemm_blk_var3f.h +++ b/frame/3/gemm/old/bli_gemm_blk_var3f.h @@ -35,6 +35,7 @@ void bli_gemm_blk_var3f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ); diff --git a/frame/3/gemm/old/bli_gemm_cntx.c b/frame/3/gemm/old/bli_gemm_cntx.c new file mode 100644 index 000000000..f4b90bf3b --- /dev/null +++ b/frame/3/gemm/old/bli_gemm_cntx.c @@ -0,0 +1,69 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +void bli_gemm_cntx_init( cntx_t* cntx ) +{ + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), given the execution method. + bli_gks_cntx_set_blkszs( BLIS_NAT, 6, + BLIS_NC, BLIS_NR, + BLIS_KC, BLIS_KR, + BLIS_MC, BLIS_MR, + BLIS_NR, BLIS_NR, + BLIS_MR, BLIS_MR, + BLIS_KR, BLIS_KR, + cntx ); + + // Set the pack_t schemas for native execution. + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS, + BLIS_PACKED_COL_PANELS, + cntx ); +} + +void bli_gemm_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + diff --git a/frame/3/gemm/old/bli_gemm_cntx.h b/frame/3/gemm/old/bli_gemm_cntx.h new file mode 100644 index 000000000..6daf0a6d2 --- /dev/null +++ b/frame/3/gemm/old/bli_gemm_cntx.h @@ -0,0 +1,37 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +void bli_gemm_cntx_init( void ); +void bli_gemm_cntx_finalize( void ); + diff --git a/frame/3/gemm/bli_gemm_ker_var2.h b/frame/3/gemm/old/bli_gemm_ker_var2.h similarity index 97% rename from frame/3/gemm/bli_gemm_ker_var2.h rename to frame/3/gemm/old/bli_gemm_ker_var2.h index 4d750889e..d125eebac 100644 --- a/frame/3/gemm/bli_gemm_ker_var2.h +++ b/frame/3/gemm/old/bli_gemm_ker_var2.h @@ -39,6 +39,7 @@ void bli_gemm_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ); @@ -62,6 +63,7 @@ void PASTEMAC(ch,varname)( \ dim_t pd_b, inc_t ps_b, \ void* beta, \ void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ void* gemm_ukr, \ gemm_thrinfo_t* thread \ ); diff --git a/frame/3/gemm/other/bli_gemm_cntl_exp.c b/frame/3/gemm/other/bli_gemm_cntl_exp.c deleted file mode 100644 index 369ef6b17..000000000 --- a/frame/3/gemm/other/bli_gemm_cntl_exp.c +++ /dev/null @@ -1,123 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - -extern blksz_t* gemm_mc; -extern blksz_t* gemm_nc; -extern blksz_t* gemm_kc; -extern blksz_t* gemm_mr; -extern blksz_t* gemm_nr; -extern blksz_t* gemm_kr; - -extern func_t* gemm_ukrs; - -extern packm_t* gemm_packa_cntl; -extern packm_t* gemm_packb_cntl; - -gemm_t* gemm_cntl5; - -gemm_t* gemm_cntl_bp_ke5; -gemm_t* gemm_cntl_pm_bp; -gemm_t* gemm_cntl_mm_pm; -gemm_t* gemm_cntl_vl_mm5; - - -void bli_gemm_cntl_init_exp() -{ - - // - // Create a control tree for packing A, and streaming B and C. - // - - gemm_cntl_bp_ke5 - = - bli_gemm_cntl_obj_create( BLIS_UNB_OPT, - BLIS_VARIANT5, - NULL, - gemm_ukrs, - NULL, NULL, NULL, - NULL, NULL, NULL ); - gemm_cntl_pm_bp - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm_kc, - NULL, - NULL, - gemm_packa_cntl, - NULL, - NULL, - gemm_cntl_bp_ke5, - NULL ); - - gemm_cntl_mm_pm - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm_mc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm_cntl_pm_bp, - NULL ); - - gemm_cntl_vl_mm5 - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm_cntl_mm_pm, - NULL ); - - gemm_cntl5 = gemm_cntl_vl_mm5; -} - -void bli_gemm_cntl_finalize_exp() -{ - bli_cntl_obj_free( gemm_cntl_bp_ke5 ); - bli_cntl_obj_free( gemm_cntl_pm_bp ); - bli_cntl_obj_free( gemm_cntl_mm_pm ); - bli_cntl_obj_free( gemm_cntl_vl_mm5 ); -} - diff --git a/frame/3/gemm/other/bli_gemm_ker_var5.c b/frame/3/gemm/other/bli_gemm_ker_var5.c index 6a06482d9..3d500bf99 100644 --- a/frame/3/gemm/other/bli_gemm_ker_var5.c +++ b/frame/3/gemm/other/bli_gemm_ker_var5.c @@ -54,6 +54,7 @@ static FUNCPTR_T GENARRAY(ftypes,gemm_ker_var5); void bli_gemm_ker_var5( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ) { @@ -103,11 +104,11 @@ void bli_gemm_ker_var5( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t object containing + // Extract from the context the func_t object containing // the gemm micro-kernel function addresses, and then query the // function address corresponding to the current datatype. - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); + gemm_ukrs = bli_cntx_get_l3_ukr( BLIS_GEMM_UKR, cntx ); + gemm_ukr = bli_func_get_dt( dt_exec, gemm_ukrs ); // Invoke the function. f( m, diff --git a/frame/3/gemm/other/bli_gemm_ker_var5.h b/frame/3/gemm/other/bli_gemm_ker_var5.h index 9b7a2835b..f0bb50503 100644 --- a/frame/3/gemm/other/bli_gemm_ker_var5.h +++ b/frame/3/gemm/other/bli_gemm_ker_var5.h @@ -39,6 +39,7 @@ void bli_gemm_ker_var5( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, gemm_thrinfo_t* thread ); diff --git a/frame/3/hemm/bli_hemm.h b/frame/3/hemm/bli_hemm.h index 0335c6a86..03a50a221 100644 --- a/frame/3/hemm/bli_hemm.h +++ b/frame/3/hemm/bli_hemm.h @@ -32,37 +32,5 @@ */ -#include "bli_hemm_check.h" #include "bli_hemm_front.h" - -// -// Prototype object-based interface. -// -void bli_hemm( side_t side, - obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c ); - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - conj_t conja, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( hemm ) - diff --git a/frame/3/hemm/bli_hemm_front.c b/frame/3/hemm/bli_hemm_front.c index 4d941967d..f9b529807 100644 --- a/frame/3/hemm/bli_hemm_front.c +++ b/frame/3/hemm/bli_hemm_front.c @@ -40,6 +40,7 @@ void bli_hemm_front( side_t side, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { obj_t a_local; @@ -48,7 +49,7 @@ void bli_hemm_front( side_t side, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_hemm_check( side, alpha, a, b, beta, c ); + bli_hemm_check( side, alpha, a, b, beta, c, cntx ); // If alpha is zero, scale by beta and return. if ( bli_obj_equals( alpha, &BLIS_ZERO ) ) @@ -57,6 +58,10 @@ void bli_hemm_front( side_t side, return; } + // Reinitialize the memory allocator to accommodate the blocksizes + // in the current context. + bli_mem_reinit( cntx ); + // Alias A, B, and C in case we need to apply transformations. bli_obj_alias_to( *a, a_local ); bli_obj_alias_to( *b, b_local ); @@ -66,14 +71,7 @@ void bli_hemm_front( side_t side, // contiguous columns, or if C is stored by columns and the micro-kernel // prefers contiguous rows, transpose the entire operation to allow the // micro-kernel to access elements of C in its preferred manner. - if ( - ( bli_obj_is_row_stored( c_local ) && - bli_func_prefers_contig_cols( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) || - ( bli_obj_is_col_stored( c_local ) && - bli_func_prefers_contig_rows( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) - ) + if ( bli_cntx_l3_nat_ukr_dislikes_storage_of( &c_local, BLIS_GEMM_UKR, cntx ) ) { bli_toggle_side( side ); bli_obj_toggle_conj( a_local ); @@ -93,12 +91,13 @@ void bli_hemm_front( side_t side, // Invoke the internal back-end. bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_gemm_int, + (l3_int_t) bli_gemm_int, alpha, &a_local, &b_local, beta, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); diff --git a/frame/3/hemm/bli_hemm_front.h b/frame/3/hemm/bli_hemm_front.h index ec7afad2d..840b24791 100644 --- a/frame/3/hemm/bli_hemm_front.h +++ b/frame/3/hemm/bli_hemm_front.h @@ -38,5 +38,6 @@ void bli_hemm_front( side_t side, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/her2k/bli_her2k.h b/frame/3/her2k/bli_her2k.h index 7b2b7174b..336bc67ed 100644 --- a/frame/3/her2k/bli_her2k.h +++ b/frame/3/her2k/bli_her2k.h @@ -32,35 +32,5 @@ */ -#include "bli_her2k_check.h" #include "bli_her2k_front.h" - -// -// Prototype object-based interface. -// -void bli_her2k( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c ); - - -#undef GENTPROTR -#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - trans_t transb, \ - dim_t m, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype_r* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROTR_BASIC( her2k ) - diff --git a/frame/3/her2k/bli_her2k_front.c b/frame/3/her2k/bli_her2k_front.c index 25bc59eb7..8007c8e8b 100644 --- a/frame/3/her2k/bli_her2k_front.c +++ b/frame/3/her2k/bli_her2k_front.c @@ -39,6 +39,7 @@ void bli_her2k_front( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { obj_t alpha_conj; @@ -50,7 +51,7 @@ void bli_her2k_front( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_her2k_check( alpha, a, b, beta, c ); + bli_her2k_check( alpha, a, b, beta, c, cntx ); // If alpha is zero, scale by beta, zero the imaginary components of // the diagonal elements, and return. @@ -61,6 +62,10 @@ void bli_her2k_front( obj_t* alpha, return; } + // Reinitialize the memory allocator to accommodate the blocksizes + // in the current context. + bli_mem_reinit( cntx ); + // Alias A, B, and C in case we need to apply transformations. bli_obj_alias_to( *a, a_local ); bli_obj_alias_to( *b, b_local ); @@ -86,14 +91,7 @@ void bli_her2k_front( obj_t* alpha, // contiguous columns, or if C is stored by columns and the micro-kernel // prefers contiguous rows, transpose the entire operation to allow the // micro-kernel to access elements of C in its preferred manner. - if ( - ( bli_obj_is_row_stored( c_local ) && - bli_func_prefers_contig_cols( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) || - ( bli_obj_is_col_stored( c_local ) && - bli_func_prefers_contig_rows( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) - ) + if ( bli_cntx_l3_nat_ukr_dislikes_storage_of( &c_local, BLIS_GEMM_UKR, cntx ) ) { bli_obj_swap( a_local, bh_local ); bli_obj_swap( b_local, ah_local ); @@ -125,22 +123,24 @@ void bli_her2k_front( obj_t* alpha, // Invoke the internal back-end. bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_herk_int, + (l3_int_t) bli_herk_int, alpha, &a_local, &bh_local, beta, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_herk_int, + (l3_int_t) bli_herk_int, &alpha_conj, &b_local, &ah_local, &BLIS_ONE, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); diff --git a/frame/3/her2k/bli_her2k_front.h b/frame/3/her2k/bli_her2k_front.h index cb744b2f1..8a699c4c4 100644 --- a/frame/3/her2k/bli_her2k_front.h +++ b/frame/3/her2k/bli_her2k_front.h @@ -37,5 +37,6 @@ void bli_her2k_front( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/herk/bli_herk.h b/frame/3/herk/bli_herk.h index 1129d3ff6..290b8bda3 100644 --- a/frame/3/herk/bli_herk.h +++ b/frame/3/herk/bli_herk.h @@ -32,42 +32,8 @@ */ -#include "bli_herk_check.h" #include "bli_herk_front.h" #include "bli_herk_int.h" -#include "bli_herk_prune.h" -#include "bli_herk_blk_var1f.h" +#include "bli_herk_var.h" -#include "bli_herk_blk_var2f.h" - -#include "bli_herk_blk_var3f.h" - -#include "bli_herk_l_ker_var2.h" -#include "bli_herk_u_ker_var2.h" - - -// -// Prototype object-based interface. -// -void bli_herk( obj_t* alpha, - obj_t* a, - obj_t* beta, - obj_t* c ); - - -#undef GENTPROTR -#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - dim_t m, \ - dim_t k, \ - ctype_r* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype_r* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROTR_BASIC( herk ) diff --git a/frame/3/herk/bli_herk_blk_var1f.c b/frame/3/herk/bli_herk_blk_var1f.c index a733d29b7..d622acbe6 100644 --- a/frame/3/herk/bli_herk_blk_var1f.c +++ b/frame/3/herk/bli_herk_blk_var1f.c @@ -37,6 +37,7 @@ void bli_herk_blk_var1f( obj_t* a, obj_t* ah, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ) { @@ -58,13 +59,13 @@ void bli_herk_blk_var1f( obj_t* a, // Initialize object for packing A'. bli_obj_init_pack( &ah_pack_s ); bli_packm_init( ah, &ah_pack_s, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); // Scale C by beta (if instructed). // Since scalm doesn't support multithreading yet, must be done by chief thread (ew) bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } ah_pack = thread_obroadcast( thread, &ah_pack_s ); @@ -78,12 +79,12 @@ void bli_herk_blk_var1f( obj_t* a, // Pack A' (if instructed). bli_packm_int( ah, ah_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), herk_thread_sub_opackm( thread ) ); dim_t my_start, my_end; bli_get_range_weighted_t2b( thread, c, - bli_blksz_get_mult_for_obj( a, cntl_blocksize( cntl ) ), + bli_cntx_get_bmult( cntl_bszid( cntl ), cntx ), &my_start, &my_end ); // Partition along the m dimension. @@ -91,7 +92,7 @@ void bli_herk_blk_var1f( obj_t* a, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, my_end, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and C1. bli_acquire_mpart_t2b( BLIS_SUBPART1, @@ -102,20 +103,20 @@ void bli_herk_blk_var1f( obj_t* a, // Initialize objects for packing A1 and C1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packm_init( &c1, c1_pack, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), herk_thread_sub_ipackm( thread ) ); // Pack C1 (if instructed). bli_packm_int( &c1, c1_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), herk_thread_sub_ipackm( thread ) ); // Perform herk subproblem. @@ -124,6 +125,7 @@ void bli_herk_blk_var1f( obj_t* a, ah_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_gemm( cntl ), herk_thread_sub_herk( thread ) ); @@ -131,7 +133,7 @@ void bli_herk_blk_var1f( obj_t* a, // Unpack C1 (if C1 was packed). bli_unpackm_int( c1_pack, &c1, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), herk_thread_sub_ipackm( thread ) ); } diff --git a/frame/3/herk/bli_herk_blk_var2f.c b/frame/3/herk/bli_herk_blk_var2f.c index b4a0ea924..8332d0d19 100644 --- a/frame/3/herk/bli_herk_blk_var2f.c +++ b/frame/3/herk/bli_herk_blk_var2f.c @@ -37,6 +37,7 @@ void bli_herk_blk_var2f( obj_t* a, obj_t* ah, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ) { @@ -58,12 +59,12 @@ void bli_herk_blk_var2f( obj_t* a, // Initialize object for packing A bli_obj_init_pack( &a_pack_s ); bli_packm_init( a, &a_pack_s, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } a_pack = thread_obroadcast( thread, &a_pack_s ); @@ -77,12 +78,12 @@ void bli_herk_blk_var2f( obj_t* a, // Pack A (if instructed). bli_packm_int( a, a_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), herk_thread_sub_opackm( thread ) ); dim_t my_start, my_end; bli_get_range_weighted_l2r( thread, c, - bli_blksz_get_mult_for_obj( a, cntl_blocksize( cntl ) ), + bli_cntx_get_bmult( cntl_bszid( cntl ), cntx ), &my_start, &my_end ); // Partition along the n dimension. @@ -90,7 +91,7 @@ void bli_herk_blk_var2f( obj_t* a, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, my_end, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1' and C1. bli_acquire_mpart_l2r( BLIS_SUBPART1, @@ -101,20 +102,20 @@ void bli_herk_blk_var2f( obj_t* a, // Initialize objects for packing A1' and C1. if( thread_am_ichief( thread ) ) { bli_packm_init( &ah1, ah1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); bli_packm_init( &c1, c1_pack, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); } thread_ibarrier( thread ) ; // Pack A1' (if instructed). bli_packm_int( &ah1, ah1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), herk_thread_sub_ipackm( thread ) ); // Pack C1 (if instructed). bli_packm_int( &c1, c1_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), herk_thread_sub_ipackm( thread ) ) ; // Perform herk subproblem. @@ -123,6 +124,7 @@ void bli_herk_blk_var2f( obj_t* a, ah1_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_gemm( cntl ), herk_thread_sub_herk( thread ) ); @@ -130,7 +132,7 @@ void bli_herk_blk_var2f( obj_t* a, // Unpack C1 (if C1 was packed). bli_unpackm_int( c1_pack, &c1, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), herk_thread_sub_ipackm( thread ) ); } diff --git a/frame/3/herk/bli_herk_blk_var3f.c b/frame/3/herk/bli_herk_blk_var3f.c index fd20c14cd..0f695b477 100644 --- a/frame/3/herk/bli_herk_blk_var3f.c +++ b/frame/3/herk/bli_herk_blk_var3f.c @@ -37,6 +37,7 @@ void bli_herk_blk_var3f( obj_t* a, obj_t* ah, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ) { @@ -59,12 +60,12 @@ void bli_herk_blk_var3f( obj_t* a, // Initialize object for packing C. bli_obj_init_pack( &c_pack_s ); bli_packm_init( c, &c_pack_s, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } c_pack = thread_obroadcast( thread, &c_pack_s ); @@ -78,7 +79,7 @@ void bli_herk_blk_var3f( obj_t* a, // Pack C (if instructed). bli_packm_int( c, c_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), herk_thread_sub_opackm( thread ) ); // Query dimension in partitioning direction. @@ -89,7 +90,7 @@ void bli_herk_blk_var3f( obj_t* a, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, k_trans, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and A1'. bli_acquire_mpart_l2r( BLIS_SUBPART1, @@ -100,20 +101,20 @@ void bli_herk_blk_var3f( obj_t* a, // Initialize objects for packing A1 and A1'. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packm_init( &ah1, ah1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), herk_thread_sub_ipackm( thread ) ); // Pack B1 (if instructed). bli_packm_int( &ah1, ah1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), herk_thread_sub_ipackm( thread ) ); // Perform herk subproblem. @@ -122,6 +123,7 @@ void bli_herk_blk_var3f( obj_t* a, ah1_pack, &BLIS_ONE, c_pack, + cntx, cntl_sub_gemm( cntl ), herk_thread_sub_herk( thread ) ); @@ -140,7 +142,7 @@ void bli_herk_blk_var3f( obj_t* a, // Unpack C (if C was packed). bli_unpackm_int( c_pack, c, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), herk_thread_sub_opackm( thread ) ); // If any packing buffers were acquired within packm, release them back diff --git a/frame/3/herk/bli_herk_front.c b/frame/3/herk/bli_herk_front.c index 1129f0690..ea7e0b188 100644 --- a/frame/3/herk/bli_herk_front.c +++ b/frame/3/herk/bli_herk_front.c @@ -38,6 +38,7 @@ void bli_herk_front( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { obj_t a_local; @@ -46,7 +47,7 @@ void bli_herk_front( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_herk_check( alpha, a, beta, c ); + bli_herk_check( alpha, a, beta, c, cntx ); // If alpha is zero, scale by beta, zero the imaginary components of // the diagonal elements, and return. @@ -57,6 +58,10 @@ void bli_herk_front( obj_t* alpha, return; } + // Reinitialize the memory allocator to accommodate the blocksizes + // in the current context. + bli_mem_reinit( cntx ); + // Alias A and C in case we need to apply transformations. bli_obj_alias_to( *a, a_local ); bli_obj_alias_to( *c, c_local ); @@ -71,14 +76,7 @@ void bli_herk_front( obj_t* alpha, // contiguous columns, or if C is stored by columns and the micro-kernel // prefers contiguous rows, transpose the entire operation to allow the // micro-kernel to access elements of C in its preferred manner. - if ( - ( bli_obj_is_row_stored( c_local ) && - bli_func_prefers_contig_cols( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) || - ( bli_obj_is_col_stored( c_local ) && - bli_func_prefers_contig_rows( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) - ) + if ( bli_cntx_l3_nat_ukr_dislikes_storage_of( &c_local, BLIS_GEMM_UKR, cntx ) ) { bli_obj_toggle_conj( a_local ); bli_obj_toggle_conj( ah_local ); @@ -91,12 +89,13 @@ void bli_herk_front( obj_t* alpha, // Invoke the internal back-end. bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_herk_int, + (l3_int_t) bli_herk_int, alpha, &a_local, &ah_local, beta, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); diff --git a/frame/3/herk/bli_herk_front.h b/frame/3/herk/bli_herk_front.h index 259905c84..c778399d0 100644 --- a/frame/3/herk/bli_herk_front.h +++ b/frame/3/herk/bli_herk_front.h @@ -36,5 +36,6 @@ void bli_herk_front( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/herk/bli_herk_int.c b/frame/3/herk/bli_herk_int.c index ea6752881..cc1507208 100644 --- a/frame/3/herk/bli_herk_int.c +++ b/frame/3/herk/bli_herk_int.c @@ -39,6 +39,7 @@ typedef void (*FUNCPTR_T)( obj_t* a, obj_t* ah, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ); @@ -67,6 +68,7 @@ void bli_herk_int( obj_t* alpha, obj_t* ah, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ) { @@ -80,7 +82,7 @@ void bli_herk_int( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_herk_int_check( alpha, a, ah, beta, c, cntl ); + bli_gemm_basic_check( alpha, a, ah, beta, c, cntx ); // If C has a zero dimension, return early. if ( bli_obj_has_zero_dim( *c ) ) return; @@ -142,6 +144,7 @@ void bli_herk_int( obj_t* alpha, f( &a_local, &ah_local, &c_local, + cntx, cntl, thread ); } diff --git a/frame/3/herk/bli_herk_int.h b/frame/3/herk/bli_herk_int.h index c9155a55e..da35d40a1 100644 --- a/frame/3/herk/bli_herk_int.h +++ b/frame/3/herk/bli_herk_int.h @@ -37,6 +37,7 @@ void bli_herk_int( obj_t* alpha, obj_t* ah, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ); diff --git a/frame/3/herk/bli_herk_l_ker_var2.c b/frame/3/herk/bli_herk_l_ker_var2.c index 4c167410f..d988682a1 100644 --- a/frame/3/herk/bli_herk_l_ker_var2.c +++ b/frame/3/herk/bli_herk_l_ker_var2.c @@ -50,7 +50,7 @@ typedef void (*FUNCPTR_T)( dim_t pd_b, inc_t ps_b, void* beta, void* c, inc_t rs_c, inc_t cs_c, - void* gemm_ukr, + cntx_t* cntx, herk_thrinfo_t* thread ); @@ -60,6 +60,7 @@ static FUNCPTR_T GENARRAY(ftypes,herk_l_ker_var2); void bli_herk_l_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ) { @@ -98,10 +99,6 @@ void bli_herk_l_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemm_ukrs; - void* gemm_ukr; - - // Detach and multiply the scalars attached to A and B. bli_obj_scalar_detach( a, &scalar_a ); bli_obj_scalar_detach( b, &scalar_b ); @@ -116,12 +113,6 @@ void bli_herk_l_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t object containing - // the gemm micro-kernel function addresses, and then query the - // function address corresponding to the current datatype. - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( diagoffc, schema_a, @@ -136,47 +127,52 @@ void bli_herk_l_ker_var2( obj_t* a, pd_b, ps_b, buf_beta, buf_c, rs_c, cs_c, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, ukrtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffc, \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha, \ - void* a, inc_t cs_a, inc_t is_a, \ - dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, inc_t is_b, \ - dim_t pd_b, inc_t ps_b, \ - void* beta, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemm_ukr, \ - herk_thrinfo_t* thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffc, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, inc_t is_a, \ + dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, inc_t is_b, \ + dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + herk_thrinfo_t* thread \ + ) \ { \ - /* Cast the micro-kernel address to its function pointer type. */ \ - PASTECH(ch,ukrtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ const dim_t MR = pd_a; \ const dim_t NR = pd_b; \ /*const dim_t PACKMR = cs_a;*/ \ /*const dim_t PACKNR = rs_b;*/ \ +\ + /* Query the context for the micro-kernel address and cast it to its + function pointer type. */ \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict zero = PASTEMAC(ch,0); \ ctype* restrict a_cast = a; \ @@ -333,13 +329,17 @@ void PASTEMAC(ch,varname)( \ if ( bli_intersects_diag_n( diagoffc_ij, m_cur, n_cur ) ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Scale C and add the result to only the stored part. */ \ PASTEMAC(ch,xpbys_mxn_l)( diagoffc_ij, \ @@ -354,24 +354,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - beta_cast, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + beta_cast, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Scale the edge of C and add the result. */ \ PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \ @@ -384,5 +392,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC( herk_l_ker_var2, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( herk_l_ker_var2 ) diff --git a/frame/3/herk/bli_herk_u_ker_var2.c b/frame/3/herk/bli_herk_u_ker_var2.c index fbb7d00af..16f9bb866 100644 --- a/frame/3/herk/bli_herk_u_ker_var2.c +++ b/frame/3/herk/bli_herk_u_ker_var2.c @@ -50,7 +50,7 @@ typedef void (*FUNCPTR_T)( dim_t pd_b, inc_t ps_b, void* beta, void* c, inc_t rs_c, inc_t cs_c, - void* gemm_ukr, + cntx_t* cntx, herk_thrinfo_t* thread ); @@ -60,6 +60,7 @@ static FUNCPTR_T GENARRAY(ftypes,herk_u_ker_var2); void bli_herk_u_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ) { @@ -98,10 +99,6 @@ void bli_herk_u_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemm_ukrs; - void* gemm_ukr; - - // Detach and multiply the scalars attached to A and B. bli_obj_scalar_detach( a, &scalar_a ); bli_obj_scalar_detach( b, &scalar_b ); @@ -116,12 +113,6 @@ void bli_herk_u_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t object containing - // the gemm micro-kernel function addresses, and then query the - // function address corresponding to the current datatype. - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( diagoffc, schema_a, @@ -136,47 +127,52 @@ void bli_herk_u_ker_var2( obj_t* a, pd_b, ps_b, buf_beta, buf_c, rs_c, cs_c, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, ukrtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffc, \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha, \ - void* a, inc_t cs_a, inc_t is_a, \ - dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, inc_t is_b, \ - dim_t pd_b, inc_t ps_b, \ - void* beta, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemm_ukr, \ - herk_thrinfo_t* thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffc, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, inc_t is_a, \ + dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, inc_t is_b, \ + dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + herk_thrinfo_t* thread \ + ) \ { \ - /* Cast the micro-kernel address to its function pointer type. */ \ - PASTECH(ch,ukrtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ const dim_t MR = pd_a; \ const dim_t NR = pd_b; \ /*const dim_t PACKMR = cs_a;*/ \ /*const dim_t PACKNR = rs_b;*/ \ +\ + /* Query the context for the micro-kernel address and cast it to its + function pointer type. */ \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict zero = PASTEMAC(ch,0); \ ctype* restrict a_cast = a; \ @@ -333,13 +329,17 @@ void PASTEMAC(ch,varname)( \ if ( bli_intersects_diag_n( diagoffc_ij, m_cur, n_cur ) ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Scale C and add the result to only the stored part. */ \ PASTEMAC(ch,xpbys_mxn_u)( diagoffc_ij, \ @@ -354,24 +354,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - beta_cast, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + beta_cast, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Scale the edge of C and add the result. */ \ PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \ @@ -384,5 +392,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC( herk_u_ker_var2, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( herk_u_ker_var2 ) diff --git a/frame/3/herk/bli_herk_var.h b/frame/3/herk/bli_herk_var.h new file mode 100644 index 000000000..34a0e5517 --- /dev/null +++ b/frame/3/herk/bli_herk_var.h @@ -0,0 +1,89 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* a, \ + obj_t* ah, \ + obj_t* c, \ + cntx_t* cntx, \ + gemm_t* cntl, \ + herk_thrinfo_t* thread \ + ); + +GENPROT( herk_blk_var1f ) +GENPROT( herk_blk_var2f ) +GENPROT( herk_blk_var3f ) + +GENPROT( herk_l_ker_var2 ) +GENPROT( herk_u_ker_var2 ) + + +// +// Prototype BLAS-like interfaces with void pointer operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffc, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, inc_t is_a, \ + dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, inc_t is_b, \ + dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + herk_thrinfo_t* thread \ + ); + +INSERT_GENTPROT_BASIC( herk_l_ker_var2 ) +INSERT_GENTPROT_BASIC( herk_u_ker_var2 ) + diff --git a/frame/3/herk/bli_herk_blk_var1f.h b/frame/3/herk/old/bli_herk_blk_var1f.h similarity index 97% rename from frame/3/herk/bli_herk_blk_var1f.h rename to frame/3/herk/old/bli_herk_blk_var1f.h index 4c6b3a112..bd1d8a95f 100644 --- a/frame/3/herk/bli_herk_blk_var1f.h +++ b/frame/3/herk/old/bli_herk_blk_var1f.h @@ -35,6 +35,7 @@ void bli_herk_blk_var1f( obj_t* a, obj_t* ah, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ); diff --git a/frame/3/herk/bli_herk_blk_var2f.h b/frame/3/herk/old/bli_herk_blk_var2f.h similarity index 97% rename from frame/3/herk/bli_herk_blk_var2f.h rename to frame/3/herk/old/bli_herk_blk_var2f.h index 044bd7041..f436a0082 100644 --- a/frame/3/herk/bli_herk_blk_var2f.h +++ b/frame/3/herk/old/bli_herk_blk_var2f.h @@ -35,6 +35,7 @@ void bli_herk_blk_var2f( obj_t* a, obj_t* ah, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ); diff --git a/frame/3/herk/bli_herk_blk_var3f.h b/frame/3/herk/old/bli_herk_blk_var3f.h similarity index 97% rename from frame/3/herk/bli_herk_blk_var3f.h rename to frame/3/herk/old/bli_herk_blk_var3f.h index 7dd945699..800a44b8d 100644 --- a/frame/3/herk/bli_herk_blk_var3f.h +++ b/frame/3/herk/old/bli_herk_blk_var3f.h @@ -35,6 +35,7 @@ void bli_herk_blk_var3f( obj_t* a, obj_t* ah, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ); diff --git a/frame/3/herk/bli_herk_l_ker_var2.h b/frame/3/herk/old/bli_herk_l_ker_var2.h similarity index 97% rename from frame/3/herk/bli_herk_l_ker_var2.h rename to frame/3/herk/old/bli_herk_l_ker_var2.h index a03b79bf2..09656596d 100644 --- a/frame/3/herk/bli_herk_l_ker_var2.h +++ b/frame/3/herk/old/bli_herk_l_ker_var2.h @@ -39,6 +39,7 @@ void bli_herk_l_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ); @@ -63,6 +64,7 @@ void PASTEMAC(ch,varname)( \ dim_t pd_b, inc_t ps_b, \ void* beta, \ void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ void* gemm_ukr, \ herk_thrinfo_t* thread \ ); diff --git a/frame/3/herk/bli_herk_u_ker_var2.h b/frame/3/herk/old/bli_herk_u_ker_var2.h similarity index 97% rename from frame/3/herk/bli_herk_u_ker_var2.h rename to frame/3/herk/old/bli_herk_u_ker_var2.h index 05feeba18..0701db148 100644 --- a/frame/3/herk/bli_herk_u_ker_var2.h +++ b/frame/3/herk/old/bli_herk_u_ker_var2.h @@ -39,6 +39,7 @@ void bli_herk_u_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, herk_thrinfo_t* thread ); @@ -63,6 +64,7 @@ void PASTEMAC(ch,varname)( \ dim_t pd_b, inc_t ps_b, \ void* beta, \ void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ void* gemm_ukr, \ herk_thrinfo_t* thread \ ); diff --git a/frame/3/gemm/bli_gemm.c b/frame/3/old/bli_gemm.c similarity index 90% rename from frame/3/gemm/bli_gemm.c rename to frame/3/old/bli_gemm.c index 15454a391..c612cf865 100644 --- a/frame/3/gemm/bli_gemm.c +++ b/frame/3/old/bli_gemm.c @@ -34,27 +34,17 @@ #include "blis.h" -extern gemm_t* gemm_cntl; - void bli_gemm( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ) { - num_t dt = bli_obj_datatype( *c ); - - // If an induced method is available (ie: implemented and enabled), - // call it instead. - if ( bli_gemmind_has_avail( dt ) ) - { - gemm_fp_t func = bli_gemmind_get_avail( dt ); - - return func( alpha, a, b, beta, c ); - } - - bli_gemm_front( alpha, a, b, beta, c, - gemm_cntl ); + // The operation's "ind" function--its induced method front-end--will + // call native execution for real domain problems. For complex problems, + // it calls the highest priority induced method that is available (ie: + // implemented and enabled), and otherwise calls native execution. + bli_gemmind( alpha, a, b, beta, c ); } diff --git a/frame/3/gemm/bli_gemm_blocksize.c b/frame/3/old/bli_gemm_blocksize.c similarity index 85% rename from frame/3/gemm/bli_gemm_blocksize.c rename to frame/3/old/bli_gemm_blocksize.c index fdf877fba..606e68ef3 100644 --- a/frame/3/gemm/bli_gemm_blocksize.c +++ b/frame/3/old/bli_gemm_blocksize.c @@ -38,12 +38,14 @@ dim_t bli_gemm_determine_kc_f( dim_t i, dim_t dim, obj_t* a, obj_t* b, - blksz_t* bsize ) + bszid_t bszid, + cntx_t* cntx ) { - num_t dt; - dim_t mnr; - dim_t b_alg, b_max; - dim_t b_use; + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; // We assume that this function is being called from an algorithm that // is moving "forward" (ie: top to bottom, left to right, top-left @@ -52,6 +54,7 @@ dim_t bli_gemm_determine_kc_f( dim_t i, // Extract the execution datatype and use it to query the corresponding // blocksize and blocksize maximum values from the blksz_t object. dt = bli_obj_execution_datatype( *a ); + bsize = bli_cntx_get_blksz( bszid, cntx ); b_alg = bli_blksz_get_def( dt, bsize ); b_max = bli_blksz_get_max( dt, bsize ); @@ -61,13 +64,13 @@ dim_t bli_gemm_determine_kc_f( dim_t i, // the blocksizes unchanged. if ( bli_obj_root_is_herm_or_symm( *a ) ) { - mnr = bli_blksz_get_mr( dt, bsize ); + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); b_alg = bli_align_dim_to_mult( b_alg, mnr ); b_max = bli_align_dim_to_mult( b_max, mnr ); } else if ( bli_obj_root_is_herm_or_symm( *b ) ) { - mnr = bli_blksz_get_nr( dt, bsize ); + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); b_alg = bli_align_dim_to_mult( b_alg, mnr ); b_max = bli_align_dim_to_mult( b_max, mnr ); } @@ -82,12 +85,14 @@ dim_t bli_gemm_determine_kc_b( dim_t i, dim_t dim, obj_t* a, obj_t* b, - blksz_t* bsize ) + bszid_t bszid, + cntx_t* cntx ) { - num_t dt; - dim_t mnr; - dim_t b_alg, b_max; - dim_t b_use; + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; // We assume that this function is being called from an algorithm that // is moving "backward" (ie: bottom to top, right to left, bottom-right @@ -96,6 +101,7 @@ dim_t bli_gemm_determine_kc_b( dim_t i, // Extract the execution datatype and use it to query the corresponding // blocksize and blocksize maximum values from the blksz_t object. dt = bli_obj_execution_datatype( *a ); + bsize = bli_cntx_get_blksz( bszid, cntx ); b_alg = bli_blksz_get_def( dt, bsize ); b_max = bli_blksz_get_max( dt, bsize ); @@ -105,13 +111,13 @@ dim_t bli_gemm_determine_kc_b( dim_t i, // the blocksizes unchanged. if ( bli_obj_root_is_herm_or_symm( *a ) ) { - mnr = bli_blksz_get_mr( dt, bsize ); + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); b_alg = bli_align_dim_to_mult( b_alg, mnr ); b_max = bli_align_dim_to_mult( b_max, mnr ); } else if ( bli_obj_root_is_herm_or_symm( *b ) ) { - mnr = bli_blksz_get_nr( dt, bsize ); + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); b_alg = bli_align_dim_to_mult( b_alg, mnr ); b_max = bli_align_dim_to_mult( b_max, mnr ); } diff --git a/frame/3/gemm/bli_gemm_blocksize.h b/frame/3/old/bli_gemm_blocksize.h similarity index 91% rename from frame/3/gemm/bli_gemm_blocksize.h rename to frame/3/old/bli_gemm_blocksize.h index 62a25d753..4762c123b 100644 --- a/frame/3/gemm/bli_gemm_blocksize.h +++ b/frame/3/old/bli_gemm_blocksize.h @@ -36,9 +36,11 @@ dim_t bli_gemm_determine_kc_f( dim_t i, dim_t dim, obj_t* a, obj_t* b, - blksz_t* bsize ); + bszid_t bszid, + cntx_t* cntx ); dim_t bli_gemm_determine_kc_b( dim_t i, dim_t dim, obj_t* a, obj_t* b, - blksz_t* bsize ); + bszid_t bszid, + cntx_t* cntx ); diff --git a/frame/3/gemm/bli_gemm_check.c b/frame/3/old/bli_gemm_check.c similarity index 99% rename from frame/3/gemm/bli_gemm_check.c rename to frame/3/old/bli_gemm_check.c index 558b3fee7..8b052bc5d 100644 --- a/frame/3/gemm/bli_gemm_check.c +++ b/frame/3/old/bli_gemm_check.c @@ -115,6 +115,7 @@ void bli_gemm_int_check( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { err_t e_val; diff --git a/frame/3/gemm/bli_gemm_check.h b/frame/3/old/bli_gemm_check.h similarity index 98% rename from frame/3/gemm/bli_gemm_check.h rename to frame/3/old/bli_gemm_check.h index 7d38288d7..0469e3c09 100644 --- a/frame/3/gemm/bli_gemm_check.h +++ b/frame/3/old/bli_gemm_check.h @@ -49,5 +49,6 @@ void bli_gemm_int_check( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/gemm/bli_gemm_ukernel.c b/frame/3/old/bli_gemm_ukernel.c similarity index 96% rename from frame/3/gemm/bli_gemm_ukernel.c rename to frame/3/old/bli_gemm_ukernel.c index 8b80cdfe6..681eba569 100644 --- a/frame/3/gemm/bli_gemm_ukernel.c +++ b/frame/3/old/bli_gemm_ukernel.c @@ -44,18 +44,18 @@ typedef void (*FUNCPTR_T)( void* beta, void* c, inc_t rs_c, inc_t cs_c, auxinfo_t* data, + cntx_t* cntx, void* ukr ); static FUNCPTR_T GENARRAY(ftypes,gemm_ukernel_void); -extern func_t* gemm_ukrs; - void bli_gemm_ukernel( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, - obj_t* c ) + obj_t* c, + cntx_t* cntx ) { num_t dt = bli_obj_datatype( *c ); @@ -88,6 +88,7 @@ void bli_gemm_ukernel( obj_t* alpha, bli_auxinfo_set_is_b( 1, data ); // Query the function address from the micro-kernel func_t object. + gemm_ukrs = bli_cntx_get_l3_ukr( BLIS_GEMM_UKR, cntx ); gemm_ukr = bli_func_obj_query( dt, gemm_ukrs ); // Index into the type combination array to extract the correct diff --git a/frame/3/gemm/bli_gemm_ukernel.h b/frame/3/old/bli_gemm_ukernel.h similarity index 95% rename from frame/3/gemm/bli_gemm_ukernel.h rename to frame/3/old/bli_gemm_ukernel.h index 43a6b0bcd..6eafa6187 100644 --- a/frame/3/gemm/bli_gemm_ukernel.h +++ b/frame/3/old/bli_gemm_ukernel.h @@ -36,7 +36,8 @@ void bli_gemm_ukernel( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, - obj_t* c ); + obj_t* c, + cntx_t* cntx ); // @@ -54,6 +55,7 @@ void PASTEMAC(ch,varname)( \ void* beta, \ void* c, inc_t rs_c, inc_t cs_c, \ auxinfo_t* data, \ + cntx_t* cntx, \ void* ukr \ ); diff --git a/frame/3/gemm/ukernels/bli_gemm_ukr_ref.h b/frame/3/old/bli_gemm_ukr_ref.h similarity index 100% rename from frame/3/gemm/ukernels/bli_gemm_ukr_ref.h rename to frame/3/old/bli_gemm_ukr_ref.h diff --git a/frame/3/trsm/ukernels/bli_gemmtrsm_l_ukr_ref.c b/frame/3/old/bli_gemmtrsm_l_ukr_ref.c similarity index 100% rename from frame/3/trsm/ukernels/bli_gemmtrsm_l_ukr_ref.c rename to frame/3/old/bli_gemmtrsm_l_ukr_ref.c diff --git a/frame/3/trsm/ukernels/bli_gemmtrsm_l_ukr_ref.h b/frame/3/old/bli_gemmtrsm_l_ukr_ref.h similarity index 100% rename from frame/3/trsm/ukernels/bli_gemmtrsm_l_ukr_ref.h rename to frame/3/old/bli_gemmtrsm_l_ukr_ref.h diff --git a/frame/3/trsm/ukernels/bli_gemmtrsm_u_ukr_ref.c b/frame/3/old/bli_gemmtrsm_u_ukr_ref.c similarity index 100% rename from frame/3/trsm/ukernels/bli_gemmtrsm_u_ukr_ref.c rename to frame/3/old/bli_gemmtrsm_u_ukr_ref.c diff --git a/frame/3/trsm/ukernels/bli_gemmtrsm_u_ukr_ref.h b/frame/3/old/bli_gemmtrsm_u_ukr_ref.h similarity index 100% rename from frame/3/trsm/ukernels/bli_gemmtrsm_u_ukr_ref.h rename to frame/3/old/bli_gemmtrsm_u_ukr_ref.h diff --git a/frame/3/trsm/bli_gemmtrsm_ukernel.c b/frame/3/old/bli_gemmtrsm_ukernel.c similarity index 91% rename from frame/3/trsm/bli_gemmtrsm_ukernel.c rename to frame/3/old/bli_gemmtrsm_ukernel.c index 4c1791023..7d9f28d53 100644 --- a/frame/3/trsm/bli_gemmtrsm_ukernel.c +++ b/frame/3/old/bli_gemmtrsm_ukernel.c @@ -46,21 +46,20 @@ typedef void (*FUNCPTR_T)( void* b11, void* c11, inc_t rs_c, inc_t cs_c, auxinfo_t* data, + cntx_t* cntx, void* ukr ); static FUNCPTR_T GENARRAY(ftypes_l,gemmtrsm_l_ukernel_void); static FUNCPTR_T GENARRAY(ftypes_u,gemmtrsm_u_ukernel_void); -extern func_t* gemmtrsm_l_ukrs; -extern func_t* gemmtrsm_u_ukrs; - void bli_gemmtrsm_ukernel( obj_t* alpha, obj_t* a1x, obj_t* a11, obj_t* bx1, obj_t* b11, - obj_t* c11 ) + obj_t* c11, + cntx_t* cntx ) { dim_t k = bli_obj_width( *a1x ); @@ -84,6 +83,7 @@ void bli_gemmtrsm_ukernel( obj_t* alpha, FUNCPTR_T f; + func_t* gemmtrsm_ukrs; void* gemmtrsm_ukr; @@ -96,9 +96,15 @@ void bli_gemmtrsm_ukernel( obj_t* alpha, // Query the function address from the micro-kernel func_t object. if ( bli_obj_is_lower( *a11 ) ) - gemmtrsm_ukr = bli_func_obj_query( dt, gemmtrsm_l_ukrs ); + { + gemmtrsm_ukrs = bli_cntx_get_l3_ukr( BLIS_GEMMTRSM_L_UKR, cntx ); + gemmtrsm_ukr = bli_func_obj_query( dt, gemmtrsm_ukrs ); + } else - gemmtrsm_ukr = bli_func_obj_query( dt, gemmtrsm_u_ukrs ); + { + gemmtrsm_ukrs = bli_cntx_get_l3_ukr( BLIS_GEMMTRSM_U_UKR, cntx ); + gemmtrsm_ukr = bli_func_obj_query( dt, gemmtrsm_ukrs ); + } // Index into the type combination array to extract the correct // function pointer. diff --git a/frame/3/trsm/bli_gemmtrsm_ukernel.h b/frame/3/old/bli_gemmtrsm_ukernel.h similarity index 95% rename from frame/3/trsm/bli_gemmtrsm_ukernel.h rename to frame/3/old/bli_gemmtrsm_ukernel.h index 22a8856b8..074c25c84 100644 --- a/frame/3/trsm/bli_gemmtrsm_ukernel.h +++ b/frame/3/old/bli_gemmtrsm_ukernel.h @@ -37,7 +37,8 @@ void bli_gemmtrsm_ukernel( obj_t* alpha, obj_t* a11, obj_t* bx1, obj_t* b11, - obj_t* c11 ); + obj_t* c11, + cntx_t* cntx ); // @@ -56,6 +57,7 @@ void PASTEMAC(ch,varname)( \ void* b11, \ void* c11, inc_t rs_c, inc_t cs_c, \ auxinfo_t* data, \ + cntx_t* cntx, \ void* ukr \ ); diff --git a/frame/3/hemm/bli_hemm.c b/frame/3/old/bli_hemm.c similarity index 90% rename from frame/3/hemm/bli_hemm.c rename to frame/3/old/bli_hemm.c index 64cac73bd..119c85043 100644 --- a/frame/3/hemm/bli_hemm.c +++ b/frame/3/old/bli_hemm.c @@ -34,8 +34,6 @@ #include "blis.h" -extern gemm_t* gemm_cntl; - void bli_hemm( side_t side, obj_t* alpha, obj_t* a, @@ -43,19 +41,11 @@ void bli_hemm( side_t side, obj_t* beta, obj_t* c ) { - num_t dt = bli_obj_datatype( *c ); - - // If an induced method is available (ie: implemented and enabled), - // call it instead. - if ( bli_hemmind_has_avail( dt ) ) - { - hemm_fp_t func = bli_hemmind_get_avail( dt ); - - return func( side, alpha, a, b, beta, c ); - } - - bli_hemm_front( side, alpha, a, b, beta, c, - gemm_cntl ); + // The operation's "ind" function--its induced method front-end--will + // call native execution for real domain problems. For complex problems, + // it calls the highest priority induced method that is available (ie: + // implemented and enabled), and otherwise calls native execution. + bli_hemmind( side, alpha, a, b, beta, c ); } diff --git a/frame/3/hemm/bli_hemm_check.c b/frame/3/old/bli_hemm_check.c similarity index 99% rename from frame/3/hemm/bli_hemm_check.c rename to frame/3/old/bli_hemm_check.c index 044ca359e..1479c7496 100644 --- a/frame/3/hemm/bli_hemm_check.c +++ b/frame/3/old/bli_hemm_check.c @@ -149,6 +149,7 @@ void bli_hemm_int_check( side_t side, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { err_t e_val; diff --git a/frame/3/hemm/bli_hemm_check.h b/frame/3/old/bli_hemm_check.h similarity index 98% rename from frame/3/hemm/bli_hemm_check.h rename to frame/3/old/bli_hemm_check.h index 94798bb7f..45e21d3df 100644 --- a/frame/3/hemm/bli_hemm_check.h +++ b/frame/3/old/bli_hemm_check.h @@ -52,5 +52,6 @@ void bli_hemm_int_check( side_t side, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/her2k/bli_her2k.c b/frame/3/old/bli_her2k.c similarity index 90% rename from frame/3/her2k/bli_her2k.c rename to frame/3/old/bli_her2k.c index a8eb53e96..5abce4385 100644 --- a/frame/3/her2k/bli_her2k.c +++ b/frame/3/old/bli_her2k.c @@ -34,27 +34,17 @@ #include "blis.h" -extern gemm_t* gemm_cntl; - void bli_her2k( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ) { - num_t dt = bli_obj_datatype( *c ); - - // If an induced method is available (ie: implemented and enabled), - // call it instead. - if ( bli_her2kind_has_avail( dt ) ) - { - her2k_fp_t func = bli_her2kind_get_avail( dt ); - - return func( alpha, a, b, beta, c ); - } - - bli_her2k_front( alpha, a, b, beta, c, - gemm_cntl ); + // The operation's "ind" function--its induced method front-end--will + // call native execution for real domain problems. For complex problems, + // it calls the highest priority induced method that is available (ie: + // implemented and enabled), and otherwise calls native execution. + bli_her2kind( alpha, a, b, beta, c ); } diff --git a/frame/3/her2k/bli_her2k_check.c b/frame/3/old/bli_her2k_check.c similarity index 100% rename from frame/3/her2k/bli_her2k_check.c rename to frame/3/old/bli_her2k_check.c diff --git a/frame/3/her2k/bli_her2k_check.h b/frame/3/old/bli_her2k_check.h similarity index 100% rename from frame/3/her2k/bli_her2k_check.h rename to frame/3/old/bli_her2k_check.h diff --git a/frame/3/herk/bli_herk.c b/frame/3/old/bli_herk.c similarity index 89% rename from frame/3/herk/bli_herk.c rename to frame/3/old/bli_herk.c index 8b2cee906..4a93c2e12 100644 --- a/frame/3/herk/bli_herk.c +++ b/frame/3/old/bli_herk.c @@ -34,26 +34,16 @@ #include "blis.h" -extern gemm_t* gemm_cntl; - void bli_herk( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c ) { - num_t dt = bli_obj_datatype( *c ); - - // If an induced method is available (ie: implemented and enabled), - // call it instead. - if ( bli_herkind_has_avail( dt ) ) - { - herk_fp_t func = bli_herkind_get_avail( dt ); - - return func( alpha, a, beta, c ); - } - - bli_herk_front( alpha, a, beta, c, - gemm_cntl ); + // The operation's "ind" function--its induced method front-end--will + // call native execution for real domain problems. For complex problems, + // it calls the highest priority induced method that is available (ie: + // implemented and enabled), and otherwise calls native execution. + bli_herkind( alpha, a, beta, c ); } diff --git a/frame/3/herk/bli_herk_check.c b/frame/3/old/bli_herk_check.c similarity index 99% rename from frame/3/herk/bli_herk_check.c rename to frame/3/old/bli_herk_check.c index 967e8f25f..0ebc7dbcd 100644 --- a/frame/3/herk/bli_herk_check.c +++ b/frame/3/old/bli_herk_check.c @@ -127,6 +127,7 @@ void bli_herk_int_check( obj_t* alpha, obj_t* ah, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { err_t e_val; diff --git a/frame/3/herk/bli_herk_check.h b/frame/3/old/bli_herk_check.h similarity index 98% rename from frame/3/herk/bli_herk_check.h rename to frame/3/old/bli_herk_check.h index d91fa7946..ceb215ea5 100644 --- a/frame/3/herk/bli_herk_check.h +++ b/frame/3/old/bli_herk_check.h @@ -48,5 +48,6 @@ void bli_herk_int_check( obj_t* alpha, obj_t* ah, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/herk/bli_herk_prune.c b/frame/3/old/bli_herk_prune.c similarity index 100% rename from frame/3/herk/bli_herk_prune.c rename to frame/3/old/bli_herk_prune.c diff --git a/frame/3/herk/bli_herk_prune.h b/frame/3/old/bli_herk_prune.h similarity index 100% rename from frame/3/herk/bli_herk_prune.h rename to frame/3/old/bli_herk_prune.h diff --git a/frame/3/symm/bli_symm.c b/frame/3/old/bli_symm.c similarity index 90% rename from frame/3/symm/bli_symm.c rename to frame/3/old/bli_symm.c index ad81db253..33fc56dc6 100644 --- a/frame/3/symm/bli_symm.c +++ b/frame/3/old/bli_symm.c @@ -34,8 +34,6 @@ #include "blis.h" -extern gemm_t* gemm_cntl; - void bli_symm( side_t side, obj_t* alpha, obj_t* a, @@ -43,19 +41,11 @@ void bli_symm( side_t side, obj_t* beta, obj_t* c ) { - num_t dt = bli_obj_datatype( *c ); - - // If an induced method is available (ie: implemented and enabled), - // call it instead. - if ( bli_symmind_has_avail( dt ) ) - { - symm_fp_t func = bli_symmind_get_avail( dt ); - - return func( side, alpha, a, b, beta, c ); - } - - bli_symm_front( side, alpha, a, b, beta, c, - gemm_cntl ); + // The operation's "ind" function--its induced method front-end--will + // call native execution for real domain problems. For complex problems, + // it calls the highest priority induced method that is available (ie: + // implemented and enabled), and otherwise calls native execution. + bli_symmind( side, alpha, a, b, beta, c ); } diff --git a/frame/3/symm/bli_symm_check.c b/frame/3/old/bli_symm_check.c similarity index 100% rename from frame/3/symm/bli_symm_check.c rename to frame/3/old/bli_symm_check.c diff --git a/frame/3/symm/bli_symm_check.h b/frame/3/old/bli_symm_check.h similarity index 100% rename from frame/3/symm/bli_symm_check.h rename to frame/3/old/bli_symm_check.h diff --git a/frame/3/syr2k/bli_syr2k.c b/frame/3/old/bli_syr2k.c similarity index 90% rename from frame/3/syr2k/bli_syr2k.c rename to frame/3/old/bli_syr2k.c index e0f867a14..8f8469926 100644 --- a/frame/3/syr2k/bli_syr2k.c +++ b/frame/3/old/bli_syr2k.c @@ -34,27 +34,17 @@ #include "blis.h" -extern gemm_t* gemm_cntl; - void bli_syr2k( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ) { - num_t dt = bli_obj_datatype( *c ); - - // If an induced method is available (ie: implemented and enabled), - // call it instead. - if ( bli_syr2kind_has_avail( dt ) ) - { - syr2k_fp_t func = bli_syr2kind_get_avail( dt ); - - return func( alpha, a, b, beta, c ); - } - - bli_syr2k_front( alpha, a, b, beta, c, - gemm_cntl ); + // The operation's "ind" function--its induced method front-end--will + // call native execution for real domain problems. For complex problems, + // it calls the highest priority induced method that is available (ie: + // implemented and enabled), and otherwise calls native execution. + bli_syr2kind( alpha, a, b, beta, c ); } diff --git a/frame/3/syr2k/bli_syr2k_check.c b/frame/3/old/bli_syr2k_check.c similarity index 100% rename from frame/3/syr2k/bli_syr2k_check.c rename to frame/3/old/bli_syr2k_check.c diff --git a/frame/3/syr2k/bli_syr2k_check.h b/frame/3/old/bli_syr2k_check.h similarity index 100% rename from frame/3/syr2k/bli_syr2k_check.h rename to frame/3/old/bli_syr2k_check.h diff --git a/frame/3/syrk/bli_syrk.c b/frame/3/old/bli_syrk.c similarity index 89% rename from frame/3/syrk/bli_syrk.c rename to frame/3/old/bli_syrk.c index 44b9e6f8f..8e4b72315 100644 --- a/frame/3/syrk/bli_syrk.c +++ b/frame/3/old/bli_syrk.c @@ -34,26 +34,16 @@ #include "blis.h" -extern gemm_t* gemm_cntl; - void bli_syrk( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c ) { - num_t dt = bli_obj_datatype( *c ); - - // If an induced method is available (ie: implemented and enabled), - // call it instead. - if ( bli_syrkind_has_avail( dt ) ) - { - syrk_fp_t func = bli_syrkind_get_avail( dt ); - - return func( alpha, a, beta, c ); - } - - bli_syrk_front( alpha, a, beta, c, - gemm_cntl ); + // The operation's "ind" function--its induced method front-end--will + // call native execution for real domain problems. For complex problems, + // it calls the highest priority induced method that is available (ie: + // implemented and enabled), and otherwise calls native execution. + bli_syrkind( alpha, a, beta, c ); } diff --git a/frame/3/syrk/bli_syrk_check.c b/frame/3/old/bli_syrk_check.c similarity index 100% rename from frame/3/syrk/bli_syrk_check.c rename to frame/3/old/bli_syrk_check.c diff --git a/frame/3/syrk/bli_syrk_check.h b/frame/3/old/bli_syrk_check.h similarity index 100% rename from frame/3/syrk/bli_syrk_check.h rename to frame/3/old/bli_syrk_check.h diff --git a/frame/3/trmm/bli_trmm.c b/frame/3/old/bli_trmm.c similarity index 89% rename from frame/3/trmm/bli_trmm.c rename to frame/3/old/bli_trmm.c index a80106db8..28dd60889 100644 --- a/frame/3/trmm/bli_trmm.c +++ b/frame/3/old/bli_trmm.c @@ -34,26 +34,16 @@ #include "blis.h" -extern gemm_t* gemm_cntl; - void bli_trmm( side_t side, obj_t* alpha, obj_t* a, obj_t* b ) { - num_t dt = bli_obj_datatype( *b ); - - // If an induced method is available (ie: implemented and enabled), - // call it instead. - if ( bli_trmmind_has_avail( dt ) ) - { - trmm_fp_t func = bli_trmmind_get_avail( dt ); - - return func( side, alpha, a, b ); - } - - bli_trmm_front( side, alpha, a, b, - gemm_cntl ); + // The operation's "ind" function--its induced method front-end--will + // call native execution for real domain problems. For complex problems, + // it calls the highest priority induced method that is available (ie: + // implemented and enabled), and otherwise calls native execution. + bli_trmmind( side, alpha, a, b ); } diff --git a/frame/3/trmm3/bli_trmm3.c b/frame/3/old/bli_trmm3.c similarity index 90% rename from frame/3/trmm3/bli_trmm3.c rename to frame/3/old/bli_trmm3.c index 7ae861f03..57011692c 100644 --- a/frame/3/trmm3/bli_trmm3.c +++ b/frame/3/old/bli_trmm3.c @@ -34,8 +34,6 @@ #include "blis.h" -extern gemm_t* gemm_cntl; - void bli_trmm3( side_t side, obj_t* alpha, obj_t* a, @@ -43,19 +41,11 @@ void bli_trmm3( side_t side, obj_t* beta, obj_t* c ) { - num_t dt = bli_obj_datatype( *c ); - - // If an induced method is available (ie: implemented and enabled), - // call it instead. - if ( bli_trmm3ind_has_avail( dt ) ) - { - trmm3_fp_t func = bli_trmm3ind_get_avail( dt ); - - return func( side, alpha, a, b, beta, c ); - } - - bli_trmm3_front( side, alpha, a, b, beta, c, - gemm_cntl ); + // The operation's "ind" function--its induced method front-end--will + // call native execution for real domain problems. For complex problems, + // it calls the highest priority induced method that is available (ie: + // implemented and enabled), and otherwise calls native execution. + bli_trmm3ind( side, alpha, a, b, beta, c ); } diff --git a/frame/3/trmm3/bli_trmm3_check.c b/frame/3/old/bli_trmm3_check.c similarity index 100% rename from frame/3/trmm3/bli_trmm3_check.c rename to frame/3/old/bli_trmm3_check.c diff --git a/frame/3/trmm3/bli_trmm3_check.h b/frame/3/old/bli_trmm3_check.h similarity index 100% rename from frame/3/trmm3/bli_trmm3_check.h rename to frame/3/old/bli_trmm3_check.h diff --git a/frame/3/trmm/bli_trmm_blocksize.c b/frame/3/old/bli_trmm_blocksize.c similarity index 82% rename from frame/3/trmm/bli_trmm_blocksize.c rename to frame/3/old/bli_trmm_blocksize.c index 5249621b5..dfe07bddf 100644 --- a/frame/3/trmm/bli_trmm_blocksize.c +++ b/frame/3/old/bli_trmm_blocksize.c @@ -38,12 +38,14 @@ dim_t bli_trmm_determine_kc_f( dim_t i, dim_t dim, obj_t* a, obj_t* b, - blksz_t* bsize ) + bszid_t bszid, + cntx_t* cntx ) { - num_t dt; - dim_t mnr; - dim_t b_alg, b_max; - dim_t b_use; + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; // We assume that this function is being called from an algorithm that // is moving "forward" (ie: top to bottom, left to right, top-left @@ -52,14 +54,18 @@ dim_t bli_trmm_determine_kc_f( dim_t i, // Extract the execution datatype and use it to query the corresponding // blocksize and blocksize maximum values from the blksz_t object. dt = bli_obj_execution_datatype( *a ); + bsize = bli_cntx_get_blksz( bszid, cntx ); b_alg = bli_blksz_get_def( dt, bsize ); b_max = bli_blksz_get_max( dt, bsize ); // Nudge the default and maximum kc blocksizes up to the nearest // multiple of MR if the triangular matrix is on the left, or NR // if the triangular matrix is one the right. - if ( bli_obj_root_is_triangular( *a ) ) mnr = bli_blksz_get_mr( dt, bsize ); - else mnr = bli_blksz_get_nr( dt, bsize ); + if ( bli_obj_root_is_triangular( *a ) ) + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + else + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); + b_alg = bli_align_dim_to_mult( b_alg, mnr ); b_max = bli_align_dim_to_mult( b_max, mnr ); @@ -73,12 +79,14 @@ dim_t bli_trmm_determine_kc_b( dim_t i, dim_t dim, obj_t* a, obj_t* b, - blksz_t* bsize ) + bszid_t bszid, + cntx_t* cntx ) { - num_t dt; - dim_t mnr; - dim_t b_alg, b_max; - dim_t b_use; + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; // We assume that this function is being called from an algorithm that // is moving "backward" (ie: bottom to top, right to left, bottom-right @@ -87,14 +95,18 @@ dim_t bli_trmm_determine_kc_b( dim_t i, // Extract the execution datatype and use it to query the corresponding // blocksize and blocksize maximum values from the blksz_t object. dt = bli_obj_execution_datatype( *a ); + bsize = bli_cntx_get_blksz( bszid, cntx ); b_alg = bli_blksz_get_def( dt, bsize ); b_max = bli_blksz_get_max( dt, bsize ); // Nudge the default and maximum kc blocksizes up to the nearest // multiple of MR if the triangular matrix is on the left, or NR // if the triangular matrix is one the right. - if ( bli_obj_root_is_triangular( *a ) ) mnr = bli_blksz_get_mr( dt, bsize ); - else mnr = bli_blksz_get_nr( dt, bsize ); + if ( bli_obj_root_is_triangular( *a ) ) + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + else + mnr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); + b_alg = bli_align_dim_to_mult( b_alg, mnr ); b_max = bli_align_dim_to_mult( b_max, mnr ); diff --git a/frame/3/trmm/bli_trmm_blocksize.h b/frame/3/old/bli_trmm_blocksize.h similarity index 91% rename from frame/3/trmm/bli_trmm_blocksize.h rename to frame/3/old/bli_trmm_blocksize.h index cb13200d2..886b4d954 100644 --- a/frame/3/trmm/bli_trmm_blocksize.h +++ b/frame/3/old/bli_trmm_blocksize.h @@ -36,9 +36,11 @@ dim_t bli_trmm_determine_kc_f( dim_t i, dim_t dim, obj_t* a, obj_t* b, - blksz_t* bsize ); + bszid_t bszid, + cntx_t* cntx ); dim_t bli_trmm_determine_kc_b( dim_t i, dim_t dim, obj_t* a, obj_t* b, - blksz_t* bsize ); + bszid_t bszid, + cntx_t* cntx ); diff --git a/frame/3/trmm/bli_trmm_check.c b/frame/3/old/bli_trmm_check.c similarity index 99% rename from frame/3/trmm/bli_trmm_check.c rename to frame/3/old/bli_trmm_check.c index 67fcf4ee4..39366829e 100644 --- a/frame/3/trmm/bli_trmm_check.c +++ b/frame/3/old/bli_trmm_check.c @@ -116,6 +116,7 @@ void bli_trmm_int_check( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { err_t e_val; diff --git a/frame/3/trmm/bli_trmm_check.h b/frame/3/old/bli_trmm_check.h similarity index 98% rename from frame/3/trmm/bli_trmm_check.h rename to frame/3/old/bli_trmm_check.h index 426b01003..97e4714eb 100644 --- a/frame/3/trmm/bli_trmm_check.h +++ b/frame/3/old/bli_trmm_check.h @@ -49,5 +49,6 @@ void bli_trmm_int_check( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/trmm/bli_trmm_prune.c b/frame/3/old/bli_trmm_prune.c similarity index 100% rename from frame/3/trmm/bli_trmm_prune.c rename to frame/3/old/bli_trmm_prune.c diff --git a/frame/3/trmm/bli_trmm_prune.h b/frame/3/old/bli_trmm_prune.h similarity index 100% rename from frame/3/trmm/bli_trmm_prune.h rename to frame/3/old/bli_trmm_prune.h diff --git a/frame/3/trsm/bli_trsm.c b/frame/3/old/bli_trsm.c similarity index 88% rename from frame/3/trsm/bli_trsm.c rename to frame/3/old/bli_trsm.c index 019687f93..58a21c398 100644 --- a/frame/3/trsm/bli_trsm.c +++ b/frame/3/old/bli_trsm.c @@ -34,28 +34,16 @@ #include "blis.h" -extern trsm_t* trsm_l_cntl; -extern trsm_t* trsm_r_cntl; - void bli_trsm( side_t side, obj_t* alpha, obj_t* a, obj_t* b ) { - num_t dt = bli_obj_datatype( *b ); - - // If an induced method is available (ie: implemented and enabled), - // call it instead. - if ( bli_trsmind_has_avail( dt ) ) - { - trsm_fp_t func = bli_trsmind_get_avail( dt ); - - return func( side, alpha, a, b ); - } - - bli_trsm_front( side, alpha, a, b, - trsm_l_cntl, - trsm_r_cntl ); + // The operation's "ind" function--its induced method front-end--will + // call native execution for real domain problems. For complex problems, + // it calls the highest priority induced method that is available (ie: + // implemented and enabled), and otherwise calls native execution. + bli_trsmind( side, alpha, a, b ); } diff --git a/frame/3/trsm/bli_trsm_blocksize.c b/frame/3/old/bli_trsm_blocksize.c similarity index 86% rename from frame/3/trsm/bli_trsm_blocksize.c rename to frame/3/old/bli_trsm_blocksize.c index fa6f29696..71401bbab 100644 --- a/frame/3/trsm/bli_trsm_blocksize.c +++ b/frame/3/old/bli_trsm_blocksize.c @@ -37,12 +37,14 @@ dim_t bli_trsm_determine_kc_f( dim_t i, dim_t dim, obj_t* obj, - blksz_t* bsize ) + bszid_t bszid, + cntx_t* cntx ) { - num_t dt; - dim_t mr; - dim_t b_alg, b_max; - dim_t b_use; + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; // We assume that this function is being called from an algorithm that // is moving "forward" (ie: top to bottom, left to right, top-left @@ -51,6 +53,7 @@ dim_t bli_trsm_determine_kc_f( dim_t i, // Extract the execution datatype and use it to query the corresponding // blocksize and blocksize maximum values from the blksz_t object. dt = bli_obj_execution_datatype( *obj ); + bsize = bli_cntx_get_blksz( bszid, cntx ); b_alg = bli_blksz_get_def( dt, bsize ); b_max = bli_blksz_get_max( dt, bsize ); @@ -59,7 +62,7 @@ dim_t bli_trsm_determine_kc_f( dim_t i, // because even when the triangle is on the right, packing of that // matrix uses MR, since only left-side trsm micro-kernels are // supported. - mr = bli_blksz_get_mr( dt, bsize ); + mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); b_alg = bli_align_dim_to_mult( b_alg, mr ); b_max = bli_align_dim_to_mult( b_max, mr ); @@ -72,12 +75,14 @@ dim_t bli_trsm_determine_kc_f( dim_t i, dim_t bli_trsm_determine_kc_b( dim_t i, dim_t dim, obj_t* obj, - blksz_t* bsize ) + bszid_t bszid, + cntx_t* cntx ) { - num_t dt; - dim_t mr; - dim_t b_alg, b_max; - dim_t b_use; + num_t dt; + blksz_t* bsize; + dim_t mnr; + dim_t b_alg, b_max; + dim_t b_use; // We assume that this function is being called from an algorithm that // is moving "backward" (ie: bottom to top, right to left, bottom-right @@ -86,6 +91,7 @@ dim_t bli_trsm_determine_kc_b( dim_t i, // Extract the execution datatype and use it to query the corresponding // blocksize and blocksize maximum values from the blksz_t object. dt = bli_obj_execution_datatype( *obj ); + bsize = bli_cntx_get_blksz( bszid, cntx ); b_alg = bli_blksz_get_def( dt, bsize ); b_max = bli_blksz_get_max( dt, bsize ); @@ -94,7 +100,7 @@ dim_t bli_trsm_determine_kc_b( dim_t i, // because even when the triangle is on the right, packing of that // matrix uses MR, since only left-side trsm micro-kernels are // supported. - mr = bli_blksz_get_mr( dt, bsize ); + mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); b_alg = bli_align_dim_to_mult( b_alg, mr ); b_max = bli_align_dim_to_mult( b_max, mr ); diff --git a/frame/3/trsm/bli_trsm_blocksize.h b/frame/3/old/bli_trsm_blocksize.h similarity index 91% rename from frame/3/trsm/bli_trsm_blocksize.h rename to frame/3/old/bli_trsm_blocksize.h index ac16fcce1..411e78a11 100644 --- a/frame/3/trsm/bli_trsm_blocksize.h +++ b/frame/3/old/bli_trsm_blocksize.h @@ -35,8 +35,10 @@ dim_t bli_trsm_determine_kc_f( dim_t i, dim_t dim, obj_t* obj, - blksz_t* bsize ); + bszid_t bszid, + cntx_t* cntx ); dim_t bli_trsm_determine_kc_b( dim_t i, dim_t dim, obj_t* obj, - blksz_t* bsize ); + bszid_t bszid, + cntx_t* cntx ); diff --git a/frame/3/trsm/bli_trsm_check.c b/frame/3/old/bli_trsm_check.c similarity index 100% rename from frame/3/trsm/bli_trsm_check.c rename to frame/3/old/bli_trsm_check.c diff --git a/frame/3/trsm/bli_trsm_check.h b/frame/3/old/bli_trsm_check.h similarity index 100% rename from frame/3/trsm/bli_trsm_check.h rename to frame/3/old/bli_trsm_check.h diff --git a/frame/3/trsm/ukernels/bli_trsm_l_ukr_ref.c b/frame/3/old/bli_trsm_l_ukr_ref.c similarity index 100% rename from frame/3/trsm/ukernels/bli_trsm_l_ukr_ref.c rename to frame/3/old/bli_trsm_l_ukr_ref.c diff --git a/frame/3/trsm/ukernels/bli_trsm_l_ukr_ref.h b/frame/3/old/bli_trsm_l_ukr_ref.h similarity index 100% rename from frame/3/trsm/ukernels/bli_trsm_l_ukr_ref.h rename to frame/3/old/bli_trsm_l_ukr_ref.h diff --git a/frame/3/trsm/bli_trsm_prune.c b/frame/3/old/bli_trsm_prune.c similarity index 100% rename from frame/3/trsm/bli_trsm_prune.c rename to frame/3/old/bli_trsm_prune.c diff --git a/frame/3/trsm/bli_trsm_prune.h b/frame/3/old/bli_trsm_prune.h similarity index 100% rename from frame/3/trsm/bli_trsm_prune.h rename to frame/3/old/bli_trsm_prune.h diff --git a/frame/3/trsm/ukernels/bli_trsm_u_ukr_ref.c b/frame/3/old/bli_trsm_u_ukr_ref.c similarity index 100% rename from frame/3/trsm/ukernels/bli_trsm_u_ukr_ref.c rename to frame/3/old/bli_trsm_u_ukr_ref.c diff --git a/frame/3/trsm/ukernels/bli_trsm_u_ukr_ref.h b/frame/3/old/bli_trsm_u_ukr_ref.h similarity index 100% rename from frame/3/trsm/ukernels/bli_trsm_u_ukr_ref.h rename to frame/3/old/bli_trsm_u_ukr_ref.h diff --git a/frame/3/trsm/bli_trsm_ukernel.c b/frame/3/old/bli_trsm_ukernel.c similarity index 90% rename from frame/3/trsm/bli_trsm_ukernel.c rename to frame/3/old/bli_trsm_ukernel.c index e83b200d2..a505744e4 100644 --- a/frame/3/trsm/bli_trsm_ukernel.c +++ b/frame/3/old/bli_trsm_ukernel.c @@ -42,18 +42,17 @@ typedef void (*FUNCPTR_T)( void* b, void* c, inc_t rs_c, inc_t cs_c, auxinfo_t* data, + cntx_t* cntx, void* ukr ); static FUNCPTR_T GENARRAY(ftypes_l,trsm_l_ukernel_void); static FUNCPTR_T GENARRAY(ftypes_u,trsm_u_ukernel_void); -extern func_t* trsm_l_ukrs; -extern func_t* trsm_u_ukrs; - void bli_trsm_ukernel( obj_t* a, obj_t* b, - obj_t* c ) + obj_t* c, + cntx_t* cntx ) { num_t dt = bli_obj_datatype( *c ); @@ -69,6 +68,7 @@ void bli_trsm_ukernel( obj_t* a, FUNCPTR_T f; + func_t* trsm_ukrs; void* trsm_ukr; @@ -80,8 +80,16 @@ void bli_trsm_ukernel( obj_t* a, bli_auxinfo_set_is_b( 1, data ); // Query the function address from the micro-kernel func_t object. - if ( bli_obj_is_lower( *a ) ) trsm_ukr = bli_func_obj_query( dt, trsm_l_ukrs ); - else trsm_ukr = bli_func_obj_query( dt, trsm_u_ukrs ); + if ( bli_obj_is_lower( *a ) ) + { + trsm_ukrs = bli_cntx_get_l3_ukr( BLIS_TRSM_L_UKR, cntx ); + trsm_ukr = bli_func_obj_query( dt, trsm_ukrs ); + } + else + { + trsm_ukrs = bli_cntx_get_l3_ukr( BLIS_TRSM_U_UKR, cntx ); + trsm_ukr = bli_func_obj_query( dt, trsm_ukrs ); + } // Index into the type combination array to extract the correct // function pointer. diff --git a/frame/3/trsm/bli_trsm_ukernel.h b/frame/3/old/bli_trsm_ukernel.h similarity index 95% rename from frame/3/trsm/bli_trsm_ukernel.h rename to frame/3/old/bli_trsm_ukernel.h index 0a420b1a5..15f9cafd0 100644 --- a/frame/3/trsm/bli_trsm_ukernel.h +++ b/frame/3/old/bli_trsm_ukernel.h @@ -34,7 +34,8 @@ void bli_trsm_ukernel( obj_t* a, obj_t* b, - obj_t* c ); + obj_t* c, + cntx_t* cntx ); // @@ -49,6 +50,7 @@ void PASTEMAC(ch,varname)( \ void* b, \ void* c, inc_t rs_c, inc_t cs_c, \ auxinfo_t* data, \ + cntx_t* cntx, \ void* ukr \ ); diff --git a/frame/3/symm/bli_symm.h b/frame/3/symm/bli_symm.h index efb5fece1..59e91e52c 100644 --- a/frame/3/symm/bli_symm.h +++ b/frame/3/symm/bli_symm.h @@ -32,37 +32,5 @@ */ -#include "bli_symm_check.h" #include "bli_symm_front.h" - -// -// Prototype object-based interface. -// -void bli_symm( side_t side, - obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c ); - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - conj_t conja, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( symm ) - diff --git a/frame/3/symm/bli_symm_front.c b/frame/3/symm/bli_symm_front.c index 5583d7e0f..af4d63054 100644 --- a/frame/3/symm/bli_symm_front.c +++ b/frame/3/symm/bli_symm_front.c @@ -40,6 +40,7 @@ void bli_symm_front( side_t side, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { obj_t a_local; @@ -48,7 +49,7 @@ void bli_symm_front( side_t side, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_symm_check( side, alpha, a, b, beta, c ); + bli_symm_check( side, alpha, a, b, beta, c, cntx ); // If alpha is zero, scale by beta and return. if ( bli_obj_equals( alpha, &BLIS_ZERO ) ) @@ -57,6 +58,10 @@ void bli_symm_front( side_t side, return; } + // Reinitialize the memory allocator to accommodate the blocksizes + // in the current context. + bli_mem_reinit( cntx ); + // Alias A, B, and C in case we need to apply transformations. bli_obj_alias_to( *a, a_local ); bli_obj_alias_to( *b, b_local ); @@ -66,14 +71,7 @@ void bli_symm_front( side_t side, // contiguous columns, or if C is stored by columns and the micro-kernel // prefers contiguous rows, transpose the entire operation to allow the // micro-kernel to access elements of C in its preferred manner. - if ( - ( bli_obj_is_row_stored( c_local ) && - bli_func_prefers_contig_cols( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) || - ( bli_obj_is_col_stored( c_local ) && - bli_func_prefers_contig_rows( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) - ) + if ( bli_cntx_l3_nat_ukr_dislikes_storage_of( &c_local, BLIS_GEMM_UKR, cntx ) ) { bli_toggle_side( side ); bli_obj_induce_trans( b_local ); @@ -92,12 +90,13 @@ void bli_symm_front( side_t side, // Invoke the internal back-end. bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_gemm_int, + (l3_int_t) bli_gemm_int, alpha, &a_local, &b_local, beta, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); diff --git a/frame/3/symm/bli_symm_front.h b/frame/3/symm/bli_symm_front.h index f05edea45..1fb9ec019 100644 --- a/frame/3/symm/bli_symm_front.h +++ b/frame/3/symm/bli_symm_front.h @@ -38,5 +38,6 @@ void bli_symm_front( side_t side, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/syr2k/bli_syr2k.h b/frame/3/syr2k/bli_syr2k.h index e55f5f067..f40cb8ed5 100644 --- a/frame/3/syr2k/bli_syr2k.h +++ b/frame/3/syr2k/bli_syr2k.h @@ -32,35 +32,5 @@ */ -#include "bli_syr2k_check.h" #include "bli_syr2k_front.h" - -// -// Prototype object-based interface. -// -void bli_syr2k( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c ); - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - trans_t transb, \ - dim_t m, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( syr2k ) - diff --git a/frame/3/syr2k/bli_syr2k_front.c b/frame/3/syr2k/bli_syr2k_front.c index ca7a2a8fe..17708b3ab 100644 --- a/frame/3/syr2k/bli_syr2k_front.c +++ b/frame/3/syr2k/bli_syr2k_front.c @@ -39,6 +39,7 @@ void bli_syr2k_front( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { obj_t c_local; @@ -49,7 +50,7 @@ void bli_syr2k_front( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_syr2k_check( alpha, a, b, beta, c ); + bli_syr2k_check( alpha, a, b, beta, c, cntx ); // If alpha is zero, scale by beta and return. if ( bli_obj_equals( alpha, &BLIS_ZERO ) ) @@ -58,6 +59,10 @@ void bli_syr2k_front( obj_t* alpha, return; } + // Reinitialize the memory allocator to accommodate the blocksizes + // in the current context. + bli_mem_reinit( cntx ); + // Alias A, B, and C in case we need to apply transformations. bli_obj_alias_to( *a, a_local ); bli_obj_alias_to( *b, b_local ); @@ -75,14 +80,7 @@ void bli_syr2k_front( obj_t* alpha, // contiguous columns, or if C is stored by columns and the micro-kernel // prefers contiguous rows, transpose the entire operation to allow the // micro-kernel to access elements of C in its preferred manner. - if ( - ( bli_obj_is_row_stored( c_local ) && - bli_func_prefers_contig_cols( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) || - ( bli_obj_is_col_stored( c_local ) && - bli_func_prefers_contig_rows( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) - ) + if ( bli_cntx_l3_nat_ukr_dislikes_storage_of( &c_local, BLIS_GEMM_UKR, cntx ) ) { bli_obj_induce_trans( c_local ); } @@ -105,22 +103,24 @@ void bli_syr2k_front( obj_t* alpha, // Invoke the internal back-end. bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_herk_int, + (l3_int_t) bli_herk_int, alpha, &a_local, &bt_local, beta, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_herk_int, + (l3_int_t) bli_herk_int, alpha, &b_local, &at_local, &BLIS_ONE, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); diff --git a/frame/3/syr2k/bli_syr2k_front.h b/frame/3/syr2k/bli_syr2k_front.h index a3c934905..674dfe5ce 100644 --- a/frame/3/syr2k/bli_syr2k_front.h +++ b/frame/3/syr2k/bli_syr2k_front.h @@ -37,5 +37,6 @@ void bli_syr2k_front( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/syrk/bli_syrk.h b/frame/3/syrk/bli_syrk.h index 08192ec80..a31ec23cd 100644 --- a/frame/3/syrk/bli_syrk.h +++ b/frame/3/syrk/bli_syrk.h @@ -32,32 +32,5 @@ */ -#include "bli_syrk_check.h" #include "bli_syrk_front.h" - -// -// Prototype object-based interface. -// -void bli_syrk( obj_t* alpha, - obj_t* a, - obj_t* beta, - obj_t* c ); - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - dim_t m, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( syrk ) - diff --git a/frame/3/syrk/bli_syrk_front.c b/frame/3/syrk/bli_syrk_front.c index 46529cde7..17e28b1f1 100644 --- a/frame/3/syrk/bli_syrk_front.c +++ b/frame/3/syrk/bli_syrk_front.c @@ -38,6 +38,7 @@ void bli_syrk_front( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { obj_t a_local; @@ -46,7 +47,7 @@ void bli_syrk_front( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_syrk_check( alpha, a, beta, c ); + bli_syrk_check( alpha, a, beta, c, cntx ); // If alpha is zero, scale by beta and return. if ( bli_obj_equals( alpha, &BLIS_ZERO ) ) @@ -55,6 +56,10 @@ void bli_syrk_front( obj_t* alpha, return; } + // Reinitialize the memory allocator to accommodate the blocksizes + // in the current context. + bli_mem_reinit( cntx ); + // Alias A and C in case we need to apply transformations. bli_obj_alias_to( *a, a_local ); bli_obj_alias_to( *c, c_local ); @@ -68,14 +73,7 @@ void bli_syrk_front( obj_t* alpha, // contiguous columns, or if C is stored by columns and the micro-kernel // prefers contiguous rows, transpose the entire operation to allow the // micro-kernel to access elements of C in its preferred manner. - if ( - ( bli_obj_is_row_stored( c_local ) && - bli_func_prefers_contig_cols( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) || - ( bli_obj_is_col_stored( c_local ) && - bli_func_prefers_contig_rows( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) - ) + if ( bli_cntx_l3_nat_ukr_dislikes_storage_of( &c_local, BLIS_GEMM_UKR, cntx ) ) { bli_obj_induce_trans( c_local ); } @@ -85,12 +83,13 @@ void bli_syrk_front( obj_t* alpha, // Invoke the internal back-end. bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_herk_int, + (l3_int_t) bli_herk_int, alpha, &a_local, &at_local, beta, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); diff --git a/frame/3/syrk/bli_syrk_front.h b/frame/3/syrk/bli_syrk_front.h index 817d77f1a..c7ab2a7b7 100644 --- a/frame/3/syrk/bli_syrk_front.h +++ b/frame/3/syrk/bli_syrk_front.h @@ -36,5 +36,6 @@ void bli_syrk_front( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/trmm/bli_trmm.h b/frame/3/trmm/bli_trmm.h index 09f407a9f..056fedb50 100644 --- a/frame/3/trmm/bli_trmm.h +++ b/frame/3/trmm/bli_trmm.h @@ -32,49 +32,8 @@ */ -#include "bli_trmm_blocksize.h" -#include "bli_trmm_check.h" #include "bli_trmm_front.h" #include "bli_trmm_int.h" -#include "bli_trmm_prune.h" -#include "bli_trmm_blk_var1f.h" - -#include "bli_trmm_blk_var2f.h" -#include "bli_trmm_blk_var2b.h" - -#include "bli_trmm_blk_var3f.h" -#include "bli_trmm_blk_var3b.h" - -#include "bli_trmm_ll_ker_var2.h" -#include "bli_trmm_lu_ker_var2.h" -#include "bli_trmm_rl_ker_var2.h" -#include "bli_trmm_ru_ker_var2.h" - - -// -// Prototype object-based interface. -// -void bli_trmm( side_t side, - obj_t* alpha, - obj_t* a, - obj_t* b ); - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b \ - ); - -INSERT_GENTPROT_BASIC( trmm ) +#include "bli_trmm_var.h" diff --git a/frame/3/trmm/bli_trmm_blk_var1f.c b/frame/3/trmm/bli_trmm_blk_var1f.c index d0127cbe8..73a7fef41 100644 --- a/frame/3/trmm/bli_trmm_blk_var1f.c +++ b/frame/3/trmm/bli_trmm_blk_var1f.c @@ -37,6 +37,7 @@ void bli_trmm_blk_var1f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ) { @@ -58,13 +59,13 @@ void bli_trmm_blk_var1f( obj_t* a, // Initialize object for packing B. bli_obj_init_pack( &b_pack_s ); bli_packm_init( b, &b_pack_s, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); // Scale C by beta (if instructed). // Since scalm doesn't support multithreading yet, must be done by chief thread (ew) bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } b_pack = thread_obroadcast( thread, &b_pack_s ); @@ -78,7 +79,7 @@ void bli_trmm_blk_var1f( obj_t* a, // Pack B (if instructed). bli_packm_int( b, b_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trmm_thread_sub_opackm( thread ) ); // Set the default length of and offset to the non-zero part of A. @@ -96,7 +97,7 @@ void bli_trmm_blk_var1f( obj_t* a, dim_t my_start, my_end; bli_get_range_weighted_t2b( thread, a, - bli_blksz_get_mult_for_obj( a, cntl_blocksize( cntl ) ), + bli_cntx_get_bmult( cntl_bszid( cntl ), cntx ), &my_start, &my_end ); // Partition along the m dimension. @@ -104,7 +105,7 @@ void bli_trmm_blk_var1f( obj_t* a, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, my_end, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and C1. bli_acquire_mpart_t2b( BLIS_SUBPART1, @@ -115,20 +116,20 @@ void bli_trmm_blk_var1f( obj_t* a, // Initialize objects for packing A1 and C1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packm_init( &c1, c1_pack, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trmm_thread_sub_ipackm( thread ) ); // Pack C1 (if instructed). bli_packm_int( &c1, c1_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), trmm_thread_sub_ipackm( thread ) ); // Perform trmm subproblem. @@ -137,13 +138,14 @@ void bli_trmm_blk_var1f( obj_t* a, b_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_gemm( cntl ), trmm_thread_sub_trmm( thread ) ); thread_ibarrier( thread ); // Unpack C1 (if C1 was packed). bli_unpackm_int( c1_pack, &c1, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), trmm_thread_sub_ipackm( thread ) ); } diff --git a/frame/3/trmm/bli_trmm_blk_var2b.c b/frame/3/trmm/bli_trmm_blk_var2b.c index fe3752ffa..01e623ddc 100644 --- a/frame/3/trmm/bli_trmm_blk_var2b.c +++ b/frame/3/trmm/bli_trmm_blk_var2b.c @@ -37,6 +37,7 @@ void bli_trmm_blk_var2b( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ) { @@ -58,12 +59,12 @@ void bli_trmm_blk_var2b( obj_t* a, // Initialize object for packing A bli_obj_init_pack( &a_pack_s ); bli_packm_init( a, &a_pack_s, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } a_pack = thread_obroadcast( thread, &a_pack_s ); @@ -77,12 +78,12 @@ void bli_trmm_blk_var2b( obj_t* a, // Pack A (if instructed). bli_packm_int( a, a_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trmm_thread_sub_opackm( thread ) ); dim_t my_start, my_end; bli_get_range_weighted_r2l( thread, b, - bli_blksz_get_mult_for_obj( b, cntl_blocksize( cntl ) ), + bli_cntx_get_bmult( cntl_bszid( cntl ), cntx ), &my_start, &my_end ); // Partition along the n dimension. @@ -90,7 +91,7 @@ void bli_trmm_blk_var2b( obj_t* a, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_b( i, my_end, b, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for B1 and C1. bli_acquire_mpart_r2l( BLIS_SUBPART1, @@ -101,20 +102,20 @@ void bli_trmm_blk_var2b( obj_t* a, // Initialize objects for packing A1 and B1. if( thread_am_ichief( thread ) ) { bli_packm_init( &b1, b1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); bli_packm_init( &c1, c1_pack, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); } thread_ibarrier( thread ); // Pack B1 (if instructed). bli_packm_int( &b1, b1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trmm_thread_sub_ipackm( thread ) ); // Pack C1 (if instructed). bli_packm_int( &c1, c1_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), trmm_thread_sub_ipackm( thread ) ); // Perform trmm subproblem. @@ -123,13 +124,14 @@ void bli_trmm_blk_var2b( obj_t* a, b1_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_gemm( cntl ), trmm_thread_sub_trmm( thread ) ); thread_ibarrier( thread ); // Unpack C1 (if C1 was packed). bli_unpackm_int( c1_pack, &c1, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), trmm_thread_sub_ipackm( thread ) ); } diff --git a/frame/3/trmm/bli_trmm_blk_var2f.c b/frame/3/trmm/bli_trmm_blk_var2f.c index 66ecb3840..10a2c7637 100644 --- a/frame/3/trmm/bli_trmm_blk_var2f.c +++ b/frame/3/trmm/bli_trmm_blk_var2f.c @@ -37,6 +37,7 @@ void bli_trmm_blk_var2f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ) { @@ -58,12 +59,12 @@ void bli_trmm_blk_var2f( obj_t* a, // Initialize object for packing A bli_obj_init_pack( &a_pack_s ); bli_packm_init( a, &a_pack_s, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } a_pack = thread_obroadcast( thread, &a_pack_s ); @@ -77,12 +78,12 @@ void bli_trmm_blk_var2f( obj_t* a, // Pack A (if instructed). bli_packm_int( a, a_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trmm_thread_sub_opackm( thread ) ); dim_t my_start, my_end; bli_get_range_weighted_l2r( thread, b, - bli_blksz_get_mult_for_obj( b, cntl_blocksize( cntl ) ), + bli_cntx_get_bmult( cntl_bszid( cntl ), cntx ), &my_start, &my_end ); // Partition along the n dimension. @@ -90,7 +91,7 @@ void bli_trmm_blk_var2f( obj_t* a, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, my_end, b, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for B1 and C1. bli_acquire_mpart_l2r( BLIS_SUBPART1, @@ -101,20 +102,20 @@ void bli_trmm_blk_var2f( obj_t* a, // Initialize objects for packing A1 and B1. if( thread_am_ichief( thread ) ) { bli_packm_init( &b1, b1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); bli_packm_init( &c1, c1_pack, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); } thread_ibarrier( thread ); // Pack B1 (if instructed). bli_packm_int( &b1, b1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trmm_thread_sub_ipackm( thread ) ); // Pack C1 (if instructed). bli_packm_int( &c1, c1_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), trmm_thread_sub_ipackm( thread ) ); // Perform trmm subproblem. @@ -123,13 +124,14 @@ void bli_trmm_blk_var2f( obj_t* a, b1_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_gemm( cntl ), trmm_thread_sub_trmm( thread ) ); thread_ibarrier( thread ); // Unpack C1 (if C1 was packed). bli_unpackm_int( c1_pack, &c1, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), trmm_thread_sub_ipackm( thread ) ); } diff --git a/frame/3/trmm/bli_trmm_blk_var3b.c b/frame/3/trmm/bli_trmm_blk_var3b.c index e81073b93..b808e8313 100644 --- a/frame/3/trmm/bli_trmm_blk_var3b.c +++ b/frame/3/trmm/bli_trmm_blk_var3b.c @@ -37,6 +37,7 @@ void bli_trmm_blk_var3b( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ) { @@ -59,12 +60,12 @@ void bli_trmm_blk_var3b( obj_t* a, // Initialize object for packing C bli_obj_init_pack( &c_pack_s ); bli_packm_init( c, &c_pack_s, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } c_pack = thread_obroadcast( thread, &c_pack_s ); @@ -78,7 +79,7 @@ void bli_trmm_blk_var3b( obj_t* a, // Pack C (if instructed). bli_packm_int( c, c_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), trmm_thread_sub_opackm( thread ) ); // Query dimension in partitioning direction. @@ -92,7 +93,7 @@ void bli_trmm_blk_var3b( obj_t* a, // blocksize so that we can implement the "nudging" of kc to be // a multiple of mr or nr, as needed. b_alg = bli_trmm_determine_kc_b( i, k_trans, a, b, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and B1. bli_acquire_mpart_r2l( BLIS_SUBPART1, @@ -103,20 +104,20 @@ void bli_trmm_blk_var3b( obj_t* a, // Initialize objects for packing A1 and B1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packm_init( &b1, b1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trmm_thread_sub_ipackm( thread ) ); // Pack B1 (if instructed). bli_packm_int( &b1, b1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trmm_thread_sub_ipackm( thread ) ); // Perform trmm subproblem. @@ -125,6 +126,7 @@ void bli_trmm_blk_var3b( obj_t* a, b1_pack, &BLIS_ONE, c_pack, + cntx, cntl_sub_gemm( cntl ), trmm_thread_sub_trmm( thread ) ); thread_ibarrier( thread ); @@ -134,7 +136,7 @@ void bli_trmm_blk_var3b( obj_t* a, // Unpack C (if C was packed). bli_unpackm_int( c_pack, c, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), trmm_thread_sub_opackm( thread ) ); // If any packing buffers were acquired within packm, release them back diff --git a/frame/3/trmm/bli_trmm_blk_var3f.c b/frame/3/trmm/bli_trmm_blk_var3f.c index 2b1dfa286..c2f465b47 100644 --- a/frame/3/trmm/bli_trmm_blk_var3f.c +++ b/frame/3/trmm/bli_trmm_blk_var3f.c @@ -37,6 +37,7 @@ void bli_trmm_blk_var3f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ) { @@ -59,12 +60,12 @@ void bli_trmm_blk_var3f( obj_t* a, // Initialize object for packing C bli_obj_init_pack( &c_pack_s ); bli_packm_init( c, &c_pack_s, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } c_pack = thread_obroadcast( thread, &c_pack_s ); @@ -78,7 +79,7 @@ void bli_trmm_blk_var3f( obj_t* a, // Pack C (if instructed). bli_packm_int( c, c_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), trmm_thread_sub_opackm( thread ) ); // Query dimension in partitioning direction. @@ -92,7 +93,7 @@ void bli_trmm_blk_var3f( obj_t* a, // blocksize so that we can implement the "nudging" of kc to be // a multiple of mr or nr, as needed. b_alg = bli_trmm_determine_kc_f( i, k_trans, a, b, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and B1. bli_acquire_mpart_l2r( BLIS_SUBPART1, @@ -103,20 +104,20 @@ void bli_trmm_blk_var3f( obj_t* a, // Initialize objects for packing A1 and B1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packm_init( &b1, b1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trmm_thread_sub_ipackm( thread ) ); // Pack B1 (if instructed). bli_packm_int( &b1, b1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trmm_thread_sub_ipackm( thread ) ); // Perform trmm subproblem. @@ -125,6 +126,7 @@ void bli_trmm_blk_var3f( obj_t* a, b1_pack, &BLIS_ONE, c_pack, + cntx, cntl_sub_gemm( cntl ), trmm_thread_sub_trmm( thread ) ); thread_ibarrier( thread ); @@ -134,7 +136,7 @@ void bli_trmm_blk_var3f( obj_t* a, // Unpack C (if C was packed). bli_unpackm_int( c_pack, c, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), trmm_thread_sub_opackm( thread ) ); // If any packing buffers were acquired within packm, release them back diff --git a/frame/3/trmm/bli_trmm_front.c b/frame/3/trmm/bli_trmm_front.c index 70bf338f2..58d9cfd34 100644 --- a/frame/3/trmm/bli_trmm_front.c +++ b/frame/3/trmm/bli_trmm_front.c @@ -38,6 +38,7 @@ void bli_trmm_front( side_t side, obj_t* alpha, obj_t* a, obj_t* b, + cntx_t* cntx, gemm_t* cntl ) { obj_t a_local; @@ -46,7 +47,7 @@ void bli_trmm_front( side_t side, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_trmm_check( side, alpha, a, b ); + bli_trmm_check( side, alpha, a, b, &BLIS_ZERO, b, cntx ); // If alpha is zero, scale by beta and return. if ( bli_obj_equals( alpha, &BLIS_ZERO ) ) @@ -55,6 +56,10 @@ void bli_trmm_front( side_t side, return; } + // Reinitialize the memory allocator to accommodate the blocksizes + // in the current context. + bli_mem_reinit( cntx ); + // Alias A and B so we can tweak the objects if necessary. bli_obj_alias_to( *a, a_local ); bli_obj_alias_to( *b, b_local ); @@ -99,14 +104,7 @@ void bli_trmm_front( side_t side, // NOTE: We disable the optimization for 1x1 matrices since the concept // of row- vs. column storage breaks down. if ( !bli_obj_is_1x1( c_local ) ) - if ( - ( bli_obj_is_row_stored( c_local ) && - bli_func_prefers_contig_cols( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) || - ( bli_obj_is_col_stored( c_local ) && - bli_func_prefers_contig_rows( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) - ) + if ( bli_cntx_l3_nat_ukr_dislikes_storage_of( &c_local, BLIS_GEMM_UKR, cntx ) ) { bli_toggle_side( side ); bli_obj_induce_trans( a_local ); @@ -136,12 +134,13 @@ void bli_trmm_front( side_t side, // Invoke the internal back-end. bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_trmm_int, + (l3_int_t) bli_trmm_int, alpha, &a_local, &b_local, &BLIS_ZERO, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); diff --git a/frame/3/trmm/bli_trmm_front.h b/frame/3/trmm/bli_trmm_front.h index e06198c17..a05284336 100644 --- a/frame/3/trmm/bli_trmm_front.h +++ b/frame/3/trmm/bli_trmm_front.h @@ -36,5 +36,6 @@ void bli_trmm_front( side_t side, obj_t* alpha, obj_t* a, obj_t* b, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/trmm/bli_trmm_int.c b/frame/3/trmm/bli_trmm_int.c index 4878aefea..af5b6f291 100644 --- a/frame/3/trmm/bli_trmm_int.c +++ b/frame/3/trmm/bli_trmm_int.c @@ -39,6 +39,7 @@ typedef void (*FUNCPTR_T)( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); @@ -89,6 +90,7 @@ void bli_trmm_int( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ) { @@ -102,7 +104,7 @@ void bli_trmm_int( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_trmm_int_check( alpha, a, b, beta, c, cntl ); + bli_gemm_basic_check( alpha, a, b, beta, c, cntx ); // If C has a zero dimension, return early. if ( bli_obj_has_zero_dim( *c ) ) return; @@ -177,6 +179,7 @@ void bli_trmm_int( obj_t* alpha, f( &a_local, &b_local, &c_local, + cntx, cntl, thread ); } diff --git a/frame/3/trmm/bli_trmm_int.h b/frame/3/trmm/bli_trmm_int.h index b4595fd38..d6df033bf 100644 --- a/frame/3/trmm/bli_trmm_int.h +++ b/frame/3/trmm/bli_trmm_int.h @@ -37,5 +37,6 @@ void bli_trmm_int( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); diff --git a/frame/3/trmm/bli_trmm_ll_ker_var2.c b/frame/3/trmm/bli_trmm_ll_ker_var2.c index a50b05dc4..3e807ac81 100644 --- a/frame/3/trmm/bli_trmm_ll_ker_var2.c +++ b/frame/3/trmm/bli_trmm_ll_ker_var2.c @@ -48,7 +48,7 @@ typedef void (*FUNCPTR_T)( void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, void* beta, void* c, inc_t rs_c, inc_t cs_c, - void* gemm_ukr, + cntx_t* cntx, trmm_thrinfo_t* thread ); @@ -58,6 +58,7 @@ static FUNCPTR_T GENARRAY(ftypes,trmm_ll_ker_var2); void bli_trmm_ll_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ) { @@ -94,10 +95,6 @@ void bli_trmm_ll_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemm_ukrs; - void* gemm_ukr; - - // Detach and multiply the scalars attached to A and B. bli_obj_scalar_detach( a, &scalar_a ); bli_obj_scalar_detach( b, &scalar_b ); @@ -112,12 +109,6 @@ void bli_trmm_ll_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t object containing - // the gemm micro-kernel function addresses, and then query the - // function address corresponding to the current datatype. - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( diagoffa, schema_a, @@ -130,45 +121,50 @@ void bli_trmm_ll_ker_var2( obj_t* a, buf_b, rs_b, pd_b, ps_b, buf_beta, buf_c, rs_c, cs_c, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, ukrtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffa, \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha, \ - void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ - void* beta, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemm_ukr, \ - trmm_thrinfo_t* jr_thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffa, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + trmm_thrinfo_t* jr_thread \ + ) \ { \ - /* Cast the micro-kernel address to its function pointer type. */ \ - PASTECH(ch,ukrtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ const dim_t MR = pd_a; \ const dim_t NR = pd_b; \ const dim_t PACKMR = cs_a; \ const dim_t PACKNR = rs_b; \ +\ + /* Query the context for the micro-kernel address and cast it to its + function pointer type. */ \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict one = PASTEMAC(ch,1); \ ctype* restrict zero = PASTEMAC(ch,0); \ @@ -388,13 +384,17 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k_a1011, \ - alpha_cast, \ - a1, \ - b1_i, \ - beta_cast, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k_a1011, \ + alpha_cast, \ + a1, \ + b1_i, \ + beta_cast, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ @@ -404,13 +404,17 @@ void PASTEMAC(ch,varname)( \ ct, rs_ct, cs_ct ); \ \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k_a1011, \ - alpha_cast, \ - a1, \ - b1_i, \ - beta_cast, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k_a1011, \ + alpha_cast, \ + a1, \ + b1_i, \ + beta_cast, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Copy the result to the edge of C. */ \ PASTEMAC(ch,copys_mxn)( m_cur, n_cur, \ @@ -450,24 +454,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - one, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + one, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Add the result to the edge of C. */ \ PASTEMAC(ch,adds_mxn)( m_cur, n_cur, \ @@ -490,5 +502,5 @@ void PASTEMAC(ch,varname)( \ /*PASTEMAC(ch,fprintm)( stdout, "trmm_ll_ker_var2: b1", k_a1011, NR, b1_i, NR, 1, "%4.1f", "" );*/ \ } -INSERT_GENTFUNC_BASIC( trmm_ll_ker_var2, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( trmm_ll_ker_var2 ) diff --git a/frame/3/trmm/bli_trmm_lu_ker_var2.c b/frame/3/trmm/bli_trmm_lu_ker_var2.c index 35a2cefe2..6ef455873 100644 --- a/frame/3/trmm/bli_trmm_lu_ker_var2.c +++ b/frame/3/trmm/bli_trmm_lu_ker_var2.c @@ -48,7 +48,7 @@ typedef void (*FUNCPTR_T)( void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, void* beta, void* c, inc_t rs_c, inc_t cs_c, - void* gemm_ukr, + cntx_t* cntx, trmm_thrinfo_t* thread ); @@ -58,6 +58,7 @@ static FUNCPTR_T GENARRAY(ftypes,trmm_lu_ker_var2); void bli_trmm_lu_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ) { @@ -94,10 +95,6 @@ void bli_trmm_lu_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemm_ukrs; - void* gemm_ukr; - - // Detach and multiply the scalars attached to A and B. bli_obj_scalar_detach( a, &scalar_a ); bli_obj_scalar_detach( b, &scalar_b ); @@ -112,12 +109,6 @@ void bli_trmm_lu_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t object containing - // the gemm micro-kernel function addresses, and then query the - // function address corresponding to the current datatype. - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( diagoffa, schema_a, @@ -130,45 +121,50 @@ void bli_trmm_lu_ker_var2( obj_t* a, buf_b, rs_b, pd_b, ps_b, buf_beta, buf_c, rs_c, cs_c, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, ukrtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffa, \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha, \ - void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ - void* beta, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemm_ukr, \ - trmm_thrinfo_t* jr_thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffa, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + trmm_thrinfo_t* jr_thread \ + ) \ { \ - /* Cast the micro-kernel address to its function pointer type. */ \ - PASTECH(ch,ukrtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ const dim_t MR = pd_a; \ const dim_t NR = pd_b; \ const dim_t PACKMR = cs_a; \ const dim_t PACKNR = rs_b; \ +\ + /* Query the context for the micro-kernel address and cast it to its + function pointer type. */ \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict one = PASTEMAC(ch,1); \ ctype* restrict zero = PASTEMAC(ch,0); \ @@ -395,13 +391,17 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k_a1112, \ - alpha_cast, \ - a1, \ - b1_i, \ - beta_cast, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k_a1112, \ + alpha_cast, \ + a1, \ + b1_i, \ + beta_cast, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ @@ -411,13 +411,17 @@ void PASTEMAC(ch,varname)( \ ct, rs_ct, cs_ct ); \ \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k_a1112, \ - alpha_cast, \ - a1, \ - b1_i, \ - beta_cast, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k_a1112, \ + alpha_cast, \ + a1, \ + b1_i, \ + beta_cast, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Copy the result to the edge of C. */ \ PASTEMAC(ch,copys_mxn)( m_cur, n_cur, \ @@ -457,24 +461,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - one, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + one, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Add the result to the edge of C. */ \ PASTEMAC(ch,adds_mxn)( m_cur, n_cur, \ @@ -498,5 +510,5 @@ void PASTEMAC(ch,varname)( \ /*PASTEMAC(ch,fprintm)( stdout, "trmm_lu_ker_var2: b1", k_a1112, NR, b1_i, NR, 1, "%4.1f", "" );*/ \ } -INSERT_GENTFUNC_BASIC( trmm_lu_ker_var2, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( trmm_lu_ker_var2 ) diff --git a/frame/3/trmm/bli_trmm_rl_ker_var2.c b/frame/3/trmm/bli_trmm_rl_ker_var2.c index 941f7a7f2..4b43ebc36 100644 --- a/frame/3/trmm/bli_trmm_rl_ker_var2.c +++ b/frame/3/trmm/bli_trmm_rl_ker_var2.c @@ -48,7 +48,7 @@ typedef void (*FUNCPTR_T)( void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, void* beta, void* c, inc_t rs_c, inc_t cs_c, - void* gemm_ukr, + cntx_t* cntx, trmm_thrinfo_t* thread ); @@ -58,6 +58,7 @@ static FUNCPTR_T GENARRAY(ftypes,trmm_rl_ker_var2); void bli_trmm_rl_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ) { @@ -94,10 +95,6 @@ void bli_trmm_rl_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemm_ukrs; - void* gemm_ukr; - - // Detach and multiply the scalars attached to A and B. bli_obj_scalar_detach( a, &scalar_a ); bli_obj_scalar_detach( b, &scalar_b ); @@ -112,12 +109,6 @@ void bli_trmm_rl_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t object containing - // the gemm micro-kernel function addresses, and then query the - // function address corresponding to the current datatype. - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( diagoffb, schema_a, @@ -130,45 +121,50 @@ void bli_trmm_rl_ker_var2( obj_t* a, buf_b, rs_b, pd_b, ps_b, buf_beta, buf_c, rs_c, cs_c, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, ukrtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffb, \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha, \ - void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ - void* beta, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemm_ukr, \ - trmm_thrinfo_t* jr_thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffb, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + trmm_thrinfo_t* jr_thread \ + ) \ { \ - /* Cast the micro-kernel address to its function pointer type. */ \ - PASTECH(ch,ukrtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ const dim_t MR = pd_a; \ const dim_t NR = pd_b; \ const dim_t PACKMR = cs_a; \ const dim_t PACKNR = rs_b; \ +\ + /* Query the context for the micro-kernel address and cast it to its + function pointer type. */ \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict one = PASTEMAC(ch,1); \ ctype* restrict zero = PASTEMAC(ch,0); \ @@ -395,13 +391,17 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k_b1121, \ - alpha_cast, \ - a1_i, \ - b1, \ - beta_cast, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k_b1121, \ + alpha_cast, \ + a1_i, \ + b1, \ + beta_cast, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ @@ -411,13 +411,17 @@ void PASTEMAC(ch,varname)( \ ct, rs_ct, cs_ct ); \ \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k_b1121, \ - alpha_cast, \ - a1_i, \ - b1, \ - beta_cast, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k_b1121, \ + alpha_cast, \ + a1_i, \ + b1, \ + beta_cast, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Copy the result to the edge of C. */ \ PASTEMAC(ch,copys_mxn)( m_cur, n_cur, \ @@ -469,24 +473,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - one, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + one, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Add the result to the edge of C. */ \ PASTEMAC(ch,adds_mxn)( m_cur, n_cur, \ @@ -510,5 +522,5 @@ void PASTEMAC(ch,varname)( \ /*PASTEMAC(ch,fprintm)( stdout, "trmm_rl_ker_var2: b1", k_b1121, NR, b1_i, NR, 1, "%4.1f", "" );*/ \ } -INSERT_GENTFUNC_BASIC( trmm_rl_ker_var2, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( trmm_rl_ker_var2 ) diff --git a/frame/3/trmm/bli_trmm_ru_ker_var2.c b/frame/3/trmm/bli_trmm_ru_ker_var2.c index 6d7127f6f..c6a7e8243 100644 --- a/frame/3/trmm/bli_trmm_ru_ker_var2.c +++ b/frame/3/trmm/bli_trmm_ru_ker_var2.c @@ -48,7 +48,7 @@ typedef void (*FUNCPTR_T)( void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, void* beta, void* c, inc_t rs_c, inc_t cs_c, - void* gemm_ukr, + cntx_t* cntx, trmm_thrinfo_t* thread ); @@ -58,6 +58,7 @@ static FUNCPTR_T GENARRAY(ftypes,trmm_ru_ker_var2); void bli_trmm_ru_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ) { @@ -94,10 +95,6 @@ void bli_trmm_ru_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemm_ukrs; - void* gemm_ukr; - - // Detach and multiply the scalars attached to A and B. bli_obj_scalar_detach( a, &scalar_a ); bli_obj_scalar_detach( b, &scalar_b ); @@ -112,12 +109,6 @@ void bli_trmm_ru_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t object containing - // the gemm micro-kernel function addresses, and then query the - // function address corresponding to the current datatype. - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( diagoffb, schema_a, @@ -130,45 +121,50 @@ void bli_trmm_ru_ker_var2( obj_t* a, buf_b, rs_b, pd_b, ps_b, buf_beta, buf_c, rs_c, cs_c, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, ukrtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffb, \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha, \ - void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ - void* beta, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemm_ukr, \ - trmm_thrinfo_t* jr_thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffb, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + trmm_thrinfo_t* jr_thread \ + ) \ { \ - /* Cast the micro-kernel address to its function pointer type. */ \ - PASTECH(ch,ukrtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ const dim_t MR = pd_a; \ const dim_t NR = pd_b; \ const dim_t PACKMR = cs_a; \ const dim_t PACKNR = rs_b; \ +\ + /* Query the context for the micro-kernel address and cast it to its + function pointer type. */ \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict one = PASTEMAC(ch,1); \ ctype* restrict zero = PASTEMAC(ch,0); \ @@ -395,13 +391,17 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k_b0111, \ - alpha_cast, \ - a1_i, \ - b1, \ - beta_cast, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k_b0111, \ + alpha_cast, \ + a1_i, \ + b1, \ + beta_cast, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ @@ -411,13 +411,17 @@ void PASTEMAC(ch,varname)( \ ct, rs_ct, cs_ct ); \ \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k_b0111, \ - alpha_cast, \ - a1_i, \ - b1, \ - beta_cast, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k_b0111, \ + alpha_cast, \ + a1_i, \ + b1, \ + beta_cast, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Copy the result to the edge of C. */ \ PASTEMAC(ch,copys_mxn)( m_cur, n_cur, \ @@ -469,24 +473,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - one, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + one, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - alpha_cast, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + alpha_cast, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Add the result to the edge of C. */ \ PASTEMAC(ch,adds_mxn)( m_cur, n_cur, \ @@ -510,5 +522,5 @@ void PASTEMAC(ch,varname)( \ /*PASTEMAC(ch,fprintm)( stdout, "trmm_ru_ker_var2: b1", k_b0111, NR, b1_i, NR, 1, "%4.1f", "" );*/ \ } -INSERT_GENTFUNC_BASIC( trmm_ru_ker_var2, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( trmm_ru_ker_var2 ) diff --git a/frame/3/trmm/bli_trmm_var.h b/frame/3/trmm/bli_trmm_var.h new file mode 100644 index 000000000..8a176e57b --- /dev/null +++ b/frame/3/trmm/bli_trmm_var.h @@ -0,0 +1,96 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* a, \ + obj_t* b, \ + obj_t* c, \ + cntx_t* cntx, \ + gemm_t* cntl, \ + trmm_thrinfo_t* thread \ + ); + +GENPROT( trmm_blk_var1f ) +//GENPROT( trmm_blk_var1b ) // variant doesn't exist b/c it's not needed +GENPROT( trmm_blk_var2f ) +GENPROT( trmm_blk_var2b ) +GENPROT( trmm_blk_var3f ) +GENPROT( trmm_blk_var3b ) + +GENPROT( trmm_ll_ker_var2 ) +GENPROT( trmm_lu_ker_var2 ) +GENPROT( trmm_rl_ker_var2 ) +GENPROT( trmm_ru_ker_var2 ) + + +// +// Prototype BLAS-like interfaces with void pointer operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoff, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha, \ + void* a, inc_t cs_a, \ + dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, \ + dim_t pd_b, inc_t ps_b, \ + void* beta, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + trmm_thrinfo_t* thread \ + ); + +INSERT_GENTPROT_BASIC( trmm_ll_ker_var2 ) +INSERT_GENTPROT_BASIC( trmm_lu_ker_var2 ) +INSERT_GENTPROT_BASIC( trmm_rl_ker_var2 ) +INSERT_GENTPROT_BASIC( trmm_ru_ker_var2 ) + diff --git a/frame/3/trmm/bli_trmm_blk_var1f.h b/frame/3/trmm/old/bli_trmm_blk_var1f.h similarity index 97% rename from frame/3/trmm/bli_trmm_blk_var1f.h rename to frame/3/trmm/old/bli_trmm_blk_var1f.h index ccf3118a8..81854ff59 100644 --- a/frame/3/trmm/bli_trmm_blk_var1f.h +++ b/frame/3/trmm/old/bli_trmm_blk_var1f.h @@ -35,6 +35,7 @@ void bli_trmm_blk_var1f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); diff --git a/frame/3/trmm/bli_trmm_blk_var2b.h b/frame/3/trmm/old/bli_trmm_blk_var2b.h similarity index 97% rename from frame/3/trmm/bli_trmm_blk_var2b.h rename to frame/3/trmm/old/bli_trmm_blk_var2b.h index dda36c6f8..81bd13064 100644 --- a/frame/3/trmm/bli_trmm_blk_var2b.h +++ b/frame/3/trmm/old/bli_trmm_blk_var2b.h @@ -35,6 +35,7 @@ void bli_trmm_blk_var2b( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); diff --git a/frame/3/trmm/bli_trmm_blk_var2f.h b/frame/3/trmm/old/bli_trmm_blk_var2f.h similarity index 97% rename from frame/3/trmm/bli_trmm_blk_var2f.h rename to frame/3/trmm/old/bli_trmm_blk_var2f.h index 4c53ebcac..412e75ae3 100644 --- a/frame/3/trmm/bli_trmm_blk_var2f.h +++ b/frame/3/trmm/old/bli_trmm_blk_var2f.h @@ -35,6 +35,7 @@ void bli_trmm_blk_var2f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); diff --git a/frame/3/trmm/bli_trmm_blk_var3b.h b/frame/3/trmm/old/bli_trmm_blk_var3b.h similarity index 97% rename from frame/3/trmm/bli_trmm_blk_var3b.h rename to frame/3/trmm/old/bli_trmm_blk_var3b.h index 81629c7e3..1ec704e2e 100644 --- a/frame/3/trmm/bli_trmm_blk_var3b.h +++ b/frame/3/trmm/old/bli_trmm_blk_var3b.h @@ -35,6 +35,7 @@ void bli_trmm_blk_var3b( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); diff --git a/frame/3/trmm/bli_trmm_blk_var3f.h b/frame/3/trmm/old/bli_trmm_blk_var3f.h similarity index 97% rename from frame/3/trmm/bli_trmm_blk_var3f.h rename to frame/3/trmm/old/bli_trmm_blk_var3f.h index 51342567b..16276ef01 100644 --- a/frame/3/trmm/bli_trmm_blk_var3f.h +++ b/frame/3/trmm/old/bli_trmm_blk_var3f.h @@ -35,6 +35,7 @@ void bli_trmm_blk_var3f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); diff --git a/frame/3/trmm/bli_trmm_ll_ker_var2.h b/frame/3/trmm/old/bli_trmm_ll_ker_var2.h similarity index 95% rename from frame/3/trmm/bli_trmm_ll_ker_var2.h rename to frame/3/trmm/old/bli_trmm_ll_ker_var2.h index 228cea65c..8550e5b1d 100644 --- a/frame/3/trmm/bli_trmm_ll_ker_var2.h +++ b/frame/3/trmm/old/bli_trmm_ll_ker_var2.h @@ -39,6 +39,7 @@ void bli_trmm_ll_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); @@ -61,7 +62,8 @@ void PASTEMAC(ch,varname)( \ void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ void* beta, \ void* c, inc_t rs_c, inc_t cs_c, \ - void* gemm_ukr, \ + cntx_t* cntx, \ + void* gemm_ukr, \ trmm_thrinfo_t* thread \ ); diff --git a/frame/3/trmm/bli_trmm_lu_ker_var2.h b/frame/3/trmm/old/bli_trmm_lu_ker_var2.h similarity index 97% rename from frame/3/trmm/bli_trmm_lu_ker_var2.h rename to frame/3/trmm/old/bli_trmm_lu_ker_var2.h index 6f750b077..03cf11997 100644 --- a/frame/3/trmm/bli_trmm_lu_ker_var2.h +++ b/frame/3/trmm/old/bli_trmm_lu_ker_var2.h @@ -39,6 +39,7 @@ void bli_trmm_lu_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); @@ -61,6 +62,7 @@ void PASTEMAC(ch,varname)( \ void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ void* beta, \ void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ void* gemm_ukr, \ trmm_thrinfo_t* thread \ ); diff --git a/frame/3/trmm/bli_trmm_rl_ker_var2.h b/frame/3/trmm/old/bli_trmm_rl_ker_var2.h similarity index 97% rename from frame/3/trmm/bli_trmm_rl_ker_var2.h rename to frame/3/trmm/old/bli_trmm_rl_ker_var2.h index a10123eba..2202a33e5 100644 --- a/frame/3/trmm/bli_trmm_rl_ker_var2.h +++ b/frame/3/trmm/old/bli_trmm_rl_ker_var2.h @@ -39,6 +39,7 @@ void bli_trmm_rl_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); @@ -61,6 +62,7 @@ void PASTEMAC(ch,varname)( \ void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ void* beta, \ void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ void* gemm_ukr, \ trmm_thrinfo_t* thread \ ); diff --git a/frame/3/trmm/bli_trmm_ru_ker_var2.h b/frame/3/trmm/old/bli_trmm_ru_ker_var2.h similarity index 97% rename from frame/3/trmm/bli_trmm_ru_ker_var2.h rename to frame/3/trmm/old/bli_trmm_ru_ker_var2.h index 7bdcd796d..4c30dcb94 100644 --- a/frame/3/trmm/bli_trmm_ru_ker_var2.h +++ b/frame/3/trmm/old/bli_trmm_ru_ker_var2.h @@ -39,6 +39,7 @@ void bli_trmm_ru_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, gemm_t* cntl, trmm_thrinfo_t* thread ); @@ -61,6 +62,7 @@ void PASTEMAC(ch,varname)( \ void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ void* beta, \ void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ void* gemm_ukr, \ trmm_thrinfo_t* thread \ ); diff --git a/frame/3/trmm/other/bli_trmm_ll_blk_var1.c b/frame/3/trmm/other/bli_trmm_ll_blk_var1.c deleted file mode 100644 index 7c68ca917..000000000 --- a/frame/3/trmm/other/bli_trmm_ll_blk_var1.c +++ /dev/null @@ -1,131 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -void bli_trmm_ll_blk_var1( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trmm_t* cntl ) -{ - obj_t a1, a1_pack; - obj_t b_pack; - obj_t c1, c1_pack; - - dim_t i; - dim_t b_alg; - dim_t m_trans; - dim_t offB; - - // Initialize all pack objects that are passed into packm_init(). - bli_obj_init_pack( &a1_pack ); - bli_obj_init_pack( &b_pack ); - bli_obj_init_pack( &c1_pack ); - - // Query dimension in partitioning direction. - m_trans = bli_obj_length_after_trans( *a ); - - // If A is [lower] triangular, use the diagonal offset of A to skip over - // the zero region. - if ( bli_obj_is_triangular( *a ) ) - offB = bli_abs( bli_obj_diag_offset_after_trans( *a ) ); - else // if ( bli_obj_is_general( *a ) - offB = 0; - - // Scale C by beta (if instructed). - bli_scalm_int( beta, - c, - cntl_sub_scalm( cntl ) ); - - // Initialize object for packing B. - bli_packm_init( b, &b_pack, - cntl_sub_packm_b( cntl ) ); - - // Pack B and scale by alpha (if instructed). - bli_packm_int( alpha, - b, &b_pack, - cntl_sub_packm_b( cntl ) ); - - // Partition along the m dimension. - for ( i = offB; i < m_trans; i += b_alg ) - { - // Determine the current algorithmic blocksize. - b_alg = bli_determine_blocksize_f( i, m_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_t2b( BLIS_SUBPART1, - i, b_alg, a, &a1 ); - bli_acquire_mpart_t2b( BLIS_SUBPART1, - i, b_alg, c, &c1 ); - - // Initialize objects for packing A1 and C1. - bli_packm_init( &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - bli_packm_init( &c1, &c1_pack, - cntl_sub_packm_c( cntl ) ); - - // Pack A1 and scale by alpha (if instructed). - bli_packm_int( alpha, - &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - - // Pack C1 and scale by beta (if instructed). - bli_packm_int( beta, - &c1, &c1_pack, - cntl_sub_packm_c( cntl ) ); - - // Perform trmm subproblem. - bli_trmm_int( BLIS_LEFT, - alpha, - &a1_pack, - &b_pack, - beta, - &c1_pack, - cntl_sub_gemm( cntl ) ); - - // Unpack C1 (if C1 was packed). - bli_unpackm_int( &c1_pack, &c1, - cntl_sub_unpackm_c( cntl ) ); - } - - // If any packing buffers were acquired within packm, release them back - // to the memory manager. - bli_obj_release_pack( &a1_pack ); - bli_obj_release_pack( &b_pack ); - bli_obj_release_pack( &c1_pack ); -} - diff --git a/frame/3/trmm/other/bli_trmm_ll_blk_var1.h b/frame/3/trmm/other/bli_trmm_ll_blk_var1.h deleted file mode 100644 index ff6928122..000000000 --- a/frame/3/trmm/other/bli_trmm_ll_blk_var1.h +++ /dev/null @@ -1,41 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_trmm_ll_blk_var1( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trmm_t* cntl ); - diff --git a/frame/3/trmm/other/bli_trmm_ll_blk_var4.c b/frame/3/trmm/other/bli_trmm_ll_blk_var4.c deleted file mode 100644 index 3b69f3824..000000000 --- a/frame/3/trmm/other/bli_trmm_ll_blk_var4.c +++ /dev/null @@ -1,195 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -void bli_trmm_ll_blk_var4( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trmm_t* cntl ) -{ - obj_t a1, a1_pack; - obj_t b_pack; - obj_t c1, c1_pack; - - dim_t i; - dim_t bm_alg; - dim_t m_trans; - dim_t offB; - - // Initialize all pack objects that are passed into packm_init(). - bli_obj_init_pack( &a1_pack ); - bli_obj_init_pack( &b_pack ); - bli_obj_init_pack( &c1_pack ); - - // Query dimension in partitioning direction. - m_trans = bli_obj_length_after_trans( *a ); - - // Use the diagonal offset of A to skip over the zero region. - offB = bli_abs( bli_obj_diag_offset_after_trans( *a ) ); - - // Scale C by beta (if instructed). - bli_scalm_int( beta, - c, - cntl_sub_scalm( cntl ) ); - - // Initialize object for packing B. - bli_packm_init( b, &b_pack, - cntl_sub_packm_b( cntl ) ); - - // Fuse the first iteration with incremental packing and computation. - { - obj_t b_inc, b_pack_inc; - obj_t c1_pack_inc; - - dim_t j; - dim_t bn_inc; - dim_t n_trans; - - // Query dimension in partitioning direction. - n_trans = bli_obj_width( b_pack ); - - // Determine the current algorithmic blocksize. - bm_alg = bli_determine_blocksize_f( offB, m_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_t2b( BLIS_SUBPART1, - offB, bm_alg, a, &a1 ); - bli_acquire_mpart_t2b( BLIS_SUBPART1, - offB, bm_alg, c, &c1 ); - - // Initialize objects for packing A1 and C1. - bli_packm_init( &a1, &a1_pack, cntl_sub_packm_a( cntl ) ); - bli_packm_init( &c1, &c1_pack, cntl_sub_packm_c( cntl ) ); - - // Pack A1 and scale by alpha (if instructed). - bli_packm_int( alpha, &a1, &a1_pack, cntl_sub_packm_a( cntl ) ); - - // Pack C1 and scale by beta (if instructed). - bli_packm_int( beta, &c1, &c1_pack, cntl_sub_packm_c( cntl ) ); - - // Partition along the n dimension. - for ( j = 0; j < n_trans; j += bn_inc ) - { - // Determine the current incremental packing blocksize. - bn_inc = bli_determine_blocksize_f( j, n_trans, b, - cntl_blocksize_aux( cntl ) ); - - // Acquire partitions. - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, b, &b_inc ); - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, &b_pack, &b_pack_inc ); - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, &c1_pack, &c1_pack_inc ); - - // Pack B1 and scale by alpha (if instructed). - bli_packm_int( alpha, &b_inc, &b_pack_inc, cntl_sub_packm_b( cntl ) ); - - // Perform trmm subproblem. - bli_trmm_int( BLIS_LEFT, - alpha, - &a1_pack, - &b_pack_inc, - beta, - &c1_pack_inc, - cntl_sub_gemm( cntl ) ); - } - - // Unpack C1 (if C1 was packed). - bli_unpackm_int( &c1_pack, &c1, cntl_sub_unpackm_c( cntl ) ); - } - - - // Partition along the remaining portion of the m dimension. - for ( i = offB + bm_alg; i < m_trans; i += bm_alg ) - { - // Determine the current algorithmic blocksize. - bm_alg = bli_determine_blocksize_f( i, m_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_t2b( BLIS_SUBPART1, - i, bm_alg, a, &a1 ); - bli_acquire_mpart_t2b( BLIS_SUBPART1, - i, bm_alg, c, &c1 ); - - // Initialize objects for packing A1 and C1. - bli_packm_init( &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - bli_packm_init( &c1, &c1_pack, - cntl_sub_packm_c( cntl ) ); - - // Pack A1 and scale by alpha (if instructed). - bli_packm_int( alpha, - &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - - // Pack C1 and scale by beta (if instructed). - bli_packm_int( beta, - &c1, &c1_pack, - cntl_sub_packm_c( cntl ) ); - - // Perform trmm subproblem. - if ( bli_obj_intersects_diag( a1_pack ) ) - bli_trmm_int( BLIS_LEFT, - alpha, - &a1_pack, - &b_pack, - beta, - &c1_pack, - cntl_sub_gemm( cntl ) ); - else - bli_gemm_int( alpha, - &a1_pack, - &b_pack, - &BLIS_ONE, - &c1_pack, - cntl_sub_gemm( cntl ) ); - - // Unpack C1 (if C1 was packed). - bli_unpackm_int( &c1_pack, &c1, - cntl_sub_unpackm_c( cntl ) ); - } - - // If any packing buffers were acquired within packm, release them back - // to the memory manager. - bli_obj_release_pack( &a1_pack ); - bli_obj_release_pack( &b_pack ); - bli_obj_release_pack( &c1_pack ); -} - diff --git a/frame/3/trmm/other/bli_trmm_ll_blk_var4.h b/frame/3/trmm/other/bli_trmm_ll_blk_var4.h deleted file mode 100644 index b9b9f8e12..000000000 --- a/frame/3/trmm/other/bli_trmm_ll_blk_var4.h +++ /dev/null @@ -1,41 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_trmm_ll_blk_var4( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trmm_t* cntl ); - diff --git a/frame/3/trmm/other/bli_trmm_lu_blk_var1.c b/frame/3/trmm/other/bli_trmm_lu_blk_var1.c deleted file mode 100644 index 0aab4b414..000000000 --- a/frame/3/trmm/other/bli_trmm_lu_blk_var1.c +++ /dev/null @@ -1,128 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -void bli_trmm_lu_blk_var1( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trmm_t* cntl ) -{ - obj_t a1, a1_pack; - obj_t b_pack; - obj_t c1, c1_pack; - - dim_t i; - dim_t b_alg; - dim_t mT_trans; - - // Initialize all pack objects that are passed into packm_init(). - bli_obj_init_pack( &a1_pack ); - bli_obj_init_pack( &b_pack ); - bli_obj_init_pack( &c1_pack ); - - // If A is [upper] triangular, use the diagonal offset of A to determine - // the length of the non-zero region. - if ( bli_obj_is_triangular( *a ) ) - mT_trans = bli_abs( bli_obj_diag_offset_after_trans( *a ) ) + - bli_obj_width_after_trans( *a ); - else // if ( bli_obj_is_general( *a ) - mT_trans = bli_obj_length_after_trans( *a ); - - // Scale C by beta (if instructed). - bli_scalm_int( beta, - c, - cntl_sub_scalm( cntl ) ); - - // Initialize object for packing B. - bli_packm_init( b, &b_pack, - cntl_sub_packm_b( cntl ) ); - - // Pack B and scale by alpha (if instructed). - bli_packm_int( alpha, - b, &b_pack, - cntl_sub_packm_b( cntl ) ); - - // Partition along the m dimension. - for ( i = 0; i < mT_trans; i += b_alg ) - { - // Determine the current algorithmic blocksize. - b_alg = bli_determine_blocksize_f( i, mT_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_t2b( BLIS_SUBPART1, - i, b_alg, a, &a1 ); - bli_acquire_mpart_t2b( BLIS_SUBPART1, - i, b_alg, c, &c1 ); - - // Initialize objects for packing A1 and C1. - bli_packm_init( &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - bli_packm_init( &c1, &c1_pack, - cntl_sub_packm_c( cntl ) ); - - // Pack A1 and scale by alpha (if instructed). - bli_packm_int( alpha, - &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - - // Pack C1 and scale by beta (if instructed). - bli_packm_int( beta, - &c1, &c1_pack, - cntl_sub_packm_c( cntl ) ); - - // Perform trmm subproblem. - bli_trmm_int( BLIS_LEFT, - alpha, - &a1_pack, - &b_pack, - beta, - &c1_pack, - cntl_sub_gemm( cntl ) ); - - // Unpack C1 (if C1 was packed). - bli_unpackm_int( &c1_pack, &c1, - cntl_sub_unpackm_c( cntl ) ); - } - - // If any packing buffers were acquired within packm, release them back - // to the memory manager. - bli_obj_release_pack( &a1_pack ); - bli_obj_release_pack( &b_pack ); - bli_obj_release_pack( &c1_pack ); -} - diff --git a/frame/3/trmm/other/bli_trmm_lu_blk_var1.h b/frame/3/trmm/other/bli_trmm_lu_blk_var1.h deleted file mode 100644 index 12d53e8f1..000000000 --- a/frame/3/trmm/other/bli_trmm_lu_blk_var1.h +++ /dev/null @@ -1,41 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_trmm_lu_blk_var1( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trmm_t* cntl ); - diff --git a/frame/3/trmm/other/bli_trmm_lu_blk_var4.c b/frame/3/trmm/other/bli_trmm_lu_blk_var4.c deleted file mode 100644 index 2d54f3d44..000000000 --- a/frame/3/trmm/other/bli_trmm_lu_blk_var4.c +++ /dev/null @@ -1,193 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -void bli_trmm_lu_blk_var4( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trmm_t* cntl ) -{ - obj_t a1, a1_pack; - obj_t b_pack; - obj_t c1, c1_pack; - - dim_t i; - dim_t bm_alg; - dim_t mT_trans; - - // Initialize all pack objects that are passed into packm_init(). - bli_obj_init_pack( &a1_pack ); - bli_obj_init_pack( &b_pack ); - bli_obj_init_pack( &c1_pack ); - - // Query dimension in partitioning direction. Use the diagonal offset - // to stop short of the zero region. - mT_trans = bli_abs( bli_obj_diag_offset_after_trans( *a ) ) + - bli_obj_width_after_trans( *a ); - - // Scale C by beta (if instructed). - bli_scalm_int( beta, - c, - cntl_sub_scalm( cntl ) ); - - // Initialize object for packing B. - bli_packm_init( b, &b_pack, - cntl_sub_packm_b( cntl ) ); - - // Fuse the first iteration with incremental packing and computation. - { - obj_t b_inc, b_pack_inc; - obj_t c1_pack_inc; - - dim_t j; - dim_t bn_inc; - dim_t n_trans; - - // Query dimension in partitioning direction. - n_trans = bli_obj_width( b_pack ); - - // Determine the current algorithmic blocksize. - bm_alg = bli_determine_blocksize_f( 0, mT_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_t2b( BLIS_SUBPART1, - 0, bm_alg, a, &a1 ); - bli_acquire_mpart_t2b( BLIS_SUBPART1, - 0, bm_alg, c, &c1 ); - - // Initialize objects for packing A1 and C1. - bli_packm_init( &a1, &a1_pack, cntl_sub_packm_a( cntl ) ); - bli_packm_init( &c1, &c1_pack, cntl_sub_packm_c( cntl ) ); - - // Pack A1 and scale by alpha (if instructed). - bli_packm_int( alpha, &a1, &a1_pack, cntl_sub_packm_a( cntl ) ); - - // Pack C1 and scale by beta (if instructed). - bli_packm_int( beta, &c1, &c1_pack, cntl_sub_packm_c( cntl ) ); - - // Partition along the n dimension. - for ( j = 0; j < n_trans; j += bn_inc ) - { - // Determine the current incremental packing blocksize. - bn_inc = bli_determine_blocksize_f( j, n_trans, b, - cntl_blocksize_aux( cntl ) ); - - // Acquire partitions. - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, b, &b_inc ); - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, &b_pack, &b_pack_inc ); - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, &c1_pack, &c1_pack_inc ); - - // Pack B1 and scale by alpha (if instructed). - bli_packm_int( alpha, &b_inc, &b_pack_inc, cntl_sub_packm_b( cntl ) ); - - // Perform trmm subproblem. - bli_trmm_int( BLIS_LEFT, - alpha, - &a1_pack, - &b_pack_inc, - beta, - &c1_pack_inc, - cntl_sub_gemm( cntl ) ); - } - - // Unpack C1 (if C1 was packed). - bli_unpackm_int( &c1_pack, &c1, cntl_sub_unpackm_c( cntl ) ); - } - - - // Partition along the remaining portion of the m dimension. - for ( i = bm_alg; i < mT_trans; i += bm_alg ) - { - // Determine the current algorithmic blocksize. - bm_alg = bli_determine_blocksize_f( i, mT_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_t2b( BLIS_SUBPART1, - i, bm_alg, a, &a1 ); - bli_acquire_mpart_t2b( BLIS_SUBPART1, - i, bm_alg, c, &c1 ); - - // Initialize objects for packing A1 and C1. - bli_packm_init( &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - bli_packm_init( &c1, &c1_pack, - cntl_sub_packm_c( cntl ) ); - - // Pack A1 and scale by alpha (if instructed). - bli_packm_int( alpha, - &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - - // Pack C1 and scale by beta (if instructed). - bli_packm_int( beta, - &c1, &c1_pack, - cntl_sub_packm_c( cntl ) ); - - // Perform trmm subproblem. - if ( bli_obj_intersects_diag( a1_pack ) ) - bli_trmm_int( BLIS_LEFT, - alpha, - &a1_pack, - &b_pack, - beta, - &c1_pack, - cntl_sub_gemm( cntl ) ); - else - bli_gemm_int( alpha, - &a1_pack, - &b_pack, - &BLIS_ONE, - &c1_pack, - cntl_sub_gemm( cntl ) ); - - // Unpack C1 (if C1 was packed). - bli_unpackm_int( &c1_pack, &c1, - cntl_sub_unpackm_c( cntl ) ); - } - - // If any packing buffers were acquired within packm, release them back - // to the memory manager. - bli_obj_release_pack( &a1_pack ); - bli_obj_release_pack( &b_pack ); - bli_obj_release_pack( &c1_pack ); -} - diff --git a/frame/3/trmm/other/bli_trmm_lu_blk_var4.h b/frame/3/trmm/other/bli_trmm_lu_blk_var4.h deleted file mode 100644 index 0d4e38005..000000000 --- a/frame/3/trmm/other/bli_trmm_lu_blk_var4.h +++ /dev/null @@ -1,41 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_trmm_lu_blk_var4( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trmm_t* cntl ); - diff --git a/frame/3/trmm3/bli_trmm3.h b/frame/3/trmm3/bli_trmm3.h index 2b10ecadf..6092bf899 100644 --- a/frame/3/trmm3/bli_trmm3.h +++ b/frame/3/trmm3/bli_trmm3.h @@ -32,38 +32,5 @@ */ -#include "bli_trmm3_check.h" #include "bli_trmm3_front.h" - -// -// Prototype object-based interface. -// -void bli_trmm3( side_t side, - obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c ); - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( trmm3 ) - diff --git a/frame/3/trmm3/bli_trmm3_front.c b/frame/3/trmm3/bli_trmm3_front.c index 0a90f6830..230a95388 100644 --- a/frame/3/trmm3/bli_trmm3_front.c +++ b/frame/3/trmm3/bli_trmm3_front.c @@ -40,6 +40,7 @@ void bli_trmm3_front( side_t side, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ) { obj_t a_local; @@ -48,7 +49,7 @@ void bli_trmm3_front( side_t side, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_trmm3_check( side, alpha, a, b, beta, c ); + bli_trmm_check( side, alpha, a, b, beta, c, cntx ); // If alpha is zero, scale by beta and return. if ( bli_obj_equals( alpha, &BLIS_ZERO ) ) @@ -57,6 +58,10 @@ void bli_trmm3_front( side_t side, return; } + // Reinitialize the memory allocator to accommodate the blocksizes + // in the current context. + bli_mem_reinit( cntx ); + // Alias A, B, and C so we can tweak the objects if necessary. bli_obj_alias_to( *a, a_local ); bli_obj_alias_to( *b, b_local ); @@ -98,14 +103,7 @@ void bli_trmm3_front( side_t side, // contiguous columns, or if C is stored by columns and the micro-kernel // prefers contiguous rows, transpose the entire operation to allow the // micro-kernel to access elements of C in its preferred manner. - if ( - ( bli_obj_is_row_stored( c_local ) && - bli_func_prefers_contig_cols( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) || - ( bli_obj_is_col_stored( c_local ) && - bli_func_prefers_contig_rows( bli_obj_datatype( c_local ), - bli_gemm_cntl_ukrs( cntl ) ) ) - ) + if ( bli_cntx_l3_nat_ukr_dislikes_storage_of( &c_local, BLIS_GEMM_UKR, cntx ) ) { bli_toggle_side( side ); bli_obj_induce_trans( a_local ); @@ -136,12 +134,13 @@ void bli_trmm3_front( side_t side, // Invoke the internal back-end. bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_trmm_int, + (l3_int_t) bli_trmm_int, alpha, &a_local, &b_local, beta, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); diff --git a/frame/3/trmm3/bli_trmm3_front.h b/frame/3/trmm3/bli_trmm3_front.h index f6ebbf27d..052d83249 100644 --- a/frame/3/trmm3/bli_trmm3_front.h +++ b/frame/3/trmm3/bli_trmm3_front.h @@ -38,4 +38,5 @@ void bli_trmm3_front( side_t side, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, gemm_t* cntl ); diff --git a/frame/3/trsm/bli_trsm.h b/frame/3/trsm/bli_trsm.h index 6245d7286..cee62e347 100644 --- a/frame/3/trsm/bli_trsm.h +++ b/frame/3/trsm/bli_trsm.h @@ -33,58 +33,8 @@ */ #include "bli_trsm_cntl.h" -#include "bli_trsm_blocksize.h" -#include "bli_trsm_check.h" #include "bli_trsm_front.h" #include "bli_trsm_int.h" -#include "bli_trsm_prune.h" -#include "bli_gemmtrsm_ukernel.h" -#include "bli_trsm_ukernel.h" +#include "bli_trsm_var.h" -#include "bli_trsm_blk_var1f.h" -#include "bli_trsm_blk_var1b.h" - -#include "bli_trsm_blk_var2f.h" -#include "bli_trsm_blk_var2b.h" - -#include "bli_trsm_blk_var3f.h" -#include "bli_trsm_blk_var3b.h" - -#include "bli_trsm_ll_ker_var2.h" -#include "bli_trsm_lu_ker_var2.h" -#include "bli_trsm_rl_ker_var2.h" -#include "bli_trsm_ru_ker_var2.h" - -#include "bli_gemmtrsm_l_ukr_ref.h" -#include "bli_gemmtrsm_u_ukr_ref.h" - -#include "bli_trsm_l_ukr_ref.h" -#include "bli_trsm_u_ukr_ref.h" - - -// -// Prototype object-based interface. -// -void bli_trsm( side_t side, - obj_t* alpha, - obj_t* a, - obj_t* b ); - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b \ - ); - -INSERT_GENTPROT_BASIC( trsm ) diff --git a/frame/3/trsm/bli_trsm_blk_var1b.c b/frame/3/trsm/bli_trsm_blk_var1b.c index 29aad41df..e098bb374 100644 --- a/frame/3/trsm/bli_trsm_blk_var1b.c +++ b/frame/3/trsm/bli_trsm_blk_var1b.c @@ -37,6 +37,7 @@ void bli_trsm_blk_var1b( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -57,7 +58,7 @@ void bli_trsm_blk_var1b( obj_t* a, if( thread_am_ochief( thread ) ) { bli_obj_init_pack( &b_pack_s ); bli_packm_init( b, &b_pack_s, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); } b_pack = thread_obroadcast( thread, &b_pack_s ); @@ -69,15 +70,14 @@ void bli_trsm_blk_var1b( obj_t* a, // Pack B1 (if instructed). bli_packm_int( b, b_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trsm_thread_sub_opackm( thread ) ); dim_t my_start, my_end; - num_t dt = bli_obj_execution_datatype( *a ); - dim_t bf = ( bli_obj_root_is_triangular( *a ) ? - bli_info_get_default_mr( BLIS_TRSM, dt ) : - bli_info_get_default_nr( BLIS_TRSM, dt ) ); - bli_get_range_b2t( thread, a, bf, + bli_get_range_b2t( thread, a, + ( bli_obj_root_is_triangular( *a ) ? + bli_cntx_get_bmult( BLIS_MR, cntx ) : + bli_cntx_get_bmult( BLIS_NR, cntx ) ), &my_start, &my_end ); // Partition along the remaining portion of the m dimension. @@ -85,7 +85,7 @@ void bli_trsm_blk_var1b( obj_t* a, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_b( i, my_end, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and C1. bli_acquire_mpart_b2t( BLIS_SUBPART1, @@ -96,13 +96,13 @@ void bli_trsm_blk_var1b( obj_t* a, // Initialize object for packing A1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trsm_thread_sub_ipackm( thread ) ); // Perform trsm subproblem. @@ -111,6 +111,7 @@ void bli_trsm_blk_var1b( obj_t* a, b_pack, &BLIS_ONE, &c1, + cntx, cntl_sub_trsm( cntl ), trsm_thread_sub_trsm( thread ) ); thread_ibarrier( thread ); diff --git a/frame/3/trsm/bli_trsm_blk_var1f.c b/frame/3/trsm/bli_trsm_blk_var1f.c index b4a90f463..75624c26f 100644 --- a/frame/3/trsm/bli_trsm_blk_var1f.c +++ b/frame/3/trsm/bli_trsm_blk_var1f.c @@ -37,6 +37,7 @@ void bli_trsm_blk_var1f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -57,7 +58,7 @@ void bli_trsm_blk_var1f( obj_t* a, if( thread_am_ochief( thread ) ) { bli_obj_init_pack( &b_pack_s ); bli_packm_init( b, &b_pack_s, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); } b_pack = thread_obroadcast( thread, &b_pack_s ); @@ -69,15 +70,14 @@ void bli_trsm_blk_var1f( obj_t* a, // Pack B1 (if instructed). bli_packm_int( b, b_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trsm_thread_sub_opackm( thread ) ); dim_t my_start, my_end; - num_t dt = bli_obj_execution_datatype( *a ); - dim_t bf = ( bli_obj_root_is_triangular( *a ) ? - bli_info_get_default_mr( BLIS_TRSM, dt ) : - bli_info_get_default_nr( BLIS_TRSM, dt ) ); - bli_get_range_t2b( thread, a, bf, + bli_get_range_b2t( thread, a, + ( bli_obj_root_is_triangular( *a ) ? + bli_cntx_get_bmult( BLIS_MR, cntx ) : + bli_cntx_get_bmult( BLIS_NR, cntx ) ), &my_start, &my_end ); // Partition along the remaining portion of the m dimension. @@ -85,7 +85,7 @@ void bli_trsm_blk_var1f( obj_t* a, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, my_end, a, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and C1. bli_acquire_mpart_t2b( BLIS_SUBPART1, @@ -96,13 +96,13 @@ void bli_trsm_blk_var1f( obj_t* a, // Initialize object for packing A1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trsm_thread_sub_ipackm( thread ) ); // Perform trsm subproblem. @@ -111,6 +111,7 @@ void bli_trsm_blk_var1f( obj_t* a, b_pack, &BLIS_ONE, &c1, + cntx, cntl_sub_trsm( cntl ), trsm_thread_sub_trsm( thread ) ); thread_ibarrier( thread ); diff --git a/frame/3/trsm/bli_trsm_blk_var2b.c b/frame/3/trsm/bli_trsm_blk_var2b.c index 667652827..6c68ec7ba 100644 --- a/frame/3/trsm/bli_trsm_blk_var2b.c +++ b/frame/3/trsm/bli_trsm_blk_var2b.c @@ -37,6 +37,7 @@ void bli_trsm_blk_var2b( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -60,12 +61,12 @@ void bli_trsm_blk_var2b( obj_t* a, // Initialize object for packing A. bli_packm_init( a, &a_pack_s, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } a_pack = thread_obroadcast( thread, &a_pack_s ); @@ -79,15 +80,14 @@ void bli_trsm_blk_var2b( obj_t* a, // Pack A (if instructed). bli_packm_int( a, a_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trsm_thread_sub_opackm( thread ) ); dim_t my_start, my_end; - num_t dt = bli_obj_execution_datatype( *a ); - dim_t bf = ( bli_obj_root_is_triangular( *b ) ? - bli_info_get_default_mr( BLIS_TRSM, dt ) : - bli_info_get_default_nr( BLIS_TRSM, dt ) ); - bli_get_range_r2l( thread, b, bf, + bli_get_range_b2t( thread, a, + ( bli_obj_root_is_triangular( *a ) ? + bli_cntx_get_bmult( BLIS_MR, cntx ) : + bli_cntx_get_bmult( BLIS_NR, cntx ) ), &my_start, &my_end ); // Partition along the n dimension. @@ -95,7 +95,7 @@ void bli_trsm_blk_var2b( obj_t* a, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_b( i, my_end, b, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for B1 and C1. bli_acquire_mpart_r2l( BLIS_SUBPART1, @@ -106,20 +106,20 @@ void bli_trsm_blk_var2b( obj_t* a, // Initialize objects for packing A1 and B1. if( thread_am_ichief( thread ) ) { bli_packm_init( &b1, b1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); bli_packm_init( &c1, c1_pack, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); } thread_ibarrier( thread ); // Pack B1 (if instructed). bli_packm_int( &b1, b1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trsm_thread_sub_ipackm( thread ) ); // Pack C1 (if instructed). bli_packm_int( &c1, c1_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), trsm_thread_sub_ipackm( thread ) ); // Perform trsm subproblem. @@ -128,13 +128,14 @@ void bli_trsm_blk_var2b( obj_t* a, b1_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_trsm( cntl ), trsm_thread_sub_trsm( thread ) ); thread_ibarrier( thread ); // Unpack C1 (if C1 was packed). bli_unpackm_int( c1_pack, &c1, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), trsm_thread_sub_ipackm( thread ) ); } diff --git a/frame/3/trsm/bli_trsm_blk_var2f.c b/frame/3/trsm/bli_trsm_blk_var2f.c index 955d11349..569a521b4 100644 --- a/frame/3/trsm/bli_trsm_blk_var2f.c +++ b/frame/3/trsm/bli_trsm_blk_var2f.c @@ -37,6 +37,7 @@ void bli_trsm_blk_var2f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -60,12 +61,12 @@ void bli_trsm_blk_var2f( obj_t* a, // Initialize object for packing A. bli_packm_init( a, &a_pack_s, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } a_pack = thread_obroadcast( thread, &a_pack_s ); @@ -79,15 +80,14 @@ void bli_trsm_blk_var2f( obj_t* a, // Pack A (if instructed). bli_packm_int( a, a_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trsm_thread_sub_opackm( thread ) ); dim_t my_start, my_end; - num_t dt = bli_obj_execution_datatype( *a ); - dim_t bf = ( bli_obj_root_is_triangular( *b ) ? - bli_info_get_default_mr( BLIS_TRSM, dt ) : - bli_info_get_default_nr( BLIS_TRSM, dt ) ); - bli_get_range_l2r( thread, b, bf, + bli_get_range_b2t( thread, a, + ( bli_obj_root_is_triangular( *a ) ? + bli_cntx_get_bmult( BLIS_MR, cntx ) : + bli_cntx_get_bmult( BLIS_NR, cntx ) ), &my_start, &my_end ); // Partition along the n dimension. @@ -95,7 +95,7 @@ void bli_trsm_blk_var2f( obj_t* a, { // Determine the current algorithmic blocksize. b_alg = bli_determine_blocksize_f( i, my_end, b, - cntl_blocksize( cntl ) ); + cntl_bszid( cntl ), cntx ); // Acquire partitions for B1 and C1. bli_acquire_mpart_l2r( BLIS_SUBPART1, @@ -106,20 +106,20 @@ void bli_trsm_blk_var2f( obj_t* a, // Initialize objects for packing A1 and B1. if( thread_am_ichief( thread ) ) { bli_packm_init( &b1, b1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); bli_packm_init( &c1, c1_pack, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); } thread_ibarrier( thread ); // Pack B1 (if instructed). bli_packm_int( &b1, b1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trsm_thread_sub_ipackm( thread ) ); // Pack C1 (if instructed). bli_packm_int( &c1, c1_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), trsm_thread_sub_ipackm( thread ) ); // Perform trsm subproblem. @@ -128,13 +128,14 @@ void bli_trsm_blk_var2f( obj_t* a, b1_pack, &BLIS_ONE, c1_pack, + cntx, cntl_sub_trsm( cntl ), trsm_thread_sub_trsm( thread ) ); thread_ibarrier( thread ); // Unpack C1 (if C1 was packed). bli_unpackm_int( c1_pack, &c1, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), trsm_thread_sub_ipackm( thread ) ); } diff --git a/frame/3/trsm/bli_trsm_blk_var3b.c b/frame/3/trsm/bli_trsm_blk_var3b.c index 411178487..86c08e00e 100644 --- a/frame/3/trsm/bli_trsm_blk_var3b.c +++ b/frame/3/trsm/bli_trsm_blk_var3b.c @@ -37,6 +37,7 @@ void bli_trsm_blk_var3b( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -61,12 +62,12 @@ void bli_trsm_blk_var3b( obj_t* a, // Initialize object for packing C. bli_packm_init( c, &c_pack_s, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } c_pack = thread_obroadcast( thread, &c_pack_s ); @@ -79,7 +80,7 @@ void bli_trsm_blk_var3b( obj_t* a, // Pack C (if instructed). bli_packm_int( c, c_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), trsm_thread_sub_opackm( thread ) ); // Query dimension in partitioning direction. @@ -92,8 +93,8 @@ void bli_trsm_blk_var3b( obj_t* a, // NOTE: We call a trsm-specific function to determine the kc // blocksize so that we can implement the "nudging" of kc to be // a multiple of mr, as needed. - b_alg = bli_trsm_determine_kc_b( i, k_trans, b, - cntl_blocksize( cntl ) ); + b_alg = bli_trsm_determine_kc_b( i, k_trans, a, b, + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and B1. bli_acquire_mpart_r2l( BLIS_SUBPART1, @@ -104,20 +105,20 @@ void bli_trsm_blk_var3b( obj_t* a, // Initialize objects for packing A1 and B1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packm_init( &b1, b1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trsm_thread_sub_ipackm( thread ) ); // Pack B1 (if instructed). bli_packm_int( &b1, b1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trsm_thread_sub_ipackm( thread ) ); // Perform trsm subproblem. @@ -126,6 +127,7 @@ void bli_trsm_blk_var3b( obj_t* a, b1_pack, &BLIS_ONE, c_pack, + cntx, cntl_sub_trsm( cntl ), trsm_thread_sub_trsm( thread ) ); @@ -144,7 +146,7 @@ void bli_trsm_blk_var3b( obj_t* a, // Unpack C (if C was packed). bli_unpackm_int( c_pack, c, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), trsm_thread_sub_opackm( thread ) ); // If any packing buffers were acquired within packm, release them back diff --git a/frame/3/trsm/bli_trsm_blk_var3f.c b/frame/3/trsm/bli_trsm_blk_var3f.c index ce84201a2..cb3f2aa44 100644 --- a/frame/3/trsm/bli_trsm_blk_var3f.c +++ b/frame/3/trsm/bli_trsm_blk_var3f.c @@ -37,6 +37,7 @@ void bli_trsm_blk_var3f( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -61,12 +62,12 @@ void bli_trsm_blk_var3f( obj_t* a, // Initialize object for packing C. bli_packm_init( c, &c_pack_s, - cntl_sub_packm_c( cntl ) ); + cntx, cntl_sub_packm_c( cntl ) ); // Scale C by beta (if instructed). bli_scalm_int( &BLIS_ONE, c, - cntl_sub_scalm( cntl ) ); + cntx, cntl_sub_scalm( cntl ) ); } c_pack = thread_obroadcast( thread, &c_pack_s ); @@ -79,7 +80,7 @@ void bli_trsm_blk_var3f( obj_t* a, // Pack C (if instructed). bli_packm_int( c, c_pack, - cntl_sub_packm_c( cntl ), + cntx, cntl_sub_packm_c( cntl ), trsm_thread_sub_opackm( thread ) ); // Query dimension in partitioning direction. @@ -92,8 +93,8 @@ void bli_trsm_blk_var3f( obj_t* a, // NOTE: We call a trsm-specific function to determine the kc // blocksize so that we can implement the "nudging" of kc to be // a multiple of mr, as needed. - b_alg = bli_trsm_determine_kc_f( i, k_trans, b, - cntl_blocksize( cntl ) ); + b_alg = bli_trsm_determine_kc_f( i, k_trans, a, b, + cntl_bszid( cntl ), cntx ); // Acquire partitions for A1 and B1. bli_acquire_mpart_l2r( BLIS_SUBPART1, @@ -104,20 +105,20 @@ void bli_trsm_blk_var3f( obj_t* a, // Initialize objects for packing A1 and B1. if( thread_am_ichief( thread ) ) { bli_packm_init( &a1, a1_pack, - cntl_sub_packm_a( cntl ) ); + cntx, cntl_sub_packm_a( cntl ) ); bli_packm_init( &b1, b1_pack, - cntl_sub_packm_b( cntl ) ); + cntx, cntl_sub_packm_b( cntl ) ); } thread_ibarrier( thread ); // Pack A1 (if instructed). bli_packm_int( &a1, a1_pack, - cntl_sub_packm_a( cntl ), + cntx, cntl_sub_packm_a( cntl ), trsm_thread_sub_ipackm( thread ) ); // Pack B1 (if instructed). bli_packm_int( &b1, b1_pack, - cntl_sub_packm_b( cntl ), + cntx, cntl_sub_packm_b( cntl ), trsm_thread_sub_ipackm( thread ) ); // Perform trsm subproblem. @@ -126,6 +127,7 @@ void bli_trsm_blk_var3f( obj_t* a, b1_pack, &BLIS_ONE, c_pack, + cntx, cntl_sub_trsm( cntl ), trsm_thread_sub_trsm( thread ) ); @@ -144,7 +146,7 @@ void bli_trsm_blk_var3f( obj_t* a, // Unpack C (if C was packed). bli_unpackm_int( c_pack, c, - cntl_sub_unpackm_c( cntl ), + cntx, cntl_sub_unpackm_c( cntl ), trsm_thread_sub_opackm( thread ) ); // If any packing buffers were acquired within packm, release them back diff --git a/frame/3/trsm/bli_trsm_cntl.c b/frame/3/trsm/bli_trsm_cntl.c index 05560aa40..765b06889 100644 --- a/frame/3/trsm/bli_trsm_cntl.c +++ b/frame/3/trsm/bli_trsm_cntl.c @@ -36,27 +36,8 @@ extern scalm_t* scalm_cntl; -extern blksz_t* gemm_mc; -extern blksz_t* gemm_nc; -extern blksz_t* gemm_kc; -extern blksz_t* gemm_mr; -extern blksz_t* gemm_nr; -extern blksz_t* gemm_kr; - -extern func_t* gemm_ukrs; - extern gemm_t* gemm_cntl_bp_ke; -func_t* gemmtrsm_l_ukrs; -func_t* gemmtrsm_u_ukrs; -func_t* trsm_l_ukrs; -func_t* trsm_u_ukrs; - -func_t* gemmtrsm_l_ref_ukrs; -func_t* gemmtrsm_u_ref_ukrs; -func_t* trsm_l_ref_ukrs; -func_t* trsm_u_ref_ukrs; - packm_t* trsm_l_packa_cntl; packm_t* trsm_l_packb_cntl; @@ -80,60 +61,6 @@ trsm_t* trsm_r_cntl; void bli_trsm_cntl_init() { - // Create function pointer objects for each datatype-specific - // micro-kernel (for gemmtrsm and trsm). - gemmtrsm_l_ukrs - = - bli_func_obj_create( BLIS_SGEMMTRSM_L_UKERNEL, FALSE, - BLIS_DGEMMTRSM_L_UKERNEL, FALSE, - BLIS_CGEMMTRSM_L_UKERNEL, FALSE, - BLIS_ZGEMMTRSM_L_UKERNEL, FALSE ); - gemmtrsm_u_ukrs - = - bli_func_obj_create( BLIS_SGEMMTRSM_U_UKERNEL, FALSE, - BLIS_DGEMMTRSM_U_UKERNEL, FALSE, - BLIS_CGEMMTRSM_U_UKERNEL, FALSE, - BLIS_ZGEMMTRSM_U_UKERNEL, FALSE ); - trsm_l_ukrs - = - bli_func_obj_create( BLIS_STRSM_L_UKERNEL, FALSE, - BLIS_DTRSM_L_UKERNEL, FALSE, - BLIS_CTRSM_L_UKERNEL, FALSE, - BLIS_ZTRSM_L_UKERNEL, FALSE ); - trsm_u_ukrs - = - bli_func_obj_create( BLIS_STRSM_U_UKERNEL, FALSE, - BLIS_DTRSM_U_UKERNEL, FALSE, - BLIS_CTRSM_U_UKERNEL, FALSE, - BLIS_ZTRSM_U_UKERNEL, FALSE ); - - // Create function pointer objects for reference micro-kernels. - gemmtrsm_l_ref_ukrs - = - bli_func_obj_create( BLIS_SGEMMTRSM_L_UKERNEL_REF, FALSE, - BLIS_DGEMMTRSM_L_UKERNEL_REF, FALSE, - BLIS_CGEMMTRSM_L_UKERNEL_REF, FALSE, - BLIS_ZGEMMTRSM_L_UKERNEL_REF, FALSE ); - gemmtrsm_u_ref_ukrs - = - bli_func_obj_create( BLIS_SGEMMTRSM_U_UKERNEL_REF, FALSE, - BLIS_DGEMMTRSM_U_UKERNEL_REF, FALSE, - BLIS_CGEMMTRSM_U_UKERNEL_REF, FALSE, - BLIS_ZGEMMTRSM_U_UKERNEL_REF, FALSE ); - trsm_l_ref_ukrs - = - bli_func_obj_create( BLIS_STRSM_L_UKERNEL_REF, FALSE, - BLIS_DTRSM_L_UKERNEL_REF, FALSE, - BLIS_CTRSM_L_UKERNEL_REF, FALSE, - BLIS_ZTRSM_L_UKERNEL_REF, FALSE ); - trsm_u_ref_ukrs - = - bli_func_obj_create( BLIS_STRSM_U_UKERNEL_REF, FALSE, - BLIS_DTRSM_U_UKERNEL_REF, FALSE, - BLIS_CTRSM_U_UKERNEL_REF, FALSE, - BLIS_ZTRSM_U_UKERNEL_REF, FALSE ); - - // Create control tree objects for packm operations (left side). trsm_l_packa_cntl = @@ -141,8 +68,8 @@ void bli_trsm_cntl_init() BLIS_VARIANT1, // IMPORTANT: n dim multiple must be mr to // support right and bottom-right edge cases - gemm_mr, - gemm_mr, + BLIS_MR, + BLIS_MR, TRUE, // invert diagonal TRUE, // reverse iteration if upper? FALSE, // reverse iteration if lower? @@ -155,8 +82,8 @@ void bli_trsm_cntl_init() BLIS_VARIANT1, // IMPORTANT: m dim multiple must be mr since // B_pack is updated (ie: serves as C) in trsm - gemm_mr, - gemm_nr, + BLIS_MR, + BLIS_NR, FALSE, // do NOT invert diagonal FALSE, // reverse iteration if upper? FALSE, // reverse iteration if lower? @@ -168,8 +95,8 @@ void bli_trsm_cntl_init() = bli_packm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemm_nr, - gemm_mr, + BLIS_NR, + BLIS_MR, FALSE, // do NOT invert diagonal FALSE, // reverse iteration if upper? FALSE, // reverse iteration if lower? @@ -180,8 +107,8 @@ void bli_trsm_cntl_init() = bli_packm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, // pack panels of B compactly - gemm_mr, - gemm_mr, + BLIS_MR, + BLIS_MR, TRUE, // invert diagonal FALSE, // reverse iteration if upper? TRUE, // reverse iteration if lower? @@ -194,10 +121,7 @@ void bli_trsm_cntl_init() = bli_trsm_cntl_obj_create( BLIS_UNB_OPT, BLIS_VARIANT2, - NULL, - gemm_ukrs, - gemmtrsm_l_ukrs, - gemmtrsm_u_ukrs, + 0, // bszid_t not used by macro-kernel NULL, NULL, NULL, NULL, NULL, NULL, NULL ); @@ -207,8 +131,7 @@ void bli_trsm_cntl_init() = bli_trsm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemm_mc, - NULL, NULL, NULL, + BLIS_MC, NULL, trsm_l_packa_cntl, trsm_l_packb_cntl, @@ -223,8 +146,7 @@ void bli_trsm_cntl_init() = bli_trsm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT3, - gemm_kc, - NULL, NULL, NULL, + BLIS_KC, NULL, NULL, NULL, @@ -239,8 +161,7 @@ void bli_trsm_cntl_init() = bli_trsm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemm_nc, - NULL, NULL, NULL, + BLIS_NC, NULL, NULL, NULL, @@ -255,8 +176,7 @@ void bli_trsm_cntl_init() = bli_trsm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT1, - gemm_mc, - NULL, NULL, NULL, + BLIS_MC, NULL, trsm_r_packa_cntl, trsm_r_packb_cntl, @@ -271,8 +191,7 @@ void bli_trsm_cntl_init() = bli_trsm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT3, - gemm_kc, - NULL, NULL, NULL, + BLIS_KC, NULL, NULL, NULL, @@ -287,8 +206,7 @@ void bli_trsm_cntl_init() = bli_trsm_cntl_obj_create( BLIS_BLOCKED, BLIS_VARIANT2, - gemm_nc, - NULL, NULL, NULL, + BLIS_NC, NULL, NULL, NULL, @@ -304,16 +222,6 @@ void bli_trsm_cntl_init() void bli_trsm_cntl_finalize() { - bli_func_obj_free( gemmtrsm_l_ukrs ); - bli_func_obj_free( gemmtrsm_u_ukrs ); - bli_func_obj_free( trsm_l_ukrs ); - bli_func_obj_free( trsm_u_ukrs ); - - bli_func_obj_free( gemmtrsm_l_ref_ukrs ); - bli_func_obj_free( gemmtrsm_u_ref_ukrs ); - bli_func_obj_free( trsm_l_ref_ukrs ); - bli_func_obj_free( trsm_u_ref_ukrs ); - bli_cntl_obj_free( trsm_l_packa_cntl ); bli_cntl_obj_free( trsm_l_packb_cntl ); bli_cntl_obj_free( trsm_r_packa_cntl ); @@ -331,10 +239,7 @@ void bli_trsm_cntl_finalize() trsm_t* bli_trsm_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, - func_t* gemm_ukrs_, - func_t* gemmtrsm_l_ukrs_, - func_t* gemmtrsm_u_ukrs_, + bszid_t bszid, scalm_t* sub_scalm, packm_t* sub_packm_a, packm_t* sub_packm_b, @@ -349,10 +254,7 @@ trsm_t* bli_trsm_cntl_obj_create( impl_t impl_type, cntl->impl_type = impl_type; cntl->var_num = var_num; - cntl->b = b; - cntl->gemm_ukrs = gemm_ukrs_; - cntl->gemmtrsm_l_ukrs = gemmtrsm_l_ukrs_; - cntl->gemmtrsm_u_ukrs = gemmtrsm_u_ukrs_; + cntl->bszid = bszid; cntl->sub_scalm = sub_scalm; cntl->sub_packm_a = sub_packm_a; cntl->sub_packm_b = sub_packm_b; diff --git a/frame/3/trsm/bli_trsm_cntl.h b/frame/3/trsm/bli_trsm_cntl.h index 3d13910d7..c974cb1e5 100644 --- a/frame/3/trsm/bli_trsm_cntl.h +++ b/frame/3/trsm/bli_trsm_cntl.h @@ -36,10 +36,7 @@ struct trsm_s { impl_t impl_type; varnum_t var_num; - blksz_t* b; - func_t* gemm_ukrs; - func_t* gemmtrsm_l_ukrs; - func_t* gemmtrsm_u_ukrs; + bszid_t bszid; struct scalm_s* sub_scalm; struct packm_s* sub_packm_a; struct packm_s* sub_packm_b; @@ -51,17 +48,12 @@ struct trsm_s typedef struct trsm_s trsm_t; #define cntl_sub_trsm( cntl ) cntl->sub_trsm -#define cntl_gemmtrsm_l_ukrs( cntl ) cntl->gemmtrsm_l_ukrs -#define cntl_gemmtrsm_u_ukrs( cntl ) cntl->gemmtrsm_u_ukrs void bli_trsm_cntl_init( void ); void bli_trsm_cntl_finalize( void ); trsm_t* bli_trsm_cntl_obj_create( impl_t impl_type, varnum_t var_num, - blksz_t* b, - func_t* gemm_ukrs, - func_t* gemmtrsm_l_ukrs, - func_t* gemmtrsm_u_ukrs, + bszid_t bszid, scalm_t* sub_scalm, packm_t* sub_pack_a, packm_t* sub_pack_b, diff --git a/frame/3/trsm/bli_trsm_front.c b/frame/3/trsm/bli_trsm_front.c index 554dc36c6..cc3de1b7f 100644 --- a/frame/3/trsm/bli_trsm_front.c +++ b/frame/3/trsm/bli_trsm_front.c @@ -38,6 +38,7 @@ void bli_trsm_front( side_t side, obj_t* alpha, obj_t* a, obj_t* b, + cntx_t* cntx, trsm_t* l_cntl, trsm_t* r_cntl ) { @@ -48,7 +49,7 @@ void bli_trsm_front( side_t side, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_trsm_check( side, alpha, a, b ); + bli_trsm_check( side, alpha, a, b, &BLIS_ZERO, b, cntx ); // If alpha is zero, scale by beta and return. if ( bli_obj_equals( alpha, &BLIS_ZERO ) ) @@ -57,6 +58,10 @@ void bli_trsm_front( side_t side, return; } + // Reinitialize the memory allocator to accommodate the blocksizes + // in the current context. + bli_mem_reinit( cntx ); + // Alias A and B so we can tweak the objects if necessary. bli_obj_alias_to( *a, a_local ); bli_obj_alias_to( *b, b_local ); @@ -119,12 +124,13 @@ void bli_trsm_front( side_t side, // Invoke the internal back-end. bli_level3_thread_decorator( n_threads, - (level3_int_t) bli_trsm_int, + (l3_int_t) bli_trsm_int, alpha, &a_local, &b_local, alpha, &c_local, + (void*) cntx, (void*) cntl, (void**) infos ); diff --git a/frame/3/trsm/bli_trsm_front.h b/frame/3/trsm/bli_trsm_front.h index f3342427f..6ee063797 100644 --- a/frame/3/trsm/bli_trsm_front.h +++ b/frame/3/trsm/bli_trsm_front.h @@ -36,6 +36,7 @@ void bli_trsm_front( side_t side, obj_t* alpha, obj_t* a, obj_t* b, + cntx_t* cntx, trsm_t* l_cntl, trsm_t* r_cntl ); diff --git a/frame/3/trsm/bli_trsm_int.c b/frame/3/trsm/bli_trsm_int.c index 7abdfa462..1fd9e35d8 100644 --- a/frame/3/trsm/bli_trsm_int.c +++ b/frame/3/trsm/bli_trsm_int.c @@ -39,6 +39,7 @@ typedef void (*FUNCPTR_T)( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ); @@ -89,6 +90,7 @@ void bli_trsm_int( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -102,7 +104,7 @@ void bli_trsm_int( obj_t* alpha, // Check parameters. if ( bli_error_checking_is_enabled() ) - bli_trsm_int_check( alpha, a, b, beta, c, cntl ); + bli_gemm_basic_check( alpha, a, b, beta, c, cntx ); // If C has a zero dimension, return early. if ( bli_obj_has_zero_dim( *c ) ) return; @@ -185,6 +187,7 @@ void bli_trsm_int( obj_t* alpha, f( &a_local, &b_local, &c_local, + cntx, cntl, thread ); } diff --git a/frame/3/trsm/bli_trsm_int.h b/frame/3/trsm/bli_trsm_int.h index 21fcde4cc..916d9a4e4 100644 --- a/frame/3/trsm/bli_trsm_int.h +++ b/frame/3/trsm/bli_trsm_int.h @@ -37,5 +37,6 @@ void bli_trsm_int( obj_t* alpha, obj_t* b, obj_t* beta, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ); diff --git a/frame/3/trsm/bli_trsm_ll_ker_var2.c b/frame/3/trsm/bli_trsm_ll_ker_var2.c index 31384e1fa..1ca4599d7 100644 --- a/frame/3/trsm/bli_trsm_ll_ker_var2.c +++ b/frame/3/trsm/bli_trsm_ll_ker_var2.c @@ -48,8 +48,7 @@ typedef void (*FUNCPTR_T)( void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, void* alpha2, void* c, inc_t rs_c, inc_t cs_c, - void* gemmtrsm_ukr, - void* gemm_ukr, + cntx_t* cntx, trsm_thrinfo_t* thread ); @@ -59,6 +58,7 @@ static FUNCPTR_T GENARRAY(ftypes,trsm_ll_ker_var2); void bli_trsm_ll_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -92,12 +92,6 @@ void bli_trsm_ll_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemmtrsm_ukrs; - func_t* gemm_ukrs; - void* gemmtrsm_ukr; - void* gemm_ukr; - - // Grab the address of the internal scalar buffer for the scalar // attached to B. This will be the alpha scalar used in the gemmtrsm // subproblems (ie: the scalar that would be applied to the packed @@ -117,14 +111,6 @@ void bli_trsm_ll_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t objects containing - // the gemmtrsm and gemm micro-kernel function addresses, and then - // query the function addresses corresponding to the current datatype. - gemmtrsm_ukrs = cntl_gemmtrsm_l_ukrs( cntl ); - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemmtrsm_ukr = bli_func_obj_query( dt_exec, gemmtrsm_ukrs ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( diagoffa, schema_a, @@ -137,48 +123,51 @@ void bli_trsm_ll_ker_var2( obj_t* a, buf_b, rs_b, pd_b, ps_b, buf_alpha2, buf_c, rs_c, cs_c, - gemmtrsm_ukr, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, gemmtrsmtype, gemmtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffa, \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha1, \ - void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ - void* alpha2, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemmtrsm_ukr, \ - void* gemm_ukr, \ - trsm_thrinfo_t* thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffa, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha1, \ + void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ + void* alpha2, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + trsm_thrinfo_t* thread \ + ) \ { \ - /* Cast the micro-kernels' addresses to their function pointer types. */ \ - PASTECH(ch,gemmtrsmtype) gemmtrsm_ukr_cast = gemmtrsm_ukr; \ - PASTECH(ch,gemmtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ - const dim_t MR = pd_a; \ - const dim_t NR = pd_b; \ - const dim_t PACKMR = cs_a; \ - const dim_t PACKNR = rs_b; \ + const dim_t MR = pd_a; \ + const dim_t NR = pd_b; \ + const dim_t PACKMR = cs_a; \ + const dim_t PACKNR = rs_b; \ +\ + /* Cast the micro-kernel address to its function pointer type. */ \ + PASTECH(ch,gemmtrsm_ukr_ft) \ + gemmtrsm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMMTRSM_L_UKR, cntx ); \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict zero = PASTEMAC(ch,0); \ ctype* restrict minus_one = PASTEMAC(ch,m1); \ @@ -425,26 +414,34 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the fused gemm/trsm micro-kernel. */ \ - gemmtrsm_ukr_cast( k_a10, \ - alpha1_cast, \ - a10, \ - a11, \ - b01, \ - b11, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemmtrsm_ukr \ + ( \ + k_a10, \ + alpha1_cast, \ + a10, \ + a11, \ + b01, \ + b11, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the fused gemm/trsm micro-kernel. */ \ - gemmtrsm_ukr_cast( k_a10, \ - alpha1_cast, \ - a10, \ - a11, \ - b01, \ - b11, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemmtrsm_ukr \ + ( \ + k_a10, \ + alpha1_cast, \ + a10, \ + a11, \ + b01, \ + b11, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Copy the result to the bottom edge of C. */ \ PASTEMAC(ch,copys_mxn)( m_cur, n_cur, \ @@ -482,24 +479,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - minus_one, \ - a1, \ - b1, \ - alpha2_cast, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + a1, \ + b1, \ + alpha2_cast, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - minus_one, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Add the result to the edge of C. */ \ PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \ @@ -570,5 +575,5 @@ PASTEMAC(ch,fprintm)( stdout, "trsm_ll_ker_var2: b1 (ndiag)", k, NR, bp, NR, 1, */ \ } -INSERT_GENTFUNC_BASIC2( trsm_ll_ker_var2, gemmtrsm_ukr_t, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( trsm_ll_ker_var2 ) diff --git a/frame/3/trsm/bli_trsm_lu_ker_var2.c b/frame/3/trsm/bli_trsm_lu_ker_var2.c index 0e6129134..6edd92ee2 100644 --- a/frame/3/trsm/bli_trsm_lu_ker_var2.c +++ b/frame/3/trsm/bli_trsm_lu_ker_var2.c @@ -48,8 +48,7 @@ typedef void (*FUNCPTR_T)( void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, void* alpha2, void* c, inc_t rs_c, inc_t cs_c, - void* gemmtrsm_ukr, - void* gemm_ukr, + cntx_t* cntx, trsm_thrinfo_t* thread ); @@ -59,6 +58,7 @@ static FUNCPTR_T GENARRAY(ftypes,trsm_lu_ker_var2); void bli_trsm_lu_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -92,12 +92,6 @@ void bli_trsm_lu_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemmtrsm_ukrs; - func_t* gemm_ukrs; - void* gemmtrsm_ukr; - void* gemm_ukr; - - // Grab the address of the internal scalar buffer for the scalar // attached to B. This will be the alpha scalar used in the gemmtrsm // subproblems (ie: the scalar that would be applied to the packed @@ -117,14 +111,6 @@ void bli_trsm_lu_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t objects containing - // the gemmtrsm and gemm micro-kernel function addresses, and then - // query the function addresses corresponding to the current datatype. - gemmtrsm_ukrs = cntl_gemmtrsm_u_ukrs( cntl ); - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemmtrsm_ukr = bli_func_obj_query( dt_exec, gemmtrsm_ukrs ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( diagoffa, schema_a, @@ -137,48 +123,51 @@ void bli_trsm_lu_ker_var2( obj_t* a, buf_b, rs_b, pd_b, ps_b, buf_alpha2, buf_c, rs_c, cs_c, - gemmtrsm_ukr, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, gemmtrsmtype, gemmtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffa, \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha1, \ - void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ - void* alpha2, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemmtrsm_ukr, \ - void* gemm_ukr, \ - trsm_thrinfo_t* thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffa, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha1, \ + void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ + void* alpha2, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + trsm_thrinfo_t* thread \ + ) \ { \ - /* Cast the micro-kernels' addresses to their function pointer types. */ \ - PASTECH(ch,gemmtrsmtype) gemmtrsm_ukr_cast = gemmtrsm_ukr; \ - PASTECH(ch,gemmtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxmr) * \ - PASTEMAC(ch,maxnr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxmr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ - const dim_t MR = pd_a; \ - const dim_t NR = pd_b; \ - const dim_t PACKMR = cs_a; \ - const dim_t PACKNR = rs_b; \ + const dim_t MR = pd_a; \ + const dim_t NR = pd_b; \ + const dim_t PACKMR = cs_a; \ + const dim_t PACKNR = rs_b; \ +\ + /* Cast the micro-kernel address to its function pointer type. */ \ + PASTECH(ch,gemmtrsm_ukr_ft) \ + gemmtrsm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMMTRSM_U_UKR, cntx ); \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict zero = PASTEMAC(ch,0); \ ctype* restrict minus_one = PASTEMAC(ch,m1); \ @@ -435,26 +424,34 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the fused gemm/trsm micro-kernel. */ \ - gemmtrsm_ukr_cast( k_a12, \ - alpha1_cast, \ - a12, \ - a11, \ - b21, \ - b11, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemmtrsm_ukr \ + ( \ + k_a12, \ + alpha1_cast, \ + a12, \ + a11, \ + b21, \ + b11, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the fused gemm/trsm micro-kernel. */ \ - gemmtrsm_ukr_cast( k_a12, \ - alpha1_cast, \ - a12, \ - a11, \ - b21, \ - b11, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemmtrsm_ukr \ + ( \ + k_a12, \ + alpha1_cast, \ + a12, \ + a11, \ + b21, \ + b11, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Copy the result to the bottom edge of C. */ \ PASTEMAC(ch,copys_mxn)( m_cur, n_cur, \ @@ -492,24 +489,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - minus_one, \ - a1, \ - b1, \ - alpha2_cast, \ - c11, rs_c, cs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + a1, \ + b1, \ + alpha2_cast, \ + c11, rs_c, cs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - minus_one, \ - a1, \ - b1, \ - zero, \ - ct, rs_ct, cs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + a1, \ + b1, \ + zero, \ + ct, rs_ct, cs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Add the result to the edge of C. */ \ PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \ @@ -551,5 +556,5 @@ PASTEMAC(ch,fprintm)( stdout, "trsm_lu_ker_var2: ct after (diag)", m_cur, n_cur, */ \ } -INSERT_GENTFUNC_BASIC2( trsm_lu_ker_var2, gemmtrsm_ukr_t, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( trsm_lu_ker_var2 ) diff --git a/frame/3/trsm/bli_trsm_rl_ker_var2.c b/frame/3/trsm/bli_trsm_rl_ker_var2.c index 7a8e97490..401afaad5 100644 --- a/frame/3/trsm/bli_trsm_rl_ker_var2.c +++ b/frame/3/trsm/bli_trsm_rl_ker_var2.c @@ -48,8 +48,7 @@ typedef void (*FUNCPTR_T)( void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, void* alpha2, void* c, inc_t rs_c, inc_t cs_c, - void* gemmtrsm_ukr, - void* gemm_ukr, + cntx_t* cntx, trsm_thrinfo_t* thread ); @@ -59,6 +58,7 @@ static FUNCPTR_T GENARRAY(ftypes,trsm_rl_ker_var2); void bli_trsm_rl_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -92,12 +92,6 @@ void bli_trsm_rl_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemmtrsm_ukrs; - func_t* gemm_ukrs; - void* gemmtrsm_ukr; - void* gemm_ukr; - - // Grab the address of the internal scalar buffer for the scalar // attached to A. This will be the alpha scalar used in the gemmtrsm // subproblems (ie: the scalar that would be applied to the packed @@ -117,14 +111,6 @@ void bli_trsm_rl_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t objects containing - // the gemmtrsm and gemm micro-kernel function addresses, and then - // query the function addresses corresponding to the current datatype. - gemmtrsm_ukrs = cntl_gemmtrsm_u_ukrs( cntl ); - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemmtrsm_ukr = bli_func_obj_query( dt_exec, gemmtrsm_ukrs ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( diagoffb, schema_a, @@ -137,48 +123,56 @@ void bli_trsm_rl_ker_var2( obj_t* a, buf_b, rs_b, pd_b, ps_b, buf_alpha2, buf_c, rs_c, cs_c, - gemmtrsm_ukr, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, gemmtrsmtype, gemmtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffb, \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha1, \ - void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ - void* alpha2, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemmtrsm_ukr, \ - void* gemm_ukr, \ - trsm_thrinfo_t* thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffb, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha1, \ + void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ + void* alpha2, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + trsm_thrinfo_t* thread \ + ) \ { \ - /* Cast the micro-kernels' addresses to their function pointer types. */ \ - PASTECH(ch,gemmtrsmtype) gemmtrsm_ukr_cast = gemmtrsm_ukr; \ - PASTECH(ch,gemmtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxnr) * \ - PASTEMAC(ch,maxmr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxnr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ - const dim_t MR = pd_a; \ - const dim_t NR = pd_b; \ - const dim_t PACKMR = cs_a; \ - const dim_t PACKNR = rs_b; \ + const dim_t MR = pd_a; \ + const dim_t NR = pd_b; \ + const dim_t PACKMR = cs_a; \ + const dim_t PACKNR = rs_b; \ +\ + /* Cast the micro-kernel address to its function pointer type. */ \ + /* NOTE: We use the upper-triangular gemmtrsm ukernel because, while + the current macro-kernel targets the "rl" case (right-side/lower- + triangular), it becomes upper-triangular after the kernel operation + is transposed so that all kernel instances are of the "left" + variety (since those are the only trsm ukernels that exist). */ \ + PASTECH(ch,gemmtrsm_ukr_ft) \ + gemmtrsm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMMTRSM_U_UKR, cntx ); \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict zero = PASTEMAC(ch,0); \ ctype* restrict minus_one = PASTEMAC(ch,m1); \ @@ -453,26 +447,34 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the fused gemm/trsm micro-kernel. */ \ - gemmtrsm_ukr_cast( k_b21, \ - alpha1_cast, \ - b21, \ - b11, \ - a12, \ - a11, \ - c11, cs_c, rs_c, \ - &aux ); \ + gemmtrsm_ukr \ + ( \ + k_b21, \ + alpha1_cast, \ + b21, \ + b11, \ + a12, \ + a11, \ + c11, cs_c, rs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the fused gemm/trsm micro-kernel. */ \ - gemmtrsm_ukr_cast( k_b21, \ - alpha1_cast, \ - b21, \ - b11, \ - a12, \ - a11, \ - ct, cs_ct, rs_ct, \ - &aux ); \ + gemmtrsm_ukr \ + ( \ + k_b21, \ + alpha1_cast, \ + b21, \ + b11, \ + a12, \ + a11, \ + ct, cs_ct, rs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Copy the result to the bottom edge of C. */ \ PASTEMAC(ch,copys_mxn)( m_cur, n_cur, \ @@ -525,24 +527,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - minus_one, \ - b1, \ - a1, \ - alpha2_cast, \ - c11, cs_c, rs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + b1, \ + a1, \ + alpha2_cast, \ + c11, cs_c, rs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - minus_one, \ - b1, \ - a1, \ - zero, \ - ct, cs_ct, rs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + b1, \ + a1, \ + zero, \ + ct, cs_ct, rs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Add the result to the edge of C. */ \ PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \ @@ -563,5 +573,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC2( trsm_rl_ker_var2, gemmtrsm_ukr_t, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( trsm_rl_ker_var2 ) diff --git a/frame/3/trsm/bli_trsm_ru_ker_var2.c b/frame/3/trsm/bli_trsm_ru_ker_var2.c index bd66d654e..a4c823583 100644 --- a/frame/3/trsm/bli_trsm_ru_ker_var2.c +++ b/frame/3/trsm/bli_trsm_ru_ker_var2.c @@ -48,8 +48,7 @@ typedef void (*FUNCPTR_T)( void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, void* alpha2, void* c, inc_t rs_c, inc_t cs_c, - void* gemmtrsm_ukr, - void* gemm_ukr, + cntx_t* cntx, trsm_thrinfo_t* thread ); @@ -59,6 +58,7 @@ static FUNCPTR_T GENARRAY(ftypes,trsm_ru_ker_var2); void bli_trsm_ru_ker_var2( obj_t* a, obj_t* b, obj_t* c, + cntx_t* cntx, trsm_t* cntl, trsm_thrinfo_t* thread ) { @@ -92,12 +92,6 @@ void bli_trsm_ru_ker_var2( obj_t* a, FUNCPTR_T f; - func_t* gemmtrsm_ukrs; - func_t* gemm_ukrs; - void* gemmtrsm_ukr; - void* gemm_ukr; - - // Grab the address of the internal scalar buffer for the scalar // attached to A. This will be the alpha scalar used in the gemmtrsm // subproblems (ie: the scalar that would be applied to the packed @@ -117,14 +111,6 @@ void bli_trsm_ru_ker_var2( obj_t* a, // function pointer. f = ftypes[dt_exec]; - // Extract from the control tree node the func_t objects containing - // the gemmtrsm and gemm micro-kernel function addresses, and then - // query the function addresses corresponding to the current datatype. - gemmtrsm_ukrs = cntl_gemmtrsm_l_ukrs( cntl ); - gemm_ukrs = cntl_gemm_ukrs( cntl ); - gemmtrsm_ukr = bli_func_obj_query( dt_exec, gemmtrsm_ukrs ); - gemm_ukr = bli_func_obj_query( dt_exec, gemm_ukrs ); - // Invoke the function. f( diagoffb, schema_a, @@ -137,48 +123,56 @@ void bli_trsm_ru_ker_var2( obj_t* a, buf_b, rs_b, pd_b, ps_b, buf_alpha2, buf_c, rs_c, cs_c, - gemmtrsm_ukr, - gemm_ukr, + cntx, thread ); } #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, gemmtrsmtype, gemmtype ) \ +#define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - doff_t diagoffb, \ - pack_t schema_a, \ - pack_t schema_b, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - void* alpha1, \ - void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ - void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ - void* alpha2, \ - void* c, inc_t rs_c, inc_t cs_c, \ - void* gemmtrsm_ukr, \ - void* gemm_ukr, \ - trsm_thrinfo_t* thread \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffb, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha1, \ + void* a, inc_t cs_a, dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, dim_t pd_b, inc_t ps_b, \ + void* alpha2, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + trsm_thrinfo_t* thread \ + ) \ { \ - /* Cast the micro-kernels' addresses to their function pointer types. */ \ - PASTECH(ch,gemmtrsmtype) gemmtrsm_ukr_cast = gemmtrsm_ukr; \ - PASTECH(ch,gemmtype) gemm_ukr_cast = gemm_ukr; \ -\ - /* Temporary C buffer for edge cases. */ \ - ctype ct[ PASTEMAC(ch,maxnr) * \ - PASTEMAC(ch,maxmr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ct = 1; \ - const inc_t cs_ct = PASTEMAC(ch,maxnr); \ + const num_t dt = PASTEMAC(ch,type); \ \ /* Alias some constants to simpler names. */ \ - const dim_t MR = pd_a; \ - const dim_t NR = pd_b; \ - const dim_t PACKMR = cs_a; \ - const dim_t PACKNR = rs_b; \ + const dim_t MR = pd_a; \ + const dim_t NR = pd_b; \ + const dim_t PACKMR = cs_a; \ + const dim_t PACKNR = rs_b; \ +\ + /* Cast the micro-kernel address to its function pointer type. */ \ + /* NOTE: We use the lower-triangular gemmtrsm ukernel because, while + the current macro-kernel targets the "ru" case (right-side/upper- + triangular), it becomes lower-triangular after the kernel operation + is transposed so that all kernel instances are of the "left" + variety (since those are the only trsm ukernels that exist). */ \ + PASTECH(ch,gemmtrsm_ukr_ft) \ + gemmtrsm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMMTRSM_L_UKR, cntx ); \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \ +\ + /* Temporary C buffer for edge cases. */ \ + ctype ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ct = 1; \ + const inc_t cs_ct = MR; \ \ ctype* restrict zero = PASTEMAC(ch,0); \ ctype* restrict minus_one = PASTEMAC(ch,m1); \ @@ -446,26 +440,34 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the fused gemm/trsm micro-kernel. */ \ - gemmtrsm_ukr_cast( k_b01, \ - alpha1_cast, \ - b01, \ - b11, \ - a10, \ - a11, \ - c11, cs_c, rs_c, \ - &aux ); \ + gemmtrsm_ukr \ + ( \ + k_b01, \ + alpha1_cast, \ + b01, \ + b11, \ + a10, \ + a11, \ + c11, cs_c, rs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the fused gemm/trsm micro-kernel. */ \ - gemmtrsm_ukr_cast( k_b01, \ - alpha1_cast, \ - b01, \ - b11, \ - a10, \ - a11, \ - ct, cs_ct, rs_ct, \ - &aux ); \ + gemmtrsm_ukr \ + ( \ + k_b01, \ + alpha1_cast, \ + b01, \ + b11, \ + a10, \ + a11, \ + ct, cs_ct, rs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Copy the result to the bottom edge of C. */ \ PASTEMAC(ch,copys_mxn)( m_cur, n_cur, \ @@ -518,24 +520,32 @@ void PASTEMAC(ch,varname)( \ if ( m_cur == MR && n_cur == NR ) \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - minus_one, \ - b1, \ - a1, \ - alpha2_cast, \ - c11, cs_c, rs_c, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + b1, \ + a1, \ + alpha2_cast, \ + c11, cs_c, rs_c, \ + &aux, \ + cntx \ + ); \ } \ else \ { \ /* Invoke the gemm micro-kernel. */ \ - gemm_ukr_cast( k, \ - minus_one, \ - b1, \ - a1, \ - zero, \ - ct, cs_ct, rs_ct, \ - &aux ); \ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + b1, \ + a1, \ + zero, \ + ct, cs_ct, rs_ct, \ + &aux, \ + cntx \ + ); \ \ /* Add the result to the edge of C. */ \ PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \ @@ -556,5 +566,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNC_BASIC2( trsm_ru_ker_var2, gemmtrsm_ukr_t, gemm_ukr_t ) +INSERT_GENTFUNC_BASIC0( trsm_ru_ker_var2 ) diff --git a/frame/3/trsm/bli_trsm_var.h b/frame/3/trsm/bli_trsm_var.h new file mode 100644 index 000000000..b33f52f7c --- /dev/null +++ b/frame/3/trsm/bli_trsm_var.h @@ -0,0 +1,96 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC0(opname) \ + ( \ + obj_t* a, \ + obj_t* b, \ + obj_t* c, \ + cntx_t* cntx, \ + trsm_t* cntl, \ + trsm_thrinfo_t* thread \ + ); + +GENPROT( trsm_blk_var1f ) +GENPROT( trsm_blk_var1b ) +GENPROT( trsm_blk_var2f ) +GENPROT( trsm_blk_var2b ) +GENPROT( trsm_blk_var3f ) +GENPROT( trsm_blk_var3b ) + +GENPROT( trsm_ll_ker_var2 ) +GENPROT( trsm_lu_ker_var2 ) +GENPROT( trsm_rl_ker_var2 ) +GENPROT( trsm_ru_ker_var2 ) + + +// +// Prototype BLAS-like interfaces with void pointer operands. +// + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoff, \ + pack_t schema_a, \ + pack_t schema_b, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + void* alpha1, \ + void* a, inc_t cs_a, \ + dim_t pd_a, inc_t ps_a, \ + void* b, inc_t rs_b, \ + dim_t pd_b, inc_t ps_b, \ + void* alpha2, \ + void* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx, \ + trsm_thrinfo_t* thread \ + ); + +INSERT_GENTPROT_BASIC( trsm_ll_ker_var2 ) +INSERT_GENTPROT_BASIC( trsm_lu_ker_var2 ) +INSERT_GENTPROT_BASIC( trsm_rl_ker_var2 ) +INSERT_GENTPROT_BASIC( trsm_ru_ker_var2 ) + diff --git a/frame/3/trsm/bli_trsm_blk_var1b.h b/frame/3/trsm/old/bli_trsm_blk_var1b.h similarity index 100% rename from frame/3/trsm/bli_trsm_blk_var1b.h rename to frame/3/trsm/old/bli_trsm_blk_var1b.h diff --git a/frame/3/trsm/bli_trsm_blk_var1f.h b/frame/3/trsm/old/bli_trsm_blk_var1f.h similarity index 100% rename from frame/3/trsm/bli_trsm_blk_var1f.h rename to frame/3/trsm/old/bli_trsm_blk_var1f.h diff --git a/frame/3/trsm/bli_trsm_blk_var2b.h b/frame/3/trsm/old/bli_trsm_blk_var2b.h similarity index 100% rename from frame/3/trsm/bli_trsm_blk_var2b.h rename to frame/3/trsm/old/bli_trsm_blk_var2b.h diff --git a/frame/3/trsm/bli_trsm_blk_var2f.h b/frame/3/trsm/old/bli_trsm_blk_var2f.h similarity index 100% rename from frame/3/trsm/bli_trsm_blk_var2f.h rename to frame/3/trsm/old/bli_trsm_blk_var2f.h diff --git a/frame/3/trsm/bli_trsm_blk_var3b.h b/frame/3/trsm/old/bli_trsm_blk_var3b.h similarity index 100% rename from frame/3/trsm/bli_trsm_blk_var3b.h rename to frame/3/trsm/old/bli_trsm_blk_var3b.h diff --git a/frame/3/trsm/bli_trsm_blk_var3f.h b/frame/3/trsm/old/bli_trsm_blk_var3f.h similarity index 100% rename from frame/3/trsm/bli_trsm_blk_var3f.h rename to frame/3/trsm/old/bli_trsm_blk_var3f.h diff --git a/frame/3/trsm/old/bli_trsm_cntx.c b/frame/3/trsm/old/bli_trsm_cntx.c new file mode 100644 index 000000000..186c146df --- /dev/null +++ b/frame/3/trsm/old/bli_trsm_cntx.c @@ -0,0 +1,76 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +void bli_trsm_cntx_init( cntx_t* cntx ) +{ + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the current architecture's native + // level-3 trsm micro-kernels. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMMTRSM_L_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMMTRSM_U_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr( BLIS_TRSM_L_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr( BLIS_TRSM_U_UKR, cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), given the execution method. + bli_gks_cntx_set_blkszs( BLIS_NAT, 6, + BLIS_NC, BLIS_NR, + BLIS_KC, BLIS_KR, + BLIS_MC, BLIS_MR, + BLIS_NR, BLIS_NR, + BLIS_MR, BLIS_MR, + BLIS_KR, BLIS_KR, + cntx ); + + // Set the pack_t schemas for native execution. + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS, + BLIS_PACKED_COL_PANELS, + cntx ); +} + +void bli_trsm_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + diff --git a/frame/3/trsm/old/bli_trsm_cntx.h b/frame/3/trsm/old/bli_trsm_cntx.h new file mode 100644 index 000000000..0bdc9e7a8 --- /dev/null +++ b/frame/3/trsm/old/bli_trsm_cntx.h @@ -0,0 +1,37 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +void bli_trsm_cntx_init( void ); +void bli_trsm_cntx_finalize( void ); + diff --git a/frame/3/trsm/bli_trsm_ll_ker_var2.h b/frame/3/trsm/old/bli_trsm_ll_ker_var2.h similarity index 98% rename from frame/3/trsm/bli_trsm_ll_ker_var2.h rename to frame/3/trsm/old/bli_trsm_ll_ker_var2.h index eea32f0a3..ef68211d7 100644 --- a/frame/3/trsm/bli_trsm_ll_ker_var2.h +++ b/frame/3/trsm/old/bli_trsm_ll_ker_var2.h @@ -62,6 +62,7 @@ void PASTEMAC(ch,varname)( \ void* alpha2, \ void* c, inc_t rs_c, inc_t cs_c, \ void* gemmtrsm_ukr, \ + cntx_t* cntx, \ void* gemm_ukr, \ trsm_thrinfo_t* thread \ ); diff --git a/frame/3/trsm/bli_trsm_lu_ker_var2.h b/frame/3/trsm/old/bli_trsm_lu_ker_var2.h similarity index 98% rename from frame/3/trsm/bli_trsm_lu_ker_var2.h rename to frame/3/trsm/old/bli_trsm_lu_ker_var2.h index 38a61a36f..6845a9cc6 100644 --- a/frame/3/trsm/bli_trsm_lu_ker_var2.h +++ b/frame/3/trsm/old/bli_trsm_lu_ker_var2.h @@ -62,6 +62,7 @@ void PASTEMAC(ch,varname)( \ void* alpha2, \ void* c, inc_t rs_c, inc_t cs_c, \ void* gemmtrsm_ukr, \ + cntx_t* cntx, \ void* gemm_ukr, \ trsm_thrinfo_t* thread \ ); diff --git a/frame/3/trsm/bli_trsm_rl_ker_var2.h b/frame/3/trsm/old/bli_trsm_rl_ker_var2.h similarity index 96% rename from frame/3/trsm/bli_trsm_rl_ker_var2.h rename to frame/3/trsm/old/bli_trsm_rl_ker_var2.h index a1a86d5af..0a1c004f8 100644 --- a/frame/3/trsm/bli_trsm_rl_ker_var2.h +++ b/frame/3/trsm/old/bli_trsm_rl_ker_var2.h @@ -62,7 +62,8 @@ void PASTEMAC(ch,varname)( \ void* alpha2, \ void* c, inc_t rs_c, inc_t cs_c, \ void* gemmtrsm_ukr, \ - void* gemm_ukr, \ + cntx_t* cntx, \ + void* gemm_ukr, \ trsm_thrinfo_t* thread \ ); diff --git a/frame/3/trsm/bli_trsm_ru_ker_var2.h b/frame/3/trsm/old/bli_trsm_ru_ker_var2.h similarity index 96% rename from frame/3/trsm/bli_trsm_ru_ker_var2.h rename to frame/3/trsm/old/bli_trsm_ru_ker_var2.h index 37c46afe0..e27331403 100644 --- a/frame/3/trsm/bli_trsm_ru_ker_var2.h +++ b/frame/3/trsm/old/bli_trsm_ru_ker_var2.h @@ -62,7 +62,8 @@ void PASTEMAC(ch,varname)( \ void* alpha2, \ void* c, inc_t rs_c, inc_t cs_c, \ void* gemmtrsm_ukr, \ - void* gemm_ukr, \ + cntx_t* cntx, \ + void* gemm_ukr, \ trsm_thrinfo_t* thread \ ); diff --git a/frame/3/trsm/other/bli_trsm_l_blk_var4.c b/frame/3/trsm/other/bli_trsm_l_blk_var4.c deleted file mode 100644 index b71dd0ef3..000000000 --- a/frame/3/trsm/other/bli_trsm_l_blk_var4.c +++ /dev/null @@ -1,174 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -void bli_trsm_l_blk_var4( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trsm_t* cntl ) -{ - obj_t a1, a1_pack; - obj_t b_pack; - obj_t c1; - - dim_t i; - dim_t bm_alg; - dim_t m_trans; - dim_t offB; - - // Initialize all pack objects that are passed into packm_init(). - bli_obj_init_pack( &a1_pack ); - bli_obj_init_pack( &b_pack ); - - // Query dimension in partitioning direction. - m_trans = bli_obj_length_after_trans( *a ); - - // Use the diagonal offset of A to skip over the zero region. - offB = bli_abs( bli_obj_diag_offset_after_trans( *a ) ); - - // Initialize object for packing B. - bli_packm_init( b, &b_pack, - cntl_sub_packm_b( cntl ) ); - - // Fuse the first iteration with incremental packing and computation. - { - obj_t b_inc, b_pack_inc; - obj_t c1_inc; - - dim_t j; - dim_t bn_inc; - dim_t n_trans; - - // Query dimension in partitioning direction. - n_trans = bli_obj_width( b_pack ); - - // Determine the current algorithmic blocksize. - bm_alg = bli_determine_blocksize_f( offB, m_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_t2b( BLIS_SUBPART1, - offB, bm_alg, a, &a1 ); - bli_acquire_mpart_t2b( BLIS_SUBPART1, - offB, bm_alg, c, &c1 ); - - // Initialize objects for packing A1 and C1. - bli_packm_init( &a1, &a1_pack, cntl_sub_packm_a( cntl ) ); - - // Pack A1 and scale by alpha (if instructed). - bli_packm_int( alpha, &a1, &a1_pack, cntl_sub_packm_a( cntl ) ); - - // Partition along the n dimension. - for ( j = 0; j < n_trans; j += bn_inc ) - { - // Determine the current incremental packing blocksize. - bn_inc = bli_determine_blocksize_f( j, n_trans, b, - cntl_blocksize_aux( cntl ) ); - - // Acquire partitions. - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, b, &b_inc ); - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, &b_pack, &b_pack_inc ); - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, &c1, &c1_inc ); - - // Pack B1 and scale by alpha (if instructed). - bli_packm_int( alpha, &b_inc, &b_pack_inc, cntl_sub_packm_b( cntl ) ); - - // Perform trsm subproblem. - bli_trsm_int( BLIS_LEFT, - alpha, - &a1_pack, - &b_pack_inc, - beta, - &c1_inc, - cntl_sub_trsm( cntl ) ); - } - - // Unpack B to the corresponding region of C. (Note that B and C1 are - // conformal since A1 is square.) - //bli_unpackm_int( &b_pack, &c1, - // cntl_sub_unpackm_c( cntl ) ); - } - - // Partition along the remaining portion of the m dimension. - for ( i = offB + bm_alg; i < m_trans; i += bm_alg ) - { - // Determine the current algorithmic blocksize. - bm_alg = bli_determine_blocksize_f( i, m_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_t2b( BLIS_SUBPART1, - i, bm_alg, a, &a1 ); - bli_acquire_mpart_t2b( BLIS_SUBPART1, - i, bm_alg, c, &c1 ); - - // Initialize object for packing A1. - bli_packm_init( &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - - // Pack A1 and scale by alpha (if instructed). - bli_packm_int( alpha, - &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - - // Perform trsm subproblem. - if ( bli_obj_intersects_diag( a1_pack ) ) - bli_trsm_int( BLIS_LEFT, - alpha, - &a1_pack, - &b_pack, - beta, - &c1, - cntl_sub_trsm( cntl ) ); - else - bli_gemm_int( &BLIS_MINUS_ONE, - &a1_pack, - &b_pack, - &BLIS_ONE, - &c1, - cntl_sub_gemm( cntl ) ); - } - - // If any packing buffers were acquired within packm, release them back - // to the memory manager. - bli_obj_release_pack( &a1_pack ); - bli_obj_release_pack( &b_pack ); -} - diff --git a/frame/3/trsm/other/bli_trsm_l_blk_var4.h b/frame/3/trsm/other/bli_trsm_l_blk_var4.h deleted file mode 100644 index 93675bb6b..000000000 --- a/frame/3/trsm/other/bli_trsm_l_blk_var4.h +++ /dev/null @@ -1,41 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_trsm_l_blk_var4( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trsm_t* cntl ); - diff --git a/frame/3/trsm/other/bli_trsm_u_blk_var4.c b/frame/3/trsm/other/bli_trsm_u_blk_var4.c deleted file mode 100644 index 200936695..000000000 --- a/frame/3/trsm/other/bli_trsm_u_blk_var4.c +++ /dev/null @@ -1,178 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -void bli_trsm_u_blk_var4( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trsm_t* cntl ) -{ - obj_t a1, a1_pack; - obj_t b_pack; - obj_t c1; - - dim_t i; - dim_t bm_alg; - dim_t m_trans; - - // Initialize all pack objects that are passed into packm_init(). - bli_obj_init_pack( &a1_pack ); - bli_obj_init_pack( &b_pack ); - - // Query dimension in partitioning direction. - m_trans = bli_obj_length_after_trans( *a ); - - // Initialize object for packing B. - bli_packm_init( b, &b_pack, - cntl_sub_packm_b( cntl ) ); - - // Find the offset to the first non-zero block of A. - for ( i = 0; i < m_trans; i += bm_alg ) - { - // Determine the current algorithmic blocksize. - bm_alg = bli_determine_blocksize_b( i, m_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_b2t( BLIS_SUBPART1, - i, bm_alg, a, &a1 ); - - if ( !bli_obj_is_zeros( a1 ) ) break; - } - - // Fuse the first iteration with incremental packing and computation. - { - obj_t b_inc, b_pack_inc; - obj_t c1_inc; - - dim_t j; - dim_t bn_inc; - dim_t n_trans; - - // Query dimension in partitioning direction. - n_trans = bli_obj_width( b_pack ); - - // Determine the current algorithmic blocksize. - bm_alg = bli_determine_blocksize_b( i, m_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_b2t( BLIS_SUBPART1, - i, bm_alg, a, &a1 ); - bli_acquire_mpart_b2t( BLIS_SUBPART1, - i, bm_alg, c, &c1 ); - - // Initialize objects for packing A1 and C1. - bli_packm_init( &a1, &a1_pack, cntl_sub_packm_a( cntl ) ); - - // Pack A1 and scale by alpha (if instructed). - bli_packm_int( alpha, &a1, &a1_pack, cntl_sub_packm_a( cntl ) ); - - // Partition along the n dimension. - for ( j = 0; j < n_trans; j += bn_inc ) - { - // Determine the current incremental packing blocksize. - bn_inc = bli_determine_blocksize_f( j, n_trans, b, - cntl_blocksize_aux( cntl ) ); - - // Acquire partitions. - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, b, &b_inc ); - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, &b_pack, &b_pack_inc ); - bli_acquire_mpart_l2r( BLIS_SUBPART1, - j, bn_inc, &c1, &c1_inc ); - - // Pack B1 and scale by alpha (if instructed). - bli_packm_int( alpha, &b_inc, &b_pack_inc, cntl_sub_packm_b( cntl ) ); - - // Perform trsm subproblem. - bli_trsm_int( BLIS_LEFT, - alpha, - &a1_pack, - &b_pack_inc, - beta, - &c1_inc, - cntl_sub_trsm( cntl ) ); - } - } - - // Partition along the remaining portion of the m dimension. - for ( i = i + bm_alg; i < m_trans; i += bm_alg ) - { - // Determine the current algorithmic blocksize. - bm_alg = bli_determine_blocksize_b( i, m_trans, a, - cntl_blocksize( cntl ) ); - - // Acquire partitions for A1 and C1. - bli_acquire_mpart_b2t( BLIS_SUBPART1, - i, bm_alg, a, &a1 ); - bli_acquire_mpart_b2t( BLIS_SUBPART1, - i, bm_alg, c, &c1 ); - - // Initialize object for packing A1. - bli_packm_init( &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - - // Pack A1 and scale by alpha (if instructed). - bli_packm_int( alpha, - &a1, &a1_pack, - cntl_sub_packm_a( cntl ) ); - - if ( bli_obj_intersects_diag( a1_pack ) ) - bli_trsm_int( BLIS_LEFT, - alpha, - &a1_pack, - &b_pack, - beta, - &c1, - cntl_sub_trsm( cntl ) ); - else - bli_gemm_int( &BLIS_MINUS_ONE, - &a1_pack, - &b_pack, - &BLIS_ONE, - &c1, - cntl_sub_gemm( cntl ) ); - } - - // If any packing buffers were acquired within packm, release them back - // to the memory manager. - bli_obj_release_pack( &a1_pack ); - bli_obj_release_pack( &b_pack ); -} - diff --git a/frame/3/trsm/other/bli_trsm_u_blk_var4.h b/frame/3/trsm/other/bli_trsm_u_blk_var4.h deleted file mode 100644 index 45f84c2e7..000000000 --- a/frame/3/trsm/other/bli_trsm_u_blk_var4.h +++ /dev/null @@ -1,41 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -void bli_trsm_u_blk_var4( obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - trsm_t* cntl ); - diff --git a/frame/3/gemm/ukernels/bli_gemm_ukr_ref.c b/frame/3/ukernels/bli_gemm_ukr_ref.c similarity index 74% rename from frame/3/gemm/ukernels/bli_gemm_ukr_ref.c rename to frame/3/ukernels/bli_gemm_ukr_ref.c index 7e6806fe0..6486478f7 100644 --- a/frame/3/gemm/ukernels/bli_gemm_ukr_ref.c +++ b/frame/3/ukernels/bli_gemm_ukr_ref.c @@ -34,34 +34,44 @@ #include "blis.h" - #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - const dim_t m = PASTEMAC(ch,mr); \ - const dim_t n = PASTEMAC(ch,nr); \ + const num_t dt = PASTEMAC(ch,type); \ \ - const inc_t cs_a = PASTEMAC(ch,packmr); \ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); \ \ - const inc_t rs_b = PASTEMAC(ch,packnr); \ + const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx ); \ + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); \ \ - const inc_t rs_ab = 1; \ - const inc_t cs_ab = PASTEMAC(ch,mr); \ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + const inc_t cs_a = packmr; \ +\ + const inc_t rs_b = packnr; \ +\ + ctype ab[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ab = 1; \ + const inc_t cs_ab = mr; \ \ dim_t l, j, i; \ \ - ctype ab[ PASTEMAC(ch,mr) * \ - PASTEMAC(ch,nr) ]; \ ctype ai; \ ctype bj; \ \ diff --git a/frame/3/ukernels/bli_gemmtrsm_ukr_ref.c b/frame/3/ukernels/bli_gemmtrsm_ukr_ref.c new file mode 100644 index 000000000..3f7a94d14 --- /dev/null +++ b/frame/3/ukernels/bli_gemmtrsm_ukr_ref.c @@ -0,0 +1,102 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname, gemmkerid, trsmkerid ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a1x, \ + ctype* restrict a11, \ + ctype* restrict bx1, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); \ +\ + const inc_t rs_b = packnr; \ + const inc_t cs_b = 1; \ +\ + ctype* minus_one = PASTEMAC(ch,m1); \ +\ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, gemmkerid, cntx ); \ + PASTECH(ch,trsm_ukr_ft) \ + trsm_ukr = bli_cntx_get_l3_ukr_dt( dt, trsmkerid, cntx ); \ +\ + /* lower: b11 = alpha * b11 - a10 * b01; */ \ + /* upper: b11 = alpha * b11 - a12 * b21; */ \ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + a1x, \ + bx1, \ + alpha, \ + b11, rs_b, cs_b, \ + data, \ + cntx \ + ); \ +\ + /* b11 = inv(a11) * b11; + c11 = b11; */ \ + trsm_ukr \ + ( \ + a11, \ + b11, \ + c11, rs_c, cs_c, \ + data, \ + cntx \ + ); \ +\ +/* +PASTEMAC(d,fprintm)( stdout, "gemmtrsm_ukr: b0111p_r after", k+3, 8, \ + ( double* )b01, 2*PASTEMAC(ch,packnr), 2, "%4.1f", "" ); \ +PASTEMAC(d,fprintm)( stdout, "gemmtrsm_ukr: b0111p_i after", k+3, 8, \ + ( double* )b01 + 1, 2*PASTEMAC(ch,packnr), 2, "%4.1f", "" ); \ +*/ \ +} + +INSERT_GENTFUNC_BASIC2( gemmtrsm_l_ukr_ref, BLIS_GEMM_UKR, BLIS_TRSM_L_UKR ) +INSERT_GENTFUNC_BASIC2( gemmtrsm_u_ukr_ref, BLIS_GEMM_UKR, BLIS_TRSM_U_UKR ) + diff --git a/frame/3/ukernels/bli_l3_ukr_ref.h b/frame/3/ukernels/bli_l3_ukr_ref.h new file mode 100644 index 000000000..688caa928 --- /dev/null +++ b/frame/3/ukernels/bli_l3_ukr_ref.h @@ -0,0 +1,53 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +// Redefine level-3 micro-kernel API names to induce prototypes. + +#undef gemm_ukr_name +#define gemm_ukr_name gemm_ukr_ref + +#undef gemmtrsm_l_ukr_name +#define gemmtrsm_l_ukr_name gemmtrsm_l_ukr_ref +#undef gemmtrsm_u_ukr_name +#define gemmtrsm_u_ukr_name gemmtrsm_u_ukr_ref + +#undef trsm_l_ukr_name +#define trsm_l_ukr_name trsm_l_ukr_ref +#undef trsm_u_ukr_name +#define trsm_u_ukr_name trsm_u_ukr_ref + +// Include the micro-kernel API template. + +#include "bli_l3_ukr.h" + diff --git a/frame/3/ukernels/bli_trsm_ukr_ref.c b/frame/3/ukernels/bli_trsm_ukr_ref.c new file mode 100644 index 000000000..a49299664 --- /dev/null +++ b/frame/3/ukernels/bli_trsm_ukr_ref.c @@ -0,0 +1,199 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); \ +\ + const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx ); \ + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + const inc_t rs_a = 1; \ + const inc_t cs_a = packmr; \ +\ + const inc_t rs_b = packnr; \ + const inc_t cs_b = 1; \ +\ + dim_t iter, i, j, l; \ + dim_t n_behind; \ +\ + for ( iter = 0; iter < m; ++iter ) \ + { \ + i = iter; \ + n_behind = i; \ +\ + ctype* restrict alpha11 = a + (i )*rs_a + (i )*cs_a; \ + ctype* restrict a10t = a + (i )*rs_a + (0 )*cs_a; \ + ctype* restrict B0 = b + (0 )*rs_b + (0 )*cs_b; \ + ctype* restrict b1 = b + (i )*rs_b + (0 )*cs_b; \ +\ + /* b1 = b1 - a10t * B0; */ \ + /* b1 = b1 / alpha11; */ \ + for ( j = 0; j < n; ++j ) \ + { \ + ctype* restrict b01 = B0 + (0 )*rs_b + (j )*cs_b; \ + ctype* restrict beta11 = b1 + (0 )*rs_b + (j )*cs_b; \ + ctype* restrict gamma11 = c + (i )*rs_c + (j )*cs_c; \ + ctype beta11c = *beta11; \ + ctype rho11; \ +\ + /* beta11 = beta11 - a10t * b01; */ \ + PASTEMAC(ch,set0s)( rho11 ); \ + for ( l = 0; l < n_behind; ++l ) \ + { \ + ctype* restrict alpha10 = a10t + (l )*cs_a; \ + ctype* restrict beta01 = b01 + (l )*rs_b; \ +\ + PASTEMAC(ch,axpys)( *alpha10, *beta01, rho11 ); \ + } \ + PASTEMAC(ch,subs)( rho11, beta11c ); \ +\ + /* beta11 = beta11 / alpha11; */ \ + /* NOTE: The INVERSE of alpha11 (1.0/alpha11) is stored instead + of alpha11, so we can multiply rather than divide. We store + the inverse of alpha11 intentionally to avoid expensive + division instructions within the micro-kernel. */ \ + PASTEMAC(ch,scals)( *alpha11, beta11c ); \ +\ + /* Output final result to matrix c. */ \ + PASTEMAC(ch,copys)( beta11c, *gamma11 ); \ +\ + /* Store the local value back to b11. */ \ + PASTEMAC(ch,copys)( beta11c, *beta11 ); \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC0( trsm_l_ukr_ref ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ +{ \ + const num_t dt = PASTEMAC(ch,type); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); \ +\ + const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx ); \ + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + const inc_t rs_a = 1; \ + const inc_t cs_a = packmr; \ +\ + const inc_t rs_b = packnr; \ + const inc_t cs_b = 1; \ +\ + dim_t iter, i, j, l; \ + dim_t n_behind; \ +\ + for ( iter = 0; iter < m; ++iter ) \ + { \ + i = m - iter - 1; \ + n_behind = iter; \ +\ + ctype* restrict alpha11 = a + (i )*rs_a + (i )*cs_a; \ + ctype* restrict a12t = a + (i )*rs_a + (i+1)*cs_a; \ + ctype* restrict b1 = b + (i )*rs_b + (0 )*cs_b; \ + ctype* restrict B2 = b + (i+1)*rs_b + (0 )*cs_b; \ +\ + /* b1 = b1 - a12t * B2; */ \ + /* b1 = b1 / alpha11; */ \ + for ( j = 0; j < n; ++j ) \ + { \ + ctype* restrict beta11 = b1 + (0 )*rs_b + (j )*cs_b; \ + ctype* restrict b21 = B2 + (0 )*rs_b + (j )*cs_b; \ + ctype* restrict gamma11 = c + (i )*rs_c + (j )*cs_c; \ + ctype beta11c = *beta11; \ + ctype rho11; \ +\ + /* beta11 = beta11 - a12t * b21; */ \ + PASTEMAC(ch,set0s)( rho11 ); \ + for ( l = 0; l < n_behind; ++l ) \ + { \ + ctype* restrict alpha12 = a12t + (l )*cs_a; \ + ctype* restrict beta21 = b21 + (l )*rs_b; \ +\ + PASTEMAC(ch,axpys)( *alpha12, *beta21, rho11 ); \ + } \ + PASTEMAC(ch,subs)( rho11, beta11c ); \ +\ + /* beta11 = beta11 / alpha11; */ \ + /* NOTE: The INVERSE of alpha11 (1.0/alpha11) is stored instead + of alpha11, so we can multiply rather than divide. We store + the inverse of alpha11 intentionally to avoid expensive + division instructions within the micro-kernel. */ \ + PASTEMAC(ch,scals)( *alpha11, beta11c ); \ +\ + /* Output final result to matrix c. */ \ + PASTEMAC(ch,copys)( beta11c, *gamma11 ); \ +\ + /* Store the local value back to b11. */ \ + PASTEMAC(ch,copys)( beta11c, *beta11 ); \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC0( trsm_u_ukr_ref ) + diff --git a/frame/base/bli_blocksize.c b/frame/base/bli_blksz.c similarity index 65% rename from frame/base/bli_blocksize.c rename to frame/base/bli_blksz.c index 4e923a0f0..388ee11a5 100644 --- a/frame/base/bli_blocksize.c +++ b/frame/base/bli_blksz.c @@ -67,26 +67,6 @@ void bli_blksz_obj_init( blksz_t* b, b->e[BLIS_DOUBLE] = be_d; b->e[BLIS_SCOMPLEX] = be_c; b->e[BLIS_DCOMPLEX] = be_z; - - // By default, set the blocksize multiple, mr, and nr fields - // to NULL. - b->mult = NULL; - b->mr = NULL; - b->nr = NULL; -} - -void bli_blksz_obj_attach_mult_to( blksz_t* br, - blksz_t* bc ) -{ - bc->mult = br; -} - -void bli_blksz_obj_attach_mr_nr_to( blksz_t* bmr, - blksz_t* bnr, - blksz_t* bc ) -{ - bc->mr = bmr; - bc->nr = bnr; } void bli_blksz_obj_free( blksz_t* b ) @@ -96,121 +76,33 @@ void bli_blksz_obj_free( blksz_t* b ) // ----------------------------------------------------------------------------- -void bli_blksz_set_def( dim_t val, - num_t dt, - blksz_t* b ) +void bli_blksz_reduce_dt_to( num_t dt_bm, blksz_t* bmult, + num_t dt_bs, blksz_t* blksz ) { - b->v[ dt ] = val; -} + dim_t blksz_def = bli_blksz_get_def( dt_bs, blksz ); + dim_t blksz_max = bli_blksz_get_max( dt_bs, blksz ); -void bli_blksz_set_max( dim_t val, - num_t dt, - blksz_t* b ) -{ - b->e[ dt ] = val; -} + dim_t bmult_val = bli_blksz_get_def( dt_bm, bmult ); -void bli_blksz_set_def_max( dim_t def, - dim_t max, - num_t dt, - blksz_t* b ) -{ - bli_blksz_set_def( def, dt, b ); - bli_blksz_set_max( max, dt, b ); -} + // If the blocksize multiple is zero, we do nothing. + if ( bmult_val == 0 ) return; -// ----------------------------------------------------------------------------- + // Round the default and maximum blocksize values down to their + // respective nearest multiples of bmult_val. (Notice that we + // ignore the "max" entry in the bmult object since that would + // correspond to the packing dimension, which plays no role + // as a blocksize multiple.) + blksz_def = ( blksz_def / bmult_val ) * bmult_val; + blksz_max = ( blksz_max / bmult_val ) * bmult_val; -void bli_blksz_reduce_to_mult( blksz_t* b ) -{ - num_t dt; + // Make sure the new blocksize values are at least the blocksize + // multiple. + if ( blksz_def == 0 ) blksz_def = bmult_val; + if ( blksz_max == 0 ) blksz_max = bmult_val; - // If there is no blocksize multiple currently attached, we - // do nothing. - if ( bli_blksz_mult( b ) == NULL ) return; - - for ( dt = BLIS_DT_LO; dt <= BLIS_DT_HI; ++dt ) - { - dim_t b_def = bli_blksz_get_def( dt, b ); - dim_t b_max = bli_blksz_get_max( dt, b ); - dim_t b_mult = bli_blksz_get_mult( dt, b ); - - // If the blocksize multiple is zero, we skip this datatype. - if ( b_mult == 0 ) continue; - - // Round default and maximum blocksize values down to nearest - // multiple of b_mult. - b_def = ( b_def / b_mult ) * b_mult; - b_max = ( b_max / b_mult ) * b_mult; - - // Make sure the blocksizes are at least b_mult. - if ( b_def == 0 ) b_def = b_mult; - if ( b_max == 0 ) b_max = b_mult; - - // Store the new blocksizes back to the object. - bli_blksz_set_def_max( b_def, b_max, dt, b ); - } -} - -// ----------------------------------------------------------------------------- - -dim_t bli_blksz_get_def( num_t dt, blksz_t* b ) -{ - return b->v[ dt ]; -} - -dim_t bli_blksz_get_max( num_t dt, blksz_t* b ) -{ - return b->e[ dt ]; -} - -dim_t bli_blksz_get_def_for_obj( obj_t* obj, blksz_t* b ) -{ - return bli_blksz_get_def( bli_obj_datatype( *obj ), b ); -} - -dim_t bli_blksz_get_max_for_obj( obj_t* obj, blksz_t* b ) -{ - return bli_blksz_get_max( bli_obj_datatype( *obj ), b ); -} - -// ----------------------------------------------------------------------------- - -blksz_t* bli_blksz_mult( blksz_t* b ) -{ - return b->mult; -} - -dim_t bli_blksz_get_mult( num_t dt, blksz_t* b ) -{ - return bli_blksz_get_def( dt, bli_blksz_mult( b ) ); -} - -dim_t bli_blksz_get_mult_for_obj( obj_t* obj, blksz_t* b ) -{ - return bli_blksz_get_mult( bli_obj_datatype( *obj ), b ); -} - -// ----------------------------------------------------------------------------- - -blksz_t* bli_blksz_mr( blksz_t* b ) -{ - return b->mr; -} - -blksz_t* bli_blksz_nr( blksz_t* b ) -{ - return b->nr; -} - -dim_t bli_blksz_get_mr( num_t dt, blksz_t* b ) -{ - return bli_blksz_get_def( dt, bli_blksz_mr( b ) ); -} - -dim_t bli_blksz_get_nr( num_t dt, blksz_t* b ) -{ - return bli_blksz_get_def( dt, bli_blksz_nr( b ) ); + // Store the new blocksizes back to the object. + bli_blksz_set_def( blksz_def, dt_bs, blksz ); + bli_blksz_set_max( blksz_max, dt_bs, blksz ); } // ----------------------------------------------------------------------------- @@ -218,15 +110,18 @@ dim_t bli_blksz_get_nr( num_t dt, blksz_t* b ) dim_t bli_determine_blocksize_f( dim_t i, dim_t dim, obj_t* obj, - blksz_t* bsize ) + bszid_t bszid, + cntx_t* cntx ) { - num_t dt; - dim_t b_alg, b_max; - dim_t b_use; + num_t dt; + blksz_t* bsize; + dim_t b_alg, b_max; + dim_t b_use; // Extract the execution datatype and use it to query the corresponding // blocksize and blocksize maximum values from the blksz_t object. dt = bli_obj_execution_datatype( *obj ); + bsize = bli_cntx_get_blksz( bszid, cntx ); b_alg = bli_blksz_get_def( dt, bsize ); b_max = bli_blksz_get_max( dt, bsize ); @@ -269,15 +164,18 @@ dim_t bli_determine_blocksize_f_sub( dim_t i, dim_t bli_determine_blocksize_b( dim_t i, dim_t dim, obj_t* obj, - blksz_t* bsize ) + bszid_t bszid, + cntx_t* cntx ) { - num_t dt; - dim_t b_alg, b_max; - dim_t b_use; + num_t dt; + blksz_t* bsize; + dim_t b_alg, b_max; + dim_t b_use; // Extract the execution datatype and use it to query the corresponding // blocksize and blocksize maximum values from the blksz_t object. dt = bli_obj_execution_datatype( *obj ); + bsize = bli_cntx_get_blksz( bszid, cntx ); b_alg = bli_blksz_get_def( dt, bsize ); b_max = bli_blksz_get_max( dt, bsize ); diff --git a/frame/base/bli_blocksize.h b/frame/base/bli_blksz.h similarity index 62% rename from frame/base/bli_blocksize.h rename to frame/base/bli_blksz.h index 1cb989518..11a8cb650 100644 --- a/frame/base/bli_blocksize.h +++ b/frame/base/bli_blksz.h @@ -32,6 +32,70 @@ */ +// ----------------------------------------------------------------------------- + +// blksz_t query + +#define bli_blksz_get_def( dt, b ) \ +\ + ( (b)->v[ dt ] ) + +#define bli_blksz_get_max( dt, b ) \ +\ + ( (b)->e[ dt ] ) + +#define bli_blksz_get_def_max( def, max, dt, b ) \ +{ \ + *(def) = bli_blksz_get_def( dt, b ); \ + *(max) = bli_blksz_get_max( dt, b ); \ +} + +#define bli_blksz_get_def_for_obj( obj, b ) \ +\ + bli_blksz_get_def( bli_obj_datatype( *(obj) ), b ) + +#define bli_blksz_get_max_for_obj( obj, b ) \ +\ + bli_blksz_get_max( bli_obj_datatype( *(obj) ), b ) + + +// blksz_t modification + +#define bli_blksz_set_def( val, dt, b ) \ +{ \ + (b)->v[ dt ] = val; \ +} + +#define bli_blksz_set_max( val, dt, b ) \ +{ \ + (b)->e[ dt ] = val; \ +} + +#define bli_blksz_set_def_max( def, max, dt, b ) \ +{ \ + bli_blksz_set_def( def, dt, b ); \ + bli_blksz_set_max( max, dt, b ); \ +} + +#define bli_blksz_copy( b_src, b_dst ) \ +{ \ + *(b_dst) = *(b_src); \ +} + +#define bli_blksz_copy_dt( dt_src, b_src, \ + dt_dst, b_dst ) \ +{ \ + (b_dst)->v[ dt_dst ] = (b_src)->v[ dt_src ]; \ + (b_dst)->e[ dt_dst ] = (b_src)->e[ dt_src ]; \ +} + +#define bli_blksz_scale_dt_by( num, den, dt, b ) \ +{ \ + (b)->v[ dt ] = ( (b)->v[ dt ] * num ) / den; \ + (b)->e[ dt ] = ( (b)->e[ dt ] * num ) / den; \ +} + +// ----------------------------------------------------------------------------- blksz_t* bli_blksz_obj_create( dim_t b_s, dim_t be_s, dim_t b_d, dim_t be_d, @@ -44,52 +108,20 @@ void bli_blksz_obj_init( blksz_t* b, dim_t b_c, dim_t be_c, dim_t b_z, dim_t be_z ); -void bli_blksz_obj_attach_mult_to( blksz_t* br, - blksz_t* bc ); - -void bli_blksz_obj_attach_mr_nr_to( blksz_t* bmr, - blksz_t* bnr, - blksz_t* bc ); - void bli_blksz_obj_free( blksz_t* b ); -void bli_blksz_set_def( dim_t val, - num_t dt, - blksz_t* b ); +// ----------------------------------------------------------------------------- -void bli_blksz_set_max( dim_t val, - num_t dt, - blksz_t* b ); - -void bli_blksz_set_def_max( dim_t def, - dim_t max, - num_t dt, - blksz_t* b ); - -void bli_blksz_reduce_to_mult( blksz_t* b ); - -dim_t bli_blksz_get_def( num_t dt, blksz_t* b ); -dim_t bli_blksz_get_max( num_t dt, blksz_t* b ); - -dim_t bli_blksz_get_def_for_obj( obj_t* obj, blksz_t* b ); -dim_t bli_blksz_get_max_for_obj( obj_t* obj, blksz_t* b ); - -blksz_t* bli_blksz_mult( blksz_t* b ); -dim_t bli_blksz_get_mult( num_t dt, blksz_t* b ); -dim_t bli_blksz_get_mult_for_obj( obj_t* obj, blksz_t* b ); - -blksz_t* bli_blksz_mr( blksz_t* b ); -blksz_t* bli_blksz_nr( blksz_t* b ); - -dim_t bli_blksz_get_mr( num_t dt, blksz_t* b ); -dim_t bli_blksz_get_nr( num_t dt, blksz_t* b ); +void bli_blksz_reduce_dt_to( num_t dt_bm, blksz_t* bmult, + num_t dt_bs, blksz_t* blksz ); // ----------------------------------------------------------------------------- dim_t bli_determine_blocksize_f( dim_t i, dim_t dim, obj_t* obj, - blksz_t* bsize ); + bszid_t bszid, + cntx_t* cntx ); dim_t bli_determine_blocksize_f_sub( dim_t i, dim_t dim, dim_t b_alg, @@ -98,7 +130,8 @@ dim_t bli_determine_blocksize_f_sub( dim_t i, dim_t bli_determine_blocksize_b( dim_t i, dim_t dim, obj_t* obj, - blksz_t* bsize ); + bszid_t bszid, + cntx_t* cntx ); dim_t bli_determine_blocksize_b_sub( dim_t i, dim_t dim, dim_t b_alg, diff --git a/frame/base/bli_check.c b/frame/base/bli_check.c index d333a4065..f6b5ae47f 100644 --- a/frame/base/bli_check.c +++ b/frame/base/bli_check.c @@ -617,6 +617,18 @@ err_t bli_check_triangular_object( obj_t* a ) return e_val; } +err_t bli_check_object_struc( obj_t* a, struc_t struc ) +{ + err_t e_val = BLIS_SUCCESS; + + if ( bli_is_general( struc ) ) e_val = bli_check_general_object( a ); + else if ( bli_is_hermitian( struc ) ) e_val = bli_check_hermitian_object( a ); + else if ( bli_is_symmetric( struc ) ) e_val = bli_check_symmetric_object( a ); + else if ( bli_is_triangular( struc ) ) e_val = bli_check_triangular_object( a ); + + return e_val; +} + // -- Storage-related checks --------------------------------------------------- err_t bli_check_upper_or_lower_object( obj_t* a ) @@ -765,7 +777,27 @@ err_t bli_check_if_exhausted_pool( pool_t* pool ) return e_val; } -// -- Memory allocator checks -------------------------------------------------- +err_t bli_check_sufficient_stack_buf_size( num_t dt, cntx_t* cntx ) +{ + err_t e_val = BLIS_SUCCESS; + + dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx ); + dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx ); + siz_t dt_size = bli_datatype_size( dt ); + + // NOTE: For induced methods, we use the size of the complex datatypes + // (rather than the size of the native micro-kernels' datatype) because + // the macro-kernel needs this larger micro-tile footprint, even if the + // virtual micro-kernel implementation will only ever be writing to half + // of it (real or imaginary part) at a time. + + if ( mr * nr * dt_size > BLIS_STACK_BUF_MAX_SIZE ) + e_val = BLIS_INSUFFICIENT_STACK_BUF_SIZE; + + return e_val; +} + +// -- Object-related errors ---------------------------------------------------- err_t bli_check_object_alias_of( obj_t* a, obj_t* b ) { diff --git a/frame/base/bli_check.h b/frame/base/bli_check.h index 721575ae9..e614861a8 100644 --- a/frame/base/bli_check.h +++ b/frame/base/bli_check.h @@ -81,6 +81,7 @@ err_t bli_check_general_object( obj_t* a ); err_t bli_check_hermitian_object( obj_t* a ); err_t bli_check_symmetric_object( obj_t* a ); err_t bli_check_triangular_object( obj_t* a ); +err_t bli_check_object_struc( obj_t* a, struc_t struc ); err_t bli_check_upper_or_lower_object( obj_t* a ); @@ -98,6 +99,7 @@ err_t bli_check_object_buffer( obj_t* a ); err_t bli_check_valid_packbuf( packbuf_t buf_type ); err_t bli_check_requested_block_size_for_pool( siz_t req_size, pool_t* pool ); err_t bli_check_if_exhausted_pool( pool_t* pool ); +err_t bli_check_sufficient_stack_buf_size( num_t dt, cntx_t* cntx ); err_t bli_check_object_alias_of( obj_t* a, obj_t* b ); diff --git a/frame/base/bli_cntx.c b/frame/base/bli_cntx.c new file mode 100644 index 000000000..53af75ec6 --- /dev/null +++ b/frame/base/bli_cntx.c @@ -0,0 +1,868 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +#if 0 +// +// NOTE: Since these functions currently do nothing, they are defined +// as empty macros in bli_cntx. +// +void bli_cntx_obj_create( cntx_t* cntx ) +{ + // Since cntx_t objects contain statically-allocated arrays, + // we don't need to do anything in order to create the cntx_t + // instance. +} + +void bli_cntx_obj_free( cntx_t* cntx ) +{ + // Just as we don't need to do anything in order to create a + // cntx_t instance, we don't need to do anything to destory + // one. +} +#endif + +void bli_cntx_obj_clear( cntx_t* cntx ) +{ + blksz_t* blkszs = bli_cntx_blkszs_buf( cntx ); + bszid_t* bmults = bli_cntx_bmults_buf( cntx ); + func_t* l3_vir_ukrs = bli_cntx_l3_vir_ukrs_buf( cntx ); + func_t* l3_nat_ukrs = bli_cntx_l3_nat_ukrs_buf( cntx ); + mbool_t* l3_nat_ukrs_prefs = bli_cntx_l3_nat_ukrs_prefs_buf( cntx ); + func_t* l1f_kers = bli_cntx_l1f_kers_buf( cntx ); + func_t* l1v_kers = bli_cntx_l1v_kers_buf( cntx ); + func_t* packm_ukrs = bli_cntx_packm_ukrs_buf( cntx ); + + dim_t i; + + // Initialize all of the elements of every array to a sane initial + // value. (Strictly speaking, there is no "null" value for typedef'ed + // enums such as bszid_t, so we cheat a little by using 0.) + + func_t null_func = { { NULL, NULL, NULL, NULL } }; + blksz_t null_blksz = { { 0, 0, 0, 0, } }; + mbool_t null_mbool = { { FALSE, FALSE, FALSE, FALSE } }; + bszid_t null_bszid = 0; + + for ( i = 0; i < BLIS_NUM_BLKSZS; ++i ) + { + blkszs[ i ] = null_blksz; + } + for ( i = 0; i < BLIS_NUM_BLKSZS; ++i ) + { + bmults[ i ] = null_bszid; + } + for ( i = 0; i < BLIS_NUM_LEVEL3_UKRS; ++i ) + { + l3_vir_ukrs[ i ] = null_func; + l3_nat_ukrs[ i ] = null_func; + l3_nat_ukrs_prefs[ i ] = null_mbool; + } + for ( i = 0; i < BLIS_NUM_LEVEL1F_KERS; ++i ) + { + l1f_kers[ i ] = null_func; + } + for ( i = 0; i < BLIS_NUM_LEVEL1V_KERS; ++i ) + { + l1v_kers[ i ] = null_func; + } + { + packm_ukrs[ 0 ] = null_func; + } + + // NOTE: It doesn't make sense to initialize method or schema fields + // at this time; the method field would normally be set by _set_blkszs() + // and the schema fields are set by _set_pack_schema_[abc](). +} + +void bli_cntx_init( cntx_t* cntx ) +{ + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMMTRSM_L_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMMTRSM_U_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr( BLIS_TRSM_L_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr( BLIS_TRSM_U_UKR, cntx ); + + bli_gks_cntx_set_blkszs( BLIS_NAT, 6, + BLIS_NC, BLIS_NR, + BLIS_KC, BLIS_KR, + BLIS_MC, BLIS_KR, + BLIS_NR, BLIS_NR, + BLIS_MR, BLIS_MR, + BLIS_KR, BLIS_KR, + cntx ); + + bli_gks_cntx_set_l1f_ker( BLIS_AXPY2V_KER, cntx ); + bli_gks_cntx_set_l1f_ker( BLIS_DOTAXPYV_KER, cntx ); + bli_gks_cntx_set_l1f_ker( BLIS_AXPYF_KER, cntx ); + bli_gks_cntx_set_l1f_ker( BLIS_DOTXF_KER, cntx ); + bli_gks_cntx_set_l1f_ker( BLIS_DOTXAXPYF_KER, cntx ); + + bli_gks_cntx_set_blkszs( BLIS_NAT, 3, + BLIS_AF, BLIS_AF, + BLIS_DF, BLIS_DF, + BLIS_XF, BLIS_XF, + cntx ); + + bli_gks_cntx_set_l1v_ker( BLIS_ADDV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_AXPYV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_COPYV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_DOTV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_DOTXV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_INVERTV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_SCALV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_SCAL2V_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_SETV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_SUBV_KER, cntx ); + bli_gks_cntx_set_l1v_ker( BLIS_SWAPV_KER, cntx ); +} + +// ----------------------------------------------------------------------------- + +blksz_t* bli_cntx_get_blksz( bszid_t bs_id, + cntx_t* cntx ) +{ + blksz_t* blkszs = bli_cntx_blkszs_buf( cntx ); + blksz_t* blksz = &blkszs[ bs_id ]; + + // Return the address of the blksz_t identified by bs_id. + return blksz; +} + +dim_t bli_cntx_get_blksz_def_dt( num_t dt, + bszid_t bs_id, + cntx_t* cntx ) +{ + blksz_t* blkszs = bli_cntx_blkszs_buf( cntx ); + blksz_t* blksz = &blkszs[ bs_id ]; + + // Return the default blocksize value for the datatype given. + return bli_blksz_get_def( dt, blksz ); +} + +dim_t bli_cntx_get_blksz_max_dt( num_t dt, + bszid_t bs_id, + cntx_t* cntx ) +{ + blksz_t* blkszs = bli_cntx_blkszs_buf( cntx ); + blksz_t* blksz = &blkszs[ bs_id ]; + + // Return the default blocksize value for the datatype given. + return bli_blksz_get_max( dt, blksz ); +} + +blksz_t* bli_cntx_get_bmult( bszid_t bs_id, + cntx_t* cntx ) +{ + blksz_t* blkszs = bli_cntx_blkszs_buf( cntx ); + bszid_t* bmults = bli_cntx_bmults_buf( cntx ); + bszid_t bm_id = bmults[ bs_id ]; + blksz_t* bmult = &blkszs[ bm_id ]; + + // Return the address of the blksz_t identified by the multiple for + // the blocksize corresponding to bs_id. + return bmult; +} + +dim_t bli_cntx_get_bmult_dt( num_t dt, + bszid_t bs_id, + cntx_t* cntx ) +{ + blksz_t* bmult = bli_cntx_get_bmult( bs_id, cntx ); + + return bli_blksz_get_def( dt, bmult ); +#if 0 + blksz_t* blkszs = bli_cntx_blkszs_buf( cntx ); + bszid_t* bmults = bli_cntx_bmults_buf( cntx ); + bszid_t bm_id = bmults[ bs_id ]; + + // A little hack to ensure we don't try to access a blocksize object + // using an uninitialized/garbage value in the bmults array (which + // may exist because that blocksize in the context was never set). + if ( bm_id < BLIS_BSZID_LO && BLIS_BSZID_HI < bm_id ) return 0; + + blksz_t* bmult = &blkszs[ bm_id ]; + + return bli_blksz_get_def( dt, bmult ); +#endif +} + +func_t* bli_cntx_get_l3_ukr( l3ukr_t ukr_id, + cntx_t* cntx ) +{ + func_t* l3_vir_ukrs = bli_cntx_l3_vir_ukrs_buf( cntx ); + func_t* l3_nat_ukrs = bli_cntx_l3_nat_ukrs_buf( cntx ); + func_t* l3_ukrs; + func_t* l3_ukr; + + // If the context was set up for non-native (ie: induced) execution, + // the virtual ukernel func_t's will contain the appropriate function + // pointers. Otherwise, we use the native ukernel func_t's. + if ( bli_cntx_method( cntx ) != BLIS_NAT ) l3_ukrs = l3_vir_ukrs; + else l3_ukrs = l3_nat_ukrs; + + // Index into the func_t array chosen above using the ukr_id. + l3_ukr = &l3_ukrs[ ukr_id ]; + + // Return the address of the func_t identified by ukr_id. + return l3_ukr; +} + +void* bli_cntx_get_l3_ukr_dt( num_t dt, + l3ukr_t ukr_id, + cntx_t* cntx ) +{ + func_t* l3_vir_ukrs = bli_cntx_l3_vir_ukrs_buf( cntx ); + func_t* l3_nat_ukrs = bli_cntx_l3_nat_ukrs_buf( cntx ); + func_t* l3_ukrs; + func_t* l3_ukr; + + // If the context was set up for non-native (ie: induced) execution, + // the virtual ukernel func_t's will contain the appropriate function + // pointers. Otherwise, we use the native ukernel func_t's. + if ( bli_cntx_method( cntx ) != BLIS_NAT ) l3_ukrs = l3_vir_ukrs; + else l3_ukrs = l3_nat_ukrs; + + // Index into the func_t array chosen above using the ukr_id. + l3_ukr = &l3_ukrs[ ukr_id ]; + + return bli_func_get_dt( dt, l3_ukr ); +} + +func_t* bli_cntx_get_l3_vir_ukr( l3ukr_t ukr_id, + cntx_t* cntx ) +{ + func_t* l3_vir_ukrs = bli_cntx_l3_vir_ukrs_buf( cntx ); + func_t* l3_vir_ukr = &l3_vir_ukrs[ ukr_id ]; + + // Return the address of the virtual level-3 micro-kernel func_t + // identified by ukr_id. + return l3_vir_ukr; +} + +void* bli_cntx_get_l3_vir_ukr_dt( num_t dt, + l3ukr_t ukr_id, + cntx_t* cntx ) +{ + func_t* l3_vir_ukrs = bli_cntx_l3_vir_ukrs_buf( cntx ); + func_t* l3_vir_ukr = &l3_vir_ukrs[ ukr_id ]; + + // Return the address of the virtual level-3 micro-kernel func_t + // identified by ukr_id. + return bli_func_get_dt( dt, l3_vir_ukr ); +} + +func_t* bli_cntx_get_l3_nat_ukr( l3ukr_t ukr_id, + cntx_t* cntx ) +{ + func_t* l3_nat_ukrs = bli_cntx_l3_nat_ukrs_buf( cntx ); + func_t* l3_nat_ukr = &l3_nat_ukrs[ ukr_id ]; + + // Return the address of the native level-3 micro-kernel func_t + // identified by ukr_id. + return l3_nat_ukr; +} + +void* bli_cntx_get_l3_nat_ukr_dt( num_t dt, + l3ukr_t ukr_id, + cntx_t* cntx ) +{ + func_t* l3_nat_ukrs = bli_cntx_l3_nat_ukrs_buf( cntx ); + func_t* l3_nat_ukr = &l3_nat_ukrs[ ukr_id ]; + + // Return the address of the native level-3 micro-kernel func_t + // identified by ukr_id. + return bli_func_get_dt( dt, l3_nat_ukr ); +} + +func_t* bli_cntx_get_l1f_ker( l1fkr_t ker_id, + cntx_t* cntx ) +{ + func_t* l1f_kers = bli_cntx_l1f_kers_buf( cntx ); + func_t* l1f_ker = &l1f_kers[ ker_id ]; + + // Return the address of the level-1f kernel func_t identified by + // ker_id. + return l1f_ker; +} + +void* bli_cntx_get_l1f_ker_dt( num_t dt, + l1fkr_t ker_id, + cntx_t* cntx ) +{ + func_t* l1f_kers = bli_cntx_l1f_kers_buf( cntx ); + func_t* l1f_ker = &l1f_kers[ ker_id ]; + + return bli_func_get_dt( dt, l1f_ker ); +} + +func_t* bli_cntx_get_l1v_ker( l1vkr_t ker_id, + cntx_t* cntx ) +{ + func_t* l1v_kers = bli_cntx_l1v_kers_buf( cntx ); + func_t* l1v_ker = &l1v_kers[ ker_id ]; + + // Return the address of the level-1v kernel func_t identified by + // ker_id. + return l1v_ker; +} + +void* bli_cntx_get_l1v_ker_dt( num_t dt, + l1vkr_t ker_id, + cntx_t* cntx ) +{ + func_t* l1v_kers = bli_cntx_l1v_kers_buf( cntx ); + func_t* l1v_ker = &l1v_kers[ ker_id ]; + + return bli_func_get_dt( dt, l1v_ker ); +} + +mbool_t* bli_cntx_get_l3_nat_ukr_prefs( l3ukr_t ukr_id, + cntx_t* cntx ) +{ + mbool_t* l3_nat_ukrs_prefs = bli_cntx_l3_nat_ukrs_prefs_buf( cntx ); + mbool_t* l3_nat_ukrs_pref = &l3_nat_ukrs_prefs[ ukr_id ]; + + // Return the address of the native kernel func_t identified by ukr_id. + return l3_nat_ukrs_pref; +} + +func_t* bli_cntx_get_packm_ukr( cntx_t* cntx ) +{ + func_t* packm_ukrs = bli_cntx_packm_ukrs( cntx ); + + // Return the address of the func_t that contains the packm ukernels. + return packm_ukrs; +} + +ind_t bli_cntx_get_ind_method( cntx_t* cntx ) +{ + return bli_cntx_method( cntx ); +} + +pack_t bli_cntx_get_pack_schema_a( cntx_t* cntx ) +{ + return bli_cntx_schema_a( cntx ); +} + +pack_t bli_cntx_get_pack_schema_b( cntx_t* cntx ) +{ + return bli_cntx_schema_b( cntx ); +} + +// ----------------------------------------------------------------------------- + +#if 1 +// +// NOTE: This function is disabled because: +// - we currently do not have any need to set a context direclty with +// blksz_t objects +// - it may be broken; it needs to be synced up with the corresponding +// function in bli_gks.c. +// +void bli_cntx_set_blkszs( ind_t method, dim_t n_bs, ... ) +{ + /* Example prototypes: + + void + bli_cntx_set_blkszs( + + ind_t method = BLIS_NAT, + dim_t n_bs, + bszid_t bs0_id, blksz_t* blksz0, bszid_t bm0_id, + bszid_t bs1_id, blksz_t* blksz1, bszid_t bm1_id, + bszid_t bs2_id, blksz_t* blksz2, bszid_t bm2_id, + ... + cntx_t* cntx ); + + void + bli_cntx_set_blkszs( + + ind_t method != BLIS_NAT, + dim_t n_bs, + bszid_t bs0_id, blksz_t* blksz0, bszid_t bm0_id, dim_t scalr0, + bszid_t bs1_id, blksz_t* blksz1, bszid_t bm1_id, dim_t scalr1, + bszid_t bs2_id, blksz_t* blksz2, bszid_t bm2_id, dim_t scalr2, + ... + cntx_t* cntx ); + */ + va_list args; + dim_t i; + + bszid_t* bszids; + blksz_t** blkszs; + bszid_t* bmults; + dim_t* scalrs; + + cntx_t* cntx; + + blksz_t* cntx_blkszs; + bszid_t* cntx_bmults; + + + // Allocate some temporary local arrays. + bszids = bli_malloc( n_bs * sizeof( bszid_t ) ); + blkszs = bli_malloc( n_bs * sizeof( blksz_t* ) ); + bmults = bli_malloc( n_bs * sizeof( bszid_t ) ); + scalrs = bli_malloc( n_bs * sizeof( dim_t ) ); + + // -- Begin variable argument section -- + + // Initialize variable argument environment. + va_start( args, n_bs ); + + // Handle native and induced method cases separately. + if ( method == BLIS_NAT ) + { + // Process n_bs tuples. + for ( i = 0; i < n_bs; ++i ) + { + // Here, we query the variable argument list for: + // - the bszid_t of the blocksize we're about to process, + // - the address of the blksz_t object, and + // - the bszid_t of the multiple we need to associate with + // the blksz_t object. + const bszid_t bs_id = va_arg( args, bszid_t ); + blksz_t* blksz = va_arg( args, blksz_t* ); + const bszid_t bm_id = va_arg( args, bszid_t ); + + // Store the values in our temporary arrays. + bszids[ i ] = bs_id; + blkszs[ i ] = blksz; + bmults[ i ] = bm_id; + } + } + else // if induced method execution was indicated + { + // Process n_bs tuples. + for ( i = 0; i < n_bs; ++i ) + { + // Here, we query the variable argument list for: + // - the bszid_t of the blocksize we're about to process, + // - the address of the blksz_t object, and + // - the bszid_t of the multiple we need to associate with + // the blksz_t object. + // - the scalar we wish to apply to the real blocksizes to + // come up with the induced complex blocksizes. + const bszid_t bs_id = va_arg( args, bszid_t ); + blksz_t* blksz = va_arg( args, blksz_t* ); + const bszid_t bm_id = va_arg( args, bszid_t ); + const dim_t scalr = va_arg( args, dim_t ); + + // Store the values in our temporary arrays. + bszids[ i ] = bs_id; + blkszs[ i ] = blksz; + bmults[ i ] = bm_id; + scalrs[ i ] = scalr; + } + } + + // The last argument should be the context pointer. + cntx = va_arg( args, cntx_t* ); + + // Shutdown variable argument environment and clean up stack. + va_end( args ); + + // -- End variable argument section -- + + // Save the execution type into the context. + bli_cntx_set_method( method, cntx ); + + // Query the context for the addresses of: + // - the blocksize object array + // - the blocksize multiple array + cntx_blkszs = bli_cntx_blkszs_buf( cntx ); + cntx_bmults = bli_cntx_bmults_buf( cntx ); + + // Now that we have the context address, we want to copy the values + // from the temporary buffers into the corresponding buffers in the + // context. Notice that the blksz_t* pointers were saved, rather than + // the objects themselves, but we copy the contents of the objects + // when copying into the context. + + // Handle native and induced method cases separately. + if ( method == BLIS_NAT ) + { + // Process each blocksize id tuple provided. + for ( i = 0; i < n_bs; ++i ) + { + // Read the current blocksize id, blksz_t* pointer, blocksize + // multiple id, and blocksize scalar. + const bszid_t bs_id = bszids[ i ]; + const bszid_t bm_id = bmults[ i ]; + + blksz_t* blksz = blkszs[ i ]; + + blksz_t* cntx_blksz = &cntx_blkszs[ bs_id ]; + + // Copy the blksz_t object contents into the appropriate + // location within the context's blksz_t array. Do the same + // for the blocksize multiple id. + //cntx_blkszs[ bs_id ] = *blksz; + bli_blksz_copy( blksz, cntx_blksz ); + + // Copy the blocksize multiple id into the context. + cntx_bmults[ bs_id ] = bm_id; + } + } + else + { + // Process each blocksize id tuple provided. + for ( i = 0; i < n_bs; ++i ) + { + // Read the current blocksize id, blksz_t pointer, blocksize + // multiple id, and blocksize scalar. + const bszid_t bs_id = bszids[ i ]; + const bszid_t bm_id = bmults[ i ]; + const dim_t scalr = scalrs[ i ]; + + blksz_t* blksz = blkszs[ i ]; + blksz_t* bmult = blkszs[ i ]; + + blksz_t* cntx_blksz = &cntx_blkszs[ bs_id ]; + + // Copy the real domain values of the source blksz_t object into + // the context, duplicating into the complex domain fields. + bli_blksz_copy_dt( BLIS_FLOAT, blksz, BLIS_FLOAT, cntx_blksz ); + bli_blksz_copy_dt( BLIS_DOUBLE, blksz, BLIS_DOUBLE, cntx_blksz ); + bli_blksz_copy_dt( BLIS_FLOAT, blksz, BLIS_SCOMPLEX, cntx_blksz ); + bli_blksz_copy_dt( BLIS_DOUBLE, blksz, BLIS_DCOMPLEX, cntx_blksz ); + + // The next steps apply only to cache blocksizes, and not register + // blocksizes (ie: they only apply to blocksizes for which the + // blocksize multiple id is different than the blocksize id) and + // only when the scalar provided is non-unit. + if ( bs_id != bm_id && scalr != 1 ) + { + // Scale the complex domain values in the blocksize object. + bli_blksz_scale_dt_by( 1, scalr, BLIS_SCOMPLEX, cntx_blksz ); + bli_blksz_scale_dt_by( 1, scalr, BLIS_DCOMPLEX, cntx_blksz ); + + // Finally, round the newly-scaled blocksizes down to their + // respective multiples. + bli_blksz_reduce_dt_to( BLIS_FLOAT, bmult, BLIS_SCOMPLEX, cntx_blksz ); + bli_blksz_reduce_dt_to( BLIS_DOUBLE, bmult, BLIS_DCOMPLEX, cntx_blksz ); + } + + // Copy the blocksize multiple id into the context. + cntx_bmults[ bs_id ] = bm_id; + } + } + + // Free the temporary local arrays. + bli_free( blkszs ); + bli_free( bszids ); + bli_free( bmults ); + bli_free( scalrs ); +} +#endif + +// ----------------------------------------------------------------------------- + +void bli_cntx_set_blksz( bszid_t bs_id, + blksz_t* blksz, + bszid_t mult_id, + cntx_t* cntx ) +{ + blksz_t* blkszs = bli_cntx_blkszs_buf( cntx ); + bszid_t* bmults = bli_cntx_bmults_buf( cntx ); + + // Copy the blocksize object into the specified location within + // the context's blocksize array. + blkszs[ bs_id ] = *blksz; + + // Assign the blocksize multiple id to the corresponding location + // in the context's blocksize multiple array. + bmults[ bs_id ] = mult_id; +} + +void bli_cntx_set_l3_vir_ukr( l3ukr_t ukr_id, + func_t* func, + cntx_t* cntx ) +{ + func_t* l3_vir_ukrs = bli_cntx_l3_vir_ukrs_buf( cntx ); + + // Copy the function object into the specified location within + // the context's virtual level-3 ukernel array. + l3_vir_ukrs[ ukr_id ] = *func; +} + +void bli_cntx_set_l3_nat_ukr( l3ukr_t ukr_id, + func_t* func, + cntx_t* cntx ) +{ + func_t* l3_nat_ukrs = bli_cntx_l3_nat_ukrs_buf( cntx ); + + // Copy the function object into the specified location within + // the context's native level-3 ukernel array. + l3_nat_ukrs[ ukr_id ] = *func; +} + +void bli_cntx_set_l3_nat_ukr_prefs( l3ukr_t ukr_id, + mbool_t* prefs, + cntx_t* cntx ) +{ + mbool_t* l3_nat_ukrs_prefs = bli_cntx_l3_nat_ukrs_prefs_buf( cntx ); + + // Copy the mbool_t into the specified location within + // the context's native level-3 ukernel preference array. + l3_nat_ukrs_prefs[ ukr_id ] = *prefs; +} + +void bli_cntx_set_l1f_ker( l1fkr_t ker_id, + func_t* func, + cntx_t* cntx ) +{ + func_t* l1f_kers = bli_cntx_l1f_kers_buf( cntx ); + + // Copy the function object into the specified location within + // the context's level-1f kernel array. + l1f_kers[ ker_id ] = *func; +} + +void bli_cntx_set_l1v_ker( l1vkr_t ker_id, + func_t* func, + cntx_t* cntx ) +{ + func_t* l1v_kers = bli_cntx_l1v_kers_buf( cntx ); + + // Copy the function object into the specified location within + // the context's level-1v kernel array. + l1v_kers[ ker_id ] = *func; +} + +void bli_cntx_set_packm_ukr( func_t* func, + cntx_t* cntx ) +{ + func_t* packm_ukrs = bli_cntx_packm_ukrs( cntx ); + + // Copy the function object into the context's packm ukernel object. + *packm_ukrs = *func; +} + +void bli_cntx_set_ind_method( ind_t method, + cntx_t* cntx ) +{ + bli_cntx_set_method( method, cntx ); +} + +void bli_cntx_set_pack_schema_ab( pack_t schema_a, + pack_t schema_b, + cntx_t* cntx ) +{ + bli_cntx_set_schema_a( schema_a, cntx ); + bli_cntx_set_schema_b( schema_b, cntx ); +} + +void bli_cntx_set_pack_schema_a( pack_t schema_a, + cntx_t* cntx ) +{ + bli_cntx_set_schema_a( schema_a, cntx ); +} + +void bli_cntx_set_pack_schema_b( pack_t schema_b, + cntx_t* cntx ) +{ + bli_cntx_set_schema_b( schema_b, cntx ); +} + +void bli_cntx_set_pack_schema_c( pack_t schema_c, + cntx_t* cntx ) +{ + bli_cntx_set_schema_c( schema_c, cntx ); +} + +// ----------------------------------------------------------------------------- + +bool_t bli_cntx_l3_nat_ukr_prefers_rows_dt( num_t dt, + l3ukr_t ukr_id, + cntx_t* cntx ) +{ + mbool_t* ukrs_prefs = bli_cntx_get_l3_nat_ukr_prefs( ukr_id, cntx ); + bool_t ukr_prefs = bli_mbool_get_dt( dt, ukrs_prefs ); + + // A ukernel preference of TRUE means the ukernel prefers row + // storage. + return ukr_prefs == TRUE; +} + +bool_t bli_cntx_l3_nat_ukr_prefers_cols_dt( num_t dt, + l3ukr_t ukr_id, + cntx_t* cntx ) +{ + mbool_t* ukrs_prefs = bli_cntx_get_l3_nat_ukr_prefs( ukr_id, cntx ); + bool_t ukr_prefs = bli_mbool_get_dt( dt, ukrs_prefs ); + + // A ukernel preference of FALSE means the ukernel prefers column + // storage. + return ukr_prefs == FALSE; +} + +bool_t bli_cntx_l3_nat_ukr_prefers_storage_of( obj_t* obj, + l3ukr_t ukr_id, + cntx_t* cntx ) +{ + return !bli_cntx_l3_nat_ukr_dislikes_storage_of( obj, ukr_id, cntx ); +} + +bool_t bli_cntx_l3_nat_ukr_dislikes_storage_of( obj_t* obj, + l3ukr_t ukr_id, + cntx_t* cntx ) +{ + const num_t dt = bli_obj_datatype( *obj ); + const bool_t ukr_prefers_rows + = bli_cntx_l3_nat_ukr_prefers_rows_dt( dt, ukr_id, cntx ); + const bool_t ukr_prefers_cols + = bli_cntx_l3_nat_ukr_prefers_cols_dt( dt, ukr_id, cntx ); + bool_t r_val = FALSE; + + if ( bli_obj_is_row_stored( *obj ) && ukr_prefers_cols ) r_val = TRUE; + else if ( bli_obj_is_col_stored( *obj ) && ukr_prefers_rows ) r_val = TRUE; + + return r_val; +} + +// ----------------------------------------------------------------------------- + +void bli_cntx_print( cntx_t* cntx ) +{ + dim_t i; + + // Print the values stored in the blksz_t objects. + printf( " s d c z\n" ); +#if 0 + //for ( i = 0; i < BLIS_NUM_BLKSZS; ++i ) + for ( i = 0; i < 6; ++i ) + { + printf( "blksz/mult %2lu: %13lu/%2lu %13lu/%2lu %13lu/%2lu %13lu/%2lu\n", + i, + bli_cntx_get_blksz_def_dt( BLIS_FLOAT, i, cntx ), + bli_cntx_get_bmult_dt ( BLIS_FLOAT, i, cntx ), + bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, i, cntx ), + bli_cntx_get_bmult_dt ( BLIS_DOUBLE, i, cntx ), + bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, i, cntx ), + bli_cntx_get_bmult_dt ( BLIS_SCOMPLEX, i, cntx ), + bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, i, cntx ), + bli_cntx_get_bmult_dt ( BLIS_DCOMPLEX, i, cntx ) + ); + } +#endif + + + for ( i = 0; i < BLIS_NUM_LEVEL3_UKRS; ++i ) + { + func_t* ukr = bli_cntx_get_l3_vir_ukr( i, cntx ); + + printf( "l3 vir ukr %2lu: %16p %16p %16p %16p\n", + i, + bli_func_get_dt( BLIS_FLOAT, ukr ), + bli_func_get_dt( BLIS_DOUBLE, ukr ), + bli_func_get_dt( BLIS_SCOMPLEX, ukr ), + bli_func_get_dt( BLIS_DCOMPLEX, ukr ) + ); + } + + for ( i = 0; i < BLIS_NUM_LEVEL3_UKRS; ++i ) + { + func_t* ukr = bli_cntx_get_l3_nat_ukr( i, cntx ); + + printf( "l3 nat ukr %2lu: %16p %16p %16p %16p\n", + i, + bli_func_get_dt( BLIS_FLOAT, ukr ), + bli_func_get_dt( BLIS_DOUBLE, ukr ), + bli_func_get_dt( BLIS_SCOMPLEX, ukr ), + bli_func_get_dt( BLIS_DCOMPLEX, ukr ) + ); + } + + for ( i = 0; i < BLIS_NUM_LEVEL1F_KERS; ++i ) + { + func_t* ker = bli_cntx_get_l1f_ker( i, cntx ); + + printf( "l1f ker %2lu: %16p %16p %16p %16p\n", + i, + bli_func_get_dt( BLIS_FLOAT, ker ), + bli_func_get_dt( BLIS_DOUBLE, ker ), + bli_func_get_dt( BLIS_SCOMPLEX, ker ), + bli_func_get_dt( BLIS_DCOMPLEX, ker ) + ); + } + + for ( i = 0; i < BLIS_NUM_LEVEL1V_KERS; ++i ) + { + func_t* ker = bli_cntx_get_l1v_ker( i, cntx ); + + printf( "l1v ker %2lu: %16p %16p %16p %16p\n", + i, + bli_func_get_dt( BLIS_FLOAT, ker ), + bli_func_get_dt( BLIS_DOUBLE, ker ), + bli_func_get_dt( BLIS_SCOMPLEX, ker ), + bli_func_get_dt( BLIS_DCOMPLEX, ker ) + ); + } + + { + func_t* ukr = bli_cntx_get_packm_ukr( cntx ); + + printf( "packm ker : %16p %16p %16p %16p\n", + bli_func_get_dt( BLIS_FLOAT, ukr ), + bli_func_get_dt( BLIS_DOUBLE, ukr ), + bli_func_get_dt( BLIS_SCOMPLEX, ukr ), + bli_func_get_dt( BLIS_DCOMPLEX, ukr ) + ); + } + + { + ind_t method = bli_cntx_get_ind_method( cntx ); + + printf( "ind method : %lu\n", ( guint_t )method ); + } +} + + + + + + + + + + + + + + + + diff --git a/frame/base/bli_cntx.h b/frame/base/bli_cntx.h new file mode 100644 index 000000000..67a0fcd96 --- /dev/null +++ b/frame/base/bli_cntx.h @@ -0,0 +1,347 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#ifndef BLIS_CNTX_H +#define BLIS_CNTX_H + +// Context object type (defined in bli_type_defs.h) + +/* +typedef struct cntx_s +{ + blksz_t* blkszs; + bszid_t* bmults; + + func_t* l3_vir_ukrs; + func_t* l3_nat_ukrs; + mbool_t* l3_nat_ukrs_prefs; + + func_t* l1f_kers; + func_t* l1v_kers; + + func_t packm_ukrs; + + ind_t method; + pack_t schema_a; + pack_t schema_b; + pack_t schema_c; + +} cntx_t; +*/ + +// ----------------------------------------------------------------------------- + +// cntx_t query (fields only) + +#define bli_cntx_blkszs_buf( cntx ) \ +\ + ( cntx->blkszs ) + +#define bli_cntx_bmults_buf( cntx ) \ +\ + ( cntx->bmults ) + +#define bli_cntx_l3_vir_ukrs_buf( cntx ) \ +\ + ( cntx->l3_vir_ukrs ) + +#define bli_cntx_l3_nat_ukrs_buf( cntx ) \ +\ + ( cntx->l3_nat_ukrs ) + +#define bli_cntx_l3_nat_ukrs_prefs_buf( cntx ) \ +\ + ( cntx->l3_nat_ukrs_prefs ) + +#define bli_cntx_l1f_kers_buf( cntx ) \ +\ + ( cntx->l1f_kers ) + +#define bli_cntx_l1v_kers_buf( cntx ) \ +\ + ( cntx->l1v_kers ) + +#define bli_cntx_packm_ukrs_buf( cntx ) \ +\ + (&(cntx->packm_ukrs) ) + +#define bli_cntx_packm_ukrs( cntx ) \ +\ + (&(cntx->packm_ukrs) ) + +#define bli_cntx_method( cntx ) \ +\ + ( cntx->method ) + +#define bli_cntx_schema_a( cntx ) \ +\ + ( cntx->schema_a ) + +#define bli_cntx_schema_b( cntx ) \ +\ + ( cntx->schema_b ) + +#define bli_cntx_schema_c( cntx ) \ +\ + ( cntx->schema_c ) + +// cntx_t modification (fields only) + +#define bli_cntx_set_blkszs_buf( _blkszs, cntx_p ) \ +{ \ + (cntx_p)->blkszs = _blkszs; \ +} + +#define bli_cntx_set_bmults_buf( _bmults, cntx_p ) \ +{ \ + (cntx_p)->bmults = _bmults; \ +} + +#define bli_cntx_set_l3_vir_ukrs_buf( _l3_vir_ukrs, cntx_p ) \ +{ \ + (cntx_p)->l3_vir_ukrs = _l3_vir_ukrs; \ +} + +#define bli_cntx_set_l3_nat_ukrs_buf( _l3_nat_ukrs, cntx_p ) \ +{ \ + (cntx_p)->l3_nat_ukrs = _l3_nat_ukrs; \ +} + +#define bli_cntx_set_l3_nat_ukrs_prefs_buf( _l3_nat_ukrs_prefs, cntx_p ) \ +{ \ + (cntx_p)->l3_nat_ukrs_prefs = _l3_nat_ukrs_prefs; \ +} + +#define bli_cntx_set_l1f_kers_buf( _l1f_kers, cntx_p ) \ +{ \ + (cntx_p)->l1f_kers = _l1f_kers; \ +} + +#define bli_cntx_set_l1v_kers_buf( _l1v_kers, cntx_p ) \ +{ \ + (cntx_p)->l1v_kers = _l1v_kers; \ +} + +#define bli_cntx_set_packm_ukrs( _packm_ukrs, cntx_p ) \ +{ \ + (cntx_p)->packm_ukrs = _packm_ukrs; \ +} + +#define bli_cntx_set_method( _method, cntx_p ) \ +{ \ + (cntx_p)->method = _method; \ +} + +#define bli_cntx_set_schema_a( _schema_a, cntx_p ) \ +{ \ + (cntx_p)->schema_a = _schema_a; \ +} + +#define bli_cntx_set_schema_b( _schema_b, cntx_p ) \ +{ \ + (cntx_p)->schema_b = _schema_b; \ +} + +#define bli_cntx_set_schema_c( _schema_c, cntx_p ) \ +{ \ + (cntx_p)->schema_c = _schema_c; \ +} + +// ----------------------------------------------------------------------------- + +// create/free + +//void bli_cntx_obj_create( cntx_t* cntx ); +//void bli_cntx_obj_copy( cntx_t* src, +// cntx_t* dst ); +//void bli_cntx_obj_free( cntx_t* cntx ); +void bli_cntx_obj_clear( cntx_t* cntx ); +void bli_cntx_init( cntx_t* cntx ); + +// get functions + +blksz_t* bli_cntx_get_blksz( bszid_t bs_id, + cntx_t* cntx ); +dim_t bli_cntx_get_blksz_def_dt( num_t dt, + bszid_t bs_id, + cntx_t* cntx ); +dim_t bli_cntx_get_blksz_max_dt( num_t dt, + bszid_t bs_id, + cntx_t* cntx ); +blksz_t* bli_cntx_get_bmult( bszid_t bs_id, + cntx_t* cntx ); +dim_t bli_cntx_get_bmult_dt( num_t dt, + bszid_t bs_id, + cntx_t* cntx ); +func_t* bli_cntx_get_l3_ukr( l3ukr_t ukr_id, + cntx_t* cntx ); +void* bli_cntx_get_l3_ukr_dt( num_t dt, + l3ukr_t ukr_id, + cntx_t* cntx ); +func_t* bli_cntx_get_l3_vir_ukr( l3ukr_t ukr_id, + cntx_t* cntx ); +void* bli_cntx_get_l3_vir_ukr_dt( num_t dt, + l3ukr_t ukr_id, + cntx_t* cntx ); +func_t* bli_cntx_get_l3_nat_ukr( l3ukr_t ukr_id, + cntx_t* cntx ); +void* bli_cntx_get_l3_nat_ukr_dt( num_t dt, + l3ukr_t ukr_id, + cntx_t* cntx ); +mbool_t* bli_cntx_get_l3_nat_ukr_prefs( l3ukr_t ukr_id, + cntx_t* cntx ); +func_t* bli_cntx_get_l1f_ker( l1fkr_t ker_id, + cntx_t* cntx ); +void* bli_cntx_get_l1f_ker_dt( num_t dt, + l1fkr_t ker_id, + cntx_t* cntx ); +func_t* bli_cntx_get_l1v_ker( l1vkr_t ker_id, + cntx_t* cntx ); +void* bli_cntx_get_l1v_ker_dt( num_t dt, + l1vkr_t ker_id, + cntx_t* cntx ); +func_t* bli_cntx_get_packm_ukr( cntx_t* cntx ); +ind_t bli_cntx_get_ind_method( cntx_t* cntx ); +pack_t bli_cntx_get_pack_schema_a( cntx_t* cntx ); +pack_t bli_cntx_get_pack_schema_b( cntx_t* cntx ); +pack_t bli_cntx_get_pack_schema_c( cntx_t* cntx ); + +// set functions + +void bli_cntx_set_blkszs( ind_t method, dim_t n_bs, ... ); + +void bli_cntx_set_blksz( bszid_t bs_id, + blksz_t* blksz, + bszid_t mult_id, + cntx_t* cntx ); +void bli_cntx_set_l3_vir_ukr( l3ukr_t ukr_id, + func_t* func, + cntx_t* cntx ); +void bli_cntx_set_l3_nat_ukr( l3ukr_t ukr_id, + func_t* func, + cntx_t* cntx ); +void bli_cntx_set_l1f_ker( l1fkr_t ker_id, + func_t* func, + cntx_t* cntx ); +void bli_cntx_set_l1v_ker( l1vkr_t ker_id, + func_t* func, + cntx_t* cntx ); +void bli_cntx_set_packm_ukr( func_t* func, + cntx_t* cntx ); +void bli_cntx_set_ind_method( ind_t method, + cntx_t* cntx ); +void bli_cntx_set_pack_schema_ab( pack_t schema_a, + pack_t schema_b, + cntx_t* cntx ); +void bli_cntx_set_pack_schema_a( pack_t schema_a, + cntx_t* cntx ); +void bli_cntx_set_pack_schema_b( pack_t schema_b, + cntx_t* cntx ); +void bli_cntx_set_pack_schema_c( pack_t schema_c, + cntx_t* cntx ); + +// other query functions + +bool_t bli_cntx_l3_nat_ukr_prefers_rows_dt( num_t dt, + l3ukr_t ukr_id, + cntx_t* cntx ); +bool_t bli_cntx_l3_nat_ukr_prefers_cols_dt( num_t dt, + l3ukr_t ukr_id, + cntx_t* cntx ); +bool_t bli_cntx_l3_nat_ukr_prefers_storage_of( obj_t* obj, + l3ukr_t ukr_id, + cntx_t* cntx ); +bool_t bli_cntx_l3_nat_ukr_dislikes_storage_of( obj_t* obj, + l3ukr_t ukr_id, + cntx_t* cntx ); + +// print function + +void bli_cntx_print( cntx_t* cntx ); + +// ----------------------------------------------------------------------------- + +// Preprocess out these calls entirely, since they are currently just empty +// functions that do nothing. +//#define bli_cntx_obj_create( cntx ) { bli_cntx_obj_clear( cntx ); } +//#define bli_cntx_obj_free( cntx ) { bli_cntx_obj_clear( cntx ); } +#define bli_cntx_obj_create( cntx ) { ; } +#define bli_cntx_obj_free( cntx ) { ; } + +#define bli_cntx_init_local_if( opname, cntx, cntx_p ) \ +\ + cntx_t _cntx_l; \ +\ + if ( bli_is_null( cntx ) ) \ + { \ + PASTEMAC(opname,_cntx_init)( &_cntx_l ); \ + cntx_p = &_cntx_l; \ + } \ + else \ + { \ + cntx_p = cntx; \ + } + +#define bli_cntx_finalize_local_if( opname, cntx ) \ +\ + if ( bli_is_null( cntx ) ) \ + { \ + PASTEMAC(opname,_cntx_finalize)( &_cntx_l ); \ + } + + +#define bli_cntx_init_local_if2( opname, suf, cntx, cntx_p ) \ +\ + cntx_t _cntx_l; \ +\ + if ( bli_is_null( cntx ) ) \ + { \ + PASTEMAC2(opname,suf,_cntx_init)( &_cntx_l ); \ + cntx_p = &_cntx_l; \ + } \ + else \ + { \ + cntx_p = cntx; \ + } + +#define bli_cntx_finalize_local_if2( opname, suf, cntx ) \ +\ + if ( bli_is_null( cntx ) ) \ + { \ + PASTEMAC2(opname,suf,_cntx_finalize)( &_cntx_l ); \ + } + + +#endif + diff --git a/frame/base/bli_error.c b/frame/base/bli_error.c index 18c84ab65..e58d9d02d 100644 --- a/frame/base/bli_error.c +++ b/frame/base/bli_error.c @@ -168,6 +168,8 @@ void bli_error_init_msgs( void ) "Attempted to allocate contiguous memory block that is too big for implementation." ); sprintf( bli_error_string_for_code(BLIS_EXHAUSTED_CONTIG_MEMORY_POOL), "Attempted to allocate more memory from contiguous pool than is available." ); + sprintf( bli_error_string_for_code(BLIS_INSUFFICIENT_STACK_BUF_SIZE), + "Configured maximum stack buffer size is insufficient for register blocksizes currently in use." ); sprintf( bli_error_string_for_code(BLIS_EXPECTED_OBJECT_ALIAS), "Expected object to be alias." ); diff --git a/frame/base/bli_func.c b/frame/base/bli_func.c index f75596af7..8ee41d4a4 100644 --- a/frame/base/bli_func.c +++ b/frame/base/bli_func.c @@ -35,85 +35,65 @@ #include "blis.h" -func_t* bli_func_obj_create( void* ptr_s, bool_t pref_s, - void* ptr_d, bool_t pref_d, - void* ptr_c, bool_t pref_c, - void* ptr_z, bool_t pref_z ) +func_t* bli_func_obj_create( void* ptr_s, + void* ptr_d, + void* ptr_c, + void* ptr_z ) { func_t* f; f = ( func_t* ) bli_malloc( sizeof(func_t) ); bli_func_obj_init( f, - ptr_s, pref_s, - ptr_d, pref_d, - ptr_c, pref_c, - ptr_z, pref_z ); + ptr_s, + ptr_d, + ptr_c, + ptr_z ); return f; } - void bli_func_obj_init( func_t* f, - void* ptr_s, bool_t pref_s, - void* ptr_d, bool_t pref_d, - void* ptr_c, bool_t pref_c, - void* ptr_z, bool_t pref_z ) + void* ptr_s, + void* ptr_d, + void* ptr_c, + void* ptr_z ) { f->ptr[BLIS_BITVAL_FLOAT_TYPE] = ptr_s; f->ptr[BLIS_BITVAL_DOUBLE_TYPE] = ptr_d; f->ptr[BLIS_BITVAL_SCOMPLEX_TYPE] = ptr_c; f->ptr[BLIS_BITVAL_DCOMPLEX_TYPE] = ptr_z; - - f->prefers_contig_rows[BLIS_BITVAL_FLOAT_TYPE] = pref_s; - f->prefers_contig_rows[BLIS_BITVAL_DOUBLE_TYPE] = pref_d; - f->prefers_contig_rows[BLIS_BITVAL_SCOMPLEX_TYPE] = pref_c; - f->prefers_contig_rows[BLIS_BITVAL_DCOMPLEX_TYPE] = pref_z; } - void bli_func_obj_free( func_t* f ) { bli_free( f ); } +// ----------------------------------------------------------------------------- -void* bli_func_obj_query( num_t dt, - func_t* f ) +bool_t bli_func_is_null_dt( num_t dt, + func_t* f ) { - return f->ptr[ dt ]; + return ( f->ptr[ dt ] == NULL ); } -bool_t bli_func_prefers_contig_rows( num_t dt, - func_t* f ) +bool_t bli_func_is_null( func_t* f ) { - return f->prefers_contig_rows[ dt ]; -} + bool_t r_val = TRUE; + num_t dt; -bool_t bli_func_prefers_contig_cols( num_t dt, - func_t* f ) -{ - return !(f->prefers_contig_rows[ dt ]); -} - -bool_t bli_func_pref_is_sat_by( obj_t* a, - func_t* f ) -{ - num_t dt = bli_obj_datatype( *a ); - bool_t r_val = FALSE; - - if ( ( bli_obj_is_row_stored( *a ) && - bli_func_prefers_contig_rows( dt, f ) ) || - ( bli_obj_is_col_stored( *a ) && - bli_func_prefers_contig_cols( dt, f ) ) ) - r_val = TRUE; + // Iterate over all floating-point datatypes. If any is non-null, + // return FALSE. Otherwise, if they are all null, return TRUE. + for ( dt = BLIS_DT_LO; dt <= BLIS_DT_HI; ++dt ) + { + if ( f->ptr[ dt ] != NULL ) + { + r_val = FALSE; + break; + } + } return r_val; } -bool_t bli_func_pref_is_unsat_by( obj_t* a, - func_t* f ) -{ - return !bli_func_pref_is_sat_by( a, f ); -} - diff --git a/frame/base/bli_func.h b/frame/base/bli_func.h index 441f7df82..56b221be9 100644 --- a/frame/base/bli_func.h +++ b/frame/base/bli_func.h @@ -32,33 +32,39 @@ */ +// ----------------------------------------------------------------------------- -func_t* bli_func_obj_create( void* ptr_s, bool_t pref_s, - void* ptr_d, bool_t pref_d, - void* ptr_c, bool_t pref_c, - void* ptr_z, bool_t pref_z ); +// func_t query + +#define bli_func_get_dt( dt, f ) \ +\ + ( (f)->ptr[ dt ] ) + +// func_t modification + +#define bli_func_set_dt( fp, dt, f ) \ +{ \ + (f)->ptr[ dt ] = fp; \ +} + +// ----------------------------------------------------------------------------- + +func_t* bli_func_obj_create( void* ptr_s, + void* ptr_d, + void* ptr_c, + void* ptr_z ); void bli_func_obj_init( func_t* f, - void* ptr_s, bool_t pref_s, - void* ptr_d, bool_t pref_d, - void* ptr_c, bool_t pref_c, - void* ptr_z, bool_t pref_z ); + void* ptr_s, + void* ptr_d, + void* ptr_c, + void* ptr_z ); void bli_func_obj_free( func_t* f ); +// ----------------------------------------------------------------------------- -void* bli_func_obj_query( num_t dt, - func_t* f ); - -bool_t bli_func_prefers_contig_rows( num_t dt, - func_t* f ); - -bool_t bli_func_prefers_contig_cols( num_t dt, - func_t* f ); - -bool_t bli_func_pref_is_sat_by( obj_t* a, - func_t* f ); - -bool_t bli_func_pref_is_unsat_by( obj_t* a, - func_t* f ); +bool_t bli_func_is_null_dt( num_t dt, + func_t* f ); +bool_t bli_func_is_null( func_t* f ); diff --git a/frame/base/bli_gks.c b/frame/base/bli_gks.c new file mode 100644 index 000000000..1368d8846 --- /dev/null +++ b/frame/base/bli_gks.c @@ -0,0 +1,1001 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// -- blksz_t structure -------------------------------------------------------- +// + +static blksz_t bli_gks_blkszs[BLIS_NUM_BLKSZS] = +{ + /* float (0) scomplex (1) double (2) dcomplex (3) */ +/* kr */ { { BLIS_DEFAULT_KR_S, BLIS_DEFAULT_KR_C, BLIS_DEFAULT_KR_D, BLIS_DEFAULT_KR_Z, }, + { BLIS_PACKDIM_KR_S, BLIS_PACKDIM_KR_C, BLIS_PACKDIM_KR_D, BLIS_PACKDIM_KR_Z, } + }, +/* mr */ { { BLIS_DEFAULT_MR_S, BLIS_DEFAULT_MR_C, BLIS_DEFAULT_MR_D, BLIS_DEFAULT_MR_Z, }, + { BLIS_PACKDIM_MR_S, BLIS_PACKDIM_MR_C, BLIS_PACKDIM_MR_D, BLIS_PACKDIM_MR_Z, } + }, +/* nr */ { { BLIS_DEFAULT_NR_S, BLIS_DEFAULT_NR_C, BLIS_DEFAULT_NR_D, BLIS_DEFAULT_NR_Z, }, + { BLIS_PACKDIM_NR_S, BLIS_PACKDIM_NR_C, BLIS_PACKDIM_NR_D, BLIS_PACKDIM_NR_Z, } + }, +/* mc */ { { BLIS_DEFAULT_MC_S, BLIS_DEFAULT_MC_C, BLIS_DEFAULT_MC_D, BLIS_DEFAULT_MC_Z, }, + { BLIS_MAXIMUM_MC_S, BLIS_MAXIMUM_MC_C, BLIS_MAXIMUM_MC_D, BLIS_MAXIMUM_MC_Z, } + }, +/* kc */ { { BLIS_DEFAULT_KC_S, BLIS_DEFAULT_KC_C, BLIS_DEFAULT_KC_D, BLIS_DEFAULT_KC_Z, }, + { BLIS_MAXIMUM_KC_S, BLIS_MAXIMUM_KC_C, BLIS_MAXIMUM_KC_D, BLIS_MAXIMUM_KC_Z, } + }, +/* nc */ { { BLIS_DEFAULT_NC_S, BLIS_DEFAULT_NC_C, BLIS_DEFAULT_NC_D, BLIS_DEFAULT_NC_Z, }, + { BLIS_MAXIMUM_NC_S, BLIS_MAXIMUM_NC_C, BLIS_MAXIMUM_NC_D, BLIS_MAXIMUM_NC_Z, } + }, +/* m2 */ { { BLIS_DEFAULT_M2_S, BLIS_DEFAULT_M2_C, BLIS_DEFAULT_M2_D, BLIS_DEFAULT_M2_Z, }, + { BLIS_DEFAULT_M2_S, BLIS_DEFAULT_M2_C, BLIS_DEFAULT_M2_D, BLIS_DEFAULT_M2_Z, } + }, +/* n2 */ { { BLIS_DEFAULT_N2_S, BLIS_DEFAULT_N2_C, BLIS_DEFAULT_N2_D, BLIS_DEFAULT_N2_Z, }, + { BLIS_DEFAULT_N2_S, BLIS_DEFAULT_N2_C, BLIS_DEFAULT_N2_D, BLIS_DEFAULT_N2_Z, } + }, +/* 1f */ { { BLIS_DEFAULT_1F_S, BLIS_DEFAULT_1F_C, BLIS_DEFAULT_1F_D, BLIS_DEFAULT_1F_Z, }, + { BLIS_DEFAULT_1F_S, BLIS_DEFAULT_1F_C, BLIS_DEFAULT_1F_D, BLIS_DEFAULT_1F_Z, } + }, +/* af */ { { BLIS_DEFAULT_AF_S, BLIS_DEFAULT_AF_C, BLIS_DEFAULT_AF_D, BLIS_DEFAULT_AF_Z, }, + { BLIS_DEFAULT_AF_S, BLIS_DEFAULT_AF_C, BLIS_DEFAULT_AF_D, BLIS_DEFAULT_AF_Z, } + }, +/* df */ { { BLIS_DEFAULT_DF_S, BLIS_DEFAULT_DF_C, BLIS_DEFAULT_DF_D, BLIS_DEFAULT_DF_Z, }, + { BLIS_DEFAULT_DF_S, BLIS_DEFAULT_DF_C, BLIS_DEFAULT_DF_D, BLIS_DEFAULT_DF_Z, } + }, +/* xf */ { { BLIS_DEFAULT_XF_S, BLIS_DEFAULT_XF_C, BLIS_DEFAULT_XF_D, BLIS_DEFAULT_XF_Z, }, + { BLIS_DEFAULT_XF_S, BLIS_DEFAULT_XF_C, BLIS_DEFAULT_XF_D, BLIS_DEFAULT_XF_Z, } + }, +/* vf */ { { BLIS_DEFAULT_VF_S, BLIS_DEFAULT_VF_C, BLIS_DEFAULT_VF_D, BLIS_DEFAULT_VF_Z, }, + { BLIS_DEFAULT_VF_S, BLIS_DEFAULT_VF_C, BLIS_DEFAULT_VF_D, BLIS_DEFAULT_VF_Z, } + }, +}; + +// ----------------------------------------------------------------------------- + +void bli_gks_get_blksz( bszid_t bs_id, + blksz_t* blksz ) +{ + *blksz = bli_gks_blkszs[ bs_id ]; +} + +void bli_gks_cntx_set_blkszs( ind_t method, dim_t n_bs, ... ) +{ + /* Example prototypes: + + void + bli_gks_cntx_set_blkszs( + + ind_t method = BLIS_NAT, + dim_t n_bs, + bszid_t bs0_id, bszid_t bm0_id, + bszid_t bs1_id, bszid_t bm1_id, + bszid_t bs2_id, bszid_t bm2_id, + ... + cntx_t* cntx ); + + void + bli_gks_cntx_set_blkszs( + + ind_t method != BLIS_NAT, + dim_t n_bs, + bszid_t bs0_id, bszid_t bm0_id, dim_t scalr0, + bszid_t bs1_id, bszid_t bm1_id, dim_t scalr1, + bszid_t bs2_id, bszid_t bm2_id, dim_t scalr2, + ... + cntx_t* cntx ); + */ + va_list args; + dim_t i; + + bszid_t* bszids; + bszid_t* bmults; + double* scalrs; + + cntx_t* cntx; + + blksz_t* cntx_blkszs; + bszid_t* cntx_bmults; + + bszid_t bs_id; + bszid_t bm_id; + double scalr; + + // Allocate some temporary local arrays. + bszids = bli_malloc( n_bs * sizeof( bszid_t ) ); + bmults = bli_malloc( n_bs * sizeof( bszid_t ) ); + scalrs = bli_malloc( n_bs * sizeof( double ) ); + + // -- Begin variable argument section -- + + // Initialize variable argument environment. + va_start( args, n_bs ); + + // Handle native and induced method cases separately. + if ( method == BLIS_NAT ) + { + // Process n_bs tuples. + for ( i = 0; i < n_bs; ++i ) + { + // Here, we query the variable argument list for: + // - the bszid_t of the blocksize we're about to process, + // - the bszid_t of the multiple we need to associate with + // the blksz_t object. + bs_id = va_arg( args, bszid_t ); + bm_id = va_arg( args, bszid_t ); + + // Store the values in our temporary arrays. + bszids[ i ] = bs_id; + bmults[ i ] = bm_id; + } + } + else // if induced method execution was indicated + { + // Process n_bs tuples. + for ( i = 0; i < n_bs; ++i ) + { + // Here, we query the variable argument list for: + // - the bszid_t of the blocksize we're about to process, + // - the bszid_t of the multiple we need to associate with + // the blksz_t object. + // - the scalar we wish to apply to the real blocksizes to + // come up with the induced complex blocksizes. + bs_id = va_arg( args, bszid_t ); + bm_id = va_arg( args, bszid_t ); + scalr = va_arg( args, double ); + + // Store the values in our temporary arrays. + bszids[ i ] = bs_id; + bmults[ i ] = bm_id; + scalrs[ i ] = scalr; + } + } + + // The last argument should be the context pointer. + cntx = va_arg( args, cntx_t* ); + + // Shutdown variable argument environment and clean up stack. + va_end( args ); + + // -- End variable argument section -- + + // Save the execution type into the context. + bli_cntx_set_method( method, cntx ); + + // Query the context for the addresses of: + // - the blocksize object array + // - the blocksize multiple array + cntx_blkszs = bli_cntx_blkszs_buf( cntx ); + cntx_bmults = bli_cntx_bmults_buf( cntx ); + + // Now that we have the context address, we want to copy the values + // from the temporary buffers into the corresponding buffers in the + // context. + + // Handle native and induced method cases separately. + if ( method == BLIS_NAT ) + { + // Process each blocksize id tuple provided. + for ( i = 0; i < n_bs; ++i ) + { + // Read the current blocksize id, blocksize multiple id. + bszid_t bs_id = bszids[ i ]; + bszid_t bm_id = bmults[ i ]; + + blksz_t* cntx_blksz = &cntx_blkszs[ bs_id ]; + + // Query the blocksizes (blksz_t) associated with bs_id and save + // them directly into the appropriate location in the context's + // blksz_t array. + bli_gks_get_blksz( bs_id, cntx_blksz ); + + // Copy the blocksize multiple id into the context. + cntx_bmults[ bs_id ] = bm_id; + } + } + else + { + // Process each blocksize id tuple provided. + for ( i = 0; i < n_bs; ++i ) + { + // Read the current blocksize id, blocksize multiple id, + // and blocksize scalar. + bszid_t bs_id = bszids[ i ]; + bszid_t bm_id = bmults[ i ]; + double scalr = scalrs[ i ]; + + blksz_t blksz; + blksz_t bmult; + + blksz_t* cntx_blksz = &cntx_blkszs[ bs_id ]; + + // Query the blocksizes (blksz_t) associated with bs_id and bm_id + // and use them to populate a pair of local blksz_t objects. + bli_gks_get_blksz( bs_id, &blksz ); + bli_gks_get_blksz( bm_id, &bmult ); + + // Copy the real domain values of the source blksz_t object into + // the context, duplicating into the complex domain fields. + bli_blksz_copy_dt( BLIS_FLOAT, &blksz, BLIS_FLOAT, cntx_blksz ); + bli_blksz_copy_dt( BLIS_DOUBLE, &blksz, BLIS_DOUBLE, cntx_blksz ); + bli_blksz_copy_dt( BLIS_FLOAT, &blksz, BLIS_SCOMPLEX, cntx_blksz ); + bli_blksz_copy_dt( BLIS_DOUBLE, &blksz, BLIS_DCOMPLEX, cntx_blksz ); + + // The next steps apply only to cache blocksizes, and not register + // blocksizes (ie: they only apply to blocksizes for which the + // blocksize multiple id is different than the blocksize id) and + // only when the scalar provided is non-unit. + if ( bs_id != bm_id && scalr != 1.0 ) + { + // Scale the complex domain values in the blocksize object. + bli_blksz_scale_dt_by( 1, (dim_t)scalr, BLIS_SCOMPLEX, cntx_blksz ); + bli_blksz_scale_dt_by( 1, (dim_t)scalr, BLIS_DCOMPLEX, cntx_blksz ); + + // Finally, round the newly-scaled blocksizes down to their + // respective multiples. + bli_blksz_reduce_dt_to( BLIS_FLOAT, &bmult, BLIS_SCOMPLEX, cntx_blksz ); + bli_blksz_reduce_dt_to( BLIS_DOUBLE, &bmult, BLIS_DCOMPLEX, cntx_blksz ); + } + + // Copy the blocksize multiple id into the context. + cntx_bmults[ bs_id ] = bm_id; + } + } + + // Free the temporary local arrays. + bli_free( bszids ); + bli_free( bmults ); + bli_free( scalrs ); +} + + +// +// -- level-3 micro-kernel structure ------------------------------------------- +// + +static func_t bli_gks_l3_ind_ukrs[BLIS_NUM_IND_METHODS] + [BLIS_NUM_LEVEL3_UKRS] = +{ + /* s(0) c(1) d(2) z(3) */ +/* 3mh */ { +/* gemm */ { { NULL, BLIS_CGEMM3MH_UKERNEL, NULL, BLIS_ZGEMM3MH_UKERNEL, } }, +/* gemmtrsm_l */ { { NULL, NULL, NULL, NULL, } }, +/* gemmtrsm_u */ { { NULL, NULL, NULL, NULL, } }, +/* trsm_l */ { { NULL, NULL, NULL, NULL, } }, +/* trsm_u */ { { NULL, NULL, NULL, NULL, } }, + }, +/* 3m3 */ { +/* gemm */ { { NULL, BLIS_CGEMM3M3_UKERNEL, NULL, BLIS_ZGEMM3M3_UKERNEL, } }, +/* gemmtrsm_l */ { { NULL, NULL, NULL, NULL, } }, +/* gemmtrsm_u */ { { NULL, NULL, NULL, NULL, } }, +/* trsm_l */ { { NULL, NULL, NULL, NULL, } }, +/* trsm_u */ { { NULL, NULL, NULL, NULL, } }, + }, +/* 3m2 */ { +/* gemm */ { { NULL, BLIS_CGEMM3M2_UKERNEL, NULL, BLIS_ZGEMM3M2_UKERNEL, } }, +/* gemmtrsm_l */ { { NULL, NULL, NULL, NULL, } }, +/* gemmtrsm_u */ { { NULL, NULL, NULL, NULL, } }, +/* trsm_l */ { { NULL, NULL, NULL, NULL, } }, +/* trsm_u */ { { NULL, NULL, NULL, NULL, } }, + }, +/* 3m1 */ { +/* gemm */ { { NULL, BLIS_CGEMM3M1_UKERNEL, NULL, BLIS_ZGEMM3M1_UKERNEL, } }, +/* gemmtrsm_l */ { { NULL, BLIS_CGEMMTRSM3M1_L_UKERNEL, NULL, BLIS_ZGEMMTRSM3M1_L_UKERNEL, } }, +/* gemmtrsm_u */ { { NULL, BLIS_CGEMMTRSM3M1_U_UKERNEL, NULL, BLIS_ZGEMMTRSM3M1_U_UKERNEL, } }, +/* trsm_l */ { { NULL, BLIS_CTRSM3M1_L_UKERNEL, NULL, BLIS_ZTRSM3M1_L_UKERNEL, } }, +/* trsm_u */ { { NULL, BLIS_CTRSM3M1_U_UKERNEL, NULL, BLIS_ZTRSM3M1_U_UKERNEL, } }, + }, +/* 4mh */ { +/* gemm */ { { NULL, BLIS_CGEMM4MH_UKERNEL, NULL, BLIS_ZGEMM4MH_UKERNEL, } }, +/* gemmtrsm_l */ { { NULL, NULL, NULL, NULL, } }, +/* gemmtrsm_u */ { { NULL, NULL, NULL, NULL, } }, +/* trsm_l */ { { NULL, NULL, NULL, NULL, } }, +/* trsm_u */ { { NULL, NULL, NULL, NULL, } }, + }, +/* 4m1b */ { +/* gemm */ { { NULL, BLIS_CGEMM4MB_UKERNEL, NULL, BLIS_ZGEMM4MB_UKERNEL, } }, +/* gemmtrsm_l */ { { NULL, NULL, NULL, NULL, } }, +/* gemmtrsm_u */ { { NULL, NULL, NULL, NULL, } }, +/* trsm_l */ { { NULL, NULL, NULL, NULL, } }, +/* trsm_u */ { { NULL, NULL, NULL, NULL, } }, + }, +/* 4m1a */ { +/* gemm */ { { NULL, BLIS_CGEMM4M1_UKERNEL, NULL, BLIS_ZGEMM4M1_UKERNEL, } }, +/* gemmtrsm_l */ { { NULL, BLIS_CGEMMTRSM4M1_L_UKERNEL, NULL, BLIS_ZGEMMTRSM4M1_L_UKERNEL, } }, +/* gemmtrsm_u */ { { NULL, BLIS_CGEMMTRSM4M1_U_UKERNEL, NULL, BLIS_ZGEMMTRSM4M1_U_UKERNEL, } }, +/* trsm_l */ { { NULL, BLIS_CTRSM4M1_L_UKERNEL, NULL, BLIS_ZTRSM4M1_L_UKERNEL, } }, +/* trsm_u */ { { NULL, BLIS_CTRSM4M1_U_UKERNEL, NULL, BLIS_ZTRSM4M1_U_UKERNEL, } }, + }, +/* nat */ { +/* gemm */ { { BLIS_SGEMM_UKERNEL, BLIS_CGEMM_UKERNEL, + BLIS_DGEMM_UKERNEL, BLIS_ZGEMM_UKERNEL, } }, +/* gemmtrsm_l */ { { BLIS_SGEMMTRSM_L_UKERNEL, BLIS_CGEMMTRSM_L_UKERNEL, + BLIS_DGEMMTRSM_L_UKERNEL, BLIS_ZGEMMTRSM_L_UKERNEL, } }, +/* gemmtrsm_u */ { { BLIS_SGEMMTRSM_U_UKERNEL, BLIS_CGEMMTRSM_U_UKERNEL, + BLIS_DGEMMTRSM_U_UKERNEL, BLIS_ZGEMMTRSM_U_UKERNEL, } }, +/* trsm_l */ { { BLIS_STRSM_L_UKERNEL, BLIS_CTRSM_L_UKERNEL, + BLIS_DTRSM_L_UKERNEL, BLIS_ZTRSM_L_UKERNEL, } }, +/* trsm_u */ { { BLIS_STRSM_U_UKERNEL, BLIS_CTRSM_U_UKERNEL, + BLIS_DTRSM_U_UKERNEL, BLIS_ZTRSM_U_UKERNEL, } }, + }, +}; + +static func_t bli_gks_l3_ref_ukrs[BLIS_NUM_LEVEL3_UKRS] = +{ + /* float (0) scomplex (1) double (2) dcomplex (3) */ +/* gemm */ { { BLIS_SGEMM_UKERNEL_REF, BLIS_CGEMM_UKERNEL_REF, + BLIS_DGEMM_UKERNEL_REF, BLIS_ZGEMM_UKERNEL_REF, } }, +/* gemmtrsm_l */ { { BLIS_SGEMMTRSM_L_UKERNEL_REF, BLIS_CGEMMTRSM_L_UKERNEL_REF, + BLIS_DGEMMTRSM_L_UKERNEL_REF, BLIS_ZGEMMTRSM_L_UKERNEL_REF, } }, +/* gemmtrsm_u */ { { BLIS_SGEMMTRSM_U_UKERNEL_REF, BLIS_CGEMMTRSM_U_UKERNEL_REF, + BLIS_DGEMMTRSM_U_UKERNEL_REF, BLIS_ZGEMMTRSM_U_UKERNEL_REF, } }, +/* trsm_l */ { { BLIS_STRSM_L_UKERNEL_REF, BLIS_CTRSM_L_UKERNEL_REF, + BLIS_DTRSM_L_UKERNEL_REF, BLIS_ZTRSM_L_UKERNEL_REF, } }, +/* trsm_u */ { { BLIS_STRSM_U_UKERNEL_REF, BLIS_CTRSM_U_UKERNEL_REF, + BLIS_DTRSM_U_UKERNEL_REF, BLIS_ZTRSM_U_UKERNEL_REF, } }, +}; + +// ----------------------------------------------------------------------------- + +void bli_gks_get_l3_nat_ukr( l3ukr_t ukr, + func_t* func ) +{ + *func = bli_gks_l3_ind_ukrs[ BLIS_NAT ][ ukr ]; +} + +void bli_gks_get_l3_vir_ukr( ind_t method, + l3ukr_t ukr, + func_t* func ) +{ + *func = bli_gks_l3_ind_ukrs[ method ][ ukr ]; +} + +void bli_gks_get_l3_ref_ukr( l3ukr_t ukr, + func_t* func ) +{ + *func = bli_gks_l3_ref_ukrs[ ukr ]; +} + +void bli_gks_cntx_set_l3_nat_ukr( l3ukr_t ukr, + cntx_t* cntx ) +{ + func_t* cntx_l3_nat_ukrs = bli_cntx_l3_nat_ukrs_buf( cntx ); + func_t* cntx_l3_nat_ukr = &cntx_l3_nat_ukrs[ ukr ]; + + bli_gks_get_l3_nat_ukr( ukr, cntx_l3_nat_ukr ); +} + +void bli_gks_cntx_set_l3_nat_ukrs( dim_t n_uk, ... ) +{ + /* Example prototype: + + void + bli_gks_cntx_set_l3_nat_ukrs( dim_t n_uk, + l3ukr_t ukr0_id, + l3ukr_t ukr1_id, + l3ukr_t ukr2_id, + ... + cntx_t* cntx ); + */ + + va_list args; + dim_t i; + l3ukr_t* l3_ukrs; + cntx_t* cntx; + + // Allocate some temporary local arrays. + l3_ukrs = bli_malloc( n_uk * sizeof( l3ukr_t ) ); + + // -- Begin variable argument section -- + + // Initialize variable argument environment. + va_start( args, n_uk ); + + // Process n_uk kernel ids. + for ( i = 0; i < n_uk; ++i ) + { + // Here, we query the variable argument list for the kernel id. + const l3ukr_t uk_id = va_arg( args, l3ukr_t ); + + // Store the value in our temporary array. + l3_ukrs[ i ] = uk_id; + } + + // The last argument should be the context pointer. + cntx = va_arg( args, cntx_t* ); + + // Shutdown variable argument environment and clean up stack. + va_end( args ); + + // -- End variable argument section -- + + // Process each kernel id provided. + for ( i = 0; i < n_uk; ++i ) + { + // Read the current kernel id. + const l3ukr_t uk_id = l3_ukrs[ i ]; + + // Query the func_t associated with uk_id and save it directly into + // the context. + bli_gks_cntx_set_l3_nat_ukr( uk_id, cntx ); + } + + // Free the temporary local array. + bli_free( l3_ukrs ); +} + +void bli_gks_cntx_set_l3_vir_ukr( ind_t method, + l3ukr_t ukr, + cntx_t* cntx ) +{ + func_t* cntx_l3_vir_ukrs = bli_cntx_l3_vir_ukrs_buf( cntx ); + func_t* cntx_l3_vir_ukr = &cntx_l3_vir_ukrs[ ukr ]; + + bli_gks_get_l3_vir_ukr( method, ukr, cntx_l3_vir_ukr ); +} + +void bli_gks_cntx_set_l3_vir_ukrs( ind_t method, dim_t n_uk, ... ) +{ + /* Example prototype: + + void + bli_gks_cntx_set_l3_vir_ukrs( ind_t method, + dim_t n_uk, + l3ukr_t ukr0_id, + l3ukr_t ukr1_id, + l3ukr_t ukr2_id, + ... + cntx_t* cntx ); + */ + + va_list args; + dim_t i; + l3ukr_t* l3_ukrs; + cntx_t* cntx; + + // Allocate some temporary local arrays. + l3_ukrs = bli_malloc( n_uk * sizeof( l3ukr_t ) ); + + // -- Begin variable argument section -- + + // Initialize variable argument environment. + va_start( args, n_uk ); + + // Process n_uk kernel ids. + for ( i = 0; i < n_uk; ++i ) + { + // Here, we query the variable argument list for the kernel id. + const l3ukr_t uk_id = va_arg( args, l3ukr_t ); + + // Store the value in our temporary array. + l3_ukrs[ i ] = uk_id; + } + + // The last argument should be the context pointer. + cntx = va_arg( args, cntx_t* ); + + // Shutdown variable argument environment and clean up stack. + va_end( args ); + + // -- End variable argument section -- + + // Process each kernel id provided. + for ( i = 0; i < n_uk; ++i ) + { + // Read the current kernel id. + const l3ukr_t uk_id = l3_ukrs[ i ]; + + // Query the func_t associated with uk_id and save it directly into + // the context. + bli_gks_cntx_set_l3_vir_ukr( method, uk_id, cntx ); + } + + // Free the temporary local array. + bli_free( l3_ukrs ); +} + + +// +// -- level-3 micro-kernel preferences ----------------------------------------- +// + +static mbool_t bli_gks_l3_ukrs_prefs[BLIS_NUM_LEVEL3_UKRS] = +{ +/* gemm */ { { BLIS_SGEMM_UKERNEL_PREFERS_CONTIG_ROWS, + BLIS_CGEMM_UKERNEL_PREFERS_CONTIG_ROWS, + BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS, + BLIS_ZGEMM_UKERNEL_PREFERS_CONTIG_ROWS, } }, +/* gemmtrsm_l */ { { FALSE, FALSE, FALSE, FALSE, } }, +/* gemmtrsm_u */ { { FALSE, FALSE, FALSE, FALSE, } }, +/* trsm_l */ { { FALSE, FALSE, FALSE, FALSE, } }, +/* trsm_u */ { { FALSE, FALSE, FALSE, FALSE, } }, +}; + +// ----------------------------------------------------------------------------- + +void bli_gks_get_l3_nat_ukr_prefs( l3ukr_t ukr, + mbool_t* mbool ) +{ + *mbool = bli_gks_l3_ukrs_prefs[ ukr ]; +} + +void bli_gks_cntx_set_l3_nat_ukr_prefs( l3ukr_t ukr, + cntx_t* cntx ) +{ + mbool_t* cntx_l3_nat_ukr_prefs = bli_cntx_l3_nat_ukrs_prefs_buf( cntx ); + mbool_t* cntx_l3_nat_ukr_pref = &cntx_l3_nat_ukr_prefs[ ukr ]; + + bli_gks_get_l3_nat_ukr_prefs( ukr, cntx_l3_nat_ukr_pref ); +} + + +#if 0 +// +// -- packm structure-aware kernel structure ----------------------------------- +// + +static func_t bli_gks_packm_struc_kers[BLIS_NUM_PACK_SCHEMA_TYPES] = +{ + /* float (0) scomplex (1) double (2) dcomplex (3) */ +// row/col vectors + { NULL, NULL, + NULL, NULL, }, +// row/col panels + { bli_spackm_struc_cxk, bli_cpackm_struc_cxk, + bli_dpackm_struc_cxk, bli_zpackm_struc_cxk, }, +// row/col panels: 4m interleaved + { NULL, bli_cpackm_struc_cxk_4mi, + NULL, bli_zpackm_struc_cxk_4mi, }, +// row/col panels: 4m separated (NOT IMPLEMENTED) + { NULL, NULL, + NULL, NULL, }, +// row/col panels: 3m interleaved + { NULL, bli_cpackm_struc_cxk_3mis, + NULL, bli_zpackm_struc_cxk_3mis, }, +// row/col panels: 3m separated + { NULL, bli_cpackm_struc_cxk_3mis, + NULL, bli_zpackm_struc_cxk_3mis, }, +// row/col panels: real only + { NULL, bli_cpackm_struc_cxk_rih, + NULL, bli_zpackm_struc_cxk_rih, }, +// row/col panels: imaginary only + { NULL, bli_cpackm_struc_cxk_rih, + NULL, bli_zpackm_struc_cxk_rih, }, +// row/col panels: real+imaginary only + { NULL, bli_cpackm_struc_cxk_rih, + NULL, bli_zpackm_struc_cxk_rih, }, +}; + +// ----------------------------------------------------------------------------- + +void bli_gks_get_packm_struc_ker( pack_t schema, + func_t* func ) +{ + const dim_t i = bli_pack_schema_index( schema ); + + *func = bli_gks_packm_struc_kers[ i ]; +} + +void bli_gks_cntx_set_packm_struc_ker( pack_t schema, + cntx_t* cntx ) +{ + func_t* cntx_packm_ukr = bli_cntx_packm_ukrs( cntx ); + + bli_gks_get_packm_struc_kers( schema, cntx_packm_ukr ); +} +#endif + + +// +// -- level-1f kernel structure ------------------------------------------------ +// + +static func_t bli_gks_l1f_kers[BLIS_NUM_LEVEL1F_KERS] = +{ + /* float (0) scomplex (1) double (2) dcomplex (3) */ +/* axpy2v */ { { BLIS_SAXPY2V_KERNEL, BLIS_CAXPY2V_KERNEL, + BLIS_DAXPY2V_KERNEL, BLIS_ZAXPY2V_KERNEL, } + }, +/* dotaxpyv */ { { BLIS_SDOTAXPYV_KERNEL, BLIS_CDOTAXPYV_KERNEL, + BLIS_DDOTAXPYV_KERNEL, BLIS_ZDOTAXPYV_KERNEL, } + }, +/* axpyf */ { { BLIS_SAXPYF_KERNEL, BLIS_CAXPYF_KERNEL, + BLIS_DAXPYF_KERNEL, BLIS_ZAXPYF_KERNEL, } + }, +/* dotxf */ { { BLIS_SDOTXF_KERNEL, BLIS_CDOTXF_KERNEL, + BLIS_DDOTXF_KERNEL, BLIS_ZDOTXF_KERNEL, } + }, +/* dotxaxpyf */ { { BLIS_SDOTXAXPYF_KERNEL, BLIS_CDOTXAXPYF_KERNEL, + BLIS_DDOTXAXPYF_KERNEL, BLIS_ZDOTXAXPYF_KERNEL, } + }, +}; + +static func_t bli_gks_l1f_ref_kers[BLIS_NUM_LEVEL1F_KERS] = +{ + /* float (0) scomplex (1) double (2) dcomplex (3) */ +/* axpy2v */ { { BLIS_SAXPY2V_KERNEL_REF, BLIS_CAXPY2V_KERNEL_REF, + BLIS_DAXPY2V_KERNEL_REF, BLIS_ZAXPY2V_KERNEL_REF, } + }, +/* dotaxpyv */ { { BLIS_SDOTAXPYV_KERNEL_REF, BLIS_CDOTAXPYV_KERNEL_REF, + BLIS_DDOTAXPYV_KERNEL_REF, BLIS_ZDOTAXPYV_KERNEL_REF, } + }, +/* axpyf */ { { BLIS_SAXPYF_KERNEL_REF, BLIS_CAXPYF_KERNEL_REF, + BLIS_DAXPYF_KERNEL_REF, BLIS_ZAXPYF_KERNEL_REF, } + }, +/* dotxf */ { { BLIS_SDOTXF_KERNEL_REF, BLIS_CDOTXF_KERNEL_REF, + BLIS_DDOTXF_KERNEL_REF, BLIS_ZDOTXF_KERNEL_REF, } + }, +/* dotxaxpyf */ { { BLIS_SDOTXAXPYF_KERNEL_REF, BLIS_CDOTXAXPYF_KERNEL_REF, + BLIS_DDOTXAXPYF_KERNEL_REF, BLIS_ZDOTXAXPYF_KERNEL_REF, } + }, +}; + +// ----------------------------------------------------------------------------- + +void bli_gks_get_l1f_ker( l1fkr_t ker, + func_t* func ) +{ + *func = bli_gks_l1f_kers[ ker ]; +} + +void bli_gks_get_l1f_ref_ker( l1fkr_t ker, + func_t* func ) +{ + *func = bli_gks_l1f_ref_kers[ ker ]; +} + +void bli_gks_cntx_set_l1f_ker( l1fkr_t ker, + cntx_t* cntx ) +{ + func_t* cntx_l1f_kers = bli_cntx_l1f_kers_buf( cntx ); + func_t* cntx_l1f_ker = &cntx_l1f_kers[ ker ]; + + bli_gks_get_l1f_ker( ker, cntx_l1f_ker ); +} + +void bli_gks_cntx_set_l1f_kers( dim_t n_kr, ... ) +{ + /* Example prototype: + + void + bli_gks_cntx_set_l1f_kers( dim_t n_kr, + l1fkr_t ker0_id, + l1fkr_t ker1_id, + l1fkr_t ker2_id, + ... + cntx_t* cntx ); + */ + + va_list args; + dim_t i; + l1fkr_t* l1f_kers; + cntx_t* cntx; + + // Allocate some temporary local arrays. + l1f_kers = bli_malloc( n_kr * sizeof( l1fkr_t ) ); + + // -- Begin variable argument section -- + + // Initialize variable argument environment. + va_start( args, n_kr ); + + // Process n_kr kernel ids. + for ( i = 0; i < n_kr; ++i ) + { + // Here, we query the variable argument list for the kernel id. + const l1fkr_t kr_id = va_arg( args, l1fkr_t ); + + // Store the value in our temporary array. + l1f_kers[ i ] = kr_id; + } + + // The last argument should be the context pointer. + cntx = va_arg( args, cntx_t* ); + + // Shutdown variable argument environment and clean up stack. + va_end( args ); + + // -- End variable argument section -- + + // Process each kernel id provided. + for ( i = 0; i < n_kr; ++i ) + { + // Read the current kernel id. + const l1fkr_t kr_id = l1f_kers[ i ]; + + // Query the func_t associated with kr_id and save it directly into + // the context. + bli_gks_cntx_set_l1f_ker( kr_id, cntx ); + } + + // Free the temporary local array. + bli_free( l1f_kers ); +} + + +// +// -- level-1v kernel structure ------------------------------------------------ +// + +static func_t bli_gks_l1v_kers[BLIS_NUM_LEVEL1V_KERS] = +{ + /* float (0) scomplex (1) double (2) dcomplex (3) */ +/* addv */ { { BLIS_SADDV_KERNEL, BLIS_CADDV_KERNEL, + BLIS_DADDV_KERNEL, BLIS_ZADDV_KERNEL, } + }, +/* axpyv */ { { BLIS_SAXPYV_KERNEL, BLIS_CAXPYV_KERNEL, + BLIS_DAXPYV_KERNEL, BLIS_ZAXPYV_KERNEL, } + }, +/* copyv */ { { BLIS_SCOPYV_KERNEL, BLIS_CCOPYV_KERNEL, + BLIS_DCOPYV_KERNEL, BLIS_ZCOPYV_KERNEL, } + }, +/* dotv */ { { BLIS_SDOTV_KERNEL, BLIS_CDOTV_KERNEL, + BLIS_DDOTV_KERNEL, BLIS_ZDOTV_KERNEL, } + }, +/* dotxv */ { { BLIS_SDOTXV_KERNEL, BLIS_CDOTXV_KERNEL, + BLIS_DDOTXV_KERNEL, BLIS_ZDOTXV_KERNEL, } + }, +/* invertv */ { { BLIS_SINVERTV_KERNEL, BLIS_CINVERTV_KERNEL, + BLIS_DINVERTV_KERNEL, BLIS_ZINVERTV_KERNEL, } + }, +/* scalv */ { { BLIS_SSCALV_KERNEL, BLIS_CSCALV_KERNEL, + BLIS_DSCALV_KERNEL, BLIS_ZSCALV_KERNEL, } + }, +/* scal2v */ { { BLIS_SSCAL2V_KERNEL, BLIS_CSCAL2V_KERNEL, + BLIS_DSCAL2V_KERNEL, BLIS_ZSCAL2V_KERNEL, } + }, +/* setv */ { { BLIS_SSETV_KERNEL, BLIS_CSETV_KERNEL, + BLIS_DSETV_KERNEL, BLIS_ZSETV_KERNEL, } + }, +/* subv */ { { BLIS_SSUBV_KERNEL, BLIS_CSUBV_KERNEL, + BLIS_DSUBV_KERNEL, BLIS_ZSUBV_KERNEL, } + }, +/* swapv */ { { BLIS_SSWAPV_KERNEL, BLIS_CSWAPV_KERNEL, + BLIS_DSWAPV_KERNEL, BLIS_ZSWAPV_KERNEL, } + }, +}; + +static func_t bli_gks_l1v_ref_kers[BLIS_NUM_LEVEL1V_KERS] = +{ + /* float (0) scomplex (1) double (2) dcomplex (3) */ +/* addv */ { { BLIS_SADDV_KERNEL_REF, BLIS_CADDV_KERNEL_REF, + BLIS_DADDV_KERNEL_REF, BLIS_ZADDV_KERNEL_REF, } + }, +/* axpyv */ { { BLIS_SAXPYV_KERNEL_REF, BLIS_CAXPYV_KERNEL_REF, + BLIS_DAXPYV_KERNEL_REF, BLIS_ZAXPYV_KERNEL_REF, } + }, +/* copyv */ { { BLIS_SCOPYV_KERNEL_REF, BLIS_CCOPYV_KERNEL_REF, + BLIS_DCOPYV_KERNEL_REF, BLIS_ZCOPYV_KERNEL_REF, } + }, +/* dotv */ { { BLIS_SDOTV_KERNEL_REF, BLIS_CDOTV_KERNEL_REF, + BLIS_DDOTV_KERNEL_REF, BLIS_ZDOTV_KERNEL_REF, } + }, +/* dotxv */ { { BLIS_SDOTXV_KERNEL_REF, BLIS_CDOTXV_KERNEL_REF, + BLIS_DDOTXV_KERNEL_REF, BLIS_ZDOTXV_KERNEL_REF, } + }, +/* invertv */ { { BLIS_SINVERTV_KERNEL_REF, BLIS_CINVERTV_KERNEL_REF, + BLIS_DINVERTV_KERNEL_REF, BLIS_ZINVERTV_KERNEL_REF, } + }, +/* scalv */ { { BLIS_SSCALV_KERNEL_REF, BLIS_CSCALV_KERNEL_REF, + BLIS_DSCALV_KERNEL_REF, BLIS_ZSCALV_KERNEL_REF, } + }, +/* scal2v */ { { BLIS_SSCAL2V_KERNEL_REF, BLIS_CSCAL2V_KERNEL_REF, + BLIS_DSCAL2V_KERNEL_REF, BLIS_ZSCAL2V_KERNEL_REF, } + }, +/* setv */ { { BLIS_SSETV_KERNEL_REF, BLIS_CSETV_KERNEL_REF, + BLIS_DSETV_KERNEL_REF, BLIS_ZSETV_KERNEL_REF, } + }, +/* subv */ { { BLIS_SSUBV_KERNEL_REF, BLIS_CSUBV_KERNEL_REF, + BLIS_DSUBV_KERNEL_REF, BLIS_ZSUBV_KERNEL_REF, } + }, +/* swapv */ { { BLIS_SSWAPV_KERNEL_REF, BLIS_CSWAPV_KERNEL_REF, + BLIS_DSWAPV_KERNEL_REF, BLIS_ZSWAPV_KERNEL_REF, } + }, +}; + +// ----------------------------------------------------------------------------- + +void bli_gks_get_l1v_ker( l1vkr_t ker, + func_t* func ) +{ + *func = bli_gks_l1v_kers[ ker ]; +} + +void bli_gks_get_l1v_ref_ker( l1vkr_t ker, + func_t* func ) +{ + *func = bli_gks_l1v_ref_kers[ ker ]; +} + +void bli_gks_cntx_set_l1v_ker( l1vkr_t ker, + cntx_t* cntx ) +{ + func_t* cntx_l1v_kers = bli_cntx_l1v_kers_buf( cntx ); + func_t* cntx_l1v_ker = &cntx_l1v_kers[ ker ]; + + bli_gks_get_l1v_ker( ker, cntx_l1v_ker ); +} + + +void bli_gks_cntx_set_l1v_kers( dim_t n_kr, ... ) +{ + /* Example prototype: + + void + bli_gks_cntx_set_l1v_kers( dim_t n_kr, + l1vkr_t ker0_id, + l1vkr_t ker1_id, + l1vkr_t ker2_id, + ... + cntx_t* cntx ); + */ + + va_list args; + dim_t i; + l1vkr_t* l1v_kers; + cntx_t* cntx; + + // Allocate some temporary local arrays. + l1v_kers = bli_malloc( n_kr * sizeof( l1vkr_t ) ); + + // -- Begin variable argument section -- + + // Initialize variable argument environment. + va_start( args, n_kr ); + + // Process n_kr kernel ids. + for ( i = 0; i < n_kr; ++i ) + { + // Here, we query the variable argument list for the kernel id. + const l1vkr_t kr_id = va_arg( args, l1vkr_t ); + + // Store the value in our temporary array. + l1v_kers[ i ] = kr_id; + } + + // The last argument should be the context pointer. + cntx = va_arg( args, cntx_t* ); + + // Shutdown variable argument environment and clean up stack. + va_end( args ); + + // -- End variable argument section -- + + // Process each kernel id provided. + for ( i = 0; i < n_kr; ++i ) + { + // Read the current kernel id. + const l1vkr_t kr_id = l1v_kers[ i ]; + + // Query the func_t associated with kr_id and save it directly into + // the context. + bli_gks_cntx_set_l1v_ker( kr_id, cntx ); + } + + // Free the temporary local array. + bli_free( l1v_kers ); +} + + +// +// -- level-3 micro-kernel implementation strings ------------------------------ +// + +static char* bli_gks_l3_ukr_impl_str[BLIS_NUM_UKR_IMPL_TYPES] = +{ + "refrnce", + "virtual", + "optimzd", + "notappl", +}; + +// ----------------------------------------------------------------------------- + +char* bli_gks_l3_ukr_impl_string( l3ukr_t ukr, ind_t method, num_t dt ) +{ + func_t p; + kimpl_t ki; + + // Query the func_t for the given ukr type and method. + bli_gks_get_l3_vir_ukr( method, ukr, &p ); + + // Check whether the ukrs func_t is NULL for the given ukr type and + // datatype. If the queried ukr func_t is NULL, return the string + // for not applicable. Otherwise, query the ukernel implementation + // type using the method provided and return the associated string. + if ( bli_func_is_null_dt( dt, &p ) ) + ki = BLIS_NOTAPPLIC_UKERNEL; + else + ki = bli_gks_l3_ukr_impl_type( ukr, method, dt ); + + return bli_gks_l3_ukr_impl_str[ ki ]; +} + +#if 0 +char* bli_gks_l3_ukr_avail_impl_string( l3ukr_t ukr, num_t dt ) +{ + opid_t oper; + ind_t method; + kimpl_t ki; + + // We need to decide which operation we will use to query the + // current available induced method. If the ukr type given is + // BLIS_GEMM_UKR, we use gemm. Otherwise, we use trsm (since + // the four other defined ukr types are trsm-related). + if ( ukr == BLIS_GEMM_UKR ) oper = BLIS_GEMM; + else oper = BLIS_TRSM; + + // Query the current available induced method using the + // chosen operation id type. + method = bli_l3_ind_oper_find_avail( oper, dt ); + + // Query the ukernel implementation type using the current + // available method. + ki = bli_gks_l3_ukr_impl_type( ukr, method, dt ); + + return bli_ukr_impl_str[ ki ]; +} +#endif + +kimpl_t bli_gks_l3_ukr_impl_type( l3ukr_t ukr, ind_t method, num_t dt ) +{ + // If the current available induced method is not native, it + // must be virtual. + if ( method != BLIS_NAT ) return BLIS_VIRTUAL_UKERNEL; + else + { + // If the current available induced method for the gemm + // operation is native, then it might be reference or + // optimized. To determine which, we compare the + // datatype-specific function pointer within the ukrs + // object corresponding to the current available induced + // method to the typed function pointer within the known + // reference ukrs object. + + func_t funcs; + func_t ref_funcs; + void* p; + void* ref_p; + + bli_gks_get_l3_vir_ukr( method, ukr, &funcs ); + bli_gks_get_l3_ref_ukr( ukr, &ref_funcs ); + + p = bli_func_get_dt( dt, &funcs ); + ref_p = bli_func_get_dt( dt, &ref_funcs ); + + if ( p == ref_p ) return BLIS_REFERENCE_UKERNEL; + else return BLIS_OPTIMIZED_UKERNEL; + } +} + diff --git a/frame/base/bli_gks.h b/frame/base/bli_gks.h new file mode 100644 index 000000000..a889497be --- /dev/null +++ b/frame/base/bli_gks.h @@ -0,0 +1,101 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#ifndef BLIS_GKS_H +#define BLIS_GKS_H + + +// ----------------------------------------------------------------------------- + +void bli_gks_get_blksz( bszid_t bs_id, + blksz_t* blksz ); + +void bli_gks_cntx_set_blkszs( ind_t method, dim_t n_bs, ... ); + +// ----------------------------------------------------------------------------- + +void bli_gks_get_l3_nat_ukr( l3ukr_t ukr, + func_t* func ); +void bli_gks_get_l3_vir_ukr( ind_t method, + l3ukr_t ukr, + func_t* func ); +void bli_gks_get_l3_ref_ukr( l3ukr_t ukr, + func_t* func ); +void bli_gks_cntx_set_l3_nat_ukr( l3ukr_t ukr, + cntx_t* cntx ); +void bli_gks_cntx_set_l3_vir_ukr( ind_t method, + l3ukr_t ukr, + cntx_t* cntx ); + +void bli_gks_cntx_set_l3_nat_ukrs( dim_t n_uk, ... ); +void bli_gks_cntx_set_l3_vir_ukrs( ind_t method, dim_t n_uk, ... ); + +// ----------------------------------------------------------------------------- + +void bli_gks_get_l3_nat_ukr_prefs( l3ukr_t ukr, + mbool_t* mbool ); +void bli_gks_cntx_set_l3_nat_ukr_prefs( l3ukr_t ukr, + cntx_t* cntx ); + +// ----------------------------------------------------------------------------- + +void bli_gks_get_l1f_ker( l1fkr_t ker, + func_t* func ); +void bli_gks_get_l1f_ref_ker( l1fkr_t ker, + func_t* func ); +void bli_gks_cntx_set_l1f_ker( l1fkr_t ker, + cntx_t* cntx ); + +void bli_gks_cntx_set_l1f_kers( dim_t n_kr, ... ); + +// ----------------------------------------------------------------------------- + +void bli_gks_get_l1v_ker( l1vkr_t ker, + func_t* func ); +void bli_gks_get_l1v_ref_ker( l1vkr_t ker, + func_t* func ); +void bli_gks_cntx_set_l1v_ker( l1vkr_t ker, + cntx_t* cntx ); + +void bli_gks_cntx_set_l1v_kers( dim_t n_kr, ... ); + +// ----------------------------------------------------------------------------- + +char* bli_gks_l3_ukr_impl_string( l3ukr_t ukr, ind_t method, num_t dt ); +kimpl_t bli_gks_l3_ukr_impl_type( l3ukr_t ukr, ind_t method, num_t dt ); + +// ----------------------------------------------------------------------------- + +#endif + diff --git a/frame/base/bli_info.c b/frame/base/bli_info.c index 941a92c3b..fede4f823 100644 --- a/frame/base/bli_info.c +++ b/frame/base/bli_info.c @@ -52,13 +52,16 @@ char* bli_info_get_int_type_size_str( void ) { return bli_int_type_size_s -// -- bli_config.h ------------------------------------------------------------- +// -- General configuration-related -------------------------------------------- gint_t bli_info_get_int_type_size( void ) { return BLIS_INT_TYPE_SIZE; } gint_t bli_info_get_num_fp_types( void ) { return BLIS_NUM_FP_TYPES; } gint_t bli_info_get_max_type_size( void ) { return BLIS_MAX_TYPE_SIZE; } -gint_t bli_info_get_simd_align_size( void ) { return BLIS_SIMD_ALIGN_SIZE; } gint_t bli_info_get_page_size( void ) { return BLIS_PAGE_SIZE; } +gint_t bli_info_get_simd_num_registers( void ) { return BLIS_SIMD_NUM_REGISTERS; } +gint_t bli_info_get_simd_size( void ) { return BLIS_SIMD_SIZE; } +gint_t bli_info_get_simd_align_size( void ) { return BLIS_SIMD_ALIGN_SIZE; } +gint_t bli_info_get_stack_buf_max_size( void ) { return BLIS_STACK_BUF_MAX_SIZE; } gint_t bli_info_get_stack_buf_align_size( void ) { return BLIS_STACK_BUF_ALIGN_SIZE; } gint_t bli_info_get_heap_addr_align_size( void ) { return BLIS_HEAP_ADDR_ALIGN_SIZE; } gint_t bli_info_get_heap_stride_align_size( void ) { return BLIS_HEAP_STRIDE_ALIGN_SIZE; } @@ -91,151 +94,25 @@ gint_t bli_info_get_blas2blis_int_type_size( void ) { return BLIS_BLAS2BLIS_INT_ -// -- bli_kernel.h ------------------------------------------------------------- - -// -- Default cache blocksizes -- - -gint_t bli_info_get_default_mc( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_dt( BLIS_MC, oper, dt ); } -gint_t bli_info_get_default_nc( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_dt( BLIS_NC, oper, dt ); } -gint_t bli_info_get_default_kc( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_dt( BLIS_KC, oper, dt ); } - -// -- Maximum cache blocksizes -- - -gint_t bli_info_get_maximum_mc( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_max_dt( BLIS_MC, oper, dt ); } -gint_t bli_info_get_maximum_nc( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_max_dt( BLIS_NC, oper, dt ); } -gint_t bli_info_get_maximum_kc( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_max_dt( BLIS_KC, oper, dt ); } - -// -- Default register blocksizes -- - -gint_t bli_info_get_default_mr( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_dt( BLIS_MR, oper, dt ); } -gint_t bli_info_get_default_nr( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_dt( BLIS_NR, oper, dt ); } -gint_t bli_info_get_default_kr( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_dt( BLIS_KR, oper, dt ); } - -// -- Packing register blocksizes -- - -gint_t bli_info_get_packdim_mr( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_max_dt( BLIS_MR, oper, dt ); } -gint_t bli_info_get_packdim_nr( opid_t oper, num_t dt ) { return bli_bsv_get_avail_blksz_max_dt( BLIS_NR, oper, dt ); } - - -// -- Level-2 cache blocksizes -- - -extern blksz_t* gemv_mc; -extern blksz_t* gemv_nc; - -// m dimension default blocksizes - -gint_t bli_info_get_default_l2_mc( num_t dt ) -{ - if ( bli_is_float ( dt ) ) return bli_info_get_default_l2_mc_s(); - else if ( bli_is_double ( dt ) ) return bli_info_get_default_l2_mc_d(); - else if ( bli_is_scomplex( dt ) ) return bli_info_get_default_l2_mc_c(); - else if ( bli_is_dcomplex( dt ) ) return bli_info_get_default_l2_mc_z(); - else return 0; -} -gint_t bli_info_get_default_l2_mc_s( void ) { bli_init(); return bli_blksz_get_def( BLIS_FLOAT, gemv_mc ); } -gint_t bli_info_get_default_l2_mc_d( void ) { bli_init(); return bli_blksz_get_def( BLIS_DOUBLE, gemv_mc ); } -gint_t bli_info_get_default_l2_mc_c( void ) { bli_init(); return bli_blksz_get_def( BLIS_SCOMPLEX, gemv_mc ); } -gint_t bli_info_get_default_l2_mc_z( void ) { bli_init(); return bli_blksz_get_def( BLIS_DCOMPLEX, gemv_mc ); } - - -// n dimension default blocksizes - -gint_t bli_info_get_default_l2_nc( num_t dt ) -{ - if ( bli_is_float ( dt ) ) return bli_info_get_default_l2_nc_s(); - else if ( bli_is_double ( dt ) ) return bli_info_get_default_l2_nc_d(); - else if ( bli_is_scomplex( dt ) ) return bli_info_get_default_l2_nc_c(); - else if ( bli_is_dcomplex( dt ) ) return bli_info_get_default_l2_nc_z(); - else return 0; -} -gint_t bli_info_get_default_l2_nc_s( void ) { bli_init(); return bli_blksz_get_def( BLIS_FLOAT, gemv_nc ); } -gint_t bli_info_get_default_l2_nc_d( void ) { bli_init(); return bli_blksz_get_def( BLIS_DOUBLE, gemv_nc ); } -gint_t bli_info_get_default_l2_nc_c( void ) { bli_init(); return bli_blksz_get_def( BLIS_SCOMPLEX, gemv_nc ); } -gint_t bli_info_get_default_l2_nc_z( void ) { bli_init(); return bli_blksz_get_def( BLIS_DCOMPLEX, gemv_nc ); } - - -// -- Level-1f fusing factors -- - -// default - -gint_t bli_info_get_default_l1f_fuse_fac( num_t dt ) -{ - if ( bli_is_float ( dt ) ) return bli_info_get_default_l1f_fuse_fac_s(); - else if ( bli_is_double ( dt ) ) return bli_info_get_default_l1f_fuse_fac_d(); - else if ( bli_is_scomplex( dt ) ) return bli_info_get_default_l1f_fuse_fac_c(); - else if ( bli_is_dcomplex( dt ) ) return bli_info_get_default_l1f_fuse_fac_z(); - else return 0; -} -gint_t bli_info_get_default_l1f_fuse_fac_s( void ) { return BLIS_L1F_FUSE_FAC_S; } -gint_t bli_info_get_default_l1f_fuse_fac_d( void ) { return BLIS_L1F_FUSE_FAC_D; } -gint_t bli_info_get_default_l1f_fuse_fac_c( void ) { return BLIS_L1F_FUSE_FAC_C; } -gint_t bli_info_get_default_l1f_fuse_fac_z( void ) { return BLIS_L1F_FUSE_FAC_Z; } - - -// axpyf - -gint_t bli_info_get_axpyf_fuse_fac( num_t dt ) -{ - if ( bli_is_float ( dt ) ) return bli_info_get_axpyf_fuse_fac_s(); - else if ( bli_is_double ( dt ) ) return bli_info_get_axpyf_fuse_fac_d(); - else if ( bli_is_scomplex( dt ) ) return bli_info_get_axpyf_fuse_fac_c(); - else if ( bli_is_dcomplex( dt ) ) return bli_info_get_axpyf_fuse_fac_z(); - else return 0; -} -gint_t bli_info_get_axpyf_fuse_fac_s( void ) { return BLIS_AXPYF_FUSE_FAC_S; } -gint_t bli_info_get_axpyf_fuse_fac_d( void ) { return BLIS_AXPYF_FUSE_FAC_D; } -gint_t bli_info_get_axpyf_fuse_fac_c( void ) { return BLIS_AXPYF_FUSE_FAC_C; } -gint_t bli_info_get_axpyf_fuse_fac_z( void ) { return BLIS_AXPYF_FUSE_FAC_Z; } - - -// dotxf - -gint_t bli_info_get_dotxf_fuse_fac( num_t dt ) -{ - if ( bli_is_float ( dt ) ) return bli_info_get_dotxf_fuse_fac_s(); - else if ( bli_is_double ( dt ) ) return bli_info_get_dotxf_fuse_fac_d(); - else if ( bli_is_scomplex( dt ) ) return bli_info_get_dotxf_fuse_fac_c(); - else if ( bli_is_dcomplex( dt ) ) return bli_info_get_dotxf_fuse_fac_z(); - else return 0; -} -gint_t bli_info_get_dotxf_fuse_fac_s( void ) { return BLIS_DOTXF_FUSE_FAC_S; } -gint_t bli_info_get_dotxf_fuse_fac_d( void ) { return BLIS_DOTXF_FUSE_FAC_D; } -gint_t bli_info_get_dotxf_fuse_fac_c( void ) { return BLIS_DOTXF_FUSE_FAC_C; } -gint_t bli_info_get_dotxf_fuse_fac_z( void ) { return BLIS_DOTXF_FUSE_FAC_Z; } - - -// dotxaxpyf - -gint_t bli_info_get_dotxaxpyf_fuse_fac( num_t dt ) -{ - if ( bli_is_float ( dt ) ) return bli_info_get_dotxaxpyf_fuse_fac_s(); - else if ( bli_is_double ( dt ) ) return bli_info_get_dotxaxpyf_fuse_fac_d(); - else if ( bli_is_scomplex( dt ) ) return bli_info_get_dotxaxpyf_fuse_fac_c(); - else if ( bli_is_dcomplex( dt ) ) return bli_info_get_dotxaxpyf_fuse_fac_z(); - else return 0; -} -gint_t bli_info_get_dotxaxpyf_fuse_fac_s( void ) { return BLIS_DOTXAXPYF_FUSE_FAC_S; } -gint_t bli_info_get_dotxaxpyf_fuse_fac_d( void ) { return BLIS_DOTXAXPYF_FUSE_FAC_D; } -gint_t bli_info_get_dotxaxpyf_fuse_fac_c( void ) { return BLIS_DOTXAXPYF_FUSE_FAC_C; } -gint_t bli_info_get_dotxaxpyf_fuse_fac_z( void ) { return BLIS_DOTXAXPYF_FUSE_FAC_Z; } +// -- Kernel implementation-related -------------------------------------------- // -- Level-3 kernel definitions -- char* bli_info_get_gemm_ukr_impl_string( ind_t method, num_t dt ) - { return bli_ukr_impl_string( BLIS_GEMM_UKR, method, dt ); } + { return bli_gks_l3_ukr_impl_string( BLIS_GEMM_UKR, method, dt ); } char* bli_info_get_gemmtrsm_l_ukr_impl_string( ind_t method, num_t dt ) - { return bli_ukr_impl_string( BLIS_GEMMTRSM_L_UKR, method, dt ); } + { return bli_gks_l3_ukr_impl_string( BLIS_GEMMTRSM_L_UKR, method, dt ); } char* bli_info_get_gemmtrsm_u_ukr_impl_string( ind_t method, num_t dt ) - { return bli_ukr_impl_string( BLIS_GEMMTRSM_U_UKR, method, dt ); } + { return bli_gks_l3_ukr_impl_string( BLIS_GEMMTRSM_U_UKR, method, dt ); } char* bli_info_get_trsm_l_ukr_impl_string( ind_t method, num_t dt ) - { return bli_ukr_impl_string( BLIS_TRSM_L_UKR, method, dt ); } + { return bli_gks_l3_ukr_impl_string( BLIS_TRSM_L_UKR, method, dt ); } char* bli_info_get_trsm_u_ukr_impl_string( ind_t method, num_t dt ) - { return bli_ukr_impl_string( BLIS_TRSM_U_UKR, method, dt ); } + { return bli_gks_l3_ukr_impl_string( BLIS_TRSM_U_UKR, method, dt ); } -// -- bli_mem_pool_macro_defs.h ------------------------------------------------ +// -- Memory pool-related ------------------------------------------------------ gint_t bli_info_get_mk_pool_size( void ) { return bli_mem_pool_size( BLIS_BUFFER_FOR_A_BLOCK ); } gint_t bli_info_get_kn_pool_size( void ) { return bli_mem_pool_size( BLIS_BUFFER_FOR_B_PANEL ); } diff --git a/frame/base/bli_info.h b/frame/base/bli_info.h index 7d6d80d0a..fd3e8175d 100644 --- a/frame/base/bli_info.h +++ b/frame/base/bli_info.h @@ -39,13 +39,16 @@ char* bli_info_get_version_str( void ); char* bli_info_get_int_type_size_str( void ); -// -- bli_config.h ------------------------------------------------------------- +// -- General configuration-related -------------------------------------------- gint_t bli_info_get_int_type_size( void ); gint_t bli_info_get_num_fp_types( void ); gint_t bli_info_get_max_type_size( void ); -gint_t bli_info_get_simd_align_size( void ); gint_t bli_info_get_page_size( void ); +gint_t bli_info_get_simd_num_registers( void ); +gint_t bli_info_get_simd_size( void ); +gint_t bli_info_get_simd_align_size( void ); +gint_t bli_info_get_stack_buf_max_size( void ); gint_t bli_info_get_stack_buf_align_size( void ); gint_t bli_info_get_heap_addr_align_size( void ); gint_t bli_info_get_heap_stride_align_size( void ); @@ -56,70 +59,7 @@ gint_t bli_info_get_enable_cblas( void ); gint_t bli_info_get_blas2blis_int_type_size( void ); -// -- bli_kernel.h ------------------------------------------------------------- - -// -- Default cache blocksizes -- - -gint_t bli_info_get_default_mc( opid_t oper, num_t dt ); -gint_t bli_info_get_default_kc( opid_t oper, num_t dt ); -gint_t bli_info_get_default_nc( opid_t oper, num_t dt ); - -// -- Maximum cache blocksizes -- - -gint_t bli_info_get_maximum_mc( opid_t oper, num_t dt ); -gint_t bli_info_get_maximum_kc( opid_t oper, num_t dt ); -gint_t bli_info_get_maximum_nc( opid_t oper, num_t dt ); - -// -- Default register blocksizes -- - -gint_t bli_info_get_default_mr( opid_t oper, num_t dt ); -gint_t bli_info_get_default_kr( opid_t oper, num_t dt ); -gint_t bli_info_get_default_nr( opid_t oper, num_t dt ); - -// -- Packing register blocksizes -- - -gint_t bli_info_get_packdim_mr( opid_t oper, num_t dt ); -gint_t bli_info_get_packdim_nr( opid_t oper, num_t dt ); - - -// -- Level-2 cache blocksizes -- - -gint_t bli_info_get_default_l2_mc_s( void ); -gint_t bli_info_get_default_l2_mc_d( void ); -gint_t bli_info_get_default_l2_mc_c( void ); -gint_t bli_info_get_default_l2_mc_z( void ); - -gint_t bli_info_get_default_l2_nc_s( void ); -gint_t bli_info_get_default_l2_nc_d( void ); -gint_t bli_info_get_default_l2_nc_c( void ); -gint_t bli_info_get_default_l2_nc_z( void ); - - -// -- Level-1f fusing factors -- - -gint_t bli_info_get_default_l1f_fuse_fac( num_t dt ); -gint_t bli_info_get_default_l1f_fuse_fac_s( void ); -gint_t bli_info_get_default_l1f_fuse_fac_d( void ); -gint_t bli_info_get_default_l1f_fuse_fac_c( void ); -gint_t bli_info_get_default_l1f_fuse_fac_z( void ); - -gint_t bli_info_get_axpyf_fuse_fac( num_t dt ); -gint_t bli_info_get_axpyf_fuse_fac_s( void ); -gint_t bli_info_get_axpyf_fuse_fac_d( void ); -gint_t bli_info_get_axpyf_fuse_fac_c( void ); -gint_t bli_info_get_axpyf_fuse_fac_z( void ); - -gint_t bli_info_get_dotxf_fuse_fac( num_t dt ); -gint_t bli_info_get_dotxf_fuse_fac_s( void ); -gint_t bli_info_get_dotxf_fuse_fac_d( void ); -gint_t bli_info_get_dotxf_fuse_fac_c( void ); -gint_t bli_info_get_dotxf_fuse_fac_z( void ); - -gint_t bli_info_get_dotxaxpyf_fuse_fac( num_t dt ); -gint_t bli_info_get_dotxaxpyf_fuse_fac_s( void ); -gint_t bli_info_get_dotxaxpyf_fuse_fac_d( void ); -gint_t bli_info_get_dotxaxpyf_fuse_fac_c( void ); -gint_t bli_info_get_dotxaxpyf_fuse_fac_z( void ); +// -- Kernel implementation-related -------------------------------------------- // -- Level-3 kernel definitions -- @@ -131,7 +71,7 @@ char* bli_info_get_trsm_l_ukr_impl_string( ind_t method, num_t dt ); char* bli_info_get_trsm_u_ukr_impl_string( ind_t method, num_t dt ); -// -- bli_mem_pool_macro_defs.h ------------------------------------------------ +// -- Memory pool-related ------------------------------------------------------ gint_t bli_info_get_mk_pool_size( void ); gint_t bli_info_get_kn_pool_size( void ); diff --git a/frame/base/bli_mbool.c b/frame/base/bli_mbool.c new file mode 100644 index 000000000..9bea1cd2e --- /dev/null +++ b/frame/base/bli_mbool.c @@ -0,0 +1,72 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + + +mbool_t* bli_mbool_obj_create( bool_t b_s, + bool_t b_d, + bool_t b_c, + bool_t b_z ) +{ + mbool_t* b; + + b = ( mbool_t* ) bli_malloc( sizeof(mbool_t) ); + + bli_mbool_obj_init( b, + b_s, + b_d, + b_c, + b_z ); + + return b; +} + +void bli_mbool_obj_init( mbool_t* b, + bool_t b_s, + bool_t b_d, + bool_t b_c, + bool_t b_z ) +{ + bli_mbool_set_dt( b_s, BLIS_FLOAT, b ); + bli_mbool_set_dt( b_d, BLIS_DOUBLE, b ); + bli_mbool_set_dt( b_c, BLIS_SCOMPLEX, b ); + bli_mbool_set_dt( b_z, BLIS_DCOMPLEX, b ); +} + +void bli_mbool_obj_free( mbool_t* b ) +{ + bli_free( b ); +} + diff --git a/frame/ind/query/bli_ukr_query.h b/frame/base/bli_mbool.h similarity index 72% rename from frame/ind/query/bli_ukr_query.h rename to frame/base/bli_mbool.h index f10016a0c..5d5f47828 100644 --- a/frame/ind/query/bli_ukr_query.h +++ b/frame/base/bli_mbool.h @@ -32,40 +32,33 @@ */ -#ifndef BLIS_UKR_QUERY_H -#define BLIS_UKR_QUERY_H +// ----------------------------------------------------------------------------- +// mbool_t query -typedef enum -{ - BLIS_GEMM_UKR = 0, - BLIS_GEMMTRSM_L_UKR, - BLIS_GEMMTRSM_U_UKR, - BLIS_TRSM_L_UKR, - BLIS_TRSM_U_UKR, -} l3ukr_t; +#define bli_mbool_get_dt( dt, mb ) \ +\ + ( (mb)->v[ dt ] ) -#define BLIS_NUM_LEVEL3_UKRS 5 +// mbool_t modification -typedef enum -{ - BLIS_REFERENCE_UKERNEL = 0, - BLIS_VIRTUAL_UKERNEL, - BLIS_OPTIMIZED_UKERNEL, - BLIS_NOTAPPLIC_UKERNEL, -} kimpl_t; - -#define BLIS_NUM_UKR_IMPL_TYPES 4 +#define bli_mbool_set_dt( val, dt, mb ) \ +{ \ + (mb)->v[ dt ] = val; \ +} // ----------------------------------------------------------------------------- -char* bli_ukr_impl_string( l3ukr_t ukr, ind_t method, num_t dt ); -char* bli_ukr_avail_impl_string( l3ukr_t ukr, num_t dt ); -kimpl_t bli_ukr_impl_type( l3ukr_t ukr, ind_t method, num_t dt ); +mbool_t* bli_mbool_obj_create( bool_t b_s, + bool_t b_d, + bool_t b_c, + bool_t b_z ); -func_t* bli_ukr_get_funcs( l3ukr_t ukr, ind_t method ); -func_t* bli_ukr_get_ref_funcs( l3ukr_t ukr ); +void bli_mbool_obj_init( mbool_t* b, + bool_t b_s, + bool_t b_d, + bool_t b_c, + bool_t b_z ); - -#endif +void bli_mbool_obj_free( mbool_t* b ); diff --git a/frame/base/bli_mem.c b/frame/base/bli_mem.c index e70fcd372..a1991304e 100644 --- a/frame/base/bli_mem.c +++ b/frame/base/bli_mem.c @@ -274,10 +274,16 @@ static bool_t bli_mem_is_init = FALSE; void bli_mem_init( void ) { + cntx_t cntx; + // If the initialization flag is TRUE, we know the API is already // initialized, so we can return early. if ( bli_mem_is_init == TRUE ) return; + // Create and initialize a context for gemm so we have something + // to pass into bli_mem_init_pools(). + bli_gemm_cntx_init( &cntx ); + #ifdef BLIS_ENABLE_OPENMP _Pragma( "omp critical (mem)" ) #endif @@ -295,7 +301,7 @@ void bli_mem_init( void ) if ( bli_mem_is_init == FALSE ) { // Initialize the memory pools. - bli_mem_init_pools(); + bli_mem_init_pools( &cntx ); // After initialization, mark the API as initialized. bli_mem_is_init = TRUE; @@ -306,11 +312,13 @@ void bli_mem_init( void ) #ifdef BLIS_ENABLE_PTHREADS pthread_mutex_unlock( &mem_manager_mutex ); #endif + + // Finalize the temporary gemm context. + bli_gemm_cntx_finalize( &cntx ); } -void bli_mem_reinit( void ) +void bli_mem_reinit( cntx_t* cntx ) { - #ifdef BLIS_ENABLE_OPENMP _Pragma( "omp critical (mem)" ) #endif @@ -325,7 +333,7 @@ void bli_mem_reinit( void ) if ( bli_mem_is_init == FALSE ) { // Initialize the memory pools. - bli_mem_init_pools(); + bli_mem_init_pools( cntx ); // After initialization, mark the API as initialized. bli_mem_is_init = TRUE; @@ -333,7 +341,7 @@ void bli_mem_reinit( void ) else { // Reinitialize the memory pools. - bli_mem_reinit_pools(); + bli_mem_reinit_pools( cntx ); } } // END CRITICAL SECTION @@ -386,114 +394,134 @@ bool_t bli_mem_is_initialized( void ) // ----------------------------------------------------------------------------- -void bli_mem_init_pools( void ) +void bli_mem_init_pools( cntx_t* cntx ) { // Map each of the packbuf_t values to an index starting at zero. - dim_t index_a = bli_packbuf_index( BLIS_BUFFER_FOR_A_BLOCK ); - dim_t index_b = bli_packbuf_index( BLIS_BUFFER_FOR_B_PANEL ); - dim_t index_c = bli_packbuf_index( BLIS_BUFFER_FOR_C_PANEL ); + const dim_t index_a = bli_packbuf_index( BLIS_BUFFER_FOR_A_BLOCK ); + const dim_t index_b = bli_packbuf_index( BLIS_BUFFER_FOR_B_PANEL ); + const dim_t index_c = bli_packbuf_index( BLIS_BUFFER_FOR_C_PANEL ); - siz_t align_size = BLIS_POOL_ADDR_ALIGN_SIZE; + const siz_t align_size = BLIS_POOL_ADDR_ALIGN_SIZE; - siz_t block_size_a = 0; - siz_t block_size_b = 0; - siz_t block_size_c = 0; + // Alias the pool addresses to convenient identifiers. + pool_t* pool_a = &pools[ index_a ]; + pool_t* pool_b = &pools[ index_b ]; + pool_t* pool_c = &pools[ index_c ]; // Start with empty pools. - dim_t num_blocks_a = 0; - dim_t num_blocks_b = 0; - dim_t num_blocks_c = 0; + const dim_t num_blocks_a = 0; + const dim_t num_blocks_b = 0; + const dim_t num_blocks_c = 0; + + siz_t block_size_a = 0; + siz_t block_size_b = 0; + siz_t block_size_c = 0; // Determine the block size for each memory pool. bli_mem_compute_pool_block_sizes( &block_size_a, &block_size_b, - &block_size_c ); + &block_size_c, + cntx ); // Initialize the memory pools for A, B, and C. - bli_pool_init( num_blocks_a, block_size_a, align_size, &pools[ index_a ] ); - bli_pool_init( num_blocks_b, block_size_b, align_size, &pools[ index_b ] ); - bli_pool_init( num_blocks_c, block_size_c, align_size, &pools[ index_c ] ); + bli_pool_init( num_blocks_a, block_size_a, align_size, pool_a ); + bli_pool_init( num_blocks_b, block_size_b, align_size, pool_b ); + bli_pool_init( num_blocks_c, block_size_c, align_size, pool_c ); } -void bli_mem_reinit_pools( void ) +void bli_mem_reinit_pools( cntx_t* cntx ) { // Map each of the packbuf_t values to an index starting at zero. - dim_t index_a = bli_packbuf_index( BLIS_BUFFER_FOR_A_BLOCK ); - dim_t index_b = bli_packbuf_index( BLIS_BUFFER_FOR_B_PANEL ); - dim_t index_c = bli_packbuf_index( BLIS_BUFFER_FOR_C_PANEL ); + const dim_t index_a = bli_packbuf_index( BLIS_BUFFER_FOR_A_BLOCK ); + const dim_t index_b = bli_packbuf_index( BLIS_BUFFER_FOR_B_PANEL ); + const dim_t index_c = bli_packbuf_index( BLIS_BUFFER_FOR_C_PANEL ); - siz_t align_size = BLIS_POOL_ADDR_ALIGN_SIZE; + const siz_t align_size = BLIS_POOL_ADDR_ALIGN_SIZE; - siz_t block_size_a = 0; - siz_t block_size_b = 0; - siz_t block_size_c = 0; + // Alias the pool addresses to convenient identifiers. + pool_t* pool_a = &pools[ index_a ]; + pool_t* pool_b = &pools[ index_b ]; + pool_t* pool_c = &pools[ index_c ]; // Query the number of blocks currently allocated in each pool. - dim_t num_blocks_a = bli_pool_num_blocks( &pools[ index_a ] ); - dim_t num_blocks_b = bli_pool_num_blocks( &pools[ index_b ] ); - dim_t num_blocks_c = bli_pool_num_blocks( &pools[ index_c ] ); + const dim_t num_blocks_a = bli_pool_num_blocks( pool_a ); + const dim_t num_blocks_b = bli_pool_num_blocks( pool_b ); + const dim_t num_blocks_c = bli_pool_num_blocks( pool_c ); - // Determine the block size for each memory pool. - bli_mem_compute_pool_block_sizes( &block_size_a, - &block_size_b, - &block_size_c ); + siz_t block_size_a_new = 0; + siz_t block_size_b_new = 0; + siz_t block_size_c_new = 0; - // Reinitialize the memory pools for A, B, and C with the same number - // of blocks as before, except with new block sizes. - bli_pool_reinit( num_blocks_a, block_size_a, align_size, &pools[ index_a ] ); - bli_pool_reinit( num_blocks_b, block_size_b, align_size, &pools[ index_b ] ); - bli_pool_reinit( num_blocks_c, block_size_c, align_size, &pools[ index_c ] ); + // Determine the context-implied block size needed for each pool. + bli_mem_compute_pool_block_sizes( &block_size_a_new, + &block_size_b_new, + &block_size_c_new, + cntx ); + + // Reinitialize the pool, but only if one of the parameters has + // changed in such a way that reinitialization would be required. + // In this case, the align_size is constant, as is num_blocks, so + // what this actually boils down to is that reinitialization of a + // pool occurs only if the block size for that pool has increased. + bli_pool_reinit_if( num_blocks_a, block_size_a_new, align_size, pool_a ); + bli_pool_reinit_if( num_blocks_b, block_size_b_new, align_size, pool_b ); + bli_pool_reinit_if( num_blocks_c, block_size_c_new, align_size, pool_c ); } void bli_mem_finalize_pools( void ) { // Map each of the packbuf_t values to an index starting at zero. - dim_t index_a = bli_packbuf_index( BLIS_BUFFER_FOR_A_BLOCK ); - dim_t index_b = bli_packbuf_index( BLIS_BUFFER_FOR_B_PANEL ); - dim_t index_c = bli_packbuf_index( BLIS_BUFFER_FOR_C_PANEL ); + dim_t index_a = bli_packbuf_index( BLIS_BUFFER_FOR_A_BLOCK ); + dim_t index_b = bli_packbuf_index( BLIS_BUFFER_FOR_B_PANEL ); + dim_t index_c = bli_packbuf_index( BLIS_BUFFER_FOR_C_PANEL ); + + // Alias the pool addresses to convenient identifiers. + pool_t* pool_a = &pools[ index_a ]; + pool_t* pool_b = &pools[ index_b ]; + pool_t* pool_c = &pools[ index_c ]; // Finalize the memory pools for A, B, and C. - bli_pool_finalize( &pools[ index_a ] ); - bli_pool_finalize( &pools[ index_b ] ); - bli_pool_finalize( &pools[ index_c ] ); + bli_pool_finalize( pool_a ); + bli_pool_finalize( pool_b ); + bli_pool_finalize( pool_c ); } // ----------------------------------------------------------------------------- -void bli_mem_compute_pool_block_sizes( siz_t* bs_a, - siz_t* bs_b, - siz_t* bs_c ) +void bli_mem_compute_pool_block_sizes( siz_t* bs_a, + siz_t* bs_b, + siz_t* bs_c, + cntx_t* cntx ) { + const ind_t im = bli_cntx_get_ind_method( cntx ); + siz_t bs_cand_a = 0; siz_t bs_cand_b = 0; siz_t bs_cand_c = 0; - ind_t im; num_t dt; - // Compute pool block sizes for datatype and each implemented - // method and find the maximum size for each pool. This is done - // so that new pools do not need to be allocated if the user - // switches datatypes or methods later on. - for ( im = 0; im < BLIS_NUM_IND_METHODS; ++im ) + // Compute pool block sizes for each datatype and find the maximum + // size for each pool. This is done so that new pools do not need + // to be allocated if the user switches datatypes. + for ( dt = BLIS_DT_LO; dt <= BLIS_DT_HI; ++dt ) { - for ( dt = BLIS_DT_LO; dt <= BLIS_DT_HI; ++dt ) - { - // Avoid considering induced methods for real datatypes. - if ( bli_is_complex( dt ) || im == BLIS_NAT ) - { - siz_t bs_dt_a, bs_dt_b, bs_dt_c; + siz_t bs_dt_a; + siz_t bs_dt_b; + siz_t bs_dt_c; - bli_mem_compute_pool_block_sizes_dt( dt, im, - &bs_dt_a, - &bs_dt_b, - &bs_dt_c ); + // Avoid considering induced methods for real datatypes. + if ( bli_is_real( dt ) && im != BLIS_NAT ) continue; - bs_cand_a = bli_max( bs_dt_a, bs_cand_a ); - bs_cand_b = bli_max( bs_dt_b, bs_cand_b ); - bs_cand_c = bli_max( bs_dt_c, bs_cand_c ); - } - } + bli_mem_compute_pool_block_sizes_dt( dt, + &bs_dt_a, + &bs_dt_b, + &bs_dt_c, + cntx ); + + bs_cand_a = bli_max( bs_dt_a, bs_cand_a ); + bs_cand_b = bli_max( bs_dt_b, bs_cand_b ); + bs_cand_c = bli_max( bs_dt_c, bs_cand_c ); } // Save the results. @@ -504,11 +532,11 @@ void bli_mem_compute_pool_block_sizes( siz_t* bs_a, // ----------------------------------------------------------------------------- -void bli_mem_compute_pool_block_sizes_dt( num_t dt, - ind_t method, - siz_t* bs_a, - siz_t* bs_b, - siz_t* bs_c ) +void bli_mem_compute_pool_block_sizes_dt( num_t dt, + siz_t* bs_a, + siz_t* bs_b, + siz_t* bs_c, + cntx_t* cntx ) { siz_t size_dt = bli_datatype_size( dt ); @@ -527,7 +555,8 @@ void bli_mem_compute_pool_block_sizes_dt( num_t dt, dim_t kc_max_dt; dim_t nc_max_dt; - dim_t packmr_dt, packnr_dt; + dim_t packmr_dt; + dim_t packnr_dt; dim_t max_packmnr_dt; dim_t scale_num_dt; @@ -543,8 +572,8 @@ void bli_mem_compute_pool_block_sizes_dt( num_t dt, // Query the mr and nr blksz_t objects for the given method of // execution. - mr = bli_bsv_get_blksz( BLIS_MR, method ); - nr = bli_bsv_get_blksz( BLIS_NR, method ); + mr = bli_cntx_get_blksz( BLIS_MR, cntx ); + nr = bli_cntx_get_blksz( BLIS_NR, cntx ); // Extract the mr and nr values specific to the current datatype. mr_dt = bli_blksz_get_def( dt, mr ); @@ -558,9 +587,9 @@ void bli_mem_compute_pool_block_sizes_dt( num_t dt, // // Query the mc, kc, and nc blksz_t objects for native execution. - mc = bli_bsv_get_blksz( BLIS_MC, method ); - kc = bli_bsv_get_blksz( BLIS_KC, method ); - nc = bli_bsv_get_blksz( BLIS_NC, method ); + mc = bli_cntx_get_blksz( BLIS_MC, cntx ); + kc = bli_cntx_get_blksz( BLIS_KC, cntx ); + nc = bli_cntx_get_blksz( BLIS_NC, cntx ); // Extract the maximum mc, kc, and nc values specific to the current // datatype. diff --git a/frame/base/bli_mem.h b/frame/base/bli_mem.h index f80e73aa6..8d6d71501 100644 --- a/frame/base/bli_mem.h +++ b/frame/base/bli_mem.h @@ -33,7 +33,7 @@ */ void bli_mem_init( void ); -void bli_mem_reinit( void ); +void bli_mem_reinit( cntx_t* cntx ); void bli_mem_finalize( void ); bool_t bli_mem_is_initialized( void ); @@ -52,16 +52,17 @@ siz_t bli_mem_pool_size( packbuf_t buf_type ); // ----------------------------------------------------------------------------- -void bli_mem_init_pools( void ); -void bli_mem_reinit_pools( void ); +void bli_mem_init_pools( cntx_t* cntx ); +void bli_mem_reinit_pools( cntx_t* cntx ); void bli_mem_finalize_pools( void ); -void bli_mem_compute_pool_block_sizes( siz_t* bs_a, - siz_t* bs_b, - siz_t* bs_c ); -void bli_mem_compute_pool_block_sizes_dt( num_t dt, - ind_t method, - siz_t* bs_a, - siz_t* bs_b, - siz_t* bs_c ); +void bli_mem_compute_pool_block_sizes( siz_t* bs_a, + siz_t* bs_b, + siz_t* bs_c, + cntx_t* cntx ); +void bli_mem_compute_pool_block_sizes_dt( num_t dt, + siz_t* bs_a, + siz_t* bs_b, + siz_t* bs_c, + cntx_t* cntx ); diff --git a/frame/base/bli_pool.c b/frame/base/bli_pool.c index bae3e3556..23090ca39 100644 --- a/frame/base/bli_pool.c +++ b/frame/base/bli_pool.c @@ -34,7 +34,7 @@ #include "blis.h" -void bli_pool_init( dim_t num_blocks_init, +void bli_pool_init( dim_t num_blocks, siz_t block_size, siz_t align_size, pool_t* pool ) @@ -43,18 +43,18 @@ void bli_pool_init( dim_t num_blocks_init, dim_t i; // Allocate the block_ptrs array. - block_ptrs = bli_malloc( num_blocks_init * sizeof( pblk_t ) ); + block_ptrs = bli_malloc( num_blocks * sizeof( pblk_t ) ); // Allocate and initialize each entry in the block_ptrs array. - for ( i = 0; i < num_blocks_init; ++i ) + for ( i = 0; i < num_blocks; ++i ) { bli_pool_alloc_block( block_size, align_size, &(block_ptrs[i]) ); } // Initialize the pool_t structure. bli_pool_set_block_ptrs( block_ptrs, pool ); - bli_pool_set_block_ptrs_len( num_blocks_init, pool ); - bli_pool_set_num_blocks( num_blocks_init, pool ); + bli_pool_set_block_ptrs_len( num_blocks, pool ); + bli_pool_set_num_blocks( num_blocks, pool ); bli_pool_set_top_index( 0, pool ); bli_pool_set_block_size( block_size, pool ); bli_pool_set_align_size( align_size, pool ); @@ -116,6 +116,38 @@ void bli_pool_reinit( dim_t num_blocks_new, bli_pool_init( num_blocks_new, block_size_new, align_size_new, pool ); } +void bli_pool_reinit_if( dim_t num_blocks_new, + siz_t block_size_new, + siz_t align_size_new, + pool_t* pool ) +{ + const dim_t num_blocks = bli_pool_num_blocks( pool ); + const dim_t block_size = bli_pool_block_size( pool ); + const dim_t align_size = bli_pool_align_size( pool ); + + // Reinitialize the pool, but only if one or more of new pool + // parameters would require it. Otherwise, if only the number + // of blocks has increased, we can skip a full reinit and just + // grow the pool. + if ( block_size_new > block_size || + align_size_new != align_size ) + { + // Reinitialize the pool with the new parameters, in particular, + // the new block size. + bli_pool_reinit( num_blocks_new, + block_size_new, + align_size_new, + pool ); + } + else if ( num_blocks_new > num_blocks ) + { + const dim_t num_blocks_add = num_blocks_new - + num_blocks; + + bli_pool_grow( num_blocks_add, pool ); + } +} + void bli_pool_checkout_block( pblk_t* block, pool_t* pool ) { pblk_t* block_ptrs; diff --git a/frame/base/bli_pool.h b/frame/base/bli_pool.h index aba3c2f51..aec0b94ac 100644 --- a/frame/base/bli_pool.h +++ b/frame/base/bli_pool.h @@ -158,15 +158,19 @@ typedef struct // ----------------------------------------------------------------------------- -void bli_pool_init( dim_t num_blocks_init, +void bli_pool_init( dim_t num_blocks, siz_t block_size, siz_t align_size, pool_t* pool ); void bli_pool_finalize( pool_t* pool ); void bli_pool_reinit( dim_t num_blocks_new, - siz_t block_size, - siz_t align_size, + siz_t block_size_new, + siz_t align_size_new, pool_t* pool ); +void bli_pool_reinit_if( dim_t num_blocks_new, + siz_t block_size_new, + siz_t align_size_new, + pool_t* pool ); void bli_pool_checkout_block( pblk_t* block, pool_t* pool ); void bli_pool_checkin_block( pblk_t* block, pool_t* pool ); diff --git a/frame/base/bli_threading.c b/frame/base/bli_threading.c index 168679ef0..fbb457eec 100644 --- a/frame/base/bli_threading.c +++ b/frame/base/bli_threading.c @@ -75,17 +75,21 @@ void bli_barrier( thread_comm_t* communicator, dim_t t_id ) return; } -void bli_level3_thread_decorator( dim_t n_threads, - level3_int_t func, - obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - void* cntl, - void** thread ) +void bli_level3_thread_decorator + ( + dim_t n_threads, + l3_int_t func, + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + void* cntx, + void* cntl, + void** thread + ) { - func( alpha, a, b, beta, c, cntl, thread[0] ); + func( alpha, a, b, beta, c, cntx, cntl, thread[0] ); } @@ -291,10 +295,11 @@ void bli_get_range( void* thr, dim_t n, dim_t bf, bool_t handle_edge_low, dim_t* } } -siz_t bli_get_range_l2r( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ) +siz_t bli_get_range_l2r( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ) { - dim_t m = bli_obj_length_after_trans( *a ); - dim_t n = bli_obj_width_after_trans( *a ); + dim_t m = bli_obj_length_after_trans( *a ); + dim_t n = bli_obj_width_after_trans( *a ); + dim_t bf = bli_blksz_get_def_for_obj( a, bmult ); bli_get_range( thr, n, bf, FALSE, start, end ); @@ -302,10 +307,11 @@ siz_t bli_get_range_l2r( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end return m * ( *end - *start ); } -siz_t bli_get_range_r2l( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ) +siz_t bli_get_range_r2l( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ) { - dim_t m = bli_obj_length_after_trans( *a ); - dim_t n = bli_obj_width_after_trans( *a ); + dim_t m = bli_obj_length_after_trans( *a ); + dim_t n = bli_obj_width_after_trans( *a ); + dim_t bf = bli_blksz_get_def_for_obj( a, bmult ); bli_get_range( thr, n, bf, TRUE, start, end ); @@ -313,10 +319,11 @@ siz_t bli_get_range_r2l( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end return m * ( *end - *start ); } -siz_t bli_get_range_t2b( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ) +siz_t bli_get_range_t2b( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ) { - dim_t m = bli_obj_length_after_trans( *a ); - dim_t n = bli_obj_width_after_trans( *a ); + dim_t m = bli_obj_length_after_trans( *a ); + dim_t n = bli_obj_width_after_trans( *a ); + dim_t bf = bli_blksz_get_def_for_obj( a, bmult ); bli_get_range( thr, m, bf, FALSE, start, end ); @@ -324,10 +331,11 @@ siz_t bli_get_range_t2b( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end return n * ( *end - *start ); } -siz_t bli_get_range_b2t( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ) +siz_t bli_get_range_b2t( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ) { - dim_t m = bli_obj_length_after_trans( *a ); - dim_t n = bli_obj_width_after_trans( *a ); + dim_t m = bli_obj_length_after_trans( *a ); + dim_t n = bli_obj_width_after_trans( *a ); + dim_t bf = bli_blksz_get_def_for_obj( a, bmult ); bli_get_range( thr, m, bf, TRUE, start, end ); @@ -665,7 +673,7 @@ siz_t bli_get_range_weighted( void* thr, return area; } -siz_t bli_get_range_weighted_l2r( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ) +siz_t bli_get_range_weighted_l2r( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ) { siz_t area; @@ -680,6 +688,7 @@ siz_t bli_get_range_weighted_l2r( void* thr, obj_t* a, dim_t bf, dim_t* start, d uplo_t uplo = bli_obj_uplo( *a ); dim_t m = bli_obj_length( *a ); dim_t n = bli_obj_width( *a ); + dim_t bf = bli_blksz_get_def_for_obj( a, bmult ); // Support implicit transposition. if ( bli_obj_has_trans( *a ) ) @@ -692,14 +701,14 @@ siz_t bli_get_range_weighted_l2r( void* thr, obj_t* a, dim_t bf, dim_t* start, d } else // if dense or zeros { - area = bli_get_range_l2r( thr, a, bf, + area = bli_get_range_l2r( thr, a, bmult, start, end ); } return area; } -siz_t bli_get_range_weighted_r2l( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ) +siz_t bli_get_range_weighted_r2l( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ) { siz_t area; @@ -714,6 +723,7 @@ siz_t bli_get_range_weighted_r2l( void* thr, obj_t* a, dim_t bf, dim_t* start, d uplo_t uplo = bli_obj_uplo( *a ); dim_t m = bli_obj_length( *a ); dim_t n = bli_obj_width( *a ); + dim_t bf = bli_blksz_get_def_for_obj( a, bmult ); // Support implicit transposition. if ( bli_obj_has_trans( *a ) ) @@ -728,14 +738,14 @@ siz_t bli_get_range_weighted_r2l( void* thr, obj_t* a, dim_t bf, dim_t* start, d } else // if dense or zeros { - area = bli_get_range_r2l( thr, a, bf, + area = bli_get_range_r2l( thr, a, bmult, start, end ); } return area; } -siz_t bli_get_range_weighted_t2b( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ) +siz_t bli_get_range_weighted_t2b( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ) { siz_t area; @@ -750,6 +760,7 @@ siz_t bli_get_range_weighted_t2b( void* thr, obj_t* a, dim_t bf, dim_t* start, d uplo_t uplo = bli_obj_uplo( *a ); dim_t m = bli_obj_length( *a ); dim_t n = bli_obj_width( *a ); + dim_t bf = bli_blksz_get_def_for_obj( a, bmult ); // Support implicit transposition. if ( bli_obj_has_trans( *a ) ) @@ -764,14 +775,14 @@ siz_t bli_get_range_weighted_t2b( void* thr, obj_t* a, dim_t bf, dim_t* start, d } else // if dense or zeros { - area = bli_get_range_t2b( thr, a, bf, + area = bli_get_range_t2b( thr, a, bmult, start, end ); } return area; } -siz_t bli_get_range_weighted_b2t( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ) +siz_t bli_get_range_weighted_b2t( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ) { siz_t area; @@ -786,6 +797,7 @@ siz_t bli_get_range_weighted_b2t( void* thr, obj_t* a, dim_t bf, dim_t* start, d uplo_t uplo = bli_obj_uplo( *a ); dim_t m = bli_obj_length( *a ); dim_t n = bli_obj_width( *a ); + dim_t bf = bli_blksz_get_def_for_obj( a, bmult ); // Support implicit transposition. if ( bli_obj_has_trans( *a ) ) @@ -802,7 +814,7 @@ siz_t bli_get_range_weighted_b2t( void* thr, obj_t* a, dim_t bf, dim_t* start, d } else // if dense or zeros { - area = bli_get_range_b2t( thr, a, bf, + area = bli_get_range_b2t( thr, a, bmult, start, end ); } diff --git a/frame/base/bli_threading.h b/frame/base/bli_threading.h index 5fdff80fd..60759b8b6 100644 --- a/frame/base/bli_threading.h +++ b/frame/base/bli_threading.h @@ -142,10 +142,10 @@ typedef struct thrinfo_s thrinfo_t; void bli_get_range( void* thr, dim_t n, dim_t bf, bool_t handle_edge_low, dim_t* start, dim_t* end ); -siz_t bli_get_range_l2r( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ); -siz_t bli_get_range_r2l( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ); -siz_t bli_get_range_t2b( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ); -siz_t bli_get_range_b2t( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ); +siz_t bli_get_range_l2r( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ); +siz_t bli_get_range_r2l( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ); +siz_t bli_get_range_t2b( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ); +siz_t bli_get_range_b2t( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ); dim_t bli_get_range_width_l( doff_t diagoff_j, dim_t m, @@ -167,10 +167,10 @@ siz_t bli_get_range_weighted( void* thr, dim_t* j_start_thr, dim_t* j_end_thr ); -siz_t bli_get_range_weighted_l2r( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ); -siz_t bli_get_range_weighted_r2l( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ); -siz_t bli_get_range_weighted_t2b( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ); -siz_t bli_get_range_weighted_b2t( void* thr, obj_t* a, dim_t bf, dim_t* start, dim_t* end ); +siz_t bli_get_range_weighted_l2r( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ); +siz_t bli_get_range_weighted_r2l( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ); +siz_t bli_get_range_weighted_t2b( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ); +siz_t bli_get_range_weighted_b2t( void* thr, obj_t* a, blksz_t* bmult, dim_t* start, dim_t* end ); thrinfo_t* bli_create_thread_info( thread_comm_t* ocomm, dim_t ocomm_id, thread_comm_t* icomm, dim_t icomm_id, @@ -193,16 +193,31 @@ dim_t bli_read_nway_from_env( char* env ); #include "bli_trmm_threading.h" #include "bli_trsm_threading.h" -typedef void (*level3_int_t) ( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, void* cntl, void* thread ); -void bli_level3_thread_decorator( dim_t num_threads, - level3_int_t func, - obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - void* cntl, - void** thread ); +typedef void (*l3_int_t) + ( + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + void* cntx, + void* cntl, + void* thread + ); + +void bli_level3_thread_decorator + ( + dim_t num_threads, + l3_int_t func, + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + void* cntx, + void* cntl, + void** thread + ); dim_t bli_gcd( dim_t x, dim_t y ); diff --git a/frame/base/bli_threading_omp.c b/frame/base/bli_threading_omp.c index 567f2eb5f..8cd714da1 100644 --- a/frame/base/bli_threading_omp.c +++ b/frame/base/bli_threading_omp.c @@ -51,27 +51,32 @@ void bli_free_communicator( thread_comm_t* communicator ) bli_free( communicator ); } -void bli_level3_thread_decorator( dim_t n_threads, - level3_int_t func, - obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - void* cntl, - void** thread ) +void bli_level3_thread_decorator + ( + dim_t n_threads, + l3_int_t func, + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + void* cntx, + void* cntl, + void** thread + ) { _Pragma( "omp parallel num_threads(n_threads)" ) { dim_t omp_id = omp_get_thread_num(); func( alpha, - a, - b, - beta, - c, - cntl, - thread[omp_id] ); + a, + b, + beta, + c, + cntx, + cntl, + thread[omp_id] ); } } diff --git a/frame/base/bli_threading_pthreads.c b/frame/base/bli_threading_pthreads.c index 0aad78c2a..d76756191 100644 --- a/frame/base/bli_threading_pthreads.c +++ b/frame/base/bli_threading_pthreads.c @@ -46,6 +46,7 @@ typedef struct thread_data obj_t* b; obj_t* beta; obj_t* c; + void* cntx; void* cntl; void* thread; } thread_data_t; @@ -54,20 +55,34 @@ void* thread_decorator_helper( void* data_void ) { thread_data_t* data = data_void; - data->func( data->alpha, data->a, data->b, data->beta, data->c, data->cntl, data->thread ); + data->func + ( + data->alpha, + data->a, + data->b, + data->beta, + data->c, + data->cntx, + data->cntl, + data->thread + ); return NULL; } -void bli_level3_thread_decorator( dim_t n_threads, - level3_int_t func, - obj_t* alpha, - obj_t* a, - obj_t* b, - obj_t* beta, - obj_t* c, - void* cntl, - void** thread ) +void bli_level3_thread_decorator + ( + dim_t n_threads, + l3_int_t func, + obj_t* alpha, + obj_t* a, + obj_t* b, + obj_t* beta, + obj_t* c, + void* cntx, + void* cntl, + void** thread + ) { pthread_t* pthreads = (pthread_t*) bli_malloc(sizeof(pthread_t) * n_threads); //Saying "datas" is kind of like saying "all y'all" @@ -83,8 +98,10 @@ void bli_level3_thread_decorator( dim_t n_threads, datas[i].b = b; datas[i].beta = beta; datas[i].c = c; + datas[i].cntx = cntx; datas[i].cntl = cntl; datas[i].thread = thread[i]; + pthread_create( &pthreads[i], NULL, &thread_decorator_helper, &datas[i] ); } diff --git a/frame/cntl/bli_cntl.h b/frame/cntl/bli_cntl.h index 375cbb204..ec7956d2c 100644 --- a/frame/cntl/bli_cntl.h +++ b/frame/cntl/bli_cntl.h @@ -64,7 +64,7 @@ void bli_cntl_obj_free( void* cntl ); #define cntl_impl_type( cntl ) cntl->impl_type #define cntl_var_num( cntl ) cntl->var_num -#define cntl_blocksize( cntl ) cntl->b +#define cntl_bszid( cntl ) cntl->bszid diff --git a/frame/cntl/bli_cntl_init.c b/frame/cntl/bli_cntl_init.c index 54187d909..b7c53ec65 100644 --- a/frame/cntl/bli_cntl_init.c +++ b/frame/cntl/bli_cntl_init.c @@ -64,9 +64,6 @@ void bli_cntl_init( void ) bli_gemm_cntl_init(); bli_trsm_cntl_init(); - // Level-3 induced - bli_ind_cntl_init(); - // Mark API as initialized. bli_cntl_is_init = TRUE; } @@ -96,9 +93,6 @@ void bli_cntl_finalize( void ) bli_gemm_cntl_finalize(); bli_trsm_cntl_finalize(); - // Level-3 induced - bli_ind_cntl_finalize(); - // Mark API as uninitialized. bli_cntl_is_init = FALSE; } diff --git a/frame/compat/bla_amax.c b/frame/compat/bla_amax.c index 9e32e2177..47f12de83 100644 --- a/frame/compat/bla_amax.c +++ b/frame/compat/bla_amax.c @@ -41,10 +41,11 @@ #undef GENTFUNC #define GENTFUNC( ftype_x, chx, blasname, blisname ) \ \ -f77_int PASTEF772(i,chx,blasname)( \ - f77_int* n, \ - ftype_x* x, f77_int* incx \ - ) \ +f77_int PASTEF772(i,chx,blasname) \ + ( \ + f77_int* n, \ + ftype_x* x, f77_int* incx \ + ) \ { \ dim_t n0; \ ftype_x* x0; \ @@ -70,9 +71,13 @@ f77_int PASTEF772(i,chx,blasname)( \ bli_convert_blas_incv( n0, x, *incx, x0, incx0 ); \ \ /* Call BLIS interface. */ \ - PASTEMAC(chx,blisname)( n0, \ - x0, incx0, \ - &bli_index ); \ + PASTEMAC(chx,blisname) \ + ( \ + n0, \ + x0, incx0, \ + &bli_index, \ + NULL \ + ); \ \ /* Convert zero-based BLIS (C) index to one-based BLAS (Fortran) index. */ \ diff --git a/frame/compat/bla_amax.h b/frame/compat/bla_amax.h index 0c2faf959..220a3fb48 100644 --- a/frame/compat/bla_amax.h +++ b/frame/compat/bla_amax.h @@ -39,10 +39,11 @@ #undef GENTPROT #define GENTPROT( ftype_x, chx, blasname ) \ \ -f77_int PASTEF772(i,chx,blasname)( \ - f77_int* n, \ - ftype_x* x, f77_int* incx \ - ); \ +f77_int PASTEF772(i,chx,blasname) \ + ( \ + f77_int* n, \ + ftype_x* x, f77_int* incx \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( amax ) diff --git a/frame/compat/bla_asum.c b/frame/compat/bla_asum.c index 33e6ad31b..8b4291296 100644 --- a/frame/compat/bla_asum.c +++ b/frame/compat/bla_asum.c @@ -41,10 +41,11 @@ #undef GENTFUNCR2 #define GENTFUNCR2( ftype_x, ftype_r, chx, chr, blasname, blisname ) \ \ -ftype_r PASTEF772(chr,chx,blasname)( \ - f77_int* n, \ - ftype_x* x, f77_int* incx \ - ) \ +ftype_r PASTEF772(chr,chx,blasname) \ + ( \ + f77_int* n, \ + ftype_x* x, f77_int* incx \ + ) \ { \ dim_t n0; \ ftype_x* x0; \ @@ -63,9 +64,13 @@ ftype_r PASTEF772(chr,chx,blasname)( \ bli_convert_blas_incv( n0, x, *incx, x0, incx0 ); \ \ /* Call BLIS interface. */ \ - PASTEMAC(chx,blisname)( n0, \ - x0, incx0, \ - &asum ); \ + PASTEMAC(chx,blisname) \ + ( \ + n0, \ + x0, incx0, \ + &asum, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_asum.h b/frame/compat/bla_asum.h index ef3c9f3e7..406665913 100644 --- a/frame/compat/bla_asum.h +++ b/frame/compat/bla_asum.h @@ -39,10 +39,11 @@ #undef GENTPROTR2 #define GENTPROTR2( ftype_x, ftype_r, chx, chr, blasname ) \ \ -ftype_r PASTEF772(chr,chx,blasname)( \ - f77_int* n, \ - ftype_x* x, f77_int* incx \ - ); \ +ftype_r PASTEF772(chr,chx,blasname) \ + ( \ + f77_int* n, \ + ftype_x* x, f77_int* incx \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTR2_BLAS( asum ) diff --git a/frame/compat/bla_axpy.c b/frame/compat/bla_axpy.c index a4f75d026..53c9e4832 100644 --- a/frame/compat/bla_axpy.c +++ b/frame/compat/bla_axpy.c @@ -41,12 +41,13 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_int* n, \ - ftype* alpha, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_int* n, \ + ftype* alpha, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy \ + ) \ { \ dim_t n0; \ ftype* x0; \ @@ -67,17 +68,21 @@ void PASTEF77(ch,blasname)( \ bli_convert_blas_incv( n0, y, *incy, y0, incy0 ); \ \ /* Call BLIS interface. */ \ - PASTEMAC3(ch,ch,ch,blisname)( BLIS_NO_CONJUGATE, \ - n0, \ - alpha, \ - x0, incx0, \ - y0, incy0 ); \ + PASTEMAC(ch,blisname) \ + ( \ + BLIS_NO_CONJUGATE, \ + n0, \ + alpha, \ + x0, incx0, \ + y0, incy0, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ } #ifdef BLIS_ENABLE_BLAS2BLIS -INSERT_GENTFUNC_BLAS( axpy, AXPYV_KERNEL ) +INSERT_GENTFUNC_BLAS( axpy, axpyv ) #endif diff --git a/frame/compat/bla_axpy.h b/frame/compat/bla_axpy.h index 12b5bafa7..a457f294d 100644 --- a/frame/compat/bla_axpy.h +++ b/frame/compat/bla_axpy.h @@ -39,12 +39,13 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_int* n, \ - ftype* alpha, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_int* n, \ + ftype* alpha, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( axpy ) diff --git a/frame/compat/bla_copy.c b/frame/compat/bla_copy.c index 79140a2ce..26a32d3c4 100644 --- a/frame/compat/bla_copy.c +++ b/frame/compat/bla_copy.c @@ -41,11 +41,12 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_int* n, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_int* n, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy \ + ) \ { \ dim_t n0; \ ftype* x0; \ @@ -66,16 +67,20 @@ void PASTEF77(ch,blasname)( \ bli_convert_blas_incv( n0, y, *incy, y0, incy0 ); \ \ /* Call BLIS interface. */ \ - PASTEMAC2(ch,ch,blisname)( BLIS_NO_CONJUGATE, \ - n0, \ - x0, incx0, \ - y0, incy0 ); \ + PASTEMAC(ch,blisname) \ + ( \ + BLIS_NO_CONJUGATE, \ + n0, \ + x0, incx0, \ + y0, incy0, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ } #ifdef BLIS_ENABLE_BLAS2BLIS -INSERT_GENTFUNC_BLAS( copy, COPYV_KERNEL ) +INSERT_GENTFUNC_BLAS( copy, copyv ) #endif diff --git a/frame/compat/bla_copy.h b/frame/compat/bla_copy.h index 46060b76c..a2f9c4c4f 100644 --- a/frame/compat/bla_copy.h +++ b/frame/compat/bla_copy.h @@ -39,11 +39,12 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_int* n, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_int* n, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( copy ) diff --git a/frame/compat/bla_dot.c b/frame/compat/bla_dot.c index a9df2114e..949177fc7 100644 --- a/frame/compat/bla_dot.c +++ b/frame/compat/bla_dot.c @@ -41,11 +41,12 @@ #undef GENTFUNCDOT #define GENTFUNCDOT( ftype, chxy, chc, blis_conjx, blasname, blisname ) \ \ -ftype PASTEF772(chxy,blasname,chc)( \ - f77_int* n, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy \ - ) \ +ftype PASTEF772(chxy,blasname,chc) \ + ( \ + f77_int* n, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy \ + ) \ { \ dim_t n0; \ ftype* x0; \ @@ -67,12 +68,16 @@ ftype PASTEF772(chxy,blasname,chc)( \ bli_convert_blas_incv( n0, y, *incy, y0, incy0 ); \ \ /* Call BLIS interface. */ \ - PASTEMAC3(chxy,chxy,chxy,blisname)( blis_conjx, \ - BLIS_NO_CONJUGATE, \ - n0, \ - x0, incx0, \ - y0, incy0, \ - &rho ); \ + PASTEMAC(chxy,blisname) \ + ( \ + blis_conjx, \ + BLIS_NO_CONJUGATE, \ + n0, \ + x0, incx0, \ + y0, incy0, \ + &rho, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ @@ -81,17 +86,19 @@ ftype PASTEF772(chxy,blasname,chc)( \ } #ifdef BLIS_ENABLE_BLAS2BLIS -INSERT_GENTFUNCDOT_BLAS( dot, DOTV_KERNEL ) +INSERT_GENTFUNCDOT_BLAS( dot, dotv ) // -- "Black sheep" dot product function definitions -- // Input vectors stored in single precision, computed in double precision, // with result returned in single precision. -float PASTEF77(sd,sdot)( f77_int* n, - float* x, f77_int* incx, - float* y, f77_int* incy - ) +float PASTEF77(sd,sdot) + ( + f77_int* n, + float* x, f77_int* incx, + float* y, f77_int* incy + ) { return ( float )PASTEF77(d,sdot)( n, x, incx, @@ -100,10 +107,12 @@ float PASTEF77(sd,sdot)( f77_int* n, // Input vectors stored in single precision, computed in double precision, // with result returned in double precision. -double PASTEF77(d,sdot)( f77_int* n, - float* x, f77_int* incx, - float* y, f77_int* incy - ) +double PASTEF77(d,sdot) + ( + f77_int* n, + float* x, f77_int* incx, + float* y, f77_int* incy + ) { dim_t n0; float* x0; diff --git a/frame/compat/bla_dot.h b/frame/compat/bla_dot.h index 1188fbeca..168b739d2 100644 --- a/frame/compat/bla_dot.h +++ b/frame/compat/bla_dot.h @@ -39,11 +39,12 @@ #undef GENTPROTDOT #define GENTPROTDOT( ftype, chxy, chc, blasname ) \ \ -ftype PASTEF772(chxy,blasname,chc)( \ - f77_int* n, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy \ - ); +ftype PASTEF772(chxy,blasname,chc) \ + ( \ + f77_int* n, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTDOT_BLAS( dot ) @@ -51,13 +52,17 @@ INSERT_GENTPROTDOT_BLAS( dot ) // -- "Black sheep" dot product function prototypes -- -float PASTEF77(sd,sdot)( f77_int* n, - float* x, f77_int* incx, - float* y, f77_int* incy - ); +float PASTEF77(sd,sdot) + ( + f77_int* n, + float* x, f77_int* incx, + float* y, f77_int* incy + ); -double PASTEF77(d,sdot)( f77_int* n, - float* x, f77_int* incx, - float* y, f77_int* incy - ); +double PASTEF77(d,sdot) + ( + f77_int* n, + float* x, f77_int* incx, + float* y, f77_int* incy + ); #endif diff --git a/frame/compat/bla_gemm.c b/frame/compat/bla_gemm.c index c50a58f3c..0cb0bb552 100644 --- a/frame/compat/bla_gemm.c +++ b/frame/compat/bla_gemm.c @@ -41,18 +41,19 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* transa, \ - f77_char* transb, \ - f77_int* m, \ - f77_int* n, \ - f77_int* k, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb, \ - ftype* beta, \ - ftype* c, f77_int* ldc \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* transa, \ + f77_char* transb, \ + f77_int* m, \ + f77_int* n, \ + f77_int* k, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb, \ + ftype* beta, \ + ftype* c, f77_int* ldc \ + ) \ { \ trans_t blis_transa; \ trans_t blis_transb; \ @@ -66,16 +67,19 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - transa, \ - transb, \ - m, \ - n, \ - k, \ - lda, \ - ldb, \ - ldc ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + transa, \ + transb, \ + m, \ + n, \ + k, \ + lda, \ + ldb, \ + ldc \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_trans( *transa, &blis_transa ); \ @@ -95,16 +99,20 @@ void PASTEF77(ch,blasname)( \ cs_c = *ldc; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_transa, \ - blis_transb, \ - m0, \ - n0, \ - k0, \ - alpha, \ - a, rs_a, cs_a, \ - b, rs_b, cs_b, \ - beta, \ - c, rs_c, cs_c ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_transa, \ + blis_transb, \ + m0, \ + n0, \ + k0, \ + alpha, \ + a, rs_a, cs_a, \ + b, rs_b, cs_b, \ + beta, \ + c, rs_c, cs_c, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_gemm.h b/frame/compat/bla_gemm.h index 0ad1018c4..3e8993878 100644 --- a/frame/compat/bla_gemm.h +++ b/frame/compat/bla_gemm.h @@ -39,18 +39,19 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* transa, \ - f77_char* transb, \ - f77_int* m, \ - f77_int* n, \ - f77_int* k, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb, \ - ftype* beta, \ - ftype* c, f77_int* ldc \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* transa, \ + f77_char* transb, \ + f77_int* m, \ + f77_int* n, \ + f77_int* k, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb, \ + ftype* beta, \ + ftype* c, f77_int* ldc \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( gemm ) diff --git a/frame/compat/bla_gemv.c b/frame/compat/bla_gemv.c index a77d23dc2..ea5076aa1 100644 --- a/frame/compat/bla_gemv.c +++ b/frame/compat/bla_gemv.c @@ -41,16 +41,17 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* transa, \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* x, f77_int* incx, \ - ftype* beta, \ - ftype* y, f77_int* incy \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* transa, \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* x, f77_int* incx, \ + ftype* beta, \ + ftype* y, f77_int* incy \ + ) \ { \ trans_t blis_transa; \ dim_t m0, n0; \ @@ -66,14 +67,17 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - transa, \ - m, \ - n, \ - lda, \ - incx, \ - incy ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + transa, \ + m, \ + n, \ + lda, \ + incx, \ + incy \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_trans( *transa, &blis_transa ); \ @@ -114,15 +118,19 @@ void PASTEF77(ch,blasname)( \ cs_a = *lda; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_transa, \ - BLIS_NO_CONJUGATE, \ - m0, \ - n0, \ - alpha, \ - a, rs_a, cs_a, \ - x0, incx0, \ - beta, \ - y0, incy0 ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_transa, \ + BLIS_NO_CONJUGATE, \ + m0, \ + n0, \ + alpha, \ + a, rs_a, cs_a, \ + x0, incx0, \ + beta, \ + y0, incy0, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_gemv.h b/frame/compat/bla_gemv.h index d1b391b67..6710aa223 100644 --- a/frame/compat/bla_gemv.h +++ b/frame/compat/bla_gemv.h @@ -39,16 +39,17 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* transa, \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* x, f77_int* incx, \ - ftype* beta, \ - ftype* y, f77_int* incy \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* transa, \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* x, f77_int* incx, \ + ftype* beta, \ + ftype* y, f77_int* incy \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( gemv ) diff --git a/frame/compat/bla_ger.c b/frame/compat/bla_ger.c index a575841ad..12eefbac0 100644 --- a/frame/compat/bla_ger.c +++ b/frame/compat/bla_ger.c @@ -41,14 +41,15 @@ #undef GENTFUNCDOT #define GENTFUNCDOT( ftype, chxy, chc, blis_conjy, blasname, blisname ) \ \ -void PASTEF772(chxy,blasname,chc)( \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy, \ - ftype* a, f77_int* lda \ - ) \ +void PASTEF772(chxy,blasname,chc) \ + ( \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy, \ + ftype* a, f77_int* lda \ + ) \ { \ dim_t m0, n0; \ ftype* x0; \ @@ -62,13 +63,16 @@ void PASTEF772(chxy,blasname,chc)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - m, \ - n, \ - incx, \ - incy, \ - lda ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + m, \ + n, \ + incx, \ + incy, \ + lda \ + ); \ \ /* Convert/typecast negative values of m and n to zero. */ \ bli_convert_blas_dim1( *m, m0 ); \ @@ -84,14 +88,18 @@ void PASTEF772(chxy,blasname,chc)( \ cs_a = *lda; \ \ /* Call BLIS interface. */ \ - PASTEMAC(chxy,blisname)( BLIS_NO_CONJUGATE, \ - blis_conjy, \ - m0, \ - n0, \ - alpha, \ - x0, incx0, \ - y0, incy0, \ - a, rs_a, cs_a ); \ + PASTEMAC(chxy,blisname) \ + ( \ + BLIS_NO_CONJUGATE, \ + blis_conjy, \ + m0, \ + n0, \ + alpha, \ + x0, incx0, \ + y0, incy0, \ + a, rs_a, cs_a, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_ger.h b/frame/compat/bla_ger.h index 515414c31..d1f2dc2f8 100644 --- a/frame/compat/bla_ger.h +++ b/frame/compat/bla_ger.h @@ -39,14 +39,15 @@ #undef GENTPROTDOT #define GENTPROTDOT( ftype, chxy, chc, blasname ) \ \ -void PASTEF772(chxy,blasname,chc)( \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy, \ - ftype* a, f77_int* lda \ - ); +void PASTEF772(chxy,blasname,chc) \ + ( \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy, \ + ftype* a, f77_int* lda \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTDOT_BLAS( ger ) diff --git a/frame/compat/bla_hemm.c b/frame/compat/bla_hemm.c index ff2730887..7bcc686fe 100644 --- a/frame/compat/bla_hemm.c +++ b/frame/compat/bla_hemm.c @@ -41,17 +41,18 @@ #undef GENTFUNCCO #define GENTFUNCCO( ftype, ftype_r, ch, chr, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* side, \ - f77_char* uploa, \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb, \ - ftype* beta, \ - ftype* c, f77_int* ldc \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* side, \ + f77_char* uploa, \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb, \ + ftype* beta, \ + ftype* c, f77_int* ldc \ + ) \ { \ side_t blis_side; \ uplo_t blis_uploa; \ @@ -65,15 +66,18 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - side, \ - uploa, \ - m, \ - n, \ - lda, \ - ldb, \ - ldc ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + side, \ + uploa, \ + m, \ + n, \ + lda, \ + ldb, \ + ldc \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_side( *side, &blis_side ); \ @@ -92,17 +96,21 @@ void PASTEF77(ch,blasname)( \ cs_c = *ldc; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_side, \ - blis_uploa, \ - BLIS_NO_CONJUGATE, \ - BLIS_NO_TRANSPOSE, \ - m0, \ - n0, \ - alpha, \ - a, rs_a, cs_a, \ - b, rs_b, cs_b, \ - beta, \ - c, rs_c, cs_c ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_side, \ + blis_uploa, \ + BLIS_NO_CONJUGATE, \ + BLIS_NO_TRANSPOSE, \ + m0, \ + n0, \ + alpha, \ + a, rs_a, cs_a, \ + b, rs_b, cs_b, \ + beta, \ + c, rs_c, cs_c, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_hemm.h b/frame/compat/bla_hemm.h index 75f5200dd..348371f72 100644 --- a/frame/compat/bla_hemm.h +++ b/frame/compat/bla_hemm.h @@ -39,17 +39,18 @@ #undef GENTPROTCO #define GENTPROTCO( ftype, ftype_r, ch, chr, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* side, \ - f77_char* uploa, \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb, \ - ftype* beta, \ - ftype* c, f77_int* ldc \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* side, \ + f77_char* uploa, \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb, \ + ftype* beta, \ + ftype* c, f77_int* ldc \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTCO_BLAS( hemm ) diff --git a/frame/compat/bla_hemv.c b/frame/compat/bla_hemv.c index 978c85de3..9b3e8cc1a 100644 --- a/frame/compat/bla_hemv.c +++ b/frame/compat/bla_hemv.c @@ -41,15 +41,16 @@ #undef GENTFUNCCO #define GENTFUNCCO( ftype, ftype_r, ch, chr, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* x, f77_int* incx, \ - ftype* beta, \ - ftype* y, f77_int* incy \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* x, f77_int* incx, \ + ftype* beta, \ + ftype* y, f77_int* incy \ + ) \ { \ uplo_t blis_uploa; \ dim_t m0; \ @@ -64,13 +65,16 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploa, \ - m, \ - lda, \ - incx, \ - incy ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploa, \ + m, \ + lda, \ + incx, \ + incy \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploa, &blis_uploa ); \ @@ -88,15 +92,19 @@ void PASTEF77(ch,blasname)( \ cs_a = *lda; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploa, \ - BLIS_NO_CONJUGATE, \ - BLIS_NO_CONJUGATE, \ - m0, \ - alpha, \ - a, rs_a, cs_a, \ - x0, incx0, \ - beta, \ - y0, incy0 ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploa, \ + BLIS_NO_CONJUGATE, \ + BLIS_NO_CONJUGATE, \ + m0, \ + alpha, \ + a, rs_a, cs_a, \ + x0, incx0, \ + beta, \ + y0, incy0, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_hemv.h b/frame/compat/bla_hemv.h index 733f19572..5fd030974 100644 --- a/frame/compat/bla_hemv.h +++ b/frame/compat/bla_hemv.h @@ -39,15 +39,16 @@ #undef GENTPROTCO #define GENTPROTCO( ftype, ftype_r, ch, chr, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* x, f77_int* incx, \ - ftype* beta, \ - ftype* y, f77_int* incy \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* x, f77_int* incx, \ + ftype* beta, \ + ftype* y, f77_int* incy \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTCO_BLAS( hemv ) diff --git a/frame/compat/bla_her.c b/frame/compat/bla_her.c index 9efe297ed..45d255db7 100644 --- a/frame/compat/bla_her.c +++ b/frame/compat/bla_her.c @@ -41,13 +41,14 @@ #undef GENTFUNCCO #define GENTFUNCCO( ftype, ftype_r, ch, chr, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype_r* alpha, \ - ftype* x, f77_int* incx, \ - ftype* a, f77_int* lda \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype_r* alpha, \ + ftype* x, f77_int* incx, \ + ftype* a, f77_int* lda \ + ) \ { \ uplo_t blis_uploa; \ dim_t m0; \ @@ -60,12 +61,15 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploa, \ - m, \ - incx, \ - lda ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploa, \ + m, \ + incx, \ + lda \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploa, &blis_uploa ); \ @@ -82,12 +86,16 @@ void PASTEF77(ch,blasname)( \ cs_a = *lda; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploa, \ - BLIS_NO_CONJUGATE, \ - m0, \ - alpha, \ - x0, incx0, \ - a, rs_a, cs_a ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploa, \ + BLIS_NO_CONJUGATE, \ + m0, \ + alpha, \ + x0, incx0, \ + a, rs_a, cs_a, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_her.h b/frame/compat/bla_her.h index 65fdbae94..cbae5b0c7 100644 --- a/frame/compat/bla_her.h +++ b/frame/compat/bla_her.h @@ -39,13 +39,14 @@ #undef GENTPROTCO #define GENTPROTCO( ftype, ftype_r, ch, chr, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype_r* alpha, \ - ftype* x, f77_int* incx, \ - ftype* a, f77_int* lda \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype_r* alpha, \ + ftype* x, f77_int* incx, \ + ftype* a, f77_int* lda \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTCO_BLAS( her ) diff --git a/frame/compat/bla_her2.c b/frame/compat/bla_her2.c index 130b5f33c..e998da715 100644 --- a/frame/compat/bla_her2.c +++ b/frame/compat/bla_her2.c @@ -41,14 +41,15 @@ #undef GENTFUNCCO #define GENTFUNCCO( ftype, ftype_r, ch, chr, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype* alpha, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy, \ - ftype* a, f77_int* lda \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype* alpha, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy, \ + ftype* a, f77_int* lda \ + ) \ { \ uplo_t blis_uploa; \ dim_t m0; \ @@ -63,13 +64,16 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploa, \ - m, \ - incx, \ - incy, \ - lda ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploa, \ + m, \ + incx, \ + incy, \ + lda \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploa, &blis_uploa ); \ @@ -87,14 +91,18 @@ void PASTEF77(ch,blasname)( \ cs_a = *lda; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploa, \ - BLIS_NO_CONJUGATE, \ - BLIS_NO_CONJUGATE, \ - m0, \ - alpha, \ - x0, incx0, \ - y0, incy0, \ - a, rs_a, cs_a ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploa, \ + BLIS_NO_CONJUGATE, \ + BLIS_NO_CONJUGATE, \ + m0, \ + alpha, \ + x0, incx0, \ + y0, incy0, \ + a, rs_a, cs_a, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_her2.h b/frame/compat/bla_her2.h index 049137fc7..f4db4f711 100644 --- a/frame/compat/bla_her2.h +++ b/frame/compat/bla_her2.h @@ -39,14 +39,15 @@ #undef GENTPROTCO #define GENTPROTCO( ftype, ftype_r, ch, chr, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype* alpha, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy, \ - ftype* a, f77_int* lda \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype* alpha, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy, \ + ftype* a, f77_int* lda \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTCO_BLAS( her2 ) diff --git a/frame/compat/bla_her2k.c b/frame/compat/bla_her2k.c index 8c246715c..b58c7b93b 100644 --- a/frame/compat/bla_her2k.c +++ b/frame/compat/bla_her2k.c @@ -41,17 +41,18 @@ #undef GENTFUNCCO #define GENTFUNCCO( ftype, ftype_r, ch, chr, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploc, \ - f77_char* transa, \ - f77_int* m, \ - f77_int* k, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb, \ - ftype_r* beta, \ - ftype* c, f77_int* ldc \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploc, \ + f77_char* transa, \ + f77_int* m, \ + f77_int* k, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb, \ + ftype_r* beta, \ + ftype* c, f77_int* ldc \ + ) \ { \ uplo_t blis_uploc; \ trans_t blis_transa; \ @@ -65,15 +66,18 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploc, \ - transa, \ - m, \ - k, \ - lda, \ - ldb, \ - ldc ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploc, \ + transa, \ + m, \ + k, \ + lda, \ + ldb, \ + ldc \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploc, &blis_uploc ); \ @@ -109,16 +113,20 @@ void PASTEF77(ch,blasname)( \ cs_c = *ldc; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploc, \ - blis_transa, \ - blis_transa, \ - m0, \ - k0, \ - alpha, \ - a, rs_a, cs_a, \ - b, rs_b, cs_b, \ - beta, \ - c, rs_c, cs_c ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploc, \ + blis_transa, \ + blis_transa, \ + m0, \ + k0, \ + alpha, \ + a, rs_a, cs_a, \ + b, rs_b, cs_b, \ + beta, \ + c, rs_c, cs_c, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_her2k.h b/frame/compat/bla_her2k.h index d166ae9d6..055c0fcf2 100644 --- a/frame/compat/bla_her2k.h +++ b/frame/compat/bla_her2k.h @@ -39,17 +39,18 @@ #undef GENTPROTCO #define GENTPROTCO( ftype, ftype_r, ch, chr, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploc, \ - f77_char* transa, \ - f77_int* m, \ - f77_int* k, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb, \ - ftype_r* beta, \ - ftype* c, f77_int* ldc \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploc, \ + f77_char* transa, \ + f77_int* m, \ + f77_int* k, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb, \ + ftype_r* beta, \ + ftype* c, f77_int* ldc \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTCO_BLAS( her2k ) diff --git a/frame/compat/bla_herk.c b/frame/compat/bla_herk.c index 948f28107..17b0bedcd 100644 --- a/frame/compat/bla_herk.c +++ b/frame/compat/bla_herk.c @@ -41,16 +41,17 @@ #undef GENTFUNCCO #define GENTFUNCCO( ftype, ftype_r, ch, chr, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploc, \ - f77_char* transa, \ - f77_int* m, \ - f77_int* k, \ - ftype_r* alpha, \ - ftype* a, f77_int* lda, \ - ftype_r* beta, \ - ftype* c, f77_int* ldc \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploc, \ + f77_char* transa, \ + f77_int* m, \ + f77_int* k, \ + ftype_r* alpha, \ + ftype* a, f77_int* lda, \ + ftype_r* beta, \ + ftype* c, f77_int* ldc \ + ) \ { \ uplo_t blis_uploc; \ trans_t blis_transa; \ @@ -63,14 +64,17 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploc, \ - transa, \ - m, \ - k, \ - lda, \ - ldc ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploc, \ + transa, \ + m, \ + k, \ + lda, \ + ldc \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploc, &blis_uploc ); \ @@ -104,14 +108,18 @@ void PASTEF77(ch,blasname)( \ cs_c = *ldc; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploc, \ - blis_transa, \ - m0, \ - k0, \ - alpha, \ - a, rs_a, cs_a, \ - beta, \ - c, rs_c, cs_c ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploc, \ + blis_transa, \ + m0, \ + k0, \ + alpha, \ + a, rs_a, cs_a, \ + beta, \ + c, rs_c, cs_c, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_herk.h b/frame/compat/bla_herk.h index d5db7850e..dfffd0b08 100644 --- a/frame/compat/bla_herk.h +++ b/frame/compat/bla_herk.h @@ -39,16 +39,17 @@ #undef GENTPROTCO #define GENTPROTCO( ftype, ftype_r, ch, chr, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploc, \ - f77_char* transa, \ - f77_int* m, \ - f77_int* k, \ - ftype_r* alpha, \ - ftype* a, f77_int* lda, \ - ftype_r* beta, \ - ftype* c, f77_int* ldc \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploc, \ + f77_char* transa, \ + f77_int* m, \ + f77_int* k, \ + ftype_r* alpha, \ + ftype* a, f77_int* lda, \ + ftype_r* beta, \ + ftype* c, f77_int* ldc \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTCO_BLAS( herk ) diff --git a/frame/compat/bla_nrm2.c b/frame/compat/bla_nrm2.c index 0021b1a68..a575a4088 100644 --- a/frame/compat/bla_nrm2.c +++ b/frame/compat/bla_nrm2.c @@ -41,10 +41,11 @@ #undef GENTFUNCR2 #define GENTFUNCR2( ftype_x, ftype_r, chx, chr, blasname, blisname ) \ \ -ftype_r PASTEF772(chr,chx,blasname)( \ - f77_int* n, \ - ftype_x* x, f77_int* incx \ - ) \ +ftype_r PASTEF772(chr,chx,blasname) \ + ( \ + f77_int* n, \ + ftype_x* x, f77_int* incx \ + ) \ { \ dim_t n0; \ ftype_x* x0; \ @@ -63,9 +64,13 @@ ftype_r PASTEF772(chr,chx,blasname)( \ bli_convert_blas_incv( n0, x, *incx, x0, incx0 ); \ \ /* Call BLIS interface. */ \ - PASTEMAC(chx,blisname)( n0, \ - x0, incx0, \ - &norm ); \ + PASTEMAC(chx,blisname) \ + ( \ + n0, \ + x0, incx0, \ + &norm, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_nrm2.h b/frame/compat/bla_nrm2.h index a2d043df7..dadbc5fc4 100644 --- a/frame/compat/bla_nrm2.h +++ b/frame/compat/bla_nrm2.h @@ -39,10 +39,11 @@ #undef GENTPROTR2 #define GENTPROTR2( ftype_x, ftype_r, chx, chr, blasname ) \ \ -ftype_r PASTEF772(chr,chx,blasname)( \ - f77_int* n, \ - ftype_x* x, f77_int* incx \ - ); \ +ftype_r PASTEF772(chr,chx,blasname) \ + ( \ + f77_int* n, \ + ftype_x* x, f77_int* incx \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTR2_BLAS( nrm2 ) diff --git a/frame/compat/bla_scal.c b/frame/compat/bla_scal.c index b108e9692..9258a073c 100644 --- a/frame/compat/bla_scal.c +++ b/frame/compat/bla_scal.c @@ -39,13 +39,14 @@ // Define BLAS-to-BLIS interfaces. // #undef GENTFUNCSCAL -#define GENTFUNCSCAL( ftype_a, ftype_x, cha, chx, blasname, blisname ) \ +#define GENTFUNCSCAL( ftype_x, ftype_a, chx, cha, blasname, blisname ) \ \ -void PASTEF772(chx,cha,blasname)( \ - f77_int* n, \ - ftype_a* alpha, \ - ftype_x* x, f77_int* incx \ - ) \ +void PASTEF772(chx,cha,blasname) \ + ( \ + f77_int* n, \ + ftype_a* alpha, \ + ftype_x* x, f77_int* incx \ + ) \ { \ dim_t n0; \ ftype_x* x0; \ @@ -63,25 +64,27 @@ void PASTEF772(chx,cha,blasname)( \ use positive increments instead. */ \ bli_convert_blas_incv( n0, x, *incx, x0, incx0 ); \ \ - /* NOTE: We do not natively implement BLAS's csscal/zdscal in BLIS - UNLESS mixed domain functionality is enabled at configure-time. - However, we don't want to assume that BLIS was configured that - way, so we will just always sub-optimally implement those cases + /* NOTE: We do not natively implement BLAS's csscal/zdscal in BLIS. + that is, we just always sub-optimally implement those cases by casting alpha to ctype_x (potentially the complex domain) and using the homogeneous datatype instance according to that type. */ \ PASTEMAC2(cha,chx,cast)( alpha, alpha_cast ); \ \ /* Call BLIS interface. */ \ - PASTEMAC2(chx,chx,blisname)( BLIS_NO_CONJUGATE, \ - n0, \ - &alpha_cast, \ - x0, incx0 ); \ + PASTEMAC(chx,blisname) \ + ( \ + BLIS_NO_CONJUGATE, \ + n0, \ + &alpha_cast, \ + x0, incx0, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ } #ifdef BLIS_ENABLE_BLAS2BLIS -INSERT_GENTFUNCSCAL_BLAS( scal, SCALV_KERNEL ) +INSERT_GENTFUNCSCAL_BLAS( scal, scalv ) #endif diff --git a/frame/compat/bla_scal.h b/frame/compat/bla_scal.h index 325494e04..7a15364d6 100644 --- a/frame/compat/bla_scal.h +++ b/frame/compat/bla_scal.h @@ -39,11 +39,12 @@ #undef GENTPROTSCAL #define GENTPROTSCAL( ftype_a, ftype_x, cha, chx, blasname ) \ \ -void PASTEF772(chx,cha,blasname)( \ - f77_int* n, \ - ftype_a* alpha, \ - ftype_x* x, f77_int* incx \ - ); +void PASTEF772(chx,cha,blasname) \ + ( \ + f77_int* n, \ + ftype_a* alpha, \ + ftype_x* x, f77_int* incx \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTSCAL_BLAS( scal ) diff --git a/frame/compat/bla_swap.c b/frame/compat/bla_swap.c index af350f2c6..cf22603a9 100644 --- a/frame/compat/bla_swap.c +++ b/frame/compat/bla_swap.c @@ -41,11 +41,12 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_int* n, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_int* n, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy \ + ) \ { \ dim_t n0; \ ftype* x0; \ @@ -66,15 +67,19 @@ void PASTEF77(ch,blasname)( \ bli_convert_blas_incv( n0, y, *incy, y0, incy0 ); \ \ /* Call BLIS interface. */ \ - PASTEMAC2(ch,ch,blisname)( n0, \ - x0, incx0, \ - y0, incy0 ); \ + PASTEMAC(ch,blisname) \ + ( \ + n0, \ + x0, incx0, \ + y0, incy0, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ } #ifdef BLIS_ENABLE_BLAS2BLIS -INSERT_GENTFUNC_BLAS( swap, SWAPV_KERNEL ) +INSERT_GENTFUNC_BLAS( swap, swapv ) #endif diff --git a/frame/compat/bla_swap.h b/frame/compat/bla_swap.h index 9186987ae..53ec754bb 100644 --- a/frame/compat/bla_swap.h +++ b/frame/compat/bla_swap.h @@ -39,11 +39,12 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_int* n, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_int* n, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( swap ) diff --git a/frame/compat/bla_symm.c b/frame/compat/bla_symm.c index 3fd5f81d7..3322faad3 100644 --- a/frame/compat/bla_symm.c +++ b/frame/compat/bla_symm.c @@ -41,17 +41,18 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* side, \ - f77_char* uploa, \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb, \ - ftype* beta, \ - ftype* c, f77_int* ldc \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* side, \ + f77_char* uploa, \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb, \ + ftype* beta, \ + ftype* c, f77_int* ldc \ + ) \ { \ side_t blis_side; \ uplo_t blis_uploa; \ @@ -65,15 +66,18 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - side, \ - uploa, \ - m, \ - n, \ - lda, \ - ldb, \ - ldc ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + side, \ + uploa, \ + m, \ + n, \ + lda, \ + ldb, \ + ldc \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_side( *side, &blis_side ); \ @@ -92,17 +96,21 @@ void PASTEF77(ch,blasname)( \ cs_c = *ldc; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_side, \ - blis_uploa, \ - BLIS_NO_CONJUGATE, \ - BLIS_NO_TRANSPOSE, \ - m0, \ - n0, \ - alpha, \ - a, rs_a, cs_a, \ - b, rs_b, cs_b, \ - beta, \ - c, rs_c, cs_c ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_side, \ + blis_uploa, \ + BLIS_NO_CONJUGATE, \ + BLIS_NO_TRANSPOSE, \ + m0, \ + n0, \ + alpha, \ + a, rs_a, cs_a, \ + b, rs_b, cs_b, \ + beta, \ + c, rs_c, cs_c, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_symm.h b/frame/compat/bla_symm.h index d5f523cf1..492bbcdd1 100644 --- a/frame/compat/bla_symm.h +++ b/frame/compat/bla_symm.h @@ -39,17 +39,18 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* side, \ - f77_char* uploa, \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb, \ - ftype* beta, \ - ftype* c, f77_int* ldc \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* side, \ + f77_char* uploa, \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb, \ + ftype* beta, \ + ftype* c, f77_int* ldc \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( symm ) diff --git a/frame/compat/bla_symv.c b/frame/compat/bla_symv.c index 38b61658f..5ed847721 100644 --- a/frame/compat/bla_symv.c +++ b/frame/compat/bla_symv.c @@ -41,15 +41,16 @@ #undef GENTFUNCRO #define GENTFUNCRO( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* x, f77_int* incx, \ - ftype* beta, \ - ftype* y, f77_int* incy \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* x, f77_int* incx, \ + ftype* beta, \ + ftype* y, f77_int* incy \ + ) \ { \ uplo_t blis_uploa; \ dim_t m0; \ @@ -64,13 +65,16 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploa, \ - m, \ - lda, \ - incx, \ - incy ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploa, \ + m, \ + lda, \ + incx, \ + incy \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploa, &blis_uploa ); \ @@ -88,15 +92,19 @@ void PASTEF77(ch,blasname)( \ cs_a = *lda; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploa, \ - BLIS_NO_CONJUGATE, \ - BLIS_NO_CONJUGATE, \ - m0, \ - alpha, \ - a, rs_a, cs_a, \ - x0, incx0, \ - beta, \ - y0, incy0 ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploa, \ + BLIS_NO_CONJUGATE, \ + BLIS_NO_CONJUGATE, \ + m0, \ + alpha, \ + a, rs_a, cs_a, \ + x0, incx0, \ + beta, \ + y0, incy0, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_symv.h b/frame/compat/bla_symv.h index 7aa3c1071..cb9fffcfd 100644 --- a/frame/compat/bla_symv.h +++ b/frame/compat/bla_symv.h @@ -39,15 +39,16 @@ #undef GENTPROTRO #define GENTPROTRO( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* x, f77_int* incx, \ - ftype* beta, \ - ftype* y, f77_int* incy \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* x, f77_int* incx, \ + ftype* beta, \ + ftype* y, f77_int* incy \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTRO_BLAS( symv ) diff --git a/frame/compat/bla_syr.c b/frame/compat/bla_syr.c index 676d8d319..112b25c8d 100644 --- a/frame/compat/bla_syr.c +++ b/frame/compat/bla_syr.c @@ -41,13 +41,14 @@ #undef GENTFUNCRO #define GENTFUNCRO( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype* alpha, \ - ftype* x, f77_int* incx, \ - ftype* a, f77_int* lda \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype* alpha, \ + ftype* x, f77_int* incx, \ + ftype* a, f77_int* lda \ + ) \ { \ uplo_t blis_uploa; \ dim_t m0; \ @@ -60,12 +61,15 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploa, \ - m, \ - incx, \ - lda ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploa, \ + m, \ + incx, \ + lda \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploa, &blis_uploa ); \ @@ -82,12 +86,16 @@ void PASTEF77(ch,blasname)( \ cs_a = *lda; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploa, \ - BLIS_NO_CONJUGATE, \ - m0, \ - alpha, \ - x0, incx0, \ - a, rs_a, cs_a ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploa, \ + BLIS_NO_CONJUGATE, \ + m0, \ + alpha, \ + x0, incx0, \ + a, rs_a, cs_a, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_syr.h b/frame/compat/bla_syr.h index 39e097927..c1260bd2f 100644 --- a/frame/compat/bla_syr.h +++ b/frame/compat/bla_syr.h @@ -39,13 +39,14 @@ #undef GENTPROTRO #define GENTPROTRO( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype* alpha, \ - ftype* x, f77_int* incx, \ - ftype* a, f77_int* lda \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype* alpha, \ + ftype* x, f77_int* incx, \ + ftype* a, f77_int* lda \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTRO_BLAS( syr ) diff --git a/frame/compat/bla_syr2.c b/frame/compat/bla_syr2.c index b4e76dd2f..0a4d5a4b5 100644 --- a/frame/compat/bla_syr2.c +++ b/frame/compat/bla_syr2.c @@ -41,14 +41,15 @@ #undef GENTFUNCRO #define GENTFUNCRO( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype* alpha, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy, \ - ftype* a, f77_int* lda \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype* alpha, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy, \ + ftype* a, f77_int* lda \ + ) \ { \ uplo_t blis_uploa; \ dim_t m0; \ @@ -63,13 +64,17 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploa, \ - m, \ - incx, \ - incy, \ - lda ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploa, \ + m, \ + incx, \ + incy, \ + lda \ + ); \ +\ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploa, &blis_uploa ); \ @@ -87,14 +92,18 @@ void PASTEF77(ch,blasname)( \ cs_a = *lda; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploa, \ - BLIS_NO_CONJUGATE, \ - BLIS_NO_CONJUGATE, \ - m0, \ - alpha, \ - x0, incx0, \ - y0, incy0, \ - a, rs_a, cs_a ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploa, \ + BLIS_NO_CONJUGATE, \ + BLIS_NO_CONJUGATE, \ + m0, \ + alpha, \ + x0, incx0, \ + y0, incy0, \ + a, rs_a, cs_a, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_syr2.h b/frame/compat/bla_syr2.h index 845d45858..34ec60d81 100644 --- a/frame/compat/bla_syr2.h +++ b/frame/compat/bla_syr2.h @@ -39,14 +39,15 @@ #undef GENTPROTRO #define GENTPROTRO( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_int* m, \ - ftype* alpha, \ - ftype* x, f77_int* incx, \ - ftype* y, f77_int* incy, \ - ftype* a, f77_int* lda \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_int* m, \ + ftype* alpha, \ + ftype* x, f77_int* incx, \ + ftype* y, f77_int* incy, \ + ftype* a, f77_int* lda \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROTRO_BLAS( syr2 ) diff --git a/frame/compat/bla_syr2k.c b/frame/compat/bla_syr2k.c index 854637301..637d2ac3e 100644 --- a/frame/compat/bla_syr2k.c +++ b/frame/compat/bla_syr2k.c @@ -41,20 +41,21 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploc, \ - f77_char* trans, \ - f77_int* m, \ - f77_int* k, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb, \ - ftype* beta, \ - ftype* c, f77_int* ldc \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploc, \ + f77_char* transa, \ + f77_int* m, \ + f77_int* k, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb, \ + ftype* beta, \ + ftype* c, f77_int* ldc \ + ) \ { \ uplo_t blis_uploc; \ - trans_t blis_trans; \ + trans_t blis_transa; \ dim_t m0, k0; \ inc_t rs_a, cs_a; \ inc_t rs_b, cs_b; \ @@ -65,27 +66,30 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploc, \ - trans, \ - m, \ - k, \ - lda, \ - ldb, \ - ldc ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploc, \ + transa, \ + m, \ + k, \ + lda, \ + ldb, \ + ldc \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploc, &blis_uploc ); \ - bli_param_map_netlib_to_blis_trans( *trans, &blis_trans ); \ + bli_param_map_netlib_to_blis_trans( *transa, &blis_transa ); \ \ /* The real domain ssyr2k and dsyr2k in netlib BLAS treat a trans value of 'C' (conjugate-transpose) as 'T' (transpose only). So, we have to go out of our way a little to support this behavior. */ \ if ( bli_is_real( PASTEMAC(ch,type) ) && \ - bli_is_conjtrans( blis_trans ) ) \ + bli_is_conjtrans( blis_transa ) ) \ { \ - blis_trans = BLIS_TRANSPOSE; \ + blis_transa = BLIS_TRANSPOSE; \ } \ \ /* Convert/typecast negative values of m and k to zero. */ \ @@ -101,16 +105,20 @@ void PASTEF77(ch,blasname)( \ cs_c = *ldc; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploc, \ - blis_trans, \ - blis_trans, \ - m0, \ - k0, \ - alpha, \ - a, rs_a, cs_a, \ - b, rs_b, cs_b, \ - beta, \ - c, rs_c, cs_c ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploc, \ + blis_transa, \ + blis_transa, \ + m0, \ + k0, \ + alpha, \ + a, rs_a, cs_a, \ + b, rs_b, cs_b, \ + beta, \ + c, rs_c, cs_c, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_syr2k.h b/frame/compat/bla_syr2k.h index 70d68e654..ac4bbaf47 100644 --- a/frame/compat/bla_syr2k.h +++ b/frame/compat/bla_syr2k.h @@ -39,17 +39,18 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploc, \ - f77_char* transa, \ - f77_int* m, \ - f77_int* k, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb, \ - ftype* beta, \ - ftype* c, f77_int* ldc \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploc, \ + f77_char* transa, \ + f77_int* m, \ + f77_int* k, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb, \ + ftype* beta, \ + ftype* c, f77_int* ldc \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( syr2k ) diff --git a/frame/compat/bla_syrk.c b/frame/compat/bla_syrk.c index 0cb926153..d48012b3f 100644 --- a/frame/compat/bla_syrk.c +++ b/frame/compat/bla_syrk.c @@ -41,16 +41,17 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploc, \ - f77_char* transa, \ - f77_int* m, \ - f77_int* k, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* beta, \ - ftype* c, f77_int* ldc \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploc, \ + f77_char* transa, \ + f77_int* m, \ + f77_int* k, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* beta, \ + ftype* c, f77_int* ldc \ + ) \ { \ uplo_t blis_uploc; \ trans_t blis_transa; \ @@ -63,14 +64,17 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploc, \ - transa, \ - m, \ - k, \ - lda, \ - ldc ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploc, \ + transa, \ + m, \ + k, \ + lda, \ + ldc \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploc, &blis_uploc ); \ @@ -96,14 +100,18 @@ void PASTEF77(ch,blasname)( \ cs_c = *ldc; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploc, \ - blis_transa, \ - m0, \ - k0, \ - alpha, \ - a, rs_a, cs_a, \ - beta, \ - c, rs_c, cs_c ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploc, \ + blis_transa, \ + m0, \ + k0, \ + alpha, \ + a, rs_a, cs_a, \ + beta, \ + c, rs_c, cs_c, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_syrk.h b/frame/compat/bla_syrk.h index e7b152454..2eed8ddba 100644 --- a/frame/compat/bla_syrk.h +++ b/frame/compat/bla_syrk.h @@ -39,16 +39,17 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploc, \ - f77_char* transa, \ - f77_int* m, \ - f77_int* k, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* beta, \ - ftype* c, f77_int* ldc \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploc, \ + f77_char* transa, \ + f77_int* m, \ + f77_int* k, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* beta, \ + ftype* c, f77_int* ldc \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( syrk ) diff --git a/frame/compat/bla_trmm.c b/frame/compat/bla_trmm.c index 8bbb2322a..c591f5b26 100644 --- a/frame/compat/bla_trmm.c +++ b/frame/compat/bla_trmm.c @@ -41,17 +41,18 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* side, \ - f77_char* uploa, \ - f77_char* transa, \ - f77_char* diaga, \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* side, \ + f77_char* uploa, \ + f77_char* transa, \ + f77_char* diaga, \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb \ + ) \ { \ side_t blis_side; \ uplo_t blis_uploa; \ @@ -66,16 +67,19 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - side, \ - uploa, \ - transa, \ - diaga, \ - m, \ - n, \ - lda, \ - ldb ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + side, \ + uploa, \ + transa, \ + diaga, \ + m, \ + n, \ + lda, \ + ldb \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_side( *side, &blis_side ); \ @@ -94,15 +98,19 @@ void PASTEF77(ch,blasname)( \ cs_b = *ldb; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_side, \ - blis_uploa, \ - blis_transa, \ - blis_diaga, \ - m0, \ - n0, \ - alpha, \ - a, rs_a, cs_a, \ - b, rs_b, cs_b ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_side, \ + blis_uploa, \ + blis_transa, \ + blis_diaga, \ + m0, \ + n0, \ + alpha, \ + a, rs_a, cs_a, \ + b, rs_b, cs_b, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_trmm.h b/frame/compat/bla_trmm.h index aa8e1375b..8ff642a94 100644 --- a/frame/compat/bla_trmm.h +++ b/frame/compat/bla_trmm.h @@ -39,17 +39,18 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* side, \ - f77_char* uploa, \ - f77_char* transa, \ - f77_char* diaga, \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* side, \ + f77_char* uploa, \ + f77_char* transa, \ + f77_char* diaga, \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( trmm ) diff --git a/frame/compat/bla_trmv.c b/frame/compat/bla_trmv.c index 239bf9781..f47a677ff 100644 --- a/frame/compat/bla_trmv.c +++ b/frame/compat/bla_trmv.c @@ -41,14 +41,15 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_char* transa, \ - f77_char* diaga, \ - f77_int* m, \ - ftype* a, f77_int* lda, \ - ftype* x, f77_int* incx \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_char* transa, \ + f77_char* diaga, \ + f77_int* m, \ + ftype* a, f77_int* lda, \ + ftype* x, f77_int* incx \ + ) \ { \ uplo_t blis_uploa; \ trans_t blis_transa; \ @@ -64,14 +65,17 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploa, \ - transa, \ - diaga, \ - m, \ - lda, \ - incx ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploa, \ + transa, \ + diaga, \ + m, \ + lda, \ + incx \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploa, &blis_uploa ); \ @@ -93,13 +97,17 @@ void PASTEF77(ch,blasname)( \ one_p = PASTEMAC(ch,1); \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploa, \ - blis_transa, \ - blis_diaga, \ - m0, \ - one_p, \ - a, rs_a, cs_a, \ - x0, incx0 ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploa, \ + blis_transa, \ + blis_diaga, \ + m0, \ + one_p, \ + a, rs_a, cs_a, \ + x0, incx0, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_trmv.h b/frame/compat/bla_trmv.h index 16611faf4..fb3ccca09 100644 --- a/frame/compat/bla_trmv.h +++ b/frame/compat/bla_trmv.h @@ -39,14 +39,15 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_char* transa, \ - f77_char* diaga, \ - f77_int* m, \ - ftype* a, f77_int* lda, \ - ftype* x, f77_int* incx \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_char* transa, \ + f77_char* diaga, \ + f77_int* m, \ + ftype* a, f77_int* lda, \ + ftype* x, f77_int* incx \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( trmv ) diff --git a/frame/compat/bla_trsm.c b/frame/compat/bla_trsm.c index 5e2c8a2c1..e028c4b94 100644 --- a/frame/compat/bla_trsm.c +++ b/frame/compat/bla_trsm.c @@ -41,17 +41,18 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* side, \ - f77_char* uploa, \ - f77_char* transa, \ - f77_char* diaga, \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* side, \ + f77_char* uploa, \ + f77_char* transa, \ + f77_char* diaga, \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb \ + ) \ { \ side_t blis_side; \ uplo_t blis_uploa; \ @@ -66,16 +67,19 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - side, \ - uploa, \ - transa, \ - diaga, \ - m, \ - n, \ - lda, \ - ldb ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + side, \ + uploa, \ + transa, \ + diaga, \ + m, \ + n, \ + lda, \ + ldb \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_side( *side, &blis_side ); \ @@ -94,15 +98,19 @@ void PASTEF77(ch,blasname)( \ cs_b = *ldb; \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_side, \ - blis_uploa, \ - blis_transa, \ - blis_diaga, \ - m0, \ - n0, \ - alpha, \ - a, rs_a, cs_a, \ - b, rs_b, cs_b ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_side, \ + blis_uploa, \ + blis_transa, \ + blis_diaga, \ + m0, \ + n0, \ + alpha, \ + a, rs_a, cs_a, \ + b, rs_b, cs_b, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_trsm.h b/frame/compat/bla_trsm.h index 27625422c..8ef12fe9c 100644 --- a/frame/compat/bla_trsm.h +++ b/frame/compat/bla_trsm.h @@ -39,17 +39,18 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* side, \ - f77_char* uploa, \ - f77_char* transa, \ - f77_char* diaga, \ - f77_int* m, \ - f77_int* n, \ - ftype* alpha, \ - ftype* a, f77_int* lda, \ - ftype* b, f77_int* ldb \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* side, \ + f77_char* uploa, \ + f77_char* transa, \ + f77_char* diaga, \ + f77_int* m, \ + f77_int* n, \ + ftype* alpha, \ + ftype* a, f77_int* lda, \ + ftype* b, f77_int* ldb \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( trsm ) diff --git a/frame/compat/bla_trsv.c b/frame/compat/bla_trsv.c index b46406c55..bff4016de 100644 --- a/frame/compat/bla_trsv.c +++ b/frame/compat/bla_trsv.c @@ -41,14 +41,15 @@ #undef GENTFUNC #define GENTFUNC( ftype, ch, blasname, blisname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_char* transa, \ - f77_char* diaga, \ - f77_int* m, \ - ftype* a, f77_int* lda, \ - ftype* x, f77_int* incx \ - ) \ +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_char* transa, \ + f77_char* diaga, \ + f77_int* m, \ + ftype* a, f77_int* lda, \ + ftype* x, f77_int* incx \ + ) \ { \ uplo_t blis_uploa; \ trans_t blis_transa; \ @@ -64,14 +65,17 @@ void PASTEF77(ch,blasname)( \ bli_init_auto( &init_result ); \ \ /* Perform BLAS parameter checking. */ \ - PASTEBLACHK(blasname)( MKSTR(ch), \ - MKSTR(blasname), \ - uploa, \ - transa, \ - diaga, \ - m, \ - lda, \ - incx ); \ + PASTEBLACHK(blasname) \ + ( \ + MKSTR(ch), \ + MKSTR(blasname), \ + uploa, \ + transa, \ + diaga, \ + m, \ + lda, \ + incx \ + ); \ \ /* Map BLAS chars to their corresponding BLIS enumerated type value. */ \ bli_param_map_netlib_to_blis_uplo( *uploa, &blis_uploa ); \ @@ -93,13 +97,17 @@ void PASTEF77(ch,blasname)( \ one_p = PASTEMAC(ch,1); \ \ /* Call BLIS interface. */ \ - PASTEMAC(ch,blisname)( blis_uploa, \ - blis_transa, \ - blis_diaga, \ - m0, \ - one_p, \ - a, rs_a, cs_a, \ - x0, incx0 ); \ + PASTEMAC(ch,blisname) \ + ( \ + blis_uploa, \ + blis_transa, \ + blis_diaga, \ + m0, \ + one_p, \ + a, rs_a, cs_a, \ + x0, incx0, \ + NULL \ + ); \ \ /* Finalize BLIS (if it was initialized above). */ \ bli_finalize_auto( init_result ); \ diff --git a/frame/compat/bla_trsv.h b/frame/compat/bla_trsv.h index d82feecbe..2292ad021 100644 --- a/frame/compat/bla_trsv.h +++ b/frame/compat/bla_trsv.h @@ -39,14 +39,15 @@ #undef GENTPROT #define GENTPROT( ftype, ch, blasname ) \ \ -void PASTEF77(ch,blasname)( \ - f77_char* uploa, \ - f77_char* transa, \ - f77_char* diaga, \ - f77_int* m, \ - ftype* a, f77_int* lda, \ - ftype* x, f77_int* incx \ - ); +void PASTEF77(ch,blasname) \ + ( \ + f77_char* uploa, \ + f77_char* transa, \ + f77_char* diaga, \ + f77_int* m, \ + ftype* a, f77_int* lda, \ + ftype* x, f77_int* incx \ + ); #ifdef BLIS_ENABLE_BLAS2BLIS INSERT_GENTPROT_BLAS( trsv ) diff --git a/frame/compat/check/bla_gemm_check.c b/frame/compat/check/bla_gemm_check.c index b0d524397..aa454e9e5 100644 --- a/frame/compat/check/bla_gemm_check.c +++ b/frame/compat/check/bla_gemm_check.c @@ -36,16 +36,19 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_gemm_check( char* dt_str, - char* op_str, - f77_char* transa, - f77_char* transb, - f77_int* m, - f77_int* n, - f77_int* k, - f77_int* lda, - f77_int* ldb, - f77_int* ldc ) +void bla_gemm_check + ( + char* dt_str, + char* op_str, + f77_char* transa, + f77_char* transb, + f77_int* m, + f77_int* n, + f77_int* k, + f77_int* lda, + f77_int* ldb, + f77_int* ldc + ) { f77_int info = 0; f77_int nota, notb; @@ -53,42 +56,59 @@ void bla_gemm_check( char* dt_str, f77_int ta, tb; f77_int nrowa, nrowb; - nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 ); - notb = PASTEF770(lsame)( transb, "N", (ftnlen)1, (ftnlen)1 ); - conja = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 ); - conjb = PASTEF770(lsame)( transb, "C", (ftnlen)1, (ftnlen)1 ); - ta = PASTEF770(lsame)( transa, "T", (ftnlen)1, (ftnlen)1 ); - tb = PASTEF770(lsame)( transb, "T", (ftnlen)1, (ftnlen)1 ); + nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 + ); + notb = PASTEF770(lsame)( transb, "N", (ftnlen)1, (ftnlen)1 + ); + conja = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 + ); + conjb = PASTEF770(lsame)( transb, "C", (ftnlen)1, (ftnlen)1 + ); + ta = PASTEF770(lsame)( transa, "T", (ftnlen)1, (ftnlen)1 + ); + tb = PASTEF770(lsame)( transb, "T", (ftnlen)1, (ftnlen)1 + ); if ( nota ) { nrowa = *m; } else { nrowa = *k; } if ( notb ) { nrowb = *k; } else { nrowb = *n; } - if ( !nota && !conja && !ta ) + if ( !nota && !conja && !ta + ) info = 1; - else if ( !notb && !conjb && !tb ) + else if ( !notb && !conjb && !tb + ) info = 2; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 3; - else if ( *n < 0 ) + else if ( *n < 0 + ) info = 4; - else if ( *k < 0 ) + else if ( *k < 0 + ) info = 5; - else if ( *lda < bli_max( 1, nrowa ) ) + else if ( *lda < bli_max( 1, nrowa ) + ) info = 8; - else if ( *ldb < bli_max( 1, nrowb ) ) + else if ( *ldb < bli_max( 1, nrowb ) + ) info = 10; - else if ( *ldc < bli_max( 1, *m ) ) + else if ( *ldc < bli_max( 1, *m ) + ) info = 13; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_gemm_check.h b/frame/compat/check/bla_gemm_check.h index 4282d947a..338bc36a7 100644 --- a/frame/compat/check/bla_gemm_check.h +++ b/frame/compat/check/bla_gemm_check.h @@ -34,15 +34,18 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_gemm_check( char* dt_str, - char* op_str, - f77_char* transa, - f77_char* transb, - f77_int* m, - f77_int* n, - f77_int* k, - f77_int* lda, - f77_int* ldb, - f77_int* ldc ); +void bla_gemm_check + ( + char* dt_str, + char* op_str, + f77_char* transa, + f77_char* transb, + f77_int* m, + f77_int* n, + f77_int* k, + f77_int* lda, + f77_int* ldb, + f77_int* ldc + ); #endif diff --git a/frame/compat/check/bla_gemv_check.c b/frame/compat/check/bla_gemv_check.c index eec338dc8..bd135ac52 100644 --- a/frame/compat/check/bla_gemv_check.c +++ b/frame/compat/check/bla_gemv_check.c @@ -36,42 +36,57 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_gemv_check( char* dt_str, - char* op_str, - f77_char* transa, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* incx, - f77_int* incy ) +void bla_gemv_check + ( + char* dt_str, + char* op_str, + f77_char* transa, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* incx, + f77_int* incy + ) { f77_int info = 0; f77_int nota, ta, conja; - nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 ); - ta = PASTEF770(lsame)( transa, "T", (ftnlen)1, (ftnlen)1 ); - conja = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 ); + nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 + ); + ta = PASTEF770(lsame)( transa, "T", (ftnlen)1, (ftnlen)1 + ); + conja = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 + ); - if ( !nota && !ta && !conja ) + if ( !nota && !ta && !conja + ) info = 1; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 2; - else if ( *n < 0 ) + else if ( *n < 0 + ) info = 3; - else if ( *lda < bli_max( 1, *m ) ) + else if ( *lda < bli_max( 1, *m ) + ) info = 6; - else if ( *incx == 0 ) + else if ( *incx == 0 + ) info = 8; - else if ( *incy == 0 ) + else if ( *incy == 0 + ) info = 11; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_gemv_check.h b/frame/compat/check/bla_gemv_check.h index a640d6a18..452d900cc 100644 --- a/frame/compat/check/bla_gemv_check.h +++ b/frame/compat/check/bla_gemv_check.h @@ -34,13 +34,16 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_gemv_check( char* dt_str, - char* op_str, - f77_char* transa, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* incx, - f77_int* incy ); +void bla_gemv_check + ( + char* dt_str, + char* op_str, + f77_char* transa, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* incx, + f77_int* incy + ); #endif diff --git a/frame/compat/check/bla_ger_check.c b/frame/compat/check/bla_ger_check.c index 31838d01d..7f8190c80 100644 --- a/frame/compat/check/bla_ger_check.c +++ b/frame/compat/check/bla_ger_check.c @@ -36,34 +36,45 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_ger_check( char* dt_str, - char* op_str, - f77_int* m, - f77_int* n, - f77_int* incx, - f77_int* incy, - f77_int* lda ) +void bla_ger_check + ( + char* dt_str, + char* op_str, + f77_int* m, + f77_int* n, + f77_int* incx, + f77_int* incy, + f77_int* lda + ) { f77_int info = 0; - if ( *m < 0 ) + if ( *m < 0 + ) info = 1; - else if ( *n < 0 ) + else if ( *n < 0 + ) info = 2; - else if ( *incx == 0 ) + else if ( *incx == 0 + ) info = 5; - else if ( *incy == 0 ) + else if ( *incy == 0 + ) info = 7; - else if ( *lda < bli_max( 1, *m ) ) + else if ( *lda < bli_max( 1, *m ) + ) info = 9; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_ger_check.h b/frame/compat/check/bla_ger_check.h index 3e1a27f81..a2733b570 100644 --- a/frame/compat/check/bla_ger_check.h +++ b/frame/compat/check/bla_ger_check.h @@ -34,12 +34,15 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_ger_check( char* dt_str, - char* op_str, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* incx, - f77_int* incy ); +void bla_ger_check + ( + char* dt_str, + char* op_str, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* incx, + f77_int* incy + ); #endif diff --git a/frame/compat/check/bla_hemm_check.c b/frame/compat/check/bla_hemm_check.c index f0b1238e0..9dd8b6500 100644 --- a/frame/compat/check/bla_hemm_check.c +++ b/frame/compat/check/bla_hemm_check.c @@ -36,51 +36,68 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_hemm_check( char* dt_str, - char* op_str, - f77_char* sidea, - f77_char* uploa, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* ldb, - f77_int* ldc ) +void bla_hemm_check + ( + char* dt_str, + char* op_str, + f77_char* sidea, + f77_char* uploa, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* ldb, + f77_int* ldc + ) { f77_int info = 0; f77_int left, right; f77_int lower, upper; f77_int nrowa; - left = PASTEF770(lsame)( sidea, "L", (ftnlen)1, (ftnlen)1 ); - right = PASTEF770(lsame)( sidea, "R", (ftnlen)1, (ftnlen)1 ); - lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 ); - upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 ); + left = PASTEF770(lsame)( sidea, "L", (ftnlen)1, (ftnlen)1 + ); + right = PASTEF770(lsame)( sidea, "R", (ftnlen)1, (ftnlen)1 + ); + lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 + ); + upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 + ); if ( left ) { nrowa = *m; } else { nrowa = *n; } - if ( !left && !right ) + if ( !left && !right + ) info = 1; - else if ( !lower && !upper ) + else if ( !lower && !upper + ) info = 2; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 3; - else if ( *n < 0 ) + else if ( *n < 0 + ) info = 4; - else if ( *lda < bli_max( 1, nrowa ) ) + else if ( *lda < bli_max( 1, nrowa ) + ) info = 7; - else if ( *ldb < bli_max( 1, *m ) ) + else if ( *ldb < bli_max( 1, *m ) + ) info = 9; - else if ( *ldc < bli_max( 1, *m ) ) + else if ( *ldc < bli_max( 1, *m ) + ) info = 12; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_hemm_check.h b/frame/compat/check/bla_hemm_check.h index cf7308678..0a5323e4c 100644 --- a/frame/compat/check/bla_hemm_check.h +++ b/frame/compat/check/bla_hemm_check.h @@ -34,14 +34,17 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_hemm_check( char* dt_str, - char* op_str, - f77_char* sidea, - f77_char* uploa, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* ldb, - f77_int* ldc ); +void bla_hemm_check + ( + char* dt_str, + char* op_str, + f77_char* sidea, + f77_char* uploa, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* ldb, + f77_int* ldc + ); #endif diff --git a/frame/compat/check/bla_hemv_check.c b/frame/compat/check/bla_hemv_check.c index 3ca58ef49..ad6209890 100644 --- a/frame/compat/check/bla_hemv_check.c +++ b/frame/compat/check/bla_hemv_check.c @@ -36,38 +36,51 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_hemv_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_int* m, - f77_int* lda, - f77_int* incx, - f77_int* incy ) +void bla_hemv_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_int* m, + f77_int* lda, + f77_int* incx, + f77_int* incy + ) { f77_int info = 0; f77_int lower, upper; - lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 ); - upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 ); + lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 + ); + upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 + ); - if ( !lower && !upper ) + if ( !lower && !upper + ) info = 1; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 2; - else if ( *lda < bli_max( 1, *m ) ) + else if ( *lda < bli_max( 1, *m ) + ) info = 5; - else if ( *incx == 0 ) + else if ( *incx == 0 + ) info = 7; - else if ( *incy == 0 ) + else if ( *incy == 0 + ) info = 10; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_hemv_check.h b/frame/compat/check/bla_hemv_check.h index 9bddec287..742abd8cb 100644 --- a/frame/compat/check/bla_hemv_check.h +++ b/frame/compat/check/bla_hemv_check.h @@ -34,12 +34,15 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_hemv_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_int* m, - f77_int* lda, - f77_int* incx, - f77_int* incy ); +void bla_hemv_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_int* m, + f77_int* lda, + f77_int* incx, + f77_int* incy + ); #endif diff --git a/frame/compat/check/bla_her2_check.c b/frame/compat/check/bla_her2_check.c index 43396e19b..7b989fbe0 100644 --- a/frame/compat/check/bla_her2_check.c +++ b/frame/compat/check/bla_her2_check.c @@ -36,38 +36,51 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_her2_check( char* dt_str, - char* op_str, - f77_char* uploc, - f77_int* m, - f77_int* incx, - f77_int* incy, - f77_int* lda ) +void bla_her2_check + ( + char* dt_str, + char* op_str, + f77_char* uploc, + f77_int* m, + f77_int* incx, + f77_int* incy, + f77_int* lda + ) { f77_int info = 0; f77_int lower, upper; - lower = PASTEF770(lsame)( uploc, "L", (ftnlen)1, (ftnlen)1 ); - upper = PASTEF770(lsame)( uploc, "U", (ftnlen)1, (ftnlen)1 ); + lower = PASTEF770(lsame)( uploc, "L", (ftnlen)1, (ftnlen)1 + ); + upper = PASTEF770(lsame)( uploc, "U", (ftnlen)1, (ftnlen)1 + ); - if ( !lower && !upper ) + if ( !lower && !upper + ) info = 1; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 2; - else if ( *incx == 0 ) + else if ( *incx == 0 + ) info = 5; - else if ( *incy == 0 ) + else if ( *incy == 0 + ) info = 7; - else if ( *lda < bli_max( 1, *m ) ) + else if ( *lda < bli_max( 1, *m ) + ) info = 9; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_her2_check.h b/frame/compat/check/bla_her2_check.h index 617c1a37f..684080768 100644 --- a/frame/compat/check/bla_her2_check.h +++ b/frame/compat/check/bla_her2_check.h @@ -34,12 +34,15 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_her2_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_int* m, - f77_int* incx, - f77_int* incy, - f77_int* lda ); +void bla_her2_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_int* m, + f77_int* incx, + f77_int* incy, + f77_int* lda + ); #endif diff --git a/frame/compat/check/bla_her2k_check.c b/frame/compat/check/bla_her2k_check.c index 492bfc9fa..151c18308 100644 --- a/frame/compat/check/bla_her2k_check.c +++ b/frame/compat/check/bla_her2k_check.c @@ -36,51 +36,68 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_her2k_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_char* trans, - f77_int* m, - f77_int* k, - f77_int* lda, - f77_int* ldb, - f77_int* ldc ) +void bla_her2k_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_char* trans, + f77_int* m, + f77_int* k, + f77_int* lda, + f77_int* ldb, + f77_int* ldc + ) { f77_int info = 0; f77_int nota, conja; f77_int lower, upper; f77_int nrowa; - nota = PASTEF770(lsame)( trans, "N", (ftnlen)1, (ftnlen)1 ); - conja = PASTEF770(lsame)( trans, "C", (ftnlen)1, (ftnlen)1 ); - lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 ); - upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 ); + nota = PASTEF770(lsame)( trans, "N", (ftnlen)1, (ftnlen)1 + ); + conja = PASTEF770(lsame)( trans, "C", (ftnlen)1, (ftnlen)1 + ); + lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 + ); + upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 + ); if ( nota ) { nrowa = *m; } else { nrowa = *k; } - if ( !lower && !upper ) + if ( !lower && !upper + ) info = 1; - else if ( !nota && !conja ) + else if ( !nota && !conja + ) info = 2; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 3; - else if ( *k < 0 ) + else if ( *k < 0 + ) info = 4; - else if ( *lda < bli_max( 1, nrowa ) ) + else if ( *lda < bli_max( 1, nrowa ) + ) info = 7; - else if ( *ldb < bli_max( 1, nrowa ) ) + else if ( *ldb < bli_max( 1, nrowa ) + ) info = 9; - else if ( *ldc < bli_max( 1, *m ) ) + else if ( *ldc < bli_max( 1, *m ) + ) info = 12; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_her2k_check.h b/frame/compat/check/bla_her2k_check.h index c67372778..9b5657481 100644 --- a/frame/compat/check/bla_her2k_check.h +++ b/frame/compat/check/bla_her2k_check.h @@ -34,14 +34,17 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_her2k_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_char* transa, - f77_int* m, - f77_int* k, - f77_int* lda, - f77_int* ldb, - f77_int* ldc ); +void bla_her2k_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_char* transa, + f77_int* m, + f77_int* k, + f77_int* lda, + f77_int* ldb, + f77_int* ldc + ); #endif diff --git a/frame/compat/check/bla_her_check.c b/frame/compat/check/bla_her_check.c index 0db28fca2..9ebde741c 100644 --- a/frame/compat/check/bla_her_check.c +++ b/frame/compat/check/bla_her_check.c @@ -36,35 +36,47 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_her_check( char* dt_str, - char* op_str, - f77_char* uploc, - f77_int* m, - f77_int* incx, - f77_int* lda ) +void bla_her_check + ( + char* dt_str, + char* op_str, + f77_char* uploc, + f77_int* m, + f77_int* incx, + f77_int* lda + ) { f77_int info = 0; f77_int lower, upper; - lower = PASTEF770(lsame)( uploc, "L", (ftnlen)1, (ftnlen)1 ); - upper = PASTEF770(lsame)( uploc, "U", (ftnlen)1, (ftnlen)1 ); + lower = PASTEF770(lsame)( uploc, "L", (ftnlen)1, (ftnlen)1 + ); + upper = PASTEF770(lsame)( uploc, "U", (ftnlen)1, (ftnlen)1 + ); - if ( !lower && !upper ) + if ( !lower && !upper + ) info = 1; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 2; - else if ( *incx == 0 ) + else if ( *incx == 0 + ) info = 5; - else if ( *lda < bli_max( 1, *m ) ) + else if ( *lda < bli_max( 1, *m ) + ) info = 7; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_her_check.h b/frame/compat/check/bla_her_check.h index d4b69abee..18a9fbb89 100644 --- a/frame/compat/check/bla_her_check.h +++ b/frame/compat/check/bla_her_check.h @@ -34,11 +34,14 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_her_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_int* m, - f77_int* incx, - f77_int* lda ); +void bla_her_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_int* m, + f77_int* incx, + f77_int* lda + ); #endif diff --git a/frame/compat/check/bla_herk_check.c b/frame/compat/check/bla_herk_check.c index dd221bfc7..65195cd88 100644 --- a/frame/compat/check/bla_herk_check.c +++ b/frame/compat/check/bla_herk_check.c @@ -36,48 +36,64 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_herk_check( char* dt_str, - char* op_str, - f77_char* uploc, - f77_char* transa, - f77_int* m, - f77_int* k, - f77_int* lda, - f77_int* ldc ) +void bla_herk_check + ( + char* dt_str, + char* op_str, + f77_char* uploc, + f77_char* transa, + f77_int* m, + f77_int* k, + f77_int* lda, + f77_int* ldc + ) { f77_int info = 0; f77_int nota, conja; f77_int lower, upper; f77_int nrowa; - nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 ); - conja = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 ); - lower = PASTEF770(lsame)( uploc, "L", (ftnlen)1, (ftnlen)1 ); - upper = PASTEF770(lsame)( uploc, "U", (ftnlen)1, (ftnlen)1 ); + nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 + ); + conja = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 + ); + lower = PASTEF770(lsame)( uploc, "L", (ftnlen)1, (ftnlen)1 + ); + upper = PASTEF770(lsame)( uploc, "U", (ftnlen)1, (ftnlen)1 + ); if ( nota ) { nrowa = *m; } else { nrowa = *k; } - if ( !lower && !upper ) + if ( !lower && !upper + ) info = 1; - else if ( !nota && !conja ) + else if ( !nota && !conja + ) info = 2; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 3; - else if ( *k < 0 ) + else if ( *k < 0 + ) info = 4; - else if ( *lda < bli_max( 1, nrowa ) ) + else if ( *lda < bli_max( 1, nrowa ) + ) info = 7; - else if ( *ldc < bli_max( 1, *m ) ) + else if ( *ldc < bli_max( 1, *m ) + ) info = 10; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_herk_check.h b/frame/compat/check/bla_herk_check.h index 95f91a114..7abc9cceb 100644 --- a/frame/compat/check/bla_herk_check.h +++ b/frame/compat/check/bla_herk_check.h @@ -34,13 +34,16 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_herk_check( char* dt_str, - char* op_str, - f77_char* uploc, - f77_char* transa, - f77_int* m, - f77_int* k, - f77_int* lda, - f77_int* ldc ); +void bla_herk_check + ( + char* dt_str, + char* op_str, + f77_char* uploc, + f77_char* transa, + f77_int* m, + f77_int* k, + f77_int* lda, + f77_int* ldc + ); #endif diff --git a/frame/compat/check/bla_symm_check.c b/frame/compat/check/bla_symm_check.c index e95646a24..e62540c39 100644 --- a/frame/compat/check/bla_symm_check.c +++ b/frame/compat/check/bla_symm_check.c @@ -36,17 +36,22 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_symm_check( char* dt_str, - char* op_str, - f77_char* sidea, - f77_char* uploa, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* ldb, - f77_int* ldc ) +void bla_symm_check + ( + char* dt_str, + char* op_str, + f77_char* sidea, + f77_char* uploa, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* ldb, + f77_int* ldc + ) { - bla_hemm_check( dt_str, + bla_hemm_check + ( + dt_str, op_str, sidea, uploa, @@ -54,7 +59,8 @@ void bla_symm_check( char* dt_str, n, lda, ldb, - ldc ); + ldc + ); } #endif diff --git a/frame/compat/check/bla_symm_check.h b/frame/compat/check/bla_symm_check.h index e3d22264f..2bf18d085 100644 --- a/frame/compat/check/bla_symm_check.h +++ b/frame/compat/check/bla_symm_check.h @@ -34,14 +34,17 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_symm_check( char* dt_str, - char* op_str, - f77_char* sidea, - f77_char* uploa, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* ldb, - f77_int* ldc ); +void bla_symm_check + ( + char* dt_str, + char* op_str, + f77_char* sidea, + f77_char* uploa, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* ldb, + f77_int* ldc + ); #endif diff --git a/frame/compat/check/bla_symv_check.c b/frame/compat/check/bla_symv_check.c index d9540dfbc..355a16766 100644 --- a/frame/compat/check/bla_symv_check.c +++ b/frame/compat/check/bla_symv_check.c @@ -36,21 +36,27 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_symv_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_int* m, - f77_int* lda, - f77_int* incx, - f77_int* incy ) +void bla_symv_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_int* m, + f77_int* lda, + f77_int* incx, + f77_int* incy + ) { - bla_hemv_check( dt_str, + bla_hemv_check + ( + dt_str, op_str, uploa, m, lda, incx, - incy ); + incy + ); } #endif diff --git a/frame/compat/check/bla_symv_check.h b/frame/compat/check/bla_symv_check.h index 06d4b92ab..8149cdf54 100644 --- a/frame/compat/check/bla_symv_check.h +++ b/frame/compat/check/bla_symv_check.h @@ -34,12 +34,15 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_symv_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_int* m, - f77_int* lda, - f77_int* incx, - f77_int* incy ); +void bla_symv_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_int* m, + f77_int* lda, + f77_int* incx, + f77_int* incy + ); #endif diff --git a/frame/compat/check/bla_syr2_check.c b/frame/compat/check/bla_syr2_check.c index 45ecaa799..82ffcc4e7 100644 --- a/frame/compat/check/bla_syr2_check.c +++ b/frame/compat/check/bla_syr2_check.c @@ -36,21 +36,27 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_syr2_check( char* dt_str, - char* op_str, - f77_char* uploc, - f77_int* m, - f77_int* incx, - f77_int* incy, - f77_int* lda ) +void bla_syr2_check + ( + char* dt_str, + char* op_str, + f77_char* uploc, + f77_int* m, + f77_int* incx, + f77_int* incy, + f77_int* lda + ) { - bla_her2_check( dt_str, + bla_her2_check + ( + dt_str, op_str, uploc, m, incx, incy, - lda ); + lda + ); } #endif diff --git a/frame/compat/check/bla_syr2_check.h b/frame/compat/check/bla_syr2_check.h index ab70439ee..782a32149 100644 --- a/frame/compat/check/bla_syr2_check.h +++ b/frame/compat/check/bla_syr2_check.h @@ -34,12 +34,15 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_syr2_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_int* m, - f77_int* incx, - f77_int* incy, - f77_int* lda ); +void bla_syr2_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_int* m, + f77_int* incx, + f77_int* incy, + f77_int* lda + ); #endif diff --git a/frame/compat/check/bla_syr2k_check.c b/frame/compat/check/bla_syr2k_check.c index 9182854dc..78386fbab 100644 --- a/frame/compat/check/bla_syr2k_check.c +++ b/frame/compat/check/bla_syr2k_check.c @@ -36,52 +36,70 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_syr2k_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_char* trans, - f77_int* m, - f77_int* k, - f77_int* lda, - f77_int* ldb, - f77_int* ldc ) +void bla_syr2k_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_char* trans, + f77_int* m, + f77_int* k, + f77_int* lda, + f77_int* ldb, + f77_int* ldc + ) { f77_int info = 0; f77_int nota, ta, cta; f77_int lower, upper; f77_int nrowa; - nota = PASTEF770(lsame)( trans, "N", (ftnlen)1, (ftnlen)1 ); - ta = PASTEF770(lsame)( trans, "T", (ftnlen)1, (ftnlen)1 ); - cta = PASTEF770(lsame)( trans, "C", (ftnlen)1, (ftnlen)1 ); - lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 ); - upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 ); + nota = PASTEF770(lsame)( trans, "N", (ftnlen)1, (ftnlen)1 + ); + ta = PASTEF770(lsame)( trans, "T", (ftnlen)1, (ftnlen)1 + ); + cta = PASTEF770(lsame)( trans, "C", (ftnlen)1, (ftnlen)1 + ); + lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 + ); + upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 + ); if ( nota ) { nrowa = *m; } else { nrowa = *k; } - if ( !lower && !upper ) + if ( !lower && !upper + ) info = 1; - else if ( !nota && !ta && !cta ) + else if ( !nota && !ta && !cta + ) info = 2; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 3; - else if ( *k < 0 ) + else if ( *k < 0 + ) info = 4; - else if ( *lda < bli_max( 1, nrowa ) ) + else if ( *lda < bli_max( 1, nrowa ) + ) info = 7; - else if ( *ldb < bli_max( 1, nrowa ) ) + else if ( *ldb < bli_max( 1, nrowa ) + ) info = 9; - else if ( *ldc < bli_max( 1, *m ) ) + else if ( *ldc < bli_max( 1, *m ) + ) info = 12; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_syr2k_check.h b/frame/compat/check/bla_syr2k_check.h index e97fff663..d9a3cd0f3 100644 --- a/frame/compat/check/bla_syr2k_check.h +++ b/frame/compat/check/bla_syr2k_check.h @@ -34,14 +34,17 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_syr2k_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_char* trans, - f77_int* m, - f77_int* k, - f77_int* lda, - f77_int* ldb, - f77_int* ldc ); +void bla_syr2k_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_char* trans, + f77_int* m, + f77_int* k, + f77_int* lda, + f77_int* ldb, + f77_int* ldc + ); #endif diff --git a/frame/compat/check/bla_syr_check.c b/frame/compat/check/bla_syr_check.c index bcd48ff19..12647837c 100644 --- a/frame/compat/check/bla_syr_check.c +++ b/frame/compat/check/bla_syr_check.c @@ -36,19 +36,25 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_syr_check( char* dt_str, - char* op_str, - f77_char* uploc, - f77_int* m, - f77_int* incx, - f77_int* lda ) +void bla_syr_check + ( + char* dt_str, + char* op_str, + f77_char* uploc, + f77_int* m, + f77_int* incx, + f77_int* lda + ) { - bla_her_check( dt_str, + bla_her_check + ( + dt_str, op_str, uploc, m, incx, - lda ); + lda + ); } #endif diff --git a/frame/compat/check/bla_syr_check.h b/frame/compat/check/bla_syr_check.h index b52f80ea7..93d368d11 100644 --- a/frame/compat/check/bla_syr_check.h +++ b/frame/compat/check/bla_syr_check.h @@ -34,11 +34,14 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_syr_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_int* m, - f77_int* incx, - f77_int* lda ); +void bla_syr_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_int* m, + f77_int* incx, + f77_int* lda + ); #endif diff --git a/frame/compat/check/bla_syrk_check.c b/frame/compat/check/bla_syrk_check.c index 19f8f7e2e..cbc9eb59e 100644 --- a/frame/compat/check/bla_syrk_check.c +++ b/frame/compat/check/bla_syrk_check.c @@ -36,49 +36,66 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_syrk_check( char* dt_str, - char* op_str, - f77_char* uploc, - f77_char* transa, - f77_int* m, - f77_int* k, - f77_int* lda, - f77_int* ldc ) +void bla_syrk_check + ( + char* dt_str, + char* op_str, + f77_char* uploc, + f77_char* transa, + f77_int* m, + f77_int* k, + f77_int* lda, + f77_int* ldc + ) { f77_int info = 0; f77_int nota, ta, cta; f77_int lower, upper; f77_int nrowa; - nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 ); - ta = PASTEF770(lsame)( transa, "T", (ftnlen)1, (ftnlen)1 ); - cta = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 ); - lower = PASTEF770(lsame)( uploc, "L", (ftnlen)1, (ftnlen)1 ); - upper = PASTEF770(lsame)( uploc, "U", (ftnlen)1, (ftnlen)1 ); + nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 + ); + ta = PASTEF770(lsame)( transa, "T", (ftnlen)1, (ftnlen)1 + ); + cta = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 + ); + lower = PASTEF770(lsame)( uploc, "L", (ftnlen)1, (ftnlen)1 + ); + upper = PASTEF770(lsame)( uploc, "U", (ftnlen)1, (ftnlen)1 + ); if ( nota ) { nrowa = *m; } else { nrowa = *k; } - if ( !lower && !upper ) + if ( !lower && !upper + ) info = 1; - else if ( !nota && !ta && !cta ) + else if ( !nota && !ta && !cta + ) info = 2; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 3; - else if ( *k < 0 ) + else if ( *k < 0 + ) info = 4; - else if ( *lda < bli_max( 1, nrowa ) ) + else if ( *lda < bli_max( 1, nrowa ) + ) info = 7; - else if ( *ldc < bli_max( 1, *m ) ) + else if ( *ldc < bli_max( 1, *m ) + ) info = 10; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_syrk_check.h b/frame/compat/check/bla_syrk_check.h index f01614079..7ed796e15 100644 --- a/frame/compat/check/bla_syrk_check.h +++ b/frame/compat/check/bla_syrk_check.h @@ -34,13 +34,16 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_syrk_check( char* dt_str, - char* op_str, - f77_char* uploc, - f77_char* transa, - f77_int* m, - f77_int* k, - f77_int* lda, - f77_int* ldc ); +void bla_syrk_check + ( + char* dt_str, + char* op_str, + f77_char* uploc, + f77_char* transa, + f77_int* m, + f77_int* k, + f77_int* lda, + f77_int* ldc + ); #endif diff --git a/frame/compat/check/bla_trmm_check.c b/frame/compat/check/bla_trmm_check.c index 7e8924be7..e5f73f051 100644 --- a/frame/compat/check/bla_trmm_check.c +++ b/frame/compat/check/bla_trmm_check.c @@ -36,16 +36,19 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_trmm_check( char* dt_str, - char* op_str, - f77_char* sidea, - f77_char* uploa, - f77_char* transa, - f77_char* diaga, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* ldb ) +void bla_trmm_check + ( + char* dt_str, + char* op_str, + f77_char* sidea, + f77_char* uploa, + f77_char* transa, + f77_char* diaga, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* ldb + ) { f77_int info = 0; f77_int left, right; @@ -54,43 +57,63 @@ void bla_trmm_check( char* dt_str, f77_int unita, nonua; f77_int nrowa; - left = PASTEF770(lsame)( sidea, "L", (ftnlen)1, (ftnlen)1 ); - right = PASTEF770(lsame)( sidea, "R", (ftnlen)1, (ftnlen)1 ); - lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 ); - upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 ); - nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 ); - ta = PASTEF770(lsame)( transa, "T", (ftnlen)1, (ftnlen)1 ); - conja = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 ); - unita = PASTEF770(lsame)( diaga, "U", (ftnlen)1, (ftnlen)1 ); - nonua = PASTEF770(lsame)( diaga, "N", (ftnlen)1, (ftnlen)1 ); + left = PASTEF770(lsame)( sidea, "L", (ftnlen)1, (ftnlen)1 + ); + right = PASTEF770(lsame)( sidea, "R", (ftnlen)1, (ftnlen)1 + ); + lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 + ); + upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 + ); + nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 + ); + ta = PASTEF770(lsame)( transa, "T", (ftnlen)1, (ftnlen)1 + ); + conja = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 + ); + unita = PASTEF770(lsame)( diaga, "U", (ftnlen)1, (ftnlen)1 + ); + nonua = PASTEF770(lsame)( diaga, "N", (ftnlen)1, (ftnlen)1 + ); if ( left ) { nrowa = *m; } else { nrowa = *n; } - if ( !left && !right ) + if ( !left && !right + ) info = 1; - else if ( !lower && !upper ) + else if ( !lower && !upper + ) info = 2; - else if ( !nota && !ta && !conja ) + else if ( !nota && !ta && !conja + ) info = 3; - else if ( !unita && !nonua ) + else if ( !unita && !nonua + ) info = 4; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 5; - else if ( *n < 0 ) + else if ( *n < 0 + ) info = 6; - else if ( *lda < bli_max( 1, nrowa ) ) + else if ( *lda < bli_max( 1, nrowa ) + ) info = 9; - else if ( *ldb < bli_max( 1, *m ) ) + else if ( *ldb < bli_max( 1, *m ) + ) info = 11; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_trmm_check.h b/frame/compat/check/bla_trmm_check.h index e6460a820..f004c2534 100644 --- a/frame/compat/check/bla_trmm_check.h +++ b/frame/compat/check/bla_trmm_check.h @@ -34,15 +34,18 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_trmm_check( char* dt_str, - char* op_str, - f77_char* sidea, - f77_char* uploa, - f77_char* transa, - f77_char* diaga, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* ldb ); +void bla_trmm_check + ( + char* dt_str, + char* op_str, + f77_char* sidea, + f77_char* uploa, + f77_char* transa, + f77_char* diaga, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* ldb + ); #endif diff --git a/frame/compat/check/bla_trmv_check.c b/frame/compat/check/bla_trmv_check.c index 9f8b6dc8a..04ea061ef 100644 --- a/frame/compat/check/bla_trmv_check.c +++ b/frame/compat/check/bla_trmv_check.c @@ -36,48 +36,67 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_trmv_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_char* transa, - f77_char* diaga, - f77_int* m, - f77_int* lda, - f77_int* incx ) +void bla_trmv_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_char* transa, + f77_char* diaga, + f77_int* m, + f77_int* lda, + f77_int* incx + ) { f77_int info = 0; f77_int lower, upper; f77_int nota, ta, conja; f77_int unita, nonua; - lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 ); - upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 ); - nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 ); - ta = PASTEF770(lsame)( transa, "T", (ftnlen)1, (ftnlen)1 ); - conja = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 ); - unita = PASTEF770(lsame)( diaga, "U", (ftnlen)1, (ftnlen)1 ); - nonua = PASTEF770(lsame)( diaga, "N", (ftnlen)1, (ftnlen)1 ); + lower = PASTEF770(lsame)( uploa, "L", (ftnlen)1, (ftnlen)1 + ); + upper = PASTEF770(lsame)( uploa, "U", (ftnlen)1, (ftnlen)1 + ); + nota = PASTEF770(lsame)( transa, "N", (ftnlen)1, (ftnlen)1 + ); + ta = PASTEF770(lsame)( transa, "T", (ftnlen)1, (ftnlen)1 + ); + conja = PASTEF770(lsame)( transa, "C", (ftnlen)1, (ftnlen)1 + ); + unita = PASTEF770(lsame)( diaga, "U", (ftnlen)1, (ftnlen)1 + ); + nonua = PASTEF770(lsame)( diaga, "N", (ftnlen)1, (ftnlen)1 + ); - if ( !lower && !upper ) + if ( !lower && !upper + ) info = 1; - else if ( !nota && !ta && !conja ) + else if ( !nota && !ta && !conja + ) info = 2; - else if ( !unita && !nonua ) + else if ( !unita && !nonua + ) info = 3; - else if ( *m < 0 ) + else if ( *m < 0 + ) info = 4; - else if ( *lda < bli_max( 1, *m ) ) + else if ( *lda < bli_max( 1, *m ) + ) info = 6; - else if ( *incx == 0 ) + else if ( *incx == 0 + ) info = 8; - if ( info != 0 ) + if ( info != 0 + ) { char func_str[ BLIS_MAX_BLAS_FUNC_STR_LENGTH ]; - sprintf( func_str, "%s%-5s", dt_str, op_str ); + sprintf( func_str, "%s%-5s", dt_str, op_str + ); - PASTEF770(xerbla)( func_str, &info, (ftnlen)6 ); + PASTEF770(xerbla)( func_str, &info, (ftnlen)6 + ); } } diff --git a/frame/compat/check/bla_trmv_check.h b/frame/compat/check/bla_trmv_check.h index e1cd113ce..81d2be9d5 100644 --- a/frame/compat/check/bla_trmv_check.h +++ b/frame/compat/check/bla_trmv_check.h @@ -34,13 +34,16 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_trmv_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_char* transa, - f77_char* diaga, - f77_int* m, - f77_int* lda, - f77_int* incx ); +void bla_trmv_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_char* transa, + f77_char* diaga, + f77_int* m, + f77_int* lda, + f77_int* incx + ); #endif diff --git a/frame/compat/check/bla_trsm_check.c b/frame/compat/check/bla_trsm_check.c index 4ce1f17fe..f3e878c4e 100644 --- a/frame/compat/check/bla_trsm_check.c +++ b/frame/compat/check/bla_trsm_check.c @@ -36,18 +36,23 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_trsm_check( char* dt_str, - char* op_str, - f77_char* sidea, - f77_char* uploa, - f77_char* transa, - f77_char* diaga, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* ldb ) +void bla_trsm_check + ( + char* dt_str, + char* op_str, + f77_char* sidea, + f77_char* uploa, + f77_char* transa, + f77_char* diaga, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* ldb + ) { - bla_trmm_check( dt_str, + bla_trmm_check + ( + dt_str, op_str, sidea, uploa, @@ -56,7 +61,8 @@ void bla_trsm_check( char* dt_str, m, n, lda, - ldb ); + ldb + ); } #endif diff --git a/frame/compat/check/bla_trsm_check.h b/frame/compat/check/bla_trsm_check.h index 7aec9dee3..dd45cce05 100644 --- a/frame/compat/check/bla_trsm_check.h +++ b/frame/compat/check/bla_trsm_check.h @@ -34,15 +34,18 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_trsm_check( char* dt_str, - char* op_str, - f77_char* sidea, - f77_char* uploa, - f77_char* transa, - f77_char* diaga, - f77_int* m, - f77_int* n, - f77_int* lda, - f77_int* ldb ); +void bla_trsm_check + ( + char* dt_str, + char* op_str, + f77_char* sidea, + f77_char* uploa, + f77_char* transa, + f77_char* diaga, + f77_int* m, + f77_int* n, + f77_int* lda, + f77_int* ldb + ); #endif diff --git a/frame/compat/check/bla_trsv_check.c b/frame/compat/check/bla_trsv_check.c index b456689d8..913ac502d 100644 --- a/frame/compat/check/bla_trsv_check.c +++ b/frame/compat/check/bla_trsv_check.c @@ -36,23 +36,29 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_trsv_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_char* transa, - f77_char* diaga, - f77_int* m, - f77_int* lda, - f77_int* incx ) +void bla_trsv_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_char* transa, + f77_char* diaga, + f77_int* m, + f77_int* lda, + f77_int* incx + ) { - bla_trmv_check( dt_str, + bla_trmv_check + ( + dt_str, op_str, uploa, transa, diaga, m, lda, - incx ); + incx + ); } #endif diff --git a/frame/compat/check/bla_trsv_check.h b/frame/compat/check/bla_trsv_check.h index 9ba97be6b..42cd00689 100644 --- a/frame/compat/check/bla_trsv_check.h +++ b/frame/compat/check/bla_trsv_check.h @@ -34,13 +34,16 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -void bla_trsv_check( char* dt_str, - char* op_str, - f77_char* uploa, - f77_char* transa, - f77_char* diaga, - f77_int* m, - f77_int* lda, - f77_int* incx ); +void bla_trsv_check + ( + char* dt_str, + char* op_str, + f77_char* uploa, + f77_char* transa, + f77_char* diaga, + f77_int* m, + f77_int* lda, + f77_int* incx + ); #endif diff --git a/frame/compat/f2c/bla_gbmv.h b/frame/compat/f2c/bla_gbmv.h index ca420385c..ec2bffe17 100644 --- a/frame/compat/f2c/bla_gbmv.h +++ b/frame/compat/f2c/bla_gbmv.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(c,gbmv)(bla_character *trans, bla_integer *m, bla_integer *n, bla_integer *kl, bla_integer *ku, bla_scomplex *alpha, bla_scomplex *a, bla_integer *lda, bla_scomplex *x, bla_integer *incx, bla_scomplex *beta, bla_scomplex *y, bla_integer *incy); diff --git a/frame/compat/f2c/bla_hbmv.h b/frame/compat/f2c/bla_hbmv.h index 64d89e6cb..406e7d1a9 100644 --- a/frame/compat/f2c/bla_hbmv.h +++ b/frame/compat/f2c/bla_hbmv.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(c,hbmv)(bla_character *uplo, bla_integer *n, bla_integer *k, bla_scomplex * alpha, bla_scomplex *a, bla_integer *lda, bla_scomplex *x, bla_integer *incx, bla_scomplex *beta, bla_scomplex *y, bla_integer *incy); diff --git a/frame/compat/f2c/bla_hpmv.h b/frame/compat/f2c/bla_hpmv.h index 1c8716ab6..0878c8e4c 100644 --- a/frame/compat/f2c/bla_hpmv.h +++ b/frame/compat/f2c/bla_hpmv.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(c,hpmv)(bla_character *uplo, bla_integer *n, bla_scomplex *alpha, bla_scomplex * ap, bla_scomplex *x, bla_integer *incx, bla_scomplex *beta, bla_scomplex *y, bla_integer *incy); diff --git a/frame/compat/f2c/bla_hpr.h b/frame/compat/f2c/bla_hpr.h index cc6cb6695..036538f21 100644 --- a/frame/compat/f2c/bla_hpr.h +++ b/frame/compat/f2c/bla_hpr.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(c,hpr)(bla_character *uplo, bla_integer *n, bla_real *alpha, bla_scomplex *x, bla_integer *incx, bla_scomplex *ap); diff --git a/frame/compat/f2c/bla_hpr2.h b/frame/compat/f2c/bla_hpr2.h index 5bef2c4c1..0b1e254b7 100644 --- a/frame/compat/f2c/bla_hpr2.h +++ b/frame/compat/f2c/bla_hpr2.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(c,hpr2)(bla_character *uplo, bla_integer *n, bla_scomplex *alpha, bla_scomplex *x, bla_integer *incx, bla_scomplex *y, bla_integer *incy, bla_scomplex *ap); diff --git a/frame/compat/f2c/bla_lsame.h b/frame/compat/f2c/bla_lsame.h index 9afc4f566..0b5eb175e 100644 --- a/frame/compat/f2c/bla_lsame.h +++ b/frame/compat/f2c/bla_lsame.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS bla_logical PASTEF770(lsame)(bla_character *ca, bla_character *cb, ftnlen ca_len, ftnlen cb_len); diff --git a/frame/compat/f2c/bla_rot.h b/frame/compat/f2c/bla_rot.h index 3f2b378ea..f55b5492b 100644 --- a/frame/compat/f2c/bla_rot.h +++ b/frame/compat/f2c/bla_rot.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(s,rot)(bla_integer *n, bla_real *sx, bla_integer *incx, bla_real *sy, bla_integer *incy, bla_real *c__, bla_real *s); diff --git a/frame/compat/f2c/bla_rotg.h b/frame/compat/f2c/bla_rotg.h index 70781ea30..a41fe26ab 100644 --- a/frame/compat/f2c/bla_rotg.h +++ b/frame/compat/f2c/bla_rotg.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(s,rotg)(bla_real *sa, bla_real *sb, bla_real *c__, bla_real *s); diff --git a/frame/compat/f2c/bla_rotm.h b/frame/compat/f2c/bla_rotm.h index 6f09c1caa..28cf8ec25 100644 --- a/frame/compat/f2c/bla_rotm.h +++ b/frame/compat/f2c/bla_rotm.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(s,rotm)(bla_integer *n, bla_real *sx, bla_integer *incx, bla_real *sy, bla_integer *incy, bla_real *sparam); diff --git a/frame/compat/f2c/bla_rotmg.h b/frame/compat/f2c/bla_rotmg.h index 228f5231c..e6cccf8b7 100644 --- a/frame/compat/f2c/bla_rotmg.h +++ b/frame/compat/f2c/bla_rotmg.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(s,rotmg)(bla_real *sd1, bla_real *sd2, bla_real *sx1, bla_real *sy1, bla_real *sparam); diff --git a/frame/compat/f2c/bla_sbmv.h b/frame/compat/f2c/bla_sbmv.h index c79be1734..16f0dbb37 100644 --- a/frame/compat/f2c/bla_sbmv.h +++ b/frame/compat/f2c/bla_sbmv.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(d,sbmv)(bla_character *uplo, bla_integer *n, bla_integer *k, bla_double *alpha, bla_double *a, bla_integer *lda, bla_double *x, bla_integer *incx, bla_double *beta, bla_double *y, bla_integer *incy); diff --git a/frame/compat/f2c/bla_spmv.h b/frame/compat/f2c/bla_spmv.h index 13c09bb48..d58349345 100644 --- a/frame/compat/f2c/bla_spmv.h +++ b/frame/compat/f2c/bla_spmv.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(d,spmv)(bla_character *uplo, bla_integer *n, bla_double *alpha, bla_double *ap, bla_double *x, bla_integer *incx, bla_double *beta, bla_double *y, bla_integer *incy); diff --git a/frame/compat/f2c/bla_spr.h b/frame/compat/f2c/bla_spr.h index 3fbcedc49..2e9d4523a 100644 --- a/frame/compat/f2c/bla_spr.h +++ b/frame/compat/f2c/bla_spr.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(d,spr)(bla_character *uplo, bla_integer *n, bla_double *alpha, bla_double *x, bla_integer *incx, bla_double *ap); diff --git a/frame/compat/f2c/bla_spr2.h b/frame/compat/f2c/bla_spr2.h index 17f8d325d..50f18d928 100644 --- a/frame/compat/f2c/bla_spr2.h +++ b/frame/compat/f2c/bla_spr2.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(d,spr2)(bla_character *uplo, bla_integer *n, bla_double *alpha, bla_double *x, bla_integer *incx, bla_double *y, bla_integer *incy, bla_double *ap); diff --git a/frame/compat/f2c/bla_tbmv.h b/frame/compat/f2c/bla_tbmv.h index 16eecae83..e73cdaa3b 100644 --- a/frame/compat/f2c/bla_tbmv.h +++ b/frame/compat/f2c/bla_tbmv.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(c,tbmv)(bla_character *uplo, bla_character *trans, bla_character *diag, bla_integer *n, bla_integer *k, bla_scomplex *a, bla_integer *lda, bla_scomplex *x, bla_integer *incx); diff --git a/frame/compat/f2c/bla_tbsv.h b/frame/compat/f2c/bla_tbsv.h index e94adf830..8fc6b2772 100644 --- a/frame/compat/f2c/bla_tbsv.h +++ b/frame/compat/f2c/bla_tbsv.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(c,tbsv)(bla_character *uplo, bla_character *trans, bla_character *diag, bla_integer *n, bla_integer *k, bla_scomplex *a, bla_integer *lda, bla_scomplex *x, bla_integer *incx); diff --git a/frame/compat/f2c/bla_tpmv.h b/frame/compat/f2c/bla_tpmv.h index 8731a5d83..68c841dd0 100644 --- a/frame/compat/f2c/bla_tpmv.h +++ b/frame/compat/f2c/bla_tpmv.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(c,tpmv)(bla_character *uplo, bla_character *trans, bla_character *diag, bla_integer *n, bla_scomplex *ap, bla_scomplex *x, bla_integer *incx); diff --git a/frame/compat/f2c/bla_tpsv.h b/frame/compat/f2c/bla_tpsv.h index fc7676a4d..1dcbeebfe 100644 --- a/frame/compat/f2c/bla_tpsv.h +++ b/frame/compat/f2c/bla_tpsv.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF77(c,tpsv)(bla_character *uplo, bla_character *trans, bla_character *diag, bla_integer *n, bla_scomplex *ap, bla_scomplex *x, bla_integer *incx); diff --git a/frame/compat/f2c/bla_xerbla.h b/frame/compat/f2c/bla_xerbla.h index d35195ed5..597f467b8 100644 --- a/frame/compat/f2c/bla_xerbla.h +++ b/frame/compat/f2c/bla_xerbla.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS int PASTEF770(xerbla)(bla_character *srname, bla_integer *info, ftnlen srname_len); diff --git a/frame/compat/f2c/util/bla_c_abs.h b/frame/compat/f2c/util/bla_c_abs.h index be3ee036d..5531e2cea 100644 --- a/frame/compat/f2c/util/bla_c_abs.h +++ b/frame/compat/f2c/util/bla_c_abs.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS double bla_c_abs(bla_scomplex *z); diff --git a/frame/compat/f2c/util/bla_c_div.h b/frame/compat/f2c/util/bla_c_div.h index fc152f034..fdfdb177a 100644 --- a/frame/compat/f2c/util/bla_c_div.h +++ b/frame/compat/f2c/util/bla_c_div.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS void bla_c_div(bla_scomplex *cp, bla_scomplex *ap, bla_scomplex *bp); diff --git a/frame/compat/f2c/util/bla_d_abs.h b/frame/compat/f2c/util/bla_d_abs.h index c0d72ffc1..6166f8c83 100644 --- a/frame/compat/f2c/util/bla_d_abs.h +++ b/frame/compat/f2c/util/bla_d_abs.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS double bla_d_abs(bla_double *x); diff --git a/frame/compat/f2c/util/bla_d_cnjg.h b/frame/compat/f2c/util/bla_d_cnjg.h index 4f713817c..1e8b9129b 100644 --- a/frame/compat/f2c/util/bla_d_cnjg.h +++ b/frame/compat/f2c/util/bla_d_cnjg.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS void bla_d_cnjg(bla_dcomplex *dest, bla_dcomplex *src); diff --git a/frame/compat/f2c/util/bla_d_imag.h b/frame/compat/f2c/util/bla_d_imag.h index 59b3c6615..5e2fa0c14 100644 --- a/frame/compat/f2c/util/bla_d_imag.h +++ b/frame/compat/f2c/util/bla_d_imag.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS double bla_d_imag(bla_dcomplex *z); diff --git a/frame/compat/f2c/util/bla_d_sign.h b/frame/compat/f2c/util/bla_d_sign.h index 5606aa4b5..74f4b015f 100644 --- a/frame/compat/f2c/util/bla_d_sign.h +++ b/frame/compat/f2c/util/bla_d_sign.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS double bla_d_sign(bla_double *a, bla_double *b); diff --git a/frame/compat/f2c/util/bla_f__cabs.c b/frame/compat/f2c/util/bla_f__cabs.c index 96a313db2..c5e6b5cdb 100644 --- a/frame/compat/f2c/util/bla_f__cabs.c +++ b/frame/compat/f2c/util/bla_f__cabs.c @@ -36,26 +36,26 @@ #ifdef BLIS_ENABLE_BLAS2BLIS -double bla_f__cabs(double bla_real, double imag) +double bla_f__cabs(double real_val, double imag_val) { double temp; - if(bla_real < 0) - bla_real = -bla_real; - if(imag < 0) - imag = -imag; - if(imag > bla_real) + if(real_val < 0) + real_val = -real_val; + if(imag_val < 0) + imag_val = -imag_val; + if(imag_val > real_val) { - temp = bla_real; - bla_real = imag; - imag = temp; + temp = real_val; + real_val = imag_val; + imag_val = temp; } - if((bla_real+imag) == bla_real) - return(bla_real); + if((real_val+imag_val) == real_val) + return(real_val); - temp = imag/bla_real; - temp = bla_real*sqrt(1.0 + temp*temp); + temp = imag_val/real_val; + temp = real_val*sqrt(1.0 + temp*temp); return(temp); } diff --git a/frame/compat/f2c/util/bla_f__cabs.h b/frame/compat/f2c/util/bla_f__cabs.h index 022e3c619..56c7faec2 100644 --- a/frame/compat/f2c/util/bla_f__cabs.h +++ b/frame/compat/f2c/util/bla_f__cabs.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS double bla_f__cabs(double bla_real, double imag); diff --git a/frame/compat/f2c/util/bla_r_abs.h b/frame/compat/f2c/util/bla_r_abs.h index b80fbb09b..5f9f7416c 100644 --- a/frame/compat/f2c/util/bla_r_abs.h +++ b/frame/compat/f2c/util/bla_r_abs.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS double bla_r_abs(bla_real *x); diff --git a/frame/compat/f2c/util/bla_r_cnjg.h b/frame/compat/f2c/util/bla_r_cnjg.h index c6a786402..9da9c8f87 100644 --- a/frame/compat/f2c/util/bla_r_cnjg.h +++ b/frame/compat/f2c/util/bla_r_cnjg.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS void bla_r_cnjg(bla_scomplex *dest, bla_scomplex *src); diff --git a/frame/compat/f2c/util/bla_r_imag.h b/frame/compat/f2c/util/bla_r_imag.h index 7492d5e3f..ecb5d0178 100644 --- a/frame/compat/f2c/util/bla_r_imag.h +++ b/frame/compat/f2c/util/bla_r_imag.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS bla_real bla_r_imag(bla_scomplex *z); diff --git a/frame/compat/f2c/util/bla_r_sign.h b/frame/compat/f2c/util/bla_r_sign.h index 7053ffcda..595ad4b4d 100644 --- a/frame/compat/f2c/util/bla_r_sign.h +++ b/frame/compat/f2c/util/bla_r_sign.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS double bla_r_sign(bla_real *a, bla_real *b); diff --git a/frame/compat/f2c/util/bla_z_abs.h b/frame/compat/f2c/util/bla_z_abs.h index 739b9c5ef..ba6236f40 100644 --- a/frame/compat/f2c/util/bla_z_abs.h +++ b/frame/compat/f2c/util/bla_z_abs.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS double bla_z_abs(bla_dcomplex *z); diff --git a/frame/compat/f2c/util/bla_z_div.h b/frame/compat/f2c/util/bla_z_div.h index e367098a6..daca277e3 100644 --- a/frame/compat/f2c/util/bla_z_div.h +++ b/frame/compat/f2c/util/bla_z_div.h @@ -32,8 +32,6 @@ */ -#include "blis.h" - #ifdef BLIS_ENABLE_BLAS2BLIS void bla_z_div(bla_dcomplex *cp, bla_dcomplex *ap, bla_dcomplex *bp); diff --git a/frame/include/bli_config_macro_defs.h b/frame/include/bli_config_macro_defs.h index 3f33c7e66..3b4ace0c8 100644 --- a/frame/include/bli_config_macro_defs.h +++ b/frame/include/bli_config_macro_defs.h @@ -87,10 +87,32 @@ #define BLIS_PAGE_SIZE 4096 #endif +// Number of named SIMD vector registers available for use. +#ifndef BLIS_SIMD_NUM_REGISTERS +#define BLIS_SIMD_NUM_REGISTERS 16 +#endif + +// Size (in bytes) of each SIMD vector. +#ifndef BLIS_SIMD_SIZE +#define BLIS_SIMD_SIZE 32 +#endif + // Alignment size (in bytes) needed by the instruction set for aligned // SIMD/vector instructions. #ifndef BLIS_SIMD_ALIGN_SIZE -#define BLIS_SIMD_ALIGN_SIZE 32 +#define BLIS_SIMD_ALIGN_SIZE BLIS_SIMD_SIZE +#endif + +// The maximum size in bytes of local stack buffers within macro-kernel +// functions. These buffers are usually used to store a temporary copy +// of a single microtile. The reason we multiply by 2 is to handle induced +// methods, where we use real domain register blocksizes in units of +// complex elements. Specifically, the macro-kernels will need this larger +// micro-tile footprint, even though the virtual micro-kernels will only +// ever be writing to half (real or imaginary part) at a time. +#ifndef BLIS_STACK_BUF_MAX_SIZE +#define BLIS_STACK_BUF_MAX_SIZE ( BLIS_SIMD_NUM_REGISTERS * \ + BLIS_SIMD_SIZE * 2 ) #endif // Alignment size used to align local stack buffers within macro-kernel diff --git a/frame/include/bli_genarray_macro_defs.h b/frame/include/bli_genarray_macro_defs.h index f3ea0b7c3..4305ce989 100644 --- a/frame/include/bli_genarray_macro_defs.h +++ b/frame/include/bli_genarray_macro_defs.h @@ -38,8 +38,36 @@ // -- Macros to generate function arrays --------------------------------------- -// -- One-operand macro -- +// -- "Smart" one-operand macro -- +#define GENARRAY_VFP(ftname,opname) \ +\ +PASTECH(ftname,_vft) \ +PASTECH(opname,_vfp)[BLIS_NUM_FP_TYPES] = \ +{ \ + PASTEMAC(s,opname), \ + PASTEMAC(c,opname), \ + PASTEMAC(d,opname), \ + PASTEMAC(z,opname) \ +} + +// -- "Smart" two-operand macro -- + +/* +#define GENARRAY2_VFP(arrayname,op) \ +\ +arrayname[BLIS_NUM_FP_TYPES][BLIS_NUM_FP_TYPES] = \ +{ \ + { PASTEMAC2(s,s,op), PASTEMAC2(s,c,op), PASTEMAC2(s,d,op), PASTEMAC2(s,z,op) }, \ + { PASTEMAC2(c,s,op), PASTEMAC2(c,c,op), PASTEMAC2(c,d,op), PASTEMAC2(c,z,op) }, \ + { PASTEMAC2(d,s,op), PASTEMAC2(d,c,op), PASTEMAC2(d,d,op), PASTEMAC2(d,z,op) }, \ + { PASTEMAC2(z,s,op), PASTEMAC2(z,c,op), PASTEMAC2(z,d,op), PASTEMAC2(z,z,op) } \ +} +*/ + + + +// -- One-operand macro -- #define GENARRAY(arrayname,op) \ \ diff --git a/frame/include/bli_gentdef_macro_defs.h b/frame/include/bli_gentdef_macro_defs.h new file mode 100644 index 000000000..116a5c1e3 --- /dev/null +++ b/frame/include/bli_gentdef_macro_defs.h @@ -0,0 +1,76 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#ifndef BLIS_GENTDEF_MACRO_DEFS_H +#define BLIS_GENTDEF_MACRO_DEFS_H + +// +// -- MACROS TO INSERT TYPEDEF-GENERATING MACROS ------------------------------- +// + + +// -- function typedef macro (both typed and void) -- + +#define INSERT_GENTDEF( opname ) \ +\ +GENTDEF( float, s, opname, _ft ) \ +GENTDEF( double, d, opname, _ft ) \ +GENTDEF( scomplex, c, opname, _ft ) \ +GENTDEF( dcomplex, z, opname, _ft ) \ +\ +GENTDEF( void, s, opname, _vft ) \ +GENTDEF( void, d, opname, _vft ) \ +GENTDEF( void, c, opname, _vft ) \ +GENTDEF( void, z, opname, _vft ) \ +\ +GENTDEF( void, , opname, _vft ) + +// -- function typedef macro (both typed and void) with real projection -- + +#define INSERT_GENTDEFR( opname ) \ +\ +GENTDEFR( float, float, s, s, opname, _ft ) \ +GENTDEFR( double, double, d, d, opname, _ft ) \ +GENTDEFR( scomplex, float, c, s, opname, _ft ) \ +GENTDEFR( dcomplex, double, z, d, opname, _ft ) \ +\ +GENTDEFR( void, void, s, s, opname, _vft ) \ +GENTDEFR( void, void, d, d, opname, _vft ) \ +GENTDEFR( void, void, c, s, opname, _vft ) \ +GENTDEFR( void, void, z, d, opname, _vft ) \ +\ +GENTDEFR( void, void, , , opname, _vft ) + + +#endif diff --git a/frame/include/bli_gentfunc_macro_defs.h b/frame/include/bli_gentfunc_macro_defs.h index 8b98674a4..5fac6b977 100644 --- a/frame/include/bli_gentfunc_macro_defs.h +++ b/frame/include/bli_gentfunc_macro_defs.h @@ -114,12 +114,12 @@ GENTFUNCR2( dcomplex, double, z, d, blasname, blisname ) #define INSERT_GENTFUNCSCAL_BLAS( blasname, blisname ) \ \ -GENTFUNCSCAL( float, float, , s, blasname, blisname ) \ -GENTFUNCSCAL( double, double, , d, blasname, blisname ) \ -GENTFUNCSCAL( scomplex, scomplex, , c, blasname, blisname ) \ -GENTFUNCSCAL( dcomplex, dcomplex, , z, blasname, blisname ) \ -GENTFUNCSCAL( float, scomplex, s, c, blasname, blisname ) \ -GENTFUNCSCAL( double, dcomplex, d, z, blasname, blisname ) +GENTFUNCSCAL( float, float, s, , blasname, blisname ) \ +GENTFUNCSCAL( double, double, d, , blasname, blisname ) \ +GENTFUNCSCAL( scomplex, scomplex, c, , blasname, blisname ) \ +GENTFUNCSCAL( dcomplex, dcomplex, z, , blasname, blisname ) \ +GENTFUNCSCAL( scomplex, float, c, s, blasname, blisname ) \ +GENTFUNCSCAL( dcomplex, double, z, d, blasname, blisname ) @@ -156,6 +156,24 @@ GENTFUNC( double, d, tfuncname, varname1, varname2 ) \ GENTFUNC( scomplex, c, tfuncname, varname1, varname2 ) \ GENTFUNC( dcomplex, z, tfuncname, varname1, varname2 ) +// -- (three auxiliary arguments) -- + +#define INSERT_GENTFUNC_BASIC3( tfuncname, varname1, varname2, varname3 ) \ +\ +GENTFUNC( float, s, tfuncname, varname1, varname2, varname3 ) \ +GENTFUNC( double, d, tfuncname, varname1, varname2, varname3 ) \ +GENTFUNC( scomplex, c, tfuncname, varname1, varname2, varname3 ) \ +GENTFUNC( dcomplex, z, tfuncname, varname1, varname2, varname3 ) + +// -- (four auxiliary arguments) -- + +#define INSERT_GENTFUNC_BASIC4( tfuncname, varname1, varname2, varname3, varname4 ) \ +\ +GENTFUNC( float, s, tfuncname, varname1, varname2, varname3, varname4 ) \ +GENTFUNC( double, d, tfuncname, varname1, varname2, varname3, varname4 ) \ +GENTFUNC( scomplex, c, tfuncname, varname1, varname2, varname3, varname4 ) \ +GENTFUNC( dcomplex, z, tfuncname, varname1, varname2, varname3, varname4 ) + // -- Basic one-operand with real projection -- @@ -178,6 +196,33 @@ GENTFUNCR( double, double, d, d, tfuncname, varname ) \ GENTFUNCR( scomplex, float, c, s, tfuncname, varname ) \ GENTFUNCR( dcomplex, double, z, d, tfuncname, varname ) +// -- (two auxiliary arguments) -- + +#define INSERT_GENTFUNCR_BASIC2( tfuncname, varname1, varname2 ) \ +\ +GENTFUNCR( float, float, s, s, tfuncname, varname1, varname2 ) \ +GENTFUNCR( double, double, d, d, tfuncname, varname1, varname2 ) \ +GENTFUNCR( scomplex, float, c, s, tfuncname, varname1, varname2 ) \ +GENTFUNCR( dcomplex, double, z, d, tfuncname, varname1, varname2 ) + +// -- (three auxiliary arguments) -- + +#define INSERT_GENTFUNCR_BASIC3( tfuncname, varname1, varname2, varname3 ) \ +\ +GENTFUNCR( float, float, s, s, tfuncname, varname1, varname2, varname3 ) \ +GENTFUNCR( double, double, d, d, tfuncname, varname1, varname2, varname3 ) \ +GENTFUNCR( scomplex, float, c, s, tfuncname, varname1, varname2, varname3 ) \ +GENTFUNCR( dcomplex, double, z, d, tfuncname, varname1, varname2, varname3 ) + +// -- (four auxiliary arguments) -- + +#define INSERT_GENTFUNCR_BASIC4( tfuncname, varname1, varname2, varname3, varname4 ) \ +\ +GENTFUNCR( float, float, s, s, tfuncname, varname1, varname2, varname3, varname4 ) \ +GENTFUNCR( double, double, d, d, tfuncname, varname1, varname2, varname3, varname4 ) \ +GENTFUNCR( scomplex, float, c, s, tfuncname, varname1, varname2, varname3, varname4 ) \ +GENTFUNCR( dcomplex, double, z, d, tfuncname, varname1, varname2, varname3, varname4 ) + // -- Basic one-operand macro with complex domain only and real projection -- @@ -207,6 +252,16 @@ GENTFUNCCO( dcomplex, double, z, d, tfuncname, varname1, varname2 ) // -- Basic one-operand macro with integer instance -- +// -- (no auxiliary arguments) -- + +#define INSERT_GENTFUNC_BASIC0_I( tfuncname ) \ +\ +GENTFUNC( float, s, tfuncname ) \ +GENTFUNC( double, d, tfuncname ) \ +GENTFUNC( scomplex, c, tfuncname ) \ +GENTFUNC( dcomplex, z, tfuncname ) \ +GENTFUNC( gint_t, i, tfuncname ) + // -- (one auxiliary argument) -- #define INSERT_GENTFUNC_BASIC_I( tfuncname, varname ) \ @@ -221,6 +276,15 @@ GENTFUNC( gint_t, i, tfuncname, varname ) // -- Basic one-operand with integer projection -- +// -- (no auxiliary arguments) -- + +#define INSERT_GENTFUNCI_BASIC0( tfuncname ) \ +\ +GENTFUNCI( float, gint_t, s, i, tfuncname ) \ +GENTFUNCI( double, gint_t, d, i, tfuncname ) \ +GENTFUNCI( scomplex, gint_t, c, i, tfuncname ) \ +GENTFUNCI( dcomplex, gint_t, z, i, tfuncname ) + // -- (one auxiliary argument) -- #define INSERT_GENTFUNCI_BASIC( tfuncname, varname ) \ diff --git a/frame/include/bli_kernel_macro_defs.h b/frame/include/bli_kernel_macro_defs.h index 5622a098c..9f3643a90 100644 --- a/frame/include/bli_kernel_macro_defs.h +++ b/frame/include/bli_kernel_macro_defs.h @@ -519,10 +519,6 @@ // axpy2v kernels -//#ifndef AXPY2V_KERNEL -//#define AXPY2V_KERNEL AXPY2V_KERNEL_REF -//#endif - #ifndef BLIS_SAXPY2V_KERNEL #define BLIS_SAXPY2V_KERNEL BLIS_SAXPY2V_KERNEL_REF #endif @@ -541,10 +537,6 @@ // dotaxpyv kernels -//#ifndef DOTAXPYV_KERNEL -//#define DOTAXPYV_KERNEL DOTAXPYV_KERNEL_REF -//#endif - #ifndef BLIS_SDOTAXPYV_KERNEL #define BLIS_SDOTAXPYV_KERNEL BLIS_SDOTAXPYV_KERNEL_REF #endif @@ -563,10 +555,6 @@ // axpyf kernels -//#ifndef AXPYF_KERNEL -//#define AXPYF_KERNEL AXPYF_KERNEL_REF -//#endif - #ifndef BLIS_SAXPYF_KERNEL #define BLIS_SAXPYF_KERNEL BLIS_SAXPYF_KERNEL_REF #endif @@ -585,10 +573,6 @@ // dotxf kernels -//#ifndef DOTXF_KERNEL -//#define DOTXF_KERNEL DOTXF_KERNEL_REF -//#endif - #ifndef BLIS_SDOTXF_KERNEL #define BLIS_SDOTXF_KERNEL BLIS_SDOTXF_KERNEL_REF #endif @@ -607,10 +591,6 @@ // dotxaxpyf kernels -//#ifndef DOTXAXPYF_KERNEL -//#define DOTXAXPYF_KERNEL DOTXAXPYF_KERNEL_REF -//#endif - #ifndef BLIS_SDOTXAXPYF_KERNEL #define BLIS_SDOTXAXPYF_KERNEL BLIS_SDOTXAXPYF_KERNEL_REF #endif @@ -633,10 +613,6 @@ // addv kernels -//#ifndef ADDV_KERNEL -//#define ADDV_KERNEL ADDV_KERNEL_REF -//#endif - #ifndef BLIS_SADDV_KERNEL #define BLIS_SADDV_KERNEL BLIS_SADDV_KERNEL_REF #endif @@ -655,10 +631,6 @@ // axpyv kernels -//#ifndef AXPYV_KERNEL -//#define AXPYV_KERNEL AXPYV_KERNEL_REF -//#endif - #ifndef BLIS_SAXPYV_KERNEL #define BLIS_SAXPYV_KERNEL BLIS_SAXPYV_KERNEL_REF #endif @@ -677,10 +649,6 @@ // copyv kernels -//#ifndef COPYV_KERNEL -//#define COPYV_KERNEL COPYV_KERNEL_REF -//#endif - #ifndef BLIS_SCOPYV_KERNEL #define BLIS_SCOPYV_KERNEL BLIS_SCOPYV_KERNEL_REF #endif @@ -699,10 +667,6 @@ // dotv kernels -//#ifndef DOTV_KERNEL -//#define DOTV_KERNEL DOTV_KERNEL_REF -//#endif - #ifndef BLIS_SDOTV_KERNEL #define BLIS_SDOTV_KERNEL BLIS_SDOTV_KERNEL_REF #endif @@ -721,10 +685,6 @@ // dotxv kernels -//#ifndef DOTXV_KERNEL -//#define DOTXV_KERNEL DOTXV_KERNEL_REF -//#endif - #ifndef BLIS_SDOTXV_KERNEL #define BLIS_SDOTXV_KERNEL BLIS_SDOTXV_KERNEL_REF #endif @@ -743,10 +703,6 @@ // invertv kernels -//#ifndef INVERTV_KERNEL -//#define INVERTV_KERNEL INVERTV_KERNEL_REF -//#endif - #ifndef BLIS_SINVERTV_KERNEL #define BLIS_SINVERTV_KERNEL BLIS_SINVERTV_KERNEL_REF #endif @@ -765,10 +721,6 @@ // scal2v kernels -//#ifndef SCAL2V_KERNEL -//#define SCAL2V_KERNEL SCAL2V_KERNEL_REF -//#endif - #ifndef BLIS_SSCAL2V_KERNEL #define BLIS_SSCAL2V_KERNEL BLIS_SSCAL2V_KERNEL_REF #endif @@ -787,10 +739,6 @@ // scalv kernels -//#ifndef SCALV_KERNEL -//#define SCALV_KERNEL SCALV_KERNEL_REF -//#endif - #ifndef BLIS_SSCALV_KERNEL #define BLIS_SSCALV_KERNEL BLIS_SSCALV_KERNEL_REF #endif @@ -809,10 +757,6 @@ // setv kernels -//#ifndef SETV_KERNEL -//#define SETV_KERNEL SETV_KERNEL_REF -//#endif - #ifndef BLIS_SSETV_KERNEL #define BLIS_SSETV_KERNEL BLIS_SSETV_KERNEL_REF #endif @@ -831,10 +775,6 @@ // subv kernels -//#ifndef SUBV_KERNEL -//#define SUBV_KERNEL SUBV_KERNEL_REF -//#endif - #ifndef BLIS_SSUBV_KERNEL #define BLIS_SSUBV_KERNEL BLIS_SSUBV_KERNEL_REF #endif @@ -853,10 +793,6 @@ // swapv kernels -//#ifndef SWAPV_KERNEL -//#define SWAPV_KERNEL SWAPV_KERNEL_REF -//#endif - #ifndef BLIS_SSWAPV_KERNEL #define BLIS_SSWAPV_KERNEL BLIS_SSWAPV_KERNEL_REF #endif @@ -1106,42 +1042,42 @@ // NOTE: These values determine high-level cache blocking for level-2 // operations ONLY. So, if gemv is performed with a 2000x2000 matrix A and -// MC = NC = 1000, then a total of four unblocked (or unblocked fused) +// M2 = N2 = 1000, then a total of four unblocked (or unblocked fused) // gemv subproblems are called. The blocked algorithms are only useful in // that they provide the opportunity for packing vectors. (Matrices can also // be packed here, but this tends to be much too expensive in practice to // actually employ.) -#ifndef BLIS_DEFAULT_L2_MC_S -#define BLIS_DEFAULT_L2_MC_S 1000 +#ifndef BLIS_DEFAULT_M2_S +#define BLIS_DEFAULT_M2_S 1000 #endif -#ifndef BLIS_DEFAULT_L2_NC_S -#define BLIS_DEFAULT_L2_NC_S 1000 +#ifndef BLIS_DEFAULT_N2_S +#define BLIS_DEFAULT_N2_S 1000 #endif -#ifndef BLIS_DEFAULT_L2_MC_D -#define BLIS_DEFAULT_L2_MC_D 1000 +#ifndef BLIS_DEFAULT_M2_D +#define BLIS_DEFAULT_M2_D 1000 #endif -#ifndef BLIS_DEFAULT_L2_NC_D -#define BLIS_DEFAULT_L2_NC_D 1000 +#ifndef BLIS_DEFAULT_N2_D +#define BLIS_DEFAULT_N2_D 1000 #endif -#ifndef BLIS_DEFAULT_L2_MC_C -#define BLIS_DEFAULT_L2_MC_C 1000 +#ifndef BLIS_DEFAULT_M2_C +#define BLIS_DEFAULT_M2_C 1000 #endif -#ifndef BLIS_DEFAULT_L2_NC_C -#define BLIS_DEFAULT_L2_NC_C 1000 +#ifndef BLIS_DEFAULT_N2_C +#define BLIS_DEFAULT_N2_C 1000 #endif -#ifndef BLIS_DEFAULT_L2_MC_Z -#define BLIS_DEFAULT_L2_MC_Z 1000 +#ifndef BLIS_DEFAULT_M2_Z +#define BLIS_DEFAULT_M2_Z 1000 #endif -#ifndef BLIS_DEFAULT_L2_NC_Z -#define BLIS_DEFAULT_L2_NC_Z 1000 +#ifndef BLIS_DEFAULT_N2_Z +#define BLIS_DEFAULT_N2_Z 1000 #endif // @@ -1150,74 +1086,74 @@ // Global level-1f fusing factors. -#ifndef BLIS_L1F_FUSE_FAC_S -#define BLIS_L1F_FUSE_FAC_S 8 +#ifndef BLIS_DEFAULT_1F_S +#define BLIS_DEFAULT_1F_S 8 #endif -#ifndef BLIS_L1F_FUSE_FAC_D -#define BLIS_L1F_FUSE_FAC_D 4 +#ifndef BLIS_DEFAULT_1F_D +#define BLIS_DEFAULT_1F_D 4 #endif -#ifndef BLIS_L1F_FUSE_FAC_C -#define BLIS_L1F_FUSE_FAC_C 4 +#ifndef BLIS_DEFAULT_1F_C +#define BLIS_DEFAULT_1F_C 4 #endif -#ifndef BLIS_L1F_FUSE_FAC_Z -#define BLIS_L1F_FUSE_FAC_Z 2 +#ifndef BLIS_DEFAULT_1F_Z +#define BLIS_DEFAULT_1F_Z 2 #endif // axpyf -#ifndef BLIS_AXPYF_FUSE_FAC_S -#define BLIS_AXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S +#ifndef BLIS_DEFAULT_AF_S +#define BLIS_DEFAULT_AF_S BLIS_DEFAULT_1F_S #endif -#ifndef BLIS_AXPYF_FUSE_FAC_D -#define BLIS_AXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D +#ifndef BLIS_DEFAULT_AF_D +#define BLIS_DEFAULT_AF_D BLIS_DEFAULT_1F_D #endif -#ifndef BLIS_AXPYF_FUSE_FAC_C -#define BLIS_AXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C +#ifndef BLIS_DEFAULT_AF_C +#define BLIS_DEFAULT_AF_C BLIS_DEFAULT_1F_C #endif -#ifndef BLIS_AXPYF_FUSE_FAC_Z -#define BLIS_AXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z +#ifndef BLIS_DEFAULT_AF_Z +#define BLIS_DEFAULT_AF_Z BLIS_DEFAULT_1F_Z #endif // dotxf -#ifndef BLIS_DOTXF_FUSE_FAC_S -#define BLIS_DOTXF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S +#ifndef BLIS_DEFAULT_DF_S +#define BLIS_DEFAULT_DF_S BLIS_DEFAULT_1F_S #endif -#ifndef BLIS_DOTXF_FUSE_FAC_D -#define BLIS_DOTXF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D +#ifndef BLIS_DEFAULT_DF_D +#define BLIS_DEFAULT_DF_D BLIS_DEFAULT_1F_D #endif -#ifndef BLIS_DOTXF_FUSE_FAC_C -#define BLIS_DOTXF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C +#ifndef BLIS_DEFAULT_DF_C +#define BLIS_DEFAULT_DF_C BLIS_DEFAULT_1F_C #endif -#ifndef BLIS_DOTXF_FUSE_FAC_Z -#define BLIS_DOTXF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z +#ifndef BLIS_DEFAULT_DF_Z +#define BLIS_DEFAULT_DF_Z BLIS_DEFAULT_1F_Z #endif // dotxaxpyf -#ifndef BLIS_DOTXAXPYF_FUSE_FAC_S -#define BLIS_DOTXAXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S +#ifndef BLIS_DEFAULT_XF_S +#define BLIS_DEFAULT_XF_S BLIS_DEFAULT_1F_S #endif -#ifndef BLIS_DOTXAXPYF_FUSE_FAC_D -#define BLIS_DOTXAXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D +#ifndef BLIS_DEFAULT_XF_D +#define BLIS_DEFAULT_XF_D BLIS_DEFAULT_1F_D #endif -#ifndef BLIS_DOTXAXPYF_FUSE_FAC_C -#define BLIS_DOTXAXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C +#ifndef BLIS_DEFAULT_XF_C +#define BLIS_DEFAULT_XF_C BLIS_DEFAULT_1F_C #endif -#ifndef BLIS_DOTXAXPYF_FUSE_FAC_Z -#define BLIS_DOTXAXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z +#ifndef BLIS_DEFAULT_XF_Z +#define BLIS_DEFAULT_XF_Z BLIS_DEFAULT_1F_Z #endif // @@ -1228,20 +1164,20 @@ // non-contiguous vectors. Similar to that of KR, they can // typically be set to 1. -#ifndef BLIS_DEFAULT_VR_S -#define BLIS_DEFAULT_VR_S 1 +#ifndef BLIS_DEFAULT_VF_S +#define BLIS_DEFAULT_VF_S 1 #endif -#ifndef BLIS_DEFAULT_VR_D -#define BLIS_DEFAULT_VR_D 1 +#ifndef BLIS_DEFAULT_VF_D +#define BLIS_DEFAULT_VF_D 1 #endif -#ifndef BLIS_DEFAULT_VR_C -#define BLIS_DEFAULT_VR_C 1 +#ifndef BLIS_DEFAULT_VF_C +#define BLIS_DEFAULT_VF_C 1 #endif -#ifndef BLIS_DEFAULT_VR_Z -#define BLIS_DEFAULT_VR_Z 1 +#ifndef BLIS_DEFAULT_VF_Z +#define BLIS_DEFAULT_VF_Z 1 #endif @@ -1313,98 +1249,4 @@ #endif -// -- Abbreiviated kernel blocksize macros ------------------------------------- - -// Here, we shorten the blocksizes defined in bli_kernel.h so that they can -// derived via the PASTEMAC macro. - -// Default (minimum) cache blocksizes - -#define bli_smc BLIS_DEFAULT_MC_S -#define bli_skc BLIS_DEFAULT_KC_S -#define bli_snc BLIS_DEFAULT_NC_S - -#define bli_dmc BLIS_DEFAULT_MC_D -#define bli_dkc BLIS_DEFAULT_KC_D -#define bli_dnc BLIS_DEFAULT_NC_D - -#define bli_cmc BLIS_DEFAULT_MC_C -#define bli_ckc BLIS_DEFAULT_KC_C -#define bli_cnc BLIS_DEFAULT_NC_C - -#define bli_zmc BLIS_DEFAULT_MC_Z -#define bli_zkc BLIS_DEFAULT_KC_Z -#define bli_znc BLIS_DEFAULT_NC_Z - -// Register blocksizes - -#define bli_smr BLIS_DEFAULT_MR_S -#define bli_skr BLIS_DEFAULT_KR_S -#define bli_snr BLIS_DEFAULT_NR_S - -#define bli_dmr BLIS_DEFAULT_MR_D -#define bli_dkr BLIS_DEFAULT_KR_D -#define bli_dnr BLIS_DEFAULT_NR_D - -#define bli_cmr BLIS_DEFAULT_MR_C -#define bli_ckr BLIS_DEFAULT_KR_C -#define bli_cnr BLIS_DEFAULT_NR_C - -#define bli_zmr BLIS_DEFAULT_MR_Z -#define bli_zkr BLIS_DEFAULT_KR_Z -#define bli_znr BLIS_DEFAULT_NR_Z - -// Extended (maximum) cache blocksizes - -#define bli_smaxmc BLIS_MAXIMUM_MC_S -#define bli_smaxkc BLIS_MAXIMUM_KC_S -#define bli_smaxnc BLIS_MAXIMUM_NC_S - -#define bli_dmaxmc BLIS_MAXIMUM_MC_D -#define bli_dmaxkc BLIS_MAXIMUM_KC_D -#define bli_dmaxnc BLIS_MAXIMUM_NC_D - -#define bli_cmaxmc BLIS_MAXIMUM_MC_C -#define bli_cmaxkc BLIS_MAXIMUM_KC_C -#define bli_cmaxnc BLIS_MAXIMUM_NC_C - -#define bli_zmaxmc BLIS_MAXIMUM_MC_Z -#define bli_zmaxkc BLIS_MAXIMUM_KC_Z -#define bli_zmaxnc BLIS_MAXIMUM_NC_Z - -// Extended (packing) register blocksizes - -#define bli_spackmr BLIS_PACKDIM_MR_S -#define bli_spackkr BLIS_PACKDIM_KR_S -#define bli_spacknr BLIS_PACKDIM_NR_S - -#define bli_dpackmr BLIS_PACKDIM_MR_D -#define bli_dpackkr BLIS_PACKDIM_KR_D -#define bli_dpacknr BLIS_PACKDIM_NR_D - -#define bli_cpackmr BLIS_PACKDIM_MR_C -#define bli_cpackkr BLIS_PACKDIM_KR_C -#define bli_cpacknr BLIS_PACKDIM_NR_C - -#define bli_zpackmr BLIS_PACKDIM_MR_Z -#define bli_zpackkr BLIS_PACKDIM_KR_Z -#define bli_zpacknr BLIS_PACKDIM_NR_Z - -// Level-1f fusing factors - -#define bli_saxpyf_fusefac BLIS_AXPYF_FUSE_FAC_S -#define bli_daxpyf_fusefac BLIS_AXPYF_FUSE_FAC_D -#define bli_caxpyf_fusefac BLIS_AXPYF_FUSE_FAC_C -#define bli_zaxpyf_fusefac BLIS_AXPYF_FUSE_FAC_Z - -#define bli_sdotxf_fusefac BLIS_DOTXF_FUSE_FAC_S -#define bli_ddotxf_fusefac BLIS_DOTXF_FUSE_FAC_D -#define bli_cdotxf_fusefac BLIS_DOTXF_FUSE_FAC_C -#define bli_zdotxf_fusefac BLIS_DOTXF_FUSE_FAC_Z - -#define bli_sdotxaxpyf_fusefac BLIS_DOTXAXPYF_FUSE_FAC_S -#define bli_ddotxaxpyf_fusefac BLIS_DOTXAXPYF_FUSE_FAC_D -#define bli_cdotxaxpyf_fusefac BLIS_DOTXAXPYF_FUSE_FAC_C -#define bli_zdotxaxpyf_fusefac BLIS_DOTXAXPYF_FUSE_FAC_Z - #endif diff --git a/frame/include/bli_kernel_pre_macro_defs.h b/frame/include/bli_kernel_pre_macro_defs.h index d18f822ac..703f8c54f 100644 --- a/frame/include/bli_kernel_pre_macro_defs.h +++ b/frame/include/bli_kernel_pre_macro_defs.h @@ -82,129 +82,129 @@ // packm_2xk kernels -#define BLIS_SPACKM_2XK_KERNEL_REF bli_spackm_ref_2xk -#define BLIS_DPACKM_2XK_KERNEL_REF bli_dpackm_ref_2xk -#define BLIS_CPACKM_2XK_KERNEL_REF bli_cpackm_ref_2xk -#define BLIS_ZPACKM_2XK_KERNEL_REF bli_zpackm_ref_2xk +#define BLIS_SPACKM_2XK_KERNEL_REF bli_spackm_2xk_ref +#define BLIS_DPACKM_2XK_KERNEL_REF bli_dpackm_2xk_ref +#define BLIS_CPACKM_2XK_KERNEL_REF bli_cpackm_2xk_ref +#define BLIS_ZPACKM_2XK_KERNEL_REF bli_zpackm_2xk_ref // packm_3xk kernels -#define BLIS_SPACKM_3XK_KERNEL_REF bli_spackm_ref_3xk -#define BLIS_DPACKM_3XK_KERNEL_REF bli_dpackm_ref_3xk -#define BLIS_CPACKM_3XK_KERNEL_REF bli_cpackm_ref_3xk -#define BLIS_ZPACKM_3XK_KERNEL_REF bli_zpackm_ref_3xk +#define BLIS_SPACKM_3XK_KERNEL_REF bli_spackm_3xk_ref +#define BLIS_DPACKM_3XK_KERNEL_REF bli_dpackm_3xk_ref +#define BLIS_CPACKM_3XK_KERNEL_REF bli_cpackm_3xk_ref +#define BLIS_ZPACKM_3XK_KERNEL_REF bli_zpackm_3xk_ref // packm_4xk kernels -#define BLIS_SPACKM_4XK_KERNEL_REF bli_spackm_ref_4xk -#define BLIS_DPACKM_4XK_KERNEL_REF bli_dpackm_ref_4xk -#define BLIS_CPACKM_4XK_KERNEL_REF bli_cpackm_ref_4xk -#define BLIS_ZPACKM_4XK_KERNEL_REF bli_zpackm_ref_4xk +#define BLIS_SPACKM_4XK_KERNEL_REF bli_spackm_4xk_ref +#define BLIS_DPACKM_4XK_KERNEL_REF bli_dpackm_4xk_ref +#define BLIS_CPACKM_4XK_KERNEL_REF bli_cpackm_4xk_ref +#define BLIS_ZPACKM_4XK_KERNEL_REF bli_zpackm_4xk_ref // packm_6xk kernels -#define BLIS_SPACKM_6XK_KERNEL_REF bli_spackm_ref_6xk -#define BLIS_DPACKM_6XK_KERNEL_REF bli_dpackm_ref_6xk -#define BLIS_CPACKM_6XK_KERNEL_REF bli_cpackm_ref_6xk -#define BLIS_ZPACKM_6XK_KERNEL_REF bli_zpackm_ref_6xk +#define BLIS_SPACKM_6XK_KERNEL_REF bli_spackm_6xk_ref +#define BLIS_DPACKM_6XK_KERNEL_REF bli_dpackm_6xk_ref +#define BLIS_CPACKM_6XK_KERNEL_REF bli_cpackm_6xk_ref +#define BLIS_ZPACKM_6XK_KERNEL_REF bli_zpackm_6xk_ref // packm_8xk kernels -#define BLIS_SPACKM_8XK_KERNEL_REF bli_spackm_ref_8xk -#define BLIS_DPACKM_8XK_KERNEL_REF bli_dpackm_ref_8xk -#define BLIS_CPACKM_8XK_KERNEL_REF bli_cpackm_ref_8xk -#define BLIS_ZPACKM_8XK_KERNEL_REF bli_zpackm_ref_8xk +#define BLIS_SPACKM_8XK_KERNEL_REF bli_spackm_8xk_ref +#define BLIS_DPACKM_8XK_KERNEL_REF bli_dpackm_8xk_ref +#define BLIS_CPACKM_8XK_KERNEL_REF bli_cpackm_8xk_ref +#define BLIS_ZPACKM_8XK_KERNEL_REF bli_zpackm_8xk_ref // packm_10xk kernels -#define BLIS_SPACKM_10XK_KERNEL_REF bli_spackm_ref_10xk -#define BLIS_DPACKM_10XK_KERNEL_REF bli_dpackm_ref_10xk -#define BLIS_CPACKM_10XK_KERNEL_REF bli_cpackm_ref_10xk -#define BLIS_ZPACKM_10XK_KERNEL_REF bli_zpackm_ref_10xk +#define BLIS_SPACKM_10XK_KERNEL_REF bli_spackm_10xk_ref +#define BLIS_DPACKM_10XK_KERNEL_REF bli_dpackm_10xk_ref +#define BLIS_CPACKM_10XK_KERNEL_REF bli_cpackm_10xk_ref +#define BLIS_ZPACKM_10XK_KERNEL_REF bli_zpackm_10xk_ref // packm_12xk kernels -#define BLIS_SPACKM_12XK_KERNEL_REF bli_spackm_ref_12xk -#define BLIS_DPACKM_12XK_KERNEL_REF bli_dpackm_ref_12xk -#define BLIS_CPACKM_12XK_KERNEL_REF bli_cpackm_ref_12xk -#define BLIS_ZPACKM_12XK_KERNEL_REF bli_zpackm_ref_12xk +#define BLIS_SPACKM_12XK_KERNEL_REF bli_spackm_12xk_ref +#define BLIS_DPACKM_12XK_KERNEL_REF bli_dpackm_12xk_ref +#define BLIS_CPACKM_12XK_KERNEL_REF bli_cpackm_12xk_ref +#define BLIS_ZPACKM_12XK_KERNEL_REF bli_zpackm_12xk_ref // packm_14xk kernels -#define BLIS_SPACKM_14XK_KERNEL_REF bli_spackm_ref_14xk -#define BLIS_DPACKM_14XK_KERNEL_REF bli_dpackm_ref_14xk -#define BLIS_CPACKM_14XK_KERNEL_REF bli_cpackm_ref_14xk -#define BLIS_ZPACKM_14XK_KERNEL_REF bli_zpackm_ref_14xk +#define BLIS_SPACKM_14XK_KERNEL_REF bli_spackm_14xk_ref +#define BLIS_DPACKM_14XK_KERNEL_REF bli_dpackm_14xk_ref +#define BLIS_CPACKM_14XK_KERNEL_REF bli_cpackm_14xk_ref +#define BLIS_ZPACKM_14XK_KERNEL_REF bli_zpackm_14xk_ref // packm_16xk kernels -#define BLIS_SPACKM_16XK_KERNEL_REF bli_spackm_ref_16xk -#define BLIS_DPACKM_16XK_KERNEL_REF bli_dpackm_ref_16xk -#define BLIS_CPACKM_16XK_KERNEL_REF bli_cpackm_ref_16xk -#define BLIS_ZPACKM_16XK_KERNEL_REF bli_zpackm_ref_16xk +#define BLIS_SPACKM_16XK_KERNEL_REF bli_spackm_16xk_ref +#define BLIS_DPACKM_16XK_KERNEL_REF bli_dpackm_16xk_ref +#define BLIS_CPACKM_16XK_KERNEL_REF bli_cpackm_16xk_ref +#define BLIS_ZPACKM_16XK_KERNEL_REF bli_zpackm_16xk_ref // packm_30xk kernels -#define BLIS_SPACKM_30XK_KERNEL_REF bli_spackm_ref_30xk -#define BLIS_DPACKM_30XK_KERNEL_REF bli_dpackm_ref_30xk -#define BLIS_CPACKM_30XK_KERNEL_REF bli_cpackm_ref_30xk -#define BLIS_ZPACKM_30XK_KERNEL_REF bli_zpackm_ref_30xk +#define BLIS_SPACKM_30XK_KERNEL_REF bli_spackm_30xk_ref +#define BLIS_DPACKM_30XK_KERNEL_REF bli_dpackm_30xk_ref +#define BLIS_CPACKM_30XK_KERNEL_REF bli_cpackm_30xk_ref +#define BLIS_ZPACKM_30XK_KERNEL_REF bli_zpackm_30xk_ref // unpack_2xk kernels -#define BLIS_SUNPACKM_2XK_KERNEL_REF bli_sunpackm_ref_2xk -#define BLIS_DUNPACKM_2XK_KERNEL_REF bli_dunpackm_ref_2xk -#define BLIS_CUNPACKM_2XK_KERNEL_REF bli_cunpackm_ref_2xk -#define BLIS_ZUNPACKM_2XK_KERNEL_REF bli_zunpackm_ref_2xk +#define BLIS_SUNPACKM_2XK_KERNEL_REF bli_sunpackm_2xk_ref +#define BLIS_DUNPACKM_2XK_KERNEL_REF bli_dunpackm_2xk_ref +#define BLIS_CUNPACKM_2XK_KERNEL_REF bli_cunpackm_2xk_ref +#define BLIS_ZUNPACKM_2XK_KERNEL_REF bli_zunpackm_2xk_ref // unpack_4xk kernels -#define BLIS_SUNPACKM_4XK_KERNEL_REF bli_sunpackm_ref_4xk -#define BLIS_DUNPACKM_4XK_KERNEL_REF bli_dunpackm_ref_4xk -#define BLIS_CUNPACKM_4XK_KERNEL_REF bli_cunpackm_ref_4xk -#define BLIS_ZUNPACKM_4XK_KERNEL_REF bli_zunpackm_ref_4xk +#define BLIS_SUNPACKM_4XK_KERNEL_REF bli_sunpackm_4xk_ref +#define BLIS_DUNPACKM_4XK_KERNEL_REF bli_dunpackm_4xk_ref +#define BLIS_CUNPACKM_4XK_KERNEL_REF bli_cunpackm_4xk_ref +#define BLIS_ZUNPACKM_4XK_KERNEL_REF bli_zunpackm_4xk_ref // unpack_6xk kernels -#define BLIS_SUNPACKM_6XK_KERNEL_REF bli_sunpackm_ref_6xk -#define BLIS_DUNPACKM_6XK_KERNEL_REF bli_dunpackm_ref_6xk -#define BLIS_CUNPACKM_6XK_KERNEL_REF bli_cunpackm_ref_6xk -#define BLIS_ZUNPACKM_6XK_KERNEL_REF bli_zunpackm_ref_6xk +#define BLIS_SUNPACKM_6XK_KERNEL_REF bli_sunpackm_6xk_ref +#define BLIS_DUNPACKM_6XK_KERNEL_REF bli_dunpackm_6xk_ref +#define BLIS_CUNPACKM_6XK_KERNEL_REF bli_cunpackm_6xk_ref +#define BLIS_ZUNPACKM_6XK_KERNEL_REF bli_zunpackm_6xk_ref // unpack_8xk kernels -#define BLIS_SUNPACKM_8XK_KERNEL_REF bli_sunpackm_ref_8xk -#define BLIS_DUNPACKM_8XK_KERNEL_REF bli_dunpackm_ref_8xk -#define BLIS_CUNPACKM_8XK_KERNEL_REF bli_cunpackm_ref_8xk -#define BLIS_ZUNPACKM_8XK_KERNEL_REF bli_zunpackm_ref_8xk +#define BLIS_SUNPACKM_8XK_KERNEL_REF bli_sunpackm_8xk_ref +#define BLIS_DUNPACKM_8XK_KERNEL_REF bli_dunpackm_8xk_ref +#define BLIS_CUNPACKM_8XK_KERNEL_REF bli_cunpackm_8xk_ref +#define BLIS_ZUNPACKM_8XK_KERNEL_REF bli_zunpackm_8xk_ref // unpack_10xk kernels -#define BLIS_SUNPACKM_10XK_KERNEL_REF bli_sunpackm_ref_10xk -#define BLIS_DUNPACKM_10XK_KERNEL_REF bli_dunpackm_ref_10xk -#define BLIS_CUNPACKM_10XK_KERNEL_REF bli_cunpackm_ref_10xk -#define BLIS_ZUNPACKM_10XK_KERNEL_REF bli_zunpackm_ref_10xk +#define BLIS_SUNPACKM_10XK_KERNEL_REF bli_sunpackm_10xk_ref +#define BLIS_DUNPACKM_10XK_KERNEL_REF bli_dunpackm_10xk_ref +#define BLIS_CUNPACKM_10XK_KERNEL_REF bli_cunpackm_10xk_ref +#define BLIS_ZUNPACKM_10XK_KERNEL_REF bli_zunpackm_10xk_ref // unpack_12xk kernels -#define BLIS_SUNPACKM_12XK_KERNEL_REF bli_sunpackm_ref_12xk -#define BLIS_DUNPACKM_12XK_KERNEL_REF bli_dunpackm_ref_12xk -#define BLIS_CUNPACKM_12XK_KERNEL_REF bli_cunpackm_ref_12xk -#define BLIS_ZUNPACKM_12XK_KERNEL_REF bli_zunpackm_ref_12xk +#define BLIS_SUNPACKM_12XK_KERNEL_REF bli_sunpackm_12xk_ref +#define BLIS_DUNPACKM_12XK_KERNEL_REF bli_dunpackm_12xk_ref +#define BLIS_CUNPACKM_12XK_KERNEL_REF bli_cunpackm_12xk_ref +#define BLIS_ZUNPACKM_12XK_KERNEL_REF bli_zunpackm_12xk_ref // unpack_14xk kernels -#define BLIS_SUNPACKM_14XK_KERNEL_REF bli_sunpackm_ref_14xk -#define BLIS_DUNPACKM_14XK_KERNEL_REF bli_dunpackm_ref_14xk -#define BLIS_CUNPACKM_14XK_KERNEL_REF bli_cunpackm_ref_14xk -#define BLIS_ZUNPACKM_14XK_KERNEL_REF bli_zunpackm_ref_14xk +#define BLIS_SUNPACKM_14XK_KERNEL_REF bli_sunpackm_14xk_ref +#define BLIS_DUNPACKM_14XK_KERNEL_REF bli_dunpackm_14xk_ref +#define BLIS_CUNPACKM_14XK_KERNEL_REF bli_cunpackm_14xk_ref +#define BLIS_ZUNPACKM_14XK_KERNEL_REF bli_zunpackm_14xk_ref // unpack_16xk kernels -#define BLIS_SUNPACKM_16XK_KERNEL_REF bli_sunpackm_ref_16xk -#define BLIS_DUNPACKM_16XK_KERNEL_REF bli_dunpackm_ref_16xk -#define BLIS_CUNPACKM_16XK_KERNEL_REF bli_cunpackm_ref_16xk -#define BLIS_ZUNPACKM_16XK_KERNEL_REF bli_zunpackm_ref_16xk +#define BLIS_SUNPACKM_16XK_KERNEL_REF bli_sunpackm_16xk_ref +#define BLIS_DUNPACKM_16XK_KERNEL_REF bli_dunpackm_16xk_ref +#define BLIS_CUNPACKM_16XK_KERNEL_REF bli_cunpackm_16xk_ref +#define BLIS_ZUNPACKM_16XK_KERNEL_REF bli_zunpackm_16xk_ref // // Level-1f @@ -212,42 +212,42 @@ // axpy2v kernels -#define BLIS_SAXPY2V_KERNEL_REF bli_sssaxpy2v_ref -#define BLIS_DAXPY2V_KERNEL_REF bli_dddaxpy2v_ref -#define BLIS_CAXPY2V_KERNEL_REF bli_cccaxpy2v_ref -#define BLIS_ZAXPY2V_KERNEL_REF bli_zzzaxpy2v_ref +#define BLIS_SAXPY2V_KERNEL_REF bli_saxpy2v_ref +#define BLIS_DAXPY2V_KERNEL_REF bli_daxpy2v_ref +#define BLIS_CAXPY2V_KERNEL_REF bli_caxpy2v_ref +#define BLIS_ZAXPY2V_KERNEL_REF bli_zaxpy2v_ref // dotaxpyv kernels -#define BLIS_SDOTAXPYV_KERNEL_REF bli_sssdotaxpyv_ref -#define BLIS_DDOTAXPYV_KERNEL_REF bli_ddddotaxpyv_ref -#define BLIS_CDOTAXPYV_KERNEL_REF bli_cccdotaxpyv_ref -#define BLIS_ZDOTAXPYV_KERNEL_REF bli_zzzdotaxpyv_ref +#define BLIS_SDOTAXPYV_KERNEL_REF bli_sdotaxpyv_ref +#define BLIS_DDOTAXPYV_KERNEL_REF bli_ddotaxpyv_ref +#define BLIS_CDOTAXPYV_KERNEL_REF bli_cdotaxpyv_ref +#define BLIS_ZDOTAXPYV_KERNEL_REF bli_zdotaxpyv_ref // axpyf kernels -#define BLIS_SAXPYF_KERNEL_REF bli_sssaxpyf_ref -#define BLIS_DAXPYF_KERNEL_REF bli_dddaxpyf_ref -#define BLIS_CAXPYF_KERNEL_REF bli_cccaxpyf_ref -#define BLIS_ZAXPYF_KERNEL_REF bli_zzzaxpyf_ref +#define BLIS_SAXPYF_KERNEL_REF bli_saxpyf_ref +#define BLIS_DAXPYF_KERNEL_REF bli_daxpyf_ref +#define BLIS_CAXPYF_KERNEL_REF bli_caxpyf_ref +#define BLIS_ZAXPYF_KERNEL_REF bli_zaxpyf_ref // dotxf kernels -#define BLIS_SDOTXF_KERNEL_REF bli_sssdotxf_ref -#define BLIS_DDOTXF_KERNEL_REF bli_ddddotxf_ref -#define BLIS_CDOTXF_KERNEL_REF bli_cccdotxf_ref -#define BLIS_ZDOTXF_KERNEL_REF bli_zzzdotxf_ref +#define BLIS_SDOTXF_KERNEL_REF bli_sdotxf_ref +#define BLIS_DDOTXF_KERNEL_REF bli_ddotxf_ref +#define BLIS_CDOTXF_KERNEL_REF bli_cdotxf_ref +#define BLIS_ZDOTXF_KERNEL_REF bli_zdotxf_ref // dotxaxpyf kernels -//#define BLIS_SDOTXAXPYF_KERNEL_REF bli_sssdotxaxpyf_ref_var1 -//#define BLIS_DDOTXAXPYF_KERNEL_REF bli_ddddotxaxpyf_ref_var1 -//#define BLIS_CDOTXAXPYF_KERNEL_REF bli_cccdotxaxpyf_ref_var1 -//#define BLIS_ZDOTXAXPYF_KERNEL_REF bli_zzzdotxaxpyf_ref_var1 -#define BLIS_SDOTXAXPYF_KERNEL_REF bli_sssdotxaxpyf_ref_var2 -#define BLIS_DDOTXAXPYF_KERNEL_REF bli_ddddotxaxpyf_ref_var2 -#define BLIS_CDOTXAXPYF_KERNEL_REF bli_cccdotxaxpyf_ref_var2 -#define BLIS_ZDOTXAXPYF_KERNEL_REF bli_zzzdotxaxpyf_ref_var2 +//#define BLIS_SDOTXAXPYF_KERNEL_REF bli_sdotxaxpyf_ref_var1 +//#define BLIS_DDOTXAXPYF_KERNEL_REF bli_ddotxaxpyf_ref_var1 +//#define BLIS_CDOTXAXPYF_KERNEL_REF bli_cdotxaxpyf_ref_var1 +//#define BLIS_ZDOTXAXPYF_KERNEL_REF bli_zdotxaxpyf_ref_var1 +#define BLIS_SDOTXAXPYF_KERNEL_REF bli_sdotxaxpyf_ref_var2 +#define BLIS_DDOTXAXPYF_KERNEL_REF bli_ddotxaxpyf_ref_var2 +#define BLIS_CDOTXAXPYF_KERNEL_REF bli_cdotxaxpyf_ref_var2 +#define BLIS_ZDOTXAXPYF_KERNEL_REF bli_zdotxaxpyf_ref_var2 // // Level-1v @@ -255,38 +255,38 @@ // addv kernels -#define BLIS_SADDV_KERNEL_REF bli_ssaddv_ref -#define BLIS_DADDV_KERNEL_REF bli_ddaddv_ref -#define BLIS_CADDV_KERNEL_REF bli_ccaddv_ref -#define BLIS_ZADDV_KERNEL_REF bli_zzaddv_ref +#define BLIS_SADDV_KERNEL_REF bli_saddv_ref +#define BLIS_DADDV_KERNEL_REF bli_daddv_ref +#define BLIS_CADDV_KERNEL_REF bli_caddv_ref +#define BLIS_ZADDV_KERNEL_REF bli_zaddv_ref // axpyv kernels -#define BLIS_SAXPYV_KERNEL_REF bli_sssaxpyv_ref -#define BLIS_DAXPYV_KERNEL_REF bli_dddaxpyv_ref -#define BLIS_CAXPYV_KERNEL_REF bli_cccaxpyv_ref -#define BLIS_ZAXPYV_KERNEL_REF bli_zzzaxpyv_ref +#define BLIS_SAXPYV_KERNEL_REF bli_saxpyv_ref +#define BLIS_DAXPYV_KERNEL_REF bli_daxpyv_ref +#define BLIS_CAXPYV_KERNEL_REF bli_caxpyv_ref +#define BLIS_ZAXPYV_KERNEL_REF bli_zaxpyv_ref // copyv kernels -#define BLIS_SCOPYV_KERNEL_REF bli_sscopyv_ref -#define BLIS_DCOPYV_KERNEL_REF bli_ddcopyv_ref -#define BLIS_CCOPYV_KERNEL_REF bli_cccopyv_ref -#define BLIS_ZCOPYV_KERNEL_REF bli_zzcopyv_ref +#define BLIS_SCOPYV_KERNEL_REF bli_scopyv_ref +#define BLIS_DCOPYV_KERNEL_REF bli_dcopyv_ref +#define BLIS_CCOPYV_KERNEL_REF bli_ccopyv_ref +#define BLIS_ZCOPYV_KERNEL_REF bli_zcopyv_ref // dotv kernels -#define BLIS_SDOTV_KERNEL_REF bli_sssdotv_ref -#define BLIS_DDOTV_KERNEL_REF bli_ddddotv_ref -#define BLIS_CDOTV_KERNEL_REF bli_cccdotv_ref -#define BLIS_ZDOTV_KERNEL_REF bli_zzzdotv_ref +#define BLIS_SDOTV_KERNEL_REF bli_sdotv_ref +#define BLIS_DDOTV_KERNEL_REF bli_ddotv_ref +#define BLIS_CDOTV_KERNEL_REF bli_cdotv_ref +#define BLIS_ZDOTV_KERNEL_REF bli_zdotv_ref // dotxv kernels -#define BLIS_SDOTXV_KERNEL_REF bli_sssdotxv_ref -#define BLIS_DDOTXV_KERNEL_REF bli_ddddotxv_ref -#define BLIS_CDOTXV_KERNEL_REF bli_cccdotxv_ref -#define BLIS_ZDOTXV_KERNEL_REF bli_zzzdotxv_ref +#define BLIS_SDOTXV_KERNEL_REF bli_sdotxv_ref +#define BLIS_DDOTXV_KERNEL_REF bli_ddotxv_ref +#define BLIS_CDOTXV_KERNEL_REF bli_cdotxv_ref +#define BLIS_ZDOTXV_KERNEL_REF bli_zdotxv_ref // invertv kernels @@ -297,38 +297,38 @@ // scal2v kernels -#define BLIS_SSCAL2V_KERNEL_REF bli_sssscal2v_ref -#define BLIS_DSCAL2V_KERNEL_REF bli_dddscal2v_ref -#define BLIS_CSCAL2V_KERNEL_REF bli_cccscal2v_ref -#define BLIS_ZSCAL2V_KERNEL_REF bli_zzzscal2v_ref +#define BLIS_SSCAL2V_KERNEL_REF bli_sscal2v_ref +#define BLIS_DSCAL2V_KERNEL_REF bli_dscal2v_ref +#define BLIS_CSCAL2V_KERNEL_REF bli_cscal2v_ref +#define BLIS_ZSCAL2V_KERNEL_REF bli_zscal2v_ref // scalv kernels -#define BLIS_SSCALV_KERNEL_REF bli_ssscalv_ref -#define BLIS_DSCALV_KERNEL_REF bli_ddscalv_ref -#define BLIS_CSCALV_KERNEL_REF bli_ccscalv_ref -#define BLIS_ZSCALV_KERNEL_REF bli_zzscalv_ref +#define BLIS_SSCALV_KERNEL_REF bli_sscalv_ref +#define BLIS_DSCALV_KERNEL_REF bli_dscalv_ref +#define BLIS_CSCALV_KERNEL_REF bli_cscalv_ref +#define BLIS_ZSCALV_KERNEL_REF bli_zscalv_ref // setv kernels -#define BLIS_SSETV_KERNEL_REF bli_sssetv_ref -#define BLIS_DSETV_KERNEL_REF bli_ddsetv_ref -#define BLIS_CSETV_KERNEL_REF bli_ccsetv_ref -#define BLIS_ZSETV_KERNEL_REF bli_zzsetv_ref +#define BLIS_SSETV_KERNEL_REF bli_ssetv_ref +#define BLIS_DSETV_KERNEL_REF bli_dsetv_ref +#define BLIS_CSETV_KERNEL_REF bli_csetv_ref +#define BLIS_ZSETV_KERNEL_REF bli_zsetv_ref // subv kernels -#define BLIS_SSUBV_KERNEL_REF bli_sssubv_ref -#define BLIS_DSUBV_KERNEL_REF bli_ddsubv_ref -#define BLIS_CSUBV_KERNEL_REF bli_ccsubv_ref -#define BLIS_ZSUBV_KERNEL_REF bli_zzsubv_ref +#define BLIS_SSUBV_KERNEL_REF bli_ssubv_ref +#define BLIS_DSUBV_KERNEL_REF bli_dsubv_ref +#define BLIS_CSUBV_KERNEL_REF bli_csubv_ref +#define BLIS_ZSUBV_KERNEL_REF bli_zsubv_ref // swapv kernels -#define BLIS_SSWAPV_KERNEL_REF bli_ssswapv_ref -#define BLIS_DSWAPV_KERNEL_REF bli_ddswapv_ref -#define BLIS_CSWAPV_KERNEL_REF bli_ccswapv_ref -#define BLIS_ZSWAPV_KERNEL_REF bli_zzswapv_ref +#define BLIS_SSWAPV_KERNEL_REF bli_sswapv_ref +#define BLIS_DSWAPV_KERNEL_REF bli_dswapv_ref +#define BLIS_CSWAPV_KERNEL_REF bli_cswapv_ref +#define BLIS_ZSWAPV_KERNEL_REF bli_zswapv_ref diff --git a/frame/include/bli_kernel_prototypes.h b/frame/include/bli_kernel_prototypes.h index 19f6378f4..6a61f484d 100644 --- a/frame/include/bli_kernel_prototypes.h +++ b/frame/include/bli_kernel_prototypes.h @@ -35,494 +35,130 @@ #ifndef BLIS_KERNEL_PROTOTYPES_H #define BLIS_KERNEL_PROTOTYPES_H - -// -- Define PASTEMAC-friendly kernel function name macros --------------------- +// Generate prototypes for level-3 micro-kernels. // // Level-3 // -// gemm micro-kernels +#define bli_sgemm_ukr_name BLIS_SGEMM_UKERNEL +#define bli_dgemm_ukr_name BLIS_DGEMM_UKERNEL +#define bli_cgemm_ukr_name BLIS_CGEMM_UKERNEL +#define bli_zgemm_ukr_name BLIS_ZGEMM_UKERNEL -#define bli_sGEMM_UKERNEL BLIS_SGEMM_UKERNEL -#define bli_dGEMM_UKERNEL BLIS_DGEMM_UKERNEL -#define bli_cGEMM_UKERNEL BLIS_CGEMM_UKERNEL -#define bli_zGEMM_UKERNEL BLIS_ZGEMM_UKERNEL +#define bli_sgemmtrsm_l_ukr_name BLIS_SGEMMTRSM_L_UKERNEL +#define bli_dgemmtrsm_l_ukr_name BLIS_DGEMMTRSM_L_UKERNEL +#define bli_cgemmtrsm_l_ukr_name BLIS_CGEMMTRSM_L_UKERNEL +#define bli_zgemmtrsm_l_ukr_name BLIS_ZGEMMTRSM_L_UKERNEL -#undef GENTPROT -#define GENTPROT( ctype, ch, kername ) \ -\ -void PASTEMAC(ch,kername) \ - ( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ); +#define bli_sgemmtrsm_u_ukr_name BLIS_SGEMMTRSM_U_UKERNEL +#define bli_dgemmtrsm_u_ukr_name BLIS_DGEMMTRSM_U_UKERNEL +#define bli_cgemmtrsm_u_ukr_name BLIS_CGEMMTRSM_U_UKERNEL +#define bli_zgemmtrsm_u_ukr_name BLIS_ZGEMMTRSM_U_UKERNEL -INSERT_GENTPROT_BASIC( GEMM_UKERNEL ) +#define bli_strsm_l_ukr_name BLIS_STRSM_L_UKERNEL +#define bli_dtrsm_l_ukr_name BLIS_DTRSM_L_UKERNEL +#define bli_ctrsm_l_ukr_name BLIS_CTRSM_L_UKERNEL +#define bli_ztrsm_l_ukr_name BLIS_ZTRSM_L_UKERNEL -// gemmtrsm_l micro-kernels - -#define bli_sGEMMTRSM_L_UKERNEL BLIS_SGEMMTRSM_L_UKERNEL -#define bli_dGEMMTRSM_L_UKERNEL BLIS_DGEMMTRSM_L_UKERNEL -#define bli_cGEMMTRSM_L_UKERNEL BLIS_CGEMMTRSM_L_UKERNEL -#define bli_zGEMMTRSM_L_UKERNEL BLIS_ZGEMMTRSM_L_UKERNEL - -#undef GENTPROT -#define GENTPROT( ctype, ch, kername ) \ -\ -void PASTEMAC(ch,kername) \ - ( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a10, \ - ctype* restrict a11, \ - ctype* restrict b01, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ); - -INSERT_GENTPROT_BASIC( GEMMTRSM_L_UKERNEL ) - -// gemmtrsm_u micro-kernels - -#define bli_sGEMMTRSM_U_UKERNEL BLIS_SGEMMTRSM_U_UKERNEL -#define bli_dGEMMTRSM_U_UKERNEL BLIS_DGEMMTRSM_U_UKERNEL -#define bli_cGEMMTRSM_U_UKERNEL BLIS_CGEMMTRSM_U_UKERNEL -#define bli_zGEMMTRSM_U_UKERNEL BLIS_ZGEMMTRSM_U_UKERNEL - -#undef GENTPROT -#define GENTPROT( ctype, ch, kername ) \ -\ -void PASTEMAC(ch,kername) \ - ( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a12, \ - ctype* restrict a11, \ - ctype* restrict b21, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ); - -INSERT_GENTPROT_BASIC( GEMMTRSM_U_UKERNEL ) - -// trsm_l micro-kernels - -#define bli_sTRSM_L_UKERNEL BLIS_STRSM_L_UKERNEL -#define bli_dTRSM_L_UKERNEL BLIS_DTRSM_L_UKERNEL -#define bli_cTRSM_L_UKERNEL BLIS_CTRSM_L_UKERNEL -#define bli_zTRSM_L_UKERNEL BLIS_ZTRSM_L_UKERNEL - -#undef GENTPROT -#define GENTPROT( ctype, ch, kername ) \ -\ -void PASTEMAC(ch,kername) \ - ( \ - ctype* restrict a11, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ); - -INSERT_GENTPROT_BASIC( TRSM_L_UKERNEL ) - -// trsm_u micro-kernels - -#define bli_sTRSM_U_UKERNEL BLIS_STRSM_U_UKERNEL -#define bli_dTRSM_U_UKERNEL BLIS_DTRSM_U_UKERNEL -#define bli_cTRSM_U_UKERNEL BLIS_CTRSM_U_UKERNEL -#define bli_zTRSM_U_UKERNEL BLIS_ZTRSM_U_UKERNEL - -#undef GENTPROT -#define GENTPROT( ctype, ch, kername ) \ -\ -void PASTEMAC(ch,kername) \ - ( \ - ctype* restrict a11, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ); - -INSERT_GENTPROT_BASIC( TRSM_U_UKERNEL ) - - -// -// Level-1m -// - -// NOTE: We don't need any PASTEMAC-friendly aliases to packm kernel -// macros because they are used directly in the initialization of the -// function pointer array, rather than via a templatizing wrapper macro. +#define bli_strsm_u_ukr_name BLIS_STRSM_U_UKERNEL +#define bli_dtrsm_u_ukr_name BLIS_DTRSM_U_UKERNEL +#define bli_ctrsm_u_ukr_name BLIS_CTRSM_U_UKERNEL +#define bli_ztrsm_u_ukr_name BLIS_ZTRSM_U_UKERNEL +#include "bli_l3_ukr.h" // // Level-1f // -// axpy2v kernels +#define bli_saxpy2v_ker_name BLIS_SAXPY2V_KERNEL +#define bli_daxpy2v_ker_name BLIS_DAXPY2V_KERNEL +#define bli_caxpy2v_ker_name BLIS_CAXPY2V_KERNEL +#define bli_zaxpy2v_ker_name BLIS_ZAXPY2V_KERNEL -#define bli_sssAXPY2V_KERNEL BLIS_SAXPY2V_KERNEL -#define bli_dddAXPY2V_KERNEL BLIS_DAXPY2V_KERNEL -#define bli_cccAXPY2V_KERNEL BLIS_CAXPY2V_KERNEL -#define bli_zzzAXPY2V_KERNEL BLIS_ZAXPY2V_KERNEL +#define bli_sdotaxpyv_ker_name BLIS_SDOTAXPYV_KERNEL +#define bli_ddotaxpyv_ker_name BLIS_DDOTAXPYV_KERNEL +#define bli_cdotaxpyv_ker_name BLIS_CDOTAXPYV_KERNEL +#define bli_zdotaxpyv_ker_name BLIS_ZDOTAXPYV_KERNEL -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, kername ) \ -\ -void PASTEMAC3(chx,chy,chz,kername) \ - ( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_xy* restrict alpha1, \ - ctype_xy* restrict alpha2, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_z* restrict z, inc_t incz \ - ); +#define bli_sdotxf_ker_name BLIS_SDOTXF_KERNEL +#define bli_ddotxf_ker_name BLIS_DDOTXF_KERNEL +#define bli_cdotxf_ker_name BLIS_CDOTXF_KERNEL +#define bli_zdotxf_ker_name BLIS_ZDOTXF_KERNEL -INSERT_GENTPROT3U12_BASIC( AXPY2V_KERNEL ) +#define bli_saxpyf_ker_name BLIS_SAXPYF_KERNEL +#define bli_daxpyf_ker_name BLIS_DAXPYF_KERNEL +#define bli_caxpyf_ker_name BLIS_CAXPYF_KERNEL +#define bli_zaxpyf_ker_name BLIS_ZAXPYF_KERNEL -// dotaxpyv kernels - -#define bli_sssDOTAXPYV_KERNEL BLIS_SDOTAXPYV_KERNEL -#define bli_dddDOTAXPYV_KERNEL BLIS_DDOTAXPYV_KERNEL -#define bli_cccDOTAXPYV_KERNEL BLIS_CDOTAXPYV_KERNEL -#define bli_zzzDOTAXPYV_KERNEL BLIS_ZDOTAXPYV_KERNEL - -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, kername ) \ -\ -void PASTEMAC3(chx,chy,chz,kername) \ - ( \ - conj_t conjxt, \ - conj_t conjx, \ - conj_t conjy, \ - dim_t m, \ - ctype_x* restrict alpha, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_xy* restrict rho, \ - ctype_z* restrict z, inc_t incz \ - ); - -INSERT_GENTPROT3U12_BASIC( DOTAXPYV_KERNEL ) - -// axpyf kernels - -#define bli_sssAXPYF_KERNEL BLIS_SAXPYF_KERNEL -#define bli_dddAXPYF_KERNEL BLIS_DAXPYF_KERNEL -#define bli_cccAXPYF_KERNEL BLIS_CAXPYF_KERNEL -#define bli_zzzAXPYF_KERNEL BLIS_ZAXPYF_KERNEL - -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, kername ) \ -\ -void PASTEMAC3(cha,chx,chy,kername) \ - ( \ - conj_t conja, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ax* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ); - -INSERT_GENTPROT3U12_BASIC( AXPYF_KERNEL ) - -// dotxf kernels - -#define bli_sssDOTXF_KERNEL BLIS_SDOTXF_KERNEL -#define bli_dddDOTXF_KERNEL BLIS_DDOTXF_KERNEL -#define bli_cccDOTXF_KERNEL BLIS_CDOTXF_KERNEL -#define bli_zzzDOTXF_KERNEL BLIS_ZDOTXF_KERNEL - -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, kername ) \ -\ -void PASTEMAC3(cha,chx,chy,kername) \ - ( \ - conj_t conjat, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ax* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict beta, \ - ctype_y* restrict y, inc_t incy \ - ); - -INSERT_GENTPROT3U12_BASIC( DOTXF_KERNEL ) - -// dotxaxpyf kernels - -#define bli_sssDOTXAXPYF_KERNEL BLIS_SDOTXAXPYF_KERNEL -#define bli_dddDOTXAXPYF_KERNEL BLIS_DDOTXAXPYF_KERNEL -#define bli_cccDOTXAXPYF_KERNEL BLIS_CDOTXAXPYF_KERNEL -#define bli_zzzDOTXAXPYF_KERNEL BLIS_ZDOTXAXPYF_KERNEL - -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_a, ctype_b, ctype_c, ctype_ab, cha, chb, chc, chab, kername ) \ -\ -void PASTEMAC3(cha,chb,chc,kername) \ - ( \ - conj_t conjat, \ - conj_t conja, \ - conj_t conjw, \ - conj_t conjx, \ - dim_t m, \ - dim_t b_n, \ - ctype_ab* restrict alpha, \ - ctype_a* restrict a, inc_t inca, inc_t lda, \ - ctype_b* restrict w, inc_t incw, \ - ctype_b* restrict x, inc_t incx, \ - ctype_c* restrict beta, \ - ctype_c* restrict y, inc_t incy, \ - ctype_c* restrict z, inc_t incz \ - ); - -INSERT_GENTPROT3U12_BASIC( DOTXAXPYF_KERNEL ) +#define bli_sdotxaxpyf_ker_name BLIS_SDOTXAXPYF_KERNEL +#define bli_ddotxaxpyf_ker_name BLIS_DDOTXAXPYF_KERNEL +#define bli_cdotxaxpyf_ker_name BLIS_CDOTXAXPYF_KERNEL +#define bli_zdotxaxpyf_ker_name BLIS_ZDOTXAXPYF_KERNEL +#include "bli_l1f_ker.h" // // Level-1v // -// addv kernels +#define bli_saddv_ker_name BLIS_SADDV_KERNEL +#define bli_daddv_ker_name BLIS_DADDV_KERNEL +#define bli_caddv_ker_name BLIS_CADDV_KERNEL +#define bli_zaddv_ker_name BLIS_ZADDV_KERNEL -#define bli_ssADDV_KERNEL BLIS_SADDV_KERNEL -#define bli_ddADDV_KERNEL BLIS_DADDV_KERNEL -#define bli_ccADDV_KERNEL BLIS_CADDV_KERNEL -#define bli_zzADDV_KERNEL BLIS_ZADDV_KERNEL +#define bli_saxpyv_ker_name BLIS_SAXPYV_KERNEL +#define bli_daxpyv_ker_name BLIS_DAXPYV_KERNEL +#define bli_caxpyv_ker_name BLIS_CAXPYV_KERNEL +#define bli_zaxpyv_ker_name BLIS_ZAXPYV_KERNEL -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, kername ) \ -\ -void PASTEMAC2(chx,chy,kername) \ - ( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ); +#define bli_scopyv_ker_name BLIS_SCOPYV_KERNEL +#define bli_dcopyv_ker_name BLIS_DCOPYV_KERNEL +#define bli_ccopyv_ker_name BLIS_CCOPYV_KERNEL +#define bli_zcopyv_ker_name BLIS_ZCOPYV_KERNEL -INSERT_GENTPROT2_BASIC( ADDV_KERNEL ) +#define bli_sdotv_ker_name BLIS_SDOTV_KERNEL +#define bli_ddotv_ker_name BLIS_DDOTV_KERNEL +#define bli_cdotv_ker_name BLIS_CDOTV_KERNEL +#define bli_zdotv_ker_name BLIS_ZDOTV_KERNEL -// axpyv kernels +#define bli_sdotxv_ker_name BLIS_SDOTXV_KERNEL +#define bli_ddotxv_ker_name BLIS_DDOTXV_KERNEL +#define bli_cdotxv_ker_name BLIS_CDOTXV_KERNEL +#define bli_zdotxv_ker_name BLIS_ZDOTXV_KERNEL -#define bli_sssAXPYV_KERNEL BLIS_SAXPYV_KERNEL -#define bli_dddAXPYV_KERNEL BLIS_DAXPYV_KERNEL -#define bli_cccAXPYV_KERNEL BLIS_CAXPYV_KERNEL -#define bli_zzzAXPYV_KERNEL BLIS_ZAXPYV_KERNEL +#define bli_sinvertv_ker_name BLIS_SINVERTV_KERNEL +#define bli_dinvertv_ker_name BLIS_DINVERTV_KERNEL +#define bli_cinvertv_ker_name BLIS_CINVERTV_KERNEL +#define bli_zinvertv_ker_name BLIS_ZINVERTV_KERNEL -#undef GENTPROT3 -#define GENTPROT3( ctype_a, ctype_x, ctype_y, cha, chx, chy, kername ) \ -\ -void PASTEMAC3(cha,chx,chy,kername) \ - ( \ - conj_t conjx, \ - dim_t n, \ - ctype_a* restrict alpha, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ); +#define bli_sscalv_ker_name BLIS_SSCALV_KERNEL +#define bli_dscalv_ker_name BLIS_DSCALV_KERNEL +#define bli_cscalv_ker_name BLIS_CSCALV_KERNEL +#define bli_zscalv_ker_name BLIS_ZSCALV_KERNEL -INSERT_GENTPROT3_BASIC( AXPYV_KERNEL ) +#define bli_sscal2v_ker_name BLIS_SSCAL2V_KERNEL +#define bli_dscal2v_ker_name BLIS_DSCAL2V_KERNEL +#define bli_cscal2v_ker_name BLIS_CSCAL2V_KERNEL +#define bli_zscal2v_ker_name BLIS_ZSCAL2V_KERNEL -// copyv kernels +#define bli_ssetv_ker_name BLIS_SSETV_KERNEL +#define bli_dsetv_ker_name BLIS_DSETV_KERNEL +#define bli_csetv_ker_name BLIS_CSETV_KERNEL +#define bli_zsetv_ker_name BLIS_ZSETV_KERNEL -#define bli_ssCOPYV_KERNEL BLIS_SCOPYV_KERNEL -#define bli_ddCOPYV_KERNEL BLIS_DCOPYV_KERNEL -#define bli_ccCOPYV_KERNEL BLIS_CCOPYV_KERNEL -#define bli_zzCOPYV_KERNEL BLIS_ZCOPYV_KERNEL +#define bli_ssubv_ker_name BLIS_SSUBV_KERNEL +#define bli_dsubv_ker_name BLIS_DSUBV_KERNEL +#define bli_csubv_ker_name BLIS_CSUBV_KERNEL +#define bli_zsubv_ker_name BLIS_ZSUBV_KERNEL -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, kername ) \ -\ -void PASTEMAC2(chx,chy,kername) \ - ( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ); - -INSERT_GENTPROT2_BASIC( COPYV_KERNEL ) - -// dotv kernels - -#define bli_sssDOTV_KERNEL BLIS_SDOTV_KERNEL -#define bli_dddDOTV_KERNEL BLIS_DDOTV_KERNEL -#define bli_cccDOTV_KERNEL BLIS_CDOTV_KERNEL -#define bli_zzzDOTV_KERNEL BLIS_ZDOTV_KERNEL - -#undef GENTPROT3 -#define GENTPROT3( ctype_x, ctype_y, ctype_r, chx, chy, chr, kername ) \ -\ -void PASTEMAC3(chx,chy,chr,kername) \ - ( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_r* restrict rho \ - ); - -INSERT_GENTPROT3_BASIC( DOTV_KERNEL ) - -// dotxv kernels - -#define bli_sssDOTXV_KERNEL BLIS_SDOTXV_KERNEL -#define bli_dddDOTXV_KERNEL BLIS_DDOTXV_KERNEL -#define bli_cccDOTXV_KERNEL BLIS_CDOTXV_KERNEL -#define bli_zzzDOTXV_KERNEL BLIS_ZDOTXV_KERNEL - -#undef GENTPROT3U12 -#define GENTPROT3U12( ctype_x, ctype_y, ctype_r, ctype_xy, chx, chy, chr, chxy, kername ) \ -\ -void PASTEMAC3(chx,chy,chr,kername) \ - ( \ - conj_t conjx, \ - conj_t conjy, \ - dim_t n, \ - ctype_xy* restrict alpha, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy, \ - ctype_r* restrict beta, \ - ctype_r* restrict rho \ - ); - -INSERT_GENTPROT3U12_BASIC( DOTXV_KERNEL ) - -// invertv kernels - -#define bli_sINVERTV_KERNEL BLIS_SINVERTV_KERNEL -#define bli_dINVERTV_KERNEL BLIS_DINVERTV_KERNEL -#define bli_cINVERTV_KERNEL BLIS_CINVERTV_KERNEL -#define bli_zINVERTV_KERNEL BLIS_ZINVERTV_KERNEL - -#undef GENTPROT -#define GENTPROT( ctype, ch, kername ) \ -\ -void PASTEMAC(ch,kername) \ - ( \ - dim_t n, \ - ctype* restrict x, inc_t incx \ - ); - -INSERT_GENTPROT_BASIC( INVERTV_KERNEL ) - -// scal2v kernels - -#define bli_sssSCAL2V_KERNEL BLIS_SSCAL2V_KERNEL -#define bli_dddSCAL2V_KERNEL BLIS_DSCAL2V_KERNEL -#define bli_cccSCAL2V_KERNEL BLIS_CSCAL2V_KERNEL -#define bli_zzzSCAL2V_KERNEL BLIS_ZSCAL2V_KERNEL - -#undef GENTPROT3 -#define GENTPROT3( ctype_b, ctype_x, ctype_y, chb, chx, chy, kername ) \ -\ -void PASTEMAC3(chb,chx,chy,kername) \ - ( \ - conj_t conjx, \ - dim_t n, \ - ctype_b* restrict beta, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ); - -INSERT_GENTPROT3_BASIC( SCAL2V_KERNEL ) - -// scalv kernels - -#define bli_ssSCALV_KERNEL BLIS_SSCALV_KERNEL -#define bli_ddSCALV_KERNEL BLIS_DSCALV_KERNEL -#define bli_ccSCALV_KERNEL BLIS_CSCALV_KERNEL -#define bli_zzSCALV_KERNEL BLIS_ZSCALV_KERNEL - -#undef GENTPROT2 -#define GENTPROT2( ctype_b, ctype_x, chb, chx, kername ) \ -\ -void PASTEMAC2(chb,chx,kername) \ - ( \ - conj_t conjbeta, \ - dim_t n, \ - ctype_b* restrict beta, \ - ctype_x* restrict x, inc_t incx \ - ); - -INSERT_GENTPROT2_BASIC( SCALV_KERNEL ) - -// setv kernels - -#define bli_ssSETV_KERNEL BLIS_SSETV_KERNEL -#define bli_ddSETV_KERNEL BLIS_DSETV_KERNEL -#define bli_ccSETV_KERNEL BLIS_CSETV_KERNEL -#define bli_zzSETV_KERNEL BLIS_ZSETV_KERNEL - -#undef GENTPROT2 -#define GENTPROT2( ctype_b, ctype_x, chb, chx, kername ) \ -\ -void PASTEMAC2(chb,chx,kername) \ - ( \ - dim_t n, \ - ctype_b* restrict beta, \ - ctype_x* restrict x, inc_t incx \ - ); - -INSERT_GENTPROT2_BASIC( SETV_KERNEL ) - -// subv kernels - -#define bli_ssSUBV_KERNEL BLIS_SSUBV_KERNEL -#define bli_ddSUBV_KERNEL BLIS_DSUBV_KERNEL -#define bli_ccSUBV_KERNEL BLIS_CSUBV_KERNEL -#define bli_zzSUBV_KERNEL BLIS_ZSUBV_KERNEL - -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, kername ) \ -\ -void PASTEMAC2(chx,chy,kername) \ - ( \ - conj_t conjx, \ - dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ); - -INSERT_GENTPROT2_BASIC( SUBV_KERNEL ) - -// swapv kernels - -#define bli_ssSWAPV_KERNEL BLIS_SSWAPV_KERNEL -#define bli_ddSWAPV_KERNEL BLIS_DSWAPV_KERNEL -#define bli_ccSWAPV_KERNEL BLIS_CSWAPV_KERNEL -#define bli_zzSWAPV_KERNEL BLIS_ZSWAPV_KERNEL - -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, kername ) \ -\ -void PASTEMAC2(chx,chy,kername) \ - ( \ - dim_t n, \ - ctype_x* restrict x, inc_t incx, \ - ctype_y* restrict y, inc_t incy \ - ); - -INSERT_GENTPROT2_BASIC( SWAPV_KERNEL ) +#define bli_sswapv_ker_name BLIS_SSWAPV_KERNEL +#define bli_dswapv_ker_name BLIS_DSWAPV_KERNEL +#define bli_cswapv_ker_name BLIS_CSWAPV_KERNEL +#define bli_zswapv_ker_name BLIS_ZSWAPV_KERNEL +#include "bli_l1v_ker.h" #endif diff --git a/frame/include/bli_level3_type_defs.h b/frame/include/bli_level3_type_defs.h deleted file mode 100644 index 62446b836..000000000 --- a/frame/include/bli_level3_type_defs.h +++ /dev/null @@ -1,119 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#ifndef BLIS_LEVEL3_TYPE_DEFS_H -#define BLIS_LEVEL3_TYPE_DEFS_H - - -// -// -- BLIS level-3 operation types --------------------------------------------- -// - -// Here we generate typedef statements that generate function pointer types -// for all defined level-3 operations. - - - -// -- gemm -- - -#undef GENTDEF -#define GENTDEF( opname ) \ -\ -typedef void \ -(*PASTECH(opname,_fp_t))( \ - obj_t* alpha, \ - obj_t* a, \ - obj_t* b, \ - obj_t* beta, \ - obj_t* c \ - ); -GENTDEF( gemm ) -GENTDEF( her2k ) -GENTDEF( syr2k ) - - -// -- hemm/symm/trmm3 -- - -#undef GENTDEF -#define GENTDEF( opname ) \ -\ -typedef void \ -(*PASTECH(opname,_fp_t))( \ - side_t side, \ - obj_t* alpha, \ - obj_t* a, \ - obj_t* b, \ - obj_t* beta, \ - obj_t* c \ - ); -GENTDEF( hemm ) -GENTDEF( symm ) -GENTDEF( trmm3 ) - - -// -- herk/syrk -- - -#undef GENTDEF -#define GENTDEF( opname ) \ -\ -typedef void \ -(*PASTECH(opname,_fp_t))( \ - obj_t* alpha, \ - obj_t* a, \ - obj_t* beta, \ - obj_t* c \ - ); -GENTDEF( herk ) -GENTDEF( syrk ) - - -// -- trmm/trsm -- - -#undef GENTDEF -#define GENTDEF( opname ) \ -\ -typedef void \ -(*PASTECH(opname,_fp_t))( \ - side_t side, \ - obj_t* alpha, \ - obj_t* a, \ - obj_t* b \ - ); -GENTDEF( trmm ) -GENTDEF( trsm ) - - - -#endif - diff --git a/frame/include/bli_macro_defs.h b/frame/include/bli_macro_defs.h index 797c5d611..01cf44e79 100644 --- a/frame/include/bli_macro_defs.h +++ b/frame/include/bli_macro_defs.h @@ -116,6 +116,7 @@ // -- Include other groups of macros #include "bli_genarray_macro_defs.h" +#include "bli_gentdef_macro_defs.h" #include "bli_gentfunc_macro_defs.h" #include "bli_gentprot_macro_defs.h" diff --git a/frame/ind/cntl/bli_gemmind_cntl.h b/frame/include/bli_oapi_w_cntx.h similarity index 73% rename from frame/ind/cntl/bli_gemmind_cntl.h rename to frame/include/bli_oapi_w_cntx.h index 8e7b43592..a187d4913 100644 --- a/frame/ind/cntl/bli_gemmind_cntl.h +++ b/frame/include/bli_oapi_w_cntx.h @@ -32,24 +32,22 @@ */ -void bli_gemm3mh_cntl_init( void ); -void bli_gemm3mh_cntl_finalize( void ); -void bli_gemm3m3_cntl_init( void ); -void bli_gemm3m3_cntl_finalize( void ); +// This file defines macros used to allow the _oapi.c files to +// produce object APIs that contain context parameters. -void bli_gemm3m2_cntl_init( void ); -void bli_gemm3m2_cntl_finalize( void ); +// Define the macro to add a suffix to the object function names. +// We use "ex" for "expert". +#undef EX_SUF +#define EX_SUF _ex -void bli_gemm3m1_cntl_init( void ); -void bli_gemm3m1_cntl_finalize( void ); +// Define the macro to add cntx_t* arguments to function signatures +// and prototypes. +#undef BLIS_OAPI_CNTX_PARAM +#define BLIS_OAPI_CNTX_PARAM ,cntx_t* cntx -void bli_gemm4mh_cntl_init( void ); -void bli_gemm4mh_cntl_finalize( void ); - -void bli_gemm4mb_cntl_init( void ); -void bli_gemm4mb_cntl_finalize( void ); - -void bli_gemm4m1_cntl_init( void ); -void bli_gemm4m1_cntl_finalize( void ); +// Define the macro to omit the cntx_t declaration block, since it is +// not needed when cntx_t's are passed in through the API. +#undef BLIS_OAPI_CNTX_DECL +#define BLIS_OAPI_CNTX_DECL diff --git a/frame/include/bli_oapi_wo_cntx.h b/frame/include/bli_oapi_wo_cntx.h new file mode 100644 index 000000000..e4158b0f6 --- /dev/null +++ b/frame/include/bli_oapi_wo_cntx.h @@ -0,0 +1,51 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +// This file defines macros used to allow the _oapi.c files to +// produce object APIs that omit context parameters. + +// Define the macro to remove the function name suffix. +#undef EX_SUF +#define EX_SUF + +// Define the macro to omit cntx_t* arguments from function signatures +// and prototypes. +#undef BLIS_OAPI_CNTX_PARAM +#define BLIS_OAPI_CNTX_PARAM + +// Define the macro to declare a local cntx_t pointer that is initialized +// to NULL. +#undef BLIS_OAPI_CNTX_DECL +#define BLIS_OAPI_CNTX_DECL cntx_t* cntx = NULL; + diff --git a/frame/include/bli_obj_macro_defs.h b/frame/include/bli_obj_macro_defs.h index 5f126c987..30c72e735 100644 --- a/frame/include/bli_obj_macro_defs.h +++ b/frame/include/bli_obj_macro_defs.h @@ -213,36 +213,6 @@ \ ( ( (obj).info & BLIS_PACK_PANEL_BIT ) ) -#define bli_obj_is_4mi_packed( obj ) \ -\ - ( ( (obj).info & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_4MI ) - -#define bli_obj_is_3mi_packed( obj ) \ -\ - ( ( (obj).info & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_3MI ) - -#define bli_obj_is_3ms_packed( obj ) \ -\ - ( ( (obj).info & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_3MS ) - -#define bli_obj_is_ro_packed( obj ) \ -\ - ( ( (obj).info & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_RO ) - -#define bli_obj_is_io_packed( obj ) \ -\ - ( ( (obj).info & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_IO ) - -#define bli_obj_is_rpi_packed( obj ) \ -\ - ( ( (obj).info & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_RPI ) - -#define bli_obj_is_rih_packed( obj ) \ -\ - ( bli_obj_is_ro_packed( obj ) || \ - bli_obj_is_io_packed( obj ) || \ - bli_obj_is_rpi_packed( obj ) ) - #define bli_obj_pack_buffer_type( obj ) \ \ ( (obj).info & BLIS_PACK_BUFFER_BITS ) @@ -981,9 +951,9 @@ bli_obj_width_stored( obj ) #define bli_obj_init_pack( obj_p ) \ { \ - mem_t* pack_mem = bli_obj_pack_mem( *obj_p ); \ + mem_t* pack_mem_ = bli_obj_pack_mem( *obj_p ); \ \ - bli_mem_set_buffer( NULL, pack_mem ); \ + bli_mem_set_buffer( NULL, pack_mem_ ); \ } @@ -991,10 +961,10 @@ bli_obj_width_stored( obj ) #define bli_obj_release_pack( obj_p ) \ { \ - mem_t* pack_mem = bli_obj_pack_mem( *(obj_p) ); \ + mem_t* pack_mem_ = bli_obj_pack_mem( *(obj_p) ); \ \ - if ( bli_mem_is_alloc( pack_mem ) ) \ - bli_mem_release( pack_mem ); \ + if ( bli_mem_is_alloc( pack_mem_ ) ) \ + bli_mem_release( pack_mem_ ); \ } @@ -1033,8 +1003,8 @@ bli_obj_width_stored( obj ) #define bli_obj_swap( a, b ) \ { \ - obj_t t; \ - t = b; b = a; a = t; \ + obj_t t_; \ + t_ = b; b = a; a = t_; \ } @@ -1042,8 +1012,8 @@ bli_obj_width_stored( obj ) #define bli_obj_swap_pointers( a, b ) \ { \ - obj_t* t; \ - t = b; b = a; a = t; \ + obj_t* t_; \ + t_ = b; b = a; a = t_; \ } @@ -1053,18 +1023,18 @@ bli_obj_width_stored( obj ) #define bli_obj_induce_trans( obj ) \ { \ { \ - dim_t m = bli_obj_length( obj ); \ - dim_t n = bli_obj_width( obj ); \ - inc_t rs = bli_obj_row_stride( obj ); \ - inc_t cs = bli_obj_col_stride( obj ); \ - dim_t offm = bli_obj_row_off( obj ); \ - dim_t offn = bli_obj_col_off( obj ); \ - doff_t diag_off = bli_obj_diag_offset( obj ); \ + dim_t m_ = bli_obj_length( obj ); \ + dim_t n_ = bli_obj_width( obj ); \ + inc_t rs_ = bli_obj_row_stride( obj ); \ + inc_t cs_ = bli_obj_col_stride( obj ); \ + dim_t offm_ = bli_obj_row_off( obj ); \ + dim_t offn_ = bli_obj_col_off( obj ); \ + doff_t diag_off_ = bli_obj_diag_offset( obj ); \ \ - bli_obj_set_dims( n, m, obj ); \ - bli_obj_set_strides( cs, rs, obj ); \ - bli_obj_set_offs( offn, offm, obj ); \ - bli_obj_set_diag_offset( -diag_off, obj ); \ + bli_obj_set_dims( n_, m_, obj ); \ + bli_obj_set_strides( cs_, rs_, obj ); \ + bli_obj_set_offs( offn_, offm_, obj ); \ + bli_obj_set_diag_offset( -diag_off_, obj ); \ \ if ( bli_obj_is_upper_or_lower( obj ) ) \ bli_obj_toggle_uplo( obj ); \ @@ -1087,15 +1057,15 @@ bli_obj_width_stored( obj ) #define bli_obj_reflect_about_diag( obj ) \ { \ { \ - dim_t m = bli_obj_length( obj ); \ - dim_t n = bli_obj_width( obj ); \ - dim_t offm = bli_obj_row_off( obj ); \ - dim_t offn = bli_obj_col_off( obj ); \ - doff_t diag_off = bli_obj_diag_offset( obj ); \ + dim_t m_ = bli_obj_length( obj ); \ + dim_t n_ = bli_obj_width( obj ); \ + dim_t offm_ = bli_obj_row_off( obj ); \ + dim_t offn_ = bli_obj_col_off( obj ); \ + doff_t diag_off_ = bli_obj_diag_offset( obj ); \ \ - bli_obj_set_dims( n, m, obj ); \ - bli_obj_set_offs( offn, offm, obj ); \ - bli_obj_set_diag_offset( -diag_off, obj ); \ + bli_obj_set_dims( n_, m_, obj ); \ + bli_obj_set_offs( offn_, offm_, obj ); \ + bli_obj_set_diag_offset( -diag_off_, obj ); \ \ bli_obj_toggle_trans( obj ); \ } \ diff --git a/frame/include/bli_param_macro_defs.h b/frame/include/bli_param_macro_defs.h index 137fa4184..54cba702e 100644 --- a/frame/include/bli_param_macro_defs.h +++ b/frame/include/bli_param_macro_defs.h @@ -538,10 +538,10 @@ #define bli_reverse_index_direction( start, end, n ) \ { \ - dim_t start2 = n - start; \ - dim_t end2 = n - end; \ - start = end2; \ - end = start2; \ + dim_t start2_ = n - start; \ + dim_t end2_ = n - end; \ + start = end2_; \ + end = start2_; \ } #define bli_reflect_about_diag( diagoff, uplo, m, n ) \ @@ -662,6 +662,11 @@ \ ( ( schema & BLIS_PACK_FORMAT_BITS ) != 0 ) +#define bli_pack_schema_index( schema ) \ +\ + ( ( (schema) & BLIS_PACK_FORMAT_BITS ) >> BLIS_PACK_FORMAT_SHIFT ) + + // pointer-related @@ -675,6 +680,14 @@ ); \ } +#define bli_is_null( p ) \ +\ + ( p == NULL ) + +#define bli_is_nonnull( p ) \ +\ + ( p != NULL ) + // return datatype for char @@ -766,69 +779,69 @@ } \ else \ { \ - doff_t diagoffa_use = diagoffa; \ - doff_t diagoff_eff; \ - dim_t n_iter_max; \ + doff_t diagoffa_use_ = diagoffa; \ + doff_t diagoff_eff_; \ + dim_t n_iter_max_; \ \ if ( bli_is_unit_diag( diaga ) ) \ - bli_shift_diag_offset_to_shrink_uplo( uploa, diagoffa_use ); \ + bli_shift_diag_offset_to_shrink_uplo( uploa, diagoffa_use_ ); \ \ /* If matrix A is entirely "stored", that is, if either: - A is upper-stored and entirely above the diagonal, or - A is lower-stored and entirely below the diagonal then we mark the storage as dense. */ \ - if ( bli_is_stored_subpart( diagoffa_use, BLIS_NO_TRANSPOSE, uploa, m, n ) ) \ + if ( bli_is_stored_subpart( diagoffa_use_, BLIS_NO_TRANSPOSE, uploa, m, n ) ) \ uploa = BLIS_DENSE; \ \ - n_iter_max = n; \ + n_iter_max_ = n; \ n_elem_max = m; \ inca = rs_a; \ lda = cs_a; \ uplo_eff = uploa; \ - diagoff_eff = diagoffa_use; \ + diagoff_eff_ = diagoffa_use_; \ \ - if ( bli_is_row_tilted( n_elem_max, n_iter_max, inca, lda ) ) \ + if ( bli_is_row_tilted( n_elem_max, n_iter_max_, inca, lda ) ) \ { \ - bli_swap_dims( n_iter_max, n_elem_max ); \ + bli_swap_dims( n_iter_max_, n_elem_max ); \ bli_swap_incs( inca, lda ); \ bli_toggle_uplo( uplo_eff ); \ - bli_negate_diag_offset( diagoff_eff ); \ + bli_negate_diag_offset( diagoff_eff_ ); \ } \ \ if ( bli_is_dense( uplo_eff ) ) \ { \ - n_iter = n_iter_max; \ + n_iter = n_iter_max_; \ } \ else if ( bli_is_upper( uplo_eff ) ) \ { \ - if ( diagoff_eff < 0 ) \ + if ( diagoff_eff_ < 0 ) \ { \ ij0 = 0; \ - n_shift = -diagoff_eff; \ + n_shift = -diagoff_eff_; \ n_elem_max = bli_min( n_elem_max, n_shift + bli_min( m, n ) ); \ - n_iter = n_iter_max; \ + n_iter = n_iter_max_; \ } \ else \ { \ - ij0 = diagoff_eff; \ + ij0 = diagoff_eff_; \ n_shift = 0; \ - n_iter = n_iter_max - diagoff_eff; \ + n_iter = n_iter_max_ - diagoff_eff_; \ } \ } \ else /* if ( bli_is_lower( uplo_eff ) ) */ \ { \ - if ( diagoff_eff < 0 ) \ + if ( diagoff_eff_ < 0 ) \ { \ - ij0 = -diagoff_eff; \ + ij0 = -diagoff_eff_; \ n_shift = 0; \ - n_elem_max = n_elem_max + diagoff_eff; \ + n_elem_max = n_elem_max + diagoff_eff_; \ n_iter = bli_min( n_elem_max, bli_min( m, n ) ); \ } \ else \ { \ ij0 = 0; \ - n_shift = diagoff_eff; \ - n_iter = bli_min( n_iter_max, n_shift + bli_min( m, n ) ); \ + n_shift = diagoff_eff_; \ + n_iter = bli_min( n_iter_max_, n_shift + bli_min( m, n ) ); \ } \ } \ } \ @@ -859,61 +872,61 @@ } \ else \ { \ - doff_t diagoffa_use = diagoffa; \ - doff_t diagoff_eff; \ - dim_t n_iter_max; \ + doff_t diagoffa_use_ = diagoffa; \ + doff_t diagoff_eff_; \ + dim_t n_iter_max_; \ \ if ( bli_is_unit_diag( diaga ) ) \ - bli_shift_diag_offset_to_shrink_uplo( uploa, diagoffa_use ); \ + bli_shift_diag_offset_to_shrink_uplo( uploa, diagoffa_use_ ); \ \ /* If matrix A is entirely "stored", that is, if either: - A is upper-stored and entirely above the diagonal, or - A is lower-stored and entirely below the diagonal then we mark the storage as dense. */ \ - if ( bli_is_stored_subpart( diagoffa_use, BLIS_NO_TRANSPOSE, uploa, m, n ) ) \ + if ( bli_is_stored_subpart( diagoffa_use_, BLIS_NO_TRANSPOSE, uploa, m, n ) ) \ uploa = BLIS_DENSE; \ \ - n_iter_max = n; \ + n_iter_max_ = n; \ n_elem_max = m; \ inca = rs_a; \ lda = cs_a; \ uplo_eff = uploa; \ - diagoff_eff = diagoffa_use; \ + diagoff_eff_ = diagoffa_use_; \ \ if ( bli_is_dense( uplo_eff ) ) \ { \ - n_iter = n_iter_max; \ + n_iter = n_iter_max_; \ } \ else if ( bli_is_upper( uplo_eff ) ) \ { \ - if ( diagoff_eff < 0 ) \ + if ( diagoff_eff_ < 0 ) \ { \ ij0 = 0; \ - n_shift = -diagoff_eff; \ + n_shift = -diagoff_eff_; \ n_elem_max = bli_min( n_elem_max, n_shift + bli_min( m, n ) ); \ - n_iter = n_iter_max; \ + n_iter = n_iter_max_; \ } \ else \ { \ - ij0 = diagoff_eff; \ + ij0 = diagoff_eff_; \ n_shift = 0; \ - n_iter = n_iter_max - diagoff_eff; \ + n_iter = n_iter_max_ - diagoff_eff_; \ } \ } \ else /* if ( bli_is_lower( uplo_eff ) ) */ \ { \ - if ( diagoff_eff < 0 ) \ + if ( diagoff_eff_ < 0 ) \ { \ - ij0 = -diagoff_eff; \ + ij0 = -diagoff_eff_; \ n_shift = 0; \ - n_elem_max = n_elem_max + diagoff_eff; \ + n_elem_max = n_elem_max + diagoff_eff_; \ n_iter = bli_min( n_elem_max, bli_min( m, n ) ); \ } \ else \ { \ ij0 = 0; \ - n_shift = diagoff_eff; \ - n_iter = bli_min( n_iter_max, n_shift + bli_min( m, n ) ); \ + n_shift = diagoff_eff_; \ + n_iter = bli_min( n_iter_max_, n_shift + bli_min( m, n ) ); \ } \ } \ } \ @@ -944,84 +957,84 @@ } \ else \ { \ - doff_t diagoffa_use = diagoffa; \ - doff_t diagoff_eff; \ - dim_t n_iter_max; \ + doff_t diagoffa_use_ = diagoffa; \ + doff_t diagoff_eff_; \ + dim_t n_iter_max_; \ \ if ( bli_is_unit_diag( diaga ) ) \ - bli_shift_diag_offset_to_shrink_uplo( uploa, diagoffa_use ); \ + bli_shift_diag_offset_to_shrink_uplo( uploa, diagoffa_use_ ); \ \ /* If matrix A is entirely "stored", that is, if either: - A is upper-stored and entirely above the diagonal, or - A is lower-stored and entirely below the diagonal then we mark the storage as dense. */ \ - if ( bli_is_stored_subpart( diagoffa_use, transa, uploa, m, n ) ) \ + if ( bli_is_stored_subpart( diagoffa_use_, transa, uploa, m, n ) ) \ uploa = BLIS_DENSE; \ \ - n_iter_max = n; \ + n_iter_max_ = n; \ n_elem_max = m; \ inca = rs_a; \ lda = cs_a; \ incb = rs_b; \ ldb = cs_b; \ uplo_eff = uploa; \ - diagoff_eff = diagoffa_use; \ + diagoff_eff_ = diagoffa_use_; \ \ if ( bli_does_trans( transa ) ) \ { \ bli_swap_incs( inca, lda ); \ bli_toggle_uplo( uplo_eff ); \ - bli_negate_diag_offset( diagoff_eff ); \ + bli_negate_diag_offset( diagoff_eff_ ); \ } \ \ - if ( bli_is_row_tilted( n_elem_max, n_iter_max, incb, ldb ) && \ - bli_is_row_tilted( n_elem_max, n_iter_max, inca, lda ) ) \ + if ( bli_is_row_tilted( n_elem_max, n_iter_max_, incb, ldb ) && \ + bli_is_row_tilted( n_elem_max, n_iter_max_, inca, lda ) ) \ { \ - bli_swap_dims( n_iter_max, n_elem_max ); \ + bli_swap_dims( n_iter_max_, n_elem_max ); \ bli_swap_incs( inca, lda ); \ bli_swap_incs( incb, ldb ); \ bli_toggle_uplo( uplo_eff ); \ - bli_negate_diag_offset( diagoff_eff ); \ + bli_negate_diag_offset( diagoff_eff_ ); \ } \ \ if ( bli_is_dense( uplo_eff ) ) \ { \ - n_iter = n_iter_max; \ + n_iter = n_iter_max_; \ } \ else if ( bli_is_upper( uplo_eff ) ) \ { \ - if ( diagoff_eff < 0 ) \ + if ( diagoff_eff_ < 0 ) \ { \ -/*printf( "uplo_eff = upper, diagoff_eff < 0\n" );*/ \ +/*printf( "uplo_eff = upper, diagoff_eff_ < 0\n" );*/ \ ij0 = 0; \ - n_shift = -diagoff_eff; \ + n_shift = -diagoff_eff_; \ n_elem_max = bli_min( n_elem_max, n_shift + bli_min( m, n ) ); \ - n_iter = n_iter_max; \ + n_iter = n_iter_max_; \ } \ else \ { \ -/*printf( "uplo_eff = upper, diagoff_eff >= 0\n" );*/ \ - ij0 = diagoff_eff; \ +/*printf( "uplo_eff = upper, diagoff_eff_ >= 0\n" );*/ \ + ij0 = diagoff_eff_; \ n_shift = 0; \ - n_iter = n_iter_max - diagoff_eff; \ + n_iter = n_iter_max_ - diagoff_eff_; \ } \ } \ else /* if ( bli_is_lower( uplo_eff ) ) */ \ { \ - if ( diagoff_eff < 0 ) \ + if ( diagoff_eff_ < 0 ) \ { \ -/*printf( "uplo_eff = lower, diagoff_eff < 0\n" );*/ \ - ij0 = -diagoff_eff; \ +/*printf( "uplo_eff = lower, diagoff_eff_ < 0\n" );*/ \ + ij0 = -diagoff_eff_; \ n_shift = 0; \ - n_elem_max = n_elem_max + diagoff_eff; \ + n_elem_max = n_elem_max + diagoff_eff_; \ n_iter = bli_min( n_elem_max, bli_min( m, n ) ); \ } \ else \ { \ -/*printf( "uplo_eff = lower, diagoff_eff >= 0\n" );*/ \ +/*printf( "uplo_eff = lower, diagoff_eff_ >= 0\n" );*/ \ ij0 = 0; \ - n_shift = diagoff_eff; \ - n_iter = bli_min( n_iter_max, n_shift + bli_min( m, n ) ); \ + n_shift = diagoff_eff_; \ + n_iter = bli_min( n_iter_max_, n_shift + bli_min( m, n ) ); \ } \ } \ } \ @@ -1055,24 +1068,134 @@ m, n, rs_x, cs_x, rs_y, cs_y, \ offx, offy, n_elem, incx, incy ) \ { \ - doff_t diagoffy = bli_diag_offset_with_trans( transx, diagoffx ); \ + doff_t diagoffy_ = bli_diag_offset_with_trans( transx, diagoffx ); \ \ if ( diagoffx < 0 ) offx = -diagoffx * rs_x; \ else offx = diagoffx * cs_x; \ \ - if ( diagoffy < 0 ) \ + if ( diagoffy_ < 0 ) \ { \ - n_elem = bli_min( m - ( dim_t )(-diagoffy), n ); \ - offy = -diagoffy * rs_y; \ + n_elem = bli_min( m - ( dim_t )(-diagoffy_), n ); \ + offy = -diagoffy_ * rs_y; \ } \ else \ { \ - n_elem = bli_min( n - ( dim_t )( diagoffy), m ); \ - offy = diagoffy * cs_y; \ + n_elem = bli_min( n - ( dim_t )( diagoffy_), m ); \ + offy = diagoffy_ * cs_y; \ } \ \ incx = rs_x + cs_x; \ incy = rs_y + cs_y; \ } +// -- Function caller/chooser macros -- + +#define bli_call_ft_2( dt, fname, o0, o1 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1); \ +} +#define bli_call_ft_3( dt, fname, o0, o1, o2 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2); \ +} +#define bli_call_ft_4( dt, fname, o0, o1, o2, o3 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3); \ +} +#define bli_call_ft_5( dt, fname, o0, o1, o2, o3, o4 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4); \ +} +#define bli_call_ft_6( dt, fname, o0, o1, o2, o3, o4, o5 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5); \ +} +#define bli_call_ft_7( dt, fname, o0, o1, o2, o3, o4, o5, o6 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5,o6); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5,o6); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5,o6); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5,o6); \ +} +#define bli_call_ft_8( dt, fname, o0, o1, o2, o3, o4, o5, o6, o7 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5,o6,o7); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5,o6,o7); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5,o6,o7); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5,o6,o7); \ +} +#define bli_call_ft_9( dt, fname, o0, o1, o2, o3, o4, o5, o6, o7, o8 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8); \ +} +#define bli_call_ft_10( dt, fname, o0, o1, o2, o3, o4, o5, o6, o7, o8, o9 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9); \ +} +#define bli_call_ft_11( dt, fname, o0, o1, o2, o3, o4, o5, o6, o7, o8, o9, o10 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10); \ +} +#define bli_call_ft_12( dt, fname, o0, o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o11 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11); \ +} +#define bli_call_ft_13( dt, fname, o0, o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o11, o12 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12); \ +} +#define bli_call_ft_14( dt, fname, o0, o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o11, o12, o13 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13); \ +} +#define bli_call_ft_15( dt, fname, o0, o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o11, o12, o13, o14 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13,o14); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13,o14); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13,o14); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13,o14); \ +} +#define bli_call_ft_20( dt, fname, o0, o1, o2, o3, o4, o5, o6, o7, o8, o9, o10, o11, o12, o13, o14, o15, o16, o17, o18, o19 ) \ +{ \ + if ( bli_is_float( dt ) ) PASTEMAC(s,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13,o14,o15,o16,o17,o18,o19); \ + else if ( bli_is_double( dt ) ) PASTEMAC(d,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13,o14,o15,o16,o17,o18,o19); \ + else if ( bli_is_scomplex( dt ) ) PASTEMAC(c,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13,o14,o15,o16,o17,o18,o19); \ + else if ( bli_is_dcomplex( dt ) ) PASTEMAC(z,fname)(o0,o1,o2,o3,o4,o5,o6,o7,o8,o9,o10,o11,o12,o13,o14,o15,o16,o17,o18,o19); \ +} + + + #endif diff --git a/frame/include/bli_system.h b/frame/include/bli_system.h index 88038d201..f00e077d8 100644 --- a/frame/include/bli_system.h +++ b/frame/include/bli_system.h @@ -39,6 +39,7 @@ #include #include #include +#include #if defined(_WIN32) || defined(__CYGWIN__) #define BLIS_OS_WINDOWS 1 diff --git a/frame/include/bli_type_defs.h b/frame/include/bli_type_defs.h index 95b7c6c0e..2efaedf9e 100644 --- a/frame/include/bli_type_defs.h +++ b/frame/include/bli_type_defs.h @@ -214,6 +214,8 @@ typedef dcomplex f77_dcomplex; - 1 0001 11: packed by 4m interleaved column panels - 1 0010 10: packed by 3m interleaved row panels - 1 0010 11: packed by 3m interleaved column panels + - 1 0011 10: packed by 4m separated row panels (not used) + - 1 0011 11: packed by 4m separated column panels (not used) - 1 0100 10: packed by 3m separated row panels - 1 0100 11: packed by 3m separated column panels - 1 0101 10: packed real-only row panels @@ -322,11 +324,12 @@ typedef dcomplex f77_dcomplex; #define BLIS_BITVAL_NOT_PACKED 0x0 #define BLIS_BITVAL_4MI ( 0x1 << BLIS_PACK_FORMAT_SHIFT ) #define BLIS_BITVAL_3MI ( 0x2 << BLIS_PACK_FORMAT_SHIFT ) +#define BLIS_BITVAL_4MS ( 0x3 << BLIS_PACK_FORMAT_SHIFT ) #define BLIS_BITVAL_3MS ( 0x4 << BLIS_PACK_FORMAT_SHIFT ) #define BLIS_BITVAL_RO ( 0x5 << BLIS_PACK_FORMAT_SHIFT ) #define BLIS_BITVAL_IO ( 0x6 << BLIS_PACK_FORMAT_SHIFT ) #define BLIS_BITVAL_RPI ( 0x7 << BLIS_PACK_FORMAT_SHIFT ) -#define BLIS_BITVAL_PACKED_UNSPEC BLIS_PACK_BIT +#define BLIS_BITVAL_PACKED_UNSPEC ( BLIS_PACK_BIT ) #define BLIS_BITVAL_PACKED_ROWS ( BLIS_PACK_BIT ) #define BLIS_BITVAL_PACKED_COLUMNS ( BLIS_PACK_BIT | BLIS_PACK_RC_BIT ) #define BLIS_BITVAL_PACKED_ROW_PANELS ( BLIS_PACK_BIT | BLIS_PACK_PANEL_BIT ) @@ -335,6 +338,8 @@ typedef dcomplex f77_dcomplex; #define BLIS_BITVAL_PACKED_COL_PANELS_4MI ( BLIS_PACK_BIT | BLIS_BITVAL_4MI | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT ) #define BLIS_BITVAL_PACKED_ROW_PANELS_3MI ( BLIS_PACK_BIT | BLIS_BITVAL_3MI | BLIS_PACK_PANEL_BIT ) #define BLIS_BITVAL_PACKED_COL_PANELS_3MI ( BLIS_PACK_BIT | BLIS_BITVAL_3MI | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT ) +#define BLIS_BITVAL_PACKED_ROW_PANELS_4MS ( BLIS_PACK_BIT | BLIS_BITVAL_4MS | BLIS_PACK_PANEL_BIT ) +#define BLIS_BITVAL_PACKED_COL_PANELS_4MS ( BLIS_PACK_BIT | BLIS_BITVAL_4MS | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT ) #define BLIS_BITVAL_PACKED_ROW_PANELS_3MS ( BLIS_PACK_BIT | BLIS_BITVAL_3MS | BLIS_PACK_PANEL_BIT ) #define BLIS_BITVAL_PACKED_COL_PANELS_3MS ( BLIS_PACK_BIT | BLIS_BITVAL_3MS | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT ) #define BLIS_BITVAL_PACKED_ROW_PANELS_RO ( BLIS_PACK_BIT | BLIS_BITVAL_RO | BLIS_PACK_PANEL_BIT ) @@ -454,6 +459,8 @@ typedef enum BLIS_PACKED_COL_PANELS_4MI = BLIS_BITVAL_PACKED_COL_PANELS_4MI, BLIS_PACKED_ROW_PANELS_3MI = BLIS_BITVAL_PACKED_ROW_PANELS_3MI, BLIS_PACKED_COL_PANELS_3MI = BLIS_BITVAL_PACKED_COL_PANELS_3MI, + BLIS_PACKED_ROW_PANELS_4MS = BLIS_BITVAL_PACKED_ROW_PANELS_4MS, + BLIS_PACKED_COL_PANELS_4MS = BLIS_BITVAL_PACKED_COL_PANELS_4MS, BLIS_PACKED_ROW_PANELS_3MS = BLIS_BITVAL_PACKED_ROW_PANELS_3MS, BLIS_PACKED_COL_PANELS_3MS = BLIS_BITVAL_PACKED_COL_PANELS_3MS, BLIS_PACKED_ROW_PANELS_RO = BLIS_BITVAL_PACKED_ROW_PANELS_RO, @@ -464,6 +471,12 @@ typedef enum BLIS_PACKED_COL_PANELS_RPI = BLIS_BITVAL_PACKED_COL_PANELS_RPI, } pack_t; +// We combine row and column packing into one "type", and we start +// with BLIS_PACKED_ROW_PANELS, _COLUMN_PANELS. We also count the +// schema pair for "4ms" (4m separated), because its bit value has +// been reserved, even though we don't use it. +#define BLIS_NUM_PACK_SCHEMA_TYPES 8 + // -- Pack order type -- @@ -524,25 +537,15 @@ typedef struct mem_s siz_t size; } mem_t; -// -- Memory block type -- - - // -- Blocksize object type -- typedef struct blksz_s { // Primary blocksize values. - dim_t v[BLIS_NUM_FP_TYPES]; + dim_t v[BLIS_NUM_FP_TYPES]; - // Blocksize Extensions. - dim_t e[BLIS_NUM_FP_TYPES]; - - // Pointer to blocksize multiple object. - struct blksz_s* mult; - - // Pointer to mr and nr objects (if applicable). - struct blksz_s* mr; - struct blksz_s* nr; + // Blocksize extensions. + dim_t e[BLIS_NUM_FP_TYPES]; } blksz_t; @@ -553,10 +556,16 @@ typedef struct func_s // Kernel function address. void* ptr[BLIS_NUM_FP_TYPES]; - // Kernel row/column storage preference. - bool_t prefers_contig_rows[BLIS_NUM_FP_TYPES]; } func_t; +// -- Multi-boolean object type -- + +typedef struct mbool_s +{ + bool_t v[BLIS_NUM_FP_TYPES]; + +} mbool_t; + // -- Auxiliary kernel info type -- // Note: This struct is used by macro-kernels to package together extra @@ -755,6 +764,73 @@ typedef enum #define BLIS_MACH_PARAM_FIRST BLIS_MACH_EPS #define BLIS_MACH_PARAM_LAST BLIS_MACH_EPS2 +// -- Induced method types -- + +typedef enum +{ + BLIS_3MH = 0, + BLIS_3M3, + BLIS_3M2, + BLIS_3M1, + BLIS_4MH, + BLIS_4M1B, + BLIS_4M1A, + BLIS_NAT, +} ind_t; + +#define BLIS_NUM_IND_METHODS (BLIS_NAT+1) + +// -- Kernel ID types -- + +typedef enum +{ + BLIS_ADDV_KER = 0, + BLIS_AXPYV_KER, + BLIS_COPYV_KER, + BLIS_DOTV_KER, + BLIS_DOTXV_KER, + BLIS_INVERTV_KER, + BLIS_SCALV_KER, + BLIS_SCAL2V_KER, + BLIS_SETV_KER, + BLIS_SUBV_KER, + BLIS_SWAPV_KER, +} l1vkr_t; + +#define BLIS_NUM_LEVEL1V_KERS 11 + +typedef enum +{ + BLIS_AXPY2V_KER = 0, + BLIS_DOTAXPYV_KER, + BLIS_AXPYF_KER, + BLIS_DOTXF_KER, + BLIS_DOTXAXPYF_KER, +} l1fkr_t; + +#define BLIS_NUM_LEVEL1F_KERS 5 + +typedef enum +{ + BLIS_GEMM_UKR = 0, + BLIS_GEMMTRSM_L_UKR, + BLIS_GEMMTRSM_U_UKR, + BLIS_TRSM_L_UKR, + BLIS_TRSM_U_UKR, +} l3ukr_t; + +#define BLIS_NUM_LEVEL3_UKRS 5 + +typedef enum +{ + BLIS_REFERENCE_UKERNEL = 0, + BLIS_VIRTUAL_UKERNEL, + BLIS_OPTIMIZED_UKERNEL, + BLIS_NOTAPPLIC_UKERNEL, +} kimpl_t; + +#define BLIS_NUM_UKR_IMPL_TYPES 4 + // -- Operation ID type -- @@ -789,6 +865,52 @@ typedef enum #define BLIS_NUM_LEVEL3_OPS 10 +// -- Blocksize ID type -- + +typedef enum +{ + BLIS_KR = 0, + BLIS_MR, + BLIS_NR, + BLIS_MC, + BLIS_KC, + BLIS_NC, + BLIS_M2, // level-2 blocksize in m dimension + BLIS_N2, // level-2 blocksize in n dimension + BLIS_1F, // level-1f global fusing factor + BLIS_AF, // level-1f axpyf fusing factor + BLIS_DF, // level-1f dotxf fusing factor + BLIS_XF, // level-1f dotxaxpyf fusing factor + BLIS_VF, // level-1v vector fusing factor +} bszid_t; + +#define BLIS_NUM_BLKSZS 13 + + +// -- Context type -- + +typedef struct cntx_s +{ + blksz_t blkszs[ BLIS_NUM_BLKSZS ]; + bszid_t bmults[ BLIS_NUM_BLKSZS ]; + + func_t l3_vir_ukrs[ BLIS_NUM_LEVEL3_UKRS ]; + func_t l3_nat_ukrs[ BLIS_NUM_LEVEL3_UKRS ]; + mbool_t l3_nat_ukrs_prefs[ BLIS_NUM_LEVEL3_UKRS ]; + + func_t l1f_kers[ BLIS_NUM_LEVEL1F_KERS ]; + func_t l1v_kers[ BLIS_NUM_LEVEL1V_KERS ]; + + func_t packm_ukrs; + + ind_t method; + pack_t schema_a; + pack_t schema_b; + pack_t schema_c; + +} cntx_t; + + // -- Error types -- typedef enum @@ -875,6 +997,7 @@ typedef enum BLIS_INVALID_PACKBUF = (-120), BLIS_REQUESTED_CONTIG_BLOCK_TOO_BIG = (-121), BLIS_EXHAUSTED_CONTIG_MEMORY_POOL = (-122), + BLIS_INSUFFICIENT_STACK_BUF_SIZE = (-123), // Object-related errors BLIS_EXPECTED_OBJECT_ALIAS = (-130), diff --git a/frame/include/blis.h b/frame/include/blis.h index 057c734bf..9bfedd71a 100644 --- a/frame/include/blis.h +++ b/frame/include/blis.h @@ -67,8 +67,6 @@ extern "C" { #include "bli_type_defs.h" #include "bli_macro_defs.h" -#include "bli_level3_type_defs.h" - // -- Threading definitions -- @@ -83,7 +81,7 @@ extern "C" { // -- BLIS kernel definitions -- #include "bli_kernel.h" -#include "bli_kernel_type_defs.h" +//#include "bli_kernel_type_defs.h" #include "bli_kernel_pre_macro_defs.h" #include "bli_kernel_ind_pre_macro_defs.h" @@ -91,15 +89,7 @@ extern "C" { #include "bli_kernel_macro_defs.h" #include "bli_kernel_ind_macro_defs.h" -#include "bli_kernel_post_macro_defs.h" - #include "bli_kernel_prototypes.h" -#include "bli_kernel_ind_prototypes.h" - - -// -- BLIS memory pool definitions -- - -//#include "bli_mem_pool_macro_defs.h" // -- Base operation prototypes -- @@ -109,14 +99,17 @@ extern "C" { #include "bli_malloc.h" #include "bli_obj.h" #include "bli_obj_scalar.h" +#include "bli_cntx.h" +#include "bli_gks.h" #include "bli_ind.h" #include "bli_pool.h" #include "bli_mem.h" #include "bli_part.h" #include "bli_prune.h" #include "bli_query.h" -#include "bli_blocksize.h" +#include "bli_blksz.h" #include "bli_func.h" +#include "bli_mbool.h" #include "bli_param_map.h" #include "bli_clock.h" #include "bli_check.h" @@ -131,124 +124,42 @@ extern "C" { // -- Level-0 operations -- -#include "bli_absqsc.h" -#include "bli_addsc.h" -#include "bli_copysc.h" -#include "bli_divsc.h" -#include "bli_getsc.h" -#include "bli_mulsc.h" -#include "bli_normfsc.h" -#include "bli_setsc.h" -#include "bli_sqrtsc.h" -#include "bli_subsc.h" -#include "bli_zipsc.h" -#include "bli_unzipsc.h" +#include "bli_l0.h" -// -- Level-1 operations -- +// -- Level-1v operations -- -// one vector operand -#include "bli_invertv.h" -#include "bli_scalv.h" -#include "bli_setv.h" -// two vector operands -#include "bli_addv.h" -#include "bli_axpyv.h" -#include "bli_copyv.h" -#include "bli_dotv.h" -#include "bli_dotxv.h" -#include "bli_scal2v.h" -#include "bli_subv.h" -#include "bli_swapv.h" -#include "bli_packv.h" -#include "bli_unpackv.h" +#include "bli_l1v.h" // -- Level-1d operations -- -// one diagonal operand -#include "bli_invertd.h" -#include "bli_scald.h" -#include "bli_setd.h" -#include "bli_setid.h" -// two diagonal operands -#include "bli_addd.h" -#include "bli_axpyd.h" -#include "bli_copyd.h" -#include "bli_scal2d.h" -#include "bli_subd.h" +#include "bli_l1d.h" // -- Level-1f operations -- -#include "bli_axpy2v.h" -#include "bli_axpyf.h" -#include "bli_dotxf.h" -#include "bli_dotaxpyv.h" -#include "bli_dotxaxpyf.h" +#include "bli_l1f.h" // -- Level-1m operations -- -// one matrix operand -#include "bli_scalm.h" -#include "bli_setm.h" -// two matrix operands -#include "bli_addm.h" -#include "bli_axpym.h" -#include "bli_copym.h" -#include "bli_scal2m.h" -#include "bli_subm.h" -#include "bli_packm.h" -#include "bli_unpackm.h" +#include "bli_l1m.h" // -- Level-2 operations -- -#include "bli_gemv.h" -#include "bli_ger.h" -#include "bli_hemv.h" -#include "bli_her.h" -#include "bli_her2.h" -#include "bli_symv.h" -#include "bli_syr.h" -#include "bli_syr2.h" -#include "bli_trmv.h" -#include "bli_trsv.h" +#include "bli_l2.h" // -- Level-3 operations -- -#include "bli_gemm.h" -#include "bli_hemm.h" -#include "bli_herk.h" -#include "bli_her2k.h" -#include "bli_symm.h" -#include "bli_syrk.h" -#include "bli_syr2k.h" -#include "bli_trmm.h" -#include "bli_trmm3.h" -#include "bli_trsm.h" +#include "bli_l3.h" // -- Utility operations -- -#include "bli_amaxv.h" -#include "bli_asumv.h" -#include "bli_mkherm.h" -#include "bli_mksymm.h" -#include "bli_mktrim.h" -#include "bli_norm1v.h" -#include "bli_norm1m.h" -#include "bli_normfv.h" -#include "bli_normfm.h" -#include "bli_normiv.h" -#include "bli_normim.h" -#include "bli_printv.h" -#include "bli_printm.h" -#include "bli_randv.h" -#include "bli_randm.h" -#include "bli_sumsqv.h" +#include "bli_util.h" // -- CBLAS compatibility layer -- diff --git a/frame/include/level0/bli_adds_mxn.h b/frame/include/level0/bli_adds_mxn.h index 7eae7916f..33a8332b5 100644 --- a/frame/include/level0/bli_adds_mxn.h +++ b/frame/include/level0/bli_adds_mxn.h @@ -43,42 +43,42 @@ #define bli_ssadds_mxn( m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_ssadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_ssadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } #define bli_ddadds_mxn( m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_ddadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_ddadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } #define bli_ccadds_mxn( m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_ccadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_ccadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } #define bli_zzadds_mxn( m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_zzadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_zzadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } diff --git a/frame/include/level0/bli_adds_mxn_uplo.h b/frame/include/level0/bli_adds_mxn_uplo.h index 9ba599d58..0c5f2b345 100644 --- a/frame/include/level0/bli_adds_mxn_uplo.h +++ b/frame/include/level0/bli_adds_mxn_uplo.h @@ -39,16 +39,16 @@ #define bli_ssadds_mxn_u( diagoff, m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ + for ( _j = 0; _j < n; ++_j ) \ { \ - for ( i = 0; i < m; ++i ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_ssadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_ssadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -56,16 +56,16 @@ #define bli_ddadds_mxn_u( diagoff, m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ + for ( _j = 0; _j < n; ++_j ) \ { \ - for ( i = 0; i < m; ++i ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_ddadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_ddadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -73,16 +73,16 @@ #define bli_ccadds_mxn_u( diagoff, m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ + for ( _j = 0; _j < n; ++_j ) \ { \ - for ( i = 0; i < m; ++i ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_ccadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_ccadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -90,16 +90,16 @@ #define bli_zzadds_mxn_u( diagoff, m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ + for ( _j = 0; _j < n; ++_j ) \ { \ - for ( i = 0; i < m; ++i ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_zzadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_zzadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -109,16 +109,16 @@ #define bli_ssadds_mxn_l( diagoff, m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ + for ( _j = 0; _j < n; ++_j ) \ { \ - for ( i = 0; i < m; ++i ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_ssadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_ssadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -126,16 +126,16 @@ #define bli_ddadds_mxn_l( diagoff, m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ + for ( _j = 0; _j < n; ++_j ) \ { \ - for ( i = 0; i < m; ++i ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_ddadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_ddadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -143,16 +143,16 @@ #define bli_ccadds_mxn_l( diagoff, m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ + for ( _j = 0; _j < n; ++_j ) \ { \ - for ( i = 0; i < m; ++i ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_ccadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_ccadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -160,16 +160,16 @@ #define bli_zzadds_mxn_l( diagoff, m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ + for ( _j = 0; _j < n; ++_j ) \ { \ - for ( i = 0; i < m; ++i ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_zzadds( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_zzadds( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ diff --git a/frame/include/level0/bli_copys_mxn.h b/frame/include/level0/bli_copys_mxn.h index f2473ae4f..daa440b4a 100644 --- a/frame/include/level0/bli_copys_mxn.h +++ b/frame/include/level0/bli_copys_mxn.h @@ -43,42 +43,42 @@ #define bli_sscopys_mxn( m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_sscopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_sscopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } #define bli_ddcopys_mxn( m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_ddcopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_ddcopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } #define bli_cccopys_mxn( m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_cccopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_cccopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } #define bli_zzcopys_mxn( m, n, x, rs_x, cs_x, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_zzcopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_zzcopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } diff --git a/frame/include/level0/bli_set0s_mxn.h b/frame/include/level0/bli_set0s_mxn.h index a4a286630..77f9deff6 100644 --- a/frame/include/level0/bli_set0s_mxn.h +++ b/frame/include/level0/bli_set0s_mxn.h @@ -43,38 +43,38 @@ #define bli_sset0s_mxn( m, n, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_sset0s( *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_sset0s( *(y + _i*rs_y + _j*cs_y) ); \ } #define bli_dset0s_mxn( m, n, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_dset0s( *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_dset0s( *(y + _i*rs_y + _j*cs_y) ); \ } #define bli_cset0s_mxn( m, n, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_cset0s( *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_cset0s( *(y + _i*rs_y + _j*cs_y) ); \ } #define bli_zset0s_mxn( m, n, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_zset0s( *(y + i*rs_y + j*cs_y) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_zset0s( *(y + _i*rs_y + _j*cs_y) ); \ } diff --git a/frame/include/level0/bli_xpbys_mxn.h b/frame/include/level0/bli_xpbys_mxn.h index 9bcf137a5..f9f2e881d 100644 --- a/frame/include/level0/bli_xpbys_mxn.h +++ b/frame/include/level0/bli_xpbys_mxn.h @@ -44,8 +44,6 @@ #define bli_sssxpbys_mxn( m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ -\ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_seq0( *beta ) ) \ { \ @@ -55,18 +53,18 @@ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_sssxpbys( *(x + i*rs_x + j*cs_x), \ + dim_t _i, _j; \ +\ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_sssxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } #define bli_dddxpbys_mxn( m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ -\ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_deq0( *beta ) ) \ { \ @@ -76,18 +74,18 @@ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_dddxpbys( *(x + i*rs_x + j*cs_x), \ + dim_t _i, _j; \ +\ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_dddxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } #define bli_cccxpbys_mxn( m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ -\ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_ceq0( *beta ) ) \ { \ @@ -97,18 +95,18 @@ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_cccxpbys( *(x + i*rs_x + j*cs_x), \ + dim_t _i, _j; \ +\ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_cccxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } #define bli_zzzxpbys_mxn( m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ -\ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_zeq0( *beta ) ) \ { \ @@ -118,11 +116,13 @@ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_zzzxpbys( *(x + i*rs_x + j*cs_x), \ + dim_t _i, _j; \ +\ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_zzzxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } diff --git a/frame/include/level0/bli_xpbys_mxn_uplo.h b/frame/include/level0/bli_xpbys_mxn_uplo.h index 9bd5954f0..c1fd82d5b 100644 --- a/frame/include/level0/bli_xpbys_mxn_uplo.h +++ b/frame/include/level0/bli_xpbys_mxn_uplo.h @@ -39,112 +39,112 @@ #define bli_sssxpbys_mxn_u( diagoff, m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_seq0( *beta ) ) \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_sscopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_sscopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_sssxpbys( *(x + i*rs_x + j*cs_x), \ + bli_sssxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } #define bli_dddxpbys_mxn_u( diagoff, m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_deq0( *beta ) ) \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_ddcopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_ddcopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_dddxpbys( *(x + i*rs_x + j*cs_x), \ + bli_dddxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } #define bli_cccxpbys_mxn_u( diagoff, m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_ceq0( *beta ) ) \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_cccopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_cccopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_cccxpbys( *(x + i*rs_x + j*cs_x), \ + bli_cccxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } #define bli_zzzxpbys_mxn_u( diagoff, m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_zeq0( *beta ) ) \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_zzcopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_zzcopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ - bli_zzzxpbys( *(x + i*rs_x + j*cs_x), \ + bli_zzzxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } @@ -153,112 +153,112 @@ #define bli_sssxpbys_mxn_l( diagoff, m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_seq0( *beta ) ) \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_sscopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_sscopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_sssxpbys( *(x + i*rs_x + j*cs_x), \ + bli_sssxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } #define bli_dddxpbys_mxn_l( diagoff, m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_deq0( *beta ) ) \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_ddcopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_ddcopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_dddxpbys( *(x + i*rs_x + j*cs_x), \ + bli_dddxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } #define bli_cccxpbys_mxn_l( diagoff, m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_ceq0( *beta ) ) \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_cccopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_cccopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_cccxpbys( *(x + i*rs_x + j*cs_x), \ + bli_cccxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } #define bli_zzzxpbys_mxn_l( diagoff, m, n, x, rs_x, cs_x, beta, y, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ /* If beta is zero, overwrite y with x (in case y has infs or NaNs). */ \ if ( bli_zeq0( *beta ) ) \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_zzcopys( *(x + i*rs_x + j*cs_x), \ - *(y + i*rs_y + j*cs_y) ); \ + bli_zzcopys( *(x + _i*rs_x + _j*cs_x), \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ else \ { \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ - bli_zzzxpbys( *(x + i*rs_x + j*cs_x), \ + bli_zzzxpbys( *(x + _i*rs_x + _j*cs_x), \ *(beta), \ - *(y + i*rs_y + j*cs_y) ); \ + *(y + _i*rs_y + _j*cs_y) ); \ } \ } \ } diff --git a/frame/include/level0/old/bli_set0ris_mxn.h b/frame/include/level0/old/bli_set0ris_mxn.h index 64e17bfc2..a86153371 100644 --- a/frame/include/level0/old/bli_set0ris_mxn.h +++ b/frame/include/level0/old/bli_set0ris_mxn.h @@ -39,42 +39,42 @@ #define bli_sset0ris_mxn( m, n, ar, ai, rs_a, cs_a ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_sset0ris( *(ar + i*rs_a + j*cs_a), \ - *(ai + i*rs_a + j*cs_a) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_sset0ris( *(ar + _i*rs_a + _j*cs_a), \ + *(ai + _i*rs_a + _j*cs_a) ); \ } #define bli_dset0ris_mxn( m, n, ar, ai, rs_a, cs_a ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_dset0ris( *(ar + i*rs_a + j*cs_a), \ - *(ai + i*rs_a + j*cs_a) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_dset0ris( *(ar + _i*rs_a + _j*cs_a), \ + *(ai + _i*rs_a + _j*cs_a) ); \ } #define bli_cset0ris_mxn( m, n, ar, ai, rs_a, cs_a ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_cset0ris( *(ar + i*rs_a + j*cs_a), \ - *(ai + i*rs_a + j*cs_a) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_cset0ris( *(ar + _i*rs_a + _j*cs_a), \ + *(ai + _i*rs_a + _j*cs_a) ); \ } #define bli_zset0ris_mxn( m, n, ar, ai, rs_a, cs_a ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ - bli_zset0ris( *(ar + i*rs_a + j*cs_a), \ - *(ai + i*rs_a + j*cs_a) ); \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ + bli_zset0ris( *(ar + _i*rs_a + _j*cs_a), \ + *(ai + _i*rs_a + _j*cs_a) ); \ } diff --git a/frame/include/level0/ri/bli_absq2ris.h b/frame/include/level0/ri/bli_absq2ris.h index 7560880a1..5332c5145 100644 --- a/frame/include/level0/ri/bli_absq2ris.h +++ b/frame/include/level0/ri/bli_absq2ris.h @@ -49,13 +49,13 @@ #define bli_cabsq2ris( ar, ai, br, bi ) \ { \ - (br) = (ar) * (ar) + (ai) + (ai); \ + (br) = (ar) * (ar) + (ai) * (ai); \ (bi) = 0.0F; \ } #define bli_zabsq2ris( ar, ai, br, bi ) \ { \ - (br) = (ar) * (ar) + (ai) + (ai); \ + (br) = (ar) * (ar) + (ai) * (ai); \ (bi) = 0.0; \ } diff --git a/frame/include/level0/ri/bli_scalris_mxn_uplo.h b/frame/include/level0/ri/bli_scalris_mxn_uplo.h index 40d36ba8b..a2396dd49 100644 --- a/frame/include/level0/ri/bli_scalris_mxn_uplo.h +++ b/frame/include/level0/ri/bli_scalris_mxn_uplo.h @@ -39,34 +39,34 @@ #define bli_cscalris_mxn_u( diagoff, m, n, ar, ai, xr, xi, rs_x, cs_x ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ bli_cscalris( *(ar), \ *(ai), \ - *((xr) + i*rs_x + j*cs_x), \ - *((xi) + i*rs_x + j*cs_x) ); \ + *((xr) + _i*rs_x + _j*cs_x), \ + *((xi) + _i*rs_x + _j*cs_x) ); \ } \ } \ } #define bli_zscalris_mxn_u( diagoff, m, n, ar, ai, xr, xi, rs_x, cs_x ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i >= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i >= diagoff ) \ { \ bli_zscalris( *(ar), \ *(ai), \ - *((xr) + i*rs_x + j*cs_x), \ - *((xi) + i*rs_x + j*cs_x) ); \ + *((xr) + _i*rs_x + _j*cs_x), \ + *((xi) + _i*rs_x + _j*cs_x) ); \ } \ } \ } @@ -75,34 +75,34 @@ #define bli_cscalris_mxn_l( diagoff, m, n, ar, ai, xr, xi, rs_x, cs_x ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ bli_cscalris( *(ar), \ *(ai), \ - *((xr) + i*rs_x + j*cs_x), \ - *((xi) + i*rs_x + j*cs_x) ); \ + *((xr) + _i*rs_x + _j*cs_x), \ + *((xi) + _i*rs_x + _j*cs_x) ); \ } \ } \ } #define bli_zscalris_mxn_l( diagoff, m, n, ar, ai, xr, xi, rs_x, cs_x ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ - for ( j = 0; j < n; ++j ) \ - for ( i = 0; i < m; ++i ) \ + for ( _j = 0; _j < n; ++_j ) \ + for ( _i = 0; _i < m; ++_i ) \ { \ - if ( (doff_t)j - (doff_t)i <= diagoff ) \ + if ( (doff_t)_j - (doff_t)_i <= diagoff ) \ { \ bli_zscalris( *(ar), \ *(ai), \ - *((xr) + i*rs_x + j*cs_x), \ - *((xi) + i*rs_x + j*cs_x) ); \ + *((xr) + _i*rs_x + _j*cs_x), \ + *((xi) + _i*rs_x + _j*cs_x) ); \ } \ } \ } diff --git a/frame/include/level0/rih/bli_scal2rihs_mxn_diag.h b/frame/include/level0/rih/bli_scal2rihs_mxn_diag.h index 39f270820..f7d019458 100644 --- a/frame/include/level0/rih/bli_scal2rihs_mxn_diag.h +++ b/frame/include/level0/rih/bli_scal2rihs_mxn_diag.h @@ -40,34 +40,34 @@ #define bli_cscscal2rihs_mxn_diag( schema, m, n, a, x, rs_x, cs_x, y_r, rs_y, cs_y ) \ { \ dim_t min_m_n = bli_min( m, n ); \ - dim_t i; \ + dim_t _i; \ \ /* Handle ro, io, and rpi separately. */ \ if ( bli_is_ro_packed( schema ) ) \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ - bli_scscal2ros( *(x + i*rs_x + i*cs_x), \ + bli_scscal2ros( *(x + _i*rs_x + _i*cs_x), \ *(a), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ else if ( bli_is_io_packed( schema ) ) \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ - bli_scscal2ios( *(x + i*rs_x + i*cs_x), \ + bli_scscal2ios( *(x + _i*rs_x + _i*cs_x), \ *(a), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ else /* if ( bli_is_rpi_packed( schema ) ) */ \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ - bli_scscal2rpis( *(x + i*rs_x + i*cs_x), \ + bli_scscal2rpis( *(x + _i*rs_x + _i*cs_x), \ *(a), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ } @@ -75,34 +75,34 @@ #define bli_zdzscal2rihs_mxn_diag( schema, m, n, a, x, rs_x, cs_x, y_r, rs_y, cs_y ) \ { \ dim_t min_m_n = bli_min( m, n ); \ - dim_t i; \ + dim_t _i; \ \ /* Handle ro, io, and rpi separately. */ \ if ( bli_is_ro_packed( schema ) ) \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ - bli_dzscal2ros( *(x + i*rs_x + i*cs_x), \ + bli_dzscal2ros( *(x + _i*rs_x + _i*cs_x), \ *(a), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ else if ( bli_is_io_packed( schema ) ) \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ - bli_dzscal2ios( *(x + i*rs_x + i*cs_x), \ + bli_dzscal2ios( *(x + _i*rs_x + _i*cs_x), \ *(a), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ else /* if ( bli_is_rpi_packed( schema ) ) */ \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ - bli_dzscal2rpis( *(x + i*rs_x + i*cs_x), \ + bli_dzscal2rpis( *(x + _i*rs_x + _i*cs_x), \ *(a), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ } diff --git a/frame/include/level0/rih/bli_scal2rihs_mxn_uplo.h b/frame/include/level0/rih/bli_scal2rihs_mxn_uplo.h index 38423dfcb..fd4937aef 100644 --- a/frame/include/level0/rih/bli_scal2rihs_mxn_uplo.h +++ b/frame/include/level0/rih/bli_scal2rihs_mxn_uplo.h @@ -39,7 +39,7 @@ #define bli_cscal2rihs_mxn_uplo( schema, uplo, conjx, m, a, x, rs_x, cs_x, y_r, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ /* Handle ro, io, and rpi separately. */ \ if ( bli_is_ro_packed( schema ) ) \ @@ -48,22 +48,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_cscal2jros( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_cscal2ros( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -71,22 +71,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_cscal2jros( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_cscal2ros( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -97,22 +97,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_cscal2jios( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_cscal2ios( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -120,22 +120,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_cscal2jios( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_cscal2ios( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -146,22 +146,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_cscal2jrpis( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_cscal2rpis( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -169,22 +169,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_cscal2jrpis( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_cscal2rpis( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -193,7 +193,7 @@ #define bli_zscal2rihs_mxn_uplo( schema, uplo, conjx, m, a, x, rs_x, cs_x, y_r, rs_y, cs_y ) \ { \ - dim_t i, j; \ + dim_t _i, _j; \ \ /* Handle ro, io, and rpi separately. */ \ if ( bli_is_ro_packed( schema ) ) \ @@ -202,22 +202,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_zscal2jros( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_zscal2ros( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -225,22 +225,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_zscal2jros( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_zscal2ros( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -251,22 +251,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_zscal2jios( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_zscal2ios( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -274,22 +274,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_zscal2jios( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_zscal2ios( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -300,22 +300,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_zscal2jrpis( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = j; i < m; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = _j; _i < m; ++_i ) \ { \ bli_zscal2rpis( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ @@ -323,22 +323,22 @@ { \ if ( bli_is_conj( conjx ) ) \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_zscal2jrpis( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ else /* if ( bli_is_noconj( conjx ) ) */ \ { \ - for ( j = 0; j < m; ++j ) \ - for ( i = 0; i < j + 1; ++i ) \ + for ( _j = 0; _j < m; ++_j ) \ + for ( _i = 0; _i < _j + 1; ++_i ) \ { \ bli_zscal2rpis( *(a), \ - *(x + i*rs_x + j*cs_x), \ - *(y_r + i*rs_y + j*cs_y) ); \ + *(x + _i*rs_x + _j*cs_x), \ + *(y_r + _i*rs_y + _j*cs_y) ); \ } \ } \ } \ diff --git a/frame/include/level0/rih/bli_setrihs_mxn_diag.h b/frame/include/level0/rih/bli_setrihs_mxn_diag.h index 3fe2a8215..ca09f9793 100644 --- a/frame/include/level0/rih/bli_setrihs_mxn_diag.h +++ b/frame/include/level0/rih/bli_setrihs_mxn_diag.h @@ -42,32 +42,32 @@ const float a_r = bli_zreal( *a ); \ const float a_i = bli_zimag( *a ); \ dim_t min_m_n = bli_min( m, n ); \ - dim_t i; \ + dim_t _i; \ \ /* Handle ro, io, and rpi separately. */ \ if ( bli_is_ro_packed( schema ) ) \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ bli_scopys( (a_r), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ else if ( bli_is_io_packed( schema ) ) \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ bli_scopys( (a_i), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ else /* if ( bli_is_rpi_packed( schema ) ) */ \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ bli_sadd3s( (a_r), \ (a_i), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ } @@ -77,32 +77,32 @@ const double a_r = bli_zreal( *a ); \ const double a_i = bli_zimag( *a ); \ dim_t min_m_n = bli_min( m, n ); \ - dim_t i; \ + dim_t _i; \ \ /* Handle ro, io, and rpi separately. */ \ if ( bli_is_ro_packed( schema ) ) \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ bli_dcopys( (a_r), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ else if ( bli_is_io_packed( schema ) ) \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ bli_dcopys( (a_i), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ else /* if ( bli_is_rpi_packed( schema ) ) */ \ { \ - for ( i = 0; i < min_m_n; ++i ) \ + for ( _i = 0; _i < min_m_n; ++_i ) \ { \ bli_dadd3s( (a_r), \ (a_i), \ - *(y_r + i*rs_y + i*cs_y) ); \ + *(y_r + _i*rs_y + _i*cs_y) ); \ } \ } \ } diff --git a/frame/include/bli_kernel_post_macro_defs.h b/frame/include/old/bli_kernel_post_macro_defs.h similarity index 100% rename from frame/include/bli_kernel_post_macro_defs.h rename to frame/include/old/bli_kernel_post_macro_defs.h diff --git a/frame/include/old/bli_kernel_prototypes.h b/frame/include/old/bli_kernel_prototypes.h new file mode 100644 index 000000000..333b2c578 --- /dev/null +++ b/frame/include/old/bli_kernel_prototypes.h @@ -0,0 +1,529 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#ifndef BLIS_KERNEL_PROTOTYPES_H +#define BLIS_KERNEL_PROTOTYPES_H + + +// -- Define PASTEMAC-friendly kernel function name macros --------------------- + +// +// Level-3 +// + +// gemm micro-kernels + +#define bli_sGEMM_UKERNEL BLIS_SGEMM_UKERNEL +#define bli_dGEMM_UKERNEL BLIS_DGEMM_UKERNEL +#define bli_cGEMM_UKERNEL BLIS_CGEMM_UKERNEL +#define bli_zGEMM_UKERNEL BLIS_ZGEMM_UKERNEL + +#undef GENTPROT +#define GENTPROT( ctype, ch, kername ) \ +\ +void PASTEMAC(ch,kername) \ + ( \ + dim_t k, \ + ctype* alpha, \ + ctype* a, \ + ctype* b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* data \ + ); + +INSERT_GENTPROT_BASIC( GEMM_UKERNEL ) + +// gemmtrsm_l micro-kernels + +#define bli_sGEMMTRSM_L_UKERNEL BLIS_SGEMMTRSM_L_UKERNEL +#define bli_dGEMMTRSM_L_UKERNEL BLIS_DGEMMTRSM_L_UKERNEL +#define bli_cGEMMTRSM_L_UKERNEL BLIS_CGEMMTRSM_L_UKERNEL +#define bli_zGEMMTRSM_L_UKERNEL BLIS_ZGEMMTRSM_L_UKERNEL + +#undef GENTPROT +#define GENTPROT( ctype, ch, kername ) \ +\ +void PASTEMAC(ch,kername) \ + ( \ + dim_t k, \ + ctype* alpha, \ + ctype* a10, \ + ctype* a11, \ + ctype* b01, \ + ctype* b11, \ + ctype* c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* data \ + ); + +INSERT_GENTPROT_BASIC( GEMMTRSM_L_UKERNEL ) + +// gemmtrsm_u micro-kernels + +#define bli_sGEMMTRSM_U_UKERNEL BLIS_SGEMMTRSM_U_UKERNEL +#define bli_dGEMMTRSM_U_UKERNEL BLIS_DGEMMTRSM_U_UKERNEL +#define bli_cGEMMTRSM_U_UKERNEL BLIS_CGEMMTRSM_U_UKERNEL +#define bli_zGEMMTRSM_U_UKERNEL BLIS_ZGEMMTRSM_U_UKERNEL + +#undef GENTPROT +#define GENTPROT( ctype, ch, kername ) \ +\ +void PASTEMAC(ch,kername) \ + ( \ + dim_t k, \ + ctype* alpha, \ + ctype* a12, \ + ctype* a11, \ + ctype* b21, \ + ctype* b11, \ + ctype* c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* data \ + ); + +INSERT_GENTPROT_BASIC( GEMMTRSM_U_UKERNEL ) + +// trsm_l micro-kernels + +#define bli_sTRSM_L_UKERNEL BLIS_STRSM_L_UKERNEL +#define bli_dTRSM_L_UKERNEL BLIS_DTRSM_L_UKERNEL +#define bli_cTRSM_L_UKERNEL BLIS_CTRSM_L_UKERNEL +#define bli_zTRSM_L_UKERNEL BLIS_ZTRSM_L_UKERNEL + +#undef GENTPROT +#define GENTPROT( ctype, ch, kername ) \ +\ +void PASTEMAC(ch,kername) \ + ( \ + ctype* a11, \ + ctype* b11, \ + ctype* c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* data \ + ); + +INSERT_GENTPROT_BASIC( TRSM_L_UKERNEL ) + +// trsm_u micro-kernels + +#define bli_sTRSM_U_UKERNEL BLIS_STRSM_U_UKERNEL +#define bli_dTRSM_U_UKERNEL BLIS_DTRSM_U_UKERNEL +#define bli_cTRSM_U_UKERNEL BLIS_CTRSM_U_UKERNEL +#define bli_zTRSM_U_UKERNEL BLIS_ZTRSM_U_UKERNEL + +#undef GENTPROT +#define GENTPROT( ctype, ch, kername ) \ +\ +void PASTEMAC(ch,kername) \ + ( \ + ctype* a11, \ + ctype* b11, \ + ctype* c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* data \ + ); + +INSERT_GENTPROT_BASIC( TRSM_U_UKERNEL ) + + +// +// Level-1m +// + +// NOTE: We don't need any PASTEMAC-friendly aliases to packm kernel +// macros because they are used directly in the initialization of the +// function pointer array, rather than via a templatizing wrapper macro. + + +// +// Level-1f +// + +// axpy2v kernels + +#define bli_sssAXPY2V_KERNEL BLIS_SAXPY2V_KERNEL +#define bli_dddAXPY2V_KERNEL BLIS_DAXPY2V_KERNEL +#define bli_cccAXPY2V_KERNEL BLIS_CAXPY2V_KERNEL +#define bli_zzzAXPY2V_KERNEL BLIS_ZAXPY2V_KERNEL + +#undef GENTPROT3U12 +#define GENTPROT3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, kername ) \ +\ +void PASTEMAC3(chx,chy,chz,kername) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype_xy* alpha1, \ + ctype_xy* alpha2, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy, \ + ctype_z* z, inc_t incz \ + ); + +INSERT_GENTPROT3U12_BASIC( AXPY2V_KERNEL ) + +// dotaxpyv kernels + +#define bli_sssDOTAXPYV_KERNEL BLIS_SDOTAXPYV_KERNEL +#define bli_dddDOTAXPYV_KERNEL BLIS_DDOTAXPYV_KERNEL +#define bli_cccDOTAXPYV_KERNEL BLIS_CDOTAXPYV_KERNEL +#define bli_zzzDOTAXPYV_KERNEL BLIS_ZDOTAXPYV_KERNEL + +#undef GENTPROT3U12 +#define GENTPROT3U12( ctype_x, ctype_y, ctype_z, ctype_xy, chx, chy, chz, chxy, kername ) \ +\ +void PASTEMAC3(chx,chy,chz,kername) \ + ( \ + conj_t conjxt, \ + conj_t conjx, \ + conj_t conjy, \ + dim_t m, \ + ctype_x* alpha, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy, \ + ctype_xy* rho, \ + ctype_z* z, inc_t incz \ + ); + +INSERT_GENTPROT3U12_BASIC( DOTAXPYV_KERNEL ) + +// axpyf kernels + +#define bli_sssAXPYF_KERNEL BLIS_SAXPYF_KERNEL +#define bli_dddAXPYF_KERNEL BLIS_DAXPYF_KERNEL +#define bli_cccAXPYF_KERNEL BLIS_CAXPYF_KERNEL +#define bli_zzzAXPYF_KERNEL BLIS_ZAXPYF_KERNEL + +#undef GENTPROT3U12 +#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, kername ) \ +\ +void PASTEMAC3(cha,chx,chy,kername) \ + ( \ + conj_t conja, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype_ax* alpha, \ + ctype_a* a, inc_t inca, inc_t lda, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ + ); + +INSERT_GENTPROT3U12_BASIC( AXPYF_KERNEL ) + +// dotxf kernels + +#define bli_sssDOTXF_KERNEL BLIS_SDOTXF_KERNEL +#define bli_dddDOTXF_KERNEL BLIS_DDOTXF_KERNEL +#define bli_cccDOTXF_KERNEL BLIS_CDOTXF_KERNEL +#define bli_zzzDOTXF_KERNEL BLIS_ZDOTXF_KERNEL + +#undef GENTPROT3U12 +#define GENTPROT3U12( ctype_a, ctype_x, ctype_y, ctype_ax, cha, chx, chy, chax, kername ) \ +\ +void PASTEMAC3(cha,chx,chy,kername) \ + ( \ + conj_t conjat, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype_ax* alpha, \ + ctype_a* a, inc_t inca, inc_t lda, \ + ctype_x* x, inc_t incx, \ + ctype_y* beta, \ + ctype_y* y, inc_t incy \ + ); + +INSERT_GENTPROT3U12_BASIC( DOTXF_KERNEL ) + +// dotxaxpyf kernels + +#define bli_sssDOTXAXPYF_KERNEL BLIS_SDOTXAXPYF_KERNEL +#define bli_dddDOTXAXPYF_KERNEL BLIS_DDOTXAXPYF_KERNEL +#define bli_cccDOTXAXPYF_KERNEL BLIS_CDOTXAXPYF_KERNEL +#define bli_zzzDOTXAXPYF_KERNEL BLIS_ZDOTXAXPYF_KERNEL + +#undef GENTPROT3U12 +#define GENTPROT3U12( ctype_a, ctype_b, ctype_c, ctype_ab, cha, chb, chc, chab, kername ) \ +\ +void PASTEMAC3(cha,chb,chc,kername) \ + ( \ + conj_t conjat, \ + conj_t conja, \ + conj_t conjw, \ + conj_t conjx, \ + dim_t m, \ + dim_t b_n, \ + ctype_ab* alpha, \ + ctype_a* a, inc_t inca, inc_t lda, \ + ctype_b* w, inc_t incw, \ + ctype_b* x, inc_t incx, \ + ctype_c* beta, \ + ctype_c* y, inc_t incy, \ + ctype_c* z, inc_t incz \ + ); + +INSERT_GENTPROT3U12_BASIC( DOTXAXPYF_KERNEL ) + + +// +// Level-1v +// + +// addv kernels + +#define bli_ssADDV_KERNEL BLIS_SADDV_KERNEL +#define bli_ddADDV_KERNEL BLIS_DADDV_KERNEL +#define bli_ccADDV_KERNEL BLIS_CADDV_KERNEL +#define bli_zzADDV_KERNEL BLIS_ZADDV_KERNEL + +#undef GENTPROT2 +#define GENTPROT2( ctype_x, ctype_y, chx, chy, kername ) \ +\ +void PASTEMAC2(chx,chy,kername) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ + ); + +INSERT_GENTPROT2_BASIC( ADDV_KERNEL ) + +// axpyv kernels + +#define bli_sssAXPYV_KERNEL BLIS_SAXPYV_KERNEL +#define bli_dddAXPYV_KERNEL BLIS_DAXPYV_KERNEL +#define bli_cccAXPYV_KERNEL BLIS_CAXPYV_KERNEL +#define bli_zzzAXPYV_KERNEL BLIS_ZAXPYV_KERNEL + +#undef GENTPROT3 +#define GENTPROT3( ctype_a, ctype_x, ctype_y, cha, chx, chy, kername ) \ +\ +void PASTEMAC3(cha,chx,chy,kername) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype_a* alpha, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ + ); + +INSERT_GENTPROT3_BASIC( AXPYV_KERNEL ) + +// copyv kernels + +#define bli_ssCOPYV_KERNEL BLIS_SCOPYV_KERNEL +#define bli_ddCOPYV_KERNEL BLIS_DCOPYV_KERNEL +#define bli_ccCOPYV_KERNEL BLIS_CCOPYV_KERNEL +#define bli_zzCOPYV_KERNEL BLIS_ZCOPYV_KERNEL + +#undef GENTPROT2 +#define GENTPROT2( ctype_x, ctype_y, chx, chy, kername ) \ +\ +void PASTEMAC2(chx,chy,kername) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ + ); + +INSERT_GENTPROT2_BASIC( COPYV_KERNEL ) + +// dotv kernels + +#define bli_sssDOTV_KERNEL BLIS_SDOTV_KERNEL +#define bli_dddDOTV_KERNEL BLIS_DDOTV_KERNEL +#define bli_cccDOTV_KERNEL BLIS_CDOTV_KERNEL +#define bli_zzzDOTV_KERNEL BLIS_ZDOTV_KERNEL + +#undef GENTPROT3 +#define GENTPROT3( ctype_x, ctype_y, ctype_r, chx, chy, chr, kername ) \ +\ +void PASTEMAC3(chx,chy,chr,kername) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy, \ + ctype_r* rho \ + ); + +INSERT_GENTPROT3_BASIC( DOTV_KERNEL ) + +// dotxv kernels + +#define bli_sssDOTXV_KERNEL BLIS_SDOTXV_KERNEL +#define bli_dddDOTXV_KERNEL BLIS_DDOTXV_KERNEL +#define bli_cccDOTXV_KERNEL BLIS_CDOTXV_KERNEL +#define bli_zzzDOTXV_KERNEL BLIS_ZDOTXV_KERNEL + +#undef GENTPROT3U12 +#define GENTPROT3U12( ctype_x, ctype_y, ctype_r, ctype_xy, chx, chy, chr, chxy, kername ) \ +\ +void PASTEMAC3(chx,chy,chr,kername) \ + ( \ + conj_t conjx, \ + conj_t conjy, \ + dim_t n, \ + ctype_xy* alpha, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy, \ + ctype_r* beta, \ + ctype_r* rho \ + ); + +INSERT_GENTPROT3U12_BASIC( DOTXV_KERNEL ) + +// invertv kernels + +#define bli_sINVERTV_KERNEL BLIS_SINVERTV_KERNEL +#define bli_dINVERTV_KERNEL BLIS_DINVERTV_KERNEL +#define bli_cINVERTV_KERNEL BLIS_CINVERTV_KERNEL +#define bli_zINVERTV_KERNEL BLIS_ZINVERTV_KERNEL + +#undef GENTPROT +#define GENTPROT( ctype, ch, kername ) \ +\ +void PASTEMAC(ch,kername) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx \ + ); + +INSERT_GENTPROT_BASIC( INVERTV_KERNEL ) + +// scal2v kernels + +#define bli_sssSCAL2V_KERNEL BLIS_SSCAL2V_KERNEL +#define bli_dddSCAL2V_KERNEL BLIS_DSCAL2V_KERNEL +#define bli_cccSCAL2V_KERNEL BLIS_CSCAL2V_KERNEL +#define bli_zzzSCAL2V_KERNEL BLIS_ZSCAL2V_KERNEL + +#undef GENTPROT3 +#define GENTPROT3( ctype_b, ctype_x, ctype_y, chb, chx, chy, kername ) \ +\ +void PASTEMAC3(chb,chx,chy,kername) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype_b* beta, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ + ); + +INSERT_GENTPROT3_BASIC( SCAL2V_KERNEL ) + +// scalv kernels + +#define bli_ssSCALV_KERNEL BLIS_SSCALV_KERNEL +#define bli_ddSCALV_KERNEL BLIS_DSCALV_KERNEL +#define bli_ccSCALV_KERNEL BLIS_CSCALV_KERNEL +#define bli_zzSCALV_KERNEL BLIS_ZSCALV_KERNEL + +#undef GENTPROT2 +#define GENTPROT2( ctype_b, ctype_x, chb, chx, kername ) \ +\ +void PASTEMAC2(chb,chx,kername) \ + ( \ + conj_t conjbeta, \ + dim_t n, \ + ctype_b* beta, \ + ctype_x* x, inc_t incx \ + ); + +INSERT_GENTPROT2_BASIC( SCALV_KERNEL ) + +// setv kernels + +#define bli_ssSETV_KERNEL BLIS_SSETV_KERNEL +#define bli_ddSETV_KERNEL BLIS_DSETV_KERNEL +#define bli_ccSETV_KERNEL BLIS_CSETV_KERNEL +#define bli_zzSETV_KERNEL BLIS_ZSETV_KERNEL + +#undef GENTPROT2 +#define GENTPROT2( ctype_b, ctype_x, chb, chx, kername ) \ +\ +void PASTEMAC2(chb,chx,kername) \ + ( \ + dim_t n, \ + ctype_b* beta, \ + ctype_x* x, inc_t incx \ + ); + +INSERT_GENTPROT2_BASIC( SETV_KERNEL ) + +// subv kernels + +#define bli_ssSUBV_KERNEL BLIS_SSUBV_KERNEL +#define bli_ddSUBV_KERNEL BLIS_DSUBV_KERNEL +#define bli_ccSUBV_KERNEL BLIS_CSUBV_KERNEL +#define bli_zzSUBV_KERNEL BLIS_ZSUBV_KERNEL + +#undef GENTPROT2 +#define GENTPROT2( ctype_x, ctype_y, chx, chy, kername ) \ +\ +void PASTEMAC2(chx,chy,kername) \ + ( \ + conj_t conjx, \ + dim_t n, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ + ); + +INSERT_GENTPROT2_BASIC( SUBV_KERNEL ) + +// swapv kernels + +#define bli_ssSWAPV_KERNEL BLIS_SSWAPV_KERNEL +#define bli_ddSWAPV_KERNEL BLIS_DSWAPV_KERNEL +#define bli_ccSWAPV_KERNEL BLIS_CSWAPV_KERNEL +#define bli_zzSWAPV_KERNEL BLIS_ZSWAPV_KERNEL + +#undef GENTPROT2 +#define GENTPROT2( ctype_x, ctype_y, chx, chy, kername ) \ +\ +void PASTEMAC2(chx,chy,kername) \ + ( \ + dim_t n, \ + ctype_x* x, inc_t incx, \ + ctype_y* y, inc_t incy \ + ); + +INSERT_GENTPROT2_BASIC( SWAPV_KERNEL ) + + + +#endif + diff --git a/frame/include/bli_kernel_type_defs.h b/frame/include/old/bli_kernel_type_defs.h similarity index 95% rename from frame/include/bli_kernel_type_defs.h rename to frame/include/old/bli_kernel_type_defs.h index 99bca369f..e0190fe1b 100644 --- a/frame/include/bli_kernel_type_defs.h +++ b/frame/include/old/bli_kernel_type_defs.h @@ -41,12 +41,14 @@ // // Here we generate typedef statements that generate custom types for -// micro-kernel function pointers. Note that we use the function +// kernel function pointers. Note that we use the function // prototype-generating macro since it takes the same arguments we need // to define our types. +// -- Level-3 kernels -- -// -- gemm micro-kernel -- +/* +// gemm #undef GENTPROT #define GENTPROT( ctype, ch, tname ) \ @@ -65,7 +67,7 @@ typedef void \ INSERT_GENTPROT_BASIC( gemm_ukr_t ) -// -- trsm_l/u micro-kernels -- +// trsm_l/u #undef GENTPROT #define GENTPROT( ctype, ch, tname ) \ @@ -81,7 +83,7 @@ typedef void \ INSERT_GENTPROT_BASIC( trsm_ukr_t ) -// -- gemmtrsm_l/u micro-kernel -- +// gemmtrsm_l/u #undef GENTPROT #define GENTPROT( ctype, ch, tname ) \ @@ -100,8 +102,9 @@ typedef void \ INSERT_GENTPROT_BASIC( gemmtrsm_ukr_t ) +// -- packm kernels -- -// -- packm_struc_cxk kernel -- +// packm_struc_cxk #undef GENTPROT #define GENTPROT( ctype, ch, tname ) \ @@ -126,6 +129,7 @@ typedef void \ ); INSERT_GENTPROT_BASIC( packm_ker_t ) +*/ diff --git a/frame/ind/bli_ind.c b/frame/ind/bli_ind.c new file mode 100644 index 000000000..e715b2aad --- /dev/null +++ b/frame/ind/bli_ind.c @@ -0,0 +1,240 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +static bool_t bli_ind_is_init = FALSE; + +static char* bli_ind_impl_str[BLIS_NUM_IND_METHODS] = +{ +/* 3mh */ "3mh", +/* 3m2 */ "3m3", +/* 3m2 */ "3m2", +/* 3m1 */ "3m1", +/* 4mh */ "4mh", +/* 4m1b */ "4m1b", +/* 4m1a */ "4m1a", +/* nat */ "native", +}; + +// ----------------------------------------------------------------------------- + +void bli_ind_init( void ) +{ + // If the API is already initialized, return early. + if ( bli_ind_is_initialized() ) return; + +#ifdef BLIS_ENABLE_INDUCED_SCOMPLEX + bli_ind_enable_dt( BLIS_4M1A, BLIS_SCOMPLEX ); +#endif +#ifdef BLIS_ENABLE_INDUCED_DCOMPLEX + bli_ind_enable_dt( BLIS_4M1A, BLIS_DCOMPLEX ); +#endif + + // Mark API as initialized. + bli_ind_is_init = TRUE; +} + +void bli_ind_finalize( void ) +{ + // Mark API as uninitialized. + bli_ind_is_init = FALSE; +} + +bool_t bli_ind_is_initialized( void ) +{ + return bli_ind_is_init; +} + +// ----------------------------------------------------------------------------- + +void bli_ind_enable( ind_t method ) +{ + bli_ind_enable_dt( method, BLIS_SCOMPLEX ); + bli_ind_enable_dt( method, BLIS_DCOMPLEX ); +} + +void bli_ind_disable( ind_t method ) +{ + bli_ind_disable_dt( method, BLIS_SCOMPLEX ); + bli_ind_disable_dt( method, BLIS_DCOMPLEX ); +} + +void bli_ind_disable_all( void ) +{ + bli_ind_disable_all_dt( BLIS_SCOMPLEX ); + bli_ind_disable_all_dt( BLIS_DCOMPLEX ); +} + +// ----------------------------------------------------------------------------- + +void bli_ind_enable_dt( ind_t method, num_t dt ) +{ + if ( !bli_is_complex( dt ) ) return; + + bli_l3_ind_set_enable_dt( method, dt, TRUE ); +} + +void bli_ind_disable_dt( ind_t method, num_t dt ) +{ + if ( !bli_is_complex( dt ) ) return; + + bli_l3_ind_set_enable_dt( method, dt, FALSE ); +} + +void bli_ind_disable_all_dt( num_t dt ) +{ + ind_t im; + + for ( im = 0; im < BLIS_NUM_IND_METHODS; ++im ) + { + // Never disable native execution. + if ( im != BLIS_NAT ) + bli_ind_disable_dt( im, dt ); + } +} + +// ----------------------------------------------------------------------------- + +void bli_ind_oper_enable_only( opid_t oper, ind_t method, num_t dt ) +{ + if ( !bli_is_complex( dt ) ) return; + + if ( bli_opid_is_level3( oper ) ) + { + bli_l3_ind_oper_enable_only( oper, method, dt ); + } + else + { + // Other operations are not implemented, so requests to enable + // them for any given induced method are currently no-ops. + ; + } +} + +// ----------------------------------------------------------------------------- + +bool_t bli_ind_oper_is_impl( opid_t oper, ind_t method ) +{ + bool_t is_impl = FALSE; + + if ( bli_opid_is_level3( oper ) ) + { + // Look up whether its func_t pointer in the table is NULL. + is_impl = ( bli_l3_ind_oper_get_func( oper, method ) != NULL ); + } + else + { + // All other operations should be reported as not implemented, + // unless the requested check was for BLIS_NAT, in which case + // all operations are implemented. + if ( method == BLIS_NAT ) is_impl = TRUE; + else is_impl = FALSE; + } + + return is_impl; +} + +bool_t bli_ind_oper_has_avail( opid_t oper, num_t dt ) +{ + ind_t method = bli_ind_oper_find_avail( oper, dt ); + + if ( method == BLIS_NAT ) return FALSE; + else return TRUE; +} + +void* bli_ind_oper_get_avail( opid_t oper, num_t dt ) +{ + ind_t method = bli_ind_oper_find_avail( oper, dt ); + void* func_p; + + if ( bli_opid_is_level3( oper ) ) + { + func_p = bli_l3_ind_oper_get_func( oper, method ); + } + else + { + // Currently, any operation that is not level-3 does not + // have induced method implementations. (This should actually + // assign the pointer to be the native front-end, but for + // now there are no calls to bli_ind_oper_get_avail() in the + // context of level-2 operations. + func_p = NULL; + } + + return func_p; +} + +ind_t bli_ind_oper_find_avail( opid_t oper, num_t dt ) +{ + ind_t method; + + if ( bli_opid_is_level3( oper ) ) + { + method = bli_l3_ind_oper_find_avail( oper, dt ); + } + else + { + // Currently, any operation that is not level-3 is guaranteed + // to be native. + method = BLIS_NAT; + } + + return method; +} + +char* bli_ind_oper_get_avail_impl_string( opid_t oper, num_t dt ) +{ + ind_t method = bli_ind_oper_find_avail( oper, dt ); + + return bli_ind_get_impl_string( method ); +} + +// ----------------------------------------------------------------------------- + +char* bli_ind_get_impl_string( ind_t method ) +{ + return bli_ind_impl_str[ method ]; +} + +num_t bli_ind_map_cdt_to_index( num_t dt ) +{ + // A non-complex datatype should never be passed in. + if ( !bli_is_complex( dt ) ) bli_abort(); + + // Map the complex datatype to a zero-based index. + if ( bli_is_scomplex( dt ) ) return 0; + else /* if ( bli_is_dcomplex( dt ) ) */ return 1; +} + diff --git a/frame/ind/bli_ind.h b/frame/ind/bli_ind.h index a283ef689..b34941d91 100644 --- a/frame/ind/bli_ind.h +++ b/frame/ind/bli_ind.h @@ -32,25 +32,51 @@ */ +#ifndef BLIS_IND_H +#define BLIS_IND_H -// cntl -#include "bli_ind_cntl_init.h" -#include "bli_gemmind_cntl.h" -#include "bli_trsmind_cntl.h" +// level-3 induced method management +#include "bli_l3_ind.h" -// object API -#include "bli_oapi_ind.h" +// level-3 object APIs +#include "bli_l3_ind_oapi.h" -// typed API -#include "bli_tapi_ind.h" +// level-3 typed APIs +#include "bli_l3_ind_tapi.h" -// query -#include "bli_ind_query.h" -#include "bli_ukr_query.h" -#include "bli_bsv_query.h" +// level-3 cntx initialization +#include "bli_gemmind_cntx.h" +#include "bli_trsmind_cntx.h" -// ukernels +// level-3 ukernels #include "bli_gemmind_ukr_ref.h" #include "bli_gemmtrsmind_x_ukr_ref.h" #include "bli_trsmind_x_ukr_ref.h" + +void bli_ind_init( void ); +void bli_ind_finalize( void ); +bool_t bli_ind_is_initialized( void ); + +void bli_ind_enable( ind_t method ); +void bli_ind_disable( ind_t method ); +void bli_ind_disable_all( void ); + +void bli_ind_enable_dt( ind_t method, num_t dt ); +void bli_ind_disable_dt( ind_t method, num_t dt ); +void bli_ind_disable_all_dt( num_t dt ); + +void bli_ind_oper_enable_only( opid_t oper, ind_t method, num_t dt ); + +bool_t bli_ind_oper_is_impl( opid_t oper, ind_t method ); +bool_t bli_ind_oper_has_avail( opid_t oper, num_t dt ); +void* bli_ind_oper_get_avail( opid_t oper, num_t dt ); +ind_t bli_ind_oper_find_avail( opid_t oper, num_t dt ); +char* bli_ind_oper_get_avail_impl_string( opid_t oper, num_t dt ); + +char* bli_ind_get_impl_string( ind_t method ); +num_t bli_ind_map_cdt_to_index( num_t dt ); + + +#endif + diff --git a/frame/ind/query/bli_ind_query.c b/frame/ind/bli_l3_ind.c similarity index 59% rename from frame/ind/query/bli_ind_query.c rename to frame/ind/bli_l3_ind.c index d07abc7aa..e2d1a0f86 100644 --- a/frame/ind/query/bli_ind_query.c +++ b/frame/ind/bli_l3_ind.c @@ -34,7 +34,7 @@ #include "blis.h" -static void* bli_ind_oper_fp[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS] = +static void* bli_l3_ind_oper_fp[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS] = { /* gemm hemm herk her2k symm syrk, syr2k trmm3 trmm trsm */ /* 3mh */ { bli_gemm3mh, bli_hemm3mh, bli_herk3mh, bli_her2k3mh, bli_symm3mh, @@ -51,14 +51,14 @@ static void* bli_ind_oper_fp[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS] = NULL, NULL, NULL, NULL, NULL }, /* 4m1 */ { bli_gemm4m1, bli_hemm4m1, bli_herk4m1, bli_her2k4m1, bli_symm4m1, bli_syrk4m1, bli_syr2k4m1, bli_trmm34m1, bli_trmm4m1, bli_trsm4m1 }, -/* nat */ { bli_gemm, bli_hemm, bli_herk, bli_her2k, bli_symm, - bli_syrk, bli_syr2k, bli_trmm3, bli_trmm, bli_trsm }, +/* nat */ { bli_gemmnat, bli_hemmnat, bli_herknat, bli_her2knat, bli_symmnat, + bli_syrknat, bli_syr2knat, bli_trmm3nat, bli_trmmnat, bli_trsmnat }, }; // // NOTE: "2" is used instead of BLIS_NUM_FP_TYPES/2. // -static bool_t bli_ind_oper_st[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS][2] = +static bool_t bli_l3_ind_oper_st[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS][2] = { /* gemm hemm herk her2k symm syrk, syr2k trmm3 trmm trsm */ /* c z */ @@ -80,20 +80,6 @@ static bool_t bli_ind_oper_st[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS][2] = {TRUE,TRUE}, {TRUE,TRUE}, {TRUE,TRUE}, {TRUE,TRUE}, {TRUE,TRUE} }, }; -static char* bli_ind_impl_str[BLIS_NUM_IND_METHODS] = -{ -/* 3mh */ "3mh", -/* 3m2 */ "3m3", -/* 3m2 */ "3m2", -/* 3m1 */ "3m1", -/* 4mh */ "4mh", -/* 4m1b */ "4m1b", -/* 4m1a */ "4m1a", -/* nat */ "native", -}; - - - // ----------------------------------------------------------------------------- #undef GENFUNC @@ -121,24 +107,8 @@ GENFUNC( trsm, BLIS_TRSM ) // ----------------------------------------------------------------------------- -bool_t bli_ind_oper_is_impl( opid_t oper, ind_t method ) -{ - // If the operation is not level-3, the method is not implemented, - // UNLESS it is native, in which case it is always implemented. - if ( !bli_opid_is_level3( oper ) ) - { - if ( method == BLIS_NAT ) return TRUE; - else return FALSE; - } - - // If the operation is level-3, look up whether its func_t is NULL. - return ( bli_ind_oper_get_func( oper, method ) != NULL ); -} - -// ----------------------------------------------------------------------------- - #if 0 -bool_t bli_ind_oper_is_avail( opid_t oper, ind_t method, num_t dt ) +bool_t bli_l3_ind_oper_is_avail( opid_t oper, ind_t method, num_t dt ) { void* func; bool_t stat; @@ -146,8 +116,8 @@ bool_t bli_ind_oper_is_avail( opid_t oper, ind_t method, num_t dt ) // If the datatype is real, it is never available. if ( !bli_is_complex( dt ) ) return FALSE; - func = bli_ind_oper_get_func( oper, method ); - stat = bli_ind_oper_get_enable( oper, method, dt ); + func = bli_l3_ind_oper_get_func( oper, method ); + stat = bli_l3_ind_oper_get_enable( oper, method, dt ); return ( func != NULL && stat == TRUE ); } @@ -155,24 +125,7 @@ bool_t bli_ind_oper_is_avail( opid_t oper, ind_t method, num_t dt ) // ----------------------------------------------------------------------------- -bool_t bli_ind_oper_has_avail( opid_t oper, num_t dt ) -{ - ind_t method = bli_ind_oper_find_avail( oper, dt ); - - if ( method == BLIS_NAT ) return FALSE; - else return TRUE; -} - -void* bli_ind_oper_get_avail( opid_t oper, num_t dt ) -{ - ind_t method = bli_ind_oper_find_avail( oper, dt ); - - return bli_ind_oper_get_func( oper, method ); -} - -// ----------------------------------------------------------------------------- - -ind_t bli_ind_oper_find_avail( opid_t oper, num_t dt ) +ind_t bli_l3_ind_oper_find_avail( opid_t oper, num_t dt ) { ind_t im; @@ -187,8 +140,8 @@ ind_t bli_ind_oper_find_avail( opid_t oper, num_t dt ) // current operation and datatype. for ( im = 0; im < BLIS_NUM_IND_METHODS; ++im ) { - void* func = bli_ind_oper_get_func( oper, im ); - bool_t stat = bli_ind_oper_get_enable( oper, im, dt ); + void* func = bli_l3_ind_oper_get_func( oper, im ); + bool_t stat = bli_l3_ind_oper_get_enable( oper, im, dt ); if ( func != NULL && stat == TRUE ) return im; @@ -202,95 +155,7 @@ ind_t bli_ind_oper_find_avail( opid_t oper, num_t dt ) // ----------------------------------------------------------------------------- -char* bli_ind_oper_get_avail_impl_string( opid_t oper, num_t dt ) -{ - ind_t method = bli_ind_oper_find_avail( oper, dt ); - - return bli_ind_get_impl_string( method ); -} - -// ----------------------------------------------------------------------------- - -static bool_t bli_ind_is_init = FALSE; - -void bli_ind_init( void ) -{ - // If the API is already initialized, return early. - if ( bli_ind_is_initialized() ) return; - -#ifdef BLIS_ENABLE_INDUCED_SCOMPLEX - bli_ind_enable_dt( BLIS_4M1A, BLIS_SCOMPLEX ); -#endif -#ifdef BLIS_ENABLE_INDUCED_DCOMPLEX - bli_ind_enable_dt( BLIS_4M1A, BLIS_DCOMPLEX ); -#endif - - // Mark API as initialized. - bli_ind_is_init = TRUE; -} - -void bli_ind_finalize( void ) -{ - // Mark API as uninitialized. - bli_ind_is_init = FALSE; -} - -bool_t bli_ind_is_initialized( void ) -{ - return bli_ind_is_init; -} - -// ----------------------------------------------------------------------------- - -void bli_ind_enable( ind_t method ) -{ - bli_ind_enable_dt( method, BLIS_SCOMPLEX ); - bli_ind_enable_dt( method, BLIS_DCOMPLEX ); -} - -void bli_ind_disable( ind_t method ) -{ - bli_ind_disable_dt( method, BLIS_SCOMPLEX ); - bli_ind_disable_dt( method, BLIS_DCOMPLEX ); -} - -void bli_ind_disable_all( void ) -{ - bli_ind_disable_all_dt( BLIS_SCOMPLEX ); - bli_ind_disable_all_dt( BLIS_DCOMPLEX ); -} - -// ----------------------------------------------------------------------------- - -void bli_ind_enable_dt( ind_t method, num_t dt ) -{ - if ( !bli_is_complex( dt ) ) return; - - bli_ind_set_enable_dt( method, dt, TRUE ); -} - -void bli_ind_disable_dt( ind_t method, num_t dt ) -{ - if ( !bli_is_complex( dt ) ) return; - - bli_ind_set_enable_dt( method, dt, FALSE ); -} - -void bli_ind_disable_all_dt( num_t dt ) -{ - ind_t im; - - for ( im = 0; im < BLIS_NUM_IND_METHODS; ++im ) - { - // Never disable native execution. - if ( im != BLIS_NAT ) - bli_ind_disable_dt( im, dt ); - } -} - -// ----------------------------------------------------------------------------- - -void bli_ind_set_enable_dt( ind_t method, num_t dt, bool_t status ) +void bli_l3_ind_set_enable_dt( ind_t method, num_t dt, bool_t status ) { opid_t iop; @@ -299,22 +164,22 @@ void bli_ind_set_enable_dt( ind_t method, num_t dt, bool_t status ) // Iterate over all level-3 operation ids. for ( iop = 0; iop < BLIS_NUM_LEVEL3_OPS; ++iop ) { - bli_ind_oper_set_enable( iop, method, dt, status ); + bli_l3_ind_oper_set_enable( iop, method, dt, status ); } } // ----------------------------------------------------------------------------- -void bli_ind_oper_enable_only( opid_t oper, ind_t method, num_t dt ) +void bli_l3_ind_oper_enable_only( opid_t oper, ind_t method, num_t dt ) { if ( !bli_is_complex( dt ) ) return; if ( !bli_opid_is_level3( oper ) ) return; - bli_ind_oper_set_enable_all( oper, dt, FALSE ); - bli_ind_oper_set_enable( oper, method, dt, TRUE ); + bli_l3_ind_oper_set_enable_all( oper, dt, FALSE ); + bli_l3_ind_oper_set_enable( oper, method, dt, TRUE ); } -void bli_ind_oper_set_enable_all( opid_t oper, num_t dt, bool_t status ) +void bli_l3_ind_oper_set_enable_all( opid_t oper, num_t dt, bool_t status ) { ind_t im; @@ -325,13 +190,13 @@ void bli_ind_oper_set_enable_all( opid_t oper, num_t dt, bool_t status ) { // Native execution should always stay enabled. if ( im != BLIS_NAT ) - bli_ind_oper_set_enable( oper, im, dt, status ); + bli_l3_ind_oper_set_enable( oper, im, dt, status ); } } // ----------------------------------------------------------------------------- -void bli_ind_oper_set_enable( opid_t oper, ind_t method, num_t dt, bool_t status ) +void bli_l3_ind_oper_set_enable( opid_t oper, ind_t method, num_t dt, bool_t status ) { num_t idt; @@ -343,39 +208,20 @@ void bli_ind_oper_set_enable( opid_t oper, ind_t method, num_t dt, bool_t status idt = bli_ind_map_cdt_to_index( dt ); - bli_ind_oper_st[ method ][ oper ][ idt ] = status; + bli_l3_ind_oper_st[ method ][ oper ][ idt ] = status; } -bool_t bli_ind_oper_get_enable( opid_t oper, ind_t method, num_t dt ) +bool_t bli_l3_ind_oper_get_enable( opid_t oper, ind_t method, num_t dt ) { num_t idt = bli_ind_map_cdt_to_index( dt ); - return bli_ind_oper_st[ method ][ oper ][ idt ]; + return bli_l3_ind_oper_st[ method ][ oper ][ idt ]; } // ----------------------------------------------------------------------------- -void* bli_ind_oper_get_func( opid_t oper, ind_t method ) +void* bli_l3_ind_oper_get_func( opid_t oper, ind_t method ) { - return bli_ind_oper_fp[ method ][ oper ]; -} - -// ----------------------------------------------------------------------------- - -char* bli_ind_get_impl_string( ind_t method ) -{ - return bli_ind_impl_str[ method ]; -} - -// ----------------------------------------------------------------------------- - -num_t bli_ind_map_cdt_to_index( num_t dt ) -{ - // A non-complex datatype should never be passed in. - if ( !bli_is_complex( dt ) ) bli_abort(); - - // Map the complex datatype to a zero-based index. - if ( bli_is_scomplex( dt ) ) return 0; - else /* if ( bli_is_dcomplex( dt ) ) */ return 1; + return bli_l3_ind_oper_fp[ method ][ oper ]; } diff --git a/frame/ind/bli_l3_ind.h b/frame/ind/bli_l3_ind.h new file mode 100644 index 000000000..278d1b10e --- /dev/null +++ b/frame/ind/bli_l3_ind.h @@ -0,0 +1,75 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#ifndef BLIS_L3_IND_H +#define BLIS_L3_IND_H + +// ----------------------------------------------------------------------------- + +#undef GENPROT +#define GENPROT( opname ) \ +\ +bool_t PASTEMAC(opname,ind_has_avail)( num_t dt ); \ +void* PASTEMAC(opname,ind_get_avail)( num_t dt ); + +GENPROT( gemm ) +GENPROT( hemm ) +GENPROT( herk ) +GENPROT( her2k ) +GENPROT( symm ) +GENPROT( syrk ) +GENPROT( syr2k ) +GENPROT( trmm3 ) +GENPROT( trmm ) +GENPROT( trsm ) + +// ----------------------------------------------------------------------------- + +//bool_t bli_l3_ind_oper_is_avail( opid_t oper, ind_t method, num_t dt ); + +ind_t bli_l3_ind_oper_find_avail( opid_t oper, num_t dt ); + +void bli_l3_ind_set_enable_dt( ind_t method, num_t dt, bool_t status ); + +void bli_l3_ind_oper_enable_only( opid_t oper, ind_t method, num_t dt ); +void bli_l3_ind_oper_set_enable_all( opid_t oper, num_t dt, bool_t status ); + +void bli_l3_ind_oper_set_enable( opid_t oper, ind_t method, num_t dt, bool_t status ); +bool_t bli_l3_ind_oper_get_enable( opid_t oper, ind_t method, num_t dt ); + +void* bli_l3_ind_oper_get_func( opid_t oper, ind_t method ); + + +#endif + diff --git a/frame/ind/cntl/bli_gemm3m1_cntl.c b/frame/ind/cntl/bli_gemm3m1_cntl.c deleted file mode 100644 index 909aedfd5..000000000 --- a/frame/ind/cntl/bli_gemm3m1_cntl.c +++ /dev/null @@ -1,247 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - -blksz_t* gemm3m1_mc; -blksz_t* gemm3m1_nc; -blksz_t* gemm3m1_kc; -blksz_t* gemm3m1_mr; -blksz_t* gemm3m1_nr; -blksz_t* gemm3m1_kr; - -func_t* gemm3m1_ukrs; - -packm_t* gemm3m1_packa_cntl; -packm_t* gemm3m1_packb_cntl; - -gemm_t* gemm3m1_cntl_bp_ke; -gemm_t* gemm3m1_cntl_op_bp; -gemm_t* gemm3m1_cntl_mm_op; -gemm_t* gemm3m1_cntl_vl_mm; - -gemm_t* gemm3m1_cntl; - - -void bli_gemm3m1_cntl_init() -{ - // Create blocksize objects for each dimension. - // NOTE: the complex blocksizes for 3m1 are generally equal to their - // corresponding real domain counterparts. However, we want to promote - // similar cache footprints for the micro-panels of A and B (when - // compared to executing in the real domain), and since the complex - // micro-panels are three times as "fat" (due to storing real, imaginary - // and real+imaginary parts), we reduce KC by a factor of 2 to - // compensate. Ideally, we would reduce by a factor of 3, but that - // could get messy vis-a-vis keeping KC a multiple of the register - // blocksizes. - gemm3m1_mc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MC_S, BLIS_MAXIMUM_MC_S, - BLIS_DEFAULT_MC_D, BLIS_MAXIMUM_MC_D ); - gemm3m1_nc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NC_S, BLIS_MAXIMUM_NC_S, - BLIS_DEFAULT_NC_D, BLIS_MAXIMUM_NC_D ); - gemm3m1_kc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KC_S/3, BLIS_MAXIMUM_KC_S/3, - BLIS_DEFAULT_KC_D/3, BLIS_MAXIMUM_KC_D/3 ); - gemm3m1_mr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MR_S, BLIS_PACKDIM_MR_S, - BLIS_DEFAULT_MR_D, BLIS_PACKDIM_MR_D ); - gemm3m1_nr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NR_S, BLIS_PACKDIM_NR_S, - BLIS_DEFAULT_NR_D, BLIS_PACKDIM_NR_D ); - gemm3m1_kr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KR_S, BLIS_PACKDIM_KR_S, - BLIS_DEFAULT_KR_D, BLIS_PACKDIM_KR_D ); - - - // Attach the register blksz_t objects as blocksize multiples to the cache - // blksz_t objects. - bli_blksz_obj_attach_mult_to( gemm3m1_mr, gemm3m1_mc ); - bli_blksz_obj_attach_mult_to( gemm3m1_nr, gemm3m1_nc ); - bli_blksz_obj_attach_mult_to( gemm3m1_kr, gemm3m1_kc ); - - - // Attach the mr and nr blksz_t objects to each cache blksz_t object. - // The primary example of why this is needed relates to nudging kc. - // In hemm, symm, trmm, or trmm3, we need to know both mr and nr, - // since the multiple we target in nudging depends on whether the - // structured matrix is on the left or the right. - bli_blksz_obj_attach_mr_nr_to( gemm3m1_mr, gemm3m1_nr, gemm3m1_mc ); - bli_blksz_obj_attach_mr_nr_to( gemm3m1_mr, gemm3m1_nr, gemm3m1_nc ); - bli_blksz_obj_attach_mr_nr_to( gemm3m1_mr, gemm3m1_nr, gemm3m1_kc ); - - - // Create function pointer object for each datatype-specific gemm - // micro-kernel. - gemm3m1_ukrs - = - bli_func_obj_create( - NULL, FALSE, - NULL, FALSE, - BLIS_CGEMM3M1_UKERNEL, BLIS_CGEMM3M1_UKERNEL_PREFERS_CONTIG_ROWS, - BLIS_ZGEMM3M1_UKERNEL, BLIS_ZGEMM3M1_UKERNEL_PREFERS_CONTIG_ROWS ); - - - // Create control tree objects for packm operations. - gemm3m1_packa_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m1_mr, - gemm3m1_kr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_3MI, - BLIS_BUFFER_FOR_A_BLOCK ); - - gemm3m1_packb_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m1_kr, - gemm3m1_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_3MI, - BLIS_BUFFER_FOR_B_PANEL ); - - - // - // Create a control tree for packing A and B, and streaming C. - // - - // Create control tree object for lowest-level block-panel kernel. - gemm3m1_cntl_bp_ke - = - bli_gemm_cntl_obj_create( BLIS_UNB_OPT, - BLIS_VARIANT2, - NULL, - gemm3m1_ukrs, - NULL, NULL, NULL, - NULL, NULL, NULL ); - - // Create control tree object for outer panel (to block-panel) - // problem. - gemm3m1_cntl_op_bp - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m1_mc, - NULL, - NULL, - gemm3m1_packa_cntl, - gemm3m1_packb_cntl, - NULL, - gemm3m1_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. - gemm3m1_cntl_mm_op - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm3m1_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3m1_cntl_op_bp, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. - gemm3m1_cntl_vl_mm - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm3m1_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3m1_cntl_mm_op, - NULL ); - - // Alias the "master" gemm control tree to a shorter name. - gemm3m1_cntl = gemm3m1_cntl_vl_mm; - -} - -void bli_gemm3m1_cntl_finalize() -{ - bli_blksz_obj_free( gemm3m1_mc ); - bli_blksz_obj_free( gemm3m1_nc ); - bli_blksz_obj_free( gemm3m1_kc ); - bli_blksz_obj_free( gemm3m1_mr ); - bli_blksz_obj_free( gemm3m1_nr ); - bli_blksz_obj_free( gemm3m1_kr ); - - bli_func_obj_free( gemm3m1_ukrs ); - - bli_cntl_obj_free( gemm3m1_packa_cntl ); - bli_cntl_obj_free( gemm3m1_packb_cntl ); - - bli_cntl_obj_free( gemm3m1_cntl_bp_ke ); - bli_cntl_obj_free( gemm3m1_cntl_op_bp ); - bli_cntl_obj_free( gemm3m1_cntl_mm_op ); - bli_cntl_obj_free( gemm3m1_cntl_vl_mm ); - -} - diff --git a/frame/ind/cntl/bli_gemm3m2_cntl.c b/frame/ind/cntl/bli_gemm3m2_cntl.c deleted file mode 100644 index 7e362473c..000000000 --- a/frame/ind/cntl/bli_gemm3m2_cntl.c +++ /dev/null @@ -1,255 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - -blksz_t* gemm3m2_mc; -blksz_t* gemm3m2_nc; -blksz_t* gemm3m2_kc; -blksz_t* gemm3m2_mr; -blksz_t* gemm3m2_nr; -blksz_t* gemm3m2_kr; - -func_t* gemm3m2_ukrs; - -packm_t* gemm3m2_packa_cntl; -packm_t* gemm3m2_packb_cntl; - -gemm_t* gemm3m2_cntl_bp_ke; -gemm_t* gemm3m2_cntl_op_bp; -gemm_t* gemm3m2_cntl_mm_op; -gemm_t* gemm3m2_cntl_vl_mm; - -gemm_t* gemm3m2_cntl; - - -void bli_gemm3m2_cntl_init() -{ - // Create blocksize objects for each dimension. - // NOTE: the complex blocksizes for 3m2 are generally equal to their - // corresponding real domain counterparts. However, we want to promote - // similar cache footprints for the micro-panels of A and B (when - // compared to executing in the real domain), and since the complex - // micro-panels are three times as "fat" (due to storing real, imaginary - // and real+imaginary parts), we reduce KC by a factor of 2 to - // compensate. Ideally, we would reduce by a factor of 3, but that - // could get messy vis-a-vis keeping KC a multiple of the register - // blocksizes. - gemm3m2_mc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MC_S/3, BLIS_MAXIMUM_MC_S/3, - BLIS_DEFAULT_MC_D/3, BLIS_MAXIMUM_MC_D/3 ); - gemm3m2_nc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NC_S/3, BLIS_MAXIMUM_NC_S/3, - BLIS_DEFAULT_NC_D/3, BLIS_MAXIMUM_NC_D/3 ); - gemm3m2_kc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KC_S, BLIS_MAXIMUM_KC_S, - BLIS_DEFAULT_KC_D, BLIS_MAXIMUM_KC_D ); - gemm3m2_mr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MR_S, BLIS_PACKDIM_MR_S, - BLIS_DEFAULT_MR_D, BLIS_PACKDIM_MR_D ); - gemm3m2_nr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NR_S, BLIS_PACKDIM_NR_S, - BLIS_DEFAULT_NR_D, BLIS_PACKDIM_NR_D ); - gemm3m2_kr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KR_S, BLIS_PACKDIM_KR_S, - BLIS_DEFAULT_KR_D, BLIS_PACKDIM_KR_D ); - - - // Attach the register blksz_t objects as blocksize multiples to the cache - // blksz_t objects. - bli_blksz_obj_attach_mult_to( gemm3m2_mr, gemm3m2_mc ); - bli_blksz_obj_attach_mult_to( gemm3m2_nr, gemm3m2_nc ); - bli_blksz_obj_attach_mult_to( gemm3m2_kr, gemm3m2_kc ); - - - // The cache blocksizes that were scaled above need to be rounded down - // to their respective nearest register blocksize multiples. Note that - // this can only happen after the appropriate register blocksize is - // actually attached as a multiple. - bli_blksz_reduce_to_mult( gemm3m2_mc ); - bli_blksz_reduce_to_mult( gemm3m2_nc ); - - - // Attach the mr and nr blksz_t objects to each cache blksz_t object. - // The primary example of why this is needed relates to nudging kc. - // In hemm, symm, trmm, or trmm3, we need to know both mr and nr, - // since the multiple we target in nudging depends on whether the - // structured matrix is on the left or the right. - bli_blksz_obj_attach_mr_nr_to( gemm3m2_mr, gemm3m2_nr, gemm3m2_mc ); - bli_blksz_obj_attach_mr_nr_to( gemm3m2_mr, gemm3m2_nr, gemm3m2_nc ); - bli_blksz_obj_attach_mr_nr_to( gemm3m2_mr, gemm3m2_nr, gemm3m2_kc ); - - - // Create function pointer object for each datatype-specific gemm - // micro-kernel. - gemm3m2_ukrs - = - bli_func_obj_create( - NULL, FALSE, - NULL, FALSE, - BLIS_CGEMM3M2_UKERNEL, BLIS_CGEMM3M2_UKERNEL_PREFERS_CONTIG_ROWS, - BLIS_ZGEMM3M2_UKERNEL, BLIS_ZGEMM3M2_UKERNEL_PREFERS_CONTIG_ROWS ); - - - // Create control tree objects for packm operations. - gemm3m2_packa_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m2_mr, - gemm3m2_kr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_3MS, - BLIS_BUFFER_FOR_A_BLOCK ); - - gemm3m2_packb_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m2_kr, - gemm3m2_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_3MI, - BLIS_BUFFER_FOR_B_PANEL ); - - - // - // Create a control tree for packing A and B, and streaming C. - // - - // Create control tree object for lowest-level block-panel kernel. - gemm3m2_cntl_bp_ke - = - bli_gemm_cntl_obj_create( BLIS_UNB_OPT, - BLIS_VARIANT4, - NULL, - gemm3m2_ukrs, - NULL, NULL, NULL, - NULL, NULL, NULL ); - - // Create control tree object for outer panel (to block-panel) - // problem. - gemm3m2_cntl_op_bp - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m2_mc, - NULL, - NULL, - gemm3m2_packa_cntl, - gemm3m2_packb_cntl, - NULL, - gemm3m2_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. - gemm3m2_cntl_mm_op - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm3m2_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3m2_cntl_op_bp, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. - gemm3m2_cntl_vl_mm - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm3m2_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3m2_cntl_mm_op, - NULL ); - - // Alias the "master" gemm control tree to a shorter name. - gemm3m2_cntl = gemm3m2_cntl_vl_mm; - -} - -void bli_gemm3m2_cntl_finalize() -{ - bli_blksz_obj_free( gemm3m2_mc ); - bli_blksz_obj_free( gemm3m2_nc ); - bli_blksz_obj_free( gemm3m2_kc ); - bli_blksz_obj_free( gemm3m2_mr ); - bli_blksz_obj_free( gemm3m2_nr ); - bli_blksz_obj_free( gemm3m2_kr ); - - bli_func_obj_free( gemm3m2_ukrs ); - - bli_cntl_obj_free( gemm3m2_packa_cntl ); - bli_cntl_obj_free( gemm3m2_packb_cntl ); - - bli_cntl_obj_free( gemm3m2_cntl_bp_ke ); - bli_cntl_obj_free( gemm3m2_cntl_op_bp ); - bli_cntl_obj_free( gemm3m2_cntl_mm_op ); - bli_cntl_obj_free( gemm3m2_cntl_vl_mm ); - -} - diff --git a/frame/ind/cntl/bli_gemm3m3_cntl.c b/frame/ind/cntl/bli_gemm3m3_cntl.c deleted file mode 100644 index e444067fd..000000000 --- a/frame/ind/cntl/bli_gemm3m3_cntl.c +++ /dev/null @@ -1,240 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - -blksz_t* gemm3m3_mc; -blksz_t* gemm3m3_nc; -blksz_t* gemm3m3_kc; -blksz_t* gemm3m3_mr; -blksz_t* gemm3m3_nr; -blksz_t* gemm3m3_kr; - -func_t* gemm3m3_ukrs; - -packm_t* gemm3m3_packb_cntl; - -gemm_t* gemm3m3_cntl_bp_ke; -gemm_t* gemm3m3_cntl_op_bp; -gemm_t* gemm3m3_cntl_mm_op; -gemm_t* gemm3m3_cntl_vl_mm; - -gemm_t* gemm3m3_cntl; - - -void bli_gemm3m3_cntl_init() -{ - // Create blocksize objects for each dimension. - // NOTE: the complex blocksizes for 3m3 are generally equal to their - // corresponding real domain counterparts. However, we want to promote - // similar cache footprints for the micro-panels of A and B (when - // compared to executing in the real domain), and since the complex - // micro-panels are three times as "fat" (due to storing real, imaginary - // and real+imaginary parts), we reduce KC by a factor of 2 to - // compensate. Ideally, we would reduce by a factor of 3, but that - // could get messy vis-a-vis keeping KC a multiple of the register - // blocksizes. - gemm3m3_mc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MC_S, BLIS_MAXIMUM_MC_S, - BLIS_DEFAULT_MC_D, BLIS_MAXIMUM_MC_D ); - gemm3m3_nc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NC_S/3, BLIS_MAXIMUM_NC_S/3, - BLIS_DEFAULT_NC_D/3, BLIS_MAXIMUM_NC_D/3 ); - gemm3m3_kc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KC_S, BLIS_MAXIMUM_KC_S, - BLIS_DEFAULT_KC_D, BLIS_MAXIMUM_KC_D ); - gemm3m3_mr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MR_S, BLIS_PACKDIM_MR_S, - BLIS_DEFAULT_MR_D, BLIS_PACKDIM_MR_D ); - gemm3m3_nr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NR_S, BLIS_PACKDIM_NR_S, - BLIS_DEFAULT_NR_D, BLIS_PACKDIM_NR_D ); - gemm3m3_kr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KR_S, BLIS_PACKDIM_KR_S, - BLIS_DEFAULT_KR_D, BLIS_PACKDIM_KR_D ); - - - // Attach the register blksz_t objects as blocksize multiples to the cache - // blksz_t objects. - bli_blksz_obj_attach_mult_to( gemm3m3_mr, gemm3m3_mc ); - bli_blksz_obj_attach_mult_to( gemm3m3_nr, gemm3m3_nc ); - bli_blksz_obj_attach_mult_to( gemm3m3_kr, gemm3m3_kc ); - - - // The cache blocksizes that were scaled above need to be rounded down - // to their respective nearest register blocksize multiples. Note that - // this can only happen after the appropriate register blocksize is - // actually attached as a multiple. - bli_blksz_reduce_to_mult( gemm3m3_nc ); - - - // Attach the mr and nr blksz_t objects to each cache blksz_t object. - // The primary example of why this is needed relates to nudging kc. - // In hemm, symm, trmm, or trmm3, we need to know both mr and nr, - // since the multiple we target in nudging depends on whether the - // structured matrix is on the left or the right. - bli_blksz_obj_attach_mr_nr_to( gemm3m3_mr, gemm3m3_nr, gemm3m3_mc ); - bli_blksz_obj_attach_mr_nr_to( gemm3m3_mr, gemm3m3_nr, gemm3m3_nc ); - bli_blksz_obj_attach_mr_nr_to( gemm3m3_mr, gemm3m3_nr, gemm3m3_kc ); - - - // Create function pointer object for each datatype-specific gemm - // micro-kernel. - gemm3m3_ukrs - = - bli_func_obj_create( - NULL, FALSE, - NULL, FALSE, - BLIS_CGEMM3M3_UKERNEL, BLIS_CGEMM3M3_UKERNEL_PREFERS_CONTIG_ROWS, - BLIS_ZGEMM3M3_UKERNEL, BLIS_ZGEMM3M3_UKERNEL_PREFERS_CONTIG_ROWS ); - - - // Create control tree objects for packm operations. - gemm3m3_packb_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m3_kr, - gemm3m3_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_3MS, - BLIS_BUFFER_FOR_B_PANEL ); - - - // - // Create a control tree for packing A and B, and streaming C. - // - - // Create control tree object for lowest-level block-panel kernel. - gemm3m3_cntl_bp_ke - = - bli_gemm_cntl_obj_create( BLIS_UNB_OPT, - BLIS_VARIANT2, - NULL, - gemm3m3_ukrs, - NULL, NULL, NULL, - NULL, NULL, NULL ); - - // Create control tree object for outer panel (to block-panel) - // problem. - gemm3m3_cntl_op_bp - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT4, - gemm3m3_mc, - NULL, - NULL, - NULL, // packm cntl nodes accessed directly from blk_var4 - gemm3m3_packb_cntl, - NULL, - gemm3m3_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. - gemm3m3_cntl_mm_op - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm3m3_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3m3_cntl_op_bp, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. - gemm3m3_cntl_vl_mm - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm3m3_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3m3_cntl_mm_op, - NULL ); - - // Alias the "master" gemm control tree to a shorter name. - gemm3m3_cntl = gemm3m3_cntl_vl_mm; - -} - -void bli_gemm3m3_cntl_finalize() -{ - bli_blksz_obj_free( gemm3m3_mc ); - bli_blksz_obj_free( gemm3m3_nc ); - bli_blksz_obj_free( gemm3m3_kc ); - bli_blksz_obj_free( gemm3m3_mr ); - bli_blksz_obj_free( gemm3m3_nr ); - bli_blksz_obj_free( gemm3m3_kr ); - - bli_func_obj_free( gemm3m3_ukrs ); - - bli_cntl_obj_free( gemm3m3_packb_cntl ); - - bli_cntl_obj_free( gemm3m3_cntl_bp_ke ); - bli_cntl_obj_free( gemm3m3_cntl_op_bp ); - bli_cntl_obj_free( gemm3m3_cntl_mm_op ); - bli_cntl_obj_free( gemm3m3_cntl_vl_mm ); - -} - diff --git a/frame/ind/cntl/bli_gemm3mh_cntl.c b/frame/ind/cntl/bli_gemm3mh_cntl.c deleted file mode 100644 index 8f108fa16..000000000 --- a/frame/ind/cntl/bli_gemm3mh_cntl.c +++ /dev/null @@ -1,412 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - -blksz_t* gemm3mh_mc; -blksz_t* gemm3mh_nc; -blksz_t* gemm3mh_kc; -blksz_t* gemm3mh_mr; -blksz_t* gemm3mh_nr; -blksz_t* gemm3mh_kr; - -func_t* gemm3mh_ukrs; - -packm_t* gemm3mh_packa_cntl_ro; -packm_t* gemm3mh_packb_cntl_ro; -packm_t* gemm3mh_packa_cntl_io; -packm_t* gemm3mh_packb_cntl_io; -packm_t* gemm3mh_packa_cntl_rpi; -packm_t* gemm3mh_packb_cntl_rpi; - -gemm_t* gemm3mh_cntl_bp_ke; -gemm_t* gemm3mh_cntl_op_bp_ro; -gemm_t* gemm3mh_cntl_mm_op_ro; -gemm_t* gemm3mh_cntl_vl_mm_ro; -gemm_t* gemm3mh_cntl_op_bp_io; -gemm_t* gemm3mh_cntl_mm_op_io; -gemm_t* gemm3mh_cntl_vl_mm_io; -gemm_t* gemm3mh_cntl_op_bp_rpi; -gemm_t* gemm3mh_cntl_mm_op_rpi; -gemm_t* gemm3mh_cntl_vl_mm_rpi; - -gemm_t* gemm3mh_cntl_ro; -gemm_t* gemm3mh_cntl_io; -gemm_t* gemm3mh_cntl_rpi; - - -void bli_gemm3mh_cntl_init() -{ - // Create blocksize objects for each dimension. - // NOTE: the complex blocksizes for 3mh are equal to their - // corresponding real domain counterparts. - gemm3mh_mc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MC_S, BLIS_MAXIMUM_MC_S, - BLIS_DEFAULT_MC_D, BLIS_MAXIMUM_MC_D ); - gemm3mh_nc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NC_S, BLIS_MAXIMUM_NC_S, - BLIS_DEFAULT_NC_D, BLIS_MAXIMUM_NC_D ); - gemm3mh_kc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KC_S, BLIS_MAXIMUM_KC_S, - BLIS_DEFAULT_KC_D, BLIS_MAXIMUM_KC_D ); - gemm3mh_mr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MR_S, BLIS_PACKDIM_MR_S, - BLIS_DEFAULT_MR_D, BLIS_PACKDIM_MR_D ); - gemm3mh_nr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NR_S, BLIS_PACKDIM_NR_S, - BLIS_DEFAULT_NR_D, BLIS_PACKDIM_NR_D ); - gemm3mh_kr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KR_S, BLIS_PACKDIM_KR_S, - BLIS_DEFAULT_KR_D, BLIS_PACKDIM_KR_D ); - - - // Attach the register blksz_t objects as blocksize multiples to the cache - // blksz_t objects. - bli_blksz_obj_attach_mult_to( gemm3mh_mr, gemm3mh_mc ); - bli_blksz_obj_attach_mult_to( gemm3mh_nr, gemm3mh_nc ); - bli_blksz_obj_attach_mult_to( gemm3mh_kr, gemm3mh_kc ); - - - // Attach the mr and nr blksz_t objects to each cache blksz_t object. - // The primary example of why this is needed relates to nudging kc. - // In hemm, symm, trmm, or trmm3, we need to know both mr and nr, - // since the multiple we target in nudging depends on whether the - // structured matrix is on the left or the right. - bli_blksz_obj_attach_mr_nr_to( gemm3mh_mr, gemm3mh_nr, gemm3mh_mc ); - bli_blksz_obj_attach_mr_nr_to( gemm3mh_mr, gemm3mh_nr, gemm3mh_nc ); - bli_blksz_obj_attach_mr_nr_to( gemm3mh_mr, gemm3mh_nr, gemm3mh_kc ); - - - // Create function pointer object for each datatype-specific gemm - // micro-kernel. - gemm3mh_ukrs - = - bli_func_obj_create( - NULL, FALSE, - NULL, FALSE, - BLIS_CGEMM3MH_UKERNEL, BLIS_CGEMM3MH_UKERNEL_PREFERS_CONTIG_ROWS, - BLIS_ZGEMM3MH_UKERNEL, BLIS_ZGEMM3MH_UKERNEL_PREFERS_CONTIG_ROWS ); - - - // Create control tree objects for packm operations (real only). - gemm3mh_packa_cntl_ro - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3mh_mr, - gemm3mh_kr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_RO, - BLIS_BUFFER_FOR_A_BLOCK ); - - gemm3mh_packb_cntl_ro - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3mh_kr, - gemm3mh_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_RO, - BLIS_BUFFER_FOR_B_PANEL ); - - // Create control tree objects for packm operations (imag only). - gemm3mh_packa_cntl_io - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3mh_mr, - gemm3mh_kr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_IO, - BLIS_BUFFER_FOR_A_BLOCK ); - - gemm3mh_packb_cntl_io - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3mh_kr, - gemm3mh_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_IO, - BLIS_BUFFER_FOR_B_PANEL ); - - // Create control tree objects for packm operations (real+imag). - gemm3mh_packa_cntl_rpi - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3mh_mr, - gemm3mh_kr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_RPI, - BLIS_BUFFER_FOR_A_BLOCK ); - - gemm3mh_packb_cntl_rpi - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3mh_kr, - gemm3mh_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_RPI, - BLIS_BUFFER_FOR_B_PANEL ); - - - // Create control tree object for lowest-level block-panel kernel. - gemm3mh_cntl_bp_ke - = - bli_gemm_cntl_obj_create( BLIS_UNB_OPT, - BLIS_VARIANT2, - NULL, - gemm3mh_ukrs, - NULL, NULL, NULL, - NULL, NULL, NULL ); - - // - // Create control tree for A.real * B.real. - // - - // Create control tree object for outer panel (to block-panel) - // problem. (real x real) - gemm3mh_cntl_op_bp_ro - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3mh_mc, - NULL, - NULL, - gemm3mh_packa_cntl_ro, - gemm3mh_packb_cntl_ro, - NULL, - gemm3mh_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. (real x real) - gemm3mh_cntl_mm_op_ro - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm3mh_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3mh_cntl_op_bp_ro, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. (real x real) - gemm3mh_cntl_vl_mm_ro - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm3mh_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3mh_cntl_mm_op_ro, - NULL ); - - // - // Create control tree for A.imag * B.imag. - // - - // Create control tree object for outer panel (to block-panel) - // problem. (imag x imag) - gemm3mh_cntl_op_bp_io - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3mh_mc, - NULL, - NULL, - gemm3mh_packa_cntl_io, - gemm3mh_packb_cntl_io, - NULL, - gemm3mh_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. (imag x imag) - gemm3mh_cntl_mm_op_io - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm3mh_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3mh_cntl_op_bp_io, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. (imag x imag) - gemm3mh_cntl_vl_mm_io - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm3mh_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3mh_cntl_mm_op_io, - NULL ); - - // - // Create control tree for (A.real + A.imag) * (B.real + B.imag). - // - - // Create control tree object for outer panel (to block-panel) - // problem. (real+imag x real+imag) - gemm3mh_cntl_op_bp_rpi - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3mh_mc, - NULL, - NULL, - gemm3mh_packa_cntl_rpi, - gemm3mh_packb_cntl_rpi, - NULL, - gemm3mh_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. (real+imag x real+imag) - gemm3mh_cntl_mm_op_rpi - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm3mh_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3mh_cntl_op_bp_rpi, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. (real+imag x real+imag) - gemm3mh_cntl_vl_mm_rpi - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm3mh_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm3mh_cntl_mm_op_rpi, - NULL ); - - // Alias the "master" gemm control tree to a shorter name. - gemm3mh_cntl_ro = gemm3mh_cntl_vl_mm_ro; - gemm3mh_cntl_io = gemm3mh_cntl_vl_mm_io; - gemm3mh_cntl_rpi = gemm3mh_cntl_vl_mm_rpi; - -} - -void bli_gemm3mh_cntl_finalize() -{ - bli_blksz_obj_free( gemm3mh_mc ); - bli_blksz_obj_free( gemm3mh_nc ); - bli_blksz_obj_free( gemm3mh_kc ); - bli_blksz_obj_free( gemm3mh_mr ); - bli_blksz_obj_free( gemm3mh_nr ); - bli_blksz_obj_free( gemm3mh_kr ); - - bli_func_obj_free( gemm3mh_ukrs ); - - bli_cntl_obj_free( gemm3mh_packa_cntl_ro ); - bli_cntl_obj_free( gemm3mh_packb_cntl_ro ); - bli_cntl_obj_free( gemm3mh_packa_cntl_io ); - bli_cntl_obj_free( gemm3mh_packb_cntl_io ); - bli_cntl_obj_free( gemm3mh_packa_cntl_rpi ); - bli_cntl_obj_free( gemm3mh_packb_cntl_rpi ); - - bli_cntl_obj_free( gemm3mh_cntl_bp_ke ); - bli_cntl_obj_free( gemm3mh_cntl_op_bp_ro ); - bli_cntl_obj_free( gemm3mh_cntl_mm_op_ro ); - bli_cntl_obj_free( gemm3mh_cntl_vl_mm_ro ); - bli_cntl_obj_free( gemm3mh_cntl_op_bp_io ); - bli_cntl_obj_free( gemm3mh_cntl_mm_op_io ); - bli_cntl_obj_free( gemm3mh_cntl_vl_mm_io ); - bli_cntl_obj_free( gemm3mh_cntl_op_bp_rpi ); - bli_cntl_obj_free( gemm3mh_cntl_mm_op_rpi ); - bli_cntl_obj_free( gemm3mh_cntl_vl_mm_rpi ); - -} - diff --git a/frame/ind/cntl/bli_gemm4m1_cntl.c b/frame/ind/cntl/bli_gemm4m1_cntl.c deleted file mode 100644 index 3fb517d52..000000000 --- a/frame/ind/cntl/bli_gemm4m1_cntl.c +++ /dev/null @@ -1,243 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - -blksz_t* gemm4m1_mc; -blksz_t* gemm4m1_nc; -blksz_t* gemm4m1_kc; -blksz_t* gemm4m1_mr; -blksz_t* gemm4m1_nr; -blksz_t* gemm4m1_kr; - -func_t* gemm4m1_ukrs; - -packm_t* gemm4m1_packa_cntl; -packm_t* gemm4m1_packb_cntl; - -gemm_t* gemm4m1_cntl_bp_ke; -gemm_t* gemm4m1_cntl_op_bp; -gemm_t* gemm4m1_cntl_mm_op; -gemm_t* gemm4m1_cntl_vl_mm; - -gemm_t* gemm4m1_cntl; - - -void bli_gemm4m1_cntl_init() -{ - // Create blocksize objects for each dimension. - // NOTE: the complex blocksizes for 4m1 are generally equal to their - // corresponding real domain counterparts. However, we want to promote - // similar cache footprints for the micro-panels of A and B (when - // compared to executing in the real domain), and since the complex - // micro-panels are twice as "fat" (due to storing real and imaginary - // parts), we reduce KC by a factor of 2 to compensate. - gemm4m1_mc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MC_S, BLIS_MAXIMUM_MC_S, - BLIS_DEFAULT_MC_D, BLIS_MAXIMUM_MC_D ); - gemm4m1_nc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NC_S, BLIS_MAXIMUM_NC_S, - BLIS_DEFAULT_NC_D, BLIS_MAXIMUM_NC_D ); - gemm4m1_kc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KC_S/2, BLIS_MAXIMUM_KC_S/2, - BLIS_DEFAULT_KC_D/2, BLIS_MAXIMUM_KC_D/2 ); - gemm4m1_mr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MR_S, BLIS_PACKDIM_MR_S, - BLIS_DEFAULT_MR_D, BLIS_PACKDIM_MR_D ); - gemm4m1_nr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NR_S, BLIS_PACKDIM_NR_S, - BLIS_DEFAULT_NR_D, BLIS_PACKDIM_NR_D ); - gemm4m1_kr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KR_S, BLIS_PACKDIM_KR_S, - BLIS_DEFAULT_KR_D, BLIS_PACKDIM_KR_D ); - - - // Attach the register blksz_t objects as blocksize multiples to the cache - // blksz_t objects. - bli_blksz_obj_attach_mult_to( gemm4m1_mr, gemm4m1_mc ); - bli_blksz_obj_attach_mult_to( gemm4m1_nr, gemm4m1_nc ); - bli_blksz_obj_attach_mult_to( gemm4m1_kr, gemm4m1_kc ); - - - // Attach the mr and nr blksz_t objects to each cache blksz_t object. - // The primary example of why this is needed relates to nudging kc. - // In hemm, symm, trmm, or trmm3, we need to know both mr and nr, - // since the multiple we target in nudging depends on whether the - // structured matrix is on the left or the right. - bli_blksz_obj_attach_mr_nr_to( gemm4m1_mr, gemm4m1_nr, gemm4m1_mc ); - bli_blksz_obj_attach_mr_nr_to( gemm4m1_mr, gemm4m1_nr, gemm4m1_nc ); - bli_blksz_obj_attach_mr_nr_to( gemm4m1_mr, gemm4m1_nr, gemm4m1_kc ); - - - // Create function pointer object for each datatype-specific gemm - // micro-kernel. - gemm4m1_ukrs - = - bli_func_obj_create( - NULL, FALSE, - NULL, FALSE, - BLIS_CGEMM4M1_UKERNEL, BLIS_CGEMM4M1_UKERNEL_PREFERS_CONTIG_ROWS, - BLIS_ZGEMM4M1_UKERNEL, BLIS_ZGEMM4M1_UKERNEL_PREFERS_CONTIG_ROWS ); - - - // Create control tree objects for packm operations. - gemm4m1_packa_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4m1_mr, - gemm4m1_kr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_4MI, - BLIS_BUFFER_FOR_A_BLOCK ); - - gemm4m1_packb_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4m1_kr, - gemm4m1_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_4MI, - BLIS_BUFFER_FOR_B_PANEL ); - - - // - // Create a control tree for packing A and B, and streaming C. - // - - // Create control tree object for lowest-level block-panel kernel. - gemm4m1_cntl_bp_ke - = - bli_gemm_cntl_obj_create( BLIS_UNB_OPT, - BLIS_VARIANT2, - NULL, - gemm4m1_ukrs, - NULL, NULL, NULL, - NULL, NULL, NULL ); - - // Create control tree object for outer panel (to block-panel) - // problem. - gemm4m1_cntl_op_bp - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4m1_mc, - NULL, - NULL, - gemm4m1_packa_cntl, - gemm4m1_packb_cntl, - NULL, - gemm4m1_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. - gemm4m1_cntl_mm_op - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm4m1_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4m1_cntl_op_bp, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. - gemm4m1_cntl_vl_mm - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm4m1_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4m1_cntl_mm_op, - NULL ); - - // Alias the "master" gemm control tree to a shorter name. - gemm4m1_cntl = gemm4m1_cntl_vl_mm; - -} - -void bli_gemm4m1_cntl_finalize() -{ - bli_blksz_obj_free( gemm4m1_mc ); - bli_blksz_obj_free( gemm4m1_nc ); - bli_blksz_obj_free( gemm4m1_kc ); - bli_blksz_obj_free( gemm4m1_mr ); - bli_blksz_obj_free( gemm4m1_nr ); - bli_blksz_obj_free( gemm4m1_kr ); - - bli_func_obj_free( gemm4m1_ukrs ); - - bli_cntl_obj_free( gemm4m1_packa_cntl ); - bli_cntl_obj_free( gemm4m1_packb_cntl ); - - bli_cntl_obj_free( gemm4m1_cntl_bp_ke ); - bli_cntl_obj_free( gemm4m1_cntl_op_bp ); - bli_cntl_obj_free( gemm4m1_cntl_mm_op ); - bli_cntl_obj_free( gemm4m1_cntl_vl_mm ); -} - diff --git a/frame/ind/cntl/bli_gemm4mb_cntl.c b/frame/ind/cntl/bli_gemm4mb_cntl.c deleted file mode 100644 index c59ed3a98..000000000 --- a/frame/ind/cntl/bli_gemm4mb_cntl.c +++ /dev/null @@ -1,245 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - -blksz_t* gemm4mb_mc; -blksz_t* gemm4mb_nc; -blksz_t* gemm4mb_kc; -blksz_t* gemm4mb_mr; -blksz_t* gemm4mb_nr; -blksz_t* gemm4mb_kr; - -func_t* gemm4mb_ukrs; - -packm_t* gemm4mb_packa_cntl; -packm_t* gemm4mb_packb_cntl; - -gemm_t* gemm4mb_cntl_bp_ke; -gemm_t* gemm4mb_cntl_op_bp; -gemm_t* gemm4mb_cntl_mm_op; -gemm_t* gemm4mb_cntl_vl_mm; - -gemm_t* gemm4mb_cntl; - - -void bli_gemm4mb_cntl_init() -{ - // Create blocksize objects for each dimension. - gemm4mb_mc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MC_S/2, BLIS_MAXIMUM_MC_S/2, - BLIS_DEFAULT_MC_D/2, BLIS_MAXIMUM_MC_D/2 ); - gemm4mb_nc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NC_S/2, BLIS_MAXIMUM_NC_S/2, - BLIS_DEFAULT_NC_D/2, BLIS_MAXIMUM_NC_D/2 ); - gemm4mb_kc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KC_S, BLIS_MAXIMUM_KC_S, - BLIS_DEFAULT_KC_D, BLIS_MAXIMUM_KC_D ); - gemm4mb_mr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MR_S, BLIS_PACKDIM_MR_S, - BLIS_DEFAULT_MR_D, BLIS_PACKDIM_MR_D ); - gemm4mb_nr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NR_S, BLIS_PACKDIM_NR_S, - BLIS_DEFAULT_NR_D, BLIS_PACKDIM_NR_D ); - gemm4mb_kr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KR_S, BLIS_PACKDIM_KR_S, - BLIS_DEFAULT_KR_D, BLIS_PACKDIM_KR_D ); - - - // Attach the register blksz_t objects as blocksize multiples to the cache - // blksz_t objects. - bli_blksz_obj_attach_mult_to( gemm4mb_mr, gemm4mb_mc ); - bli_blksz_obj_attach_mult_to( gemm4mb_nr, gemm4mb_nc ); - bli_blksz_obj_attach_mult_to( gemm4mb_kr, gemm4mb_kc ); - - - // The cache blocksizes that were scaled above need to be rounded down - // to their respective nearest register blocksize multiples. Note that - // this can only happen after the appropriate register blocksize is - // actually attached as a multiple. - bli_blksz_reduce_to_mult( gemm4mb_mc ); - bli_blksz_reduce_to_mult( gemm4mb_nc ); - - - // Attach the mr and nr blksz_t objects to each cache blksz_t object. - // The primary example of why this is needed relates to nudging kc. - // In hemm, symm, trmm, or trmm3, we need to know both mr and nr, - // since the multiple we target in nudging depends on whether the - // structured matrix is on the left or the right. - bli_blksz_obj_attach_mr_nr_to( gemm4mb_mr, gemm4mb_nr, gemm4mb_mc ); - bli_blksz_obj_attach_mr_nr_to( gemm4mb_mr, gemm4mb_nr, gemm4mb_nc ); - bli_blksz_obj_attach_mr_nr_to( gemm4mb_mr, gemm4mb_nr, gemm4mb_kc ); - - - // Create function pointer object for each datatype-specific gemm - // micro-kernel. - gemm4mb_ukrs - = - bli_func_obj_create( - NULL, FALSE, - NULL, FALSE, - BLIS_CGEMM4MB_UKERNEL, BLIS_CGEMM4MB_UKERNEL_PREFERS_CONTIG_ROWS, - BLIS_ZGEMM4MB_UKERNEL, BLIS_ZGEMM4MB_UKERNEL_PREFERS_CONTIG_ROWS ); - - - // Create control tree objects for packm operations. - gemm4mb_packa_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mb_mr, - gemm4mb_kr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_4MI, - BLIS_BUFFER_FOR_A_BLOCK ); - - gemm4mb_packb_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mb_kr, - gemm4mb_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_4MI, - BLIS_BUFFER_FOR_B_PANEL ); - - - // - // Create a control tree for packing A and B, and streaming C. - // - - // Create control tree object for lowest-level block-panel kernel. - gemm4mb_cntl_bp_ke - = - bli_gemm_cntl_obj_create( BLIS_UNB_OPT, - BLIS_VARIANT3, - NULL, - gemm4mb_ukrs, - NULL, NULL, NULL, - NULL, NULL, NULL ); - - // Create control tree object for outer panel (to block-panel) - // problem. - gemm4mb_cntl_op_bp - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mb_mc, - NULL, - NULL, - gemm4mb_packa_cntl, - gemm4mb_packb_cntl, - NULL, - gemm4mb_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. - gemm4mb_cntl_mm_op - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm4mb_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4mb_cntl_op_bp, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. - gemm4mb_cntl_vl_mm - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm4mb_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4mb_cntl_mm_op, - NULL ); - - // Alias the "master" gemm control tree to a shorter name. - gemm4mb_cntl = gemm4mb_cntl_vl_mm; - -} - -void bli_gemm4mb_cntl_finalize() -{ - bli_blksz_obj_free( gemm4mb_mc ); - bli_blksz_obj_free( gemm4mb_nc ); - bli_blksz_obj_free( gemm4mb_kc ); - bli_blksz_obj_free( gemm4mb_mr ); - bli_blksz_obj_free( gemm4mb_nr ); - bli_blksz_obj_free( gemm4mb_kr ); - - bli_func_obj_free( gemm4mb_ukrs ); - - bli_cntl_obj_free( gemm4mb_packa_cntl ); - bli_cntl_obj_free( gemm4mb_packb_cntl ); - - bli_cntl_obj_free( gemm4mb_cntl_bp_ke ); - bli_cntl_obj_free( gemm4mb_cntl_op_bp ); - bli_cntl_obj_free( gemm4mb_cntl_mm_op ); - bli_cntl_obj_free( gemm4mb_cntl_vl_mm ); -} - diff --git a/frame/ind/cntl/bli_gemm4mh_cntl.c b/frame/ind/cntl/bli_gemm4mh_cntl.c deleted file mode 100644 index 2deb4ee09..000000000 --- a/frame/ind/cntl/bli_gemm4mh_cntl.c +++ /dev/null @@ -1,441 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - -blksz_t* gemm4mh_mc; -blksz_t* gemm4mh_nc; -blksz_t* gemm4mh_kc; -blksz_t* gemm4mh_mr; -blksz_t* gemm4mh_nr; -blksz_t* gemm4mh_kr; - -func_t* gemm4mh_ukrs; - -packm_t* gemm4mh_packa_cntl_ro; -packm_t* gemm4mh_packb_cntl_ro; -packm_t* gemm4mh_packa_cntl_io; -packm_t* gemm4mh_packb_cntl_io; - -gemm_t* gemm4mh_cntl_bp_ke; -gemm_t* gemm4mh_cntl_op_bp_rr; -gemm_t* gemm4mh_cntl_mm_op_rr; -gemm_t* gemm4mh_cntl_vl_mm_rr; -gemm_t* gemm4mh_cntl_op_bp_ri; -gemm_t* gemm4mh_cntl_mm_op_ri; -gemm_t* gemm4mh_cntl_vl_mm_ri; -gemm_t* gemm4mh_cntl_op_bp_ir; -gemm_t* gemm4mh_cntl_mm_op_ir; -gemm_t* gemm4mh_cntl_vl_mm_ir; -gemm_t* gemm4mh_cntl_op_bp_ii; -gemm_t* gemm4mh_cntl_mm_op_ii; -gemm_t* gemm4mh_cntl_vl_mm_ii; - -gemm_t* gemm4mh_cntl_rr; -gemm_t* gemm4mh_cntl_ri; -gemm_t* gemm4mh_cntl_ir; -gemm_t* gemm4mh_cntl_ii; - - -void bli_gemm4mh_cntl_init() -{ - // Create blocksize objects for each dimension. - // NOTE: the complex blocksizes for 4mh are equal to their - // corresponding real domain counterparts. - gemm4mh_mc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MC_S, BLIS_MAXIMUM_MC_S, - BLIS_DEFAULT_MC_D, BLIS_MAXIMUM_MC_D ); - gemm4mh_nc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NC_S, BLIS_MAXIMUM_NC_S, - BLIS_DEFAULT_NC_D, BLIS_MAXIMUM_NC_D ); - gemm4mh_kc - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KC_S, BLIS_MAXIMUM_KC_S, - BLIS_DEFAULT_KC_D, BLIS_MAXIMUM_KC_D ); - gemm4mh_mr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_MR_S, BLIS_PACKDIM_MR_S, - BLIS_DEFAULT_MR_D, BLIS_PACKDIM_MR_D ); - gemm4mh_nr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_NR_S, BLIS_PACKDIM_NR_S, - BLIS_DEFAULT_NR_D, BLIS_PACKDIM_NR_D ); - gemm4mh_kr - = - bli_blksz_obj_create( 0, 0, - 0, 0, - BLIS_DEFAULT_KR_S, BLIS_PACKDIM_KR_S, - BLIS_DEFAULT_KR_D, BLIS_PACKDIM_KR_D ); - - - // Attach the register blksz_t objects as blocksize multiples to the cache - // blksz_t objects. - bli_blksz_obj_attach_mult_to( gemm4mh_mr, gemm4mh_mc ); - bli_blksz_obj_attach_mult_to( gemm4mh_nr, gemm4mh_nc ); - bli_blksz_obj_attach_mult_to( gemm4mh_kr, gemm4mh_kc ); - - - // Attach the mr and nr blksz_t objects to each cache blksz_t object. - // The primary example of why this is needed relates to nudging kc. - // In hemm, symm, trmm, or trmm3, we need to know both mr and nr, - // since the multiple we target in nudging depends on whether the - // structured matrix is on the left or the right. - bli_blksz_obj_attach_mr_nr_to( gemm4mh_mr, gemm4mh_nr, gemm4mh_mc ); - bli_blksz_obj_attach_mr_nr_to( gemm4mh_mr, gemm4mh_nr, gemm4mh_nc ); - bli_blksz_obj_attach_mr_nr_to( gemm4mh_mr, gemm4mh_nr, gemm4mh_kc ); - - - // Create function pointer object for each datatype-specific gemm - // micro-kernel. - gemm4mh_ukrs - = - bli_func_obj_create( - NULL, FALSE, - NULL, FALSE, - BLIS_CGEMM4MH_UKERNEL, BLIS_CGEMM4MH_UKERNEL_PREFERS_CONTIG_ROWS, - BLIS_ZGEMM4MH_UKERNEL, BLIS_ZGEMM4MH_UKERNEL_PREFERS_CONTIG_ROWS ); - - - // Create control tree objects for packm operations (real only). - gemm4mh_packa_cntl_ro - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mh_mr, - gemm4mh_kr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_RO, - BLIS_BUFFER_FOR_A_BLOCK ); - - gemm4mh_packb_cntl_ro - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mh_kr, - gemm4mh_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_RO, - BLIS_BUFFER_FOR_B_PANEL ); - - // Create control tree objects for packm operations (imag only). - gemm4mh_packa_cntl_io - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mh_mr, - gemm4mh_kr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_IO, - BLIS_BUFFER_FOR_A_BLOCK ); - - gemm4mh_packb_cntl_io - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mh_kr, - gemm4mh_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_IO, - BLIS_BUFFER_FOR_B_PANEL ); - - - // Create control tree object for lowest-level block-panel kernel. - gemm4mh_cntl_bp_ke - = - bli_gemm_cntl_obj_create( BLIS_UNB_OPT, - BLIS_VARIANT2, - NULL, - gemm4mh_ukrs, - NULL, NULL, NULL, - NULL, NULL, NULL ); - - // - // Create control tree for A.real * B.real. - // - - // Create control tree object for outer panel (to block-panel) - // problem. (real x real) - gemm4mh_cntl_op_bp_rr - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mh_mc, - NULL, - NULL, - gemm4mh_packa_cntl_ro, - gemm4mh_packb_cntl_ro, - NULL, - gemm4mh_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. (real x real) - gemm4mh_cntl_mm_op_rr - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm4mh_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4mh_cntl_op_bp_rr, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. (real x real) - gemm4mh_cntl_vl_mm_rr - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm4mh_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4mh_cntl_mm_op_rr, - NULL ); - - // - // Create control tree for A.real * B.imag. - // - - // Create control tree object for outer panel (to block-panel) - // problem. (real x imag) - gemm4mh_cntl_op_bp_ri - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mh_mc, - NULL, - NULL, - gemm4mh_packa_cntl_ro, - gemm4mh_packb_cntl_io, - NULL, - gemm4mh_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. (real x imag) - gemm4mh_cntl_mm_op_ri - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm4mh_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4mh_cntl_op_bp_ri, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. (real x imag) - gemm4mh_cntl_vl_mm_ri - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm4mh_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4mh_cntl_mm_op_ri, - NULL ); - - // - // Create control tree for A.imag * B.real. - // - - // Create control tree object for outer panel (to block-panel) - // problem. (imag x real) - gemm4mh_cntl_op_bp_ir - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mh_mc, - NULL, - NULL, - gemm4mh_packa_cntl_io, - gemm4mh_packb_cntl_ro, - NULL, - gemm4mh_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. (imag x real) - gemm4mh_cntl_mm_op_ir - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm4mh_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4mh_cntl_op_bp_ir, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. (imag x real) - gemm4mh_cntl_vl_mm_ir - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm4mh_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4mh_cntl_mm_op_ir, - NULL ); - - // - // Create control tree for A.imag * B.imag. - // - - // Create control tree object for outer panel (to block-panel) - // problem. (imag x imag) - gemm4mh_cntl_op_bp_ii - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4mh_mc, - NULL, - NULL, - gemm4mh_packa_cntl_io, - gemm4mh_packb_cntl_io, - NULL, - gemm4mh_cntl_bp_ke, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates. (imag x imag) - gemm4mh_cntl_mm_op_ii - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm4mh_kc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4mh_cntl_op_bp_ii, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems. (imag x imag) - gemm4mh_cntl_vl_mm_ii - = - bli_gemm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm4mh_nc, - NULL, - NULL, - NULL, - NULL, - NULL, - gemm4mh_cntl_mm_op_ii, - NULL ); - - - // Alias the "master" gemm control tree to a shorter name. - gemm4mh_cntl_rr = gemm4mh_cntl_vl_mm_rr; - gemm4mh_cntl_ri = gemm4mh_cntl_vl_mm_ri; - gemm4mh_cntl_ir = gemm4mh_cntl_vl_mm_ir; - gemm4mh_cntl_ii = gemm4mh_cntl_vl_mm_ii; - -} - -void bli_gemm4mh_cntl_finalize() -{ - bli_blksz_obj_free( gemm4mh_mc ); - bli_blksz_obj_free( gemm4mh_nc ); - bli_blksz_obj_free( gemm4mh_kc ); - bli_blksz_obj_free( gemm4mh_mr ); - bli_blksz_obj_free( gemm4mh_nr ); - bli_blksz_obj_free( gemm4mh_kr ); - - bli_func_obj_free( gemm4mh_ukrs ); - - bli_cntl_obj_free( gemm4mh_packa_cntl_ro ); - bli_cntl_obj_free( gemm4mh_packb_cntl_ro ); - bli_cntl_obj_free( gemm4mh_packa_cntl_io ); - bli_cntl_obj_free( gemm4mh_packb_cntl_io ); - - bli_cntl_obj_free( gemm4mh_cntl_bp_ke ); - bli_cntl_obj_free( gemm4mh_cntl_op_bp_rr ); - bli_cntl_obj_free( gemm4mh_cntl_mm_op_rr ); - bli_cntl_obj_free( gemm4mh_cntl_vl_mm_rr ); - bli_cntl_obj_free( gemm4mh_cntl_op_bp_ri ); - bli_cntl_obj_free( gemm4mh_cntl_mm_op_ri ); - bli_cntl_obj_free( gemm4mh_cntl_vl_mm_ri ); - bli_cntl_obj_free( gemm4mh_cntl_op_bp_ir ); - bli_cntl_obj_free( gemm4mh_cntl_mm_op_ir ); - bli_cntl_obj_free( gemm4mh_cntl_vl_mm_ir ); - bli_cntl_obj_free( gemm4mh_cntl_op_bp_ii ); - bli_cntl_obj_free( gemm4mh_cntl_mm_op_ii ); - bli_cntl_obj_free( gemm4mh_cntl_vl_mm_ii ); - -} - diff --git a/frame/ind/cntl/bli_trsm3m1_cntl.c b/frame/ind/cntl/bli_trsm3m1_cntl.c deleted file mode 100644 index 5c88cd688..000000000 --- a/frame/ind/cntl/bli_trsm3m1_cntl.c +++ /dev/null @@ -1,300 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - -extern blksz_t* gemm3m1_mc; -extern blksz_t* gemm3m1_nc; -extern blksz_t* gemm3m1_kc; -extern blksz_t* gemm3m1_mr; -extern blksz_t* gemm3m1_nr; -extern blksz_t* gemm3m1_kr; - -extern func_t* gemm3m1_ukrs; - -func_t* gemmtrsm3m1_l_ukrs; -func_t* gemmtrsm3m1_u_ukrs; - -func_t* trsm3m1_l_ukrs; -func_t* trsm3m1_u_ukrs; - -packm_t* trsm3m1_l_packa_cntl; -packm_t* trsm3m1_l_packb_cntl; - -packm_t* trsm3m1_r_packa_cntl; -packm_t* trsm3m1_r_packb_cntl; - -trsm_t* trsm3m1_cntl_bp_ke; - -trsm_t* trsm3m1_l_cntl_op_bp; -trsm_t* trsm3m1_l_cntl_mm_op; -trsm_t* trsm3m1_l_cntl_vl_mm; - -trsm_t* trsm3m1_r_cntl_op_bp; -trsm_t* trsm3m1_r_cntl_mm_op; -trsm_t* trsm3m1_r_cntl_vl_mm; - -trsm_t* trsm3m1_l_cntl; -trsm_t* trsm3m1_r_cntl; - - -void bli_trsm3m1_cntl_init() -{ - - // Create function pointer objects for each datatype-specific - // gemmtrsm3m1_l and gemmtrsm3m1_u micro-kernel. - gemmtrsm3m1_l_ukrs - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - BLIS_CGEMMTRSM3M1_L_UKERNEL, FALSE, - BLIS_ZGEMMTRSM3M1_L_UKERNEL, FALSE ); - - gemmtrsm3m1_u_ukrs - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - BLIS_CGEMMTRSM3M1_U_UKERNEL, FALSE, - BLIS_ZGEMMTRSM3M1_U_UKERNEL, FALSE ); - - - // Create function pointer objects for each datatype-specific - // trsm3m1_l and trsm3m1_u micro-kernel. - trsm3m1_l_ukrs - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - BLIS_CTRSM3M1_L_UKERNEL, FALSE, - BLIS_ZTRSM3M1_L_UKERNEL, FALSE ); - - trsm3m1_u_ukrs - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - BLIS_CTRSM3M1_U_UKERNEL, FALSE, - BLIS_ZTRSM3M1_U_UKERNEL, FALSE ); - - - // Create control tree objects for packm operations (left side). - trsm3m1_l_packa_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - // IMPORTANT: n dim multiple must be mr to - // support right and bottom-right edge cases - gemm3m1_mr, - gemm3m1_mr, - TRUE, // invert diagonal - TRUE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_3MI, - BLIS_BUFFER_FOR_A_BLOCK ); - - trsm3m1_l_packb_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - // IMPORTANT: m dim multiple must be mr since - // B_pack is updated (ie: serves as C) in trsm - gemm3m1_mr, - gemm3m1_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_3MI, - BLIS_BUFFER_FOR_B_PANEL ); - - // Create control tree objects for packm operations (right side). - trsm3m1_r_packa_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m1_nr, - gemm3m1_mr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_3MI, - BLIS_BUFFER_FOR_A_BLOCK ); - - trsm3m1_r_packb_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m1_mr, - gemm3m1_mr, - TRUE, // invert diagonal - FALSE, // reverse iteration if upper? - TRUE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_3MI, - BLIS_BUFFER_FOR_B_PANEL ); - - - // Create control tree object for lowest-level block-panel kernel. - trsm3m1_cntl_bp_ke - = - bli_trsm_cntl_obj_create( BLIS_UNB_OPT, - BLIS_VARIANT2, - NULL, - gemm3m1_ukrs, - gemmtrsm3m1_l_ukrs, - gemmtrsm3m1_u_ukrs, - NULL, NULL, NULL, NULL, - NULL, NULL, NULL ); - - // Create control tree object for outer panel (to block-panel) - // problem (left side). - trsm3m1_l_cntl_op_bp - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m1_mc, - NULL, NULL, NULL, - NULL, - trsm3m1_l_packa_cntl, - trsm3m1_l_packb_cntl, - NULL, - trsm3m1_cntl_bp_ke, - NULL, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates (left side). - trsm3m1_l_cntl_mm_op - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm3m1_kc, - NULL, NULL, NULL, - NULL, - NULL, - NULL, - NULL, - trsm3m1_l_cntl_op_bp, - NULL, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems (left side). - trsm3m1_l_cntl_vl_mm - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm3m1_nc, - NULL, NULL, NULL, - NULL, - NULL, - NULL, - NULL, - trsm3m1_l_cntl_mm_op, - NULL, - NULL ); - - // Create control tree object for outer panel (to block-panel) - // problem (right side). - trsm3m1_r_cntl_op_bp - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm3m1_mc, - NULL, NULL, NULL, - NULL, - trsm3m1_r_packa_cntl, - trsm3m1_r_packb_cntl, - NULL, - trsm3m1_cntl_bp_ke, - NULL, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates (right side). - trsm3m1_r_cntl_mm_op - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm3m1_kc, - NULL, NULL, NULL, - NULL, - NULL, - NULL, - NULL, - trsm3m1_r_cntl_op_bp, - NULL, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems (right side). - trsm3m1_r_cntl_vl_mm - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm3m1_nc, - NULL, NULL, NULL, - NULL, - NULL, - NULL, - NULL, - trsm3m1_r_cntl_mm_op, - NULL, - NULL ); - - // Alias the "master" trsm control trees to shorter names. - trsm3m1_l_cntl = trsm3m1_l_cntl_vl_mm; - trsm3m1_r_cntl = trsm3m1_r_cntl_vl_mm; -} - -void bli_trsm3m1_cntl_finalize() -{ - bli_func_obj_free( gemmtrsm3m1_l_ukrs ); - bli_func_obj_free( gemmtrsm3m1_u_ukrs ); - bli_func_obj_free( trsm3m1_l_ukrs ); - bli_func_obj_free( trsm3m1_u_ukrs ); - - bli_cntl_obj_free( trsm3m1_l_packa_cntl ); - bli_cntl_obj_free( trsm3m1_l_packb_cntl ); - bli_cntl_obj_free( trsm3m1_r_packa_cntl ); - bli_cntl_obj_free( trsm3m1_r_packb_cntl ); - - bli_cntl_obj_free( trsm3m1_cntl_bp_ke ); - - bli_cntl_obj_free( trsm3m1_l_cntl_op_bp ); - bli_cntl_obj_free( trsm3m1_l_cntl_mm_op ); - bli_cntl_obj_free( trsm3m1_l_cntl_vl_mm ); - bli_cntl_obj_free( trsm3m1_r_cntl_op_bp ); - bli_cntl_obj_free( trsm3m1_r_cntl_mm_op ); - bli_cntl_obj_free( trsm3m1_r_cntl_vl_mm ); -} - diff --git a/frame/ind/cntl/bli_trsm4m1_cntl.c b/frame/ind/cntl/bli_trsm4m1_cntl.c deleted file mode 100644 index 54883c42f..000000000 --- a/frame/ind/cntl/bli_trsm4m1_cntl.c +++ /dev/null @@ -1,300 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -extern scalm_t* scalm_cntl; - -extern blksz_t* gemm4m1_mc; -extern blksz_t* gemm4m1_nc; -extern blksz_t* gemm4m1_kc; -extern blksz_t* gemm4m1_mr; -extern blksz_t* gemm4m1_nr; -extern blksz_t* gemm4m1_kr; - -extern func_t* gemm4m1_ukrs; - -func_t* gemmtrsm4m1_l_ukrs; -func_t* gemmtrsm4m1_u_ukrs; - -func_t* trsm4m1_l_ukrs; -func_t* trsm4m1_u_ukrs; - -packm_t* trsm4m1_l_packa_cntl; -packm_t* trsm4m1_l_packb_cntl; - -packm_t* trsm4m1_r_packa_cntl; -packm_t* trsm4m1_r_packb_cntl; - -trsm_t* trsm4m1_cntl_bp_ke; - -trsm_t* trsm4m1_l_cntl_op_bp; -trsm_t* trsm4m1_l_cntl_mm_op; -trsm_t* trsm4m1_l_cntl_vl_mm; - -trsm_t* trsm4m1_r_cntl_op_bp; -trsm_t* trsm4m1_r_cntl_mm_op; -trsm_t* trsm4m1_r_cntl_vl_mm; - -trsm_t* trsm4m1_l_cntl; -trsm_t* trsm4m1_r_cntl; - - -void bli_trsm4m1_cntl_init() -{ - - // Create function pointer objects for each datatype-specific - // gemmtrsm4m1_l and gemmtrsm4m1_u micro-kernel. - gemmtrsm4m1_l_ukrs - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - BLIS_CGEMMTRSM4M1_L_UKERNEL, FALSE, - BLIS_ZGEMMTRSM4M1_L_UKERNEL, FALSE ); - - gemmtrsm4m1_u_ukrs - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - BLIS_CGEMMTRSM4M1_U_UKERNEL, FALSE, - BLIS_ZGEMMTRSM4M1_U_UKERNEL, FALSE ); - - - // Create function pointer objects for each datatype-specific - // trsm4m1_l and trsm4m1_u micro-kernel. - trsm4m1_l_ukrs - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - BLIS_CTRSM4M1_L_UKERNEL, FALSE, - BLIS_ZTRSM4M1_L_UKERNEL, FALSE ); - - trsm4m1_u_ukrs - = - bli_func_obj_create( NULL, FALSE, - NULL, FALSE, - BLIS_CTRSM4M1_U_UKERNEL, FALSE, - BLIS_ZTRSM4M1_U_UKERNEL, FALSE ); - - - // Create control tree objects for packm operations (left side). - trsm4m1_l_packa_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - // IMPORTANT: n dim multiple must be mr to - // support right and bottom-right edge cases - gemm4m1_mr, - gemm4m1_mr, - TRUE, // invert diagonal - TRUE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_4MI, - BLIS_BUFFER_FOR_A_BLOCK ); - - trsm4m1_l_packb_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - // IMPORTANT: m dim multiple must be mr since - // B_pack is updated (ie: serves as C) in trsm - gemm4m1_mr, - gemm4m1_nr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_4MI, - BLIS_BUFFER_FOR_B_PANEL ); - - // Create control tree objects for packm operations (right side). - trsm4m1_r_packa_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4m1_nr, - gemm4m1_mr, - FALSE, // do NOT invert diagonal - FALSE, // reverse iteration if upper? - FALSE, // reverse iteration if lower? - BLIS_PACKED_ROW_PANELS_4MI, - BLIS_BUFFER_FOR_A_BLOCK ); - - trsm4m1_r_packb_cntl - = - bli_packm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4m1_mr, - gemm4m1_mr, - TRUE, // invert diagonal - FALSE, // reverse iteration if upper? - TRUE, // reverse iteration if lower? - BLIS_PACKED_COL_PANELS_4MI, - BLIS_BUFFER_FOR_B_PANEL ); - - - // Create control tree object for lowest-level block-panel kernel. - trsm4m1_cntl_bp_ke - = - bli_trsm_cntl_obj_create( BLIS_UNB_OPT, - BLIS_VARIANT2, - NULL, - gemm4m1_ukrs, - gemmtrsm4m1_l_ukrs, - gemmtrsm4m1_u_ukrs, - NULL, NULL, NULL, NULL, - NULL, NULL, NULL ); - - // Create control tree object for outer panel (to block-panel) - // problem (left side). - trsm4m1_l_cntl_op_bp - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4m1_mc, - NULL, NULL, NULL, - NULL, - trsm4m1_l_packa_cntl, - trsm4m1_l_packb_cntl, - NULL, - trsm4m1_cntl_bp_ke, - NULL, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates (left side). - trsm4m1_l_cntl_mm_op - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm4m1_kc, - NULL, NULL, NULL, - NULL, - NULL, - NULL, - NULL, - trsm4m1_l_cntl_op_bp, - NULL, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems (left side). - trsm4m1_l_cntl_vl_mm - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm4m1_nc, - NULL, NULL, NULL, - NULL, - NULL, - NULL, - NULL, - trsm4m1_l_cntl_mm_op, - NULL, - NULL ); - - // Create control tree object for outer panel (to block-panel) - // problem (right side). - trsm4m1_r_cntl_op_bp - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT1, - gemm4m1_mc, - NULL, NULL, NULL, - NULL, - trsm4m1_r_packa_cntl, - trsm4m1_r_packb_cntl, - NULL, - trsm4m1_cntl_bp_ke, - NULL, - NULL ); - - // Create control tree object for general problem via multiple - // rank-k (outer panel) updates (right side). - trsm4m1_r_cntl_mm_op - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT3, - gemm4m1_kc, - NULL, NULL, NULL, - NULL, - NULL, - NULL, - NULL, - trsm4m1_r_cntl_op_bp, - NULL, - NULL ); - - // Create control tree object for very large problem via multiple - // general problems (right side). - trsm4m1_r_cntl_vl_mm - = - bli_trsm_cntl_obj_create( BLIS_BLOCKED, - BLIS_VARIANT2, - gemm4m1_nc, - NULL, NULL, NULL, - NULL, - NULL, - NULL, - NULL, - trsm4m1_r_cntl_mm_op, - NULL, - NULL ); - - // Alias the "master" trsm control trees to shorter names. - trsm4m1_l_cntl = trsm4m1_l_cntl_vl_mm; - trsm4m1_r_cntl = trsm4m1_r_cntl_vl_mm; -} - -void bli_trsm4m1_cntl_finalize() -{ - bli_func_obj_free( gemmtrsm4m1_l_ukrs ); - bli_func_obj_free( gemmtrsm4m1_u_ukrs ); - bli_func_obj_free( trsm4m1_l_ukrs ); - bli_func_obj_free( trsm4m1_u_ukrs ); - - bli_cntl_obj_free( trsm4m1_l_packa_cntl ); - bli_cntl_obj_free( trsm4m1_l_packb_cntl ); - bli_cntl_obj_free( trsm4m1_r_packa_cntl ); - bli_cntl_obj_free( trsm4m1_r_packb_cntl ); - - bli_cntl_obj_free( trsm4m1_cntl_bp_ke ); - - bli_cntl_obj_free( trsm4m1_l_cntl_op_bp ); - bli_cntl_obj_free( trsm4m1_l_cntl_mm_op ); - bli_cntl_obj_free( trsm4m1_l_cntl_vl_mm ); - bli_cntl_obj_free( trsm4m1_r_cntl_op_bp ); - bli_cntl_obj_free( trsm4m1_r_cntl_mm_op ); - bli_cntl_obj_free( trsm4m1_r_cntl_vl_mm ); -} - diff --git a/frame/ind/cntx/bli_gemmind_cntx.c b/frame/ind/cntx/bli_gemmind_cntx.c new file mode 100644 index 000000000..b8d777f86 --- /dev/null +++ b/frame/ind/cntx/bli_gemmind_cntx.c @@ -0,0 +1,507 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +typedef void (*cntx_ft)( cntx_t* cntx ); + +static void* bli_gemmind_cntx_fp[BLIS_NUM_IND_METHODS][2] = +{ + /* _cntx_init _cntx_finalize */ +/* 3mh */ { bli_gemm3mh_cntx_init, bli_gemm3mh_cntx_finalize }, +/* 3m3 */ { bli_gemm3m3_cntx_init, bli_gemm3m3_cntx_finalize }, +/* 3m2 */ { bli_gemm3m2_cntx_init, bli_gemm3m2_cntx_finalize }, +/* 3m1 */ { bli_gemm3m1_cntx_init, bli_gemm3m1_cntx_finalize }, +/* 4mh */ { bli_gemm4mh_cntx_init, bli_gemm4mh_cntx_finalize }, +/* 4mb */ { bli_gemm4mb_cntx_init, bli_gemm4mb_cntx_finalize }, +/* 4m1 */ { bli_gemm4m1_cntx_init, bli_gemm4m1_cntx_finalize }, +/* nat */ { bli_gemmnat_cntx_init, bli_gemmnat_cntx_finalize } +}; + +#define BLIS_CNTX_INIT_INDEX 0 +#define BLIS_CNTX_FINALIZE_INDEX 1 + +// ----------------------------------------------------------------------------- + +// Use a datatype to find the highest priority available (ie: implemented +// and enabled) induced method, and then execute the context initialization/ +// finalization function associated with that induced method. + +void bli_gemmind_cntx_init_avail( num_t dt, cntx_t* cntx ) +{ + ind_t method = bli_ind_oper_find_avail( BLIS_GEMM, dt ); + + bli_gemmind_cntx_init( method, cntx ); +} + +void bli_gemmind_cntx_finalize_avail( num_t dt, cntx_t* cntx ) +{ + ind_t method = bli_ind_oper_find_avail( BLIS_GEMM, dt ); + + bli_gemmind_cntx_finalize( method, cntx ); +} + +// ----------------------------------------------------------------------------- + +// Execute the context initialization/finalization function associated +// with a given induced method. + +void bli_gemmind_cntx_init( ind_t method, cntx_t* cntx ) +{ + cntx_ft func = bli_gemmind_cntx_init_get_func( method ); + + func( cntx ); +} + +void bli_gemmind_cntx_finalize( ind_t method, cntx_t* cntx ) +{ + cntx_ft func = bli_gemmind_cntx_finalize_get_func( method ); + + func( cntx ); +} + +// ----------------------------------------------------------------------------- + +void* bli_gemmind_cntx_init_get_func( ind_t method ) +{ + return bli_gemmind_cntx_fp[ method ][ BLIS_CNTX_INIT_INDEX ]; +} + +void* bli_gemmind_cntx_finalize_get_func( ind_t method ) +{ + return bli_gemmind_cntx_fp[ method ][ BLIS_CNTX_FINALIZE_INDEX ]; +} + +// ----------------------------------------------------------------------------- + +void bli_gemm3m1_cntx_init( cntx_t* cntx ) +{ + const ind_t method = BLIS_3M1; + + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the virtual micro-kernel associated with + // the current induced method. + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMM_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), and the induced method. + bli_gks_cntx_set_blkszs( method, 6, + BLIS_NC, BLIS_NR, 1.0, + BLIS_KC, BLIS_KR, 3.0, + BLIS_MC, BLIS_MR, 1.0, + BLIS_NR, BLIS_NR, 1.0, + BLIS_MR, BLIS_MR, 1.0, + BLIS_KR, BLIS_KR, 1.0, + cntx ); + + // Set the pack_t schemas for the current induced method. + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_3MI, + BLIS_PACKED_COL_PANELS_3MI, + cntx ); +} + +void bli_gemm3m1_cntx_stage( dim_t stage, cntx_t* cntx ) +{ +} + +void bli_gemm3m1_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + +// ----------------------------------------------------------------------------- + +void bli_gemm3m2_cntx_init( cntx_t* cntx ) +{ + const ind_t method = BLIS_3M2; + + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the virtual micro-kernel associated with + // the current induced method. + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMM_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), and the induced method. + bli_gks_cntx_set_blkszs( method, 6, + BLIS_NC, BLIS_NR, 3.0, + BLIS_KC, BLIS_KR, 1.0, + BLIS_MC, BLIS_MR, 3.0, + BLIS_NR, BLIS_NR, 1.0, + BLIS_MR, BLIS_MR, 1.0, + BLIS_KR, BLIS_KR, 1.0, + cntx ); + + // Set the pack_t schemas for the current induced method. + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_3MS, + BLIS_PACKED_COL_PANELS_3MI, + cntx ); +} + +void bli_gemm3m2_cntx_stage( dim_t stage, cntx_t* cntx ) +{ +} + +void bli_gemm3m2_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + +// ----------------------------------------------------------------------------- + +void bli_gemm3m3_cntx_init( cntx_t* cntx ) +{ + const ind_t method = BLIS_3M3; + + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the virtual micro-kernel associated with + // the current induced method. + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMM_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), and the induced method. + bli_gks_cntx_set_blkszs( method, 6, + BLIS_NC, BLIS_NR, 3.0, + BLIS_KC, BLIS_KR, 1.0, + BLIS_MC, BLIS_MR, 1.0, + BLIS_NR, BLIS_NR, 1.0, + BLIS_MR, BLIS_MR, 1.0, + BLIS_KR, BLIS_KR, 1.0, + cntx ); + + // Set the pack_t schemas for the current induced method. + bli_cntx_set_pack_schema_ab( 0, // not yet needed; varies with _stage() + BLIS_PACKED_COL_PANELS_3MS, + cntx ); +} + +void bli_gemm3m3_cntx_stage( dim_t stage, cntx_t* cntx ) +{ + // Set the pack_t schemas as a function of the stage of execution. + if ( stage == 0 ) + { + bli_cntx_set_pack_schema_a( BLIS_PACKED_ROW_PANELS_RO, cntx ); + } + else if ( stage == 1 ) + { + bli_cntx_set_pack_schema_a( BLIS_PACKED_ROW_PANELS_IO, cntx ); + } + else // if ( stage == 2 ) + { + bli_cntx_set_pack_schema_a( BLIS_PACKED_ROW_PANELS_RPI, cntx ); + } +} + +void bli_gemm3m3_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + +// ----------------------------------------------------------------------------- + +void bli_gemm3mh_cntx_init( cntx_t* cntx ) +{ + const ind_t method = BLIS_3MH; + + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the virtual micro-kernel associated with + // the current induced method. + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMM_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), and the induced method. + bli_gks_cntx_set_blkszs( method, 6, + BLIS_NC, BLIS_NR, 1.0, + BLIS_KC, BLIS_KR, 1.0, + BLIS_MC, BLIS_MR, 1.0, + BLIS_NR, BLIS_NR, 1.0, + BLIS_MR, BLIS_MR, 1.0, + BLIS_KR, BLIS_KR, 1.0, + cntx ); + + // Set the pack_t schemas for the current induced method. + bli_cntx_set_pack_schema_ab( 0, // not yet needed; varies with _stage() + 0, // not yet needed; varies with _stage() + cntx ); +} + +void bli_gemm3mh_cntx_stage( dim_t stage, cntx_t* cntx ) +{ + // Set the pack_t schemas as a function of the stage of execution. + if ( stage == 0 ) + { + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_RO, + BLIS_PACKED_COL_PANELS_RO, cntx ); + } + else if ( stage == 1 ) + { + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_IO, + BLIS_PACKED_COL_PANELS_IO, cntx ); + } + else // if ( stage == 2 ) + { + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_RPI, + BLIS_PACKED_COL_PANELS_RPI, cntx ); + } +} + +void bli_gemm3mh_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + +// ----------------------------------------------------------------------------- + +void bli_gemm4m1_cntx_init( cntx_t* cntx ) +{ + const ind_t method = BLIS_4M1A; + + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the virtual micro-kernel associated with + // the current induced method. + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMM_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), and the induced method. + bli_gks_cntx_set_blkszs( method, 6, + BLIS_NC, BLIS_NR, 1.0, + BLIS_KC, BLIS_KR, 2.0, + BLIS_MC, BLIS_MR, 1.0, + BLIS_NR, BLIS_NR, 1.0, + BLIS_MR, BLIS_MR, 1.0, + BLIS_KR, BLIS_KR, 1.0, + cntx ); + + // Set the pack_t schemas for the current induced method. + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_4MI, + BLIS_PACKED_COL_PANELS_4MI, + cntx ); +} + +void bli_gemm4m1_cntx_stage( dim_t stage, cntx_t* cntx ) +{ +} + +void bli_gemm4m1_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + +// ----------------------------------------------------------------------------- + +void bli_gemm4mb_cntx_init( cntx_t* cntx ) +{ + const ind_t method = BLIS_4M1B; + + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the virtual micro-kernel associated with + // the current induced method. + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMM_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), and the induced method. + bli_gks_cntx_set_blkszs( method, 6, + BLIS_NC, BLIS_NR, 2.0, + BLIS_KC, BLIS_KR, 1.0, + BLIS_MC, BLIS_MR, 2.0, + BLIS_NR, BLIS_NR, 1.0, + BLIS_MR, BLIS_MR, 1.0, + BLIS_KR, BLIS_KR, 1.0, + cntx ); + + // Set the pack_t schemas for the current induced method. + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_4MI, + BLIS_PACKED_COL_PANELS_4MI, + cntx ); +} + +void bli_gemm4mb_cntx_stage( dim_t stage, cntx_t* cntx ) +{ +} + +void bli_gemm4mb_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + +// ----------------------------------------------------------------------------- + +void bli_gemm4mh_cntx_init( cntx_t* cntx ) +{ + const ind_t method = BLIS_4MH; + + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the virtual micro-kernel associated with + // the current induced method. + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMM_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), and the induced method. + bli_gks_cntx_set_blkszs( method, 6, + BLIS_NC, BLIS_NR, 1.0, + BLIS_KC, BLIS_KR, 1.0, + BLIS_MC, BLIS_MR, 1.0, + BLIS_NR, BLIS_NR, 1.0, + BLIS_MR, BLIS_MR, 1.0, + BLIS_KR, BLIS_KR, 1.0, + cntx ); + + // Set the pack_t schemas for the current induced method. + bli_cntx_set_pack_schema_ab( 0, // not yet needed; varies with _stage() + 0, // not yet needed; varies with _stage() + cntx ); +} + +void bli_gemm4mh_cntx_stage( dim_t stage, cntx_t* cntx ) +{ + // Set the pack_t schemas as a function of the stage of execution. + if ( stage == 0 ) + { + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_RO, + BLIS_PACKED_COL_PANELS_RO, cntx ); + } + else if ( stage == 1 ) + { + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_IO, + BLIS_PACKED_COL_PANELS_IO, cntx ); + } + else if ( stage == 2 ) + { + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_RO, + BLIS_PACKED_COL_PANELS_IO, cntx ); + } + else // if ( stage == 3 ) + { + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_IO, + BLIS_PACKED_COL_PANELS_RO, cntx ); + } +} + +void bli_gemm4mh_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + +// ----------------------------------------------------------------------------- + +void bli_gemmnat_cntx_init( cntx_t* cntx ) +{ + bli_gemm_cntx_init( cntx ); +} + +void bli_gemmnat_cntx_stage( dim_t stage, cntx_t* cntx ) +{ +} + +void bli_gemmnat_cntx_finalize( cntx_t* cntx ) +{ + bli_gemm_cntx_finalize( cntx ); +} + diff --git a/frame/ind/cntx/bli_gemmind_cntx.h b/frame/ind/cntx/bli_gemmind_cntx.h new file mode 100644 index 000000000..c70da7b36 --- /dev/null +++ b/frame/ind/cntx/bli_gemmind_cntx.h @@ -0,0 +1,100 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#if 0 +// +// Generate prototypes for _cntx_init(), _cntx_stage(), and _cntx_finalize() +// for each induced method (including native execution) based on gemm. +// + +#undef GENPROT +#define GENPROT( opname, imeth ) \ +\ +void PASTEMAC2(opname,imeth,_cntx_init)( void ); \ +void PASTEMAC2(opname,imeth,_cntx_stage)( dim_t stage, cntx_t* cntx ); \ +void PASTEMAC2(opname,imeth,_cntx_finalize)( void ); + +GENPROT( gemm, nat ) +GENPROT( gemm, 3mh ) +GENPROT( gemm, 3m3 ) +GENPROT( gemm, 3m2 ) +GENPROT( gemm, 3m1 ) +GENPROT( gemm, 4mh ) +GENPROT( gemm, 4mb ) +GENPROT( gemm, 4m1 ) +#endif + +void bli_gemmnat_cntx_init( cntx_t* cntx ); +void bli_gemmnat_cntx_stage( dim_t stage, cntx_t* cntx ); +void bli_gemmnat_cntx_finalize( cntx_t* cntx ); + +void bli_gemm3mh_cntx_init( cntx_t* cntx ); +void bli_gemm3mh_cntx_stage( dim_t stage, cntx_t* cntx ); +void bli_gemm3mh_cntx_finalize( cntx_t* cntx ); + +void bli_gemm3m3_cntx_init( cntx_t* cntx ); +void bli_gemm3m3_cntx_stage( dim_t stage, cntx_t* cntx ); +void bli_gemm3m3_cntx_finalize( cntx_t* cntx ); + +void bli_gemm3m2_cntx_init( cntx_t* cntx ); +void bli_gemm3m2_cntx_stage( dim_t stage, cntx_t* cntx ); +void bli_gemm3m2_cntx_finalize( cntx_t* cntx ); + +void bli_gemm3m1_cntx_init( cntx_t* cntx ); +void bli_gemm3m1_cntx_stage( dim_t stage, cntx_t* cntx ); +void bli_gemm3m1_cntx_finalize( cntx_t* cntx ); + +void bli_gemm4mh_cntx_init( cntx_t* cntx ); +void bli_gemm4mh_cntx_stage( dim_t stage, cntx_t* cntx ); +void bli_gemm4mh_cntx_finalize( cntx_t* cntx ); + +void bli_gemm4mb_cntx_init( cntx_t* cntx ); +void bli_gemm4mb_cntx_stage( dim_t stage, cntx_t* cntx ); +void bli_gemm4mb_cntx_finalize( cntx_t* cntx ); + +void bli_gemm4m1_cntx_init( cntx_t* cntx ); +void bli_gemm4m1_cntx_stage( dim_t stage, cntx_t* cntx ); +void bli_gemm4m1_cntx_finalize( cntx_t* cntx ); + +// ----------------------------------------------------------------------------- + +void bli_gemmind_cntx_init_avail( num_t dt, cntx_t* cntx ); +void bli_gemmind_cntx_finalize_avail( num_t dt, cntx_t* cntx ); + +void bli_gemmind_cntx_init( ind_t method, cntx_t* cntx ); +void bli_gemmind_cntx_finalize( ind_t method, cntx_t* cntx ); + +void* bli_gemmind_cntx_init_get_func( ind_t method ); +void* bli_gemmind_cntx_finalize_get_func( ind_t method ); + diff --git a/frame/ind/cntx/bli_trsmind_cntx.c b/frame/ind/cntx/bli_trsmind_cntx.c new file mode 100644 index 000000000..c1e8057ce --- /dev/null +++ b/frame/ind/cntx/bli_trsmind_cntx.c @@ -0,0 +1,144 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// ----------------------------------------------------------------------------- + +void bli_trsm3m1_cntx_init( cntx_t* cntx ) +{ + const ind_t method = BLIS_3M1; + + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the virtual micro-kernels associated with + // the current induced method. + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMMTRSM_L_UKR, cntx ); + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMMTRSM_U_UKR, cntx ); + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_TRSM_L_UKR, cntx ); + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_TRSM_U_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), and the induced method. + bli_gks_cntx_set_blkszs( method, 6, + BLIS_NC, BLIS_NR, 1.0, + BLIS_KC, BLIS_KR, 3.0, + BLIS_MC, BLIS_MR, 1.0, + BLIS_NR, BLIS_NR, 1.0, + BLIS_MR, BLIS_MR, 1.0, + BLIS_KR, BLIS_KR, 1.0, + cntx ); + + // Set the pack_t schemas for native execution. + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_3MI, + BLIS_PACKED_COL_PANELS_3MI, + cntx ); +} + +void bli_trsm3m1_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + +// ----------------------------------------------------------------------------- + +void bli_trsm4m1_cntx_init( cntx_t* cntx ) +{ + const ind_t method = BLIS_4M1A; + + // Perform basic setup on the context. + bli_cntx_obj_create( cntx ); + + // Initialize the context with the current architecture's native + // level-3 gemm micro-kernel, and its output preferences. + bli_gks_cntx_set_l3_nat_ukr( BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_nat_ukr_prefs( BLIS_GEMM_UKR, cntx ); + + // Initialize the context with the virtual micro-kernels associated with + // the current induced method. + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMM_UKR, cntx ); + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMMTRSM_L_UKR, cntx ); + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_GEMMTRSM_U_UKR, cntx ); + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_TRSM_L_UKR, cntx ); + bli_gks_cntx_set_l3_vir_ukr( method, BLIS_TRSM_U_UKR, cntx ); + + // Initialize the context with packm-related kernels. + bli_packm_cntx_init( cntx ); + + // Initialize the context with the current architecture's register + // and cache blocksizes (and multiples), and the induced method. + bli_gks_cntx_set_blkszs( method, 6, + BLIS_NC, BLIS_NR, 1.0, + BLIS_KC, BLIS_KR, 2.0, + BLIS_MC, BLIS_MR, 1.0, + BLIS_NR, BLIS_NR, 1.0, + BLIS_MR, BLIS_MR, 1.0, + BLIS_KR, BLIS_KR, 1.0, + cntx ); + + // Set the pack_t schemas for native execution. + bli_cntx_set_pack_schema_ab( BLIS_PACKED_ROW_PANELS_4MI, + BLIS_PACKED_COL_PANELS_4MI, + cntx ); +} + +void bli_trsm4m1_cntx_finalize( cntx_t* cntx ) +{ + // Free the context and all memory allocated to it. + bli_cntx_obj_free( cntx ); +} + +// ----------------------------------------------------------------------------- + +void bli_trsmnat_cntx_init( cntx_t* cntx ) +{ + bli_trsm_cntx_init( cntx ); +} + +void bli_trsmnat_cntx_finalize( cntx_t* cntx ) +{ + bli_trsm_cntx_finalize( cntx ); +} + diff --git a/frame/1/swapv/bli_swapv_kernel.h b/frame/ind/cntx/bli_trsmind_cntx.h similarity index 72% rename from frame/1/swapv/bli_swapv_kernel.h rename to frame/ind/cntx/bli_trsmind_cntx.h index d2c446564..3d3c883f9 100644 --- a/frame/1/swapv/bli_swapv_kernel.h +++ b/frame/ind/cntx/bli_trsmind_cntx.h @@ -32,29 +32,29 @@ */ -void bli_swapv_kernel( obj_t* x, - obj_t* y ); - - +/* // -// Prototype the void pointer kernel wrappers. +// Generate prototypes for _cntx_init(), _cntx_stage(), and _cntx_finalize() +// for each induced method (including native execution) based on trsm. // -#undef GENTPROT2 -#define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \ +#undef GENPROT +#define GENPROT( opname, imeth ) \ \ -void PASTEMAC2(chx,chy,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* y, inc_t incy \ - ); +void PASTEMAC2(opname,imeth,_cntx_init)( void ); \ +void PASTEMAC2(opname,imeth,_cntx_finalize)( void ); -INSERT_GENTPROT2_BASIC( swapv_kernel_void ) +GENPROT( trsm, nat ) +GENPROT( trsm, 3m1 ) +GENPROT( trsm, 4m1 ) +*/ -#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT -INSERT_GENTPROT2_MIX_D( swapv_kernel_void ) -#endif +void bli_trsmnat_cntx_init( cntx_t* cntx ); +void bli_trsmnat_cntx_finalize( cntx_t* cntx ); + +void bli_trsm4m1_cntx_init( cntx_t* cntx ); +void bli_trsm4m1_cntx_finalize( cntx_t* cntx ); + +void bli_trsm3m1_cntx_init( cntx_t* cntx ); +void bli_trsm3m1_cntx_finalize( cntx_t* cntx ); -#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT -INSERT_GENTPROT2_MIX_P( swapv_kernel_void ) -#endif diff --git a/frame/ind/include/bli_packm_ind_pre_macro_defs.h b/frame/ind/include/bli_packm_ind_pre_macro_defs.h index 360c78faa..ee5070e49 100644 --- a/frame/ind/include/bli_packm_ind_pre_macro_defs.h +++ b/frame/ind/include/bli_packm_ind_pre_macro_defs.h @@ -38,142 +38,142 @@ // packm_2xk_3mis kernels -#define BLIS_CPACKM_2XK_3MIS_KERNEL_REF bli_cpackm_ref_2xk_3mis -#define BLIS_ZPACKM_2XK_3MIS_KERNEL_REF bli_zpackm_ref_2xk_3mis +#define BLIS_CPACKM_2XK_3MIS_KERNEL_REF bli_cpackm_2xk_3mis_ref +#define BLIS_ZPACKM_2XK_3MIS_KERNEL_REF bli_zpackm_2xk_3mis_ref // packm_4xk_3mis kernels -#define BLIS_CPACKM_4XK_3MIS_KERNEL_REF bli_cpackm_ref_4xk_3mis -#define BLIS_ZPACKM_4XK_3MIS_KERNEL_REF bli_zpackm_ref_4xk_3mis +#define BLIS_CPACKM_4XK_3MIS_KERNEL_REF bli_cpackm_4xk_3mis_ref +#define BLIS_ZPACKM_4XK_3MIS_KERNEL_REF bli_zpackm_4xk_3mis_ref // packm_6xk_3mis kernels -#define BLIS_CPACKM_6XK_3MIS_KERNEL_REF bli_cpackm_ref_6xk_3mis -#define BLIS_ZPACKM_6XK_3MIS_KERNEL_REF bli_zpackm_ref_6xk_3mis +#define BLIS_CPACKM_6XK_3MIS_KERNEL_REF bli_cpackm_6xk_3mis_ref +#define BLIS_ZPACKM_6XK_3MIS_KERNEL_REF bli_zpackm_6xk_3mis_ref // packm_8xk_3mis kernels -#define BLIS_CPACKM_8XK_3MIS_KERNEL_REF bli_cpackm_ref_8xk_3mis -#define BLIS_ZPACKM_8XK_3MIS_KERNEL_REF bli_zpackm_ref_8xk_3mis +#define BLIS_CPACKM_8XK_3MIS_KERNEL_REF bli_cpackm_8xk_3mis_ref +#define BLIS_ZPACKM_8XK_3MIS_KERNEL_REF bli_zpackm_8xk_3mis_ref // packm_10xk_3mis kernels -#define BLIS_CPACKM_10XK_3MIS_KERNEL_REF bli_cpackm_ref_10xk_3mis -#define BLIS_ZPACKM_10XK_3MIS_KERNEL_REF bli_zpackm_ref_10xk_3mis +#define BLIS_CPACKM_10XK_3MIS_KERNEL_REF bli_cpackm_10xk_3mis_ref +#define BLIS_ZPACKM_10XK_3MIS_KERNEL_REF bli_zpackm_10xk_3mis_ref // packm_12xk_3mis kernels -#define BLIS_CPACKM_12XK_3MIS_KERNEL_REF bli_cpackm_ref_12xk_3mis -#define BLIS_ZPACKM_12XK_3MIS_KERNEL_REF bli_zpackm_ref_12xk_3mis +#define BLIS_CPACKM_12XK_3MIS_KERNEL_REF bli_cpackm_12xk_3mis_ref +#define BLIS_ZPACKM_12XK_3MIS_KERNEL_REF bli_zpackm_12xk_3mis_ref // packm_14xk_3mis kernels -#define BLIS_CPACKM_14XK_3MIS_KERNEL_REF bli_cpackm_ref_14xk_3mis -#define BLIS_ZPACKM_14XK_3MIS_KERNEL_REF bli_zpackm_ref_14xk_3mis +#define BLIS_CPACKM_14XK_3MIS_KERNEL_REF bli_cpackm_14xk_3mis_ref +#define BLIS_ZPACKM_14XK_3MIS_KERNEL_REF bli_zpackm_14xk_3mis_ref // packm_16xk_3mis kernels -#define BLIS_CPACKM_16XK_3MIS_KERNEL_REF bli_cpackm_ref_16xk_3mis -#define BLIS_ZPACKM_16XK_3MIS_KERNEL_REF bli_zpackm_ref_16xk_3mis +#define BLIS_CPACKM_16XK_3MIS_KERNEL_REF bli_cpackm_16xk_3mis_ref +#define BLIS_ZPACKM_16XK_3MIS_KERNEL_REF bli_zpackm_16xk_3mis_ref // packm_30xk_3mis kernels -#define BLIS_CPACKM_30XK_3MIS_KERNEL_REF bli_cpackm_ref_30xk_3mis -#define BLIS_ZPACKM_30XK_3MIS_KERNEL_REF bli_zpackm_ref_30xk_3mis +#define BLIS_CPACKM_30XK_3MIS_KERNEL_REF bli_cpackm_30xk_3mis_ref +#define BLIS_ZPACKM_30XK_3MIS_KERNEL_REF bli_zpackm_30xk_3mis_ref // packm_2xk_4mi kernels -#define BLIS_CPACKM_2XK_4MI_KERNEL_REF bli_cpackm_ref_2xk_4mi -#define BLIS_ZPACKM_2XK_4MI_KERNEL_REF bli_zpackm_ref_2xk_4mi +#define BLIS_CPACKM_2XK_4MI_KERNEL_REF bli_cpackm_2xk_4mi_ref +#define BLIS_ZPACKM_2XK_4MI_KERNEL_REF bli_zpackm_2xk_4mi_ref // packm_4xk_4mi kernels -#define BLIS_CPACKM_4XK_4MI_KERNEL_REF bli_cpackm_ref_4xk_4mi -#define BLIS_ZPACKM_4XK_4MI_KERNEL_REF bli_zpackm_ref_4xk_4mi +#define BLIS_CPACKM_4XK_4MI_KERNEL_REF bli_cpackm_4xk_4mi_ref +#define BLIS_ZPACKM_4XK_4MI_KERNEL_REF bli_zpackm_4xk_4mi_ref // packm_6xk_4mi kernels -#define BLIS_CPACKM_6XK_4MI_KERNEL_REF bli_cpackm_ref_6xk_4mi -#define BLIS_ZPACKM_6XK_4MI_KERNEL_REF bli_zpackm_ref_6xk_4mi +#define BLIS_CPACKM_6XK_4MI_KERNEL_REF bli_cpackm_6xk_4mi_ref +#define BLIS_ZPACKM_6XK_4MI_KERNEL_REF bli_zpackm_6xk_4mi_ref // packm_8xk_4mi kernels -#define BLIS_CPACKM_8XK_4MI_KERNEL_REF bli_cpackm_ref_8xk_4mi -#define BLIS_ZPACKM_8XK_4MI_KERNEL_REF bli_zpackm_ref_8xk_4mi +#define BLIS_CPACKM_8XK_4MI_KERNEL_REF bli_cpackm_8xk_4mi_ref +#define BLIS_ZPACKM_8XK_4MI_KERNEL_REF bli_zpackm_8xk_4mi_ref // packm_10xk_4mi kernels -#define BLIS_CPACKM_10XK_4MI_KERNEL_REF bli_cpackm_ref_10xk_4mi -#define BLIS_ZPACKM_10XK_4MI_KERNEL_REF bli_zpackm_ref_10xk_4mi +#define BLIS_CPACKM_10XK_4MI_KERNEL_REF bli_cpackm_10xk_4mi_ref +#define BLIS_ZPACKM_10XK_4MI_KERNEL_REF bli_zpackm_10xk_4mi_ref // packm_12xk_4mi kernels -#define BLIS_CPACKM_12XK_4MI_KERNEL_REF bli_cpackm_ref_12xk_4mi -#define BLIS_ZPACKM_12XK_4MI_KERNEL_REF bli_zpackm_ref_12xk_4mi +#define BLIS_CPACKM_12XK_4MI_KERNEL_REF bli_cpackm_12xk_4mi_ref +#define BLIS_ZPACKM_12XK_4MI_KERNEL_REF bli_zpackm_12xk_4mi_ref // packm_14xk_4mi kernels -#define BLIS_CPACKM_14XK_4MI_KERNEL_REF bli_cpackm_ref_14xk_4mi -#define BLIS_ZPACKM_14XK_4MI_KERNEL_REF bli_zpackm_ref_14xk_4mi +#define BLIS_CPACKM_14XK_4MI_KERNEL_REF bli_cpackm_14xk_4mi_ref +#define BLIS_ZPACKM_14XK_4MI_KERNEL_REF bli_zpackm_14xk_4mi_ref // packm_16xk_4mi kernels -#define BLIS_CPACKM_16XK_4MI_KERNEL_REF bli_cpackm_ref_16xk_4mi -#define BLIS_ZPACKM_16XK_4MI_KERNEL_REF bli_zpackm_ref_16xk_4mi +#define BLIS_CPACKM_16XK_4MI_KERNEL_REF bli_cpackm_16xk_4mi_ref +#define BLIS_ZPACKM_16XK_4MI_KERNEL_REF bli_zpackm_16xk_4mi_ref // packm_30xk_4mi kernels -#define BLIS_CPACKM_30XK_4MI_KERNEL_REF bli_cpackm_ref_30xk_4mi -#define BLIS_ZPACKM_30XK_4MI_KERNEL_REF bli_zpackm_ref_30xk_4mi +#define BLIS_CPACKM_30XK_4MI_KERNEL_REF bli_cpackm_30xk_4mi_ref +#define BLIS_ZPACKM_30XK_4MI_KERNEL_REF bli_zpackm_30xk_4mi_ref // packm_2xk_rih kernels -#define BLIS_CPACKM_2XK_RIH_KERNEL_REF bli_cpackm_ref_2xk_rih -#define BLIS_ZPACKM_2XK_RIH_KERNEL_REF bli_zpackm_ref_2xk_rih +#define BLIS_CPACKM_2XK_RIH_KERNEL_REF bli_cpackm_2xk_rih_ref +#define BLIS_ZPACKM_2XK_RIH_KERNEL_REF bli_zpackm_2xk_rih_ref // packm_4xk_rih kernels -#define BLIS_CPACKM_4XK_RIH_KERNEL_REF bli_cpackm_ref_4xk_rih -#define BLIS_ZPACKM_4XK_RIH_KERNEL_REF bli_zpackm_ref_4xk_rih +#define BLIS_CPACKM_4XK_RIH_KERNEL_REF bli_cpackm_4xk_rih_ref +#define BLIS_ZPACKM_4XK_RIH_KERNEL_REF bli_zpackm_4xk_rih_ref // packm_6xk_rih kernels -#define BLIS_CPACKM_6XK_RIH_KERNEL_REF bli_cpackm_ref_6xk_rih -#define BLIS_ZPACKM_6XK_RIH_KERNEL_REF bli_zpackm_ref_6xk_rih +#define BLIS_CPACKM_6XK_RIH_KERNEL_REF bli_cpackm_6xk_rih_ref +#define BLIS_ZPACKM_6XK_RIH_KERNEL_REF bli_zpackm_6xk_rih_ref // packm_8xk_rih kernels -#define BLIS_CPACKM_8XK_RIH_KERNEL_REF bli_cpackm_ref_8xk_rih -#define BLIS_ZPACKM_8XK_RIH_KERNEL_REF bli_zpackm_ref_8xk_rih +#define BLIS_CPACKM_8XK_RIH_KERNEL_REF bli_cpackm_8xk_rih_ref +#define BLIS_ZPACKM_8XK_RIH_KERNEL_REF bli_zpackm_8xk_rih_ref // packm_10xk_rih kernels -#define BLIS_CPACKM_10XK_RIH_KERNEL_REF bli_cpackm_ref_10xk_rih -#define BLIS_ZPACKM_10XK_RIH_KERNEL_REF bli_zpackm_ref_10xk_rih +#define BLIS_CPACKM_10XK_RIH_KERNEL_REF bli_cpackm_10xk_rih_ref +#define BLIS_ZPACKM_10XK_RIH_KERNEL_REF bli_zpackm_10xk_rih_ref // packm_12xk_rih kernels -#define BLIS_CPACKM_12XK_RIH_KERNEL_REF bli_cpackm_ref_12xk_rih -#define BLIS_ZPACKM_12XK_RIH_KERNEL_REF bli_zpackm_ref_12xk_rih +#define BLIS_CPACKM_12XK_RIH_KERNEL_REF bli_cpackm_12xk_rih_ref +#define BLIS_ZPACKM_12XK_RIH_KERNEL_REF bli_zpackm_12xk_rih_ref // packm_14xk_rih kernels -#define BLIS_CPACKM_14XK_RIH_KERNEL_REF bli_cpackm_ref_14xk_rih -#define BLIS_ZPACKM_14XK_RIH_KERNEL_REF bli_zpackm_ref_14xk_rih +#define BLIS_CPACKM_14XK_RIH_KERNEL_REF bli_cpackm_14xk_rih_ref +#define BLIS_ZPACKM_14XK_RIH_KERNEL_REF bli_zpackm_14xk_rih_ref // packm_16xk_rih kernels -#define BLIS_CPACKM_16XK_RIH_KERNEL_REF bli_cpackm_ref_16xk_rih -#define BLIS_ZPACKM_16XK_RIH_KERNEL_REF bli_zpackm_ref_16xk_rih +#define BLIS_CPACKM_16XK_RIH_KERNEL_REF bli_cpackm_16xk_rih_ref +#define BLIS_ZPACKM_16XK_RIH_KERNEL_REF bli_zpackm_16xk_rih_ref // packm_30xk_rih kernels -#define BLIS_CPACKM_30XK_RIH_KERNEL_REF bli_cpackm_ref_30xk_rih -#define BLIS_ZPACKM_30XK_RIH_KERNEL_REF bli_zpackm_ref_30xk_rih +#define BLIS_CPACKM_30XK_RIH_KERNEL_REF bli_cpackm_30xk_rih_ref +#define BLIS_ZPACKM_30XK_RIH_KERNEL_REF bli_zpackm_30xk_rih_ref diff --git a/frame/ind/include/bli_kernel_ind_prototypes.h b/frame/ind/include/old/bli_kernel_ind_prototypes.h similarity index 100% rename from frame/ind/include/bli_kernel_ind_prototypes.h rename to frame/ind/include/old/bli_kernel_ind_prototypes.h diff --git a/frame/ind/oapi/bli_l3_3m4m_oapi.c b/frame/ind/oapi/bli_l3_3m4m_oapi.c new file mode 100644 index 000000000..04f2259d2 --- /dev/null +++ b/frame/ind/oapi/bli_l3_3m4m_oapi.c @@ -0,0 +1,382 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Bring control trees into scope. +extern gemm_t* gemm_cntl; +extern trsm_t* trsm_l_cntl; +extern trsm_t* trsm_r_cntl; + + +// -- gemm/her2k/syr2k --------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth, nstage ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ + dim_t i; \ +\ + obj_t* beta_use = beta; \ +\ + /* If the objects are in the real domain, execute the native + implementation. */ \ + if ( bli_obj_is_real( *c ) ) \ + { \ + PASTEMAC(opname,nat)( alpha, a, b, beta, c, cntx ); \ + return; \ + } \ +\ + /* Initialize a local context if the one provided is NULL. */ \ + bli_cntx_init_local_if2( cname, imeth, cntx, cntx_p ); \ +\ + /* Some induced methods execute in multiple "stages". */ \ + for ( i = 0; i < nstage; ++i ) \ + { \ + /* Prepare the context for the ith stage of computation. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, cntx_p ); \ +\ + /* For multi-stage methods, use BLIS_ONE as beta after the first + stage. */ \ + if ( i > 0 ) beta_use = &BLIS_ONE; \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front)( alpha, a, b, beta_use, c, cntx_p, \ + PASTECH(cname,_cntl) ); \ + } \ +\ + /* Finalize the local context if it was initialized here. */ \ + bli_cntx_finalize_local_if2( cname, imeth, cntx ); \ +} + +// gemm +GENFRONT( gemm, gemm, 3mh, 3 ) +GENFRONT( gemm, gemm, 3m3, 1 ) +GENFRONT( gemm, gemm, 3m2, 1 ) +GENFRONT( gemm, gemm, 3m1, 1 ) +GENFRONT( gemm, gemm, 4mh, 4 ) +GENFRONT( gemm, gemm, 4mb, 1 ) +GENFRONT( gemm, gemm, 4m1, 1 ) + +// her2k +GENFRONT( her2k, gemm, 3mh, 3 ) +//GENFRONT( her2k, gemm, 3m3, 1 ) // Not implemented. +//GENFRONT( her2k, gemm, 3m2, 1 ) // Not implemented. +GENFRONT( her2k, gemm, 3m1, 1 ) +GENFRONT( her2k, gemm, 4mh, 4 ) +//GENFRONT( her2k, gemm, 4mb, 1 ) // Not implemented. +GENFRONT( her2k, gemm, 4m1, 1 ) + +// syr2k +GENFRONT( syr2k, gemm, 3mh, 3 ) +//GENFRONT( syr2k, gemm, 3m3, 1 ) // Not implemented. +//GENFRONT( syr2k, gemm, 3m2, 1 ) // Not implemented. +GENFRONT( syr2k, gemm, 3m1, 1 ) +GENFRONT( syr2k, gemm, 4mh, 4 ) +//GENFRONT( syr2k, gemm, 4mb, 1 ) // Not implemented. +GENFRONT( syr2k, gemm, 4m1, 1 ) + + +// -- hemm/symm/trmm3 ---------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth, nstage ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ + dim_t i; \ +\ + obj_t* beta_use = beta; \ +\ + /* If the objects are in the real domain, execute the native + implementation. */ \ + if ( bli_obj_is_real( *c ) ) \ + { \ + PASTEMAC(opname,nat)( side, alpha, a, b, beta, c, cntx ); \ + return; \ + } \ +\ + /* Initialize a local context if the one provided is NULL. */ \ + bli_cntx_init_local_if2( cname, imeth, cntx, cntx_p ); \ +\ + /* Some induced methods execute in multiple "stages". */ \ + for ( i = 0; i < nstage; ++i ) \ + { \ + /* Prepare the context for the ith stage of computation. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, cntx_p ); \ +\ + /* For multi-stage methods, use BLIS_ONE as beta after the first + stage. */ \ + if ( i > 0 ) beta_use = &BLIS_ONE; \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front)( side, alpha, a, b, beta_use, c, cntx_p, \ + PASTECH(cname,_cntl) ); \ + } \ +\ + /* Finalize the local context if it was initialized here. */ \ + bli_cntx_finalize_local_if2( cname, imeth, cntx ); \ +} + +// hemm +GENFRONT( hemm, gemm, 3mh, 3 ) +//GENFRONT( hemm, gemm, 3m3, 1 ) // Not implemented. +//GENFRONT( hemm, gemm, 3m2, 1 ) // Not implemented. +GENFRONT( hemm, gemm, 3m1, 1 ) +GENFRONT( hemm, gemm, 4mh, 4 ) +//GENFRONT( hemm, gemm, 4mb, 1 ) // Not implemented. +GENFRONT( hemm, gemm, 4m1, 1 ) + +// symm +GENFRONT( symm, gemm, 3mh, 3 ) +//GENFRONT( symm, gemm, 3m3, 1 ) // Not implemented. +//GENFRONT( symm, gemm, 3m2, 1 ) // Not implemented. +GENFRONT( symm, gemm, 3m1, 1 ) +GENFRONT( symm, gemm, 4mh, 4 ) +//GENFRONT( symm, gemm, 4mb, 1 ) // Not implemented. +GENFRONT( symm, gemm, 4m1, 1 ) + +// trmm3 +GENFRONT( trmm3, gemm, 3mh, 3 ) +//GENFRONT( trmm3, gemm, 3m3, 1 ) // Not implemented. +//GENFRONT( trmm3, gemm, 3m2, 1 ) // Not implemented. +GENFRONT( trmm3, gemm, 3m1, 1 ) +GENFRONT( trmm3, gemm, 4mh, 4 ) +//GENFRONT( trmm3, gemm, 4mb, 1 ) // Not implemented. +GENFRONT( trmm3, gemm, 4m1, 1 ) + + +// -- herk/syrk ---------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth, nstage ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ + dim_t i; \ +\ + obj_t* beta_use = beta; \ +\ + /* If the objects are in the real domain, execute the native + implementation. */ \ + if ( bli_obj_is_real( *c ) ) \ + { \ + PASTEMAC(opname,nat)( alpha, a, beta, c, cntx ); \ + return; \ + } \ +\ + /* Initialize a local context if the one provided is NULL. */ \ + bli_cntx_init_local_if2( cname, imeth, cntx, cntx_p ); \ +\ + /* Some induced methods execute in multiple "stages". */ \ + for ( i = 0; i < nstage; ++i ) \ + { \ + /* Prepare the context for the ith stage of computation. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, cntx_p ); \ +\ + /* For multi-stage methods, use BLIS_ONE as beta after the first + stage. */ \ + if ( i > 0 ) beta_use = &BLIS_ONE; \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front)( alpha, a, beta_use, c, cntx_p, \ + PASTECH(cname,_cntl) ); \ + } \ +\ + /* Finalize the local context if it was initialized here. */ \ + bli_cntx_finalize_local_if2( cname, imeth, cntx ); \ +} + +// herk +GENFRONT( herk, gemm, 3mh, 3 ) +//GENFRONT( herk, gemm, 3m3, 1 ) // Not implemented. +//GENFRONT( herk, gemm, 3m2, 1 ) // Not implemented. +GENFRONT( herk, gemm, 3m1, 1 ) +GENFRONT( herk, gemm, 4mh, 4 ) +//GENFRONT( herk, gemm, 4mb, 1 ) // Not implemented. +GENFRONT( herk, gemm, 4m1, 1 ) + +// syrk +GENFRONT( syrk, gemm, 3mh, 3 ) +//GENFRONT( syrk, gemm, 3m3, 1 ) // Not implemented. +//GENFRONT( syrk, gemm, 3m2, 1 ) // Not implemented. +GENFRONT( syrk, gemm, 3m1, 1 ) +GENFRONT( syrk, gemm, 4mh, 4 ) +//GENFRONT( syrk, gemm, 4mb, 1 ) // Not implemented. +GENFRONT( syrk, gemm, 4m1, 1 ) + + +// -- trmm --------------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth, nstage ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ + dim_t i; \ +\ + /* If the objects are in the real domain, execute the native + implementation. */ \ + if ( bli_obj_is_real( *b ) ) \ + { \ + PASTEMAC(opname,nat)( side, alpha, a, b, cntx ); \ + return; \ + } \ +\ + /* Initialize a local context if the one provided is NULL. */ \ + bli_cntx_init_local_if2( cname, imeth, cntx, cntx_p ); \ +\ + /* Some induced methods execute in multiple "stages". */ \ + for ( i = 0; i < nstage; ++i ) \ + { \ + /* Prepare the context for the ith stage of computation. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, cntx_p ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front)( side, alpha, a, b, cntx_p, \ + PASTECH(cname,_cntl) ); \ + } \ +\ + /* Finalize the local context if it was initialized here. */ \ + bli_cntx_finalize_local_if2( cname, imeth, cntx ); \ +} + +// trmm +//GENFRONT( trmm, gemm, 3mh, 3 ) // Unimplementable. +//GENFRONT( trmm, gemm, 3m3, 1 ) // Unimplementable. +//GENFRONT( trmm, gemm, 3m2, 1 ) // Unimplementable. +GENFRONT( trmm, gemm, 3m1, 1 ) +//GENFRONT( trmm, gemm, 4mh, 4 ) // Unimplementable. +//GENFRONT( trmm, gemm, 4mb, 1 ) // Unimplementable. +GENFRONT( trmm, gemm, 4m1, 1 ) + + +// -- trsm --------------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth, nstage ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* If the objects are in the real domain, execute the native + implementation. */ \ + if ( bli_obj_is_real( *b ) ) \ + { \ + PASTEMAC(opname,nat)( side, alpha, a, b, cntx ); \ + return; \ + } \ +\ + /* Initialize a local context if the one provided is NULL. */ \ + bli_cntx_init_local_if2( cname, imeth, cntx, cntx_p ); \ +\ + { \ + /* NOTE: trsm cannot be implemented via any induced method that + needs to execute in stages (e.g. 3mh, 4mh). */ \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front)( side, alpha, a, b, cntx_p, \ + PASTECH(cname,_l_cntl), \ + PASTECH(cname,_r_cntl) ); \ + } \ +\ + /* Finalize the local context if it was initialized here. */ \ + bli_cntx_finalize_local_if2( cname, imeth, cntx ); \ +} + +// trsm +//GENFRONT( trmm, trsm, 3mh, 3 ) // Unimplementable. +//GENFRONT( trmm, trsm, 3m3, 1 ) // Unimplementable. +//GENFRONT( trmm, trsm, 3m2, 1 ) // Unimplementable. +GENFRONT( trsm, trsm, 3m1, 1 ) +//GENFRONT( trmm, trsm, 4mh, 4 ) // Unimplementable. +//GENFRONT( trmm, trsm, 4mb, 1 ) // Unimplementable. +GENFRONT( trsm, trsm, 4m1, 1 ) + + +// +// ----------------------------------------------------------------------------- +// ----------------------------------------------------------------------------- +// ----------------------------------------------------------------------------- +// + diff --git a/frame/ind/oapi/bli_l3_ind_oapi.c b/frame/ind/oapi/bli_l3_ind_oapi.c new file mode 100644 index 000000000..348d31e51 --- /dev/null +++ b/frame/ind/oapi/bli_l3_ind_oapi.c @@ -0,0 +1,137 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + + +// -- gemm/her2k/syr2k --------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, imeth ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *c ); \ + PASTECH(opname,_oft) func = PASTEMAC(opname,ind_get_avail)( dt ); \ +\ + func( alpha, a, b, beta, c, cntx ); \ +} + +GENFRONT( gemm, ind ) +GENFRONT( her2k, ind ) +GENFRONT( syr2k, ind ) + + +// -- hemm/symm/trmm3 ---------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, imeth ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *c ); \ + PASTECH(opname,_oft) func = PASTEMAC(opname,ind_get_avail)( dt ); \ +\ + func( side, alpha, a, b, beta, c, cntx ); \ +} + +GENFRONT( hemm, ind ) +GENFRONT( symm, ind ) +GENFRONT( trmm3, ind ) + + +// -- herk/syrk ---------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, imeth ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *c ); \ + PASTECH(opname,_oft) func = PASTEMAC(opname,ind_get_avail)( dt ); \ +\ + func( alpha, a, beta, c, cntx ); \ +} + +GENFRONT( herk, ind ) +GENFRONT( syrk, ind ) + + +// -- trmm/trsm ---------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, imeth ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + cntx_t* cntx \ + ) \ +{ \ + num_t dt = bli_obj_datatype( *b ); \ + PASTECH(opname,_oft) func = PASTEMAC(opname,ind_get_avail)( dt ); \ +\ + func( side, alpha, a, b, cntx ); \ +} + +GENFRONT( trmm, ind ) +GENFRONT( trsm, ind ) + diff --git a/frame/ind/oapi/bli_oapi_ind.h b/frame/ind/oapi/bli_l3_ind_oapi.h similarity index 77% rename from frame/ind/oapi/bli_oapi_ind.h rename to frame/ind/oapi/bli_l3_ind_oapi.h index 46875c176..62fa794fa 100644 --- a/frame/ind/oapi/bli_oapi_ind.h +++ b/frame/ind/oapi/bli_l3_ind_oapi.h @@ -40,17 +40,19 @@ #undef GENPROT #define GENPROT( imeth ) \ \ -void PASTEMAC(gemm,imeth) ( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(hemm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(herk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c ); \ -void PASTEMAC(her2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(symm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(syrk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c ); \ -void PASTEMAC(syr2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(trmm3,imeth)( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(trmm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b ); \ -void PASTEMAC(trsm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b ); +void PASTEMAC(gemm,imeth) ( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(hemm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(herk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(her2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(symm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(syrk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(syr2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(trmm3,imeth)( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(trmm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, cntx_t* cntx ); \ +void PASTEMAC(trsm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, cntx_t* cntx ); +GENPROT( nat ) +GENPROT( ind ) GENPROT( 3m1 ) GENPROT( 4m1 ) @@ -62,14 +64,14 @@ GENPROT( 4m1 ) #undef GENPROT_NO2OP #define GENPROT_NO2OP( imeth ) \ \ -void PASTEMAC(gemm,imeth) ( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(hemm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(herk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c ); \ -void PASTEMAC(her2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(symm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(syrk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c ); \ -void PASTEMAC(syr2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \ -void PASTEMAC(trmm3,imeth)( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); +void PASTEMAC(gemm,imeth) ( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(hemm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(herk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(her2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(symm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(syrk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(syr2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); \ +void PASTEMAC(trmm3,imeth)( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx ); GENPROT_NO2OP( 3mh ) GENPROT_NO2OP( 3m3 ) diff --git a/frame/ind/oapi/bli_l3_nat_oapi.c b/frame/ind/oapi/bli_l3_nat_oapi.c new file mode 100644 index 000000000..9038067c5 --- /dev/null +++ b/frame/ind/oapi/bli_l3_nat_oapi.c @@ -0,0 +1,225 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Bring control trees into scope. +extern gemm_t* gemm_cntl; +extern trsm_t* trsm_l_cntl; +extern trsm_t* trsm_r_cntl; + +// NOTE: The function definitions in this file can be consolidated with the +// definitions for the other induced methods. The only advantage of keeping +// them separate is that it allows us to avoid the very small loop overhead +// of executing one iteration of a for loop, plus the overhead of calling a +// function that does nothing (ie: the _cntx_init_stage() function). + +// -- gemm/her2k/syr2k --------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* Initialize a local context if the one provided is NULL. */ \ + bli_cntx_init_local_if2( cname, imeth, cntx, cntx_p ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front) \ + ( \ + alpha, a, b, beta, c, cntx_p, \ + PASTECH(cname,_cntl) \ + ); \ +\ + /* Finalize the local context if it was initialized here. */ \ + bli_cntx_finalize_local_if2( cname, imeth, cntx ); \ +} + +GENFRONT( gemm, gemm, nat ) +GENFRONT( her2k, gemm, nat ) +GENFRONT( syr2k, gemm, nat ) + + +// -- hemm/symm/trmm3 ---------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* Initialize a local context if the one provided is NULL. */ \ + bli_cntx_init_local_if2( cname, imeth, cntx, cntx_p ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front) \ + ( \ + side, alpha, a, b, beta, c, cntx_p, \ + PASTECH(cname,_cntl) \ + ); \ +\ + /* Finalize the local context if it was initialized here. */ \ + bli_cntx_finalize_local_if2( cname, imeth, cntx ); \ +} + +GENFRONT( hemm, gemm, nat ) +GENFRONT( symm, gemm, nat ) +GENFRONT( trmm3, gemm, nat ) + + +// -- herk/syrk ---------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* beta, \ + obj_t* c, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* Initialize a local context if the one provided is NULL. */ \ + bli_cntx_init_local_if2( cname, imeth, cntx, cntx_p ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front) \ + ( \ + alpha, a, beta, c, cntx_p, \ + PASTECH(cname,_cntl) \ + ); \ +\ + /* Finalize the local context if it was initialized here. */ \ + bli_cntx_finalize_local_if2( cname, imeth, cntx ); \ +} + +GENFRONT( herk, gemm, nat ) +GENFRONT( syrk, gemm, nat ) + + +// -- trmm --------------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* Initialize a local context if the one provided is NULL. */ \ + bli_cntx_init_local_if2( cname, imeth, cntx, cntx_p ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front) \ + ( \ + side, alpha, a, b, cntx_p, \ + PASTECH(cname,_cntl) \ + ); \ +\ + /* Finalize the local context if it was initialized here. */ \ + bli_cntx_finalize_local_if2( cname, imeth, cntx ); \ +} + +GENFRONT( trmm, gemm, nat ) + + +// -- trsm --------------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth ) \ +\ +void PASTEMAC(opname,imeth) \ + ( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p; \ +\ + /* Initialize a local context if the one provided is NULL. */ \ + bli_cntx_init_local_if2( cname, imeth, cntx, cntx_p ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front) \ + ( \ + side, alpha, a, b, cntx_p, \ + PASTECH(cname,_l_cntl), \ + PASTECH(cname,_r_cntl) \ + ); \ +\ + /* Finalize the local context if it was initialized here. */ \ + bli_cntx_finalize_local_if2( cname, imeth, cntx ); \ +} + +GENFRONT( trsm, trsm, nat ) + diff --git a/frame/ind/oapi/bli_oapi_3m1.c b/frame/ind/oapi/old/bli_oapi_3m1.c similarity index 92% rename from frame/ind/oapi/bli_oapi_3m1.c rename to frame/ind/oapi/old/bli_oapi_3m1.c index 6f33ea345..b5c2080f2 100644 --- a/frame/ind/oapi/bli_oapi_3m1.c +++ b/frame/ind/oapi/old/bli_oapi_3m1.c @@ -52,13 +52,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -80,13 +82,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -108,13 +112,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -136,13 +142,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -163,13 +171,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* b \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *b ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, \ + PASTEMAC(opname,_front)( side, alpha, a, b, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -189,13 +199,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* b \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *b ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, \ + PASTEMAC(opname,_front)( side, alpha, a, b, &cntx, \ PASTECH2(cname,imeth,_l_cntl), \ PASTECH2(cname,imeth,_r_cntl) ); \ } \ diff --git a/frame/ind/oapi/bli_oapi_3m2.c b/frame/ind/oapi/old/bli_oapi_3m2.c similarity index 93% rename from frame/ind/oapi/bli_oapi_3m2.c rename to frame/ind/oapi/old/bli_oapi_3m2.c index 42947ca34..31bf4189c 100644 --- a/frame/ind/oapi/bli_oapi_3m2.c +++ b/frame/ind/oapi/old/bli_oapi_3m2.c @@ -50,13 +50,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -78,13 +80,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -106,13 +110,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -134,13 +140,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } diff --git a/frame/ind/oapi/bli_oapi_3m3.c b/frame/ind/oapi/old/bli_oapi_3m3.c similarity index 93% rename from frame/ind/oapi/bli_oapi_3m3.c rename to frame/ind/oapi/old/bli_oapi_3m3.c index 4b4fe6653..dc4df2ce0 100644 --- a/frame/ind/oapi/bli_oapi_3m3.c +++ b/frame/ind/oapi/old/bli_oapi_3m3.c @@ -50,13 +50,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -78,13 +80,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -106,13 +110,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -134,13 +140,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } diff --git a/frame/ind/oapi/bli_oapi_3mh.c b/frame/ind/oapi/old/bli_oapi_3mh.c similarity index 86% rename from frame/ind/oapi/bli_oapi_3mh.c rename to frame/ind/oapi/old/bli_oapi_3mh.c index 4c51174a5..a36ba0c86 100644 --- a/frame/ind/oapi/bli_oapi_3mh.c +++ b/frame/ind/oapi/old/bli_oapi_3mh.c @@ -52,17 +52,19 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ro) ); \ - PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_io) ); \ - PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_rpi) ); \ } \ } @@ -84,17 +86,19 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ro) ); \ - PASTEMAC(opname,_front)( side, alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_io) ); \ - PASTEMAC(opname,_front)( side, alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_rpi) ); \ } \ } @@ -116,17 +120,19 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ro) ); \ - PASTEMAC(opname,_front)( alpha, a, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_io) ); \ - PASTEMAC(opname,_front)( alpha, a, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_rpi) ); \ } \ } @@ -148,17 +154,19 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ro) ); \ - PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_io) ); \ - PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_rpi) ); \ } \ } diff --git a/frame/ind/oapi/old/bli_oapi_4m1.c b/frame/ind/oapi/old/bli_oapi_4m1.c new file mode 100644 index 000000000..793fec413 --- /dev/null +++ b/frame/ind/oapi/old/bli_oapi_4m1.c @@ -0,0 +1,280 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Bring control trees into scope. +extern gemm_t* gemm_cntl; +extern trsm_t* trsm_l_cntl; +extern trsm_t* trsm_r_cntl; + + +// -- gemm/her2k/syr2k --------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth, nstage ) \ +\ +void PASTEMAC(opname,imeth)( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c \ + ) \ +{ \ + cntx_t cntx; \ + dim_t i; \ +\ + /* If the objects are in the real domain, execute the native + implementation. */ \ + if ( bli_obj_is_real( *c ) ) \ + { \ + PASTEMAC(opname,nat)( alpha, a, b, beta, c ); \ + return; \ + } \ +\ + /* Initialize the context. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, &cntx ); \ +\ + /* Some induced methods (e.g. 3mh and 4mh) execute in multiple + "stages". */ \ + for ( i = 0; i < nstage; ++i ) \ + { \ + /* Prepare the context for the ith stage of computation. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, &cntx ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ + PASTECH(cname,_cntl) ); \ + } \ +\ + /* Finalize the context. */ \ + PASTEMAC2(cname,imeth,_cntx_finalize)( &cntx ); \ +} + +GENFRONT( gemm, gemm, 4m1, 1 ) +GENFRONT( her2k, gemm, 4m1, 1 ) +GENFRONT( syr2k, gemm, 4m1, 1 ) + + +// -- hemm/symm/trmm3 ---------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth, nstage ) \ +\ +void PASTEMAC(opname,imeth)( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c \ + ) \ +{ \ + cntx_t cntx; \ + dim_t i; \ +\ + /* If the objects are in the real domain, execute the native + implementation. */ \ + if ( bli_obj_is_real( *c ) ) \ + { \ + PASTEMAC(opname,nat)( side, alpha, a, b, beta, c ); \ + return; \ + } \ +\ + /* Initialize the context. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, &cntx ); \ +\ + /* Some induced methods (e.g. 3mh and 4mh) execute in multiple + "stages". */ \ + for ( i = 0; i < nstage; ++i ) \ + { \ + /* Prepare the context for the ith stage of computation. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, &cntx ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, &cntx, \ + PASTECH(cname,_cntl) ); \ + } \ +\ + /* Finalize the context. */ \ + PASTEMAC2(cname,imeth,_cntx_finalize)( &cntx ); \ +} + +GENFRONT( hemm, gemm, 4m1, 1 ) +GENFRONT( symm, gemm, 4m1, 1 ) +GENFRONT( trmm3, gemm, 4m1, 1 ) + + +// -- herk/syrk ---------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth, nstage ) \ +\ +void PASTEMAC(opname,imeth)( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* beta, \ + obj_t* c \ + ) \ +{ \ + cntx_t cntx; \ + dim_t i; \ +\ + /* If the objects are in the real domain, execute the native + implementation. */ \ + if ( bli_obj_is_real( *c ) ) \ + { \ + PASTEMAC(opname,nat)( alpha, a, beta, c ); \ + return; \ + } \ +\ + /* Initialize the context. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, &cntx ); \ +\ + /* Some induced methods (e.g. 3mh and 4mh) execute in multiple + "stages". */ \ + for ( i = 0; i < nstage; ++i ) \ + { \ + /* Prepare the context for the ith stage of computation. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, &cntx ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front)( alpha, a, beta, c, &cntx, \ + PASTECH(cname,_cntl) ); \ + } \ +\ + /* Finalize the context. */ \ + PASTEMAC2(cname,imeth,_cntx_finalize)( &cntx ); \ +} + +GENFRONT( herk, gemm, 4m1, 1 ) +GENFRONT( syrk, gemm, 4m1, 1 ) + + +// -- trmm --------------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth, nstage ) \ +\ +void PASTEMAC(opname,imeth)( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b \ + ) \ +{ \ + cntx_t cntx; \ + dim_t i; \ +\ + /* If the objects are in the real domain, execute the native + implementation. */ \ + if ( bli_obj_is_real( *b ) ) \ + { \ + PASTEMAC(opname,nat)( side, alpha, a, b ); \ + return; \ + } \ +\ + /* Initialize the context. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, &cntx ); \ +\ + /* Some induced methods (e.g. 3mh and 4mh) execute in multiple + "stages". */ \ + for ( i = 0; i < nstage; ++i ) \ + { \ + /* Prepare the context for the ith stage of computation. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, &cntx ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front)( side, alpha, a, b, &cntx, \ + PASTECH(cname,_cntl) ); \ + } \ +\ + /* Finalize the context. */ \ + PASTEMAC2(cname,imeth,_cntx_finalize)( &cntx ); \ +} + +GENFRONT( trmm, gemm, 4m1, 1 ) + + +// -- trsm --------------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth, nstage ) \ +\ +void PASTEMAC(opname,imeth)( \ + side_t side, \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b \ + ) \ +{ \ + cntx_t cntx; \ + dim_t i; \ +\ + /* If the objects are in the real domain, execute the native + implementation. */ \ + if ( bli_obj_is_real( *b ) ) \ + { \ + PASTEMAC(opname,nat)( side, alpha, a, b ); \ + return; \ + } \ +\ + /* Initialize the context. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, &cntx ); \ +\ + /* Some induced methods (e.g. 3mh and 4mh) execute in multiple + "stages". */ \ + for ( i = 0; i < nstage; ++i ) \ + { \ + /* Prepare the context for the ith stage of computation. */ \ + PASTEMAC2(cname,imeth,_cntx_stage)( i, &cntx ); \ +\ + /* Invoke the operation's front end with the appropriate control + tree. */ \ + PASTEMAC(opname,_front)( side, alpha, a, b, &cntx, \ + PASTECH(cname,_l_cntl), \ + PASTECH(cname,_r_cntl) ); \ + } \ +\ + /* Finalize the context. */ \ + PASTEMAC2(cname,imeth,_cntx_finalize)( &cntx ); \ +} + +GENFRONT( trsm, trsm, 4m1, 1 ) + diff --git a/frame/ind/oapi/bli_oapi_4mb.c b/frame/ind/oapi/old/bli_oapi_4mb.c similarity index 93% rename from frame/ind/oapi/bli_oapi_4mb.c rename to frame/ind/oapi/old/bli_oapi_4mb.c index 37c3f49eb..52ea80abc 100644 --- a/frame/ind/oapi/bli_oapi_4mb.c +++ b/frame/ind/oapi/old/bli_oapi_4mb.c @@ -52,13 +52,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -80,13 +82,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -108,13 +112,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -136,13 +142,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } @@ -163,13 +171,15 @@ void PASTEMAC(opname,imeth)( \ obj_t* b \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *b ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, \ + PASTEMAC(opname,_front)( side, alpha, a, b, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } diff --git a/frame/ind/oapi/bli_oapi_4mh.c b/frame/ind/oapi/old/bli_oapi_4mh.c similarity index 83% rename from frame/ind/oapi/bli_oapi_4mh.c rename to frame/ind/oapi/old/bli_oapi_4mh.c index 5332e6bc8..2355a7c72 100644 --- a/frame/ind/oapi/bli_oapi_4mh.c +++ b/frame/ind/oapi/old/bli_oapi_4mh.c @@ -53,19 +53,21 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl_rr) ); \ - PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ii) ); \ - PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ri) ); \ - PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ir) ); \ } \ } @@ -87,19 +89,21 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl_rr) ); \ - PASTEMAC(opname,_front)( side, alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ii) ); \ - PASTEMAC(opname,_front)( side, alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ri) ); \ - PASTEMAC(opname,_front)( side, alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ir) ); \ } \ } @@ -121,19 +125,21 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl_rr) ); \ - PASTEMAC(opname,_front)( alpha, a, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ii) ); \ - PASTEMAC(opname,_front)( alpha, a, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ri) ); \ - PASTEMAC(opname,_front)( alpha, a, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ir) ); \ } \ } @@ -155,19 +161,21 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl_rr) ); \ - PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ii) ); \ - PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ri) ); \ - PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, \ + PASTEMAC(opname,_front)( alpha, a, b, &BLIS_ONE, c, &cntx, \ PASTECH2(cname,imeth,_cntl_ir) ); \ } \ } diff --git a/frame/ind/oapi/bli_oapi_4m1.c b/frame/ind/oapi/old/bli_oapi_nat.c.old similarity index 74% rename from frame/ind/oapi/bli_oapi_4m1.c rename to frame/ind/oapi/old/bli_oapi_nat.c.old index 03b779565..c75692fb9 100644 --- a/frame/ind/oapi/bli_oapi_4m1.c +++ b/frame/ind/oapi/old/bli_oapi_nat.c.old @@ -34,12 +34,12 @@ #include "blis.h" -extern gemm_t* gemm4m1_cntl; -extern trsm_t* trsm4m1_l_cntl; -extern trsm_t* trsm4m1_r_cntl; +extern gemm_t* gemm_cntl; +extern trsm_t* trsm_l_cntl; +extern trsm_t* trsm_r_cntl; -// -- gemm/her2k/syr2k --------------------------------------------------------- +// -- gemm --------------------------------------------------------------------- #undef GENFRONT #define GENFRONT( opname, cname, imeth ) \ @@ -52,20 +52,21 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ - if ( bli_obj_is_real( *c ) ) \ - { \ - PASTEMAC0(opname)( alpha, a, b, beta, c ); \ - } \ - else \ - { \ - PASTEMAC(opname,_front)( alpha, a, b, beta, c, \ - PASTECH2(cname,imeth,_cntl) ); \ - } \ + cntx_t cntx; \ +\ + /* Create and initialize a context for native execution. */ \ + bli_gemm_cntx_init( &cntx ); \ +\ + /* Invoke the operation's front-end with the context and appropriate + control tree. */ \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ + PASTECH(cname,_cntl) ); \ +\ + /* Free the context. */ \ + bli_gemm_cntx_finalize( &cntx ); \ } -GENFRONT( gemm, gemm, 4m1 ) -GENFRONT( her2k, gemm, 4m1 ) -GENFRONT( syr2k, gemm, 4m1 ) +GENFRONT( gemm, gemm, nat ) // -- hemm/symm/trmm3 ---------------------------------------------------------- @@ -82,20 +83,22 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, \ + PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } -GENFRONT( hemm, gemm, 4m1 ) -GENFRONT( symm, gemm, 4m1 ) -GENFRONT( trmm3, gemm, 4m1 ) +GENFRONT( hemm, gemm, nat ) +GENFRONT( symm, gemm, nat ) +GENFRONT( trmm3, gemm, nat ) // -- herk/syrk ---------------------------------------------------------------- @@ -110,19 +113,51 @@ void PASTEMAC(opname,imeth)( \ obj_t* c \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *c ) ) \ { \ PASTEMAC0(opname)( alpha, a, beta, c ); \ } \ else \ { \ - PASTEMAC(opname,_front)( alpha, a, beta, c, \ + PASTEMAC(opname,_front)( alpha, a, beta, c, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } -GENFRONT( herk, gemm, 4m1 ) -GENFRONT( syrk, gemm, 4m1 ) +GENFRONT( herk, gemm, nat ) +GENFRONT( syrk, gemm, nat ) + + +// -- her2k/syr2k -------------------------------------------------------------- + +#undef GENFRONT +#define GENFRONT( opname, cname, imeth ) \ +\ +void PASTEMAC(opname,imeth)( \ + obj_t* alpha, \ + obj_t* a, \ + obj_t* b, \ + obj_t* beta, \ + obj_t* c \ + ) \ +{ \ + cntx_t cntx; \ +\ + if ( bli_obj_is_real( *c ) ) \ + { \ + PASTEMAC0(opname)( alpha, a, b, beta, c ); \ + } \ + else \ + { \ + PASTEMAC(opname,_front)( alpha, a, b, beta, c, &cntx, \ + PASTECH2(cname,imeth,_cntl) ); \ + } \ +} + +GENFRONT( her2k, gemm, nat ) +GENFRONT( syr2k, gemm, nat ) // -- trmm --------------------------------------------------------------------- @@ -137,18 +172,20 @@ void PASTEMAC(opname,imeth)( \ obj_t* b \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *b ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, \ + PASTEMAC(opname,_front)( side, alpha, a, b, &cntx, \ PASTECH2(cname,imeth,_cntl) ); \ } \ } -GENFRONT( trmm, gemm, 4m1 ) +GENFRONT( trmm, gemm, nat ) // -- trsm --------------------------------------------------------------------- @@ -163,17 +200,19 @@ void PASTEMAC(opname,imeth)( \ obj_t* b \ ) \ { \ + cntx_t cntx; \ +\ if ( bli_obj_is_real( *b ) ) \ { \ PASTEMAC0(opname)( side, alpha, a, b ); \ } \ else \ { \ - PASTEMAC(opname,_front)( side, alpha, a, b, \ + PASTEMAC(opname,_front)( side, alpha, a, b, &cntx, \ PASTECH2(cname,imeth,_l_cntl), \ PASTECH2(cname,imeth,_r_cntl) ); \ } \ } -GENFRONT( trsm, trsm, 4m1 ) +GENFRONT( trsm, trsm, nat ) diff --git a/frame/ind/query/bli_bsv_query.c b/frame/ind/query/bli_bsv_query.c deleted file mode 100644 index 73f4a7738..000000000 --- a/frame/ind/query/bli_bsv_query.c +++ /dev/null @@ -1,178 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - -// 3mh blocksizes -extern blksz_t* gemm3mh_mc; -extern blksz_t* gemm3mh_nc; -extern blksz_t* gemm3mh_kc; -extern blksz_t* gemm3mh_mr; -extern blksz_t* gemm3mh_nr; -extern blksz_t* gemm3mh_kr; - -// 3m3 blocksizes -extern blksz_t* gemm3m3_mc; -extern blksz_t* gemm3m3_nc; -extern blksz_t* gemm3m3_kc; -extern blksz_t* gemm3m3_mr; -extern blksz_t* gemm3m3_nr; -extern blksz_t* gemm3m3_kr; - -// 3m2 blocksizes -extern blksz_t* gemm3m2_mc; -extern blksz_t* gemm3m2_nc; -extern blksz_t* gemm3m2_kc; -extern blksz_t* gemm3m2_mr; -extern blksz_t* gemm3m2_nr; -extern blksz_t* gemm3m2_kr; - -// 3m1 blocksizes -extern blksz_t* gemm3m1_mc; -extern blksz_t* gemm3m1_nc; -extern blksz_t* gemm3m1_kc; -extern blksz_t* gemm3m1_mr; -extern blksz_t* gemm3m1_nr; -extern blksz_t* gemm3m1_kr; - -// 4mh blocksizes -extern blksz_t* gemm4mh_mc; -extern blksz_t* gemm4mh_nc; -extern blksz_t* gemm4mh_kc; -extern blksz_t* gemm4mh_mr; -extern blksz_t* gemm4mh_nr; -extern blksz_t* gemm4mh_kr; - -// 4m1b blocksizes -extern blksz_t* gemm4mb_mc; -extern blksz_t* gemm4mb_nc; -extern blksz_t* gemm4mb_kc; -extern blksz_t* gemm4mb_mr; -extern blksz_t* gemm4mb_nr; -extern blksz_t* gemm4mb_kr; - -// 4m1a blocksizes -extern blksz_t* gemm4m1_mc; -extern blksz_t* gemm4m1_nc; -extern blksz_t* gemm4m1_kc; -extern blksz_t* gemm4m1_mr; -extern blksz_t* gemm4m1_nr; -extern blksz_t* gemm4m1_kr; - -// Native blocksizes -extern blksz_t* gemm_mc; -extern blksz_t* gemm_nc; -extern blksz_t* gemm_kc; -extern blksz_t* gemm_mr; -extern blksz_t* gemm_nr; -extern blksz_t* gemm_kr; - -// -// NOTE: We have to use the address of the blksz_t*, since the value -// will not yet be set at compile-time (since they are allocated at -// runtime). -// -static blksz_t** bli_bsizes[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_BLKSZS] = -{ - /* mc/mr nc/nr kc/kr */ -/* 3mh */ { &gemm3mh_mc, &gemm3mh_nc, &gemm3mh_kc, - &gemm3mh_mr, &gemm3mh_nr, &gemm3mh_kr }, -/* 3m3 */ { &gemm3m3_mc, &gemm3m3_nc, &gemm3m3_kc, - &gemm3m3_mr, &gemm3m3_nr, &gemm3m3_kr }, -/* 3m2 */ { &gemm3m2_mc, &gemm3m2_nc, &gemm3m2_kc, - &gemm3m2_mr, &gemm3m2_nr, &gemm3m2_kr }, -/* 3m1 */ { &gemm3m1_mc, &gemm3m1_nc, &gemm3m1_kc, - &gemm3m1_mr, &gemm3m1_nr, &gemm3m1_kr }, -/* 4mh */ { &gemm4mh_mc, &gemm4mh_nc, &gemm4mh_kc, - &gemm4mh_mr, &gemm4mh_nr, &gemm4mh_kr }, -/* 4mb */ { &gemm4mb_mc, &gemm4mb_nc, &gemm4mb_kc, - &gemm4mb_mr, &gemm4mb_nr, &gemm4mb_kr }, -/* 4m1 */ { &gemm4m1_mc, &gemm4m1_nc, &gemm4m1_kc, - &gemm4m1_mr, &gemm4m1_nr, &gemm4m1_kr }, -/* nat */ { &gemm_mc, &gemm_nc, &gemm_kc, - &gemm_mr, &gemm_nr, &gemm_kr }, -}; - -// ----------------------------------------------------------------------------- - -dim_t bli_bsv_get_avail_blksz_dt( bszid_t bsv, opid_t oper, num_t dt ) -{ - // Query the blksz_t object corresponding to the requested - // blocksize id type and datatype (for the current available - // induced method of the given operation). - blksz_t* b = bli_bsv_get_avail_blksz( bsv, oper, dt ); - - // Return the default blocksize associated with the given datatype. - return bli_blksz_get_def( dt, b ); -} - -// ----------------------------------------------------------------------------- - -dim_t bli_bsv_get_avail_blksz_max_dt( bszid_t bsv, opid_t oper, num_t dt ) -{ - // Query the blksz_t object corresponding to the requested - // blocksize id type and datatype (for the current available - // induced method of the given operation). - blksz_t* b = bli_bsv_get_avail_blksz( bsv, oper, dt ); - - // Return the maximum blocksize associated with the given datatype. - return bli_blksz_get_max( dt, b ); -} - -// ----------------------------------------------------------------------------- - -blksz_t* bli_bsv_get_avail_blksz( bszid_t bsv, opid_t oper, num_t dt ) -{ - // Query the current available induced method for the operation - // and datatype given. - ind_t method = bli_ind_oper_find_avail( oper, dt ); - - // Return a pointer to the blksz_t object corresponding to the - // blocksize id type for the current available induced method. - return bli_bsv_get_blksz( bsv, method ); -} - -// ----------------------------------------------------------------------------- - -blksz_t* bli_bsv_get_blksz( bszid_t bsv, ind_t method ) -{ - // Initialize the cntl API, if it isn't already initialized. This is - // needed because we have to ensure that the blksz_t objects have - // been created, otherwise this function could return a NULL (or - // garbage) address. - bli_cntl_init(); - - return *(bli_bsizes[ method ][ bsv ]); -} - diff --git a/frame/ind/query/bli_ukr_query.c b/frame/ind/query/bli_ukr_query.c deleted file mode 100644 index 61d07d6fc..000000000 --- a/frame/ind/query/bli_ukr_query.c +++ /dev/null @@ -1,228 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -// 3mh micro-kernels -extern func_t* gemm3mh_ukrs; - -// 3m3 micro-kernels -extern func_t* gemm3m3_ukrs; - -// 3m2 micro-kernels -extern func_t* gemm3m2_ukrs; - -// 3m1 micro-kernels -extern func_t* gemm3m1_ukrs; -extern func_t* gemmtrsm3m1_l_ukrs; -extern func_t* gemmtrsm3m1_u_ukrs; -extern func_t* trsm3m1_l_ukrs; -extern func_t* trsm3m1_u_ukrs; - -// 4mh micro-kernels -extern func_t* gemm4mh_ukrs; - -// 4m1b micro-kernels -extern func_t* gemm4mb_ukrs; - -// 4m1a micro-kernels -extern func_t* gemm4m1_ukrs; -extern func_t* gemmtrsm4m1_l_ukrs; -extern func_t* gemmtrsm4m1_u_ukrs; -extern func_t* trsm4m1_l_ukrs; -extern func_t* trsm4m1_u_ukrs; - -// Native micro-kernels -extern func_t* gemm_ukrs; -extern func_t* gemmtrsm_l_ukrs; -extern func_t* gemmtrsm_u_ukrs; -extern func_t* trsm_l_ukrs; -extern func_t* trsm_u_ukrs; - -// Reference micro-kernels -extern func_t* gemm_ref_ukrs; -extern func_t* gemmtrsm_l_ref_ukrs; -extern func_t* gemmtrsm_u_ref_ukrs; -extern func_t* trsm_l_ref_ukrs; -extern func_t* trsm_u_ref_ukrs; - -// -// NOTE: We have to use the address of the func_t*, since the value -// will not yet be set at compile-time (since they are allocated at -// runtime). -// -static func_t** bli_ukrs[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_UKRS] = -{ - /* gemm gemmtrsm_l gemmtrsm_u trsm_l trsm_u */ -/* 3mh */ { &gemm3mh_ukrs, NULL, NULL, - NULL, NULL, }, -/* 3m3 */ { &gemm3m3_ukrs, NULL, NULL, - NULL, NULL, }, -/* 3m2 */ { &gemm3m2_ukrs, NULL, NULL, - NULL, NULL, }, -/* 3m1 */ { &gemm3m1_ukrs, &gemmtrsm3m1_l_ukrs, &gemmtrsm3m1_u_ukrs, - &trsm3m1_l_ukrs, &trsm3m1_u_ukrs, }, -/* 4mh */ { &gemm4mh_ukrs, NULL, NULL, - NULL, NULL, }, -/* 4mb */ { &gemm4mb_ukrs, NULL, NULL, - NULL, NULL, }, -/* 4m1 */ { &gemm4m1_ukrs, &gemmtrsm4m1_l_ukrs, &gemmtrsm4m1_u_ukrs, - &trsm4m1_l_ukrs, &trsm4m1_u_ukrs, }, -/* nat */ { &gemm_ukrs, &gemmtrsm_l_ukrs, &gemmtrsm_u_ukrs, - &trsm_l_ukrs, &trsm_u_ukrs, }, -}; - -static func_t** bli_ref_ukrs[BLIS_NUM_LEVEL3_UKRS] = -{ - &gemm_ref_ukrs, - &gemmtrsm_l_ref_ukrs, - &gemmtrsm_u_ref_ukrs, - &trsm_l_ref_ukrs, - &trsm_u_ref_ukrs, -}; - -static char* bli_ukr_impl_str[BLIS_NUM_UKR_IMPL_TYPES] = -{ - "refrnce", - "virtual", - "optimzd", - "notappl", -}; - -// ----------------------------------------------------------------------------- - -char* bli_ukr_impl_string( l3ukr_t ukr, ind_t method, num_t dt ) -{ - func_t* p; - kimpl_t ki; - -//printf( "ukr method dt = %u %u %u\n", ukr, method, dt ); - // Look up the ukr func_t for the given ukr type and method. - p = bli_ukr_get_funcs( ukr, method ); - - // Check whether the ukrs func_t is NULL for the given ukr type. - // If the queried ukr func_t is NULL, return the string for not - // applicable. Otherwise, query the ukernel implementation type - // using the method provided and return the associated string. - if ( p == NULL ) ki = BLIS_NOTAPPLIC_UKERNEL; - else ki = bli_ukr_impl_type( ukr, method, dt ); - - return bli_ukr_impl_str[ ki ]; -} - -// ----------------------------------------------------------------------------- - -char* bli_ukr_avail_impl_string( l3ukr_t ukr, num_t dt ) -{ - opid_t oper; - ind_t method; - kimpl_t ki; - - // We need to decide which operation we will use to query the - // current available induced method. If the ukr type given is - // BLIS_GEMM_UKR, we use gemm. Otherwise, we use trsm (since - // the four other defined ukr types are trsm-related). - if ( ukr == BLIS_GEMM_UKR ) oper = BLIS_GEMM; - else oper = BLIS_TRSM; - - // Query the current available induced method using the - // chosen operation id type. - method = bli_ind_oper_find_avail( oper, dt ); - - // Query the ukernel implementation type using the current - // available method. - ki = bli_ukr_impl_type( ukr, method, dt ); - - return bli_ukr_impl_str[ ki ]; -} - -// ----------------------------------------------------------------------------- - -kimpl_t bli_ukr_impl_type( l3ukr_t ukr, ind_t method, num_t dt ) -{ - // If the current available induced method is not native, it - // must be virtual. - if ( method != BLIS_NAT ) return BLIS_VIRTUAL_UKERNEL; - else - { - // If the current available induced method for the gemm - // operation is native, then it might be reference or - // optimized. To determine which, we compare the - // datatype-specific function pointer within the ukrs - // object corresponding to the current available induced - // method to the typed function pointer within the known - // reference ukrs object. - func_t* funcs = bli_ukr_get_funcs( ukr, method ); - void* p = bli_func_obj_query( dt, funcs ); - func_t* ref_funcs = bli_ukr_get_ref_funcs( ukr ); - void* ref_p = bli_func_obj_query( dt, ref_funcs ); - - if ( p == ref_p ) return BLIS_REFERENCE_UKERNEL; - else return BLIS_OPTIMIZED_UKERNEL; - } -} - -// ----------------------------------------------------------------------------- - -func_t* bli_ukr_get_funcs( l3ukr_t ukr, ind_t method ) -{ - func_t** p = bli_ukrs[ method ][ ukr ]; - - // Initialize the cntl API, if it isn't already initialized. This is - // needed because we have to ensure that the ukr func_t objects have - // been created (and thus contain valid function pointers). - bli_cntl_init(); - - // Avoid dereferencing NULL pointers. (A NULL pointer indicates that - // there is no kernel for the requested kernel type and method.) - if ( p == NULL ) return NULL; - else return *p; -} - -func_t* bli_ukr_get_ref_funcs( l3ukr_t ukr ) -{ - func_t** p = bli_ref_ukrs[ ukr ]; - - // Initialize the cntl API, if it isn't already initialized. This is - // needed because we have to ensure that the ukr func_t objects have - // been created (and thus contain valid function pointers). - bli_cntl_init(); - - // Avoid dereferencing NULL pointers. (A NULL pointer indicates that - // there is no reference kernel for the requested kernel type.) - if ( p == NULL ) return NULL; - else return *p; -} - diff --git a/frame/ind/tapi/bli_tapi_ind.c b/frame/ind/tapi/bli_l3_ind_tapi.c similarity index 69% rename from frame/ind/tapi/bli_tapi_ind.c rename to frame/ind/tapi/bli_l3_ind_tapi.c index e36618d06..1c4ba3ba9 100644 --- a/frame/ind/tapi/bli_tapi_ind.c +++ b/frame/ind/tapi/bli_l3_ind_tapi.c @@ -40,18 +40,20 @@ #undef GENTFUNC #define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - trans_t transa, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -77,7 +79,8 @@ void PASTEMAC(ch,opname)( \ &ao, \ &bo, \ &betao, \ - &co ); \ + &co, \ + cntx ); \ } INSERT_GENTFUNC_BASIC0( gemm3mh ) @@ -94,19 +97,21 @@ INSERT_GENTFUNC_BASIC0( gemm4m1 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - conj_t conja, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + conj_t conja, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -136,7 +141,8 @@ void PASTEMAC(ch,opname)( \ &ao, \ &bo, \ &betao, \ - &co ); \ + &co, \ + cntx ); \ } INSERT_GENTFUNC_BASIC0( hemm3mh ) @@ -150,16 +156,18 @@ INSERT_GENTFUNC_BASIC0( hemm4m1 ) #undef GENTFUNCR #define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - dim_t m, \ - dim_t k, \ - ctype_r* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype_r* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + dim_t m, \ + dim_t k, \ + ctype_r* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype_r* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt_r = PASTEMAC(chr,type); \ const num_t dt = PASTEMAC(ch,type); \ @@ -184,7 +192,8 @@ void PASTEMAC(ch,opname)( \ PASTEMAC0(opname)( &alphao, \ &ao, \ &betao, \ - &co ); \ + &co, \ + cntx ); \ } INSERT_GENTFUNCR_BASIC0( herk3mh ) @@ -198,18 +207,20 @@ INSERT_GENTFUNCR_BASIC0( herk4m1 ) #undef GENTFUNCR #define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - trans_t transb, \ - dim_t m, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype_r* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype_r* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt_r = PASTEMAC(chr,type); \ const num_t dt = PASTEMAC(ch,type); \ @@ -239,7 +250,8 @@ void PASTEMAC(ch,opname)( \ &ao, \ &bo, \ &betao, \ - &co ); \ + &co, \ + cntx ); \ } INSERT_GENTFUNCR_BASIC0( her2k3mh ) @@ -253,19 +265,21 @@ INSERT_GENTFUNCR_BASIC0( her2k4m1 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - conj_t conja, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + conj_t conja, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -295,7 +309,8 @@ void PASTEMAC(ch,opname)( \ &ao, \ &bo, \ &betao, \ - &co ); \ + &co, \ + cntx ); \ } INSERT_GENTFUNC_BASIC0( symm3mh ) @@ -309,16 +324,18 @@ INSERT_GENTFUNC_BASIC0( symm4m1 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - dim_t m, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -342,7 +359,8 @@ void PASTEMAC(ch,opname)( \ PASTEMAC0(opname)( &alphao, \ &ao, \ &betao, \ - &co ); \ + &co, \ + cntx ); \ } INSERT_GENTFUNC_BASIC0( syrk3mh ) @@ -356,18 +374,20 @@ INSERT_GENTFUNC_BASIC0( syrk4m1 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - trans_t transb, \ - dim_t m, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -396,7 +416,8 @@ void PASTEMAC(ch,opname)( \ &ao, \ &bo, \ &betao, \ - &co ); \ + &co, \ + cntx ); \ } INSERT_GENTFUNC_BASIC0( syr2k3mh ) @@ -410,20 +431,22 @@ INSERT_GENTFUNC_BASIC0( syr2k4m1 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -454,7 +477,8 @@ void PASTEMAC(ch,opname)( \ &ao, \ &bo, \ &betao, \ - &co ); \ + &co, \ + cntx ); \ } INSERT_GENTFUNC_BASIC0( trmm33mh ) @@ -468,17 +492,19 @@ INSERT_GENTFUNC_BASIC0( trmm34m1 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -502,7 +528,8 @@ void PASTEMAC(ch,opname)( \ PASTEMAC0(opname)( side, \ &alphao, \ &ao, \ - &bo ); \ + &bo, \ + cntx ); \ } INSERT_GENTFUNC_BASIC0( trmm3m1 ) @@ -514,17 +541,19 @@ INSERT_GENTFUNC_BASIC0( trmm4m1 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + cntx_t* cntx \ + ) \ { \ const num_t dt = PASTEMAC(ch,type); \ \ @@ -548,7 +577,8 @@ void PASTEMAC(ch,opname)( \ PASTEMAC0(opname)( side, \ &alphao, \ &ao, \ - &bo ); \ + &bo, \ + cntx ); \ } INSERT_GENTFUNC_BASIC0( trsm3m1 ) diff --git a/frame/ind/tapi/bli_l3_ind_tapi.h b/frame/ind/tapi/bli_l3_ind_tapi.h new file mode 100644 index 000000000..029166c6c --- /dev/null +++ b/frame/ind/tapi/bli_l3_ind_tapi.h @@ -0,0 +1,271 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( gemm3mh ) +INSERT_GENTPROT_BASIC( gemm3m3 ) +INSERT_GENTPROT_BASIC( gemm3m2 ) +INSERT_GENTPROT_BASIC( gemm3m1 ) +INSERT_GENTPROT_BASIC( gemm4mh ) +INSERT_GENTPROT_BASIC( gemm4mb ) +INSERT_GENTPROT_BASIC( gemm4m1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + conj_t conja, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( hemm3mh ) +INSERT_GENTPROT_BASIC( hemm3m1 ) +INSERT_GENTPROT_BASIC( hemm4mh ) +INSERT_GENTPROT_BASIC( hemm4m1 ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype_r* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( her2k3mh ) +INSERT_GENTPROTR_BASIC( her2k3m1 ) +INSERT_GENTPROTR_BASIC( her2k4mh ) +INSERT_GENTPROTR_BASIC( her2k4m1 ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + dim_t m, \ + dim_t k, \ + ctype_r* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype_r* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( herk3mh ) +INSERT_GENTPROTR_BASIC( herk3m1 ) +INSERT_GENTPROTR_BASIC( herk4mh ) +INSERT_GENTPROTR_BASIC( herk4m1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + conj_t conja, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( symm3mh ) +INSERT_GENTPROT_BASIC( symm3m1 ) +INSERT_GENTPROT_BASIC( symm4mh ) +INSERT_GENTPROT_BASIC( symm4m1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + trans_t transb, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( syr2k3mh ) +INSERT_GENTPROT_BASIC( syr2k3m1 ) +INSERT_GENTPROT_BASIC( syr2k4mh ) +INSERT_GENTPROT_BASIC( syr2k4m1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploc, \ + trans_t transa, \ + dim_t m, \ + dim_t k, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( syrk3mh ) +INSERT_GENTPROT_BASIC( syrk3m1 ) +INSERT_GENTPROT_BASIC( syrk4mh ) +INSERT_GENTPROT_BASIC( syrk4m1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + trans_t transb, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + ctype* beta, \ + ctype* c, inc_t rs_c, inc_t cs_c, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( trmm33mh ) +INSERT_GENTPROT_BASIC( trmm33m1 ) +INSERT_GENTPROT_BASIC( trmm34mh ) +INSERT_GENTPROT_BASIC( trmm34m1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( trmm3m1 ) +INSERT_GENTPROT_BASIC( trmm4m1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + side_t side, \ + uplo_t uploa, \ + trans_t transa, \ + diag_t diaga, \ + dim_t m, \ + dim_t n, \ + ctype* alpha, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + ctype* b, inc_t rs_b, inc_t cs_b, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( trsm3m1 ) +INSERT_GENTPROT_BASIC( trsm4m1 ) + diff --git a/frame/ind/tapi/bli_tapi_ind.h b/frame/ind/tapi/bli_tapi_ind.h deleted file mode 100644 index d7bac868a..000000000 --- a/frame/ind/tapi/bli_tapi_ind.h +++ /dev/null @@ -1,251 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - trans_t transa, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( gemm3mh ) -INSERT_GENTPROT_BASIC( gemm3m3 ) -INSERT_GENTPROT_BASIC( gemm3m2 ) -INSERT_GENTPROT_BASIC( gemm3m1 ) -INSERT_GENTPROT_BASIC( gemm4mh ) -INSERT_GENTPROT_BASIC( gemm4mb ) -INSERT_GENTPROT_BASIC( gemm4m1 ) - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - conj_t conja, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( hemm3mh ) -INSERT_GENTPROT_BASIC( hemm3m1 ) -INSERT_GENTPROT_BASIC( hemm4mh ) -INSERT_GENTPROT_BASIC( hemm4m1 ) - - -#undef GENTPROTR -#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - trans_t transb, \ - dim_t m, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype_r* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROTR_BASIC( her2k3mh ) -INSERT_GENTPROTR_BASIC( her2k3m1 ) -INSERT_GENTPROTR_BASIC( her2k4mh ) -INSERT_GENTPROTR_BASIC( her2k4m1 ) - - -#undef GENTPROTR -#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - dim_t m, \ - dim_t k, \ - ctype_r* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype_r* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROTR_BASIC( herk3mh ) -INSERT_GENTPROTR_BASIC( herk3m1 ) -INSERT_GENTPROTR_BASIC( herk4mh ) -INSERT_GENTPROTR_BASIC( herk4m1 ) - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - conj_t conja, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( symm3mh ) -INSERT_GENTPROT_BASIC( symm3m1 ) -INSERT_GENTPROT_BASIC( symm4mh ) -INSERT_GENTPROT_BASIC( symm4m1 ) - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - trans_t transb, \ - dim_t m, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( syr2k3mh ) -INSERT_GENTPROT_BASIC( syr2k3m1 ) -INSERT_GENTPROT_BASIC( syr2k4mh ) -INSERT_GENTPROT_BASIC( syr2k4m1 ) - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - uplo_t uploc, \ - trans_t transa, \ - dim_t m, \ - dim_t k, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( syrk3mh ) -INSERT_GENTPROT_BASIC( syrk3m1 ) -INSERT_GENTPROT_BASIC( syrk4mh ) -INSERT_GENTPROT_BASIC( syrk4m1 ) - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - trans_t transb, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( trmm33mh ) -INSERT_GENTPROT_BASIC( trmm33m1 ) -INSERT_GENTPROT_BASIC( trmm34mh ) -INSERT_GENTPROT_BASIC( trmm34m1 ) - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b \ - ); - -INSERT_GENTPROT_BASIC( trmm3m1 ) -INSERT_GENTPROT_BASIC( trmm4m1 ) - - -#undef GENTPROT -#define GENTPROT( ctype, ch, opname ) \ -\ -void PASTEMAC(ch,opname)( \ - side_t side, \ - uplo_t uploa, \ - trans_t transa, \ - diag_t diaga, \ - dim_t m, \ - dim_t n, \ - ctype* alpha, \ - ctype* a, inc_t rs_a, inc_t cs_a, \ - ctype* b, inc_t rs_b, inc_t cs_b \ - ); - -INSERT_GENTPROT_BASIC( trsm3m1 ) -INSERT_GENTPROT_BASIC( trsm4m1 ) - diff --git a/frame/ind/ukernels/gemm/bli_gemm3m1_ukr_ref.c b/frame/ind/ukernels/gemm/bli_gemm3m1_ukr_ref.c index 373031bc8..d5873b070 100644 --- a/frame/ind/ukernels/gemm/bli_gemm3m1_ukr_ref.c +++ b/frame/ind/ukernels/gemm/bli_gemm3m1_ukr_ref.c @@ -35,32 +35,42 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype_r ab_r[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - ctype_r ab_i[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - ctype_r ab_rpi[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + ctype_r ab_r[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + ctype_r ab_i[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + ctype_r ab_rpi[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ inc_t rs_ab; \ inc_t cs_ab; \ -\ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ \ const inc_t is_a = bli_auxinfo_is_a( data ); \ const inc_t is_b = bli_auxinfo_is_b( data ); \ @@ -127,41 +137,54 @@ void PASTEMAC(ch,varname)( \ c_r += + a_r * b_r - a_i * b_i; c_i += (a_r + a_i)(b_r + b_i) - a_r * b_r - a_i * b_i; - NOTE: Scaling by alpha_r is not shown for space reasons. */ \ + NOTE: Scaling by alpha_r is not shown above, but is implemented + below. */ \ \ \ bli_auxinfo_set_next_ab( a_i, b_i, *data ); \ \ /* ab_r = alpha_r * a_r * b_r; */ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_r, \ - b_r, \ - zero_r, \ - ab_r, rs_ab, cs_ab, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_r, \ + b_r, \ + zero_r, \ + ab_r, rs_ab, cs_ab, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_rpi, b_rpi, *data ); \ \ /* ab_i = alpha_r * a_i * b_i; */ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_i, \ - b_i, \ - zero_r, \ - ab_i, rs_ab, cs_ab, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_i, \ + b_i, \ + zero_r, \ + ab_i, rs_ab, cs_ab, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_next, b_next, *data ); \ \ /* ct_i = alpha_r * a_ri * b_ri; */ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_rpi, \ - b_rpi, \ - zero_r, \ - ab_rpi, rs_ab, cs_ab, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_rpi, \ + b_rpi, \ + zero_r, \ + ab_rpi, rs_ab, cs_ab, \ + data, \ + cntx \ + ); \ \ \ /* How we accumulate the intermediate matrix products stored in ab_r, @@ -309,5 +332,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC( gemm3m1_ukr_ref, GEMM_UKERNEL ) +INSERT_GENTFUNCCO_BASIC( gemm3m1_ukr_ref, BLIS_GEMM_UKR ) diff --git a/frame/ind/ukernels/gemm/bli_gemm3m2_ukr_ref.c b/frame/ind/ukernels/gemm/bli_gemm3m2_ukr_ref.c index f0fe5072e..57c2488f0 100644 --- a/frame/ind/ukernels/gemm/bli_gemm3m2_ukr_ref.c +++ b/frame/ind/ukernels/gemm/bli_gemm3m2_ukr_ref.c @@ -35,26 +35,36 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype_r ct[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + ctype_r ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ inc_t rs_ct; \ inc_t cs_ct; \ -\ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ \ ctype_r* restrict a_cast = ( ctype_r* )a; \ \ @@ -118,7 +128,8 @@ void PASTEMAC(ch,varname)( \ c_r += + a_r * b_r - a_i * b_i; c_i += (a_r + a_i)(b_r + b_i) - a_r * b_r - a_i * b_i; - NOTE: Scaling by alpha_r is not shown for space reasons. */ \ + NOTE: Scaling by alpha_r is not shown above, but is implemented + below. */ \ \ \ /* Compute the offset to the real, imaginary, or summed micro-panel @@ -132,13 +143,17 @@ void PASTEMAC(ch,varname)( \ ctype_r* restrict b_cur = b_cast + off_b; \ \ /* ct = alpha_r * a * b; */ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_cur, \ - b_cur, \ - zero_r, \ - ct, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_cur, \ + b_cur, \ + zero_r, \ + ct, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ } \ \ \ @@ -291,5 +306,5 @@ void PASTEMAC(ch,varname)( \ PASTEMAC(chr,fprintm)( stdout, "gemm3m2_ukr: a1", m, k, a_cast, 1, m, "%4.1f", "" );*/ \ } -INSERT_GENTFUNCCO_BASIC( gemm3m2_ukr_ref, GEMM_UKERNEL ) +INSERT_GENTFUNCCO_BASIC( gemm3m2_ukr_ref, BLIS_GEMM_UKR ) diff --git a/frame/ind/ukernels/gemm/bli_gemm3m3_ukr_ref.c b/frame/ind/ukernels/gemm/bli_gemm3m3_ukr_ref.c index 3a9a603ba..c999107fd 100644 --- a/frame/ind/ukernels/gemm/bli_gemm3m3_ukr_ref.c +++ b/frame/ind/ukernels/gemm/bli_gemm3m3_ukr_ref.c @@ -35,26 +35,36 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype_r ct[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + ctype_r ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ inc_t rs_ct; \ inc_t cs_ct; \ -\ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ \ ctype_r* restrict a_cast = ( ctype_r* )a; \ \ @@ -120,7 +130,8 @@ void PASTEMAC(ch,varname)( \ c_r += + a_r * b_r - a_i * b_i; c_i += (a_r + a_i)(b_r + b_i) - a_r * b_r - a_i * b_i; - NOTE: Scaling by alpha_r is not shown for space reasons. */ \ + NOTE: Scaling by alpha_r is not shown above, but is implemented + below. */ \ \ \ /* Compute the offset to the real, imaginary, or summed micro-panel @@ -136,13 +147,17 @@ void PASTEMAC(ch,varname)( \ ctype_r* restrict b_cur = b_cast + off_b; \ \ /* ct = alpha_r * a * b; */ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_cur, \ - b_cur, \ - zero_r, \ - ct, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_cur, \ + b_cur, \ + zero_r, \ + ct, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ } \ \ \ @@ -295,5 +310,5 @@ void PASTEMAC(ch,varname)( \ PASTEMAC(chr,fprintm)( stdout, "gemm3m3_ukr: a1", m, k, a_cast, 1, m, "%4.1f", "" );*/ \ } -INSERT_GENTFUNCCO_BASIC( gemm3m3_ukr_ref, GEMM_UKERNEL ) +INSERT_GENTFUNCCO_BASIC( gemm3m3_ukr_ref, BLIS_GEMM_UKR ) diff --git a/frame/ind/ukernels/gemm/bli_gemm3mh_ukr_ref.c b/frame/ind/ukernels/gemm/bli_gemm3mh_ukr_ref.c index 40b94249c..7688b4dff 100644 --- a/frame/ind/ukernels/gemm/bli_gemm3mh_ukr_ref.c +++ b/frame/ind/ukernels/gemm/bli_gemm3mh_ukr_ref.c @@ -35,26 +35,36 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype_r ct[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + ctype_r ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ inc_t rs_ct; \ inc_t cs_ct; \ -\ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ \ ctype_r* restrict a_cast = ( ctype_r* )a; \ \ @@ -113,18 +123,25 @@ void PASTEMAC(ch,varname)( \ c_r += + a_r * b_r - a_i * b_i; c_i += (a_r + a_i)(b_r + b_i) - a_r * b_r - a_i * b_i; - NOTE: Scaling by alpha_r is not shown for space reasons. */ \ + NOTE: Scaling by alpha_r is not shown above, but is implemented + below. */ \ \ \ /* ct = alpha_r * a * b; */ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_cast, \ - b_cast, \ - zero_r, \ - ct, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_cast, \ + b_cast, \ + zero_r, \ + ct, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ \ +/* +PASTEMAC(chr,fprintm)( stdout, "gemm3mh_ukr: ct", 4, 4, ct, rs_ct, cs_ct, "%4.1f", "" );*/ \ \ /* How we accumulate the intermediate matrix product stored in ct depends on (a) the schemas of A and B (they are always the same), @@ -270,10 +287,12 @@ void PASTEMAC(ch,varname)( \ } \ } \ \ +/*PASTEMAC(ch,fprintm)( stdout, "gemm3mh_ukr: c", 4, 4, c, rs_c, cs_c, "%4.1f", "" ); \ +*/ \ \ /*PASTEMAC(chr,fprintm)( stdout, "gemm3mh_ukr: b1", k, n, b_cast, n, 1, "%4.1f", "" ); \ PASTEMAC(chr,fprintm)( stdout, "gemm3mh_ukr: a1", m, k, a_cast, 1, m, "%4.1f", "" );*/ \ } -INSERT_GENTFUNCCO_BASIC( gemm3mh_ukr_ref, GEMM_UKERNEL ) +INSERT_GENTFUNCCO_BASIC( gemm3mh_ukr_ref, BLIS_GEMM_UKR ) diff --git a/frame/ind/ukernels/gemm/bli_gemm4m1_ukr_ref.c b/frame/ind/ukernels/gemm/bli_gemm4m1_ukr_ref.c index b90ff4f79..7cee976ce 100644 --- a/frame/ind/ukernels/gemm/bli_gemm4m1_ukr_ref.c +++ b/frame/ind/ukernels/gemm/bli_gemm4m1_ukr_ref.c @@ -35,29 +35,39 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype_r ct_r[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - ctype_r ct_i[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + ctype_r ct_r[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + ctype_r ct_i[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ inc_t rs_ct; \ inc_t cs_ct; \ -\ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ \ const inc_t is_a = bli_auxinfo_is_a( data ); \ const inc_t is_b = bli_auxinfo_is_b( data ); \ @@ -137,52 +147,69 @@ PASTEMAC(chr,fprintm)( stdout, "gemm4m1_ukr: bp_i", k, n, \ c_r += a_r * b_r - a_i * b_i; c_i += a_r * b_i + a_i * b_r; - NOTE: Scaling by alpha_r is not shown for space reasons. */ \ + NOTE: Scaling by alpha_r is not shown above, but is implemented + below. */ \ \ \ bli_auxinfo_set_next_ab( a_r, b_i, *data ); \ \ /* ct_r = alpha_r * a_r * b_r; */ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_r, \ - b_r, \ - zero_r, \ - ct_r, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_r, \ + b_r, \ + zero_r, \ + ct_r, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_i, b_r, *data ); \ \ /* ct_i = alpha_r * a_r * b_i; */ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_r, \ - b_i, \ - zero_r, \ - ct_i, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_r, \ + b_i, \ + zero_r, \ + ct_i, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_i, b_i, *data ); \ \ /* ct_i += alpha_r * a_i * b_r; */ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_i, \ - b_r, \ - one_r, \ - ct_i, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_i, \ + b_r, \ + one_r, \ + ct_i, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_next, b_next, *data ); \ \ /* ct_r += -alpha_r * a_i * b_i; */ \ - PASTEMAC(chr,gemmukr)( k, \ - &m_alpha_r, \ - a_i, \ - b_i, \ - one_r, \ - ct_r, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + &m_alpha_r, \ + a_i, \ + b_i, \ + one_r, \ + ct_r, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ \ \ /* How we accumulate the intermediate matrix product stored in ct_r @@ -260,5 +287,5 @@ PASTEMAC(chr,fprintm)( stdout, "gemm4m1_ukr: bp_i", k, n, \ } \ } -INSERT_GENTFUNCCO_BASIC( gemm4m1_ukr_ref, GEMM_UKERNEL ) +INSERT_GENTFUNCCO_BASIC( gemm4m1_ukr_ref, BLIS_GEMM_UKR ) diff --git a/frame/ind/ukernels/gemm/bli_gemm4mb_ukr_ref.c b/frame/ind/ukernels/gemm/bli_gemm4mb_ukr_ref.c index d0e48ac41..5e94ba909 100644 --- a/frame/ind/ukernels/gemm/bli_gemm4mb_ukr_ref.c +++ b/frame/ind/ukernels/gemm/bli_gemm4mb_ukr_ref.c @@ -35,29 +35,39 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype_r ct_r[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - ctype_r ct_i[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + ctype_r ct_r[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + ctype_r ct_i[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ inc_t rs_ct; \ inc_t cs_ct; \ -\ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ \ const inc_t is_a = bli_auxinfo_is_a( data ); \ const inc_t is_b = bli_auxinfo_is_b( data ); \ @@ -123,63 +133,81 @@ void PASTEMAC(ch,varname)( \ if ( bli_is_ro_packed( schema_b ) ) \ { \ /* The following gemm micro-kernel calls implement the first half of - the 4mb method: + the 4mb method (which uses b_r): c = beta * c; c_r += a_r * b_r; c_i += a_i * b_r; - NOTE: Scaling by alpha_r is not shown. */ \ + NOTE: Scaling by alpha_r is not shown above, but is implemented + below. */ \ \ bli_auxinfo_set_next_ab( a_i, b_r, *data ); \ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_r, \ - b_r, \ - zero_r, \ - ct_r, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_r, \ + b_r, \ + zero_r, \ + ct_r, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_next, b_next, *data ); \ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_i, \ - b_r, \ - zero_r, \ - ct_i, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_i, \ + b_r, \ + zero_r, \ + ct_i, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ } \ else /* if ( bli_is_io_packed( schema_b ) ) */ \ { \ /* The following gemm micro-kernel calls implement the second half of - the 4mb method: + the 4mb method (which uses b_i): c_r += -a_i * b_i; c_i += a_r * b_i; - NOTE: Scaling by alpha_r is not shown. */ \ + NOTE: Scaling by alpha_r is not shown above, but is implemented + below. */ \ \ bli_auxinfo_set_next_ab( a_i, b_i, *data ); \ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_r, \ - b_i, \ - zero_r, \ - ct_i, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_r, \ + b_i, \ + zero_r, \ + ct_i, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_next, b_next, *data ); \ \ - PASTEMAC(chr,gemmukr)( k, \ - &m_alpha_r, \ - a_i, \ - b_i, \ - zero_r, \ - ct_r, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + &m_alpha_r, \ + a_i, \ + b_i, \ + zero_r, \ + ct_r, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ } \ \ \ @@ -304,7 +332,14 @@ void PASTEMAC(ch,varname)( \ } \ } \ } \ +\ +/*PASTEMAC(chr,fprintm)( stdout, "gemm4mb_ukr: b1_r", k, n, b_r, n, 1, "%4.1f", "" ); \ +PASTEMAC(chr,fprintm)( stdout, "gemm4mb_ukr: b1_i", k, n, b_i, n, 1, "%4.1f", "" );*/ \ +/*PASTEMAC(chr,fprintm)( stdout, "gemm4mb_ukr: a1_r", m, k, a_r, 1, m, "%4.1f", "" ); \ +PASTEMAC(chr,fprintm)( stdout, "gemm4mb_ukr: a1_i", m, k, a_i, 1, m, "%4.1f", "" );*/ \ +/*PASTEMAC(chr,fprintm)( stdout, "gemm4mb_ukr: ct_r", 8, 6, ct_r, rs_ct, cs_ct, "%4.1f", "" ); \ +PASTEMAC(chr,fprintm)( stdout, "gemm4mb_ukr: ct_i", 8, 6, ct_i, rs_ct, cs_ct, "%4.1f", "" );*/ \ } -INSERT_GENTFUNCCO_BASIC( gemm4mb_ukr_ref, GEMM_UKERNEL ) +INSERT_GENTFUNCCO_BASIC( gemm4mb_ukr_ref, BLIS_GEMM_UKR ) diff --git a/frame/ind/ukernels/gemm/bli_gemm4mh_ukr_ref.c b/frame/ind/ukernels/gemm/bli_gemm4mh_ukr_ref.c index a1d993ddd..6423ac8b2 100644 --- a/frame/ind/ukernels/gemm/bli_gemm4mh_ukr_ref.c +++ b/frame/ind/ukernels/gemm/bli_gemm4mh_ukr_ref.c @@ -35,26 +35,36 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype_r ct[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + ctype_r ct[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ inc_t rs_ct; \ inc_t cs_ct; \ -\ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ \ ctype_r* restrict a_cast = ( ctype_r* )a; \ \ @@ -114,17 +124,22 @@ void PASTEMAC(ch,varname)( \ c_r += a_r * b_r - a_i * b_i; c_i += a_r * b_i + a_i * b_r; - NOTE: Scaling by alpha_r is not shown for space reasons. */ \ + NOTE: Scaling by alpha_r is not shown above, but is implemented + below. */ \ \ \ /* ct = alpha_r * a * b; */ \ - PASTEMAC(chr,gemmukr)( k, \ - alpha_r, \ - a_cast, \ - b_cast, \ - zero_r, \ - ct, rs_ct, cs_ct, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + alpha_r, \ + a_cast, \ + b_cast, \ + zero_r, \ + ct, rs_ct, cs_ct, \ + data, \ + cntx \ + ); \ \ \ /* How we accumulate the intermediate matrix product stored in ct @@ -268,5 +283,5 @@ void PASTEMAC(ch,varname)( \ } \ } -INSERT_GENTFUNCCO_BASIC( gemm4mh_ukr_ref, GEMM_UKERNEL ) +INSERT_GENTFUNCCO_BASIC( gemm4mh_ukr_ref, BLIS_GEMM_UKR ) diff --git a/frame/ind/ukernels/gemm/bli_gemmind_ukr_ref.h b/frame/ind/ukernels/gemm/bli_gemmind_ukr_ref.h index a0c3326c4..d7d5a258f 100644 --- a/frame/ind/ukernels/gemm/bli_gemmind_ukr_ref.h +++ b/frame/ind/ukernels/gemm/bli_gemmind_ukr_ref.h @@ -36,15 +36,17 @@ #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ); +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ); INSERT_GENTPROTCO_BASIC( gemm3mh_ukr_ref ) INSERT_GENTPROTCO_BASIC( gemm3m3_ukr_ref ) diff --git a/frame/ind/ukernels/trsm/bli_gemmtrsm3m1_l_ukr_ref.c b/frame/ind/ukernels/trsm/bli_gemmtrsm3m1_l_ukr_ref.c index 5d5b93d1e..5fc8e012c 100644 --- a/frame/ind/ukernels/trsm/bli_gemmtrsm3m1_l_ukr_ref.c +++ b/frame/ind/ukernels/trsm/bli_gemmtrsm3m1_l_ukr_ref.c @@ -35,31 +35,46 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr, trsmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid, trsmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a10, \ - ctype* restrict a11, \ - ctype* restrict b01, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a10, \ + ctype* restrict a11, \ + ctype* restrict b01, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype_r ab_r[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - ctype_r ab_i[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const num_t dt = PASTEMAC(ch,type); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ +\ + PASTECH(ch,trsm_ukr_ft) \ + ctrsm_vir_ukr = bli_cntx_get_l3_vir_ukr_dt( dt, trsmkerid, cntx ); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t packnr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ +\ + ctype_r ab_r[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + ctype_r ab_i[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ const inc_t rs_ab = 1; \ - const inc_t cs_ab = PASTEMAC(chr,mr); \ -\ -\ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ + const inc_t cs_ab = mr; \ \ const inc_t is_a = bli_auxinfo_is_a( data ); \ const inc_t is_b = bli_auxinfo_is_b( data ); \ @@ -67,8 +82,6 @@ void PASTEMAC(ch,varname)( \ ctype_r* restrict a10_r = ( ctype_r* )a10; \ ctype_r* restrict a10_i = ( ctype_r* )a10 + is_a; \ ctype_r* restrict a10_ri = ( ctype_r* )a10 + 2*is_a; \ -\ - ctype_r* restrict a11_r = ( ctype_r* )a11; \ \ ctype_r* restrict b01_r = ( ctype_r* )b01; \ ctype_r* restrict b01_i = ( ctype_r* )b01 + is_b; \ @@ -78,7 +91,7 @@ void PASTEMAC(ch,varname)( \ ctype_r* restrict b11_i = ( ctype_r* )b11 + is_b; \ ctype_r* restrict b11_ri = ( ctype_r* )b11 + 2*is_b; \ \ - const inc_t rs_b = PASTEMAC(chr,packnr); \ + const inc_t rs_b = packnr; \ const inc_t cs_b = 1; \ \ ctype_r* restrict one_r = PASTEMAC(chr,1); \ @@ -118,35 +131,47 @@ void PASTEMAC(ch,varname)( \ bli_auxinfo_set_next_ab( a10_i, b01_i, *data ); \ \ /* ab.r = a10.r * b01.r; */ \ - PASTEMAC(chr,gemmukr)( k, \ - one_r, \ - a10_r, \ - b01_r, \ - zero_r, \ - ab_r, rs_ab, cs_ab, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + one_r, \ + a10_r, \ + b01_r, \ + zero_r, \ + ab_r, rs_ab, cs_ab, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a10_ri, b01_ri, *data ); \ \ /* ab.i = a10.i * b01.i; */ \ - PASTEMAC(chr,gemmukr)( k, \ - one_r, \ - a10_i, \ - b01_i, \ - zero_r, \ - ab_i, rs_ab, cs_ab, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + one_r, \ + a10_i, \ + b01_i, \ + zero_r, \ + ab_i, rs_ab, cs_ab, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_next, b_next, *data ); \ \ /* b11.i = alpha.r * b11.i - a10.ri * b01.ri; */ \ - PASTEMAC(chr,gemmukr)( k, \ - minus_one_r, \ - a10_ri, \ - b01_ri, \ - &alpha_r, \ - b11_i, rs_b, cs_b, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + minus_one_r, \ + a10_ri, \ + b01_ri, \ + &alpha_r, \ + b11_i, rs_b, cs_b, \ + data, \ + cntx \ + ); \ \ \ /* b11.r = alpha.r * b11.r - ab.r; @@ -183,10 +208,14 @@ void PASTEMAC(ch,varname)( \ \ /* b11 = inv(a11) * b11; c11 = b11; */ \ - PASTEMAC(ch,trsmukr)( a11_r, \ - b11_r, \ - c11, rs_c, cs_c, \ - data ); \ + ctrsm_vir_ukr \ + ( \ + a11, \ + b11, \ + c11, rs_c, cs_c, \ + data, \ + cntx \ + ); \ \ \ /* @@ -207,5 +236,5 @@ PASTEMAC(chr,fprintm)( stdout, "gemmtrsm3m1_l_ukr: b11_i", m, n, \ */ \ } -INSERT_GENTFUNCCO_BASIC2( gemmtrsm3m1_l_ukr_ref, GEMM_UKERNEL, TRSM3M1_L_UKERNEL ) +INSERT_GENTFUNCCO_BASIC2( gemmtrsm3m1_l_ukr_ref, BLIS_GEMM_UKR, BLIS_TRSM_L_UKR ) diff --git a/frame/ind/ukernels/trsm/bli_gemmtrsm3m1_u_ukr_ref.c b/frame/ind/ukernels/trsm/bli_gemmtrsm3m1_u_ukr_ref.c index 59d0fa352..9d82ba8c9 100644 --- a/frame/ind/ukernels/trsm/bli_gemmtrsm3m1_u_ukr_ref.c +++ b/frame/ind/ukernels/trsm/bli_gemmtrsm3m1_u_ukr_ref.c @@ -35,51 +35,64 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr, trsmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid, trsmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a12, \ - ctype* restrict a11, \ - ctype* restrict b21, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a12, \ + ctype* restrict a11, \ + ctype* restrict b21, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype_r ab_r[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - ctype_r ab_i[ PASTEMAC(chr,mr) * \ - PASTEMAC(chr,nr) ] \ - __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ - const inc_t rs_ab = 1; \ - const inc_t cs_ab = PASTEMAC(chr,mr); \ + const num_t dt = PASTEMAC(ch,type); \ + const num_t dt_r = PASTEMAC(chr,type); \ \ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ \ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ + PASTECH(ch,trsm_ukr_ft) \ + ctrsm_vir_ukr = bli_cntx_get_l3_vir_ukr_dt( dt, trsmkerid, cntx ); \ \ - const inc_t is_a = bli_auxinfo_is_a( data ); \ - const inc_t is_b = bli_auxinfo_is_b( data ); \ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ \ - ctype_r* restrict a11_r = ( ctype_r* )a11; \ + const dim_t packnr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_NR, cntx ); \ \ - ctype_r* restrict a12_r = ( ctype_r* )a12; \ - ctype_r* restrict a12_i = ( ctype_r* )a12 + is_a; \ - ctype_r* restrict a12_ri = ( ctype_r* )a12 + 2*is_a; \ + const dim_t m = mr; \ + const dim_t n = nr; \ \ - ctype_r* restrict b11_r = ( ctype_r* )b11; \ - ctype_r* restrict b11_i = ( ctype_r* )b11 + is_b; \ - ctype_r* restrict b11_ri = ( ctype_r* )b11 + 2*is_b; \ + ctype_r ab_r[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + ctype_r ab_i[ BLIS_STACK_BUF_MAX_SIZE \ + / sizeof( ctype_r ) ] \ + __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \ + const inc_t rs_ab = 1; \ + const inc_t cs_ab = mr; \ \ - ctype_r* restrict b21_r = ( ctype_r* )b21; \ - ctype_r* restrict b21_i = ( ctype_r* )b21 + is_b; \ - ctype_r* restrict b21_ri = ( ctype_r* )b21 + 2*is_b; \ + const inc_t is_a = bli_auxinfo_is_a( data ); \ + const inc_t is_b = bli_auxinfo_is_b( data ); \ \ - const inc_t rs_b = PASTEMAC(chr,packnr); \ - const inc_t cs_b = 1; \ + ctype_r* restrict a12_r = ( ctype_r* )a12; \ + ctype_r* restrict a12_i = ( ctype_r* )a12 + is_a; \ + ctype_r* restrict a12_ri = ( ctype_r* )a12 + 2*is_a; \ +\ + ctype_r* restrict b11_r = ( ctype_r* )b11; \ + ctype_r* restrict b11_i = ( ctype_r* )b11 + is_b; \ + ctype_r* restrict b11_ri = ( ctype_r* )b11 + 2*is_b; \ +\ + ctype_r* restrict b21_r = ( ctype_r* )b21; \ + ctype_r* restrict b21_i = ( ctype_r* )b21 + is_b; \ + ctype_r* restrict b21_ri = ( ctype_r* )b21 + 2*is_b; \ +\ + const inc_t rs_b = packnr; \ + const inc_t cs_b = 1; \ \ ctype_r* restrict one_r = PASTEMAC(chr,1); \ ctype_r* restrict zero_r = PASTEMAC(chr,0); \ @@ -118,35 +131,47 @@ void PASTEMAC(ch,varname)( \ bli_auxinfo_set_next_ab( a12_i, b21_i, *data ); \ \ /* ab.r = a12.r * b21.r; */ \ - PASTEMAC(chr,gemmukr)( k, \ - one_r, \ - a12_r, \ - b21_r, \ - zero_r, \ - ab_r, rs_ab, cs_ab, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + one_r, \ + a12_r, \ + b21_r, \ + zero_r, \ + ab_r, rs_ab, cs_ab, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a12_ri, b21_ri, *data ); \ \ /* ab.i = a12.i * b21.i; */ \ - PASTEMAC(chr,gemmukr)( k, \ - one_r, \ - a12_i, \ - b21_i, \ - zero_r, \ - ab_i, rs_ab, cs_ab, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + one_r, \ + a12_i, \ + b21_i, \ + zero_r, \ + ab_i, rs_ab, cs_ab, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_next, b_next, *data ); \ \ /* b11.i = alpha.r * b11.i - a12.ri * b21.ri; */ \ - PASTEMAC(chr,gemmukr)( k, \ - minus_one_r, \ - a12_ri, \ - b21_ri, \ - &alpha_r, \ - b11_i, rs_b, cs_b, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + minus_one_r, \ + a12_ri, \ + b21_ri, \ + &alpha_r, \ + b11_i, rs_b, cs_b, \ + data, \ + cntx \ + ); \ \ \ /* b11.r = alpha.r * b11.r - ab.r; @@ -183,11 +208,15 @@ void PASTEMAC(ch,varname)( \ \ /* b11 = inv(a11) * b11; c11 = b11; */ \ - PASTEMAC(ch,trsmukr)( a11_r, \ - b11_r, \ - c11, rs_c, cs_c, \ - data ); \ + ctrsm_vir_ukr \ + ( \ + a11, \ + b11, \ + c11, rs_c, cs_c, \ + data, \ + cntx \ + ); \ } -INSERT_GENTFUNCCO_BASIC2( gemmtrsm3m1_u_ukr_ref, GEMM_UKERNEL, TRSM3M1_U_UKERNEL ) +INSERT_GENTFUNCCO_BASIC2( gemmtrsm3m1_u_ukr_ref, BLIS_GEMM_UKR, BLIS_TRSM_U_UKR ) diff --git a/frame/ind/ukernels/trsm/bli_gemmtrsm4m1_l_ukr_ref.c b/frame/ind/ukernels/trsm/bli_gemmtrsm4m1_l_ukr_ref.c index d4d885c43..c979d5cbf 100644 --- a/frame/ind/ukernels/trsm/bli_gemmtrsm4m1_l_ukr_ref.c +++ b/frame/ind/ukernels/trsm/bli_gemmtrsm4m1_l_ukr_ref.c @@ -35,29 +35,43 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr, trsmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid, trsmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a10, \ - ctype* restrict a11, \ - ctype* restrict b01, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a10, \ + ctype* restrict a11, \ + ctype* restrict b01, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ + const num_t dt = PASTEMAC(ch,type); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ +\ + PASTECH(ch,trsm_ukr_ft) \ + ctrsm_vir_ukr = bli_cntx_get_l3_vir_ukr_dt( dt, trsmkerid, cntx ); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t packnr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ \ const inc_t is_a = bli_auxinfo_is_a( data ); \ const inc_t is_b = bli_auxinfo_is_b( data ); \ \ ctype_r* restrict a10_r = ( ctype_r* )a10; \ ctype_r* restrict a10_i = ( ctype_r* )a10 + is_a; \ -\ - ctype_r* restrict a11_r = ( ctype_r* )a11; \ \ ctype_r* restrict b01_r = ( ctype_r* )b01; \ ctype_r* restrict b01_i = ( ctype_r* )b01 + is_b; \ @@ -65,7 +79,7 @@ void PASTEMAC(ch,varname)( \ ctype_r* restrict b11_r = ( ctype_r* )b11; \ ctype_r* restrict b11_i = ( ctype_r* )b11 + is_b; \ \ - const inc_t rs_b = PASTEMAC(chr,packnr); \ + const inc_t rs_b = packnr; \ const inc_t cs_b = 1; \ \ ctype_r* restrict one_r = PASTEMAC(chr,1); \ @@ -115,46 +129,62 @@ PASTEMAC(chr,fprintm)( stdout, "gemmtrsm4m1_l_ukr: b0111p_i", k+m, n, \ bli_auxinfo_set_next_ab( a10_r, b01_i, *data ); \ \ /* b11.r = alpha.r * b11.r - a10.r * b01.r; */ \ - PASTEMAC(chr,gemmukr)( k, \ - minus_one_r, \ - a10_r, \ - b01_r, \ - &alpha_r, \ - b11_r, rs_b, cs_b, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + minus_one_r, \ + a10_r, \ + b01_r, \ + &alpha_r, \ + b11_r, rs_b, cs_b, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a10_i, b01_r, *data ); \ \ /* b11.i = alpha.r * b11.i - a10.r * b01.i; */ \ - PASTEMAC(chr,gemmukr)( k, \ - minus_one_r, \ - a10_r, \ - b01_i, \ - &alpha_r, \ - b11_i, rs_b, cs_b, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + minus_one_r, \ + a10_r, \ + b01_i, \ + &alpha_r, \ + b11_i, rs_b, cs_b, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a10_i, b01_i, *data ); \ \ /* b11.i = 1.0 * b11.i - a10.i * b01.r; */ \ - PASTEMAC(chr,gemmukr)( k, \ - minus_one_r, \ - a10_i, \ - b01_r, \ - one_r, \ - b11_i, rs_b, cs_b, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + minus_one_r, \ + a10_i, \ + b01_r, \ + one_r, \ + b11_i, rs_b, cs_b, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_next, b_next, *data ); \ \ /* b11.r = 1.0 * b11.r + a10.i * b01.i; */ \ - PASTEMAC(chr,gemmukr)( k, \ - one_r, \ - a10_i, \ - b01_i, \ - one_r, \ - b11_r, rs_b, cs_b, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + one_r, \ + a10_i, \ + b01_i, \ + one_r, \ + b11_r, rs_b, cs_b, \ + data, \ + cntx \ + ); \ /* PASTEMAC(chr,fprintm)( stdout, "gemmtrsm4m1_l_ukr: b0111p_r post-gemm", k+m, n, \ b01_r, PASTEMAC(chr,packnr), 1, "%4.1f", "" ); \ @@ -164,10 +194,14 @@ PASTEMAC(chr,fprintm)( stdout, "gemmtrsm4m1_l_ukr: b0111p_i post-gemm", k+m, n, \ /* b11 = inv(a11) * b11; c11 = b11; */ \ - PASTEMAC(ch,trsmukr)( a11_r, \ - b11_r, \ - c11, rs_c, cs_c, \ - data ); \ + ctrsm_vir_ukr \ + ( \ + a11, \ + b11, \ + c11, rs_c, cs_c, \ + data, \ + cntx \ + ); \ \ /* PASTEMAC(chr,fprintm)( stdout, "gemmtrsm4m1_l_ukr: b0111p_r after", k+m, n, \ @@ -177,5 +211,5 @@ PASTEMAC(chr,fprintm)( stdout, "gemmtrsm4m1_l_ukr: b0111p_i after", k+m, n, \ */ \ } -INSERT_GENTFUNCCO_BASIC2( gemmtrsm4m1_l_ukr_ref, GEMM_UKERNEL, TRSM4M1_L_UKERNEL ) +INSERT_GENTFUNCCO_BASIC2( gemmtrsm4m1_l_ukr_ref, BLIS_GEMM_UKR, BLIS_TRSM_L_UKR ) diff --git a/frame/ind/ukernels/trsm/bli_gemmtrsm4m1_u_ukr_ref.c b/frame/ind/ukernels/trsm/bli_gemmtrsm4m1_u_ukr_ref.c index 1df6451d6..9d1d1927e 100644 --- a/frame/ind/ukernels/trsm/bli_gemmtrsm4m1_u_ukr_ref.c +++ b/frame/ind/ukernels/trsm/bli_gemmtrsm4m1_u_ukr_ref.c @@ -35,26 +35,40 @@ #include "blis.h" #undef GENTFUNCCO -#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmukr, trsmukr ) \ +#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, gemmkerid, trsmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a12, \ - ctype* restrict a11, \ - ctype* restrict b21, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a12, \ + ctype* restrict a11, \ + ctype* restrict b21, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ + const num_t dt = PASTEMAC(ch,type); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + PASTECH(chr,gemm_ukr_ft) \ + rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, gemmkerid, cntx ); \ +\ + PASTECH(ch,trsm_ukr_ft) \ + ctrsm_vir_ukr = bli_cntx_get_l3_vir_ukr_dt( dt, trsmkerid, cntx ); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t packnr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ \ const inc_t is_a = bli_auxinfo_is_a( data ); \ const inc_t is_b = bli_auxinfo_is_b( data ); \ -\ - ctype_r* restrict a11_r = ( ctype_r* )a11; \ \ ctype_r* restrict a12_r = ( ctype_r* )a12; \ ctype_r* restrict a12_i = ( ctype_r* )a12 + is_a; \ @@ -65,7 +79,7 @@ void PASTEMAC(ch,varname)( \ ctype_r* restrict b21_r = ( ctype_r* )b21; \ ctype_r* restrict b21_i = ( ctype_r* )b21 + is_b; \ \ - const inc_t rs_b = PASTEMAC(chr,packnr); \ + const inc_t rs_b = packnr; \ const inc_t cs_b = 1; \ \ ctype_r* restrict one_r = PASTEMAC(chr,1); \ @@ -115,55 +129,75 @@ PASTEMAC(chr,fprintm)( stdout, "gemmtrsm4m1_ukr: b1121p_i", k+m, n, \ bli_auxinfo_set_next_ab( a12_r, b21_i, *data ); \ \ /* b11.r = alpha.r * b11.r - a12.r * b21.r; */ \ - PASTEMAC(chr,gemmukr)( k, \ - minus_one_r, \ - a12_r, \ - b21_r, \ - &alpha_r, \ - b11_r, rs_b, cs_b, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + minus_one_r, \ + a12_r, \ + b21_r, \ + &alpha_r, \ + b11_r, rs_b, cs_b, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a12_i, b21_r, *data ); \ \ /* b11.i = alpha.r * b11.i - a12.r * b21.i; */ \ - PASTEMAC(chr,gemmukr)( k, \ - minus_one_r, \ - a12_r, \ - b21_i, \ - &alpha_r, \ - b11_i, rs_b, cs_b, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + minus_one_r, \ + a12_r, \ + b21_i, \ + &alpha_r, \ + b11_i, rs_b, cs_b, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a12_i, b21_i, *data ); \ \ /* b11.i = 1.0 * b11.i - a12.i * b21.r; */ \ - PASTEMAC(chr,gemmukr)( k, \ - minus_one_r, \ - a12_i, \ - b21_r, \ - one_r, \ - b11_i, rs_b, cs_b, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + minus_one_r, \ + a12_i, \ + b21_r, \ + one_r, \ + b11_i, rs_b, cs_b, \ + data, \ + cntx \ + ); \ \ bli_auxinfo_set_next_ab( a_next, b_next, *data ); \ \ /* b11.r = 1.0 * b11.r + a12.i * b21.i; */ \ - PASTEMAC(chr,gemmukr)( k, \ - one_r, \ - a12_i, \ - b21_i, \ - one_r, \ - b11_r, rs_b, cs_b, \ - data ); \ + rgemm_ukr \ + ( \ + k, \ + one_r, \ + a12_i, \ + b21_i, \ + one_r, \ + b11_r, rs_b, cs_b, \ + data, \ + cntx \ + ); \ \ \ /* b11 = inv(a11) * b11; c11 = b11; */ \ - PASTEMAC(ch,trsmukr)( a11_r, \ - b11_r, \ - c11, rs_c, cs_c, \ - data ); \ + ctrsm_vir_ukr \ + ( \ + a11, \ + b11, \ + c11, rs_c, cs_c, \ + data, \ + cntx \ + ); \ } -INSERT_GENTFUNCCO_BASIC2( gemmtrsm4m1_u_ukr_ref, GEMM_UKERNEL, TRSM4M1_U_UKERNEL ) +INSERT_GENTFUNCCO_BASIC2( gemmtrsm4m1_u_ukr_ref, BLIS_GEMM_UKR, BLIS_TRSM_U_UKR ) diff --git a/frame/ind/ukernels/trsm/bli_gemmtrsmind_x_ukr_ref.h b/frame/ind/ukernels/trsm/bli_gemmtrsmind_x_ukr_ref.h index d49ac5e01..7ec51ad8d 100644 --- a/frame/ind/ukernels/trsm/bli_gemmtrsmind_x_ukr_ref.h +++ b/frame/ind/ukernels/trsm/bli_gemmtrsmind_x_ukr_ref.h @@ -36,16 +36,18 @@ #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a1x, \ - ctype* restrict a11, \ - ctype* restrict bx1, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ); +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a1x, \ + ctype* restrict a11, \ + ctype* restrict bx1, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ); INSERT_GENTPROTCO_BASIC( gemmtrsm4m1_l_ukr_ref ) INSERT_GENTPROTCO_BASIC( gemmtrsm4m1_u_ukr_ref ) diff --git a/frame/ind/ukernels/trsm/bli_trsm3m1_l_ukr_ref.c b/frame/ind/ukernels/trsm/bli_trsm3m1_l_ukr_ref.c index 705d4aee1..62fff68e0 100644 --- a/frame/ind/ukernels/trsm/bli_trsm3m1_l_ukr_ref.c +++ b/frame/ind/ukernels/trsm/bli_trsm3m1_l_ukr_ref.c @@ -38,31 +38,41 @@ #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - ctype_r* restrict ar, \ - ctype_r* restrict br, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ + const num_t dt_r = PASTEMAC(chr,type); \ \ - const inc_t is_a = bli_auxinfo_is_a( data ); \ - const inc_t is_b = bli_auxinfo_is_b( data ); \ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ \ - ctype_r* restrict a_r = ( ctype_r* )ar; \ - ctype_r* restrict a_i = ( ctype_r* )ar + is_a; \ + const inc_t packmr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_MR, cntx ); \ + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_NR, cntx ); \ \ - ctype_r* restrict b_r = ( ctype_r* )br; \ - ctype_r* restrict b_i = ( ctype_r* )br + is_b; \ - ctype_r* restrict b_ri = ( ctype_r* )br + 2*is_b; \ + const dim_t m = mr; \ + const dim_t n = nr; \ \ - const inc_t rs_a = 1; \ - const inc_t cs_a = PASTEMAC(chr,packmr); \ + const inc_t is_a = bli_auxinfo_is_a( data ); \ + const inc_t is_b = bli_auxinfo_is_b( data ); \ \ - const inc_t rs_b = PASTEMAC(chr,packnr); \ - const inc_t cs_b = 1; \ + ctype_r* restrict a_r = ( ctype_r* )a; \ + ctype_r* restrict a_i = ( ctype_r* )a + is_a; \ +\ + ctype_r* restrict b_r = ( ctype_r* )b; \ + ctype_r* restrict b_i = ( ctype_r* )b + is_b; \ + ctype_r* restrict b_ri = ( ctype_r* )b + 2*is_b; \ +\ + const inc_t rs_a = 1; \ + const inc_t cs_a = packmr; \ +\ + const inc_t rs_b = packnr; \ + const inc_t cs_b = 1; \ \ dim_t iter, i, j, l; \ dim_t n_behind; \ diff --git a/frame/ind/ukernels/trsm/bli_trsm3m1_u_ukr_ref.c b/frame/ind/ukernels/trsm/bli_trsm3m1_u_ukr_ref.c index 567abe198..af916ed33 100644 --- a/frame/ind/ukernels/trsm/bli_trsm3m1_u_ukr_ref.c +++ b/frame/ind/ukernels/trsm/bli_trsm3m1_u_ukr_ref.c @@ -38,30 +38,40 @@ #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - ctype_r* restrict ar, \ - ctype_r* restrict br, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const inc_t packmr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_MR, cntx ); \ + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ \ const inc_t is_a = bli_auxinfo_is_a( data ); \ const inc_t is_b = bli_auxinfo_is_b( data ); \ \ - ctype_r* restrict a_r = ( ctype_r* )ar; \ - ctype_r* restrict a_i = ( ctype_r* )ar + is_a; \ + ctype_r* restrict a_r = ( ctype_r* )a; \ + ctype_r* restrict a_i = ( ctype_r* )a + is_a; \ \ - ctype_r* restrict b_r = ( ctype_r* )br; \ - ctype_r* restrict b_i = ( ctype_r* )br + is_b; \ - ctype_r* restrict b_ri = ( ctype_r* )br + 2*is_b; \ + ctype_r* restrict b_r = ( ctype_r* )b; \ + ctype_r* restrict b_i = ( ctype_r* )b + is_b; \ + ctype_r* restrict b_ri = ( ctype_r* )b + 2*is_b; \ \ const inc_t rs_a = 1; \ - const inc_t cs_a = PASTEMAC(chr,packmr); \ + const inc_t cs_a = packmr; \ \ - const inc_t rs_b = PASTEMAC(chr,packnr); \ + const inc_t rs_b = packnr; \ const inc_t cs_b = 1; \ \ dim_t iter, i, j, l; \ diff --git a/frame/ind/ukernels/trsm/bli_trsm4m1_l_ukr_ref.c b/frame/ind/ukernels/trsm/bli_trsm4m1_l_ukr_ref.c index e11faabe6..06274d95c 100644 --- a/frame/ind/ukernels/trsm/bli_trsm4m1_l_ukr_ref.c +++ b/frame/ind/ukernels/trsm/bli_trsm4m1_l_ukr_ref.c @@ -38,29 +38,39 @@ #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - ctype_r* restrict ar, \ - ctype_r* restrict br, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const inc_t packmr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_MR, cntx ); \ + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ \ const inc_t is_a = bli_auxinfo_is_a( data ); \ const inc_t is_b = bli_auxinfo_is_b( data ); \ \ - ctype_r* restrict a_r = ( ctype_r* )ar; \ - ctype_r* restrict a_i = ( ctype_r* )ar + is_a; \ + ctype_r* restrict a_r = ( ctype_r* )a; \ + ctype_r* restrict a_i = ( ctype_r* )a + is_a; \ \ - ctype_r* restrict b_r = ( ctype_r* )br; \ - ctype_r* restrict b_i = ( ctype_r* )br + is_b; \ + ctype_r* restrict b_r = ( ctype_r* )b; \ + ctype_r* restrict b_i = ( ctype_r* )b + is_b; \ \ const inc_t rs_a = 1; \ - const inc_t cs_a = PASTEMAC(chr,packmr); \ + const inc_t cs_a = packmr; \ \ - const inc_t rs_b = PASTEMAC(chr,packnr); \ + const inc_t rs_b = packnr; \ const inc_t cs_b = 1; \ \ dim_t iter, i, j, l; \ diff --git a/frame/ind/ukernels/trsm/bli_trsm4m1_u_ukr_ref.c b/frame/ind/ukernels/trsm/bli_trsm4m1_u_ukr_ref.c index f932ba5ea..5711dc8ce 100644 --- a/frame/ind/ukernels/trsm/bli_trsm4m1_u_ukr_ref.c +++ b/frame/ind/ukernels/trsm/bli_trsm4m1_u_ukr_ref.c @@ -38,29 +38,39 @@ #undef GENTFUNCCO #define GENTFUNCCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - ctype_r* restrict ar, \ - ctype_r* restrict br, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - const dim_t m = PASTEMAC(chr,mr); \ - const dim_t n = PASTEMAC(chr,nr); \ + const num_t dt_r = PASTEMAC(chr,type); \ +\ + const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \ + const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \ +\ + const inc_t packmr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_MR, cntx ); \ + const inc_t packnr = bli_cntx_get_blksz_max_dt( dt_r, BLIS_NR, cntx ); \ +\ + const dim_t m = mr; \ + const dim_t n = nr; \ \ const inc_t is_a = bli_auxinfo_is_a( data ); \ const inc_t is_b = bli_auxinfo_is_b( data ); \ \ - ctype_r* restrict a_r = ( ctype_r* )ar; \ - ctype_r* restrict a_i = ( ctype_r* )ar + is_a; \ + ctype_r* restrict a_r = ( ctype_r* )a; \ + ctype_r* restrict a_i = ( ctype_r* )a + is_a; \ \ - ctype_r* restrict b_r = ( ctype_r* )br; \ - ctype_r* restrict b_i = ( ctype_r* )br + is_b; \ + ctype_r* restrict b_r = ( ctype_r* )b; \ + ctype_r* restrict b_i = ( ctype_r* )b + is_b; \ \ const inc_t rs_a = 1; \ - const inc_t cs_a = PASTEMAC(chr,packmr); \ + const inc_t cs_a = packmr; \ \ - const inc_t rs_b = PASTEMAC(chr,packnr); \ + const inc_t rs_b = packnr; \ const inc_t cs_b = 1; \ \ dim_t iter, i, j, l; \ diff --git a/frame/ind/ukernels/trsm/bli_trsmind_x_ukr_ref.h b/frame/ind/ukernels/trsm/bli_trsmind_x_ukr_ref.h index 1335f0961..abad11caf 100644 --- a/frame/ind/ukernels/trsm/bli_trsmind_x_ukr_ref.h +++ b/frame/ind/ukernels/trsm/bli_trsmind_x_ukr_ref.h @@ -36,12 +36,14 @@ #undef GENTPROTCO #define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(ch,varname)( \ - ctype_r* restrict ar, \ - ctype_r* restrict br, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ); +void PASTEMAC(ch,varname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ); INSERT_GENTPROTCO_BASIC( trsm4m1_l_ukr_ref ) INSERT_GENTPROTCO_BASIC( trsm4m1_u_ukr_ref ) diff --git a/frame/ind/cntl/bli_trsmind_cntl.h b/frame/util/bli_util.h similarity index 86% rename from frame/ind/cntl/bli_trsmind_cntl.h rename to frame/util/bli_util.h index 7f4535c61..0b16b1b9f 100644 --- a/frame/ind/cntl/bli_trsmind_cntl.h +++ b/frame/util/bli_util.h @@ -32,9 +32,14 @@ */ -void bli_trsm4m1_cntl_init( void ); -void bli_trsm4m1_cntl_finalize( void ); +#include "bli_util_check.h" -void bli_trsm3m1_cntl_init( void ); -void bli_trsm3m1_cntl_finalize( void ); +// Prototype object APIs with and without contexts. +#include "bli_oapi_w_cntx.h" +#include "bli_util_oapi.h" +#include "bli_oapi_wo_cntx.h" +#include "bli_util_oapi.h" + +#include "bli_util_tapi.h" +#include "bli_util_unb_var1.h" diff --git a/frame/util/bli_util_check.c b/frame/util/bli_util_check.c new file mode 100644 index 000000000..8fb1cca02 --- /dev/null +++ b/frame/util/bli_util_check.c @@ -0,0 +1,443 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define object-based check functions. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* index \ + ) \ +{ \ + bli_utilv_xi_check( x, index ); \ +} + +GENFRONT( amaxv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* asum \ + ) \ +{ \ + bli_utilv_xa_check( x, asum ); \ +} + +GENFRONT( asumv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x \ + ) \ +{ \ + bli_utilm_mkhst_check( x ); \ +} + +GENFRONT( mkherm ) +GENFRONT( mksymm ) +GENFRONT( mktrim ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* norm \ + ) \ +{ \ + bli_utilv_norm_check( x, norm ); \ +} + +GENFRONT( norm1v ) +GENFRONT( normfv ) +GENFRONT( normiv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* norm \ + ) \ +{ \ + bli_utilm_norm_check( x, norm ); \ +} + +GENFRONT( norm1m ) +GENFRONT( normfm ) +GENFRONT( normim ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + FILE* file, \ + char* s1, \ + obj_t* x, \ + char* format, \ + char* s2 \ + ) \ +{ \ + bli_utilm_fprint_check( file, s1, x, format, s2 ); \ +} + +GENFRONT( fprintv ) +GENFRONT( fprintm ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x \ + ) \ +{ \ + bli_utilm_rand_check( x ); \ +} + +GENFRONT( randv ) +GENFRONT( randm ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* scale, \ + obj_t* sumsq \ + ) \ +{ \ + bli_utilv_sumsqv_check( x, scale, sumsq ); \ +} + +GENFRONT( sumsqv ) + + +// ----------------------------------------------------------------------------- + +void bli_utilv_xi_check + ( + obj_t* x, + obj_t* index + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_integer_object( index ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( index ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( index ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( index ); + bli_check_error_code( e_val ); +} + +void bli_utilv_xa_check + ( + obj_t* x, + obj_t* asum + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( asum ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( asum ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( asum ); + bli_check_error_code( e_val ); +} + +void bli_utilm_mkhst_check + ( + obj_t* a + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( a ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_matrix_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_square_object( a ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_diag_offset_equals( a, 0 ); + bli_check_error_code( e_val ); + + // Check matrix storage. + + e_val = bli_check_upper_or_lower_object( a ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( a ); + bli_check_error_code( e_val ); +} + +void bli_utilv_norm_check + ( + obj_t* x, + obj_t* norm + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( norm ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( norm ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( norm ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( norm ); + bli_check_error_code( e_val ); +} + + +void bli_utilm_norm_check + ( + obj_t* x, + obj_t* norm + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_noninteger_object( norm ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( norm ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_matrix_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( norm ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( norm ); + bli_check_error_code( e_val ); +} + +void bli_utilm_fprint_check + ( + FILE* file, + char* s1, + obj_t* x, + char* format, + char* s2 + ) +{ + err_t e_val; + + // Check argument pointers. + + e_val = bli_check_null_pointer( file ); + bli_check_error_code( e_val ); + + e_val = bli_check_null_pointer( s1 ); + bli_check_error_code( e_val ); + + e_val = bli_check_null_pointer( s2 ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); +} + +void bli_utilm_rand_check + ( + obj_t* x + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_noninteger_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( x ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); +} + +void bli_utilv_sumsqv_check + ( + obj_t* x, + obj_t* scale, + obj_t* sumsq + ) +{ + err_t e_val; + + // Check object datatypes. + + e_val = bli_check_floating_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( scale ); + bli_check_error_code( e_val ); + + e_val = bli_check_nonconstant_object( sumsq ); + bli_check_error_code( e_val ); + + // Check object dimensions. + + e_val = bli_check_vector_object( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( scale ); + bli_check_error_code( e_val ); + + e_val = bli_check_scalar_object( sumsq ); + bli_check_error_code( e_val ); + + // Check object buffers (for non-NULLness). + + e_val = bli_check_object_buffer( x ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( scale ); + bli_check_error_code( e_val ); + + e_val = bli_check_object_buffer( sumsq ); + bli_check_error_code( e_val ); +} + diff --git a/frame/util/bli_util_check.h b/frame/util/bli_util_check.h new file mode 100644 index 000000000..b48a3160a --- /dev/null +++ b/frame/util/bli_util_check.h @@ -0,0 +1,197 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based check functions. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* index \ + ); + +GENPROT( amaxv ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* asum \ + ); + +GENPROT( asumv ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x \ + ); + +GENPROT( mkherm ) +GENPROT( mksymm ) +GENPROT( mktrim ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* norm \ + ); + +GENPROT( norm1v ) +GENPROT( normfv ) +GENPROT( normiv ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* norm \ + ); + +GENPROT( norm1m ) +GENPROT( normfm ) +GENPROT( normim ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + FILE* file, \ + char* s1, \ + obj_t* x, \ + char* format, \ + char* s2 \ + ); + +GENPROT( fprintv ) +GENPROT( fprintm ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x \ + ); + +GENPROT( randv ) +GENPROT( randm ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,_check) \ + ( \ + obj_t* x, \ + obj_t* scale, \ + obj_t* sumsq \ + ); + +GENPROT( sumsqv ) + + +// ----------------------------------------------------------------------------- + +void bli_utilv_xi_check + ( + obj_t* x, + obj_t* index + ); + +void bli_utilv_xa_check + ( + obj_t* x, + obj_t* asum + ); + +void bli_utilm_mkhst_check + ( + obj_t* a + ); + +void bli_utilv_norm_check + ( + obj_t* x, + obj_t* norm + ); + +void bli_utilm_norm_check + ( + obj_t* x, + obj_t* norm + ); + +void bli_utilm_fprint_check + ( + FILE* file, + char* s1, + obj_t* x, + char* format, + char* s2 + ); + +void bli_utilm_rand_check + ( + obj_t* x + ); + +void bli_utilv_sumsqv_check + ( + obj_t* x, + obj_t* scale, + obj_t* sumsq + ); + diff --git a/frame/util/bli_util_oapi.c b/frame/util/bli_util_oapi.c new file mode 100644 index 000000000..d0c4ff1ad --- /dev/null +++ b/frame/util/bli_util_oapi.c @@ -0,0 +1,512 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +// Guard the function definitions so that they are only compiled when +// #included from files that define the object API macros. +#ifdef BLIS_ENABLE_OAPI + +// +// Define object-based interfaces. +// + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* index \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_index = bli_obj_buffer_at_off( *index ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x, index ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_5 \ + ( \ + dt, \ + opname, \ + n, \ + buf_x, incx, \ + buf_index, \ + cntx \ + ); \ +} + +GENFRONT( amaxv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* asum \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + void* buf_asum = bli_obj_buffer_at_off( *asum ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x, asum ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_5 \ + ( \ + dt, \ + opname, \ + n, \ + buf_x, incx, \ + buf_asum, \ + cntx \ + ); \ +} + +GENFRONT( asumv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* a \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *a ); \ +\ + uplo_t uploa = bli_obj_uplo( *a ); \ + dim_t m = bli_obj_length( *a ); \ + void* buf_a = bli_obj_buffer_at_off( *a ); \ + inc_t rs_a = bli_obj_row_stride( *a ); \ + inc_t cs_a = bli_obj_col_stride( *a ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( a ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_6 \ + ( \ + dt, \ + opname, \ + uploa, \ + m, \ + buf_a, rs_a, cs_a, \ + cntx \ + ); \ +} + +GENFRONT( mkherm ) +GENFRONT( mksymm ) +GENFRONT( mktrim ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* norm \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ + void* buf_norm = bli_obj_buffer_at_off( *norm ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x, norm ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_5 \ + ( \ + dt, \ + opname, \ + n, \ + buf_x, incx, \ + buf_norm, \ + cntx \ + ); \ +} + +GENFRONT( norm1v ) +GENFRONT( normfv ) +GENFRONT( normiv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* norm \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + diag_t diagx = bli_obj_diag( *x ); \ + uplo_t uplox = bli_obj_uplo( *x ); \ + dim_t m = bli_obj_length( *x ); \ + dim_t n = bli_obj_width( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ + void* buf_norm = bli_obj_buffer_at_off( *norm ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x, norm ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_10 \ + ( \ + dt, \ + opname, \ + diagoffx, \ + diagx, \ + uplox, \ + m, \ + n, \ + buf_x, rs_x, cs_x, \ + buf_norm, \ + cntx \ + ); \ +} + +GENFRONT( norm1m ) +GENFRONT( normfm ) +GENFRONT( normim ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + FILE* file, \ + char* s1, \ + obj_t* x, \ + char* format, \ + char* s2 \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + /* Suppress compiler warning about unused variables. */ \ + ( void )cntx; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( file, s1, x, format, s2 ); \ +\ + /* Handle constants up front. */ \ + if ( dt == BLIS_CONSTANT ) \ + { \ + bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \ + } \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_7 \ + ( \ + dt, \ + opname, \ + file, \ + s1, \ + n, \ + buf_x, incx, \ + format, \ + s2 \ + ); \ +} + +GENFRONT( fprintv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + FILE* file, \ + char* s1, \ + obj_t* x, \ + char* format, \ + char* s2 \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + dim_t m = bli_obj_length( *x ); \ + dim_t n = bli_obj_width( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ +\ + /* Suppress compiler warning about unused variables. */ \ + ( void )cntx; \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( file, s1, x, format, s2 ); \ +\ + /* Handle constants up front. */ \ + if ( dt == BLIS_CONSTANT ) \ + { \ + float* sp = bli_obj_buffer_for_const( BLIS_FLOAT, *x ); \ + double* dp = bli_obj_buffer_for_const( BLIS_DOUBLE, *x ); \ + scomplex* cp = bli_obj_buffer_for_const( BLIS_SCOMPLEX, *x ); \ + dcomplex* zp = bli_obj_buffer_for_const( BLIS_DCOMPLEX, *x ); \ + gint_t* ip = bli_obj_buffer_for_const( BLIS_INT, *x ); \ +\ + fprintf( file, "%s\n", s1 ); \ + fprintf( file, " float: %9.2e\n", bli_sreal( *sp ) ); \ + fprintf( file, " double: %9.2e\n", bli_dreal( *dp ) ); \ + fprintf( file, " scomplex: %9.2e + %9.2e\n", bli_creal( *cp ), \ + bli_cimag( *cp ) ); \ + fprintf( file, " dcomplex: %9.2e + %9.2e\n", bli_zreal( *zp ), \ + bli_zimag( *zp ) ); \ + fprintf( file, " int: %ld\n", *ip ); \ + fprintf( file, "\n" ); \ + return; \ + } \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_9 \ + ( \ + dt, \ + opname, \ + file, \ + s1, \ + m, \ + n, \ + buf_x, rs_x, cs_x, \ + format, \ + s2 \ + ); \ +} + +GENFRONT( fprintm ) + + +#undef GENFRONT +#define GENFRONT( opname, varname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + char* s1, \ + obj_t* x, \ + char* format, \ + char* s2 \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + /* Suppress compiler warning about unused variables. */ \ + ( void )cntx; \ +\ + /* Invoke the typed function. */ \ + PASTEMAC0(varname) \ + ( \ + stdout, \ + s1, \ + x, \ + format, \ + s2 \ + ); \ +} + +GENFRONT( printv, fprintv ) +GENFRONT( printm, fprintm ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_4 \ + ( \ + dt, \ + opname, \ + n, \ + buf_x, incx, \ + cntx \ + ); \ +} + +GENFRONT( randv ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + doff_t diagoffx = bli_obj_diag_offset( *x ); \ + uplo_t uplox = bli_obj_uplo( *x ); \ + dim_t m = bli_obj_length( *x ); \ + dim_t n = bli_obj_width( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t rs_x = bli_obj_row_stride( *x ); \ + inc_t cs_x = bli_obj_col_stride( *x ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_8 \ + ( \ + dt, \ + opname, \ + diagoffx, \ + uplox, \ + m, \ + n, \ + buf_x, rs_x, cs_x, \ + cntx \ + ); \ +} + +GENFRONT( randm ) + + +#undef GENFRONT +#define GENFRONT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* scale, \ + obj_t* sumsq \ + BLIS_OAPI_CNTX_PARAM \ + ) \ +{ \ + BLIS_OAPI_CNTX_DECL \ +\ + num_t dt = bli_obj_datatype( *x ); \ +\ + dim_t n = bli_obj_vector_dim( *x ); \ + void* buf_x = bli_obj_buffer_at_off( *x ); \ + inc_t incx = bli_obj_vector_inc( *x ); \ + void* buf_scale = bli_obj_buffer_at_off( *scale ); \ + void* buf_sumsq = bli_obj_buffer_at_off( *sumsq ); \ +\ + if ( bli_error_checking_is_enabled() ) \ + PASTEMAC(opname,_check)( x, scale, sumsq ); \ +\ + /* Invoke the typed function. */ \ + bli_call_ft_6 \ + ( \ + dt, \ + opname, \ + n, \ + buf_x, incx, \ + buf_scale, \ + buf_sumsq, \ + cntx \ + ); \ +} + +GENFRONT( sumsqv ) + + + +#endif + diff --git a/frame/util/bli_util_oapi.h b/frame/util/bli_util_oapi.h new file mode 100644 index 000000000..7f7e7048c --- /dev/null +++ b/frame/util/bli_util_oapi.h @@ -0,0 +1,179 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype object-based interfaces. +// + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* index \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( amaxv ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* asum \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( asumv ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* a \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( mkherm ) +GENPROT( mksymm ) +GENPROT( mktrim ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* norm \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( norm1v ) +GENPROT( normfv ) +GENPROT( normiv ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* norm \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( norm1m ) +GENPROT( normfm ) +GENPROT( normim ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + FILE* file, \ + char* s1, \ + obj_t* x, \ + char* format, \ + char* s2 \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( fprintv ) +GENPROT( fprintm ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + char* s1, \ + obj_t* x, \ + char* format, \ + char* s2 \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( printv ) +GENPROT( printm ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( randv ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( randm ) + + +#undef GENPROT +#define GENPROT( opname ) \ +\ +void PASTEMAC(opname,EX_SUF) \ + ( \ + obj_t* x, \ + obj_t* scale, \ + obj_t* sumsq \ + BLIS_OAPI_CNTX_PARAM \ + ); + +GENPROT( sumsqv ) + diff --git a/frame/util/bli_util_oapi_wc.c b/frame/util/bli_util_oapi_wc.c new file mode 100644 index 000000000..264180b94 --- /dev/null +++ b/frame/util/bli_util_oapi_wc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-aware. +#include "bli_oapi_w_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_util_oapi.c" + diff --git a/frame/util/bli_util_oapi_woc.c b/frame/util/bli_util_oapi_woc.c new file mode 100644 index 000000000..ac0f52906 --- /dev/null +++ b/frame/util/bli_util_oapi_woc.c @@ -0,0 +1,46 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// Include cpp macros that instantiate the API definition templates as +// context-less. +#include "bli_oapi_wo_cntx.h" + +// Define the macro protecting the object API definitions. +#define BLIS_ENABLE_OAPI + +// Include the object API definitions here. +#include "bli_util_oapi.c" + diff --git a/frame/util/bli_util_tapi.c b/frame/util/bli_util_tapi.c new file mode 100644 index 000000000..9aa177a7c --- /dev/null +++ b/frame/util/bli_util_tapi.c @@ -0,0 +1,423 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNCI +#define GENTFUNCI( ctype, ctype_i, ch, chi, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_i* index, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p = cntx; \ +\ + /* If the vector length is zero, set the index to zero and return + early. This directly emulatess the behavior of netlib LAPACK's + i?amax() routines. */ \ + if ( bli_zero_dim1( n ) ) \ + { \ + ctype_i* zero_i = PASTEMAC(chi,0); \ +\ + PASTEMAC(chi,copys)( *zero_i, *index ); \ + return; \ + } \ +\ + /* Initialize a local context if the given context is NULL. */ \ + /*bli_cntx_init_local_if( opname, cntx, cntx_p );*/ \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + n, \ + x, incx, \ + index, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + /*bli_cntx_finalize_local_if( opname, cntx );*/ \ +} + +INSERT_GENTFUNCI_BASIC0( amaxv ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* asum, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p = cntx; \ +\ + /* If the vector length is zero, set the absolute sum return value to + zero and return early. */ \ + if ( bli_zero_dim1( n ) ) \ + { \ + PASTEMAC(chr,set0s)( *asum ); \ + return; \ + } \ +\ + /* Initialize a local context if the given context is NULL. */ \ + /*bli_cntx_init_local_if( opname, cntx, cntx_p );*/ \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + n, \ + x, incx, \ + asum, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + /*bli_cntx_finalize_local_if( opname, cntx );*/ \ +} + +INSERT_GENTFUNCR_BASIC0( asumv ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + dim_t m, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p = cntx; \ +\ + /* If either dimension is zero, return early. */ \ + if ( bli_zero_dim2( m, m ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + /*bli_cntx_init_local_if( opname, cntx, cntx_p );*/ \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + uploa, \ + m, \ + a, rs_a, cs_a, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + /*bli_cntx_finalize_local_if( opname, cntx );*/ \ +} + +INSERT_GENTFUNC_BASIC0( mkherm ) +INSERT_GENTFUNC_BASIC0( mksymm ) +INSERT_GENTFUNC_BASIC0( mktrim ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* norm, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p = cntx; \ +\ + /* If the vector length is zero, set the norm to zero and return + early. */ \ + if ( bli_zero_dim1( n ) ) \ + { \ + PASTEMAC(chr,set0s)( *norm ); \ + return; \ + } \ +\ + /* Initialize a local context if the given context is NULL. */ \ + /*bli_cntx_init_local_if( opname, cntx, cntx_p );*/ \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + n, \ + x, incx, \ + norm, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + /*bli_cntx_finalize_local_if( opname, cntx );*/ \ +} + +INSERT_GENTFUNCR_BASIC0( norm1v ) +INSERT_GENTFUNCR_BASIC0( normfv ) +INSERT_GENTFUNCR_BASIC0( normiv ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype_r* norm, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p = cntx; \ +\ + /* If either dimension is zero, set the norm to zero and return + early. */ \ + if ( bli_zero_dim2( m, n ) ) \ + { \ + PASTEMAC(chr,set0s)( *norm ); \ + return; \ + } \ +\ + /* Initialize a local context if the given context is NULL. */ \ + /*bli_cntx_init_local_if( opname, cntx, cntx_p );*/ \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + diagoffx, \ + diagx, \ + uplox, \ + m, \ + n, \ + x, rs_x, cs_x, \ + norm, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + /*bli_cntx_finalize_local_if( opname, cntx );*/ \ +} + +INSERT_GENTFUNCR_BASIC0( norm1m ) +INSERT_GENTFUNCR_BASIC0( normfm ) +INSERT_GENTFUNCR_BASIC0( normim ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, varname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + char* s1, \ + dim_t n, \ + void* x, inc_t incx, \ + char* format, \ + char* s2 \ + ) \ +{ \ + PASTEMAC(ch,varname) \ + ( \ + stdout, \ + s1, \ + n, \ + x, incx, \ + format, \ + s2 \ + ); \ +} + +INSERT_GENTFUNC_BASIC_I( printv, fprintv ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname, varname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + char* s1, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + char* format, \ + char* s2 \ + ) \ +{ \ + PASTEMAC(ch,varname) \ + ( \ + stdout, \ + s1, \ + m, \ + n, \ + x, rs_x, cs_x, \ + format, \ + s2 \ + ); \ +} + +INSERT_GENTFUNC_BASIC_I( printm, fprintm ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p = cntx; \ +\ + /* If the vector length is zero, return early. */ \ + if ( bli_zero_dim1( n ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + /*bli_cntx_init_local_if( opname, cntx, cntx_p );*/ \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + n, \ + x, incx, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + /*bli_cntx_finalize_local_if( opname, cntx );*/ \ +} + +INSERT_GENTFUNC_BASIC0( randv ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p = cntx; \ +\ + /* If either dimension is zero, return early. */ \ + if ( bli_zero_dim2( m, n ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + /*bli_cntx_init_local_if( opname, cntx, cntx_p );*/ \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + diagoffx, \ + uplox, \ + m, \ + n, \ + x, rs_x, cs_x, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + /*bli_cntx_finalize_local_if( opname, cntx );*/ \ +} + +INSERT_GENTFUNC_BASIC0( randm ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* scale, \ + ctype_r* sumsq, \ + cntx_t* cntx \ + ) \ +{ \ + cntx_t* cntx_p = cntx; \ +\ + /* If x is zero length, return with scale and sumsq unchanged. */ \ + if ( bli_zero_dim1( n ) ) return; \ +\ + /* Initialize a local context if the given context is NULL. */ \ + /*bli_cntx_init_local_if( opname, cntx, cntx_p );*/ \ +\ + /* Invoke the helper variant, which loops over the appropriate kernel + to implement the current operation. */ \ + PASTEMAC2(ch,opname,_unb_var1) \ + ( \ + n, \ + x, incx, \ + scale, \ + sumsq, \ + cntx_p \ + ); \ +\ + /* Finalize the context if it was initialized locally. */ \ + /*bli_cntx_finalize_local_if( opname, cntx );*/ \ +} + +INSERT_GENTFUNCR_BASIC0( sumsqv ) + + diff --git a/frame/util/bli_util_tapi.h b/frame/util/bli_util_tapi.h new file mode 100644 index 000000000..92fec2cb4 --- /dev/null +++ b/frame/util/bli_util_tapi.h @@ -0,0 +1,194 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROTI +#define GENTPROTI( ctype, ctype_i, ch, chi, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_i* index, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTI_BASIC( amaxv ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* asum, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( asumv ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + uplo_t uploa, \ + dim_t m, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( mkherm ) +INSERT_GENTPROT_BASIC( mksymm ) +INSERT_GENTPROT_BASIC( mktrim ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* norm, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( norm1v ) +INSERT_GENTPROTR_BASIC( normfv ) +INSERT_GENTPROTR_BASIC( normiv ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype_r* norm, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( norm1m ) +INSERT_GENTPROTR_BASIC( normfm ) +INSERT_GENTPROTR_BASIC( normim ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + char* s1, \ + dim_t n, \ + void* x, inc_t incx, \ + char* format, \ + char* s2 \ + ); + +INSERT_GENTPROT_BASIC_I( printv ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + char* s1, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + char* format, \ + char* s2 \ + ); + +INSERT_GENTPROT_BASIC_I( printm ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( randv ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + doff_t diagoffx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( randm ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* scale, \ + ctype_r* sumsq, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( sumsqv ) + + diff --git a/frame/util/bli_util_unb_var1.c b/frame/util/bli_util_unb_var1.c new file mode 100644 index 000000000..6625d6fd0 --- /dev/null +++ b/frame/util/bli_util_unb_var1.c @@ -0,0 +1,1091 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + +#include "blis.h" + +// +// Define BLAS-like interfaces with typed operands. +// + +#undef GENTFUNCRI +#define GENTFUNCRI( ctype, ctype_r, ctype_i, ch, chr, chi, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_i* abmax_i, \ + cntx_t* cntx \ + ) \ +{ \ + ctype_r* minus_one = PASTEMAC(chr,m1); \ + ctype_i* zero_i = PASTEMAC(chi,0); \ +\ + ctype* chi1; \ + ctype_r chi1_r; \ + ctype_r chi1_i; \ + ctype_r abs_chi1; \ + ctype_r abs_chi1_max; \ + ctype_i i_max; \ + dim_t i; \ +\ + /* Initialize the index of the maximum absolute value to zero. */ \ + PASTEMAC(chi,copys)( *zero_i, i_max ); \ +\ + /* Initialize the maximum absolute value search candidate with + -1, which is guaranteed to be less than all values we will + compute. */ \ + PASTEMAC(chr,copys)( *minus_one, abs_chi1_max ); \ +\ + for ( i = 0; i < n; ++i ) \ + { \ + chi1 = x + (i )*incx; \ +\ + /* Get the real and imaginary components of chi1. */ \ + PASTEMAC2(ch,chr,gets)( *chi1, chi1_r, chi1_i ); \ +\ + /* Replace chi1_r and chi1_i with their absolute values. */ \ + PASTEMAC(chr,abval2s)( chi1_r, chi1_r ); \ + PASTEMAC(chr,abval2s)( chi1_i, chi1_i ); \ +\ + /* Add the real and imaginary absolute values together. */ \ + PASTEMAC(chr,set0s)( abs_chi1 ); \ + PASTEMAC(chr,adds)( chi1_r, abs_chi1 ); \ + PASTEMAC(chr,adds)( chi1_i, abs_chi1 ); \ +\ + /* If the absolute value of the current element exceeds that of + the previous largest, save it and its index. If NaN is + encountered, then treat it the same as if it were a valid + value that was smaller than any previously seen. This + behavior mimics that of LAPACK's ?lange(). */ \ + if ( abs_chi1_max < abs_chi1 || bli_isnan( abs_chi1 ) ) \ + { \ + PASTEMAC(chr,copys)( abs_chi1, abs_chi1_max ); \ + PASTEMAC(chi,copys)( i, i_max ); \ + } \ + } \ +\ + /* Store final index to output variable. */ \ + PASTEMAC(chi,copys)( i_max, *abmax_i ); \ +} + +INSERT_GENTFUNCRI_BASIC0( amaxv_unb_var1 ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* asum, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype_r chi1_r; \ + ctype_r chi1_i; \ + ctype_r absum; \ + dim_t i; \ +\ + /* Initialize the absolute sum accumulator to zero. */ \ + PASTEMAC(chr,set0s)( absum ); \ +\ + for ( i = 0; i < n; ++i ) \ + { \ + chi1 = x + (i )*incx; \ +\ + /* Get the real and imaginary components of chi1. */ \ + PASTEMAC2(ch,chr,gets)( *chi1, chi1_r, chi1_i ); \ +\ + /* Replace chi1_r and chi1_i with their absolute values. */ \ + chi1_r = bli_fabs( chi1_r ); \ + chi1_i = bli_fabs( chi1_i ); \ +\ + /* Accumulate the real and imaginary components into absum. */ \ + PASTEMAC(chr,adds)( chi1_r, absum ); \ + PASTEMAC(chr,adds)( chi1_i, absum ); \ + } \ +\ + /* Store the final value of absum to the output variable. */ \ + PASTEMAC(chr,copys)( absum, *asum ); \ +} + +INSERT_GENTFUNCR_BASIC0( asumv_unb_var1 ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + dim_t m, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ +{ \ + ctype_r* zeror = PASTEMAC(chr,0); \ + doff_t diagoffa; \ +\ + /* If the dimension is zero, return early. */ \ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* In order to avoid the main diagonal, we must nudge the diagonal either + up or down by one, depending on which triangle is currently stored. */ \ + if ( bli_is_upper( uploa ) ) diagoffa = 1; \ + else /*if ( bli_is_lower( uploa ) )*/ diagoffa = -1; \ +\ + /* We will be reflecting the stored region over the diagonal into the + unstored region, so a transposition is necessary. Furthermore, since + we are creating a Hermitian matrix, we must also conjugate. */ \ + PASTEMAC(ch,copym) \ + ( \ + diagoffa, \ + BLIS_NONUNIT_DIAG, \ + uploa, \ + BLIS_CONJ_TRANSPOSE, \ + m, \ + m, \ + a, rs_a, cs_a, \ + a, rs_a, cs_a, \ + cntx \ + ); \ +\ + /* Set the imaginary parts of the diagonal elements to zero. */ \ + PASTEMAC(ch,setid) \ + ( \ + 0, \ + m, \ + m, \ + zeror, \ + a, rs_a, cs_a, \ + cntx \ + ); \ +} + +INSERT_GENTFUNCR_BASIC0( mkherm_unb_var1 ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + dim_t m, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ +{ \ + doff_t diagoffa; \ +\ + /* If the dimension is zero, return early. */ \ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* In order to avoid the main diagonal, we must nudge the diagonal either + up or down by one, depending on which triangle is currently stored. */ \ + if ( bli_is_upper( uploa ) ) diagoffa = 1; \ + else /*if ( bli_is_lower( uploa ) )*/ diagoffa = -1; \ +\ + /* We will be reflecting the stored region over the diagonal into the + unstored region, so a transposition is necessary. */ \ + PASTEMAC(ch,copym) \ + ( \ + diagoffa, \ + BLIS_NONUNIT_DIAG, \ + uploa, \ + BLIS_TRANSPOSE, \ + m, \ + m, \ + a, rs_a, cs_a, \ + a, rs_a, cs_a, \ + cntx \ + ); \ +} + +INSERT_GENTFUNC_BASIC0( mksymm_unb_var1 ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + dim_t m, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* zero = PASTEMAC(ch,0); \ + doff_t diagoffa; \ +\ + /* If the dimension is zero, return early. */ \ + if ( bli_zero_dim1( m ) ) return; \ +\ + /* Toggle uplo so that it refers to the unstored triangle. */ \ + bli_toggle_uplo( uploa ); \ +\ + /* In order to avoid the main diagonal, we must nudge the diagonal either + up or down by one, depending on which triangle is to be zeroed. */ \ + if ( bli_is_upper( uploa ) ) diagoffa = 1; \ + else /*if ( bli_is_lower( uploa ) )*/ diagoffa = -1; \ +\ + /* Set the unstored triangle to zero. */ \ + PASTEMAC(ch,setm) \ + ( \ + BLIS_NO_CONJUGATE, \ + diagoffa, \ + BLIS_NONUNIT_DIAG, \ + uploa, \ + m, \ + m, \ + zero, \ + a, rs_a, cs_a, \ + cntx \ + ); \ +} + +INSERT_GENTFUNC_BASIC0( mktrim_unb_var1 ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* norm, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype_r abs_chi1; \ + ctype_r absum; \ + dim_t i; \ +\ + /* Initialize the absolute sum accumulator to zero. */ \ + PASTEMAC(chr,set0s)( absum ); \ +\ + for ( i = 0; i < n; ++i ) \ + { \ + chi1 = x + (i )*incx; \ +\ + /* Compute the absolute value (or complex magnitude) of chi1. */ \ + PASTEMAC2(ch,chr,abval2s)( *chi1, abs_chi1 ); \ +\ + /* Accumulate the absolute value of chi1 into absum. */ \ + PASTEMAC(chr,adds)( abs_chi1, absum ); \ + } \ +\ + /* Store final value of absum to the output variable. */ \ + PASTEMAC(chr,copys)( absum, *norm ); \ +} + +INSERT_GENTFUNCR_BASIC0( norm1v_unb_var1 ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname, kername ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* norm, \ + cntx_t* cntx \ + ) \ +{ \ + ctype_r* zero = PASTEMAC(chr,0); \ + ctype_r* one = PASTEMAC(chr,1); \ + ctype_r scale; \ + ctype_r sumsq; \ + ctype_r sqrt_sumsq; \ +\ + /* Initialize scale and sumsq to begin the summation. */ \ + PASTEMAC(chr,copys)( *zero, scale ); \ + PASTEMAC(chr,copys)( *one, sumsq ); \ +\ + /* Compute the sum of the squares of the vector. */ \ + PASTEMAC(ch,kername) \ + ( \ + n, \ + x, incx, \ + &scale, \ + &sumsq, \ + cntx \ + ); \ +\ + /* Compute: norm = scale * sqrt( sumsq ) */ \ + PASTEMAC(chr,sqrt2s)( sumsq, sqrt_sumsq ); \ + PASTEMAC(chr,scals)( scale, sqrt_sumsq ); \ +\ + /* Store the final value to the output variable. */ \ + PASTEMAC(chr,copys)( sqrt_sumsq, *norm ); \ +} + +INSERT_GENTFUNCR_BASIC( normfv_unb_var1, sumsqv_unb_var1 ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* norm, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + ctype_r abs_chi1; \ + ctype_r abs_chi1_max; \ + dim_t i; \ +\ + /* Initialize the maximum absolute value to zero. */ \ + PASTEMAC(chr,set0s)( abs_chi1_max ); \ +\ + for ( i = 0; i < n; ++i ) \ + { \ + chi1 = x + (i )*incx; \ +\ + /* Compute the absolute value (or complex magnitude) of chi1. */ \ + PASTEMAC2(ch,chr,abval2s)( *chi1, abs_chi1 ); \ +\ + /* If the absolute value of the current element exceeds that of + the previous largest, save it and its index. If NaN is + encountered, then treat it the same as if it were a valid + value that was smaller than any previously seen. This + behavior mimics that of LAPACK's ?lange(). */ \ + if ( abs_chi1_max < abs_chi1 || bli_isnan( abs_chi1 ) ) \ + { \ + PASTEMAC(chr,copys)( abs_chi1, abs_chi1_max ); \ + } \ + } \ +\ + /* Store the final value to the output variable. */ \ + PASTEMAC(chr,copys)( abs_chi1_max, *norm ); \ +} + +INSERT_GENTFUNCR_BASIC0( normiv_unb_var1 ) + + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname, kername ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype_r* norm, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* one = PASTEMAC(ch,1); \ + ctype* x0; \ + ctype* chi1; \ + ctype* x2; \ + ctype_r absum_max; \ + ctype_r absum_j; \ + ctype_r abval_chi1; \ + uplo_t uplox_eff; \ + dim_t n_iter; \ + dim_t n_elem, n_elem_max; \ + inc_t ldx, incx; \ + dim_t j, i; \ + dim_t ij0, n_shift; \ +\ + /* Initialize the maximum absolute column sum to zero. */ \ + PASTEMAC(chr,set0s)( absum_max ); \ +\ + /* If either dimension is zero, return with absum_max equal to zero. */ \ + if ( bli_zero_dim2( m, n ) ) \ + { \ + PASTEMAC(chr,copys)( absum_max, *norm ); \ + return; \ + } \ +\ + /* Set various loop parameters. */ \ + bli_set_dims_incs_uplo_1m_noswap( diagoffx, BLIS_NONUNIT_DIAG, \ + uplox, m, n, rs_x, cs_x, \ + uplox_eff, n_elem_max, n_iter, incx, ldx, \ + ij0, n_shift ); \ +\ + /* If the matrix is zeros, return with absum_max equal to zero. */ \ + if ( bli_is_zeros( uplox_eff ) ) \ + { \ + PASTEMAC(chr,copys)( absum_max, *norm ); \ + return; \ + } \ +\ +\ + /* Handle dense and upper/lower storage cases separately. */ \ + if ( bli_is_dense( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = n_elem_max; \ +\ + x0 = x + (j )*ldx + (0 )*incx; \ +\ + /* Compute the norm of the current column. */ \ + PASTEMAC(ch,kername) \ + ( \ + n_elem, \ + x0, incx, \ + &absum_j, \ + cntx \ + ); \ +\ + /* If absum_j is greater than the previous maximum value, + then save it. */ \ + if ( absum_max < absum_j || bli_isnan( absum_j ) ) \ + { \ + PASTEMAC(chr,copys)( absum_j, absum_max ); \ + } \ + } \ + } \ + else \ + { \ + if ( bli_is_upper( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = bli_min( n_shift + j + 1, n_elem_max ); \ +\ + x0 = x + (ij0+j )*ldx + (0 )*incx; \ + chi1 = x + (ij0+j )*ldx + (n_elem-1)*incx; \ +\ + /* Compute the norm of the super-diagonal elements. */ \ + PASTEMAC(ch,kername) \ + ( \ + n_elem - 1, \ + x0, incx, \ + &absum_j, \ + cntx \ + ); \ +\ + if ( bli_is_unit_diag( diagx ) ) chi1 = one; \ +\ + /* Handle the diagonal element separately in case it's + unit. */ \ + PASTEMAC2(ch,chr,abval2s)( *chi1, abval_chi1 ); \ + PASTEMAC(chr,adds)( abval_chi1, absum_j ); \ +\ + /* If absum_j is greater than the previous maximum value, + then save it. */ \ + if ( absum_max < absum_j || bli_isnan( absum_j ) ) \ + { \ + PASTEMAC(chr,copys)( absum_j, absum_max ); \ + } \ + } \ + } \ + else if ( bli_is_lower( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + i = bli_max( 0, ( doff_t )j - ( doff_t )n_shift ); \ + n_elem = n_elem_max - i; \ +\ + chi1 = x + (j )*ldx + (ij0+i )*incx; \ + x2 = x + (j )*ldx + (ij0+i+1)*incx; \ +\ + /* Compute the norm of the sub-diagonal elements. */ \ + PASTEMAC(ch,kername) \ + ( \ + n_elem - 1, \ + x2, incx, \ + &absum_j, \ + cntx \ + ); \ +\ + if ( bli_is_unit_diag( diagx ) ) chi1 = one; \ +\ + /* Handle the diagonal element separately in case it's + unit. */ \ + PASTEMAC2(ch,chr,abval2s)( *chi1, abval_chi1 ); \ + PASTEMAC(chr,adds)( abval_chi1, absum_j ); \ +\ + /* If absum_j is greater than the previous maximum value, + then save it. */ \ + if ( absum_max < absum_j || bli_isnan( absum_j ) ) \ + { \ + PASTEMAC(chr,copys)( absum_j, absum_max ); \ + } \ + } \ + } \ + } \ +\ + /* Store final value of absum_max to the output variable. */ \ + PASTEMAC(chr,copys)( absum_max, *norm ); \ +} + +INSERT_GENTFUNCR_BASIC( norm1m_unb_var1, norm1v_unb_var1 ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname, kername ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype_r* norm, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* one = PASTEMAC(ch,1); \ + ctype_r* one_r = PASTEMAC(chr,1); \ + ctype_r* zero_r = PASTEMAC(chr,0); \ + ctype* x0; \ + ctype* chi1; \ + ctype* x2; \ + ctype_r scale; \ + ctype_r sumsq; \ + ctype_r sqrt_sumsq; \ + uplo_t uplox_eff; \ + dim_t n_iter; \ + dim_t n_elem, n_elem_max; \ + inc_t ldx, incx; \ + dim_t j, i; \ + dim_t ij0, n_shift; \ +\ + /* Return a norm of zero if either dimension is zero. */ \ + if ( bli_zero_dim2( m, n ) ) \ + { \ + PASTEMAC(chr,set0s)( *norm ); \ + return; \ + } \ +\ + /* Set various loop parameters. Here, we pretend that diagx is equal to + BLIS_NONUNIT_DIAG because we handle the unit diagonal case manually. */ \ + bli_set_dims_incs_uplo_1m( diagoffx, BLIS_NONUNIT_DIAG, \ + uplox, m, n, rs_x, cs_x, \ + uplox_eff, n_elem_max, n_iter, incx, ldx, \ + ij0, n_shift ); \ +\ + /* Check the effective uplo; if it's zeros, then our norm is zero. */ \ + if ( bli_is_zeros( uplox_eff ) ) \ + { \ + PASTEMAC(chr,set0s)( *norm ); \ + return; \ + } \ +\ + /* Initialize scale and sumsq to begin the summation. */ \ + PASTEMAC(chr,copys)( *zero_r, scale ); \ + PASTEMAC(chr,copys)( *one_r, sumsq ); \ +\ + /* Handle dense and upper/lower storage cases separately. */ \ + if ( bli_is_dense( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = n_elem_max; \ +\ + x0 = x + (j )*ldx + (0 )*incx; \ +\ + /* Compute the norm of the current column. */ \ + PASTEMAC(ch,kername) \ + ( \ + n_elem, \ + x0, incx, \ + &scale, \ + &sumsq, \ + cntx \ + ); \ + } \ + } \ + else \ + { \ + if ( bli_is_upper( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = bli_min( n_shift + j + 1, n_elem_max ); \ +\ + x0 = x + (ij0+j )*ldx + (0 )*incx; \ + chi1 = x + (ij0+j )*ldx + (n_elem-1)*incx; \ +\ + /* Sum the squares of the super-diagonal elements. */ \ + PASTEMAC(ch,kername) \ + ( \ + n_elem - 1, \ + x0, incx, \ + &scale, \ + &sumsq, \ + cntx \ + ); \ +\ + if ( bli_is_unit_diag( diagx ) ) chi1 = one; \ +\ + /* Handle the diagonal element separately in case it's + unit. */ \ + PASTEMAC(ch,kername) \ + ( \ + 1, \ + chi1, incx, \ + &scale, \ + &sumsq, \ + cntx \ + ); \ + } \ + } \ + else if ( bli_is_lower( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + i = bli_max( 0, ( doff_t )j - ( doff_t )n_shift ); \ + n_elem = n_elem_max - i; \ +\ + chi1 = x + (j )*ldx + (ij0+i )*incx; \ + x2 = x + (j )*ldx + (ij0+i+1)*incx; \ +\ + /* Sum the squares of the sub-diagonal elements. */ \ + PASTEMAC(ch,kername) \ + ( \ + n_elem - 1, \ + x2, incx, \ + &scale, \ + &sumsq, \ + cntx \ + ); \ +\ + if ( bli_is_unit_diag( diagx ) ) chi1 = one; \ +\ + /* Handle the diagonal element separately in case it's + unit. */ \ + PASTEMAC(ch,kername) \ + ( \ + 1, \ + chi1, incx, \ + &scale, \ + &sumsq, \ + cntx \ + ); \ + } \ + } \ + } \ +\ + /* Compute: norm = scale * sqrt( sumsq ) */ \ + PASTEMAC(chr,sqrt2s)( sumsq, sqrt_sumsq ); \ + PASTEMAC(chr,scals)( scale, sqrt_sumsq ); \ +\ + /* Store the final value to the output variable. */ \ + PASTEMAC(chr,copys)( sqrt_sumsq, *norm ); \ +} + +INSERT_GENTFUNCR_BASIC( normfm_unb_var1, sumsqv_unb_var1 ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname, kername ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype_r* norm, \ + cntx_t* cntx \ + ) \ +{ \ + /* Induce a transposition so that rows become columns. */ \ + bli_swap_dims( m, n ); \ + bli_swap_incs( rs_x, cs_x ); \ + bli_toggle_uplo( uplox ); \ + bli_negate_diag_offset( diagoffx ); \ +\ + /* Now we can simply compute the 1-norm of this transposed matrix, + which will be equivalent to the infinity-norm of the original + matrix. */ \ + PASTEMAC(ch,kername) \ + ( \ + diagoffx, \ + diagx, \ + uplox, \ + m, \ + n, \ + x, rs_x, cs_x, \ + norm, \ + cntx \ + ); \ +} + +INSERT_GENTFUNCR_BASIC( normim_unb_var1, norm1m_unb_var1 ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + FILE* file, \ + char* s1, \ + dim_t n, \ + ctype* x, inc_t incx, \ + char* format, \ + char* s2 \ + ) \ +{ \ + dim_t i; \ + ctype* chi1; \ + char default_spec[32] = PASTEMAC(ch,formatspec)(); \ +\ + if ( format == NULL ) format = default_spec; \ +\ + chi1 = x; \ +\ + fprintf( file, "%s\n", s1 ); \ +\ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,fprints)( file, format, *chi1 ); \ + fprintf( file, "\n" ); \ +\ + chi1 += incx; \ + } \ +\ + fprintf( file, "\n" ); \ + fprintf( file, "%s\n", s2 ); \ +} + +INSERT_GENTFUNC_BASIC0_I( fprintv ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + FILE* file, \ + char* s1, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + char* format, \ + char* s2 \ + ) \ +{ \ + dim_t i, j; \ + ctype* chi1; \ + char default_spec[32] = PASTEMAC(ch,formatspec)(); \ +\ + if ( format == NULL ) format = default_spec; \ +\ + fprintf( file, "%s\n", s1 ); \ +\ + for ( i = 0; i < m; ++i ) \ + { \ + for ( j = 0; j < n; ++j ) \ + { \ + chi1 = (( ctype* ) x) + i*rs_x + j*cs_x; \ +\ + PASTEMAC(ch,fprints)( file, format, *chi1 ); \ + fprintf( file, " " ); \ + } \ +\ + fprintf( file, "\n" ); \ + } \ +\ + fprintf( file, "%s\n", s2 ); \ + fflush( file ); \ +} + +INSERT_GENTFUNC_BASIC0_I( fprintm ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* chi1; \ + dim_t i; \ +\ + chi1 = x; \ +\ + for ( i = 0; i < n; ++i ) \ + { \ + PASTEMAC(ch,rands)( *chi1 ); \ +\ + chi1 += incx; \ + } \ +} + +INSERT_GENTFUNC_BASIC0( randv_unb_var1 ) + + +#undef GENTFUNC +#define GENTFUNC( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ) \ +{ \ + ctype* one = PASTEMAC(ch,1); \ + ctype* x0; \ + ctype* x1; \ + ctype* x2; \ + ctype* chi1; \ + ctype beta; \ + ctype omega; \ + double max_m_n; \ + uplo_t uplox_eff; \ + dim_t n_iter; \ + dim_t n_elem, n_elem_max; \ + inc_t ldx, incx; \ + dim_t j, i; \ + dim_t ij0, n_shift; \ +\ + /* Set various loop parameters. Here, we pretend that diagx is equal to + BLIS_NONUNIT_DIAG because we handle the unit diagonal case manually. */ \ + bli_set_dims_incs_uplo_1m( diagoffx, BLIS_NONUNIT_DIAG, \ + uplox, m, n, rs_x, cs_x, \ + uplox_eff, n_elem_max, n_iter, incx, ldx, \ + ij0, n_shift ); \ +\ + if ( bli_is_zeros( uplox_eff ) ) return; \ +\ + /* Handle dense and upper/lower storage cases separately. */ \ + if ( bli_is_dense( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = n_elem_max; \ +\ + x1 = x + (j )*ldx + (0 )*incx; \ +\ + PASTEMAC(ch,randv) \ + ( \ + n_elem, \ + x1, incx, \ + cntx \ + ); \ + } \ + } \ + else \ + { \ + max_m_n = bli_max( m, n ); \ +\ + PASTEMAC2(d,ch,sets)( max_m_n, 0.0, omega ); \ + PASTEMAC(ch,copys)( *one, beta ); \ + PASTEMAC(ch,invscals)( omega, beta ); \ +\ + if ( bli_is_upper( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + n_elem = bli_min( n_shift + j + 1, n_elem_max ); \ +\ + x1 = x + (ij0+j )*ldx + (0 )*incx; \ + x0 = x1; \ + chi1 = x1 + (n_elem-1)*incx; \ +\ + PASTEMAC(ch,randv) \ + ( \ + n_elem, \ + x1, incx, \ + cntx \ + ); \ +\ + /* We want positive diagonal elements between 1 and 2. */ \ + PASTEMAC(ch,abval2s)( *chi1, *chi1 ); \ + PASTEMAC(ch,adds)( *one, *chi1 ); \ +\ + /* Scale the super-diagonal elements by 1/max(m,n). */ \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + n_elem - 1, \ + &beta, \ + x0, incx, \ + cntx \ + ); \ + } \ + } \ + else if ( bli_is_lower( uplox_eff ) ) \ + { \ + for ( j = 0; j < n_iter; ++j ) \ + { \ + i = bli_max( 0, ( doff_t )j - ( doff_t )n_shift ); \ + n_elem = n_elem_max - i; \ +\ + x1 = x + (j )*ldx + (ij0+i )*incx; \ + x2 = x1 + incx; \ + chi1 = x1; \ +\ + PASTEMAC(ch,randv) \ + ( \ + n_elem, \ + x1, incx, \ + cntx \ + ); \ +\ + /* We want positive diagonal elements between 1 and 2. */ \ + PASTEMAC(ch,abval2s)( *chi1, *chi1 ); \ + PASTEMAC(ch,adds)( *one, *chi1 ); \ +\ + /* Scale the sub-diagonal elements by 1/max(m,n). */ \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + n_elem - 1, \ + &beta, \ + x2, incx, \ + cntx \ + ); \ + } \ + } \ + } \ +} + +INSERT_GENTFUNC_BASIC0( randm_unb_var1 ) + + +#undef GENTFUNCR +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* scale, \ + ctype_r* sumsq, \ + cntx_t* cntx \ + ) \ +{ \ + const ctype_r zero_r = *PASTEMAC(chr,0); \ + const ctype_r one_r = *PASTEMAC(chr,1); \ +\ + ctype* chi1; \ + ctype_r chi1_r; \ + ctype_r chi1_i; \ + ctype_r scale_r; \ + ctype_r sumsq_r; \ + ctype_r abs_chi1_r; \ + dim_t i; \ +\ + /* NOTE: This function attempts to mimic the algorithm for computing + the Frobenius norm in netlib LAPACK's ?lassq(). */ \ +\ + /* Copy scale and sumsq to local variables. */ \ + PASTEMAC(chr,copys)( *scale, scale_r ); \ + PASTEMAC(chr,copys)( *sumsq, sumsq_r ); \ +\ + chi1 = x; \ +\ + for ( i = 0; i < n; ++i ) \ + { \ + /* Get the real and imaginary components of chi1. */ \ + PASTEMAC2(ch,chr,gets)( *chi1, chi1_r, chi1_i ); \ +\ + abs_chi1_r = bli_fabs( chi1_r ); \ +\ + /* Accumulate real component into sumsq, adjusting scale if + needed. */ \ + if ( abs_chi1_r > zero_r || bli_isnan( abs_chi1_r) ) \ + { \ + if ( scale_r < abs_chi1_r ) \ + { \ + sumsq_r = one_r + \ + sumsq_r * ( scale_r / abs_chi1_r ) * \ + ( scale_r / abs_chi1_r ); \ +\ + PASTEMAC(chr,copys)( abs_chi1_r, scale_r ); \ + } \ + else \ + { \ + sumsq_r = sumsq_r + ( abs_chi1_r / scale_r ) * \ + ( abs_chi1_r / scale_r ); \ + } \ + } \ +\ + abs_chi1_r = bli_fabs( chi1_i ); \ +\ + /* Accumulate imaginary component into sumsq, adjusting scale if + needed. */ \ + if ( abs_chi1_r > zero_r || bli_isnan( abs_chi1_r) ) \ + { \ + if ( scale_r < abs_chi1_r ) \ + { \ + sumsq_r = one_r + \ + sumsq_r * ( scale_r / abs_chi1_r ) * \ + ( scale_r / abs_chi1_r ); \ +\ + PASTEMAC(chr,copys)( abs_chi1_r, scale_r ); \ + } \ + else \ + { \ + sumsq_r = sumsq_r + ( abs_chi1_r / scale_r ) * \ + ( abs_chi1_r / scale_r ); \ + } \ + } \ +\ + chi1 += incx; \ + } \ +\ + /* Store final values of scale and sumsq to output variables. */ \ + PASTEMAC(chr,copys)( scale_r, *scale ); \ + PASTEMAC(chr,copys)( sumsq_r, *sumsq ); \ +} + +INSERT_GENTFUNCR_BASIC0( sumsqv_unb_var1 ) + diff --git a/frame/util/bli_util_unb_var1.h b/frame/util/bli_util_unb_var1.h new file mode 100644 index 000000000..f486072bf --- /dev/null +++ b/frame/util/bli_util_unb_var1.h @@ -0,0 +1,195 @@ +/* + + BLIS + An object-based framework for developing high-performance BLAS-like + libraries. + + Copyright (C) 2014, The University of Texas at Austin + + Redistribution and use in source and binary forms, with or without + modification, are permitted provided that the following conditions are + met: + - Redistributions of source code must retain the above copyright + notice, this list of conditions and the following disclaimer. + - Redistributions in binary form must reproduce the above copyright + notice, this list of conditions and the following disclaimer in the + documentation and/or other materials provided with the distribution. + - Neither the name of The University of Texas at Austin nor the names + of its contributors may be used to endorse or promote products + derived from this software without specific prior written permission. + + THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS + "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT + LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR + A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT + HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, + SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT + LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, + DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY + THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT + (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE + OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. + +*/ + + +// +// Prototype BLAS-like interfaces with typed operands. +// + +#undef GENTPROTRI +#define GENTPROTRI( ctype, ctype_r, ctype_i, ch, chr, chi, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_i* abmax_i, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTRI_BASIC( amaxv_unb_var1 ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* asum, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( asumv_unb_var1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + dim_t m, \ + ctype* a, inc_t rs_a, inc_t cs_a, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( mkherm_unb_var1 ) +INSERT_GENTPROT_BASIC( mksymm_unb_var1 ) +INSERT_GENTPROT_BASIC( mktrim_unb_var1 ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* norm, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( norm1v_unb_var1 ) +INSERT_GENTPROTR_BASIC( normfv_unb_var1 ) +INSERT_GENTPROTR_BASIC( normiv_unb_var1 ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + ctype_r* norm, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( norm1m_unb_var1 ) +INSERT_GENTPROTR_BASIC( normfm_unb_var1 ) +INSERT_GENTPROTR_BASIC( normim_unb_var1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + FILE* file, \ + char* s1, \ + dim_t n, \ + ctype* x, inc_t incx, \ + char* format, \ + char* s2 \ + ); + +INSERT_GENTPROT_BASIC_I( fprintv ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, opname ) \ +\ +void PASTEMAC(ch,opname) \ + ( \ + FILE* file, \ + char* s1, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + char* format, \ + char* s2 \ + ); + +INSERT_GENTPROT_BASIC_I( fprintm ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( randv_unb_var1 ) + + +#undef GENTPROT +#define GENTPROT( ctype, ch, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + doff_t diagoffx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + ctype* x, inc_t rs_x, inc_t cs_x, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROT_BASIC( randm_unb_var1 ) + + +#undef GENTPROTR +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ +\ +void PASTEMAC(ch,varname) \ + ( \ + dim_t n, \ + ctype* x, inc_t incx, \ + ctype_r* scale, \ + ctype_r* sumsq, \ + cntx_t* cntx \ + ); + +INSERT_GENTPROTR_BASIC( sumsqv_unb_var1 ) + diff --git a/frame/util/amaxv/bli_amaxv.c b/frame/util/old/amaxv/bli_amaxv.c similarity index 100% rename from frame/util/amaxv/bli_amaxv.c rename to frame/util/old/amaxv/bli_amaxv.c diff --git a/frame/util/amaxv/bli_amaxv.h b/frame/util/old/amaxv/bli_amaxv.h similarity index 100% rename from frame/util/amaxv/bli_amaxv.h rename to frame/util/old/amaxv/bli_amaxv.h diff --git a/frame/util/amaxv/bli_amaxv_check.c b/frame/util/old/amaxv/bli_amaxv_check.c similarity index 100% rename from frame/util/amaxv/bli_amaxv_check.c rename to frame/util/old/amaxv/bli_amaxv_check.c diff --git a/frame/util/amaxv/bli_amaxv_check.h b/frame/util/old/amaxv/bli_amaxv_check.h similarity index 100% rename from frame/util/amaxv/bli_amaxv_check.h rename to frame/util/old/amaxv/bli_amaxv_check.h diff --git a/frame/util/amaxv/bli_amaxv_unb_var1.c b/frame/util/old/amaxv/bli_amaxv_unb_var1.c similarity index 75% rename from frame/util/amaxv/bli_amaxv_unb_var1.c rename to frame/util/old/amaxv/bli_amaxv_unb_var1.c index 41f672eb8..4515f0fc9 100644 --- a/frame/util/amaxv/bli_amaxv_unb_var1.c +++ b/frame/util/old/amaxv/bli_amaxv_unb_var1.c @@ -71,57 +71,57 @@ void bli_amaxv_unb_var1( obj_t* x, #undef GENTFUNCRI -#define GENTFUNCRI( ctype_x, ctype_xr, ctype_i, chx, chxr, chi, varname ) \ +#define GENTFUNCRI( ctype, ctype_r, ctype_i, ch, chr, chi, varname ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* abmax_i \ - ) \ +void PASTEMAC(ch,varname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* abmax_i \ + ) \ { \ - ctype_x* x_cast = x; \ - ctype_i* abmax_i_cast = abmax_i; \ - ctype_xr* minus_one = PASTEMAC(chxr,m1); \ - ctype_i* zero_i = PASTEMAC(chi,0); \ + ctype* x_cast = x; \ + ctype_i* abmax_i_cast = abmax_i; \ + ctype_r* minus_one = PASTEMAC(chr,m1); \ + ctype_i* zero_i = PASTEMAC(chi,0); \ \ - ctype_x* chi1; \ - ctype_xr chi1_r; \ - ctype_xr chi1_i; \ - ctype_xr abs_chi1; \ - ctype_xr abs_chi1_max; \ - ctype_i i_max; \ - dim_t i; \ + ctype* chi1; \ + ctype_r chi1_r; \ + ctype_r chi1_i; \ + ctype_r abs_chi1; \ + ctype_r abs_chi1_max; \ + ctype_i i_max; \ + dim_t i; \ \ /* If the vector is empty, return an index of zero. */ \ if ( bli_zero_dim1( n ) ) \ { \ - PASTEMAC2(chi,chi,copys)( *zero_i, *abmax_i_cast ); \ + PASTEMAC(chi,copys)( *zero_i, *abmax_i_cast ); \ return; \ } \ \ /* Initialize the index of the maximum absolute value to zero. */ \ - PASTEMAC2(chi,chi,copys)( *zero_i, i_max ); \ + PASTEMAC(chi,copys)( *zero_i, i_max ); \ \ /* Initialize the maximum absolute value search candidate with -1, which is guaranteed to be less than all values we will compute. */ \ - PASTEMAC2(chxr,chxr,copys)( *minus_one, abs_chi1_max ); \ + PASTEMAC(chr,copys)( *minus_one, abs_chi1_max ); \ \ for ( i = 0; i < n; ++i ) \ { \ chi1 = x_cast + (i )*incx; \ \ /* Get the real and imaginary components of chi1. */ \ - PASTEMAC2(chx,chxr,gets)( *chi1, chi1_r, chi1_i ); \ + PASTEMAC2(ch,chr,gets)( *chi1, chi1_r, chi1_i ); \ \ /* Replace chi1_r and chi1_i with their absolute values. */ \ - PASTEMAC2(chxr,chxr,abval2s)( chi1_r, chi1_r ); \ - PASTEMAC2(chxr,chxr,abval2s)( chi1_i, chi1_i ); \ + PASTEMAC(chr,abval2s)( chi1_r, chi1_r ); \ + PASTEMAC(chr,abval2s)( chi1_i, chi1_i ); \ \ /* Add the real and imaginary absolute values together. */ \ - PASTEMAC(chxr,set0s)( abs_chi1 ); \ - PASTEMAC2(chxr,chxr,adds)( chi1_r, abs_chi1 ); \ - PASTEMAC2(chxr,chxr,adds)( chi1_i, abs_chi1 ); \ + PASTEMAC(chr,set0s)( abs_chi1 ); \ + PASTEMAC(chr,adds)( chi1_r, abs_chi1 ); \ + PASTEMAC(chr,adds)( chi1_i, abs_chi1 ); \ \ /* If the absolute value of the current element exceeds that of the previous largest, save it and its index. If NaN is @@ -130,13 +130,13 @@ void PASTEMAC(chx,varname)( \ behavior mimics that of LAPACK's ?lange(). */ \ if ( abs_chi1_max < abs_chi1 || bli_isnan( abs_chi1 ) ) \ { \ - PASTEMAC2(chxr,chxr,copys)( abs_chi1, abs_chi1_max ); \ - PASTEMAC2(chi,chi,copys)( i, i_max ); \ + PASTEMAC(chr,copys)( abs_chi1, abs_chi1_max ); \ + PASTEMAC(chi,copys)( i, i_max ); \ } \ } \ \ /* Store final index to output variable. */ \ - PASTEMAC2(chi,chi,copys)( i_max, *abmax_i_cast ); \ + PASTEMAC(chi,copys)( i_max, *abmax_i_cast ); \ } INSERT_GENTFUNCRI_BASIC0( amaxv_unb_var1 ) diff --git a/frame/util/amaxv/bli_amaxv_unb_var1.h b/frame/util/old/amaxv/bli_amaxv_unb_var1.h similarity index 87% rename from frame/util/amaxv/bli_amaxv_unb_var1.h rename to frame/util/old/amaxv/bli_amaxv_unb_var1.h index 12cc7a6d5..d975731f7 100644 --- a/frame/util/amaxv/bli_amaxv_unb_var1.h +++ b/frame/util/old/amaxv/bli_amaxv_unb_var1.h @@ -37,13 +37,13 @@ void bli_amaxv_unb_var1( obj_t* x, #undef GENTPROTRI -#define GENTPROTRI( ctype_x, ctype_xr, ctype_i, chx, chxr, chi, varname ) \ +#define GENTPROTRI( ctype, ctype_r, ctype_i, ch, chr, chi, varname ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* abmax_i \ - ); +void PASTEMAC(ch,varname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* abmax_i \ + ); INSERT_GENTPROTRI_BASIC( amaxv_unb_var1 ) diff --git a/frame/util/asumv/bli_asumv.c b/frame/util/old/asumv/bli_asumv.c similarity index 100% rename from frame/util/asumv/bli_asumv.c rename to frame/util/old/asumv/bli_asumv.c diff --git a/frame/util/asumv/bli_asumv.h b/frame/util/old/asumv/bli_asumv.h similarity index 100% rename from frame/util/asumv/bli_asumv.h rename to frame/util/old/asumv/bli_asumv.h diff --git a/frame/util/asumv/bli_asumv_check.c b/frame/util/old/asumv/bli_asumv_check.c similarity index 100% rename from frame/util/asumv/bli_asumv_check.c rename to frame/util/old/asumv/bli_asumv_check.c diff --git a/frame/util/asumv/bli_asumv_check.h b/frame/util/old/asumv/bli_asumv_check.h similarity index 100% rename from frame/util/asumv/bli_asumv_check.h rename to frame/util/old/asumv/bli_asumv_check.h diff --git a/frame/util/asumv/bli_asumv_unb_var1.c b/frame/util/old/asumv/bli_asumv_unb_var1.c similarity index 82% rename from frame/util/asumv/bli_asumv_unb_var1.c rename to frame/util/old/asumv/bli_asumv_unb_var1.c index 5845a8720..fba424af2 100644 --- a/frame/util/asumv/bli_asumv_unb_var1.c +++ b/frame/util/old/asumv/bli_asumv_unb_var1.c @@ -71,43 +71,43 @@ void bli_asumv_unb_var1( obj_t* x, #undef GENTFUNCR -#define GENTFUNCR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* asum \ - ) \ +void PASTEMAC(ch,varname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* asum \ + ) \ { \ - ctype_x* x_cast = x; \ - ctype_xr* asum_cast = asum; \ - ctype_x* chi1; \ - ctype_xr chi1_r; \ - ctype_xr chi1_i; \ - ctype_xr absum; \ - dim_t i; \ + ctype* x_cast = x; \ + ctype_r* asum_cast = asum; \ + ctype* chi1; \ + ctype_r chi1_r; \ + ctype_r chi1_i; \ + ctype_r absum; \ + dim_t i; \ \ /* Initialize the absolute sum accumulator to zero. */ \ - PASTEMAC(chxr,set0s)( absum ); \ + PASTEMAC(chr,set0s)( absum ); \ \ for ( i = 0; i < n; ++i ) \ { \ chi1 = x_cast + (i )*incx; \ \ /* Get the real and imaginary components of chi1. */ \ - PASTEMAC2(chx,chxr,gets)( *chi1, chi1_r, chi1_i ); \ + PASTEMAC2(ch,chr,gets)( *chi1, chi1_r, chi1_i ); \ \ /* Replace chi1_r and chi1_i with their absolute values. */ \ chi1_r = bli_fabs( chi1_r ); \ chi1_i = bli_fabs( chi1_i ); \ \ /* Accumulate the real and imaginary components into absum. */ \ - PASTEMAC2(chxr,chxr,adds)( chi1_r, absum ); \ - PASTEMAC2(chxr,chxr,adds)( chi1_i, absum ); \ + PASTEMAC(chr,adds)( chi1_r, absum ); \ + PASTEMAC(chr,adds)( chi1_i, absum ); \ } \ \ /* Store the final value of absum to the output variable. */ \ - PASTEMAC2(chxr,chxr,copys)( absum, *asum_cast ); \ + PASTEMAC(chr,copys)( absum, *asum_cast ); \ } INSERT_GENTFUNCR_BASIC0( asumv_unb_var1 ) diff --git a/frame/util/asumv/bli_asumv_unb_var1.h b/frame/util/old/asumv/bli_asumv_unb_var1.h similarity index 100% rename from frame/util/asumv/bli_asumv_unb_var1.h rename to frame/util/old/asumv/bli_asumv_unb_var1.h diff --git a/frame/util/mkherm/bli_mkherm.c b/frame/util/old/mkherm/bli_mkherm.c similarity index 100% rename from frame/util/mkherm/bli_mkherm.c rename to frame/util/old/mkherm/bli_mkherm.c diff --git a/frame/util/mkherm/bli_mkherm.h b/frame/util/old/mkherm/bli_mkherm.h similarity index 100% rename from frame/util/mkherm/bli_mkherm.h rename to frame/util/old/mkherm/bli_mkherm.h diff --git a/frame/util/mkherm/bli_mkherm_check.c b/frame/util/old/mkherm/bli_mkherm_check.c similarity index 100% rename from frame/util/mkherm/bli_mkherm_check.c rename to frame/util/old/mkherm/bli_mkherm_check.c diff --git a/frame/util/mkherm/bli_mkherm_check.h b/frame/util/old/mkherm/bli_mkherm_check.h similarity index 100% rename from frame/util/mkherm/bli_mkherm_check.h rename to frame/util/old/mkherm/bli_mkherm_check.h diff --git a/frame/util/mkherm/bli_mkherm_unb_var1.c b/frame/util/old/mkherm/bli_mkherm_unb_var1.c similarity index 85% rename from frame/util/mkherm/bli_mkherm_unb_var1.c rename to frame/util/old/mkherm/bli_mkherm_unb_var1.c index b069c4f0f..2eff3897e 100644 --- a/frame/util/mkherm/bli_mkherm_unb_var1.c +++ b/frame/util/old/mkherm/bli_mkherm_unb_var1.c @@ -76,11 +76,12 @@ void bli_mkherm_unb_var1( obj_t* a ) void PASTEMAC(ch,varname)( \ uplo_t uploa, \ dim_t m, \ - void* a, inc_t rs_a, inc_t cs_a \ + void* a, inc_t rs_a, inc_t cs_a \ ) \ { \ - ctype_r* zeror = PASTEMAC(chr,0); \ - ctype* a_cast = a; \ + cntx_t* cntx = NULL; \ + ctype_r* zeror = PASTEMAC(chr,0); \ + ctype* a_cast = a; \ ctype* alpha11; \ doff_t diagoffa; \ \ @@ -98,21 +99,29 @@ void PASTEMAC(ch,varname)( \ /* We will be reflecting the stored region over the diagonal into the unstored region, so a transposition is necessary. Furthermore, since we are creating a Hermitian matrix, we must also conjugate. */ \ - PASTEMAC2(ch,ch,copym)( diagoffa, \ - BLIS_NONUNIT_DIAG, \ - uploa, \ - BLIS_CONJ_TRANSPOSE, \ - m, \ - m, \ - a_cast, rs_a, cs_a, \ - a_cast, rs_a, cs_a ); \ + PASTEMAC(ch,copym) \ + ( \ + diagoffa, \ + BLIS_NONUNIT_DIAG, \ + uploa, \ + BLIS_CONJ_TRANSPOSE, \ + m, \ + m, \ + a_cast, rs_a, cs_a, \ + a_cast, rs_a, cs_a, \ + cntx \ + ); \ \ /* Set the imaginary parts of the diagonal elements to zero. */ \ - PASTEMAC(ch,setid)( 0, \ - m, \ - m, \ - zeror, \ - a_cast, rs_a, cs_a ); \ + PASTEMAC(ch,setid) \ + ( \ + 0, \ + m, \ + m, \ + zeror, \ + a_cast, rs_a, cs_a, \ + cntx \ + ); \ } diff --git a/frame/util/mkherm/bli_mkherm_unb_var1.h b/frame/util/old/mkherm/bli_mkherm_unb_var1.h similarity index 100% rename from frame/util/mkherm/bli_mkherm_unb_var1.h rename to frame/util/old/mkherm/bli_mkherm_unb_var1.h diff --git a/frame/util/mksymm/bli_mksymm.c b/frame/util/old/mksymm/bli_mksymm.c similarity index 100% rename from frame/util/mksymm/bli_mksymm.c rename to frame/util/old/mksymm/bli_mksymm.c diff --git a/frame/util/mksymm/bli_mksymm.h b/frame/util/old/mksymm/bli_mksymm.h similarity index 100% rename from frame/util/mksymm/bli_mksymm.h rename to frame/util/old/mksymm/bli_mksymm.h diff --git a/frame/util/mksymm/bli_mksymm_check.c b/frame/util/old/mksymm/bli_mksymm_check.c similarity index 100% rename from frame/util/mksymm/bli_mksymm_check.c rename to frame/util/old/mksymm/bli_mksymm_check.c diff --git a/frame/util/mksymm/bli_mksymm_check.h b/frame/util/old/mksymm/bli_mksymm_check.h similarity index 100% rename from frame/util/mksymm/bli_mksymm_check.h rename to frame/util/old/mksymm/bli_mksymm_check.h diff --git a/frame/util/mksymm/bli_mksymm_unb_var1.c b/frame/util/old/mksymm/bli_mksymm_unb_var1.c similarity index 90% rename from frame/util/mksymm/bli_mksymm_unb_var1.c rename to frame/util/old/mksymm/bli_mksymm_unb_var1.c index a7a96694c..40f7d4a44 100644 --- a/frame/util/mksymm/bli_mksymm_unb_var1.c +++ b/frame/util/old/mksymm/bli_mksymm_unb_var1.c @@ -79,7 +79,8 @@ void PASTEMAC(ch,varname)( \ void* a, inc_t rs_a, inc_t cs_a \ ) \ { \ - ctype* a_cast = a; \ + cntx_t* cntx = NULL; \ + ctype* a_cast = a; \ doff_t diagoffa; \ \ /* If the dimension is zero, return early. */ \ @@ -92,14 +93,18 @@ void PASTEMAC(ch,varname)( \ \ /* We will be reflecting the stored region over the diagonal into the unstored region, so a transposition is necessary. */ \ - PASTEMAC2(ch,ch,copym)( diagoffa, \ - BLIS_NONUNIT_DIAG, \ - uploa, \ - BLIS_TRANSPOSE, \ - m, \ - m, \ - a_cast, rs_a, cs_a, \ - a_cast, rs_a, cs_a ); \ + PASTEMAC(ch,copym) \ + ( \ + diagoffa, \ + BLIS_NONUNIT_DIAG, \ + uploa, \ + BLIS_TRANSPOSE, \ + m, \ + m, \ + a_cast, rs_a, cs_a, \ + a_cast, rs_a, cs_a, \ + cntx \ + ); \ } diff --git a/frame/util/mksymm/bli_mksymm_unb_var1.h b/frame/util/old/mksymm/bli_mksymm_unb_var1.h similarity index 100% rename from frame/util/mksymm/bli_mksymm_unb_var1.h rename to frame/util/old/mksymm/bli_mksymm_unb_var1.h diff --git a/frame/util/mktrim/bli_mktrim.c b/frame/util/old/mktrim/bli_mktrim.c similarity index 100% rename from frame/util/mktrim/bli_mktrim.c rename to frame/util/old/mktrim/bli_mktrim.c diff --git a/frame/util/mktrim/bli_mktrim.h b/frame/util/old/mktrim/bli_mktrim.h similarity index 100% rename from frame/util/mktrim/bli_mktrim.h rename to frame/util/old/mktrim/bli_mktrim.h diff --git a/frame/util/mktrim/bli_mktrim_check.c b/frame/util/old/mktrim/bli_mktrim_check.c similarity index 100% rename from frame/util/mktrim/bli_mktrim_check.c rename to frame/util/old/mktrim/bli_mktrim_check.c diff --git a/frame/util/mktrim/bli_mktrim_check.h b/frame/util/old/mktrim/bli_mktrim_check.h similarity index 100% rename from frame/util/mktrim/bli_mktrim_check.h rename to frame/util/old/mktrim/bli_mktrim_check.h diff --git a/frame/util/mktrim/bli_mktrim_unb_var1.c b/frame/util/old/mktrim/bli_mktrim_unb_var1.c similarity index 85% rename from frame/util/mktrim/bli_mktrim_unb_var1.c rename to frame/util/old/mktrim/bli_mktrim_unb_var1.c index ad39ce5cf..50e536316 100644 --- a/frame/util/mktrim/bli_mktrim_unb_var1.c +++ b/frame/util/old/mktrim/bli_mktrim_unb_var1.c @@ -73,14 +73,16 @@ void bli_mktrim_unb_var1( obj_t* a ) #undef GENTFUNC #define GENTFUNC( ctype, ch, varname ) \ \ -void PASTEMAC(ch,varname)( \ - uplo_t uploa, \ - dim_t m, \ - void* a, inc_t rs_a, inc_t cs_a \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + uplo_t uploa, \ + dim_t m, \ + void* a, inc_t rs_a, inc_t cs_a \ + ) \ { \ - ctype* a_cast = a; \ - ctype* zero = PASTEMAC(ch,0); \ + cntx_t* cntx = NULL; \ + ctype* a_cast = a; \ + ctype* zero = PASTEMAC(ch,0); \ doff_t diagoffa; \ \ /* If the dimension is zero, return early. */ \ @@ -95,13 +97,17 @@ void PASTEMAC(ch,varname)( \ else /*if ( bli_is_lower( uploa ) )*/ diagoffa = -1; \ \ /* Set the unstored triangle to zero. */ \ - PASTEMAC2(ch,ch,setm)( diagoffa, \ - BLIS_NONUNIT_DIAG, \ - uploa, \ - m, \ - m, \ - zero, \ - a_cast, rs_a, cs_a ); \ + PASTEMAC(ch,setm) \ + ( \ + diagoffa, \ + BLIS_NONUNIT_DIAG, \ + uploa, \ + m, \ + m, \ + zero, \ + a_cast, rs_a, cs_a, \ + cntx \ + ); \ } diff --git a/frame/util/mktrim/bli_mktrim_unb_var1.h b/frame/util/old/mktrim/bli_mktrim_unb_var1.h similarity index 100% rename from frame/util/mktrim/bli_mktrim_unb_var1.h rename to frame/util/old/mktrim/bli_mktrim_unb_var1.h diff --git a/frame/util/norm1m/bli_norm1m.c b/frame/util/old/norm1m/bli_norm1m.c similarity index 100% rename from frame/util/norm1m/bli_norm1m.c rename to frame/util/old/norm1m/bli_norm1m.c diff --git a/frame/util/norm1m/bli_norm1m.h b/frame/util/old/norm1m/bli_norm1m.h similarity index 100% rename from frame/util/norm1m/bli_norm1m.h rename to frame/util/old/norm1m/bli_norm1m.h diff --git a/frame/util/norm1m/bli_norm1m_check.c b/frame/util/old/norm1m/bli_norm1m_check.c similarity index 100% rename from frame/util/norm1m/bli_norm1m_check.c rename to frame/util/old/norm1m/bli_norm1m_check.c diff --git a/frame/util/norm1m/bli_norm1m_check.h b/frame/util/old/norm1m/bli_norm1m_check.h similarity index 100% rename from frame/util/norm1m/bli_norm1m_check.h rename to frame/util/old/norm1m/bli_norm1m_check.h diff --git a/frame/util/norm1m/bli_norm1m_unb_var1.c b/frame/util/old/norm1m/bli_norm1m_unb_var1.c similarity index 73% rename from frame/util/norm1m/bli_norm1m_unb_var1.c rename to frame/util/old/norm1m/bli_norm1m_unb_var1.c index 976692e9f..03b0731dc 100644 --- a/frame/util/norm1m/bli_norm1m_unb_var1.c +++ b/frame/util/old/norm1m/bli_norm1m_unb_var1.c @@ -85,43 +85,39 @@ void bli_norm1m_unb_var1( obj_t* x, #undef GENTFUNCR -#define GENTFUNCR( ctype_x, ctype_xr, chx, chxr, varname, kername ) \ +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(chx,varname)( \ - doff_t diagoffx, \ - diag_t diagx, \ - uplo_t uplox, \ - dim_t m, \ - dim_t n, \ - void* x, inc_t rs_x, inc_t cs_x, \ - void* norm \ - ) \ +void PASTEMAC(ch,varname)( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + void* norm \ + ) \ { \ - ctype_x* x_cast = x; \ - ctype_xr* norm_cast = norm; \ - ctype_x* one = PASTEMAC(chx,1); \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_xr absum_max; \ - ctype_xr absum_j; \ - ctype_xr abval_chi1; \ - uplo_t uplox_eff; \ - dim_t n_iter; \ - dim_t n_elem, n_elem_max; \ - inc_t ldx, incx; \ - dim_t j, i; \ - dim_t ij0, n_shift; \ + ctype* x_cast = x; \ + ctype_r* norm_cast = norm; \ + ctype* one = PASTEMAC(ch,1); \ + ctype* x0; \ + ctype* chi1; \ + ctype* x2; \ + ctype_r absum_max; \ + ctype_r absum_j; \ + ctype_r abval_chi1; \ + uplo_t uplox_eff; \ + dim_t n_iter; \ + dim_t n_elem, n_elem_max; \ + inc_t ldx, incx; \ + dim_t j, i; \ + dim_t ij0, n_shift; \ +\ + /* NOTE: Early returns due to empty dimensions are handled by the + caller. */ \ \ /* Initialize the maximum absolute column sum to zero. */ \ - PASTEMAC(chxr,set0s)( absum_max ); \ -\ - /* If either dimension is zero, return with absum_max equal to zero. */ \ - if ( bli_zero_dim2( m, n ) ) \ - { \ - PASTEMAC2(chxr,chxr,copys)( absum_max, *norm_cast ); \ - return; \ - } \ + PASTEMAC(chr,set0s)( absum_max ); \ \ /* Set various loop parameters. */ \ bli_set_dims_incs_uplo_1m_noswap( diagoffx, BLIS_NONUNIT_DIAG, \ @@ -132,7 +128,7 @@ void PASTEMAC(chx,varname)( \ /* If the matrix is zeros, return with absum_max equal to zero. */ \ if ( bli_is_zeros( uplox_eff ) ) \ { \ - PASTEMAC2(chxr,chxr,copys)( absum_max, *norm_cast ); \ + PASTEMAC(chr,copys)( absum_max, *norm_cast ); \ return; \ } \ \ @@ -147,15 +143,18 @@ void PASTEMAC(chx,varname)( \ x0 = x_cast + (j )*ldx + (0 )*incx; \ \ /* Compute the norm of the current column. */ \ - PASTEMAC(chx,kername)( n_elem, \ - x0, incx, \ - &absum_j ); \ + PASTEMAC(ch,kername) \ + ( \ + n_elem, \ + x0, incx, \ + &absum_j \ + ); \ \ /* If absum_j is greater than the previous maximum value, then save it. */ \ if ( absum_max < absum_j || bli_isnan( absum_j ) ) \ { \ - PASTEMAC2(chxr,chxr,copys)( absum_j, absum_max ); \ + PASTEMAC(chr,copys)( absum_j, absum_max ); \ } \ } \ } \ @@ -171,22 +170,25 @@ void PASTEMAC(chx,varname)( \ chi1 = x_cast + (ij0+j )*ldx + (n_elem-1)*incx; \ \ /* Compute the norm of the super-diagonal elements. */ \ - PASTEMAC(chx,kername)( n_elem - 1, \ - x0, incx, \ - &absum_j ); \ + PASTEMAC(ch,kername) \ + ( \ + n_elem - 1, \ + x0, incx, \ + &absum_j \ + ); \ \ if ( bli_is_unit_diag( diagx ) ) chi1 = one; \ \ /* Handle the diagonal element separately in case it's unit. */ \ - PASTEMAC2(chx,chxr,abval2s)( *chi1, abval_chi1 ); \ - PASTEMAC2(chxr,chxr,adds)( abval_chi1, absum_j ); \ + PASTEMAC2(ch,chr,abval2s)( *chi1, abval_chi1 ); \ + PASTEMAC(chr,adds)( abval_chi1, absum_j ); \ \ /* If absum_j is greater than the previous maximum value, then save it. */ \ if ( absum_max < absum_j || bli_isnan( absum_j ) ) \ { \ - PASTEMAC2(chxr,chxr,copys)( absum_j, absum_max ); \ + PASTEMAC(chr,copys)( absum_j, absum_max ); \ } \ } \ } \ @@ -201,29 +203,32 @@ void PASTEMAC(chx,varname)( \ x2 = x_cast + (j )*ldx + (ij0+i+1)*incx; \ \ /* Compute the norm of the sub-diagonal elements. */ \ - PASTEMAC(chx,kername)( n_elem - 1, \ - x2, incx, \ - &absum_j ); \ + PASTEMAC(ch,kername) \ + ( \ + n_elem - 1, \ + x2, incx, \ + &absum_j \ + ); \ \ if ( bli_is_unit_diag( diagx ) ) chi1 = one; \ \ /* Handle the diagonal element separately in case it's unit. */ \ - PASTEMAC2(chx,chxr,abval2s)( *chi1, abval_chi1 ); \ - PASTEMAC2(chxr,chxr,adds)( abval_chi1, absum_j ); \ + PASTEMAC2(ch,chr,abval2s)( *chi1, abval_chi1 ); \ + PASTEMAC(chr,adds)( abval_chi1, absum_j ); \ \ /* If absum_j is greater than the previous maximum value, then save it. */ \ if ( absum_max < absum_j || bli_isnan( absum_j ) ) \ { \ - PASTEMAC2(chxr,chxr,copys)( absum_j, absum_max ); \ + PASTEMAC(chr,copys)( absum_j, absum_max ); \ } \ } \ } \ } \ \ /* Store final value of absum_max to the output variable. */ \ - PASTEMAC2(chxr,chxr,copys)( absum_max, *norm_cast ); \ + PASTEMAC(chr,copys)( absum_max, *norm_cast ); \ } diff --git a/frame/util/norm1m/bli_norm1m_unb_var1.h b/frame/util/old/norm1m/bli_norm1m_unb_var1.h similarity index 80% rename from frame/util/norm1m/bli_norm1m_unb_var1.h rename to frame/util/old/norm1m/bli_norm1m_unb_var1.h index 1ac6e1a58..ffd7d494c 100644 --- a/frame/util/norm1m/bli_norm1m_unb_var1.h +++ b/frame/util/old/norm1m/bli_norm1m_unb_var1.h @@ -37,17 +37,17 @@ void bli_norm1m_unb_var1( obj_t* x, #undef GENTPROTR -#define GENTPROTR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - doff_t diagoffx, \ - diag_t diagx, \ - uplo_t uplox, \ - dim_t m, \ - dim_t n, \ - void* x, inc_t rs_x, inc_t cs_x, \ - void* norm \ - ); +void PASTEMAC(ch,varname)( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + void* norm \ + ); INSERT_GENTPROTR_BASIC( norm1m_unb_var1 ) diff --git a/frame/util/norm1v/bli_norm1v.c b/frame/util/old/norm1v/bli_norm1v.c similarity index 100% rename from frame/util/norm1v/bli_norm1v.c rename to frame/util/old/norm1v/bli_norm1v.c diff --git a/frame/util/norm1v/bli_norm1v.h b/frame/util/old/norm1v/bli_norm1v.h similarity index 100% rename from frame/util/norm1v/bli_norm1v.h rename to frame/util/old/norm1v/bli_norm1v.h diff --git a/frame/util/norm1v/bli_norm1v_check.c b/frame/util/old/norm1v/bli_norm1v_check.c similarity index 100% rename from frame/util/norm1v/bli_norm1v_check.c rename to frame/util/old/norm1v/bli_norm1v_check.c diff --git a/frame/util/norm1v/bli_norm1v_check.h b/frame/util/old/norm1v/bli_norm1v_check.h similarity index 100% rename from frame/util/norm1v/bli_norm1v_check.h rename to frame/util/old/norm1v/bli_norm1v_check.h diff --git a/frame/util/norm1v/bli_norm1v_unb_var1.c b/frame/util/old/norm1v/bli_norm1v_unb_var1.c similarity index 81% rename from frame/util/norm1v/bli_norm1v_unb_var1.c rename to frame/util/old/norm1v/bli_norm1v_unb_var1.c index 4c487f32c..efce21280 100644 --- a/frame/util/norm1v/bli_norm1v_unb_var1.c +++ b/frame/util/old/norm1v/bli_norm1v_unb_var1.c @@ -71,37 +71,40 @@ void bli_norm1v_unb_var1( obj_t* x, #undef GENTFUNCR -#define GENTFUNCR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* norm \ - ) \ +void PASTEMAC(ch,varname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* norm \ + ) \ { \ - ctype_x* x_cast = x; \ - ctype_xr* norm_cast = norm; \ - ctype_x* chi1; \ - ctype_xr abs_chi1; \ - ctype_xr absum; \ - dim_t i; \ + ctype* x_cast = x; \ + ctype_r* norm_cast = norm; \ + ctype* chi1; \ + ctype_r abs_chi1; \ + ctype_r absum; \ + dim_t i; \ +\ + /* NOTE: Early returns due to empty dimensions are handled by the + caller. */ \ \ /* Initialize the absolute sum accumulator to zero. */ \ - PASTEMAC(chxr,set0s)( absum ); \ + PASTEMAC(chr,set0s)( absum ); \ \ for ( i = 0; i < n; ++i ) \ { \ chi1 = x_cast + (i )*incx; \ \ /* Compute the absolute value (or complex magnitude) of chi1. */ \ - PASTEMAC2(chx,chxr,abval2s)( *chi1, abs_chi1 ); \ + PASTEMAC2(ch,chr,abval2s)( *chi1, abs_chi1 ); \ \ /* Accumulate the absolute value of chi1 into absum. */ \ - PASTEMAC2(chxr,chxr,adds)( abs_chi1, absum ); \ + PASTEMAC(chr,adds)( abs_chi1, absum ); \ } \ \ /* Store final value of absum to the output variable. */ \ - PASTEMAC2(chxr,chxr,copys)( absum, *norm_cast ); \ + PASTEMAC(chr,copys)( absum, *norm_cast ); \ } INSERT_GENTFUNCR_BASIC0( norm1v_unb_var1 ) diff --git a/frame/util/norm1v/bli_norm1v_unb_var1.h b/frame/util/old/norm1v/bli_norm1v_unb_var1.h similarity index 87% rename from frame/util/norm1v/bli_norm1v_unb_var1.h rename to frame/util/old/norm1v/bli_norm1v_unb_var1.h index fd2fb555f..8020b00f8 100644 --- a/frame/util/norm1v/bli_norm1v_unb_var1.h +++ b/frame/util/old/norm1v/bli_norm1v_unb_var1.h @@ -37,13 +37,13 @@ void bli_norm1v_unb_var1( obj_t* x, #undef GENTPROTR -#define GENTPROTR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* norm \ - ); +void PASTEMAC(ch,varname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* norm \ + ); INSERT_GENTPROTR_BASIC( norm1v_unb_var1 ) diff --git a/frame/util/normfm/bli_normfm.c b/frame/util/old/normfm/bli_normfm.c similarity index 100% rename from frame/util/normfm/bli_normfm.c rename to frame/util/old/normfm/bli_normfm.c diff --git a/frame/util/normfm/bli_normfm.h b/frame/util/old/normfm/bli_normfm.h similarity index 100% rename from frame/util/normfm/bli_normfm.h rename to frame/util/old/normfm/bli_normfm.h diff --git a/frame/util/normfm/bli_normfm_check.c b/frame/util/old/normfm/bli_normfm_check.c similarity index 100% rename from frame/util/normfm/bli_normfm_check.c rename to frame/util/old/normfm/bli_normfm_check.c diff --git a/frame/util/normfm/bli_normfm_check.h b/frame/util/old/normfm/bli_normfm_check.h similarity index 100% rename from frame/util/normfm/bli_normfm_check.h rename to frame/util/old/normfm/bli_normfm_check.h diff --git a/frame/util/normfm/bli_normfm_unb_var1.c b/frame/util/old/normfm/bli_normfm_unb_var1.c similarity index 70% rename from frame/util/normfm/bli_normfm_unb_var1.c rename to frame/util/old/normfm/bli_normfm_unb_var1.c index 78ffbb1ec..b86f38bc9 100644 --- a/frame/util/normfm/bli_normfm_unb_var1.c +++ b/frame/util/old/normfm/bli_normfm_unb_var1.c @@ -85,42 +85,38 @@ void bli_normfm_unb_var1( obj_t* x, #undef GENTFUNCR -#define GENTFUNCR( ctype_x, ctype_xr, chx, chxr, varname, kername ) \ +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(chx,varname)( \ - doff_t diagoffx, \ - diag_t diagx, \ - uplo_t uplox, \ - dim_t m, \ - dim_t n, \ - void* x, inc_t rs_x, inc_t cs_x, \ - void* norm \ - ) \ +void PASTEMAC(ch,varname)( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + void* norm \ + ) \ { \ - ctype_x* x_cast = x; \ - ctype_xr* norm_cast = norm; \ - ctype_x* one = PASTEMAC(chx,1); \ - ctype_xr* one_r = PASTEMAC(chxr,1); \ - ctype_xr* zero_r = PASTEMAC(chxr,0); \ - ctype_x* x0; \ - ctype_x* chi1; \ - ctype_x* x2; \ - ctype_xr scale; \ - ctype_xr sumsq; \ - ctype_xr sqrt_sumsq; \ - uplo_t uplox_eff; \ - dim_t n_iter; \ - dim_t n_elem, n_elem_max; \ - inc_t ldx, incx; \ - dim_t j, i; \ - dim_t ij0, n_shift; \ + ctype* x_cast = x; \ + ctype_r* norm_cast = norm; \ + ctype* one = PASTEMAC(chx,1); \ + ctype_r* one_r = PASTEMAC(chxr,1); \ + ctype_r* zero_r = PASTEMAC(chxr,0); \ + ctype* x0; \ + ctype* chi1; \ + ctype* x2; \ + ctype_r scale; \ + ctype_r sumsq; \ + ctype_r sqrt_sumsq; \ + uplo_t uplox_eff; \ + dim_t n_iter; \ + dim_t n_elem, n_elem_max; \ + inc_t ldx, incx; \ + dim_t j, i; \ + dim_t ij0, n_shift; \ \ - /* Return a norm of zero if either dimension is zero. */ \ - if ( bli_zero_dim2( m, n ) ) \ - { \ - PASTEMAC(chxr,set0s)( *norm_cast ); \ - return; \ - } \ + /* NOTE: Early returns due to empty dimensions are handled by the + caller. */ \ \ /* Set various loop parameters. Here, we pretend that diagx is equal to BLIS_NONUNIT_DIAG because we handle the unit diagonal case manually. */ \ @@ -132,13 +128,13 @@ void PASTEMAC(chx,varname)( \ /* Check the effective uplo; if it's zeros, then our norm is zero. */ \ if ( bli_is_zeros( uplox_eff ) ) \ { \ - PASTEMAC(chxr,set0s)( *norm_cast ); \ + PASTEMAC(chr,set0s)( *norm_cast ); \ return; \ } \ \ /* Initialize scale and sumsq to begin the summation. */ \ - PASTEMAC2(chxr,chxr,copys)( *zero_r, scale ); \ - PASTEMAC2(chxr,chxr,copys)( *one_r, sumsq ); \ + PASTEMAC(chr,copys)( *zero_r, scale ); \ + PASTEMAC(chr,copys)( *one_r, sumsq ); \ \ /* Handle dense and upper/lower storage cases separately. */ \ if ( bli_is_dense( uplox_eff ) ) \ @@ -150,10 +146,13 @@ void PASTEMAC(chx,varname)( \ x0 = x_cast + (j )*ldx + (0 )*incx; \ \ /* Compute the norm of the current column. */ \ - PASTEMAC(chx,kername)( n_elem, \ - x0, incx, \ - &scale, \ - &sumsq ); \ + PASTEMAC(ch,kername) \ + ( \ + n_elem, \ + x0, incx, \ + &scale, \ + &sumsq \ + ); \ } \ } \ else \ @@ -168,19 +167,25 @@ void PASTEMAC(chx,varname)( \ chi1 = x_cast + (ij0+j )*ldx + (n_elem-1)*incx; \ \ /* Sum the squares of the super-diagonal elements. */ \ - PASTEMAC(chx,kername)( n_elem - 1, \ - x0, incx, \ - &scale, \ - &sumsq ); \ + PASTEMAC(ch,kername) \ + ( \ + n_elem - 1, \ + x0, incx, \ + &scale, \ + &sumsq \ + ); \ \ if ( bli_is_unit_diag( diagx ) ) chi1 = one; \ \ /* Handle the diagonal element separately in case it's unit. */ \ - PASTEMAC(chx,kername)( 1, \ - chi1, incx, \ - &scale, \ - &sumsq ); \ + PASTEMAC(ch,kername) \ + ( \ + 1, \ + chi1, incx, \ + &scale, \ + &sumsq \ + ); \ } \ } \ else if ( bli_is_lower( uplox_eff ) ) \ @@ -194,29 +199,35 @@ void PASTEMAC(chx,varname)( \ x2 = x_cast + (j )*ldx + (ij0+i+1)*incx; \ \ /* Sum the squares of the sub-diagonal elements. */ \ - PASTEMAC(chx,kername)( n_elem - 1, \ - x2, incx, \ - &scale, \ - &sumsq ); \ + PASTEMAC(ch,kername) \ + ( \ + n_elem - 1, \ + x2, incx, \ + &scale, \ + &sumsq \ + ); \ \ if ( bli_is_unit_diag( diagx ) ) chi1 = one; \ \ /* Handle the diagonal element separately in case it's unit. */ \ - PASTEMAC(chx,kername)( 1, \ - chi1, incx, \ - &scale, \ - &sumsq ); \ + PASTEMAC(ch,kername) \ + ( \ + 1, \ + chi1, incx, \ + &scale, \ + &sumsq \ + ); \ } \ } \ } \ \ /* Compute: norm = scale * sqrt( sumsq ) */ \ - PASTEMAC2(chxr,chxr,sqrt2s)( sumsq, sqrt_sumsq ); \ - PASTEMAC2(chxr,chxr,scals)( scale, sqrt_sumsq ); \ + PASTEMAC(chr,sqrt2s)( sumsq, sqrt_sumsq ); \ + PASTEMAC(chr,scals)( scale, sqrt_sumsq ); \ \ /* Store the final value to the output variable. */ \ - PASTEMAC2(chxr,chxr,copys)( sqrt_sumsq, *norm_cast ); \ + PASTEMAC(chr,copys)( sqrt_sumsq, *norm_cast ); \ } INSERT_GENTFUNCR_BASIC( normfm_unb_var1, sumsqv_unb_var1 ) diff --git a/frame/util/normfm/bli_normfm_unb_var1.h b/frame/util/old/normfm/bli_normfm_unb_var1.h similarity index 77% rename from frame/util/normfm/bli_normfm_unb_var1.h rename to frame/util/old/normfm/bli_normfm_unb_var1.h index 2dc97f6c9..bc329bb4d 100644 --- a/frame/util/normfm/bli_normfm_unb_var1.h +++ b/frame/util/old/normfm/bli_normfm_unb_var1.h @@ -32,19 +32,21 @@ */ -void bli_normfm_unb_var1( obj_t* x, obj_t* norm ); +void bli_normfm_unb_var1( obj_t* x, + obj_t* norm ); + #undef GENTPROTR -#define GENTPROTR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - doff_t diagoffx, \ - diag_t diagx, \ - uplo_t uplox, \ - dim_t m, \ - dim_t n, \ - void* x, inc_t rs_x, inc_t cs_x, \ - void* norm \ - ); +void PASTEMAC(ch,varname)( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + void* norm \ + ); INSERT_GENTPROTR_BASIC( normfm_unb_var1 ) diff --git a/frame/util/normfv/bli_normfv.c b/frame/util/old/normfv/bli_normfv.c similarity index 100% rename from frame/util/normfv/bli_normfv.c rename to frame/util/old/normfv/bli_normfv.c diff --git a/frame/util/normfv/bli_normfv.h b/frame/util/old/normfv/bli_normfv.h similarity index 100% rename from frame/util/normfv/bli_normfv.h rename to frame/util/old/normfv/bli_normfv.h diff --git a/frame/util/normfv/bli_normfv_check.c b/frame/util/old/normfv/bli_normfv_check.c similarity index 100% rename from frame/util/normfv/bli_normfv_check.c rename to frame/util/old/normfv/bli_normfv_check.c diff --git a/frame/util/normfv/bli_normfv_check.h b/frame/util/old/normfv/bli_normfv_check.h similarity index 100% rename from frame/util/normfv/bli_normfv_check.h rename to frame/util/old/normfv/bli_normfv_check.h diff --git a/frame/util/normfv/bli_normfv_unb_var1.c b/frame/util/old/normfv/bli_normfv_unb_var1.c similarity index 73% rename from frame/util/normfv/bli_normfv_unb_var1.c rename to frame/util/old/normfv/bli_normfv_unb_var1.c index 391d60ddd..8bbaeb52b 100644 --- a/frame/util/normfv/bli_normfv_unb_var1.c +++ b/frame/util/old/normfv/bli_normfv_unb_var1.c @@ -71,45 +71,44 @@ void bli_normfv_unb_var1( obj_t* x, #undef GENTFUNCR -#define GENTFUNCR( ctype_x, ctype_xr, chx, chxr, varname, kername ) \ +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t m, \ - void* x, inc_t incx, \ - void* norm \ - ) \ +void PASTEMAC(ch,varname)( \ + dim_t m, \ + void* x, inc_t incx, \ + void* norm \ + ) \ { \ - ctype_x* x_cast = x; \ - ctype_xr* norm_cast = norm; \ - ctype_xr* zero = PASTEMAC(chxr,0); \ - ctype_xr* one = PASTEMAC(chxr,1); \ - ctype_xr scale; \ - ctype_xr sumsq; \ - ctype_xr sqrt_sumsq; \ + ctype* x_cast = x; \ + ctype_r* norm_cast = norm; \ + ctype_r* zero = PASTEMAC(chr,0); \ + ctype_r* one = PASTEMAC(chr,1); \ + ctype_r scale; \ + ctype_r sumsq; \ + ctype_r sqrt_sumsq; \ \ - /* Return a norm of zero if either dimension is zero. */ \ - if ( bli_zero_dim1( m ) ) \ - { \ - PASTEMAC(chxr,set0s)( *norm_cast ); \ - return; \ - } \ + /* NOTE: Early returns due to empty dimensions are handled by the + caller. */ \ \ /* Initialize scale and sumsq to begin the summation. */ \ - PASTEMAC2(chxr,chxr,copys)( *zero, scale ); \ - PASTEMAC2(chxr,chxr,copys)( *one, sumsq ); \ + PASTEMAC(chr,copys)( *zero, scale ); \ + PASTEMAC(chr,copys)( *one, sumsq ); \ \ /* Compute the sum of the squares of the vector. */ \ - PASTEMAC(chx,kername)( m, \ - x_cast, incx, \ - &scale, \ - &sumsq ); \ + PASTEMAC(ch,kername) \ + ( \ + m, \ + x_cast, incx, \ + &scale, \ + &sumsq \ + ); \ \ /* Compute: norm = scale * sqrt( sumsq ) */ \ - PASTEMAC2(chxr,chxr,sqrt2s)( sumsq, sqrt_sumsq ); \ - PASTEMAC2(chxr,chxr,scals)( scale, sqrt_sumsq ); \ + PASTEMAC(chr,sqrt2s)( sumsq, sqrt_sumsq ); \ + PASTEMAC(chr,scals)( scale, sqrt_sumsq ); \ \ /* Store the final value to the output variable. */ \ - PASTEMAC2(chxr,chxr,copys)( sqrt_sumsq, *norm_cast ); \ + PASTEMAC(chr,copys)( sqrt_sumsq, *norm_cast ); \ } INSERT_GENTFUNCR_BASIC( normfv_unb_var1, sumsqv_unb_var1 ) diff --git a/frame/util/normfv/bli_normfv_unb_var1.h b/frame/util/old/normfv/bli_normfv_unb_var1.h similarity index 84% rename from frame/util/normfv/bli_normfv_unb_var1.h rename to frame/util/old/normfv/bli_normfv_unb_var1.h index ab20d2d2a..1ab14637e 100644 --- a/frame/util/normfv/bli_normfv_unb_var1.h +++ b/frame/util/old/normfv/bli_normfv_unb_var1.h @@ -32,17 +32,18 @@ */ -void bli_normfv_unb_var1( obj_t* x, obj_t* norm ); +void bli_normfv_unb_var1( obj_t* x, + obj_t* norm ); #undef GENTPROTR -#define GENTPROTR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t m, \ - void* x, inc_t incx, \ - void* norm \ - ); +void PASTEMAC(ch,varname)( \ + dim_t m, \ + void* x, inc_t incx, \ + void* norm \ + ); INSERT_GENTPROTR_BASIC( normfv_unb_var1 ) diff --git a/frame/util/normim/bli_normim.c b/frame/util/old/normim/bli_normim.c similarity index 100% rename from frame/util/normim/bli_normim.c rename to frame/util/old/normim/bli_normim.c diff --git a/frame/util/normim/bli_normim.h b/frame/util/old/normim/bli_normim.h similarity index 100% rename from frame/util/normim/bli_normim.h rename to frame/util/old/normim/bli_normim.h diff --git a/frame/util/normim/bli_normim_check.c b/frame/util/old/normim/bli_normim_check.c similarity index 100% rename from frame/util/normim/bli_normim_check.c rename to frame/util/old/normim/bli_normim_check.c diff --git a/frame/util/normim/bli_normim_check.h b/frame/util/old/normim/bli_normim_check.h similarity index 100% rename from frame/util/normim/bli_normim_check.h rename to frame/util/old/normim/bli_normim_check.h diff --git a/frame/util/normim/bli_normim_unb_var1.c b/frame/util/old/normim/bli_normim_unb_var1.c similarity index 82% rename from frame/util/normim/bli_normim_unb_var1.c rename to frame/util/old/normim/bli_normim_unb_var1.c index 78a320ba8..970ede18a 100644 --- a/frame/util/normim/bli_normim_unb_var1.c +++ b/frame/util/old/normim/bli_normim_unb_var1.c @@ -85,18 +85,21 @@ void bli_normim_unb_var1( obj_t* x, #undef GENTFUNCR -#define GENTFUNCR( ctype_x, ctype_xr, chx, chxr, varname, kername ) \ +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname, kername ) \ \ -void PASTEMAC(chx,varname)( \ - doff_t diagoffx, \ - diag_t diagx, \ - uplo_t uplox, \ - dim_t m, \ - dim_t n, \ - void* x, inc_t rs_x, inc_t cs_x, \ - void* norm \ - ) \ +void PASTEMAC(ch,varname)( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + void* norm \ + ) \ { \ + /* NOTE: Early returns due to empty dimensions are handled by the + caller. */ \ +\ /* Induce a transposition so that rows become columns. */ \ bli_swap_dims( m, n ); \ bli_swap_incs( rs_x, cs_x ); \ @@ -106,13 +109,16 @@ void PASTEMAC(chx,varname)( \ /* Now we can simply compute the 1-norm of this transposed matrix, which will be equivalent to the infinity-norm of the original matrix. */ \ - PASTEMAC(chx,kername)( diagoffx, \ - diagx, \ - uplox, \ - m, \ - n, \ - x, rs_x, cs_x, \ - norm ); \ + PASTEMAC(ch,kername) \ + ( \ + diagoffx, \ + diagx, \ + uplox, \ + m, \ + n, \ + x, rs_x, cs_x, \ + norm \ + ); \ } diff --git a/frame/util/normim/bli_normim_unb_var1.h b/frame/util/old/normim/bli_normim_unb_var1.h similarity index 80% rename from frame/util/normim/bli_normim_unb_var1.h rename to frame/util/old/normim/bli_normim_unb_var1.h index fc8ae6ca8..2225c9288 100644 --- a/frame/util/normim/bli_normim_unb_var1.h +++ b/frame/util/old/normim/bli_normim_unb_var1.h @@ -37,17 +37,17 @@ void bli_normim_unb_var1( obj_t* x, #undef GENTPROTR -#define GENTPROTR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - doff_t diagoffx, \ - diag_t diagx, \ - uplo_t uplox, \ - dim_t m, \ - dim_t n, \ - void* x, inc_t rs_x, inc_t cs_x, \ - void* norm \ - ); +void PASTEMAC(ch,varname)( \ + doff_t diagoffx, \ + diag_t diagx, \ + uplo_t uplox, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + void* norm \ + ); INSERT_GENTPROTR_BASIC( normim_unb_var1 ) diff --git a/frame/util/normiv/bli_normiv.c b/frame/util/old/normiv/bli_normiv.c similarity index 100% rename from frame/util/normiv/bli_normiv.c rename to frame/util/old/normiv/bli_normiv.c diff --git a/frame/util/normiv/bli_normiv.h b/frame/util/old/normiv/bli_normiv.h similarity index 100% rename from frame/util/normiv/bli_normiv.h rename to frame/util/old/normiv/bli_normiv.h diff --git a/frame/util/normiv/bli_normiv_check.c b/frame/util/old/normiv/bli_normiv_check.c similarity index 100% rename from frame/util/normiv/bli_normiv_check.c rename to frame/util/old/normiv/bli_normiv_check.c diff --git a/frame/util/normiv/bli_normiv_check.h b/frame/util/old/normiv/bli_normiv_check.h similarity index 100% rename from frame/util/normiv/bli_normiv_check.h rename to frame/util/old/normiv/bli_normiv_check.h diff --git a/frame/util/normiv/bli_normiv_unb_var1.c b/frame/util/old/normiv/bli_normiv_unb_var1.c similarity index 82% rename from frame/util/normiv/bli_normiv_unb_var1.c rename to frame/util/old/normiv/bli_normiv_unb_var1.c index eaa381a10..9f3f154f4 100644 --- a/frame/util/normiv/bli_normiv_unb_var1.c +++ b/frame/util/old/normiv/bli_normiv_unb_var1.c @@ -71,30 +71,33 @@ void bli_normiv_unb_var1( obj_t* x, #undef GENTFUNCR -#define GENTFUNCR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* norm \ - ) \ +void PASTEMAC(ch,varname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* norm \ + ) \ { \ - ctype_x* x_cast = x; \ - ctype_xr* norm_cast = norm; \ - ctype_x* chi1; \ - ctype_xr abs_chi1; \ - ctype_xr abs_chi1_max; \ - dim_t i; \ + ctype* x_cast = x; \ + ctype_r* norm_cast = norm; \ + ctype* chi1; \ + ctype_r abs_chi1; \ + ctype_r abs_chi1_max; \ + dim_t i; \ +\ + /* NOTE: Early returns due to empty dimensions are handled by the + caller. */ \ \ /* Initialize the maximum absolute value to zero. */ \ - PASTEMAC(chxr,set0s)( abs_chi1_max ); \ + PASTEMAC(chr,set0s)( abs_chi1_max ); \ \ for ( i = 0; i < n; ++i ) \ { \ chi1 = x_cast + (i )*incx; \ \ /* Compute the absolute value (or complex magnitude) of chi1. */ \ - PASTEMAC2(chx,chxr,abval2s)( *chi1, abs_chi1 ); \ + PASTEMAC2(ch,chr,abval2s)( *chi1, abs_chi1 ); \ \ /* If the absolute value of the current element exceeds that of the previous largest, save it and its index. If NaN is @@ -103,12 +106,12 @@ void PASTEMAC(chx,varname)( \ behavior mimics that of LAPACK's ?lange(). */ \ if ( abs_chi1_max < abs_chi1 || bli_isnan( abs_chi1 ) ) \ { \ - PASTEMAC2(chxr,chxr,copys)( abs_chi1, abs_chi1_max ); \ + PASTEMAC(chr,copys)( abs_chi1, abs_chi1_max ); \ } \ } \ \ /* Store the final value to the output variable. */ \ - PASTEMAC2(chxr,chxr,copys)( abs_chi1_max, *norm_cast ); \ + PASTEMAC(chr,copys)( abs_chi1_max, *norm_cast ); \ } INSERT_GENTFUNCR_BASIC0( normiv_unb_var1 ) diff --git a/frame/util/normiv/bli_normiv_unb_var1.h b/frame/util/old/normiv/bli_normiv_unb_var1.h similarity index 87% rename from frame/util/normiv/bli_normiv_unb_var1.h rename to frame/util/old/normiv/bli_normiv_unb_var1.h index d1772cba4..51866a3fe 100644 --- a/frame/util/normiv/bli_normiv_unb_var1.h +++ b/frame/util/old/normiv/bli_normiv_unb_var1.h @@ -37,13 +37,13 @@ void bli_normiv_unb_var1( obj_t* x, #undef GENTPROTR -#define GENTPROTR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* norm \ - ); +void PASTEMAC(ch,varname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* norm \ + ); INSERT_GENTPROTR_BASIC( normiv_unb_var1 ) diff --git a/frame/util/printm/bli_fprintm.c b/frame/util/old/printm/bli_fprintm.c similarity index 92% rename from frame/util/printm/bli_fprintm.c rename to frame/util/old/printm/bli_fprintm.c index 3ef90651b..657a22b35 100644 --- a/frame/util/printm/bli_fprintm.c +++ b/frame/util/old/printm/bli_fprintm.c @@ -102,15 +102,16 @@ void bli_fprintm( FILE* file, char* s1, obj_t* x, char* format, char* s2 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname, varname ) \ \ -void PASTEMAC(ch,opname)( \ - FILE* file, \ - char* s1, \ - dim_t m, \ - dim_t n, \ - void* x, inc_t rs_x, inc_t cs_x, \ - char* format, \ - char* s2 \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + FILE* file, \ + char* s1, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + char* format, \ + char* s2 \ + ) \ { \ dim_t i, j; \ ctype* chi1; \ diff --git a/frame/util/printm/bli_fprintm.h b/frame/util/old/printm/bli_fprintm.h similarity index 84% rename from frame/util/printm/bli_fprintm.h rename to frame/util/old/printm/bli_fprintm.h index 7c7d95233..33f0fc2c0 100644 --- a/frame/util/printm/bli_fprintm.h +++ b/frame/util/old/printm/bli_fprintm.h @@ -42,15 +42,16 @@ void bli_fprintm( FILE* file, #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - FILE* file, \ - char* s1, \ - dim_t m, \ - dim_t n, \ - void* x, inc_t rs_x, inc_t cs_x, \ - char* format, \ - char* s2 \ - ); +void PASTEMAC(ch,opname) \ + ( \ + FILE* file, \ + char* s1, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + char* format, \ + char* s2 \ + ); INSERT_GENTPROT_BASIC_I( fprintm ) diff --git a/frame/util/printm/bli_fprintm_check.c b/frame/util/old/printm/bli_fprintm_check.c similarity index 100% rename from frame/util/printm/bli_fprintm_check.c rename to frame/util/old/printm/bli_fprintm_check.c diff --git a/frame/util/printm/bli_fprintm_check.h b/frame/util/old/printm/bli_fprintm_check.h similarity index 100% rename from frame/util/printm/bli_fprintm_check.h rename to frame/util/old/printm/bli_fprintm_check.h diff --git a/frame/util/printm/bli_printm.c b/frame/util/old/printm/bli_printm.c similarity index 78% rename from frame/util/printm/bli_printm.c rename to frame/util/old/printm/bli_printm.c index f81c5786d..f0adb99c3 100644 --- a/frame/util/printm/bli_printm.c +++ b/frame/util/old/printm/bli_printm.c @@ -43,22 +43,26 @@ void bli_printm( char* s1, obj_t* x, char* format, char* s2 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname, varname ) \ \ -void PASTEMAC(ch,opname)( \ - char* s1, \ - dim_t m, \ - dim_t n, \ - ctype* x, inc_t rs_x, inc_t cs_x, \ - char* format, \ - char* s2 \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + char* s1, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + char* format, \ + char* s2 \ + ) \ { \ - PASTEMAC(ch,varname)( stdout, \ - s1, \ - m, \ - n, \ - x, rs_x, cs_x, \ - format, \ - s2 ); \ + PASTEMAC(ch,varname) \ + ( \ + stdout, \ + s1, \ + m, \ + n, \ + x, rs_x, cs_x, \ + format, \ + s2 \ + ); \ } INSERT_GENTFUNC_BASIC_I( printm, fprintm ) diff --git a/frame/util/printm/bli_printm.h b/frame/util/old/printm/bli_printm.h similarity index 86% rename from frame/util/printm/bli_printm.h rename to frame/util/old/printm/bli_printm.h index 542c51d07..1a9c5333e 100644 --- a/frame/util/printm/bli_printm.h +++ b/frame/util/old/printm/bli_printm.h @@ -44,14 +44,15 @@ void bli_printm( char* s1, #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - char* s1, \ - dim_t m, \ - dim_t n, \ - ctype* x, inc_t rs_x, inc_t cs_x, \ - char* format, \ - char* s2 \ - ); +void PASTEMAC(ch,opname) \ + ( \ + char* s1, \ + dim_t m, \ + dim_t n, \ + void* x, inc_t rs_x, inc_t cs_x, \ + char* format, \ + char* s2 \ + ); INSERT_GENTPROT_BASIC_I( printm ) diff --git a/frame/util/printv/bli_fprintv.c b/frame/util/old/printv/bli_fprintv.c similarity index 91% rename from frame/util/printv/bli_fprintv.c rename to frame/util/old/printv/bli_fprintv.c index cc214cccf..3135c4e2d 100644 --- a/frame/util/printv/bli_fprintv.c +++ b/frame/util/old/printv/bli_fprintv.c @@ -79,14 +79,15 @@ void bli_fprintv( FILE* file, char* s1, obj_t* x, char* format, char* s2 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname, varname ) \ \ -void PASTEMAC(ch,opname)( \ - FILE* file, \ - char* s1, \ - dim_t n, \ - void* x, inc_t incx, \ - char* format, \ - char* s2 \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + FILE* file, \ + char* s1, \ + dim_t n, \ + void* x, inc_t incx, \ + char* format, \ + char* s2 \ + ) \ { \ dim_t i; \ ctype* chi1; \ diff --git a/frame/util/printv/bli_fprintv.h b/frame/util/old/printv/bli_fprintv.h similarity index 86% rename from frame/util/printv/bli_fprintv.h rename to frame/util/old/printv/bli_fprintv.h index 0cceef257..ddd58f717 100644 --- a/frame/util/printv/bli_fprintv.h +++ b/frame/util/old/printv/bli_fprintv.h @@ -42,14 +42,15 @@ void bli_fprintv( FILE* file, #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - FILE* file, \ - char* s1, \ - dim_t n, \ - void* x, inc_t incx, \ - char* format, \ - char* s2 \ - ); +void PASTEMAC(ch,opname) \ + ( \ + FILE* file, \ + char* s1, \ + dim_t n, \ + void* x, inc_t incx, \ + char* format, \ + char* s2 \ + ); INSERT_GENTPROT_BASIC_I( fprintv ) diff --git a/frame/util/printv/bli_fprintv_check.c b/frame/util/old/printv/bli_fprintv_check.c similarity index 100% rename from frame/util/printv/bli_fprintv_check.c rename to frame/util/old/printv/bli_fprintv_check.c diff --git a/frame/util/printv/bli_fprintv_check.h b/frame/util/old/printv/bli_fprintv_check.h similarity index 100% rename from frame/util/printv/bli_fprintv_check.h rename to frame/util/old/printv/bli_fprintv_check.h diff --git a/frame/util/printv/bli_printv.c b/frame/util/old/printv/bli_printv.c similarity index 81% rename from frame/util/printv/bli_printv.c rename to frame/util/old/printv/bli_printv.c index 62a85576c..ed67939d7 100644 --- a/frame/util/printv/bli_printv.c +++ b/frame/util/old/printv/bli_printv.c @@ -43,20 +43,24 @@ void bli_printv( char* s1, obj_t* x, char* format, char* s2 ) #undef GENTFUNC #define GENTFUNC( ctype, ch, opname, varname ) \ \ -void PASTEMAC(ch,opname)( \ - char* s1, \ - dim_t n, \ - ctype* x, inc_t incx, \ - char* format, \ - char* s2 \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + char* s1, \ + dim_t n, \ + void* x, inc_t incx, \ + char* format, \ + char* s2 \ + ) \ { \ - PASTEMAC(ch,varname)( stdout, \ - s1, \ - n, \ - x, incx, \ - format, \ - s2 ); \ + PASTEMAC(ch,varname) \ + ( \ + stdout, \ + s1, \ + n, \ + x, incx, \ + format, \ + s2 \ + ); \ } INSERT_GENTFUNC_BASIC_I( printv, fprintv ) diff --git a/frame/util/printv/bli_printv.h b/frame/util/old/printv/bli_printv.h similarity index 88% rename from frame/util/printv/bli_printv.h rename to frame/util/old/printv/bli_printv.h index 3255366a0..172286e7c 100644 --- a/frame/util/printv/bli_printv.h +++ b/frame/util/old/printv/bli_printv.h @@ -44,13 +44,14 @@ void bli_printv( char* s1, #undef GENTPROT #define GENTPROT( ctype, ch, opname ) \ \ -void PASTEMAC(ch,opname)( \ - char* s1, \ - dim_t n, \ - ctype* x, inc_t incx, \ - char* format, \ - char* s2 \ - ); +void PASTEMAC(ch,opname) \ + ( \ + char* s1, \ + dim_t n, \ + void* x, inc_t incx, \ + char* format, \ + char* s2 \ + ); INSERT_GENTPROT_BASIC_I( printv ) diff --git a/frame/util/randm/bli_randm.c b/frame/util/old/randm/bli_randm.c similarity index 100% rename from frame/util/randm/bli_randm.c rename to frame/util/old/randm/bli_randm.c diff --git a/frame/util/randm/bli_randm.h b/frame/util/old/randm/bli_randm.h similarity index 100% rename from frame/util/randm/bli_randm.h rename to frame/util/old/randm/bli_randm.h diff --git a/frame/util/randm/bli_randm_check.c b/frame/util/old/randm/bli_randm_check.c similarity index 100% rename from frame/util/randm/bli_randm_check.c rename to frame/util/old/randm/bli_randm_check.c diff --git a/frame/util/randm/bli_randm_check.h b/frame/util/old/randm/bli_randm_check.h similarity index 100% rename from frame/util/randm/bli_randm_check.h rename to frame/util/old/randm/bli_randm_check.h diff --git a/frame/util/randm/bli_randm_unb_var1.c b/frame/util/old/randm/bli_randm_unb_var1.c similarity index 80% rename from frame/util/randm/bli_randm_unb_var1.c rename to frame/util/old/randm/bli_randm_unb_var1.c index 3576cefaf..4ef651446 100644 --- a/frame/util/randm/bli_randm_unb_var1.c +++ b/frame/util/old/randm/bli_randm_unb_var1.c @@ -87,21 +87,21 @@ void PASTEMAC(ch,varname)( \ void* x, inc_t rs_x, inc_t cs_x \ ) \ { \ - ctype* one = PASTEMAC(ch,1); \ - ctype* x_cast = x; \ - ctype* x0; \ - ctype* x1; \ - ctype* x2; \ - ctype* chi1; \ - ctype beta; \ - ctype omega; \ - double max_m_n; \ - uplo_t uplox_eff; \ - dim_t n_iter; \ - dim_t n_elem, n_elem_max; \ - inc_t ldx, incx; \ - dim_t j, i; \ - dim_t ij0, n_shift; \ + ctype* one = PASTEMAC(ch,1); \ + ctype* x_cast = x; \ + ctype* x0; \ + ctype* x1; \ + ctype* x2; \ + ctype* chi1; \ + ctype beta; \ + ctype omega; \ + double max_m_n; \ + uplo_t uplox_eff; \ + dim_t n_iter; \ + dim_t n_elem, n_elem_max; \ + inc_t ldx, incx; \ + dim_t j, i; \ + dim_t ij0, n_shift; \ \ if ( bli_zero_dim2( m, n ) ) return; \ \ @@ -123,8 +123,11 @@ void PASTEMAC(ch,varname)( \ \ x1 = x_cast + (j )*ldx + (0 )*incx; \ \ - PASTEMAC(ch,randv)( n_elem, \ - x1, incx ); \ + PASTEMAC(ch,randv) \ + ( \ + n_elem, \ + x1, incx \ + ); \ } \ } \ else \ @@ -132,8 +135,8 @@ void PASTEMAC(ch,varname)( \ max_m_n = bli_max( m, n ); \ \ PASTEMAC2(d,ch,sets)( max_m_n, 0.0, omega ); \ - PASTEMAC2(ch,ch,copys)( *one, beta ); \ - PASTEMAC2(ch,ch,invscals)( omega, beta ); \ + PASTEMAC(ch,copys)( *one, beta ); \ + PASTEMAC(ch,invscals)( omega, beta ); \ \ if ( bli_is_upper( uplox_eff ) ) \ { \ @@ -145,18 +148,24 @@ void PASTEMAC(ch,varname)( \ x0 = x1; \ chi1 = x1 + (n_elem-1)*incx; \ \ - PASTEMAC(ch,randv)( n_elem, \ - x1, incx ); \ + PASTEMAC(ch,randv) \ + ( \ + n_elem, \ + x1, incx \ + ); \ \ /* We want positive diagonal elements between 1 and 2. */ \ - PASTEMAC2(ch,ch,abval2s)( *chi1, *chi1 ); \ - PASTEMAC2(ch,ch,adds)( *one, *chi1 ); \ + PASTEMAC(ch,abval2s)( *chi1, *chi1 ); \ + PASTEMAC(ch,adds)( *one, *chi1 ); \ \ /* Scale the super-diagonal elements by 1/max(m,n). */ \ - PASTEMAC2(ch,ch,scalv)( BLIS_NO_CONJUGATE, \ - n_elem - 1, \ - &beta, \ - x0, incx ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + n_elem - 1, \ + &beta, \ + x0, incx \ + ); \ } \ } \ else if ( bli_is_lower( uplox_eff ) ) \ @@ -170,18 +179,24 @@ void PASTEMAC(ch,varname)( \ x2 = x1 + incx; \ chi1 = x1; \ \ - PASTEMAC(ch,randv)( n_elem, \ - x1, incx ); \ + PASTEMAC(ch,randv) \ + ( \ + n_elem, \ + x1, incx \ + ); \ \ /* We want positive diagonal elements between 1 and 2. */ \ - PASTEMAC2(ch,ch,abval2s)( *chi1, *chi1 ); \ - PASTEMAC2(ch,ch,adds)( *one, *chi1 ); \ + PASTEMAC(ch,abval2s)( *chi1, *chi1 ); \ + PASTEMAC(ch,adds)( *one, *chi1 ); \ \ /* Scale the sub-diagonal elements by 1/max(m,n). */ \ - PASTEMAC2(ch,ch,scalv)( BLIS_NO_CONJUGATE, \ - n_elem - 1, \ - &beta, \ - x2, incx ); \ + PASTEMAC(ch,scalv) \ + ( \ + BLIS_NO_CONJUGATE, \ + n_elem - 1, \ + &beta, \ + x2, incx \ + ); \ } \ } \ } \ diff --git a/frame/util/randm/bli_randm_unb_var1.h b/frame/util/old/randm/bli_randm_unb_var1.h similarity index 100% rename from frame/util/randm/bli_randm_unb_var1.h rename to frame/util/old/randm/bli_randm_unb_var1.h diff --git a/frame/util/randv/bli_randv.c b/frame/util/old/randv/bli_randv.c similarity index 100% rename from frame/util/randv/bli_randv.c rename to frame/util/old/randv/bli_randv.c diff --git a/frame/util/randv/bli_randv.h b/frame/util/old/randv/bli_randv.h similarity index 100% rename from frame/util/randv/bli_randv.h rename to frame/util/old/randv/bli_randv.h diff --git a/frame/util/randv/bli_randv_check.c b/frame/util/old/randv/bli_randv_check.c similarity index 100% rename from frame/util/randv/bli_randv_check.c rename to frame/util/old/randv/bli_randv_check.c diff --git a/frame/util/randv/bli_randv_check.h b/frame/util/old/randv/bli_randv_check.h similarity index 100% rename from frame/util/randv/bli_randv_check.h rename to frame/util/old/randv/bli_randv_check.h diff --git a/frame/util/randv/bli_randv_unb_var1.c b/frame/util/old/randv/bli_randv_unb_var1.c similarity index 100% rename from frame/util/randv/bli_randv_unb_var1.c rename to frame/util/old/randv/bli_randv_unb_var1.c diff --git a/frame/util/randv/bli_randv_unb_var1.h b/frame/util/old/randv/bli_randv_unb_var1.h similarity index 99% rename from frame/util/randv/bli_randv_unb_var1.h rename to frame/util/old/randv/bli_randv_unb_var1.h index e3f345d28..5d6fb633b 100644 --- a/frame/util/randv/bli_randv_unb_var1.h +++ b/frame/util/old/randv/bli_randv_unb_var1.h @@ -34,6 +34,7 @@ void bli_randv_unb_var1( obj_t* x ); + #undef GENTPROT #define GENTPROT( ctype, ch, varname ) \ \ diff --git a/frame/util/sumsqv/bli_sumsqv.c b/frame/util/old/sumsqv/bli_sumsqv.c similarity index 100% rename from frame/util/sumsqv/bli_sumsqv.c rename to frame/util/old/sumsqv/bli_sumsqv.c diff --git a/frame/util/sumsqv/bli_sumsqv.h b/frame/util/old/sumsqv/bli_sumsqv.h similarity index 100% rename from frame/util/sumsqv/bli_sumsqv.h rename to frame/util/old/sumsqv/bli_sumsqv.h diff --git a/frame/util/sumsqv/bli_sumsqv_check.c b/frame/util/old/sumsqv/bli_sumsqv_check.c similarity index 100% rename from frame/util/sumsqv/bli_sumsqv_check.c rename to frame/util/old/sumsqv/bli_sumsqv_check.c diff --git a/frame/util/sumsqv/bli_sumsqv_check.h b/frame/util/old/sumsqv/bli_sumsqv_check.h similarity index 100% rename from frame/util/sumsqv/bli_sumsqv_check.h rename to frame/util/old/sumsqv/bli_sumsqv_check.h diff --git a/frame/util/sumsqv/bli_sumsqv_unb_var1.c b/frame/util/old/sumsqv/bli_sumsqv_unb_var1.c similarity index 81% rename from frame/util/sumsqv/bli_sumsqv_unb_var1.c rename to frame/util/old/sumsqv/bli_sumsqv_unb_var1.c index b3c3708bb..311e88d04 100644 --- a/frame/util/sumsqv/bli_sumsqv_unb_var1.c +++ b/frame/util/old/sumsqv/bli_sumsqv_unb_var1.c @@ -90,29 +90,29 @@ void bli_sumsqv_unb_var1( obj_t* x, #undef GENTFUNCR -#define GENTFUNCR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTFUNCR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* scale, \ - void* sumsq \ - ) \ +void PASTEMAC(ch,varname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* scale, \ + void* sumsq \ + ) \ { \ - ctype_x* x_cast = x; \ - ctype_xr* scale_cast = scale; \ - ctype_xr* sumsq_cast = sumsq; \ + ctype* x_cast = x; \ + ctype_r* scale_cast = scale; \ + ctype_r* sumsq_cast = sumsq; \ \ - const ctype_xr zero_r = *PASTEMAC(chxr,0); \ - const ctype_xr one_r = *PASTEMAC(chxr,1); \ + const ctype_r zero_r = *PASTEMAC(chr,0); \ + const ctype_r one_r = *PASTEMAC(chr,1); \ \ - ctype_x* chi1; \ - ctype_xr chi1_r; \ - ctype_xr chi1_i; \ - ctype_xr scale_r; \ - ctype_xr sumsq_r; \ - ctype_xr abs_chi1_r; \ - dim_t i; \ + ctype* chi1; \ + ctype_r chi1_r; \ + ctype_r chi1_i; \ + ctype_r scale_r; \ + ctype_r sumsq_r; \ + ctype_r abs_chi1_r; \ + dim_t i; \ \ /* NOTE: This function attempts to mimic the algorithm for computing the Frobenius norm in netlib LAPACK's ?lassq(). */ \ @@ -121,15 +121,15 @@ void PASTEMAC(chx,varname)( \ if ( bli_zero_dim1( n ) ) return; \ \ /* Copy scale and sumsq to local variables. */ \ - PASTEMAC2(chxr,chxr,copys)( *scale_cast, scale_r ); \ - PASTEMAC2(chxr,chxr,copys)( *sumsq_cast, sumsq_r ); \ + PASTEMAC(chr,copys)( *scale_cast, scale_r ); \ + PASTEMAC(chr,copys)( *sumsq_cast, sumsq_r ); \ \ chi1 = x_cast; \ \ for ( i = 0; i < n; ++i ) \ { \ /* Get the real and imaginary components of chi1. */ \ - PASTEMAC2(chx,chxr,gets)( *chi1, chi1_r, chi1_i ); \ + PASTEMAC2(ch,chr,gets)( *chi1, chi1_r, chi1_i ); \ \ abs_chi1_r = bli_fabs( chi1_r ); \ \ @@ -143,7 +143,7 @@ void PASTEMAC(chx,varname)( \ sumsq_r * ( scale_r / abs_chi1_r ) * \ ( scale_r / abs_chi1_r ); \ \ - PASTEMAC2(chxr,chxr,copys)( abs_chi1_r, scale_r ); \ + PASTEMAC(chr,copys)( abs_chi1_r, scale_r ); \ } \ else \ { \ @@ -164,7 +164,7 @@ void PASTEMAC(chx,varname)( \ sumsq_r * ( scale_r / abs_chi1_r ) * \ ( scale_r / abs_chi1_r ); \ \ - PASTEMAC2(chxr,chxr,copys)( abs_chi1_r, scale_r ); \ + PASTEMAC(chr,copys)( abs_chi1_r, scale_r ); \ } \ else \ { \ @@ -177,8 +177,8 @@ void PASTEMAC(chx,varname)( \ } \ \ /* Store final values of scale and sumsq to output variables. */ \ - PASTEMAC2(chxr,chxr,copys)( scale_r, *scale_cast ); \ - PASTEMAC2(chxr,chxr,copys)( sumsq_r, *sumsq_cast ); \ + PASTEMAC(chr,copys)( scale_r, *scale_cast ); \ + PASTEMAC(chr,copys)( sumsq_r, *sumsq_cast ); \ } INSERT_GENTFUNCR_BASIC0( sumsqv_unb_var1 ) diff --git a/frame/util/sumsqv/bli_sumsqv_unb_var1.h b/frame/util/old/sumsqv/bli_sumsqv_unb_var1.h similarity index 86% rename from frame/util/sumsqv/bli_sumsqv_unb_var1.h rename to frame/util/old/sumsqv/bli_sumsqv_unb_var1.h index 6c7beb4f4..0257c20fb 100644 --- a/frame/util/sumsqv/bli_sumsqv_unb_var1.h +++ b/frame/util/old/sumsqv/bli_sumsqv_unb_var1.h @@ -38,14 +38,14 @@ void bli_sumsqv_unb_var1( obj_t* x, #undef GENTPROTR -#define GENTPROTR( ctype_x, ctype_xr, chx, chxr, varname ) \ +#define GENTPROTR( ctype, ctype_r, ch, chr, varname ) \ \ -void PASTEMAC(chx,varname)( \ - dim_t n, \ - void* x, inc_t incx, \ - void* scale, \ - void* sumsq \ - ); +void PASTEMAC(ch,varname)( \ + dim_t n, \ + void* x, inc_t incx, \ + void* scale, \ + void* sumsq \ + ); INSERT_GENTPROTR_BASIC( sumsqv_unb_var1 ) diff --git a/kernels/arm/neon/3/bli_gemm_opt_4x4.c b/kernels/arm/3/bli_gemm_opt_4x4.c similarity index 85% rename from kernels/arm/neon/3/bli_gemm_opt_4x4.c rename to kernels/arm/3/bli_gemm_opt_4x4.c index e5ae17281..a709b5022 100644 --- a/kernels/arm/neon/3/bli_gemm_opt_4x4.c +++ b/kernels/arm/3/bli_gemm_opt_4x4.c @@ -35,15 +35,17 @@ #include "blis.h" #include "arm_neon.h" -void bli_sgemm_opt_4x4( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_opt_4x4 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); @@ -264,15 +266,17 @@ void bli_sgemm_opt_4x4( } } -void bli_dgemm_opt_4x4( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_opt_4x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); @@ -510,43 +514,3 @@ void bli_dgemm_opt_4x4( *c33 += ab33 * *alpha; } -void bli_cgemm_opt_4x4( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) -{ - /* Just call the reference implementation. */ - BLIS_CGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); -} - -void bli_zgemm_opt_4x4( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) -{ - /* Just call the reference implementation. */ - BLIS_ZGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); -} - diff --git a/kernels/armv7a/3/bli_gemm_opt_4x4.c b/kernels/armv7a/3/bli_gemm_opt_4x4.c index 4884eea5d..e93c7adbf 100644 --- a/kernels/armv7a/3/bli_gemm_opt_4x4.c +++ b/kernels/armv7a/3/bli_gemm_opt_4x4.c @@ -34,101 +34,117 @@ #include "blis.h" -extern void bli_sgemm_kernel_4x4(dim_t k, - float* alpha, - float* restrict a, - float* restrict b, - float* beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ); +extern +void bli_sgemm_kernel_4x4 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data + ); - -void bli_sgemm_opt_4x4( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_opt_4x4 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - - bli_sgemm_kernel_4x4(k, alpha, a, b, beta, c, rs_c, cs_c, data); - + bli_sgemm_kernel_4x4( k, alpha, a, b, beta, c, rs_c, cs_c, data ); } -extern void bli_dgemm_kernel_4x4(dim_t k, - double* alpha, - double* restrict a, - double* restrict b, - double* beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ); -void bli_dgemm_opt_4x4( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +extern +void bli_dgemm_kernel_4x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data + ); + +void bli_dgemm_opt_4x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - bli_dgemm_kernel_4x4(k, alpha, a, b, beta, c, rs_c, cs_c, data); + bli_dgemm_kernel_4x4( k, alpha, a, b, beta, c, rs_c, cs_c, data ); } -extern void bli_cgemm_kernel_2x2(dim_t k, - scomplex* alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ); +extern +void bli_cgemm_kernel_2x2 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data + ); - -void bli_cgemm_opt_4x4( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_cgemm_opt_4x4 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - - bli_cgemm_kernel_2x2(k, alpha, a, b, beta, c, rs_c, cs_c, data); + bli_cgemm_kernel_2x2( k, alpha, a, b, beta, c, rs_c, cs_c, data ); } -extern void bli_zgemm_kernel_2x2(dim_t k, - dcomplex* alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ); -void bli_zgemm_opt_4x4( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +extern +void bli_zgemm_kernel_2x2 + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a, + dcomplex* restrict b, + dcomplex* restrict beta, + dcomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data + ); + +void bli_zgemm_opt_4x4 + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a, + dcomplex* restrict b, + dcomplex* restrict beta, + dcomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - - bli_zgemm_kernel_2x2(k, alpha, a, b, beta, c, rs_c, cs_c, data); + bli_zgemm_kernel_2x2( k, alpha, a, b, beta, c, rs_c, cs_c, data ); } diff --git a/kernels/armv8a/neon/3/bli_gemm_opt_4x4.c b/kernels/armv8a/3/bli_gemm_opt_4x4.c similarity index 100% rename from kernels/armv8a/neon/3/bli_gemm_opt_4x4.c rename to kernels/armv8a/3/bli_gemm_opt_4x4.c diff --git a/kernels/bgq/1/bli_axpyv_opt_var1.c b/kernels/bgq/1/bli_axpyv_opt_var1.c index 9cfd5a4eb..b6131e5ee 100644 --- a/kernels/bgq/1/bli_axpyv_opt_var1.c +++ b/kernels/bgq/1/bli_axpyv_opt_var1.c @@ -34,13 +34,15 @@ #include "blis.h" -void bli_daxpyv_opt_var1( - conj_t conjx, - dim_t n, - double* restrict alpha, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy - ) +void bli_daxpyv_opt_var1 + ( + conj_t conjx, + dim_t n, + double* alpha, + double* x, inc_t incx, + double* y, inc_t incy, + cntx_t* cntx + ) { if ( bli_zero_dim1( n ) ) return; diff --git a/kernels/bgq/1/bli_dotv_opt_var1.c b/kernels/bgq/1/bli_dotv_opt_var1.c index edec60096..b54b1176f 100644 --- a/kernels/bgq/1/bli_dotv_opt_var1.c +++ b/kernels/bgq/1/bli_dotv_opt_var1.c @@ -34,14 +34,16 @@ #include "blis.h" -void bli_ddotv_opt_var1( - conj_t conjx, - conj_t conjy, - dim_t n, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy, - double* restrict rho - ) +void bli_ddotv_opt_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + double* x, inc_t incx, + double* y, inc_t incy, + double* rho, + cntx_t* cntx + ) { bool_t use_ref = FALSE; diff --git a/kernels/bgq/1f/bli_axpyf_opt_var1.c b/kernels/bgq/1f/bli_axpyf_opt_var1.c index 16b8dacbd..2af7d1e2f 100644 --- a/kernels/bgq/1f/bli_axpyf_opt_var1.c +++ b/kernels/bgq/1f/bli_axpyf_opt_var1.c @@ -35,41 +35,18 @@ #include "blis.h" - -void bli_saxpyf_opt_var1( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - float* restrict alpha, - float* restrict a, inc_t inca, inc_t lda, - float* restrict x, inc_t incx, - float* restrict y, inc_t incy - ) -{ - /* Just call the reference implementation. */ - BLIS_SAXPYF_KERNEL_REF( conja, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - y, incy ); -} - - - -void bli_daxpyf_opt_var1( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - double* restrict alpha, - double* restrict a, inc_t inca, inc_t lda, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy - ) +void bli_daxpyf_opt_var1 + ( + conj_t conja, + conj_t conjx, + dim_t m, + dim_t b_n, + double* alpha, + double* a, inc_t inca, inc_t lda, + double* x, inc_t incx, + double* y, inc_t incy, + cntx_t* cntx + ) { if ( bli_zero_dim2( m, b_n ) ) return; @@ -170,50 +147,3 @@ void bli_daxpyf_opt_var1( } - - -void bli_caxpyf_opt_var1( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - scomplex* restrict alpha, - scomplex* restrict a, inc_t inca, inc_t lda, - scomplex* restrict x, inc_t incx, - scomplex* restrict y, inc_t incy - ) -{ - /* Just call the reference implementation. */ - BLIS_CAXPYF_KERNEL_REF( conja, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - y, incy ); -} - - -void bli_zaxpyf_opt_var1( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - dcomplex* restrict alpha, - dcomplex* restrict a, inc_t inca, inc_t lda, - dcomplex* restrict x, inc_t incx, - dcomplex* restrict y, inc_t incy - ) -{ - /* Just call the reference implementation. */ - BLIS_ZAXPYF_KERNEL_REF( conja, - conjx, - m, - b_n, - alpha, - a, inca, lda, - x, incx, - y, incy ); -} - diff --git a/kernels/bgq/3/bli_gemm_8x8.h b/kernels/bgq/3/bli_gemm_8x8.h deleted file mode 100644 index ea96fe876..000000000 --- a/kernels/bgq/3/bli_gemm_8x8.h +++ /dev/null @@ -1,69 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - -#include "blis.h" - - -/* -#undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ -\ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* alpha, \ - ctype* a, \ - ctype* b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ); - -INSERT_GENTPROT_BASIC( gemm_8x8 ) -*/ - - -#undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ -\ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* alpha, \ - ctype* a, \ - ctype* b, \ - ctype* beta, \ - ctype* c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ); - -INSERT_GENTPROT_BASIC( gemm_8x8_mt ) diff --git a/kernels/bgq/3/bli_gemm_8x8.c b/kernels/bgq/3/bli_gemm_int_8x8.c similarity index 94% rename from kernels/bgq/3/bli_gemm_8x8.c rename to kernels/bgq/3/bli_gemm_int_8x8.c index 7e7c9b9e7..363155738 100644 --- a/kernels/bgq/3/bli_gemm_8x8.c +++ b/kernels/bgq/3/bli_gemm_int_8x8.c @@ -54,16 +54,17 @@ * we could (maybe) theoretically hit 100% of peak with this instruction mix */ -void bli_dgemm_8x8( - dim_t k, - restrict double* alpha, - restrict double* a, - restrict double* b, - restrict double* beta, - restrict double* c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) - +void bli_dgemm_int_8x8 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //Registers for storing C. //4 4x4 subblocks of C, c00, c01, c10, c11 @@ -211,15 +212,17 @@ void printvec(vector4double v) printf("%4.3f\t%4.3f\t%4.3f\t%4.3f\n", a, b, c, d); } -void bli_zgemm_8x8( - dim_t k, - dcomplex* alpha_z, - dcomplex* a_z, - dcomplex* b_z, - dcomplex* beta_z, - dcomplex* c_z, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemm_int_8x8 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { double * alpha = (double*) alpha_z; double * beta = (double*) beta_z; diff --git a/kernels/c99/3/bli_gemm_ref_4x4.c b/kernels/c99/3/bli_gemm_c99_4x4.c similarity index 92% rename from kernels/c99/3/bli_gemm_ref_4x4.c rename to kernels/c99/3/bli_gemm_c99_4x4.c index 72d74e941..9fa3e30c3 100644 --- a/kernels/c99/3/bli_gemm_ref_4x4.c +++ b/kernels/c99/3/bli_gemm_c99_4x4.c @@ -38,15 +38,17 @@ #undef GENTFUNC #define GENTFUNC( ctype, ch, varname, kername ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict beta, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ ctype a0; \ ctype a1; \ @@ -210,5 +212,5 @@ void PASTEMAC(ch,varname)( \ PASTEMAC(ch,dots)( *alpha, ab33, *c33 ); \ } -INSERT_GENTFUNC_BASIC( gemm_ref_4x4, gemm_ref_4x4 ) +INSERT_GENTFUNC_BASIC( gemm_c99_4x4, gemm_c99_4x4 ) diff --git a/kernels/c99/3/bli_gemmtrsm_u_ref_4x4.c b/kernels/c99/3/bli_gemmtrsm_l_c99_4x4.c similarity index 65% rename from kernels/c99/3/bli_gemmtrsm_u_ref_4x4.c rename to kernels/c99/3/bli_gemmtrsm_l_c99_4x4.c index b4dde2940..f697bceab 100644 --- a/kernels/c99/3/bli_gemmtrsm_u_ref_4x4.c +++ b/kernels/c99/3/bli_gemmtrsm_l_c99_4x4.c @@ -36,37 +36,53 @@ #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, gemmukr, trsmukr ) \ +#define GENTFUNC( ctype, ch, varname, gemmkerid, trsmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict aR, \ - ctype* restrict a, \ - ctype* restrict bB, \ - ctype* restrict b, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a10, \ + ctype* restrict a11, \ + ctype* restrict b01, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype* minus_one = PASTEMAC(ch,m1); \ -\ + const num_t dt = PASTEMAC(ch,type); \ const inc_t rs_b = 4; \ const inc_t cs_b = 1; \ \ - PASTEMAC(ch,gemmukr)( k, \ - minus_one, \ - aR, \ - bB, \ - alpha, \ - b, rs_b, cs_b, \ - data ); \ + ctype* minus_one = PASTEMAC(ch,m1); \ \ - PASTEMAC(ch,trsmukr)( a, \ - b, \ - c, rs_c, cs_c, \ - data ); \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, gemmkerid, cntx ); \ + PASTECH(ch,trsm_ukr_ft) \ + trsm_ukr = bli_cntx_get_l3_ukr_dt( dt, trsmkerid, cntx ); \ +\ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + a10, \ + b01, \ + alpha, \ + b11, rs_b, cs_b, \ + data, \ + cntx \ + ); \ +\ + trsm_ukr \ + ( \ + a11, \ + b11, \ + c11, rs_c, cs_c, \ + data, \ + cntx \ + ); \ } -INSERT_GENTFUNC_BASIC2( gemmtrsm_u_ref_4x4, GEMM_UKERNEL, TRSM_U_UKERNEL ) +INSERT_GENTFUNC_BASIC2( gemmtrsm_l_c99_4x4, BLIS_GEMM_UKR, BLIS_TRSM_L_UKR ) diff --git a/kernels/c99/3/bli_gemmtrsm_l_ref_4x4.c b/kernels/c99/3/bli_gemmtrsm_u_c99_4x4.c similarity index 65% rename from kernels/c99/3/bli_gemmtrsm_l_ref_4x4.c rename to kernels/c99/3/bli_gemmtrsm_u_c99_4x4.c index 2b2f6377a..f0f0a5ee0 100644 --- a/kernels/c99/3/bli_gemmtrsm_l_ref_4x4.c +++ b/kernels/c99/3/bli_gemmtrsm_u_c99_4x4.c @@ -36,37 +36,53 @@ #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, gemmukr, trsmukr ) \ +#define GENTFUNC( ctype, ch, varname, gemmkerid, trsmkerid ) \ \ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict aL, \ - ctype* restrict a, \ - ctype* restrict bT, \ - ctype* restrict b, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,varname) \ + ( \ + dim_t k, \ + ctype* restrict alpha, \ + ctype* restrict a12, \ + ctype* restrict a11, \ + ctype* restrict b21, \ + ctype* restrict b11, \ + ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ - ctype* minus_one = PASTEMAC(ch,m1); \ -\ + const num_t dt = PASTEMAC(ch,type); \ const inc_t rs_b = 4; \ const inc_t cs_b = 1; \ \ - PASTEMAC(ch,gemmukr)( k, \ - minus_one, \ - aL, \ - bT, \ - alpha, \ - b, rs_b, cs_b, \ - data ); \ + ctype* minus_one = PASTEMAC(ch,m1); \ \ - PASTEMAC(ch,trsmukr)( a, \ - b, \ - c, rs_c, cs_c, \ - data ); \ + PASTECH(ch,gemm_ukr_ft) \ + gemm_ukr = bli_cntx_get_l3_ukr_dt( dt, gemmkerid, cntx ); \ + PASTECH(ch,trsm_ukr_ft) \ + trsm_ukr = bli_cntx_get_l3_ukr_dt( dt, trsmkerid, cntx ); \ +\ + gemm_ukr \ + ( \ + k, \ + minus_one, \ + a12, \ + b21, \ + alpha, \ + b11, rs_b, cs_b, \ + data, \ + cntx \ + ); \ +\ + trsm_ukr \ + ( \ + a11, \ + b11, \ + c11, rs_c, cs_c, \ + data, \ + cntx \ + ); \ } -INSERT_GENTFUNC_BASIC2( gemmtrsm_l_ref_4x4, GEMM_UKERNEL, TRSM_L_UKERNEL ) +INSERT_GENTFUNC_BASIC2( gemmtrsm_u_c99_4x4, BLIS_GEMM_UKR, BLIS_TRSM_U_UKR ) diff --git a/kernels/c99/3/bli_trsm_l_ref_4x4.c b/kernels/c99/3/bli_trsm_l_c99_4x4.c similarity index 93% rename from kernels/c99/3/bli_trsm_l_ref_4x4.c rename to kernels/c99/3/bli_trsm_l_c99_4x4.c index 9e270a450..ae706da42 100644 --- a/kernels/c99/3/bli_trsm_l_ref_4x4.c +++ b/kernels/c99/3/bli_trsm_l_c99_4x4.c @@ -36,14 +36,16 @@ #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, kername ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,varname)( \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ const dim_t rs_a = 1; \ const dim_t cs_a = 4; \ @@ -201,5 +203,5 @@ void PASTEMAC(ch,varname)( \ *(c + 3*rs_c + 3*cs_c) = b33; \ } -INSERT_GENTFUNC_BASIC( trsm_l_ref_4x4, trsm_l_ref_4x4 ) +INSERT_GENTFUNC_BASIC0( trsm_l_c99_4x4 ) diff --git a/kernels/c99/3/bli_trsm_u_ref_4x4.c b/kernels/c99/3/bli_trsm_u_c99_4x4.c similarity index 93% rename from kernels/c99/3/bli_trsm_u_ref_4x4.c rename to kernels/c99/3/bli_trsm_u_c99_4x4.c index 97fa1a4a4..7cd901005 100644 --- a/kernels/c99/3/bli_trsm_u_ref_4x4.c +++ b/kernels/c99/3/bli_trsm_u_c99_4x4.c @@ -36,14 +36,16 @@ #undef GENTFUNC -#define GENTFUNC( ctype, ch, varname, kername ) \ +#define GENTFUNC( ctype, ch, opname ) \ \ -void PASTEMAC(ch,varname)( \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - auxinfo_t* data \ - ) \ +void PASTEMAC(ch,opname) \ + ( \ + ctype* restrict a, \ + ctype* restrict b, \ + ctype* restrict c, inc_t rs_c, inc_t cs_c, \ + auxinfo_t* restrict data, \ + cntx_t* restrict cntx \ + ) \ { \ const dim_t rs_a = 1; \ const dim_t cs_a = 4; \ @@ -201,5 +203,5 @@ void PASTEMAC(ch,varname)( \ *(c + 0*rs_c + 3*cs_c) = b03; \ } -INSERT_GENTFUNC_BASIC( trsm_u_ref_4x4, trsm_u_ref_4x4 ) +INSERT_GENTFUNC_BASIC0( trsm_u_c99_4x4 ) diff --git a/kernels/loongson3a/3/bli_gemm_opt_d4x4.c b/kernels/loongson3a/3/bli_gemm_opt_d4x4.c index a7e4a825a..a7834c18c 100644 --- a/kernels/loongson3a/3/bli_gemm_opt_d4x4.c +++ b/kernels/loongson3a/3/bli_gemm_opt_d4x4.c @@ -34,35 +34,18 @@ #include "blis.h" -void bli_sgemm_opt_d4x4( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) -{ - /* Just call the reference implementation. */ - BLIS_SGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); -} -void bli_dgemm_opt_d4x4( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_opt_4x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { dim_t k_iter = k / 4; dim_t k_left = k % 4; @@ -536,43 +519,3 @@ void bli_dgemm_opt_d4x4( } -void bli_cgemm_opt_d4x4( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) -{ - /* Just call the reference implementation. */ - BLIS_CGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); -} - -void bli_zgemm_opt_d4x4( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) -{ - /* Just call the reference implementation. */ - BLIS_ZGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); -} - diff --git a/kernels/mic/3/bli_dgemm_opt_30x8.c b/kernels/mic/3/bli_dgemm_opt_30x8.c index 485cdd926..151f56b9a 100644 --- a/kernels/mic/3/bli_dgemm_opt_30x8.c +++ b/kernels/mic/3/bli_dgemm_opt_30x8.c @@ -254,15 +254,17 @@ extern int offsets[16]; //#define MONITORS //#define LOOPMON -void bli_dgemm_opt_30x8( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_asm_30x8 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { double * a_next = bli_auxinfo_next_a( data ); double * b_next = bli_auxinfo_next_b( data ); @@ -574,43 +576,3 @@ void bli_dgemm_opt_30x8( #endif } - - -void bli_cgemm_opt_30x8( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) -{ - BLIS_CGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); -} - - -void bli_zgemm_opt_30x8( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) -{ - BLIS_ZGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); -} diff --git a/kernels/mic/3/bli_sgemm_opt_30x16.c b/kernels/mic/3/bli_sgemm_opt_30x16.c index 90727bafa..f8eb972bc 100644 --- a/kernels/mic/3/bli_sgemm_opt_30x16.c +++ b/kernels/mic/3/bli_sgemm_opt_30x16.c @@ -254,15 +254,17 @@ int offsets[16] __attribute__((aligned(0x1000))) = {0, 1, 2, 3, 4, 5, 6, 7, 8, 9 //#define MONITORS //#define LOOPMON -void bli_sgemm_opt_30x16( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_asm_30x16 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { float * a_next = bli_auxinfo_next_a( data ); float * b_next = bli_auxinfo_next_b( data ); diff --git a/kernels/nacl/pnacl/3/bli_gemm_opt.c b/kernels/nacl/pnacl/3/bli_gemm_opt.c index 23459d6ad..b05c2adad 100644 --- a/kernels/nacl/pnacl/3/bli_gemm_opt.c +++ b/kernels/nacl/pnacl/3/bli_gemm_opt.c @@ -61,16 +61,19 @@ return (v4sf) { 0.0f, 0.0f, 0.0f, 0.0f }; } - void bli_sgemm_opt( - dim_t k, - float alpha[restrict static 1], - float a[restrict static 8*k], - float b[restrict static k*4], - float beta[restrict static 1], - float c[restrict static 8*4], - inc_t rs_c, - inc_t cs_c, - auxinfo_t* data) + void bli_sgemm_opt + ( + dim_t k, + float alpha[restrict static 1], + float a[restrict static 8*k], + float b[restrict static k*4], + float beta[restrict static 1], + float c[restrict static 8*4], + inc_t rs_c, + inc_t cs_c, + auxinfo_t* data, + cntx_t* cntx + ) { // Vectors for accummulating column 0, 1, 2, 3 (initialize to 0.0) v4sf abv0t = v4sf_zero(), abv1t = v4sf_zero(), abv2t = v4sf_zero(), abv3t = v4sf_zero(); @@ -201,16 +204,19 @@ } } - void bli_cgemm_opt( - dim_t k, - scomplex alpha[restrict static 1], - scomplex a[restrict static 4*k], - scomplex b[restrict static k*4], - scomplex beta[restrict static 1], - scomplex c[restrict static 4*4], - inc_t rs_c, - inc_t cs_c, - auxinfo_t* data) + void bli_cgemm_opt + ( + dim_t k, + scomplex alpha[restrict static 1], + scomplex a[restrict static 4*k], + scomplex b[restrict static k*4], + scomplex beta[restrict static 1], + scomplex c[restrict static 4*4], + inc_t rs_c, + inc_t cs_c, + auxinfo_t* data, + cntx_t* cntx + ) { // Vectors for accummulating column 0, 1, 2, 3 (initialize to 0.0) v4sf abv0r = v4sf_zero(), abv1r = v4sf_zero(), abv2r = v4sf_zero(), abv3r = v4sf_zero(); diff --git a/kernels/x86/1m/bli_packm_2xk.c b/kernels/old/x86/1m/bli_packm_2xk.c similarity index 100% rename from kernels/x86/1m/bli_packm_2xk.c rename to kernels/old/x86/1m/bli_packm_2xk.c diff --git a/kernels/x86/1m/bli_packm_2xk.h b/kernels/old/x86/1m/bli_packm_2xk.h similarity index 100% rename from kernels/x86/1m/bli_packm_2xk.h rename to kernels/old/x86/1m/bli_packm_2xk.h diff --git a/kernels/x86/1m/bli_packm_4xk.c b/kernels/old/x86/1m/bli_packm_4xk.c similarity index 100% rename from kernels/x86/1m/bli_packm_4xk.c rename to kernels/old/x86/1m/bli_packm_4xk.c diff --git a/kernels/x86/1m/bli_packm_4xk.h b/kernels/old/x86/1m/bli_packm_4xk.h similarity index 100% rename from kernels/x86/1m/bli_packm_4xk.h rename to kernels/old/x86/1m/bli_packm_4xk.h diff --git a/kernels/x86/3/bli_gemm_opt_d2x4.c b/kernels/old/x86/3/bli_gemm_opt_d2x4.c similarity index 100% rename from kernels/x86/3/bli_gemm_opt_d2x4.c rename to kernels/old/x86/3/bli_gemm_opt_d2x4.c diff --git a/kernels/x86/3/bli_gemm_opt_d4x2.c b/kernels/old/x86/3/bli_gemm_opt_d4x2.c similarity index 100% rename from kernels/x86/3/bli_gemm_opt_d4x2.c rename to kernels/old/x86/3/bli_gemm_opt_d4x2.c diff --git a/kernels/x86/3/bli_gemmtrsm_l_opt_d4x2.c b/kernels/old/x86/3/bli_gemmtrsm_l_opt_d4x2.c similarity index 100% rename from kernels/x86/3/bli_gemmtrsm_l_opt_d4x2.c rename to kernels/old/x86/3/bli_gemmtrsm_l_opt_d4x2.c diff --git a/kernels/x86/3/bli_gemmtrsm_u_opt_d4x2.c b/kernels/old/x86/3/bli_gemmtrsm_u_opt_d4x2.c similarity index 100% rename from kernels/x86/3/bli_gemmtrsm_u_opt_d4x2.c rename to kernels/old/x86/3/bli_gemmtrsm_u_opt_d4x2.c diff --git a/kernels/x86/3/bli_trsm_l_opt_d4x2.c b/kernels/old/x86/3/bli_trsm_l_opt_d4x2.c similarity index 100% rename from kernels/x86/3/bli_trsm_l_opt_d4x2.c rename to kernels/old/x86/3/bli_trsm_l_opt_d4x2.c diff --git a/kernels/power7/3/bli_gemm_opt_8x4.c b/kernels/power7/3/bli_gemm_opt_8x4.c index 56e7c82f1..456973023 100644 --- a/kernels/power7/3/bli_gemm_opt_8x4.c +++ b/kernels/power7/3/bli_gemm_opt_8x4.c @@ -48,15 +48,17 @@ * a is mr x k in packed col-maj format (leading dim is mr) * b is k x nr in packed row-maj format (leading dim is nr) */ -void bli_sgemm_opt_8x4( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_opt_8x4 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { #if 0 || defined(UTEST) const long MR = BLIS_DEFAULT_MR_S, NR = BLIS_DEFAULT_NR_S; @@ -86,15 +88,17 @@ void bli_sgemm_opt_8x4( * a is mr x k in packed col-maj format (leading dim is mr) * b is k x nr in packed row-maj format (leading dim is nr) */ -void bli_dgemm_opt_8x4( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_opt_8x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { #if 1 if (rs_c == 1) { @@ -459,15 +463,17 @@ void bli_dgemm_opt_8x4( * a is mr x k in packed col-maj format (leading dim is mr) * b is k x nr in packed row-maj format (leading dim is nr) */ -void bli_cgemm_opt_8x4( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_cgemm_opt_8x4 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { #if 0 || defined(UTEST) const long MR = BLIS_DEFAULT_MR_C, NR = BLIS_DEFAULT_NR_C; @@ -508,15 +514,17 @@ void bli_cgemm_opt_8x4( * a is mr x k in packed col-maj format (leading dim is mr) * b is k x nr in packed row-maj format (leading dim is nr) */ -void bli_zgemm_opt_8x4( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemm_opt_8x4 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { #if 0 || defined(UTEST) const long MR = BLIS_DEFAULT_MR_Z, NR = BLIS_DEFAULT_NR_Z; diff --git a/kernels/power7/3/bli_gemm_opt_8x4.h b/kernels/power7/3/bli_gemm_opt_8x4.h index b9fdb3648..6514b2b6f 100644 --- a/kernels/power7/3/bli_gemm_opt_8x4.h +++ b/kernels/power7/3/bli_gemm_opt_8x4.h @@ -41,44 +41,52 @@ #include "blis.h" #endif -void bli_sgemm_opt_8x4( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ); +void bli_sgemm_opt_8x4 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ); -void bli_dgemm_opt_8x4( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ); +void bli_dgemm_opt_8x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ); -void bli_cgemm_opt_8x4( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ); +void bli_cgemm_opt_8x4 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ); -void bli_zgemm_opt_8x4( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ); +void bli_zgemm_opt_8x4 + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a, + dcomplex* restrict b, + dcomplex* restrict beta, + dcomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ); #endif diff --git a/kernels/x86/3/bli_gemm_opt_d2x4.h b/kernels/x86/3/bli_gemm_opt_d2x4.h deleted file mode 100644 index ffdf8567a..000000000 --- a/kernels/x86/3/bli_gemm_opt_d2x4.h +++ /dev/null @@ -1,49 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - - -#undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ -\ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( gemm_opt_d2x4 ) - diff --git a/kernels/x86/3/bli_gemm_opt_d4x2.h b/kernels/x86/3/bli_gemm_opt_d4x2.h deleted file mode 100644 index f826e5492..000000000 --- a/kernels/x86/3/bli_gemm_opt_d4x2.h +++ /dev/null @@ -1,51 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - - -#undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ -\ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a, \ - ctype* restrict b, \ - ctype* restrict beta, \ - ctype* restrict c, inc_t rs_c, inc_t cs_c, \ - ctype* restrict a_next, \ - ctype* restrict b_next \ - ); - -INSERT_GENTPROT_BASIC( gemm_opt_d4x2 ) - diff --git a/kernels/x86/3/bli_gemmtrsm_l_opt_d4x2.h b/kernels/x86/3/bli_gemmtrsm_l_opt_d4x2.h deleted file mode 100644 index 516f1ac18..000000000 --- a/kernels/x86/3/bli_gemmtrsm_l_opt_d4x2.h +++ /dev/null @@ -1,53 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - - -#undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ -\ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a10, \ - ctype* restrict a11, \ - ctype* restrict bd01, \ - ctype* restrict bd11, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - ctype* restrict a_next, \ - ctype* restrict b_next \ - ); - -INSERT_GENTPROT_BASIC( gemmtrsm_l_opt_d4x2 ) - diff --git a/kernels/x86/3/bli_gemmtrsm_u_opt_d4x2.h b/kernels/x86/3/bli_gemmtrsm_u_opt_d4x2.h deleted file mode 100644 index 820361c4a..000000000 --- a/kernels/x86/3/bli_gemmtrsm_u_opt_d4x2.h +++ /dev/null @@ -1,53 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - - -#undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ -\ -void PASTEMAC(ch,varname)( \ - dim_t k, \ - ctype* restrict alpha, \ - ctype* restrict a12, \ - ctype* restrict a11, \ - ctype* restrict bd21, \ - ctype* restrict bd11, \ - ctype* restrict b11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c, \ - ctype* restrict a_next, \ - ctype* restrict b_next \ - ); - -INSERT_GENTPROT_BASIC( gemmtrsm_u_opt_d4x2 ) - diff --git a/kernels/x86/3/bli_trsm_l_opt_d4x2.h b/kernels/x86/3/bli_trsm_l_opt_d4x2.h deleted file mode 100644 index 3ff690a44..000000000 --- a/kernels/x86/3/bli_trsm_l_opt_d4x2.h +++ /dev/null @@ -1,47 +0,0 @@ -/* - - BLIS - An object-based framework for developing high-performance BLAS-like - libraries. - - Copyright (C) 2014, The University of Texas at Austin - - Redistribution and use in source and binary forms, with or without - modification, are permitted provided that the following conditions are - met: - - Redistributions of source code must retain the above copyright - notice, this list of conditions and the following disclaimer. - - Redistributions in binary form must reproduce the above copyright - notice, this list of conditions and the following disclaimer in the - documentation and/or other materials provided with the distribution. - - Neither the name of The University of Texas at Austin nor the names - of its contributors may be used to endorse or promote products - derived from this software without specific prior written permission. - - THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS - "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT - LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR - A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT - HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, - SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT - LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, - DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY - THEORY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT - (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE - OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. - -*/ - - -#undef GENTPROT -#define GENTPROT( ctype, ch, varname ) \ -\ -void PASTEMAC(ch,varname)( \ - ctype* restrict a11, \ - ctype* restrict b11, \ - ctype* restrict bd11, \ - ctype* restrict c11, inc_t rs_c, inc_t cs_c \ - ); - -INSERT_GENTPROT_BASIC( trsm_l_opt_d4x2 ) - diff --git a/kernels/x86_64/bulldozer/3/bli_gemm_4x6_FMA4.c b/kernels/x86_64/bulldozer/3/bli_gemm_asm_d4x6_fma4.c similarity index 98% rename from kernels/x86_64/bulldozer/3/bli_gemm_4x6_FMA4.c rename to kernels/x86_64/bulldozer/3/bli_gemm_asm_d4x6_fma4.c index e2a020ab1..dcfc41488 100644 --- a/kernels/x86_64/bulldozer/3/bli_gemm_4x6_FMA4.c +++ b/kernels/x86_64/bulldozer/3/bli_gemm_asm_d4x6_fma4.c @@ -85,15 +85,17 @@ "vmovss %%xmm3, (%%rdx,%%r13) \n\t"\ -void bli_sgemm_8x8_FMA4( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_asm_8x8_fma4 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { dim_t k_iter = k / 4; dim_t k_left = k % 4; @@ -851,15 +853,17 @@ void bli_sgemm_8x8_FMA4( "vmovaps 16 * 8(%%rbx), %%xmm3 \n\t"\ "addq $24*8, %%rbx \n\t" -void bli_dgemm_4x6_FMA4( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_asm_4x6_fma4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { dim_t k_iter = k / 12; dim_t k_left = k % 12; @@ -1057,15 +1061,17 @@ void bli_dgemm_4x6_FMA4( "vfmaddps %%ymm8, %%ymm1, %%ymm5, %%ymm8 \n\t"\ "vperm2f128 $0x3, %%ymm3, %%ymm3, %%ymm5 \n\t"\ -void bli_cgemm_8x4_FMA4( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_cgemm_asm_8x4_fma4 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); @@ -1862,15 +1868,17 @@ void bli_cgemm_8x4_FMA4( "vaddsubpd %%ymm"j", %%ymm"i", %%ymm"i" \n\t"\ " \n\t" - void bli_zgemm_4x4_FMA4( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemm_asm_4x4_fma4 + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a, + dcomplex* restrict b, + dcomplex* restrict beta, + dcomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); diff --git a/kernels/x86_64/avx2/3/bli_gemm_asm_d12x4.c b/kernels/x86_64/haswell/3/bli_gemm_asm_d12x4.c similarity index 97% rename from kernels/x86_64/avx2/3/bli_gemm_asm_d12x4.c rename to kernels/x86_64/haswell/3/bli_gemm_asm_d12x4.c index 2f9ec6542..5bc2dd4ba 100644 --- a/kernels/x86_64/avx2/3/bli_gemm_asm_d12x4.c +++ b/kernels/x86_64/haswell/3/bli_gemm_asm_d12x4.c @@ -65,15 +65,17 @@ "vpermilps $0x39, %%xmm2, %%xmm1 \n\t" \ "vmovss %%xmm1, (%%rcx,%%r10 ) \n\t" -void bli_sgemm_asm_24x4( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_asm_24x4 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); @@ -682,15 +684,17 @@ void bli_sgemm_asm_24x4( "vmovlpd %%xmm1, (%%rcx,%%r13,2) \n\t" \ "vmovhpd %%xmm1, (%%rcx,%%r10 ) \n\t"*/ -void bli_dgemm_asm_12x4( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_asm_12x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); @@ -1277,15 +1281,17 @@ void bli_dgemm_asm_12x4( #if 0 -void bli_cgemm_asm_( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_cgemm_asm_ + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); @@ -1297,15 +1303,17 @@ void bli_cgemm_asm_( -void bli_zgemm_asm_( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemm_asm_ + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a, + dcomplex* restrict b, + dcomplex* restrict beta, + dcomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); diff --git a/kernels/x86_64/avx2/3/bli_gemm_asm_d8x6.c b/kernels/x86_64/haswell/3/bli_gemm_asm_d8x6.c similarity index 97% rename from kernels/x86_64/avx2/3/bli_gemm_asm_d8x6.c rename to kernels/x86_64/haswell/3/bli_gemm_asm_d8x6.c index 47cc7ad68..0a49f8989 100644 --- a/kernels/x86_64/avx2/3/bli_gemm_asm_d8x6.c +++ b/kernels/x86_64/haswell/3/bli_gemm_asm_d8x6.c @@ -65,15 +65,17 @@ "vpermilps $0x39, %%xmm2, %%xmm1 \n\t" \ "vmovss %%xmm1, (%%rcx,%%r10 ) \n\t" -void bli_sgemm_asm_16x6( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_asm_16x6 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); @@ -676,15 +678,17 @@ void bli_sgemm_asm_16x6( "vmovlpd %%xmm1, (%%rcx,%%r13,2) \n\t" \ "vmovhpd %%xmm1, (%%rcx,%%r10 ) \n\t"*/ -void bli_dgemm_asm_8x6( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_asm_8x6 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); @@ -1265,15 +1269,17 @@ void bli_dgemm_asm_8x6( #if 0 -void bli_cgemm_asm_( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_cgemm_asm_ + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); @@ -1285,15 +1291,17 @@ void bli_cgemm_asm_( -void bli_zgemm_asm_( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemm_asm_ + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a, + dcomplex* restrict b, + dcomplex* restrict beta, + dcomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); diff --git a/kernels/x86_64/core2-sse3/1/bli_axpyv_opt_var1.c b/kernels/x86_64/penryn/1/bli_axpyv_int_var1.c similarity index 93% rename from kernels/x86_64/core2-sse3/1/bli_axpyv_opt_var1.c rename to kernels/x86_64/penryn/1/bli_axpyv_int_var1.c index f1d3b39fd..f9a923bed 100644 --- a/kernels/x86_64/core2-sse3/1/bli_axpyv_opt_var1.c +++ b/kernels/x86_64/penryn/1/bli_axpyv_int_var1.c @@ -43,13 +43,15 @@ typedef union } v2df_t; -void bli_daxpyv_opt_var1( - conj_t conjx, - dim_t n, - double* restrict alpha, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy - ) +void bli_daxpyv_int_var1 + ( + conj_t conjx, + dim_t n, + double* alpha, + double* x, inc_t incx, + double* y, inc_t incy, + cntx_t* cntx + ) { double* restrict alpha_cast = alpha; double* restrict x_cast = x; diff --git a/kernels/x86_64/core2-sse3/1/bli_dotv_opt_var1.c b/kernels/x86_64/penryn/1/bli_dotv_int_var1.c similarity index 91% rename from kernels/x86_64/core2-sse3/1/bli_dotv_opt_var1.c rename to kernels/x86_64/penryn/1/bli_dotv_int_var1.c index e3d4d05e9..53310273f 100644 --- a/kernels/x86_64/core2-sse3/1/bli_dotv_opt_var1.c +++ b/kernels/x86_64/penryn/1/bli_dotv_int_var1.c @@ -43,14 +43,16 @@ typedef union } v2df_t; -void bli_ddotv_opt_var1( - conj_t conjx, - conj_t conjy, - dim_t n, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy, - double* restrict rho - ) +void bli_ddotv_int_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + double* x, inc_t incx, + double* y, inc_t incy, + double* rho, + cntx_t* cntx + ) { double* restrict x_cast = x; double* restrict y_cast = y; diff --git a/kernels/x86_64/core2-sse3/1f/bli_axpy2v_opt_var1.c b/kernels/x86_64/penryn/1f/bli_axpy2v_opt_var1.c similarity index 93% rename from kernels/x86_64/core2-sse3/1f/bli_axpy2v_opt_var1.c rename to kernels/x86_64/penryn/1f/bli_axpy2v_opt_var1.c index 650f55dd8..3d88d3f26 100644 --- a/kernels/x86_64/core2-sse3/1f/bli_axpy2v_opt_var1.c +++ b/kernels/x86_64/penryn/1f/bli_axpy2v_opt_var1.c @@ -43,16 +43,18 @@ typedef union } v2df_t; -void bli_daxpy2v_opt_var1( - conj_t conjx, - conj_t conjy, - dim_t n, - double* restrict alpha, - double* restrict beta, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy, - double* restrict z, inc_t incz - ) +void bli_daxpy2v_int_var1 + ( + conj_t conjx, + conj_t conjy, + dim_t n, + double* alpha, + double* beta, + double* x, inc_t incx, + double* y, inc_t incy, + double* z, inc_t incz, + cntx_t* cntx + ) { double* restrict alpha_cast = alpha; double* restrict beta_cast = beta; diff --git a/kernels/x86_64/core2-sse3/1f/bli_axpyf_opt_var1.c b/kernels/x86_64/penryn/1f/bli_axpyf_opt_var1.c similarity index 92% rename from kernels/x86_64/core2-sse3/1f/bli_axpyf_opt_var1.c rename to kernels/x86_64/penryn/1f/bli_axpyf_opt_var1.c index e57e29669..41bf5f1c4 100644 --- a/kernels/x86_64/core2-sse3/1f/bli_axpyf_opt_var1.c +++ b/kernels/x86_64/penryn/1f/bli_axpyf_opt_var1.c @@ -42,16 +42,19 @@ typedef union double d[2]; } v2df_t; -void bli_daxpyf_opt_var1( - conj_t conja, - conj_t conjx, - dim_t m, - dim_t b_n, - double* restrict alpha, - double* restrict a, inc_t inca, inc_t lda, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy - ) + +void bli_daxpyf_int_var1 + ( + conj_t conja, + conj_t conjx, + dim_t m, + dim_t b_n, + double* alpha, + double* a, inc_t inca, inc_t lda, + double* x, inc_t incx, + double* y, inc_t incy, + cntx_t* cntx + ) { double* restrict alpha_cast = alpha; double* restrict a_cast = a; diff --git a/kernels/x86_64/core2-sse3/1f/bli_dotaxpyv_opt_var1.c b/kernels/x86_64/penryn/1f/bli_dotaxpyv_opt_var1.c similarity index 89% rename from kernels/x86_64/core2-sse3/1f/bli_dotaxpyv_opt_var1.c rename to kernels/x86_64/penryn/1f/bli_dotaxpyv_opt_var1.c index 0cb390dea..f90f1ae83 100644 --- a/kernels/x86_64/core2-sse3/1f/bli_dotaxpyv_opt_var1.c +++ b/kernels/x86_64/penryn/1f/bli_dotaxpyv_opt_var1.c @@ -43,17 +43,19 @@ typedef union } v2df_t; -void bli_ddotaxpyv_opt_var1( - conj_t conjxt, - conj_t conjx, - conj_t conjy, - dim_t n, - double* restrict alpha, - double* restrict x, inc_t incx, - double* restrict y, inc_t incy, - double* restrict rho, - double* restrict z, inc_t incz - ) +void bli_ddotaxpyv_int_var1 + ( + conj_t conjxt, + conj_t conjx, + conj_t conjy, + dim_t n, + double* alpha, + double* x, inc_t incx, + double* y, inc_t incy, + double* rho, + double* z, inc_t incz, + cntx_t* cntx + ) { double* restrict alpha_cast = alpha; double* restrict x_cast = x; diff --git a/kernels/x86_64/core2-sse3/1f/bli_dotxaxpyf_opt_var1.c b/kernels/x86_64/penryn/1f/bli_dotxaxpyf_opt_var1.c similarity index 92% rename from kernels/x86_64/core2-sse3/1f/bli_dotxaxpyf_opt_var1.c rename to kernels/x86_64/penryn/1f/bli_dotxaxpyf_opt_var1.c index 2f907a8f7..cec491bc9 100644 --- a/kernels/x86_64/core2-sse3/1f/bli_dotxaxpyf_opt_var1.c +++ b/kernels/x86_64/penryn/1f/bli_dotxaxpyf_opt_var1.c @@ -43,19 +43,23 @@ typedef union } v2df_t; -void bli_ddotxaxpyf_opt_var1( conj_t conjat, - conj_t conja, - conj_t conjw, - conj_t conjx, - dim_t m, - dim_t b_n, - double* restrict alpha, - double* restrict a, inc_t inca, inc_t lda, - double* restrict w, inc_t incw, - double* restrict x, inc_t incx, - double* restrict beta, - double* restrict y, inc_t incy, - double* restrict z, inc_t incz ) +void bli_ddotxaxpyf_int_var1 + ( + conj_t conjat, + conj_t conja, + conj_t conjw, + conj_t conjx, + dim_t m, + dim_t b_n, + double* alpha, + double* a, inc_t inca, inc_t lda, + double* w, inc_t incw, + double* x, inc_t incx, + double* beta, + double* y, inc_t incy, + double* z, inc_t incz, + cntx_t* cntx + ) { double* restrict alpha_cast = alpha; double* restrict beta_cast = beta; diff --git a/kernels/x86_64/core2-sse3/1f/bli_dotxf_opt_var1.c b/kernels/x86_64/penryn/1f/bli_dotxf_opt_var1.c similarity index 93% rename from kernels/x86_64/core2-sse3/1f/bli_dotxf_opt_var1.c rename to kernels/x86_64/penryn/1f/bli_dotxf_opt_var1.c index 1dab93b5c..d104d0b01 100644 --- a/kernels/x86_64/core2-sse3/1f/bli_dotxf_opt_var1.c +++ b/kernels/x86_64/penryn/1f/bli_dotxf_opt_var1.c @@ -43,17 +43,19 @@ typedef union } v2df_t; -void bli_ddotxf_opt_var1( - conj_t conjat, - conj_t conjx, - dim_t m, - dim_t b_n, - double* restrict alpha, - double* restrict a, inc_t inca, inc_t lda, - double* restrict x, inc_t incx, - double* restrict beta, - double* restrict y, inc_t incy - ) +void bli_ddotxf_int_var1 + ( + conj_t conjat, + conj_t conjx, + dim_t m, + dim_t b_n, + double* alpha, + double* a, inc_t inca, inc_t lda, + double* x, inc_t incx, + double* beta, + double* y, inc_t incy, + cntx_t* cntx + ) { double* restrict alpha_cast = alpha; double* restrict beta_cast = beta; diff --git a/kernels/x86_64/core2-sse3/1f/bli_axpyf_opt_var1.c.alt b/kernels/x86_64/penryn/1f/old/bli_axpyf_opt_var1.c.alt similarity index 100% rename from kernels/x86_64/core2-sse3/1f/bli_axpyf_opt_var1.c.alt rename to kernels/x86_64/penryn/1f/old/bli_axpyf_opt_var1.c.alt diff --git a/kernels/x86_64/core2-sse3/1f/bli_dotxf_opt_var1.c.alt b/kernels/x86_64/penryn/1f/old/bli_dotxf_opt_var1.c.alt similarity index 100% rename from kernels/x86_64/core2-sse3/1f/bli_dotxf_opt_var1.c.alt rename to kernels/x86_64/penryn/1f/old/bli_dotxf_opt_var1.c.alt diff --git a/kernels/x86_64/core2-sse3/3/bli_gemm_opt_d4x4.c b/kernels/x86_64/penryn/3/bli_gemm_asm_d4x4.c similarity index 97% rename from kernels/x86_64/core2-sse3/3/bli_gemm_opt_d4x4.c rename to kernels/x86_64/penryn/3/bli_gemm_asm_d4x4.c index b4dfb1ce9..5eb0a2f3c 100644 --- a/kernels/x86_64/core2-sse3/3/bli_gemm_opt_d4x4.c +++ b/kernels/x86_64/penryn/3/bli_gemm_asm_d4x4.c @@ -34,15 +34,17 @@ #include "blis.h" -void bli_sgemm_opt_8x4( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_asm_8x4 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); @@ -834,15 +836,17 @@ void bli_sgemm_opt_8x4( ); } -void bli_dgemm_opt_4x4( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_asm_4x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); @@ -1475,45 +1479,34 @@ void bli_dgemm_opt_4x4( ); } -void bli_cgemm_opt_4x2( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +#if 0 + +void bli_cgemm_asm_4x2 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_CGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); } - - -void bli_zgemm_opt_2x2( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemm_asm_2x2 + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a, + dcomplex* restrict b, + dcomplex* restrict beta, + dcomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_ZGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); } +#endif diff --git a/kernels/x86_64/core2-sse3/3/bli_gemmtrsm_l_opt_d4x4.c b/kernels/x86_64/penryn/3/bli_gemmtrsm_l_asm_d4x4.c similarity index 89% rename from kernels/x86_64/core2-sse3/3/bli_gemmtrsm_l_opt_d4x4.c rename to kernels/x86_64/penryn/3/bli_gemmtrsm_l_asm_d4x4.c index 74e5cca70..576f43400 100644 --- a/kernels/x86_64/core2-sse3/3/bli_gemmtrsm_l_opt_d4x4.c +++ b/kernels/x86_64/penryn/3/bli_gemmtrsm_l_asm_d4x4.c @@ -34,38 +34,35 @@ #include "blis.h" -void bli_sgemmtrsm_l_opt_8x4( - dim_t k, - float* restrict alpha, - float* restrict a10, - float* restrict a11, - float* restrict b01, - float* restrict b11, - float* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +#if 0 +void bli_sgemmtrsm_l_asm_8x4 + ( + dim_t k, + float* restrict alpha, + float* restrict a10, + float* restrict a11, + float* restrict b01, + float* restrict b11, + float* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_SGEMMTRSM_L_UKERNEL_REF( k, - alpha, - a10, - a11, - b01, - b11, - c11, rs_c, cs_c, - data ); } +#endif -void bli_dgemmtrsm_l_opt_4x4( - dim_t k, - double* restrict alpha, - double* restrict a10, - double* restrict a11, - double* restrict b01, - double* restrict b11, - double* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemmtrsm_l_asm_4x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a10, + double* restrict a11, + double* restrict b01, + double* restrict b11, + double* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { void* b_next = bli_auxinfo_next_b( data ); @@ -539,47 +536,35 @@ void bli_dgemmtrsm_l_opt_4x4( } -void bli_cgemmtrsm_l_opt_4x2( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a10, - scomplex* restrict a11, - scomplex* restrict b01, - scomplex* restrict b11, - scomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +#if 0 +void bli_cgemmtrsm_l_asm_4x2 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a10, + scomplex* restrict a11, + scomplex* restrict b01, + scomplex* restrict b11, + scomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_CGEMMTRSM_L_UKERNEL_REF( k, - alpha, - a10, - a11, - b01, - b11, - c11, rs_c, cs_c, - data ); } -void bli_zgemmtrsm_l_opt_2x2( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a10, - dcomplex* restrict a11, - dcomplex* restrict b01, - dcomplex* restrict b11, - dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemmtrsm_l_asm_2x2 + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a10, + dcomplex* restrict a11, + dcomplex* restrict b01, + dcomplex* restrict b11, + dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_ZGEMMTRSM_L_UKERNEL_REF( k, - alpha, - a10, - a11, - b01, - b11, - c11, rs_c, cs_c, - data ); } +#endif diff --git a/kernels/x86_64/core2-sse3/3/bli_gemmtrsm_u_opt_d4x4.c b/kernels/x86_64/penryn/3/bli_gemmtrsm_u_asm_d4x4.c similarity index 88% rename from kernels/x86_64/core2-sse3/3/bli_gemmtrsm_u_opt_d4x4.c rename to kernels/x86_64/penryn/3/bli_gemmtrsm_u_asm_d4x4.c index f73b6da0a..cf0c5a11d 100644 --- a/kernels/x86_64/core2-sse3/3/bli_gemmtrsm_u_opt_d4x4.c +++ b/kernels/x86_64/penryn/3/bli_gemmtrsm_u_asm_d4x4.c @@ -34,38 +34,35 @@ #include "blis.h" -void bli_sgemmtrsm_u_opt_8x4( - dim_t k, - float* restrict alpha, - float* restrict a12, - float* restrict a11, - float* restrict b21, - float* restrict b11, - float* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +#if 0 +void bli_sgemmtrsm_u_asm_8x4 + ( + dim_t k, + float* restrict alpha, + float* restrict a12, + float* restrict a11, + float* restrict b21, + float* restrict b11, + float* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_SGEMMTRSM_U_UKERNEL_REF( k, - alpha, - a12, - a11, - b21, - b11, - c11, rs_c, cs_c, - data ); } +#endif -void bli_dgemmtrsm_u_opt_4x4( - dim_t k, - double* restrict alpha, - double* restrict a12, - double* restrict a11, - double* restrict b21, - double* restrict b11, - double* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemmtrsm_u_asm_4x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a12, + double* restrict a11, + double* restrict b21, + double* restrict b11, + double* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { void* b_next = bli_auxinfo_next_b( data ); @@ -526,47 +523,35 @@ void bli_dgemmtrsm_u_opt_4x4( } -void bli_cgemmtrsm_u_opt_4x2( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a12, - scomplex* restrict a11, - scomplex* restrict b21, - scomplex* restrict b11, - scomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +#if 0 +void bli_cgemmtrsm_u_asm_4x2 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a12, + scomplex* restrict a11, + scomplex* restrict b21, + scomplex* restrict b11, + scomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_CGEMMTRSM_U_UKERNEL_REF( k, - alpha, - a12, - a11, - b21, - b11, - c11, rs_c, cs_c, - data ); } -void bli_zgemmtrsm_u_opt_2x2( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a12, - dcomplex* restrict a11, - dcomplex* restrict b21, - dcomplex* restrict b11, - dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemmtrsm_u_asm_2x2 + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a12, + dcomplex* restrict a11, + dcomplex* restrict b21, + dcomplex* restrict b11, + dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_ZGEMMTRSM_U_UKERNEL_REF( k, - alpha, - a12, - a11, - b21, - b11, - c11, rs_c, cs_c, - data ); } +#endif diff --git a/kernels/x86_64/core2-sse3/3/bli_trsm_l_opt_d4x4.c b/kernels/x86_64/penryn/3/bli_trsm_l_asm_d4x4.c similarity index 86% rename from kernels/x86_64/core2-sse3/3/bli_trsm_l_opt_d4x4.c rename to kernels/x86_64/penryn/3/bli_trsm_l_asm_d4x4.c index 16b67541e..193f5457a 100644 --- a/kernels/x86_64/core2-sse3/3/bli_trsm_l_opt_d4x4.c +++ b/kernels/x86_64/penryn/3/bli_trsm_l_asm_d4x4.c @@ -34,26 +34,27 @@ #include "blis.h" -void bli_strsm_l_opt_8x4( - float* restrict a11, - float* restrict b11, - float* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +#if 0 +void bli_strsm_l_asm_8x4 + ( + float* restrict a11, + float* restrict b11, + float* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_STRSM_L_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); } +#endif -void bli_dtrsm_l_opt_4x4( - double* restrict a11, - double* restrict b11, - double* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dtrsm_l_asm_4x4 + ( + double* restrict a11, + double* restrict b11, + double* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { __asm__ volatile ( @@ -208,31 +209,26 @@ void bli_dtrsm_l_opt_4x4( } -void bli_ctrsm_l_opt_4x2( - scomplex* restrict a11, - scomplex* restrict b11, - scomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +#if 0 +void bli_ctrsm_l_asm_4x2 + ( + scomplex* restrict a11, + scomplex* restrict b11, + scomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_CTRSM_L_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); } -void bli_ztrsm_l_opt_2x2( - dcomplex* restrict a11, - dcomplex* restrict b11, - dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_ztrsm_l_asm_2x2 + ( + dcomplex* restrict a11, + dcomplex* restrict b11, + dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_ZTRSM_L_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); } - +#endif diff --git a/kernels/x86_64/core2-sse3/3/bli_trsm_u_opt_d4x4.c b/kernels/x86_64/penryn/3/bli_trsm_u_asm_d4x4.c similarity index 86% rename from kernels/x86_64/core2-sse3/3/bli_trsm_u_opt_d4x4.c rename to kernels/x86_64/penryn/3/bli_trsm_u_asm_d4x4.c index 0587241f9..5d8baf099 100644 --- a/kernels/x86_64/core2-sse3/3/bli_trsm_u_opt_d4x4.c +++ b/kernels/x86_64/penryn/3/bli_trsm_u_asm_d4x4.c @@ -34,26 +34,27 @@ #include "blis.h" -void bli_strsm_u_opt_8x4( - float* restrict a11, - float* restrict b11, - float* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +#if 0 +void bli_strsm_u_asm_8x4 + ( + float* restrict a11, + float* restrict b11, + float* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_STRSM_U_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); } +#endif -void bli_dtrsm_u_opt_4x4( - double* restrict a11, - double* restrict b11, - double* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dtrsm_u_asm_4x4 + ( + double* restrict a11, + double* restrict b11, + double* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { __asm__ volatile ( @@ -211,31 +212,27 @@ void bli_dtrsm_u_opt_4x4( } -void bli_ctrsm_u_opt_4x2( - scomplex* restrict a11, - scomplex* restrict b11, - scomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +#if 0 +void bli_ctrsm_u_asm_4x2 + ( + scomplex* restrict a11, + scomplex* restrict b11, + scomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_CTRSM_U_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); } -void bli_ztrsm_u_opt_2x2( - dcomplex* restrict a11, - dcomplex* restrict b11, - dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_ztrsm_u_asm_2x2 + ( + dcomplex* restrict a11, + dcomplex* restrict b11, + dcomplex* restrict c11, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { - /* Just call the reference implementation. */ - BLIS_ZTRSM_U_UKERNEL_REF( a11, - b11, - c11, rs_c, cs_c, - data ); } +#endif diff --git a/kernels/x86_64/piledriver/3/bli_gemm_new_d8x3.c b/kernels/x86_64/piledriver/3/bli_gemm_asm_d8x3.c similarity index 98% rename from kernels/x86_64/piledriver/3/bli_gemm_new_d8x3.c rename to kernels/x86_64/piledriver/3/bli_gemm_asm_d8x3.c index abbc65ce1..a4e2b9c58 100644 --- a/kernels/x86_64/piledriver/3/bli_gemm_new_d8x3.c +++ b/kernels/x86_64/piledriver/3/bli_gemm_asm_d8x3.c @@ -37,15 +37,17 @@ #include "blis.h" -void bli_sgemm_new_16x3( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_asm_16x3 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); @@ -902,15 +904,17 @@ void bli_sgemm_new_16x3( ); } -void bli_dgemm_new_8x3( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_asm_8x3 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); @@ -1615,15 +1619,17 @@ void bli_dgemm_new_8x3( ); } -void bli_cgemm_new_4x2( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_cgemm_asm_4x2 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); @@ -2157,15 +2163,17 @@ void bli_cgemm_new_4x2( ); } -void bli_zgemm_new_2x2( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemm_asm_2x2 + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a, + dcomplex* restrict b, + dcomplex* restrict beta, + dcomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); diff --git a/kernels/x86_64/avx/3/bli_gemm_asm_d8x4.c b/kernels/x86_64/sandybridge/3/bli_gemm_asm_d8x4.c similarity index 99% rename from kernels/x86_64/avx/3/bli_gemm_asm_d8x4.c rename to kernels/x86_64/sandybridge/3/bli_gemm_asm_d8x4.c index c143a8a5a..5189403b8 100644 --- a/kernels/x86_64/avx/3/bli_gemm_asm_d8x4.c +++ b/kernels/x86_64/sandybridge/3/bli_gemm_asm_d8x4.c @@ -37,15 +37,17 @@ #include "blis.h" -void bli_sgemm_asm_8x8( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_asm_8x8 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); @@ -1035,15 +1037,17 @@ void bli_sgemm_asm_8x8( ); } -void bli_dgemm_asm_8x4( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_asm_8x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); @@ -1720,15 +1724,17 @@ void bli_dgemm_asm_8x4( ); } -void bli_cgemm_asm_8x4( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_cgemm_asm_8x4 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); @@ -2694,15 +2700,17 @@ void bli_cgemm_asm_8x4( -void bli_zgemm_asm_4x4( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemm_asm_4x4 + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a, + dcomplex* restrict b, + dcomplex* restrict beta, + dcomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); //void* b_next = bli_auxinfo_next_b( data ); diff --git a/kernels/x86_64/avx/3/bli_gemm_int_d8x4.c b/kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c similarity index 89% rename from kernels/x86_64/avx/3/bli_gemm_int_d8x4.c rename to kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c index f5c2885dd..72d5020a9 100644 --- a/kernels/x86_64/avx/3/bli_gemm_int_d8x4.c +++ b/kernels/x86_64/sandybridge/3/bli_gemm_int_d8x4.c @@ -36,38 +36,45 @@ #include - -void bli_sgemm_int_8x8( - dim_t k, - float* restrict alpha, - float* restrict a, - float* restrict b, - float* restrict beta, - float* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_sgemm_int_8x8 + ( + dim_t k, + float* restrict alpha, + float* restrict a, + float* restrict b, + float* restrict beta, + float* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_SGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); + BLIS_SGEMM_UKERNEL_REF + ( + k, + alpha, + a, + b, + beta, + c, rs_c, cs_c, + data, + cntx + ); } -void bli_dgemm_int_8x4( - dim_t k, - double* restrict alpha, - double* restrict a, - double* restrict b, - double* restrict beta, - double* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_dgemm_int_8x4 + ( + dim_t k, + double* restrict alpha, + double* restrict a, + double* restrict b, + double* restrict beta, + double* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { //void* a_next = bli_auxinfo_next_a( data ); void* b_next = bli_auxinfo_next_b( data ); @@ -631,45 +638,57 @@ void bli_dgemm_int_8x4( -void bli_cgemm_int_8x4( - dim_t k, - scomplex* restrict alpha, - scomplex* restrict a, - scomplex* restrict b, - scomplex* restrict beta, - scomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_cgemm_int_8x4 + ( + dim_t k, + scomplex* restrict alpha, + scomplex* restrict a, + scomplex* restrict b, + scomplex* restrict beta, + scomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_CGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); + BLIS_CGEMM_UKERNEL_REF + ( + k, + alpha, + a, + b, + beta, + c, rs_c, cs_c, + data, + cntx + ); } -void bli_zgemm_int_4x4( - dim_t k, - dcomplex* restrict alpha, - dcomplex* restrict a, - dcomplex* restrict b, - dcomplex* restrict beta, - dcomplex* restrict c, inc_t rs_c, inc_t cs_c, - auxinfo_t* data - ) +void bli_zgemm_int_4x4 + ( + dim_t k, + dcomplex* restrict alpha, + dcomplex* restrict a, + dcomplex* restrict b, + dcomplex* restrict beta, + dcomplex* restrict c, inc_t rs_c, inc_t cs_c, + auxinfo_t* restrict data, + cntx_t* restrict cntx + ) { /* Just call the reference implementation. */ - BLIS_ZGEMM_UKERNEL_REF( k, - alpha, - a, - b, - beta, - c, rs_c, cs_c, - data ); + BLIS_ZGEMM_UKERNEL_REF + ( + k, + alpha, + a, + b, + beta, + c, rs_c, cs_c, + data, + cntx + ); } diff --git a/testsuite/input.operations b/testsuite/input.operations index 34ceb6922..87b1090b0 100644 --- a/testsuite/input.operations +++ b/testsuite/input.operations @@ -81,7 +81,7 @@ # --- Section overrides ---------------------------------------------------- 1 # Utility -1 # Level-1v +1 # Level-1v kernels 1 # Level-1m 1 # Level-1f kernels 1 # Level-2 diff --git a/testsuite/src/test_axpy2v.c b/testsuite/src/test_axpy2v.c index a622a1b6e..dafd36c59 100644 --- a/testsuite/src/test_axpy2v.c +++ b/testsuite/src/test_axpy2v.c @@ -64,7 +64,8 @@ void libblis_test_axpy2v_impl( iface_t iface, obj_t* alpha2, obj_t* x, obj_t* y, - obj_t* z ); + obj_t* z, + cntx_t* cntx ); void libblis_test_axpy2v_check( obj_t* alpha1, obj_t* alpha2, @@ -140,6 +141,10 @@ void libblis_test_axpy2v_experiment( test_params_t* params, obj_t alpha1, alpha2, x, y, z; obj_t z_save; + cntx_t cntx; + + // Initialize a context. + bli_axpy2v_cntx_init( &cntx ); // Map the dimension specifier to an actual dimension. m = libblis_test_get_dim_from_prob_size( op->dim_spec[0], p_cur ); @@ -187,7 +192,9 @@ void libblis_test_axpy2v_experiment( test_params_t* params, time = bli_clock(); - libblis_test_axpy2v_impl( iface, &alpha1, &alpha2, &x, &y, &z ); + libblis_test_axpy2v_impl( iface, + &alpha1, &alpha2, &x, &y, &z, + &cntx ); time_min = bli_clock_min_diff( time_min, time ); } @@ -207,6 +214,9 @@ void libblis_test_axpy2v_experiment( test_params_t* params, bli_obj_free( &y ); bli_obj_free( &z ); bli_obj_free( &z_save ); + + // Finalize the context. + bli_axpy2v_cntx_finalize( &cntx ); } @@ -216,12 +226,13 @@ void libblis_test_axpy2v_impl( iface_t iface, obj_t* alpha2, obj_t* x, obj_t* y, - obj_t* z ) + obj_t* z, + cntx_t* cntx ) { switch ( iface ) { case BLIS_TEST_SEQ_FRONT_END: - bli_axpy2v_kernel( alpha1, alpha2, x, y, z ); + bli_axpy2v_ex( alpha1, alpha2, x, y, z, cntx ); break; default: diff --git a/testsuite/src/test_axpyf.c b/testsuite/src/test_axpyf.c index e85defc53..819d222f6 100644 --- a/testsuite/src/test_axpyf.c +++ b/testsuite/src/test_axpyf.c @@ -63,7 +63,8 @@ void libblis_test_axpyf_impl( iface_t iface, obj_t* alpha, obj_t* a, obj_t* x, - obj_t* y ); + obj_t* y, + cntx_t* cntx ); void libblis_test_axpyf_check( obj_t* alpha, obj_t* a, @@ -138,12 +139,16 @@ void libblis_test_axpyf_experiment( test_params_t* params, obj_t alpha, a, x, y; obj_t y_save; + cntx_t cntx; + + // Initialize a context. + bli_axpyf_cntx_init( &cntx ); // Map the dimension specifier to an actual dimension. m = libblis_test_get_dim_from_prob_size( op->dim_spec[0], p_cur ); // Query the operation's fusing factor for the current datatype. - b_n = bli_axpyf_fusefac( datatype ); + b_n = bli_cntx_get_blksz_def_dt( datatype, BLIS_AF, &cntx ); // Store the fusing factor so that the driver can retrieve the value // later when printing results. @@ -190,7 +195,9 @@ void libblis_test_axpyf_experiment( test_params_t* params, time = bli_clock(); - libblis_test_axpyf_impl( iface, &alpha, &a, &x, &y ); + libblis_test_axpyf_impl( iface, + &alpha, &a, &x, &y, + &cntx ); time_min = bli_clock_min_diff( time_min, time ); } @@ -210,6 +217,9 @@ void libblis_test_axpyf_experiment( test_params_t* params, bli_obj_free( &x ); bli_obj_free( &y ); bli_obj_free( &y_save ); + + // Finalize the context. + bli_axpyf_cntx_finalize( &cntx ); } @@ -218,12 +228,13 @@ void libblis_test_axpyf_impl( iface_t iface, obj_t* alpha, obj_t* a, obj_t* x, - obj_t* y ) + obj_t* y, + cntx_t* cntx ) { switch ( iface ) { case BLIS_TEST_SEQ_FRONT_END: - bli_axpyf_kernel( alpha, a, x, y ); + bli_axpyf_ex( alpha, a, x, y, cntx ); break; default: diff --git a/testsuite/src/test_dotaxpyv.c b/testsuite/src/test_dotaxpyv.c index 4fa0fbba9..fd59ee3d5 100644 --- a/testsuite/src/test_dotaxpyv.c +++ b/testsuite/src/test_dotaxpyv.c @@ -65,7 +65,8 @@ void libblis_test_dotaxpyv_impl( iface_t iface, obj_t* x, obj_t* y, obj_t* rho, - obj_t* z ); + obj_t* z, + cntx_t* cntx ); void libblis_test_dotaxpyv_check( obj_t* alpha, obj_t* xt, @@ -143,6 +144,10 @@ void libblis_test_dotaxpyv_experiment( test_params_t* params, obj_t alpha, xt, x, y, rho, z; obj_t z_save; + cntx_t cntx; + + // Initialize a context. + bli_dotaxpyv_cntx_init( &cntx ); // Map the dimension specifier to an actual dimension. m = libblis_test_get_dim_from_prob_size( op->dim_spec[0], p_cur ); @@ -207,7 +212,9 @@ void libblis_test_dotaxpyv_experiment( test_params_t* params, time = bli_clock(); - libblis_test_dotaxpyv_impl( iface, &alpha, &xt, &x, &y, &rho, &z ); + libblis_test_dotaxpyv_impl( iface, + &alpha, &xt, &x, &y, &rho, &z, + &cntx ); time_min = bli_clock_min_diff( time_min, time ); } @@ -227,6 +234,9 @@ void libblis_test_dotaxpyv_experiment( test_params_t* params, bli_obj_free( &y ); bli_obj_free( &z ); bli_obj_free( &z_save ); + + // Finalize the context. + bli_dotaxpyv_cntx_finalize( &cntx ); } @@ -237,12 +247,13 @@ void libblis_test_dotaxpyv_impl( iface_t iface, obj_t* x, obj_t* y, obj_t* rho, - obj_t* z ) + obj_t* z, + cntx_t* cntx ) { switch ( iface ) { case BLIS_TEST_SEQ_FRONT_END: - bli_dotaxpyv_kernel( alpha, xt, x, y, rho, z ); + bli_dotaxpyv_ex( alpha, xt, x, y, rho, z, cntx ); break; default: diff --git a/testsuite/src/test_dotxaxpyf.c b/testsuite/src/test_dotxaxpyf.c index b4361470a..3744344e7 100644 --- a/testsuite/src/test_dotxaxpyf.c +++ b/testsuite/src/test_dotxaxpyf.c @@ -67,7 +67,8 @@ void libblis_test_dotxaxpyf_impl( iface_t iface, obj_t* x, obj_t* beta, obj_t* y, - obj_t* z ); + obj_t* z, + cntx_t* cntx ); void libblis_test_dotxaxpyf_check( obj_t* alpha, obj_t* at, @@ -148,12 +149,16 @@ void libblis_test_dotxaxpyf_experiment( test_params_t* params, obj_t alpha, at, a, w, x, beta, y, z; obj_t y_save, z_save; + cntx_t cntx; + + // Initialize a context. + bli_dotxaxpyf_cntx_init( &cntx ); // Map the dimension specifier to an actual dimension. m = libblis_test_get_dim_from_prob_size( op->dim_spec[0], p_cur ); // Query the operation's fusing factor for the current datatype. - b_n = bli_dotxaxpyf_fusefac( datatype ); + b_n = bli_cntx_get_blksz_def_dt( datatype, BLIS_XF, &cntx ); // Store the fusing factor so that the driver can retrieve the value // later when printing results. @@ -219,7 +224,9 @@ void libblis_test_dotxaxpyf_experiment( test_params_t* params, time = bli_clock(); - libblis_test_dotxaxpyf_impl( iface, &alpha, &at, &a, &w, &x, &beta, &y, &z ); + libblis_test_dotxaxpyf_impl( iface, + &alpha, &at, &a, &w, &x, &beta, &y, &z, + &cntx ); time_min = bli_clock_min_diff( time_min, time ); } @@ -243,6 +250,9 @@ void libblis_test_dotxaxpyf_experiment( test_params_t* params, bli_obj_free( &z ); bli_obj_free( &y_save ); bli_obj_free( &z_save ); + + // Finalize the context. + bli_dotxaxpyf_cntx_finalize( &cntx ); } @@ -255,12 +265,13 @@ void libblis_test_dotxaxpyf_impl( iface_t iface, obj_t* x, obj_t* beta, obj_t* y, - obj_t* z ) + obj_t* z, + cntx_t* cntx ) { switch ( iface ) { case BLIS_TEST_SEQ_FRONT_END: - bli_dotxaxpyf_kernel( alpha, at, a, w, x, beta, y, z ); + bli_dotxaxpyf_ex( alpha, at, a, w, x, beta, y, z, cntx ); break; default: diff --git a/testsuite/src/test_dotxf.c b/testsuite/src/test_dotxf.c index d9a21c463..b917331a4 100644 --- a/testsuite/src/test_dotxf.c +++ b/testsuite/src/test_dotxf.c @@ -64,7 +64,8 @@ void libblis_test_dotxf_impl( iface_t iface, obj_t* a, obj_t* x, obj_t* beta, - obj_t* y ); + obj_t* y, + cntx_t* cntx ); void libblis_test_dotxf_check( obj_t* alpha, obj_t* a, @@ -140,12 +141,16 @@ void libblis_test_dotxf_experiment( test_params_t* params, obj_t alpha, a, x, beta, y; obj_t y_save; + cntx_t cntx; + + // Initialize a context. + bli_dotxf_cntx_init( &cntx ); // Map the dimension specifier to an actual dimension. m = libblis_test_get_dim_from_prob_size( op->dim_spec[0], p_cur ); // Query the operation's fusing factor for the current datatype. - b_n = bli_dotxf_fusefac( datatype ); + b_n = bli_cntx_get_blksz_def_dt( datatype, BLIS_DF, &cntx ); // Store the fusing factor so that the driver can retrieve the value // later when printing results. @@ -195,7 +200,9 @@ void libblis_test_dotxf_experiment( test_params_t* params, time = bli_clock(); - libblis_test_dotxf_impl( iface, &alpha, &a, &x, &beta, &y ); + libblis_test_dotxf_impl( iface, + &alpha, &a, &x, &beta, &y, + &cntx ); time_min = bli_clock_min_diff( time_min, time ); } @@ -215,6 +222,9 @@ void libblis_test_dotxf_experiment( test_params_t* params, bli_obj_free( &x ); bli_obj_free( &y ); bli_obj_free( &y_save ); + + // Finalize the context. + bli_dotxf_cntx_finalize( &cntx ); } @@ -224,12 +234,13 @@ void libblis_test_dotxf_impl( iface_t iface, obj_t* a, obj_t* x, obj_t* beta, - obj_t* y ) + obj_t* y, + cntx_t* cntx ) { switch ( iface ) { case BLIS_TEST_SEQ_FRONT_END: - bli_dotxf_kernel( alpha, a, x, beta, y ); + bli_dotxf_ex( alpha, a, x, beta, y, cntx ); break; default: diff --git a/testsuite/src/test_gemm_ukr.c b/testsuite/src/test_gemm_ukr.c index cc91563f0..1a077d40b 100644 --- a/testsuite/src/test_gemm_ukr.c +++ b/testsuite/src/test_gemm_ukr.c @@ -64,7 +64,8 @@ void libblis_test_gemm_ukr_impl( iface_t iface, obj_t* a, obj_t* b, obj_t* beta, - obj_t* c ); + obj_t* c, + cntx_t* cntx ); void libblis_test_gemm_ukr_check( obj_t* alpha, obj_t* a, @@ -119,10 +120,6 @@ void libblis_test_gemm_ukr( test_params_t* params, test_op_t* op ) } -// Import the register blocksizes used by the micro-kernel(s). -extern blksz_t* gemm_mr; -extern blksz_t* gemm_nr; -extern blksz_t* gemm_kr; void libblis_test_gemm_ukr_experiment( test_params_t* params, test_op_t* op, @@ -150,13 +147,17 @@ void libblis_test_gemm_ukr_experiment( test_params_t* params, obj_t ap, bp; obj_t c_save; + cntx_t cntx; + + // Initialize a context. + bli_gemm_cntx_init( &cntx ); // Map the dimension specifier to actual dimensions. k = libblis_test_get_dim_from_prob_size( op->dim_spec[0], p_cur ); // Fix m and n to MR and NR, respectively. - m = bli_blksz_get_def( datatype, gemm_mr ); - n = bli_blksz_get_def( datatype, gemm_nr ); + m = bli_cntx_get_blksz_def_dt( datatype, BLIS_MR, &cntx ); + n = bli_cntx_get_blksz_def_dt( datatype, BLIS_NR, &cntx ); // Store the register blocksizes so that the driver can retrieve the // values later when printing results. @@ -207,22 +208,24 @@ void libblis_test_gemm_ukr_experiment( test_params_t* params, bli_obj_init_pack( &bp ); // Create pack objects for a and b. - libblis_test_pobj_create( gemm_mr, - gemm_kr, + libblis_test_pobj_create( BLIS_MR, + BLIS_KR, BLIS_NO_INVERT_DIAG, BLIS_PACKED_ROW_PANELS, BLIS_BUFFER_FOR_A_BLOCK, - &a, &ap ); - libblis_test_pobj_create( gemm_kr, - gemm_nr, + &a, &ap, + &cntx ); + libblis_test_pobj_create( BLIS_KR, + BLIS_NR, BLIS_NO_INVERT_DIAG, BLIS_PACKED_COL_PANELS, BLIS_BUFFER_FOR_B_PANEL, - &b, &bp ); + &b, &bp, + &cntx ); // Pack the contents of a and b to ap and bp, respectively. - bli_packm_blk_var1( &a, &ap, &BLIS_PACKM_SINGLE_THREADED ); - bli_packm_blk_var1( &b, &bp, &BLIS_PACKM_SINGLE_THREADED ); + bli_packm_blk_var1( &a, &ap, &cntx, &BLIS_PACKM_SINGLE_THREADED ); + bli_packm_blk_var1( &b, &bp, &cntx, &BLIS_PACKM_SINGLE_THREADED ); // Repeat the experiment n_repeats times and record results. @@ -232,7 +235,9 @@ void libblis_test_gemm_ukr_experiment( test_params_t* params, time = bli_clock(); - libblis_test_gemm_ukr_impl( iface, &alpha, &ap, &bp, &beta, &c ); + libblis_test_gemm_ukr_impl( iface, + &alpha, &ap, &bp, &beta, &c, + &cntx ); time_min = bli_clock_min_diff( time_min, time ); } @@ -256,6 +261,9 @@ void libblis_test_gemm_ukr_experiment( test_params_t* params, bli_obj_free( &b ); bli_obj_free( &c ); bli_obj_free( &c_save ); + + // Finalize the context. + bli_gemm_cntx_finalize( &cntx ); } @@ -265,12 +273,13 @@ void libblis_test_gemm_ukr_impl( iface_t iface, obj_t* a, obj_t* b, obj_t* beta, - obj_t* c ) + obj_t* c, + cntx_t* cntx ) { switch ( iface ) { case BLIS_TEST_SEQ_UKERNEL: - bli_gemm_ukernel( alpha, a, b, beta, c ); + bli_gemm_ukernel( alpha, a, b, beta, c, cntx ); break; default: diff --git a/testsuite/src/test_gemmtrsm_ukr.c b/testsuite/src/test_gemmtrsm_ukr.c index 7b30a0284..0ca9acb0c 100644 --- a/testsuite/src/test_gemmtrsm_ukr.c +++ b/testsuite/src/test_gemmtrsm_ukr.c @@ -66,7 +66,8 @@ void libblis_test_gemmtrsm_ukr_impl( iface_t iface, obj_t* a11, obj_t* bx1, obj_t* b11, - obj_t* c11 ); + obj_t* c11, + cntx_t* cntx ); void libblis_test_gemmtrsm_ukr_check( side_t side, obj_t* alpha, @@ -168,13 +169,17 @@ void libblis_test_gemmtrsm_ukr_experiment( test_params_t* params, obj_t a1xp, a11p, bx1p, b11p; obj_t c11_save; + cntx_t cntx; + + // Initialize a context. + bli_trsm_cntx_init( &cntx ); // Map the dimension specifier to actual dimensions. k = libblis_test_get_dim_from_prob_size( op->dim_spec[0], p_cur ); // Fix m and n to MR and NR, respectively. - m = bli_blksz_get_def( datatype, gemm_mr ); - n = bli_blksz_get_def( datatype, gemm_nr ); + m = bli_cntx_get_blksz_def_dt( datatype, BLIS_MR, &cntx ); + n = bli_cntx_get_blksz_def_dt( datatype, BLIS_NR, &cntx ); // Store the register blocksizes so that the driver can retrieve the // values later when printing results. @@ -237,24 +242,26 @@ void libblis_test_gemmtrsm_ukr_experiment( test_params_t* params, bli_obj_init_pack( &bp ); // Create pack objects for a and b. - libblis_test_pobj_create( gemm_mr, - gemm_mr, + libblis_test_pobj_create( BLIS_MR, + BLIS_MR, BLIS_INVERT_DIAG, BLIS_PACKED_ROW_PANELS, BLIS_BUFFER_FOR_A_BLOCK, - &a, &ap ); - libblis_test_pobj_create( gemm_mr, - gemm_nr, + &a, &ap, + &cntx ); + libblis_test_pobj_create( BLIS_MR, + BLIS_NR, BLIS_NO_INVERT_DIAG, BLIS_PACKED_COL_PANELS, BLIS_BUFFER_FOR_B_PANEL, - &b, &bp ); + &b, &bp, + &cntx ); // Pack the contents of a to ap. - bli_packm_blk_var1( &a, &ap, &BLIS_PACKM_SINGLE_THREADED ); + bli_packm_blk_var1( &a, &ap, &cntx, &BLIS_PACKM_SINGLE_THREADED ); // Pack the contents of b to bp. - bli_packm_blk_var1( &b, &bp, &BLIS_PACKM_SINGLE_THREADED ); + bli_packm_blk_var1( &b, &bp, &cntx, &BLIS_PACKM_SINGLE_THREADED ); // Set the uplo field of ap since the default for packed objects is // BLIS_DENSE, and the _make_subparts() routine needs this information @@ -278,12 +285,13 @@ void libblis_test_gemmtrsm_ukr_experiment( test_params_t* params, bli_copym( &c11_save, &c11 ); // Re-pack the contents of b to bp. - bli_packm_blk_var1( &b, &bp, &BLIS_PACKM_SINGLE_THREADED ); + bli_packm_blk_var1( &b, &bp, &cntx, &BLIS_PACKM_SINGLE_THREADED ); time = bli_clock(); libblis_test_gemmtrsm_ukr_impl( iface, side, &alpha, - &a1xp, &a11p, &bx1p, &b11p, &c11 ); + &a1xp, &a11p, &bx1p, &b11p, &c11, + &cntx ); time_min = bli_clock_min_diff( time_min, time ); } @@ -308,6 +316,9 @@ void libblis_test_gemmtrsm_ukr_experiment( test_params_t* params, bli_obj_free( &b ); bli_obj_free( &c11 ); bli_obj_free( &c11_save ); + + // Finalize the context. + bli_trsm_cntx_finalize( &cntx ); } @@ -319,12 +330,13 @@ void libblis_test_gemmtrsm_ukr_impl( iface_t iface, obj_t* a11, obj_t* bx1, obj_t* b11, - obj_t* c11 ) + obj_t* c11, + cntx_t* cntx ) { switch ( iface ) { case BLIS_TEST_SEQ_UKERNEL: - bli_gemmtrsm_ukernel( alpha, a1x, a11, bx1, b11, c11 ); + bli_gemmtrsm_ukernel( alpha, a1x, a11, bx1, b11, c11, cntx ); break; default: diff --git a/testsuite/src/test_gemv.c b/testsuite/src/test_gemv.c index b866a5ed6..cfc231d10 100644 --- a/testsuite/src/test_gemv.c +++ b/testsuite/src/test_gemv.c @@ -174,8 +174,8 @@ void libblis_test_gemv_experiment( test_params_t* params, } else { - bli_setsc( 0.0, 2.0, &alpha ); - bli_setsc( 0.0, -1.0, &beta ); + bli_setsc( 1.0, 2.0, &alpha ); + bli_setsc( 1.0, -1.0, &beta ); } // Initialize diagonal of matrix A. diff --git a/testsuite/src/test_libblis.c b/testsuite/src/test_libblis.c index 55a98bd96..d07eade90 100644 --- a/testsuite/src/test_libblis.c +++ b/testsuite/src/test_libblis.c @@ -570,10 +570,12 @@ void libblis_test_output_section_overrides( FILE* os, test_ops_t* ops ) void libblis_test_output_params_struct( FILE* os, test_params_t* params ) { - int i; + int i; //char int_type_size_str[8]; - gint_t int_type_size; - ind_t ind; + gint_t int_type_size; + ind_t im; + cntx_t cntx_s; + cntx_t* cntx = &cntx_s; // If bli_info_get_int_type_size() returns 32 or 64, the size is forced. // Otherwise, the size is chosen automatically. We query the result of @@ -601,14 +603,17 @@ void libblis_test_output_params_struct( FILE* os, test_params_t* params ) libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, "integer type size (bits) %d\n", ( int )int_type_size ); libblis_test_fprintf_c( os, "\n" ); + libblis_test_fprintf_c( os, "SIMD number of registers %d\n", ( int )bli_info_get_simd_num_registers() ); + libblis_test_fprintf_c( os, "SIMD size (bytes) %d\n", ( int )bli_info_get_simd_size() ); libblis_test_fprintf_c( os, "SIMD alignment (bytes) %d\n", ( int )bli_info_get_simd_align_size() ); + libblis_test_fprintf_c( os, "Max stack buffer size (bytes) %d\n", ( int )bli_info_get_stack_buf_max_size() ); libblis_test_fprintf_c( os, "Page size (bytes) %d\n", ( int )bli_info_get_page_size() ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, "memory alignment (bytes) \n" ); - libblis_test_fprintf_c( os, " stack address (def: simd) %d\n", ( int )bli_info_get_stack_buf_align_size() ); - libblis_test_fprintf_c( os, " obj_t address (def: simd) %d\n", ( int )bli_info_get_heap_addr_align_size() ); - libblis_test_fprintf_c( os, " obj_t stride (def: simd) %d\n", ( int )bli_info_get_heap_stride_align_size() ); - libblis_test_fprintf_c( os, " pool block addr (def: page) %d\n", ( int )bli_info_get_pool_addr_align_size() ); + libblis_test_fprintf_c( os, " stack address %d\n", ( int )bli_info_get_stack_buf_align_size() ); + libblis_test_fprintf_c( os, " obj_t address %d\n", ( int )bli_info_get_heap_addr_align_size() ); + libblis_test_fprintf_c( os, " obj_t stride %d\n", ( int )bli_info_get_heap_stride_align_size() ); + libblis_test_fprintf_c( os, " pool block addr %d\n", ( int )bli_info_get_pool_addr_align_size() ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, "BLAS compatibility layer \n" ); libblis_test_fprintf_c( os, " enabled? %d\n", ( int )bli_info_get_enable_blas2blis() ); @@ -692,60 +697,63 @@ void libblis_test_output_params_struct( FILE* os, test_params_t* params ) bli_ind_oper_get_avail_impl_string( BLIS_GEMM, BLIS_SCOMPLEX ), bli_ind_oper_get_avail_impl_string( BLIS_GEMM, BLIS_DCOMPLEX ) ); libblis_test_fprintf_c( os, "\n" ); + + bli_gemmnat_cntx_init( cntx ); + libblis_test_fprintf_c( os, "level-3 blocksizes s d c z \n" ); libblis_test_fprintf_c( os, " mc %7d %7d %7d %7d\n", - ( int )bli_info_get_default_mc( BLIS_GEMM, BLIS_FLOAT ), - ( int )bli_info_get_default_mc( BLIS_GEMM, BLIS_DOUBLE ), - ( int )bli_info_get_default_mc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_default_mc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_FLOAT, BLIS_MC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, BLIS_MC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_MC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_MC, cntx ) ); libblis_test_fprintf_c( os, " kc %7d %7d %7d %7d\n", - ( int )bli_info_get_default_kc( BLIS_GEMM, BLIS_FLOAT ), - ( int )bli_info_get_default_kc( BLIS_GEMM, BLIS_DOUBLE ), - ( int )bli_info_get_default_kc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_default_kc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_FLOAT, BLIS_KC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, BLIS_KC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_KC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_KC, cntx ) ); libblis_test_fprintf_c( os, " nc %7d %7d %7d %7d\n", - ( int )bli_info_get_default_nc( BLIS_GEMM, BLIS_FLOAT ), - ( int )bli_info_get_default_nc( BLIS_GEMM, BLIS_DOUBLE ), - ( int )bli_info_get_default_nc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_default_nc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_FLOAT, BLIS_NC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, BLIS_NC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_NC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_NC, cntx ) ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, " mc maximum %7d %7d %7d %7d\n", - ( int )bli_info_get_maximum_mc( BLIS_GEMM, BLIS_FLOAT ), - ( int )bli_info_get_maximum_mc( BLIS_GEMM, BLIS_DOUBLE ), - ( int )bli_info_get_maximum_mc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_maximum_mc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_FLOAT, BLIS_MC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DOUBLE, BLIS_MC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_MC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_MC, cntx ) ); libblis_test_fprintf_c( os, " kc maximum %7d %7d %7d %7d\n", - ( int )bli_info_get_maximum_kc( BLIS_GEMM, BLIS_FLOAT ), - ( int )bli_info_get_maximum_kc( BLIS_GEMM, BLIS_DOUBLE ), - ( int )bli_info_get_maximum_kc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_maximum_kc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_FLOAT, BLIS_KC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DOUBLE, BLIS_KC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_KC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_KC, cntx ) ); libblis_test_fprintf_c( os, " nc maximum %7d %7d %7d %7d\n", - ( int )bli_info_get_maximum_nc( BLIS_GEMM, BLIS_FLOAT ), - ( int )bli_info_get_maximum_nc( BLIS_GEMM, BLIS_DOUBLE ), - ( int )bli_info_get_maximum_nc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_maximum_nc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_FLOAT, BLIS_NC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DOUBLE, BLIS_NC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_NC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_NC, cntx ) ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, " mr %7d %7d %7d %7d\n", - ( int )bli_info_get_default_mr( BLIS_GEMM, BLIS_FLOAT ), - ( int )bli_info_get_default_mr( BLIS_GEMM, BLIS_DOUBLE ), - ( int )bli_info_get_default_mr( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_default_mr( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_FLOAT, BLIS_MR, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, BLIS_MR, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_MR, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_MR, cntx ) ); libblis_test_fprintf_c( os, " nr %7d %7d %7d %7d\n", - ( int )bli_info_get_default_nr( BLIS_GEMM, BLIS_FLOAT ), - ( int )bli_info_get_default_nr( BLIS_GEMM, BLIS_DOUBLE ), - ( int )bli_info_get_default_nr( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_default_nr( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_FLOAT, BLIS_NR, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, BLIS_NR, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_NR, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_NR, cntx ) ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, " mr packdim %7d %7d %7d %7d\n", - ( int )bli_info_get_packdim_mr( BLIS_GEMM, BLIS_FLOAT ), - ( int )bli_info_get_packdim_mr( BLIS_GEMM, BLIS_DOUBLE ), - ( int )bli_info_get_packdim_mr( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_packdim_mr( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_FLOAT, BLIS_MR, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DOUBLE, BLIS_MR, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_MR, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_MR, cntx ) ); libblis_test_fprintf_c( os, " nr packdim %7d %7d %7d %7d\n", - ( int )bli_info_get_packdim_nr( BLIS_GEMM, BLIS_FLOAT ), - ( int )bli_info_get_packdim_nr( BLIS_GEMM, BLIS_DOUBLE ), - ( int )bli_info_get_packdim_nr( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_packdim_nr( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_FLOAT, BLIS_NR, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DOUBLE, BLIS_NR, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_NR, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_NR, cntx ) ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, "micro-kernel types s d c z\n" ); libblis_test_fprintf_c( os, " gemm %7s %7s %7s %7s\n", @@ -776,116 +784,124 @@ void libblis_test_output_params_struct( FILE* os, test_params_t* params ) libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, "\n" ); + bli_gemmnat_cntx_finalize( cntx ); + libblis_test_fprintf_c( os, "--- BLIS induced implementation info ---\n" ); libblis_test_fprintf_c( os, "\n" ); - for ( ind = 0; ind < BLIS_NAT; ++ind ) + for ( im = 0; im < BLIS_NAT; ++im ) { - if ( params->ind_enable[ ind ] == 0 ) continue; + if ( params->ind_enable[ im ] == 0 ) continue; - bli_ind_oper_enable_only( BLIS_GEMM, ind, BLIS_SCOMPLEX ); - bli_ind_oper_enable_only( BLIS_GEMM, ind, BLIS_DCOMPLEX ); + bli_ind_oper_enable_only( BLIS_GEMM, im, BLIS_SCOMPLEX ); + bli_ind_oper_enable_only( BLIS_GEMM, im, BLIS_DCOMPLEX ); libblis_test_fprintf_c( os, " c z \n" ); libblis_test_fprintf_c( os, "complex implementation %7s %7s\n", bli_ind_oper_get_avail_impl_string( BLIS_GEMM, BLIS_SCOMPLEX ), bli_ind_oper_get_avail_impl_string( BLIS_GEMM, BLIS_DCOMPLEX ) ); libblis_test_fprintf_c( os, "\n" ); + + bli_gemmind_cntx_init( im, cntx ); + libblis_test_fprintf_c( os, "level-3 blocksizes c z \n" ); libblis_test_fprintf_c( os, " mc %7d %7d\n", - ( int )bli_info_get_default_mc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_default_mc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_MC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_MC, cntx ) ); libblis_test_fprintf_c( os, " kc %7d %7d\n", - ( int )bli_info_get_default_kc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_default_kc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_KC, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_KC, cntx ) ); libblis_test_fprintf_c( os, " nc %7d %7d\n", - ( int )bli_info_get_default_nc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_default_nc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_NC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_NC, cntx ) ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, " mc maximum %7d %7d\n", - ( int )bli_info_get_maximum_mc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_maximum_mc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_MC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_MC, cntx ) ); libblis_test_fprintf_c( os, " kc maximum %7d %7d\n", - ( int )bli_info_get_maximum_kc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_maximum_kc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_KC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_KC, cntx ) ); libblis_test_fprintf_c( os, " nc maximum %7d %7d\n", - ( int )bli_info_get_maximum_nc( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_maximum_nc( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_NC, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_NC, cntx ) ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, " mr %7d %7d\n", - ( int )bli_info_get_default_mr( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_default_mr( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_MR, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_MR, cntx ) ); libblis_test_fprintf_c( os, " nr %7d %7d\n", - ( int )bli_info_get_default_nr( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_default_nr( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_NR, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_NR, cntx ) ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, " mr packdim %7d %7d\n", - ( int )bli_info_get_packdim_mr( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_packdim_mr( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_MR, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_MR, cntx ) ); libblis_test_fprintf_c( os, " nr packdim %7d %7d\n", - ( int )bli_info_get_packdim_nr( BLIS_GEMM, BLIS_SCOMPLEX ), - ( int )bli_info_get_packdim_nr( BLIS_GEMM, BLIS_DCOMPLEX ) ); + ( int )bli_cntx_get_blksz_max_dt( BLIS_SCOMPLEX, BLIS_NR, cntx ), + ( int )bli_cntx_get_blksz_max_dt( BLIS_DCOMPLEX, BLIS_NR, cntx ) ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, "micro-kernel types c z\n" ); libblis_test_fprintf_c( os, " gemm %7s %7s\n", - bli_info_get_gemm_ukr_impl_string( ind, BLIS_SCOMPLEX ), - bli_info_get_gemm_ukr_impl_string( ind, BLIS_DCOMPLEX ) ); + bli_info_get_gemm_ukr_impl_string( im, BLIS_SCOMPLEX ), + bli_info_get_gemm_ukr_impl_string( im, BLIS_DCOMPLEX ) ); libblis_test_fprintf_c( os, " gemmtrsm_l %7s %7s\n", - bli_info_get_gemmtrsm_l_ukr_impl_string( ind, BLIS_SCOMPLEX ), - bli_info_get_gemmtrsm_l_ukr_impl_string( ind, BLIS_DCOMPLEX ) ); + bli_info_get_gemmtrsm_l_ukr_impl_string( im, BLIS_SCOMPLEX ), + bli_info_get_gemmtrsm_l_ukr_impl_string( im, BLIS_DCOMPLEX ) ); libblis_test_fprintf_c( os, " gemmtrsm_u %7s %7s\n", - bli_info_get_gemmtrsm_u_ukr_impl_string( ind, BLIS_SCOMPLEX ), - bli_info_get_gemmtrsm_u_ukr_impl_string( ind, BLIS_DCOMPLEX ) ); + bli_info_get_gemmtrsm_u_ukr_impl_string( im, BLIS_SCOMPLEX ), + bli_info_get_gemmtrsm_u_ukr_impl_string( im, BLIS_DCOMPLEX ) ); libblis_test_fprintf_c( os, " trsm_l %7s %7s\n", - bli_info_get_trsm_l_ukr_impl_string( ind, BLIS_SCOMPLEX ), - bli_info_get_trsm_l_ukr_impl_string( ind, BLIS_DCOMPLEX ) ); + bli_info_get_trsm_l_ukr_impl_string( im, BLIS_SCOMPLEX ), + bli_info_get_trsm_l_ukr_impl_string( im, BLIS_DCOMPLEX ) ); libblis_test_fprintf_c( os, " trsm_u %7s %7s\n", - bli_info_get_trsm_u_ukr_impl_string( ind, BLIS_SCOMPLEX ), - bli_info_get_trsm_u_ukr_impl_string( ind, BLIS_DCOMPLEX ) ); + bli_info_get_trsm_u_ukr_impl_string( im, BLIS_SCOMPLEX ), + bli_info_get_trsm_u_ukr_impl_string( im, BLIS_DCOMPLEX ) ); libblis_test_fprintf_c( os, "\n" ); + + bli_gemmind_cntx_finalize( im, cntx ); } bli_ind_disable_all(); + // We use hemv's context because we know it is initialized with all of the fields + // we will be outputing. + bli_hemv_cntx_init( cntx ); + libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, "--- BLIS misc. other info ---\n" ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, "level-2 cache blocksizes s d c z \n" ); libblis_test_fprintf_c( os, " m dimension %7d %7d %7d %7d\n", - ( int )bli_info_get_default_l2_mc_s(), - ( int )bli_info_get_default_l2_mc_d(), - ( int )bli_info_get_default_l2_mc_c(), - ( int )bli_info_get_default_l2_mc_z() ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_FLOAT, BLIS_M2, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, BLIS_M2, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_M2, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_M2, cntx ) ); libblis_test_fprintf_c( os, " n dimension %7d %7d %7d %7d\n", - ( int )bli_info_get_default_l2_nc_s(), - ( int )bli_info_get_default_l2_nc_d(), - ( int )bli_info_get_default_l2_nc_c(), - ( int )bli_info_get_default_l2_nc_z() ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_FLOAT, BLIS_N2, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, BLIS_N2, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_N2, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_N2, cntx ) ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, "level-1f fusing factors s d c z \n" ); - libblis_test_fprintf_c( os, " default %7d %7d %7d %7d\n", - ( int )bli_info_get_default_l1f_fuse_fac_s(), - ( int )bli_info_get_default_l1f_fuse_fac_d(), - ( int )bli_info_get_default_l1f_fuse_fac_c(), - ( int )bli_info_get_default_l1f_fuse_fac_z() ); libblis_test_fprintf_c( os, " axpyf %7d %7d %7d %7d\n", - ( int )bli_info_get_axpyf_fuse_fac_s(), - ( int )bli_info_get_axpyf_fuse_fac_d(), - ( int )bli_info_get_axpyf_fuse_fac_c(), - ( int )bli_info_get_axpyf_fuse_fac_z() ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_FLOAT, BLIS_AF, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, BLIS_AF, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_AF, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_AF, cntx ) ); libblis_test_fprintf_c( os, " dotxf %7d %7d %7d %7d\n", - ( int )bli_info_get_dotxf_fuse_fac_s(), - ( int )bli_info_get_dotxf_fuse_fac_d(), - ( int )bli_info_get_dotxf_fuse_fac_c(), - ( int )bli_info_get_dotxf_fuse_fac_z() ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_FLOAT, BLIS_DF, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, BLIS_DF, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_DF, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_DF, cntx ) ); libblis_test_fprintf_c( os, " dotxaxpyf %7d %7d %7d %7d\n", - ( int )bli_info_get_dotxaxpyf_fuse_fac_s(), - ( int )bli_info_get_dotxaxpyf_fuse_fac_d(), - ( int )bli_info_get_dotxaxpyf_fuse_fac_c(), - ( int )bli_info_get_dotxaxpyf_fuse_fac_z() ); + ( int )bli_cntx_get_blksz_def_dt( BLIS_FLOAT, BLIS_XF, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DOUBLE, BLIS_XF, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_SCOMPLEX, BLIS_XF, cntx ), + ( int )bli_cntx_get_blksz_def_dt( BLIS_DCOMPLEX, BLIS_XF, cntx ) ); libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf( os, "\n" ); + bli_hemv_cntx_finalize( cntx ); + // Output the contents of the param struct. libblis_test_fprintf_c( os, "\n" ); libblis_test_fprintf_c( os, "--- BLIS test suite parameters ----------------------------\n" ); @@ -1809,7 +1825,7 @@ void libblis_test_mobj_create( test_params_t* params, num_t dt, trans_t trans, c -void libblis_test_pobj_create( blksz_t* m, blksz_t* n, invdiag_t inv_diag, pack_t pack_schema, packbuf_t pack_buf, obj_t* a, obj_t* p ) +void libblis_test_pobj_create( bszid_t bmult_id_m, bszid_t bmult_id_n, invdiag_t inv_diag, pack_t pack_schema, packbuf_t pack_buf, obj_t* a, obj_t* p, cntx_t* cntx ) { // Start with making p and alias to a. bli_obj_alias_to( *a, *p ); @@ -1820,9 +1836,11 @@ void libblis_test_pobj_create( blksz_t* m, blksz_t* n, invdiag_t inv_diag, pack_ BLIS_PACK_FWD_IF_UPPER, BLIS_PACK_FWD_IF_LOWER, pack_buf, - m, n, + bmult_id_m, + bmult_id_n, a, - p ); + p, + cntx ); } diff --git a/testsuite/src/test_libblis.h b/testsuite/src/test_libblis.h index 8bdbd37be..0725dd400 100644 --- a/testsuite/src/test_libblis.h +++ b/testsuite/src/test_libblis.h @@ -369,7 +369,7 @@ void fill_string_with_n_spaces( char* str, unsigned int n_spaces ); // --- Create object --- void libblis_test_mobj_create( test_params_t* params, num_t dt, trans_t trans, char storage, dim_t m, dim_t n, obj_t* a ); -void libblis_test_pobj_create( blksz_t* m, blksz_t* n, invdiag_t inv_diag, pack_t pack_schema, packbuf_t pack_buf, obj_t* a, obj_t* p ); +void libblis_test_pobj_create( bszid_t bmult_id_m, bszid_t bmult_id_n, invdiag_t inv_diag, pack_t pack_schema, packbuf_t pack_buf, obj_t* a, obj_t* p, cntx_t* cntx ); void libblis_test_vobj_create( test_params_t* params, num_t dt, char storage, dim_t m, obj_t* x ); // --- Global string initialization --- diff --git a/testsuite/src/test_trsm_ukr.c b/testsuite/src/test_trsm_ukr.c index 5ed3e8940..81aedbf57 100644 --- a/testsuite/src/test_trsm_ukr.c +++ b/testsuite/src/test_trsm_ukr.c @@ -63,7 +63,8 @@ void libblis_test_trsm_ukr_impl( iface_t iface, side_t side, obj_t* a, obj_t* b, - obj_t* c ); + obj_t* c, + cntx_t* cntx ); void libblis_test_trsm_ukr_check( side_t side, obj_t* a, @@ -151,10 +152,14 @@ void libblis_test_trsm_ukr_experiment( test_params_t* params, obj_t ap, bp; obj_t c_save; + cntx_t cntx; + + // Initialize a context. + bli_trsm_cntx_init( &cntx ); // Fix m and n to MR and NR, respectively. - m = bli_blksz_get_def( datatype, gemm_mr ); - n = bli_blksz_get_def( datatype, gemm_nr ); + m = bli_cntx_get_blksz_def_dt( datatype, BLIS_MR, &cntx ); + n = bli_cntx_get_blksz_def_dt( datatype, BLIS_NR, &cntx ); // Store the register blocksizes so that the driver can retrieve the // values later when printing results. @@ -203,18 +208,20 @@ void libblis_test_trsm_ukr_experiment( test_params_t* params, bli_obj_init_pack( &bp ); // Create pack objects for a and b. - libblis_test_pobj_create( gemm_mr, - gemm_mr, + libblis_test_pobj_create( BLIS_MR, + BLIS_MR, BLIS_INVERT_DIAG, BLIS_PACKED_ROW_PANELS, BLIS_BUFFER_FOR_A_BLOCK, - &a, &ap ); - libblis_test_pobj_create( gemm_mr, - gemm_nr, + &a, &ap, + &cntx ); + libblis_test_pobj_create( BLIS_MR, + BLIS_NR, BLIS_NO_INVERT_DIAG, BLIS_PACKED_COL_PANELS, BLIS_BUFFER_FOR_B_PANEL, - &b, &bp ); + &b, &bp, + &cntx ); // Set the uplo field of ap since the default for packed objects is // BLIS_DENSE, and the _ukernel() wrapper needs this information to @@ -222,20 +229,22 @@ void libblis_test_trsm_ukr_experiment( test_params_t* params, bli_obj_set_uplo( uploa, ap ); // Pack the contents of a to ap. - bli_packm_blk_var1( &a, &ap, &BLIS_PACKM_SINGLE_THREADED ); + bli_packm_blk_var1( &a, &ap, &cntx, &BLIS_PACKM_SINGLE_THREADED ); // Repeat the experiment n_repeats times and record results. for ( i = 0; i < n_repeats; ++i ) { // Re-pack the contents of b to bp. - bli_packm_blk_var1( &b, &bp, &BLIS_PACKM_SINGLE_THREADED ); + bli_packm_blk_var1( &b, &bp, &cntx, &BLIS_PACKM_SINGLE_THREADED ); bli_copym( &c_save, &c ); time = bli_clock(); - libblis_test_trsm_ukr_impl( iface, side, &ap, &bp, &c ); + libblis_test_trsm_ukr_impl( iface, side, + &ap, &bp, &c, + &cntx ); time_min = bli_clock_min_diff( time_min, time ); } @@ -259,6 +268,9 @@ void libblis_test_trsm_ukr_experiment( test_params_t* params, bli_obj_free( &b ); bli_obj_free( &c ); bli_obj_free( &c_save ); + + // Finalize the context. + bli_trsm_cntx_finalize( &cntx ); } @@ -267,12 +279,13 @@ void libblis_test_trsm_ukr_impl( iface_t iface, side_t side, obj_t* a, obj_t* b, - obj_t* c ) + obj_t* c, + cntx_t* cntx ) { switch ( iface ) { case BLIS_TEST_SEQ_UKERNEL: - bli_trsm_ukernel( a, b, c ); + bli_trsm_ukernel( a, b, c, cntx ); break; default: