amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-12 01:59:59 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	f1a6b7d028	Reorganized code for induced complex methods. Details: - Consolidated most of the code relating to induced complex methods (e.g. 4mh, 4m1, 3mh, 3m1, etc.) into frame/ind. Induced methods are now enabled on a per-operation basis. The current "available" (enabled and implemented) implementation can then be queried on an operation basis. Micro-kernel func_t objects as well as blksz_t objects can also be queried in a similar maner. - Redefined several micro-kernel and operation-related functions in bli_info_() API, in accordance with above changes. - Added mr and nr fields to blksz_t object, which point to the mr and nr blksz_t objects for each cache blocksize (and are NULL for register blocksizes). Renamed the sub-blocksize field "sub" to "mult" since it is really expressing a blocksize multiple. - Updated bli__determine_kc_[fb]() for gemm/hemm/symm, trmm, and trsm to correctly query mr and nr (for purposes of nudging kc). - Introduced an enumerated opid_t in bli_type_defs.h that uniquely identifies an operation. For now, only level-3 id values are defined, along with a generic, catch-all BLIS_NOID value. - Reworked testsuite so that all induced methods that are enabled are tested (one at a time) rather than only testing the first available method. - Reformated summary at the beginning of testsuite output so that blocksize and micro-kernel info is shown for each induced method that was requested (as well as native execution). - Reduced the number of columns needed to display non-matlab testsuite output (from approx. 90 to 80).	2015-03-18 15:37:10 -05:00
Field G. Van Zee	c0acca0f51	Clarified comments in testsuite input.operations.	2015-03-03 10:56:22 -06:00
Field G. Van Zee	45692e3ad4	Reverted some accidental changes. Details: - Reverted some changes that were unintentionally included in the previous commit (`9526ce98`). Thanks to Tony Kelman for pointing this out. (Note: a few select changes were not reverted.)	2014-08-07 13:21:15 -05:00
Field G. Van Zee	9526ce9881	Updated copyright headers of emscripten configuration files.	2014-08-06 14:15:34 -05:00
Field G. Van Zee	970b431416	Minor bugfixes to BLAS compatibility layer. Details: - Changed bla_amax.c so that i?amax() routines now correctly return 0 if ( n < 1 \|\| incx <= 0 ). - Changed bla_rotg.c and bla_rotmg.c to use bli_fabs() macro instead of f2c's abs() macro for float and double cases. - Thanks to Murtaza Ali for suggesting the two fixes above. - Updated label of fnormv to normfv in testsuite/input.operations.	2014-07-10 09:30:00 -05:00
Field G. Van Zee	fde5f1fdec	Added extensive support for configuration defaults. Details: - Standard names for reference kernels (levels-1v, -1f and 3) are now macro constants. Examples: BLIS_SAXPYV_KERNEL_REF BLIS_DDOTXF_KERNEL_REF BLIS_ZGEMM_UKERNEL_REF - Developers no longer have to name all datatype instances of a kernel with a common base name; [sdcz] datatype flavors of each kernel or micro-kernel (level-1v, -1f, or 3) may now be named independently. This means you can now, if you wish, encode the datatype-specific register blocksizes in the name of the micro-kernel functions. - Any datatype instances of any kernel (1v, 1f, or 3) that is left undefined in bli_kernel.h will default to the corresponding reference implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined, it will be defined to be BLIS_DGEMM_UKERNEL_REF. - Developers no longer need to name level-1v/-1f kernels with multiple datatype chars to match the number of types the kernel WOULD take in a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is sufficient, as in bli_daxpyv_opt(). - There is no longer a need to define an obj_t wrapper to go along with your level-1v/-1f kernels. The framework now prvides a _kernel() function which serves as the obj_t wrapper for whatever kernels are specified (or defaulted to) via bli_kernel.h - Developers no longer need to prototype their kernels, and thus no longer need to include any prototyping headers from within bli_kernel.h. The framework now generates kernel prototypes, with the proper type signature, based on the kernel names defined (or defaulted to) via bli_kernel.h. - If the complex datatype x (of [cz]) implementation of the gemm micro- kernel is left undefined by bli_kernel.h, but its same-precision real domain equivalent IS defined, BLIS will use a 4m-based implementation for the datatype x implementations of all level-3 operations, using only the real gemm micro-kernel.	2014-02-25 13:34:56 -06:00
Field G. Van Zee	d37c2cff62	Minor comment and Makefile changes. Details: - Added missing 'check-config' and 'check-make-defs' targets to testsuite/Makefile. - Removed unused 'test' target from top-level Makefile. - Comment changes to testsuite input files.	2013-11-13 10:47:11 -06:00
Field G. Van Zee	68a5910974	Added comments to testsuite/input.operations. Details: - Added extensive comments to the top of testsuite/input.operations, which describe how to edit the file. - Removed input.operations.0 and input.operations.1. - Changed input.general to test all datatypes ("sdcz") by default.	2013-11-07 11:36:11 -06:00
Field G. Van Zee	a091a219bd	Minor fixes to piledriver configuration, ukernel. Details: - Applied a patch from Tyler that fixes minor staleness in the piledriver configuration and gemm micro-kernel. - Very minor changes to test suite input files.	2013-10-14 10:11:29 -05:00
Field G. Van Zee	be4833bd91	Added test suite modules for level-1f, 3 kernels. Details: - Added test modules in test suite for level-1f kernels and level-3 micro-kernels. (Duplication in the micro-kernels, for now, is NOT supported by these test modules.) - Added section override switches to test suite's input.operations file. - Added obj_t APIs for level-1f front-ends and their unblocked variants to facilitate the level-1f test modules. Also added front-end for dupl operation. - Added obj_t-based check routines for level-1f operations, which are called from the new front-ends mentioned above. - Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing factors as a function of datatype, which is needed by their respective test modules. - Whitespace changes to bli_kernel.h of all existing configurations.	2013-10-10 14:20:06 -05:00
Field G. Van Zee	73aa1e9f31	Added section overrides to test suite. Details: - Added new lines of input to the test suite's input.operations file, which allows the user to disable entire sections (levels) of tests. Before this change, the user had to manually disable each operation tests's "master switch". (This is why input.operations.0 existed: to allow a more convenient starting point for someone who only wanted to test one or a few operations.)	2013-10-01 17:01:18 -05:00
Field G. Van Zee	5e54f46ccb	Added template implementations and other tweaks. Details: - Added a 'template' configuration, which contains stub implementations of the level 1, 1f, and 3 kernels with one datatype implemented in C for each, with lots of in-file comments and documentation. - Modified some variable/parameter names for some 1/1f operations. (e.g. renaming vector length parameter from m to n.) - Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files to bli_kernel.h. - Modifed test suite to print out fusing factors for axpyf, dotxf, and dotxaxpyf, as well as the default fusing factor (which are all equal in the reference and template implementations). - Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these reference variants were implemented in terms of front-end routines rather that directly in terms of the kernels. (For example, axpy2v was implemented as two calls to axpyv rather than two calls to AXPYV_KERNEL.) - Changed the interface to dotxf so that it matches that of axpyf, in that A is assumed to be m x b_n in both cases, and for dotxf A is actually used as A^T. - Minor variable naming and comment changes to reference micro-kernels in frame/3/gemm/ukernels and frame/3/trsm/ukernels.	2013-09-30 12:58:18 -05:00
Field G. Van Zee	9013ad6ff2	Switched integer typedefs (again) to C types. Details: - Redefined gint_t and guint_t in terms of the standard C types long int and unsigned long int, respectively. - Changed testsuite default max problem size to 500. - Changed testsuite input.operations to use square problems for level-3 operation tests.	2013-09-04 13:36:07 -05:00
Field G. Van Zee	9ee6e12537	Changed dimension spec for gemm in testsuite. Details: - Encounted a bizarre typecasting bug whereby the test suite was not computing the proper dimension from the problem size and dimension specification when the latter was set to -3. Will investigate. Thanks to Fran for finding this "bug".	2013-09-03 21:53:27 -05:00
Field G. Van Zee	e8be081e68	Generalized matlab and file output in testsuite. Details: - Added a new option in input.general that allows outputting in matlab/octave format so that one can output in matlab format independently from outputting to files. - Adjusted input.operations according to above. - Added input.operations.0 and input.operations.1 with all options disabled and enabled, respectively.	2013-08-28 15:52:34 -05:00
Field G. Van Zee	2d9c667f3c	Fixed x86_64 kernel bugs and other minor issues. Details: - Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in unaligned subpartitions. We were already going out of our way a bit to handle edge cases in the first iteration for blocked variants, and this was simply the unblocked-fused extension of that idea. - Fixed control tree handling in her/her2/syr/syr2 that was not taking into account how the choice of variant needed to be altered for upper-stored matrices (given that only lower-stored algorithms are explicitly implemented). - Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b() macros to provide inlined versions of bli_determine_blocksize_[fb]() for use by unblocked-fused variants. - Integrated new blocksize_dim macros into gemv/hemv unf variants for consistency with that of the bugfix for trmv/trsv (both of which now use the same macros). - Modified bli_obj_vector_inc() so that 1 is returned if the object is a vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain conditions (e.g. dotv_opt_var1), an invalid increment was returned, which was invalid only because the code was expecting 1 (for purposes of performing contiguous vector loads) but got a value greater than 1 because the column stride of the object (e.g. rho) was inflated for alignment purposes (albeit unnecessarily since there is only one element in the object). - Replaced some old invocations of set0 with set0s. - Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly. - Fixed increment bug in cleanup loop of gemm ukernel for x86_64. - Added safeguard to test modules so that testing a problem with a zero dimension does not result in a failure. - Tweaked handling of zero dimensions in level-2 and level-3 operations' internal back-ends to correctly handle cases where output operand still needs to be scaled (e.g. by beta, in the case of gemm with k = 0).	2013-05-24 16:28:10 -05:00
Field G. Van Zee	768fcebaa8	Added unified test suite, and many fixes. Details: - Added a highly configurable, unified test suite. - Removed DUPB configuration constant from bl2_kernel.h and macro-kernel header files. Now, instead, DUPB is computed as (NDUP != 1) within each macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into incorrectly when DUPB was set to FALSE but the NDUP was still non-unit. By encoding both pieces of information into one constant in _kernel.h, it seems somewhat less likely others will encounter this bug in the future. - Added level-2 cache blocksizes to _kernel.h for reference configuration, and defined blocksizes in _cntl.c files to these default values. - Changed semantics of her2k and syr2k such that these operations no longer expect the B matrix to already be conjugate-transposed (or just transposed for syr2k). However, these semantics are preserved for the internal mechanics of the implementations, including the internal back-end and all blocked variants. - Inserted checks for real-valued alpha and beta for herk/her2k and herk, respectively. - Relaxed general object structure constraints in _basic_check() for gemv, ger. - Changed her front-end to NOT copy-cast to real projection; instead, this is replaced by selecting either the real part or both parts within the unblocked algorithm implementation, depending on the value of conjh. - Added conjh to all _check routines for her so that the code knows when to verify that alpha has an imaginary component equal to zero (for her, but not syr). - Changed control tree for her to forgo packing. - Added unit diagonal support to fnormm. - Redefined real versions of abval2s macros in terms of fabs(), fabsf(). - Redefined complex versions of sqrt2s macros using the actual "complex square root" formula. - Created new level-0 object-based routines, suffixed with "sc" (for "scalar"). - Defined new level-1v, -1d, and -1m versions of add and sub operations (two-operand add and subtract). - Added new scalar macros: - getris: acquire real and imaginary components. - setris: set real and imaginary components. - addjs: addition with conjugated x. - subjs: subtraction with conjugated x. - Defined new utility operations: - absumv: element-wise sum of absolute values for vector elements. - absumm: element-wise sum of absolute values for matrix elements. - mkherm: convert existing matrix to Hermitian. - mksymm: convert existing matrix to symmetric. - mktrim: convert existing matrix to triangular. - Added various error checking routines. - Added bl2_clock_min_diff(), which is used to more cleanly measure the wall clock time of a code block. - Added general stride support to bl2_obj_alloc_buffer(). - Added bl2_obj_init_scalar(). - Updated parameter mapping in bl2_param_map.c. - Added support for queriable version string. - Fixed a bug in the her2k macro-kernels (which currently are simply implemented in terms of two invocations of herk) whereby beta was being applied to both the first and second rank-k updates, rather than only the first. - Fixed a bug in trmm/trsm whereby transpose and right side cases were not properly implemented due to erroneous assumptions regarding aliasing and root objects. - Fixed a bug in the upper triangular trsm macro-kernel in which the wrong MR x NR block of B was being updated. - Fixed a bug in the inverts macro in the double real case whereby the value was typecast to float before inversion. This affected non-unit cases of dtrsm. - Fixed a bug in the reference kernels for gemmtrsm whereby the minus one constant was being applied incorrectly. - Fixed a bug in the overall treatment of non-unit alpha for trsm. The code now mimics the rank-k strategy of gemm, whereby alpah is applied during the first iteration of variant 3, with BLIS_ONE passed in instead for subsequent iterations. This also required passing alpha into the macro- kernels as well as the fused gemmtrsm micro-kernels. - Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being called for blocks strictly above the diagonal. While this sounds good in theory, this cannot be done because gemm_ker_var2 expects row panels of A to be packed from top to bottom, while for trsm_u, A is actually packed from bottom to top due to the reverse (BR->TL) nature of the algorithm. - Fixed a bug in packm_cxk() whereby panel packings with unit panel dimensions were mishandled due to incorrect arguments to the copyv kernel. Also changed the copyv kernel invocation to scal2v so that these edge cases are properly handled when scaling is requested. - Fixed a bug in packv_int() whereby an uninitialized object is passed in instead of the source object. - Fixed a bug whereby level-2 code could allocate memory dynamically via bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed a potential future bug whereby a mem_t object that is actually no longer "allocated" from the static pool is mistaken for being allocated due to failure to NULLify the buffer when the block was most recently released. - Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly toggled when the requested subpartition needed to be "reflected" due to it residing in an unstored region.	2013-02-11 13:20:44 -06:00

17 Commits