amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 09:39:59 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	7ed415824d	Updated copyright headers (continued). Details: - Inserted "at Austin" into third clause of license declarations. Meant to include this change in previous commit.	2014-07-14 16:14:33 -05:00
Field G. Van Zee	5c2c6c8561	Updated copyright headers to contain "at Austin". Details: - Updated copyright headers to include "at Austin" in the name of the University of Texas. - Updated the copyright years of a few headers to 2014 (from 2011 and 2012).	2014-07-14 16:05:03 -05:00
Field G. Van Zee	94c0df797e	Changed order of zero dim / error checking. Details: - Updated level-2 and level-3 internal back-ends so that the operation's _check() function is called BEFORE any attempt to return early due to the presence of zero dimensions. This ordering makes more sense because (for example) object dimensions should match even if one of them is zero. Previously, a dimension mismatch could result in an early return with no error message. - Updated bli_check_object_buffer() so that NULL buffers result in an error only if the object is dimensionally non-empty (i.e., only if both of the object's dimensions are non-zero). This allows BLIS operations to be performed on dimensionally empty objects (i.e., where at least one dimension is zero). - Updated the error message associated with bli_check_object_buffer() to mention the newly relaxed constraint mentioned above, vis-a-vis non-zero dimensions.	2014-07-14 11:24:36 -05:00
Tyler Smith	f4fdfe8fc5	Merge http://github.com/flame/blis	2014-04-30 11:46:35 -05:00
Field G. Van Zee	262cdabcc8	Changed treatment of NULL object buffers. Details: - Relaxed the constraint in bli_obj_attach_buffer_check(), which required the buffer address being attached to be non-NULL. This is acceptable because the user was already able to create and use objects with NULL buffers (via bli_obj_create_without_buffer(), which initializes the buffer to NULL). - Inserted calls to newly defined function, bli_check_object_buffer(), into nearly all operations' _check() or _int_check() functions. This allows BLIS to abort peacefully if a computational routine is called with an object containing a NULL buffer. By contrast, under such conditions, BLAS would typically fail with a segmentation fault. - Within operation front-ends, moved the calls to _check()/_int_check() so that zero dimensions are checked first (and if found, execution returns with trivial or no computation). This resolves issue #7. Thanks to Jack Poulson for reporting this bug.	2014-04-28 16:48:25 -05:00
Tyler Smith	31bb065ba4	Merge http://github.com/flame/blis	2014-04-23 12:30:19 -05:00
Field G. Van Zee	58671597d3	Minor cleanups to level-2 _cntl.c files. Details: - Changed level-2 _cntl.c files so that the blocksizes for gemv are imported and used, rather than blocksizes being declared locally. - Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as 4m/3m variants). - Removed test/old/test_blis2.c.	2014-04-10 15:35:30 -05:00
Tyler Smith	2e727a025a	Modifying the thread info data structures This change makes each operation have its own thread info type, allowing more fine control of threading in operations that have different types of suboperations	2014-03-10 15:14:33 -05:00
Tyler Smith	01b125e815	First pass at adding parallelism to BLIS. Added a multithreading infrastructure that should be independent of multithreading implementation in the future. Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized.	2014-02-27 11:55:45 -06:00
Field G. Van Zee	d628bf1da1	Consolidated pack_t enums; retired VECTOR value. Details: - Changed the pack_t enumerations so that BLIS_PACKED_VECTOR no longer has its own value, and instead simply aliases to BLIS_PACKED_UNSPEC. This makes room in the three pack_t bits of the info field of obj_t so that two values are now unused, and may be used for other future purposes. - Updated sloppy terminology usage in comments in level-2 front-ends. (Replaced "is contiguous" with more accurate "has unit stride".)	2014-01-15 11:40:12 -06:00
Field G. Van Zee	2cb13600f9	Updated year in copyright headers to 2014.	2014-01-03 12:29:13 -06:00
Field G. Van Zee	d289f5d3a9	Whitespace changes to level-2 blocked variants. Details: - Joined some lines in level-2 blocked variants to match formatting used in level-3 blocked variants. - Streamlined implementation of bli_obj_equals() in bli_query.c.	2013-12-05 10:56:13 -06:00
Field G. Van Zee	b444489f10	Added new "attached" scalar representation. Details: - Added infrastructure to support a new scalar representation, whereby every object contains an internal scalar that defaults to 1.0. This facilitates passing scalars around without having to house them in separate objects. These "attached" scalars are stored in the internal atom_t field of the obj_t struct, and are always stored to be the same datatype as the object to which they are attached. Level-3 variants no longer take scalar arguments, however, level-3 internal back-ends stll do; this is so that the calling function can perform subproblems such as C := C - alpha * A * B on-the-fly without needing to change either of the scalars attached to A or B. - Removed scalar argument from packm_int(). - Observe and apply attached scalars in scalm_int(), and removed scalar from interface of scalm_unb_var1(). - Renamed the following functions (and corresponding invocations): bli_obj_init_scalar_copy_of() -> bli_obj_scalar_init_detached_copy_of() bli_obj_init_scalar() -> bli_obj_scalar_init_detached() bli_obj_create_scalar_with_attached_buffer() -> bli_obj_create_1x1_with_attached_buffer() bli_obj_scalar_equals() -> bli_obj_equals() - Defined new functions: bli_obj_scalar_detach() bli_obj_scalar_attach() bli_obj_scalar_apply_scalar() bli_obj_scalar_reset() bli_obj_scalar_has_nonzero_imag() bli_obj_scalar_equals() - Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c. - Renamed the following macros: bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1() bli_obj_is_scalar() -> bli_obj_is_1x1() - Defined new macros to set and copy internal scalars between objects: bli_obj_set_internal_scalar() bli_obj_copy_internal_scalar() - In level-3 internal back-ends, added conditional blocks where alpha and beta are checked for non-unit-ness. Those values for alpha and beta are applied to the scalars attached to aliases of A/B/C, as appropriate, before being passed into the variant specified by the control tree. - In level-3 blocked variants, pass BLIS_ONE into subproblems instead of alpha and/or beta. - In level-3 macro-kernels, changed how scalars are obtained. Now, scalars attached to A and B are multiplied together to obtain alpha, while beta is obtained directly from C. - In level-3 front-ends, removed old function calls meant to provide future support for mixed domain/precision. These can be added back later once that functionality is given proper treatment. Also, removed the creating of copy-casts of alpha and beta since typecasting of scalars is now implicitly handled in the internal back-ends when alpha and beta are applied to the attached scalars.	2013-12-03 16:08:30 -06:00
Field G. Van Zee	9552e6ee82	Removed optional scaling from packm control tree. Details: - Removed does_scale field from packm control tree node and bli_packm_cntl_obj_create() interface. Adjusted all invocations of _cntl_obj_create() accordingly. - Redefined/renamted macros that are used in aliasing so that now, bli_obj_alias_to() does a full alias (shallow copy) while bli_obj_alias_for_packing() does a partial alias that preserves the pack_mem-related fields of the aliasing (destination) object. - Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree will work just fine for bli_trmm3(). - Removed some commented vestiges of the typecasting functionality needed to support heterogeneous datatypes.	2013-11-24 11:40:31 -06:00
Field G. Van Zee	a98f78b715	Changed dim_t and inc_t to be signed integers. Details: - Redefined dim_t and inc_t in terms of gint_t (instead of guint_t). This will facilitate interoperability with Fortran in the future. (Fortran does not support unsigned integers.) - Redefined many instances of stride-related macros so that they return or use the absolute value of the strides, rather than the raw strides which may now be signed. Added new macros bli_is_row_stored_f() and bli_is_col_stored_f(), which assume positive (forward-oriented) strides, and changed the packm_blk_var[23] variants to use these macros instead of the existing bli_is_row_stored(), bli_is_col_stored(). - Added/adjusted typecasting to to various functions/macros, including bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer- related macros in bli_param_macro_defs.h. - Redefined bli_convert_blas_incv() macro so that the BLAS compatibility layer properly handles situations where vector increments are negative. Thanks to Vladimir Sukharev for pointing out this issue. - Changed type of increment parameters in bli_adjust_strides() from dim_t to inc_t. Likewise in bli_check_matrix_strides(). - Defined bli_check_matrix_object(), which checks for negative strides. - Redefined bli_check_scalar_object() and bli_check_vector_object() so that they also check for negative stride. - Added instances of bli_check_matrix_object() to various operations' _check routines.	2013-11-06 15:32:47 -06:00
Field G. Van Zee	5e54f46ccb	Added template implementations and other tweaks. Details: - Added a 'template' configuration, which contains stub implementations of the level 1, 1f, and 3 kernels with one datatype implemented in C for each, with lots of in-file comments and documentation. - Modified some variable/parameter names for some 1/1f operations. (e.g. renaming vector length parameter from m to n.) - Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files to bli_kernel.h. - Modifed test suite to print out fusing factors for axpyf, dotxf, and dotxaxpyf, as well as the default fusing factor (which are all equal in the reference and template implementations). - Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these reference variants were implemented in terms of front-end routines rather that directly in terms of the kernels. (For example, axpy2v was implemented as two calls to axpyv rather than two calls to AXPYV_KERNEL.) - Changed the interface to dotxf so that it matches that of axpyf, in that A is assumed to be m x b_n in both cases, and for dotxf A is actually used as A^T. - Minor variable naming and comment changes to reference micro-kernels in frame/3/gemm/ukernels and frame/3/trsm/ukernels.	2013-09-30 12:58:18 -05:00
Field G. Van Zee	12dbd2f334	Moved init_safe(), finalize_safe() to BLAS compat. Details: - Moved the bli_init_safe() and bli_finalize_safe() function calls from the BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto- initializers in the BLIS layer wasn't buying us anything because the user could still call the library with uninitialized global scalar constants, for example. Thus, we will just have to live with the constraint that bli_init() MUST be called before calling ANY routine with a bli_ prefix. - Added the missing _init_safe() and finalize_safe() calls to the level-1 BLAS compatibility wrappers.	2013-08-08 14:39:35 -05:00
Field G. Van Zee	2d9c667f3c	Fixed x86_64 kernel bugs and other minor issues. Details: - Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in unaligned subpartitions. We were already going out of our way a bit to handle edge cases in the first iteration for blocked variants, and this was simply the unblocked-fused extension of that idea. - Fixed control tree handling in her/her2/syr/syr2 that was not taking into account how the choice of variant needed to be altered for upper-stored matrices (given that only lower-stored algorithms are explicitly implemented). - Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b() macros to provide inlined versions of bli_determine_blocksize_[fb]() for use by unblocked-fused variants. - Integrated new blocksize_dim macros into gemv/hemv unf variants for consistency with that of the bugfix for trmv/trsv (both of which now use the same macros). - Modified bli_obj_vector_inc() so that 1 is returned if the object is a vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain conditions (e.g. dotv_opt_var1), an invalid increment was returned, which was invalid only because the code was expecting 1 (for purposes of performing contiguous vector loads) but got a value greater than 1 because the column stride of the object (e.g. rho) was inflated for alignment purposes (albeit unnecessarily since there is only one element in the object). - Replaced some old invocations of set0 with set0s. - Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly. - Fixed increment bug in cleanup loop of gemm ukernel for x86_64. - Added safeguard to test modules so that testing a problem with a zero dimension does not result in a failure. - Tweaked handling of zero dimensions in level-2 and level-3 operations' internal back-ends to correctly handle cases where output operand still needs to be scaled (e.g. by beta, in the case of gemm with k = 0).	2013-05-24 16:28:10 -05:00
Field G. Van Zee	9e2b227866	Renamed _set_trans(), _trans_status() macros. Details: - Renamed the following macros: bli_obj_set_trans() -> bli_obj_set_onlytrans() bli_obj_trans_status() -> bli_obj_onlytrans_status() to remove ambiguity as to which bits are read/updated.	2013-05-03 17:24:58 -05:00
Field G. Van Zee	6bfa96f848	Absorbed blocksize extensions into main objects. Details: - Revamped some parts of commit `b6ef84fad1` by adding blocksize extension fields to the blksz_t object rather than have them as separate structs. - Updated all packm interfaces/invocations according to above change. - Generalized bli_determine_blocksize_?() so that edge case optimization happens if and only if cache blocksizes are created with non-zero extensions. - Updated comments in bli_kernel.h files to indicate that the edge case blocksize extension mechanism is now available for use.	2013-04-30 19:35:54 -05:00
Field G. Van Zee	6a538fa7b1	Formatting change to mods in previous commit.	2013-04-15 14:40:31 -05:00
Field G. Van Zee	ea079d3559	Set structure of objects in level-2 BLIS APIs. Details: - Added missing statement to set structure field of local objects in top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for reporting this bug.	2013-04-15 14:31:40 -05:00
Field G. Van Zee	b65cdc57d9	Migrated 'bl2' prefix to 'bli'. Details: - Changed all filename and function prefixes from 'bl2' to 'bli'. - Changed the "blis2.h" header filename to "blis.h" and changed all corresponding #include statements accordingly. - Fixed incorrect association for Fran in CREDITS file.	2013-03-24 20:01:49 -05:00
Field G. Van Zee	551ea4767a	Removed #include "blis2.h" from low-level headers. Details: - Removed #include of "blis2.h" from various lower-level, operation-specific header files throughout the framework. Given that these low-level headers are included within #blis2.h in a very specific order, #include'ing blis2.h within them directly is unnecessary.	2013-03-24 18:00:10 -05:00
Field G. Van Zee	bb612f864e	Updated behavior of bl2_obj_induce_trans() macro. Details: - Changed bl2_obj_induce_trans() so that the transposition bit is no longer updated as part of the macro. All current uses of the macro have been coupled with instances of bl2_obj_set_trans() to clear the bit. - Added Jed to CREDITS file.	2013-03-01 12:55:42 -06:00
Field G. Van Zee	ede75693e5	Implemented blas2blis compatibility layer. Details: - Added the blas2blis compatibility layer, located in frame/compat. This includes virtually all of the BLAS, including banded and packed level-2 operations. - Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional initialization, which stores the "exit status" in an err_t, which is then read by the latter function to determine whether finalization should actually take place. - Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and level-3 BLAS-like wrappers. - Added configuration option to instruct BLIS to remain initialized whenever it automatically initializes itself (via bl2_init_safe()), until/unless the application code explicitly calls bl2_finalize(). - Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type templatization of blas2blis wrappers. - Defined level-0 scalar macro bl2_??swaps(). - Defined level-1v operation bl2_swapv(). - Defined some "Fortran" types to bl2_type_defs.h for use with BLAS wrappers.	2013-02-22 12:11:24 -06:00
Field G. Van Zee	e6ac623a90	Properly implemented beta == 0 semantics. Details: - Changed name of set0 and set0_mxn macros to set0s and set0s_mxn, respectively. - Added code to the following operations that sets the output operand to zero if the corresponding scalar is zero (rather than performing the floating-point multiply, or in the case of setv, copying the value). This will prevent nan's and inf's from creeping into results from uninitialized memory. - axpy - dotxv - scalv - scal2v - setv - gemv - ger - hemv - her - her2 - gemm reference ukernels	2013-02-13 18:44:59 -06:00
Field G. Van Zee	1274e12437	Updated copyright headers from 2012 to 2013.	2013-02-11 14:37:47 -06:00
Field G. Van Zee	768fcebaa8	Added unified test suite, and many fixes. Details: - Added a highly configurable, unified test suite. - Removed DUPB configuration constant from bl2_kernel.h and macro-kernel header files. Now, instead, DUPB is computed as (NDUP != 1) within each macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into incorrectly when DUPB was set to FALSE but the NDUP was still non-unit. By encoding both pieces of information into one constant in _kernel.h, it seems somewhat less likely others will encounter this bug in the future. - Added level-2 cache blocksizes to _kernel.h for reference configuration, and defined blocksizes in _cntl.c files to these default values. - Changed semantics of her2k and syr2k such that these operations no longer expect the B matrix to already be conjugate-transposed (or just transposed for syr2k). However, these semantics are preserved for the internal mechanics of the implementations, including the internal back-end and all blocked variants. - Inserted checks for real-valued alpha and beta for herk/her2k and herk, respectively. - Relaxed general object structure constraints in _basic_check() for gemv, ger. - Changed her front-end to NOT copy-cast to real projection; instead, this is replaced by selecting either the real part or both parts within the unblocked algorithm implementation, depending on the value of conjh. - Added conjh to all _check routines for her so that the code knows when to verify that alpha has an imaginary component equal to zero (for her, but not syr). - Changed control tree for her to forgo packing. - Added unit diagonal support to fnormm. - Redefined real versions of abval2s macros in terms of fabs(), fabsf(). - Redefined complex versions of sqrt2s macros using the actual "complex square root" formula. - Created new level-0 object-based routines, suffixed with "sc" (for "scalar"). - Defined new level-1v, -1d, and -1m versions of add and sub operations (two-operand add and subtract). - Added new scalar macros: - getris: acquire real and imaginary components. - setris: set real and imaginary components. - addjs: addition with conjugated x. - subjs: subtraction with conjugated x. - Defined new utility operations: - absumv: element-wise sum of absolute values for vector elements. - absumm: element-wise sum of absolute values for matrix elements. - mkherm: convert existing matrix to Hermitian. - mksymm: convert existing matrix to symmetric. - mktrim: convert existing matrix to triangular. - Added various error checking routines. - Added bl2_clock_min_diff(), which is used to more cleanly measure the wall clock time of a code block. - Added general stride support to bl2_obj_alloc_buffer(). - Added bl2_obj_init_scalar(). - Updated parameter mapping in bl2_param_map.c. - Added support for queriable version string. - Fixed a bug in the her2k macro-kernels (which currently are simply implemented in terms of two invocations of herk) whereby beta was being applied to both the first and second rank-k updates, rather than only the first. - Fixed a bug in trmm/trsm whereby transpose and right side cases were not properly implemented due to erroneous assumptions regarding aliasing and root objects. - Fixed a bug in the upper triangular trsm macro-kernel in which the wrong MR x NR block of B was being updated. - Fixed a bug in the inverts macro in the double real case whereby the value was typecast to float before inversion. This affected non-unit cases of dtrsm. - Fixed a bug in the reference kernels for gemmtrsm whereby the minus one constant was being applied incorrectly. - Fixed a bug in the overall treatment of non-unit alpha for trsm. The code now mimics the rank-k strategy of gemm, whereby alpah is applied during the first iteration of variant 3, with BLIS_ONE passed in instead for subsequent iterations. This also required passing alpha into the macro- kernels as well as the fused gemmtrsm micro-kernels. - Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being called for blocks strictly above the diagonal. While this sounds good in theory, this cannot be done because gemm_ker_var2 expects row panels of A to be packed from top to bottom, while for trsm_u, A is actually packed from bottom to top due to the reverse (BR->TL) nature of the algorithm. - Fixed a bug in packm_cxk() whereby panel packings with unit panel dimensions were mishandled due to incorrect arguments to the copyv kernel. Also changed the copyv kernel invocation to scal2v so that these edge cases are properly handled when scaling is requested. - Fixed a bug in packv_int() whereby an uninitialized object is passed in instead of the source object. - Fixed a bug whereby level-2 code could allocate memory dynamically via bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed a potential future bug whereby a mem_t object that is actually no longer "allocated" from the static pool is mistaken for being allocated due to failure to NULLify the buffer when the block was most recently released. - Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly toggled when the requested subpartition needed to be "reflected" due to it residing in an unstored region.	2013-02-11 13:20:44 -06:00
Field G. Van Zee	00f3498a89	Initial commit.	2012-12-03 12:36:11 -06:00

30 Commits