amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	4fe1435f20	Updated dupl implementation to use PACKNR and NR. Details: - Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR explicitly so navigate b1 so that situations where PACKNR > NR are supported. - Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and frame/3/trsm/ukernels to kernels/c99/. - Updated clarksville and flame configurations.	2013-04-22 19:00:43 -05:00
Field G. Van Zee	ca9e435c57	Fixed a bug in reference implementation of dupl. Details: - Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c), which resulted in incorrect duplication. - Updated old test drivers according to recently updated packm control tree creation interface. - Added 'restrict' to x86 gemm microkernel interface.	2013-04-15 09:59:46 -05:00
Field G. Van Zee	d43d1a0a2e	Appended 'f2c_' to abs, min, max macros in f2c.h. Details: - Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they would not conflict with anything defined by the user (or the language). Thanks to Devin Matthews for suggesting this fix. - Updated all instances of the above macros accordingly.	2013-04-11 16:28:17 -05:00
Field G. Van Zee	4afe3bfd82	Renamed/moved object scalar constant macros. Details: - Replaced scalar constant macro definitions in bli_const_defs.h with a single, simplier macro in bli_obj_macro_defs.h. - Updated invocations of old macros accordingly. - Removed bli_const_defs.h.	2013-04-09 17:45:39 -05:00
Field G. Van Zee	6684b73d55	Implemented amax operation and related changes. Details: - Implemented amax operation in BLIS. - Activated BLAS2BLIS routine mapping for new amax BLIS implementation. - Added integer support to [f]printv, [f]printm. - Added integer support to level-0 copys macros. - Updated printing of configuration information in test suite driver. - Comment changes to _config.h files. - Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are used for.	2013-04-02 13:06:20 -05:00
Field G. Van Zee	b65cdc57d9	Migrated 'bl2' prefix to 'bli'. Details: - Changed all filename and function prefixes from 'bl2' to 'bli'. - Changed the "blis2.h" header filename to "blis.h" and changed all corresponding #include statements accordingly. - Fixed incorrect association for Fran in CREDITS file.	2013-03-24 20:01:49 -05:00
Field G. Van Zee	132bffcef7	Removed several 'old' directories and files. Details: - Removed most of the 'old' directories scattered throughout the framework, which includes alternate/half-baked/broken implementations.	2013-03-24 18:49:36 -05:00
Field G. Van Zee	551ea4767a	Removed #include "blis2.h" from low-level headers. Details: - Removed #include of "blis2.h" from various lower-level, operation-specific header files throughout the framework. Given that these low-level headers are included within #blis2.h in a very specific order, #include'ing blis2.h within them directly is unnecessary.	2013-03-24 18:00:10 -05:00
Field G. Van Zee	e6ac623a90	Properly implemented beta == 0 semantics. Details: - Changed name of set0 and set0_mxn macros to set0s and set0s_mxn, respectively. - Added code to the following operations that sets the output operand to zero if the corresponding scalar is zero (rather than performing the floating-point multiply, or in the case of setv, copying the value). This will prevent nan's and inf's from creeping into results from uninitialized memory. - axpy - dotxv - scalv - scal2v - setv - gemv - ger - hemv - her - her2 - gemm reference ukernels	2013-02-13 18:44:59 -06:00
Field G. Van Zee	eb139ae256	Replaced bl2_abs() with _fabs() where appropriate.	2013-02-12 12:39:30 -06:00
Field G. Van Zee	474bac30c9	Removed level-0 macros projrs, grabis. Details: - Replaced instances of projrs and grabis macros with newer, more general-purpose getris.	2013-02-12 12:23:48 -06:00
Field G. Van Zee	1274e12437	Updated copyright headers from 2012 to 2013.	2013-02-11 14:37:47 -06:00
Field G. Van Zee	768fcebaa8	Added unified test suite, and many fixes. Details: - Added a highly configurable, unified test suite. - Removed DUPB configuration constant from bl2_kernel.h and macro-kernel header files. Now, instead, DUPB is computed as (NDUP != 1) within each macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into incorrectly when DUPB was set to FALSE but the NDUP was still non-unit. By encoding both pieces of information into one constant in _kernel.h, it seems somewhat less likely others will encounter this bug in the future. - Added level-2 cache blocksizes to _kernel.h for reference configuration, and defined blocksizes in _cntl.c files to these default values. - Changed semantics of her2k and syr2k such that these operations no longer expect the B matrix to already be conjugate-transposed (or just transposed for syr2k). However, these semantics are preserved for the internal mechanics of the implementations, including the internal back-end and all blocked variants. - Inserted checks for real-valued alpha and beta for herk/her2k and herk, respectively. - Relaxed general object structure constraints in _basic_check() for gemv, ger. - Changed her front-end to NOT copy-cast to real projection; instead, this is replaced by selecting either the real part or both parts within the unblocked algorithm implementation, depending on the value of conjh. - Added conjh to all _check routines for her so that the code knows when to verify that alpha has an imaginary component equal to zero (for her, but not syr). - Changed control tree for her to forgo packing. - Added unit diagonal support to fnormm. - Redefined real versions of abval2s macros in terms of fabs(), fabsf(). - Redefined complex versions of sqrt2s macros using the actual "complex square root" formula. - Created new level-0 object-based routines, suffixed with "sc" (for "scalar"). - Defined new level-1v, -1d, and -1m versions of add and sub operations (two-operand add and subtract). - Added new scalar macros: - getris: acquire real and imaginary components. - setris: set real and imaginary components. - addjs: addition with conjugated x. - subjs: subtraction with conjugated x. - Defined new utility operations: - absumv: element-wise sum of absolute values for vector elements. - absumm: element-wise sum of absolute values for matrix elements. - mkherm: convert existing matrix to Hermitian. - mksymm: convert existing matrix to symmetric. - mktrim: convert existing matrix to triangular. - Added various error checking routines. - Added bl2_clock_min_diff(), which is used to more cleanly measure the wall clock time of a code block. - Added general stride support to bl2_obj_alloc_buffer(). - Added bl2_obj_init_scalar(). - Updated parameter mapping in bl2_param_map.c. - Added support for queriable version string. - Fixed a bug in the her2k macro-kernels (which currently are simply implemented in terms of two invocations of herk) whereby beta was being applied to both the first and second rank-k updates, rather than only the first. - Fixed a bug in trmm/trsm whereby transpose and right side cases were not properly implemented due to erroneous assumptions regarding aliasing and root objects. - Fixed a bug in the upper triangular trsm macro-kernel in which the wrong MR x NR block of B was being updated. - Fixed a bug in the inverts macro in the double real case whereby the value was typecast to float before inversion. This affected non-unit cases of dtrsm. - Fixed a bug in the reference kernels for gemmtrsm whereby the minus one constant was being applied incorrectly. - Fixed a bug in the overall treatment of non-unit alpha for trsm. The code now mimics the rank-k strategy of gemm, whereby alpah is applied during the first iteration of variant 3, with BLIS_ONE passed in instead for subsequent iterations. This also required passing alpha into the macro- kernels as well as the fused gemmtrsm micro-kernels. - Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being called for blocks strictly above the diagonal. While this sounds good in theory, this cannot be done because gemm_ker_var2 expects row panels of A to be packed from top to bottom, while for trsm_u, A is actually packed from bottom to top due to the reverse (BR->TL) nature of the algorithm. - Fixed a bug in packm_cxk() whereby panel packings with unit panel dimensions were mishandled due to incorrect arguments to the copyv kernel. Also changed the copyv kernel invocation to scal2v so that these edge cases are properly handled when scaling is requested. - Fixed a bug in packv_int() whereby an uninitialized object is passed in instead of the source object. - Fixed a bug whereby level-2 code could allocate memory dynamically via bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed a potential future bug whereby a mem_t object that is actually no longer "allocated" from the static pool is mistaken for being allocated due to failure to NULLify the buffer when the block was most recently released. - Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly toggled when the requested subpartition needed to be "reflected" due to it residing in an unstored region.	2013-02-11 13:20:44 -06:00
Field G. Van Zee	806e74beb4	Defined Frobenius norm operations. Details: - Added level-0 grabis macro operation to grab imaginary component of one variable and copy it to the real component of another variable. - Defined sumsqv operation, which computes the sum of the absolute squares of the elements of a vector. This implementation is modeled after ?lassq in netlib LAPACK. - Defined fnormv and fnormm operations, which compute the Frobenius norm on vectors and matrices, respectively. These operations are treated as one- operand operations where the output norm value is the real projection of the datatype of the input operand. Both operations are implemented in terms of sumsqv.	2012-12-20 17:07:50 -06:00
Field G. Van Zee	2fecc88ca2	Fixed harmless macro bug in level-1m operations. Details: - Fixed some inconsistent usage of n_iter_max and n_iter in the two bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening despite the bug, which is why I had not discovered it until now.	2012-12-20 11:35:14 -06:00
Field G. Van Zee	00f3498a89	Initial commit.	2012-12-03 12:36:11 -06:00

16 Commits