amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	ede75693e5	Implemented blas2blis compatibility layer. Details: - Added the blas2blis compatibility layer, located in frame/compat. This includes virtually all of the BLAS, including banded and packed level-2 operations. - Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional initialization, which stores the "exit status" in an err_t, which is then read by the latter function to determine whether finalization should actually take place. - Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and level-3 BLAS-like wrappers. - Added configuration option to instruct BLIS to remain initialized whenever it automatically initializes itself (via bl2_init_safe()), until/unless the application code explicitly calls bl2_finalize(). - Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type templatization of blas2blis wrappers. - Defined level-0 scalar macro bl2_??swaps(). - Defined level-1v operation bl2_swapv(). - Defined some "Fortran" types to bl2_type_defs.h for use with BLAS wrappers. 0.0.3	2013-02-22 12:11:24 -06:00
Field G. Van Zee	995edf43e2	Updated version file. (Forgot to in prev commit).	2013-02-21 14:30:50 -06:00
Field G. Van Zee	e823b08aaf	Fixed some scalar types in BLAS-like Herm APIs. Details: - Some of the scalars of Hermitian operations, such as alpha in her, alpha and beta in herk, and beta in her2k, need to be real. These arguments were typed incorrectly as the complex types. This has been fixed. Note the issue was only present in the BLAS-like APIs for these operations (not the native object-based interfaces).	2013-02-21 12:00:17 -06:00
Field G. Van Zee	5ece050a66	Updated version file. (Forgot to in prev commit).	2013-02-20 15:50:54 -06:00
Field G. Van Zee	f243034b8b	Changed API of packm_init_pack() to use blksz_t. Details: - Changed the interface of packm_init_pack() so that mult_m and mult_n are passed in as type blksz_t* instead of dim_t. - Make similar change for packv_init_pack().	2013-02-20 14:11:36 -06:00
Field G. Van Zee	da0c22f241	Minor changes to lower levels of scalm and setm. Details: - Removed diagx parameter from lower-level interfaces of scalm. - Modified scalm_basic_check() to expect an object with a nonunit diagonal. - Changed setm_unb_var1() so that having an implicit unit diagonal results in only the strictly lower or upper triangle of the matrix being modified.	2013-02-15 09:59:48 -06:00
Field G. Van Zee	2c836adadc	Updated beta == zero semantics of mulsc. Details: - Updated beta == zero semantics of mulsc. Hopefully this is the last operation that needed updating. - Added Devin to CREDITS file.	2013-02-14 10:42:56 -06:00
Field G. Van Zee	722b66c7dc	Removed some calls to setv() in test modules. Details: - Removed calls to setv() in test modules whose sole purpose was to initialize vectors to zero to ensure that nan's and inf's would not taint the computation. Now that beta == zero semantics have been updated to clear the output operand (when beta is zero), rather than multiply against it, these setv() calls are no longer needed.	2013-02-14 10:18:00 -06:00
Field G. Van Zee	e6ac623a90	Properly implemented beta == 0 semantics. Details: - Changed name of set0 and set0_mxn macros to set0s and set0s_mxn, respectively. - Added code to the following operations that sets the output operand to zero if the corresponding scalar is zero (rather than performing the floating-point multiply, or in the case of setv, copying the value). This will prevent nan's and inf's from creeping into results from uninitialized memory. - axpy - dotxv - scalv - scal2v - setv - gemv - ger - hemv - her - her2 - gemm reference ukernels	2013-02-13 18:44:59 -06:00
Field G. Van Zee	aedccbc85d	Fixed stale interface to packm_unb_var1(). Details: - Removed the control tree from the interface to packm_unb_var1(), which I meant to do when it was un-deprecated.	2013-02-13 18:29:53 -06:00
Field G. Van Zee	c23135669f	Un-deprecated packm_unb_var1.c (needed by l2 ops). Details: - Added bl2_packm_unb_var1() back into the mix once I realized that level-2 operations still need this routine for packing matrices. Now, whether level-2 operations should be packing matrices to begin with is another matter. But this fixes the segmentation fault one would have gotten when running bl2_gemv() on a general stride matrix.	2013-02-13 13:21:00 -06:00
Field G. Van Zee	cf49e35f98	Removed cntl tree usage from packm implementation. Details: - Added new fields to obj_t info field: - invert_diag - pack_order_if_upper - pack_order_if_lower These fields allow packm_init() to embed information that begins in the control tree into the object so that the packm implementation does not need to use control trees at all. This is being done to aid Bryan's DxT code generation. - Added macros that operate on above fields. - Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according to above changes. - Made similar (but much simpler) changes to packv. - Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify(). These were part of prototype implementations and are no longer needed.	2013-02-12 18:39:35 -06:00
Field G. Van Zee	eb139ae256	Replaced bl2_abs() with _fabs() where appropriate.	2013-02-12 12:39:30 -06:00
Field G. Van Zee	474bac30c9	Removed level-0 macros projrs, grabis. Details: - Replaced instances of projrs and grabis macros with newer, more general-purpose getris.	2013-02-12 12:23:48 -06:00
Field G. Van Zee	03a260a457	Restored executable permissions to scripts. Details: - Restored executable (0755) permissions to scripts that were touched by the recursive sed script that updated the copyright headers in the previous commit.	2013-02-12 11:45:34 -06:00
Field G. Van Zee	1274e12437	Updated copyright headers from 2012 to 2013.	2013-02-11 14:37:47 -06:00
Field G. Van Zee	3b620cc8e9	CHANGELOG update.	2013-02-11 13:38:07 -06:00
Field G. Van Zee	768fcebaa8	Added unified test suite, and many fixes. Details: - Added a highly configurable, unified test suite. - Removed DUPB configuration constant from bl2_kernel.h and macro-kernel header files. Now, instead, DUPB is computed as (NDUP != 1) within each macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into incorrectly when DUPB was set to FALSE but the NDUP was still non-unit. By encoding both pieces of information into one constant in _kernel.h, it seems somewhat less likely others will encounter this bug in the future. - Added level-2 cache blocksizes to _kernel.h for reference configuration, and defined blocksizes in _cntl.c files to these default values. - Changed semantics of her2k and syr2k such that these operations no longer expect the B matrix to already be conjugate-transposed (or just transposed for syr2k). However, these semantics are preserved for the internal mechanics of the implementations, including the internal back-end and all blocked variants. - Inserted checks for real-valued alpha and beta for herk/her2k and herk, respectively. - Relaxed general object structure constraints in _basic_check() for gemv, ger. - Changed her front-end to NOT copy-cast to real projection; instead, this is replaced by selecting either the real part or both parts within the unblocked algorithm implementation, depending on the value of conjh. - Added conjh to all _check routines for her so that the code knows when to verify that alpha has an imaginary component equal to zero (for her, but not syr). - Changed control tree for her to forgo packing. - Added unit diagonal support to fnormm. - Redefined real versions of abval2s macros in terms of fabs(), fabsf(). - Redefined complex versions of sqrt2s macros using the actual "complex square root" formula. - Created new level-0 object-based routines, suffixed with "sc" (for "scalar"). - Defined new level-1v, -1d, and -1m versions of add and sub operations (two-operand add and subtract). - Added new scalar macros: - getris: acquire real and imaginary components. - setris: set real and imaginary components. - addjs: addition with conjugated x. - subjs: subtraction with conjugated x. - Defined new utility operations: - absumv: element-wise sum of absolute values for vector elements. - absumm: element-wise sum of absolute values for matrix elements. - mkherm: convert existing matrix to Hermitian. - mksymm: convert existing matrix to symmetric. - mktrim: convert existing matrix to triangular. - Added various error checking routines. - Added bl2_clock_min_diff(), which is used to more cleanly measure the wall clock time of a code block. - Added general stride support to bl2_obj_alloc_buffer(). - Added bl2_obj_init_scalar(). - Updated parameter mapping in bl2_param_map.c. - Added support for queriable version string. - Fixed a bug in the her2k macro-kernels (which currently are simply implemented in terms of two invocations of herk) whereby beta was being applied to both the first and second rank-k updates, rather than only the first. - Fixed a bug in trmm/trsm whereby transpose and right side cases were not properly implemented due to erroneous assumptions regarding aliasing and root objects. - Fixed a bug in the upper triangular trsm macro-kernel in which the wrong MR x NR block of B was being updated. - Fixed a bug in the inverts macro in the double real case whereby the value was typecast to float before inversion. This affected non-unit cases of dtrsm. - Fixed a bug in the reference kernels for gemmtrsm whereby the minus one constant was being applied incorrectly. - Fixed a bug in the overall treatment of non-unit alpha for trsm. The code now mimics the rank-k strategy of gemm, whereby alpah is applied during the first iteration of variant 3, with BLIS_ONE passed in instead for subsequent iterations. This also required passing alpha into the macro- kernels as well as the fused gemmtrsm micro-kernels. - Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being called for blocks strictly above the diagonal. While this sounds good in theory, this cannot be done because gemm_ker_var2 expects row panels of A to be packed from top to bottom, while for trsm_u, A is actually packed from bottom to top due to the reverse (BR->TL) nature of the algorithm. - Fixed a bug in packm_cxk() whereby panel packings with unit panel dimensions were mishandled due to incorrect arguments to the copyv kernel. Also changed the copyv kernel invocation to scal2v so that these edge cases are properly handled when scaling is requested. - Fixed a bug in packv_int() whereby an uninitialized object is passed in instead of the source object. - Fixed a bug whereby level-2 code could allocate memory dynamically via bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed a potential future bug whereby a mem_t object that is actually no longer "allocated" from the static pool is mistaken for being allocated due to failure to NULLify the buffer when the block was most recently released. - Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly toggled when the requested subpartition needed to be "reflected" due to it residing in an unstored region. 0.0.2	2013-02-11 13:20:44 -06:00
Field G. Van Zee	be94fb84c0	Added missing 'd' to fused gemmtrsm function name.	2013-01-04 10:55:21 -06:00
Field G. Van Zee	879a179e1d	Added debug statements to bl2_mm_acquire_m(). Details: - Added printf() statements to bl2_mm_acquire_m() to help debug issues with prematurely exhausted memory pool. - Removed 'd' from kernel names of reference kernels in clarksville configuration's bl2_kernel.h	2013-01-04 10:37:27 -06:00
Field G. Van Zee	806e74beb4	Defined Frobenius norm operations. Details: - Added level-0 grabis macro operation to grab imaginary component of one variable and copy it to the real component of another variable. - Defined sumsqv operation, which computes the sum of the absolute squares of the elements of a vector. This implementation is modeled after ?lassq in netlib LAPACK. - Defined fnormv and fnormm operations, which compute the Frobenius norm on vectors and matrices, respectively. These operations are treated as one- operand operations where the output norm value is the real projection of the datatype of the input operand. Both operations are implemented in terms of sumsqv.	2012-12-20 17:07:50 -06:00
Field G. Van Zee	66e80ce1ae	Added GENT*R macros; tweaked bl2_machval defs. Details: - Added function and prototype macro-generating macros for GENTFUNCR and GENTPROTR, which are one-operand macros with auxiliary real projection types. - Tweaked bl2_machval files to use new macros.	2012-12-20 17:02:55 -06:00
Field G. Van Zee	2fecc88ca2	Fixed harmless macro bug in level-1m operations. Details: - Fixed some inconsistent usage of n_iter_max and n_iter in the two bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening despite the bug, which is why I had not discovered it until now.	2012-12-20 11:35:14 -06:00
Field G. Van Zee	8945db6ec9	Renamed x86,x86_64 kernels to indicate 'd' fusing. Details: - Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape to emphasize that the fusing shape is not for all datatype instances, but rather just for one (that of double-precision real). Other fusing shapes would be proportional to their precision and domain "byte footprints". - Corresponding changes to config/clarksville/bl2_kernel.h.	2012-12-18 15:07:36 -06:00
Field G. Van Zee	6fbbdd4e19	More tweaks to _config.h, _kernel.h; smem tweaks. Details: - Moved kernel-related definitions form bl2_config.h to bl2_kernel.h. - Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This accomplishes the same thing (enabling posix_memalign()) without enabling all of the GNU extensions we don't need. - Defined the size of the static memory pool in terms of MC, KC, and NC, as well as two new constants that determine how many MCxKC blocks and how many KCxNC blocks should be allocated (defined in bl2_config.h). - In the case of static memory pool exhaustion, replaced the generic bl2_abort() with a specific error code call.	2012-12-18 14:34:02 -06:00
Field G. Van Zee	5d8bdb21c4	Minor reordering of bl2_config.h definitions.	2012-12-17 16:07:36 -06:00
Field G. Van Zee	4a83f67490	Consolidated configuration headers. Details: - Merged contents of bl2_arch.h into bl2_config.h for reference and clarksville configurations. - Updated CREDITS, INSTALL, LICENSE, README files.	2012-12-17 12:35:54 -06:00
Field G. Van Zee	0670c33cc1	Fixed bug in reference gemm ukernels. Details: - Fixed a bug whereby, for the reference gemm ukernels, the matrix product was not correctly accumulated and scaled (by alpha) into the output matrix C. (Thanks to Fran for finding this bug.) - Whitespace changes to reference trsm kernels.	2012-12-14 12:45:26 -06:00
Field G. Van Zee	e2e7cb2fbe	Expanded reference packm/unpackm kernel set to 16. Details: - Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and unpackm. - Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range" kernel size is requested. (Thanks to Tyler for finding this bug.) - Updated bl2_kernel.h to contain new _KERNEL definitions, according to above changes, for 'reference' and 'clarksville' configurations. - Updated CHANGELOG. - Removed "output*.m" from .gitignore.	2012-12-13 18:17:54 -06:00
Field G. Van Zee	17455a8bce	Minor updates towards to 0.0.1.	2012-12-10 17:23:32 -06:00
Field G. Van Zee	7ad4ebef38	Tweaks to get BLIS compiling again on clarksville. Details: - Updated header files and make_defs.mk in config/clarksville. - Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone). - Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h. - Shuffled include statements in blis2.h. 0.0.1	2012-12-10 16:18:40 -06:00
Field G. Van Zee	cc58ea8601	Added template fragment.mk; updated .gitignore.	2012-12-10 14:55:12 -06:00
Field G. Van Zee	714c527b0e	Added 'changelog' make target; other tweaks. Details: - Updated CHANGELOG. - Added 'changelog' target to Makefile that runs 'git log --decorate' and overwrites CHANGELOG with the output. - Other trivial changes.	2012-12-07 19:54:04 -06:00
Field G. Van Zee	e4e5404d26	Define static memory pool size in bl2_config.h.	2012-12-07 17:34:53 -06:00
Field G. Van Zee	19bb507d0d	Refined INSTALL text; added 'showconfig' target. Details: - Added 'showconfig' target to Makefile. - Added header files and ./config/<configname>/make_defs.mk as prerequisites to object file rules. - Added config.mk as prerequisite to library install rules. - Edited and added to INSTALL file.	2012-12-07 17:18:00 -06:00
Field G. Van Zee	26cb659dd7	Added auto-detection of version string (via git). Details: - Added build/update-version-file.sh script for auto-detecting "version" string and updating 'version' file accordingly. (If .git directory is not present, then it is assumed this copy of BLIS is a downloaded release, in which case 'version' file is left unchanged.) - Added invocation of update-version-file.sh to configure script.	2012-12-06 15:34:53 -06:00
Field G. Van Zee	b0ecd0ff52	Wrote first draft of INSTALL file.	2012-12-06 14:27:11 -06:00
Field G. Van Zee	bcbe81235a	Updated standalone test Makefile and other fixes. Details: - Major edits to test/Makefile to bring up-to-date wrt new build system; should no longer be broken. - Minor edits to top-level Makefile. - Fixed copy-and-paste bugs in - frame/1m/packm/ukernels/bl2_packm_ref_?xk.c - frame/1m/unpackm/ukernels/bl2_unpackm_ref_?xk.c 0.0.0	2012-12-06 12:42:35 -06:00
Field G. Van Zee	2f272b40f4	Added build system and continued reorganization. Details: - Added/renamed packm, unpackm kernels. - Added machine value routines. - Added param_map facility. - Renamed AUTHORS to CREDITS. - Added Makefile; continued to expand upon existing configure script. - #define fuse_fac macros in operation headers if not defined already (by the user in bl2_kernels.h).	2012-12-04 19:22:14 -06:00
Field G. Van Zee	00f3498a89	Initial commit.	2012-12-03 12:36:11 -06:00

40 Commits