mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
345 lines
15 KiB
Plaintext
345 lines
15 KiB
Plaintext
commit 768fcebaa8be0eb936a6e7a02cd8a19438c79d99 (HEAD, tag: 0.0.2, origin/master, master)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 11 13:20:44 2013 -0600
|
|
|
|
Added unified test suite, and many fixes.
|
|
|
|
Details:
|
|
- Added a highly configurable, unified test suite.
|
|
|
|
- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
|
|
header files. Now, instead, DUPB is computed as (NDUP != 1) within each
|
|
macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
|
|
incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
|
|
By encoding both pieces of information into one constant in _kernel.h,
|
|
it seems somewhat less likely others will encounter this bug in the
|
|
future.
|
|
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
|
|
and defined blocksizes in _cntl.c files to these default values.
|
|
|
|
- Changed semantics of her2k and syr2k such that these operations no longer
|
|
expect the B matrix to already be conjugate-transposed (or just transposed
|
|
for syr2k). However, these semantics are preserved for the internal
|
|
mechanics of the implementations, including the internal back-end and all
|
|
blocked variants.
|
|
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
|
|
respectively.
|
|
|
|
- Relaxed general object structure constraints in _basic_check() for gemv, ger.
|
|
- Changed her front-end to NOT copy-cast to real projection; instead, this is
|
|
replaced by selecting either the real part or both parts within the unblocked
|
|
algorithm implementation, depending on the value of conjh.
|
|
- Added conjh to all _check routines for her so that the code knows when to
|
|
verify that alpha has an imaginary component equal to zero (for her, but
|
|
not syr).
|
|
- Changed control tree for her to forgo packing.
|
|
|
|
- Added unit diagonal support to fnormm.
|
|
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
|
|
- Redefined complex versions of sqrt2s macros using the actual "complex square
|
|
root" formula.
|
|
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
|
|
- Defined new level-1v, -1d, and -1m versions of add and sub operations
|
|
(two-operand add and subtract).
|
|
- Added new scalar macros:
|
|
- getris: acquire real and imaginary components.
|
|
- setris: set real and imaginary components.
|
|
- addjs: addition with conjugated x.
|
|
- subjs: subtraction with conjugated x.
|
|
- Defined new utility operations:
|
|
- absumv: element-wise sum of absolute values for vector elements.
|
|
- absumm: element-wise sum of absolute values for matrix elements.
|
|
- mkherm: convert existing matrix to Hermitian.
|
|
- mksymm: convert existing matrix to symmetric.
|
|
- mktrim: convert existing matrix to triangular.
|
|
|
|
- Added various error checking routines.
|
|
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
|
|
wall clock time of a code block.
|
|
- Added general stride support to bl2_obj_alloc_buffer().
|
|
- Added bl2_obj_init_scalar().
|
|
- Updated parameter mapping in bl2_param_map.c.
|
|
- Added support for queriable version string.
|
|
|
|
- Fixed a bug in the her2k macro-kernels (which currently are simply
|
|
implemented in terms of two invocations of herk) whereby beta was being
|
|
applied to both the first and second rank-k updates, rather than only
|
|
the first.
|
|
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
|
|
properly implemented due to erroneous assumptions regarding aliasing and
|
|
root objects.
|
|
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
|
|
MR x NR block of B was being updated.
|
|
- Fixed a bug in the inverts macro in the double real case whereby the
|
|
value was typecast to float before inversion. This affected non-unit cases
|
|
of dtrsm.
|
|
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
|
|
constant was being applied incorrectly.
|
|
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
|
|
now mimics the rank-k strategy of gemm, whereby alpah is applied during
|
|
the first iteration of variant 3, with BLIS_ONE passed in instead for
|
|
subsequent iterations. This also required passing alpha into the macro-
|
|
kernels as well as the fused gemmtrsm micro-kernels.
|
|
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
|
|
called for blocks strictly above the diagonal. While this sounds good in
|
|
theory, this cannot be done because gemm_ker_var2 expects row panels of
|
|
A to be packed from top to bottom, while for trsm_u, A is actually packed
|
|
from bottom to top due to the reverse (BR->TL) nature of the algorithm.
|
|
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
|
|
dimensions were mishandled due to incorrect arguments to the copyv kernel.
|
|
Also changed the copyv kernel invocation to scal2v so that these edge
|
|
cases are properly handled when scaling is requested.
|
|
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
|
|
instead of the source object.
|
|
- Fixed a bug whereby level-2 code could allocate memory dynamically via
|
|
bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
|
|
a potential future bug whereby a mem_t object that is actually no longer
|
|
"allocated" from the static pool is mistaken for being allocated due to
|
|
failure to NULLify the buffer when the block was most recently released.
|
|
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
|
|
toggled when the requested subpartition needed to be "reflected" due to it
|
|
residing in an unstored region.
|
|
|
|
commit be94fb84c0351602d7585269f29998e3bf83f899
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 4 10:55:21 2013 -0600
|
|
|
|
Added missing 'd' to fused gemmtrsm function name.
|
|
|
|
commit 879a179e1dee36f0c56765f2ab91a26861019b34
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 4 10:37:27 2013 -0600
|
|
|
|
Added debug statements to bl2_mm_acquire_m().
|
|
|
|
Details:
|
|
- Added printf() statements to bl2_mm_acquire_m() to help debug issues
|
|
with prematurely exhausted memory pool.
|
|
- Removed 'd' from kernel names of reference kernels in clarksville
|
|
configuration's bl2_kernel.h
|
|
|
|
commit 806e74beb4eafeef620a555ffbb3f6779e29c7b6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 17:07:50 2012 -0600
|
|
|
|
Defined Frobenius norm operations.
|
|
|
|
Details:
|
|
- Added level-0 grabis macro operation to grab imaginary component of one
|
|
variable and copy it to the real component of another variable.
|
|
- Defined sumsqv operation, which computes the sum of the absolute squares
|
|
of the elements of a vector. This implementation is modeled after ?lassq
|
|
in netlib LAPACK.
|
|
- Defined fnormv and fnormm operations, which compute the Frobenius norm on
|
|
vectors and matrices, respectively. These operations are treated as one-
|
|
operand operations where the output norm value is the real projection of
|
|
the datatype of the input operand. Both operations are implemented in terms
|
|
of sumsqv.
|
|
|
|
commit 66e80ce1aec099b2b2b0c4f295e38add2c921383
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 17:02:55 2012 -0600
|
|
|
|
Added GENT*R macros; tweaked bl2_machval defs.
|
|
|
|
Details:
|
|
- Added function and prototype macro-generating macros for GENTFUNCR and
|
|
GENTPROTR, which are one-operand macros with auxiliary real projection
|
|
types.
|
|
- Tweaked bl2_machval files to use new macros.
|
|
|
|
commit 2fecc88ca22142020573f168da715e8e9f3dd7de
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 11:35:14 2012 -0600
|
|
|
|
Fixed harmless macro bug in level-1m operations.
|
|
|
|
Details:
|
|
- Fixed some inconsistent usage of n_iter_max and n_iter in the two
|
|
bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening
|
|
despite the bug, which is why I had not discovered it until now.
|
|
|
|
commit 8945db6ec9f82168cf72411ad408b4fdb44ae0d1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 18 15:07:36 2012 -0600
|
|
|
|
Renamed x86,x86_64 kernels to indicate 'd' fusing.
|
|
|
|
Details:
|
|
- Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape
|
|
to emphasize that the fusing shape is not for all datatype instances, but
|
|
rather just for one (that of double-precision real). Other fusing shapes
|
|
would be proportional to their precision and domain "byte footprints".
|
|
- Corresponding changes to config/clarksville/bl2_kernel.h.
|
|
|
|
commit 6fbbdd4e194d06096ad08c5db61127be338067db
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 18 14:34:02 2012 -0600
|
|
|
|
More tweaks to _config.h, _kernel.h; smem tweaks.
|
|
|
|
Details:
|
|
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
|
|
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
|
|
accomplishes the same thing (enabling posix_memalign()) without enabling
|
|
all of the GNU extensions we don't need.
|
|
- Defined the size of the static memory pool in terms of MC, KC, and NC,
|
|
as well as two new constants that determine how many MCxKC blocks and
|
|
how many KCxNC blocks should be allocated (defined in bl2_config.h).
|
|
- In the case of static memory pool exhaustion, replaced the generic
|
|
bl2_abort() with a specific error code call.
|
|
|
|
commit 5d8bdb21c48e8fb11bef6128a242122cc1470a99
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 17 16:07:36 2012 -0600
|
|
|
|
Minor reordering of bl2_config.h definitions.
|
|
|
|
commit 4a83f67490136a898f558e273b76a687aed8b893
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 17 12:35:54 2012 -0600
|
|
|
|
Consolidated configuration headers.
|
|
|
|
Details:
|
|
- Merged contents of bl2_arch.h into bl2_config.h for reference and
|
|
clarksville configurations.
|
|
- Updated CREDITS, INSTALL, LICENSE, README files.
|
|
|
|
commit 0670c33cc14612f636ef09ede4133404ae0af6ba
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 14 12:45:26 2012 -0600
|
|
|
|
Fixed bug in reference gemm ukernels.
|
|
|
|
Details:
|
|
- Fixed a bug whereby, for the reference gemm ukernels, the matrix product
|
|
was not correctly accumulated and scaled (by alpha) into the output matrix
|
|
C. (Thanks to Fran for finding this bug.)
|
|
- Whitespace changes to reference trsm kernels.
|
|
|
|
commit e2e7cb2fbe615be4d375bc2dce88d03d98fadc9e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 13 18:17:54 2012 -0600
|
|
|
|
Expanded reference packm/unpackm kernel set to 16.
|
|
|
|
Details:
|
|
- Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and
|
|
unpackm.
|
|
- Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range"
|
|
kernel size is requested. (Thanks to Tyler for finding this bug.)
|
|
- Updated bl2_kernel.h to contain new _KERNEL definitions, according
|
|
to above changes, for 'reference' and 'clarksville' configurations.
|
|
- Updated CHANGELOG.
|
|
- Removed "output*.m" from .gitignore.
|
|
|
|
commit 17455a8bce038dd570356ab0c5c11d9a89f20248
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 17:23:32 2012 -0600
|
|
|
|
Minor updates towards to 0.0.1.
|
|
|
|
commit 7ad4ebef38b8e6eea9b6091844ba7294ec870271 (tag: 0.0.1)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 16:18:40 2012 -0600
|
|
|
|
Tweaks to get BLIS compiling again on clarksville.
|
|
|
|
Details:
|
|
- Updated header files and make_defs.mk in config/clarksville.
|
|
- Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
|
|
- Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
|
|
- Shuffled include statements in blis2.h.
|
|
|
|
commit cc58ea86010b1f046134d13b546c878389df9af5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 14:55:12 2012 -0600
|
|
|
|
Added template fragment.mk; updated .gitignore.
|
|
|
|
commit 714c527b0eb153b7e2040b79349edc8372f743fd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 19:54:04 2012 -0600
|
|
|
|
Added 'changelog' make target; other tweaks.
|
|
|
|
Details:
|
|
- Updated CHANGELOG.
|
|
- Added 'changelog' target to Makefile that runs 'git log --decorate' and
|
|
overwrites CHANGELOG with the output.
|
|
- Other trivial changes.
|
|
|
|
commit e4e5404d26aded4873278e85faf6f14ac32115b5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 17:34:53 2012 -0600
|
|
|
|
Define static memory pool size in bl2_config.h.
|
|
|
|
commit 19bb507d0de6a2bd3ce37cf616bdcd6b419ed641
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 17:18:00 2012 -0600
|
|
|
|
Refined INSTALL text; added 'showconfig' target.
|
|
|
|
Details:
|
|
- Added 'showconfig' target to Makefile.
|
|
- Added header files and ./config/<configname>/make_defs.mk as prerequisites
|
|
to object file rules.
|
|
- Added config.mk as prerequisite to library install rules.
|
|
- Edited and added to INSTALL file.
|
|
|
|
commit 26cb659dd79636489db5a051aa60fff80273a7b9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 15:34:53 2012 -0600
|
|
|
|
Added auto-detection of version string (via git).
|
|
|
|
Details:
|
|
- Added build/update-version-file.sh script for auto-detecting "version"
|
|
string and updating 'version' file accordingly. (If .git directory is
|
|
not present, then it is assumed this copy of BLIS is a downloaded
|
|
release, in which case 'version' file is left unchanged.)
|
|
- Added invocation of update-version-file.sh to configure script.
|
|
|
|
commit b0ecd0ff52fa6ffc9e1d9eb44c365f7f009a6204
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 14:27:11 2012 -0600
|
|
|
|
Wrote first draft of INSTALL file.
|
|
|
|
commit bcbe81235a35ccfdbcc2f2319a0ca6e04f75a785 (tag: 0.0.0)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 12:42:35 2012 -0600
|
|
|
|
Updated standalone test Makefile and other fixes.
|
|
|
|
Details:
|
|
- Major edits to test/Makefile to bring up-to-date wrt new build system;
|
|
should no longer be broken.
|
|
- Minor edits to top-level Makefile.
|
|
- Fixed copy-and-paste bugs in
|
|
- frame/1m/packm/ukernels/bl2_packm_ref_?xk.c
|
|
- frame/1m/unpackm/ukernels/bl2_unpackm_ref_?xk.c
|
|
|
|
commit 2f272b40f43307909736327f49d17737c7a05d37
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 4 19:22:14 2012 -0600
|
|
|
|
Added build system and continued reorganization.
|
|
|
|
Details:
|
|
- Added/renamed packm, unpackm kernels.
|
|
- Added machine value routines.
|
|
- Added param_map facility.
|
|
- Renamed AUTHORS to CREDITS.
|
|
- Added Makefile; continued to expand upon existing configure script.
|
|
- #define fuse_fac macros in operation headers if not defined already
|
|
(by the user in bl2_kernels.h).
|
|
|
|
commit 00f3498a8943be1b387f0d5c029c8c7891687ad5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 3 12:36:11 2012 -0600
|
|
|
|
Initial commit.
|