Files
blis/CHANGELOG
Field G. Van Zee 75405a2b83 CHANGELOG update.
2013-05-01 15:00:30 -05:00

1479 lines
60 KiB
Plaintext

commit 6bfa96f84887dec0b4cf8be5d38dd634c2f8951d (HEAD, tag: 0.0.7, origin/master, master)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 30 19:35:54 2013 -0500
Absorbed blocksize extensions into main objects.
Details:
- Revamped some parts of commit b6ef84fad1c9 by adding blocksize extension
fields to the blksz_t object rather than have them as separate structs.
- Updated all packm interfaces/invocations according to above change.
- Generalized bli_determine_blocksize_?() so that edge case optimization
happens if and only if cache blocksizes are created with non-zero
extensions.
- Updated comments in bli_kernel.h files to indicate that the edge case
blocksize extension mechanism is now available for use.
commit bc7c8005cedbe50961ac2a99aeeabf4e9f9a8e9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 25 17:16:59 2013 -0500
Added option to disable err checking in testsuite.
Details:
- Added a new line to input.general that allows one to specify the error-
checking level to use for each BLIS experiment. The only two levels
supported for now are "no error checking" and "full error checking".
commit 096b366ddcfe386f44419ef84d8df8be13825f86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 25 16:43:43 2013 -0500
Use cntl trees that block in n dimension.
Details:
- Updated _cntl.c files for each level-3 operation to induce blocked
algorithms that first paritition in the n dimension with a blocksize
of NC. Typically this is not an issue since only very large problems
exceed that of NC. But developers often run very large problems, and
so this extra blocking should be the default.
- Removed some recently introduced but now unused macros from
bli_param_macro_defs.h.
commit b6e24b23cb4dfc488c1c9c70d596539c2287f72e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 25 12:06:12 2013 -0500
Use PASTEMAC in macro-kernels (over MAC2 or MAC3).
Details:
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
and PASTEMAC3) with those that only use a single type (PASTEMAC).
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
accommodate above change.
- Fixed comment typo in bli_config.h files.
- Added .nfs* pattern to .gitignore.
commit df80acf517dde180ddcc5835c6136b2fa7556d4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 19:43:23 2013 -0500
Fixed computation of b_next in L3 macro-kernels.
Details:
- Restructured herk_l and herk_u macro-kernels in the imagine of trmm
and trsm, in that the edge cases are captured by the main loop, rather
than trying to have "cleanup" sections that result in four distinct
parts (interior, bottom edge, right edge, bottom-right edge) of the
code.
- Fixed the way b_next was being computed in the non-gemm level-3
macro-kernels (herk, trmm, trsm). The way they are computed now matches
that of gemm.
commit 3671528cf8efe4b445d196665143a5c50c2c6048
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 19:12:14 2013 -0500
Fixed minor bug in computing b_next in gemm.
commit db072a5b4a039a9a668ef951333ecfb5bd3a74b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 17:49:10 2013 -0500
Fixed rare edge case bug in herk_l macro-kernel.
Details:
- Fixed a potential bug in herk_l at the m_left edge case. If MR was
chosen to be much larger than NR, then one could encounter edge cases
in the the MC dimension that fall entirely below the diagonal, which
the previous implementation of the herk_l macro-kernel was not allowing
for.
commit 1dab11e37d1cb403cbe75b73a644c00de534f104
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 17:17:11 2013 -0500
Updated x86 gemmtrsm ukernels to use alpha.
commit 9d10d7dd9bc92a993fea7162bfa5983f75506f49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 16:00:18 2013 -0500
Added a_next, b_next arguments to micro-kernels.
Details:
- Added two more arguments to the gemm and gemmtrsm microkernels: the
addresses of the next micro-panels of A and B. By passing these
pointers into the micro-kernel, we allow the micro-kernel author to
prefetch micro-panels of A and B as necessary (though this is
completely optional; these addresses may also be safely ignored).
- Updated all seven macro-kernels so that they compute and pass in
a_next and b_next. Note that ONLY the gemm macro-kernel computes
a_next and b_next with the precise semantics we want. I will go back
and fix the other macro-kernels in the near future.
- Added 'restrict' to various micro-kernels from which it was missing.
commit f3815dc84d385c514a5acaf1e925424a57be2f51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 11:12:33 2013 -0500
Added code for backward edge-case blocking.
Disabled:
- Edited bli_determine_blocksize_b() to include experimental (and
currently disabled) code that computes extended blocks.
- Updated commnts relate to above changes.
- Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.
commit 4fe1435f20e8fc7dd72f795ac58c8e236e6c631b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 22 19:00:43 2013 -0500
Updated dupl implementation to use PACKNR and NR.
Details:
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
explicitly so navigate b1 so that situations where PACKNR > NR are
supported.
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
frame/3/trsm/ukernels to kernels/c99/.
- Updated clarksville and flame configurations.
commit 2d6f9e83799a46d52d7901e275f8fd67f0a0edc6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Apr 21 15:10:34 2013 -0500
Disabled blocksize checks for memory pools.
Details:
- Temporarily disabled checks that ensure that enough memory will be allocated
by the contiguous memory allocator for all types, given that the values for
double precision real are the ones used to allocate the space. These checks
can easily go awry in certain situations, especially if you are developing for
only one datatype. So for now, they are probably more trouble than they are
worth.
commit b6ef84fad1c9884c84b7f1350a0bcdfe1737e8f2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Apr 21 15:00:24 2013 -0500
Allow ldim of packed micro-panels != MR, NR.
Details:
- Made substantial changes throughout the framework to decouple the leading
dimension (row or column stride) used within each packed micro-panel from
the corresponding register blocksize. It appears advantageous on some
systems to use, for example, packed micro-panels of A where the column
stride is greater than MR (whereas previously it was always equal to MR).
- Changes include:
- Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
to use when packing micro-panels of A and B.
- Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
where appropriate, instead of MR and NR.
- Added pd field (panel dimension) to obj_t.
- New interface to bli_packm_cntl_obj_create().
- Renamed bli_obj_packed_length()/_width() macros to
bli_obj_padded_length()/_width().
- Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
- Print out new cache and register blocksize extensions in test suite.
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
blocksize for edge cases, which can improve performance at the margins.
commit 59fca58dbe678d79c1df0916b022afbeac7c48fa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 19 15:26:29 2013 -0500
Fixed bug in compatibility layer (her2k/syr2k).
Details:
- Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c
and bla_syr2k.c, that caused incorrect computation to occur when the BLAS
interface caller requests the [conjugate-]transpose case. Thanks to Bryan
Marker for reporting the behavior that led to this bug.
commit 09eacbd1ab1380a95a0e9625726b45e43ed102d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 18 19:39:13 2013 -0500
Changed old level3 test drivers to call front-ends.
Details:
- Changed old level-3 test drivers, in 'test' directory, to always call the
front-end object API instead of the internal back-end with the locally
defined control tree.
commit 83e45de23e565138b8fde06fb11cfedc973b7246
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 18 18:33:03 2013 -0500
Allow packm_init() to reacquire a too-small mem_t.
Details:
- Changed bli_packm_init() to react differently to a situation where a pack
obj_t has an already-allocated mem_t entry that has a buffer that is smaller
than what will be needed to hold the block/panel that now needs to be
packed. Previously, this situation was treated with an abort() since I
assumed something was horribly wrong. I have changed the code so that it now
reacts by releasing the previous mem_t and re-acquires a new mem_t with the
new information. (This change was done at the request of Bryan Marker to
facilitate code generation via DxT.)
commit a6990434173b0cf651f8521194f3aef738deb7d2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 18 13:52:47 2013 -0500
Fixed bug in packing block of A for hemm/symm.
Details:
- Fixed a bug in bli_packm_blk_var2() that affected the packing functionality
of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or
symmetric matrix where the block of A being packed intersects the diagonal,
but some of its micro-panels do not intersect the diagonal and lie completely
in the unstored region. Thanks to Francisco Igual for reporting this bug.
- Comment updates to both _blk_var2.c and _blk_var3.c.
commit c92e7590e1934f830814ab614c794215ebe0c415
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 17 20:53:29 2013 -0500
Activated bli_packm_acquire_mpart_t2b().
Details:
- Removed the overly-paranoid bli_abort() from the end of
bli_packm_acquire_mpart_t2b(), to allow others to experiment with
partitioning through packed blocks of A. Also, and more importantly,
changed an earlier check that was causing an erroneous (but
coincidentally redundant) abort(). Also, updated some of the comments
in bli_packm_part.c.
commit bea579e9f009a44e08008eb14d09f38748ab2b53
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 16 19:43:14 2013 -0500
Allow creation of "empty" objects.
Details:
- Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and
modified bli_adjust_strides() to explicitly handle m = n = 0.
- Updated bli_check_matrix_strides() to allow cases where m = n = 0.
commit 7904e20f2e6908571ee5008da2a08084198eefae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 16 17:37:16 2013 -0500
Fixed "root" object bug in bli_her[2]k/syr[2]k.
Details:
- Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k,
that manifested as the incorrect triangle being updated. It occurred when
the user would pass in a matrix object that was correctly marked as
symmetric/Hermitian and lower-stored, but whose root object was never marked
as lower (or upper). We now alias and re-assign root status for matrix C
within the front-ends. Note that trmm and trsm were already doing this,
albeit for a slightly different reason (to allow the internal back-end to
choose which algorithm to run--lower or upper--based on the uplo of the root
object for both left and right side cases). Thanks to Bryan Marker for
leading me to this bug.
commit 19155a768dd97b57cfb59c32fa8e54a344ec66e1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 16 11:24:03 2013 -0500
Fixed overzealous type-checking in bli_getsc().
Details:
- Relaxed type checking in getsc so that the input object could be a constant
and not just a proper floating-point type. (If it is a constant, default to
extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
bug.
- Added definition for bli_is_constant() in bli_param_macro_defs.h
- Comment updates to various level-0 scalar routines.
commit 2ee6bbca2953d04c967685da9735b3eaf8a4b813
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 19:27:57 2013 -0500
Fixed bug in bli_obj_is_packed() and renamed.
Details:
- This macro is used to determine whether the partitioning routines should
call a corresponding packm_part routine instead. However, it was
unintentionally catching matrices that were marked as "packed" by virtue
of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
The macro has now been renamed to bli_obj_is_panel_packed(), and now only
checks for row or column panel packing. (Note that I first attempted to
fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
erroneous behavior that led me to this bug.
commit 99b99eebe70336b5f28039a4a084aa7f5fa7059d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 17:54:43 2013 -0500
Removed local reference ukernel blocksize macros.
Details:
- Removed locally defined gemm microkernel blocksize macros from _mxn
reference microkernel definition and header. Meant to include this in
a recent/previous commit (0020ef7c8271).
commit 6a538fa7b164655f41cea5b9c8d3902438bda66b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 14:40:31 2013 -0500
Formatting change to mods in previous commit.
commit ea079d35591e808971d2d98a1a7d9f89bc1f7c2f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 14:31:40 2013 -0500
Set structure of objects in level-2 BLIS APIs.
Details:
- Added missing statement to set structure field of local objects in
top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
reporting this bug.
commit d9948c541c0446e20e249a1ccc83709ce51b7aa8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 10:21:26 2013 -0500
Tweak to test suite function string construction.
Details:
- Fixed a minor bug in the way that the test suite would construct function
name strings when the user anchored all parameters in input.operations.
In this case, the test driver would mistake this situation for one where
the operation simply had no parameters to begin with, and thus would not
include the parameter string in the function string that is output for
every result.
commit ca9e435c57c5c7a000d2a32681dd8070ba850abd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 09:59:46 2013 -0500
Fixed a bug in reference implementation of dupl.
Details:
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
which resulted in incorrect duplication.
- Updated old test drivers according to recently updated packm control tree
creation interface.
- Added 'restrict' to x86 gemm microkernel interface.
commit 26cbd52e364bbe439e3744101cd5a6cbcb82dffd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Apr 14 19:05:33 2013 -0500
Modified bli_kernel.h include order in blis.h.
Details:
- Delayed #include of bli_kernel.h in blis.h to prevent a situation where
_kernel.h includes an optimized microkernel header, which uses BLIS types
such as dim_t and inc_t, which would precede the definition of those types
in bli_type_defs.h.
- Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
(immediately after that of bli_kernel.h).
commit 3414a23c38b0de45a8034b3dda2fc4b5a755e4e1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 13 16:53:16 2013 -0500
CHANGELOG update.
commit ec16c52f2ecf419c749175ce0a297441c10f1c68 (tag: 0.0.6)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 13 16:41:16 2013 -0500
Updated INSTALL file (now redirects to website).
commit 0020ef7c82711a7ebf08e5174f939bee2563184c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 13 15:26:35 2013 -0500
Removed gemmtrsm-, trsm-specific blocksize macros.
Details:
- Modified gemmtrsm micro-kernel wrappers to use new aliased blocksize macros
instead of operation-specific ones.
- Removed local, gemmtrsm-specific blocksize macro definitions found in
micro-kernel header files.
(Meant to include above changes in 31b100e7bf4a.)
- Added comments to reference gemmtrsm micro-kernel wrapper implementation.
commit 1a9f427b85bb95aaa9e54c8ff8ecad8734b361ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 12 15:25:54 2013 -0500
Added/renamed alignment constants to _config.h.
Details:
- Added new memory alignment constants:
BLIS_HEAP_STRIDE_ALIGN_SIZE (previously assumed to be same as SYSTEM_MEM)
BLIS_CONTIG_ADDR_ALIGN_SIZE (previously assumed to be same as PAGE_SIZE)
BLIS_STACK_BUF_ALIGN_SIZE (previously not enforced)
and renamed existing ones
BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
to better convey what the alignment factor is used for (and what it is
not used for).
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
into macro-kernels to specify stack alignment of temporary buffers.
- Modified test suite driver to output new constants.
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
use bli_align_dim_to_size(), which takes a third argument (the desired
alignment).
commit a77d10e87e3c0ab55ec14d74c285bc95c06285c3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 12 11:40:55 2013 -0500
Fixed an bug in axpyv/axpym when alpha is unit.
Details:
- Fixed bug whereby axpyv and axpym were incorrectly simplifying to a copy,
rather than an add, when alpha = 1. Thanks to Bryan Marker for identifying
this bug.
commit 0495bd1d6de5995fe2fb79b321eec79e961eb7a5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 16:39:25 2013 -0500
Moved _POSIX_C_SOURCE def to compiler cmd line.
Details:
- Removed the #define of _POSIX_C_SOURCE in bli_config.h (for both reference
and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
the compiler command line arguments in make_defs.mk (for both configs).
Thanks to Devin Matthews for suggesting this change.
commit d43d1a0a2ef6de4bc57627566aef8e3fdb458b8c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 16:28:17 2013 -0500
Appended 'f2c_' to abs, min, max macros in f2c.h.
Details:
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
would not conflict with anything defined by the user (or the language).
Thanks to Devin Matthews for suggesting this fix.
- Updated all instances of the above macros accordingly.
commit 31b100e7bf4aeaa4ceafefd2b6c3102d5fbc4cbb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 11:11:52 2013 -0500
Added new kernel blocksize macro aliases.
Details:
- Added new macros that alias level-3 cache and register blocksize macros
to names that can be constructed via the PASTEMAC macro. These aliased
macro definitions live inside bli_kernel_macro_defs.h, which is now
#included after bli_kernel.h.
- Modified macro-kernels to use new aliased blocksize macros instead of
operation-specific ones.
- Removed local, operation-specific kernel blocksize macro definitions
(found in macro-kernel header files).
commit bd2b24ba65b36d7c07c5918a3838ce2ff57c4b48
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 10:35:39 2013 -0500
Updated CREDITS file.
commit 79328c15410215737f3f14cd069328cf52aa11fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 10:32:14 2013 -0500
Reverted testsuite object files' home to 'obj'.
Details:
- Removed 'obj' and 'lib' from .gitignore.
- Added testsuite/obj/.gitkeep (which is an empty file).
- Updated testsuite/Makefile accordingly.
- Thanks to Vernon Austel for pointing out the .gitkeep trick to tracking
empty directories in git.
commit 4afe3bfd82c03e1e97b58b7d250588a0d28541e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 9 17:45:39 2013 -0500
Renamed/moved object scalar constant macros.
Details:
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
simplier macro in bli_obj_macro_defs.h.
- Updated invocations of old macros accordingly.
- Removed bli_const_defs.h.
commit 357893f5be5c56ab7b062874005e77e614b23f06
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 9 14:48:15 2013 -0500
Applied fix from prev commit to gemmtrsm_?_ref_4x4
Details:
- Fixed hard-coded kernels in bli_gemmtrsm_l_ref_4x4.c and
bli_gemmtrsm_u_ref_4x4.c.
commit 54988e8dca44475610bcaee5a7bc1c40e8921402
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 19:08:43 2013 -0500
Fixed a performance bug in trsm.
Details:
- Fixed a bug in the reference implementations of the gemmtrsm wrappers
(bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the
reference gemm microkernel was hard-coded, and thus always called, even
when GEMM_UKERNEL was defined to point to an optimzied microkernel. This
manifested as artificially low trsm performance for all problem sizes, but
especially for small problem sizes as it only affected blocks of A that
intersected the diagonal. Thanks to Mike Kistler of IBM for helping me
find this bug.
commit a7252e40b5c351eef9a1df531ea0ef25cb5fb705
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 16:08:22 2013 -0500
Generate testsuite objects 'src'.
Details:
- Tweaked the testsuite makefile so that object files are stored in 'src'
rather than 'obj', since (a) the top-level .gitignore dictates that
obj directories are to be ignored, and (b) since git has problems
tracking empty directories. Now, users do not need to create their own
obj directories within their own local clones of BLIS.
commit 803871c55b60d3c225ad9a0607fa507a9c16aab7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 15:18:42 2013 -0500
Minor formatting changes.
commit a571af816d72727e16cad37007e7043b9d6fa362
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 15:00:13 2013 -0500
Fixed definition of bli_is_packed_object() macro.
Details:
- Changed the definition of bli_is_packed_object() so that it keys off of the
value of the pack schema bits in the info field of obj_t, rather than
comparing the obj_t buffer with that of the mem_t entry. This was the cause
of a very low probability bug whereby uninitialized memory caused the macro
to evaluate to TRUE even though the object in question was not packed.
Thanks to Vernon Austel of IBM for helping discover this bug.
- Changed an abort() in bli_packm_part() to a not-yet-implemented.
commit 3be14c32f735ecc6169d3ab6370cf8b69162acec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 6 12:54:45 2013 -0500
Updated information in testsuite output header.
Details:
- Added to the information that is echoed at the beginning of the test suite's
output, and also re-labeled some existing information.
commit 874707c1b183a4dd9a91dbfd4ea1522384c190df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 5 17:19:43 2013 -0500
Fixed edge case handling bug in herk macrokernels.
Details:
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
only manifests when BLIS is configured such that MR != NR. The bug involves
incorrectly detecting edge cases, which resulted in some parts of matrix C
potentially being skipped and not updated, depending on the problem size.
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
8 and 4, respectively, so that I can better stress the framework on a
day-to-day basis. (The fact that they were both equal to 4 for so long is
why I did not stumble upon this bug much sooner.)
commit 7cbda15291d3e01300e71c286b9657b7ef0708bf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 4 15:25:43 2013 -0500
Added reference microkernels for arbitrary MR, NR.
Details:
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
contain explicit loops over MR and NR, thus allowing them to be used
unmodified by developers who want to build a reference library with
custom register blocksizes.
- Changed config/reference/bli_kernel.h to use above ukernels by default.
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
to use 'restrict' keyword.
- Added -funroll-loops option to config/reference/make_defs.mk.
- Updated comments in bli_kernel.h describing constraints on register and
cache blocksizes.
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
single-char macros are also defined.
commit 6684b73d5501f91d24a79e26655a42819c9b3114
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 2 13:06:20 2013 -0500
Implemented amax operation and related changes.
Details:
- Implemented amax operation in BLIS.
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
- Added integer support to [f]printv, [f]printm.
- Added integer support to level-0 copys macros.
- Updated printing of configuration information in test suite driver.
- Comment changes to _config.h files.
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
used for.
commit fb68087f8727cd5fd656a742a110e54fb1c91db9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 15:10:16 2013 -0500
More memory alignment-related tweaks.
Details:
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
passed into posix_memalign() or equivalent.
- Defined new function, bli_align_dim_to_cmem(), which applies the
contiguous memory alignment (rather than the system/malloc alignment).
commit 9682ef61dbf9a8846c8b0826d4de24bc216cd641
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 14:14:53 2013 -0500
Always define memory alignment size cpp constant.
Details:
- Removed guard around #define for memory alignment size constant.
Memory alignment should always be enabled, and so this value should
always be defined.
commit 3a787cccaae16531474f34398e3c0cf4f49b8cd8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 13:59:19 2013 -0500
Renamed memory alignment macro constant.
Details:
- Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
BLIS_MEMORY_ALIGNMENT_SIZE.
commit 37308f9a502b56d94fa52a7df71c676a46c3be3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 12:43:14 2013 -0500
Align packed panel strides with system alignment.
Details:
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
subsequent packed panel of A and B begins at an aligned address. (The
first panel is presumably aligned to system alignment because it is
aligned to a page boundary, which is typically much larger.)
- Rearranged code in packm_init_pack() to prevent additional conditional
blocks as a result of the aforementioned change.
- Adjusted contiguous memory allocator so that the system memory alignment
is used to allocate enough space for each block no matter what kind of
register blocking is used (even if register blocksize is unit and every
row/column needs maximal padding).
- Adjusted default blocksizes in reference configuration so that MC*KC
and KC*NC result in identical footprints for all datatypes.
commit 40a0654ada5f256beb3da80ebba015a3c71fb61f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 20:18:12 2013 -0500
CHANGELOG update.
commit b65cdc57d9e51fa00e3c03539cfb7e045707d0f4 (tag: 0.0.5)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 20:01:49 2013 -0500
Migrated 'bl2' prefix to 'bli'.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
commit 132bffcef7441f32d02cc7485aef6a0648e0ef1e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 18:49:36 2013 -0500
Removed several 'old' directories and files.
Details:
- Removed most of the 'old' directories scattered throughout the framework,
which includes alternate/half-baked/broken implementations.
commit 551ea4767a3ea6c263f12aaca94bc2642cee4cfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 18:00:10 2013 -0500
Removed #include "blis2.h" from low-level headers.
Details:
- Removed #include of "blis2.h" from various lower-level, operation-specific
header files throughout the framework. Given that these low-level headers
are included within #blis2.h in a very specific order, #include'ing blis2.h
within them directly is unnecessary.
commit bc7b318ed0960edeb4537797dd8c91de0d942ca9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 17:18:58 2013 -0500
Added cpp guards to conflicting libflame typedefs.
Details:
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
This is a temporary hack to allow interoperability with libflame. (Similarly
temporary changes are being made to libflame's type definitions file.)
commit f469907503fcdc24dff0174c569170e6e756e045
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 15:20:15 2013 -0500
Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
Details:
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
(e.g. "prefetch" instructions, which are different than the particular
kind of prefetching/preloading referred to by this constant).
commit d1023bfbc6668a58a01ee4f82ded2319911e7b19
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 15:09:59 2013 -0500
Removed build/old directory.
commit 718888849c48d99f83eea6b8f83bc1998cffef7e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 15:07:01 2013 -0500
Deprecated 'flame' configuration.
Details:
- Removed 'flame' configuration, as it was horribly out-of-date.
- Comment changes to bl2_blocksize.c and bl2_mem.c.
commit bba38cf4e9d28058c14483f44fa074a6d2852ad9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 18:07:40 2013 -0500
Added missing conjbeta argument to scald.
commit 1f82b51d06d0279dded3f2b87ba59403f3ed0af6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 18 15:37:20 2013 -0500
Relocated packed mem_t dimension fields to obj_t.
Details:
- Removed the m and n (and elem_size) fields from the mem_t object, and added
m_packed and n_packed fields to obj_t. These new fields track the same as
the old ones. From an abstraction standpoint, it seemed awkward to store
those dimensions inside the mem_t.
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
is passed in, instead of m, n, and elem_size.
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
respectively.
- Updated packm variants to access the packed length and width fields from
their new locations.
commit 36c782857bf9b8ac1b1dac47a70f689a4407e2cc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 18 10:37:03 2013 -0500
CHANGELOG update.
commit e7d41229d3b1674e74f47d7f29fae004a745201a (tag: 0.0.4)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 15 17:12:36 2013 -0500
Re-implemented contiguous memory allocator.
Details:
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
allocator instantiates and initializes three separate memory pool objects,
each one associated with a separate array of contiguous memory blocks, each
block of fixed and uniform size. (The three pools are for allocating mc-by-kc
blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
objects use a stack structure internally to track which blocks in the region
have been "checked out" to a thread and which are still available. Critical
regions are now clearly marked and adaptable to parallel environments (e.g.
OpenMP). Memory pools are set up when bl2_init() is called.
- Added a new field to the packm control tree node, which indicates what kind
of packed buffer is being allocated. The enumerated type for this argument
is defined as packbuf_t in bl2_type_defs.h.
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
packbuf_t argument to bl2_packm_cntl_obj_create().
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
bl2_mem_macro_defs.h.
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
number of blocks of A reserved for the memory allocator.
- Deprecated bl2_align_dim(). Replaced usage with that of
bl2_align_dim_to_mult(). Turns out that typically we don't need to align
a dimension to the system alignment, since that value has to do with
starting addresses, whereas the values we are dealing with are unitless
dimensions.
commit 1e76cae00cb0a04544aaae1ade878686b238d283
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 15 12:21:42 2013 -0500
Perform her2k var1 loops in sequence.
Details:
- Changed variant 1 of her2k so that the two rank-k products are computed
and accumulated in sequence rather than fused into one loop. This is
necessary if BLIS is to be configured to provide only enough contiguous
memory for one panel of B.
commit c95c270eba91ae4efc26603beddfd0292caa919b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 7 14:42:15 2013 -0600
Enhanced tracking of dimensions for mem_t objects.
Details:
- Added new fields to mem_t struct definition to track the allocated (as
opposed to the currently used) dimensions of the memory region. This
allows packm_init() to be more robust in situations where memory is
already allocated but is more than needed for the current packing job.
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
in packm_init(), to update the "currently used" dimensions of the mem_t
object if the requested dimensions are smaller than the allocated
dimensions.
commit e99281a0f41d482fddeffa239bfc8e13e6d13d4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 7 14:00:10 2013 -0600
Fixed test suite flop formulas for ops with side.
Details:
- Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
trmm3, and trsm.
- Comment updates in herk macro-kernels.
commit ef8cbfc44dd620fdcbdb51cdb173217194bebe31
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 2 12:47:06 2013 -0600
Added "version" to .gitignore.
Details:
- Added "version" to .gitignore file so that the file does not show up when
running 'git status', or accidentally get pulled into the index when
running 'git add' or 'git add --all'.
commit e9e0747c2f6c178f53ac46ab794acbb7b8c4fea8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 2 12:43:54 2013 -0600
Removed version file from version control.
Details:
- Removed version file from version control to prevent git errors that occur
when trying to pull new commits.
commit bb612f864e9c17dd9805e9446840f02259619469
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 1 12:55:42 2013 -0600
Updated behavior of bl2_obj_induce_trans() macro.
Details:
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
updated as part of the macro. All current uses of the macro have been
coupled with instances of bl2_obj_set_trans() to clear the bit.
- Added Jed to CREDITS file.
commit f24e29b789e7314764a818ceb3063126936c986f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 22 18:15:41 2013 -0600
Replaced banded/packed BLAS2 stubs with f2c code.
Details:
- Retired the blas2blis wrappers that simply called abort with a "not yet
implemented" message. This includes all of the level-2 banded and packed
routines.
- Replaced the aforementioned with the corresponding netlib implementations
having been run through f2c (with some customization).
- Added directories named 'attic' to build/gen-make-frags/ignore_list.
commit 1454c1a14207766dfed372b8e38b47fa384f5198
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 22 12:38:45 2013 -0600
Moved Fortran name-mangling macro to bl2_config.h.
Details:
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
configuration directory (bl2_config.h, specifically) given that it can be
expected to be tweaked by some developers.
commit ede75693e5a36c6006087c4a7df834175b604504 (tag: 0.0.3)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 22 12:11:24 2013 -0600
Implemented blas2blis compatibility layer.
Details:
- Added the blas2blis compatibility layer, located in frame/compat. This
includes virtually all of the BLAS, including banded and packed level-2
operations.
- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
initialization, which stores the "exit status" in an err_t, which is then
read by the latter function to determine whether finalization should actually
take place.
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
level-3 BLAS-like wrappers.
- Added configuration option to instruct BLIS to remain initialized whenever
it automatically initializes itself (via bl2_init_safe()), until/unless the
application code explicitly calls bl2_finalize().
- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
templatization of blas2blis wrappers.
- Defined level-0 scalar macro bl2_??swaps().
- Defined level-1v operation bl2_swapv().
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
wrappers.
commit 995edf43e21c1868732dbdd7fee14b08730218bd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 21 14:30:50 2013 -0600
Updated version file. (Forgot to in prev commit).
commit e823b08aaf7b65ecc6ddc30570709ea8a4b52aa7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 21 12:00:17 2013 -0600
Fixed some scalar types in BLAS-like Herm APIs.
Details:
- Some of the scalars of Hermitian operations, such as alpha in her,
alpha and beta in herk, and beta in her2k, need to be real. These
arguments were typed incorrectly as the complex types. This has been
fixed. Note the issue was only present in the BLAS-like APIs for
these operations (not the native object-based interfaces).
commit 5ece050a669e74ba4a711d1d4669239d22d45642
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 20 15:50:54 2013 -0600
Updated version file. (Forgot to in prev commit).
commit f243034b8b430d4684680ea8eddfd246e73fefc0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 20 14:11:36 2013 -0600
Changed API of packm_init_pack() to use blksz_t.
Details:
- Changed the interface of packm_init_pack() so that mult_m and mult_n
are passed in as type blksz_t* instead of dim_t.
- Make similar change for packv_init_pack().
commit da0c22f24107be9f33e0ea2dae52e5534b1fd0e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 15 09:59:48 2013 -0600
Minor changes to lower levels of scalm and setm.
Details:
- Removed diagx parameter from lower-level interfaces of scalm.
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
- Changed setm_unb_var1() so that having an implicit unit diagonal results
in only the strictly lower or upper triangle of the matrix being modified.
commit 2c836adadcd2a7d7f217033ac4d7fcad03d5bd55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 14 10:42:56 2013 -0600
Updated beta == zero semantics of mulsc.
Details:
- Updated beta == zero semantics of mulsc. Hopefully this is the last
operation that needed updating.
- Added Devin to CREDITS file.
commit 722b66c7dcaaaa1b109e7c8b1d53fd71a9af8240
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 14 10:18:00 2013 -0600
Removed some calls to setv() in test modules.
Details:
- Removed calls to setv() in test modules whose sole purpose was to
initialize vectors to zero to ensure that nan's and inf's would not
taint the computation. Now that beta == zero semantics have been
updated to clear the output operand (when beta is zero), rather than
multiply against it, these setv() calls are no longer needed.
commit e6ac623a902f776c42f85eadbf76996d9770a0db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 13 18:44:59 2013 -0600
Properly implemented beta == 0 semantics.
Details:
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
respectively.
- Added code to the following operations that sets the output operand to
zero if the corresponding scalar is zero (rather than performing the
floating-point multiply, or in the case of setv, copying the value).
This will prevent nan's and inf's from creeping into results from
uninitialized memory.
- axpy
- dotxv
- scalv
- scal2v
- setv
- gemv
- ger
- hemv
- her
- her2
- gemm reference ukernels
commit aedccbc85d491e41711a0c6eb0d246d8700a199a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 13 18:29:53 2013 -0600
Fixed stale interface to packm_unb_var1().
Details:
- Removed the control tree from the interface to packm_unb_var1(), which
I meant to do when it was un-deprecated.
commit c23135669f7a8a545e2e11ef559bf284be8bc65c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 13 13:21:00 2013 -0600
Un-deprecated packm_unb_var1.c (needed by l2 ops).
Details:
- Added bl2_packm_unb_var1() back into the mix once I realized that level-2
operations still need this routine for packing matrices. Now, whether
level-2 operations should be packing matrices to begin with is another
matter. But this fixes the segmentation fault one would have gotten when
running bl2_gemv() on a general stride matrix.
commit cf49e35f9819f9d93ebdca4703ade5abab28f6f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 18:39:35 2013 -0600
Removed cntl tree usage from packm implementation.
Details:
- Added new fields to obj_t info field:
- invert_diag
- pack_order_if_upper
- pack_order_if_lower
These fields allow packm_init() to embed information that begins
in the control tree into the object so that the packm implementation
does not need to use control trees at all. This is being done to aid
Bryan's DxT code generation.
- Added macros that operate on above fields.
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
to above changes.
- Made similar (but much simpler) changes to packv.
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
These were part of prototype implementations and are no longer needed.
commit eb139ae256651af7820b93ef982626180195b87f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 12:39:30 2013 -0600
Replaced bl2_abs() with _fabs() where appropriate.
commit 474bac30c99928f9e87315972bcb45c632c0b7ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 12:23:48 2013 -0600
Removed level-0 macros projrs, grabis.
Details:
- Replaced instances of projrs and grabis macros with newer,
more general-purpose getris.
commit 03a260a457c8964e4603a655cee0d40ac17affba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 11:45:34 2013 -0600
Restored executable permissions to scripts.
Details:
- Restored executable (0755) permissions to scripts that were touched by
the recursive sed script that updated the copyright headers in the
previous commit.
commit 1274e1243775e5e705114257a43176f63635227f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 11 14:37:47 2013 -0600
Updated copyright headers from 2012 to 2013.
commit 3b620cc8e90c53c79129bd9dd89ae6b77c2446f1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 11 13:38:07 2013 -0600
CHANGELOG update.
commit 768fcebaa8be0eb936a6e7a02cd8a19438c79d99 (tag: 0.0.2)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 11 13:20:44 2013 -0600
Added unified test suite, and many fixes.
Details:
- Added a highly configurable, unified test suite.
- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
header files. Now, instead, DUPB is computed as (NDUP != 1) within each
macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
By encoding both pieces of information into one constant in _kernel.h,
it seems somewhat less likely others will encounter this bug in the
future.
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
and defined blocksizes in _cntl.c files to these default values.
- Changed semantics of her2k and syr2k such that these operations no longer
expect the B matrix to already be conjugate-transposed (or just transposed
for syr2k). However, these semantics are preserved for the internal
mechanics of the implementations, including the internal back-end and all
blocked variants.
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
respectively.
- Relaxed general object structure constraints in _basic_check() for gemv, ger.
- Changed her front-end to NOT copy-cast to real projection; instead, this is
replaced by selecting either the real part or both parts within the unblocked
algorithm implementation, depending on the value of conjh.
- Added conjh to all _check routines for her so that the code knows when to
verify that alpha has an imaginary component equal to zero (for her, but
not syr).
- Changed control tree for her to forgo packing.
- Added unit diagonal support to fnormm.
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
- Redefined complex versions of sqrt2s macros using the actual "complex square
root" formula.
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
- Defined new level-1v, -1d, and -1m versions of add and sub operations
(two-operand add and subtract).
- Added new scalar macros:
- getris: acquire real and imaginary components.
- setris: set real and imaginary components.
- addjs: addition with conjugated x.
- subjs: subtraction with conjugated x.
- Defined new utility operations:
- absumv: element-wise sum of absolute values for vector elements.
- absumm: element-wise sum of absolute values for matrix elements.
- mkherm: convert existing matrix to Hermitian.
- mksymm: convert existing matrix to symmetric.
- mktrim: convert existing matrix to triangular.
- Added various error checking routines.
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
wall clock time of a code block.
- Added general stride support to bl2_obj_alloc_buffer().
- Added bl2_obj_init_scalar().
- Updated parameter mapping in bl2_param_map.c.
- Added support for queriable version string.
- Fixed a bug in the her2k macro-kernels (which currently are simply
implemented in terms of two invocations of herk) whereby beta was being
applied to both the first and second rank-k updates, rather than only
the first.
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
properly implemented due to erroneous assumptions regarding aliasing and
root objects.
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
MR x NR block of B was being updated.
- Fixed a bug in the inverts macro in the double real case whereby the
value was typecast to float before inversion. This affected non-unit cases
of dtrsm.
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
constant was being applied incorrectly.
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
now mimics the rank-k strategy of gemm, whereby alpah is applied during
the first iteration of variant 3, with BLIS_ONE passed in instead for
subsequent iterations. This also required passing alpha into the macro-
kernels as well as the fused gemmtrsm micro-kernels.
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
called for blocks strictly above the diagonal. While this sounds good in
theory, this cannot be done because gemm_ker_var2 expects row panels of
A to be packed from top to bottom, while for trsm_u, A is actually packed
from bottom to top due to the reverse (BR->TL) nature of the algorithm.
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
dimensions were mishandled due to incorrect arguments to the copyv kernel.
Also changed the copyv kernel invocation to scal2v so that these edge
cases are properly handled when scaling is requested.
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
instead of the source object.
- Fixed a bug whereby level-2 code could allocate memory dynamically via
bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
a potential future bug whereby a mem_t object that is actually no longer
"allocated" from the static pool is mistaken for being allocated due to
failure to NULLify the buffer when the block was most recently released.
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
toggled when the requested subpartition needed to be "reflected" due to it
residing in an unstored region.
commit be94fb84c0351602d7585269f29998e3bf83f899
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 4 10:55:21 2013 -0600
Added missing 'd' to fused gemmtrsm function name.
commit 879a179e1dee36f0c56765f2ab91a26861019b34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 4 10:37:27 2013 -0600
Added debug statements to bl2_mm_acquire_m().
Details:
- Added printf() statements to bl2_mm_acquire_m() to help debug issues
with prematurely exhausted memory pool.
- Removed 'd' from kernel names of reference kernels in clarksville
configuration's bl2_kernel.h
commit 806e74beb4eafeef620a555ffbb3f6779e29c7b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 17:07:50 2012 -0600
Defined Frobenius norm operations.
Details:
- Added level-0 grabis macro operation to grab imaginary component of one
variable and copy it to the real component of another variable.
- Defined sumsqv operation, which computes the sum of the absolute squares
of the elements of a vector. This implementation is modeled after ?lassq
in netlib LAPACK.
- Defined fnormv and fnormm operations, which compute the Frobenius norm on
vectors and matrices, respectively. These operations are treated as one-
operand operations where the output norm value is the real projection of
the datatype of the input operand. Both operations are implemented in terms
of sumsqv.
commit 66e80ce1aec099b2b2b0c4f295e38add2c921383
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 17:02:55 2012 -0600
Added GENT*R macros; tweaked bl2_machval defs.
Details:
- Added function and prototype macro-generating macros for GENTFUNCR and
GENTPROTR, which are one-operand macros with auxiliary real projection
types.
- Tweaked bl2_machval files to use new macros.
commit 2fecc88ca22142020573f168da715e8e9f3dd7de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 11:35:14 2012 -0600
Fixed harmless macro bug in level-1m operations.
Details:
- Fixed some inconsistent usage of n_iter_max and n_iter in the two
bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening
despite the bug, which is why I had not discovered it until now.
commit 8945db6ec9f82168cf72411ad408b4fdb44ae0d1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 18 15:07:36 2012 -0600
Renamed x86,x86_64 kernels to indicate 'd' fusing.
Details:
- Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape
to emphasize that the fusing shape is not for all datatype instances, but
rather just for one (that of double-precision real). Other fusing shapes
would be proportional to their precision and domain "byte footprints".
- Corresponding changes to config/clarksville/bl2_kernel.h.
commit 6fbbdd4e194d06096ad08c5db61127be338067db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 18 14:34:02 2012 -0600
More tweaks to _config.h, _kernel.h; smem tweaks.
Details:
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
accomplishes the same thing (enabling posix_memalign()) without enabling
all of the GNU extensions we don't need.
- Defined the size of the static memory pool in terms of MC, KC, and NC,
as well as two new constants that determine how many MCxKC blocks and
how many KCxNC blocks should be allocated (defined in bl2_config.h).
- In the case of static memory pool exhaustion, replaced the generic
bl2_abort() with a specific error code call.
commit 5d8bdb21c48e8fb11bef6128a242122cc1470a99
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 17 16:07:36 2012 -0600
Minor reordering of bl2_config.h definitions.
commit 4a83f67490136a898f558e273b76a687aed8b893
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 17 12:35:54 2012 -0600
Consolidated configuration headers.
Details:
- Merged contents of bl2_arch.h into bl2_config.h for reference and
clarksville configurations.
- Updated CREDITS, INSTALL, LICENSE, README files.
commit 0670c33cc14612f636ef09ede4133404ae0af6ba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 14 12:45:26 2012 -0600
Fixed bug in reference gemm ukernels.
Details:
- Fixed a bug whereby, for the reference gemm ukernels, the matrix product
was not correctly accumulated and scaled (by alpha) into the output matrix
C. (Thanks to Fran for finding this bug.)
- Whitespace changes to reference trsm kernels.
commit e2e7cb2fbe615be4d375bc2dce88d03d98fadc9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 13 18:17:54 2012 -0600
Expanded reference packm/unpackm kernel set to 16.
Details:
- Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and
unpackm.
- Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range"
kernel size is requested. (Thanks to Tyler for finding this bug.)
- Updated bl2_kernel.h to contain new _KERNEL definitions, according
to above changes, for 'reference' and 'clarksville' configurations.
- Updated CHANGELOG.
- Removed "output*.m" from .gitignore.
commit 17455a8bce038dd570356ab0c5c11d9a89f20248
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 10 17:23:32 2012 -0600
Minor updates towards to 0.0.1.
commit 7ad4ebef38b8e6eea9b6091844ba7294ec870271 (tag: 0.0.1)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 10 16:18:40 2012 -0600
Tweaks to get BLIS compiling again on clarksville.
Details:
- Updated header files and make_defs.mk in config/clarksville.
- Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
- Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
- Shuffled include statements in blis2.h.
commit cc58ea86010b1f046134d13b546c878389df9af5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 10 14:55:12 2012 -0600
Added template fragment.mk; updated .gitignore.
commit 714c527b0eb153b7e2040b79349edc8372f743fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 7 19:54:04 2012 -0600
Added 'changelog' make target; other tweaks.
Details:
- Updated CHANGELOG.
- Added 'changelog' target to Makefile that runs 'git log --decorate' and
overwrites CHANGELOG with the output.
- Other trivial changes.
commit e4e5404d26aded4873278e85faf6f14ac32115b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 7 17:34:53 2012 -0600
Define static memory pool size in bl2_config.h.
commit 19bb507d0de6a2bd3ce37cf616bdcd6b419ed641
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 7 17:18:00 2012 -0600
Refined INSTALL text; added 'showconfig' target.
Details:
- Added 'showconfig' target to Makefile.
- Added header files and ./config/<configname>/make_defs.mk as prerequisites
to object file rules.
- Added config.mk as prerequisite to library install rules.
- Edited and added to INSTALL file.
commit 26cb659dd79636489db5a051aa60fff80273a7b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 6 15:34:53 2012 -0600
Added auto-detection of version string (via git).
Details:
- Added build/update-version-file.sh script for auto-detecting "version"
string and updating 'version' file accordingly. (If .git directory is
not present, then it is assumed this copy of BLIS is a downloaded
release, in which case 'version' file is left unchanged.)
- Added invocation of update-version-file.sh to configure script.
commit b0ecd0ff52fa6ffc9e1d9eb44c365f7f009a6204
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 6 14:27:11 2012 -0600
Wrote first draft of INSTALL file.
commit bcbe81235a35ccfdbcc2f2319a0ca6e04f75a785 (tag: 0.0.0)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 6 12:42:35 2012 -0600
Updated standalone test Makefile and other fixes.
Details:
- Major edits to test/Makefile to bring up-to-date wrt new build system;
should no longer be broken.
- Minor edits to top-level Makefile.
- Fixed copy-and-paste bugs in
- frame/1m/packm/ukernels/bl2_packm_ref_?xk.c
- frame/1m/unpackm/ukernels/bl2_unpackm_ref_?xk.c
commit 2f272b40f43307909736327f49d17737c7a05d37
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 4 19:22:14 2012 -0600
Added build system and continued reorganization.
Details:
- Added/renamed packm, unpackm kernels.
- Added machine value routines.
- Added param_map facility.
- Renamed AUTHORS to CREDITS.
- Added Makefile; continued to expand upon existing configure script.
- #define fuse_fac macros in operation headers if not defined already
(by the user in bl2_kernels.h).
commit 00f3498a8943be1b387f0d5c029c8c7891687ad5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 3 12:36:11 2012 -0600
Initial commit.