mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
1904 lines
79 KiB
Plaintext
1904 lines
79 KiB
Plaintext
commit 0680916fdd532f7a4716b11a2515243b2c08d00f (HEAD, tag: 0.0.9, origin/master, origin/HEAD, master)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jul 18 18:04:34 2013 -0500
|
|
|
|
Added BLAS error checking to compatibility layer.
|
|
|
|
Details:
|
|
- Added frame/compat/check directory, which now houses companion _check()
|
|
routines for each of the BLAS wrappers in frame/compat. These _check()
|
|
routines are called from the compatibility wrappers and mimic the
|
|
error-checking present in the netlib BLAS.
|
|
- Edited bla_xerbla.c so that xerbla() translates the operation string to
|
|
uppercase before printing.
|
|
- Redefined util routines in frame/compat/f2c/util in terms of level0
|
|
macros.
|
|
- Added prototypes for util routines, f2c routines, lsame(), and xerbla().
|
|
- Commented out prototypes in test/test_*.c since Fortran integers are now
|
|
int64_t by default (and the prototypes that were present in the files
|
|
used int).
|
|
- Removed redundant #include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c,
|
|
since blis.h was already being included.
|
|
- Other minor changes to code in frame/compat/f2c.
|
|
|
|
commit 4e80ad28c97273db3366428ec44020da7944964d
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jul 18 17:53:31 2013 -0500
|
|
|
|
Added support for C99 complex types/arithmetic.
|
|
|
|
Details:
|
|
- Added support for C99 complex types to bli_type_defs.h and overloaded
|
|
complex arithmetic to the scalar-level macros in include/level0. This
|
|
includes a somewhat substantial reorganization and re-layering of much
|
|
of the existing machinery present in the level0 macros.
|
|
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
|
|
commented-out by default, which optionally enables the use of built-in
|
|
C99 complex types and arithmetic.
|
|
- Minor changes to clarksville and reference configs' make_defs.mk files.
|
|
- Removed macro definitions from bli_param_macro_defs.h which was not being
|
|
used (bli_proj_dt_to_real_if_imag_eq0).
|
|
|
|
commit 6072d7c848e837ba20d607f7b727438ada31bdcf
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jul 17 12:27:45 2013 -0500
|
|
|
|
Fixed bugs in trsm, trmm macro-kernels.
|
|
|
|
Details:
|
|
- Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling.
|
|
- Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was
|
|
incorrectly being adjusted upward by MR, instead of NR. The rl and ru
|
|
trmm macro-kernels were updated in a similar fashion.
|
|
- Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on
|
|
diagoffb when recomputing k to skip a zero region below where the
|
|
diagonal intersects the right side of the block. The corresponding
|
|
trmm macro-kernel was also updated.
|
|
- Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR)
|
|
needed to be placed AFTER the block that recomputes k to skip the zero
|
|
region (if present). The other three trsm macro-kernels, as well as the
|
|
trmm macro-kernels, were updated in the same manner, for consistency.
|
|
- Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was
|
|
being updated to skip a zero region to the left of where the diagonal
|
|
of A intersects the top edge of the block.
|
|
- Comment updates to all trsm and trmm macro-kernels.
|
|
- Comment updates to bli_packm_init.c.
|
|
|
|
commit 47410a48f9b91e94ce4c67633686ffd1f2ad0275
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jul 10 14:53:59 2013 -0500
|
|
|
|
Added f2c'ed Givens rotation wrappers.
|
|
|
|
Details:
|
|
- Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic'
|
|
along with other wrappers for which no BLIS implementation exists.
|
|
- Added f2c-generated codes for applicable datatype flavors of rot, rotg,
|
|
rotm, and rotmg operations.
|
|
|
|
commit e5f90f3a8dbe671104bcb9d8b4e3409de01805da
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jul 10 13:40:12 2013 -0500
|
|
|
|
Removed copynz defs from bli_kernel.h files.
|
|
|
|
Details:
|
|
- Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each
|
|
configuration. (Meant to include this in previous commit.)
|
|
|
|
commit aec12d90f596e8c04b1ad178258a1cd38108f59d
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jul 10 13:33:30 2013 -0500
|
|
|
|
Removed copynzv, copynzm and related codes.
|
|
|
|
Details:
|
|
- Removed copynzv and copynzm operation directories. These operations
|
|
implemented a variation of copyv/m that, in the case of real source
|
|
and complex destination operands, leaves the imaginary component
|
|
untouched (rather than setting it to zero). I realize now that the
|
|
special case(s) (e.g. gemm with real A and B but complex C) that I
|
|
thought required this operation actually can be handled more simply.
|
|
- Removed level0 scalar macros implementing copynzs, copynzjs.
|
|
|
|
commit b0a0a0f274a761788531b5d281cc3b411b7124ed
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Jul 9 17:15:38 2013 -0500
|
|
|
|
Added handling of restrict, stdint.h for non-C99.
|
|
|
|
Details:
|
|
- Removed the #include <stdint.h> from blis.h and inserted a cpp macro block
|
|
in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise
|
|
manually typedefs the types we need (which, for now, are unconditionally
|
|
int64_t and uint64_t).
|
|
- Moved basic typedefs to top of bli_type_defs.h, and comment changes.
|
|
- Added cpp macro block to bli_macro_defs.h that #defines restrict as
|
|
nothing for C++ and non-C99.
|
|
|
|
commit 4b7e7970f1af4a1ab121e07657e2b78b9fcd7671
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jul 8 15:20:34 2013 -0500
|
|
|
|
Migrated integer usage to stdint.h types.
|
|
|
|
Details:
|
|
- Changed the way bli_type_defs.h defines integer types so that dim_t,
|
|
inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
|
|
integer) or guint_t (general unsigned integer).
|
|
- Renamed Fortran types fchar and fint to f77_char and f77_int.
|
|
- Define f77_int as int64_t if a new configuration variable,
|
|
BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
|
|
These types are defined in stdint.h, which is now included in blis.h.
|
|
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
|
|
in terms of scomplex.
|
|
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
|
|
of char.
|
|
- Updated bla_amax() wrappers so that the return type is defined directly
|
|
as f77_int, rather than letting the prototype-generating macro decide
|
|
the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
|
|
so I removed them. Also, changed the body of the wrapper so that a
|
|
gint_t is passed into abmaxv, which is THEN typecast to an f77_int
|
|
before returning the value.
|
|
- Updated f2c code that accessed .r and .i fields of complex and
|
|
doublecomplex types so that they use .real and .imag instead (now that
|
|
we are using scomplex and dcomplex).
|
|
|
|
commit 372501398564fdba3d5a3db86c30bc1039b185ff
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jul 8 11:24:18 2013 -0500
|
|
|
|
Added experimental bli_gemm_ker_var5().
|
|
|
|
Details:
|
|
- Added support for an experimental gemm macro-kernel incrementally
|
|
packs one micro-panel of B at a time. This is useful for certain
|
|
special cases of gemm where m is small.
|
|
- Minor changes to default values of clarksville configuration.
|
|
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
|
|
do not yet have any use (or implementation support) for block storage.
|
|
- Comment update to bli_packm_init.c.
|
|
|
|
commit 9915d667a79f23e3a2a2516247c560e9063a1646
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Jul 7 13:28:39 2013 -0500
|
|
|
|
Defined "total" blocksize query functions.
|
|
|
|
Details:
|
|
- Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query
|
|
the default blocksize plus blocksize extension (using the type or the type
|
|
of an object).
|
|
- Comment update in bli_packm_cxk.c.
|
|
|
|
commit 46d3d09d49aded1d9f1b468c83fce75e07d631dc
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jun 27 13:19:56 2013 -0500
|
|
|
|
Consolidated lower/upper her[2]k blocked variants.
|
|
|
|
Details:
|
|
- Consolidated lower and upper blocked variants for herk and her2k, and
|
|
renamed the resulting variants, according to the same changes recently
|
|
made to trmm and trsm.
|
|
- Implemented support for four new subpartitions types:
|
|
BLIS_SUBPART1T
|
|
BLIS_SUBPART1B
|
|
BLIS_SUBPART1L
|
|
BLIS_SUBPART1R
|
|
which correspond to "merged" partitions that include the middle "1"
|
|
partition as well as either the neighboring "0" or "2" partition. This is
|
|
used to clean up code in herk/her2k var2 that attempts to partition away
|
|
the strictly zero region above or below the diagonal of a matrix operand
|
|
that is being marched through diagonally.
|
|
- Added safeguards to herk macro-kernels that skip any leading or trailing
|
|
zero region in the panel of C that is passed in. This is now needed given
|
|
that herk/her2k var1 no longer partitions off this zero region before
|
|
calling the macro-kernel (via bli_her[2]k_int()).
|
|
- Updated comments and other whitespace changes to trmm/trsm macro-kernels.
|
|
|
|
commit 02002ef6f3d2746665982793db36714bd69bccc9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jun 24 17:08:14 2013 -0500
|
|
|
|
Added row-storage optimizations for trmm, trsm.
|
|
|
|
Details:
|
|
- Implemented algorithmic optimizations for trmm and trsm whereby the right
|
|
side case is now handled explicitly, rather than induced indirectly by
|
|
transposing and swapping strides on operands. This allows us to walk through
|
|
the output matrix with favorable access patterns no matter how it is stored,
|
|
for all parameter combinations.
|
|
- Renamed trmm and trsm blocked variants so that there is no longer a
|
|
lower/upper distinction. Instead, we simply label the variants by which
|
|
dimension is partitioned and whether the variant marches forwards or
|
|
backwards through the corresponding partitioned operands.
|
|
- Added support for row-stored packing of lower and upper triangular matrices
|
|
(as provided by bli_packm_blk_var3.c).
|
|
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
|
|
blocksize extensions (if non-zero) were not being used to appropriately size
|
|
the first iteration (ie: the bottom/right edge case).
|
|
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
|
|
whole multiples of MR AND NR. This is needed for the case of trsm_r where,
|
|
in order to reuse existing left-side gemmtrsm fused micro-kernels, the
|
|
packing of A (left-hand operand) and B (right-hand operand) is done with
|
|
NR and MR, respectively (instead of MR and NR).
|
|
|
|
commit d1e81ddc848ee47bc188735883d14582bdd0cabc
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jun 13 11:14:21 2013 -0500
|
|
|
|
Minor generalizing tweaks to trmm blk var1, var2.
|
|
|
|
commit 0efb7974f104206ba3985276f2180a9b14fe9f9b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jun 12 16:40:04 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit 5b641c3bab31eac6a1795b9f6e3f86c59651ca50 (tag: 0.0.8)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Jun 12 16:02:12 2013 -0500
|
|
|
|
Use separate CFLAGS for "kernels" directories.
|
|
|
|
Details:
|
|
- Added a new "special" directory type: any source code within directories
|
|
named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
|
|
compiler flags. This allows the developer to specify a separate set of
|
|
flags (e.g. optimization flags) for compiling kernels while maintaining a
|
|
standard set for regular framework code.
|
|
- Fixed a bug in the top-level Makefile that was causing "noopt" code
|
|
to be compiled with the standard set of compilation flags.
|
|
- Updated make_defs.mk in reference, flame, and clarksville configurations
|
|
according to above changes.
|
|
|
|
commit 08475e7c7653ba598665071a617d10f0d8f763c2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Jun 11 12:18:39 2013 -0500
|
|
|
|
Various level-3 optimizations for row storage.
|
|
|
|
Details:
|
|
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
|
|
packing from a lower or upper-stored symmetric/Hermitian matrix to column
|
|
panels (which are row-stored). Previously one could only pack to row panels
|
|
(which are column-stored).
|
|
- Implemented various optimizations in the level-3 front-ends that allow more
|
|
favorable access through row-stored matrices for gemm, hemm, herk, her2k,
|
|
symm, syrk, and syr2k.
|
|
- Cleaned up code in level-3 front-ends that has to do with setting target and
|
|
execution datatypes.
|
|
|
|
commit 05a657a6b92e8d34efa5c57ae6a18a4f35ec0841
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jun 7 11:04:10 2013 -0500
|
|
|
|
Added beta == 0 optimization to x86_64 ukernel.
|
|
|
|
Details:
|
|
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
|
|
from memory (nor scaled by beta).
|
|
- Fixed minor bug in test suite driver when "Test all combinations of storage
|
|
schemes?" switch is disabled, which would result in redundant tests being
|
|
executed for matrix-only (e.g. level-1m, level-3) operations if multiple
|
|
vector storage schemes were specified.
|
|
- Restored debug flags as default in clarksville configuration.
|
|
|
|
commit f1aa6b81cc421516dd77dd0f18f7c432724e6ef2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Jun 6 13:36:06 2013 -0500
|
|
|
|
Whitespace changes to old test drivers.
|
|
|
|
Details:
|
|
- Replaced tabs with four spaces in places where indention was already
|
|
in place.
|
|
|
|
commit 9feb4c23d2e36f3d8b5417a3802c69f94b29f749
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Jun 4 14:57:46 2013 -0500
|
|
|
|
Fixed unaligned handling in axpyf, dotxaxpyf.
|
|
|
|
Details:
|
|
- Fixed over-cautious handling of unaligned operands in vector instrinsic
|
|
implementation of axpyf kernel.
|
|
- Fixed over- and under-cautious handling of unaligned operands in vector
|
|
intrinsic implementation of dotxaxpyf kernel.
|
|
|
|
commit 22b06cfcd2e3205c8325a246c2279e4b1047c066
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Jun 3 16:54:52 2013 -0500
|
|
|
|
Updated level-1/-1f [vector intrinsic] kernels.
|
|
|
|
Details:
|
|
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
|
|
handled by reference implementation (rather than aborted).
|
|
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
|
|
configuration.
|
|
- Defined bli_offset_from_alignment() macro.
|
|
- Minor edits to old test drivers.
|
|
|
|
commit 0288c827d3659bb225ac9c10f168b623ed0106a2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Jun 1 08:02:23 2013 -0500
|
|
|
|
Updated ukernels for x86_64.
|
|
|
|
Details:
|
|
- Tweaked micro-kernels and configuration for clarksville.
|
|
- Updated/cleaned up old test drivers in test directory.
|
|
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
|
|
recently).
|
|
|
|
commit 85a6d1c9a52c2b27c71a3a3e341c51d7ba263749
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon May 6 11:05:08 2013 -0500
|
|
|
|
Replaced axpys usage with subs in trsv.
|
|
|
|
Details:
|
|
- Replaced instances of axpys with alpha equal to -1 with subs.
|
|
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
|
|
sizeof(dcomplex).
|
|
|
|
commit 2d9c667f3c48a12cab64e5ad09d5fcb9f4c19d78
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri May 24 16:28:10 2013 -0500
|
|
|
|
Fixed x86_64 kernel bugs and other minor issues.
|
|
|
|
Details:
|
|
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
|
|
unaligned subpartitions. We were already going out of our way a bit to
|
|
handle edge cases in the first iteration for blocked variants, and this
|
|
was simply the unblocked-fused extension of that idea.
|
|
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
|
|
into account how the choice of variant needed to be altered for
|
|
upper-stored matrices (given that only lower-stored algorithms are
|
|
explicitly implemented).
|
|
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
|
|
macros to provide inlined versions of bli_determine_blocksize_[fb]() for
|
|
use by unblocked-fused variants.
|
|
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
|
|
consistency with that of the bugfix for trmv/trsv (both of which now
|
|
use the same macros).
|
|
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
|
|
vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
|
|
conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
|
|
was invalid only because the code was expecting 1 (for purposes of
|
|
performing contiguous vector loads) but got a value greater than 1 because
|
|
the column stride of the object (e.g. rho) was inflated for alignment
|
|
purposes (albeit unnecessarily since there is only one element in the
|
|
object).
|
|
- Replaced some old invocations of set0 with set0s.
|
|
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
|
|
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
|
|
- Added safeguard to test modules so that testing a problem with a zero
|
|
dimension does not result in a failure.
|
|
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
|
|
internal back-ends to correctly handle cases where output operand still
|
|
needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
|
|
|
|
commit d57ec42b34f8447c88adeffa95cf22f8c115ad51
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri May 3 17:35:32 2013 -0500
|
|
|
|
Renamed _trans_status() macro.
|
|
|
|
Details:
|
|
- Mistakenly forgot to rename the _trans_status() macro and instances in
|
|
previous commit.
|
|
|
|
commit 9e2b227866af429a4a6fb7dbb8c457bbdda2f136
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri May 3 17:24:58 2013 -0500
|
|
|
|
Renamed _set_trans(), _trans_status() macros.
|
|
|
|
Details:
|
|
- Renamed the following macros:
|
|
bli_obj_set_trans() -> bli_obj_set_onlytrans()
|
|
bli_obj_trans_status() -> bli_obj_onlytrans_status()
|
|
to remove ambiguity as to which bits are read/updated.
|
|
|
|
commit 2f8174509ea9f844db11ebd9389de5168e85b132
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed May 1 15:06:30 2013 -0500
|
|
|
|
Unconditionally check memory pool(s) for errors.
|
|
|
|
Details:
|
|
- Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the
|
|
memory pool is exhausted before checking out and returning a block, even
|
|
if BLIS error checking has been disabled. These errors are useful because
|
|
they likely indicate that BLIS was improperly configured for the code
|
|
being run.
|
|
|
|
commit 75405a2b83679b6aff38d7e7425199d623a7b0a9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed May 1 15:00:30 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit 6bfa96f84887dec0b4cf8be5d38dd634c2f8951d (tag: 0.0.7)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 30 19:35:54 2013 -0500
|
|
|
|
Absorbed blocksize extensions into main objects.
|
|
|
|
Details:
|
|
- Revamped some parts of commit b6ef84fad1c9 by adding blocksize extension
|
|
fields to the blksz_t object rather than have them as separate structs.
|
|
- Updated all packm interfaces/invocations according to above change.
|
|
- Generalized bli_determine_blocksize_?() so that edge case optimization
|
|
happens if and only if cache blocksizes are created with non-zero
|
|
extensions.
|
|
- Updated comments in bli_kernel.h files to indicate that the edge case
|
|
blocksize extension mechanism is now available for use.
|
|
|
|
commit bc7c8005cedbe50961ac2a99aeeabf4e9f9a8e9e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 25 17:16:59 2013 -0500
|
|
|
|
Added option to disable err checking in testsuite.
|
|
|
|
Details:
|
|
- Added a new line to input.general that allows one to specify the error-
|
|
checking level to use for each BLIS experiment. The only two levels
|
|
supported for now are "no error checking" and "full error checking".
|
|
|
|
commit 096b366ddcfe386f44419ef84d8df8be13825f86
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 25 16:43:43 2013 -0500
|
|
|
|
Use cntl trees that block in n dimension.
|
|
|
|
Details:
|
|
- Updated _cntl.c files for each level-3 operation to induce blocked
|
|
algorithms that first paritition in the n dimension with a blocksize
|
|
of NC. Typically this is not an issue since only very large problems
|
|
exceed that of NC. But developers often run very large problems, and
|
|
so this extra blocking should be the default.
|
|
- Removed some recently introduced but now unused macros from
|
|
bli_param_macro_defs.h.
|
|
|
|
commit b6e24b23cb4dfc488c1c9c70d596539c2287f72e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 25 12:06:12 2013 -0500
|
|
|
|
Use PASTEMAC in macro-kernels (over MAC2 or MAC3).
|
|
|
|
Details:
|
|
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
|
|
and PASTEMAC3) with those that only use a single type (PASTEMAC).
|
|
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
|
|
accommodate above change.
|
|
- Fixed comment typo in bli_config.h files.
|
|
- Added .nfs* pattern to .gitignore.
|
|
|
|
commit df80acf517dde180ddcc5835c6136b2fa7556d4b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 19:43:23 2013 -0500
|
|
|
|
Fixed computation of b_next in L3 macro-kernels.
|
|
|
|
Details:
|
|
- Restructured herk_l and herk_u macro-kernels in the imagine of trmm
|
|
and trsm, in that the edge cases are captured by the main loop, rather
|
|
than trying to have "cleanup" sections that result in four distinct
|
|
parts (interior, bottom edge, right edge, bottom-right edge) of the
|
|
code.
|
|
- Fixed the way b_next was being computed in the non-gemm level-3
|
|
macro-kernels (herk, trmm, trsm). The way they are computed now matches
|
|
that of gemm.
|
|
|
|
commit 3671528cf8efe4b445d196665143a5c50c2c6048
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 19:12:14 2013 -0500
|
|
|
|
Fixed minor bug in computing b_next in gemm.
|
|
|
|
commit db072a5b4a039a9a668ef951333ecfb5bd3a74b9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 17:49:10 2013 -0500
|
|
|
|
Fixed rare edge case bug in herk_l macro-kernel.
|
|
|
|
Details:
|
|
- Fixed a potential bug in herk_l at the m_left edge case. If MR was
|
|
chosen to be much larger than NR, then one could encounter edge cases
|
|
in the the MC dimension that fall entirely below the diagonal, which
|
|
the previous implementation of the herk_l macro-kernel was not allowing
|
|
for.
|
|
|
|
commit 1dab11e37d1cb403cbe75b73a644c00de534f104
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 17:17:11 2013 -0500
|
|
|
|
Updated x86 gemmtrsm ukernels to use alpha.
|
|
|
|
commit 9d10d7dd9bc92a993fea7162bfa5983f75506f49
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 16:00:18 2013 -0500
|
|
|
|
Added a_next, b_next arguments to micro-kernels.
|
|
|
|
Details:
|
|
- Added two more arguments to the gemm and gemmtrsm microkernels: the
|
|
addresses of the next micro-panels of A and B. By passing these
|
|
pointers into the micro-kernel, we allow the micro-kernel author to
|
|
prefetch micro-panels of A and B as necessary (though this is
|
|
completely optional; these addresses may also be safely ignored).
|
|
- Updated all seven macro-kernels so that they compute and pass in
|
|
a_next and b_next. Note that ONLY the gemm macro-kernel computes
|
|
a_next and b_next with the precise semantics we want. I will go back
|
|
and fix the other macro-kernels in the near future.
|
|
- Added 'restrict' to various micro-kernels from which it was missing.
|
|
|
|
commit f3815dc84d385c514a5acaf1e925424a57be2f51
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 23 11:12:33 2013 -0500
|
|
|
|
Added code for backward edge-case blocking.
|
|
|
|
Disabled:
|
|
- Edited bli_determine_blocksize_b() to include experimental (and
|
|
currently disabled) code that computes extended blocks.
|
|
- Updated commnts relate to above changes.
|
|
- Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.
|
|
|
|
commit 4fe1435f20e8fc7dd72f795ac58c8e236e6c631b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 22 19:00:43 2013 -0500
|
|
|
|
Updated dupl implementation to use PACKNR and NR.
|
|
|
|
Details:
|
|
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
|
|
explicitly so navigate b1 so that situations where PACKNR > NR are
|
|
supported.
|
|
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
|
|
frame/3/trsm/ukernels to kernels/c99/.
|
|
- Updated clarksville and flame configurations.
|
|
|
|
commit 2d6f9e83799a46d52d7901e275f8fd67f0a0edc6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Apr 21 15:10:34 2013 -0500
|
|
|
|
Disabled blocksize checks for memory pools.
|
|
|
|
Details:
|
|
- Temporarily disabled checks that ensure that enough memory will be allocated
|
|
by the contiguous memory allocator for all types, given that the values for
|
|
double precision real are the ones used to allocate the space. These checks
|
|
can easily go awry in certain situations, especially if you are developing for
|
|
only one datatype. So for now, they are probably more trouble than they are
|
|
worth.
|
|
|
|
commit b6ef84fad1c9884c84b7f1350a0bcdfe1737e8f2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Apr 21 15:00:24 2013 -0500
|
|
|
|
Allow ldim of packed micro-panels != MR, NR.
|
|
|
|
Details:
|
|
- Made substantial changes throughout the framework to decouple the leading
|
|
dimension (row or column stride) used within each packed micro-panel from
|
|
the corresponding register blocksize. It appears advantageous on some
|
|
systems to use, for example, packed micro-panels of A where the column
|
|
stride is greater than MR (whereas previously it was always equal to MR).
|
|
- Changes include:
|
|
- Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
|
|
to use when packing micro-panels of A and B.
|
|
- Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
|
|
where appropriate, instead of MR and NR.
|
|
- Added pd field (panel dimension) to obj_t.
|
|
- New interface to bli_packm_cntl_obj_create().
|
|
- Renamed bli_obj_packed_length()/_width() macros to
|
|
bli_obj_padded_length()/_width().
|
|
- Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
|
|
- Print out new cache and register blocksize extensions in test suite.
|
|
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
|
|
blocksize for edge cases, which can improve performance at the margins.
|
|
|
|
commit 59fca58dbe678d79c1df0916b022afbeac7c48fa
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Apr 19 15:26:29 2013 -0500
|
|
|
|
Fixed bug in compatibility layer (her2k/syr2k).
|
|
|
|
Details:
|
|
- Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c
|
|
and bla_syr2k.c, that caused incorrect computation to occur when the BLAS
|
|
interface caller requests the [conjugate-]transpose case. Thanks to Bryan
|
|
Marker for reporting the behavior that led to this bug.
|
|
|
|
commit 09eacbd1ab1380a95a0e9625726b45e43ed102d6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 18 19:39:13 2013 -0500
|
|
|
|
Changed old level3 test drivers to call front-ends.
|
|
|
|
Details:
|
|
- Changed old level-3 test drivers, in 'test' directory, to always call the
|
|
front-end object API instead of the internal back-end with the locally
|
|
defined control tree.
|
|
|
|
commit 83e45de23e565138b8fde06fb11cfedc973b7246
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 18 18:33:03 2013 -0500
|
|
|
|
Allow packm_init() to reacquire a too-small mem_t.
|
|
|
|
Details:
|
|
- Changed bli_packm_init() to react differently to a situation where a pack
|
|
obj_t has an already-allocated mem_t entry that has a buffer that is smaller
|
|
than what will be needed to hold the block/panel that now needs to be
|
|
packed. Previously, this situation was treated with an abort() since I
|
|
assumed something was horribly wrong. I have changed the code so that it now
|
|
reacts by releasing the previous mem_t and re-acquires a new mem_t with the
|
|
new information. (This change was done at the request of Bryan Marker to
|
|
facilitate code generation via DxT.)
|
|
|
|
commit a6990434173b0cf651f8521194f3aef738deb7d2
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 18 13:52:47 2013 -0500
|
|
|
|
Fixed bug in packing block of A for hemm/symm.
|
|
|
|
Details:
|
|
- Fixed a bug in bli_packm_blk_var2() that affected the packing functionality
|
|
of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or
|
|
symmetric matrix where the block of A being packed intersects the diagonal,
|
|
but some of its micro-panels do not intersect the diagonal and lie completely
|
|
in the unstored region. Thanks to Francisco Igual for reporting this bug.
|
|
- Comment updates to both _blk_var2.c and _blk_var3.c.
|
|
|
|
commit c92e7590e1934f830814ab614c794215ebe0c415
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Apr 17 20:53:29 2013 -0500
|
|
|
|
Activated bli_packm_acquire_mpart_t2b().
|
|
|
|
Details:
|
|
- Removed the overly-paranoid bli_abort() from the end of
|
|
bli_packm_acquire_mpart_t2b(), to allow others to experiment with
|
|
partitioning through packed blocks of A. Also, and more importantly,
|
|
changed an earlier check that was causing an erroneous (but
|
|
coincidentally redundant) abort(). Also, updated some of the comments
|
|
in bli_packm_part.c.
|
|
|
|
commit bea579e9f009a44e08008eb14d09f38748ab2b53
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 16 19:43:14 2013 -0500
|
|
|
|
Allow creation of "empty" objects.
|
|
|
|
Details:
|
|
- Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and
|
|
modified bli_adjust_strides() to explicitly handle m = n = 0.
|
|
- Updated bli_check_matrix_strides() to allow cases where m = n = 0.
|
|
|
|
commit 7904e20f2e6908571ee5008da2a08084198eefae
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 16 17:37:16 2013 -0500
|
|
|
|
Fixed "root" object bug in bli_her[2]k/syr[2]k.
|
|
|
|
Details:
|
|
- Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k,
|
|
that manifested as the incorrect triangle being updated. It occurred when
|
|
the user would pass in a matrix object that was correctly marked as
|
|
symmetric/Hermitian and lower-stored, but whose root object was never marked
|
|
as lower (or upper). We now alias and re-assign root status for matrix C
|
|
within the front-ends. Note that trmm and trsm were already doing this,
|
|
albeit for a slightly different reason (to allow the internal back-end to
|
|
choose which algorithm to run--lower or upper--based on the uplo of the root
|
|
object for both left and right side cases). Thanks to Bryan Marker for
|
|
leading me to this bug.
|
|
|
|
commit 19155a768dd97b57cfb59c32fa8e54a344ec66e1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 16 11:24:03 2013 -0500
|
|
|
|
Fixed overzealous type-checking in bli_getsc().
|
|
|
|
Details:
|
|
- Relaxed type checking in getsc so that the input object could be a constant
|
|
and not just a proper floating-point type. (If it is a constant, default to
|
|
extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
|
|
bug.
|
|
- Added definition for bli_is_constant() in bli_param_macro_defs.h
|
|
- Comment updates to various level-0 scalar routines.
|
|
|
|
commit 2ee6bbca2953d04c967685da9735b3eaf8a4b813
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 19:27:57 2013 -0500
|
|
|
|
Fixed bug in bli_obj_is_packed() and renamed.
|
|
|
|
Details:
|
|
- This macro is used to determine whether the partitioning routines should
|
|
call a corresponding packm_part routine instead. However, it was
|
|
unintentionally catching matrices that were marked as "packed" by virtue
|
|
of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
|
|
The macro has now been renamed to bli_obj_is_panel_packed(), and now only
|
|
checks for row or column panel packing. (Note that I first attempted to
|
|
fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
|
|
erroneous behavior that led me to this bug.
|
|
|
|
commit 99b99eebe70336b5f28039a4a084aa7f5fa7059d
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 17:54:43 2013 -0500
|
|
|
|
Removed local reference ukernel blocksize macros.
|
|
|
|
Details:
|
|
- Removed locally defined gemm microkernel blocksize macros from _mxn
|
|
reference microkernel definition and header. Meant to include this in
|
|
a recent/previous commit (0020ef7c8271).
|
|
|
|
commit 6a538fa7b164655f41cea5b9c8d3902438bda66b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 14:40:31 2013 -0500
|
|
|
|
Formatting change to mods in previous commit.
|
|
|
|
commit ea079d35591e808971d2d98a1a7d9f89bc1f7c2f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 14:31:40 2013 -0500
|
|
|
|
Set structure of objects in level-2 BLIS APIs.
|
|
|
|
Details:
|
|
- Added missing statement to set structure field of local objects in
|
|
top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
|
|
reporting this bug.
|
|
|
|
commit d9948c541c0446e20e249a1ccc83709ce51b7aa8
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 10:21:26 2013 -0500
|
|
|
|
Tweak to test suite function string construction.
|
|
|
|
Details:
|
|
- Fixed a minor bug in the way that the test suite would construct function
|
|
name strings when the user anchored all parameters in input.operations.
|
|
In this case, the test driver would mistake this situation for one where
|
|
the operation simply had no parameters to begin with, and thus would not
|
|
include the parameter string in the function string that is output for
|
|
every result.
|
|
|
|
commit ca9e435c57c5c7a000d2a32681dd8070ba850abd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 15 09:59:46 2013 -0500
|
|
|
|
Fixed a bug in reference implementation of dupl.
|
|
|
|
Details:
|
|
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
|
|
which resulted in incorrect duplication.
|
|
- Updated old test drivers according to recently updated packm control tree
|
|
creation interface.
|
|
- Added 'restrict' to x86 gemm microkernel interface.
|
|
|
|
commit 26cbd52e364bbe439e3744101cd5a6cbcb82dffd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Apr 14 19:05:33 2013 -0500
|
|
|
|
Modified bli_kernel.h include order in blis.h.
|
|
|
|
Details:
|
|
- Delayed #include of bli_kernel.h in blis.h to prevent a situation where
|
|
_kernel.h includes an optimized microkernel header, which uses BLIS types
|
|
such as dim_t and inc_t, which would precede the definition of those types
|
|
in bli_type_defs.h.
|
|
- Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
|
|
(immediately after that of bli_kernel.h).
|
|
|
|
commit 3414a23c38b0de45a8034b3dda2fc4b5a755e4e1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Apr 13 16:53:16 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit ec16c52f2ecf419c749175ce0a297441c10f1c68 (tag: 0.0.6)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Apr 13 16:41:16 2013 -0500
|
|
|
|
Updated INSTALL file (now redirects to website).
|
|
|
|
commit 0020ef7c82711a7ebf08e5174f939bee2563184c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Apr 13 15:26:35 2013 -0500
|
|
|
|
Removed gemmtrsm-, trsm-specific blocksize macros.
|
|
|
|
Details:
|
|
- Modified gemmtrsm micro-kernel wrappers to use new aliased blocksize macros
|
|
instead of operation-specific ones.
|
|
- Removed local, gemmtrsm-specific blocksize macro definitions found in
|
|
micro-kernel header files.
|
|
(Meant to include above changes in 31b100e7bf4a.)
|
|
- Added comments to reference gemmtrsm micro-kernel wrapper implementation.
|
|
|
|
commit 1a9f427b85bb95aaa9e54c8ff8ecad8734b361ee
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Apr 12 15:25:54 2013 -0500
|
|
|
|
Added/renamed alignment constants to _config.h.
|
|
|
|
Details:
|
|
- Added new memory alignment constants:
|
|
BLIS_HEAP_STRIDE_ALIGN_SIZE (previously assumed to be same as SYSTEM_MEM)
|
|
BLIS_CONTIG_ADDR_ALIGN_SIZE (previously assumed to be same as PAGE_SIZE)
|
|
BLIS_STACK_BUF_ALIGN_SIZE (previously not enforced)
|
|
and renamed existing ones
|
|
BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
|
|
BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
|
|
to better convey what the alignment factor is used for (and what it is
|
|
not used for).
|
|
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
|
|
disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
|
|
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
|
|
into macro-kernels to specify stack alignment of temporary buffers.
|
|
- Modified test suite driver to output new constants.
|
|
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
|
|
use bli_align_dim_to_size(), which takes a third argument (the desired
|
|
alignment).
|
|
|
|
commit a77d10e87e3c0ab55ec14d74c285bc95c06285c3
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Apr 12 11:40:55 2013 -0500
|
|
|
|
Fixed an bug in axpyv/axpym when alpha is unit.
|
|
|
|
Details:
|
|
- Fixed bug whereby axpyv and axpym were incorrectly simplifying to a copy,
|
|
rather than an add, when alpha = 1. Thanks to Bryan Marker for identifying
|
|
this bug.
|
|
|
|
commit 0495bd1d6de5995fe2fb79b321eec79e961eb7a5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 11 16:39:25 2013 -0500
|
|
|
|
Moved _POSIX_C_SOURCE def to compiler cmd line.
|
|
|
|
Details:
|
|
- Removed the #define of _POSIX_C_SOURCE in bli_config.h (for both reference
|
|
and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
|
|
the compiler command line arguments in make_defs.mk (for both configs).
|
|
Thanks to Devin Matthews for suggesting this change.
|
|
|
|
commit d43d1a0a2ef6de4bc57627566aef8e3fdb458b8c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 11 16:28:17 2013 -0500
|
|
|
|
Appended 'f2c_' to abs, min, max macros in f2c.h.
|
|
|
|
Details:
|
|
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
|
|
would not conflict with anything defined by the user (or the language).
|
|
Thanks to Devin Matthews for suggesting this fix.
|
|
- Updated all instances of the above macros accordingly.
|
|
|
|
commit 31b100e7bf4aeaa4ceafefd2b6c3102d5fbc4cbb
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 11 11:11:52 2013 -0500
|
|
|
|
Added new kernel blocksize macro aliases.
|
|
|
|
Details:
|
|
- Added new macros that alias level-3 cache and register blocksize macros
|
|
to names that can be constructed via the PASTEMAC macro. These aliased
|
|
macro definitions live inside bli_kernel_macro_defs.h, which is now
|
|
#included after bli_kernel.h.
|
|
- Modified macro-kernels to use new aliased blocksize macros instead of
|
|
operation-specific ones.
|
|
- Removed local, operation-specific kernel blocksize macro definitions
|
|
(found in macro-kernel header files).
|
|
|
|
commit bd2b24ba65b36d7c07c5918a3838ce2ff57c4b48
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 11 10:35:39 2013 -0500
|
|
|
|
Updated CREDITS file.
|
|
|
|
commit 79328c15410215737f3f14cd069328cf52aa11fd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 11 10:32:14 2013 -0500
|
|
|
|
Reverted testsuite object files' home to 'obj'.
|
|
|
|
Details:
|
|
- Removed 'obj' and 'lib' from .gitignore.
|
|
- Added testsuite/obj/.gitkeep (which is an empty file).
|
|
- Updated testsuite/Makefile accordingly.
|
|
- Thanks to Vernon Austel for pointing out the .gitkeep trick to tracking
|
|
empty directories in git.
|
|
|
|
commit 4afe3bfd82c03e1e97b58b7d250588a0d28541e5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 9 17:45:39 2013 -0500
|
|
|
|
Renamed/moved object scalar constant macros.
|
|
|
|
Details:
|
|
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
|
|
simplier macro in bli_obj_macro_defs.h.
|
|
- Updated invocations of old macros accordingly.
|
|
- Removed bli_const_defs.h.
|
|
|
|
commit 357893f5be5c56ab7b062874005e77e614b23f06
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 9 14:48:15 2013 -0500
|
|
|
|
Applied fix from prev commit to gemmtrsm_?_ref_4x4
|
|
|
|
Details:
|
|
- Fixed hard-coded kernels in bli_gemmtrsm_l_ref_4x4.c and
|
|
bli_gemmtrsm_u_ref_4x4.c.
|
|
|
|
commit 54988e8dca44475610bcaee5a7bc1c40e8921402
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 8 19:08:43 2013 -0500
|
|
|
|
Fixed a performance bug in trsm.
|
|
|
|
Details:
|
|
- Fixed a bug in the reference implementations of the gemmtrsm wrappers
|
|
(bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the
|
|
reference gemm microkernel was hard-coded, and thus always called, even
|
|
when GEMM_UKERNEL was defined to point to an optimzied microkernel. This
|
|
manifested as artificially low trsm performance for all problem sizes, but
|
|
especially for small problem sizes as it only affected blocks of A that
|
|
intersected the diagonal. Thanks to Mike Kistler of IBM for helping me
|
|
find this bug.
|
|
|
|
commit a7252e40b5c351eef9a1df531ea0ef25cb5fb705
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 8 16:08:22 2013 -0500
|
|
|
|
Generate testsuite objects 'src'.
|
|
|
|
Details:
|
|
- Tweaked the testsuite makefile so that object files are stored in 'src'
|
|
rather than 'obj', since (a) the top-level .gitignore dictates that
|
|
obj directories are to be ignored, and (b) since git has problems
|
|
tracking empty directories. Now, users do not need to create their own
|
|
obj directories within their own local clones of BLIS.
|
|
|
|
commit 803871c55b60d3c225ad9a0607fa507a9c16aab7
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 8 15:18:42 2013 -0500
|
|
|
|
Minor formatting changes.
|
|
|
|
commit a571af816d72727e16cad37007e7043b9d6fa362
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Apr 8 15:00:13 2013 -0500
|
|
|
|
Fixed definition of bli_is_packed_object() macro.
|
|
|
|
Details:
|
|
- Changed the definition of bli_is_packed_object() so that it keys off of the
|
|
value of the pack schema bits in the info field of obj_t, rather than
|
|
comparing the obj_t buffer with that of the mem_t entry. This was the cause
|
|
of a very low probability bug whereby uninitialized memory caused the macro
|
|
to evaluate to TRUE even though the object in question was not packed.
|
|
Thanks to Vernon Austel of IBM for helping discover this bug.
|
|
- Changed an abort() in bli_packm_part() to a not-yet-implemented.
|
|
|
|
commit 3be14c32f735ecc6169d3ab6370cf8b69162acec
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Apr 6 12:54:45 2013 -0500
|
|
|
|
Updated information in testsuite output header.
|
|
|
|
Details:
|
|
- Added to the information that is echoed at the beginning of the test suite's
|
|
output, and also re-labeled some existing information.
|
|
|
|
commit 874707c1b183a4dd9a91dbfd4ea1522384c190df
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Apr 5 17:19:43 2013 -0500
|
|
|
|
Fixed edge case handling bug in herk macrokernels.
|
|
|
|
Details:
|
|
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
|
|
only manifests when BLIS is configured such that MR != NR. The bug involves
|
|
incorrectly detecting edge cases, which resulted in some parts of matrix C
|
|
potentially being skipped and not updated, depending on the problem size.
|
|
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
|
|
8 and 4, respectively, so that I can better stress the framework on a
|
|
day-to-day basis. (The fact that they were both equal to 4 for so long is
|
|
why I did not stumble upon this bug much sooner.)
|
|
|
|
commit 7cbda15291d3e01300e71c286b9657b7ef0708bf
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Apr 4 15:25:43 2013 -0500
|
|
|
|
Added reference microkernels for arbitrary MR, NR.
|
|
|
|
Details:
|
|
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
|
|
contain explicit loops over MR and NR, thus allowing them to be used
|
|
unmodified by developers who want to build a reference library with
|
|
custom register blocksizes.
|
|
- Changed config/reference/bli_kernel.h to use above ukernels by default.
|
|
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
|
|
to use 'restrict' keyword.
|
|
- Added -funroll-loops option to config/reference/make_defs.mk.
|
|
- Updated comments in bli_kernel.h describing constraints on register and
|
|
cache blocksizes.
|
|
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
|
|
single-char macros are also defined.
|
|
|
|
commit 6684b73d5501f91d24a79e26655a42819c9b3114
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Apr 2 13:06:20 2013 -0500
|
|
|
|
Implemented amax operation and related changes.
|
|
|
|
Details:
|
|
- Implemented amax operation in BLIS.
|
|
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
|
|
- Added integer support to [f]printv, [f]printm.
|
|
- Added integer support to level-0 copys macros.
|
|
- Updated printing of configuration information in test suite driver.
|
|
- Comment changes to _config.h files.
|
|
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
|
|
used for.
|
|
|
|
commit fb68087f8727cd5fd656a742a110e54fb1c91db9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 26 15:10:16 2013 -0500
|
|
|
|
More memory alignment-related tweaks.
|
|
|
|
Details:
|
|
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
|
|
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
|
|
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
|
|
passed into posix_memalign() or equivalent.
|
|
- Defined new function, bli_align_dim_to_cmem(), which applies the
|
|
contiguous memory alignment (rather than the system/malloc alignment).
|
|
|
|
commit 9682ef61dbf9a8846c8b0826d4de24bc216cd641
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 26 14:14:53 2013 -0500
|
|
|
|
Always define memory alignment size cpp constant.
|
|
|
|
Details:
|
|
- Removed guard around #define for memory alignment size constant.
|
|
Memory alignment should always be enabled, and so this value should
|
|
always be defined.
|
|
|
|
commit 3a787cccaae16531474f34398e3c0cf4f49b8cd8
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 26 13:59:19 2013 -0500
|
|
|
|
Renamed memory alignment macro constant.
|
|
|
|
Details:
|
|
- Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
|
|
BLIS_MEMORY_ALIGNMENT_SIZE.
|
|
|
|
commit 37308f9a502b56d94fa52a7df71c676a46c3be3d
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 26 12:43:14 2013 -0500
|
|
|
|
Align packed panel strides with system alignment.
|
|
|
|
Details:
|
|
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
|
|
subsequent packed panel of A and B begins at an aligned address. (The
|
|
first panel is presumably aligned to system alignment because it is
|
|
aligned to a page boundary, which is typically much larger.)
|
|
- Rearranged code in packm_init_pack() to prevent additional conditional
|
|
blocks as a result of the aforementioned change.
|
|
- Adjusted contiguous memory allocator so that the system memory alignment
|
|
is used to allocate enough space for each block no matter what kind of
|
|
register blocking is used (even if register blocksize is unit and every
|
|
row/column needs maximal padding).
|
|
- Adjusted default blocksizes in reference configuration so that MC*KC
|
|
and KC*NC result in identical footprints for all datatypes.
|
|
|
|
commit 40a0654ada5f256beb3da80ebba015a3c71fb61f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 20:18:12 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit b65cdc57d9e51fa00e3c03539cfb7e045707d0f4 (tag: 0.0.5)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 20:01:49 2013 -0500
|
|
|
|
Migrated 'bl2' prefix to 'bli'.
|
|
|
|
Details:
|
|
- Changed all filename and function prefixes from 'bl2' to 'bli'.
|
|
- Changed the "blis2.h" header filename to "blis.h" and changed all
|
|
corresponding #include statements accordingly.
|
|
- Fixed incorrect association for Fran in CREDITS file.
|
|
|
|
commit 132bffcef7441f32d02cc7485aef6a0648e0ef1e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 18:49:36 2013 -0500
|
|
|
|
Removed several 'old' directories and files.
|
|
|
|
Details:
|
|
- Removed most of the 'old' directories scattered throughout the framework,
|
|
which includes alternate/half-baked/broken implementations.
|
|
|
|
commit 551ea4767a3ea6c263f12aaca94bc2642cee4cfa
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 18:00:10 2013 -0500
|
|
|
|
Removed #include "blis2.h" from low-level headers.
|
|
|
|
Details:
|
|
- Removed #include of "blis2.h" from various lower-level, operation-specific
|
|
header files throughout the framework. Given that these low-level headers
|
|
are included within #blis2.h in a very specific order, #include'ing blis2.h
|
|
within them directly is unnecessary.
|
|
|
|
commit bc7b318ed0960edeb4537797dd8c91de0d942ca9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 17:18:58 2013 -0500
|
|
|
|
Added cpp guards to conflicting libflame typedefs.
|
|
|
|
Details:
|
|
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
|
|
This is a temporary hack to allow interoperability with libflame. (Similarly
|
|
temporary changes are being made to libflame's type definitions file.)
|
|
|
|
commit f469907503fcdc24dff0174c569170e6e756e045
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 15:20:15 2013 -0500
|
|
|
|
Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
|
|
|
|
Details:
|
|
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
|
|
BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
|
|
(e.g. "prefetch" instructions, which are different than the particular
|
|
kind of prefetching/preloading referred to by this constant).
|
|
|
|
commit d1023bfbc6668a58a01ee4f82ded2319911e7b19
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 15:09:59 2013 -0500
|
|
|
|
Removed build/old directory.
|
|
|
|
commit 718888849c48d99f83eea6b8f83bc1998cffef7e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 15:07:01 2013 -0500
|
|
|
|
Deprecated 'flame' configuration.
|
|
|
|
Details:
|
|
- Removed 'flame' configuration, as it was horribly out-of-date.
|
|
- Comment changes to bl2_blocksize.c and bl2_mem.c.
|
|
|
|
commit bba38cf4e9d28058c14483f44fa074a6d2852ad9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 19 18:07:40 2013 -0500
|
|
|
|
Added missing conjbeta argument to scald.
|
|
|
|
commit 1f82b51d06d0279dded3f2b87ba59403f3ed0af6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Mar 18 15:37:20 2013 -0500
|
|
|
|
Relocated packed mem_t dimension fields to obj_t.
|
|
|
|
Details:
|
|
- Removed the m and n (and elem_size) fields from the mem_t object, and added
|
|
m_packed and n_packed fields to obj_t. These new fields track the same as
|
|
the old ones. From an abstraction standpoint, it seemed awkward to store
|
|
those dimensions inside the mem_t.
|
|
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
|
|
is passed in, instead of m, n, and elem_size.
|
|
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
|
|
functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
|
|
respectively.
|
|
- Updated packm variants to access the packed length and width fields from
|
|
their new locations.
|
|
|
|
commit 36c782857bf9b8ac1b1dac47a70f689a4407e2cc
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Mar 18 10:37:03 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit e7d41229d3b1674e74f47d7f29fae004a745201a (tag: 0.0.4)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 15 17:12:36 2013 -0500
|
|
|
|
Re-implemented contiguous memory allocator.
|
|
|
|
Details:
|
|
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
|
|
allocator instantiates and initializes three separate memory pool objects,
|
|
each one associated with a separate array of contiguous memory blocks, each
|
|
block of fixed and uniform size. (The three pools are for allocating mc-by-kc
|
|
blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
|
|
objects use a stack structure internally to track which blocks in the region
|
|
have been "checked out" to a thread and which are still available. Critical
|
|
regions are now clearly marked and adaptable to parallel environments (e.g.
|
|
OpenMP). Memory pools are set up when bl2_init() is called.
|
|
- Added a new field to the packm control tree node, which indicates what kind
|
|
of packed buffer is being allocated. The enumerated type for this argument
|
|
is defined as packbuf_t in bl2_type_defs.h.
|
|
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
|
|
packbuf_t argument to bl2_packm_cntl_obj_create().
|
|
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
|
|
bl2_mem_macro_defs.h.
|
|
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
|
|
number of blocks of A reserved for the memory allocator.
|
|
- Deprecated bl2_align_dim(). Replaced usage with that of
|
|
bl2_align_dim_to_mult(). Turns out that typically we don't need to align
|
|
a dimension to the system alignment, since that value has to do with
|
|
starting addresses, whereas the values we are dealing with are unitless
|
|
dimensions.
|
|
|
|
commit 1e76cae00cb0a04544aaae1ade878686b238d283
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 15 12:21:42 2013 -0500
|
|
|
|
Perform her2k var1 loops in sequence.
|
|
|
|
Details:
|
|
- Changed variant 1 of her2k so that the two rank-k products are computed
|
|
and accumulated in sequence rather than fused into one loop. This is
|
|
necessary if BLIS is to be configured to provide only enough contiguous
|
|
memory for one panel of B.
|
|
|
|
commit c95c270eba91ae4efc26603beddfd0292caa919b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Mar 7 14:42:15 2013 -0600
|
|
|
|
Enhanced tracking of dimensions for mem_t objects.
|
|
|
|
Details:
|
|
- Added new fields to mem_t struct definition to track the allocated (as
|
|
opposed to the currently used) dimensions of the memory region. This
|
|
allows packm_init() to be more robust in situations where memory is
|
|
already allocated but is more than needed for the current packing job.
|
|
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
|
|
in packm_init(), to update the "currently used" dimensions of the mem_t
|
|
object if the requested dimensions are smaller than the allocated
|
|
dimensions.
|
|
|
|
commit e99281a0f41d482fddeffa239bfc8e13e6d13d4b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Mar 7 14:00:10 2013 -0600
|
|
|
|
Fixed test suite flop formulas for ops with side.
|
|
|
|
Details:
|
|
- Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
|
|
trmm3, and trsm.
|
|
- Comment updates in herk macro-kernels.
|
|
|
|
commit ef8cbfc44dd620fdcbdb51cdb173217194bebe31
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Mar 2 12:47:06 2013 -0600
|
|
|
|
Added "version" to .gitignore.
|
|
|
|
Details:
|
|
- Added "version" to .gitignore file so that the file does not show up when
|
|
running 'git status', or accidentally get pulled into the index when
|
|
running 'git add' or 'git add --all'.
|
|
|
|
commit e9e0747c2f6c178f53ac46ab794acbb7b8c4fea8
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Mar 2 12:43:54 2013 -0600
|
|
|
|
Removed version file from version control.
|
|
|
|
Details:
|
|
- Removed version file from version control to prevent git errors that occur
|
|
when trying to pull new commits.
|
|
|
|
commit bb612f864e9c17dd9805e9446840f02259619469
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 1 12:55:42 2013 -0600
|
|
|
|
Updated behavior of bl2_obj_induce_trans() macro.
|
|
|
|
Details:
|
|
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
|
|
updated as part of the macro. All current uses of the macro have been
|
|
coupled with instances of bl2_obj_set_trans() to clear the bit.
|
|
- Added Jed to CREDITS file.
|
|
|
|
commit f24e29b789e7314764a818ceb3063126936c986f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 22 18:15:41 2013 -0600
|
|
|
|
Replaced banded/packed BLAS2 stubs with f2c code.
|
|
|
|
Details:
|
|
- Retired the blas2blis wrappers that simply called abort with a "not yet
|
|
implemented" message. This includes all of the level-2 banded and packed
|
|
routines.
|
|
- Replaced the aforementioned with the corresponding netlib implementations
|
|
having been run through f2c (with some customization).
|
|
- Added directories named 'attic' to build/gen-make-frags/ignore_list.
|
|
|
|
commit 1454c1a14207766dfed372b8e38b47fa384f5198
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 22 12:38:45 2013 -0600
|
|
|
|
Moved Fortran name-mangling macro to bl2_config.h.
|
|
|
|
Details:
|
|
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
|
|
configuration directory (bl2_config.h, specifically) given that it can be
|
|
expected to be tweaked by some developers.
|
|
|
|
commit ede75693e5a36c6006087c4a7df834175b604504 (tag: 0.0.3)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 22 12:11:24 2013 -0600
|
|
|
|
Implemented blas2blis compatibility layer.
|
|
|
|
Details:
|
|
- Added the blas2blis compatibility layer, located in frame/compat. This
|
|
includes virtually all of the BLAS, including banded and packed level-2
|
|
operations.
|
|
|
|
- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
|
|
initialization, which stores the "exit status" in an err_t, which is then
|
|
read by the latter function to determine whether finalization should actually
|
|
take place.
|
|
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
|
|
level-3 BLAS-like wrappers.
|
|
- Added configuration option to instruct BLIS to remain initialized whenever
|
|
it automatically initializes itself (via bl2_init_safe()), until/unless the
|
|
application code explicitly calls bl2_finalize().
|
|
|
|
- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
|
|
templatization of blas2blis wrappers.
|
|
- Defined level-0 scalar macro bl2_??swaps().
|
|
- Defined level-1v operation bl2_swapv().
|
|
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
|
|
wrappers.
|
|
|
|
commit 995edf43e21c1868732dbdd7fee14b08730218bd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 21 14:30:50 2013 -0600
|
|
|
|
Updated version file. (Forgot to in prev commit).
|
|
|
|
commit e823b08aaf7b65ecc6ddc30570709ea8a4b52aa7
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 21 12:00:17 2013 -0600
|
|
|
|
Fixed some scalar types in BLAS-like Herm APIs.
|
|
|
|
Details:
|
|
- Some of the scalars of Hermitian operations, such as alpha in her,
|
|
alpha and beta in herk, and beta in her2k, need to be real. These
|
|
arguments were typed incorrectly as the complex types. This has been
|
|
fixed. Note the issue was only present in the BLAS-like APIs for
|
|
these operations (not the native object-based interfaces).
|
|
|
|
commit 5ece050a669e74ba4a711d1d4669239d22d45642
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 20 15:50:54 2013 -0600
|
|
|
|
Updated version file. (Forgot to in prev commit).
|
|
|
|
commit f243034b8b430d4684680ea8eddfd246e73fefc0
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 20 14:11:36 2013 -0600
|
|
|
|
Changed API of packm_init_pack() to use blksz_t.
|
|
|
|
Details:
|
|
- Changed the interface of packm_init_pack() so that mult_m and mult_n
|
|
are passed in as type blksz_t* instead of dim_t.
|
|
- Make similar change for packv_init_pack().
|
|
|
|
commit da0c22f24107be9f33e0ea2dae52e5534b1fd0e5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 15 09:59:48 2013 -0600
|
|
|
|
Minor changes to lower levels of scalm and setm.
|
|
|
|
Details:
|
|
- Removed diagx parameter from lower-level interfaces of scalm.
|
|
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
|
|
- Changed setm_unb_var1() so that having an implicit unit diagonal results
|
|
in only the strictly lower or upper triangle of the matrix being modified.
|
|
|
|
commit 2c836adadcd2a7d7f217033ac4d7fcad03d5bd55
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 14 10:42:56 2013 -0600
|
|
|
|
Updated beta == zero semantics of mulsc.
|
|
|
|
Details:
|
|
- Updated beta == zero semantics of mulsc. Hopefully this is the last
|
|
operation that needed updating.
|
|
- Added Devin to CREDITS file.
|
|
|
|
commit 722b66c7dcaaaa1b109e7c8b1d53fd71a9af8240
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 14 10:18:00 2013 -0600
|
|
|
|
Removed some calls to setv() in test modules.
|
|
|
|
Details:
|
|
- Removed calls to setv() in test modules whose sole purpose was to
|
|
initialize vectors to zero to ensure that nan's and inf's would not
|
|
taint the computation. Now that beta == zero semantics have been
|
|
updated to clear the output operand (when beta is zero), rather than
|
|
multiply against it, these setv() calls are no longer needed.
|
|
|
|
commit e6ac623a902f776c42f85eadbf76996d9770a0db
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 13 18:44:59 2013 -0600
|
|
|
|
Properly implemented beta == 0 semantics.
|
|
|
|
Details:
|
|
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
|
|
respectively.
|
|
- Added code to the following operations that sets the output operand to
|
|
zero if the corresponding scalar is zero (rather than performing the
|
|
floating-point multiply, or in the case of setv, copying the value).
|
|
This will prevent nan's and inf's from creeping into results from
|
|
uninitialized memory.
|
|
- axpy
|
|
- dotxv
|
|
- scalv
|
|
- scal2v
|
|
- setv
|
|
- gemv
|
|
- ger
|
|
- hemv
|
|
- her
|
|
- her2
|
|
- gemm reference ukernels
|
|
|
|
commit aedccbc85d491e41711a0c6eb0d246d8700a199a
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 13 18:29:53 2013 -0600
|
|
|
|
Fixed stale interface to packm_unb_var1().
|
|
|
|
Details:
|
|
- Removed the control tree from the interface to packm_unb_var1(), which
|
|
I meant to do when it was un-deprecated.
|
|
|
|
commit c23135669f7a8a545e2e11ef559bf284be8bc65c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 13 13:21:00 2013 -0600
|
|
|
|
Un-deprecated packm_unb_var1.c (needed by l2 ops).
|
|
|
|
Details:
|
|
- Added bl2_packm_unb_var1() back into the mix once I realized that level-2
|
|
operations still need this routine for packing matrices. Now, whether
|
|
level-2 operations should be packing matrices to begin with is another
|
|
matter. But this fixes the segmentation fault one would have gotten when
|
|
running bl2_gemv() on a general stride matrix.
|
|
|
|
commit cf49e35f9819f9d93ebdca4703ade5abab28f6f6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 18:39:35 2013 -0600
|
|
|
|
Removed cntl tree usage from packm implementation.
|
|
|
|
Details:
|
|
- Added new fields to obj_t info field:
|
|
- invert_diag
|
|
- pack_order_if_upper
|
|
- pack_order_if_lower
|
|
These fields allow packm_init() to embed information that begins
|
|
in the control tree into the object so that the packm implementation
|
|
does not need to use control trees at all. This is being done to aid
|
|
Bryan's DxT code generation.
|
|
- Added macros that operate on above fields.
|
|
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
|
|
to above changes.
|
|
- Made similar (but much simpler) changes to packv.
|
|
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
|
|
These were part of prototype implementations and are no longer needed.
|
|
|
|
commit eb139ae256651af7820b93ef982626180195b87f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 12:39:30 2013 -0600
|
|
|
|
Replaced bl2_abs() with _fabs() where appropriate.
|
|
|
|
commit 474bac30c99928f9e87315972bcb45c632c0b7ec
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 12:23:48 2013 -0600
|
|
|
|
Removed level-0 macros projrs, grabis.
|
|
|
|
Details:
|
|
- Replaced instances of projrs and grabis macros with newer,
|
|
more general-purpose getris.
|
|
|
|
commit 03a260a457c8964e4603a655cee0d40ac17affba
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 11:45:34 2013 -0600
|
|
|
|
Restored executable permissions to scripts.
|
|
|
|
Details:
|
|
- Restored executable (0755) permissions to scripts that were touched by
|
|
the recursive sed script that updated the copyright headers in the
|
|
previous commit.
|
|
|
|
commit 1274e1243775e5e705114257a43176f63635227f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 11 14:37:47 2013 -0600
|
|
|
|
Updated copyright headers from 2012 to 2013.
|
|
|
|
commit 3b620cc8e90c53c79129bd9dd89ae6b77c2446f1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 11 13:38:07 2013 -0600
|
|
|
|
CHANGELOG update.
|
|
|
|
commit 768fcebaa8be0eb936a6e7a02cd8a19438c79d99 (tag: 0.0.2)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 11 13:20:44 2013 -0600
|
|
|
|
Added unified test suite, and many fixes.
|
|
|
|
Details:
|
|
- Added a highly configurable, unified test suite.
|
|
|
|
- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
|
|
header files. Now, instead, DUPB is computed as (NDUP != 1) within each
|
|
macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
|
|
incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
|
|
By encoding both pieces of information into one constant in _kernel.h,
|
|
it seems somewhat less likely others will encounter this bug in the
|
|
future.
|
|
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
|
|
and defined blocksizes in _cntl.c files to these default values.
|
|
|
|
- Changed semantics of her2k and syr2k such that these operations no longer
|
|
expect the B matrix to already be conjugate-transposed (or just transposed
|
|
for syr2k). However, these semantics are preserved for the internal
|
|
mechanics of the implementations, including the internal back-end and all
|
|
blocked variants.
|
|
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
|
|
respectively.
|
|
|
|
- Relaxed general object structure constraints in _basic_check() for gemv, ger.
|
|
- Changed her front-end to NOT copy-cast to real projection; instead, this is
|
|
replaced by selecting either the real part or both parts within the unblocked
|
|
algorithm implementation, depending on the value of conjh.
|
|
- Added conjh to all _check routines for her so that the code knows when to
|
|
verify that alpha has an imaginary component equal to zero (for her, but
|
|
not syr).
|
|
- Changed control tree for her to forgo packing.
|
|
|
|
- Added unit diagonal support to fnormm.
|
|
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
|
|
- Redefined complex versions of sqrt2s macros using the actual "complex square
|
|
root" formula.
|
|
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
|
|
- Defined new level-1v, -1d, and -1m versions of add and sub operations
|
|
(two-operand add and subtract).
|
|
- Added new scalar macros:
|
|
- getris: acquire real and imaginary components.
|
|
- setris: set real and imaginary components.
|
|
- addjs: addition with conjugated x.
|
|
- subjs: subtraction with conjugated x.
|
|
- Defined new utility operations:
|
|
- absumv: element-wise sum of absolute values for vector elements.
|
|
- absumm: element-wise sum of absolute values for matrix elements.
|
|
- mkherm: convert existing matrix to Hermitian.
|
|
- mksymm: convert existing matrix to symmetric.
|
|
- mktrim: convert existing matrix to triangular.
|
|
|
|
- Added various error checking routines.
|
|
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
|
|
wall clock time of a code block.
|
|
- Added general stride support to bl2_obj_alloc_buffer().
|
|
- Added bl2_obj_init_scalar().
|
|
- Updated parameter mapping in bl2_param_map.c.
|
|
- Added support for queriable version string.
|
|
|
|
- Fixed a bug in the her2k macro-kernels (which currently are simply
|
|
implemented in terms of two invocations of herk) whereby beta was being
|
|
applied to both the first and second rank-k updates, rather than only
|
|
the first.
|
|
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
|
|
properly implemented due to erroneous assumptions regarding aliasing and
|
|
root objects.
|
|
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
|
|
MR x NR block of B was being updated.
|
|
- Fixed a bug in the inverts macro in the double real case whereby the
|
|
value was typecast to float before inversion. This affected non-unit cases
|
|
of dtrsm.
|
|
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
|
|
constant was being applied incorrectly.
|
|
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
|
|
now mimics the rank-k strategy of gemm, whereby alpah is applied during
|
|
the first iteration of variant 3, with BLIS_ONE passed in instead for
|
|
subsequent iterations. This also required passing alpha into the macro-
|
|
kernels as well as the fused gemmtrsm micro-kernels.
|
|
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
|
|
called for blocks strictly above the diagonal. While this sounds good in
|
|
theory, this cannot be done because gemm_ker_var2 expects row panels of
|
|
A to be packed from top to bottom, while for trsm_u, A is actually packed
|
|
from bottom to top due to the reverse (BR->TL) nature of the algorithm.
|
|
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
|
|
dimensions were mishandled due to incorrect arguments to the copyv kernel.
|
|
Also changed the copyv kernel invocation to scal2v so that these edge
|
|
cases are properly handled when scaling is requested.
|
|
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
|
|
instead of the source object.
|
|
- Fixed a bug whereby level-2 code could allocate memory dynamically via
|
|
bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
|
|
a potential future bug whereby a mem_t object that is actually no longer
|
|
"allocated" from the static pool is mistaken for being allocated due to
|
|
failure to NULLify the buffer when the block was most recently released.
|
|
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
|
|
toggled when the requested subpartition needed to be "reflected" due to it
|
|
residing in an unstored region.
|
|
|
|
commit be94fb84c0351602d7585269f29998e3bf83f899
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 4 10:55:21 2013 -0600
|
|
|
|
Added missing 'd' to fused gemmtrsm function name.
|
|
|
|
commit 879a179e1dee36f0c56765f2ab91a26861019b34
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 4 10:37:27 2013 -0600
|
|
|
|
Added debug statements to bl2_mm_acquire_m().
|
|
|
|
Details:
|
|
- Added printf() statements to bl2_mm_acquire_m() to help debug issues
|
|
with prematurely exhausted memory pool.
|
|
- Removed 'd' from kernel names of reference kernels in clarksville
|
|
configuration's bl2_kernel.h
|
|
|
|
commit 806e74beb4eafeef620a555ffbb3f6779e29c7b6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 17:07:50 2012 -0600
|
|
|
|
Defined Frobenius norm operations.
|
|
|
|
Details:
|
|
- Added level-0 grabis macro operation to grab imaginary component of one
|
|
variable and copy it to the real component of another variable.
|
|
- Defined sumsqv operation, which computes the sum of the absolute squares
|
|
of the elements of a vector. This implementation is modeled after ?lassq
|
|
in netlib LAPACK.
|
|
- Defined fnormv and fnormm operations, which compute the Frobenius norm on
|
|
vectors and matrices, respectively. These operations are treated as one-
|
|
operand operations where the output norm value is the real projection of
|
|
the datatype of the input operand. Both operations are implemented in terms
|
|
of sumsqv.
|
|
|
|
commit 66e80ce1aec099b2b2b0c4f295e38add2c921383
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 17:02:55 2012 -0600
|
|
|
|
Added GENT*R macros; tweaked bl2_machval defs.
|
|
|
|
Details:
|
|
- Added function and prototype macro-generating macros for GENTFUNCR and
|
|
GENTPROTR, which are one-operand macros with auxiliary real projection
|
|
types.
|
|
- Tweaked bl2_machval files to use new macros.
|
|
|
|
commit 2fecc88ca22142020573f168da715e8e9f3dd7de
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 11:35:14 2012 -0600
|
|
|
|
Fixed harmless macro bug in level-1m operations.
|
|
|
|
Details:
|
|
- Fixed some inconsistent usage of n_iter_max and n_iter in the two
|
|
bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening
|
|
despite the bug, which is why I had not discovered it until now.
|
|
|
|
commit 8945db6ec9f82168cf72411ad408b4fdb44ae0d1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 18 15:07:36 2012 -0600
|
|
|
|
Renamed x86,x86_64 kernels to indicate 'd' fusing.
|
|
|
|
Details:
|
|
- Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape
|
|
to emphasize that the fusing shape is not for all datatype instances, but
|
|
rather just for one (that of double-precision real). Other fusing shapes
|
|
would be proportional to their precision and domain "byte footprints".
|
|
- Corresponding changes to config/clarksville/bl2_kernel.h.
|
|
|
|
commit 6fbbdd4e194d06096ad08c5db61127be338067db
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 18 14:34:02 2012 -0600
|
|
|
|
More tweaks to _config.h, _kernel.h; smem tweaks.
|
|
|
|
Details:
|
|
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
|
|
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
|
|
accomplishes the same thing (enabling posix_memalign()) without enabling
|
|
all of the GNU extensions we don't need.
|
|
- Defined the size of the static memory pool in terms of MC, KC, and NC,
|
|
as well as two new constants that determine how many MCxKC blocks and
|
|
how many KCxNC blocks should be allocated (defined in bl2_config.h).
|
|
- In the case of static memory pool exhaustion, replaced the generic
|
|
bl2_abort() with a specific error code call.
|
|
|
|
commit 5d8bdb21c48e8fb11bef6128a242122cc1470a99
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 17 16:07:36 2012 -0600
|
|
|
|
Minor reordering of bl2_config.h definitions.
|
|
|
|
commit 4a83f67490136a898f558e273b76a687aed8b893
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 17 12:35:54 2012 -0600
|
|
|
|
Consolidated configuration headers.
|
|
|
|
Details:
|
|
- Merged contents of bl2_arch.h into bl2_config.h for reference and
|
|
clarksville configurations.
|
|
- Updated CREDITS, INSTALL, LICENSE, README files.
|
|
|
|
commit 0670c33cc14612f636ef09ede4133404ae0af6ba
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 14 12:45:26 2012 -0600
|
|
|
|
Fixed bug in reference gemm ukernels.
|
|
|
|
Details:
|
|
- Fixed a bug whereby, for the reference gemm ukernels, the matrix product
|
|
was not correctly accumulated and scaled (by alpha) into the output matrix
|
|
C. (Thanks to Fran for finding this bug.)
|
|
- Whitespace changes to reference trsm kernels.
|
|
|
|
commit e2e7cb2fbe615be4d375bc2dce88d03d98fadc9e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 13 18:17:54 2012 -0600
|
|
|
|
Expanded reference packm/unpackm kernel set to 16.
|
|
|
|
Details:
|
|
- Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and
|
|
unpackm.
|
|
- Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range"
|
|
kernel size is requested. (Thanks to Tyler for finding this bug.)
|
|
- Updated bl2_kernel.h to contain new _KERNEL definitions, according
|
|
to above changes, for 'reference' and 'clarksville' configurations.
|
|
- Updated CHANGELOG.
|
|
- Removed "output*.m" from .gitignore.
|
|
|
|
commit 17455a8bce038dd570356ab0c5c11d9a89f20248
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 17:23:32 2012 -0600
|
|
|
|
Minor updates towards to 0.0.1.
|
|
|
|
commit 7ad4ebef38b8e6eea9b6091844ba7294ec870271 (tag: 0.0.1)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 16:18:40 2012 -0600
|
|
|
|
Tweaks to get BLIS compiling again on clarksville.
|
|
|
|
Details:
|
|
- Updated header files and make_defs.mk in config/clarksville.
|
|
- Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
|
|
- Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
|
|
- Shuffled include statements in blis2.h.
|
|
|
|
commit cc58ea86010b1f046134d13b546c878389df9af5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 14:55:12 2012 -0600
|
|
|
|
Added template fragment.mk; updated .gitignore.
|
|
|
|
commit 714c527b0eb153b7e2040b79349edc8372f743fd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 19:54:04 2012 -0600
|
|
|
|
Added 'changelog' make target; other tweaks.
|
|
|
|
Details:
|
|
- Updated CHANGELOG.
|
|
- Added 'changelog' target to Makefile that runs 'git log --decorate' and
|
|
overwrites CHANGELOG with the output.
|
|
- Other trivial changes.
|
|
|
|
commit e4e5404d26aded4873278e85faf6f14ac32115b5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 17:34:53 2012 -0600
|
|
|
|
Define static memory pool size in bl2_config.h.
|
|
|
|
commit 19bb507d0de6a2bd3ce37cf616bdcd6b419ed641
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 17:18:00 2012 -0600
|
|
|
|
Refined INSTALL text; added 'showconfig' target.
|
|
|
|
Details:
|
|
- Added 'showconfig' target to Makefile.
|
|
- Added header files and ./config/<configname>/make_defs.mk as prerequisites
|
|
to object file rules.
|
|
- Added config.mk as prerequisite to library install rules.
|
|
- Edited and added to INSTALL file.
|
|
|
|
commit 26cb659dd79636489db5a051aa60fff80273a7b9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 15:34:53 2012 -0600
|
|
|
|
Added auto-detection of version string (via git).
|
|
|
|
Details:
|
|
- Added build/update-version-file.sh script for auto-detecting "version"
|
|
string and updating 'version' file accordingly. (If .git directory is
|
|
not present, then it is assumed this copy of BLIS is a downloaded
|
|
release, in which case 'version' file is left unchanged.)
|
|
- Added invocation of update-version-file.sh to configure script.
|
|
|
|
commit b0ecd0ff52fa6ffc9e1d9eb44c365f7f009a6204
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 14:27:11 2012 -0600
|
|
|
|
Wrote first draft of INSTALL file.
|
|
|
|
commit bcbe81235a35ccfdbcc2f2319a0ca6e04f75a785 (tag: 0.0.0)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 12:42:35 2012 -0600
|
|
|
|
Updated standalone test Makefile and other fixes.
|
|
|
|
Details:
|
|
- Major edits to test/Makefile to bring up-to-date wrt new build system;
|
|
should no longer be broken.
|
|
- Minor edits to top-level Makefile.
|
|
- Fixed copy-and-paste bugs in
|
|
- frame/1m/packm/ukernels/bl2_packm_ref_?xk.c
|
|
- frame/1m/unpackm/ukernels/bl2_unpackm_ref_?xk.c
|
|
|
|
commit 2f272b40f43307909736327f49d17737c7a05d37
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 4 19:22:14 2012 -0600
|
|
|
|
Added build system and continued reorganization.
|
|
|
|
Details:
|
|
- Added/renamed packm, unpackm kernels.
|
|
- Added machine value routines.
|
|
- Added param_map facility.
|
|
- Renamed AUTHORS to CREDITS.
|
|
- Added Makefile; continued to expand upon existing configure script.
|
|
- #define fuse_fac macros in operation headers if not defined already
|
|
(by the user in bl2_kernels.h).
|
|
|
|
commit 00f3498a8943be1b387f0d5c029c8c7891687ad5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 3 12:36:11 2012 -0600
|
|
|
|
Initial commit.
|