mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
787 lines
32 KiB
Plaintext
787 lines
32 KiB
Plaintext
commit b65cdc57d9e51fa00e3c03539cfb7e045707d0f4 (HEAD, tag: 0.0.5, origin/master, master)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 20:01:49 2013 -0500
|
|
|
|
Migrated 'bl2' prefix to 'bli'.
|
|
|
|
Details:
|
|
- Changed all filename and function prefixes from 'bl2' to 'bli'.
|
|
- Changed the "blis2.h" header filename to "blis.h" and changed all
|
|
corresponding #include statements accordingly.
|
|
- Fixed incorrect association for Fran in CREDITS file.
|
|
|
|
commit 132bffcef7441f32d02cc7485aef6a0648e0ef1e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 18:49:36 2013 -0500
|
|
|
|
Removed several 'old' directories and files.
|
|
|
|
Details:
|
|
- Removed most of the 'old' directories scattered throughout the framework,
|
|
which includes alternate/half-baked/broken implementations.
|
|
|
|
commit 551ea4767a3ea6c263f12aaca94bc2642cee4cfa
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sun Mar 24 18:00:10 2013 -0500
|
|
|
|
Removed #include "blis2.h" from low-level headers.
|
|
|
|
Details:
|
|
- Removed #include of "blis2.h" from various lower-level, operation-specific
|
|
header files throughout the framework. Given that these low-level headers
|
|
are included within #blis2.h in a very specific order, #include'ing blis2.h
|
|
within them directly is unnecessary.
|
|
|
|
commit bc7b318ed0960edeb4537797dd8c91de0d942ca9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 17:18:58 2013 -0500
|
|
|
|
Added cpp guards to conflicting libflame typedefs.
|
|
|
|
Details:
|
|
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
|
|
This is a temporary hack to allow interoperability with libflame. (Similarly
|
|
temporary changes are being made to libflame's type definitions file.)
|
|
|
|
commit f469907503fcdc24dff0174c569170e6e756e045
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 15:20:15 2013 -0500
|
|
|
|
Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
|
|
|
|
Details:
|
|
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
|
|
BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
|
|
(e.g. "prefetch" instructions, which are different than the particular
|
|
kind of prefetching/preloading referred to by this constant).
|
|
|
|
commit d1023bfbc6668a58a01ee4f82ded2319911e7b19
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 15:09:59 2013 -0500
|
|
|
|
Removed build/old directory.
|
|
|
|
commit 718888849c48d99f83eea6b8f83bc1998cffef7e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 22 15:07:01 2013 -0500
|
|
|
|
Deprecated 'flame' configuration.
|
|
|
|
Details:
|
|
- Removed 'flame' configuration, as it was horribly out-of-date.
|
|
- Comment changes to bl2_blocksize.c and bl2_mem.c.
|
|
|
|
commit bba38cf4e9d28058c14483f44fa074a6d2852ad9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Mar 19 18:07:40 2013 -0500
|
|
|
|
Added missing conjbeta argument to scald.
|
|
|
|
commit 1f82b51d06d0279dded3f2b87ba59403f3ed0af6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Mar 18 15:37:20 2013 -0500
|
|
|
|
Relocated packed mem_t dimension fields to obj_t.
|
|
|
|
Details:
|
|
- Removed the m and n (and elem_size) fields from the mem_t object, and added
|
|
m_packed and n_packed fields to obj_t. These new fields track the same as
|
|
the old ones. From an abstraction standpoint, it seemed awkward to store
|
|
those dimensions inside the mem_t.
|
|
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
|
|
is passed in, instead of m, n, and elem_size.
|
|
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
|
|
functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
|
|
respectively.
|
|
- Updated packm variants to access the packed length and width fields from
|
|
their new locations.
|
|
|
|
commit 36c782857bf9b8ac1b1dac47a70f689a4407e2cc
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Mar 18 10:37:03 2013 -0500
|
|
|
|
CHANGELOG update.
|
|
|
|
commit e7d41229d3b1674e74f47d7f29fae004a745201a (tag: 0.0.4)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 15 17:12:36 2013 -0500
|
|
|
|
Re-implemented contiguous memory allocator.
|
|
|
|
Details:
|
|
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
|
|
allocator instantiates and initializes three separate memory pool objects,
|
|
each one associated with a separate array of contiguous memory blocks, each
|
|
block of fixed and uniform size. (The three pools are for allocating mc-by-kc
|
|
blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
|
|
objects use a stack structure internally to track which blocks in the region
|
|
have been "checked out" to a thread and which are still available. Critical
|
|
regions are now clearly marked and adaptable to parallel environments (e.g.
|
|
OpenMP). Memory pools are set up when bl2_init() is called.
|
|
- Added a new field to the packm control tree node, which indicates what kind
|
|
of packed buffer is being allocated. The enumerated type for this argument
|
|
is defined as packbuf_t in bl2_type_defs.h.
|
|
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
|
|
packbuf_t argument to bl2_packm_cntl_obj_create().
|
|
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
|
|
bl2_mem_macro_defs.h.
|
|
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
|
|
number of blocks of A reserved for the memory allocator.
|
|
- Deprecated bl2_align_dim(). Replaced usage with that of
|
|
bl2_align_dim_to_mult(). Turns out that typically we don't need to align
|
|
a dimension to the system alignment, since that value has to do with
|
|
starting addresses, whereas the values we are dealing with are unitless
|
|
dimensions.
|
|
|
|
commit 1e76cae00cb0a04544aaae1ade878686b238d283
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 15 12:21:42 2013 -0500
|
|
|
|
Perform her2k var1 loops in sequence.
|
|
|
|
Details:
|
|
- Changed variant 1 of her2k so that the two rank-k products are computed
|
|
and accumulated in sequence rather than fused into one loop. This is
|
|
necessary if BLIS is to be configured to provide only enough contiguous
|
|
memory for one panel of B.
|
|
|
|
commit c95c270eba91ae4efc26603beddfd0292caa919b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Mar 7 14:42:15 2013 -0600
|
|
|
|
Enhanced tracking of dimensions for mem_t objects.
|
|
|
|
Details:
|
|
- Added new fields to mem_t struct definition to track the allocated (as
|
|
opposed to the currently used) dimensions of the memory region. This
|
|
allows packm_init() to be more robust in situations where memory is
|
|
already allocated but is more than needed for the current packing job.
|
|
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
|
|
in packm_init(), to update the "currently used" dimensions of the mem_t
|
|
object if the requested dimensions are smaller than the allocated
|
|
dimensions.
|
|
|
|
commit e99281a0f41d482fddeffa239bfc8e13e6d13d4b
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Mar 7 14:00:10 2013 -0600
|
|
|
|
Fixed test suite flop formulas for ops with side.
|
|
|
|
Details:
|
|
- Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
|
|
trmm3, and trsm.
|
|
- Comment updates in herk macro-kernels.
|
|
|
|
commit ef8cbfc44dd620fdcbdb51cdb173217194bebe31
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Mar 2 12:47:06 2013 -0600
|
|
|
|
Added "version" to .gitignore.
|
|
|
|
Details:
|
|
- Added "version" to .gitignore file so that the file does not show up when
|
|
running 'git status', or accidentally get pulled into the index when
|
|
running 'git add' or 'git add --all'.
|
|
|
|
commit e9e0747c2f6c178f53ac46ab794acbb7b8c4fea8
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Sat Mar 2 12:43:54 2013 -0600
|
|
|
|
Removed version file from version control.
|
|
|
|
Details:
|
|
- Removed version file from version control to prevent git errors that occur
|
|
when trying to pull new commits.
|
|
|
|
commit bb612f864e9c17dd9805e9446840f02259619469
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Mar 1 12:55:42 2013 -0600
|
|
|
|
Updated behavior of bl2_obj_induce_trans() macro.
|
|
|
|
Details:
|
|
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
|
|
updated as part of the macro. All current uses of the macro have been
|
|
coupled with instances of bl2_obj_set_trans() to clear the bit.
|
|
- Added Jed to CREDITS file.
|
|
|
|
commit f24e29b789e7314764a818ceb3063126936c986f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 22 18:15:41 2013 -0600
|
|
|
|
Replaced banded/packed BLAS2 stubs with f2c code.
|
|
|
|
Details:
|
|
- Retired the blas2blis wrappers that simply called abort with a "not yet
|
|
implemented" message. This includes all of the level-2 banded and packed
|
|
routines.
|
|
- Replaced the aforementioned with the corresponding netlib implementations
|
|
having been run through f2c (with some customization).
|
|
- Added directories named 'attic' to build/gen-make-frags/ignore_list.
|
|
|
|
commit 1454c1a14207766dfed372b8e38b47fa384f5198
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 22 12:38:45 2013 -0600
|
|
|
|
Moved Fortran name-mangling macro to bl2_config.h.
|
|
|
|
Details:
|
|
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
|
|
configuration directory (bl2_config.h, specifically) given that it can be
|
|
expected to be tweaked by some developers.
|
|
|
|
commit ede75693e5a36c6006087c4a7df834175b604504 (tag: 0.0.3)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 22 12:11:24 2013 -0600
|
|
|
|
Implemented blas2blis compatibility layer.
|
|
|
|
Details:
|
|
- Added the blas2blis compatibility layer, located in frame/compat. This
|
|
includes virtually all of the BLAS, including banded and packed level-2
|
|
operations.
|
|
|
|
- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
|
|
initialization, which stores the "exit status" in an err_t, which is then
|
|
read by the latter function to determine whether finalization should actually
|
|
take place.
|
|
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
|
|
level-3 BLAS-like wrappers.
|
|
- Added configuration option to instruct BLIS to remain initialized whenever
|
|
it automatically initializes itself (via bl2_init_safe()), until/unless the
|
|
application code explicitly calls bl2_finalize().
|
|
|
|
- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
|
|
templatization of blas2blis wrappers.
|
|
- Defined level-0 scalar macro bl2_??swaps().
|
|
- Defined level-1v operation bl2_swapv().
|
|
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
|
|
wrappers.
|
|
|
|
commit 995edf43e21c1868732dbdd7fee14b08730218bd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 21 14:30:50 2013 -0600
|
|
|
|
Updated version file. (Forgot to in prev commit).
|
|
|
|
commit e823b08aaf7b65ecc6ddc30570709ea8a4b52aa7
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 21 12:00:17 2013 -0600
|
|
|
|
Fixed some scalar types in BLAS-like Herm APIs.
|
|
|
|
Details:
|
|
- Some of the scalars of Hermitian operations, such as alpha in her,
|
|
alpha and beta in herk, and beta in her2k, need to be real. These
|
|
arguments were typed incorrectly as the complex types. This has been
|
|
fixed. Note the issue was only present in the BLAS-like APIs for
|
|
these operations (not the native object-based interfaces).
|
|
|
|
commit 5ece050a669e74ba4a711d1d4669239d22d45642
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 20 15:50:54 2013 -0600
|
|
|
|
Updated version file. (Forgot to in prev commit).
|
|
|
|
commit f243034b8b430d4684680ea8eddfd246e73fefc0
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 20 14:11:36 2013 -0600
|
|
|
|
Changed API of packm_init_pack() to use blksz_t.
|
|
|
|
Details:
|
|
- Changed the interface of packm_init_pack() so that mult_m and mult_n
|
|
are passed in as type blksz_t* instead of dim_t.
|
|
- Make similar change for packv_init_pack().
|
|
|
|
commit da0c22f24107be9f33e0ea2dae52e5534b1fd0e5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Feb 15 09:59:48 2013 -0600
|
|
|
|
Minor changes to lower levels of scalm and setm.
|
|
|
|
Details:
|
|
- Removed diagx parameter from lower-level interfaces of scalm.
|
|
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
|
|
- Changed setm_unb_var1() so that having an implicit unit diagonal results
|
|
in only the strictly lower or upper triangle of the matrix being modified.
|
|
|
|
commit 2c836adadcd2a7d7f217033ac4d7fcad03d5bd55
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 14 10:42:56 2013 -0600
|
|
|
|
Updated beta == zero semantics of mulsc.
|
|
|
|
Details:
|
|
- Updated beta == zero semantics of mulsc. Hopefully this is the last
|
|
operation that needed updating.
|
|
- Added Devin to CREDITS file.
|
|
|
|
commit 722b66c7dcaaaa1b109e7c8b1d53fd71a9af8240
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Feb 14 10:18:00 2013 -0600
|
|
|
|
Removed some calls to setv() in test modules.
|
|
|
|
Details:
|
|
- Removed calls to setv() in test modules whose sole purpose was to
|
|
initialize vectors to zero to ensure that nan's and inf's would not
|
|
taint the computation. Now that beta == zero semantics have been
|
|
updated to clear the output operand (when beta is zero), rather than
|
|
multiply against it, these setv() calls are no longer needed.
|
|
|
|
commit e6ac623a902f776c42f85eadbf76996d9770a0db
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 13 18:44:59 2013 -0600
|
|
|
|
Properly implemented beta == 0 semantics.
|
|
|
|
Details:
|
|
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
|
|
respectively.
|
|
- Added code to the following operations that sets the output operand to
|
|
zero if the corresponding scalar is zero (rather than performing the
|
|
floating-point multiply, or in the case of setv, copying the value).
|
|
This will prevent nan's and inf's from creeping into results from
|
|
uninitialized memory.
|
|
- axpy
|
|
- dotxv
|
|
- scalv
|
|
- scal2v
|
|
- setv
|
|
- gemv
|
|
- ger
|
|
- hemv
|
|
- her
|
|
- her2
|
|
- gemm reference ukernels
|
|
|
|
commit aedccbc85d491e41711a0c6eb0d246d8700a199a
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 13 18:29:53 2013 -0600
|
|
|
|
Fixed stale interface to packm_unb_var1().
|
|
|
|
Details:
|
|
- Removed the control tree from the interface to packm_unb_var1(), which
|
|
I meant to do when it was un-deprecated.
|
|
|
|
commit c23135669f7a8a545e2e11ef559bf284be8bc65c
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Wed Feb 13 13:21:00 2013 -0600
|
|
|
|
Un-deprecated packm_unb_var1.c (needed by l2 ops).
|
|
|
|
Details:
|
|
- Added bl2_packm_unb_var1() back into the mix once I realized that level-2
|
|
operations still need this routine for packing matrices. Now, whether
|
|
level-2 operations should be packing matrices to begin with is another
|
|
matter. But this fixes the segmentation fault one would have gotten when
|
|
running bl2_gemv() on a general stride matrix.
|
|
|
|
commit cf49e35f9819f9d93ebdca4703ade5abab28f6f6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 18:39:35 2013 -0600
|
|
|
|
Removed cntl tree usage from packm implementation.
|
|
|
|
Details:
|
|
- Added new fields to obj_t info field:
|
|
- invert_diag
|
|
- pack_order_if_upper
|
|
- pack_order_if_lower
|
|
These fields allow packm_init() to embed information that begins
|
|
in the control tree into the object so that the packm implementation
|
|
does not need to use control trees at all. This is being done to aid
|
|
Bryan's DxT code generation.
|
|
- Added macros that operate on above fields.
|
|
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
|
|
to above changes.
|
|
- Made similar (but much simpler) changes to packv.
|
|
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
|
|
These were part of prototype implementations and are no longer needed.
|
|
|
|
commit eb139ae256651af7820b93ef982626180195b87f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 12:39:30 2013 -0600
|
|
|
|
Replaced bl2_abs() with _fabs() where appropriate.
|
|
|
|
commit 474bac30c99928f9e87315972bcb45c632c0b7ec
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 12:23:48 2013 -0600
|
|
|
|
Removed level-0 macros projrs, grabis.
|
|
|
|
Details:
|
|
- Replaced instances of projrs and grabis macros with newer,
|
|
more general-purpose getris.
|
|
|
|
commit 03a260a457c8964e4603a655cee0d40ac17affba
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Feb 12 11:45:34 2013 -0600
|
|
|
|
Restored executable permissions to scripts.
|
|
|
|
Details:
|
|
- Restored executable (0755) permissions to scripts that were touched by
|
|
the recursive sed script that updated the copyright headers in the
|
|
previous commit.
|
|
|
|
commit 1274e1243775e5e705114257a43176f63635227f
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 11 14:37:47 2013 -0600
|
|
|
|
Updated copyright headers from 2012 to 2013.
|
|
|
|
commit 3b620cc8e90c53c79129bd9dd89ae6b77c2446f1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 11 13:38:07 2013 -0600
|
|
|
|
CHANGELOG update.
|
|
|
|
commit 768fcebaa8be0eb936a6e7a02cd8a19438c79d99 (tag: 0.0.2)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Feb 11 13:20:44 2013 -0600
|
|
|
|
Added unified test suite, and many fixes.
|
|
|
|
Details:
|
|
- Added a highly configurable, unified test suite.
|
|
|
|
- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
|
|
header files. Now, instead, DUPB is computed as (NDUP != 1) within each
|
|
macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
|
|
incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
|
|
By encoding both pieces of information into one constant in _kernel.h,
|
|
it seems somewhat less likely others will encounter this bug in the
|
|
future.
|
|
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
|
|
and defined blocksizes in _cntl.c files to these default values.
|
|
|
|
- Changed semantics of her2k and syr2k such that these operations no longer
|
|
expect the B matrix to already be conjugate-transposed (or just transposed
|
|
for syr2k). However, these semantics are preserved for the internal
|
|
mechanics of the implementations, including the internal back-end and all
|
|
blocked variants.
|
|
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
|
|
respectively.
|
|
|
|
- Relaxed general object structure constraints in _basic_check() for gemv, ger.
|
|
- Changed her front-end to NOT copy-cast to real projection; instead, this is
|
|
replaced by selecting either the real part or both parts within the unblocked
|
|
algorithm implementation, depending on the value of conjh.
|
|
- Added conjh to all _check routines for her so that the code knows when to
|
|
verify that alpha has an imaginary component equal to zero (for her, but
|
|
not syr).
|
|
- Changed control tree for her to forgo packing.
|
|
|
|
- Added unit diagonal support to fnormm.
|
|
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
|
|
- Redefined complex versions of sqrt2s macros using the actual "complex square
|
|
root" formula.
|
|
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
|
|
- Defined new level-1v, -1d, and -1m versions of add and sub operations
|
|
(two-operand add and subtract).
|
|
- Added new scalar macros:
|
|
- getris: acquire real and imaginary components.
|
|
- setris: set real and imaginary components.
|
|
- addjs: addition with conjugated x.
|
|
- subjs: subtraction with conjugated x.
|
|
- Defined new utility operations:
|
|
- absumv: element-wise sum of absolute values for vector elements.
|
|
- absumm: element-wise sum of absolute values for matrix elements.
|
|
- mkherm: convert existing matrix to Hermitian.
|
|
- mksymm: convert existing matrix to symmetric.
|
|
- mktrim: convert existing matrix to triangular.
|
|
|
|
- Added various error checking routines.
|
|
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
|
|
wall clock time of a code block.
|
|
- Added general stride support to bl2_obj_alloc_buffer().
|
|
- Added bl2_obj_init_scalar().
|
|
- Updated parameter mapping in bl2_param_map.c.
|
|
- Added support for queriable version string.
|
|
|
|
- Fixed a bug in the her2k macro-kernels (which currently are simply
|
|
implemented in terms of two invocations of herk) whereby beta was being
|
|
applied to both the first and second rank-k updates, rather than only
|
|
the first.
|
|
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
|
|
properly implemented due to erroneous assumptions regarding aliasing and
|
|
root objects.
|
|
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
|
|
MR x NR block of B was being updated.
|
|
- Fixed a bug in the inverts macro in the double real case whereby the
|
|
value was typecast to float before inversion. This affected non-unit cases
|
|
of dtrsm.
|
|
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
|
|
constant was being applied incorrectly.
|
|
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
|
|
now mimics the rank-k strategy of gemm, whereby alpah is applied during
|
|
the first iteration of variant 3, with BLIS_ONE passed in instead for
|
|
subsequent iterations. This also required passing alpha into the macro-
|
|
kernels as well as the fused gemmtrsm micro-kernels.
|
|
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
|
|
called for blocks strictly above the diagonal. While this sounds good in
|
|
theory, this cannot be done because gemm_ker_var2 expects row panels of
|
|
A to be packed from top to bottom, while for trsm_u, A is actually packed
|
|
from bottom to top due to the reverse (BR->TL) nature of the algorithm.
|
|
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
|
|
dimensions were mishandled due to incorrect arguments to the copyv kernel.
|
|
Also changed the copyv kernel invocation to scal2v so that these edge
|
|
cases are properly handled when scaling is requested.
|
|
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
|
|
instead of the source object.
|
|
- Fixed a bug whereby level-2 code could allocate memory dynamically via
|
|
bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
|
|
a potential future bug whereby a mem_t object that is actually no longer
|
|
"allocated" from the static pool is mistaken for being allocated due to
|
|
failure to NULLify the buffer when the block was most recently released.
|
|
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
|
|
toggled when the requested subpartition needed to be "reflected" due to it
|
|
residing in an unstored region.
|
|
|
|
commit be94fb84c0351602d7585269f29998e3bf83f899
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 4 10:55:21 2013 -0600
|
|
|
|
Added missing 'd' to fused gemmtrsm function name.
|
|
|
|
commit 879a179e1dee36f0c56765f2ab91a26861019b34
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Jan 4 10:37:27 2013 -0600
|
|
|
|
Added debug statements to bl2_mm_acquire_m().
|
|
|
|
Details:
|
|
- Added printf() statements to bl2_mm_acquire_m() to help debug issues
|
|
with prematurely exhausted memory pool.
|
|
- Removed 'd' from kernel names of reference kernels in clarksville
|
|
configuration's bl2_kernel.h
|
|
|
|
commit 806e74beb4eafeef620a555ffbb3f6779e29c7b6
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 17:07:50 2012 -0600
|
|
|
|
Defined Frobenius norm operations.
|
|
|
|
Details:
|
|
- Added level-0 grabis macro operation to grab imaginary component of one
|
|
variable and copy it to the real component of another variable.
|
|
- Defined sumsqv operation, which computes the sum of the absolute squares
|
|
of the elements of a vector. This implementation is modeled after ?lassq
|
|
in netlib LAPACK.
|
|
- Defined fnormv and fnormm operations, which compute the Frobenius norm on
|
|
vectors and matrices, respectively. These operations are treated as one-
|
|
operand operations where the output norm value is the real projection of
|
|
the datatype of the input operand. Both operations are implemented in terms
|
|
of sumsqv.
|
|
|
|
commit 66e80ce1aec099b2b2b0c4f295e38add2c921383
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 17:02:55 2012 -0600
|
|
|
|
Added GENT*R macros; tweaked bl2_machval defs.
|
|
|
|
Details:
|
|
- Added function and prototype macro-generating macros for GENTFUNCR and
|
|
GENTPROTR, which are one-operand macros with auxiliary real projection
|
|
types.
|
|
- Tweaked bl2_machval files to use new macros.
|
|
|
|
commit 2fecc88ca22142020573f168da715e8e9f3dd7de
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 20 11:35:14 2012 -0600
|
|
|
|
Fixed harmless macro bug in level-1m operations.
|
|
|
|
Details:
|
|
- Fixed some inconsistent usage of n_iter_max and n_iter in the two
|
|
bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening
|
|
despite the bug, which is why I had not discovered it until now.
|
|
|
|
commit 8945db6ec9f82168cf72411ad408b4fdb44ae0d1
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 18 15:07:36 2012 -0600
|
|
|
|
Renamed x86,x86_64 kernels to indicate 'd' fusing.
|
|
|
|
Details:
|
|
- Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape
|
|
to emphasize that the fusing shape is not for all datatype instances, but
|
|
rather just for one (that of double-precision real). Other fusing shapes
|
|
would be proportional to their precision and domain "byte footprints".
|
|
- Corresponding changes to config/clarksville/bl2_kernel.h.
|
|
|
|
commit 6fbbdd4e194d06096ad08c5db61127be338067db
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 18 14:34:02 2012 -0600
|
|
|
|
More tweaks to _config.h, _kernel.h; smem tweaks.
|
|
|
|
Details:
|
|
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
|
|
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
|
|
accomplishes the same thing (enabling posix_memalign()) without enabling
|
|
all of the GNU extensions we don't need.
|
|
- Defined the size of the static memory pool in terms of MC, KC, and NC,
|
|
as well as two new constants that determine how many MCxKC blocks and
|
|
how many KCxNC blocks should be allocated (defined in bl2_config.h).
|
|
- In the case of static memory pool exhaustion, replaced the generic
|
|
bl2_abort() with a specific error code call.
|
|
|
|
commit 5d8bdb21c48e8fb11bef6128a242122cc1470a99
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 17 16:07:36 2012 -0600
|
|
|
|
Minor reordering of bl2_config.h definitions.
|
|
|
|
commit 4a83f67490136a898f558e273b76a687aed8b893
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 17 12:35:54 2012 -0600
|
|
|
|
Consolidated configuration headers.
|
|
|
|
Details:
|
|
- Merged contents of bl2_arch.h into bl2_config.h for reference and
|
|
clarksville configurations.
|
|
- Updated CREDITS, INSTALL, LICENSE, README files.
|
|
|
|
commit 0670c33cc14612f636ef09ede4133404ae0af6ba
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 14 12:45:26 2012 -0600
|
|
|
|
Fixed bug in reference gemm ukernels.
|
|
|
|
Details:
|
|
- Fixed a bug whereby, for the reference gemm ukernels, the matrix product
|
|
was not correctly accumulated and scaled (by alpha) into the output matrix
|
|
C. (Thanks to Fran for finding this bug.)
|
|
- Whitespace changes to reference trsm kernels.
|
|
|
|
commit e2e7cb2fbe615be4d375bc2dce88d03d98fadc9e
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 13 18:17:54 2012 -0600
|
|
|
|
Expanded reference packm/unpackm kernel set to 16.
|
|
|
|
Details:
|
|
- Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and
|
|
unpackm.
|
|
- Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range"
|
|
kernel size is requested. (Thanks to Tyler for finding this bug.)
|
|
- Updated bl2_kernel.h to contain new _KERNEL definitions, according
|
|
to above changes, for 'reference' and 'clarksville' configurations.
|
|
- Updated CHANGELOG.
|
|
- Removed "output*.m" from .gitignore.
|
|
|
|
commit 17455a8bce038dd570356ab0c5c11d9a89f20248
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 17:23:32 2012 -0600
|
|
|
|
Minor updates towards to 0.0.1.
|
|
|
|
commit 7ad4ebef38b8e6eea9b6091844ba7294ec870271 (tag: 0.0.1)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 16:18:40 2012 -0600
|
|
|
|
Tweaks to get BLIS compiling again on clarksville.
|
|
|
|
Details:
|
|
- Updated header files and make_defs.mk in config/clarksville.
|
|
- Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
|
|
- Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
|
|
- Shuffled include statements in blis2.h.
|
|
|
|
commit cc58ea86010b1f046134d13b546c878389df9af5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 10 14:55:12 2012 -0600
|
|
|
|
Added template fragment.mk; updated .gitignore.
|
|
|
|
commit 714c527b0eb153b7e2040b79349edc8372f743fd
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 19:54:04 2012 -0600
|
|
|
|
Added 'changelog' make target; other tweaks.
|
|
|
|
Details:
|
|
- Updated CHANGELOG.
|
|
- Added 'changelog' target to Makefile that runs 'git log --decorate' and
|
|
overwrites CHANGELOG with the output.
|
|
- Other trivial changes.
|
|
|
|
commit e4e5404d26aded4873278e85faf6f14ac32115b5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 17:34:53 2012 -0600
|
|
|
|
Define static memory pool size in bl2_config.h.
|
|
|
|
commit 19bb507d0de6a2bd3ce37cf616bdcd6b419ed641
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Fri Dec 7 17:18:00 2012 -0600
|
|
|
|
Refined INSTALL text; added 'showconfig' target.
|
|
|
|
Details:
|
|
- Added 'showconfig' target to Makefile.
|
|
- Added header files and ./config/<configname>/make_defs.mk as prerequisites
|
|
to object file rules.
|
|
- Added config.mk as prerequisite to library install rules.
|
|
- Edited and added to INSTALL file.
|
|
|
|
commit 26cb659dd79636489db5a051aa60fff80273a7b9
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 15:34:53 2012 -0600
|
|
|
|
Added auto-detection of version string (via git).
|
|
|
|
Details:
|
|
- Added build/update-version-file.sh script for auto-detecting "version"
|
|
string and updating 'version' file accordingly. (If .git directory is
|
|
not present, then it is assumed this copy of BLIS is a downloaded
|
|
release, in which case 'version' file is left unchanged.)
|
|
- Added invocation of update-version-file.sh to configure script.
|
|
|
|
commit b0ecd0ff52fa6ffc9e1d9eb44c365f7f009a6204
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 14:27:11 2012 -0600
|
|
|
|
Wrote first draft of INSTALL file.
|
|
|
|
commit bcbe81235a35ccfdbcc2f2319a0ca6e04f75a785 (tag: 0.0.0)
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Thu Dec 6 12:42:35 2012 -0600
|
|
|
|
Updated standalone test Makefile and other fixes.
|
|
|
|
Details:
|
|
- Major edits to test/Makefile to bring up-to-date wrt new build system;
|
|
should no longer be broken.
|
|
- Minor edits to top-level Makefile.
|
|
- Fixed copy-and-paste bugs in
|
|
- frame/1m/packm/ukernels/bl2_packm_ref_?xk.c
|
|
- frame/1m/unpackm/ukernels/bl2_unpackm_ref_?xk.c
|
|
|
|
commit 2f272b40f43307909736327f49d17737c7a05d37
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Tue Dec 4 19:22:14 2012 -0600
|
|
|
|
Added build system and continued reorganization.
|
|
|
|
Details:
|
|
- Added/renamed packm, unpackm kernels.
|
|
- Added machine value routines.
|
|
- Added param_map facility.
|
|
- Renamed AUTHORS to CREDITS.
|
|
- Added Makefile; continued to expand upon existing configure script.
|
|
- #define fuse_fac macros in operation headers if not defined already
|
|
(by the user in bl2_kernels.h).
|
|
|
|
commit 00f3498a8943be1b387f0d5c029c8c7891687ad5
|
|
Author: Field G. Van Zee <field@cs.utexas.edu>
|
|
Date: Mon Dec 3 12:36:11 2012 -0600
|
|
|
|
Initial commit.
|