41 Commits

Author SHA1 Message Date
Field G. Van Zee
096895c5d5 Reorganized code, APIs related to multithreading.
Details:
- Reorganized code and renamed files defining APIs related to multithreading.
  All code that is not specific to a particular operation is now located in a
  new directory: frame/thread. Code is now organized, roughly, by the
  namespace to which it belongs (see below).
- Consolidated all operation-specific *_thrinfo_t object types into a single
  thrinfo_t object type. Operation-specific level-3 *_thrinfo_t APIs were
  also consolidated, leaving bli_l3_thrinfo_*() and bli_packm_thrinfo_*()
  functions (aside from a few general purpose bli_thrinfo_*() functions).
- Renamed thread_comm_t object type to thrcomm_t.
- Renamed many of the routines and functions (and macros) for multithreading.
  We now have the following API namespaces:
  - bli_thrinfo_*(): functions related to thrinfo_t objects
  - bli_thrcomm_*(): functions related to thrcomm_t objects.
  - bli_thread_*(): general-purpose functions, such as initialization,
    finalization, and computing ranges. (For now, some macros, such as
    bli_thread_[io]broadcast() and bli_thread_[io]barrier() use the
    bli_thread_ namespace prefix, even though bli_thrinfo_ may be more
    appropriate.)
- Renamed thread-related macros so that they use a bli_ prefix.
- Renamed control tree-related macros so that they use a bli_ prefix (to be
  consistent with the thread-related macros that were also renamed).
- Removed #undef BLIS_SIMD_ALIGN_SIZE from dunnington's bli_kernel.h. This
  #undef was a temporary fix to some macro defaults which were being applied
  in the wrong order, which was recently fixed.
2016-06-06 13:32:04 -05:00
Field G. Van Zee
898614a555 Version file update (0.2.0) 2016-04-11 17:32:09 -05:00
Field G. Van Zee
47caa33485 Version file update (0.1.8) 2015-07-29 13:31:09 -05:00
Field G. Van Zee
267253de8a Version file update (0.1.7) 2015-06-19 12:01:49 -05:00
Field G. Van Zee
38ea5022e4 Version file update (0.1.6) 2014-10-23 11:35:45 -05:00
Field G. Van Zee
bde56d0ecf Version file update (0.1.5) 2014-08-04 16:01:58 -05:00
Field G. Van Zee
a7537071b1 Version file update (0.1.4) 2014-07-27 18:20:12 -05:00
Field G. Van Zee
036cc63491 Version file update (0.1.3) 2014-06-23 13:48:17 -05:00
Field G. Van Zee
09d9a3bf67 Reverting version file to test new version script.
Details:
- Changed version file contents to 0.1.2 so that I can test out a new
  version file bumping script.
2014-06-23 13:43:26 -05:00
Field G. Van Zee
ebb3396598 Added 'version' file. 2014-06-23 11:22:50 -05:00
Field G. Van Zee
e9e0747c2f Removed version file from version control.
Details:
- Removed version file from version control to prevent git errors that occur
  when trying to pull new commits.
2013-03-02 12:43:54 -06:00
Field G. Van Zee
bb612f864e Updated behavior of bl2_obj_induce_trans() macro.
Details:
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
  updated as part of the macro. All current uses of the macro have been
  coupled with instances of bl2_obj_set_trans() to clear the bit.
- Added Jed to CREDITS file.
2013-03-01 12:55:42 -06:00
Field G. Van Zee
f24e29b789 Replaced banded/packed BLAS2 stubs with f2c code.
Details:
- Retired the blas2blis wrappers that simply called abort with a "not yet
  implemented" message. This includes all of the level-2 banded and packed
  routines.
- Replaced the aforementioned with the corresponding netlib implementations
  having been run through f2c (with some customization).
- Added directories named 'attic' to build/gen-make-frags/ignore_list.
2013-02-22 18:15:41 -06:00
Field G. Van Zee
1454c1a142 Moved Fortran name-mangling macro to bl2_config.h.
Details:
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
  configuration directory (bl2_config.h, specifically) given that it can be
  expected to be tweaked by some developers.
2013-02-22 12:38:45 -06:00
Field G. Van Zee
ede75693e5 Implemented blas2blis compatibility layer.
Details:
- Added the blas2blis compatibility layer, located in frame/compat. This
  includes virtually all of the BLAS, including banded and packed level-2
  operations.

- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
  initialization, which stores the "exit status" in an err_t, which is then
  read by the latter function to determine whether finalization should actually
  take place.
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
  level-3 BLAS-like wrappers.
- Added configuration option to instruct BLIS to remain initialized whenever
  it automatically initializes itself (via bl2_init_safe()), until/unless the
  application code explicitly calls bl2_finalize().

- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
  templatization of blas2blis wrappers.
- Defined level-0 scalar macro bl2_??swaps().
- Defined level-1v operation bl2_swapv().
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
  wrappers.
2013-02-22 12:11:24 -06:00
Field G. Van Zee
995edf43e2 Updated version file. (Forgot to in prev commit). 2013-02-21 14:30:50 -06:00
Field G. Van Zee
5ece050a66 Updated version file. (Forgot to in prev commit). 2013-02-20 15:50:54 -06:00
Field G. Van Zee
da0c22f241 Minor changes to lower levels of scalm and setm.
Details:
- Removed diagx parameter from lower-level interfaces of scalm.
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
- Changed setm_unb_var1() so that having an implicit unit diagonal results
  in only the strictly lower or upper triangle of the matrix being modified.
2013-02-15 09:59:48 -06:00
Field G. Van Zee
2c836adadc Updated beta == zero semantics of mulsc.
Details:
- Updated beta == zero semantics of mulsc. Hopefully this is the last
  operation that needed updating.
- Added Devin to CREDITS file.
2013-02-14 10:42:56 -06:00
Field G. Van Zee
722b66c7dc Removed some calls to setv() in test modules.
Details:
- Removed calls to setv() in test modules whose sole purpose was to
  initialize vectors to zero to ensure that nan's and inf's would not
  taint the computation. Now that beta == zero semantics have been
  updated to clear the output operand (when beta is zero), rather than
  multiply against it, these setv() calls are no longer needed.
2013-02-14 10:18:00 -06:00
Field G. Van Zee
e6ac623a90 Properly implemented beta == 0 semantics.
Details:
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
  respectively.
- Added code to the following operations that sets the output operand to
  zero if the corresponding scalar is zero (rather than performing the
  floating-point multiply, or in the case of setv, copying the value).
  This will prevent nan's and inf's from creeping into results from
  uninitialized memory.
  - axpy
  - dotxv
  - scalv
  - scal2v
  - setv
  - gemv
  - ger
  - hemv
  - her
  - her2
  - gemm reference ukernels
2013-02-13 18:44:59 -06:00
Field G. Van Zee
c23135669f Un-deprecated packm_unb_var1.c (needed by l2 ops).
Details:
- Added bl2_packm_unb_var1() back into the mix once I realized that level-2
  operations still need this routine for packing matrices. Now, whether
  level-2 operations should be packing matrices to begin with is another
  matter. But this fixes the segmentation fault one would have gotten when
  running bl2_gemv() on a general stride matrix.
2013-02-13 13:21:00 -06:00
Field G. Van Zee
cf49e35f98 Removed cntl tree usage from packm implementation.
Details:
- Added new fields to obj_t info field:
  - invert_diag
  - pack_order_if_upper
  - pack_order_if_lower
  These fields allow packm_init() to embed information that begins
  in the control tree into the object so that the packm implementation
  does not need to use control trees at all. This is being done to aid
  Bryan's DxT code generation.
- Added macros that operate on above fields.
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
  to above changes.
- Made similar (but much simpler) changes to packv.
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
  These were part of prototype implementations and are no longer needed.
2013-02-12 18:39:35 -06:00
Field G. Van Zee
474bac30c9 Removed level-0 macros projrs, grabis.
Details:
- Replaced instances of projrs and grabis macros with newer,
  more general-purpose getris.
2013-02-12 12:23:48 -06:00
Field G. Van Zee
1274e12437 Updated copyright headers from 2012 to 2013. 2013-02-11 14:37:47 -06:00
Field G. Van Zee
768fcebaa8 Added unified test suite, and many fixes.
Details:
- Added a highly configurable, unified test suite.

- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
  header files. Now, instead, DUPB is computed as (NDUP != 1) within each
  macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
  incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
  By encoding both pieces of information into one constant in _kernel.h,
  it seems somewhat less likely others will encounter this bug in the
  future.
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
  and defined blocksizes in _cntl.c files to these default values.

- Changed semantics of her2k and syr2k such that these operations no longer
  expect the B matrix to already be conjugate-transposed (or just transposed
  for syr2k). However, these semantics are preserved for the internal
  mechanics of the implementations, including the internal back-end and all
  blocked variants.
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
  respectively.

- Relaxed general object structure constraints in _basic_check() for gemv, ger.
- Changed her front-end to NOT copy-cast to real projection; instead, this is
  replaced by selecting either the real part or both parts within the unblocked
  algorithm implementation, depending on the value of conjh.
- Added conjh to all _check routines for her so that the code knows when to
  verify that alpha has an imaginary component equal to zero (for her, but
  not syr).
- Changed control tree for her to forgo packing.

- Added unit diagonal support to fnormm.
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
- Redefined complex versions of sqrt2s macros using the actual "complex square
  root" formula.
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
- Defined new level-1v, -1d, and -1m versions of add and sub operations
  (two-operand add and subtract).
- Added new scalar macros:
  - getris: acquire real and imaginary components.
  - setris: set real and imaginary components.
  - addjs: addition with conjugated x.
  - subjs: subtraction with conjugated x.
- Defined new utility operations:
  - absumv: element-wise sum of absolute values for vector elements.
  - absumm: element-wise sum of absolute values for matrix elements.
  - mkherm: convert existing matrix to Hermitian.
  - mksymm: convert existing matrix to symmetric.
  - mktrim: convert existing matrix to triangular.

- Added various error checking routines.
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
  wall clock time of a code block.
- Added general stride support to bl2_obj_alloc_buffer().
- Added bl2_obj_init_scalar().
- Updated parameter mapping in bl2_param_map.c.
- Added support for queriable version string.

- Fixed a bug in the her2k macro-kernels (which currently are simply
  implemented in terms of two invocations of herk) whereby beta was being
  applied to both the first and second rank-k updates, rather than only
  the first.
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
  properly implemented due to erroneous assumptions regarding aliasing and
  root objects.
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
  MR x NR block of B was being updated.
- Fixed a bug in the inverts macro in the double real case whereby the
  value was typecast to float before inversion. This affected non-unit cases
  of dtrsm.
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
  constant was being applied incorrectly.
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
  now mimics the rank-k strategy of gemm, whereby alpah is applied during
  the first iteration of variant 3, with BLIS_ONE passed in instead for
  subsequent iterations. This also required passing alpha into the macro-
  kernels as well as the fused gemmtrsm micro-kernels.
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
  called for blocks strictly above the diagonal. While this sounds good in
  theory, this cannot be done because gemm_ker_var2 expects row panels of
  A to be packed from top to bottom, while for trsm_u, A is actually packed
  from bottom to top due to the reverse (BR->TL) nature of the algorithm.
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
  dimensions were mishandled due to incorrect arguments to the copyv kernel.
  Also changed the copyv kernel invocation to scal2v so that these edge
  cases are properly handled when scaling is requested.
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
  instead of the source object.
- Fixed a bug whereby level-2 code could allocate memory dynamically via
  bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
  a potential future bug whereby a mem_t object that is actually no longer
  "allocated" from the static pool is mistaken for being allocated due to
  failure to NULLify the buffer when the block was most recently released.
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
  toggled when the requested subpartition needed to be "reflected" due to it
  residing in an unstored region.
2013-02-11 13:20:44 -06:00
Field G. Van Zee
879a179e1d Added debug statements to bl2_mm_acquire_m().
Details:
- Added printf() statements to bl2_mm_acquire_m() to help debug issues
  with prematurely exhausted memory pool.
- Removed 'd' from kernel names of reference kernels in clarksville
  configuration's bl2_kernel.h
2013-01-04 10:37:27 -06:00
Field G. Van Zee
806e74beb4 Defined Frobenius norm operations.
Details:
- Added level-0 grabis macro operation to grab imaginary component of one
  variable and copy it to the real component of another variable.
- Defined sumsqv operation, which computes the sum of the absolute squares
  of the elements of a vector. This implementation is modeled after ?lassq
  in netlib LAPACK.
- Defined fnormv and fnormm operations, which compute the Frobenius norm on
  vectors and matrices, respectively. These operations are treated as one-
  operand operations where the output norm value is the real projection of
  the datatype of the input operand. Both operations are implemented in terms
  of sumsqv.
2012-12-20 17:07:50 -06:00
Field G. Van Zee
6fbbdd4e19 More tweaks to _config.h, _kernel.h; smem tweaks.
Details:
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
  accomplishes the same thing (enabling posix_memalign()) without enabling
  all of the GNU extensions we don't need.
- Defined the size of the static memory pool in terms of MC, KC, and NC,
  as well as two new constants that determine how many MCxKC blocks and
  how many KCxNC blocks should be allocated (defined in bl2_config.h).
- In the case of static memory pool exhaustion, replaced the generic
  bl2_abort() with a specific error code call.
2012-12-18 14:34:02 -06:00
Field G. Van Zee
5d8bdb21c4 Minor reordering of bl2_config.h definitions. 2012-12-17 16:07:36 -06:00
Field G. Van Zee
4a83f67490 Consolidated configuration headers.
Details:
- Merged contents of bl2_arch.h into bl2_config.h for reference and
  clarksville configurations.
- Updated CREDITS, INSTALL, LICENSE, README files.
2012-12-17 12:35:54 -06:00
Field G. Van Zee
0670c33cc1 Fixed bug in reference gemm ukernels.
Details:
- Fixed a bug whereby, for the reference gemm ukernels, the matrix product
  was not correctly accumulated and scaled (by alpha) into the output matrix
  C. (Thanks to Fran for finding this bug.)
- Whitespace changes to reference trsm kernels.
2012-12-14 12:45:26 -06:00
Field G. Van Zee
e2e7cb2fbe Expanded reference packm/unpackm kernel set to 16.
Details:
- Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and
  unpackm.
- Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range"
  kernel size is requested. (Thanks to Tyler for finding this bug.)
- Updated bl2_kernel.h to contain new _KERNEL definitions, according
  to above changes, for 'reference' and 'clarksville' configurations.
- Updated CHANGELOG.
- Removed "output*.m" from .gitignore.
2012-12-13 18:17:54 -06:00
Field G. Van Zee
17455a8bce Minor updates towards to 0.0.1. 2012-12-10 17:23:32 -06:00
Field G. Van Zee
7ad4ebef38 Tweaks to get BLIS compiling again on clarksville.
Details:
- Updated header files and make_defs.mk in config/clarksville.
- Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
- Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
- Shuffled include statements in blis2.h.
2012-12-10 16:18:40 -06:00
Field G. Van Zee
cc58ea8601 Added template fragment.mk; updated .gitignore. 2012-12-10 14:55:12 -06:00
Field G. Van Zee
714c527b0e Added 'changelog' make target; other tweaks.
Details:
- Updated CHANGELOG.
- Added 'changelog' target to Makefile that runs 'git log --decorate' and
  overwrites CHANGELOG with the output.
- Other trivial changes.
2012-12-07 19:54:04 -06:00
Field G. Van Zee
19bb507d0d Refined INSTALL text; added 'showconfig' target.
Details:
- Added 'showconfig' target to Makefile.
- Added header files and ./config/<configname>/make_defs.mk as prerequisites
  to object file rules.
- Added config.mk as prerequisite to library install rules.
- Edited and added to INSTALL file.
2012-12-07 17:18:00 -06:00
Field G. Van Zee
26cb659dd7 Added auto-detection of version string (via git).
Details:
- Added build/update-version-file.sh script for auto-detecting "version"
  string and updating 'version' file accordingly. (If .git directory is
  not present, then it is assumed this copy of BLIS is a downloaded
  release, in which case 'version' file is left unchanged.)
- Added invocation of update-version-file.sh to configure script.
2012-12-06 15:34:53 -06:00
Field G. Van Zee
2f272b40f4 Added build system and continued reorganization.
Details:
- Added/renamed packm, unpackm kernels.
- Added machine value routines.
- Added param_map facility.
- Renamed AUTHORS to CREDITS.
- Added Makefile; continued to expand upon existing configure script.
- #define fuse_fac macros in operation headers if not defined already
  (by the user in bl2_kernels.h).
2012-12-04 19:22:14 -06:00
Field G. Van Zee
00f3498a89 Initial commit. 2012-12-03 12:36:11 -06:00