Commit Graph

71 Commits

Author SHA1 Message Date
Field G. Van Zee
a7252e40b5 Generate testsuite objects 'src'.
Details:
- Tweaked the testsuite makefile so that object files are stored in 'src'
  rather than 'obj', since (a) the top-level .gitignore dictates that
  obj directories are to be ignored, and (b) since git has problems
  tracking empty directories. Now, users do not need to create their own
  obj directories within their own local clones of BLIS.
2013-04-08 16:08:22 -05:00
Field G. Van Zee
803871c55b Minor formatting changes. 2013-04-08 15:18:42 -05:00
Field G. Van Zee
a571af816d Fixed definition of bli_is_packed_object() macro.
Details:
- Changed the definition of bli_is_packed_object() so that it keys off of the
  value of the pack schema bits in the info field of obj_t, rather than
  comparing the obj_t buffer with that of the mem_t entry. This was the cause
  of a very low probability bug whereby uninitialized memory caused the macro
  to evaluate to TRUE even though the object in question was not packed.
  Thanks to Vernon Austel of IBM for helping discover this bug.
- Changed an abort() in bli_packm_part() to a not-yet-implemented.
2013-04-08 15:00:13 -05:00
Field G. Van Zee
3be14c32f7 Updated information in testsuite output header.
Details:
- Added to the information that is echoed at the beginning of the test suite's
  output, and also re-labeled some existing information.
2013-04-06 12:54:45 -05:00
Field G. Van Zee
874707c1b1 Fixed edge case handling bug in herk macrokernels.
Details:
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
  only manifests when BLIS is configured such that MR != NR. The bug involves
  incorrectly detecting edge cases, which resulted in some parts of matrix C
  potentially being skipped and not updated, depending on the problem size.
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
  8 and 4, respectively, so that I can better stress the framework on a
  day-to-day basis. (The fact that they were both equal to 4 for so long is
  why I did not stumble upon this bug much sooner.)
2013-04-05 17:19:43 -05:00
Field G. Van Zee
7cbda15291 Added reference microkernels for arbitrary MR, NR.
Details:
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
  contain explicit loops over MR and NR, thus allowing them to be used
  unmodified by developers who want to build a reference library with
  custom register blocksizes.
- Changed config/reference/bli_kernel.h to use above ukernels by default.
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
  to use 'restrict' keyword.
- Added -funroll-loops option to config/reference/make_defs.mk.
- Updated comments in bli_kernel.h describing constraints on register and
  cache blocksizes.
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
  single-char macros are also defined.
2013-04-04 15:25:43 -05:00
Field G. Van Zee
6684b73d55 Implemented amax operation and related changes.
Details:
- Implemented amax operation in BLIS.
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
- Added integer support to [f]printv, [f]printm.
- Added integer support to level-0 copys macros.
- Updated printing of configuration information in test suite driver.
- Comment changes to _config.h files.
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
  used for.
2013-04-02 13:06:20 -05:00
Field G. Van Zee
fb68087f87 More memory alignment-related tweaks.
Details:
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
  passed into posix_memalign() or equivalent.
- Defined new function, bli_align_dim_to_cmem(), which applies the
  contiguous memory alignment (rather than the system/malloc alignment).
2013-03-26 15:10:16 -05:00
Field G. Van Zee
9682ef61db Always define memory alignment size cpp constant.
Details:
- Removed guard around #define for memory alignment size constant.
  Memory alignment should always be enabled, and so this value should
  always be defined.
2013-03-26 14:14:53 -05:00
Field G. Van Zee
3a787cccaa Renamed memory alignment macro constant.
Details:
- Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
  BLIS_MEMORY_ALIGNMENT_SIZE.
2013-03-26 13:59:19 -05:00
Field G. Van Zee
37308f9a50 Align packed panel strides with system alignment.
Details:
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
  subsequent packed panel of A and B begins at an aligned address. (The
  first panel is presumably aligned to system alignment because it is
  aligned to a page boundary, which is typically much larger.)
- Rearranged code in packm_init_pack() to prevent additional conditional
  blocks as a result of the aforementioned change.
- Adjusted contiguous memory allocator so that the system memory alignment
  is used to allocate enough space for each block no matter what kind of
  register blocking is used (even if register blocksize is unit and every
  row/column needs maximal padding).
- Adjusted default blocksizes in reference configuration so that MC*KC
  and KC*NC result in identical footprints for all datatypes.
2013-03-26 12:43:14 -05:00
Field G. Van Zee
40a0654ada CHANGELOG update. 2013-03-24 20:18:12 -05:00
Field G. Van Zee
b65cdc57d9 Migrated 'bl2' prefix to 'bli'.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
  corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
0.0.5
2013-03-24 20:01:49 -05:00
Field G. Van Zee
132bffcef7 Removed several 'old' directories and files.
Details:
- Removed most of the 'old' directories scattered throughout the framework,
  which includes alternate/half-baked/broken implementations.
2013-03-24 18:49:36 -05:00
Field G. Van Zee
551ea4767a Removed #include "blis2.h" from low-level headers.
Details:
- Removed #include of "blis2.h" from various lower-level, operation-specific
  header files throughout the framework. Given that these low-level headers
  are included within #blis2.h in a very specific order, #include'ing blis2.h
  within them directly is unnecessary.
2013-03-24 18:00:10 -05:00
Field G. Van Zee
bc7b318ed0 Added cpp guards to conflicting libflame typedefs.
Details:
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
  This is a temporary hack to allow interoperability with libflame. (Similarly
  temporary changes are being made to libflame's type definitions file.)
2013-03-22 17:18:58 -05:00
Field G. Van Zee
f469907503 Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
Details:
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
  BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
  (e.g. "prefetch" instructions, which are different than the particular
  kind of prefetching/preloading referred to by this constant).
2013-03-22 15:20:15 -05:00
Field G. Van Zee
d1023bfbc6 Removed build/old directory. 2013-03-22 15:09:59 -05:00
Field G. Van Zee
718888849c Deprecated 'flame' configuration.
Details:
- Removed 'flame' configuration, as it was horribly out-of-date.
- Comment changes to bl2_blocksize.c and bl2_mem.c.
2013-03-22 15:07:01 -05:00
Field G. Van Zee
bba38cf4e9 Added missing conjbeta argument to scald. 2013-03-19 18:07:40 -05:00
Field G. Van Zee
1f82b51d06 Relocated packed mem_t dimension fields to obj_t.
Details:
- Removed the m and n (and elem_size) fields from the mem_t object, and added
  m_packed and n_packed fields to obj_t. These new fields track the same as
  the old ones. From an abstraction standpoint, it seemed awkward to store
  those dimensions inside the mem_t.
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
  is passed in, instead of m, n, and elem_size.
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
  functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
  respectively.
- Updated packm variants to access the packed length and width fields from
  their new locations.
2013-03-18 15:37:20 -05:00
Field G. Van Zee
36c782857b CHANGELOG update. 2013-03-18 10:37:03 -05:00
Field G. Van Zee
e7d41229d3 Re-implemented contiguous memory allocator.
Details:
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
  allocator instantiates and initializes three separate memory pool objects,
  each one associated with a separate array of contiguous memory blocks, each
  block of fixed and uniform size. (The three pools are for allocating mc-by-kc
  blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
  objects use a stack structure internally to track which blocks in the region
  have been "checked out" to a thread and which are still available. Critical
  regions are now clearly marked and adaptable to parallel environments (e.g.
  OpenMP). Memory pools are set up when bl2_init() is called.
- Added a new field to the packm control tree node, which indicates what kind
  of packed buffer is being allocated. The enumerated type for this argument
  is defined as packbuf_t in bl2_type_defs.h.
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
  packbuf_t argument to bl2_packm_cntl_obj_create().
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
  bl2_mem_macro_defs.h.
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
  number of blocks of A reserved for the memory allocator.
- Deprecated bl2_align_dim(). Replaced usage with that of
  bl2_align_dim_to_mult(). Turns out that typically we don't need to align
  a dimension to the system alignment, since that value has to do with
  starting addresses, whereas the values we are dealing with are unitless
  dimensions.
0.0.4
2013-03-15 17:12:36 -05:00
Field G. Van Zee
1e76cae00c Perform her2k var1 loops in sequence.
Details:
- Changed variant 1 of her2k so that the two rank-k products are computed
  and accumulated in sequence rather than fused into one loop. This is
  necessary if BLIS is to be configured to provide only enough contiguous
  memory for one panel of B.
2013-03-15 12:21:42 -05:00
Field G. Van Zee
c95c270eba Enhanced tracking of dimensions for mem_t objects.
Details:
- Added new fields to mem_t struct definition to track the allocated (as
  opposed to the currently used) dimensions of the memory region. This
  allows packm_init() to be more robust in situations where memory is
  already allocated but is more than needed for the current packing job.
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
  in packm_init(), to update the "currently used" dimensions of the mem_t
  object if the requested dimensions are smaller than the allocated
  dimensions.
2013-03-07 14:42:15 -06:00
Field G. Van Zee
e99281a0f4 Fixed test suite flop formulas for ops with side.
Details:
- Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
  trmm3, and trsm.
- Comment updates in herk macro-kernels.
2013-03-07 14:00:10 -06:00
Field G. Van Zee
ef8cbfc44d Added "version" to .gitignore.
Details:
- Added "version" to .gitignore file so that the file does not show up when
  running 'git status', or accidentally get pulled into the index when
  running 'git add' or 'git add --all'.
2013-03-02 12:47:06 -06:00
Field G. Van Zee
e9e0747c2f Removed version file from version control.
Details:
- Removed version file from version control to prevent git errors that occur
  when trying to pull new commits.
2013-03-02 12:43:54 -06:00
Field G. Van Zee
bb612f864e Updated behavior of bl2_obj_induce_trans() macro.
Details:
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
  updated as part of the macro. All current uses of the macro have been
  coupled with instances of bl2_obj_set_trans() to clear the bit.
- Added Jed to CREDITS file.
2013-03-01 12:55:42 -06:00
Field G. Van Zee
f24e29b789 Replaced banded/packed BLAS2 stubs with f2c code.
Details:
- Retired the blas2blis wrappers that simply called abort with a "not yet
  implemented" message. This includes all of the level-2 banded and packed
  routines.
- Replaced the aforementioned with the corresponding netlib implementations
  having been run through f2c (with some customization).
- Added directories named 'attic' to build/gen-make-frags/ignore_list.
2013-02-22 18:15:41 -06:00
Field G. Van Zee
1454c1a142 Moved Fortran name-mangling macro to bl2_config.h.
Details:
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
  configuration directory (bl2_config.h, specifically) given that it can be
  expected to be tweaked by some developers.
2013-02-22 12:38:45 -06:00
Field G. Van Zee
ede75693e5 Implemented blas2blis compatibility layer.
Details:
- Added the blas2blis compatibility layer, located in frame/compat. This
  includes virtually all of the BLAS, including banded and packed level-2
  operations.

- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
  initialization, which stores the "exit status" in an err_t, which is then
  read by the latter function to determine whether finalization should actually
  take place.
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
  level-3 BLAS-like wrappers.
- Added configuration option to instruct BLIS to remain initialized whenever
  it automatically initializes itself (via bl2_init_safe()), until/unless the
  application code explicitly calls bl2_finalize().

- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
  templatization of blas2blis wrappers.
- Defined level-0 scalar macro bl2_??swaps().
- Defined level-1v operation bl2_swapv().
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
  wrappers.
0.0.3
2013-02-22 12:11:24 -06:00
Field G. Van Zee
995edf43e2 Updated version file. (Forgot to in prev commit). 2013-02-21 14:30:50 -06:00
Field G. Van Zee
e823b08aaf Fixed some scalar types in BLAS-like Herm APIs.
Details:
- Some of the scalars of Hermitian operations, such as alpha in her,
  alpha and beta in herk, and beta in her2k, need to be real. These
  arguments were typed incorrectly as the complex types. This has been
  fixed. Note the issue was only present in the BLAS-like APIs for
  these operations (not the native object-based interfaces).
2013-02-21 12:00:17 -06:00
Field G. Van Zee
5ece050a66 Updated version file. (Forgot to in prev commit). 2013-02-20 15:50:54 -06:00
Field G. Van Zee
f243034b8b Changed API of packm_init_pack() to use blksz_t.
Details:
- Changed the interface of packm_init_pack() so that mult_m and mult_n
  are passed in as type blksz_t* instead of dim_t.
- Make similar change for packv_init_pack().
2013-02-20 14:11:36 -06:00
Field G. Van Zee
da0c22f241 Minor changes to lower levels of scalm and setm.
Details:
- Removed diagx parameter from lower-level interfaces of scalm.
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
- Changed setm_unb_var1() so that having an implicit unit diagonal results
  in only the strictly lower or upper triangle of the matrix being modified.
2013-02-15 09:59:48 -06:00
Field G. Van Zee
2c836adadc Updated beta == zero semantics of mulsc.
Details:
- Updated beta == zero semantics of mulsc. Hopefully this is the last
  operation that needed updating.
- Added Devin to CREDITS file.
2013-02-14 10:42:56 -06:00
Field G. Van Zee
722b66c7dc Removed some calls to setv() in test modules.
Details:
- Removed calls to setv() in test modules whose sole purpose was to
  initialize vectors to zero to ensure that nan's and inf's would not
  taint the computation. Now that beta == zero semantics have been
  updated to clear the output operand (when beta is zero), rather than
  multiply against it, these setv() calls are no longer needed.
2013-02-14 10:18:00 -06:00
Field G. Van Zee
e6ac623a90 Properly implemented beta == 0 semantics.
Details:
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
  respectively.
- Added code to the following operations that sets the output operand to
  zero if the corresponding scalar is zero (rather than performing the
  floating-point multiply, or in the case of setv, copying the value).
  This will prevent nan's and inf's from creeping into results from
  uninitialized memory.
  - axpy
  - dotxv
  - scalv
  - scal2v
  - setv
  - gemv
  - ger
  - hemv
  - her
  - her2
  - gemm reference ukernels
2013-02-13 18:44:59 -06:00
Field G. Van Zee
aedccbc85d Fixed stale interface to packm_unb_var1().
Details:
- Removed the control tree from the interface to packm_unb_var1(), which
  I meant to do when it was un-deprecated.
2013-02-13 18:29:53 -06:00
Field G. Van Zee
c23135669f Un-deprecated packm_unb_var1.c (needed by l2 ops).
Details:
- Added bl2_packm_unb_var1() back into the mix once I realized that level-2
  operations still need this routine for packing matrices. Now, whether
  level-2 operations should be packing matrices to begin with is another
  matter. But this fixes the segmentation fault one would have gotten when
  running bl2_gemv() on a general stride matrix.
2013-02-13 13:21:00 -06:00
Field G. Van Zee
cf49e35f98 Removed cntl tree usage from packm implementation.
Details:
- Added new fields to obj_t info field:
  - invert_diag
  - pack_order_if_upper
  - pack_order_if_lower
  These fields allow packm_init() to embed information that begins
  in the control tree into the object so that the packm implementation
  does not need to use control trees at all. This is being done to aid
  Bryan's DxT code generation.
- Added macros that operate on above fields.
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
  to above changes.
- Made similar (but much simpler) changes to packv.
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
  These were part of prototype implementations and are no longer needed.
2013-02-12 18:39:35 -06:00
Field G. Van Zee
eb139ae256 Replaced bl2_abs() with _fabs() where appropriate. 2013-02-12 12:39:30 -06:00
Field G. Van Zee
474bac30c9 Removed level-0 macros projrs, grabis.
Details:
- Replaced instances of projrs and grabis macros with newer,
  more general-purpose getris.
2013-02-12 12:23:48 -06:00
Field G. Van Zee
03a260a457 Restored executable permissions to scripts.
Details:
- Restored executable (0755) permissions to scripts that were touched by
  the recursive sed script that updated the copyright headers in the
  previous commit.
2013-02-12 11:45:34 -06:00
Field G. Van Zee
1274e12437 Updated copyright headers from 2012 to 2013. 2013-02-11 14:37:47 -06:00
Field G. Van Zee
3b620cc8e9 CHANGELOG update. 2013-02-11 13:38:07 -06:00
Field G. Van Zee
768fcebaa8 Added unified test suite, and many fixes.
Details:
- Added a highly configurable, unified test suite.

- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
  header files. Now, instead, DUPB is computed as (NDUP != 1) within each
  macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
  incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
  By encoding both pieces of information into one constant in _kernel.h,
  it seems somewhat less likely others will encounter this bug in the
  future.
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
  and defined blocksizes in _cntl.c files to these default values.

- Changed semantics of her2k and syr2k such that these operations no longer
  expect the B matrix to already be conjugate-transposed (or just transposed
  for syr2k). However, these semantics are preserved for the internal
  mechanics of the implementations, including the internal back-end and all
  blocked variants.
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
  respectively.

- Relaxed general object structure constraints in _basic_check() for gemv, ger.
- Changed her front-end to NOT copy-cast to real projection; instead, this is
  replaced by selecting either the real part or both parts within the unblocked
  algorithm implementation, depending on the value of conjh.
- Added conjh to all _check routines for her so that the code knows when to
  verify that alpha has an imaginary component equal to zero (for her, but
  not syr).
- Changed control tree for her to forgo packing.

- Added unit diagonal support to fnormm.
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
- Redefined complex versions of sqrt2s macros using the actual "complex square
  root" formula.
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
- Defined new level-1v, -1d, and -1m versions of add and sub operations
  (two-operand add and subtract).
- Added new scalar macros:
  - getris: acquire real and imaginary components.
  - setris: set real and imaginary components.
  - addjs: addition with conjugated x.
  - subjs: subtraction with conjugated x.
- Defined new utility operations:
  - absumv: element-wise sum of absolute values for vector elements.
  - absumm: element-wise sum of absolute values for matrix elements.
  - mkherm: convert existing matrix to Hermitian.
  - mksymm: convert existing matrix to symmetric.
  - mktrim: convert existing matrix to triangular.

- Added various error checking routines.
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
  wall clock time of a code block.
- Added general stride support to bl2_obj_alloc_buffer().
- Added bl2_obj_init_scalar().
- Updated parameter mapping in bl2_param_map.c.
- Added support for queriable version string.

- Fixed a bug in the her2k macro-kernels (which currently are simply
  implemented in terms of two invocations of herk) whereby beta was being
  applied to both the first and second rank-k updates, rather than only
  the first.
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
  properly implemented due to erroneous assumptions regarding aliasing and
  root objects.
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
  MR x NR block of B was being updated.
- Fixed a bug in the inverts macro in the double real case whereby the
  value was typecast to float before inversion. This affected non-unit cases
  of dtrsm.
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
  constant was being applied incorrectly.
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
  now mimics the rank-k strategy of gemm, whereby alpah is applied during
  the first iteration of variant 3, with BLIS_ONE passed in instead for
  subsequent iterations. This also required passing alpha into the macro-
  kernels as well as the fused gemmtrsm micro-kernels.
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
  called for blocks strictly above the diagonal. While this sounds good in
  theory, this cannot be done because gemm_ker_var2 expects row panels of
  A to be packed from top to bottom, while for trsm_u, A is actually packed
  from bottom to top due to the reverse (BR->TL) nature of the algorithm.
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
  dimensions were mishandled due to incorrect arguments to the copyv kernel.
  Also changed the copyv kernel invocation to scal2v so that these edge
  cases are properly handled when scaling is requested.
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
  instead of the source object.
- Fixed a bug whereby level-2 code could allocate memory dynamically via
  bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
  a potential future bug whereby a mem_t object that is actually no longer
  "allocated" from the static pool is mistaken for being allocated due to
  failure to NULLify the buffer when the block was most recently released.
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
  toggled when the requested subpartition needed to be "reflected" due to it
  residing in an unstored region.
0.0.2
2013-02-11 13:20:44 -06:00
Field G. Van Zee
be94fb84c0 Added missing 'd' to fused gemmtrsm function name. 2013-01-04 10:55:21 -06:00