Commit Graph

53 Commits

Author SHA1 Message Date
Field G. Van Zee
03f6c35997 Tightened some macros that detect datatypes.
Details:
- Modified the definitions of some macros, such as bli_is_real(), so that
  the "special" bit is taken into account so that BLIS_INT is differentiated
  from BLIS_FLOAT.
- Whitespace changes to bli_obj_macro_defs.h.
- Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
  being used.
2013-07-22 12:54:32 -05:00
Field G. Van Zee
4e80ad28c9 Added support for C99 complex types/arithmetic.
Details:
- Added support for C99 complex types to bli_type_defs.h and overloaded
  complex arithmetic to the scalar-level macros in include/level0. This
  includes a somewhat substantial reorganization and re-layering of much
  of the existing machinery present in the level0 macros.
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
  commented-out by default, which optionally enables the use of built-in
  C99 complex types and arithmetic.
- Minor changes to clarksville and reference configs' make_defs.mk files.
- Removed macro definitions from bli_param_macro_defs.h which was not being
  used (bli_proj_dt_to_real_if_imag_eq0).
2013-07-18 17:53:31 -05:00
Field G. Van Zee
aec12d90f5 Removed copynzv, copynzm and related codes.
Details:
- Removed copynzv and copynzm operation directories. These operations
  implemented a variation of copyv/m that, in the case of real source
  and complex destination operands, leaves the imaginary component
  untouched (rather than setting it to zero). I realize now that the
  special case(s) (e.g. gemm with real A and B but complex C) that I
  thought required this operation actually can be handled more simply.
- Removed level0 scalar macros implementing copynzs, copynzjs.
2013-07-10 13:33:30 -05:00
Field G. Van Zee
b0a0a0f274 Added handling of restrict, stdint.h for non-C99.
Details:
- Removed the #include <stdint.h> from blis.h and inserted a cpp macro block
  in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise
  manually typedefs the types we need (which, for now, are unconditionally
  int64_t and uint64_t).
- Moved basic typedefs to top of bli_type_defs.h, and comment changes.
- Added cpp macro block to bli_macro_defs.h that #defines restrict as
  nothing for C++ and non-C99.
2013-07-09 17:15:38 -05:00
Field G. Van Zee
4b7e7970f1 Migrated integer usage to stdint.h types.
Details:
- Changed the way bli_type_defs.h defines integer types so that dim_t,
  inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
  integer) or guint_t (general unsigned integer).
- Renamed Fortran types fchar and fint to f77_char and f77_int.
- Define f77_int as int64_t if a new configuration variable,
  BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
  These types are defined in stdint.h, which is now included in blis.h.
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
  in terms of scomplex.
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
  of char.
- Updated bla_amax() wrappers so that the return type is defined directly
  as f77_int, rather than letting the prototype-generating macro decide
  the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
  so I removed them. Also, changed the body of the wrapper so that a
  gint_t is passed into abmaxv, which is THEN typecast to an f77_int
  before returning the value.
- Updated f2c code that accessed .r and .i fields of complex and
  doublecomplex types so that they use .real and .imag instead (now that
  we are using scomplex and dcomplex).
2013-07-08 15:20:34 -05:00
Field G. Van Zee
3725013985 Added experimental bli_gemm_ker_var5().
Details:
- Added support for an experimental gemm macro-kernel incrementally
  packs one micro-panel of B at a time. This is useful for certain
  special cases of gemm where m is small.
- Minor changes to default values of clarksville configuration.
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
  do not yet have any use (or implementation support) for block storage.
- Comment update to bli_packm_init.c.
2013-07-08 11:24:18 -05:00
Field G. Van Zee
46d3d09d49 Consolidated lower/upper her[2]k blocked variants.
Details:
- Consolidated lower and upper blocked variants for herk and her2k, and
  renamed the resulting variants, according to the same changes recently
  made to trmm and trsm.
- Implemented support for four new subpartitions types:
    BLIS_SUBPART1T
    BLIS_SUBPART1B
    BLIS_SUBPART1L
    BLIS_SUBPART1R
  which correspond to "merged" partitions that include the middle "1"
  partition as well as either the neighboring "0" or "2" partition. This is
  used to clean up code in herk/her2k var2 that attempts to partition away
  the strictly zero region above or below the diagonal of a matrix operand
  that is being marched through diagonally.
- Added safeguards to herk macro-kernels that skip any leading or trailing
  zero region in the panel of C that is passed in. This is now needed given
  that herk/her2k var1 no longer partitions off this zero region before
  calling the macro-kernel (via bli_her[2]k_int()).
- Updated comments and other whitespace changes to trmm/trsm macro-kernels.
2013-06-27 13:19:56 -05:00
Field G. Van Zee
02002ef6f3 Added row-storage optimizations for trmm, trsm.
Details:
- Implemented algorithmic optimizations for trmm and trsm whereby the right
  side case is now handled explicitly, rather than induced indirectly by
  transposing and swapping strides on operands. This allows us to walk through
  the output matrix with favorable access patterns no matter how it is stored,
  for all parameter combinations.
- Renamed trmm and trsm blocked variants so that there is no longer a
  lower/upper distinction. Instead, we simply label the variants by which
  dimension is partitioned and whether the variant marches forwards or
  backwards through the corresponding partitioned operands.
- Added support for row-stored packing of lower and upper triangular matrices
  (as provided by bli_packm_blk_var3.c).
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
  blocksize  extensions (if non-zero) were not being used to appropriately size
  the first iteration (ie: the bottom/right edge case).
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
  whole multiples of MR AND NR. This is needed for the case of trsm_r where,
  in order to reuse existing left-side gemmtrsm fused micro-kernels, the
  packing of A (left-hand operand) and B (right-hand operand) is done with
  NR and MR, respectively (instead of MR and NR).
2013-06-24 17:08:14 -05:00
Field G. Van Zee
08475e7c76 Various level-3 optimizations for row storage.
Details:
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
  packing from a lower or upper-stored symmetric/Hermitian matrix to column
  panels (which are row-stored). Previously one could only pack to row panels
  (which are column-stored).
- Implemented various optimizations in the level-3 front-ends that allow more
  favorable access through row-stored matrices for gemm, hemm, herk, her2k,
  symm, syrk, and syr2k.
- Cleaned up code in level-3 front-ends that has to do with setting target and
  execution datatypes.
2013-06-11 12:18:39 -05:00
Field G. Van Zee
22b06cfcd2 Updated level-1/-1f [vector intrinsic] kernels.
Details:
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
  handled by reference implementation (rather than aborted).
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
  configuration.
- Defined bli_offset_from_alignment() macro.
- Minor edits to old test drivers.
2013-06-03 16:54:52 -05:00
Field G. Van Zee
85a6d1c9a5 Replaced axpys usage with subs in trsv.
Details:
- Replaced instances of axpys with alpha equal to -1 with subs.
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
  sizeof(dcomplex).
2013-05-29 10:58:24 -05:00
Field G. Van Zee
2d9c667f3c Fixed x86_64 kernel bugs and other minor issues.
Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
  unaligned subpartitions. We were already going out of our way a bit to
  handle edge cases in the first iteration for blocked variants, and this
  was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
  into account how the choice of variant needed to be altered for
  upper-stored matrices (given that only lower-stored algorithms are
  explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
  macros to provide inlined versions of bli_determine_blocksize_[fb]() for
  use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
  consistency with that of the bugfix for trmv/trsv (both of which now
  use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
  vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
  conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
  was invalid only because the code was expecting 1 (for purposes of
  performing contiguous vector loads) but got a value greater than 1 because
  the column stride of the object (e.g. rho) was inflated for alignment
  purposes (albeit unnecessarily since there is only one element in the
  object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
  dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
  internal back-ends to correctly handle cases where output operand still
  needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
2013-05-24 16:28:10 -05:00
Field G. Van Zee
d57ec42b34 Renamed _trans_status() macro.
Details:
- Mistakenly forgot to rename the _trans_status() macro and instances in
  previous commit.
2013-05-03 17:35:32 -05:00
Field G. Van Zee
9e2b227866 Renamed _set_trans(), _trans_status() macros.
Details:
- Renamed the following macros:
    bli_obj_set_trans()    -> bli_obj_set_onlytrans()
    bli_obj_trans_status() -> bli_obj_onlytrans_status()
  to remove ambiguity as to which bits are read/updated.
2013-05-03 17:24:58 -05:00
Field G. Van Zee
6bfa96f848 Absorbed blocksize extensions into main objects.
Details:
- Revamped some parts of commit b6ef84fad1 by adding blocksize extension
  fields to the blksz_t object rather than have them as separate structs.
- Updated all packm interfaces/invocations according to above change.
- Generalized bli_determine_blocksize_?() so that edge case optimization
  happens if and only if cache blocksizes are created with non-zero
  extensions.
- Updated comments in bli_kernel.h files to indicate that the edge case
  blocksize extension mechanism is now available for use.
2013-04-30 19:35:54 -05:00
Field G. Van Zee
096b366ddc Use cntl trees that block in n dimension.
Details:
- Updated _cntl.c files for each level-3 operation to induce blocked
  algorithms that first paritition in the n dimension with a blocksize
  of NC. Typically this is not an issue since only very large problems
  exceed that of NC. But developers often run very large problems, and
  so this extra blocking should be the default.
- Removed some recently introduced but now unused macros from
  bli_param_macro_defs.h.
2013-04-25 16:43:43 -05:00
Field G. Van Zee
b6e24b23cb Use PASTEMAC in macro-kernels (over MAC2 or MAC3).
Details:
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
  and PASTEMAC3) with those that only use a single type (PASTEMAC).
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
  accommodate above change.
- Fixed comment typo in bli_config.h files.
- Added .nfs* pattern to .gitignore.
2013-04-25 12:06:12 -05:00
Field G. Van Zee
9d10d7dd9b Added a_next, b_next arguments to micro-kernels.
Details:
- Added two more arguments to the gemm and gemmtrsm microkernels: the
  addresses of the next micro-panels of A and B. By passing these
  pointers into the micro-kernel, we allow the micro-kernel author to
  prefetch micro-panels of A and B as necessary (though this is
  completely optional; these addresses may also be safely ignored).
- Updated all seven macro-kernels so that they compute and pass in
  a_next and b_next. Note that ONLY the gemm macro-kernel computes
  a_next and b_next with the precise semantics we want. I will go back
  and fix the other macro-kernels in the near future.
- Added 'restrict' to various micro-kernels from which it was missing.
2013-04-23 16:00:18 -05:00
Field G. Van Zee
2d6f9e8379 Disabled blocksize checks for memory pools.
Details:
- Temporarily disabled checks that ensure that enough memory will be allocated
  by the contiguous memory allocator for all types, given that the values for
  double precision real are the ones used to allocate the space. These checks
  can easily go awry in certain situations, especially if you are developing for
  only one datatype. So for now, they are probably more trouble than they are
  worth.
2013-04-21 15:10:34 -05:00
Field G. Van Zee
b6ef84fad1 Allow ldim of packed micro-panels != MR, NR.
Details:
- Made substantial changes throughout the framework to decouple the leading
  dimension (row or column stride) used within each packed micro-panel from
  the corresponding register blocksize. It appears advantageous on some
  systems to use, for example, packed micro-panels of A where the column
  stride is greater than MR (whereas previously it was always equal to MR).
- Changes include:
  - Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
    to use when packing micro-panels of A and B.
  - Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
    where appropriate, instead of MR and NR.
  - Added pd field (panel dimension) to obj_t.
  - New interface to bli_packm_cntl_obj_create().
  - Renamed bli_obj_packed_length()/_width() macros to
    bli_obj_padded_length()/_width().
  - Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
  - Print out new cache and register blocksize extensions in test suite.
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
  blocksize for edge cases, which can improve performance at the margins.
2013-04-21 15:00:24 -05:00
Field G. Van Zee
19155a768d Fixed overzealous type-checking in bli_getsc().
Details:
- Relaxed type checking in getsc so that the input object could be a constant
  and not just a proper floating-point type. (If it is a constant, default to
  extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
  bug.
- Added definition for bli_is_constant() in bli_param_macro_defs.h
- Comment updates to various level-0 scalar routines.
2013-04-16 11:24:03 -05:00
Field G. Van Zee
2ee6bbca29 Fixed bug in bli_obj_is_packed() and renamed.
Details:
- This macro is used to determine whether the partitioning routines should
  call a corresponding packm_part routine instead. However, it was
  unintentionally catching matrices that were marked as "packed" by virtue
  of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
  The macro has now been renamed to bli_obj_is_panel_packed(), and now only
  checks for row or column panel packing. (Note that I first attempted to
  fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
  erroneous behavior that led me to this bug.
2013-04-15 19:27:57 -05:00
Field G. Van Zee
26cbd52e36 Modified bli_kernel.h include order in blis.h.
Details:
- Delayed #include of bli_kernel.h in blis.h to prevent a situation where
  _kernel.h includes an optimized microkernel header, which uses BLIS types
  such as dim_t and inc_t, which would precede the definition of those types
  in bli_type_defs.h.
- Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
  (immediately after that of bli_kernel.h).
2013-04-14 19:05:33 -05:00
Field G. Van Zee
d43d1a0a2e Appended 'f2c_' to abs, min, max macros in f2c.h.
Details:
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
  would not conflict with anything defined by the user (or the language).
  Thanks to Devin Matthews for suggesting this fix.
- Updated all instances of the above macros accordingly.
2013-04-11 16:28:17 -05:00
Field G. Van Zee
31b100e7bf Added new kernel blocksize macro aliases.
Details:
- Added new macros that alias level-3 cache and register blocksize macros
  to names that can be constructed via the PASTEMAC macro. These aliased
  macro definitions live inside bli_kernel_macro_defs.h, which is now
  #included after bli_kernel.h.
- Modified macro-kernels to use new aliased blocksize macros instead of
  operation-specific ones.
- Removed local, operation-specific kernel blocksize macro definitions
  (found in macro-kernel header files).
2013-04-11 11:11:52 -05:00
Field G. Van Zee
4afe3bfd82 Renamed/moved object scalar constant macros.
Details:
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
  simplier macro in bli_obj_macro_defs.h.
- Updated invocations of old macros accordingly.
- Removed bli_const_defs.h.
2013-04-09 17:45:39 -05:00
Field G. Van Zee
803871c55b Minor formatting changes. 2013-04-08 15:18:42 -05:00
Field G. Van Zee
a571af816d Fixed definition of bli_is_packed_object() macro.
Details:
- Changed the definition of bli_is_packed_object() so that it keys off of the
  value of the pack schema bits in the info field of obj_t, rather than
  comparing the obj_t buffer with that of the mem_t entry. This was the cause
  of a very low probability bug whereby uninitialized memory caused the macro
  to evaluate to TRUE even though the object in question was not packed.
  Thanks to Vernon Austel of IBM for helping discover this bug.
- Changed an abort() in bli_packm_part() to a not-yet-implemented.
2013-04-08 15:00:13 -05:00
Field G. Van Zee
7cbda15291 Added reference microkernels for arbitrary MR, NR.
Details:
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
  contain explicit loops over MR and NR, thus allowing them to be used
  unmodified by developers who want to build a reference library with
  custom register blocksizes.
- Changed config/reference/bli_kernel.h to use above ukernels by default.
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
  to use 'restrict' keyword.
- Added -funroll-loops option to config/reference/make_defs.mk.
- Updated comments in bli_kernel.h describing constraints on register and
  cache blocksizes.
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
  single-char macros are also defined.
2013-04-04 15:25:43 -05:00
Field G. Van Zee
6684b73d55 Implemented amax operation and related changes.
Details:
- Implemented amax operation in BLIS.
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
- Added integer support to [f]printv, [f]printm.
- Added integer support to level-0 copys macros.
- Updated printing of configuration information in test suite driver.
- Comment changes to _config.h files.
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
  used for.
2013-04-02 13:06:20 -05:00
Field G. Van Zee
b65cdc57d9 Migrated 'bl2' prefix to 'bli'.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
  corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
2013-03-24 20:01:49 -05:00
Field G. Van Zee
132bffcef7 Removed several 'old' directories and files.
Details:
- Removed most of the 'old' directories scattered throughout the framework,
  which includes alternate/half-baked/broken implementations.
2013-03-24 18:49:36 -05:00
Field G. Van Zee
bc7b318ed0 Added cpp guards to conflicting libflame typedefs.
Details:
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
  This is a temporary hack to allow interoperability with libflame. (Similarly
  temporary changes are being made to libflame's type definitions file.)
2013-03-22 17:18:58 -05:00
Field G. Van Zee
1f82b51d06 Relocated packed mem_t dimension fields to obj_t.
Details:
- Removed the m and n (and elem_size) fields from the mem_t object, and added
  m_packed and n_packed fields to obj_t. These new fields track the same as
  the old ones. From an abstraction standpoint, it seemed awkward to store
  those dimensions inside the mem_t.
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
  is passed in, instead of m, n, and elem_size.
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
  functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
  respectively.
- Updated packm variants to access the packed length and width fields from
  their new locations.
2013-03-18 15:37:20 -05:00
Field G. Van Zee
e7d41229d3 Re-implemented contiguous memory allocator.
Details:
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
  allocator instantiates and initializes three separate memory pool objects,
  each one associated with a separate array of contiguous memory blocks, each
  block of fixed and uniform size. (The three pools are for allocating mc-by-kc
  blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
  objects use a stack structure internally to track which blocks in the region
  have been "checked out" to a thread and which are still available. Critical
  regions are now clearly marked and adaptable to parallel environments (e.g.
  OpenMP). Memory pools are set up when bl2_init() is called.
- Added a new field to the packm control tree node, which indicates what kind
  of packed buffer is being allocated. The enumerated type for this argument
  is defined as packbuf_t in bl2_type_defs.h.
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
  packbuf_t argument to bl2_packm_cntl_obj_create().
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
  bl2_mem_macro_defs.h.
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
  number of blocks of A reserved for the memory allocator.
- Deprecated bl2_align_dim(). Replaced usage with that of
  bl2_align_dim_to_mult(). Turns out that typically we don't need to align
  a dimension to the system alignment, since that value has to do with
  starting addresses, whereas the values we are dealing with are unitless
  dimensions.
2013-03-15 17:12:36 -05:00
Field G. Van Zee
c95c270eba Enhanced tracking of dimensions for mem_t objects.
Details:
- Added new fields to mem_t struct definition to track the allocated (as
  opposed to the currently used) dimensions of the memory region. This
  allows packm_init() to be more robust in situations where memory is
  already allocated but is more than needed for the current packing job.
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
  in packm_init(), to update the "currently used" dimensions of the mem_t
  object if the requested dimensions are smaller than the allocated
  dimensions.
2013-03-07 14:42:15 -06:00
Field G. Van Zee
bb612f864e Updated behavior of bl2_obj_induce_trans() macro.
Details:
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
  updated as part of the macro. All current uses of the macro have been
  coupled with instances of bl2_obj_set_trans() to clear the bit.
- Added Jed to CREDITS file.
2013-03-01 12:55:42 -06:00
Field G. Van Zee
1454c1a142 Moved Fortran name-mangling macro to bl2_config.h.
Details:
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
  configuration directory (bl2_config.h, specifically) given that it can be
  expected to be tweaked by some developers.
2013-02-22 12:38:45 -06:00
Field G. Van Zee
ede75693e5 Implemented blas2blis compatibility layer.
Details:
- Added the blas2blis compatibility layer, located in frame/compat. This
  includes virtually all of the BLAS, including banded and packed level-2
  operations.

- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
  initialization, which stores the "exit status" in an err_t, which is then
  read by the latter function to determine whether finalization should actually
  take place.
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
  level-3 BLAS-like wrappers.
- Added configuration option to instruct BLIS to remain initialized whenever
  it automatically initializes itself (via bl2_init_safe()), until/unless the
  application code explicitly calls bl2_finalize().

- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
  templatization of blas2blis wrappers.
- Defined level-0 scalar macro bl2_??swaps().
- Defined level-1v operation bl2_swapv().
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
  wrappers.
2013-02-22 12:11:24 -06:00
Field G. Van Zee
da0c22f241 Minor changes to lower levels of scalm and setm.
Details:
- Removed diagx parameter from lower-level interfaces of scalm.
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
- Changed setm_unb_var1() so that having an implicit unit diagonal results
  in only the strictly lower or upper triangle of the matrix being modified.
2013-02-15 09:59:48 -06:00
Field G. Van Zee
e6ac623a90 Properly implemented beta == 0 semantics.
Details:
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
  respectively.
- Added code to the following operations that sets the output operand to
  zero if the corresponding scalar is zero (rather than performing the
  floating-point multiply, or in the case of setv, copying the value).
  This will prevent nan's and inf's from creeping into results from
  uninitialized memory.
  - axpy
  - dotxv
  - scalv
  - scal2v
  - setv
  - gemv
  - ger
  - hemv
  - her
  - her2
  - gemm reference ukernels
2013-02-13 18:44:59 -06:00
Field G. Van Zee
cf49e35f98 Removed cntl tree usage from packm implementation.
Details:
- Added new fields to obj_t info field:
  - invert_diag
  - pack_order_if_upper
  - pack_order_if_lower
  These fields allow packm_init() to embed information that begins
  in the control tree into the object so that the packm implementation
  does not need to use control trees at all. This is being done to aid
  Bryan's DxT code generation.
- Added macros that operate on above fields.
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
  to above changes.
- Made similar (but much simpler) changes to packv.
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
  These were part of prototype implementations and are no longer needed.
2013-02-12 18:39:35 -06:00
Field G. Van Zee
474bac30c9 Removed level-0 macros projrs, grabis.
Details:
- Replaced instances of projrs and grabis macros with newer,
  more general-purpose getris.
2013-02-12 12:23:48 -06:00
Field G. Van Zee
1274e12437 Updated copyright headers from 2012 to 2013. 2013-02-11 14:37:47 -06:00
Field G. Van Zee
768fcebaa8 Added unified test suite, and many fixes.
Details:
- Added a highly configurable, unified test suite.

- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
  header files. Now, instead, DUPB is computed as (NDUP != 1) within each
  macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
  incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
  By encoding both pieces of information into one constant in _kernel.h,
  it seems somewhat less likely others will encounter this bug in the
  future.
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
  and defined blocksizes in _cntl.c files to these default values.

- Changed semantics of her2k and syr2k such that these operations no longer
  expect the B matrix to already be conjugate-transposed (or just transposed
  for syr2k). However, these semantics are preserved for the internal
  mechanics of the implementations, including the internal back-end and all
  blocked variants.
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
  respectively.

- Relaxed general object structure constraints in _basic_check() for gemv, ger.
- Changed her front-end to NOT copy-cast to real projection; instead, this is
  replaced by selecting either the real part or both parts within the unblocked
  algorithm implementation, depending on the value of conjh.
- Added conjh to all _check routines for her so that the code knows when to
  verify that alpha has an imaginary component equal to zero (for her, but
  not syr).
- Changed control tree for her to forgo packing.

- Added unit diagonal support to fnormm.
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
- Redefined complex versions of sqrt2s macros using the actual "complex square
  root" formula.
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
- Defined new level-1v, -1d, and -1m versions of add and sub operations
  (two-operand add and subtract).
- Added new scalar macros:
  - getris: acquire real and imaginary components.
  - setris: set real and imaginary components.
  - addjs: addition with conjugated x.
  - subjs: subtraction with conjugated x.
- Defined new utility operations:
  - absumv: element-wise sum of absolute values for vector elements.
  - absumm: element-wise sum of absolute values for matrix elements.
  - mkherm: convert existing matrix to Hermitian.
  - mksymm: convert existing matrix to symmetric.
  - mktrim: convert existing matrix to triangular.

- Added various error checking routines.
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
  wall clock time of a code block.
- Added general stride support to bl2_obj_alloc_buffer().
- Added bl2_obj_init_scalar().
- Updated parameter mapping in bl2_param_map.c.
- Added support for queriable version string.

- Fixed a bug in the her2k macro-kernels (which currently are simply
  implemented in terms of two invocations of herk) whereby beta was being
  applied to both the first and second rank-k updates, rather than only
  the first.
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
  properly implemented due to erroneous assumptions regarding aliasing and
  root objects.
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
  MR x NR block of B was being updated.
- Fixed a bug in the inverts macro in the double real case whereby the
  value was typecast to float before inversion. This affected non-unit cases
  of dtrsm.
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
  constant was being applied incorrectly.
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
  now mimics the rank-k strategy of gemm, whereby alpah is applied during
  the first iteration of variant 3, with BLIS_ONE passed in instead for
  subsequent iterations. This also required passing alpha into the macro-
  kernels as well as the fused gemmtrsm micro-kernels.
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
  called for blocks strictly above the diagonal. While this sounds good in
  theory, this cannot be done because gemm_ker_var2 expects row panels of
  A to be packed from top to bottom, while for trsm_u, A is actually packed
  from bottom to top due to the reverse (BR->TL) nature of the algorithm.
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
  dimensions were mishandled due to incorrect arguments to the copyv kernel.
  Also changed the copyv kernel invocation to scal2v so that these edge
  cases are properly handled when scaling is requested.
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
  instead of the source object.
- Fixed a bug whereby level-2 code could allocate memory dynamically via
  bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
  a potential future bug whereby a mem_t object that is actually no longer
  "allocated" from the static pool is mistaken for being allocated due to
  failure to NULLify the buffer when the block was most recently released.
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
  toggled when the requested subpartition needed to be "reflected" due to it
  residing in an unstored region.
2013-02-11 13:20:44 -06:00
Field G. Van Zee
806e74beb4 Defined Frobenius norm operations.
Details:
- Added level-0 grabis macro operation to grab imaginary component of one
  variable and copy it to the real component of another variable.
- Defined sumsqv operation, which computes the sum of the absolute squares
  of the elements of a vector. This implementation is modeled after ?lassq
  in netlib LAPACK.
- Defined fnormv and fnormm operations, which compute the Frobenius norm on
  vectors and matrices, respectively. These operations are treated as one-
  operand operations where the output norm value is the real projection of
  the datatype of the input operand. Both operations are implemented in terms
  of sumsqv.
2012-12-20 17:07:50 -06:00
Field G. Van Zee
66e80ce1ae Added GENT*R macros; tweaked bl2_machval defs.
Details:
- Added function and prototype macro-generating macros for GENTFUNCR and
  GENTPROTR, which are one-operand macros with auxiliary real projection
  types.
- Tweaked bl2_machval files to use new macros.
2012-12-20 17:02:55 -06:00
Field G. Van Zee
2fecc88ca2 Fixed harmless macro bug in level-1m operations.
Details:
- Fixed some inconsistent usage of n_iter_max and n_iter in the two
  bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening
  despite the bug, which is why I had not discovered it until now.
2012-12-20 11:35:14 -06:00
Field G. Van Zee
6fbbdd4e19 More tweaks to _config.h, _kernel.h; smem tweaks.
Details:
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
  accomplishes the same thing (enabling posix_memalign()) without enabling
  all of the GNU extensions we don't need.
- Defined the size of the static memory pool in terms of MC, KC, and NC,
  as well as two new constants that determine how many MCxKC blocks and
  how many KCxNC blocks should be allocated (defined in bl2_config.h).
- In the case of static memory pool exhaustion, replaced the generic
  bl2_abort() with a specific error code call.
2012-12-18 14:34:02 -06:00
Field G. Van Zee
4a83f67490 Consolidated configuration headers.
Details:
- Merged contents of bl2_arch.h into bl2_config.h for reference and
  clarksville configurations.
- Updated CREDITS, INSTALL, LICENSE, README files.
2012-12-17 12:35:54 -06:00