Commit Graph

127 Commits

Author SHA1 Message Date
Field G. Van Zee
d1e81ddc84 Minor generalizing tweaks to trmm blk var1, var2. 2013-06-13 11:14:21 -05:00
Field G. Van Zee
0efb7974f1 CHANGELOG update. 2013-06-12 16:40:04 -05:00
Field G. Van Zee
5b641c3bab Use separate CFLAGS for "kernels" directories.
Details:
- Added a new "special" directory type: any source code within directories
  named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
  compiler flags. This allows the developer to specify a separate set of
  flags (e.g. optimization flags) for compiling kernels while maintaining a
  standard set for regular framework code.
- Fixed a bug in the top-level Makefile that was causing "noopt" code
  to be compiled with the standard set of compilation flags.
- Updated make_defs.mk in reference, flame, and clarksville configurations
  according to above changes.
0.0.8
2013-06-12 16:02:12 -05:00
Field G. Van Zee
08475e7c76 Various level-3 optimizations for row storage.
Details:
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
  packing from a lower or upper-stored symmetric/Hermitian matrix to column
  panels (which are row-stored). Previously one could only pack to row panels
  (which are column-stored).
- Implemented various optimizations in the level-3 front-ends that allow more
  favorable access through row-stored matrices for gemm, hemm, herk, her2k,
  symm, syrk, and syr2k.
- Cleaned up code in level-3 front-ends that has to do with setting target and
  execution datatypes.
2013-06-11 12:18:39 -05:00
Field G. Van Zee
05a657a6b9 Added beta == 0 optimization to x86_64 ukernel.
Details:
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
  from memory (nor scaled by beta).
- Fixed minor bug in test suite driver when "Test all combinations of storage
  schemes?" switch is disabled, which would result in redundant tests being
  executed for matrix-only (e.g. level-1m, level-3) operations if multiple
  vector storage schemes were specified.
- Restored debug flags as default in clarksville configuration.
2013-06-07 11:04:10 -05:00
Field G. Van Zee
f1aa6b81cc Whitespace changes to old test drivers.
Details:
- Replaced tabs with four spaces in places where indention was already
  in place.
2013-06-06 13:36:06 -05:00
Field G. Van Zee
9feb4c23d2 Fixed unaligned handling in axpyf, dotxaxpyf.
Details:
- Fixed over-cautious handling of unaligned operands in vector instrinsic
  implementation of axpyf kernel.
- Fixed over- and under-cautious handling of unaligned operands in vector
  intrinsic implementation of dotxaxpyf kernel.
2013-06-04 14:57:46 -05:00
Field G. Van Zee
22b06cfcd2 Updated level-1/-1f [vector intrinsic] kernels.
Details:
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
  handled by reference implementation (rather than aborted).
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
  configuration.
- Defined bli_offset_from_alignment() macro.
- Minor edits to old test drivers.
2013-06-03 16:54:52 -05:00
Field G. Van Zee
0288c827d3 Updated ukernels for x86_64.
Details:
- Tweaked micro-kernels and configuration for clarksville.
- Updated/cleaned up old test drivers in test directory.
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
  recently).
2013-06-01 08:02:23 -05:00
Field G. Van Zee
85a6d1c9a5 Replaced axpys usage with subs in trsv.
Details:
- Replaced instances of axpys with alpha equal to -1 with subs.
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
  sizeof(dcomplex).
2013-05-29 10:58:24 -05:00
Field G. Van Zee
2d9c667f3c Fixed x86_64 kernel bugs and other minor issues.
Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
  unaligned subpartitions. We were already going out of our way a bit to
  handle edge cases in the first iteration for blocked variants, and this
  was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
  into account how the choice of variant needed to be altered for
  upper-stored matrices (given that only lower-stored algorithms are
  explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
  macros to provide inlined versions of bli_determine_blocksize_[fb]() for
  use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
  consistency with that of the bugfix for trmv/trsv (both of which now
  use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
  vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
  conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
  was invalid only because the code was expecting 1 (for purposes of
  performing contiguous vector loads) but got a value greater than 1 because
  the column stride of the object (e.g. rho) was inflated for alignment
  purposes (albeit unnecessarily since there is only one element in the
  object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
  dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
  internal back-ends to correctly handle cases where output operand still
  needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
2013-05-24 16:28:10 -05:00
Field G. Van Zee
d57ec42b34 Renamed _trans_status() macro.
Details:
- Mistakenly forgot to rename the _trans_status() macro and instances in
  previous commit.
2013-05-03 17:35:32 -05:00
Field G. Van Zee
9e2b227866 Renamed _set_trans(), _trans_status() macros.
Details:
- Renamed the following macros:
    bli_obj_set_trans()    -> bli_obj_set_onlytrans()
    bli_obj_trans_status() -> bli_obj_onlytrans_status()
  to remove ambiguity as to which bits are read/updated.
2013-05-03 17:24:58 -05:00
Field G. Van Zee
2f8174509e Unconditionally check memory pool(s) for errors.
Details:
- Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the
  memory pool is exhausted before checking out and returning a block, even
  if BLIS error checking has been disabled. These errors are useful because
  they likely indicate that BLIS was improperly configured for the code
  being run.
2013-05-01 15:06:30 -05:00
Field G. Van Zee
75405a2b83 CHANGELOG update. 2013-05-01 15:00:30 -05:00
Field G. Van Zee
6bfa96f848 Absorbed blocksize extensions into main objects.
Details:
- Revamped some parts of commit b6ef84fad1 by adding blocksize extension
  fields to the blksz_t object rather than have them as separate structs.
- Updated all packm interfaces/invocations according to above change.
- Generalized bli_determine_blocksize_?() so that edge case optimization
  happens if and only if cache blocksizes are created with non-zero
  extensions.
- Updated comments in bli_kernel.h files to indicate that the edge case
  blocksize extension mechanism is now available for use.
0.0.7
2013-04-30 19:35:54 -05:00
Field G. Van Zee
bc7c8005ce Added option to disable err checking in testsuite.
Details:
- Added a new line to input.general that allows one to specify the error-
  checking level to use for each BLIS experiment. The only two levels
  supported for now are "no error checking" and "full error checking".
2013-04-25 17:16:59 -05:00
Field G. Van Zee
096b366ddc Use cntl trees that block in n dimension.
Details:
- Updated _cntl.c files for each level-3 operation to induce blocked
  algorithms that first paritition in the n dimension with a blocksize
  of NC. Typically this is not an issue since only very large problems
  exceed that of NC. But developers often run very large problems, and
  so this extra blocking should be the default.
- Removed some recently introduced but now unused macros from
  bli_param_macro_defs.h.
2013-04-25 16:43:43 -05:00
Field G. Van Zee
b6e24b23cb Use PASTEMAC in macro-kernels (over MAC2 or MAC3).
Details:
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
  and PASTEMAC3) with those that only use a single type (PASTEMAC).
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
  accommodate above change.
- Fixed comment typo in bli_config.h files.
- Added .nfs* pattern to .gitignore.
2013-04-25 12:06:12 -05:00
Field G. Van Zee
df80acf517 Fixed computation of b_next in L3 macro-kernels.
Details:
- Restructured herk_l and herk_u macro-kernels in the imagine of trmm
  and trsm, in that the edge cases are captured by the main loop, rather
  than trying to have "cleanup" sections that result in four distinct
  parts (interior, bottom edge, right edge, bottom-right edge) of the
  code.
- Fixed the way b_next was being computed in the non-gemm level-3
  macro-kernels (herk, trmm, trsm). The way they are computed now matches
  that of gemm.
2013-04-23 19:43:23 -05:00
Field G. Van Zee
3671528cf8 Fixed minor bug in computing b_next in gemm. 2013-04-23 19:12:14 -05:00
Field G. Van Zee
db072a5b4a Fixed rare edge case bug in herk_l macro-kernel.
Details:
- Fixed a potential bug in herk_l at the m_left edge case. If MR was
  chosen to be much larger than NR, then one could encounter edge cases
  in the the MC dimension that fall entirely below the diagonal, which
  the previous implementation of the herk_l macro-kernel was not allowing
  for.
2013-04-23 17:49:10 -05:00
Field G. Van Zee
1dab11e37d Updated x86 gemmtrsm ukernels to use alpha. 2013-04-23 17:17:11 -05:00
Field G. Van Zee
9d10d7dd9b Added a_next, b_next arguments to micro-kernels.
Details:
- Added two more arguments to the gemm and gemmtrsm microkernels: the
  addresses of the next micro-panels of A and B. By passing these
  pointers into the micro-kernel, we allow the micro-kernel author to
  prefetch micro-panels of A and B as necessary (though this is
  completely optional; these addresses may also be safely ignored).
- Updated all seven macro-kernels so that they compute and pass in
  a_next and b_next. Note that ONLY the gemm macro-kernel computes
  a_next and b_next with the precise semantics we want. I will go back
  and fix the other macro-kernels in the near future.
- Added 'restrict' to various micro-kernels from which it was missing.
2013-04-23 16:00:18 -05:00
Field G. Van Zee
f3815dc84d Added code for backward edge-case blocking.
Disabled:
- Edited bli_determine_blocksize_b() to include experimental (and
  currently disabled) code that computes extended blocks.
- Updated commnts relate to above changes.
- Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.
2013-04-23 11:12:33 -05:00
Field G. Van Zee
4fe1435f20 Updated dupl implementation to use PACKNR and NR.
Details:
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
  explicitly so navigate b1 so that situations where PACKNR > NR are
  supported.
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
  frame/3/trsm/ukernels to kernels/c99/.
- Updated clarksville and flame configurations.
2013-04-22 19:00:43 -05:00
Field G. Van Zee
2d6f9e8379 Disabled blocksize checks for memory pools.
Details:
- Temporarily disabled checks that ensure that enough memory will be allocated
  by the contiguous memory allocator for all types, given that the values for
  double precision real are the ones used to allocate the space. These checks
  can easily go awry in certain situations, especially if you are developing for
  only one datatype. So for now, they are probably more trouble than they are
  worth.
2013-04-21 15:10:34 -05:00
Field G. Van Zee
b6ef84fad1 Allow ldim of packed micro-panels != MR, NR.
Details:
- Made substantial changes throughout the framework to decouple the leading
  dimension (row or column stride) used within each packed micro-panel from
  the corresponding register blocksize. It appears advantageous on some
  systems to use, for example, packed micro-panels of A where the column
  stride is greater than MR (whereas previously it was always equal to MR).
- Changes include:
  - Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
    to use when packing micro-panels of A and B.
  - Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
    where appropriate, instead of MR and NR.
  - Added pd field (panel dimension) to obj_t.
  - New interface to bli_packm_cntl_obj_create().
  - Renamed bli_obj_packed_length()/_width() macros to
    bli_obj_padded_length()/_width().
  - Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
  - Print out new cache and register blocksize extensions in test suite.
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
  blocksize for edge cases, which can improve performance at the margins.
2013-04-21 15:00:24 -05:00
Field G. Van Zee
59fca58dbe Fixed bug in compatibility layer (her2k/syr2k).
Details:
- Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c
  and bla_syr2k.c, that caused incorrect computation to occur when the BLAS
  interface caller requests the [conjugate-]transpose case. Thanks to Bryan
  Marker for reporting the behavior that led to this bug.
2013-04-19 15:26:29 -05:00
Field G. Van Zee
09eacbd1ab Changed old level3 test drivers to call front-ends.
Details:
- Changed old level-3 test drivers, in 'test' directory, to always call the
  front-end object API instead of the internal back-end with the locally
  defined control tree.
2013-04-18 19:39:13 -05:00
Field G. Van Zee
83e45de23e Allow packm_init() to reacquire a too-small mem_t.
Details:
- Changed bli_packm_init() to react differently to a situation where a pack
  obj_t has an already-allocated mem_t entry that has a buffer that is smaller
  than what will be needed to hold the block/panel that now needs to be
  packed. Previously, this situation was treated with an abort() since I
  assumed something was horribly wrong. I have changed the code so that it now
  reacts by releasing the previous mem_t and re-acquires a new mem_t with the
  new information. (This change was done at the request of Bryan Marker to
  facilitate code generation via DxT.)
2013-04-18 18:33:03 -05:00
Field G. Van Zee
a699043417 Fixed bug in packing block of A for hemm/symm.
Details:
- Fixed a bug in bli_packm_blk_var2() that affected the packing functionality
  of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or
  symmetric matrix where the block of A being packed intersects the diagonal,
  but some of its micro-panels do not intersect the diagonal and lie completely
  in the unstored region. Thanks to Francisco Igual for reporting this bug.
- Comment updates to both _blk_var2.c and _blk_var3.c.
2013-04-18 13:52:47 -05:00
Field G. Van Zee
c92e7590e1 Activated bli_packm_acquire_mpart_t2b().
Details:
- Removed the overly-paranoid bli_abort() from the end of
  bli_packm_acquire_mpart_t2b(), to allow others to experiment with
  partitioning through packed blocks of A. Also, and more importantly,
  changed an earlier check that was causing an erroneous (but
  coincidentally redundant) abort(). Also, updated some of the comments
  in bli_packm_part.c.
2013-04-17 20:53:29 -05:00
Field G. Van Zee
bea579e9f0 Allow creation of "empty" objects.
Details:
- Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and
  modified bli_adjust_strides() to explicitly handle m = n = 0.
- Updated bli_check_matrix_strides() to allow cases where m = n = 0.
2013-04-16 19:43:14 -05:00
Field G. Van Zee
7904e20f2e Fixed "root" object bug in bli_her[2]k/syr[2]k.
Details:
- Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k,
  that manifested as the incorrect triangle being updated. It occurred when
  the user would pass in a matrix object that was correctly marked as
  symmetric/Hermitian and lower-stored, but whose root object was never marked
  as lower (or upper). We now alias and re-assign root status for matrix C
  within the front-ends. Note that trmm and trsm were already doing this,
  albeit for a slightly different reason (to allow the internal back-end to
  choose which algorithm to run--lower or upper--based on the uplo of the root
  object for both left and right side cases). Thanks to Bryan Marker for
  leading me to this bug.
2013-04-16 17:37:16 -05:00
Field G. Van Zee
19155a768d Fixed overzealous type-checking in bli_getsc().
Details:
- Relaxed type checking in getsc so that the input object could be a constant
  and not just a proper floating-point type. (If it is a constant, default to
  extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
  bug.
- Added definition for bli_is_constant() in bli_param_macro_defs.h
- Comment updates to various level-0 scalar routines.
2013-04-16 11:24:03 -05:00
Field G. Van Zee
2ee6bbca29 Fixed bug in bli_obj_is_packed() and renamed.
Details:
- This macro is used to determine whether the partitioning routines should
  call a corresponding packm_part routine instead. However, it was
  unintentionally catching matrices that were marked as "packed" by virtue
  of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
  The macro has now been renamed to bli_obj_is_panel_packed(), and now only
  checks for row or column panel packing. (Note that I first attempted to
  fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
  erroneous behavior that led me to this bug.
2013-04-15 19:27:57 -05:00
Field G. Van Zee
99b99eebe7 Removed local reference ukernel blocksize macros.
Details:
- Removed locally defined gemm microkernel blocksize macros from _mxn
  reference microkernel definition and header. Meant to include this in
  a recent/previous commit (0020ef7c82).
2013-04-15 17:54:43 -05:00
Field G. Van Zee
6a538fa7b1 Formatting change to mods in previous commit. 2013-04-15 14:40:31 -05:00
Field G. Van Zee
ea079d3559 Set structure of objects in level-2 BLIS APIs.
Details:
- Added missing statement to set structure field of local objects in
  top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
  reporting this bug.
2013-04-15 14:31:40 -05:00
Field G. Van Zee
d9948c541c Tweak to test suite function string construction.
Details:
- Fixed a minor bug in the way that the test suite would construct function
  name strings when the user anchored all parameters in input.operations.
  In this case, the test driver would mistake this situation for one where
  the operation simply had no parameters to begin with, and thus would not
  include the parameter string in the function string that is output for
  every result.
2013-04-15 10:21:26 -05:00
Field G. Van Zee
ca9e435c57 Fixed a bug in reference implementation of dupl.
Details:
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
  which resulted in incorrect duplication.
- Updated old test drivers according to recently updated packm control tree
  creation interface.
- Added 'restrict' to x86 gemm microkernel interface.
2013-04-15 09:59:46 -05:00
Field G. Van Zee
26cbd52e36 Modified bli_kernel.h include order in blis.h.
Details:
- Delayed #include of bli_kernel.h in blis.h to prevent a situation where
  _kernel.h includes an optimized microkernel header, which uses BLIS types
  such as dim_t and inc_t, which would precede the definition of those types
  in bli_type_defs.h.
- Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
  (immediately after that of bli_kernel.h).
2013-04-14 19:05:33 -05:00
Field G. Van Zee
3414a23c38 CHANGELOG update. 2013-04-13 16:53:16 -05:00
Field G. Van Zee
ec16c52f2e Updated INSTALL file (now redirects to website). 0.0.6 2013-04-13 16:41:16 -05:00
Field G. Van Zee
0020ef7c82 Removed gemmtrsm-, trsm-specific blocksize macros.
Details:
- Modified gemmtrsm micro-kernel wrappers to use new aliased blocksize macros
  instead of operation-specific ones.
- Removed local, gemmtrsm-specific blocksize macro definitions found in
  micro-kernel header files.
  (Meant to include above changes in 31b100e7bf4a.)
- Added comments to reference gemmtrsm micro-kernel wrapper implementation.
2013-04-13 15:26:35 -05:00
Field G. Van Zee
1a9f427b85 Added/renamed alignment constants to _config.h.
Details:
- Added new memory alignment constants:
    BLIS_HEAP_STRIDE_ALIGN_SIZE   (previously assumed to be same as SYSTEM_MEM)
    BLIS_CONTIG_ADDR_ALIGN_SIZE   (previously assumed to be same as PAGE_SIZE)
    BLIS_STACK_BUF_ALIGN_SIZE     (previously not enforced)
  and renamed existing ones
    BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
    BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
  to better convey what the alignment factor is used for (and what it is
  not used for).
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
  disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
  into macro-kernels to specify stack alignment of temporary buffers.
- Modified test suite driver to output new constants.
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
  use bli_align_dim_to_size(), which takes a third argument (the desired
  alignment).
2013-04-12 15:25:54 -05:00
Field G. Van Zee
a77d10e87e Fixed an bug in axpyv/axpym when alpha is unit.
Details:
- Fixed bug whereby axpyv and axpym were incorrectly simplifying to a copy,
  rather than an add, when alpha = 1. Thanks to Bryan Marker for identifying
  this bug.
2013-04-12 11:40:55 -05:00
Field G. Van Zee
0495bd1d6d Moved _POSIX_C_SOURCE def to compiler cmd line.
Details:
- Removed the #define of _POSIX_C_SOURCE in bli_config.h (for both reference
  and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
  the compiler command line arguments in make_defs.mk (for both configs).
  Thanks to Devin Matthews for suggesting this change.
2013-04-11 16:39:25 -05:00
Field G. Van Zee
d43d1a0a2e Appended 'f2c_' to abs, min, max macros in f2c.h.
Details:
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
  would not conflict with anything defined by the user (or the language).
  Thanks to Devin Matthews for suggesting this fix.
- Updated all instances of the above macros accordingly.
2013-04-11 16:28:17 -05:00