Commit Graph

92 Commits

Author SHA1 Message Date
Field G. Van Zee
19155a768d Fixed overzealous type-checking in bli_getsc().
Details:
- Relaxed type checking in getsc so that the input object could be a constant
  and not just a proper floating-point type. (If it is a constant, default to
  extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
  bug.
- Added definition for bli_is_constant() in bli_param_macro_defs.h
- Comment updates to various level-0 scalar routines.
2013-04-16 11:24:03 -05:00
Field G. Van Zee
2ee6bbca29 Fixed bug in bli_obj_is_packed() and renamed.
Details:
- This macro is used to determine whether the partitioning routines should
  call a corresponding packm_part routine instead. However, it was
  unintentionally catching matrices that were marked as "packed" by virtue
  of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
  The macro has now been renamed to bli_obj_is_panel_packed(), and now only
  checks for row or column panel packing. (Note that I first attempted to
  fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
  erroneous behavior that led me to this bug.
2013-04-15 19:27:57 -05:00
Field G. Van Zee
99b99eebe7 Removed local reference ukernel blocksize macros.
Details:
- Removed locally defined gemm microkernel blocksize macros from _mxn
  reference microkernel definition and header. Meant to include this in
  a recent/previous commit (0020ef7c82).
2013-04-15 17:54:43 -05:00
Field G. Van Zee
6a538fa7b1 Formatting change to mods in previous commit. 2013-04-15 14:40:31 -05:00
Field G. Van Zee
ea079d3559 Set structure of objects in level-2 BLIS APIs.
Details:
- Added missing statement to set structure field of local objects in
  top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
  reporting this bug.
2013-04-15 14:31:40 -05:00
Field G. Van Zee
d9948c541c Tweak to test suite function string construction.
Details:
- Fixed a minor bug in the way that the test suite would construct function
  name strings when the user anchored all parameters in input.operations.
  In this case, the test driver would mistake this situation for one where
  the operation simply had no parameters to begin with, and thus would not
  include the parameter string in the function string that is output for
  every result.
2013-04-15 10:21:26 -05:00
Field G. Van Zee
ca9e435c57 Fixed a bug in reference implementation of dupl.
Details:
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
  which resulted in incorrect duplication.
- Updated old test drivers according to recently updated packm control tree
  creation interface.
- Added 'restrict' to x86 gemm microkernel interface.
2013-04-15 09:59:46 -05:00
Field G. Van Zee
26cbd52e36 Modified bli_kernel.h include order in blis.h.
Details:
- Delayed #include of bli_kernel.h in blis.h to prevent a situation where
  _kernel.h includes an optimized microkernel header, which uses BLIS types
  such as dim_t and inc_t, which would precede the definition of those types
  in bli_type_defs.h.
- Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
  (immediately after that of bli_kernel.h).
2013-04-14 19:05:33 -05:00
Field G. Van Zee
3414a23c38 CHANGELOG update. 2013-04-13 16:53:16 -05:00
Field G. Van Zee
ec16c52f2e Updated INSTALL file (now redirects to website). 0.0.6 2013-04-13 16:41:16 -05:00
Field G. Van Zee
0020ef7c82 Removed gemmtrsm-, trsm-specific blocksize macros.
Details:
- Modified gemmtrsm micro-kernel wrappers to use new aliased blocksize macros
  instead of operation-specific ones.
- Removed local, gemmtrsm-specific blocksize macro definitions found in
  micro-kernel header files.
  (Meant to include above changes in 31b100e7bf4a.)
- Added comments to reference gemmtrsm micro-kernel wrapper implementation.
2013-04-13 15:26:35 -05:00
Field G. Van Zee
1a9f427b85 Added/renamed alignment constants to _config.h.
Details:
- Added new memory alignment constants:
    BLIS_HEAP_STRIDE_ALIGN_SIZE   (previously assumed to be same as SYSTEM_MEM)
    BLIS_CONTIG_ADDR_ALIGN_SIZE   (previously assumed to be same as PAGE_SIZE)
    BLIS_STACK_BUF_ALIGN_SIZE     (previously not enforced)
  and renamed existing ones
    BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
    BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
  to better convey what the alignment factor is used for (and what it is
  not used for).
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
  disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
  into macro-kernels to specify stack alignment of temporary buffers.
- Modified test suite driver to output new constants.
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
  use bli_align_dim_to_size(), which takes a third argument (the desired
  alignment).
2013-04-12 15:25:54 -05:00
Field G. Van Zee
a77d10e87e Fixed an bug in axpyv/axpym when alpha is unit.
Details:
- Fixed bug whereby axpyv and axpym were incorrectly simplifying to a copy,
  rather than an add, when alpha = 1. Thanks to Bryan Marker for identifying
  this bug.
2013-04-12 11:40:55 -05:00
Field G. Van Zee
0495bd1d6d Moved _POSIX_C_SOURCE def to compiler cmd line.
Details:
- Removed the #define of _POSIX_C_SOURCE in bli_config.h (for both reference
  and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
  the compiler command line arguments in make_defs.mk (for both configs).
  Thanks to Devin Matthews for suggesting this change.
2013-04-11 16:39:25 -05:00
Field G. Van Zee
d43d1a0a2e Appended 'f2c_' to abs, min, max macros in f2c.h.
Details:
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
  would not conflict with anything defined by the user (or the language).
  Thanks to Devin Matthews for suggesting this fix.
- Updated all instances of the above macros accordingly.
2013-04-11 16:28:17 -05:00
Field G. Van Zee
31b100e7bf Added new kernel blocksize macro aliases.
Details:
- Added new macros that alias level-3 cache and register blocksize macros
  to names that can be constructed via the PASTEMAC macro. These aliased
  macro definitions live inside bli_kernel_macro_defs.h, which is now
  #included after bli_kernel.h.
- Modified macro-kernels to use new aliased blocksize macros instead of
  operation-specific ones.
- Removed local, operation-specific kernel blocksize macro definitions
  (found in macro-kernel header files).
2013-04-11 11:11:52 -05:00
Field G. Van Zee
bd2b24ba65 Updated CREDITS file. 2013-04-11 10:35:39 -05:00
Field G. Van Zee
79328c1541 Reverted testsuite object files' home to 'obj'.
Details:
- Removed 'obj' and 'lib' from .gitignore.
- Added testsuite/obj/.gitkeep (which is an empty file).
- Updated testsuite/Makefile accordingly.
- Thanks to Vernon Austel for pointing out the .gitkeep trick to tracking
  empty directories in git.
2013-04-11 10:32:14 -05:00
Field G. Van Zee
4afe3bfd82 Renamed/moved object scalar constant macros.
Details:
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
  simplier macro in bli_obj_macro_defs.h.
- Updated invocations of old macros accordingly.
- Removed bli_const_defs.h.
2013-04-09 17:45:39 -05:00
Field G. Van Zee
357893f5be Applied fix from prev commit to gemmtrsm_?_ref_4x4
Details:
- Fixed hard-coded kernels in bli_gemmtrsm_l_ref_4x4.c and
  bli_gemmtrsm_u_ref_4x4.c.
2013-04-09 14:48:15 -05:00
Field G. Van Zee
54988e8dca Fixed a performance bug in trsm.
Details:
- Fixed a bug in the reference implementations of the gemmtrsm wrappers
  (bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the
  reference gemm microkernel was hard-coded, and thus always called, even
  when GEMM_UKERNEL was defined to point to an optimzied microkernel. This
  manifested as artificially low trsm performance for all problem sizes, but
  especially for small problem sizes as it only affected blocks of A that
  intersected the diagonal. Thanks to Mike Kistler of IBM for helping me
  find this bug.
2013-04-08 19:08:43 -05:00
Field G. Van Zee
a7252e40b5 Generate testsuite objects 'src'.
Details:
- Tweaked the testsuite makefile so that object files are stored in 'src'
  rather than 'obj', since (a) the top-level .gitignore dictates that
  obj directories are to be ignored, and (b) since git has problems
  tracking empty directories. Now, users do not need to create their own
  obj directories within their own local clones of BLIS.
2013-04-08 16:08:22 -05:00
Field G. Van Zee
803871c55b Minor formatting changes. 2013-04-08 15:18:42 -05:00
Field G. Van Zee
a571af816d Fixed definition of bli_is_packed_object() macro.
Details:
- Changed the definition of bli_is_packed_object() so that it keys off of the
  value of the pack schema bits in the info field of obj_t, rather than
  comparing the obj_t buffer with that of the mem_t entry. This was the cause
  of a very low probability bug whereby uninitialized memory caused the macro
  to evaluate to TRUE even though the object in question was not packed.
  Thanks to Vernon Austel of IBM for helping discover this bug.
- Changed an abort() in bli_packm_part() to a not-yet-implemented.
2013-04-08 15:00:13 -05:00
Field G. Van Zee
3be14c32f7 Updated information in testsuite output header.
Details:
- Added to the information that is echoed at the beginning of the test suite's
  output, and also re-labeled some existing information.
2013-04-06 12:54:45 -05:00
Field G. Van Zee
874707c1b1 Fixed edge case handling bug in herk macrokernels.
Details:
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
  only manifests when BLIS is configured such that MR != NR. The bug involves
  incorrectly detecting edge cases, which resulted in some parts of matrix C
  potentially being skipped and not updated, depending on the problem size.
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
  8 and 4, respectively, so that I can better stress the framework on a
  day-to-day basis. (The fact that they were both equal to 4 for so long is
  why I did not stumble upon this bug much sooner.)
2013-04-05 17:19:43 -05:00
Field G. Van Zee
7cbda15291 Added reference microkernels for arbitrary MR, NR.
Details:
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
  contain explicit loops over MR and NR, thus allowing them to be used
  unmodified by developers who want to build a reference library with
  custom register blocksizes.
- Changed config/reference/bli_kernel.h to use above ukernels by default.
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
  to use 'restrict' keyword.
- Added -funroll-loops option to config/reference/make_defs.mk.
- Updated comments in bli_kernel.h describing constraints on register and
  cache blocksizes.
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
  single-char macros are also defined.
2013-04-04 15:25:43 -05:00
Field G. Van Zee
6684b73d55 Implemented amax operation and related changes.
Details:
- Implemented amax operation in BLIS.
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
- Added integer support to [f]printv, [f]printm.
- Added integer support to level-0 copys macros.
- Updated printing of configuration information in test suite driver.
- Comment changes to _config.h files.
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
  used for.
2013-04-02 13:06:20 -05:00
Field G. Van Zee
fb68087f87 More memory alignment-related tweaks.
Details:
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
  passed into posix_memalign() or equivalent.
- Defined new function, bli_align_dim_to_cmem(), which applies the
  contiguous memory alignment (rather than the system/malloc alignment).
2013-03-26 15:10:16 -05:00
Field G. Van Zee
9682ef61db Always define memory alignment size cpp constant.
Details:
- Removed guard around #define for memory alignment size constant.
  Memory alignment should always be enabled, and so this value should
  always be defined.
2013-03-26 14:14:53 -05:00
Field G. Van Zee
3a787cccaa Renamed memory alignment macro constant.
Details:
- Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
  BLIS_MEMORY_ALIGNMENT_SIZE.
2013-03-26 13:59:19 -05:00
Field G. Van Zee
37308f9a50 Align packed panel strides with system alignment.
Details:
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
  subsequent packed panel of A and B begins at an aligned address. (The
  first panel is presumably aligned to system alignment because it is
  aligned to a page boundary, which is typically much larger.)
- Rearranged code in packm_init_pack() to prevent additional conditional
  blocks as a result of the aforementioned change.
- Adjusted contiguous memory allocator so that the system memory alignment
  is used to allocate enough space for each block no matter what kind of
  register blocking is used (even if register blocksize is unit and every
  row/column needs maximal padding).
- Adjusted default blocksizes in reference configuration so that MC*KC
  and KC*NC result in identical footprints for all datatypes.
2013-03-26 12:43:14 -05:00
Field G. Van Zee
40a0654ada CHANGELOG update. 2013-03-24 20:18:12 -05:00
Field G. Van Zee
b65cdc57d9 Migrated 'bl2' prefix to 'bli'.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
  corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
0.0.5
2013-03-24 20:01:49 -05:00
Field G. Van Zee
132bffcef7 Removed several 'old' directories and files.
Details:
- Removed most of the 'old' directories scattered throughout the framework,
  which includes alternate/half-baked/broken implementations.
2013-03-24 18:49:36 -05:00
Field G. Van Zee
551ea4767a Removed #include "blis2.h" from low-level headers.
Details:
- Removed #include of "blis2.h" from various lower-level, operation-specific
  header files throughout the framework. Given that these low-level headers
  are included within #blis2.h in a very specific order, #include'ing blis2.h
  within them directly is unnecessary.
2013-03-24 18:00:10 -05:00
Field G. Van Zee
bc7b318ed0 Added cpp guards to conflicting libflame typedefs.
Details:
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
  This is a temporary hack to allow interoperability with libflame. (Similarly
  temporary changes are being made to libflame's type definitions file.)
2013-03-22 17:18:58 -05:00
Field G. Van Zee
f469907503 Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
Details:
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
  BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
  (e.g. "prefetch" instructions, which are different than the particular
  kind of prefetching/preloading referred to by this constant).
2013-03-22 15:20:15 -05:00
Field G. Van Zee
d1023bfbc6 Removed build/old directory. 2013-03-22 15:09:59 -05:00
Field G. Van Zee
718888849c Deprecated 'flame' configuration.
Details:
- Removed 'flame' configuration, as it was horribly out-of-date.
- Comment changes to bl2_blocksize.c and bl2_mem.c.
2013-03-22 15:07:01 -05:00
Field G. Van Zee
bba38cf4e9 Added missing conjbeta argument to scald. 2013-03-19 18:07:40 -05:00
Field G. Van Zee
1f82b51d06 Relocated packed mem_t dimension fields to obj_t.
Details:
- Removed the m and n (and elem_size) fields from the mem_t object, and added
  m_packed and n_packed fields to obj_t. These new fields track the same as
  the old ones. From an abstraction standpoint, it seemed awkward to store
  those dimensions inside the mem_t.
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
  is passed in, instead of m, n, and elem_size.
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
  functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
  respectively.
- Updated packm variants to access the packed length and width fields from
  their new locations.
2013-03-18 15:37:20 -05:00
Field G. Van Zee
36c782857b CHANGELOG update. 2013-03-18 10:37:03 -05:00
Field G. Van Zee
e7d41229d3 Re-implemented contiguous memory allocator.
Details:
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
  allocator instantiates and initializes three separate memory pool objects,
  each one associated with a separate array of contiguous memory blocks, each
  block of fixed and uniform size. (The three pools are for allocating mc-by-kc
  blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
  objects use a stack structure internally to track which blocks in the region
  have been "checked out" to a thread and which are still available. Critical
  regions are now clearly marked and adaptable to parallel environments (e.g.
  OpenMP). Memory pools are set up when bl2_init() is called.
- Added a new field to the packm control tree node, which indicates what kind
  of packed buffer is being allocated. The enumerated type for this argument
  is defined as packbuf_t in bl2_type_defs.h.
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
  packbuf_t argument to bl2_packm_cntl_obj_create().
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
  bl2_mem_macro_defs.h.
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
  number of blocks of A reserved for the memory allocator.
- Deprecated bl2_align_dim(). Replaced usage with that of
  bl2_align_dim_to_mult(). Turns out that typically we don't need to align
  a dimension to the system alignment, since that value has to do with
  starting addresses, whereas the values we are dealing with are unitless
  dimensions.
0.0.4
2013-03-15 17:12:36 -05:00
Field G. Van Zee
1e76cae00c Perform her2k var1 loops in sequence.
Details:
- Changed variant 1 of her2k so that the two rank-k products are computed
  and accumulated in sequence rather than fused into one loop. This is
  necessary if BLIS is to be configured to provide only enough contiguous
  memory for one panel of B.
2013-03-15 12:21:42 -05:00
Field G. Van Zee
c95c270eba Enhanced tracking of dimensions for mem_t objects.
Details:
- Added new fields to mem_t struct definition to track the allocated (as
  opposed to the currently used) dimensions of the memory region. This
  allows packm_init() to be more robust in situations where memory is
  already allocated but is more than needed for the current packing job.
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
  in packm_init(), to update the "currently used" dimensions of the mem_t
  object if the requested dimensions are smaller than the allocated
  dimensions.
2013-03-07 14:42:15 -06:00
Field G. Van Zee
e99281a0f4 Fixed test suite flop formulas for ops with side.
Details:
- Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
  trmm3, and trsm.
- Comment updates in herk macro-kernels.
2013-03-07 14:00:10 -06:00
Field G. Van Zee
ef8cbfc44d Added "version" to .gitignore.
Details:
- Added "version" to .gitignore file so that the file does not show up when
  running 'git status', or accidentally get pulled into the index when
  running 'git add' or 'git add --all'.
2013-03-02 12:47:06 -06:00
Field G. Van Zee
e9e0747c2f Removed version file from version control.
Details:
- Removed version file from version control to prevent git errors that occur
  when trying to pull new commits.
2013-03-02 12:43:54 -06:00
Field G. Van Zee
bb612f864e Updated behavior of bl2_obj_induce_trans() macro.
Details:
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
  updated as part of the macro. All current uses of the macro have been
  coupled with instances of bl2_obj_set_trans() to clear the bit.
- Added Jed to CREDITS file.
2013-03-01 12:55:42 -06:00