Commit Graph

26 Commits

Author SHA1 Message Date
Field G. Van Zee
2ee6bbca29 Fixed bug in bli_obj_is_packed() and renamed.
Details:
- This macro is used to determine whether the partitioning routines should
  call a corresponding packm_part routine instead. However, it was
  unintentionally catching matrices that were marked as "packed" by virtue
  of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
  The macro has now been renamed to bli_obj_is_panel_packed(), and now only
  checks for row or column panel packing. (Note that I first attempted to
  fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
  erroneous behavior that led me to this bug.
2013-04-15 19:27:57 -05:00
Field G. Van Zee
1a9f427b85 Added/renamed alignment constants to _config.h.
Details:
- Added new memory alignment constants:
    BLIS_HEAP_STRIDE_ALIGN_SIZE   (previously assumed to be same as SYSTEM_MEM)
    BLIS_CONTIG_ADDR_ALIGN_SIZE   (previously assumed to be same as PAGE_SIZE)
    BLIS_STACK_BUF_ALIGN_SIZE     (previously not enforced)
  and renamed existing ones
    BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
    BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
  to better convey what the alignment factor is used for (and what it is
  not used for).
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
  disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
  into macro-kernels to specify stack alignment of temporary buffers.
- Modified test suite driver to output new constants.
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
  use bli_align_dim_to_size(), which takes a third argument (the desired
  alignment).
2013-04-12 15:25:54 -05:00
Field G. Van Zee
d43d1a0a2e Appended 'f2c_' to abs, min, max macros in f2c.h.
Details:
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
  would not conflict with anything defined by the user (or the language).
  Thanks to Devin Matthews for suggesting this fix.
- Updated all instances of the above macros accordingly.
2013-04-11 16:28:17 -05:00
Field G. Van Zee
4afe3bfd82 Renamed/moved object scalar constant macros.
Details:
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
  simplier macro in bli_obj_macro_defs.h.
- Updated invocations of old macros accordingly.
- Removed bli_const_defs.h.
2013-04-09 17:45:39 -05:00
Field G. Van Zee
6684b73d55 Implemented amax operation and related changes.
Details:
- Implemented amax operation in BLIS.
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
- Added integer support to [f]printv, [f]printm.
- Added integer support to level-0 copys macros.
- Updated printing of configuration information in test suite driver.
- Comment changes to _config.h files.
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
  used for.
2013-04-02 13:06:20 -05:00
Field G. Van Zee
fb68087f87 More memory alignment-related tweaks.
Details:
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
  passed into posix_memalign() or equivalent.
- Defined new function, bli_align_dim_to_cmem(), which applies the
  contiguous memory alignment (rather than the system/malloc alignment).
2013-03-26 15:10:16 -05:00
Field G. Van Zee
3a787cccaa Renamed memory alignment macro constant.
Details:
- Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
  BLIS_MEMORY_ALIGNMENT_SIZE.
2013-03-26 13:59:19 -05:00
Field G. Van Zee
37308f9a50 Align packed panel strides with system alignment.
Details:
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
  subsequent packed panel of A and B begins at an aligned address. (The
  first panel is presumably aligned to system alignment because it is
  aligned to a page boundary, which is typically much larger.)
- Rearranged code in packm_init_pack() to prevent additional conditional
  blocks as a result of the aforementioned change.
- Adjusted contiguous memory allocator so that the system memory alignment
  is used to allocate enough space for each block no matter what kind of
  register blocking is used (even if register blocksize is unit and every
  row/column needs maximal padding).
- Adjusted default blocksizes in reference configuration so that MC*KC
  and KC*NC result in identical footprints for all datatypes.
2013-03-26 12:43:14 -05:00
Field G. Van Zee
b65cdc57d9 Migrated 'bl2' prefix to 'bli'.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
  corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
2013-03-24 20:01:49 -05:00
Field G. Van Zee
551ea4767a Removed #include "blis2.h" from low-level headers.
Details:
- Removed #include of "blis2.h" from various lower-level, operation-specific
  header files throughout the framework. Given that these low-level headers
  are included within #blis2.h in a very specific order, #include'ing blis2.h
  within them directly is unnecessary.
2013-03-24 18:00:10 -05:00
Field G. Van Zee
f469907503 Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
Details:
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
  BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
  (e.g. "prefetch" instructions, which are different than the particular
  kind of prefetching/preloading referred to by this constant).
2013-03-22 15:20:15 -05:00
Field G. Van Zee
718888849c Deprecated 'flame' configuration.
Details:
- Removed 'flame' configuration, as it was horribly out-of-date.
- Comment changes to bl2_blocksize.c and bl2_mem.c.
2013-03-22 15:07:01 -05:00
Field G. Van Zee
1f82b51d06 Relocated packed mem_t dimension fields to obj_t.
Details:
- Removed the m and n (and elem_size) fields from the mem_t object, and added
  m_packed and n_packed fields to obj_t. These new fields track the same as
  the old ones. From an abstraction standpoint, it seemed awkward to store
  those dimensions inside the mem_t.
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
  is passed in, instead of m, n, and elem_size.
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
  functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
  respectively.
- Updated packm variants to access the packed length and width fields from
  their new locations.
2013-03-18 15:37:20 -05:00
Field G. Van Zee
e7d41229d3 Re-implemented contiguous memory allocator.
Details:
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
  allocator instantiates and initializes three separate memory pool objects,
  each one associated with a separate array of contiguous memory blocks, each
  block of fixed and uniform size. (The three pools are for allocating mc-by-kc
  blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
  objects use a stack structure internally to track which blocks in the region
  have been "checked out" to a thread and which are still available. Critical
  regions are now clearly marked and adaptable to parallel environments (e.g.
  OpenMP). Memory pools are set up when bl2_init() is called.
- Added a new field to the packm control tree node, which indicates what kind
  of packed buffer is being allocated. The enumerated type for this argument
  is defined as packbuf_t in bl2_type_defs.h.
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
  packbuf_t argument to bl2_packm_cntl_obj_create().
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
  bl2_mem_macro_defs.h.
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
  number of blocks of A reserved for the memory allocator.
- Deprecated bl2_align_dim(). Replaced usage with that of
  bl2_align_dim_to_mult(). Turns out that typically we don't need to align
  a dimension to the system alignment, since that value has to do with
  starting addresses, whereas the values we are dealing with are unitless
  dimensions.
2013-03-15 17:12:36 -05:00
Field G. Van Zee
c95c270eba Enhanced tracking of dimensions for mem_t objects.
Details:
- Added new fields to mem_t struct definition to track the allocated (as
  opposed to the currently used) dimensions of the memory region. This
  allows packm_init() to be more robust in situations where memory is
  already allocated but is more than needed for the current packing job.
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
  in packm_init(), to update the "currently used" dimensions of the mem_t
  object if the requested dimensions are smaller than the allocated
  dimensions.
2013-03-07 14:42:15 -06:00
Field G. Van Zee
ede75693e5 Implemented blas2blis compatibility layer.
Details:
- Added the blas2blis compatibility layer, located in frame/compat. This
  includes virtually all of the BLAS, including banded and packed level-2
  operations.

- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
  initialization, which stores the "exit status" in an err_t, which is then
  read by the latter function to determine whether finalization should actually
  take place.
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
  level-3 BLAS-like wrappers.
- Added configuration option to instruct BLIS to remain initialized whenever
  it automatically initializes itself (via bl2_init_safe()), until/unless the
  application code explicitly calls bl2_finalize().

- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
  templatization of blas2blis wrappers.
- Defined level-0 scalar macro bl2_??swaps().
- Defined level-1v operation bl2_swapv().
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
  wrappers.
2013-02-22 12:11:24 -06:00
Field G. Van Zee
da0c22f241 Minor changes to lower levels of scalm and setm.
Details:
- Removed diagx parameter from lower-level interfaces of scalm.
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
- Changed setm_unb_var1() so that having an implicit unit diagonal results
  in only the strictly lower or upper triangle of the matrix being modified.
2013-02-15 09:59:48 -06:00
Field G. Van Zee
1274e12437 Updated copyright headers from 2012 to 2013. 2013-02-11 14:37:47 -06:00
Field G. Van Zee
768fcebaa8 Added unified test suite, and many fixes.
Details:
- Added a highly configurable, unified test suite.

- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
  header files. Now, instead, DUPB is computed as (NDUP != 1) within each
  macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
  incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
  By encoding both pieces of information into one constant in _kernel.h,
  it seems somewhat less likely others will encounter this bug in the
  future.
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
  and defined blocksizes in _cntl.c files to these default values.

- Changed semantics of her2k and syr2k such that these operations no longer
  expect the B matrix to already be conjugate-transposed (or just transposed
  for syr2k). However, these semantics are preserved for the internal
  mechanics of the implementations, including the internal back-end and all
  blocked variants.
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
  respectively.

- Relaxed general object structure constraints in _basic_check() for gemv, ger.
- Changed her front-end to NOT copy-cast to real projection; instead, this is
  replaced by selecting either the real part or both parts within the unblocked
  algorithm implementation, depending on the value of conjh.
- Added conjh to all _check routines for her so that the code knows when to
  verify that alpha has an imaginary component equal to zero (for her, but
  not syr).
- Changed control tree for her to forgo packing.

- Added unit diagonal support to fnormm.
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
- Redefined complex versions of sqrt2s macros using the actual "complex square
  root" formula.
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
- Defined new level-1v, -1d, and -1m versions of add and sub operations
  (two-operand add and subtract).
- Added new scalar macros:
  - getris: acquire real and imaginary components.
  - setris: set real and imaginary components.
  - addjs: addition with conjugated x.
  - subjs: subtraction with conjugated x.
- Defined new utility operations:
  - absumv: element-wise sum of absolute values for vector elements.
  - absumm: element-wise sum of absolute values for matrix elements.
  - mkherm: convert existing matrix to Hermitian.
  - mksymm: convert existing matrix to symmetric.
  - mktrim: convert existing matrix to triangular.

- Added various error checking routines.
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
  wall clock time of a code block.
- Added general stride support to bl2_obj_alloc_buffer().
- Added bl2_obj_init_scalar().
- Updated parameter mapping in bl2_param_map.c.
- Added support for queriable version string.

- Fixed a bug in the her2k macro-kernels (which currently are simply
  implemented in terms of two invocations of herk) whereby beta was being
  applied to both the first and second rank-k updates, rather than only
  the first.
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
  properly implemented due to erroneous assumptions regarding aliasing and
  root objects.
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
  MR x NR block of B was being updated.
- Fixed a bug in the inverts macro in the double real case whereby the
  value was typecast to float before inversion. This affected non-unit cases
  of dtrsm.
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
  constant was being applied incorrectly.
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
  now mimics the rank-k strategy of gemm, whereby alpah is applied during
  the first iteration of variant 3, with BLIS_ONE passed in instead for
  subsequent iterations. This also required passing alpha into the macro-
  kernels as well as the fused gemmtrsm micro-kernels.
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
  called for blocks strictly above the diagonal. While this sounds good in
  theory, this cannot be done because gemm_ker_var2 expects row panels of
  A to be packed from top to bottom, while for trsm_u, A is actually packed
  from bottom to top due to the reverse (BR->TL) nature of the algorithm.
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
  dimensions were mishandled due to incorrect arguments to the copyv kernel.
  Also changed the copyv kernel invocation to scal2v so that these edge
  cases are properly handled when scaling is requested.
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
  instead of the source object.
- Fixed a bug whereby level-2 code could allocate memory dynamically via
  bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
  a potential future bug whereby a mem_t object that is actually no longer
  "allocated" from the static pool is mistaken for being allocated due to
  failure to NULLify the buffer when the block was most recently released.
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
  toggled when the requested subpartition needed to be "reflected" due to it
  residing in an unstored region.
2013-02-11 13:20:44 -06:00
Field G. Van Zee
879a179e1d Added debug statements to bl2_mm_acquire_m().
Details:
- Added printf() statements to bl2_mm_acquire_m() to help debug issues
  with prematurely exhausted memory pool.
- Removed 'd' from kernel names of reference kernels in clarksville
  configuration's bl2_kernel.h
2013-01-04 10:37:27 -06:00
Field G. Van Zee
66e80ce1ae Added GENT*R macros; tweaked bl2_machval defs.
Details:
- Added function and prototype macro-generating macros for GENTFUNCR and
  GENTPROTR, which are one-operand macros with auxiliary real projection
  types.
- Tweaked bl2_machval files to use new macros.
2012-12-20 17:02:55 -06:00
Field G. Van Zee
6fbbdd4e19 More tweaks to _config.h, _kernel.h; smem tweaks.
Details:
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
  accomplishes the same thing (enabling posix_memalign()) without enabling
  all of the GNU extensions we don't need.
- Defined the size of the static memory pool in terms of MC, KC, and NC,
  as well as two new constants that determine how many MCxKC blocks and
  how many KCxNC blocks should be allocated (defined in bl2_config.h).
- In the case of static memory pool exhaustion, replaced the generic
  bl2_abort() with a specific error code call.
2012-12-18 14:34:02 -06:00
Field G. Van Zee
7ad4ebef38 Tweaks to get BLIS compiling again on clarksville.
Details:
- Updated header files and make_defs.mk in config/clarksville.
- Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
- Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
- Shuffled include statements in blis2.h.
2012-12-10 16:18:40 -06:00
Field G. Van Zee
e4e5404d26 Define static memory pool size in bl2_config.h. 2012-12-07 17:34:53 -06:00
Field G. Van Zee
2f272b40f4 Added build system and continued reorganization.
Details:
- Added/renamed packm, unpackm kernels.
- Added machine value routines.
- Added param_map facility.
- Renamed AUTHORS to CREDITS.
- Added Makefile; continued to expand upon existing configure script.
- #define fuse_fac macros in operation headers if not defined already
  (by the user in bl2_kernels.h).
2012-12-04 19:22:14 -06:00
Field G. Van Zee
00f3498a89 Initial commit. 2012-12-03 12:36:11 -06:00