Commit Graph

74 Commits

Author SHA1 Message Date
Field G. Van Zee
2cb13600f9 Updated year in copyright headers to 2014. 2014-01-03 12:29:13 -06:00
Field G. Van Zee
e3a6c7e776 Macroized conditionals for a2/b2 in macro-kernels.
Details:
- Replaced conditional expressions in macro-kernels related to computing
  the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
  invocation, bli_is_last_iter(), that tests the same condition.
- Updated gemm_ukr module to use auxinfo_t argument.
- Whitespace changes in test suite ukr modules.
2013-12-19 16:29:31 -06:00
Field G. Van Zee
a0331fb10a Introduced auxinfo_t argument to micro-kernels.
Details:
- Removed a_next and b_next arguments to micro-kernels and replaced them
  with a pointer to a new datatype, auxinfo_t, which is simply a struct
  that holds a_next and b_next. The struct may hold other auxiliary
  information that may be useful to a micro-kernel, such as micro-panel
  stride. Micro-kernels may access struct fields via accessor macros
  defined in bli_auxinfo_macro_defs.h.
- Updated all instances of micro-kernel definitions, micro-kernel calls,
  as well as macro-kernels (for declaring and initializing the structs)
  according to above change.
2013-12-19 14:50:11 -06:00
Field G. Van Zee
392428dea4 Added "ri" scalar macros.
Details:
- Added set of basic scalar macros that take arguments' real and
  imaginary components separately, named like the previous set except
  with the "ris" (instead of "s") suffix.
- Redefined the previous set of scalar macros (those that take arguments
  "whole") in terms of the new "ri" set.
- Renamed setris and getris macros to sets and gets.
- Renamed setimag0 macros to seti0s.
- Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.
2013-12-12 19:01:47 -06:00
Field G. Van Zee
b444489f10 Added new "attached" scalar representation.
Details:
- Added infrastructure to support a new scalar representation, whereby
  every object contains an internal scalar that defaults to 1.0. This
  facilitates passing scalars around without having to house them in
  separate objects. These "attached" scalars are stored in the internal
  atom_t field of the obj_t struct, and are always stored to be the same
  datatype as the object to which they are attached. Level-3 variants no
  longer take scalar arguments, however, level-3 internal back-ends stll
  do; this is so that the calling function can perform subproblems such
  as C := C - alpha * A * B on-the-fly without needing to change either
  of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
  from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):

   bli_obj_init_scalar_copy_of()
                           -> bli_obj_scalar_init_detached_copy_of()
   bli_obj_init_scalar()   -> bli_obj_scalar_init_detached()
   bli_obj_create_scalar_with_attached_buffer()
                           -> bli_obj_create_1x1_with_attached_buffer()
   bli_obj_scalar_equals() -> bli_obj_equals()

- Defined new functions:

   bli_obj_scalar_detach()
   bli_obj_scalar_attach()
   bli_obj_scalar_apply_scalar()
   bli_obj_scalar_reset()
   bli_obj_scalar_has_nonzero_imag()
   bli_obj_scalar_equals()

- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:

   bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
   bli_obj_is_scalar()     -> bli_obj_is_1x1()

- Defined new macros to set and copy internal scalars between objects:

   bli_obj_set_internal_scalar()
   bli_obj_copy_internal_scalar()

- In level-3 internal back-ends, added conditional blocks where alpha and
  beta are checked for non-unit-ness. Those values for alpha and beta are
  applied to the scalars attached to aliases of A/B/C, as appropriate,
  before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
  alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
  attached to A and B are multiplied together to obtain alpha, while beta
  is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
  future support for mixed domain/precision. These can be added back later
  once that functionality is given proper treatment. Also, removed the
  creating of copy-casts of alpha and beta since typecasting of scalars
  is now implicitly handled in the internal back-ends when alpha and
  beta are applied to the attached scalars.
2013-12-03 16:08:30 -06:00
Field G. Van Zee
9552e6ee82 Removed optional scaling from packm control tree.
Details:
- Removed does_scale field from packm control tree node and
  bli_packm_cntl_obj_create() interface. Adjusted all invocations of
  _cntl_obj_create() accordingly.
- Redefined/renamted macros that are used in aliasing so that now,
  bli_obj_alias_to() does a full alias (shallow copy) while
  bli_obj_alias_for_packing() does a partial alias that preserves the
  pack_mem-related fields of the aliasing (destination) object.
- Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree
  will work just fine for bli_trmm3().
- Removed some commented vestiges of the typecasting functionality needed
  to support heterogeneous datatypes.
2013-11-24 11:40:31 -06:00
Field G. Van Zee
376bbb59c8 Removed support for duplication.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
  and all framework code.
- Updated test suite modules according to above changes.
2013-11-08 11:17:34 -06:00
Field G. Van Zee
a98f78b715 Changed dim_t and inc_t to be signed integers.
Details:
- Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
  This will facilitate interoperability with Fortran in the future.
  (Fortran does not support unsigned integers.)
- Redefined many instances of stride-related macros so that they return
  or use the absolute value of the strides, rather than the raw strides
  which may now be signed. Added new macros bli_is_row_stored_f() and
  bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
  and changed the packm_blk_var[23] variants to use these macros instead
  of the existing bli_is_row_stored(), bli_is_col_stored().
- Added/adjusted typecasting to to various functions/macros, including
  bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
  related macros in bli_param_macro_defs.h.
- Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
  layer properly handles situations where vector increments are negative.
  Thanks to Vladimir Sukharev for pointing out this issue.
- Changed type of increment parameters in bli_adjust_strides() from dim_t
  to inc_t. Likewise in bli_check_matrix_strides().
- Defined bli_check_matrix_object(), which checks for negative strides.
- Redefined bli_check_scalar_object() and bli_check_vector_object() so
  that they also check for negative stride.
- Added instances of bli_check_matrix_object() to various operations'
  _check routines.
2013-11-06 15:32:47 -06:00
Field G. Van Zee
d70f2b089d Added scaling to abval2s, sqrt2s macros.
Details:
- Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
  and overflow from squaring the real and imaginary components. (This is
  the same technique used to fix recent bugs in invscals/invscaljs and
  inverts.)
2013-11-02 17:19:40 -05:00
Field G. Van Zee
97f89fbcf2 Fixed bug in complex invscals.
Details:
- Fixed complex inversion in invscals and invscaljs whereby the
  imaginary component was being computed incorrectly.
- Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
  in inverts, invscals, and invscaljs.
- Changed bli_abs() and bli_fabs() macro definitions to use "<="
  operator instead of "<".
2013-11-01 10:16:39 -05:00
Field G. Van Zee
2807013a47 Fixed over/under-flow in complex inversion.
Details:
- Fixed the complex bli_?inverts() macros, which were inverting elements
  in an "unsafe" manner, such that very large and very small values were
  unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
  reporting this bug.
- Comment update to bli_sumsqv_unb_var1.c.
- Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
- Changed 1.0F to 1.0 for bli_drands() macro.
2013-10-24 14:32:20 -05:00
Field G. Van Zee
be4833bd91 Added test suite modules for level-1f, 3 kernels.
Details:
- Added test modules in test suite for level-1f kernels and level-3
  micro-kernels. (Duplication in the micro-kernels, for now, is NOT
  supported by these test modules.)
- Added section override switches to test suite's input.operations file.
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
  facilitate the level-1f test modules. Also added front-end for dupl
  operation.
- Added obj_t-based check routines for level-1f operations, which are
  called from the new front-ends mentioned above.
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
  factors as a function of datatype, which is needed by their respective
  test modules.
- Whitespace changes to bli_kernel.h of all existing configurations.
2013-10-10 14:20:06 -05:00
Field G. Van Zee
5e54f46ccb Added template implementations and other tweaks.
Details:
- Added a 'template' configuration, which contains stub implementations of the
  level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
  lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
  renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
  to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
  dotxaxpyf, as well as the default fusing factor (which are all equal
  in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
  reference variants were implemented in terms of front-end routines rather
  that directly in terms of the kernels. (For example, axpy2v was implemented
  as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
  A is assumed to be m x b_n in both cases, and for dotxf A is actually used
  as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
  frame/3/gemm/ukernels and frame/3/trsm/ukernels.
2013-09-30 12:58:18 -05:00
Field G. Van Zee
da77e9614f Minor improvements to static memory allocator.
Details:
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
  a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
  functionality includes computing the pool size for each datatype (using
  that datatype's cache blocksizes) and using the maximum to size the
  actual pool array. This addresses the somewhat common pitfall whereby a
  developer updates cache blocksizes in bli_kernel.h for only one datatype
  (say, single-precision real), while the memory pools are sized using the
  double-precision real values. Then, when the developer attempts to link
  to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
  a message saying the static memory pool was exhausted. Clearly, this
  message is misleading when the pool was not sized properly to begin with.
- Removed previously disabled code in bli_kernel_macro_defs.h that was
  meant to check for size consistency among the various cache blocksizes.
  (Obviously the memory pool size-based solution mentioned above is better.)
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
  reasonable place to put these constants, rather than further crowd up
  bli_config.h.
- Updated testsuite driver to output memory pool sizes for A, B, and C.
- Minor comment updates to bli_config.h.
- Removed 'flame' configuration. It was beginning to get out-of-date, and
  I hadn't used it in months. We can always re-create it later.
2013-09-13 12:00:37 -05:00
Field G. Van Zee
7ae4d7a41d Various changes to treatment of integers.
Details:
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
  assigned values of 32, 64, or some other value. The former two result in
  defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
  causes integers to be defined in terms of a default type (e.g. long int).
- Updated bli_config.h in reference and clarksville configurations according
  to above changes.
- Updated test drivers in test and testsuite to avoid type warnings associated
  with format specifiers not matching the types of their arguments to printf()
  and scanf().
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
  inclusion in d141f9eeb6).
- Added explicit typecasting of dim_t and inc_t to macros in
  bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
- Slight changes to CREDITS and INSTALL files.
- Slight tweaks to Windows build system, mostly in the form of switching to
  Windows-style CRLF newlines for certain files.
2013-09-10 16:35:12 -05:00
Field G. Van Zee
d141f9eeb6 Added Windows build system.
Details:
- Added a 'windows' directory, which contains a Windows build system
  similar to that of libflame's. Thanks to Martin for getting this up
  and running.
- Spun off system header #includes into bli_system.h, which is included
  in blis.h
- Added a Windows section to bli_clock.c (similar to libflame's).
2013-09-09 13:09:16 -05:00
Field G. Van Zee
9013ad6ff2 Switched integer typedefs (again) to C types.
Details:
- Redefined gint_t and guint_t in terms of the standard C types long int
  and unsigned long int, respectively.
- Changed testsuite default max problem size to 500.
- Changed testsuite input.operations to use square problems for level-3
  operation tests.
2013-09-04 13:36:07 -05:00
Field G. Van Zee
981a60cfa0 Falling back to 32-bit integers for dim_t, etc.
Details:
- In light of recent segfaulting issues when compiling on 32-bit systems,
  I've changed the default typedef for gint_t and guint_t from int64_t and
  uint64_t to int32_t and uint32_t, respectively.
- Disabled 64-bit integers in the blas2blis layer for the reference
  configuration.
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
  to introductory output of the testsuite.
2013-09-04 12:09:11 -05:00
Field G. Van Zee
f8980edf9c Merge branch 'master' of https://code.google.com/p/blis 2013-07-26 11:14:27 -05:00
Field G. Van Zee
67a8b9498d Added missing cpp kernel blocksize constraints.
Details:
- Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce
  constraints on the register blocksizes relative to the cache blocksizes.
  Thanks to Tyler for helping me stumble across this issue.
2013-07-26 11:12:37 -05:00
Field G. Van Zee
6e7e452343 Fixed minor warnings and misc issues.
Details:
- Fixed various warnings output by gcc 4.6.3-1, including removing some
  set-but-not-used variables and addressing some instances of typecasting
  of pointer types to integer types of different sizes.
2013-07-22 14:50:57 -05:00
Field G. Van Zee
03f6c35997 Tightened some macros that detect datatypes.
Details:
- Modified the definitions of some macros, such as bli_is_real(), so that
  the "special" bit is taken into account so that BLIS_INT is differentiated
  from BLIS_FLOAT.
- Whitespace changes to bli_obj_macro_defs.h.
- Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
  being used.
2013-07-22 12:54:32 -05:00
Field G. Van Zee
4e80ad28c9 Added support for C99 complex types/arithmetic.
Details:
- Added support for C99 complex types to bli_type_defs.h and overloaded
  complex arithmetic to the scalar-level macros in include/level0. This
  includes a somewhat substantial reorganization and re-layering of much
  of the existing machinery present in the level0 macros.
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
  commented-out by default, which optionally enables the use of built-in
  C99 complex types and arithmetic.
- Minor changes to clarksville and reference configs' make_defs.mk files.
- Removed macro definitions from bli_param_macro_defs.h which was not being
  used (bli_proj_dt_to_real_if_imag_eq0).
2013-07-18 17:53:31 -05:00
Field G. Van Zee
aec12d90f5 Removed copynzv, copynzm and related codes.
Details:
- Removed copynzv and copynzm operation directories. These operations
  implemented a variation of copyv/m that, in the case of real source
  and complex destination operands, leaves the imaginary component
  untouched (rather than setting it to zero). I realize now that the
  special case(s) (e.g. gemm with real A and B but complex C) that I
  thought required this operation actually can be handled more simply.
- Removed level0 scalar macros implementing copynzs, copynzjs.
2013-07-10 13:33:30 -05:00
Field G. Van Zee
b0a0a0f274 Added handling of restrict, stdint.h for non-C99.
Details:
- Removed the #include <stdint.h> from blis.h and inserted a cpp macro block
  in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise
  manually typedefs the types we need (which, for now, are unconditionally
  int64_t and uint64_t).
- Moved basic typedefs to top of bli_type_defs.h, and comment changes.
- Added cpp macro block to bli_macro_defs.h that #defines restrict as
  nothing for C++ and non-C99.
2013-07-09 17:15:38 -05:00
Field G. Van Zee
4b7e7970f1 Migrated integer usage to stdint.h types.
Details:
- Changed the way bli_type_defs.h defines integer types so that dim_t,
  inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
  integer) or guint_t (general unsigned integer).
- Renamed Fortran types fchar and fint to f77_char and f77_int.
- Define f77_int as int64_t if a new configuration variable,
  BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
  These types are defined in stdint.h, which is now included in blis.h.
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
  in terms of scomplex.
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
  of char.
- Updated bla_amax() wrappers so that the return type is defined directly
  as f77_int, rather than letting the prototype-generating macro decide
  the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
  so I removed them. Also, changed the body of the wrapper so that a
  gint_t is passed into abmaxv, which is THEN typecast to an f77_int
  before returning the value.
- Updated f2c code that accessed .r and .i fields of complex and
  doublecomplex types so that they use .real and .imag instead (now that
  we are using scomplex and dcomplex).
2013-07-08 15:20:34 -05:00
Field G. Van Zee
3725013985 Added experimental bli_gemm_ker_var5().
Details:
- Added support for an experimental gemm macro-kernel incrementally
  packs one micro-panel of B at a time. This is useful for certain
  special cases of gemm where m is small.
- Minor changes to default values of clarksville configuration.
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
  do not yet have any use (or implementation support) for block storage.
- Comment update to bli_packm_init.c.
2013-07-08 11:24:18 -05:00
Field G. Van Zee
46d3d09d49 Consolidated lower/upper her[2]k blocked variants.
Details:
- Consolidated lower and upper blocked variants for herk and her2k, and
  renamed the resulting variants, according to the same changes recently
  made to trmm and trsm.
- Implemented support for four new subpartitions types:
    BLIS_SUBPART1T
    BLIS_SUBPART1B
    BLIS_SUBPART1L
    BLIS_SUBPART1R
  which correspond to "merged" partitions that include the middle "1"
  partition as well as either the neighboring "0" or "2" partition. This is
  used to clean up code in herk/her2k var2 that attempts to partition away
  the strictly zero region above or below the diagonal of a matrix operand
  that is being marched through diagonally.
- Added safeguards to herk macro-kernels that skip any leading or trailing
  zero region in the panel of C that is passed in. This is now needed given
  that herk/her2k var1 no longer partitions off this zero region before
  calling the macro-kernel (via bli_her[2]k_int()).
- Updated comments and other whitespace changes to trmm/trsm macro-kernels.
2013-06-27 13:19:56 -05:00
Field G. Van Zee
02002ef6f3 Added row-storage optimizations for trmm, trsm.
Details:
- Implemented algorithmic optimizations for trmm and trsm whereby the right
  side case is now handled explicitly, rather than induced indirectly by
  transposing and swapping strides on operands. This allows us to walk through
  the output matrix with favorable access patterns no matter how it is stored,
  for all parameter combinations.
- Renamed trmm and trsm blocked variants so that there is no longer a
  lower/upper distinction. Instead, we simply label the variants by which
  dimension is partitioned and whether the variant marches forwards or
  backwards through the corresponding partitioned operands.
- Added support for row-stored packing of lower and upper triangular matrices
  (as provided by bli_packm_blk_var3.c).
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
  blocksize  extensions (if non-zero) were not being used to appropriately size
  the first iteration (ie: the bottom/right edge case).
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
  whole multiples of MR AND NR. This is needed for the case of trsm_r where,
  in order to reuse existing left-side gemmtrsm fused micro-kernels, the
  packing of A (left-hand operand) and B (right-hand operand) is done with
  NR and MR, respectively (instead of MR and NR).
2013-06-24 17:08:14 -05:00
Field G. Van Zee
08475e7c76 Various level-3 optimizations for row storage.
Details:
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
  packing from a lower or upper-stored symmetric/Hermitian matrix to column
  panels (which are row-stored). Previously one could only pack to row panels
  (which are column-stored).
- Implemented various optimizations in the level-3 front-ends that allow more
  favorable access through row-stored matrices for gemm, hemm, herk, her2k,
  symm, syrk, and syr2k.
- Cleaned up code in level-3 front-ends that has to do with setting target and
  execution datatypes.
2013-06-11 12:18:39 -05:00
Field G. Van Zee
22b06cfcd2 Updated level-1/-1f [vector intrinsic] kernels.
Details:
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
  handled by reference implementation (rather than aborted).
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
  configuration.
- Defined bli_offset_from_alignment() macro.
- Minor edits to old test drivers.
2013-06-03 16:54:52 -05:00
Field G. Van Zee
85a6d1c9a5 Replaced axpys usage with subs in trsv.
Details:
- Replaced instances of axpys with alpha equal to -1 with subs.
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
  sizeof(dcomplex).
2013-05-29 10:58:24 -05:00
Field G. Van Zee
2d9c667f3c Fixed x86_64 kernel bugs and other minor issues.
Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
  unaligned subpartitions. We were already going out of our way a bit to
  handle edge cases in the first iteration for blocked variants, and this
  was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
  into account how the choice of variant needed to be altered for
  upper-stored matrices (given that only lower-stored algorithms are
  explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
  macros to provide inlined versions of bli_determine_blocksize_[fb]() for
  use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
  consistency with that of the bugfix for trmv/trsv (both of which now
  use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
  vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
  conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
  was invalid only because the code was expecting 1 (for purposes of
  performing contiguous vector loads) but got a value greater than 1 because
  the column stride of the object (e.g. rho) was inflated for alignment
  purposes (albeit unnecessarily since there is only one element in the
  object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
  dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
  internal back-ends to correctly handle cases where output operand still
  needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
2013-05-24 16:28:10 -05:00
Field G. Van Zee
d57ec42b34 Renamed _trans_status() macro.
Details:
- Mistakenly forgot to rename the _trans_status() macro and instances in
  previous commit.
2013-05-03 17:35:32 -05:00
Field G. Van Zee
9e2b227866 Renamed _set_trans(), _trans_status() macros.
Details:
- Renamed the following macros:
    bli_obj_set_trans()    -> bli_obj_set_onlytrans()
    bli_obj_trans_status() -> bli_obj_onlytrans_status()
  to remove ambiguity as to which bits are read/updated.
2013-05-03 17:24:58 -05:00
Field G. Van Zee
6bfa96f848 Absorbed blocksize extensions into main objects.
Details:
- Revamped some parts of commit b6ef84fad1 by adding blocksize extension
  fields to the blksz_t object rather than have them as separate structs.
- Updated all packm interfaces/invocations according to above change.
- Generalized bli_determine_blocksize_?() so that edge case optimization
  happens if and only if cache blocksizes are created with non-zero
  extensions.
- Updated comments in bli_kernel.h files to indicate that the edge case
  blocksize extension mechanism is now available for use.
2013-04-30 19:35:54 -05:00
Field G. Van Zee
096b366ddc Use cntl trees that block in n dimension.
Details:
- Updated _cntl.c files for each level-3 operation to induce blocked
  algorithms that first paritition in the n dimension with a blocksize
  of NC. Typically this is not an issue since only very large problems
  exceed that of NC. But developers often run very large problems, and
  so this extra blocking should be the default.
- Removed some recently introduced but now unused macros from
  bli_param_macro_defs.h.
2013-04-25 16:43:43 -05:00
Field G. Van Zee
b6e24b23cb Use PASTEMAC in macro-kernels (over MAC2 or MAC3).
Details:
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
  and PASTEMAC3) with those that only use a single type (PASTEMAC).
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
  accommodate above change.
- Fixed comment typo in bli_config.h files.
- Added .nfs* pattern to .gitignore.
2013-04-25 12:06:12 -05:00
Field G. Van Zee
9d10d7dd9b Added a_next, b_next arguments to micro-kernels.
Details:
- Added two more arguments to the gemm and gemmtrsm microkernels: the
  addresses of the next micro-panels of A and B. By passing these
  pointers into the micro-kernel, we allow the micro-kernel author to
  prefetch micro-panels of A and B as necessary (though this is
  completely optional; these addresses may also be safely ignored).
- Updated all seven macro-kernels so that they compute and pass in
  a_next and b_next. Note that ONLY the gemm macro-kernel computes
  a_next and b_next with the precise semantics we want. I will go back
  and fix the other macro-kernels in the near future.
- Added 'restrict' to various micro-kernels from which it was missing.
2013-04-23 16:00:18 -05:00
Field G. Van Zee
2d6f9e8379 Disabled blocksize checks for memory pools.
Details:
- Temporarily disabled checks that ensure that enough memory will be allocated
  by the contiguous memory allocator for all types, given that the values for
  double precision real are the ones used to allocate the space. These checks
  can easily go awry in certain situations, especially if you are developing for
  only one datatype. So for now, they are probably more trouble than they are
  worth.
2013-04-21 15:10:34 -05:00
Field G. Van Zee
b6ef84fad1 Allow ldim of packed micro-panels != MR, NR.
Details:
- Made substantial changes throughout the framework to decouple the leading
  dimension (row or column stride) used within each packed micro-panel from
  the corresponding register blocksize. It appears advantageous on some
  systems to use, for example, packed micro-panels of A where the column
  stride is greater than MR (whereas previously it was always equal to MR).
- Changes include:
  - Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
    to use when packing micro-panels of A and B.
  - Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
    where appropriate, instead of MR and NR.
  - Added pd field (panel dimension) to obj_t.
  - New interface to bli_packm_cntl_obj_create().
  - Renamed bli_obj_packed_length()/_width() macros to
    bli_obj_padded_length()/_width().
  - Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
  - Print out new cache and register blocksize extensions in test suite.
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
  blocksize for edge cases, which can improve performance at the margins.
2013-04-21 15:00:24 -05:00
Field G. Van Zee
19155a768d Fixed overzealous type-checking in bli_getsc().
Details:
- Relaxed type checking in getsc so that the input object could be a constant
  and not just a proper floating-point type. (If it is a constant, default to
  extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
  bug.
- Added definition for bli_is_constant() in bli_param_macro_defs.h
- Comment updates to various level-0 scalar routines.
2013-04-16 11:24:03 -05:00
Field G. Van Zee
2ee6bbca29 Fixed bug in bli_obj_is_packed() and renamed.
Details:
- This macro is used to determine whether the partitioning routines should
  call a corresponding packm_part routine instead. However, it was
  unintentionally catching matrices that were marked as "packed" by virtue
  of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
  The macro has now been renamed to bli_obj_is_panel_packed(), and now only
  checks for row or column panel packing. (Note that I first attempted to
  fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
  erroneous behavior that led me to this bug.
2013-04-15 19:27:57 -05:00
Field G. Van Zee
26cbd52e36 Modified bli_kernel.h include order in blis.h.
Details:
- Delayed #include of bli_kernel.h in blis.h to prevent a situation where
  _kernel.h includes an optimized microkernel header, which uses BLIS types
  such as dim_t and inc_t, which would precede the definition of those types
  in bli_type_defs.h.
- Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
  (immediately after that of bli_kernel.h).
2013-04-14 19:05:33 -05:00
Field G. Van Zee
d43d1a0a2e Appended 'f2c_' to abs, min, max macros in f2c.h.
Details:
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
  would not conflict with anything defined by the user (or the language).
  Thanks to Devin Matthews for suggesting this fix.
- Updated all instances of the above macros accordingly.
2013-04-11 16:28:17 -05:00
Field G. Van Zee
31b100e7bf Added new kernel blocksize macro aliases.
Details:
- Added new macros that alias level-3 cache and register blocksize macros
  to names that can be constructed via the PASTEMAC macro. These aliased
  macro definitions live inside bli_kernel_macro_defs.h, which is now
  #included after bli_kernel.h.
- Modified macro-kernels to use new aliased blocksize macros instead of
  operation-specific ones.
- Removed local, operation-specific kernel blocksize macro definitions
  (found in macro-kernel header files).
2013-04-11 11:11:52 -05:00
Field G. Van Zee
4afe3bfd82 Renamed/moved object scalar constant macros.
Details:
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
  simplier macro in bli_obj_macro_defs.h.
- Updated invocations of old macros accordingly.
- Removed bli_const_defs.h.
2013-04-09 17:45:39 -05:00
Field G. Van Zee
803871c55b Minor formatting changes. 2013-04-08 15:18:42 -05:00
Field G. Van Zee
a571af816d Fixed definition of bli_is_packed_object() macro.
Details:
- Changed the definition of bli_is_packed_object() so that it keys off of the
  value of the pack schema bits in the info field of obj_t, rather than
  comparing the obj_t buffer with that of the mem_t entry. This was the cause
  of a very low probability bug whereby uninitialized memory caused the macro
  to evaluate to TRUE even though the object in question was not packed.
  Thanks to Vernon Austel of IBM for helping discover this bug.
- Changed an abort() in bli_packm_part() to a not-yet-implemented.
2013-04-08 15:00:13 -05:00
Field G. Van Zee
7cbda15291 Added reference microkernels for arbitrary MR, NR.
Details:
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
  contain explicit loops over MR and NR, thus allowing them to be used
  unmodified by developers who want to build a reference library with
  custom register blocksizes.
- Changed config/reference/bli_kernel.h to use above ukernels by default.
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
  to use 'restrict' keyword.
- Added -funroll-loops option to config/reference/make_defs.mk.
- Updated comments in bli_kernel.h describing constraints on register and
  cache blocksizes.
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
  single-char macros are also defined.
2013-04-04 15:25:43 -05:00