Commit Graph

215 Commits

Author SHA1 Message Date
Field G. Van Zee
2cb13600f9 Updated year in copyright headers to 2014. 2014-01-03 12:29:13 -06:00
Field G. Van Zee
290fa54e00 Store variable panel strides in trmm/trsm auxinfo.
Details:
- Changed the value being stored into the auxinfo_t structure in trmm
  and trsm macro-kernels. Whereas before we stored whatever value was
  provided to the macro-kernel implementation via ps_a/ps_b, now we
  store the stride that will advance to the next variable-length
  micro-panel of the triangular matrix A (left) or B (right).
- Whitespace changes to the files affected above.
2013-12-20 14:10:26 -06:00
Field G. Van Zee
e3a6c7e776 Macroized conditionals for a2/b2 in macro-kernels.
Details:
- Replaced conditional expressions in macro-kernels related to computing
  the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
  invocation, bli_is_last_iter(), that tests the same condition.
- Updated gemm_ukr module to use auxinfo_t argument.
- Whitespace changes in test suite ukr modules.
2013-12-19 16:29:31 -06:00
Field G. Van Zee
a0331fb10a Introduced auxinfo_t argument to micro-kernels.
Details:
- Removed a_next and b_next arguments to micro-kernels and replaced them
  with a pointer to a new datatype, auxinfo_t, which is simply a struct
  that holds a_next and b_next. The struct may hold other auxiliary
  information that may be useful to a micro-kernel, such as micro-panel
  stride. Micro-kernels may access struct fields via accessor macros
  defined in bli_auxinfo_macro_defs.h.
- Updated all instances of micro-kernel definitions, micro-kernel calls,
  as well as macro-kernels (for declaring and initializing the structs)
  according to above change.
2013-12-19 14:50:11 -06:00
Field G. Van Zee
392428dea4 Added "ri" scalar macros.
Details:
- Added set of basic scalar macros that take arguments' real and
  imaginary components separately, named like the previous set except
  with the "ris" (instead of "s") suffix.
- Redefined the previous set of scalar macros (those that take arguments
  "whole") in terms of the new "ri" set.
- Renamed setris and getris macros to sets and gets.
- Renamed setimag0 macros to seti0s.
- Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.
2013-12-12 19:01:47 -06:00
Field G. Van Zee
f60c8adc2f Minor updates to dunnington configuration.
Details:
- Added commented alternatives to dunnington configuration's bli_kernel.h.
- Minor reformatting of optimization flag variables in make_defs.mk.
2013-12-10 14:39:56 -06:00
Field G. Van Zee
4ef2015049 Tweaks to dunnington configuration (x86_64/core2).
Details:
- Updated BLIS_DEFAULT_KC_D from 256 to 384.
- Enabled cache blocksize extension of up to 25% for MC and KC (for
  double-precision real).
2013-12-09 18:53:03 -06:00
Field G. Van Zee
5ad2ce7bf5 Minor x86_64 (core2) kernel fixes.
Details:
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
  for x86_64/core2 were calling the wrong reference code (l instead
  of u).
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
  kernels.
- Minor typecasting fix in testsuite/src/test_libblis.c.
- Makefile updates.
2013-12-09 18:30:49 -06:00
Field G. Van Zee
d289f5d3a9 Whitespace changes to level-2 blocked variants.
Details:
- Joined some lines in level-2 blocked variants to match formatting used
  in level-3 blocked variants.
- Streamlined implementation of bli_obj_equals() in bli_query.c.
2013-12-05 10:56:13 -06:00
Field G. Van Zee
b444489f10 Added new "attached" scalar representation.
Details:
- Added infrastructure to support a new scalar representation, whereby
  every object contains an internal scalar that defaults to 1.0. This
  facilitates passing scalars around without having to house them in
  separate objects. These "attached" scalars are stored in the internal
  atom_t field of the obj_t struct, and are always stored to be the same
  datatype as the object to which they are attached. Level-3 variants no
  longer take scalar arguments, however, level-3 internal back-ends stll
  do; this is so that the calling function can perform subproblems such
  as C := C - alpha * A * B on-the-fly without needing to change either
  of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
  from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):

   bli_obj_init_scalar_copy_of()
                           -> bli_obj_scalar_init_detached_copy_of()
   bli_obj_init_scalar()   -> bli_obj_scalar_init_detached()
   bli_obj_create_scalar_with_attached_buffer()
                           -> bli_obj_create_1x1_with_attached_buffer()
   bli_obj_scalar_equals() -> bli_obj_equals()

- Defined new functions:

   bli_obj_scalar_detach()
   bli_obj_scalar_attach()
   bli_obj_scalar_apply_scalar()
   bli_obj_scalar_reset()
   bli_obj_scalar_has_nonzero_imag()
   bli_obj_scalar_equals()

- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:

   bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
   bli_obj_is_scalar()     -> bli_obj_is_1x1()

- Defined new macros to set and copy internal scalars between objects:

   bli_obj_set_internal_scalar()
   bli_obj_copy_internal_scalar()

- In level-3 internal back-ends, added conditional blocks where alpha and
  beta are checked for non-unit-ness. Those values for alpha and beta are
  applied to the scalars attached to aliases of A/B/C, as appropriate,
  before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
  alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
  attached to A and B are multiplied together to obtain alpha, while beta
  is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
  future support for mixed domain/precision. These can be added back later
  once that functionality is given proper treatment. Also, removed the
  creating of copy-casts of alpha and beta since typecasting of scalars
  is now implicitly handled in the internal back-ends when alpha and
  beta are applied to the attached scalars.
2013-12-03 16:08:30 -06:00
Field G. Van Zee
992de486d6 Unimplemented kernels now call reference.
Details:
- Updated arm, bgq, loongson3a, and x86_64 kernels so that unimplemented
  datatypes call the corresponding reference kernel. Previously, these
  kernel functions called abort() with a "not yet implemented" error
  message.
2013-12-02 13:58:46 -06:00
Field G. Van Zee
fd4ac636d9 Unimplemented kernels now call reference.
Details:
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
  unimplemented kernel functions simply call the corresponding reference
  implementation. (Previously, these unimplemented functions would
  abort() with a "not yet implemented" message.)
2013-12-02 13:50:36 -06:00
Field G. Van Zee
9552e6ee82 Removed optional scaling from packm control tree.
Details:
- Removed does_scale field from packm control tree node and
  bli_packm_cntl_obj_create() interface. Adjusted all invocations of
  _cntl_obj_create() accordingly.
- Redefined/renamted macros that are used in aliasing so that now,
  bli_obj_alias_to() does a full alias (shallow copy) while
  bli_obj_alias_for_packing() does a partial alias that preserves the
  pack_mem-related fields of the aliasing (destination) object.
- Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree
  will work just fine for bli_trmm3().
- Removed some commented vestiges of the typecasting functionality needed
  to support heterogeneous datatypes.
2013-11-24 11:40:31 -06:00
Field G. Van Zee
e65c476284 Minor updates to packm_blk_var2.c and _blk_var3.c.
Details:
- Comment updates to packm_blk_var2.c and packm_blk_var3.c.
- In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly
  instead of setm(), scal2m().
2013-11-19 10:05:35 -06:00
Field G. Van Zee
9e1d0d4bca Added trsm_l, trsm_u ukernels for x86_64/core2.
Details:
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
  These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
  that already existed in kernels/x86_64/core2-sse3/3.
2013-11-18 18:11:07 -06:00
Field G. Van Zee
85e7e02ea3 Merge branch 'master'. Forgot to git-pull. 2013-11-18 12:02:00 -06:00
Field G. Van Zee
67761e224c Attempting to fix errors in bgq build.
Details:
- Removed restrict declaration from b_cast and c_cast from
  bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
  are causing problems for xlc only in those two files and no other
  macro-kernels.
- Fixed (hopefully) kernel function parameter type declarations in
  kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.
2013-11-18 11:57:40 -06:00
Field G. Van Zee
707200541d Syntax error fix in x86_64/core2 gemmtrsm_u ukr. 2013-11-18 11:17:31 -06:00
Field G. Van Zee
bbe2b84a49 Updated Makefile in test, testsuite.
Details:
- Updated Makefiles in test and testsuite directories to use the new
  BLIS header installation directory scheme, which is to compile with
  -I<PREFIX>/include/blis instead of -I<PREFIX>/include.
2013-11-18 11:11:06 -06:00
Field G. Van Zee
9bd7fcfd43 Outer-to-inner 'restrict' fix in macro-kernels.
Details:
- Fixed sloppy placement of 'restrict' pointer declarations in level-3
  macro-kernels. Previously, all restricted pointers were being declared
  at the outer-most function scope level. While this violates the C99
  standard, very few of the compilers used with BLIS so far have seemed
  to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith
  for identifying this bug (and suggesting the fix).
2013-11-18 10:58:09 -06:00
Field G. Van Zee
50549a6a31 Changed header install directory to include/blis.
Details:
- Changed top-level Makefile so that headers are installed to
  $(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
  named by version/configuration and then symlinked.)
- Added uninstall targets, including uninstall-old to clean out old
  library archives.
- Added GREP makefile definitions to all configurations' make_defs.mk.
2013-11-17 18:31:27 -06:00
Field G. Van Zee
d70733abdd Added ARM kernels, configurations.
Details:
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
  Thanks to Francisco Igual for contributing these kernels and
  configurations.
2013-11-16 17:34:25 -06:00
Field G. Van Zee
d37c2cff62 Minor comment and Makefile changes.
Details:
- Added missing 'check-config' and 'check-make-defs' targets to
  testsuite/Makefile.
- Removed unused 'test' target from top-level Makefile.
- Comment changes to testsuite input files.
2013-11-13 10:47:11 -06:00
Field G. Van Zee
19885f893a Updated some kernel comment headers.
Details:
- Updated bgq and piledriver comment headers to use BLIS copyright header
  instead of libflame.
2013-11-11 12:09:21 -06:00
Field G. Van Zee
1a4d698f42 CHANGELOG update (for 0.1.0). 2013-11-11 10:15:40 -06:00
Field G. Van Zee
089048d589 Added object wrappers to 1f test suite modules.
Details:
- Added missing object wrappers to level-1f test suite modules. This was
  only apparent if you were configuring with something other than the
  reference configuration.
- Commented out object-wrappers in level-1f front-ends. These were not
  working as intended the reference configuration was selected, because
  most kernel sets, such as those in the template set, do not have object
  wrappers.
- Whitespace changes to template micro-kernels.
- Comment changes to template level-1f kernel headers.
0.1.0
2013-11-09 17:18:00 -06:00
Field G. Van Zee
9ef3752079 Updated template kernels wrt KernelsHowTo wiki.
Details:
- Merged latest state of KernelsHowTo wiki into template micro-kernels
  located in config/template/kernels/3.
2013-11-08 17:20:47 -06:00
Field G. Van Zee
376bbb59c8 Removed support for duplication.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
  and all framework code.
- Updated test suite modules according to above changes.
2013-11-08 11:17:34 -06:00
Field G. Van Zee
68a5910974 Added comments to testsuite/input.operations.
Details:
- Added extensive comments to the top of testsuite/input.operations,
  which describe how to edit the file.
- Removed input.operations.0 and input.operations.1.
- Changed input.general to test all datatypes ("sdcz") by default.
2013-11-07 11:36:11 -06:00
Field G. Van Zee
a98f78b715 Changed dim_t and inc_t to be signed integers.
Details:
- Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
  This will facilitate interoperability with Fortran in the future.
  (Fortran does not support unsigned integers.)
- Redefined many instances of stride-related macros so that they return
  or use the absolute value of the strides, rather than the raw strides
  which may now be signed. Added new macros bli_is_row_stored_f() and
  bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
  and changed the packm_blk_var[23] variants to use these macros instead
  of the existing bli_is_row_stored(), bli_is_col_stored().
- Added/adjusted typecasting to to various functions/macros, including
  bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
  related macros in bli_param_macro_defs.h.
- Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
  layer properly handles situations where vector increments are negative.
  Thanks to Vladimir Sukharev for pointing out this issue.
- Changed type of increment parameters in bli_adjust_strides() from dim_t
  to inc_t. Likewise in bli_check_matrix_strides().
- Defined bli_check_matrix_object(), which checks for negative strides.
- Redefined bli_check_scalar_object() and bli_check_vector_object() so
  that they also check for negative stride.
- Added instances of bli_check_matrix_object() to various operations'
  _check routines.
2013-11-06 15:32:47 -06:00
Field G. Van Zee
1f8afc3e08 Minor comment update to BLAS compat files. 2013-11-06 10:09:10 -06:00
Field G. Van Zee
1abbf768af Fixed bugs in scalv and setv.
Details:
- Fixed bugs similar to those addressed in cca1e1f51d, whereby
  a segmentation fault may occur if beta is not the same type as
  the vector operand for scalv and setv.
- Changed axpyv and scal2v front-ends in a similar fashion.
2013-11-04 15:50:00 -06:00
Field G. Van Zee
f5953259a1 Fixed a bug related to Hermitian matrix diagonals.
Details:
- Fixed a bug whereby BLIS assumed that the imaginary components of the
  diagonal elements of Hermitian matrices were already zero. This property
  is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
  to Vladimir Sukharev for reporting this bug.
- Minor comment updates to template kernels.
2013-11-04 14:43:55 -06:00
Field G. Van Zee
d70f2b089d Added scaling to abval2s, sqrt2s macros.
Details:
- Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
  and overflow from squaring the real and imaginary components. (This is
  the same technique used to fix recent bugs in invscals/invscaljs and
  inverts.)
2013-11-02 17:19:40 -05:00
Field G. Van Zee
c5b1ed9409 Added new dotxaxpyf variant 2.
Details:
- Added a new variant for dotxaxpyf that is based on dotxf and axpyf
  kernels. By default, this variant is not used by any other operation.
2013-11-01 10:28:04 -05:00
Field G. Van Zee
97f89fbcf2 Fixed bug in complex invscals.
Details:
- Fixed complex inversion in invscals and invscaljs whereby the
  imaginary component was being computed incorrectly.
- Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
  in inverts, invscals, and invscaljs.
- Changed bli_abs() and bli_fabs() macro definitions to use "<="
  operator instead of "<".
2013-11-01 10:16:39 -05:00
Field G. Van Zee
eda42a21d1 Defined missing symbols in bla_rotg.c
Details:
- Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
  z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
  d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
  these bugs.
2013-10-31 18:00:44 -05:00
Field G. Van Zee
cca1e1f51d Fixed bugs in scalm and setm.
Details:
- Fixed bugs in scalm and setm that resulted in segmentation faults when
  beta is not the same type as the matrix operand. Thanks to Vladimir
  Sukharev for reporting this bug.
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
  and setm; namely, the alpha scalar is copy-cast the type of the first
  matrix operand.
- Changed the template and reference configurations' bli_config.h files
  so that the number of memory allocator blocks of A and B are set based
  on BLIS_MAX_NUM_THREADS.
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.
2013-10-30 14:39:01 -05:00
Field G. Van Zee
2807013a47 Fixed over/under-flow in complex inversion.
Details:
- Fixed the complex bli_?inverts() macros, which were inverting elements
  in an "unsafe" manner, such that very large and very small values were
  unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
  reporting this bug.
- Comment update to bli_sumsqv_unb_var1.c.
- Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
- Changed 1.0F to 1.0 for bli_drands() macro.
2013-10-24 14:32:20 -05:00
Field G. Van Zee
45a80c625f Fixed parameter checking issue in BLAS syr[2]k.
Details:
- Fixed a minor parameter checking bug in the BLAS compatibility layer
  for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the
  trans parameter of either operation, it is (a) allowed, and (b) treated
  as 'T' (whereas previously it was disallowed). Thanks for Vladimir
  Sukharev for finding and reporting this bug.
2013-10-23 12:15:25 -05:00
Field G. Van Zee
a091a219bd Minor fixes to piledriver configuration, ukernel.
Details:
- Applied a patch from Tyler that fixes minor staleness in the piledriver
  configuration and gemm micro-kernel.
- Very minor changes to test suite input files.
2013-10-14 10:11:29 -05:00
Field G. Van Zee
dacdde27ae Added Fran's Sandy Bridge kernels/configuration.
Details:
- Added a kernel directory for kernels developed by Francisco Igual for
  the Sandy Bridge architecture, including a dgemm ukernel coded with
  AVX intrinsics.
- Added a configuration for Sandy Bridge using values supplied by Fran.
2013-10-11 11:37:19 -05:00
Field G. Van Zee
03106d650e Fixed minor perf bug in gemm_ker_var2.
Details:
- Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental
  bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not
  computed correctly (ie: do not wraparound) at the edge cases. Thanks to
  Tze Meng for helping me identify this bug.
2013-10-11 10:40:38 -05:00
Field G. Van Zee
b053337387 Added fusing factors, MR/NR to test suite output.
Details:
- Updated the test suite driver (and modules where appropriate) so that
  the level-1f fusing factors are output along with the variable dimension.
  While this is not strictly necessary, since the fusing factors are output
  in the initial parameter summary, it allows extra reassurance to the user
  since the fusing factors appear alongside the variable dimension, which
  together give a complete picture of the problem size. Similar changes were
  made for outputting the register blocksizes when reporting results for the
  micro-kernel test modules.
2013-10-10 18:26:55 -05:00
Field G. Van Zee
be4833bd91 Added test suite modules for level-1f, 3 kernels.
Details:
- Added test modules in test suite for level-1f kernels and level-3
  micro-kernels. (Duplication in the micro-kernels, for now, is NOT
  supported by these test modules.)
- Added section override switches to test suite's input.operations file.
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
  facilitate the level-1f test modules. Also added front-end for dupl
  operation.
- Added obj_t-based check routines for level-1f operations, which are
  called from the new front-ends mentioned above.
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
  factors as a function of datatype, which is needed by their respective
  test modules.
- Whitespace changes to bli_kernel.h of all existing configurations.
2013-10-10 14:20:06 -05:00
Field G. Van Zee
680188d46b Cleaned up old test drivers.
Details:
- Minor updates to old test drivers in preparation for our participation
  in ACM TOMS's replicated results initiative.
2013-10-10 13:23:37 -05:00
Field G. Van Zee
3690bdd4f9 More updates to level-1f kernels for core2-sse3.
Details:
- Changed types in function signatures to match new prototypes. Meant to
  include this in previous commit.
2013-10-10 11:45:33 -05:00
Field G. Van Zee
661d5120cd Fixed outdated fusing factor macros in 1f kernels.
Details:
- Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
  macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
  this out.
2013-10-10 11:27:27 -05:00
Field G. Van Zee
73aa1e9f31 Added section overrides to test suite.
Details:
- Added new lines of input to the test suite's input.operations file, which
  allows the user to disable entire sections (levels) of tests. Before this
  change, the user had to manually disable each operation tests's "master
  switch". (This is why input.operations.0 existed: to allow a more
  convenient starting point for someone who only wanted to test one or a
  few operations.)
2013-10-01 17:01:18 -05:00
Field G. Van Zee
5e54f46ccb Added template implementations and other tweaks.
Details:
- Added a 'template' configuration, which contains stub implementations of the
  level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
  lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
  renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
  to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
  dotxaxpyf, as well as the default fusing factor (which are all equal
  in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
  reference variants were implemented in terms of front-end routines rather
  that directly in terms of the kernels. (For example, axpy2v was implemented
  as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
  A is assumed to be m x b_n in both cases, and for dotxf A is actually used
  as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
  frame/3/gemm/ukernels and frame/3/trsm/ukernels.
2013-09-30 12:58:18 -05:00