Commit Graph

240 Commits

Author SHA1 Message Date
Field G. Van Zee
32d8f264ae Refactored packm variants.
Details:
- Revised packm_blk_var2() and _var3() by encapsulating the general,
  hermitian/symmetric, and triangular panel-packing subproblems into
  separate functions: packm_gen_cxk(), packm_herm_cxk(), and
  packm_tri_cxk(), respectively. Also, homogenized the packm code as
  well as the new specialized packm_*_cxk() code to further improve
  readability.
2014-02-09 10:07:37 -06:00
Field G. Van Zee
6c80670287 Renamed enumerated type in testsuite and modules.
Details:
- Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and
  renamed all corresponding "impl" variables to "iface".
2014-02-07 11:27:15 -06:00
Field G. Van Zee
6c12598b1b Employ simpler INSERT_ macro for ref ukernels.
Details:
- Defined a new macro, INSERT_GENTFUNC_BASIC0, which takes only one
  argument--the base name of the function--and employed this macro
  in the reference micro-kernel files instead of the _BASIC macro,
  which takes one auxiliary argument. That argument was not being
  used and probably just acted to unnecessarily obfuscate.
2014-02-06 18:26:35 -06:00
Field G. Van Zee
32cae66326 Fixed some instances of sloppy 'restrict' usage.
Details:
- Fixed some technical incorrectness with some usage of the 'restrict'
  keyword in the reference trsm micro-kernels.
- Tweak to testsuite/Makefile that causes rebuild if libblis was
  touched.
2014-02-06 18:06:42 -06:00
Field G. Van Zee
7aceef7683 Updated comments in macro-kernels.
Details:
- Updated (and fixed some errors in) the "Assumptions/assertions" comment
  section of macro-kernels.
- Changed register blocksizes of reference configuration to MR = 8 and
  NR = 4. It's always good for MR != NR in the reference configuration
  since it may help uncover bugs related to non-square micro-kernels.
2014-02-06 17:31:19 -06:00
Field G. Van Zee
8fd292aa78 Pass panel dimensions into macro-kernels.
Details:
- Modified the interfaces to the datatype-specific macro-kernels so that:
  - pd_a and pd_b are passed in (which contain the panel dimensions of
    packed panels of a and b).
  - rs_a and cs_b are no longer passed in (they were guaranteed to be 1).
- Modified implementations of datatype-specific macro-kernels so pd_a,
  pd_b, cs_a, and rs_b are used instead of cpp macros for MR, NR, PACKMR,
  and PACKNR, respectively.
- Declare temporary c matrices (ct) as being maxmr-by-maxnr, which for now
  is equivalent to being mr-by-nr. maxmr and maxnr are declared in a new
  header file bli_kernel_post_macro_defs.h.
2014-02-06 14:32:21 -06:00
Field G. Van Zee
3404e6657e Deprecated incremental blocksize macro const defs.
Details:
- Removed macro constant definitions related to incremental blocksizes
  from all configurations' bli_kernel.h files. This change is minor and
  is mostly a cleanup related to a previous commit.
2014-02-05 11:19:10 -06:00
Field G. Van Zee
1e9afd39a6 Comment updates (removed vestiges of "bd"). 2014-02-04 20:15:19 -06:00
Field G. Van Zee
5cf58f7c2d Added early returns for "object is zeros" case.
Details:
- Added some logic to packm_init(), pack_int() and gemm_int() so that
  (a) objects marked as BLIS_ZEROS are not packed, and (b) those
  objects are not computed with. This functionality is not currently
  needed by any existing implementations, but may be used in the
  future.
2014-02-04 09:15:19 -06:00
Field G. Van Zee
6bbd4be769 Added 'f' on some gemm and trmm blocked variants.
Details:
- Added 'f' to some block variant files/functions to be consistent with
  other file/functions' naming convention. Here, the f indicates
  partitioning in the "forward" direction.
2014-02-03 13:15:25 -06:00
Field G. Van Zee
eb13cb2c6b Removed redundant non-gemm blksz_t creation.
Details:
- Removed code that creates duplicate blksz_t objects for herk, trmm,
  and trsm. Instead, the gemm blksz_t objects are accessed via extern
  and used directly. This reduces the amount of code associated with
  each of the three _cntl_init() and _cntl_finalize() function.
2014-02-03 11:07:01 -06:00
Field G. Van Zee
0a023a7d9e Introduced new level-3 front-end layer.
Details:
- Added new _front() functions for each level-3 operation. This is done
  so that the choosing of the control tree (and *only* the choosing of
  the control tree) happens in what was previously the "front end"
  (e.g. bli_gemm()). That control tree is then passed into the _front()
  function, which then performs up-front tasks such as parameter
  checking.
2014-01-29 14:02:08 -06:00
Field G. Van Zee
251c5d1121 Removed redundant hemm, her2k control trees.
Details:
- Removed code that generated a control tree specifically for hemm and
  symm. Instead, the gemm control tree is now configured so that it
  works for gemm, hemm, or symm.
- Retired most her2k code, as it was not being used. (Currently, her2k is
  implemented as two invocations of herk.) I couldn't think of many
  situations where her2k variants were needed.
- Removed some older her2k code.
2014-01-28 19:40:29 -06:00
Field G. Van Zee
5a36e5bf2f Embed func_t microkernel objects in control trees.
Details:
- Modified all control tree node definitions to include a new field of
  type func_t*, which is similar to a blksz_t except that it contains
  one function pointer (each typed simply as void*) for each datatype.
  We use the func_t* to embed pointers to the micro-kernels to use for
  the leaf-level nodes of each control tree. This change is a natural
  extension of control trees and will allow more flexibility in the
  future.
- Modified all macro-kernel wrappers to obtain the micro-kernel pointers
  from the incomming (previously ignored) control tree node and then pass
  the queried pointer into the datatype-specific macro-kernel code, which
  then casts the pointer to the appropriate type (new typedefs residing
  in bli_kernel_type_defs.h) and then uses the pointer to call the micro-
  kernel. Thus, the micro-kernel function is no longer "hard-coded" (that
  is, determined when the datatype-specific macro-kernel functions are
  instantiated by the C preprocessor).
- Added macros to bli_kernel_macro_defs.h that build datatype-specific
  base names if they do not exist already, and then uses those to build
  datatype-specific micro-kernel function names. This will allow
  developers extra flexibility if they wanted to, for example, name each
  of their datatype-specific micro-kernels differently (e.g. double
  real might be named bli_dgemm_opt_4x4() while double complex might be
  named bli_zgemm_opt_2x2()).
- Inserted appropriate code into _cntl_init() functions that allocates
  and initializes a func_t object for the corresponding micro-kernels.
  The gemm ukernel func_t object is created once, in bli_gemm_cntl_init(),
  and then reused via extern wherever possible.
2014-01-27 11:13:00 -06:00
Field G. Van Zee
6cbd6f1c7f Removed commented mixed domain macro-kernel code.
Details:
- Removed commented-out code from macro-kernels that was supposed to
  facilitate implementing mixed domain (complex times real) matrix
  multiplication. This functionality is still (probably possible),
  but I'm getting tired of looking at the code every time I edit
  a macro-kernel. Plus, there are probably ways of doing it at a
  higher level, via control trees.
2014-01-24 10:38:29 -06:00
Field G. Van Zee
29778be111 Removed b_aux field from cntl nodes.
Details:
- Removed b_aux field from all control tree node definitions. This field
  was being used in certain optimizations (incremental blocking) that were
  not actually being employed within BLIS, and are probably not employed
  by others.
- Updated all _cntl_obj_create() function definitions and invocations
  according to above change.
- Retired bli_gemm_blk_var4.c, which was one such function that employed
  incremental blocking, but which was never called by BLIS itself.
2014-01-22 16:03:11 -06:00
Field G. Van Zee
06ac727a42 Updated some comments in level-3 front ends. 2014-01-15 16:44:52 -06:00
Field G. Van Zee
d628bf1da1 Consolidated pack_t enums; retired VECTOR value.
Details:
- Changed the pack_t enumerations so that BLIS_PACKED_VECTOR no longer has
  its own value, and instead simply aliases to BLIS_PACKED_UNSPEC. This
  makes room in the three pack_t bits of the info field of obj_t so that
  two values are now unused, and may be used for other future purposes.
- Updated sloppy terminology usage in comments in level-2 front-ends.
  (Replaced "is contiguous" with more accurate "has unit stride".)
2014-01-15 11:40:12 -06:00
Field G. Van Zee
ddc8c1c379 Suppress warning in Makefile (UNINSTALL_LIBS).
Details:
- Redirect errors to /dev/null when using 'find' to locate libraries that
  would be uninstalled upon executing "make uninstall-old". Before, if the
  Makefile was read before $(INSTALL_PREFIX)/lib existed, a "No such file
  or directory" message was emitted. This message was harmless, but is now
  suppressed in this situation.
2014-01-13 14:55:43 -06:00
Field G. Van Zee
f8f67d7251 Typecast bli_getopt() return value in testsuite.
Details:
- In the test suite driver, inserted an explicit typecast of the return
  value of bli_getopt() prior parsing. The lack of typecast caused a
  problem on at least one system whereby a return value of -1 was
  interpreted as garbage character. Thanks to Francisco Igual for finding
  and submitting this fix.
2014-01-10 09:06:11 -06:00
Field G. Van Zee
e7f154fe2e Applied edge case fix to arm/neon microkernel.
Details:
- Applied an edge case bugfix, courtesy of Francisco Igual, to the current
  double precision real gemm microkernel in kernels/arm/neon/3.
2014-01-10 08:48:07 -06:00
Field G. Van Zee
89c76a8a51 Allow building outside source distribution.
Details:
- Modified build system (mostly configure and top-level Makefile) so that
  a user can build a BLIS library outside of the top-level directory of
  the source distribution.
- Added "test" target to Makefile so that the user can run "make test",
  which will compile, link, and run the testsuite binary. This works even
  if the build directory is externally located, thanks to the test suite
  binary's new -g and -o command-line options. Also, when creating the
  test suite via the top-level Makefile, the linking is against the
  local archive, in lib/<configname>, rather than at <install_prefix>/lib.
- Modified testsuite/Makefile so that it links against the library built
  locally, in ../lib/<configname>.
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
- Various other cleanups to build system.
2014-01-09 12:08:37 -06:00
Field G. Van Zee
12fa82ec12 Implemented bli_getopt().
Details:
- Added bli_getopt.c and .h files to frame/base. These files implement
  a custom version of getopt(), which may be used to parse command line
  options passed into a program via argc/argv. I am implementing this
  function myself, as opposed to using the version available via unistd.h,
  for portability reasons, as the only requirements are string.h (which
  is available via the standard C library).
- Modified test suite to allow the user to specify the file name (and/or
  path) to the parameters and operations input files: -g may be used to
  specify the general input file and -o to specify the operations input
  file). If -g or -o or both are not given, default filenames are assumed
  (as well as their existence in the current directory).
2014-01-08 16:09:26 -06:00
Field G. Van Zee
cafb58e86e Updated template micro-kernels to use auxinfo_t.
Details:
- Updated template micro-kernel implementations (located in
  config/template/kernels), to adhere to the new auxinfo_t interface.
  Meant to include this change in a0331fb1.
- Changed template configuration to use 64-bit integers (for both BLIS
  and the BLAS compatibility layer).
2014-01-06 13:28:36 -06:00
Field G. Van Zee
9ab126b499 Removed error checks in netlib->BLIS param mapping
Details:
- Disabled error checking in netlib-to-BLIS parameter mapping functions.
  If the char value input to these functions was not one of the defined
  values, bli_check_error_code() with the appropriate error code value
  would be called, resulting in an abort(). This was unnecessary and
  redundant since these routines are currently only used within the
  BLAS compatibility layer, and they are only called AFTER parameter
  checking has already been performed on the original BLAS char values.
  If the application tried to override xerbla() to prevent an abort()
  from being called, this error checking would still get in the way.
  Thus, instead of reporting the error situation to the framework (ie:
  calling abort()), an arbitrary BLIS parameter value is now chosen and
  the function returns normally. Thanks to Jeff Hammond for finding and
  reporting this issue.
2014-01-06 12:13:26 -06:00
Field G. Van Zee
2cb13600f9 Updated year in copyright headers to 2014. 2014-01-03 12:29:13 -06:00
Field G. Van Zee
290fa54e00 Store variable panel strides in trmm/trsm auxinfo.
Details:
- Changed the value being stored into the auxinfo_t structure in trmm
  and trsm macro-kernels. Whereas before we stored whatever value was
  provided to the macro-kernel implementation via ps_a/ps_b, now we
  store the stride that will advance to the next variable-length
  micro-panel of the triangular matrix A (left) or B (right).
- Whitespace changes to the files affected above.
2013-12-20 14:10:26 -06:00
Field G. Van Zee
e3a6c7e776 Macroized conditionals for a2/b2 in macro-kernels.
Details:
- Replaced conditional expressions in macro-kernels related to computing
  the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
  invocation, bli_is_last_iter(), that tests the same condition.
- Updated gemm_ukr module to use auxinfo_t argument.
- Whitespace changes in test suite ukr modules.
2013-12-19 16:29:31 -06:00
Field G. Van Zee
a0331fb10a Introduced auxinfo_t argument to micro-kernels.
Details:
- Removed a_next and b_next arguments to micro-kernels and replaced them
  with a pointer to a new datatype, auxinfo_t, which is simply a struct
  that holds a_next and b_next. The struct may hold other auxiliary
  information that may be useful to a micro-kernel, such as micro-panel
  stride. Micro-kernels may access struct fields via accessor macros
  defined in bli_auxinfo_macro_defs.h.
- Updated all instances of micro-kernel definitions, micro-kernel calls,
  as well as macro-kernels (for declaring and initializing the structs)
  according to above change.
2013-12-19 14:50:11 -06:00
Field G. Van Zee
392428dea4 Added "ri" scalar macros.
Details:
- Added set of basic scalar macros that take arguments' real and
  imaginary components separately, named like the previous set except
  with the "ris" (instead of "s") suffix.
- Redefined the previous set of scalar macros (those that take arguments
  "whole") in terms of the new "ri" set.
- Renamed setris and getris macros to sets and gets.
- Renamed setimag0 macros to seti0s.
- Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.
2013-12-12 19:01:47 -06:00
Field G. Van Zee
f60c8adc2f Minor updates to dunnington configuration.
Details:
- Added commented alternatives to dunnington configuration's bli_kernel.h.
- Minor reformatting of optimization flag variables in make_defs.mk.
2013-12-10 14:39:56 -06:00
Field G. Van Zee
4ef2015049 Tweaks to dunnington configuration (x86_64/core2).
Details:
- Updated BLIS_DEFAULT_KC_D from 256 to 384.
- Enabled cache blocksize extension of up to 25% for MC and KC (for
  double-precision real).
2013-12-09 18:53:03 -06:00
Field G. Van Zee
5ad2ce7bf5 Minor x86_64 (core2) kernel fixes.
Details:
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
  for x86_64/core2 were calling the wrong reference code (l instead
  of u).
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
  kernels.
- Minor typecasting fix in testsuite/src/test_libblis.c.
- Makefile updates.
2013-12-09 18:30:49 -06:00
Field G. Van Zee
d289f5d3a9 Whitespace changes to level-2 blocked variants.
Details:
- Joined some lines in level-2 blocked variants to match formatting used
  in level-3 blocked variants.
- Streamlined implementation of bli_obj_equals() in bli_query.c.
2013-12-05 10:56:13 -06:00
Field G. Van Zee
b444489f10 Added new "attached" scalar representation.
Details:
- Added infrastructure to support a new scalar representation, whereby
  every object contains an internal scalar that defaults to 1.0. This
  facilitates passing scalars around without having to house them in
  separate objects. These "attached" scalars are stored in the internal
  atom_t field of the obj_t struct, and are always stored to be the same
  datatype as the object to which they are attached. Level-3 variants no
  longer take scalar arguments, however, level-3 internal back-ends stll
  do; this is so that the calling function can perform subproblems such
  as C := C - alpha * A * B on-the-fly without needing to change either
  of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
  from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):

   bli_obj_init_scalar_copy_of()
                           -> bli_obj_scalar_init_detached_copy_of()
   bli_obj_init_scalar()   -> bli_obj_scalar_init_detached()
   bli_obj_create_scalar_with_attached_buffer()
                           -> bli_obj_create_1x1_with_attached_buffer()
   bli_obj_scalar_equals() -> bli_obj_equals()

- Defined new functions:

   bli_obj_scalar_detach()
   bli_obj_scalar_attach()
   bli_obj_scalar_apply_scalar()
   bli_obj_scalar_reset()
   bli_obj_scalar_has_nonzero_imag()
   bli_obj_scalar_equals()

- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:

   bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
   bli_obj_is_scalar()     -> bli_obj_is_1x1()

- Defined new macros to set and copy internal scalars between objects:

   bli_obj_set_internal_scalar()
   bli_obj_copy_internal_scalar()

- In level-3 internal back-ends, added conditional blocks where alpha and
  beta are checked for non-unit-ness. Those values for alpha and beta are
  applied to the scalars attached to aliases of A/B/C, as appropriate,
  before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
  alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
  attached to A and B are multiplied together to obtain alpha, while beta
  is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
  future support for mixed domain/precision. These can be added back later
  once that functionality is given proper treatment. Also, removed the
  creating of copy-casts of alpha and beta since typecasting of scalars
  is now implicitly handled in the internal back-ends when alpha and
  beta are applied to the attached scalars.
2013-12-03 16:08:30 -06:00
Field G. Van Zee
992de486d6 Unimplemented kernels now call reference.
Details:
- Updated arm, bgq, loongson3a, and x86_64 kernels so that unimplemented
  datatypes call the corresponding reference kernel. Previously, these
  kernel functions called abort() with a "not yet implemented" error
  message.
2013-12-02 13:58:46 -06:00
Field G. Van Zee
fd4ac636d9 Unimplemented kernels now call reference.
Details:
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
  unimplemented kernel functions simply call the corresponding reference
  implementation. (Previously, these unimplemented functions would
  abort() with a "not yet implemented" message.)
2013-12-02 13:50:36 -06:00
Field G. Van Zee
9552e6ee82 Removed optional scaling from packm control tree.
Details:
- Removed does_scale field from packm control tree node and
  bli_packm_cntl_obj_create() interface. Adjusted all invocations of
  _cntl_obj_create() accordingly.
- Redefined/renamted macros that are used in aliasing so that now,
  bli_obj_alias_to() does a full alias (shallow copy) while
  bli_obj_alias_for_packing() does a partial alias that preserves the
  pack_mem-related fields of the aliasing (destination) object.
- Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree
  will work just fine for bli_trmm3().
- Removed some commented vestiges of the typecasting functionality needed
  to support heterogeneous datatypes.
2013-11-24 11:40:31 -06:00
Field G. Van Zee
e65c476284 Minor updates to packm_blk_var2.c and _blk_var3.c.
Details:
- Comment updates to packm_blk_var2.c and packm_blk_var3.c.
- In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly
  instead of setm(), scal2m().
2013-11-19 10:05:35 -06:00
Field G. Van Zee
9e1d0d4bca Added trsm_l, trsm_u ukernels for x86_64/core2.
Details:
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
  These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
  that already existed in kernels/x86_64/core2-sse3/3.
2013-11-18 18:11:07 -06:00
Field G. Van Zee
85e7e02ea3 Merge branch 'master'. Forgot to git-pull. 2013-11-18 12:02:00 -06:00
Field G. Van Zee
67761e224c Attempting to fix errors in bgq build.
Details:
- Removed restrict declaration from b_cast and c_cast from
  bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
  are causing problems for xlc only in those two files and no other
  macro-kernels.
- Fixed (hopefully) kernel function parameter type declarations in
  kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.
2013-11-18 11:57:40 -06:00
Field G. Van Zee
707200541d Syntax error fix in x86_64/core2 gemmtrsm_u ukr. 2013-11-18 11:17:31 -06:00
Field G. Van Zee
bbe2b84a49 Updated Makefile in test, testsuite.
Details:
- Updated Makefiles in test and testsuite directories to use the new
  BLIS header installation directory scheme, which is to compile with
  -I<PREFIX>/include/blis instead of -I<PREFIX>/include.
2013-11-18 11:11:06 -06:00
Field G. Van Zee
9bd7fcfd43 Outer-to-inner 'restrict' fix in macro-kernels.
Details:
- Fixed sloppy placement of 'restrict' pointer declarations in level-3
  macro-kernels. Previously, all restricted pointers were being declared
  at the outer-most function scope level. While this violates the C99
  standard, very few of the compilers used with BLIS so far have seemed
  to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith
  for identifying this bug (and suggesting the fix).
2013-11-18 10:58:09 -06:00
Field G. Van Zee
50549a6a31 Changed header install directory to include/blis.
Details:
- Changed top-level Makefile so that headers are installed to
  $(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
  named by version/configuration and then symlinked.)
- Added uninstall targets, including uninstall-old to clean out old
  library archives.
- Added GREP makefile definitions to all configurations' make_defs.mk.
2013-11-17 18:31:27 -06:00
Field G. Van Zee
d70733abdd Added ARM kernels, configurations.
Details:
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
  Thanks to Francisco Igual for contributing these kernels and
  configurations.
2013-11-16 17:34:25 -06:00
Field G. Van Zee
d37c2cff62 Minor comment and Makefile changes.
Details:
- Added missing 'check-config' and 'check-make-defs' targets to
  testsuite/Makefile.
- Removed unused 'test' target from top-level Makefile.
- Comment changes to testsuite input files.
2013-11-13 10:47:11 -06:00
Field G. Van Zee
19885f893a Updated some kernel comment headers.
Details:
- Updated bgq and piledriver comment headers to use BLIS copyright header
  instead of libflame.
2013-11-11 12:09:21 -06:00
Field G. Van Zee
1a4d698f42 CHANGELOG update (for 0.1.0). 2013-11-11 10:15:40 -06:00