Commit Graph

253 Commits

Author SHA1 Message Date
Field G. Van Zee
fc04b5eb69 Merge pull request #3 from figual/master
New ARM armv7a kernels and Assembly file consideration in Makefile
2014-02-21 09:04:13 -06:00
Francisco Igual
d1813c9dee Added new armv7a micro-kernels and configuration files from Werner Saar. 2014-02-21 15:14:31 +01:00
Francisco Igual
0cd098c03a o Modified Makefile to consider .S assembly microkernels. 2014-02-21 15:12:30 +01:00
Field G. Van Zee
b29e1c2b27 Merge pull request #2 from tlrmchlsmth/master
Fixes and improvements to xeon phi implementation.
2014-02-14 14:11:54 -06:00
Tyler Smith
bd3c7ecfb5 Removing changes to input.general and input.operations 2014-02-14 14:05:57 -06:00
Tyler Smith
ce06686368 Fixed more Xeon Phi bugs, especially with scattered update 2014-02-14 13:52:18 -06:00
Tyler Smith
31134b5c70 Some fixes, changes, and improvements to the microkernel to the Xeon Phi 2014-02-14 11:19:44 -06:00
Field G. Van Zee
ee60377e46 Shifted some fields in info_t.
Details:
- Shifted the pack order, pack buffer type, and structure type fields
  to make room for an extra bit in the pack type/status field.
2014-02-13 14:03:31 -06:00
Field G. Van Zee
bd3ab1ad4c Minor fixes to trsm consistent with prev on trmm.
Details:
- Removed use of bli_min() and bli_max() that were only being used to
  try to support situations where the diagonal would intersect the
  short end of some micro-panels, which is situation that is disallowed
  at a higher level by various constraints on the register and cache
  blocksize. This only affected trsm_ll and trsm_lu.
- Use panel stride as passed into the macro-kernel rather than compute
  it via k and PACKMR/PACKNR. This affects all macro-kernels of trsm.
2014-02-13 09:29:55 -06:00
Field G. Van Zee
6260b0b5f8 Fixed obscure bug in trmm_ll, trmm_lu.
Details:
- Fixed an obscure bug in left-hand trmm that would only manifest when
  non-zero register blocksize extensions (PACKMR > MR or PACKNR > NR)
  are used.
- Removed use of bli_min() and bli_max() that were only being used to
  try to support situations where the diagonal would intersect the
  short end of some micro-panels, which is situation that is disallowed
  at a higher level by various constraints on the register and cache
  blocksize. This only affected trmm_ll and trmm_lu.
- Use panel stride as passed into the macro-kernel rather than compute
  it via k and PACKMR/PACKNR. This affects all macro-kernels of trmm.
2014-02-13 09:19:56 -06:00
Field G. Van Zee
16915c1c1e Fixed an obscure bug in packm_cxk().
Details:
- Fixed a bug in packm_cxk() whereby the packm ukernel was being chosen
  from ldp, which is always equal to PACKMR or PACKNR. The problem with
  this is that the pack ukernels were implicitly assuming that the
  panel dimension of the panel being packed was equal to ldp, which
  is not the case when the register blocksizes extensions are non-zero
  (ie: when PACKMR > MR or PACKNR > NR, whichever is applicable). This
  problem has been fixed by passing ldp into the pack ukernels, which
  now walk through the packed micro-panel region by incrementing by this
  value, rather than incrementing by the inherent panel dimension value
  assumed by each packm ukernel (e.g. 4 in the case of packm_ref_4xk).
- Also fixed a very minor edge case inefficiency whereby pack ukernels
  smaller than the default were not being used in edge cases, and instead
  those situations were being handled by scal2m. This is related to the
  issue above, because the pack ukernel itself was being chosen based on
  ldp instead of the panel dimension.
2014-02-11 10:54:19 -06:00
Field G. Van Zee
b7da57b282 Updated calls to packm_blk_var2() in testsuite.
Details:
- In ukernel testsuite modules, replaced calls to packm_blk_var2() with
  _var1(). Meant to include this in previous commit.
2014-02-11 10:28:23 -06:00
Field G. Van Zee
c255a293e2 Consolidated packm_blk_var2 and var3.
Details:
- Consolidated the functionality previously supported by packm_blk_var2()
  and packm_blk_var3() into a new variant, packm_blk_var1().
- Updates to packm_gen_cxk(), packm_herm_cxk.c(), and packm_tri_cxk()
  to accommodate above changes.
- Removed packm_blk_var3() and retired packm_blk_var2() to
  frame/1m/packm/old.
- Updated all level-3 _cntl_init() functions so that the new, more
  versatile packm_blk_var1 is used for all level-3 matrix packing.
2014-02-10 14:31:24 -06:00
Field G. Van Zee
32d8f264ae Refactored packm variants.
Details:
- Revised packm_blk_var2() and _var3() by encapsulating the general,
  hermitian/symmetric, and triangular panel-packing subproblems into
  separate functions: packm_gen_cxk(), packm_herm_cxk(), and
  packm_tri_cxk(), respectively. Also, homogenized the packm code as
  well as the new specialized packm_*_cxk() code to further improve
  readability.
2014-02-09 10:07:37 -06:00
Field G. Van Zee
6c80670287 Renamed enumerated type in testsuite and modules.
Details:
- Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and
  renamed all corresponding "impl" variables to "iface".
2014-02-07 11:27:15 -06:00
Field G. Van Zee
6c12598b1b Employ simpler INSERT_ macro for ref ukernels.
Details:
- Defined a new macro, INSERT_GENTFUNC_BASIC0, which takes only one
  argument--the base name of the function--and employed this macro
  in the reference micro-kernel files instead of the _BASIC macro,
  which takes one auxiliary argument. That argument was not being
  used and probably just acted to unnecessarily obfuscate.
2014-02-06 18:26:35 -06:00
Field G. Van Zee
32cae66326 Fixed some instances of sloppy 'restrict' usage.
Details:
- Fixed some technical incorrectness with some usage of the 'restrict'
  keyword in the reference trsm micro-kernels.
- Tweak to testsuite/Makefile that causes rebuild if libblis was
  touched.
2014-02-06 18:06:42 -06:00
Field G. Van Zee
7aceef7683 Updated comments in macro-kernels.
Details:
- Updated (and fixed some errors in) the "Assumptions/assertions" comment
  section of macro-kernels.
- Changed register blocksizes of reference configuration to MR = 8 and
  NR = 4. It's always good for MR != NR in the reference configuration
  since it may help uncover bugs related to non-square micro-kernels.
2014-02-06 17:31:19 -06:00
Field G. Van Zee
8fd292aa78 Pass panel dimensions into macro-kernels.
Details:
- Modified the interfaces to the datatype-specific macro-kernels so that:
  - pd_a and pd_b are passed in (which contain the panel dimensions of
    packed panels of a and b).
  - rs_a and cs_b are no longer passed in (they were guaranteed to be 1).
- Modified implementations of datatype-specific macro-kernels so pd_a,
  pd_b, cs_a, and rs_b are used instead of cpp macros for MR, NR, PACKMR,
  and PACKNR, respectively.
- Declare temporary c matrices (ct) as being maxmr-by-maxnr, which for now
  is equivalent to being mr-by-nr. maxmr and maxnr are declared in a new
  header file bli_kernel_post_macro_defs.h.
2014-02-06 14:32:21 -06:00
Field G. Van Zee
3404e6657e Deprecated incremental blocksize macro const defs.
Details:
- Removed macro constant definitions related to incremental blocksizes
  from all configurations' bli_kernel.h files. This change is minor and
  is mostly a cleanup related to a previous commit.
2014-02-05 11:19:10 -06:00
Field G. Van Zee
1e9afd39a6 Comment updates (removed vestiges of "bd"). 2014-02-04 20:15:19 -06:00
Field G. Van Zee
5cf58f7c2d Added early returns for "object is zeros" case.
Details:
- Added some logic to packm_init(), pack_int() and gemm_int() so that
  (a) objects marked as BLIS_ZEROS are not packed, and (b) those
  objects are not computed with. This functionality is not currently
  needed by any existing implementations, but may be used in the
  future.
2014-02-04 09:15:19 -06:00
Field G. Van Zee
6bbd4be769 Added 'f' on some gemm and trmm blocked variants.
Details:
- Added 'f' to some block variant files/functions to be consistent with
  other file/functions' naming convention. Here, the f indicates
  partitioning in the "forward" direction.
2014-02-03 13:15:25 -06:00
Field G. Van Zee
eb13cb2c6b Removed redundant non-gemm blksz_t creation.
Details:
- Removed code that creates duplicate blksz_t objects for herk, trmm,
  and trsm. Instead, the gemm blksz_t objects are accessed via extern
  and used directly. This reduces the amount of code associated with
  each of the three _cntl_init() and _cntl_finalize() function.
2014-02-03 11:07:01 -06:00
Field G. Van Zee
0a023a7d9e Introduced new level-3 front-end layer.
Details:
- Added new _front() functions for each level-3 operation. This is done
  so that the choosing of the control tree (and *only* the choosing of
  the control tree) happens in what was previously the "front end"
  (e.g. bli_gemm()). That control tree is then passed into the _front()
  function, which then performs up-front tasks such as parameter
  checking.
2014-01-29 14:02:08 -06:00
Field G. Van Zee
251c5d1121 Removed redundant hemm, her2k control trees.
Details:
- Removed code that generated a control tree specifically for hemm and
  symm. Instead, the gemm control tree is now configured so that it
  works for gemm, hemm, or symm.
- Retired most her2k code, as it was not being used. (Currently, her2k is
  implemented as two invocations of herk.) I couldn't think of many
  situations where her2k variants were needed.
- Removed some older her2k code.
2014-01-28 19:40:29 -06:00
Field G. Van Zee
5a36e5bf2f Embed func_t microkernel objects in control trees.
Details:
- Modified all control tree node definitions to include a new field of
  type func_t*, which is similar to a blksz_t except that it contains
  one function pointer (each typed simply as void*) for each datatype.
  We use the func_t* to embed pointers to the micro-kernels to use for
  the leaf-level nodes of each control tree. This change is a natural
  extension of control trees and will allow more flexibility in the
  future.
- Modified all macro-kernel wrappers to obtain the micro-kernel pointers
  from the incomming (previously ignored) control tree node and then pass
  the queried pointer into the datatype-specific macro-kernel code, which
  then casts the pointer to the appropriate type (new typedefs residing
  in bli_kernel_type_defs.h) and then uses the pointer to call the micro-
  kernel. Thus, the micro-kernel function is no longer "hard-coded" (that
  is, determined when the datatype-specific macro-kernel functions are
  instantiated by the C preprocessor).
- Added macros to bli_kernel_macro_defs.h that build datatype-specific
  base names if they do not exist already, and then uses those to build
  datatype-specific micro-kernel function names. This will allow
  developers extra flexibility if they wanted to, for example, name each
  of their datatype-specific micro-kernels differently (e.g. double
  real might be named bli_dgemm_opt_4x4() while double complex might be
  named bli_zgemm_opt_2x2()).
- Inserted appropriate code into _cntl_init() functions that allocates
  and initializes a func_t object for the corresponding micro-kernels.
  The gemm ukernel func_t object is created once, in bli_gemm_cntl_init(),
  and then reused via extern wherever possible.
2014-01-27 11:13:00 -06:00
Field G. Van Zee
6cbd6f1c7f Removed commented mixed domain macro-kernel code.
Details:
- Removed commented-out code from macro-kernels that was supposed to
  facilitate implementing mixed domain (complex times real) matrix
  multiplication. This functionality is still (probably possible),
  but I'm getting tired of looking at the code every time I edit
  a macro-kernel. Plus, there are probably ways of doing it at a
  higher level, via control trees.
2014-01-24 10:38:29 -06:00
Field G. Van Zee
29778be111 Removed b_aux field from cntl nodes.
Details:
- Removed b_aux field from all control tree node definitions. This field
  was being used in certain optimizations (incremental blocking) that were
  not actually being employed within BLIS, and are probably not employed
  by others.
- Updated all _cntl_obj_create() function definitions and invocations
  according to above change.
- Retired bli_gemm_blk_var4.c, which was one such function that employed
  incremental blocking, but which was never called by BLIS itself.
2014-01-22 16:03:11 -06:00
Field G. Van Zee
06ac727a42 Updated some comments in level-3 front ends. 2014-01-15 16:44:52 -06:00
Field G. Van Zee
d628bf1da1 Consolidated pack_t enums; retired VECTOR value.
Details:
- Changed the pack_t enumerations so that BLIS_PACKED_VECTOR no longer has
  its own value, and instead simply aliases to BLIS_PACKED_UNSPEC. This
  makes room in the three pack_t bits of the info field of obj_t so that
  two values are now unused, and may be used for other future purposes.
- Updated sloppy terminology usage in comments in level-2 front-ends.
  (Replaced "is contiguous" with more accurate "has unit stride".)
2014-01-15 11:40:12 -06:00
Field G. Van Zee
ddc8c1c379 Suppress warning in Makefile (UNINSTALL_LIBS).
Details:
- Redirect errors to /dev/null when using 'find' to locate libraries that
  would be uninstalled upon executing "make uninstall-old". Before, if the
  Makefile was read before $(INSTALL_PREFIX)/lib existed, a "No such file
  or directory" message was emitted. This message was harmless, but is now
  suppressed in this situation.
2014-01-13 14:55:43 -06:00
Field G. Van Zee
f8f67d7251 Typecast bli_getopt() return value in testsuite.
Details:
- In the test suite driver, inserted an explicit typecast of the return
  value of bli_getopt() prior parsing. The lack of typecast caused a
  problem on at least one system whereby a return value of -1 was
  interpreted as garbage character. Thanks to Francisco Igual for finding
  and submitting this fix.
2014-01-10 09:06:11 -06:00
Field G. Van Zee
e7f154fe2e Applied edge case fix to arm/neon microkernel.
Details:
- Applied an edge case bugfix, courtesy of Francisco Igual, to the current
  double precision real gemm microkernel in kernels/arm/neon/3.
2014-01-10 08:48:07 -06:00
Field G. Van Zee
89c76a8a51 Allow building outside source distribution.
Details:
- Modified build system (mostly configure and top-level Makefile) so that
  a user can build a BLIS library outside of the top-level directory of
  the source distribution.
- Added "test" target to Makefile so that the user can run "make test",
  which will compile, link, and run the testsuite binary. This works even
  if the build directory is externally located, thanks to the test suite
  binary's new -g and -o command-line options. Also, when creating the
  test suite via the top-level Makefile, the linking is against the
  local archive, in lib/<configname>, rather than at <install_prefix>/lib.
- Modified testsuite/Makefile so that it links against the library built
  locally, in ../lib/<configname>.
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
- Various other cleanups to build system.
2014-01-09 12:08:37 -06:00
Field G. Van Zee
12fa82ec12 Implemented bli_getopt().
Details:
- Added bli_getopt.c and .h files to frame/base. These files implement
  a custom version of getopt(), which may be used to parse command line
  options passed into a program via argc/argv. I am implementing this
  function myself, as opposed to using the version available via unistd.h,
  for portability reasons, as the only requirements are string.h (which
  is available via the standard C library).
- Modified test suite to allow the user to specify the file name (and/or
  path) to the parameters and operations input files: -g may be used to
  specify the general input file and -o to specify the operations input
  file). If -g or -o or both are not given, default filenames are assumed
  (as well as their existence in the current directory).
2014-01-08 16:09:26 -06:00
Field G. Van Zee
cafb58e86e Updated template micro-kernels to use auxinfo_t.
Details:
- Updated template micro-kernel implementations (located in
  config/template/kernels), to adhere to the new auxinfo_t interface.
  Meant to include this change in a0331fb1.
- Changed template configuration to use 64-bit integers (for both BLIS
  and the BLAS compatibility layer).
2014-01-06 13:28:36 -06:00
Field G. Van Zee
9ab126b499 Removed error checks in netlib->BLIS param mapping
Details:
- Disabled error checking in netlib-to-BLIS parameter mapping functions.
  If the char value input to these functions was not one of the defined
  values, bli_check_error_code() with the appropriate error code value
  would be called, resulting in an abort(). This was unnecessary and
  redundant since these routines are currently only used within the
  BLAS compatibility layer, and they are only called AFTER parameter
  checking has already been performed on the original BLAS char values.
  If the application tried to override xerbla() to prevent an abort()
  from being called, this error checking would still get in the way.
  Thus, instead of reporting the error situation to the framework (ie:
  calling abort()), an arbitrary BLIS parameter value is now chosen and
  the function returns normally. Thanks to Jeff Hammond for finding and
  reporting this issue.
2014-01-06 12:13:26 -06:00
Field G. Van Zee
2cb13600f9 Updated year in copyright headers to 2014. 2014-01-03 12:29:13 -06:00
Field G. Van Zee
290fa54e00 Store variable panel strides in trmm/trsm auxinfo.
Details:
- Changed the value being stored into the auxinfo_t structure in trmm
  and trsm macro-kernels. Whereas before we stored whatever value was
  provided to the macro-kernel implementation via ps_a/ps_b, now we
  store the stride that will advance to the next variable-length
  micro-panel of the triangular matrix A (left) or B (right).
- Whitespace changes to the files affected above.
2013-12-20 14:10:26 -06:00
Field G. Van Zee
e3a6c7e776 Macroized conditionals for a2/b2 in macro-kernels.
Details:
- Replaced conditional expressions in macro-kernels related to computing
  the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
  invocation, bli_is_last_iter(), that tests the same condition.
- Updated gemm_ukr module to use auxinfo_t argument.
- Whitespace changes in test suite ukr modules.
2013-12-19 16:29:31 -06:00
Field G. Van Zee
a0331fb10a Introduced auxinfo_t argument to micro-kernels.
Details:
- Removed a_next and b_next arguments to micro-kernels and replaced them
  with a pointer to a new datatype, auxinfo_t, which is simply a struct
  that holds a_next and b_next. The struct may hold other auxiliary
  information that may be useful to a micro-kernel, such as micro-panel
  stride. Micro-kernels may access struct fields via accessor macros
  defined in bli_auxinfo_macro_defs.h.
- Updated all instances of micro-kernel definitions, micro-kernel calls,
  as well as macro-kernels (for declaring and initializing the structs)
  according to above change.
2013-12-19 14:50:11 -06:00
Field G. Van Zee
392428dea4 Added "ri" scalar macros.
Details:
- Added set of basic scalar macros that take arguments' real and
  imaginary components separately, named like the previous set except
  with the "ris" (instead of "s") suffix.
- Redefined the previous set of scalar macros (those that take arguments
  "whole") in terms of the new "ri" set.
- Renamed setris and getris macros to sets and gets.
- Renamed setimag0 macros to seti0s.
- Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.
2013-12-12 19:01:47 -06:00
Field G. Van Zee
f60c8adc2f Minor updates to dunnington configuration.
Details:
- Added commented alternatives to dunnington configuration's bli_kernel.h.
- Minor reformatting of optimization flag variables in make_defs.mk.
2013-12-10 14:39:56 -06:00
Field G. Van Zee
4ef2015049 Tweaks to dunnington configuration (x86_64/core2).
Details:
- Updated BLIS_DEFAULT_KC_D from 256 to 384.
- Enabled cache blocksize extension of up to 25% for MC and KC (for
  double-precision real).
2013-12-09 18:53:03 -06:00
Field G. Van Zee
5ad2ce7bf5 Minor x86_64 (core2) kernel fixes.
Details:
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
  for x86_64/core2 were calling the wrong reference code (l instead
  of u).
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
  kernels.
- Minor typecasting fix in testsuite/src/test_libblis.c.
- Makefile updates.
2013-12-09 18:30:49 -06:00
Field G. Van Zee
d289f5d3a9 Whitespace changes to level-2 blocked variants.
Details:
- Joined some lines in level-2 blocked variants to match formatting used
  in level-3 blocked variants.
- Streamlined implementation of bli_obj_equals() in bli_query.c.
2013-12-05 10:56:13 -06:00
Field G. Van Zee
b444489f10 Added new "attached" scalar representation.
Details:
- Added infrastructure to support a new scalar representation, whereby
  every object contains an internal scalar that defaults to 1.0. This
  facilitates passing scalars around without having to house them in
  separate objects. These "attached" scalars are stored in the internal
  atom_t field of the obj_t struct, and are always stored to be the same
  datatype as the object to which they are attached. Level-3 variants no
  longer take scalar arguments, however, level-3 internal back-ends stll
  do; this is so that the calling function can perform subproblems such
  as C := C - alpha * A * B on-the-fly without needing to change either
  of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
  from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):

   bli_obj_init_scalar_copy_of()
                           -> bli_obj_scalar_init_detached_copy_of()
   bli_obj_init_scalar()   -> bli_obj_scalar_init_detached()
   bli_obj_create_scalar_with_attached_buffer()
                           -> bli_obj_create_1x1_with_attached_buffer()
   bli_obj_scalar_equals() -> bli_obj_equals()

- Defined new functions:

   bli_obj_scalar_detach()
   bli_obj_scalar_attach()
   bli_obj_scalar_apply_scalar()
   bli_obj_scalar_reset()
   bli_obj_scalar_has_nonzero_imag()
   bli_obj_scalar_equals()

- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:

   bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
   bli_obj_is_scalar()     -> bli_obj_is_1x1()

- Defined new macros to set and copy internal scalars between objects:

   bli_obj_set_internal_scalar()
   bli_obj_copy_internal_scalar()

- In level-3 internal back-ends, added conditional blocks where alpha and
  beta are checked for non-unit-ness. Those values for alpha and beta are
  applied to the scalars attached to aliases of A/B/C, as appropriate,
  before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
  alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
  attached to A and B are multiplied together to obtain alpha, while beta
  is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
  future support for mixed domain/precision. These can be added back later
  once that functionality is given proper treatment. Also, removed the
  creating of copy-casts of alpha and beta since typecasting of scalars
  is now implicitly handled in the internal back-ends when alpha and
  beta are applied to the attached scalars.
2013-12-03 16:08:30 -06:00
Field G. Van Zee
992de486d6 Unimplemented kernels now call reference.
Details:
- Updated arm, bgq, loongson3a, and x86_64 kernels so that unimplemented
  datatypes call the corresponding reference kernel. Previously, these
  kernel functions called abort() with a "not yet implemented" error
  message.
2013-12-02 13:58:46 -06:00
Field G. Van Zee
fd4ac636d9 Unimplemented kernels now call reference.
Details:
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
  unimplemented kernel functions simply call the corresponding reference
  implementation. (Previously, these unimplemented functions would
  abort() with a "not yet implemented" message.)
2013-12-02 13:50:36 -06:00