Commit Graph

1293 Commits

Author SHA1 Message Date
Devin Matthews
dafca7a0c2 Fix botched memory addressing in Penryn kernel (no effect for GAS output). 2018-06-25 16:20:10 -05:00
Devin Matthews
de493b0f34 Merge pull request #226 from devinamatthews/dev
Finish macroization of assembly ukernels.
2018-06-25 14:26:06 -05:00
Field G. Van Zee
195480beb5 Merge branch 'master' into dev 2018-06-25 13:24:21 -05:00
Field G. Van Zee
3f387ca35e Fixed bugs in configure's select_cc() function.
Details:
- This commit fixes several bugs in configure relating to selecting a C
  compiler. By dumb luck, two of the two bugs sort of cancelled each
  other out in most use cases, which manifested as the expected behavior.
  Thanks to Mathieu Poumeyrol for bringing this issue to our attention,
  and to Devin Matthews for suggesting the more portable way of
  capturing both stdout and stderr and suggesting a return code check
  instead of testing stdout/stderr.
- The first bug: As the values of the compiler search list are iterated
  over, only stderr is captured when querying a compiler with --version
  rather than both stdout and stderr.
- The second bug: After each query, a conditional attempted to test
  whether the query resulted in anything being output. That conditional
  erroneously was using "-z" instead of "-n" for non-emptiness. Thus,
  most of the time, stderr was empty (because the --version info was
  being output on stdout), and since it was empty, the -z conditional
  (intended to execute only when a compiler was found to be responsive)
  executed.
- A third bug was also fixed in the way that the merged stdout/stderr
  output was tested for non-emptiness (moving the 'cat' invocation to
  another line and testing the contents of a variable instead).
- The three bugs above have been fixed as part of a partial rewrite of
  the select_cc() function in terms of a return code check, which
  obviated the need to save the output of stdout and stderr.
- The fourth bug involved a misnamed variable in the right-hand side
  of a statement intended to prepend CC to search_list when CC was
  non-empty. This typically did not manifest as a bug since usually CC
  (if it was set) was set to a value that was known to work.
2018-06-25 12:32:03 -05:00
Devin Matthews
a7166feb10 Finish macroization of assembly ukernels. 2018-06-25 12:09:18 -05:00
Field G. Van Zee
f986396c2a Added 'configure --help' text for CFLAGS, LDFLAGS.
Details:
- Added mention of the new support for preset CFLAGS, LDFLAGS to the
  bottom of the text output by './configure --help'.
- Updated usage example to use 'haswell' instead of 'sandybridge'.
2018-06-22 18:12:40 -05:00
Field G. Van Zee
884175d9ff Added configure support for preset CFLAGS, LDFLAGS.
Details:
- Any preexisting values set to the CFLAGS environment variable (or the
  CFLAGS variable if given on the command line) are saved by configure
  for later inclusion (prepending, to be precise) along with the
  compiler flags automatically determined by the BLIS build system.
  LDFLAGS is treated in a similar manner.) Thanks to Dave Love for
  requesting this feature in issue #223 and Mathieu Poumeyrol for his
  support on this and a previous related issue.
- Comment updates to build/config.mk.in.
- Strip whitespace from return value of various cflags functions in
  common.mk.
2018-06-22 18:08:43 -05:00
Field G. Van Zee
07c3d0a951 Update to CREDITS file. 2018-06-21 12:35:07 -05:00
Devin Matthews
a1ebbbf158 Merge pull request #224 from devinamatthews/asm-macros
Asm macros
2018-06-20 15:37:53 -05:00
Devin Matthews
c81c6f23b9 Fix problem with inc and dec macros. 2018-06-20 15:20:44 -05:00
Devin Matthews
5a63971c82 Merge remote-tracking branch 'upstream/dev' into asm-macros 2018-06-20 14:07:49 -05:00
Devin Matthews
b4d94e54d4 Convert x86 microkernels to assembly macros. 2018-06-20 14:07:24 -05:00
Field G. Van Zee
17928b1c99 Added static funcs bli_dt_domain(), bli_dt_prec().
Details:
- Added definitions of static functions bli_dt_domain()/bli_dt_prec(),
  which extract a dom_t domain or prec_t precision value, respectively,
  from a num_t datatype.
- Changed the return types of bli_obj_domain() and bli_obj_prec() from
  objbits_t to dom_t and prec_t. (Not sure why they were ever set to
  return objbits_t.)
2018-06-19 17:59:03 -05:00
Field G. Van Zee
5f7fbb7115 Static funcs for projecting dt to single/double.
Details:
- Added static functions for projecting a datatype to single precision
  or double precision, both for obj_t's storage datatypes and standalone
  datatypes.
2018-06-19 15:38:55 -05:00
Field G. Van Zee
d4a22702c7 Set up haswell config for optional col-pref ukrs.
Details:
- Added two presently-disabled cpp blocks in bli_cntx_init_haswell.c to
  easily allow one to switch to a set of column-preferential gemm
  microkernels (in the haswell subconfiguration). The second column-
  preferring block sets the the register blocksizes to their appropriate
  values. However, cache blocksizes are left unchanged, and therefore are
  likely suboptimal. This should be addressed later.
2018-06-19 14:54:57 -05:00
Field G. Van Zee
f317c2e31b Added get/set static funcs for exec dt/dom/prec.
Details:
- Added functions to bli_obj_macro_defs.h to get and set the target
  domain and target precision bits in the obj_t, and also added the
  appropriate support in bli_type_defs.h.
2018-06-19 12:21:23 -05:00
Field G. Van Zee
e88a5b8da8 Implemented castm, castv operations.
Details:
- Implemented castm and castv operations, which behave like copym and
  copyv except where the obj_t operands can be of different datatypes.
  These new operations, however, unlike copym/copyv, do not build upon
  existing level-1v kernels.
- Reorganized projm, projv into a 'proj' subdirectory of frame/base (to
  match the newly added frame/base/cast directory).
- Added new macros to bli_gentfunc_macro_defs.h, _gentprot_macro_defs.h
  that insert GENTFUNC2/GENTPROT2 macros for all non-homogeneous datatype
  combinations. Previously, one had to invoke two additional macros--one
  which mixed domains only and another that included all remaining
  cases--in order to get full type combination coverage.
- Defined a new static function, bli_set_dims_incs_2m(), to aid in the
  setting of various variables in the implementations of bli_??castm().
  This static function joins others like it in bli_param_macro_defs.h.
- Comment update to bli_copysc.h.
2018-06-18 15:56:26 -05:00
Field G. Van Zee
2000cdff59 Update to CREDITS file. 2018-06-18 14:17:28 -05:00
Field G. Van Zee
ed2c8aed84 Temporarily disabled small matrix handling on zen.
Details:
- Disabled small matrix handling in config/zen/bli_family_zen.h due to
  what appears to be a bug that manifests as failures in the single and
  double precision real level-3 BLAS test drivers (visible via
  out.sblat3 and out.dblat3). Thanks to Robin Christ for reporting this
  issue.
2018-06-18 11:49:34 -05:00
Field G. Van Zee
ed20392c50 Added get/set static funcs for exec dt/dom/prec.
Details:
- Added functions to bli_obj_macro_defs.h to get and set the execution
  domain and execution precision bits in the obj_t.
- Added/rearranged a few functions in bli_obj_macro_defs.h.
- Renamed some macros in bli_type_defs.h: EXECUTION -> EXEC.
2018-06-15 16:31:22 -05:00
Field G. Van Zee
22594e8e9a Updated sandbox/ref99 according to f97a86f.
Details:
- Applied changes to ref99 sandbox analagous to those applied to
  framework code in f97a86f. This involves setting the pack schemas of
  A and B objects temporarily to communicate those desired schemas to
  the control tree creation function in blx_gemm_cntl.c. This allows us
  to (henceforth) query the schemas from the control tree rather than
  the context.
2018-06-14 17:35:23 -05:00
Field G. Van Zee
1b5d0424d2 Prototype column-preferential zen gemm ukernels.
Details:
- Added prototypes to bli_kernels_zen.h for each of the four gemm
  microkernels that prefer outputting to column storage.
2018-06-13 18:41:32 -05:00
Field G. Van Zee
f88c2e7a53 Defined static function bli_blksz_scale_def_max().
Details:
- Added a new static function to bli_blksz.h that scales both the default
  (regular) blocksize as well as the maximum blocksize in the blksz_t
  object. Reminder: maximum blocksizes have different meanings in
  different contexts. For register blocksizes, they refer to the packing
  register blocksizes (PACKMR or PACKNR) while for cache blocksizes, they
  refer to the maximum blocksize to use during the final iteration of a
  loop.
2018-06-13 18:27:46 -05:00
Field G. Van Zee
87db5c048e Changed usage of virtual microkernel slots in cntx.
Details:
- Changed the way virtual microkernels are handled in the context.
  Previously, there were query routines such as bli_cntx_get_l3_ukr_dt()
  which returned the native ukernel for a datatype if the method was
  equal to BLIS_NAT, or the virtual ukernel for that datatype if the
  method was some other value. Going forward, the context native and
  virtual ukernel slots will both be initialized to native ukernel
  function pointers for native execution, and for non-native execution
  the virtual ukernel pointer will be something else. This allows us
  to always query the virtual ukernel slot (from within, say, the
  macrokernel) without needing any logic in the query routine to decide
  which function pointer (native or virtual) to return. (Essentially,
  the logic has been shifted to init-time instead of compute-time.)
  This scheme will also allow generalized virtual ukernels as a way
  to insert extra logic in between the macrokernel and the native
  microkernel.
- Initialize native contexts (in bli_cntx_ref.c) with native ukernel
  function addresses stored to the virtual ukernel slots pursuant to
  the above policy change.
- Renamed all static functions that were native/virtual-ambiguous, such
  as bli_cntx_get_l3_ukr_dt() or bli_cntx_l3_ukr_prefers_cols_dt()
  pursuant to the above polilcy change. Those routines now use the
  substring "get_l3_vir_ukr" in their name instead of "get_l3_ukr". All
  of these functions were static functions defined in bli_cntx.h, and
  most uses were in level-3 front-ends and macrokernels.
- Deprecated anti_pref bool_t in context, along with related functions
  such as bli_cntx_l3_ukr_eff_dislikes_storage_of(), now that 1m's
  panel-block execution is disabled.
2018-06-12 19:38:37 -05:00
Field G. Van Zee
dbaf440540 Merge branch 'master' into dev 2018-06-11 12:37:04 -05:00
Field G. Van Zee
2610fff0b0 Renamed 1m packm kernels from _1e to _1er.
Details:
- Renamed the reference packm kernels used by 1m. Previously, they used
  a _1e suffix, which was confusing since they packed to both 1e and 1r
  schemas. This was likely an artifact of the time when there were
  separate kernels for each schema before I decided to combine them into
  a single function (per datatype and panel dimension), and the 1e
  functions were the ones to inherit the 1r functionality. The kernels
  have now been renamed to use a _1er suffix.
2018-06-11 12:32:54 -05:00
Field G. Van Zee
712de9b371 Added missing semicolon in 03obj_view.c
Details:
- Thanks to Tony Skjellum for pointing out this typo due to a
  last-minute change to the source prior to committing.
2018-06-09 14:36:30 -05:00
Field G. Van Zee
043d0cd37e Implemented bli_acquire_mpart(), added example code.
Details:
- Implemented bli_acquire_mpart(), a general-purpose submatrix view
  function that will alias an obj_t to be a submatrix "view" of an
  existing obj_t.
- Renumbered examples in examples/oapi and inserted a new example file,
  03obj_view.c, which shows how to use bli_acquire_mpart() to obtain
  submatrix views of existing objects, which can then be used to
  indirectly modify the parent object.
2018-06-09 13:46:49 -05:00
Field G. Van Zee
f1908d3976 Fixed broken input.operations.fast.
Details:
- Removed three input lines from input.operations.fast (labeled
  "test sequential micro-kernel") that I intended to remove in bd02c4e.
  These lines prevented 'make check' (and 'make checkblis-fast') from
  completing correctly. Note: This bug was fixed in 3df39b3, but that
  commit has not yet been merged into master, hence this redundant
  commit. Thanks to Robert van de Geijn for reporting this issue.
2018-06-08 14:22:22 -05:00
Field G. Van Zee
262a62e348 Fixed undefined ref in steamroller/excavator configs.
Details:
- Fixed erroneous calls to bli_cntx_init_piledriver_ref() in
  bli_cntx_init_steamroller() and bli_cntx_init_excavator(), which
  should have been to their respectively-named bli_cntx_init_*()
  functions instead. Thanks to qnerd for bringing these bugs to our
  attention.
2018-06-08 12:10:54 -05:00
Field G. Van Zee
22aa44ebec Merge branch 'dev' of github.com:flame/blis into dev 2018-06-07 17:42:59 -05:00
Field G. Van Zee
65fae95074 Implemented bli_setrm, _setim, _setrv, _setiv.
Details:
- Defined new wrappers to setm/setv operations in frame/base/bli_setri.c
  that will target only the real or only the imaginary parts of a
  matrix/vector object.
- Updated bli_obj_real_part() so that the complex-specific portions of
  the function are not executed if the object is real.
- Defined bli_obj_imag_part().
  - Caveat: If bli_obj_imag_part() is called on a real object, it does
    nothing, leaving the destination object untouched. The caller must
    take care to only call the function on complex objects.
- Reordered some of the static functions in bli_obj_macro_defs.h related
  to aliasing.
2018-06-07 17:41:09 -05:00
Field G. Van Zee
b65d0b841b Fixed bug in bli_dt_proj_to_complex().
Details:
- Fixed a bug identical to the one fixed in 0a4a27e, except this time in
  the bli_obj_param_defs.h header file. It looks like the only consumers
  of this static function were in bli_l0_oapi.c, and so this may not have
  been manifesting (yet).
2018-06-07 14:38:41 -05:00
Field G. Van Zee
55b6abdf74 Enforce consistent datatypes in most object APIs.
Details:
- Added logic to level-1v, -1d, -1f, -1m, -2, and -3 operations' _check()
  functions to ensure that all operands are of the same datatype. There
  are some exceptions that were left out, such as the _check() function
  for the various norm operations since they have a different idea of
  datatype consistency (ie: the norm object must be the real projection
  of the primary input vector/matrix object).
2018-06-07 14:08:12 -05:00
Field G. Van Zee
513138b1a1 Defined/implemented bli_projv().
Details:
- Added an implementation for bli_projv() to go along with the
  implementation of bli_projm() added in 0a4a27e. The only difference
  between the two is that bli_projv() may only be used on vectors,
  whereas bli_projm() is general-purpose.
- Added a _check() function corresponding to bli_projv().
2018-06-07 12:24:47 -05:00
Field G. Van Zee
5f71c1e719 Merge branch 'dev' of github.com:flame/blis into dev 2018-06-06 19:06:14 -05:00
Field G. Van Zee
b5a641e968 Added char-to-dt and dt-to-char mapping functions.
Details:
- Defined additional functions in bli_param_map.c:
    bli_param_map_char_to_blis_dt()
    bli_param_map_blis_to_char_dt()
  which will map a char to its corresponding num_t, or vice versa.
2018-06-06 19:05:37 -05:00
Field G. Van Zee
0a4a27e1a4 Defined/implemented bli_projm().
Details:
- Defined a new operation in frame/base/bli_proj.c, bli_projm(), which
  behaves like bli_copym(), except that operands a and b are allowed to
  contain data of differing domains (e.g. a is real while b is complex,
  or vice versa). The file is named bli_proj.c, rather than bli_projm.c,
  with the intention that a 'v' vector version of the function may be
  added to the same file (at some point in the future).
- Added supporting bli_check_*() functions in bli_check.c to confirm
  consistent precisions between to datatypes/objects, as well as the
  appropriate error message in bli_error.c and a new error code in
  bli_type_defs.h.
- Wrote a bli_projm_check() function to go along with bli_projm().
- Defined static function bli_obj_real_part() in bli_obj_macro_defs.h,
  which will initialize an obj_t alias to the real part of the source
  object.
- Fixed a bug in the static function bli_dt_proj_to_complex(), found
  in bli_param_macro_defs.h. Thankfully, there were no calls to the
  function to produce buggy behavior.
2018-06-06 19:02:29 -05:00
Field G. Van Zee
3df39b37a0 Fixed recently broken input.operations.fast.
Details:
- Removed "test sequential front-end" lines from microkernel test
  entries of input.operations.fast. This change was meant for inclusion
  in bd02c4e but was missed due to slightly different wording of the
  comment (I used "sed //d" to remove the lines). This fixes the broken
  'make checkblis-fast' (and 'make check') targets.
2018-06-06 15:35:05 -05:00
Field G. Van Zee
3f48c38164 Cosmetic fix to configure output in config.mk.
Details:
- Fixed configure so that MK_ENABLE_MEMKIND is assigned "no" when the
  option is disabled due to libmemkind not being present. This wasn't
  affecting anything since the one use of the variable (in common.mk)
  was formulated as "ifeq ($(MK_ENABLE_MEMKIND),yes)". That is, the
  variable being empty was effectively equivalent to it being set to
  "no".
- Comment updates to build/config.mk.in, common.mk.
2018-06-05 16:52:35 -05:00
Field G. Van Zee
5df201260f Merge branch 'master' into dev 2018-06-05 16:14:19 -05:00
Field G. Van Zee
1b9af85ec9 Updated ref99 call to _cntx_set_thrloop_from_env().
Details:
- Reordered the arguments in the ref99 sandbox's call to
  bli_cntx_set_thrloop_from_env() to be consistent with the updated
  function signature from f97a86f. Thanks to Devangi Parikh for
  reporting this issue.
2018-06-05 16:07:13 -05:00
Tyler Michael Smith
96d2774b4c Make bli_auxinfo_next_b() return b_next, not a_next (#216) 2018-06-05 07:17:39 -05:00
Field G. Van Zee
bd02c4e9f7 Cleanups to testsuite, input.operations format.
Details:
- Removed the line in each operation entry in input.operations titled
  "test sequential front-end" and the corresponding support for the lines
  in the testsuite input parsing code. This line was included in the some
  of the earliest versions of the testsuite, back when I intended to
  eventually have separate multithreaded APIs. Specifically, I envisioned
  that multithreaded and sequential testing could be enabled or disabled
  on an operation level. However, BLIS evolved in a different direction
  and still does not have multithreaded-specific APIs (even if it will
  eventually someday). But even if it did have such APIs, I doubt I would
  allow the user to enable/disable them on an operation level. Thus, this
  was a zombie future parameter that was never used and never made sense
  to begin with. The one instance of the front_seq variable, used in the
  various libblis_test_<operation>() functions to guard the call to the
  operation test driver, that remains was commented out instead of
  deleted so that someday it could be easily changed via sed, if desired.
- Various minor cleanups to the testsuite code, including consolidating
  use of DISABLE and DISABLE_ALL and reexpressing certain conditional
  expressions in the libblis_test_<operation>() functions in terms of
  boolean functions.
2018-06-04 13:42:17 -05:00
Field G. Van Zee
2c6d99b99e Fixed names out of alphabetical order in CREDITS. 2018-06-03 18:13:36 -05:00
Field G. Van Zee
7a207e8f2c Disabled indirect blacklisting (issue #214).
Details:
- Return early from function, pass_config_kernel_registries(), that
  implements indirect blacklisting of subconfigurations (during pass 0).
  In short, I realized that indirect blacklisting is not needed in the
  situations I envisioned, and can actually cause problems under certain
  circumstances. Thanks to Tony Skjellum for reporting the issue (#214)
  that led to this commit, and to Devin Matthews for prompting me to
  realize that indirect blacklisting was unnecessary, at least as
  originally envisioned.
2018-06-03 18:04:27 -05:00
Field G. Van Zee
d7fb326820 Fixed syntax artifacts from 4b36e85 in examples.
Details:
- Fixed artifacts of malformed recursive sed expressions used when
  preparing 4b36e85, in which most function-like macros were converted
  to static functions. The syntactically defective code was contained
  entirely in examples/oapi. Thanks to Tony Skjellum for reporting this
  issue.
- Update to CREDITS file.
2018-06-03 13:20:37 -05:00
Field G. Van Zee
ed7dedfd4a Merge branch 'master' into dev 2018-06-02 20:29:53 -05:00
Field G. Van Zee
f97a86f322 Updated setting/querying pack schema (cntx->cntl).
- Query pack schemas in level-3 bli_*_front() functions and store those
  values in the schema bitfields of the correponding obj_t's when the
  cntx's method is not BLIS_NAT. (When method is BLIS_NAT, the default
  native schemas are stored to the obj_t's.)
- In bli_l3_cntl_create_if(), query the schemas stored to the obj_t's in
  bli_*_front(), clear the schema bitfields, and pass the queried values
  into bli_gemm_cntl_create() and bli_trsm_cntl_create().
- Updated APIs for bli_gemm_cntl_create() and bli_trsm_cntl_create() to
  take schemas for A and B, and use these values to initialize the
  appropriate control tree nodes. (Also cpp-disabled the panel-block cntl
  tree creation variant, bli_gemmpb_cntl_create(), as it has not been
  employed by BLIS in quite some time.)
- Simplified querying of schema in bli_packm_init() thanks to above
  changes.
- Updated openmp and pthreads definitions of bli_l3_thread_decorator()
  so that thread-local aliases of matrix operands are guaranteed, even
  if aliasing is disabled within the internal back-end functions (e.g.
  bli_gemm_int.c). Also added a comment to bli_thrcomm_single.c
  explaining why the extra aliasing is not needed there.
- Change bli_gemm() and level-3 friends so that the operation's ind()
  function is called only if all matrix operands have the same datatype,
  and only if that datatype is complex. The former condition is needed
  in preparation for work related to mixed domain operands, while the
  latter helps with readability, especially for those who don't want to
  venture into frame/ind.
- Reshuffled arguments in bli_cntx_set_thrloop_from_env() to be
  consistent with BLIS calling conventions (modified argument(s) are
  last), and updated all invocations in the level-3 _front() functions.
- Comment updates to bli_cntx_set_thrloop_from_env().
2018-06-02 20:28:20 -05:00
Field G. Van Zee
965db85d29 Updated macro invocations in bli_gemm_ker_var2.c.
Details:
- Updated "get next a/b micropanel" macro invocations in
  bli_gemm_ker_var2.c according to changes in 9588625.
- Comment update in bli_cntx.c.
2018-06-01 12:32:15 -05:00