Commit Graph

5 Commits

Author SHA1 Message Date
Devin Matthews
492a736fab Fix vectorized version of bli_amaxv (#382)
* Fix vectorized version of bli_amaxv

To match Netlib, i?amax should return:
- the lowest index among equal values
- the first NaN if one is encountered

* Fix typos.

* And another one...

* Update ref. amaxv kernel too.

* Re-enabled optimized amaxv kernels.

Details:
- Re-enabled the optimized, intrinsics-based amaxv kernels in the 'zen'
  kernel set for use in haswell, zen, zen2, knl, and skx subconfigs.
  These two kernels (for s and d datatypes) were temporarily disabled in
  e186d71 as part of issue #380. However, the key missing semantic
  properties that prompted the disabling of these kernels--returning the
  index of the *first* rather than of the last element with largest
  absolute value, and returning the index of the first NaN if one is
  encountered--were added as part of #382 thanks to Devin Matthews.
  Thus, now that the kernels are working as expected once more, this
  commit causes these kernels to once again be registered for the
  affected subconfigs, which effectively reverts all code changes
  included in e186d71.
- Whitespace/formatting updates to new macros in bli_amaxv_zen_int.c.

Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-03-24 17:28:47 -05:00
Field G. Van Zee
e186d7141a Disabled optimized amaxv kernels.
Details:
- Disabled use of optimized amaxv kernels, which use vector intrinsics
  for both 's' and 'd' datatypes. We disable these kernels because the
  current implementations fail to observe a semantic property of the
  BLAS i?amax_() subroutine, which is to return the index of the
  *first* element containing the maximum absolute value (that is, the
  first element if there exist two or more elements that contain the
  same value). With the optimized kernels disabled, the affected
  subconfigurations (haswell, zen, zen2, knl, and skx) will use the
  default reference implementations. Thanks to Mat Cross for reporting
  this issue via #380.
- CREDITS file update.
2020-03-21 18:40:36 -05:00
Field G. Van Zee
fb8bef9982 Fixed copy-paste bug in bli_spackm_6xk_bb4_ref().
Details:
- Fixed a copy-paste bug in the new bli_spackm_6xk_bb4_ref() that
  manifested as failures in single-precision real level-3 operations.
  Also replaced the duplication factor constants with a const-qualifed
  varialbe, dfac, so that this won't happen again.
- Changed NC for single-precision real from 4080 to 8160 so that the
  packed matrix B will have the same byte footprint in both single
  and double real.
2019-11-14 13:05:28 -06:00
Field G. Van Zee
bdc7ee3394 Various fixes to support packing duplication in B.
Details:
- Added cpp macros to trmm and trmm3 front-ends to optionally force
  those operations to be cast so the structured matrix is on the left.
  symm and hemm already had such macros, but these too were renamed so
  that the macros were individual to the operation. We now have four
  such macros:
    #define BLIS_DISABLE_HEMM_RIGHT
    #define BLIS_DISABLE_SYMM_RIGHT
    #define BLIS_DISABLE_TRMM_RIGHT
    #define BLIS_DISABLE_TRMM3_RIGHT
  Also, updated the comments in the symm and hemm front-ends related to
  the first two macro guards, and added corresponding comments to the
  trmm and trmm3 front-ends for the latter two guards. (They all
  functionally do the same thing, just for their specific operations.)
  Thanks to Jeff Hammond for reporting the bugs that led me to this
  change (via #359).
- Updated config/old/haswellbb subconfiguration (used to debug issues
  related to duplicating B during packing) to register: a packing
  kernel for single-precision real; gemmbb ukernels for s, c, and z;
  trsmbb ukernels for s, c, and z; gemmtrsmbb virtual ukrnels for s, c
  and z; and to use non-default cache and register blocksizes for s, c,
  and z datatypes. Also declared prototypes for all of the gemmbb,
  trsmbb, and gemmtrsmbb ukernel functions within the
  bli_cntx_init_haswellbb() function. This should, once applied to the
  power9 configuration, fix the remaining issues in #359.
- Defined bli_spackm_6xk_bb4_ref(), which packs single reals with a
  duplication factor of 4. This function is defined in the same file as
  bli_dpackm_6xk_bb2_ref() (bli_packm_cxk_bb_ref.c).
2019-11-11 15:47:17 -06:00
Field G. Van Zee
31c8657f1d Added support for pre-broadcast when packing B.
Details:
- Added support for being able to duplicate (broadcast) elements in
  memory when packing matrix B (ie: the left-hand operand) in level-3
  operations. This turns out advantageous for some architectures that
  can afford the cost of the extra bandwidth and somehow benefit from
  the pre-broadcast elements (and thus being able to avoid using
  broadcast-style load instructions on micro-rows of B in the gemm
  microkernel).
- Support optionally disabling right-side hemm and symm. If this occurs,
  hemm_r is implemented in terms of hemm_l (and symm_r in terms of
  symm_l). This is needed when broadcasting during packing because the
  alternative--supporting the broadcast of B while also allowing matrix
  B to be Hermitian/symmetric--would be an absolute mess.
- Support alignment factors for packed blocks of A, B, and C separately
  (as well as for general-purpose buffers). In addition, we support
  byte offsets from those alignment values (which is different from
  aligning by align+offset bytes to begin with). The default alignment
  values are BLIS_PAGE_SIZE in all four cases, with the offset values
  defaulting to zero.
- Pass pack_t schema into bli_?packm_cxk() so that it can be then passed
  into the packm kernel, where it will be needed by packm kernels that
  perform broadcasts of B, since the idea is that we *only* want to
  broadcast when packing micropanels of B and not A.
- Added definition for variadic bli_cntx_set_l3_vir_ukrs(), which can be
  used to set custom virtual level-3 microkernels in the cntx_t, which
  would typically be done in the bli_cntx_init_*() function defined in
  the subconfiguration of interest.
- Added a "broadcast B" kernel function for use with NP/NR = 12/6,
  defined in in ref_kernels/1m/bli_packm_cxk_bb_ref.c.
- Added a gemm, gemmtrsm, and trsm "broadcast B" reference kernels
  defined in ref_kernels/3/bb. (These kernels have been tested with
  double real with NP/NR = 12/6.)
- Added #ifndef ... #endif guards around several macro constants defined
  in frame/include/bli_kernel_macro_defs.h.
- Defined a few "broadcast B" static functions in
  frame/include/level0/bb for use by "broadcast B"-style packm reference
  kernels. For now, only the real domain kernels are tested and fully
  defined.
- Output the alignment and offset values for packed blocks of A and B
  in the testsuite's "BLIS configuration info" section.
- Comment updates to various files.
- Bumped so_version to 3.0.0.
2019-09-17 17:42:10 -05:00