Commit Graph

1942 Commits

Author SHA1 Message Date
Devrajegowda, Kiran
b074c5e09c Added a macro MATRIX_INITIALISATION for matrix initialisation in test application
Change-Id: I8e5c9902e603a549218d4e8509a481288792266d
2019-12-01 13:12:02 +05:30
Devrajegowda, Kiran
c4047e491a Merge branch 'amd-blis-nov-mergetest' into amd-staging-rome2.1
Change-Id: I1e04592dd9494faa34555008dd1edbca8a092a44
2019-11-29 23:01:51 +05:30
Dipal M Zambare
e6e66fb1f9 Fixed reentrancy issues with bli_sgemm_small() and bli_dgemm_small().
Replaced global buffer used for packing with the buffer provided by
memory pools. These buffers are checkout at the beginning of each call
and return the pool once done.

Please check comment in the above functions for details.

Change-Id: I76b3560f7efcc621a4455e834fce06f629c38f50
2019-11-27 19:10:16 +05:30
Dipal M Zambare
37badee648 Updated build infra to use python detected by auto config.
Even though configure script check the availability of correct version
of python, this information is not passed to makefiles. This results
in python scripts getting involved without interpreter. This normally
works fine as the script used the path for shebang, however it doesn't
work if the command specified by shebang is alias.

This also causes confusion that even though configure has found the
python, we end up with python not found error during build.

This fix will pass the detected version of the python interpreter to
makefiles which solved both issues mentioned above.

Change-Id: Ic04da77601ff8ad2a461e9f2f936470109cda22c
2019-11-26 14:57:47 +05:30
Meghana Vankadari
764d6f4643 changed configure script to support AOCC
Change-Id: I86d2f36f42bc6cc7e6b950f4e85087753ce5bc40
2019-11-25 15:17:04 +05:30
Devrajegowda, Kiran
85fa9e4107 resolved merge conflicts when merged with public repo master branch
Change-Id: Iad6ba809680ba5081cc9d7879794ef58cc8f8a40
2019-11-25 14:46:48 +05:30
Field G. Van Zee
bbb21fd0a9 Tweaked SIAM/SC Best Prize language in README.md. 2019-11-21 18:15:16 -06:00
Field G. Van Zee
043366f92d Fixed typo in previous commit (SIAM/SC prize). 2019-11-21 18:13:51 -06:00
Field G. Van Zee
05a4d583e6 Added SIAM/SC prize to "What's New" in README.md. 2019-11-21 18:12:24 -06:00
Field G. Van Zee
881b05ecd4 Fixed blastest failure for 'generic' subconfig.
Details:
- Fixed a subtle and complicated bug that only manifested via the BLAS
  test drivers in the generic subconfiguration, and possibly any other
  subconfiguration that did not register complex-domain gemm ukernels,
  or registered ONLY real-domain ukernels as row-preferential. This is
  a long story, but it boils down to an exception to the "transpose the
  operation to bring storage of C into agreement with ukernel pref"
  optimization in bli_hemm_front.c and bli_symm_front.c sabotaging the
  proper functioning of the 1m method, but only when the imaginary
  component of beta is zero. See the comments in issue #342 for more
  details. Thanks to Dave Love for identifying the commit in which this
  bug was introduced, and other feedback related to this bug.
2019-11-21 16:34:27 -06:00
Kiran Varaganti
27fe3d2df3 Merge "Fixed segemntation fault in trsm_small kernels for cases XAuB, XAltB, XAlB For matrix sizes which are not multiples of 4, trsm_small kernels access memory outside the allocated buffers which causes segmentation fault. This is fixed by handling each of the corner cases separately." into amd-staging-rome2.1 2019-11-21 09:19:39 -05:00
prangana
33648bbf31 CPP Test comparison util function fix
Change-Id: I6a9769efcef5f313eb318921275d37353df2b127
2019-11-21 15:57:41 +05:30
Meghana
c63a078a57 Fixed segemntation fault in trsm_small kernels for cases XAuB, XAltB, XAlB
For matrix sizes which are not multiples of 4, trsm_small kernels access memory outside the allocated buffers which causes segmentation fault.
This is fixed by handling each of the corner cases separately.

Change-Id: I267e69ee095a8ca3e8ce2a3ada5f48bfefcc2219
2019-11-21 12:31:09 +05:30
prangana
ba86a38143 Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-blis-cpp
Change-Id: I49bc3fa15e41fc287e1ca26c357edf144044943f
2019-11-21 10:04:24 +05:30
Meghana
5560f75c0c Modified makefiles for zen and zen2 to pick up compiler flags based on architecture and compiler versions
Change-Id: I443e47c38e0ffd12f4b303f546abd46d02aa31ca
2019-11-21 09:53:44 +05:30
prangana
3d20128aea Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-blis-cpp
Change-Id: I97a10ab7546d475474b0ff733bafb8248843c352
2019-11-21 00:54:16 +05:30
prangana
d63f9b7d7f checkcpp test rule in Makefile
Change-Id: If01fe55e258e563a96cd8da9ea93d21063b730c2
2019-11-21 00:52:47 +05:30
prangana
49c27040d1 Instll CPP Template headers
Change-Id: Ib15dc9bda08d1f3fdc68e31520daee90a287357c
2019-11-20 21:52:22 +05:30
prangana
5f04fdd618 CPP Templatee test files update
Change-Id: Ia9637556b50b10cb4409e18f369a3e7fc35569fb
2019-11-20 21:32:37 +05:30
Field G. Van Zee
0c7165fb01 Fixed obscure bug in bli_acquire_mpart_[mn]dim().
Details:
- Fixed a bug in bli_acquire_mpart_mdim(), bli_acquire_mpart_ndim(),
  and bli_acquire_mpart_mndim() that allowed the use of a blocksize b
  that is too large given the current row/column index (i.e., the i/j
  argument) and the size of the dimension being partitioned (i.e., the
  m/n argument). This bug only affected backwards partitioning/motion
  through the dimension and was the result of a misplaced conditional
  check-and-redirect to the backwards code path. It should be noted
  that this bug was discovered not because it manifested the way it
  could (thanks to the callers in BLIS making sure to always pass in
  the "correct" blocksize b), but could have manifested if the
  functions were used by 3rd party callers. Thanks to Minh Quan Ho for
  reporting the bug via issue #363.
2019-11-14 16:48:14 -06:00
Field G. Van Zee
fb8bef9982 Fixed copy-paste bug in bli_spackm_6xk_bb4_ref().
Details:
- Fixed a copy-paste bug in the new bli_spackm_6xk_bb4_ref() that
  manifested as failures in single-precision real level-3 operations.
  Also replaced the duplication factor constants with a const-qualifed
  varialbe, dfac, so that this won't happen again.
- Changed NC for single-precision real from 4080 to 8160 so that the
  packed matrix B will have the same byte footprint in both single
  and double real.
2019-11-14 13:05:28 -06:00
Field G. Van Zee
8f399c8940 Tweaked/added notes to docs/Multithreading.md.
Details:
- Added language to docs/Multithreading.md cautioning the reader about
  the nuances of setting multithreading parameters via the manual and
  automatic ways simultaneously, and also about how these parameters
  behave when multithreading is disabled at configure-time. These
  changes are an attempt to address the issues that arose in issue #362.
  Thanks to Jérémie du Boisberranger for his feedback on this topic.
- CREDITS file update.
2019-11-12 15:32:57 -06:00
Devrajegowda, Kiran
b5475f527d Adding context initialisation for SUP kernels in zen2 architecture
Change-Id: I9de533abb039d0dff348728be51554cc53679d10
2019-11-12 13:59:26 +05:30
Field G. Van Zee
bdc7ee3394 Various fixes to support packing duplication in B.
Details:
- Added cpp macros to trmm and trmm3 front-ends to optionally force
  those operations to be cast so the structured matrix is on the left.
  symm and hemm already had such macros, but these too were renamed so
  that the macros were individual to the operation. We now have four
  such macros:
    #define BLIS_DISABLE_HEMM_RIGHT
    #define BLIS_DISABLE_SYMM_RIGHT
    #define BLIS_DISABLE_TRMM_RIGHT
    #define BLIS_DISABLE_TRMM3_RIGHT
  Also, updated the comments in the symm and hemm front-ends related to
  the first two macro guards, and added corresponding comments to the
  trmm and trmm3 front-ends for the latter two guards. (They all
  functionally do the same thing, just for their specific operations.)
  Thanks to Jeff Hammond for reporting the bugs that led me to this
  change (via #359).
- Updated config/old/haswellbb subconfiguration (used to debug issues
  related to duplicating B during packing) to register: a packing
  kernel for single-precision real; gemmbb ukernels for s, c, and z;
  trsmbb ukernels for s, c, and z; gemmtrsmbb virtual ukrnels for s, c
  and z; and to use non-default cache and register blocksizes for s, c,
  and z datatypes. Also declared prototypes for all of the gemmbb,
  trsmbb, and gemmtrsmbb ukernel functions within the
  bli_cntx_init_haswellbb() function. This should, once applied to the
  power9 configuration, fix the remaining issues in #359.
- Defined bli_spackm_6xk_bb4_ref(), which packs single reals with a
  duplication factor of 4. This function is defined in the same file as
  bli_dpackm_6xk_bb2_ref() (bli_packm_cxk_bb_ref.c).
2019-11-11 15:47:17 -06:00
Field G. Van Zee
0eb79ca850 Avoid unused variable warning in lread.c (#356).
Details:
- Replaced the line

    f = f;

  with

    ( void )f;

  for the unused variable 'f' in blastest/f2c/lread.c. (Hopefully)
  addresses issue #356, but since we don't use xlc who knows. Thanks
  to Jeff Hammond for reporting this.
2019-11-08 14:48:48 -06:00
Jérôme Duval
f377bb4485 Add Haiku to the known OS list (#361) 2019-11-07 16:39:29 -06:00
Field G. Van Zee
e29b1f9706 Fixed failing testsuite gemmtrsm_ukr for power9.
Details:
- Added code that fixes false failures in the gemmtrsm_ukr module of the
  testsuite. The tests were failing because the computation (bli_gemv())
  that performs the numerical check was not able to properly travserse
  the matrix operands bx1 and b11 that are views into the micropanel of
  B, which has duplicated/broadcast elements under the power9 subconfig.
  (For example, a micropanel of B with duplication factor of 2 needs to
  use a column stride of 2; previously, the column stride was being
  interpreted as 1.)
- Defined separate bli_obj_set_row_stride() and bli_obj_set_col_stride()
  static functions in bli_obj_macro_defs.h. (Previously, only the
  function bli_obj_set_strides() was defined. Amazing to think that we
  got this far without these former functions.)
- Updated/expounded upon comments.
2019-11-05 17:15:19 -06:00
Field G. Van Zee
49177a6b9a Fixed latent testsuite ukr module bugs for power9.
Details:
- Fixed a latent bug in the testsuite ukernel modules (gemm, trsm, and
  gemmtrsm) that only manifested once we began running with parameters
  that mimic those of power9. The problem was rooted in the way those
  modules were creating objects (and thus allocating memory) for the
  micropanel operands to the microkernel being tested. Since power9
  duplicates/broadcasts elements of B in memory, we needed an easy way
  of asking for more than one storage element per logical element in
  the matrix. I incorrectly expressed this as:

    bli_obj_create( datatype, k, n, ldbp, 1, &bp );

  The problem here is that bli_obj_create() is exceedingly efficient
  at calculating the size it passes to malloc() and doesn't allocate a
  full leading dimension's worth of elements for the last column (or
  row, in this example). This would normally not bother anyone since
  you're not supposed to access that memory anyway. But here, my
  attempted "hack" for getting extra elements was insufficient, and
  needed to be changed to:

    bli_obj_create( datatype, k, ldbp, ldbp, 1, &bp );

  That is, the extra elements needed to be baked into the dimensions of
  the matrix object in order to have the intended effect on the number
  of elements actually allocated. Thanks to Jeff Hammond for reporting
  this bug.
- Fixed a typically harmless memory leak in the aforementioned test
  modules (the objects for the packed micropanels were not being freed).
- Updated/expanded a common comment across all three ukr test modules.
2019-11-04 18:09:37 -06:00
Field G. Van Zee
c84391314d Reverted minor temp/wspace changes from b426f9e.
Details:
- Added missing license header to bli_pwr9_asm_macros_12x6.h.
- Reverted temporary changes to various files in 'test' and 'testsuite'
  directories.
- Moved testsuite/jobscripts into testsuite/old.
- Minor whitespace/comment changes across various files.
2019-11-04 13:57:12 -06:00
Jeff Hammond
4870260f6b blacklist GCC 5 and older for POWER9 (#360) 2019-11-04 13:55:47 -06:00
Nicholai Tukanov
b426f9e04e POWER9 DGEMM (#355)
Implemented and registered power9 dgemm ukernel.

Details:
- Implemented 12x6 dgemm microkernel for power9. This microkernel 
  assumes that elements of B have been duplicated/broadcast during the
  packing step. The microkernel uses a column orientation for its 
  microtile vector registers and thus implements column storage and 
  general stride IO cases. (A row storage IO case via in-register
  transposition may be added at a future date.) It should be noted that 
  we recommend using this microkernel with gcc and *not* xlc, as issues 
  with the latter cropped up during development, including but not 
  limited to slightly incompatible vector register mnemonics in the GNU 
  extended inline assembly clobber list.
2019-11-01 17:57:03 -05:00
prangana
d21c726003 update version 2.1
Change-Id: I531fe8005f63ad138077320c3f0b03a05a7c7dd2
2019-10-30 15:33:37 +05:30
Field G. Van Zee
58102aeaa2 Merge branch 'amd' 2019-10-28 17:58:31 -05:00
Kiran Varaganti
c3d4464f03 Removed extra 'endif' statement causing build failures for zen configuration - Fixed now
Change-Id: Ia7f164209124ffae5c70e1ff7c3d131cd44b9294
2019-10-24 14:56:04 +05:30
Kiran Varaganti
97a4236c82 Matrices are not initialized when inputs dimensions are fed through file, now these are fixed. test_gemm.c contains matrices initialized for file-based inputs as well.
Change-Id: I4c3625a51dcbf64c99f56f354dcd898e66035cb1
2019-10-24 13:57:55 +05:30
Field G. Van Zee
52059506b2 Added "How to Download BLIS" section to README.md.
Details:
- Added a new section to the README.md, just prior to the "Getting
  Started" section, titled "How to Download BLIS". This section details
  the user's options for obtaining BLIS and lays out four common ways
  of downloading the library. Thanks to Jeff Diamond for his feedback
  on this topic.
2019-10-23 15:26:42 -05:00
Devrajegowda, Kiran
4158e7fffe missed changes while rebasing field's SUP code
Change-Id: I560b93c42901ca2bbd4c22e833f55ba884a38a50
2019-10-23 10:33:43 +05:30
Field G. Van Zee
e6f0a96cc5 Updated README.md to ack Facebook as funder. 2019-10-14 17:05:39 -05:00
Field G. Van Zee
b9bc222bfc Call bli_syrk_small() before error checking.
Details:
- In bli_syrk_front(), moved the conditional call to bli_syrk_check()
  (if error checking is enabled) and the conditional scaling of C by
  beta (if alpha is zero) so that they occur after, instead of before,
  the call to bli_syrk_small(). This sequencing now matches that of
  bli_gemm_small() in bli_gemm_front() and bli_trsm_small() in
  bli_trsm_front().
2019-10-14 16:38:15 -05:00
Field G. Van Zee
f0959a81db When manual config is blacklisted, output error.
Details:
- Fixed and adjusted the logic in configure so that a more informative
  error message is output when a user runs './configure ... <conf>' and
  <conf> is present in the configuration blacklist. Previously, this
  particular set of conditions would result in the message:

    'user-specified configuration '' is NOT registered!

  That is, the error message mis-identified the targeted configuration
  as the empty string, and (more importantly) mis-identifies the
  problem. Thanks to Tze Meng Low for reporting this issue.
- Fixed a nearby error messages somewhat unrelated to the issue above.
  Specifically, the wrong string was being printed when the error
  message was identifying an auto-detected configuration that did not
  appear to be registered.
2019-10-14 15:46:28 -05:00
Field G. Van Zee
6218ac95a5 Merge branch 'master' into amd 2019-10-11 11:53:51 -05:00
Field G. Van Zee
0016d541e6 Changed -march=znver2 to =znver1 for clang on zen2.
Details:
- In config/zen2/make_defs.mk, changed the -march= flag so that
  -march=znver1 is used instead of -march=znver2 when CC_VENDOR is
  clang. (The gcc branch attempts to differentiate between various
  versions, but the equivalent version cutoffs for clang are not
  yet known by us, so we have to use a single flag for all versions
  of clang. Hopefully -march=znver1 is new enough. If not, we'll
  fall back to -march=bdver4 -mno-fma4 -mno-tbm -mno-xop -mno-lwp.)
  This issue was discovered thanks to AppVeyor.
2019-10-11 11:09:44 -05:00
Field G. Van Zee
e94a0530e5 Corrected zen NC that was non-multiple of NR.
Details:
- Updated an incorrectly set cache blocksize NC for single real within
  config/zen/bli_cntx_init_zen.c that was non a multiple of the
  corresponding value of NR. This issue, which was caught by Travis CI,
  was introduced in 29b0e1e.
2019-10-11 10:48:27 -05:00
Field G. Van Zee
a2ffac7520 Merge branch 'amd-master' into amd 2019-10-11 10:31:18 -05:00
Field G. Van Zee
29b0e1ef4e Code review + tweaks to AMD's AOCL 2.0 PR (#349).
Details:
- NOTE: This is a merge commit of 'master' of git://github.com/amd/blis
  into 'amd-master' of flame/blis.
- Fixed a bug in the downstream value of BLIS_NUM_ARCHS, which was
  inadvertantly not incremented when the Zen2 subconfiguration was
  added.
- In bli_gemm_front(), added a missing conditional constraint around the
  call to bli_gemm_small() that ensures that the computation precision
  of C matches the storage precision of C.
- In bli_syrk_front(), reorganized and relocated the notrans/trans logic
  that existed around the call to bli_syrk_small() into bli_syrk_small()
  to minimize the calling code footprint and also to bring that code
  into stylistic harmony with similar code in bli_gemm_front() and
  bli_trsm_front(). Also, replaced direct accessing of obj_t fields with
  proper accessor static functions (e.g. 'a->dim[0]' becomes
  'bli_obj_length( a )').
- Added #ifdef BLIS_ENABLE_SMALL_MATRIX guard around prototypes for
  bli_gemm_small(), bli_syrk_small(), and bli_trsm_small(). This is
  strictly speaking unnecessary, but it serves as a useful visual cue to
  those who may be reading the files.
- Removed cpp macro-protected small matrix debugging code from
  bli_trsm_front.c.
- Added a GCC_OT_9_1_0 variable to build/config.mk.in to facilitate gcc
  version check for availability of -march=znver2, and added appropriate
  support to configure script.
- Cleanups to compiler flags common to recent AMD microarchitectures in
  config/zen/amd_config.mk, including: removal of -march=znver1 et al.
  from CKVECFLAGS (since the -march flag is added within make_defs.mk);
  setting CRVECFLAGS similarly to CKVECFLAGS.
- Cleanups to config/zen/bli_cntx_init_zen.c.
- Cleanups, added comments to config/zen/make_defs.mk.
- Cleanups to config/zen2/make_defs.mk, including making use of newly-
  added GCC_OT_9_1_0 and existing GCC_OT_6_1_0 to choose the correct
  set of compiler flags based on the version of gcc being used.
- Reverted downstream changes to test/test_gemm.c.
- Various whitespace/comment changes.
2019-10-11 10:24:24 -05:00
Field G. Van Zee
a617301f93 Updates to docs/CodingConventions.md. 2019-10-08 17:14:05 -05:00
Kiran Devrajegowda
b2479b1a6d Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-staging-rome2.1
Change-Id: I340e417fde52385deb3ee231e2c219214d4e278d
2019-10-07 11:19:32 +05:30
Field G. Van Zee
171f100691 Merge remote-tracking branch 'loveshack/emacs' 2019-10-04 11:18:23 -05:00
Chithra Sankar
574bdaeb48 Modified cblas.hh not to include cblas.h ,as this file gets generated after make install BLIS 2019-10-03 17:04:57 +05:30
Chithra Sankar
a000c617f9 test/Makefile reverted to correct version to retain copyright information 2019-10-03 14:46:52 +05:30