Commit Graph

1965 Commits

Author SHA1 Message Date
Meghana Vankadari
8eb264f78b Change in threshold condition for trsm_small kernels
Change-Id: I396e246b1639d300fcb94bdf7e5fa8bc8c87e994
2019-12-16 18:54:48 +05:30
Nallani Bhaskar
a8af07f68c Added support to handle unsupported storage formats in sgemmsup using normal/small gemm path
Change-Id: I8762059c89e50f60e765a2a2983c5b2bdcdd8bc1
2019-12-13 15:31:28 +05:30
Kiran Devrajegowda
21224e8264 Merge "Revert " Merge Selective Packing code from amd branch flame/blis"" into amd-staging-rome-rel-2.1 2019-12-13 00:45:34 -05:00
Nallani Bhaskar
10a26a7357 Merge "Fix for CPUPL-550: AOCC clang compiler error. Resolved: Duplicate back to back declaration of a lable in asm file" into amd-staging-rome-rel-2.1 2019-12-13 00:25:49 -05:00
Kiran Varaganti
1650bcb623 Revert " Merge Selective Packing code from amd branch flame/blis"
This reverts commit e4a6af33f5.

Reason for revert: <Review not done>

Change-Id: Iae548f949a81a66281023c860c2bcffdfdae21b2
2019-12-13 00:01:35 -05:00
Nallani Bhaskar
dc4e7d1203 Fix for CPUPL-550: AOCC clang compiler error. Resolved: Duplicate back to back declaration of a lable in asm file
Change-Id: I82c386d5fc00139da74fa031980d65c6a3874bd0
2019-12-12 20:43:47 +05:30
Devrajegowda, Kiran
e4a6af33f5 Merge Selective Packing code from amd branch flame/blis
Change-Id: I6d577f67ec84febe6af3635b10e5c9c77844ccd2
2019-12-12 15:22:21 +05:30
Kiran Varaganti
edc8f04dea Merge "Fix for CPUPL-541,When threading is enabled blis-mt library gets generated otherwise blis.a gets generated for sequential builds. However blis.h header file will differ for both type of libraries. The difference is about enable/disable of defining BLIS_ENABLE_OPENMP or BLIS_ENABLE_PTHREADS when threading is enabled. Appropriate header file needs to be included in the application." into amd-staging-rome-rel-2.1 2019-12-11 01:26:06 -05:00
Nallani Bhaskar
44edee7404 Added support to handle 7x16,8x16,9x16 efficiently in 6x16n kernel 2019-12-10 16:09:46 +05:30
Kiran Varaganti
82ec21f1c7 Fix for CPUPL-541,When threading is enabled blis-mt library gets generated otherwise blis.a gets generated for sequential builds. However blis.h header file will differ for both type of libraries. The difference is about enable/disable of defining BLIS_ENABLE_OPENMP or BLIS_ENABLE_PTHREADS when threading is enabled. Appropriate header file needs to be included in the application.
Change-Id: I82a4ae2bf529eedd83868e059a43749714cbe246
2019-12-10 15:40:27 +05:30
Kiran Varaganti
9b6c04d075 Merge " change in threshold condition for SUP and small kernels" into amd-staging-rome-rel-2.1 2019-12-08 23:42:25 -05:00
Devrajegowda, Kiran
3192914a1c change in threshold condition for SUP and small kernels
Change-Id: I7dbd30b2004c67122a639f081efc36e0f0d69fad
2019-12-09 01:31:58 +05:30
Kiran Varaganti
27d2b5a0db Merge "Made some improvements to trsm_small kernels" into amd-staging-rome-rel-2.1 2019-12-06 05:21:34 -05:00
Meghana
17b3a2639e Made some improvements to trsm_small kernels
Interchanged some loops to favour column-major storage.
Added check condiion to identify last column and load it using a 'for' loop to avoid memory accesses out of buffer

Change-Id: Id5d2e16c65017a7f4b641d33228d23903efd09ac
2019-12-06 14:48:28 +05:30
Nallani Bhaskar
af94ba29cf Added sup support for sgemm under zen and related frame work changes.
Change-Id: Ia7e88b96d3a3617e8d24754f50db081ffe2e9955
2019-12-04 10:56:10 +05:30
Meghana Vankadari
31bfe8985f re-enabling the boundary check condition for bli_dtrsm_small_AlXB. It was disabled by mistake in previous commits.
Change-Id: Ib7d2d0c5e133ff10559ce3dc5f7e624707e43c11
2019-12-03 17:07:37 +05:30
Meghana
cef185250e Fixed Segmentation fault in trsm_small kernels for the case AlXB.
For matrix sizes which are not multiples of 4, trsm_small kernels access memory outside the allocated buffers which causes segmentation fault.
This is fixed by handling each of the corner cases separately.

Change-Id: Ia7cfad5d65339a209a7376cc1654382593c933af
2019-12-03 17:05:57 +05:30
Meghana
fb75044ea2 Removed zen and zen2 configurations from amd64 family
amd64 family supports all the architectures before zen.
Assigned (BLIS_ARCH_GENERIC+1) to BLIS_NUM_ARCHS in order to avoid update for every new architecture.

Change-Id: I8241e643f6dfd0ebe272e053ca8b6a9c1463d9dc
2019-12-03 16:48:34 +05:30
Devrajegowda, Kiran
b074c5e09c Added a macro MATRIX_INITIALISATION for matrix initialisation in test application
Change-Id: I8e5c9902e603a549218d4e8509a481288792266d
2019-12-01 13:12:02 +05:30
prangana
d72b509fbb Pass actual enum type to bli_mem_set_buf_type function if C++
Change-Id: I63b4926963c361429b001f7ae93d9b544e9be95b
2019-11-30 17:57:42 +05:30
prangana
13249e83e2 Replace bli_thread_init_rntm with bli_rntm_init_from_global in zen small gemm
Change-Id: I14fb2795b483368580ff3fcf5f537723f3845377
2019-11-30 16:33:10 +05:30
prangana
e0fb039a60 Merge branch 'amd' of https://github.com/flame/blis into amd-blis-nov-mergetest
Change-Id: I59325783883d67bb33e938aea8c34d8e3d6832fb
2019-11-30 12:52:14 +05:30
Field G. Van Zee
efa61a6c8b Added missing bli_l3_sup_thread_decorator() symbol.
Details:
- Defined dummy versions of bli_l3_sup_thread_decorator() for Openmp
  and pthreads so that those builds don't fail when performing shared
  library linking (especially for Windows DLLs via AppVeyor). For now,
  these dummy implementations of bli_l3_sup_thread_decorator() are
  merely carbon-copies of the implementation provided for single-
  threaded execution (ie: the one found in bli_l3_sup_decor_single.c).
  Thus, an OpenMP or pthreads build will be able to use the gemmsup
  code (including the new selective packing functionality), as it did
  before 39fa7136, even though it will not actually employ any
  multithreaded parallelism.
2019-11-29 16:17:04 -06:00
Field G. Van Zee
39fa7136f4 Added support for selective packing to gemmsup.
Details:
- Implemented optional packing for A or B (or both) within the sup
  framework (which currently only supports gemm). The request for
  packing either matrix A or matrix B can be made via setting
  environment variables BLIS_PACK_A or BLIS_PACK_B (to any
  non-zero value; if set, zero means "disable packing"). It can also
  be made globally at runtime via bli_pack_set_pack_a() and
  bli_pack_set_pack_b() or with individual rntm_t objects via
  bli_rntm_set_pack_a() and bli_rntm_set_pack_b() if using the expert
  interface of either the BLIS typed or object APIs. (If using the
  BLAS API, environment variables are the only way to communicate the
  packing request.)
- One caveat (for now) with the current implementation of selective
  packing is that any blocksize extension registered in the _cntx_init
  function (such as is currently used by haswell and zen subconfigs)
  will be ignored if the affected matrix is packed. The reason is
  simply that I didn't get around to implementing the necessary logic
  to pack a larger edge-case micropanel, though this is entirely
  possible and should be done in the future.
- Spun off the variant-choosing portion of bli_gemmsup_ref() into
  bli_gemmsup_int(), in bli_l3_sup_int.c.
- Added new files, bli_l3_sup_packm_a.c, bli_l3_sup_packm_b.c, along
  with corresponding headers, in which higher-level packm-related
  functions are defined for use within the sup framework. The actual
  packm variant code resides in bli_l3_sup_packm_var.c.
- Pass the following new parameters into var1n and var2m: packa, packb
  bool_t's, pointer to a rntm_t, pointer to a cntl_t (which is for now
  always NULL), and pointer to a thrinfo_t* (which for nowis the address
  of the global single-threaded packm thread control node).
- Added panel strides ps_a and ps_b to the auxinfo_t structure so that
  the millikernel can query the panel stride of the packed matrix and
  step through it accordingly. If the matrix isn't packed, the panel
  stride of interest for the given millikernel will be set to the
  appropriate value so that the mkernel may step through the unpacked
  matrix as it normally would.
- Modified the rv_6x8m and rv_6x8n millikernels to read the appropriate
  panel strides (ps_a and ps_b, respectively) instead of computing them
  on the fly.
- Spun off the environment variable getting and setting functions into
  a new file, bli_env.c (with a corresponding prototype header). These
  functions are now used by the threading infrastructure (e.g.
  BLIS_NUM_THREADS, BLIS_JC_NT, etc.) as well as the selective packing
  infrastructure (e.g. BLIS_PACK_A, BLIS_PACK_B).
- Added a static initializer for mem_t objects, BLIS_MEM_INITIALIZER.
- Added a static initializer for pblk_t objects, BLIS_PBLK_INITIALIZER,
  for use within the definition of BLIS_MEM_INITIALIZER.
- Moved the global_rntm object to bli_rntm.c and extern it where needed.
  This means that the function bli_thread_init_rntm() was renamed to
  bli_rntm_init_from_global() and relocated accordingly.
- Added a new bli_pack.c function, which serves as the home for
  functions that manage the pack_a and pack_b fields of the global
  rntm_t, including from environment variables, just as we have
  functions to manage the threading fields of the global rntm_t in
  bli_thread.c.
- Reorganized naming for files in frame/thread, which mostly involved
  spinning off the bli_l3_thread_decorator() functions into their own
  files. This change makes more sense when considering the further
  addition of bli_l3_sup_thread_decorator() functions (for now limited
  only to the single-threaded form found in the  _single.c file).
- Explicitly initialize the reference sup handlers in both
  bli_cntx_init_haswell.c and bli_cntx_init_zen.c so that it's more
  obvious how to customize to a different handler, if desired.
- Removed various snippets of disabled code.
- Various comment updates.
2019-11-29 15:27:07 -06:00
Devrajegowda, Kiran
c4047e491a Merge branch 'amd-blis-nov-mergetest' into amd-staging-rome2.1
Change-Id: I1e04592dd9494faa34555008dd1edbca8a092a44
2019-11-29 23:01:51 +05:30
Dipal M Zambare
e6e66fb1f9 Fixed reentrancy issues with bli_sgemm_small() and bli_dgemm_small().
Replaced global buffer used for packing with the buffer provided by
memory pools. These buffers are checkout at the beginning of each call
and return the pool once done.

Please check comment in the above functions for details.

Change-Id: I76b3560f7efcc621a4455e834fce06f629c38f50
2019-11-27 19:10:16 +05:30
Dipal M Zambare
37badee648 Updated build infra to use python detected by auto config.
Even though configure script check the availability of correct version
of python, this information is not passed to makefiles. This results
in python scripts getting involved without interpreter. This normally
works fine as the script used the path for shebang, however it doesn't
work if the command specified by shebang is alias.

This also causes confusion that even though configure has found the
python, we end up with python not found error during build.

This fix will pass the detected version of the python interpreter to
makefiles which solved both issues mentioned above.

Change-Id: Ic04da77601ff8ad2a461e9f2f936470109cda22c
2019-11-26 14:57:47 +05:30
Meghana Vankadari
764d6f4643 changed configure script to support AOCC
Change-Id: I86d2f36f42bc6cc7e6b950f4e85087753ce5bc40
2019-11-25 15:17:04 +05:30
Devrajegowda, Kiran
85fa9e4107 resolved merge conflicts when merged with public repo master branch
Change-Id: Iad6ba809680ba5081cc9d7879794ef58cc8f8a40
2019-11-25 14:46:48 +05:30
Field G. Van Zee
bbb21fd0a9 Tweaked SIAM/SC Best Prize language in README.md. 2019-11-21 18:15:16 -06:00
Field G. Van Zee
043366f92d Fixed typo in previous commit (SIAM/SC prize). 2019-11-21 18:13:51 -06:00
Field G. Van Zee
05a4d583e6 Added SIAM/SC prize to "What's New" in README.md. 2019-11-21 18:12:24 -06:00
Field G. Van Zee
881b05ecd4 Fixed blastest failure for 'generic' subconfig.
Details:
- Fixed a subtle and complicated bug that only manifested via the BLAS
  test drivers in the generic subconfiguration, and possibly any other
  subconfiguration that did not register complex-domain gemm ukernels,
  or registered ONLY real-domain ukernels as row-preferential. This is
  a long story, but it boils down to an exception to the "transpose the
  operation to bring storage of C into agreement with ukernel pref"
  optimization in bli_hemm_front.c and bli_symm_front.c sabotaging the
  proper functioning of the 1m method, but only when the imaginary
  component of beta is zero. See the comments in issue #342 for more
  details. Thanks to Dave Love for identifying the commit in which this
  bug was introduced, and other feedback related to this bug.
2019-11-21 16:34:27 -06:00
Kiran Varaganti
27fe3d2df3 Merge "Fixed segemntation fault in trsm_small kernels for cases XAuB, XAltB, XAlB For matrix sizes which are not multiples of 4, trsm_small kernels access memory outside the allocated buffers which causes segmentation fault. This is fixed by handling each of the corner cases separately." into amd-staging-rome2.1 2019-11-21 09:19:39 -05:00
prangana
33648bbf31 CPP Test comparison util function fix
Change-Id: I6a9769efcef5f313eb318921275d37353df2b127
2019-11-21 15:57:41 +05:30
Meghana
c63a078a57 Fixed segemntation fault in trsm_small kernels for cases XAuB, XAltB, XAlB
For matrix sizes which are not multiples of 4, trsm_small kernels access memory outside the allocated buffers which causes segmentation fault.
This is fixed by handling each of the corner cases separately.

Change-Id: I267e69ee095a8ca3e8ce2a3ada5f48bfefcc2219
2019-11-21 12:31:09 +05:30
prangana
ba86a38143 Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-blis-cpp
Change-Id: I49bc3fa15e41fc287e1ca26c357edf144044943f
2019-11-21 10:04:24 +05:30
Meghana
5560f75c0c Modified makefiles for zen and zen2 to pick up compiler flags based on architecture and compiler versions
Change-Id: I443e47c38e0ffd12f4b303f546abd46d02aa31ca
2019-11-21 09:53:44 +05:30
prangana
3d20128aea Merge branch 'amd-staging-rome2.1' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-blis-cpp
Change-Id: I97a10ab7546d475474b0ff733bafb8248843c352
2019-11-21 00:54:16 +05:30
prangana
d63f9b7d7f checkcpp test rule in Makefile
Change-Id: If01fe55e258e563a96cd8da9ea93d21063b730c2
2019-11-21 00:52:47 +05:30
prangana
49c27040d1 Instll CPP Template headers
Change-Id: Ib15dc9bda08d1f3fdc68e31520daee90a287357c
2019-11-20 21:52:22 +05:30
prangana
5f04fdd618 CPP Templatee test files update
Change-Id: Ia9637556b50b10cb4409e18f369a3e7fc35569fb
2019-11-20 21:32:37 +05:30
Field G. Van Zee
0c7165fb01 Fixed obscure bug in bli_acquire_mpart_[mn]dim().
Details:
- Fixed a bug in bli_acquire_mpart_mdim(), bli_acquire_mpart_ndim(),
  and bli_acquire_mpart_mndim() that allowed the use of a blocksize b
  that is too large given the current row/column index (i.e., the i/j
  argument) and the size of the dimension being partitioned (i.e., the
  m/n argument). This bug only affected backwards partitioning/motion
  through the dimension and was the result of a misplaced conditional
  check-and-redirect to the backwards code path. It should be noted
  that this bug was discovered not because it manifested the way it
  could (thanks to the callers in BLIS making sure to always pass in
  the "correct" blocksize b), but could have manifested if the
  functions were used by 3rd party callers. Thanks to Minh Quan Ho for
  reporting the bug via issue #363.
2019-11-14 16:48:14 -06:00
Field G. Van Zee
fb8bef9982 Fixed copy-paste bug in bli_spackm_6xk_bb4_ref().
Details:
- Fixed a copy-paste bug in the new bli_spackm_6xk_bb4_ref() that
  manifested as failures in single-precision real level-3 operations.
  Also replaced the duplication factor constants with a const-qualifed
  varialbe, dfac, so that this won't happen again.
- Changed NC for single-precision real from 4080 to 8160 so that the
  packed matrix B will have the same byte footprint in both single
  and double real.
2019-11-14 13:05:28 -06:00
Field G. Van Zee
8f399c8940 Tweaked/added notes to docs/Multithreading.md.
Details:
- Added language to docs/Multithreading.md cautioning the reader about
  the nuances of setting multithreading parameters via the manual and
  automatic ways simultaneously, and also about how these parameters
  behave when multithreading is disabled at configure-time. These
  changes are an attempt to address the issues that arose in issue #362.
  Thanks to Jérémie du Boisberranger for his feedback on this topic.
- CREDITS file update.
2019-11-12 15:32:57 -06:00
Devrajegowda, Kiran
b5475f527d Adding context initialisation for SUP kernels in zen2 architecture
Change-Id: I9de533abb039d0dff348728be51554cc53679d10
2019-11-12 13:59:26 +05:30
Field G. Van Zee
bdc7ee3394 Various fixes to support packing duplication in B.
Details:
- Added cpp macros to trmm and trmm3 front-ends to optionally force
  those operations to be cast so the structured matrix is on the left.
  symm and hemm already had such macros, but these too were renamed so
  that the macros were individual to the operation. We now have four
  such macros:
    #define BLIS_DISABLE_HEMM_RIGHT
    #define BLIS_DISABLE_SYMM_RIGHT
    #define BLIS_DISABLE_TRMM_RIGHT
    #define BLIS_DISABLE_TRMM3_RIGHT
  Also, updated the comments in the symm and hemm front-ends related to
  the first two macro guards, and added corresponding comments to the
  trmm and trmm3 front-ends for the latter two guards. (They all
  functionally do the same thing, just for their specific operations.)
  Thanks to Jeff Hammond for reporting the bugs that led me to this
  change (via #359).
- Updated config/old/haswellbb subconfiguration (used to debug issues
  related to duplicating B during packing) to register: a packing
  kernel for single-precision real; gemmbb ukernels for s, c, and z;
  trsmbb ukernels for s, c, and z; gemmtrsmbb virtual ukrnels for s, c
  and z; and to use non-default cache and register blocksizes for s, c,
  and z datatypes. Also declared prototypes for all of the gemmbb,
  trsmbb, and gemmtrsmbb ukernel functions within the
  bli_cntx_init_haswellbb() function. This should, once applied to the
  power9 configuration, fix the remaining issues in #359.
- Defined bli_spackm_6xk_bb4_ref(), which packs single reals with a
  duplication factor of 4. This function is defined in the same file as
  bli_dpackm_6xk_bb2_ref() (bli_packm_cxk_bb_ref.c).
2019-11-11 15:47:17 -06:00
Field G. Van Zee
0eb79ca850 Avoid unused variable warning in lread.c (#356).
Details:
- Replaced the line

    f = f;

  with

    ( void )f;

  for the unused variable 'f' in blastest/f2c/lread.c. (Hopefully)
  addresses issue #356, but since we don't use xlc who knows. Thanks
  to Jeff Hammond for reporting this.
2019-11-08 14:48:48 -06:00
Jérôme Duval
f377bb4485 Add Haiku to the known OS list (#361) 2019-11-07 16:39:29 -06:00
Field G. Van Zee
e29b1f9706 Fixed failing testsuite gemmtrsm_ukr for power9.
Details:
- Added code that fixes false failures in the gemmtrsm_ukr module of the
  testsuite. The tests were failing because the computation (bli_gemv())
  that performs the numerical check was not able to properly travserse
  the matrix operands bx1 and b11 that are views into the micropanel of
  B, which has duplicated/broadcast elements under the power9 subconfig.
  (For example, a micropanel of B with duplication factor of 2 needs to
  use a column stride of 2; previously, the column stride was being
  interpreted as 1.)
- Defined separate bli_obj_set_row_stride() and bli_obj_set_col_stride()
  static functions in bli_obj_macro_defs.h. (Previously, only the
  function bli_obj_set_strides() was defined. Amazing to think that we
  got this far without these former functions.)
- Updated/expounded upon comments.
2019-11-05 17:15:19 -06:00