Commit Graph

1684 Commits

Author SHA1 Message Date
Field G. Van Zee
b495ca9b76 Link to Eigen BLAS for non-gemm drivers in test/3.
Details:
- Adjusted test/3/Makefile so that the test drivers are linked against
  Eigen's BLAS library for hemm, herk, trmm, and trsm. We have to do
  this since Eigen's headers don't define implementations to the
  standard BLAS APIs.
- Simplified #included headers in hemm, herk, trmm, and trsm source
  driver files, since nothing specific to Eigen is needed at
  compile-time for those operations.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
728e9666e1 Add more support for Eigen to drivers in test/3.
Details:
- Use compile-time implementations of Eigen in test_gemm.c via new
  EIGEN cpp macro, defined on command line. (Linking to Eigen's BLAS
  library is not necessary.) However, as of Eigen 3.3.7, Eigen only
  parallelizes the gemm operation and not hemm, herk, trmm, trsm, or
  any other level-3 operation.
- Fixed a bug in trmm and trsm drivers whereby the wrong function
  (bli_does_trans()) was being called to determine whether the object
  for matrix A should be created for a left- or right-side case. This
  was corrected by changing the function to bli_is_left(), as is done
  in the hemm driver.
- Added support for running Eigen test drivers from runme.sh.
2019-08-23 14:18:08 +05:30
Isuru Fernando
686aa860f2 Fix clang version detection (#305)
clang -dumpversion gives 4.2.1 for all clang versions as clang was
originally compatible with gcc 4.2.1

Apple clang version and clang version are two different things
and the real clang version cannot be deduced from apple clang version
programatically. Rely on wikipedia to map apple clang to clang version

Also fixes assembly detection with clang

clang 3.8 can't build knl as it doesn't recognize zmm0
2019-08-23 14:18:08 +05:30
Field G. Van Zee
a9270bef26 Allow disabling of BLAS prototypes at compile-time.
Details:
- Modified bli_blas.h so that:
  - By default, if the BLAS layer is enabled at configure-time, BLAS
    prototypes are also enabled within blis.h;
  - But if the user #defines BLIS_DISABLE_BLAS_DEFS prior to including
    blis.h, BLAS prototypes are skipped over entirely so that, for
    example, the application or some other header pulled in by the
    application may prototype the BLAS functions without causing any
    duplication.
- Updated docs/BuildSystem.md to document the feature above, and
  related text.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
a1c8b11b3f Added Eigen support to test/3 Makefile, runme.sh.
Details:
- Added targets to test/3/Makefile that link against a BLAS library
  build by Eigen. It appears, however, that Eigen's BLAS library does
  not support multithreading. (It may be that multithreading is only
  available when using the native C++ APIs.)
- Updated runme.sh with a few Eigen-related tweaks.
- Minor tweaks to docs/Performance.md.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
366c4b14c0 More minor tweaks to docs/Performance.md.
Details:
- Defined GFLOPS as billions of floating-point operations per second,
  and reworded the sentence after about normalization.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
38e2180bd8 CHANGELOG update (0.5.2) 2019-08-23 14:18:08 +05:30
Field G. Van Zee
14bc42f3f7 ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
cd81a6a50a Very minor tweaks to Performance.md. 2019-08-23 14:18:08 +05:30
Field G. Van Zee
6385a3ed51 Minor fixes to docs/Performance.md.
Details:
- Fixed some incorrect labels associated with the pdf/png graphs,
  apparently the result of copy-pasting.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
25db9033d9 Fixed broken section links in docs/Performance.md.
Details:
- Fixed a few broken section links in the Contents section.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
c6793be46e Added docs/Performance.md and docs/graphs subdir.
Details:
- Added a new markdown document, docs/Performance.md, which reports
  performance of a representative set of level-3 operations across a
  variety of hardware architectures, comparing BLIS to OpenBLAS and a
  vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
  in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
b503627a68 Adjusted cache blocksizes for zen subconfig.
Details:
- Adjusted the zen sub-configuration's cache blocksizes for float,
  scomplex, and dcomplex based on the existing values for double.
  (The previous values were taken directly from the haswell subconfig,
  which targets Intel Haswell/Broadwell/Skylake systems.)
2019-08-23 14:18:07 +05:30
Field G. Van Zee
2d1cd32c0f Renamed --enable-export-all to --export-shared=[].
Details:
- Replaced the existing --enable-export-all / --disable-export-all
  configure option with --export-shared=[public|all], with the 'public'
  instance of the latter corresponding to --disable-export-all and the
  'all' instance corresponding to --enable-export-all. Nothing else
  semantically about the option, or its default, has changed.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
3e16033ead Updates to docs/Multithreading.md.
Details:
- Made extra explicit the fact that: (a) multithreading in BLIS is
  disabled by default; and (b) even with multithreading enabled, the
  user must specify multithreading at runtime in order to observe
  parallelism. Thanks to M. Zhou for suggesting these clarifications
  in #292.
- Also made explicit that only the environment variable and global
  runtime API methods are available when using the BLAS API. If the
  user wishes to use the local runtime API (specify multithreading on
  a per-call basis), one of the native BLIS APIs must be used.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
9ea223df2d Use -fvisibility=[...] with clang on Linux/BSD/OSX.
Details:
- Modified common.mk to use the -fvisibility=[hidden|default] option
  when compiling with clang on non-Windows platforms (Linux, BSD, OS X,
  etc.). Thanks to Isuru Fernando for pointing out this option works
  with clang on these OSes.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
40b392de3f Annotated additional symbols for export.
Details:
- Added export annotations to additional function prototypes in order to
  accommodate the testsuite.
- Disabled calling bli_amaxv_check() from within the testsuite's
  test_amaxv.c.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
d8f6d17bce Support shared lib export of only public symbols.
Details:
- Introduced a new configure option, --enable-export-all, which will
  cause all shared library symbols to be exported by default, or,
  alternatively, --disable-export-all, which will cause all symbols to
  be hidden by default, with only those symbols that are annotated for
  visibility, via BLIS_EXPORT_BLIS (and BLIS_EXPORT_BLAS for BLAS
  symbols), to be exported. The default for this configure option is
  --disable-export-all. Thanks to Isuru Fernando for consulting on
  this commit.
- Removed BLIS_EXPORT_BLIS annotations from frame/1m/bli_l1m_unb_var1.h,
  which was intended for 5a5f494.
- Relocated BLIS_EXPORT-related cpp logic from bli_config.h.in to
  frame/include/bli_config_macro_defs.h.
- Provided appropriate logic within common.mk to implement variable
  symbol visibility for gcc, clang, and icc (to the extend that each of
  these compilers allow).
- Relocated --help text associated with debug option (-d) to configure
  slightly further down in the list.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
0709ccd187 Removed export macros from all internal prototypes.
Details:
- After merging PR #303, at Isuru's request, I removed the use of
  BLIS_EXPORT_BLIS from all function prototypes *except* those that we
  potentially wish to be exported in shared/dynamic libraries. In other
  words, I removed the use of BLIS_EXPORT_BLIS from all prototypes of
  functions that can be considered private or for internal use only.
  This is likely the last big modification along the path towards
  implementing the functionality spelled out in issue #248. Thanks
  again to Isuru Fernando for his initial efforts of sprinkling the
  export macros throughout BLIS, which made removing them where
  necessary relatively painless. Also, I'd like to thank Tony Kelman,
  Nathaniel Smith, Ian Henriksen, Marat Dukhan, and Matthew Brett for
  participating in the initial discussion in issue #37 that was later
  summarized and restated in issue #248.
- CREDITS file update.
2019-08-23 14:18:07 +05:30
Isuru Fernando
dcb0f50455 Export functions without def file (#303)
* Revert "restore bli_extern_defs exporting for now"

This reverts commit 09fb07c350b2acee17645e8e9e1b8d829c73dca8.

* Remove symbols not intended to be public

* No need of def file anymore

* Fix whitespace

* No need of configure option

* Remove export macro from definitions

* Remove blas export macro from definitions
2019-08-23 14:18:07 +05:30
Field G. Van Zee
da590746b0 Renamed test/3m4m to test/3.
Details:
- Renamed '3m4m' directory to '3', which captures the directory nicely
  since it builds test drivers to test level-3 operations.
- These test drivers ceased to be used to test the 3m and 4m (or even
  1m) induced methods long ago, hence the name change.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
2ebe4aafe5 More minor updates and edits to test/3m4m.
Details:
- Further updates to matlab scripts, mostly for compatibility with
  GNU Octave.
- More tweaks to runme.sh.
- Updates to runme.m that allow copy-paste into matlab interactive
  session to generate graphs.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
09ed72c5a7 Very minor updates to test/3m4m for ul252.
Details:
- Very minor updates to the newly revamped test/3m4m drivers when used
  on a Xeon Platinum (SkylakeX).
2019-08-23 14:18:07 +05:30
Field G. Van Zee
8beda64ea5 Overhauled test/3m4m Makefile and scripts.
Details:
- Rewrote much of Makefile to generate executables for single- and dual-
  socket multithreading as well as single-threaded. Each of the three
  can also use a different problem size range/increment, as is often
  appropriate when doubling/halving the number of threads.
- Rewrote runme.sh script to flexibly execute as many threading
  parameter scenarios as is given in the input parameter string
  (currently set within the script itself). The string also encodes
  the maximum problem size for each threading scenario, which is used
  to identify the executable to run. Also improved the "progress" output
  of the script to reduce redundant info and improve readability in
  terminals that are not especially wide.
- Minor updates to test_*.c source files.
- Updated matlab scripts according to changes made to the Makefile,
  test drivers, and runme.sh script, and renamed 'plot_all.m' to
  'runme.m'.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
7918a6deca Updates (from ls5) to test/3m4m/runme.sh.
Details:
- Lonestar5-specific updates to runme.sh.
2019-08-23 14:18:07 +05:30
Isuru Fernando
e5fc00a2e7 Add symbol export macro for all functions (#302)
* initial export of blis functions

* Regenerate def file for master

* restore bli_extern_defs exporting for now
2019-08-23 14:18:07 +05:30
Field G. Van Zee
56286b4729 Updated level-3 BLAS to call object API directly.
Details:
- Updated the BLAS compatibility layer for level-3 operations so that
  the corresponding BLIS object API is called directly rather than first
  calling the typed BLIS API. The previous code based on the typed BLIS
  API calls is still available in a deactivated cpp macro branch, which
  may be re-activated by #defining BLIS_BLAS3_CALLS_TAPI. (This does not
  yet correspond to a configure option. If it seems like people might
  want to toggle this behavior more regularly, a configure option can be
  added in the future.)
- Updated the BLIS typed API to statically "pre-initialize" objects via
  new initializor macros. Initialization is then finished via calls to
  static functions bli_obj_init_finish_1x1() and bli_obj_init_finish(),
  which are similar to the previously-called functions,
  bli_obj_create_1x1_with_attached_buffer() and
  bli_obj_create_with_attached_buffer(), respectively. (The BLAS
  compatibility layer updates mentioned above employ this new technique
  as well.)
- Transformed certain routines in bli_param_map.c--specifically, the
  ones that convert netlib-style parameters to BLIS equivalents--into
  static functions, now in bli_param_map.h. (The remaining three classes
  of conversation routines were left unchanged.)
- Added the aforementioned pre-initializor macros to bli_type_defs.h.
- Relocated bli_obj_init_const() and bli_obj_init_constdata() from
  bli_obj_macro_defs.h to bli_type_defs.h.
- Added a few macros to bli_param_macro_defs.h for testing domains for
  real/complexness and precisions for single/double-ness.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
84282bba54 Updates to 3m4m/matlab scripts.
Details:
- Minor updates to matlab graph-generating scripts.
- Added a plot_all.m script that is more of a scratchpad for copying and
  pasting function invocations into matlab to generate plots that are
  presently of interest to us.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
29d5bcb1c8 Changed unsafe-loop to unsafe-math optimizations.
Details:
- Changed -funsafe-loop-optimizations (re-)introduced in 7690855 for
  make_defs.mk files' CRVECFLAGS to -funsafe-math-optimizations (to
  account for a miscommunication in issue #300). Thanks to Dave Love
  for this suggestion and Jeff Hammond for his feedback on the topic.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
9a42c1a323 Restored -funsafe-loop-optimizations to subconfigs.
Details:
- Restored use of -funsafe-loop-optimizations in the definitions of
  CRVECFLAGS (when using gcc), but only for sub-configurations (and
  not configuration families such as amd64, intel64, and x86_64).
  This more or less reverts 5190d05 and 6cf1550.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
e62bdd4df1 Disable TBM, XOP, LWP instructions in AMD configs.
Details:
- Added -mno-tbm -mno-xop -mno-lwp to CKVECFLAGS in bulldozer,
  piledriver, steamroller, and excavator configurations to explicitly
  disable AMD's bulldozer-era TBM, XOP, and LWP instruction sets in an
  attempt to fix the invalid instruction error that has plagued Travis
  CI builds since 6a014a3. Thanks to Devin Matthews for pointing out
  that the offending instruction was part of TBM (issue #300).
- Restored -O3 to piledriver configuration's COPTFLAGS.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
e7b73bf1ed Reverted piledriver COPTFLAGS from -O3 to -O2.
Details:
- Debugging continues; changing COPTFLAGS for piledriver subconfig from
  -O3 to -O2, its original value prior to 6a014a3.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
176e4c6860 Removed -funsafe-loop-optimizations from all configs.
Details:
- Error persists. Removed -funsafe-loop-optimizations from all remaining
  sub-configurations.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
24adee071c Removed -funsafe-loop-optimizations from piledriver.
Details:
- Error persists; continuing debugging from bf0fb78c by removing
  -funsafe-loop-optimizations from piledriver configuration.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
7128d4b94b Removed -funsafe-loop-optimizations from families.
Details:
- Removed -funsafe-loop-optimizations from the configuration families
  affected by 6a014a3, specifically: intel64, amd64, and x86_64.
  This is part of an attempt to debug why the sde, as executed by
  Travis CI, is crashing via the following error:

    TID 0 SDE-ERROR: Executed instruction not valid for specified chip
    (ICELAKE): 0x9172a5: bextr_xop rax, rcx, 0x103
2019-08-23 14:18:07 +05:30
Field G. Van Zee
b7c4f1e305 Standardized optimization flags in make_defs.mk.
Details:
- Per Dave Love's recommendation in issue #300, this commit defines
    COPTFLAGS := -03
  and
    CRVECFLAGS := $(CKVECFLAGS) -funsafe-loop-optimizations
  in the make_defs.mk for all Intel- and AMD-based configurations.
2019-08-23 14:18:07 +05:30
Meghana
fdce1a5648 changed gcc version check condition from 'ifeq' to 'if greater or equal'
Change-Id: Ie4c461867829bcc113210791bbefb9517e52c226
AOCL2.0 2.0
2019-07-24 15:04:41 +05:30
Meghana
c9486e0c4f code to detect version of gcc and set flags accordingly for zen2
Change-Id: I29b0311d0000dee1a2533ee29941acf53f9e9f34
2019-07-24 09:45:17 +05:30
Meghana
dcc0ce12fd Added a global Makefile for AMD architectures in config/zen folder
This Makefile(amd_config.mk) has all the flags that are common to EPYC series

Change-Id: Ic02c60a8293ccdd37f0f292e631acd198e6895de
2019-07-22 17:12:01 +05:30
Meghana Vankadari
b84cee29f4 Merge "Added compiler flags for vanilla clang" into amd-staging-rome2.0 2019-07-08 02:03:07 -04:00
kdevraje
1f80858abf This checkin solves the dgemm performance issue jira ticket CPUPL 458, as #else was missed during integration, it was always following else path to get the block sizes
Change-Id: I0084b5856c2513ab1066c08c15b5086db6532717
2019-07-05 16:05:11 +05:30
Meghana
c7dd6e6cd2 Added compiler flags for vanilla clang
Change-Id: I13c00b4c0d65bbda4c929848fd48b0ab611952ab
2019-07-04 09:32:51 +05:30
Meghana
2acd49b764 fix for test failures using AOCC 2.0
Change-Id: If44eaccc64bbe96bbbe1d32279b1b5773aba08d1
2019-07-01 15:44:07 +05:30
kdevraje
cac127182d Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis
with public repo commit id 565fa3853b.

Change-Id: I68b9824b110cf14df248217a24a6191b3df79d42
2019-06-24 14:05:54 +05:30
Kiran Devrajegowda
3a45ecb154 Merge "Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup" into amd-staging-rome2.0 2019-05-31 06:47:02 -04:00
Kiran Varaganti
b69fb0b74a Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup
Change-Id: I9f5d8225254676a99c6f2b09a0825e545206d0fc
2019-05-31 15:14:22 +05:30
kdevraje
3f867c96ca When running HPL with pure MPI without DGEMM Threading (Single Threaded BLIS ), making this macro 1 gives best performance.wq
Change-Id: I24fd0bf99216f315e49f1c74c44c3feaffd7078d
2019-05-31 14:31:49 +05:30
kdevraje
13806ba3b0 This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019
Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041
2019-05-27 16:24:43 +05:30
Meghana
ee123f5358 Defined small matrix thresholds for TRSM for various cases for NAPLES and ROME
Updated copyright information for kernels/zen/bli_trsm_small.c file
Removed separate kernels for zen2 architecture
Instead added threshold conditions in zen kernels both for ROME and NAPLES

Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5
2019-05-27 15:36:44 +05:30
prangana
9d93a4caa2 update version 2.0 2019-05-24 17:59:13 +05:30