Commit Graph

1886 Commits

Author SHA1 Message Date
Field G. Van Zee
b5e9bce4dd Updated -march flags for sandybridge, haswell.
Details:
- Updated the '-march=corei7-avx' flag in the sandybridge subconfig
  to '-march=sandybridge' and the '-march=core-avx2' flag in the
  haswell subconfig to '-march=haswell'. The older flags were used
  by older versions of gcc and should have been updated to the newer
  forms a long time ago. (The older flags were clearly working, even
  though they are no longer documented in the gcc man page.)
2019-07-19 14:42:37 -05:00
Field G. Van Zee
c22b9dba58 More updates to comments in testsuite modules.
Details:
- Updated most comments in testsuite modules that describe how the
  correctness test is performed so that it is clear whether the vector
  (normfv) or matrix (normfm) form of Frobenius norm is used.
2019-07-16 13:14:47 -05:00
Field G. Van Zee
c4cc6fa702 New cntx_t blksz "set" functions + misc tweaks.
Details:
- Defined two new static functions in bli_cntx.h:
    bli_cntx_set_blksz_def_dt()
    bli_cntx_set_blksz_max_dt()
  which developers may find convenient when experimenting with different
  values of cache blocksizes.
- Updated one- and two-socket multithreaded problem size range and
  increment values in test/3/Makefile.
- Changed default to column storage in test/3/test_gemm.c.
- Fixed typo in comment in testsuite/src/test_subm.c.
2019-07-16 13:00:35 -05:00
Meghana Vankadari
b84cee29f4 Merge "Added compiler flags for vanilla clang" into amd-staging-rome2.0 2019-07-08 02:03:07 -04:00
kdevraje
1f80858abf This checkin solves the dgemm performance issue jira ticket CPUPL 458, as #else was missed during integration, it was always following else path to get the block sizes
Change-Id: I0084b5856c2513ab1066c08c15b5086db6532717
2019-07-05 16:05:11 +05:30
Meghana
c7dd6e6cd2 Added compiler flags for vanilla clang
Change-Id: I13c00b4c0d65bbda4c929848fd48b0ab611952ab
2019-07-04 09:32:51 +05:30
Meghana
2acd49b764 fix for test failures using AOCC 2.0
Change-Id: If44eaccc64bbe96bbbe1d32279b1b5773aba08d1
2019-07-01 15:44:07 +05:30
Field G. Van Zee
ceee2f973e Fixed thrinfo_t printing bug for small problems.
Details:
- Fixed a bug in bli_l3_thrinfo_print_gemm_paths() and
  bli_l3_thrinfo_print_trsm_paths(), defined in bli_l3_thrinfo.c,
  whereby subnodes of the thrinfo_t tree are "dereferenced" near the
  beginning of the functions, which may lead to segfaults in certain
  situations where the thread tree was not fully formed because the
  matrix problem was too small for the level of parallelism specified.
  (That is, too small because some problems were assigned no work due
  to the smallest units in the m and n dimensions being defined by the
  register blocksizes mr and nr.) The fix requires several nested levels
  of if statements, and this is one of those few instances where use of
  goto statements results in (mostly) prettier code, especially in the
  case of _gemm_paths(). And while it wasn't necessary, I ported this
  goto usage to the loop body that prints the thrinfo_t work_id and
  comm_id values for each thread. Thanks to Nicholai Tukanov for helping
  to find this bug.
2019-06-24 17:47:40 -05:00
kdevraje
cac127182d Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis
with public repo commit id 565fa3853b.

Change-Id: I68b9824b110cf14df248217a24a6191b3df79d42
2019-06-24 14:05:54 +05:30
Field G. Van Zee
c152109e9a Updated BLASFEO results in PerformanceSmall.md.
Details:
- Updated the BLASFEO performance graphs shown in PerformanceSmall.md
  using a new commit of BLASFEO (2c9f312); updated PerformanceSmall.md
  accordingly.
- Updated test/sup/octave/plot_l3sup_perf.m so that the .m files
  containing the mpnpkp results do not need to be preprocessed in order
  to plot half the problem size range (ie: up to 400 instead of the
  800 range of the other shape cases).
- Trivial updates to runme.m.
2019-06-19 13:23:24 -05:00
Field G. Van Zee
4d19c98110 Trivial change to MixedDatatypes.md link text. 2019-06-08 11:02:03 -05:00
Field G. Van Zee
24965beabe Fixed typo in README.md's MixedDatatypes.md link. 2019-06-08 11:00:22 -05:00
Field G. Van Zee
50dc5d9576 Adjust -fopenmp-simd for icc's preferred syntax.
Details:
- Use -qopenmp-simd instead of -fopenmp-simd when compiling with Intel
  icc. Recall that this option is used for SIMD auto-vectorization in
  reference kernels only. Support for the -f option has been completely
  deprecated and removed in newer versions of icc in favor of -q. Thanks
  to Victor Eijkhout for reporting this issue and suggesting the fix.
2019-06-07 13:10:16 -05:00
Field G. Van Zee
ad937db950 Added missing #include "bli_family_thunderx2.h".
Details:
- Added a cpp-conditional directive block to bli_arch_config.h that
  #includes "bli_family_thunderx2.h". The code has been missing since
  adf5c17f. However, this never manifested as an error because the file
  is virtually empty and not needed for thunderx2 (or most subconfigs).
  Thanks to Jeff Diamond for helping to spot this.
2019-06-07 11:34:08 -05:00
Field G. Van Zee
ce671917b2 Fixed formatting/typo in docs/PerformanceSmall.md. 2019-06-06 14:17:21 -05:00
Field G. Van Zee
86c33a4eb2 Tweaked language in README.md related to sup/AMD. 2019-06-05 11:43:55 -05:00
Field G. Van Zee
cbaa22e1ca Added BLASFEO results to docs/PerformanceSmall.md.
Details:
- Updated the graphs linked in PerformanceSmall.md with BLASFEO results,
  and added documenting language accordingly.
- Updated scripts in test/sup/octave to plot BLASFEO data.
- Minor tweak to language re: how OpenBLAS was configured for
  docs/Performance.md.
2019-06-04 16:06:58 -05:00
Field G. Van Zee
763fa39c30 Minor tweaks to test/sup.
Details:
- Changed starting problem and increment from 16 to 4.
- Added 'lll' (square problems) to list of problem size shapes to
  compile and run with.
- Define BLASFEO location and added BLASFEO-related definitions.
2019-06-04 14:46:45 -05:00
Field G. Van Zee
5e1e696003 CHANGELOG update (0.6.0) 2019-06-03 18:37:20 -05:00
Field G. Van Zee
18c876b989 Version file update (0.6.0) 0.6.0 2019-06-03 18:37:19 -05:00
Field G. Van Zee
0f1b3bf49e ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
- CREDITS file update.
2019-06-03 18:35:19 -05:00
Field G. Van Zee
27da2e8400 Minor edits to docs/PerformanceSmall.md.
Details:
- Added performance analysis to "Comments" section of both Kaby Lake and
  Epyc sections.
- Added emphasis to certain passages.
2019-06-03 17:14:56 -05:00
Field G. Van Zee
09ba05c6f8 Added sup performance graphs/document to 'docs'.
Details:
- Added a new markdown document, docs/PerformanceSmall.md, which
  publishes new performance graphs for Kaby Lake and Epyc showcasing
  the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
  For now, only single-threaded dgemm performance is shown.
- Reorganized graphs in docs/graphs into docs/graphs/large, with new
  graphs being placed in docs/graphs/sup.
- Updates to scripts in test/sup/octave, mostly to allow decent output
  in both GNU octave and Matlab.
- Updated README.md to mention and refer to the new PerformanceSmall.md
  document.
2019-06-03 16:53:19 -05:00
Field G. Van Zee
6bf449cc69 Merge branch 'amd' 2019-05-31 17:42:40 -05:00
Field G. Van Zee
a4e8801d08 Increased MT sup threshold for double to 201.
Details:
- Fine-tuned the double-precision real MT threshold (which controls
  whether the sup implementation kicks for smaller m dimension values)
  from 180 to 201 for haswell and 180 to 256 for zen.
- Updated octave scripts in test/sup/octave to include a seventh column
  to display performance for m = n = k.
2019-05-31 17:30:51 -05:00
Kiran Devrajegowda
3a45ecb154 Merge "Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup" into amd-staging-rome2.0 2019-05-31 06:47:02 -04:00
Kiran Varaganti
b69fb0b74a Added back BLIS_ENABLE_ZEN_BLOCK_SIZES macro to zen configuration, this is same as release 1.3. This was added before to improve DGEMM Multithreaded scalability on Naples for when number of threads is greater than 16. By mistake this got deleted in many changes done for 2.0 release, now we are adding this change back., in bli_gemm_front.c - code cleanup
Change-Id: I9f5d8225254676a99c6f2b09a0825e545206d0fc
2019-05-31 15:14:22 +05:30
kdevraje
3f867c96ca When running HPL with pure MPI without DGEMM Threading (Single Threaded BLIS ), making this macro 1 gives best performance.wq
Change-Id: I24fd0bf99216f315e49f1c74c44c3feaffd7078d
2019-05-31 14:31:49 +05:30
Field G. Van Zee
abd8a9fa7d Inadvertantly hidden xerbla_() in blastest (#313).
Details:
- Attempted a fix to issue #313, which reports that when building only
  a shared library (ie: static library build is disabled), running the
  BLAS test drivers can fail because those drivers provide their own
  local version of xerbla_() as a clever (albeit still rather hackish)
  way of checking the error codes that result from the individual tests.
  This local xerbla_() function is never found at link-time because the
  BLAS test drivers' Makefile imports BLIS compilation flags via the
  get-user-cflags-for() function, which currently conveys the
  -fvisibility=hidden flag, which hides symbols unless they are
  explicitly annotated for export. The -fvisibility=hidden flag was
  only ever intended for use when building BLIS (not for applications),
  and so the attempted solution here is to omit the symbol export
  flag(s) from get-user-cflags-for() by storing the symbol export
  flag(s) to a new BULID_SYMFLAGS variable instead of appending it
  to the subconfigurations' CMISCFLAGS variable (which is returned by
  every get-*-cflags-for() function). Thanks to M. Zhou for reporting
  this issue and also to Isuru Fernando for suggesting the fix.
- Renamed BUILD_FLAGS to BUILD_CPPFLAGS to harmonize with the newly
  created BUILD_SYMFLAGS.
- Fixed typo in entry for --export-shared flag in 'configure --help'
  text.
2019-05-28 12:49:44 -05:00
kdevraje
13806ba3b0 This check in has changes w.r.t Copyright information, which is changed to (start year) - 2019
Change-Id: Ide3c8f7172210b8d3538d3c36e88634ab1ba9041
2019-05-27 16:24:43 +05:30
Meghana
ee123f5358 Defined small matrix thresholds for TRSM for various cases for NAPLES and ROME
Updated copyright information for kernels/zen/bli_trsm_small.c file
Removed separate kernels for zen2 architecture
Instead added threshold conditions in zen kernels both for ROME and NAPLES

Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5
2019-05-27 15:36:44 +05:30
prangana
9d93a4caa2 update version 2.0 2019-05-24 17:59:13 +05:30
Field G. Van Zee
755730608d Minor rewording of language around mt env. vars. 2019-05-23 17:34:36 -05:00
Field G. Van Zee
ba31abe73c Added BLIS theading info to Performance.md.
Details:
- Documented the BLIS environment variables that were set
  (e.g. BLIS_JC_NT, BLIS_IC_NT, BLIS_JR_NT) for each machine and
  threading configuration in order to achieve the parallelism reported
  on in docs/Performance.md.
2019-05-23 14:59:53 -05:00
Field G. Van Zee
cb788ffc89 Increased MT sup threshold for double to 180.
Details:
- Increased the double-precision real MT threshold (which controls
  whether the sup implementation kicks for smaller m dimension values)
  from 80 to 180, and this change was made for both haswell and zen
  subconfigurations. This is less about the m dimension in particular
  and more about facilitating a smoother performance transition when
  m = n = k.
2019-05-23 13:00:53 -05:00
Field G. Van Zee
057f5f3d21 Minor build system housekeeping.
Details:
- Commented out redundant setting of LIBBLIS_LINK within all driver-
  level Makefiles. This variable is already set within common.mk, and
  so the only time it should be overridden is if the user wants to link
  to a different copy of libblis.
- Very minor changes to build/gen-make-frags/gen-make-frag.sh.
- Whitespace and inconsequential quoting change to configure.
- Moved top-level 'windows' directory into a new 'attic' directory.
2019-05-23 12:51:17 -05:00
Meghana
e05171118c Implemented TRSM for small matrices for cases where A is on the right
Added separate kernels for zen and zen2

Change-Id: I6318ddc250cf82516c1aa4732718a35eae0c9134
2019-05-23 16:17:19 +05:30
kdevraje
02920f5c48 make checkblis fails for matrix dimension check at the begining hence reverting it
Change-Id: Ibd2ee8c2d4914598b72003fbfc5845be9c9c1e87
2019-05-23 15:29:59 +05:30
kdevraje
84215022f2 Adding threshold condition to dgemm small matrix kernels, defining the constants in zen2 configuration
Change-Id: I53a58b5d734925a6fcb8d8bea5a02ddb8971fcd5
2019-05-23 14:33:47 +05:30
kdevraje
a3554eb1dc Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis to configure zen2
Change-Id: I97e17bca9716b80b862925f97bb513c07b4b0cae
2019-05-23 11:53:32 +05:30
kdevraje
ea082f8390 adding empty zen2 directory with .gitignore file
Change-Id: Ifa37cf54b2578aa19ad335372b44bca17043fe4b
2019-05-23 10:38:29 +05:30
Kiran Varaganti
b80bd5bcb2 config/zen/bli_cntx_init_zen.c: removed BLIS_ENBLE_ZEN_BLOCK_SIZES macro. We have different configurations for both zen and zen2
config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES
config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store
frame/base/bli_cpuid.c: ROME family is 17H but model # is from 0x30H.
test/test_gemm.c - commented out #define FILE_IN_OUT (some compilation error when BLIS is configured as amd64)
Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples

Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2
2019-05-22 05:51:22 -04:00
Kiran Varaganti
a042db011d Modified make_defs.mk for zen2 to get compiled by gcc version less than gcc9.0
Change-Id: I8fcac30538ee39534c296932639053b47b9a2d43
2019-05-22 05:51:10 -04:00
Kiran Varaganti
a23f92594c config_registry: New AMD zen2 architecture configuration added.
frame/base/bli_arch.c: #ifdef BLIS_FAMILY_ZEN2 id = BLIS_ARCH_ZEN2; #endif added. zen2 is added in config_name[BLIS_NUM_ARCHS]
  frame/base/bli_cpuid.c : #ifdef BLIS_CONFIG_ZEN2 if ( bli_cpuid_is_zen2( family, model, features ) ) return BLIS_ARCH_ZEN2; #endif, defined new function bool bli_cpuid_is_zen2(...).
  frame/base/bli_cpuid.h : declared bli_cpuid_is_zen2(..).
  frame/base/bli_gks.c : #ifdef BLIS_CONFIG_ZEN2 bli_gks_register_cntx(BLIS_ARCH_ZEN2, bli_cntx_init_zen2, bli_cntx_init_zen2_ref, bli_cntx_init_zen2_ind); #endif
  frame/include/bli_arch_config.h : #ifdef BLIS_CONFIG_ZEN2 CNTX_INIT_PROTS(zen2) #endif #ifdef BLIS_FAMILY_ZEN2 #include "bli_family_zen2.h" #endif
  frame/include/bli_type_defs.h : added BLIS_ARCH_ZEN2 in arch_t enum. BLIS_NUM_ARCHS 20

Change-Id: I2a2d9b7266673e78a4f8543b1bfb5425b0aa7866
2019-05-22 05:28:16 -04:00
kdevraje
17b878b66d adding license same as in ut-austin-amd-branch
Change-Id: I6790768d2bf5d42369d304ef93e34701f95fbaff
2019-05-22 14:02:53 +05:30
kdevraje
df755848b8 Merge branch 'amd-staging-rome2.0' of ssh://git.amd.com:29418/cpulibraries/er/blis into rome2.0
Change-Id: Ie8aad1ab810f0f3c0b90ec67f9dd3dfb8dcc74cc
2019-05-22 13:30:07 +05:30
Nisanth M P
c72ae27ade Re-enabling the small matrix gemm optimization for target zen
Change-Id: I13872784586984634d728cd99a00f71c3f904395
2019-05-22 01:05:13 -04:00
sraut
ab0818af80 Review comments incorporated for small TRSM.
Change-Id: Ia64b7b2c0375cc501c2cb0be8a1af93111808cd9
2019-05-22 00:43:10 -04:00
Jeff Hammond
32392cfc72 add info about CXX in configure (#311) 2019-05-14 14:52:30 -05:00
Field G. Van Zee
fa7e6b182b Define _POSIX_C_SOURCE in bli_system.h.
Details:
- Added
    #ifndef _POSIX_C_SOURCE
    #define _POSIX_C_SOURCE 200809L
    #endif
  to bli_system.h so that an application that uses BLIS (specifically,
  an application that #includes blis.h) does not need to remember to
  #define the macro itself (either on the command line or in the code
  that includes blis.h) in order to activate things like the pthreads.
  Thanks to Christos Psarras for reporting this issue and suggesting
  this fix.
- Commented out #include <sys/time.h> in bli_system.h, since I don't
  think this header is used/needed anymore.
- Comment update to function macro for bli_?normiv_unb_var1() in
  frame/util/bli_util_unb_var1.c.
2019-05-01 19:13:00 -05:00