Commit Graph

16 Commits

Author SHA1 Message Date
Field G. Van Zee
c84391314d Reverted minor temp/wspace changes from b426f9e.
Details:
- Added missing license header to bli_pwr9_asm_macros_12x6.h.
- Reverted temporary changes to various files in 'test' and 'testsuite'
  directories.
- Moved testsuite/jobscripts into testsuite/old.
- Minor whitespace/comment changes across various files.
2019-11-04 13:57:12 -06:00
Nicholai Tukanov
b426f9e04e POWER9 DGEMM (#355)
Implemented and registered power9 dgemm ukernel.

Details:
- Implemented 12x6 dgemm microkernel for power9. This microkernel 
  assumes that elements of B have been duplicated/broadcast during the
  packing step. The microkernel uses a column orientation for its 
  microtile vector registers and thus implements column storage and 
  general stride IO cases. (A row storage IO case via in-register
  transposition may be added at a future date.) It should be noted that 
  we recommend using this microkernel with gcc and *not* xlc, as issues 
  with the latter cropped up during development, including but not 
  limited to slightly incompatible vector register mnemonics in the GNU 
  extended inline assembly clobber list.
2019-11-01 17:57:03 -05:00
Field G. Van Zee
29b0e1ef4e Code review + tweaks to AMD's AOCL 2.0 PR (#349).
Details:
- NOTE: This is a merge commit of 'master' of git://github.com/amd/blis
  into 'amd-master' of flame/blis.
- Fixed a bug in the downstream value of BLIS_NUM_ARCHS, which was
  inadvertantly not incremented when the Zen2 subconfiguration was
  added.
- In bli_gemm_front(), added a missing conditional constraint around the
  call to bli_gemm_small() that ensures that the computation precision
  of C matches the storage precision of C.
- In bli_syrk_front(), reorganized and relocated the notrans/trans logic
  that existed around the call to bli_syrk_small() into bli_syrk_small()
  to minimize the calling code footprint and also to bring that code
  into stylistic harmony with similar code in bli_gemm_front() and
  bli_trsm_front(). Also, replaced direct accessing of obj_t fields with
  proper accessor static functions (e.g. 'a->dim[0]' becomes
  'bli_obj_length( a )').
- Added #ifdef BLIS_ENABLE_SMALL_MATRIX guard around prototypes for
  bli_gemm_small(), bli_syrk_small(), and bli_trsm_small(). This is
  strictly speaking unnecessary, but it serves as a useful visual cue to
  those who may be reading the files.
- Removed cpp macro-protected small matrix debugging code from
  bli_trsm_front.c.
- Added a GCC_OT_9_1_0 variable to build/config.mk.in to facilitate gcc
  version check for availability of -march=znver2, and added appropriate
  support to configure script.
- Cleanups to compiler flags common to recent AMD microarchitectures in
  config/zen/amd_config.mk, including: removal of -march=znver1 et al.
  from CKVECFLAGS (since the -march flag is added within make_defs.mk);
  setting CRVECFLAGS similarly to CKVECFLAGS.
- Cleanups to config/zen/bli_cntx_init_zen.c.
- Cleanups, added comments to config/zen/make_defs.mk.
- Cleanups to config/zen2/make_defs.mk, including making use of newly-
  added GCC_OT_9_1_0 and existing GCC_OT_6_1_0 to choose the correct
  set of compiler flags based on the version of gcc being used.
- Reverted downstream changes to test/test_gemm.c.
- Various whitespace/comment changes.
2019-10-11 10:24:24 -05:00
Field G. Van Zee
80e6c10b72 Added reproduction section to Performance docs.
Details:
- Added section titled "Reproduction" to both Performance.md and
  PerformanceSmall.md that briefly nudges the motivated reader in the
  right direction if he/she wishes to run the same performance
  benchmarks used to produce the graphs shown in those documents.
  Thanks to Dave Love for making this suggestion.
2019-08-29 12:12:08 -05:00
Field G. Van Zee
b02e0aae8c Updated test drivers to iterate backwards.
Details:
- Updated test driver source in test, test/3, test/1m4m, and
  test/mixeddt to iterate through the problem space backwards. This
  can help avoid certain situations where the CPU frequency does not
  immediately throttle up to its maximum. Thanks to Robert van de
  Geijn for recommending this fix (originally made to test/sup drivers
  in 57e422a).
- Applied off-by-one matlab output bugfix from b6017e5 to test drivers
  in test, test/3, test/1m4m, and test/mixeddt directories.
2019-08-27 14:37:46 -05:00
Field G. Van Zee
c4cc6fa702 New cntx_t blksz "set" functions + misc tweaks.
Details:
- Defined two new static functions in bli_cntx.h:
    bli_cntx_set_blksz_def_dt()
    bli_cntx_set_blksz_max_dt()
  which developers may find convenient when experimenting with different
  values of cache blocksizes.
- Updated one- and two-socket multithreaded problem size range and
  increment values in test/3/Makefile.
- Changed default to column storage in test/3/test_gemm.c.
- Fixed typo in comment in testsuite/src/test_subm.c.
2019-07-16 13:00:35 -05:00
Field G. Van Zee
057f5f3d21 Minor build system housekeeping.
Details:
- Commented out redundant setting of LIBBLIS_LINK within all driver-
  level Makefiles. This variable is already set within common.mk, and
  so the only time it should be overridden is if the user wants to link
  to a different copy of libblis.
- Very minor changes to build/gen-make-frags/gen-make-frag.sh.
- Whitespace and inconsequential quoting change to configure.
- Moved top-level 'windows' directory into a new 'attic' directory.
2019-05-23 12:51:17 -05:00
Field G. Van Zee
74e513eb6a Support row storage in Eigen gemm test/3 driver.
Details:
- Added preprocessor branches to test/3/test_gemm.c to explicitly
  support row-stored matrices. Column-stored matrices are also still
  supported (and is the default for now). (This is mainly residual work
  leftover from initial integration of Eigen into the test drivers, so
  if we ever want to test Eigen with row-stored matrices, the code will
  be ready to use, even if it is not yet integrated into the Makefile
  in test/3.)
2019-04-17 13:34:44 -05:00
Field G. Van Zee
7bc75882f0 Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
  results, this time using a development version cloned from their git
  mirror on March 27, 2019 (version 3.3.90). Performance is improved
  over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
  test/3/matlab.
2019-03-28 17:40:50 -05:00
Field G. Van Zee
bfac7e385f Added ability to plot with Eigen in test/3/matlab.
Details:
- Updated matlab scripts in test/3/matlab to optionally plot/display
  Eigen performance curves. Whether Eigen is plotted is determined by
  a new boolean function parameter, with_eigen.
- Updated runme.m scratchpad to reflect the latest invocations of the
  plot_panel_4x5() function (with Eigen plotting enabled).
2019-03-27 16:04:48 -05:00
Field G. Van Zee
67535317b9 Fixed mislabeled eigen output from test/3 drivers.
Details:
- Fixed the Makefile in test/3 so that it no longer incorrectly labels
  the matlab output variables from Eigen-linked hemm, herk, trmm, and
  trsm driver output as "vendor". (The gemm drivers were already
  correctly outputing matlab variables containing the "eigen" label.)
2019-03-27 13:32:18 -05:00
Field G. Van Zee
5e6b160c8a Link to Eigen BLAS for non-gemm drivers in test/3.
Details:
- Adjusted test/3/Makefile so that the test drivers are linked against
  Eigen's BLAS library for hemm, herk, trmm, and trsm. We have to do
  this since Eigen's headers don't define implementations to the
  standard BLAS APIs.
- Simplified #included headers in hemm, herk, trmm, and trsm source
  driver files, since nothing specific to Eigen is needed at
  compile-time for those operations.
2019-03-26 19:10:59 -05:00
Field G. Van Zee
92fb9c87bf Add more support for Eigen to drivers in test/3.
Details:
- Use compile-time implementations of Eigen in test_gemm.c via new
  EIGEN cpp macro, defined on command line. (Linking to Eigen's BLAS
  library is not necessary.) However, as of Eigen 3.3.7, Eigen only
  parallelizes the gemm operation and not hemm, herk, trmm, trsm, or
  any other level-3 operation.
- Fixed a bug in trmm and trsm drivers whereby the wrong function
  (bli_does_trans()) was being called to determine whether the object
  for matrix A should be created for a left- or right-side case. This
  was corrected by changing the function to bli_is_left(), as is done
  in the hemm driver.
- Added support for running Eigen test drivers from runme.sh.
2019-03-26 15:43:23 -05:00
Field G. Van Zee
288843b06d Added Eigen support to test/3 Makefile, runme.sh.
Details:
- Added targets to test/3/Makefile that link against a BLAS library
  build by Eigen. It appears, however, that Eigen's BLAS library does
  not support multithreading. (It may be that multithreading is only
  available when using the native C++ APIs.)
- Updated runme.sh with a few Eigen-related tweaks.
- Minor tweaks to docs/Performance.md.
2019-03-20 17:52:23 -05:00
Field G. Van Zee
913cf97653 Added docs/Performance.md and docs/graphs subdir.
Details:
- Added a new markdown document, docs/Performance.md, which reports
  performance of a representative set of level-3 operations across a
  variety of hardware architectures, comparing BLIS to OpenBLAS and a
  vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
  in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
2019-03-19 16:15:24 -05:00
Field G. Van Zee
b938c16b0c Renamed test/3m4m to test/3.
Details:
- Renamed '3m4m' directory to '3', which captures the directory nicely
  since it builds test drivers to test level-3 operations.
- These test drivers ceased to be used to test the 3m and 4m (or even
  1m) induced methods long ago, hence the name change.
2019-03-07 16:40:39 -06:00