Commit Graph

26 Commits

Author SHA1 Message Date
Field G. Van Zee
f065a8070f Removed support for 3m, 4m induced methods.
Details:
- Removed support for all induced methods except for 1m. This included
  removing code related to 3mh, 3m1, 4mh, 4m1a, and 4m1b as well as any
  code that existed only to support those implementations. These
  implementations were rarely used and posed code maintenance challenges
  for BLIS's maintainers going forward.
- Removed reference kernels for packm that pack 3m and 4m micropanels,
  and removed 3m/4m-related code from bli_cntx_ref.c.
- Removed support for 3m/4m from the code in frame/ind, then reorganized
  and streamlined the remaining code in that directory. The *ind(),
  *nat(), and *1m() APIs were all removed. (These additional API layers
  no longer made as much sense with only one induced method (1m) being
  supported.) The bli_ind.c file (and header) were moved to frame/base
  and bli_l3_ind.c (and header) and bli_l3_ind_tapi.h were moved to
  frame/3.
- Removed 3m/4m support from the code in frame/1m/packm.
- Removed 3m/4m support from trmm/trsm macrokernels and simplified some
  pointer arithmetic that was previously expressed in terms of the
  bli_ptr_inc_by_frac() static inline function (whose definition was
  also removed).
- Removed the following subdirectories of level-0 macro headers from
  frame/include/level0: ri3, rih, ri, ro, rpi. The level-0 scalar macros
  defined in these directories were used exclusively for 3m and 4m
  method codes.
- Simplified bli_cntx_set_blkszs() and bli_cntx_set_ind_blkszs() in
  light of 1m being the only induced method left within BLIS.
- Removed dt_on_output field within auxinfo_t and its associated
  accessor functions.
- Re-indexed the 1e/1r pack schemas after removing those associated with
  variants of the 3m and 4m methods. This leaves two bits unused within
  the pack format portion of the schema bitfield. (See bli_type_defs.h
  for more info.)
- Spun off the basic and expert interfaces to the object and typed APIs
  into separate files: bli_l3_oapi.c and bli_l3_oapi_ex.c; bli_l3_tapi.c
  and bli_l3_tapi_ex.c.
- Moved the level-3 operation-specific _check function calls from the
  operations' _front() functions to the corresponding _ex() function of
  the object API. (This change roughly maintains where the _check()
  functions are called in the call stack but lays the groundwork for
  future changes that may come to the level-3 object APIs.) Minor
  modifications to bli_l3_check.c to allow the check() functions to be
  called from the expert interface APIs.
- Removed support within the testsuite for testing the aforementioned
  induced methods, and updated the standalone test drivers in the 'test'
  directory so reflect the retirement of those induced methods.
- Modified the sandbox contract so that the user is obliged to define
  bli_gemm_ex() instead of bli_gemmnat(). (This change was made in light
  of the *nat() functions no longer existing.) Also updated the existing
  'power10' and 'gemmlike' sandboxes to come into compliance with the
  new sandbox rules.
- Updated BLISObjectAPI.md, BLISTypedAPI.md, Testsuite.md documentation
  to reflect the retirement of 3m/4m, and also modified Sandboxes.md to
  bring the document into alignment with new conventions.
- Updated various comments; removed segments of commented-out code.
2021-10-28 16:05:43 -05:00
Field G. Van Zee
cc9206df66 Added Graviton2 Neoverse N1 performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on a Graviton2
  Neoverse N1 server. Special thanks to Nicholai Tukanov for
  collecting these results via the Arm-HPC/AWS hackaton.
- Corrected what was supposed to be a temporary tweak to the legend
  labels in test/3/octave/plot_l3_perf.m.
2021-07-16 15:48:37 -05:00
Field G. Van Zee
82af05f54c Updated Fugaku (a64fx) performance results.
Details:
- Updated the performance graphs (pdfs and pngs) for the Fugaku/a64fx
  entry within Performance.md, and also updated the experiment details
  accordingly. Thanks to RuQing Xu for re-running the BLIS and SSL2
  experiments reflected in this commit.
- In Performance.md, added an English translation of the project name
  under which the Fugaku results were gathered, courtesy of RuQing Xu.
2021-05-25 15:25:08 -05:00
Field G. Van Zee
ba3ba8da83 Minor updates and fixes to test/3/octave scripts.
Details:
- Fixed an issue where the wrong string was being passed in for the
  vendor legend string.
- Changed the graph in which the legends appear.
- Updates to runthese.m.
2021-04-06 18:39:58 -05:00
Field G. Van Zee
159ca6f01a Made test/3/octave scripts robust to missing data.
Details:
- Modified the octave scripts in test/3 so that the script does not
  choke when one or more of the expected OpenBLAS, Eigen, or vendor data
  files is missing. (The BLIS data set, however, must be complete.) When
  a file is missing, that data series is simply not included on that
  particular graph. Also factored out a lot of the redundant logic from
  plot_panel_4x5.m into a separate function in read_data.m.
2021-03-24 15:57:32 -05:00
Field G. Van Zee
addcd46b05 Added Epyc 7742 Zen2 ("Rome") sup perf results.
Details:
- Added single-threaded and multithreaded sup performance results to
  docs/PerformanceSmall.md for both sgemm and dgemm. These results were
  gathered on an Epyc 7742 "Rome" server featuring AMD's Zen2
  microarchitecture. Special thanks to Jeff Diamond for facilitating
  access to the system via the Oracle Cloud.
- Updates to octave scripts in test/sup/octave for use with Octave 5.2
  and for use with subplot_tight().
- Minor updates to octave scripts in test/3/octave.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
- Decreased line thickness slightly in large/conventional Zen2 graphs.
  I'm done tweaking those this time. Really.
- Added missing line regarding eigen header installation for each
  microarchitecture section.
2020-10-09 15:41:09 -05:00
Field G. Van Zee
bc4a213a2c Updated matlab (now octave) plot code in test/3.
Details:
- Renamed test/3/matlab to test/3/octave.
- Within test/3, updated and tuned plot_l3_perf.m and plot_panel_4x5.m
  files for use with octave (which is free and doesn't crash on me
  mid-way through my use of subplot).
- Updated runthese.m scratchpad for zen2 invocations.
- Added Nikolay S.'s subplot_tight() function, along with its license.
2020-09-30 15:28:20 -05:00
Field G. Van Zee
c77ddc4181 Added optional numactl usage to test/3/runme.sh. 2020-09-30 20:15:43 +00:00
Field G. Van Zee
4fd8d9fec2 Tweaked zen2 subconfig's MC cache blocksizes.
Details:
- Updated the MC cache blocksizes registered by the 'zen2' subconfig.
- Minor updates to test/3/Makefile and test/3/runme.sh.
2020-09-28 23:39:05 +00:00
Field G. Van Zee
e01dd12558 Fail-safe updates to Makefiles in 'test' dir.
Details:
- Updated Makefiles in test, test/3, and test/sup so that running any of
  the usual targets without having first built BLIS results in a helpful
  error message. For example, if BLIS is not yet configured, make will
  output:

    Makefile:327: *** Cannot proceed: config.mk not detected! Run
    configure first.  Stop.

  Similarly, if BLIS is configured but not yet built, make will output:

    Makefile:340: *** Cannot proceed: BLIS library not yet built! Run
    make first.  Stop.

  In previous commits, these actions would result in a rather cryptic
  make error such as:

    make: *** No rule to make target 'test_sgemm_2400_asm_blis_st.x',
    needed by 'blis-nat-st'.  Stop.
2020-07-24 15:41:46 -05:00
Field G. Van Zee
c84391314d Reverted minor temp/wspace changes from b426f9e.
Details:
- Added missing license header to bli_pwr9_asm_macros_12x6.h.
- Reverted temporary changes to various files in 'test' and 'testsuite'
  directories.
- Moved testsuite/jobscripts into testsuite/old.
- Minor whitespace/comment changes across various files.
2019-11-04 13:57:12 -06:00
Nicholai Tukanov
b426f9e04e POWER9 DGEMM (#355)
Implemented and registered power9 dgemm ukernel.

Details:
- Implemented 12x6 dgemm microkernel for power9. This microkernel 
  assumes that elements of B have been duplicated/broadcast during the
  packing step. The microkernel uses a column orientation for its 
  microtile vector registers and thus implements column storage and 
  general stride IO cases. (A row storage IO case via in-register
  transposition may be added at a future date.) It should be noted that 
  we recommend using this microkernel with gcc and *not* xlc, as issues 
  with the latter cropped up during development, including but not 
  limited to slightly incompatible vector register mnemonics in the GNU 
  extended inline assembly clobber list.
2019-11-01 17:57:03 -05:00
Field G. Van Zee
29b0e1ef4e Code review + tweaks to AMD's AOCL 2.0 PR (#349).
Details:
- NOTE: This is a merge commit of 'master' of git://github.com/amd/blis
  into 'amd-master' of flame/blis.
- Fixed a bug in the downstream value of BLIS_NUM_ARCHS, which was
  inadvertantly not incremented when the Zen2 subconfiguration was
  added.
- In bli_gemm_front(), added a missing conditional constraint around the
  call to bli_gemm_small() that ensures that the computation precision
  of C matches the storage precision of C.
- In bli_syrk_front(), reorganized and relocated the notrans/trans logic
  that existed around the call to bli_syrk_small() into bli_syrk_small()
  to minimize the calling code footprint and also to bring that code
  into stylistic harmony with similar code in bli_gemm_front() and
  bli_trsm_front(). Also, replaced direct accessing of obj_t fields with
  proper accessor static functions (e.g. 'a->dim[0]' becomes
  'bli_obj_length( a )').
- Added #ifdef BLIS_ENABLE_SMALL_MATRIX guard around prototypes for
  bli_gemm_small(), bli_syrk_small(), and bli_trsm_small(). This is
  strictly speaking unnecessary, but it serves as a useful visual cue to
  those who may be reading the files.
- Removed cpp macro-protected small matrix debugging code from
  bli_trsm_front.c.
- Added a GCC_OT_9_1_0 variable to build/config.mk.in to facilitate gcc
  version check for availability of -march=znver2, and added appropriate
  support to configure script.
- Cleanups to compiler flags common to recent AMD microarchitectures in
  config/zen/amd_config.mk, including: removal of -march=znver1 et al.
  from CKVECFLAGS (since the -march flag is added within make_defs.mk);
  setting CRVECFLAGS similarly to CKVECFLAGS.
- Cleanups to config/zen/bli_cntx_init_zen.c.
- Cleanups, added comments to config/zen/make_defs.mk.
- Cleanups to config/zen2/make_defs.mk, including making use of newly-
  added GCC_OT_9_1_0 and existing GCC_OT_6_1_0 to choose the correct
  set of compiler flags based on the version of gcc being used.
- Reverted downstream changes to test/test_gemm.c.
- Various whitespace/comment changes.
2019-10-11 10:24:24 -05:00
Field G. Van Zee
80e6c10b72 Added reproduction section to Performance docs.
Details:
- Added section titled "Reproduction" to both Performance.md and
  PerformanceSmall.md that briefly nudges the motivated reader in the
  right direction if he/she wishes to run the same performance
  benchmarks used to produce the graphs shown in those documents.
  Thanks to Dave Love for making this suggestion.
2019-08-29 12:12:08 -05:00
Field G. Van Zee
b02e0aae8c Updated test drivers to iterate backwards.
Details:
- Updated test driver source in test, test/3, test/1m4m, and
  test/mixeddt to iterate through the problem space backwards. This
  can help avoid certain situations where the CPU frequency does not
  immediately throttle up to its maximum. Thanks to Robert van de
  Geijn for recommending this fix (originally made to test/sup drivers
  in 57e422a).
- Applied off-by-one matlab output bugfix from b6017e5 to test drivers
  in test, test/3, test/1m4m, and test/mixeddt directories.
2019-08-27 14:37:46 -05:00
Field G. Van Zee
c4cc6fa702 New cntx_t blksz "set" functions + misc tweaks.
Details:
- Defined two new static functions in bli_cntx.h:
    bli_cntx_set_blksz_def_dt()
    bli_cntx_set_blksz_max_dt()
  which developers may find convenient when experimenting with different
  values of cache blocksizes.
- Updated one- and two-socket multithreaded problem size range and
  increment values in test/3/Makefile.
- Changed default to column storage in test/3/test_gemm.c.
- Fixed typo in comment in testsuite/src/test_subm.c.
2019-07-16 13:00:35 -05:00
Field G. Van Zee
057f5f3d21 Minor build system housekeeping.
Details:
- Commented out redundant setting of LIBBLIS_LINK within all driver-
  level Makefiles. This variable is already set within common.mk, and
  so the only time it should be overridden is if the user wants to link
  to a different copy of libblis.
- Very minor changes to build/gen-make-frags/gen-make-frag.sh.
- Whitespace and inconsequential quoting change to configure.
- Moved top-level 'windows' directory into a new 'attic' directory.
2019-05-23 12:51:17 -05:00
Field G. Van Zee
74e513eb6a Support row storage in Eigen gemm test/3 driver.
Details:
- Added preprocessor branches to test/3/test_gemm.c to explicitly
  support row-stored matrices. Column-stored matrices are also still
  supported (and is the default for now). (This is mainly residual work
  leftover from initial integration of Eigen into the test drivers, so
  if we ever want to test Eigen with row-stored matrices, the code will
  be ready to use, even if it is not yet integrated into the Makefile
  in test/3.)
2019-04-17 13:34:44 -05:00
Field G. Van Zee
7bc75882f0 Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
  results, this time using a development version cloned from their git
  mirror on March 27, 2019 (version 3.3.90). Performance is improved
  over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
  test/3/matlab.
2019-03-28 17:40:50 -05:00
Field G. Van Zee
bfac7e385f Added ability to plot with Eigen in test/3/matlab.
Details:
- Updated matlab scripts in test/3/matlab to optionally plot/display
  Eigen performance curves. Whether Eigen is plotted is determined by
  a new boolean function parameter, with_eigen.
- Updated runme.m scratchpad to reflect the latest invocations of the
  plot_panel_4x5() function (with Eigen plotting enabled).
2019-03-27 16:04:48 -05:00
Field G. Van Zee
67535317b9 Fixed mislabeled eigen output from test/3 drivers.
Details:
- Fixed the Makefile in test/3 so that it no longer incorrectly labels
  the matlab output variables from Eigen-linked hemm, herk, trmm, and
  trsm driver output as "vendor". (The gemm drivers were already
  correctly outputing matlab variables containing the "eigen" label.)
2019-03-27 13:32:18 -05:00
Field G. Van Zee
5e6b160c8a Link to Eigen BLAS for non-gemm drivers in test/3.
Details:
- Adjusted test/3/Makefile so that the test drivers are linked against
  Eigen's BLAS library for hemm, herk, trmm, and trsm. We have to do
  this since Eigen's headers don't define implementations to the
  standard BLAS APIs.
- Simplified #included headers in hemm, herk, trmm, and trsm source
  driver files, since nothing specific to Eigen is needed at
  compile-time for those operations.
2019-03-26 19:10:59 -05:00
Field G. Van Zee
92fb9c87bf Add more support for Eigen to drivers in test/3.
Details:
- Use compile-time implementations of Eigen in test_gemm.c via new
  EIGEN cpp macro, defined on command line. (Linking to Eigen's BLAS
  library is not necessary.) However, as of Eigen 3.3.7, Eigen only
  parallelizes the gemm operation and not hemm, herk, trmm, trsm, or
  any other level-3 operation.
- Fixed a bug in trmm and trsm drivers whereby the wrong function
  (bli_does_trans()) was being called to determine whether the object
  for matrix A should be created for a left- or right-side case. This
  was corrected by changing the function to bli_is_left(), as is done
  in the hemm driver.
- Added support for running Eigen test drivers from runme.sh.
2019-03-26 15:43:23 -05:00
Field G. Van Zee
288843b06d Added Eigen support to test/3 Makefile, runme.sh.
Details:
- Added targets to test/3/Makefile that link against a BLAS library
  build by Eigen. It appears, however, that Eigen's BLAS library does
  not support multithreading. (It may be that multithreading is only
  available when using the native C++ APIs.)
- Updated runme.sh with a few Eigen-related tweaks.
- Minor tweaks to docs/Performance.md.
2019-03-20 17:52:23 -05:00
Field G. Van Zee
913cf97653 Added docs/Performance.md and docs/graphs subdir.
Details:
- Added a new markdown document, docs/Performance.md, which reports
  performance of a representative set of level-3 operations across a
  variety of hardware architectures, comparing BLIS to OpenBLAS and a
  vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
  in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
2019-03-19 16:15:24 -05:00
Field G. Van Zee
b938c16b0c Renamed test/3m4m to test/3.
Details:
- Renamed '3m4m' directory to '3', which captures the directory nicely
  since it builds test drivers to test level-3 operations.
- These test drivers ceased to be used to test the 3m and 4m (or even
  1m) induced methods long ago, hence the name change.
2019-03-07 16:40:39 -06:00