Commit Graph

216 Commits

Author SHA1 Message Date
Field G. Van Zee
addcd46b05 Added Epyc 7742 Zen2 ("Rome") sup perf results.
Details:
- Added single-threaded and multithreaded sup performance results to
  docs/PerformanceSmall.md for both sgemm and dgemm. These results were
  gathered on an Epyc 7742 "Rome" server featuring AMD's Zen2
  microarchitecture. Special thanks to Jeff Diamond for facilitating
  access to the system via the Oracle Cloud.
- Updates to octave scripts in test/sup/octave for use with Octave 5.2
  and for use with subplot_tight().
- Minor updates to octave scripts in test/3/octave.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
- Decreased line thickness slightly in large/conventional Zen2 graphs.
  I'm done tweaking those this time. Really.
- Added missing line regarding eigen header installation for each
  microarchitecture section.
2020-10-09 15:41:09 -05:00
Field G. Van Zee
d98368c32d Another tweak to line thickness of Zen2 graphs. 2020-10-08 19:05:51 -05:00
Field G. Van Zee
1855dfbdaa Tweaked line thickness in Zen2 graphs once more.
Details:
- Decreased (relative to previous commit) line thickness in recent Zen2
  graphs.
2020-10-08 19:01:00 -05:00
Field G. Van Zee
0991611e7e Increased line thickness in recent Zen2 graphs.
Details:
- Increased the width of the lines in the graphs introduced in 74ec6b8.
2020-10-08 18:54:49 -05:00
Field G. Van Zee
8273cbacd7 README.md, docs/FAQ.md updates.
Details:
- Added a frequently asked question to docs/FAQ.md regarding the
  difference between upstream (vanilla) BLIS and AMD BLIS.
- Updated the name of ICES in the README.md to reflect the Oden
  rebranding.
2020-10-07 14:51:33 -05:00
Field G. Van Zee
a178a822ad Added Zen2 links to docs/Performance.md Contents. 2020-09-30 16:00:52 -05:00
Field G. Van Zee
74ec6b8f45 Added Epyc 7742 Zen2 ("Rome") performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on an Epyc 7742
  "Rome" server with AMD's Zen2 microarchitecture. Special thanks
  to Jeff Diamond for facilitating access to the system via the
  Oracle Cloud.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
2020-09-30 15:54:18 -05:00
Devin Matthews
5d653a11a0 Update Multithreading.md
Addresses the issue raised in #426.
2020-08-06 17:58:26 -05:00
Field G. Van Zee
882dcb11bf Mention example code at top of documentation docs.
Details:
- Steer the reader towards the example code section of each
  documentation doc (object and typed).
- Trivial update to examples/oapi/README, examples/tapi/README.
2020-08-06 17:28:14 -05:00
Field G. Van Zee
f4894512e5 Very minor updates to previous commit. 2020-08-06 17:20:00 -05:00
Field G. Van Zee
adedb893ae Documented mutator functions in BLISObjectAPI.md.
Details:
- Added documentation for commonly-used object mutator functions in
  BLISObjectAPI.md. Previously, only accessor functions were documented.
  Thanks to Jeff Diamond for pointing out this omission.
- Explicitly set the 'diag' property of objects in oapi example modules
  (08level2.c and 09level3.c).
2020-08-06 17:14:01 -05:00
dzambare
267a959af1 Rebased amd-staging-milan-3.0 branch on master
-- Rebased on top of master commit # 6e522e5823
  -- Updated merged code to remove duplicated code added by auto-merging
  -- Updated merged code to rename bool_t type
  -- Updated merged code to rename bli_thread_obarrier
  -- Updated merged code to rename bli_thread_obroadcast

Change-Id: I39879f1ef3b42ecbe5808af3b559d88c36dbbf6c
AMD-Internal: [CPUPL-1067]
2020-08-06 10:09:29 +05:30
Field G. Van Zee
89105695cc Added sup performance graphs/document to 'docs'.
Details:
- Added a new markdown document, docs/PerformanceSmall.md, which
  publishes new performance graphs for Kaby Lake and Epyc showcasing
  the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
  For now, only single-threaded dgemm performance is shown.
- Reorganized graphs in docs/graphs into docs/graphs/large, with new
  graphs being placed in docs/graphs/sup.
- Updates to scripts in test/sup/octave, mostly to allow decent output
  in both GNU octave and Matlab.
- Updated README.md to mention and refer to the new PerformanceSmall.md
  document.
2020-08-03 11:48:43 +05:30
Field G. Van Zee
ba8fde137c Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
  results, this time using a development version cloned from their git
  mirror on March 27, 2019 (version 3.3.90). Performance is improved
  over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
  test/3/matlab.
2020-08-03 11:48:06 +05:30
Field G. Van Zee
1bc2002706 Added Eigen results to performance graphs.
Details:
- Updated the Haswell, SkylakeX, and Epyc performance graphs in
  docs/graphs to report on Eigen implementations, where applicable.
  Specifically, Eigen implements all level-3 operations sequentially,
  however, of those operations it only provides multithreaded gemm.
  Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
  omitted. Thanks to Sameer Agarwal for his help configuring and
  using Eigen.
- Updated docs/Performance.md to note the new implementation tested.
- CREDITS file update.
2020-08-03 11:48:05 +05:30
Field G. Van Zee
ec6776a1b5 Added docs/Performance.md and docs/graphs subdir.
Details:
- Added a new markdown document, docs/Performance.md, which reports
  performance of a representative set of level-3 operations across a
  variety of hardware architectures, comparing BLIS to OpenBLAS and a
  vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
  in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
2020-08-03 11:47:40 +05:30
Field G. Van Zee
5904e18d40 Updates to docs/Multithreading.md.
Details:
- Made extra explicit the fact that: (a) multithreading in BLIS is
  disabled by default; and (b) even with multithreading enabled, the
  user must specify multithreading at runtime in order to observe
  parallelism. Thanks to M. Zhou for suggesting these clarifications
  in #292.
- Also made explicit that only the environment variable and global
  runtime API methods are available when using the BLAS API. If the
  user wishes to use the local runtime API (specify multithreading on
  a per-call basis), one of the native BLIS APIs must be used.
2020-08-03 11:47:18 +05:30
Field G. Van Zee
c04966262b Mention disabling of sup in docs/Sandboxes.md.
Details:
- Added language to remind the reader to disable sup if the intended
  behavior is for the sandbox implementation to handle all problem
  sizes, even the smaller ones that would normally be handled by the
  sup code path.
2020-08-03 11:27:13 +05:30
Field G. Van Zee
fd5db714f4 Replaced use of bool_t type with C99 bool.
Details:
- Textually replaced nearly all non-comment instances of bool_t with the
  C99 bool type. A few remaining instances, such as those in the files
  bli_herk_x_ker_var2.c, bli_trmm_xx_ker_var2.c, and
  bli_trsm_xx_ker_var2.c, were promoted to dim_t since they were being
  used not for boolean purposes but to index into an array.
- This commit constitutes the third phase of a transition toward using
  C99's bool instead of bool_t, which was raised in issue #420. The first
  phase, which cleaned up various typecasts in preparation for using
  bool as the basis for bool_t (instead of gint_t), was implemented by
  commit a69a4d7. The second phase, which redefined the bool_t typedef
  in terms of bool (from gint_t), was implemented by commit 2c554c2.
2020-08-03 11:27:13 +05:30
Field G. Van Zee
5cbdbe495f Replaced broken ref99 sandbox w/ simpler version.
Details:
- The 'ref99' sandbox was broken by multiple refactorings and internal
  API changes over the last two years. Rather than try to fix it, I've
  replaced it with a much simpler version based on var2 of gemmsup.
  Why not fix the previous implementation? It occurred to me that the
  old implementation was trying to be a lightly simplified duplication
  of what exists in the framework. Duplication aside, this sandbox
  would have worked fine if it had been completely independent of the
  framework code. The problem was that it was only partially
  independent, with many function calls calling a function in BLIS
  rather than a duplicated/simplified version within the sandbox. (And
  the reason I didn't make it fully independent to begin with was that
  it seemed unnecessarily duplicative at the time.) Maintaining two
  versions of the same implementation is problematic for obvious
  reasons, especially when it wasn't even done properly to begin with.
  This explains the reimplementation in this commit. The only catch is
  that the newer implementation is single-threaded only and does not
  perform any packing on either input matrix (A or B). Basically, it's
  only meant to be a simple placeholder that shows how you could plug
  in your own implementation. Thanks to Francisco Igual for reporting
  this brokenness.
- Updated the three reference gemmsup kernels (defined in
  ref_kernels/3/bli_gemmsup_ref.c) so that they properly handle
  conjugation of conja and/or conjb. The general storage kernel, which
  is currently identical to the column-storage kernel, is used in the
  new ref99 sandbox to provide basic support for all datatypes
  (including scomplex and dcomplex).
- Minor updates to docs/Sandboxes.md, including adding the threading
  and packing limitations to the Caveats section.
- Fixed a comment typo in bli_l3_sup_var1n2m.c (upon which the new
  sandbox implementation is based).
2020-08-03 11:23:40 +05:30
Giorgos Margaritis
7db89fe91d Update Multithreading.md 2020-08-03 11:23:40 +05:30
Field G. Van Zee
2d7a43d7ef Fixed incorrect link to shiftd in BLISTypedAPI.md.
Details:
- Previously, the entry for shiftd in the Operation index section of
  BLISTypedAPI.md was incorrectly linking to the shiftd operation entry
  in BLISObjectAPI.md. This has been fixed. Thanks to Jeff Diamond for
  helping find this incorrect link.
2020-08-03 11:22:32 +05:30
Isuru Fernando
51c36e8019 Expand windows instructions (#414)
* Expand windows instructions

* Windows: both static and shared don't work at the same time
2020-08-03 11:22:32 +05:30
Isuru Fernando
c7f9684384 FIx typo in FAQ 2020-08-03 11:22:32 +05:30
Isuru Fernando
454047caa3 Add build instructions for Windows (#404) 2020-08-03 11:22:32 +05:30
Field G. Van Zee
713313562b Separate OS X and Windows into separate FAQs.
Details:
- Separated the unified Mac OS X / Windows frequently asked question
  into two separate questions, one for each OS.
2020-08-03 11:22:32 +05:30
Field G. Van Zee
6e522e5823 Mention disabling of sup in docs/Sandboxes.md.
Details:
- Added language to remind the reader to disable sup if the intended
  behavior is for the sandbox implementation to handle all problem
  sizes, even the smaller ones that would normally be handled by the
  sup code path.
2020-07-30 19:31:37 -05:00
Field G. Van Zee
00e14cb6d8 Replaced use of bool_t type with C99 bool.
Details:
- Textually replaced nearly all non-comment instances of bool_t with the
  C99 bool type. A few remaining instances, such as those in the files
  bli_herk_x_ker_var2.c, bli_trmm_xx_ker_var2.c, and
  bli_trsm_xx_ker_var2.c, were promoted to dim_t since they were being
  used not for boolean purposes but to index into an array.
- This commit constitutes the third phase of a transition toward using
  C99's bool instead of bool_t, which was raised in issue #420. The first
  phase, which cleaned up various typecasts in preparation for using
  bool as the basis for bool_t (instead of gint_t), was implemented by
  commit a69a4d7. The second phase, which redefined the bool_t typedef
  in terms of bool (from gint_t), was implemented by commit 2c554c2.
2020-07-29 14:24:34 -05:00
Field G. Van Zee
a6437a5c11 Replaced broken ref99 sandbox w/ simpler version.
Details:
- The 'ref99' sandbox was broken by multiple refactorings and internal
  API changes over the last two years. Rather than try to fix it, I've
  replaced it with a much simpler version based on var2 of gemmsup.
  Why not fix the previous implementation? It occurred to me that the
  old implementation was trying to be a lightly simplified duplication
  of what exists in the framework. Duplication aside, this sandbox
  would have worked fine if it had been completely independent of the
  framework code. The problem was that it was only partially
  independent, with many function calls calling a function in BLIS
  rather than a duplicated/simplified version within the sandbox. (And
  the reason I didn't make it fully independent to begin with was that
  it seemed unnecessarily duplicative at the time.) Maintaining two
  versions of the same implementation is problematic for obvious
  reasons, especially when it wasn't even done properly to begin with.
  This explains the reimplementation in this commit. The only catch is
  that the newer implementation is single-threaded only and does not
  perform any packing on either input matrix (A or B). Basically, it's
  only meant to be a simple placeholder that shows how you could plug
  in your own implementation. Thanks to Francisco Igual for reporting
  this brokenness.
- Updated the three reference gemmsup kernels (defined in
  ref_kernels/3/bli_gemmsup_ref.c) so that they properly handle
  conjugation of conja and/or conjb. The general storage kernel, which
  is currently identical to the column-storage kernel, is used in the
  new ref99 sandbox to provide basic support for all datatypes
  (including scomplex and dcomplex).
- Minor updates to docs/Sandboxes.md, including adding the threading
  and packing limitations to the Caveats section.
- Fixed a comment typo in bli_l3_sup_var1n2m.c (upon which the new
  sandbox implementation is based).
2020-07-20 19:21:07 -05:00
Giorgos Margaritis
171ecc1dc6 Update Multithreading.md 2020-07-20 12:24:06 +03:00
Field G. Van Zee
ceb9b95a96 Fixed incorrect link to shiftd in BLISTypedAPI.md.
Details:
- Previously, the entry for shiftd in the Operation index section of
  BLISTypedAPI.md was incorrectly linking to the shiftd operation entry
  in BLISObjectAPI.md. This has been fixed. Thanks to Jeff Diamond for
  helping find this incorrect link.
2020-06-18 17:15:25 -05:00
Isuru Fernando
31af73c11a Expand windows instructions (#414)
* Expand windows instructions

* Windows: both static and shared don't work at the same time
2020-06-18 13:35:54 -05:00
Isuru Fernando
35e38fb693 FIx typo in FAQ 2020-06-16 09:08:31 -07:00
Isuru Fernando
943a21def0 Add build instructions for Windows (#404) 2020-05-21 14:09:21 -05:00
Field G. Van Zee
fbef422f0d Separate OS X and Windows into separate FAQs.
Details:
- Separated the unified Mac OS X / Windows frequently asked question
  into two separate questions, one for each OS.
2020-05-21 10:30:41 -05:00
Field G. Van Zee
994a2d8de5 Documented Perl prerequisite for build system.
Details:
- Added Perl to list of prerequisites for building BLIS. This is in part
  (and perhaps completely?) due to some substitution commands used at
  the end of configure that include '\n' characters that are not
  properly interpreted by the version of sed included on some versions
  of OS X. This new documentation addresses issue #398.
2020-05-21 11:56:45 +05:30
Yingbo Ma
562b9eeaaf Update KernelsHowTo.md (#395) 2020-05-21 11:56:45 +05:30
Field G. Van Zee
8e3f143439 Adding missing conjy to her2/syr2 in typed API doc.
Details:
- Fixed a missing argument (conjy) in the function signatures of
  bli_?her2() and bli_?syr2() in docs/BLISTypedAPI.md. Thanks to Robert
  van de Geijn for reporting this omission.

Change-Id: Ifd1e01d5d7f943db4b1d67b467eb57e4a5c44165
2020-05-21 11:56:36 +05:30
Field G. Van Zee
052a3c589f ReleaseNotes.md update in advance of next version.
Details:
- Updated docs/ReleaseNotes.md in preparation for next version.

Change-Id: I6c3e0dbaebcb855dff9420196092da5cb0bcce89
2020-05-21 11:55:20 +05:30
Field G. Van Zee
c7faae9442 Merged test/sup, test/supmt into test/sup.
Details:
- Updated the Makefile, test_gemm.c, and runme.sh in test/sup to be able
  to compile and run both single-threaded and multithreaded experiments.
  This should help with maintenance going forward.
- Created a test/sup/octave_st directory of scripts (based on the
  previous test/sup/octave scripts) as well as a test/sup/octave_mt
  directory (based on the previous test/supmt/octave scripts). The
  octave scripts are slightly different and not easily mergeable, and
  thus for now I'll maintain them separately.
- Preserved the previous test/sup directory as test/sup/old/supst and
  the previous test/supmt directory as test/sup/old/supmt.

Change-Id: Ia230fc65185fd9a34eec714721004aa9e0bd40ed
2020-05-21 11:50:19 +05:30
Field G. Van Zee
d6496d55cc ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.

Change-Id: I2aa6f944ce2584de85ae7b6921ff0193b3b7020a
2020-05-21 11:41:49 +05:30
Dave Love
291ee5f748 Fix parsing in vpu_count on workstation SKX (#351)
* Fix parsing in vpu_count on workstation SKX

* Document Skylake-X as Haswell for single FMA

* Update vpu_count for Skylake and Cascade Lake models

* Support printing the configuration selected, controlled by the environment

Intended particularly for diagnosing mis-selection of SKX through
unknown, or incorrect, number of VPUs.

* Move bli_log outside the cpp condition, and use it where intended

* Add Fixme comment (Skylake D)

* Mostly superficial edits to commits towards #351.

Details:
- Moved architecture/sub-config logging-related code from bli_cpuid.c
  to bli_arch.c, tweaked names, and added more set/get layering.
- Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
- Content, whitespace changes to new bullet in HardwareSupport.md that
  relates to single-VPU Skylake-Xs.

* Fix comment typos

Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-05-21 11:40:57 +05:30
Field G. Van Zee
c53b5153be Documented Perl prerequisite for build system.
Details:
- Added Perl to list of prerequisites for building BLIS. This is in part
  (and perhaps completely?) due to some substitution commands used at
  the end of configure that include '\n' characters that are not
  properly interpreted by the version of sed included on some versions
  of OS X. This new documentation addresses issue #398.
2020-05-05 12:39:12 -05:00
Yingbo Ma
4d87eb24e8 Update KernelsHowTo.md (#395) 2020-04-27 16:02:47 -05:00
Field G. Van Zee
8bde63ffd7 Adding missing conjy to her2/syr2 in typed API doc.
Details:
- Fixed a missing argument (conjy) in the function signatures of
  bli_?her2() and bli_?syr2() in docs/BLISTypedAPI.md. Thanks to Robert
  van de Geijn for reporting this omission.
2020-04-18 12:50:12 -05:00
Field G. Van Zee
b04de636c1 ReleaseNotes.md update in advance of next version.
Details:
- Updated docs/ReleaseNotes.md in preparation for next version.
2020-04-07 14:37:43 -05:00
Field G. Van Zee
1a284828d1 Support multithreading within the sup framework.
Details:
- Added multithreading support to the sup framework (via either OpenMP
  or pthreads). Both variants 1n and 2m now have the appropriate
  threading infrastructure, including data partitioning logic, to
  parallelize computation. This support handles all four combinations
  of packing on matrices A and B (neither, A only, B only, or both).
  This implementation tries to be a little smarter when automatic
  threading is requested (e.g. via BLIS_NUM_THREADS) in that it will
  recalculate the factorization in units of micropanels (rather than
  using the raw dimensions) in bli_l3_sup_int.c, when the final
  problem shape is known and after threads have already been spawned.
- Implemented bli_?packm_sup_var2(), which packs to conventional row-
  or column-stored matrices. (This is used for the rrc and crc storage
  cases.) Previously, copym was used, but that would no longer suffice
  because it could not be parallelized.
- Minor reorganization of packing-related sup functions. Specifically,
  bli_packm_sup_init_mem_[ab]() are called from within packm_sup_[ab]()
  instead of from the variant functions. This has the effect of making
  the variant functions more readable.
- Added additional bli_thrinfo_set_*() static functions to bli_thrinfo.h
  and inserted usage of these functions within bli_thrinfo_init(), which
  previously was accessing thrinfo_t fields via the -> operator.
- Renamed bli_partition_2x2() to bli_thread_partition_2x2().
- Added an auto_factor field to the rntm_t struct in order to track
  whether automatic thread factorization was originally requested.
- Added new test drivers in test/supmt that perform multithreaded sup
  tests, as well as appropriate octave/matlab scripts to plot the
  resulting output files.
- Added additional language to docs/Multithreading.md to make it clear
  that specifying any BLIS_*_NT variable, even if it is set to 1, will
  be considered manual specification for the purposes of determining
  whether to auto-factorize via BLIS_NUM_THREADS.
- Minor comment updates.
AMD-Internal: [CPUPL-713]

Change-Id: I9536648e7befac4d2dc17805e44ef34470961662
2020-03-13 01:09:29 -04:00
Field G. Van Zee
0f9e0399e1 Updated sup performance graphs; added mt results.
Details:
- Reran all existing single-threaded performance experiments comparing
  BLIS sup to other implementations (including the conventional code
  path within BLIS), using the latest versions (where appropriate).
- Added multithreaded results for the three existing hardware types
  showcased in docs/PerformanceSmall.md: Kaby Lake, Haswell, and Epyc
  (Zen1).
- Various minor updates to the text in docs/PerformanceSmall.md.
- Updates to the octave scripts in test/sup/octave, test/supmt/octave.
2020-03-05 17:03:21 -06:00
Field G. Van Zee
c0558fde45 Support multithreading within the sup framework.
Details:
- Added multithreading support to the sup framework (via either OpenMP
  or pthreads). Both variants 1n and 2m now have the appropriate
  threading infrastructure, including data partitioning logic, to
  parallelize computation. This support handles all four combinations
  of packing on matrices A and B (neither, A only, B only, or both).
  This implementation tries to be a little smarter when automatic
  threading is requested (e.g. via BLIS_NUM_THREADS) in that it will
  recalculate the factorization in units of micropanels (rather than
  using the raw dimensions) in bli_l3_sup_int.c, when the final
  problem shape is known and after threads have already been spawned.
- Implemented bli_?packm_sup_var2(), which packs to conventional row-
  or column-stored matrices. (This is used for the rrc and crc storage
  cases.) Previously, copym was used, but that would no longer suffice
  because it could not be parallelized.
- Minor reorganization of packing-related sup functions. Specifically,
  bli_packm_sup_init_mem_[ab]() are called from within packm_sup_[ab]()
  instead of from the variant functions. This has the effect of making
  the variant functions more readable.
- Added additional bli_thrinfo_set_*() static functions to bli_thrinfo.h
  and inserted usage of these functions within bli_thrinfo_init(), which
  previously was accessing thrinfo_t fields via the -> operator.
- Renamed bli_partition_2x2() to bli_thread_partition_2x2().
- Added an auto_factor field to the rntm_t struct in order to track
  whether automatic thread factorization was originally requested.
- Added new test drivers in test/supmt that perform multithreaded sup
  tests, as well as appropriate octave/matlab scripts to plot the
  resulting output files.
- Added additional language to docs/Multithreading.md to make it clear
  that specifying any BLIS_*_NT variable, even if it is set to 1, will
  be considered manual specification for the purposes of determining
  whether to auto-factorize via BLIS_NUM_THREADS.
- Minor comment updates.
2020-02-17 14:08:08 -06:00
Field G. Van Zee
5db8e710a2 ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
2020-01-14 15:59:59 -06:00