26 Commits

Author SHA1 Message Date
RuQing Xu
f76ea905e2 Arm SVE: Update Perf. Graph
Pic. size seems a bit different from upstream.
Generaged w/ MATLAB. Open to any change.
2021-10-08 12:13:08 +09:00
Field G. Van Zee
cc9206df66 Added Graviton2 Neoverse N1 performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on a Graviton2
  Neoverse N1 server. Special thanks to Nicholai Tukanov for
  collecting these results via the Arm-HPC/AWS hackaton.
- Corrected what was supposed to be a temporary tweak to the legend
  labels in test/3/octave/plot_l3_perf.m.
2021-07-16 15:48:37 -05:00
Field G. Van Zee
82af05f54c Updated Fugaku (a64fx) performance results.
Details:
- Updated the performance graphs (pdfs and pngs) for the Fugaku/a64fx
  entry within Performance.md, and also updated the experiment details
  accordingly. Thanks to RuQing Xu for re-running the BLIS and SSL2
  experiments reflected in this commit.
- In Performance.md, added an English translation of the project name
  under which the Fugaku results were gathered, courtesy of RuQing Xu.
2021-05-25 15:25:08 -05:00
Field G. Van Zee
6280757be3 Minor updates to a64fx section of Performance.md. 2021-04-07 13:03:56 -05:00
RuQing Xu
1e6ed823c6 Additional A64fx Comments (#490)
* Performance.md Update A64fx Comments

- Reason for ARMPL's missing data;
- Additional envs / flags for kernel selection;
- Update BLIS SRC commit.

* Include Another Fix in armsve-cfg-vendor

A prototype was forgotten, causing that void* pointer was not fully returned.
2021-04-07 12:59:26 -05:00
Field G. Van Zee
2688f21a5b Added Fujitsu A64fx (512-bit SVE) perf results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on the "Fugaku"
  Fujitsu A64fx supercomputer at the RIKEN Center for Computational
  Science in Kobe, Japan. Special thanks to RuQing Xu and Stepan
  Nassyr for their work in developing and optimizing A64fx support in
  BLIS and RuQing for gathering the performance data that is reflected
  in these new graphs.
2021-04-06 19:02:37 -05:00
Field G. Van Zee
addcd46b05 Added Epyc 7742 Zen2 ("Rome") sup perf results.
Details:
- Added single-threaded and multithreaded sup performance results to
  docs/PerformanceSmall.md for both sgemm and dgemm. These results were
  gathered on an Epyc 7742 "Rome" server featuring AMD's Zen2
  microarchitecture. Special thanks to Jeff Diamond for facilitating
  access to the system via the Oracle Cloud.
- Updates to octave scripts in test/sup/octave for use with Octave 5.2
  and for use with subplot_tight().
- Minor updates to octave scripts in test/3/octave.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
- Decreased line thickness slightly in large/conventional Zen2 graphs.
  I'm done tweaking those this time. Really.
- Added missing line regarding eigen header installation for each
  microarchitecture section.
2020-10-09 15:41:09 -05:00
Field G. Van Zee
a178a822ad Added Zen2 links to docs/Performance.md Contents. 2020-09-30 16:00:52 -05:00
Field G. Van Zee
74ec6b8f45 Added Epyc 7742 Zen2 ("Rome") performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on an Epyc 7742
  "Rome" server with AMD's Zen2 microarchitecture. Special thanks
  to Jeff Diamond for facilitating access to the system via the
  Oracle Cloud.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
2020-09-30 15:54:18 -05:00
Field G. Van Zee
80e6c10b72 Added reproduction section to Performance docs.
Details:
- Added section titled "Reproduction" to both Performance.md and
  PerformanceSmall.md that briefly nudges the motivated reader in the
  right direction if he/she wishes to run the same performance
  benchmarks used to produce the graphs shown in those documents.
  Thanks to Dave Love for making this suggestion.
2019-08-29 12:12:08 -05:00
Field G. Van Zee
2f387e32ef Added Eigen -march=native hack to perf docs.
Details:
- Spell out the hack given to me by Sameer Agarwal in order to get Eigen
  to build with -march=native (which is critically important for Eigen)
  in docs/Performance.md and docs/PerformanceSmall.md.
2019-08-22 14:27:30 -05:00
Field G. Van Zee
e6ac4ebcb6 Added page size, source location to perf docs.
Details:
- Added the page size, as returned via 'getconf -a | grep PAGE_SIZE',
  and the location of the performance drivers to docs/Performance.md
  (test/3) and docs/PerformanceSmall.md (test/sup). Thanks to Dave
  Love for suggesting these additions in #325.
2019-08-20 13:49:47 -05:00
Field G. Van Zee
cbaa22e1ca Added BLASFEO results to docs/PerformanceSmall.md.
Details:
- Updated the graphs linked in PerformanceSmall.md with BLASFEO results,
  and added documenting language accordingly.
- Updated scripts in test/sup/octave to plot BLASFEO data.
- Minor tweak to language re: how OpenBLAS was configured for
  docs/Performance.md.
2019-06-04 16:06:58 -05:00
Field G. Van Zee
09ba05c6f8 Added sup performance graphs/document to 'docs'.
Details:
- Added a new markdown document, docs/PerformanceSmall.md, which
  publishes new performance graphs for Kaby Lake and Epyc showcasing
  the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
  For now, only single-threaded dgemm performance is shown.
- Reorganized graphs in docs/graphs into docs/graphs/large, with new
  graphs being placed in docs/graphs/sup.
- Updates to scripts in test/sup/octave, mostly to allow decent output
  in both GNU octave and Matlab.
- Updated README.md to mention and refer to the new PerformanceSmall.md
  document.
2019-06-03 16:53:19 -05:00
Field G. Van Zee
755730608d Minor rewording of language around mt env. vars. 2019-05-23 17:34:36 -05:00
Field G. Van Zee
ba31abe73c Added BLIS theading info to Performance.md.
Details:
- Documented the BLIS environment variables that were set
  (e.g. BLIS_JC_NT, BLIS_IC_NT, BLIS_JR_NT) for each machine and
  threading configuration in order to achieve the parallelism reported
  on in docs/Performance.md.
2019-05-23 14:59:53 -05:00
Field G. Van Zee
7bc75882f0 Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
  results, this time using a development version cloned from their git
  mirror on March 27, 2019 (version 3.3.90). Performance is improved
  over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
  test/3/matlab.
2019-03-28 17:40:50 -05:00
Field G. Van Zee
20ea7a1217 Minor text updates (Eigen) to docs/Performance.md.
Details:
- Added/updated a few more details, mostly regarding Eigen.
2019-03-27 18:09:17 -05:00
Field G. Van Zee
2c85e1dd9d Added Eigen results to performance graphs.
Details:
- Updated the Haswell, SkylakeX, and Epyc performance graphs in
  docs/graphs to report on Eigen implementations, where applicable.
  Specifically, Eigen implements all level-3 operations sequentially,
  however, of those operations it only provides multithreaded gemm.
  Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
  omitted. Thanks to Sameer Agarwal for his help configuring and
  using Eigen.
- Updated docs/Performance.md to note the new implementation tested.
- CREDITS file update.
2019-03-27 16:29:51 -05:00
Field G. Van Zee
e593221383 Merge branch 'master' into dev 2019-03-26 15:51:45 -05:00
Field G. Van Zee
288843b06d Added Eigen support to test/3 Makefile, runme.sh.
Details:
- Added targets to test/3/Makefile that link against a BLAS library
  build by Eigen. It appears, however, that Eigen's BLAS library does
  not support multithreading. (It may be that multithreading is only
  available when using the native C++ APIs.)
- Updated runme.sh with a few Eigen-related tweaks.
- Minor tweaks to docs/Performance.md.
2019-03-20 17:52:23 -05:00
Field G. Van Zee
153e0be21d More minor tweaks to docs/Performance.md.
Details:
- Defined GFLOPS as billions of floating-point operations per second,
  and reworded the sentence after about normalization.
2019-03-19 17:53:18 -05:00
Field G. Van Zee
ab5ad557ea Very minor tweaks to Performance.md. 2019-03-19 16:50:41 -05:00
Field G. Van Zee
03c4a25e1a Minor fixes to docs/Performance.md.
Details:
- Fixed some incorrect labels associated with the pdf/png graphs,
  apparently the result of copy-pasting.
2019-03-19 16:47:15 -05:00
Field G. Van Zee
fe6dd8b132 Fixed broken section links in docs/Performance.md.
Details:
- Fixed a few broken section links in the Contents section.
2019-03-19 16:30:23 -05:00
Field G. Van Zee
913cf97653 Added docs/Performance.md and docs/graphs subdir.
Details:
- Added a new markdown document, docs/Performance.md, which reports
  performance of a representative set of level-3 operations across a
  variety of hardware architectures, comparing BLIS to OpenBLAS and a
  vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
  in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
2019-03-19 16:15:24 -05:00