Added docs/Performance.md and docs/graphs subdir.

Details:
- Added a new markdown document, docs/Performance.md, which reports
  performance of a representative set of level-3 operations across a
  variety of hardware architectures, comparing BLIS to OpenBLAS and a
  vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
  in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
This commit is contained in:
Field G. Van Zee
2019-03-19 16:15:24 -05:00
committed by Devrajegowda, Kiran
parent b503627a68
commit c6793be46e
31 changed files with 346 additions and 7 deletions

View File

@@ -23,7 +23,7 @@
# Introduction
Our paper [Anatomy of High-Performance Many-Threaded Matrix Multiplication](https://github.com/flame/blis#citations), presented at IPDPS'14, identified 5 loops around the microkernel as opportunities for parallelization within level-3 operations such as `gemm`. Within BLIS, we have enabled parallelism for 4 of those loops and have extended it to the rest of the level-3 operations except for `trsm`.
Our paper [Anatomy of High-Performance Many-Threaded Matrix Multiplication](https://github.com/flame/blis#citations), presented at IPDPS'14, identified five loops around the microkernel as opportunities for parallelization within level-3 operations such as `gemm`. Within BLIS, we have enabled parallelism for four of those loops, with the fifth planned for future work. This software architecture extends naturally to all level-3 operations except for `trsm`, where its application is necessarily limited to three of the five loops due to inter-iteration dependencies.
**IMPORTANT**: Multithreading in BLIS is disabled by default. Furthermore, even when multithreading is enabled, BLIS will default to single-threaded execution at runtime. In order to both *allow* and *invoke* parallelism from within BLIS operations, you must both *enable* multithreading at configure-time and *specify* multithreading at runtime.