Added Eigen results to performance graphs.

Details:
- Updated the Haswell, SkylakeX, and Epyc performance graphs in
  docs/graphs to report on Eigen implementations, where applicable.
  Specifically, Eigen implements all level-3 operations sequentially,
  however, of those operations it only provides multithreaded gemm.
  Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
  omitted. Thanks to Sameer Agarwal for his help configuring and
  using Eigen.
- Updated docs/Performance.md to note the new implementation tested.
- CREDITS file update.
This commit is contained in:
Field G. Van Zee
2019-03-27 16:29:51 -05:00
parent bfac7e385f
commit 2c85e1dd9d
23 changed files with 22 additions and 0 deletions

View File

@@ -9,6 +9,7 @@ The BLIS framework was primarily authored by
but many others have contributed code and feedback, including
Sameer Agarwal @sandwichmaker (Google)
Murtaza Ali (Texas Instruments)
Sajid Ali @s-sajid-ali (Northwestern University)
Erling Andersen @erling-d-andersen

View File

@@ -194,6 +194,13 @@ size of interest so that we can better assist you.
* Requested threading via `export OPENBLAS_NUM_THREADS=1` (single-threaded)
* Requested threading via `export OPENBLAS_NUM_THREADS=26` (multithreaded, 26 cores)
* Requested threading via `export OPENBLAS_NUM_THREADS=52` (multithreaded, 52 cores)
* Eigen 3.3.7
* Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal).
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
* Requested threading via `export OMP_NUM_THREADS=1` (single-threaded)
* Requested threading via `export OMP_NUM_THREADS=26` (multithreaded, 26 cores)
* Requested threading via `export OMP_NUM_THREADS=52` (multithreaded, 52 cores)
* **NOTE**: This version of Eigen does not provide multithreaded implementations of `symm`/`hemm`, `syrk`/`herk`, `trmm`, or `trsm`, and therefore those curves are omitted from the multithreaded graphs.
* MKL 2019 update 1
* Requested threading via `export MKL_NUM_THREADS=1` (single-threaded)
* Requested threading via `export MKL_NUM_THREADS=26` (multithreaded, 26 cores)
@@ -251,6 +258,13 @@ size of interest so that we can better assist you.
* Requested threading via `export OPENBLAS_NUM_THREADS=1` (single-threaded)
* Requested threading via `export OPENBLAS_NUM_THREADS=12` (multithreaded, 12 cores)
* Requested threading via `export OPENBLAS_NUM_THREADS=24` (multithreaded, 24 cores)
* Eigen 3.3.7
* Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal).
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
* Requested threading via `export OMP_NUM_THREADS=1` (single-threaded)
* Requested threading via `export OMP_NUM_THREADS=12` (multithreaded, 12 cores)
* Requested threading via `export OMP_NUM_THREADS=24` (multithreaded, 24 cores)
* **NOTE**: This version of Eigen does not provide multithreaded implementations of `symm`/`hemm`, `syrk`/`herk`, `trmm`, or `trsm`, and therefore those curves are omitted from the multithreaded graphs.
* MKL 2018 update 2
* Requested threading via `export MKL_NUM_THREADS=1` (single-threaded)
* Requested threading via `export MKL_NUM_THREADS=12` (multithreaded, 12 cores)
@@ -309,6 +323,13 @@ size of interest so that we can better assist you.
* Requested threading via `export OPENBLAS_NUM_THREADS=1` (single-threaded)
* Requested threading via `export OPENBLAS_NUM_THREADS=32` (multithreaded, 32 cores)
* Requested threading via `export OPENBLAS_NUM_THREADS=64` (multithreaded, 64 cores)
* Eigen 3.3.7
* Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal).
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
* Requested threading via `export OMP_NUM_THREADS=1` (single-threaded)
* Requested threading via `export OMP_NUM_THREADS=32` (multithreaded, 32 cores)
* Requested threading via `export OMP_NUM_THREADS=64` (multithreaded, 64 cores)
* **NOTE**: This version of Eigen does not provide multithreaded implementations of `symm`/`hemm`, `syrk`/`herk`, `trmm`, or `trsm`, and therefore those curves are omitted from the multithreaded graphs.
* MKL 2019 update 1
* Requested threading via `export MKL_NUM_THREADS=1` (single-threaded)
* Requested threading via `export MKL_NUM_THREADS=32` (multithreaded, 32 cores)

Binary file not shown.

Before

Width:  |  Height:  |  Size: 102 KiB

After

Width:  |  Height:  |  Size: 107 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 105 KiB

After

Width:  |  Height:  |  Size: 114 KiB

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 66 KiB

After

Width:  |  Height:  |  Size: 76 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 91 KiB

After

Width:  |  Height:  |  Size: 95 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 90 KiB

After

Width:  |  Height:  |  Size: 97 KiB

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 69 KiB

After

Width:  |  Height:  |  Size: 81 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 98 KiB

After

Width:  |  Height:  |  Size: 102 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 94 KiB

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

Binary file not shown.

Before

Width:  |  Height:  |  Size: 75 KiB

After

Width:  |  Height:  |  Size: 83 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 92 KiB

After

Width:  |  Height:  |  Size: 92 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 100 KiB

After

Width:  |  Height:  |  Size: 100 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 70 KiB

After

Width:  |  Height:  |  Size: 70 KiB