diff --git a/CREDITS b/CREDITS index 73a90cc1b..634aced15 100644 --- a/CREDITS +++ b/CREDITS @@ -9,6 +9,7 @@ The BLIS framework was primarily authored by but many others have contributed code and feedback, including + Sameer Agarwal @sandwichmaker (Google) Murtaza Ali (Texas Instruments) Sajid Ali @s-sajid-ali (Northwestern University) Erling Andersen @erling-d-andersen diff --git a/docs/Performance.md b/docs/Performance.md index 28d3c2244..37d6c0267 100644 --- a/docs/Performance.md +++ b/docs/Performance.md @@ -194,6 +194,13 @@ size of interest so that we can better assist you. * Requested threading via `export OPENBLAS_NUM_THREADS=1` (single-threaded) * Requested threading via `export OPENBLAS_NUM_THREADS=26` (multithreaded, 26 cores) * Requested threading via `export OPENBLAS_NUM_THREADS=52` (multithreaded, 52 cores) + * Eigen 3.3.7 + * Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal). + * configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas` + * Requested threading via `export OMP_NUM_THREADS=1` (single-threaded) + * Requested threading via `export OMP_NUM_THREADS=26` (multithreaded, 26 cores) + * Requested threading via `export OMP_NUM_THREADS=52` (multithreaded, 52 cores) + * **NOTE**: This version of Eigen does not provide multithreaded implementations of `symm`/`hemm`, `syrk`/`herk`, `trmm`, or `trsm`, and therefore those curves are omitted from the multithreaded graphs. * MKL 2019 update 1 * Requested threading via `export MKL_NUM_THREADS=1` (single-threaded) * Requested threading via `export MKL_NUM_THREADS=26` (multithreaded, 26 cores) @@ -251,6 +258,13 @@ size of interest so that we can better assist you. * Requested threading via `export OPENBLAS_NUM_THREADS=1` (single-threaded) * Requested threading via `export OPENBLAS_NUM_THREADS=12` (multithreaded, 12 cores) * Requested threading via `export OPENBLAS_NUM_THREADS=24` (multithreaded, 24 cores) + * Eigen 3.3.7 + * Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal). + * configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas` + * Requested threading via `export OMP_NUM_THREADS=1` (single-threaded) + * Requested threading via `export OMP_NUM_THREADS=12` (multithreaded, 12 cores) + * Requested threading via `export OMP_NUM_THREADS=24` (multithreaded, 24 cores) + * **NOTE**: This version of Eigen does not provide multithreaded implementations of `symm`/`hemm`, `syrk`/`herk`, `trmm`, or `trsm`, and therefore those curves are omitted from the multithreaded graphs. * MKL 2018 update 2 * Requested threading via `export MKL_NUM_THREADS=1` (single-threaded) * Requested threading via `export MKL_NUM_THREADS=12` (multithreaded, 12 cores) @@ -309,6 +323,13 @@ size of interest so that we can better assist you. * Requested threading via `export OPENBLAS_NUM_THREADS=1` (single-threaded) * Requested threading via `export OPENBLAS_NUM_THREADS=32` (multithreaded, 32 cores) * Requested threading via `export OPENBLAS_NUM_THREADS=64` (multithreaded, 64 cores) + * Eigen 3.3.7 + * Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal). + * configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas` + * Requested threading via `export OMP_NUM_THREADS=1` (single-threaded) + * Requested threading via `export OMP_NUM_THREADS=32` (multithreaded, 32 cores) + * Requested threading via `export OMP_NUM_THREADS=64` (multithreaded, 64 cores) + * **NOTE**: This version of Eigen does not provide multithreaded implementations of `symm`/`hemm`, `syrk`/`herk`, `trmm`, or `trsm`, and therefore those curves are omitted from the multithreaded graphs. * MKL 2019 update 1 * Requested threading via `export MKL_NUM_THREADS=1` (single-threaded) * Requested threading via `export MKL_NUM_THREADS=32` (multithreaded, 32 cores) diff --git a/docs/graphs/l3_perf_epyc_jc1ic8jr4_nt32.pdf b/docs/graphs/l3_perf_epyc_jc1ic8jr4_nt32.pdf index d60329805..6a8c3a5ef 100644 Binary files a/docs/graphs/l3_perf_epyc_jc1ic8jr4_nt32.pdf and b/docs/graphs/l3_perf_epyc_jc1ic8jr4_nt32.pdf differ diff --git a/docs/graphs/l3_perf_epyc_jc1ic8jr4_nt32.png b/docs/graphs/l3_perf_epyc_jc1ic8jr4_nt32.png index fa7959762..9f9b63f0c 100644 Binary files a/docs/graphs/l3_perf_epyc_jc1ic8jr4_nt32.png and b/docs/graphs/l3_perf_epyc_jc1ic8jr4_nt32.png differ diff --git a/docs/graphs/l3_perf_epyc_jc2ic8jr4_nt64.pdf b/docs/graphs/l3_perf_epyc_jc2ic8jr4_nt64.pdf index 2c09c2ae3..05c776381 100644 Binary files a/docs/graphs/l3_perf_epyc_jc2ic8jr4_nt64.pdf and b/docs/graphs/l3_perf_epyc_jc2ic8jr4_nt64.pdf differ diff --git a/docs/graphs/l3_perf_epyc_jc2ic8jr4_nt64.png b/docs/graphs/l3_perf_epyc_jc2ic8jr4_nt64.png index 83eb4e92c..b021e6a91 100644 Binary files a/docs/graphs/l3_perf_epyc_jc2ic8jr4_nt64.png and b/docs/graphs/l3_perf_epyc_jc2ic8jr4_nt64.png differ diff --git a/docs/graphs/l3_perf_epyc_nt1.pdf b/docs/graphs/l3_perf_epyc_nt1.pdf index 12c2742da..8e6c7ad38 100644 Binary files a/docs/graphs/l3_perf_epyc_nt1.pdf and b/docs/graphs/l3_perf_epyc_nt1.pdf differ diff --git a/docs/graphs/l3_perf_epyc_nt1.png b/docs/graphs/l3_perf_epyc_nt1.png index 2ac4b238c..d9f6bad99 100644 Binary files a/docs/graphs/l3_perf_epyc_nt1.png and b/docs/graphs/l3_perf_epyc_nt1.png differ diff --git a/docs/graphs/l3_perf_has_jc2ic3jr2_nt12.pdf b/docs/graphs/l3_perf_has_jc2ic3jr2_nt12.pdf index a6978f345..cc27d747a 100644 Binary files a/docs/graphs/l3_perf_has_jc2ic3jr2_nt12.pdf and b/docs/graphs/l3_perf_has_jc2ic3jr2_nt12.pdf differ diff --git a/docs/graphs/l3_perf_has_jc2ic3jr2_nt12.png b/docs/graphs/l3_perf_has_jc2ic3jr2_nt12.png index 7a4e252d1..0d61d382b 100644 Binary files a/docs/graphs/l3_perf_has_jc2ic3jr2_nt12.png and b/docs/graphs/l3_perf_has_jc2ic3jr2_nt12.png differ diff --git a/docs/graphs/l3_perf_has_jc4ic3jr2_nt24.pdf b/docs/graphs/l3_perf_has_jc4ic3jr2_nt24.pdf index 0717574cb..ebaba0841 100644 Binary files a/docs/graphs/l3_perf_has_jc4ic3jr2_nt24.pdf and b/docs/graphs/l3_perf_has_jc4ic3jr2_nt24.pdf differ diff --git a/docs/graphs/l3_perf_has_jc4ic3jr2_nt24.png b/docs/graphs/l3_perf_has_jc4ic3jr2_nt24.png index 0804e11c2..af0ae99f2 100644 Binary files a/docs/graphs/l3_perf_has_jc4ic3jr2_nt24.png and b/docs/graphs/l3_perf_has_jc4ic3jr2_nt24.png differ diff --git a/docs/graphs/l3_perf_has_nt1.pdf b/docs/graphs/l3_perf_has_nt1.pdf index f2b95fbdc..4ce611e57 100644 Binary files a/docs/graphs/l3_perf_has_nt1.pdf and b/docs/graphs/l3_perf_has_nt1.pdf differ diff --git a/docs/graphs/l3_perf_has_nt1.png b/docs/graphs/l3_perf_has_nt1.png index 66bb33207..69d0224ed 100644 Binary files a/docs/graphs/l3_perf_has_nt1.png and b/docs/graphs/l3_perf_has_nt1.png differ diff --git a/docs/graphs/l3_perf_skx_jc2ic13_nt26.pdf b/docs/graphs/l3_perf_skx_jc2ic13_nt26.pdf index 5d2f075f7..b1a4774f5 100644 Binary files a/docs/graphs/l3_perf_skx_jc2ic13_nt26.pdf and b/docs/graphs/l3_perf_skx_jc2ic13_nt26.pdf differ diff --git a/docs/graphs/l3_perf_skx_jc2ic13_nt26.png b/docs/graphs/l3_perf_skx_jc2ic13_nt26.png index 166764fc7..ad1a6de95 100644 Binary files a/docs/graphs/l3_perf_skx_jc2ic13_nt26.png and b/docs/graphs/l3_perf_skx_jc2ic13_nt26.png differ diff --git a/docs/graphs/l3_perf_skx_jc4ic13_nt52.pdf b/docs/graphs/l3_perf_skx_jc4ic13_nt52.pdf index 8a8edbce4..9ec55d689 100644 Binary files a/docs/graphs/l3_perf_skx_jc4ic13_nt52.pdf and b/docs/graphs/l3_perf_skx_jc4ic13_nt52.pdf differ diff --git a/docs/graphs/l3_perf_skx_jc4ic13_nt52.png b/docs/graphs/l3_perf_skx_jc4ic13_nt52.png index fed5b22e2..69f098570 100644 Binary files a/docs/graphs/l3_perf_skx_jc4ic13_nt52.png and b/docs/graphs/l3_perf_skx_jc4ic13_nt52.png differ diff --git a/docs/graphs/l3_perf_skx_nt1.pdf b/docs/graphs/l3_perf_skx_nt1.pdf index ad8af5327..52e301647 100644 Binary files a/docs/graphs/l3_perf_skx_nt1.pdf and b/docs/graphs/l3_perf_skx_nt1.pdf differ diff --git a/docs/graphs/l3_perf_skx_nt1.png b/docs/graphs/l3_perf_skx_nt1.png index 772ca0180..ffe58309c 100644 Binary files a/docs/graphs/l3_perf_skx_nt1.png and b/docs/graphs/l3_perf_skx_nt1.png differ diff --git a/docs/graphs/l3_perf_tx2_jc4ic7_nt28.png b/docs/graphs/l3_perf_tx2_jc4ic7_nt28.png index ec691cd3e..b2085567e 100644 Binary files a/docs/graphs/l3_perf_tx2_jc4ic7_nt28.png and b/docs/graphs/l3_perf_tx2_jc4ic7_nt28.png differ diff --git a/docs/graphs/l3_perf_tx2_jc8ic7_nt56.png b/docs/graphs/l3_perf_tx2_jc8ic7_nt56.png index 861d80782..4c1a774c8 100644 Binary files a/docs/graphs/l3_perf_tx2_jc8ic7_nt56.png and b/docs/graphs/l3_perf_tx2_jc8ic7_nt56.png differ diff --git a/docs/graphs/l3_perf_tx2_nt1.png b/docs/graphs/l3_perf_tx2_nt1.png index a6e5dc413..72bce9745 100644 Binary files a/docs/graphs/l3_perf_tx2_nt1.png and b/docs/graphs/l3_perf_tx2_nt1.png differ