Added Epyc 7742 Zen2 ("Rome") sup perf results.
Details: - Added single-threaded and multithreaded sup performance results to docs/PerformanceSmall.md for both sgemm and dgemm. These results were gathered on an Epyc 7742 "Rome" server featuring AMD's Zen2 microarchitecture. Special thanks to Jeff Diamond for facilitating access to the system via the Oracle Cloud. - Updates to octave scripts in test/sup/octave for use with Octave 5.2 and for use with subplot_tight(). - Minor updates to octave scripts in test/3/octave. - Renamed files containing the previous Zen performance results for consistency with the new results. - Decreased line thickness slightly in large/conventional Zen2 graphs. I'm done tweaking those this time. Really. - Added missing line regarding eigen header installation for each microarchitecture section.
@@ -243,6 +243,7 @@ The `runthese.m` file will contain example invocations of the function.
|
||||
endif()
|
||||
```
|
||||
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
|
||||
* installed headers via `cmake . -DCMAKE_INSTALL_PREFIX=$HOME/flame/eigen; make install`
|
||||
* The `gemm` implementation was pulled in at compile-time via Eigen headers; other operations were linked to Eigen's BLAS library.
|
||||
* Single-threaded (1 core) execution requested via `export OMP_NUM_THREADS=1`
|
||||
* Multithreaded (26 core) execution requested via `export OMP_NUM_THREADS=26`
|
||||
@@ -323,6 +324,7 @@ The `runthese.m` file will contain example invocations of the function.
|
||||
endif()
|
||||
```
|
||||
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
|
||||
* installed headers via `cmake . -DCMAKE_INSTALL_PREFIX=$HOME/flame/eigen; make install`
|
||||
* The `gemm` implementation was pulled in at compile-time via Eigen headers; other operations were linked to Eigen's BLAS library.
|
||||
* Single-threaded (1 core) execution requested via `export OMP_NUM_THREADS=1`
|
||||
* Multithreaded (12 core) execution requested via `export OMP_NUM_THREADS=12`
|
||||
@@ -401,6 +403,7 @@ The `runthese.m` file will contain example invocations of the function.
|
||||
endif()
|
||||
```
|
||||
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
|
||||
* installed headers via `cmake . -DCMAKE_INSTALL_PREFIX=$HOME/flame/eigen; make install`
|
||||
* The `gemm` implementation was pulled in at compile-time via Eigen headers; other operations were linked to Eigen's BLAS library.
|
||||
* Single-threaded (1 core) execution requested via `export OMP_NUM_THREADS=1`
|
||||
* Multithreaded (32 core) execution requested via `export OMP_NUM_THREADS=32`
|
||||
@@ -483,6 +486,7 @@ The `runthese.m` file will contain example invocations of the function.
|
||||
endif()
|
||||
```
|
||||
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
|
||||
* installed headers via `cmake . -DCMAKE_INSTALL_PREFIX=$HOME/flame/eigen; make install`
|
||||
* The `gemm` implementation was pulled in at compile-time via Eigen headers; other operations were linked to Eigen's BLAS library.
|
||||
* Single-threaded (1 core) execution requested via `export OMP_NUM_THREADS=1`
|
||||
* Multithreaded (64 core) execution requested via `export OMP_NUM_THREADS=64`
|
||||
|
||||
@@ -12,9 +12,12 @@
|
||||
* **[Haswell](PerformanceSmall.md#haswell)**
|
||||
* **[Experiment details](PerformanceSmall.md#haswell-experiment-details)**
|
||||
* **[Results](PerformanceSmall.md#haswell-results)**
|
||||
* **[Epyc](PerformanceSmall.md#epyc)**
|
||||
* **[Experiment details](PerformanceSmall.md#epyc-experiment-details)**
|
||||
* **[Results](PerformanceSmall.md#epyc-results)**
|
||||
* **[Zen](PerformanceSmall.md#zen)**
|
||||
* **[Experiment details](PerformanceSmall.md#zen-experiment-details)**
|
||||
* **[Results](PerformanceSmall.md#zen-results)**
|
||||
* **[Zen2](PerformanceSmall.md#zen2)**
|
||||
* **[Experiment details](PerformanceSmall.md#zen2-experiment-details)**
|
||||
* **[Results](PerformanceSmall.md#zen2-results)**
|
||||
* **[Feedback](PerformanceSmall.md#feedback)**
|
||||
|
||||
# Introduction
|
||||
@@ -295,9 +298,9 @@ The `runthese.m` file will contain example invocations of the function.
|
||||
|
||||
---
|
||||
|
||||
## Epyc
|
||||
## Zen
|
||||
|
||||
### Epyc experiment details
|
||||
### Zen experiment details
|
||||
|
||||
* Location: Oracle cloud
|
||||
* Processor model: AMD Epyc 7551 (Zen1)
|
||||
@@ -318,7 +321,7 @@ The `runthese.m` file will contain example invocations of the function.
|
||||
* BLIS 90db88e (0.6.1-8)
|
||||
* configured with `./configure --enable-cblas auto` (single-threaded)
|
||||
* configured with `./configure --enable-cblas -t openmp auto` (multithreaded)
|
||||
* sub-configuration exercised: `haswell`
|
||||
* sub-configuration exercised: `zen`
|
||||
* Multithreaded (32 cores) execution requested via `export BLIS_NUM_THREADS=32`
|
||||
* OpenBLAS 0.3.8
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0 USE_LOCKING=1` (single-threaded)
|
||||
@@ -357,25 +360,124 @@ The `runthese.m` file will contain example invocations of the function.
|
||||
* Comments:
|
||||
* libxsmm is highly competitive for very small problems, but quickly gives up once the "large" dimension exceeds about 180-240 (or 64 in the case where all operands are square). Also, libxsmm's `gemm` cannot handle a transposition on matrix A and similarly dispatches the fallback implementation for those cases. libxsmm also does not export CBLAS interfaces, and therefore only appears on the graphs for column-stored matrices.
|
||||
|
||||
### Epyc results
|
||||
### Zen results
|
||||
|
||||
#### pdf
|
||||
|
||||
* [Epyc single-threaded row-stored](graphs/sup/dgemm_rrr_epyc_nt1.pdf)
|
||||
* [Epyc single-threaded column-stored](graphs/sup/dgemm_ccc_epyc_nt1.pdf)
|
||||
* [Epyc multithreaded (32 cores) row-stored](graphs/sup/dgemm_rrr_epyc_nt32.pdf)
|
||||
* [Epyc multithreaded (32 cores) column-stored](graphs/sup/dgemm_ccc_epyc_nt32.pdf)
|
||||
* [Zen single-threaded row-stored](graphs/sup/dgemm_rrr_zen_nt1.pdf)
|
||||
* [Zen single-threaded column-stored](graphs/sup/dgemm_ccc_zen_nt1.pdf)
|
||||
* [Zen multithreaded (32 cores) row-stored](graphs/sup/dgemm_rrr_zen_nt32.pdf)
|
||||
* [Zen multithreaded (32 cores) column-stored](graphs/sup/dgemm_ccc_zen_nt32.pdf)
|
||||
|
||||
#### png (inline)
|
||||
|
||||
* **Epyc single-threaded row-stored**
|
||||

|
||||
* **Epyc single-threaded column-stored**
|
||||

|
||||
* **Epyc multithreaded (32 cores) row-stored**
|
||||

|
||||
* **Epyc multithreaded (32 cores) column-stored**
|
||||

|
||||
* **Zen single-threaded row-stored**
|
||||

|
||||
* **Zen single-threaded column-stored**
|
||||

|
||||
* **Zen multithreaded (32 cores) row-stored**
|
||||

|
||||
* **Zen multithreaded (32 cores) column-stored**
|
||||

|
||||
|
||||
---
|
||||
|
||||
## Zen2
|
||||
|
||||
### Zen2 experiment details
|
||||
|
||||
* Location: Oracle cloud
|
||||
* Processor model: AMD Epyc 7742 (Zen2 "Rome")
|
||||
* Core topology: two sockets, 8 Core Complex Dies (CCDs) per socket, 2 Core Complexes (CCX) per CCD, 4 cores per CCX, 128 cores total
|
||||
* SMT status: enabled, but not utilized
|
||||
* Max clock rate: 2.25GHz (base, documented); 3.4GHz boost (single-core, documented); 2.6GHz boost (multicore, estimated)
|
||||
* Max vector register length: 256 bits (AVX2)
|
||||
* Max FMA vector IPC: 2
|
||||
* Alternatively, FMA vector IPC is 4 when vectors are limited to 128 bits each.
|
||||
* Peak performance:
|
||||
* single-core: 54.4 GFLOPS (double-precision), 108.8 GFLOPS (single-precision)
|
||||
* multicore (estimated): 41.6 GFLOPS/core (double-precision), 83.2 GFLOPS/core (single-precision)
|
||||
* Operating system: Ubuntu 18.04 (Linux kernel 4.15.0)
|
||||
* Page size: 4096 bytes
|
||||
* Compiler: gcc 9.3.0
|
||||
* Results gathered: 8 October 2020
|
||||
* Implementations tested:
|
||||
* BLIS a0849d3 (0.7.0-67)
|
||||
* configured with `./configure --enable-cblas auto` (single-threaded)
|
||||
* configured with `./configure --enable-cblas -t openmp auto` (multithreaded)
|
||||
* sub-configuration exercised: `zen2`
|
||||
* Multithreaded (32 cores) execution requested via `export BLIS_NUM_THREADS=32`
|
||||
* OpenBLAS 0.3.10
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0 USE_LOCKING=1` (single-threaded)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=32` (multithreaded)
|
||||
* Multithreaded (32 cores) execution requested via `export OPENBLAS_NUM_THREADS=32`
|
||||
* BLASFEO 5b26d40
|
||||
* configured `Makefile.rule` with: `BLAS_API=1 FORTRAN_BLAS_API=1 CBLAS_API=1`.
|
||||
* built BLAS library via `make CC=gcc`
|
||||
* Eigen 3.3.90
|
||||
* Obtained via the [Eigen GitLab homepage](https://gitlab.com/libeigen/eigen) (24 September 2020)
|
||||
* Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal):
|
||||
```
|
||||
# These lines added after line 60.
|
||||
check_cxx_compiler_flag("-march=native" COMPILER_SUPPORTS_MARCH_NATIVE)
|
||||
if(COMPILER_SUPPORTS_MARCH_NATIVE)
|
||||
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
|
||||
endif()
|
||||
```
|
||||
* configured and built BLAS library via `mkdir build; cd build; CC=gcc cmake ..; make blas`
|
||||
* installed headers via `cmake . -DCMAKE_INSTALL_PREFIX=$HOME/flame/eigen; make install`
|
||||
* The `gemm` implementation was pulled in at compile-time via Eigen headers; other operations were linked to Eigen's BLAS library.
|
||||
* Single-threaded (1 core) execution requested via `export OMP_NUM_THREADS=1`
|
||||
* Multithreaded (32 cores) execution requested via `export OMP_NUM_THREADS=32`
|
||||
* MKL 2020 update 3
|
||||
* Single-threaded (1 core) execution requested via `export MKL_NUM_THREADS=1`
|
||||
* Multithreaded (32 cores) execution requested via `export MKL_NUM_THREADS=32`
|
||||
* libxsmm f0ab9cb (post-1.16.1)
|
||||
* compiled with `make AVX=2`; linked with [netlib BLAS](http://www.netlib.org/blas/) 3.6.0 as the fallback library to better show where libxsmm stops handling the computation internally.
|
||||
* Affinity:
|
||||
* Thread affinity for BLIS was specified manually via `GOMP_CPU_AFFINITY="0-31"`. However, multithreaded OpenBLAS appears to revert to single-threaded execution if `GOMP_CPU_AFFINITY` is set. Therefore, when measuring OpenBLAS performance, the `GOMP_CPU_AFFINITY` environment variable was unset.
|
||||
* All executables were run through `numactl --interleave=all`.
|
||||
* Frequency throttling (via `cpupower`):
|
||||
* Driver: acpi-cpufreq
|
||||
* Governor: performance
|
||||
* Hardware limits (steps): 1.5GHz, 2.0GHz, 2.25GHz
|
||||
* Adjusted minimum: 2.25GHz
|
||||
* Comments:
|
||||
* None.
|
||||
|
||||
### Zen2 results
|
||||
|
||||
#### pdf
|
||||
|
||||
* [Zen2 sgemm single-threaded row-stored](graphs/sup/sgemm_rrr_zen2_nt1.pdf)
|
||||
* [Zen2 sgemm single-threaded column-stored](graphs/sup/sgemm_ccc_zen2_nt1.pdf)
|
||||
* [Zen2 dgemm single-threaded row-stored](graphs/sup/dgemm_rrr_zen2_nt1.pdf)
|
||||
* [Zen2 dgemm single-threaded column-stored](graphs/sup/dgemm_ccc_zen2_nt1.pdf)
|
||||
|
||||
* [Zen2 sgemm multithreaded (32 cores) row-stored](graphs/sup/sgemm_rrr_zen2_nt32.pdf)
|
||||
* [Zen2 sgemm multithreaded (32 cores) column-stored](graphs/sup/sgemm_ccc_zen2_nt32.pdf)
|
||||
* [Zen2 dgemm multithreaded (32 cores) row-stored](graphs/sup/dgemm_rrr_zen2_nt32.pdf)
|
||||
* [Zen2 dgemm multithreaded (32 cores) column-stored](graphs/sup/dgemm_ccc_zen2_nt32.pdf)
|
||||
|
||||
#### png (inline)
|
||||
|
||||
* **Zen2 sgemm single-threaded row-stored**
|
||||

|
||||
* **Zen2 sgemm single-threaded column-stored**
|
||||

|
||||
* **Zen2 dgemm single-threaded row-stored**
|
||||

|
||||
* **Zen2 dgemm single-threaded column-stored**
|
||||

|
||||
|
||||
* **Zen2 sgemm multithreaded (32 cores) row-stored**
|
||||

|
||||
* **Zen2 sgemm multithreaded (32 cores) column-stored**
|
||||

|
||||
* **Zen2 dgemm multithreaded (32 cores) row-stored**
|
||||

|
||||
* **Zen2 dgemm multithreaded (32 cores) column-stored**
|
||||

|
||||
|
||||
---
|
||||
|
||||
|
||||
|
Before Width: | Height: | Size: 245 KiB After Width: | Height: | Size: 243 KiB |
|
Before Width: | Height: | Size: 243 KiB After Width: | Height: | Size: 241 KiB |
|
Before Width: | Height: | Size: 230 KiB After Width: | Height: | Size: 228 KiB |
BIN
docs/graphs/sup/dgemm_ccc_zen2_nt1.pdf
Normal file
BIN
docs/graphs/sup/dgemm_ccc_zen2_nt1.png
Normal file
|
After Width: | Height: | Size: 346 KiB |
BIN
docs/graphs/sup/dgemm_ccc_zen2_nt32.pdf
Normal file
BIN
docs/graphs/sup/dgemm_ccc_zen2_nt32.png
Normal file
|
After Width: | Height: | Size: 208 KiB |
|
Before Width: | Height: | Size: 236 KiB After Width: | Height: | Size: 236 KiB |
|
Before Width: | Height: | Size: 180 KiB After Width: | Height: | Size: 180 KiB |
BIN
docs/graphs/sup/dgemm_rrr_zen2_nt1.pdf
Normal file
BIN
docs/graphs/sup/dgemm_rrr_zen2_nt1.png
Normal file
|
After Width: | Height: | Size: 326 KiB |
BIN
docs/graphs/sup/dgemm_rrr_zen2_nt32.pdf
Normal file
BIN
docs/graphs/sup/dgemm_rrr_zen2_nt32.png
Normal file
|
After Width: | Height: | Size: 206 KiB |
|
Before Width: | Height: | Size: 222 KiB After Width: | Height: | Size: 222 KiB |
|
Before Width: | Height: | Size: 179 KiB After Width: | Height: | Size: 179 KiB |
BIN
docs/graphs/sup/sgemm_ccc_zen2_nt1.pdf
Normal file
BIN
docs/graphs/sup/sgemm_ccc_zen2_nt1.png
Normal file
|
After Width: | Height: | Size: 310 KiB |
BIN
docs/graphs/sup/sgemm_ccc_zen2_nt32.pdf
Normal file
BIN
docs/graphs/sup/sgemm_ccc_zen2_nt32.png
Normal file
|
After Width: | Height: | Size: 183 KiB |
BIN
docs/graphs/sup/sgemm_rrr_zen2_nt1.pdf
Normal file
BIN
docs/graphs/sup/sgemm_rrr_zen2_nt1.png
Normal file
|
After Width: | Height: | Size: 303 KiB |
BIN
docs/graphs/sup/sgemm_rrr_zen2_nt32.pdf
Normal file
BIN
docs/graphs/sup/sgemm_rrr_zen2_nt32.png
Normal file
|
After Width: | Height: | Size: 185 KiB |
@@ -85,7 +85,7 @@ else
|
||||
xaxisname = 'm = n = k';
|
||||
fontsize = 20;
|
||||
end
|
||||
linesize = 0.5;
|
||||
linesize = 0.8;
|
||||
legend_loc = 'southeast';
|
||||
|
||||
%ax1 = subplot( rows, cols, theid );
|
||||
@@ -188,14 +188,14 @@ set( titl, 'Position', tpos ); % here we nudge it back to centered with box.
|
||||
set( titl, 'FontSize', fontsize );
|
||||
|
||||
if theid > (rows-1)*cols
|
||||
xlab = xlabel( ax1,xaxisname );
|
||||
%tpos = get( xlab, 'Position' )
|
||||
%tpos(2) = tpos(2) + 10;
|
||||
%set( xlab, 'Position', tpos );
|
||||
%tpos = get( xlab, 'Position' )
|
||||
%tpos(2) = tpos(2) + 10;
|
||||
%set( xlab, 'Position', tpos );
|
||||
xlab = xlabel( ax1,xaxisname );
|
||||
end
|
||||
|
||||
if mod(theid-1,cols) == 0
|
||||
ylab = ylabel( ax1,yaxisname );
|
||||
ylab = ylabel( ax1,yaxisname );
|
||||
end
|
||||
|
||||
r_val = 0;
|
||||
|
||||
@@ -1,11 +1,14 @@
|
||||
function r_val = plot_panel_4x5( cfreq, ...
|
||||
dflopspercycle, ...
|
||||
nth, ...
|
||||
thr_str, ...
|
||||
dirpath, ...
|
||||
arch_str, ...
|
||||
vend_str, ...
|
||||
with_eigen )
|
||||
function r_val = plot_panel_4x5 ...
|
||||
( ...
|
||||
cfreq, ...
|
||||
dflopspercycle, ...
|
||||
nth, ...
|
||||
thr_str, ...
|
||||
dirpath, ...
|
||||
arch_str, ...
|
||||
vend_str, ...
|
||||
with_eigen ...
|
||||
)
|
||||
|
||||
impl = 'octave';
|
||||
%impl = 'matlab';
|
||||
@@ -138,7 +141,6 @@ end
|
||||
outfile = sprintf( 'l3_perf_%s_nt%d.pdf', arch_str, nth );
|
||||
|
||||
% Output the graph to pdf format.
|
||||
%print(gcf, 'gemm_md','-fillpage','-dpdf');
|
||||
if strcmp( impl, 'octave' )
|
||||
print( gcf, outfile );
|
||||
else
|
||||
|
||||
@@ -22,3 +22,5 @@ plot_panel_4x5(2.55,8,64,'2s','../results/epyc/merged20190306_0319_0328/jc2ic8jr
|
||||
plot_panel_4x5(3.40,16,1, 'st','../results/zen2/20200929/st', 'zen2','MKL',1); close all; clear all;
|
||||
plot_panel_4x5(2.60,16,64, '1s','../results/zen2/20200929/jc4ic4jr4','zen2','MKL',1); close all; clear all;
|
||||
plot_panel_4x5(2.60,16,128,'2s','../results/zen2/20200929/jc8ic4jr4','zen2','MKL',1); close all; clear all;
|
||||
|
||||
plot_panel_4x5(3.40,16,1, 'st','../results/zen2/20200929/st', 'zen2','MKL',1); close all; clear all; plot_panel_4x5(2.60,16,64, '1s','../results/zen2/20200929/jc4ic4jr4','zen2','MKL',1); close all; clear all; plot_panel_4x5(2.60,16,128,'2s','../results/zen2/20200929/jc8ic4jr4','zen2','MKL',1); close all; clear all;
|
||||
|
||||
@@ -11,7 +11,10 @@ function r_val = plot_l3sup_perf( opname, ...
|
||||
rows, cols, ...
|
||||
cfreq, ...
|
||||
dfps, ...
|
||||
theid, impl )
|
||||
theid, impl, ...
|
||||
fontsize, ...
|
||||
leg_pos_st, leg_pos_st_x, leg_pos_mt, ...
|
||||
sp_margins )
|
||||
|
||||
% Define the column in which the performance rates are found.
|
||||
flopscol = size( data_blissup, 2 );
|
||||
@@ -32,23 +35,8 @@ end
|
||||
% NOTE: We can draw the legend on any graph as long as it has already been
|
||||
% rendered. Since the coordinates are global, we can simply always wait until
|
||||
% the final graph to draw the legend.
|
||||
%if nth == 1
|
||||
% if has_xsmm == 1
|
||||
% legend_plot_id = 2*cols + 1*5;
|
||||
% else
|
||||
% legend_plot_id = 1*cols + 1*5;
|
||||
% end
|
||||
%else
|
||||
% legend_plot_id = 0*cols + 1*6;
|
||||
%end
|
||||
legend_plot_id = cols*rows;
|
||||
|
||||
% Hold the axes.
|
||||
if 1
|
||||
ax1 = subplot( rows, cols, theid );
|
||||
hold( ax1, 'on' );
|
||||
end
|
||||
|
||||
% Set line properties.
|
||||
color_blissup = 'k'; lines_blissup = '-'; markr_blissup = '';
|
||||
color_blisconv = 'k'; lines_blisconv = ':'; markr_blisconv = '';
|
||||
@@ -97,17 +85,17 @@ else
|
||||
yaxisname = 'GFLOPS/core';
|
||||
end
|
||||
|
||||
|
||||
%flopscol = 4;
|
||||
% Set the marker size, line size, and other items.
|
||||
msize = 5;
|
||||
if 1
|
||||
fontsize = 12;
|
||||
else
|
||||
fontsize = 16;
|
||||
end
|
||||
linesize = 0.5;
|
||||
linesize = 0.8;
|
||||
legend_loc = 'southeast';
|
||||
|
||||
%ax1 = subplot( rows, cols, theid );
|
||||
ax1 = subplot_tight( rows, cols, theid, sp_margins );
|
||||
|
||||
% Hold the axes.
|
||||
hold( ax1, 'on' );
|
||||
|
||||
% --------------------------------------------------------------------
|
||||
|
||||
% Automatically detect a column with the increasing problem size.
|
||||
@@ -199,6 +187,7 @@ end
|
||||
|
||||
% xpos ypos
|
||||
%set( leg,'Position',[11.32 6.36 1.15 0.7 ] ); % (1,4tl)
|
||||
|
||||
if nth == 1 && theid == legend_plot_id
|
||||
if has_xsmm == 1
|
||||
% single-threaded, with libxsmm (ccc)
|
||||
@@ -207,13 +196,9 @@ if nth == 1 && theid == legend_plot_id
|
||||
blissup_lg, blisconv_lg, eigen_lg, open_lg, vend_lg, bfeo_lg, xsmm_lg, ...
|
||||
'Location', legend_loc );
|
||||
set( leg,'Box','off','Color','none','Units','inches' );
|
||||
if impl == 'octave'
|
||||
set( leg,'FontSize',fontsize );
|
||||
set( leg,'Position',[15.35 4.62 1.9 1.20] ); % (1,4tl)
|
||||
else
|
||||
set( leg,'FontSize',fontsize-3 );
|
||||
set( leg,'Position',[18.20 10.20 1.15 0.7 ] ); % (1,4tl)
|
||||
end
|
||||
set( leg,'FontSize',fontsize );
|
||||
%set( leg,'Position',[15.35 4.62 1.9 1.20] );
|
||||
set( leg,'Position',leg_pos_st_x );
|
||||
else
|
||||
% single-threaded, without libxsmm (rrr, or other)
|
||||
leg = legend( ...
|
||||
@@ -221,13 +206,9 @@ if nth == 1 && theid == legend_plot_id
|
||||
blissup_lg, blisconv_lg, eigen_lg, open_lg, vend_lg, bfeo_lg, ...
|
||||
'Location', legend_loc );
|
||||
set( leg,'Box','off','Color','none','Units','inches' );
|
||||
if impl == 'octave'
|
||||
set( leg,'FontSize',fontsize );
|
||||
set( leg,'Position',[15.35 7.40 1.9 1.10] ); % (1,4tl)
|
||||
else
|
||||
set( leg,'FontSize',fontsize-1 );
|
||||
set( leg,'Position',[18.24 10.15 1.15 0.7] ); % (1,4tl)
|
||||
end
|
||||
set( leg,'FontSize',fontsize );
|
||||
%set( leg,'Position',[15.35 7.40 1.9 1.10] );
|
||||
set( leg,'Position',leg_pos_st );
|
||||
end
|
||||
elseif nth > 1 && theid == legend_plot_id
|
||||
% multithreaded
|
||||
@@ -236,13 +217,9 @@ elseif nth > 1 && theid == legend_plot_id
|
||||
blissup_lg, blisconv_lg, eigen_lg, open_lg, vend_lg, ...
|
||||
'Location', legend_loc );
|
||||
set( leg,'Box','off','Color','none','Units','inches' );
|
||||
if impl == 'octave'
|
||||
set( leg,'FontSize',fontsize );
|
||||
set( leg,'Position',[18.20 10.30 1.9 0.95] ); % (1,4tl)
|
||||
else
|
||||
set( leg,'FontSize',fontsize-1 );
|
||||
set( leg,'Position',[18.24 10.15 1.15 0.7] ); % (1,4tl)
|
||||
end
|
||||
set( leg,'FontSize',fontsize );
|
||||
%set( leg,'Position',[18.20 10.30 1.9 0.95] );
|
||||
set( leg,'Position',leg_pos_mt );
|
||||
end
|
||||
|
||||
set( ax1,'FontSize',fontsize );
|
||||
@@ -256,16 +233,18 @@ set( titl, 'FontWeight', 'normal' ); % default font style is now 'bold'.
|
||||
% This is a hack to nudge the title back to the center of the box.
|
||||
if impl == 'octave'
|
||||
tpos = get( titl, 'Position' );
|
||||
% For some reason, the titles in the graphs in the last column start
|
||||
% off in a different relative position than the graphs in the other
|
||||
% columns. Here, we manually account for that.
|
||||
if mod(theid-1,cols) == 6
|
||||
tpos(1) = tpos(1) + -10;
|
||||
else
|
||||
tpos(1) = tpos(1) + -40;
|
||||
end
|
||||
% For some reason, the titles in the graphs in certain columns start
|
||||
% off in a different relative position. Here, we manually fix that.
|
||||
%modid = mod(theid-1,cols);
|
||||
%if modid == 0 || modid == 1 || modid == 2
|
||||
% tpos(1) = tpos(1) + 0;
|
||||
%elseif modid == 3 || modid == 4 || modid == 5
|
||||
% tpos(1) = tpos(1) + 0;
|
||||
%else
|
||||
% tpos(1) = tpos(1) + 0;
|
||||
%end
|
||||
set( titl, 'Position', tpos );
|
||||
set( titl, 'FontSize', fontsize );
|
||||
set( titl, 'FontSize', fontsize-1 );
|
||||
else % impl == 'matlab'
|
||||
tpos = get( titl, 'Position' );
|
||||
tpos(1) = tpos(1) + 90;
|
||||
|
||||
@@ -11,26 +11,42 @@ function r_val = plot_panel_trxsh ...
|
||||
pack_str, ...
|
||||
dirpath, ...
|
||||
arch_str, ...
|
||||
vend_str, ...
|
||||
impl ...
|
||||
vend_str ...
|
||||
)
|
||||
|
||||
if 1 == 1
|
||||
%fig = figure('Position', [100, 100, 2400, 1500]);
|
||||
fig = figure('Position', [100, 100, 2400, 1200]);
|
||||
orient( fig, 'portrait' );
|
||||
set(gcf,'PaperUnits', 'inches');
|
||||
if impl == 'matlab'
|
||||
set(gcf,'PaperSize', [11.5 20.4]);
|
||||
set(gcf,'PaperPosition', [0 0 11.5 20.4]);
|
||||
set(gcf,'PaperPositionMode','manual');
|
||||
else % impl == 'octave' % octave 4.x
|
||||
set(gcf,'PaperSize', [12 22.0]);
|
||||
set(gcf,'PaperPositionMode','auto');
|
||||
end
|
||||
set(gcf,'PaperOrientation','landscape');
|
||||
impl = 'octave';
|
||||
|
||||
%subp = 'default';
|
||||
subp = 'tight';
|
||||
|
||||
if strcmp( subp, 'default' )
|
||||
position = [100 100 2400 1200];
|
||||
papersize = [12 22.0];
|
||||
sp_margins = [ 0.070 0.049 ];
|
||||
else
|
||||
position = [100 100 2308 1202];
|
||||
papersize = [12.5 24.0];
|
||||
fontsize = 14;
|
||||
leg_pos_st = [10.85 7.43 1.3 1.2 ];
|
||||
leg_pos_st_x = [14.15 4.35 1.3 1.4 ];
|
||||
leg_pos_mt = [10.85 7.66 1.3 1.0 ];
|
||||
sp_margins = [ 0.063 0.033 ];
|
||||
end
|
||||
|
||||
%fig = figure('Position', [100, 100, 2400, 1500]);
|
||||
fig = figure('Position', position);
|
||||
orient( fig, 'portrait' );
|
||||
set(gcf,'PaperUnits', 'inches');
|
||||
if impl == 'octave'
|
||||
set(gcf,'PaperSize', papersize);
|
||||
set(gcf,'PaperPositionMode','auto');
|
||||
else % impl == 'matlab'
|
||||
set(gcf,'PaperSize', [11.5 20.4]);
|
||||
set(gcf,'PaperPosition', [0 0 11.5 20.4]);
|
||||
set(gcf,'PaperPositionMode','manual');
|
||||
end
|
||||
set(gcf,'PaperOrientation','landscape');
|
||||
|
||||
% Create filename "templates" for the files that contain the performance
|
||||
% results.
|
||||
filetemp_blissup = '%s/output_%s_%s_blissup.m';
|
||||
@@ -92,7 +108,8 @@ for opi = 1:n_opsupnames
|
||||
|
||||
% Only read libxsmm data for single-threaded cases, and cases that use column
|
||||
% storage since that's the only format that libxsmm supports.
|
||||
if nth == 1 && stor_str == 'ccc'
|
||||
%if nth == 1 && stor_str == 'ccc'
|
||||
if nth == 1 && strcmp( stor_str, 'ccc' )
|
||||
data_xsmm = load_data( filetemp_xsmm, dirpath, thr_str, opsupname, vartemp, opname, 'libxsmm' );
|
||||
else
|
||||
data_xsmm = zeros( size( data_blissup, 1 ), size( data_blissup, 2 ) );
|
||||
@@ -113,31 +130,19 @@ for opi = 1:n_opsupnames
|
||||
4, 7, ...
|
||||
cfreq, ...
|
||||
dflopspercycle, ...
|
||||
opi, impl );
|
||||
|
||||
% Clear the variables created for the return values of load_data().
|
||||
%clear data_blissup;
|
||||
%clear data_blisconv;
|
||||
%clear data_eigen;
|
||||
%clear data_open;
|
||||
%clear data_vend;
|
||||
%clear data_bfeo;
|
||||
%clear data_xsmm;
|
||||
|
||||
% Clear the variables used in the raw data files.
|
||||
%clear data_st_*gemm_*;
|
||||
%clear data_mt_*gemm_*;
|
||||
opi, impl, ...
|
||||
fontsize, ...
|
||||
leg_pos_st, leg_pos_st_x, leg_pos_mt, ...
|
||||
sp_margins );
|
||||
end
|
||||
|
||||
% Construct the name of the file to which we will output the graph.
|
||||
outfile = sprintf( 'l3sup_%s_%s_%s_nt%d.pdf', oproot, stor_str, arch_str, nth );
|
||||
|
||||
% Output the graph to pdf format.
|
||||
%print(gcf, 'gemm_md','-fillpage','-dpdf');
|
||||
%print(gcf, outfile,'-bestfit','-dpdf');
|
||||
if impl == 'octave'
|
||||
if strcmp( impl, 'octave' )
|
||||
print(gcf, outfile);
|
||||
else % if impl == 'matlab'
|
||||
else
|
||||
print(gcf, outfile,'-bestfit','-dpdf');
|
||||
end
|
||||
|
||||
|
||||
@@ -1,40 +1,25 @@
|
||||
% kabylake
|
||||
plot_panel_trxsh(3.80,16,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/kabylake/20200302/mnkt100000_st','kbl','MKL','octave'); close; clear all;
|
||||
plot_panel_trxsh(3.80,16,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/kabylake/20200302/mnkt100000_st','kbl','MKL'); close; clear all;
|
||||
|
||||
% haswell
|
||||
plot_panel_trxsh(3.5,16,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/haswell/20200302/mnkt100000_st','has','MKL','octave'); close; clear all;
|
||||
plot_panel_trxsh(3.5,16,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/haswell/20200302/mnkt100000_st','has','MKL'); close; clear all;
|
||||
|
||||
% epyc
|
||||
plot_panel_trxsh(3.00, 8,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/epyc/20200302/mnkt100000_st','epyc','MKL','octave'); close; clear all;
|
||||
% zen
|
||||
plot_panel_trxsh(3.00, 8,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/epyc/20200302/mnkt100000_st','zen','MKL'); close; clear all;
|
||||
|
||||
% zen2
|
||||
plot_panel_trxsh(3.40,16, 1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all;
|
||||
plot_panel_trxsh(3.40,16, 1,'st','d','ccc',[ 6 8 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all;
|
||||
plot_panel_trxsh(3.40,16, 1,'st','s','rrr',[ 6 16 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all;
|
||||
plot_panel_trxsh(3.40,16, 1,'st','s','ccc',[ 6 16 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all;
|
||||
|
||||
plot_panel_trxsh(2.60,16,32,'mt','d','rrr',[ 6 8 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all;
|
||||
plot_panel_trxsh(2.60,16,32,'mt','d','ccc',[ 6 8 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all;
|
||||
plot_panel_trxsh(2.60,16,32,'mt','s','rrr',[ 6 16 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all;
|
||||
plot_panel_trxsh(2.60,16,32,'mt','s','ccc',[ 6 16 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all;
|
||||
|
||||
|
||||
|
||||
plot_panel_trxsh(3.40,16, 1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all; plot_panel_trxsh(3.40,16, 1,'st','d','ccc',[ 6 8 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all; plot_panel_trxsh(3.40,16, 1,'st','s','rrr',[ 6 16 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all; plot_panel_trxsh(3.40,16, 1,'st','s','ccc',[ 6 16 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all;
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
% Scratchpad
|
||||
% st d
|
||||
plot_panel_trxsh(3.80,16,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../output_st/d','kbl','MKL','octave');
|
||||
plot_panel_trxsh(3.80,16,1,'st','d','ccc',[ 6 8 4 ],'lds','uaub','../output_st/d','kbl','MKL','octave');
|
||||
% mt d
|
||||
plot_panel_trxsh(3.80,16,4,'mt','d','rrr',[ 6 8 10 ],'lds','uaub','../output_mt/d','kbl','MKL','octave');
|
||||
plot_panel_trxsh(3.80,16,4,'mt','d','ccc',[ 6 8 10 ],'lds','uaub','../output_mt/d','kbl','MKL','octave');
|
||||
% st s
|
||||
plot_panel_trxsh(3.80,16,1,'st','s','rrr',[ 6 16 4 ],'lds','uaub','../output_st/s','kbl','MKL','octave');
|
||||
plot_panel_trxsh(3.80,16,1,'st','s','ccc',[ 6 16 4 ],'lds','uaub','../output_st/s','kbl','MKL','octave');
|
||||
% mt s
|
||||
plot_panel_trxsh(3.80,16,4,'mt','s','rrr',[ 6 16 10 ],'lds','uaub','../output_mt/s','kbl','MKL','octave');
|
||||
plot_panel_trxsh(3.80,16,4,'mt','s','ccc',[ 6 16 10 ],'lds','uaub','../output_mt/s','kbl','MKL','octave');
|
||||
plot_panel_trxsh(2.60,16,32,'mt','d','rrr',[ 6 8 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all; plot_panel_trxsh(2.60,16,32,'mt','d','ccc',[ 6 8 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all; plot_panel_trxsh(2.60,16,32,'mt','s','rrr',[ 6 16 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all; plot_panel_trxsh(2.60,16,32,'mt','s','ccc',[ 6 16 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all;
|
||||
|
||||
126
test/sup/octave/subplot_tight.m
Normal file
@@ -0,0 +1,126 @@
|
||||
%
|
||||
% Copyright (c) 2016, Nikolay S.
|
||||
% All rights reserved.
|
||||
%
|
||||
% Redistribution and use in source and binary forms, with or without
|
||||
% modification, are permitted provided that the following conditions are
|
||||
% met:
|
||||
%
|
||||
% * Redistributions of source code must retain the above copyright
|
||||
% notice, this list of conditions and the following disclaimer.
|
||||
% * Redistributions in binary form must reproduce the above copyright
|
||||
% notice, this list of conditions and the following disclaimer in
|
||||
% the documentation and/or other materials provided with the distribution
|
||||
%
|
||||
% THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
|
||||
% AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
|
||||
% IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
|
||||
% ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
|
||||
% LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
|
||||
% CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
|
||||
% SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
|
||||
% INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
|
||||
% CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
|
||||
% ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
|
||||
% POSSIBILITY OF SUCH DAMAGE.
|
||||
%
|
||||
|
||||
function vargout=subplot_tight(m, n, p, margins, varargin)
|
||||
%% subplot_tight
|
||||
% A subplot function substitude with margins user tunabble parameter.
|
||||
%
|
||||
%% Syntax
|
||||
% h=subplot_tight(m, n, p);
|
||||
% h=subplot_tight(m, n, p, margins);
|
||||
% h=subplot_tight(m, n, p, margins, subplotArgs...);
|
||||
%
|
||||
%% Description
|
||||
% Our goal is to grant the user the ability to define the margins between neighbouring
|
||||
% subplots. Unfotrtunately Matlab subplot function lacks this functionality, and the
|
||||
% margins between subplots can reach 40% of figure area, which is pretty lavish. While at
|
||||
% the begining the function was implememnted as wrapper function for Matlab function
|
||||
% subplot, it was modified due to axes del;etion resulting from what Matlab subplot
|
||||
% detected as overlapping. Therefore, the current implmenetation makes no use of Matlab
|
||||
% subplot function, using axes instead. This can be problematic, as axis and subplot
|
||||
% parameters are quie different. Set isWrapper to "True" to return to wrapper mode, which
|
||||
% fully supports subplot format.
|
||||
%
|
||||
%% Input arguments (defaults exist):
|
||||
% margins- two elements vector [vertical,horizontal] defining the margins between
|
||||
% neighbouring axes. Default value is 0.04
|
||||
%
|
||||
%% Output arguments
|
||||
% same as subplot- none, or axes handle according to function call.
|
||||
%
|
||||
%% Issues & Comments
|
||||
% - Note that if additional elements are used in order to be passed to subplot, margins
|
||||
% parameter must be defined. For default margins value use empty element- [].
|
||||
% -
|
||||
%
|
||||
%% Example
|
||||
% close all;
|
||||
% img=imread('peppers.png');
|
||||
% figSubplotH=figure('Name', 'subplot');
|
||||
% figSubplotTightH=figure('Name', 'subplot_tight');
|
||||
% nElems=17;
|
||||
% subplotRows=ceil(sqrt(nElems)-1);
|
||||
% subplotRows=max(1, subplotRows);
|
||||
% subplotCols=ceil(nElems/subplotRows);
|
||||
% for iElem=1:nElems
|
||||
% figure(figSubplotH);
|
||||
% subplot(subplotRows, subplotCols, iElem);
|
||||
% imshow(img);
|
||||
% figure(figSubplotTightH);
|
||||
% subplot_tight(subplotRows, subplotCols, iElem, [0.0001]);
|
||||
% imshow(img);
|
||||
% end
|
||||
%
|
||||
%% See also
|
||||
% - subplot
|
||||
%
|
||||
%% Revision history
|
||||
% First version: Nikolay S. 2011-03-29.
|
||||
% Last update: Nikolay S. 2012-05-24.
|
||||
%
|
||||
% *List of Changes:*
|
||||
% 2012-05-24
|
||||
% Non wrapping mode (based on axes command) added, to deal with an issue of disappearing
|
||||
% subplots occuring with massive axes.
|
||||
|
||||
%% Default params
|
||||
isWrapper=false;
|
||||
if (nargin<4) || isempty(margins)
|
||||
margins=[0.04,0.04]; % default margins value- 4% of figure
|
||||
end
|
||||
if length(margins)==1
|
||||
margins(2)=margins;
|
||||
end
|
||||
|
||||
%note n and m are switched as Matlab indexing is column-wise, while subplot indexing is row-wise :(
|
||||
[subplot_col,subplot_row]=ind2sub([n,m],p);
|
||||
|
||||
|
||||
height=(1-(m+1)*margins(1))/m; % single subplot height
|
||||
width=(1-(n+1)*margins(2))/n; % single subplot width
|
||||
|
||||
% note subplot suppors vector p inputs- so a merged subplot of higher dimentions will be created
|
||||
subplot_cols=1+max(subplot_col)-min(subplot_col); % number of column elements in merged subplot
|
||||
subplot_rows=1+max(subplot_row)-min(subplot_row); % number of row elements in merged subplot
|
||||
|
||||
merged_height=subplot_rows*( height+margins(1) )- margins(1); % merged subplot height
|
||||
merged_width= subplot_cols*( width +margins(2) )- margins(2); % merged subplot width
|
||||
|
||||
merged_bottom=(m-max(subplot_row))*(height+margins(1)) +margins(1); % merged subplot bottom position
|
||||
merged_left=min(subplot_col)*(width+margins(2))-width; % merged subplot left position
|
||||
pos=[merged_left, merged_bottom, merged_width, merged_height];
|
||||
|
||||
|
||||
if isWrapper
|
||||
h=subplot(m, n, p, varargin{:}, 'Units', 'Normalized', 'Position', pos);
|
||||
else
|
||||
h=axes('Position', pos, varargin{:});
|
||||
end
|
||||
|
||||
if nargout==1
|
||||
vargout=h;
|
||||
end
|
||||