Added Epyc 7742 Zen2 ("Rome") sup perf results.

Details:
- Added single-threaded and multithreaded sup performance results to
  docs/PerformanceSmall.md for both sgemm and dgemm. These results were
  gathered on an Epyc 7742 "Rome" server featuring AMD's Zen2
  microarchitecture. Special thanks to Jeff Diamond for facilitating
  access to the system via the Oracle Cloud.
- Updates to octave scripts in test/sup/octave for use with Octave 5.2
  and for use with subplot_tight().
- Minor updates to octave scripts in test/3/octave.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
- Decreased line thickness slightly in large/conventional Zen2 graphs.
  I'm done tweaking those this time. Really.
- Added missing line regarding eigen header installation for each
  microarchitecture section.
This commit is contained in:
Field G. Van Zee
2020-10-09 15:41:09 -05:00
parent d98368c32d
commit addcd46b05
39 changed files with 360 additions and 155 deletions

View File

@@ -243,6 +243,7 @@ The `runthese.m` file will contain example invocations of the function.
endif()
```
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
* installed headers via `cmake . -DCMAKE_INSTALL_PREFIX=$HOME/flame/eigen; make install`
* The `gemm` implementation was pulled in at compile-time via Eigen headers; other operations were linked to Eigen's BLAS library.
* Single-threaded (1 core) execution requested via `export OMP_NUM_THREADS=1`
* Multithreaded (26 core) execution requested via `export OMP_NUM_THREADS=26`
@@ -323,6 +324,7 @@ The `runthese.m` file will contain example invocations of the function.
endif()
```
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
* installed headers via `cmake . -DCMAKE_INSTALL_PREFIX=$HOME/flame/eigen; make install`
* The `gemm` implementation was pulled in at compile-time via Eigen headers; other operations were linked to Eigen's BLAS library.
* Single-threaded (1 core) execution requested via `export OMP_NUM_THREADS=1`
* Multithreaded (12 core) execution requested via `export OMP_NUM_THREADS=12`
@@ -401,6 +403,7 @@ The `runthese.m` file will contain example invocations of the function.
endif()
```
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
* installed headers via `cmake . -DCMAKE_INSTALL_PREFIX=$HOME/flame/eigen; make install`
* The `gemm` implementation was pulled in at compile-time via Eigen headers; other operations were linked to Eigen's BLAS library.
* Single-threaded (1 core) execution requested via `export OMP_NUM_THREADS=1`
* Multithreaded (32 core) execution requested via `export OMP_NUM_THREADS=32`
@@ -483,6 +486,7 @@ The `runthese.m` file will contain example invocations of the function.
endif()
```
* configured and built BLAS library via `mkdir build; cd build; cmake ..; make blas`
* installed headers via `cmake . -DCMAKE_INSTALL_PREFIX=$HOME/flame/eigen; make install`
* The `gemm` implementation was pulled in at compile-time via Eigen headers; other operations were linked to Eigen's BLAS library.
* Single-threaded (1 core) execution requested via `export OMP_NUM_THREADS=1`
* Multithreaded (64 core) execution requested via `export OMP_NUM_THREADS=64`

View File

@@ -12,9 +12,12 @@
* **[Haswell](PerformanceSmall.md#haswell)**
* **[Experiment details](PerformanceSmall.md#haswell-experiment-details)**
* **[Results](PerformanceSmall.md#haswell-results)**
* **[Epyc](PerformanceSmall.md#epyc)**
* **[Experiment details](PerformanceSmall.md#epyc-experiment-details)**
* **[Results](PerformanceSmall.md#epyc-results)**
* **[Zen](PerformanceSmall.md#zen)**
* **[Experiment details](PerformanceSmall.md#zen-experiment-details)**
* **[Results](PerformanceSmall.md#zen-results)**
* **[Zen2](PerformanceSmall.md#zen2)**
* **[Experiment details](PerformanceSmall.md#zen2-experiment-details)**
* **[Results](PerformanceSmall.md#zen2-results)**
* **[Feedback](PerformanceSmall.md#feedback)**
# Introduction
@@ -295,9 +298,9 @@ The `runthese.m` file will contain example invocations of the function.
---
## Epyc
## Zen
### Epyc experiment details
### Zen experiment details
* Location: Oracle cloud
* Processor model: AMD Epyc 7551 (Zen1)
@@ -318,7 +321,7 @@ The `runthese.m` file will contain example invocations of the function.
* BLIS 90db88e (0.6.1-8)
* configured with `./configure --enable-cblas auto` (single-threaded)
* configured with `./configure --enable-cblas -t openmp auto` (multithreaded)
* sub-configuration exercised: `haswell`
* sub-configuration exercised: `zen`
* Multithreaded (32 cores) execution requested via `export BLIS_NUM_THREADS=32`
* OpenBLAS 0.3.8
* configured `Makefile.rule` with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0 USE_LOCKING=1` (single-threaded)
@@ -357,25 +360,124 @@ The `runthese.m` file will contain example invocations of the function.
* Comments:
* libxsmm is highly competitive for very small problems, but quickly gives up once the "large" dimension exceeds about 180-240 (or 64 in the case where all operands are square). Also, libxsmm's `gemm` cannot handle a transposition on matrix A and similarly dispatches the fallback implementation for those cases. libxsmm also does not export CBLAS interfaces, and therefore only appears on the graphs for column-stored matrices.
### Epyc results
### Zen results
#### pdf
* [Epyc single-threaded row-stored](graphs/sup/dgemm_rrr_epyc_nt1.pdf)
* [Epyc single-threaded column-stored](graphs/sup/dgemm_ccc_epyc_nt1.pdf)
* [Epyc multithreaded (32 cores) row-stored](graphs/sup/dgemm_rrr_epyc_nt32.pdf)
* [Epyc multithreaded (32 cores) column-stored](graphs/sup/dgemm_ccc_epyc_nt32.pdf)
* [Zen single-threaded row-stored](graphs/sup/dgemm_rrr_zen_nt1.pdf)
* [Zen single-threaded column-stored](graphs/sup/dgemm_ccc_zen_nt1.pdf)
* [Zen multithreaded (32 cores) row-stored](graphs/sup/dgemm_rrr_zen_nt32.pdf)
* [Zen multithreaded (32 cores) column-stored](graphs/sup/dgemm_ccc_zen_nt32.pdf)
#### png (inline)
* **Epyc single-threaded row-stored**
![single-threaded row-stored](graphs/sup/dgemm_rrr_epyc_nt1.png)
* **Epyc single-threaded column-stored**
![single-threaded column-stored](graphs/sup/dgemm_ccc_epyc_nt1.png)
* **Epyc multithreaded (32 cores) row-stored**
![multithreaded row-stored](graphs/sup/dgemm_rrr_epyc_nt32.png)
* **Epyc multithreaded (32 cores) column-stored**
![multithreaded column-stored](graphs/sup/dgemm_ccc_epyc_nt32.png)
* **Zen single-threaded row-stored**
![single-threaded row-stored](graphs/sup/dgemm_rrr_zen_nt1.png)
* **Zen single-threaded column-stored**
![single-threaded column-stored](graphs/sup/dgemm_ccc_zen_nt1.png)
* **Zen multithreaded (32 cores) row-stored**
![multithreaded row-stored](graphs/sup/dgemm_rrr_zen_nt32.png)
* **Zen multithreaded (32 cores) column-stored**
![multithreaded column-stored](graphs/sup/dgemm_ccc_zen_nt32.png)
---
## Zen2
### Zen2 experiment details
* Location: Oracle cloud
* Processor model: AMD Epyc 7742 (Zen2 "Rome")
* Core topology: two sockets, 8 Core Complex Dies (CCDs) per socket, 2 Core Complexes (CCX) per CCD, 4 cores per CCX, 128 cores total
* SMT status: enabled, but not utilized
* Max clock rate: 2.25GHz (base, documented); 3.4GHz boost (single-core, documented); 2.6GHz boost (multicore, estimated)
* Max vector register length: 256 bits (AVX2)
* Max FMA vector IPC: 2
* Alternatively, FMA vector IPC is 4 when vectors are limited to 128 bits each.
* Peak performance:
* single-core: 54.4 GFLOPS (double-precision), 108.8 GFLOPS (single-precision)
* multicore (estimated): 41.6 GFLOPS/core (double-precision), 83.2 GFLOPS/core (single-precision)
* Operating system: Ubuntu 18.04 (Linux kernel 4.15.0)
* Page size: 4096 bytes
* Compiler: gcc 9.3.0
* Results gathered: 8 October 2020
* Implementations tested:
* BLIS a0849d3 (0.7.0-67)
* configured with `./configure --enable-cblas auto` (single-threaded)
* configured with `./configure --enable-cblas -t openmp auto` (multithreaded)
* sub-configuration exercised: `zen2`
* Multithreaded (32 cores) execution requested via `export BLIS_NUM_THREADS=32`
* OpenBLAS 0.3.10
* configured `Makefile.rule` with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0 USE_LOCKING=1` (single-threaded)
* configured `Makefile.rule` with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=32` (multithreaded)
* Multithreaded (32 cores) execution requested via `export OPENBLAS_NUM_THREADS=32`
* BLASFEO 5b26d40
* configured `Makefile.rule` with: `BLAS_API=1 FORTRAN_BLAS_API=1 CBLAS_API=1`.
* built BLAS library via `make CC=gcc`
* Eigen 3.3.90
* Obtained via the [Eigen GitLab homepage](https://gitlab.com/libeigen/eigen) (24 September 2020)
* Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal):
```
# These lines added after line 60.
check_cxx_compiler_flag("-march=native" COMPILER_SUPPORTS_MARCH_NATIVE)
if(COMPILER_SUPPORTS_MARCH_NATIVE)
set(CMAKE_CXX_FLAGS "${CMAKE_CXX_FLAGS} -march=native")
endif()
```
* configured and built BLAS library via `mkdir build; cd build; CC=gcc cmake ..; make blas`
* installed headers via `cmake . -DCMAKE_INSTALL_PREFIX=$HOME/flame/eigen; make install`
* The `gemm` implementation was pulled in at compile-time via Eigen headers; other operations were linked to Eigen's BLAS library.
* Single-threaded (1 core) execution requested via `export OMP_NUM_THREADS=1`
* Multithreaded (32 cores) execution requested via `export OMP_NUM_THREADS=32`
* MKL 2020 update 3
* Single-threaded (1 core) execution requested via `export MKL_NUM_THREADS=1`
* Multithreaded (32 cores) execution requested via `export MKL_NUM_THREADS=32`
* libxsmm f0ab9cb (post-1.16.1)
* compiled with `make AVX=2`; linked with [netlib BLAS](http://www.netlib.org/blas/) 3.6.0 as the fallback library to better show where libxsmm stops handling the computation internally.
* Affinity:
* Thread affinity for BLIS was specified manually via `GOMP_CPU_AFFINITY="0-31"`. However, multithreaded OpenBLAS appears to revert to single-threaded execution if `GOMP_CPU_AFFINITY` is set. Therefore, when measuring OpenBLAS performance, the `GOMP_CPU_AFFINITY` environment variable was unset.
* All executables were run through `numactl --interleave=all`.
* Frequency throttling (via `cpupower`):
* Driver: acpi-cpufreq
* Governor: performance
* Hardware limits (steps): 1.5GHz, 2.0GHz, 2.25GHz
* Adjusted minimum: 2.25GHz
* Comments:
* None.
### Zen2 results
#### pdf
* [Zen2 sgemm single-threaded row-stored](graphs/sup/sgemm_rrr_zen2_nt1.pdf)
* [Zen2 sgemm single-threaded column-stored](graphs/sup/sgemm_ccc_zen2_nt1.pdf)
* [Zen2 dgemm single-threaded row-stored](graphs/sup/dgemm_rrr_zen2_nt1.pdf)
* [Zen2 dgemm single-threaded column-stored](graphs/sup/dgemm_ccc_zen2_nt1.pdf)
* [Zen2 sgemm multithreaded (32 cores) row-stored](graphs/sup/sgemm_rrr_zen2_nt32.pdf)
* [Zen2 sgemm multithreaded (32 cores) column-stored](graphs/sup/sgemm_ccc_zen2_nt32.pdf)
* [Zen2 dgemm multithreaded (32 cores) row-stored](graphs/sup/dgemm_rrr_zen2_nt32.pdf)
* [Zen2 dgemm multithreaded (32 cores) column-stored](graphs/sup/dgemm_ccc_zen2_nt32.pdf)
#### png (inline)
* **Zen2 sgemm single-threaded row-stored**
![sgemm single-threaded row-stored](graphs/sup/sgemm_rrr_zen2_nt1.png)
* **Zen2 sgemm single-threaded column-stored**
![sgemm single-threaded column-stored](graphs/sup/sgemm_ccc_zen2_nt1.png)
* **Zen2 dgemm single-threaded row-stored**
![dgemm single-threaded row-stored](graphs/sup/dgemm_rrr_zen2_nt1.png)
* **Zen2 dgemm single-threaded column-stored**
![dgemm single-threaded column-stored](graphs/sup/dgemm_ccc_zen2_nt1.png)
* **Zen2 sgemm multithreaded (32 cores) row-stored**
![sgemm multithreaded row-stored](graphs/sup/sgemm_rrr_zen2_nt32.png)
* **Zen2 sgemm multithreaded (32 cores) column-stored**
![sgemm multithreaded column-stored](graphs/sup/sgemm_ccc_zen2_nt32.png)
* **Zen2 dgemm multithreaded (32 cores) row-stored**
![dgemm multithreaded row-stored](graphs/sup/dgemm_rrr_zen2_nt32.png)
* **Zen2 dgemm multithreaded (32 cores) column-stored**
![dgemm multithreaded column-stored](graphs/sup/dgemm_ccc_zen2_nt32.png)
---

Binary file not shown.

Before

Width:  |  Height:  |  Size: 245 KiB

After

Width:  |  Height:  |  Size: 243 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 243 KiB

After

Width:  |  Height:  |  Size: 241 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 230 KiB

After

Width:  |  Height:  |  Size: 228 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 346 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 208 KiB

View File

Before

Width:  |  Height:  |  Size: 236 KiB

After

Width:  |  Height:  |  Size: 236 KiB

View File

Before

Width:  |  Height:  |  Size: 180 KiB

After

Width:  |  Height:  |  Size: 180 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 326 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 206 KiB

View File

Before

Width:  |  Height:  |  Size: 222 KiB

After

Width:  |  Height:  |  Size: 222 KiB

View File

Before

Width:  |  Height:  |  Size: 179 KiB

After

Width:  |  Height:  |  Size: 179 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 310 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 183 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 303 KiB

Binary file not shown.

Binary file not shown.

After

Width:  |  Height:  |  Size: 185 KiB

View File

@@ -85,7 +85,7 @@ else
xaxisname = 'm = n = k';
fontsize = 20;
end
linesize = 0.5;
linesize = 0.8;
legend_loc = 'southeast';
%ax1 = subplot( rows, cols, theid );
@@ -188,14 +188,14 @@ set( titl, 'Position', tpos ); % here we nudge it back to centered with box.
set( titl, 'FontSize', fontsize );
if theid > (rows-1)*cols
xlab = xlabel( ax1,xaxisname );
%tpos = get( xlab, 'Position' )
%tpos(2) = tpos(2) + 10;
%set( xlab, 'Position', tpos );
%tpos = get( xlab, 'Position' )
%tpos(2) = tpos(2) + 10;
%set( xlab, 'Position', tpos );
xlab = xlabel( ax1,xaxisname );
end
if mod(theid-1,cols) == 0
ylab = ylabel( ax1,yaxisname );
ylab = ylabel( ax1,yaxisname );
end
r_val = 0;

View File

@@ -1,11 +1,14 @@
function r_val = plot_panel_4x5( cfreq, ...
dflopspercycle, ...
nth, ...
thr_str, ...
dirpath, ...
arch_str, ...
vend_str, ...
with_eigen )
function r_val = plot_panel_4x5 ...
( ...
cfreq, ...
dflopspercycle, ...
nth, ...
thr_str, ...
dirpath, ...
arch_str, ...
vend_str, ...
with_eigen ...
)
impl = 'octave';
%impl = 'matlab';
@@ -138,7 +141,6 @@ end
outfile = sprintf( 'l3_perf_%s_nt%d.pdf', arch_str, nth );
% Output the graph to pdf format.
%print(gcf, 'gemm_md','-fillpage','-dpdf');
if strcmp( impl, 'octave' )
print( gcf, outfile );
else

View File

@@ -22,3 +22,5 @@ plot_panel_4x5(2.55,8,64,'2s','../results/epyc/merged20190306_0319_0328/jc2ic8jr
plot_panel_4x5(3.40,16,1, 'st','../results/zen2/20200929/st', 'zen2','MKL',1); close all; clear all;
plot_panel_4x5(2.60,16,64, '1s','../results/zen2/20200929/jc4ic4jr4','zen2','MKL',1); close all; clear all;
plot_panel_4x5(2.60,16,128,'2s','../results/zen2/20200929/jc8ic4jr4','zen2','MKL',1); close all; clear all;
plot_panel_4x5(3.40,16,1, 'st','../results/zen2/20200929/st', 'zen2','MKL',1); close all; clear all; plot_panel_4x5(2.60,16,64, '1s','../results/zen2/20200929/jc4ic4jr4','zen2','MKL',1); close all; clear all; plot_panel_4x5(2.60,16,128,'2s','../results/zen2/20200929/jc8ic4jr4','zen2','MKL',1); close all; clear all;

View File

@@ -11,7 +11,10 @@ function r_val = plot_l3sup_perf( opname, ...
rows, cols, ...
cfreq, ...
dfps, ...
theid, impl )
theid, impl, ...
fontsize, ...
leg_pos_st, leg_pos_st_x, leg_pos_mt, ...
sp_margins )
% Define the column in which the performance rates are found.
flopscol = size( data_blissup, 2 );
@@ -32,23 +35,8 @@ end
% NOTE: We can draw the legend on any graph as long as it has already been
% rendered. Since the coordinates are global, we can simply always wait until
% the final graph to draw the legend.
%if nth == 1
% if has_xsmm == 1
% legend_plot_id = 2*cols + 1*5;
% else
% legend_plot_id = 1*cols + 1*5;
% end
%else
% legend_plot_id = 0*cols + 1*6;
%end
legend_plot_id = cols*rows;
% Hold the axes.
if 1
ax1 = subplot( rows, cols, theid );
hold( ax1, 'on' );
end
% Set line properties.
color_blissup = 'k'; lines_blissup = '-'; markr_blissup = '';
color_blisconv = 'k'; lines_blisconv = ':'; markr_blisconv = '';
@@ -97,17 +85,17 @@ else
yaxisname = 'GFLOPS/core';
end
%flopscol = 4;
% Set the marker size, line size, and other items.
msize = 5;
if 1
fontsize = 12;
else
fontsize = 16;
end
linesize = 0.5;
linesize = 0.8;
legend_loc = 'southeast';
%ax1 = subplot( rows, cols, theid );
ax1 = subplot_tight( rows, cols, theid, sp_margins );
% Hold the axes.
hold( ax1, 'on' );
% --------------------------------------------------------------------
% Automatically detect a column with the increasing problem size.
@@ -199,6 +187,7 @@ end
% xpos ypos
%set( leg,'Position',[11.32 6.36 1.15 0.7 ] ); % (1,4tl)
if nth == 1 && theid == legend_plot_id
if has_xsmm == 1
% single-threaded, with libxsmm (ccc)
@@ -207,13 +196,9 @@ if nth == 1 && theid == legend_plot_id
blissup_lg, blisconv_lg, eigen_lg, open_lg, vend_lg, bfeo_lg, xsmm_lg, ...
'Location', legend_loc );
set( leg,'Box','off','Color','none','Units','inches' );
if impl == 'octave'
set( leg,'FontSize',fontsize );
set( leg,'Position',[15.35 4.62 1.9 1.20] ); % (1,4tl)
else
set( leg,'FontSize',fontsize-3 );
set( leg,'Position',[18.20 10.20 1.15 0.7 ] ); % (1,4tl)
end
set( leg,'FontSize',fontsize );
%set( leg,'Position',[15.35 4.62 1.9 1.20] );
set( leg,'Position',leg_pos_st_x );
else
% single-threaded, without libxsmm (rrr, or other)
leg = legend( ...
@@ -221,13 +206,9 @@ if nth == 1 && theid == legend_plot_id
blissup_lg, blisconv_lg, eigen_lg, open_lg, vend_lg, bfeo_lg, ...
'Location', legend_loc );
set( leg,'Box','off','Color','none','Units','inches' );
if impl == 'octave'
set( leg,'FontSize',fontsize );
set( leg,'Position',[15.35 7.40 1.9 1.10] ); % (1,4tl)
else
set( leg,'FontSize',fontsize-1 );
set( leg,'Position',[18.24 10.15 1.15 0.7] ); % (1,4tl)
end
set( leg,'FontSize',fontsize );
%set( leg,'Position',[15.35 7.40 1.9 1.10] );
set( leg,'Position',leg_pos_st );
end
elseif nth > 1 && theid == legend_plot_id
% multithreaded
@@ -236,13 +217,9 @@ elseif nth > 1 && theid == legend_plot_id
blissup_lg, blisconv_lg, eigen_lg, open_lg, vend_lg, ...
'Location', legend_loc );
set( leg,'Box','off','Color','none','Units','inches' );
if impl == 'octave'
set( leg,'FontSize',fontsize );
set( leg,'Position',[18.20 10.30 1.9 0.95] ); % (1,4tl)
else
set( leg,'FontSize',fontsize-1 );
set( leg,'Position',[18.24 10.15 1.15 0.7] ); % (1,4tl)
end
set( leg,'FontSize',fontsize );
%set( leg,'Position',[18.20 10.30 1.9 0.95] );
set( leg,'Position',leg_pos_mt );
end
set( ax1,'FontSize',fontsize );
@@ -256,16 +233,18 @@ set( titl, 'FontWeight', 'normal' ); % default font style is now 'bold'.
% This is a hack to nudge the title back to the center of the box.
if impl == 'octave'
tpos = get( titl, 'Position' );
% For some reason, the titles in the graphs in the last column start
% off in a different relative position than the graphs in the other
% columns. Here, we manually account for that.
if mod(theid-1,cols) == 6
tpos(1) = tpos(1) + -10;
else
tpos(1) = tpos(1) + -40;
end
% For some reason, the titles in the graphs in certain columns start
% off in a different relative position. Here, we manually fix that.
%modid = mod(theid-1,cols);
%if modid == 0 || modid == 1 || modid == 2
% tpos(1) = tpos(1) + 0;
%elseif modid == 3 || modid == 4 || modid == 5
% tpos(1) = tpos(1) + 0;
%else
% tpos(1) = tpos(1) + 0;
%end
set( titl, 'Position', tpos );
set( titl, 'FontSize', fontsize );
set( titl, 'FontSize', fontsize-1 );
else % impl == 'matlab'
tpos = get( titl, 'Position' );
tpos(1) = tpos(1) + 90;

View File

@@ -11,26 +11,42 @@ function r_val = plot_panel_trxsh ...
pack_str, ...
dirpath, ...
arch_str, ...
vend_str, ...
impl ...
vend_str ...
)
if 1 == 1
%fig = figure('Position', [100, 100, 2400, 1500]);
fig = figure('Position', [100, 100, 2400, 1200]);
orient( fig, 'portrait' );
set(gcf,'PaperUnits', 'inches');
if impl == 'matlab'
set(gcf,'PaperSize', [11.5 20.4]);
set(gcf,'PaperPosition', [0 0 11.5 20.4]);
set(gcf,'PaperPositionMode','manual');
else % impl == 'octave' % octave 4.x
set(gcf,'PaperSize', [12 22.0]);
set(gcf,'PaperPositionMode','auto');
end
set(gcf,'PaperOrientation','landscape');
impl = 'octave';
%subp = 'default';
subp = 'tight';
if strcmp( subp, 'default' )
position = [100 100 2400 1200];
papersize = [12 22.0];
sp_margins = [ 0.070 0.049 ];
else
position = [100 100 2308 1202];
papersize = [12.5 24.0];
fontsize = 14;
leg_pos_st = [10.85 7.43 1.3 1.2 ];
leg_pos_st_x = [14.15 4.35 1.3 1.4 ];
leg_pos_mt = [10.85 7.66 1.3 1.0 ];
sp_margins = [ 0.063 0.033 ];
end
%fig = figure('Position', [100, 100, 2400, 1500]);
fig = figure('Position', position);
orient( fig, 'portrait' );
set(gcf,'PaperUnits', 'inches');
if impl == 'octave'
set(gcf,'PaperSize', papersize);
set(gcf,'PaperPositionMode','auto');
else % impl == 'matlab'
set(gcf,'PaperSize', [11.5 20.4]);
set(gcf,'PaperPosition', [0 0 11.5 20.4]);
set(gcf,'PaperPositionMode','manual');
end
set(gcf,'PaperOrientation','landscape');
% Create filename "templates" for the files that contain the performance
% results.
filetemp_blissup = '%s/output_%s_%s_blissup.m';
@@ -92,7 +108,8 @@ for opi = 1:n_opsupnames
% Only read libxsmm data for single-threaded cases, and cases that use column
% storage since that's the only format that libxsmm supports.
if nth == 1 && stor_str == 'ccc'
%if nth == 1 && stor_str == 'ccc'
if nth == 1 && strcmp( stor_str, 'ccc' )
data_xsmm = load_data( filetemp_xsmm, dirpath, thr_str, opsupname, vartemp, opname, 'libxsmm' );
else
data_xsmm = zeros( size( data_blissup, 1 ), size( data_blissup, 2 ) );
@@ -113,31 +130,19 @@ for opi = 1:n_opsupnames
4, 7, ...
cfreq, ...
dflopspercycle, ...
opi, impl );
% Clear the variables created for the return values of load_data().
%clear data_blissup;
%clear data_blisconv;
%clear data_eigen;
%clear data_open;
%clear data_vend;
%clear data_bfeo;
%clear data_xsmm;
% Clear the variables used in the raw data files.
%clear data_st_*gemm_*;
%clear data_mt_*gemm_*;
opi, impl, ...
fontsize, ...
leg_pos_st, leg_pos_st_x, leg_pos_mt, ...
sp_margins );
end
% Construct the name of the file to which we will output the graph.
outfile = sprintf( 'l3sup_%s_%s_%s_nt%d.pdf', oproot, stor_str, arch_str, nth );
% Output the graph to pdf format.
%print(gcf, 'gemm_md','-fillpage','-dpdf');
%print(gcf, outfile,'-bestfit','-dpdf');
if impl == 'octave'
if strcmp( impl, 'octave' )
print(gcf, outfile);
else % if impl == 'matlab'
else
print(gcf, outfile,'-bestfit','-dpdf');
end

View File

@@ -1,40 +1,25 @@
% kabylake
plot_panel_trxsh(3.80,16,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/kabylake/20200302/mnkt100000_st','kbl','MKL','octave'); close; clear all;
plot_panel_trxsh(3.80,16,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/kabylake/20200302/mnkt100000_st','kbl','MKL'); close; clear all;
% haswell
plot_panel_trxsh(3.5,16,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/haswell/20200302/mnkt100000_st','has','MKL','octave'); close; clear all;
plot_panel_trxsh(3.5,16,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/haswell/20200302/mnkt100000_st','has','MKL'); close; clear all;
% epyc
plot_panel_trxsh(3.00, 8,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/epyc/20200302/mnkt100000_st','epyc','MKL','octave'); close; clear all;
% zen
plot_panel_trxsh(3.00, 8,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/epyc/20200302/mnkt100000_st','zen','MKL'); close; clear all;
% zen2
plot_panel_trxsh(3.40,16, 1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all;
plot_panel_trxsh(3.40,16, 1,'st','d','ccc',[ 6 8 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all;
plot_panel_trxsh(3.40,16, 1,'st','s','rrr',[ 6 16 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all;
plot_panel_trxsh(3.40,16, 1,'st','s','ccc',[ 6 16 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all;
plot_panel_trxsh(2.60,16,32,'mt','d','rrr',[ 6 8 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all;
plot_panel_trxsh(2.60,16,32,'mt','d','ccc',[ 6 8 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all;
plot_panel_trxsh(2.60,16,32,'mt','s','rrr',[ 6 16 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all;
plot_panel_trxsh(2.60,16,32,'mt','s','ccc',[ 6 16 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all;
plot_panel_trxsh(3.40,16, 1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all; plot_panel_trxsh(3.40,16, 1,'st','d','ccc',[ 6 8 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all; plot_panel_trxsh(3.40,16, 1,'st','s','rrr',[ 6 16 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all; plot_panel_trxsh(3.40,16, 1,'st','s','ccc',[ 6 16 4 ],'lds','uaub','../results/zen2/20201006/mnkt100000_st', 'zen2','MKL'); close; clear all;
% Scratchpad
% st d
plot_panel_trxsh(3.80,16,1,'st','d','rrr',[ 6 8 4 ],'lds','uaub','../output_st/d','kbl','MKL','octave');
plot_panel_trxsh(3.80,16,1,'st','d','ccc',[ 6 8 4 ],'lds','uaub','../output_st/d','kbl','MKL','octave');
% mt d
plot_panel_trxsh(3.80,16,4,'mt','d','rrr',[ 6 8 10 ],'lds','uaub','../output_mt/d','kbl','MKL','octave');
plot_panel_trxsh(3.80,16,4,'mt','d','ccc',[ 6 8 10 ],'lds','uaub','../output_mt/d','kbl','MKL','octave');
% st s
plot_panel_trxsh(3.80,16,1,'st','s','rrr',[ 6 16 4 ],'lds','uaub','../output_st/s','kbl','MKL','octave');
plot_panel_trxsh(3.80,16,1,'st','s','ccc',[ 6 16 4 ],'lds','uaub','../output_st/s','kbl','MKL','octave');
% mt s
plot_panel_trxsh(3.80,16,4,'mt','s','rrr',[ 6 16 10 ],'lds','uaub','../output_mt/s','kbl','MKL','octave');
plot_panel_trxsh(3.80,16,4,'mt','s','ccc',[ 6 16 10 ],'lds','uaub','../output_mt/s','kbl','MKL','octave');
plot_panel_trxsh(2.60,16,32,'mt','d','rrr',[ 6 8 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all; plot_panel_trxsh(2.60,16,32,'mt','d','ccc',[ 6 8 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all; plot_panel_trxsh(2.60,16,32,'mt','s','rrr',[ 6 16 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all; plot_panel_trxsh(2.60,16,32,'mt','s','ccc',[ 6 16 10 ],'lds','uaub','../results/zen2/20201006/mnkt100000_mt32','zen2','MKL'); close; clear all;

View File

@@ -0,0 +1,126 @@
%
% Copyright (c) 2016, Nikolay S.
% All rights reserved.
%
% Redistribution and use in source and binary forms, with or without
% modification, are permitted provided that the following conditions are
% met:
%
% * Redistributions of source code must retain the above copyright
% notice, this list of conditions and the following disclaimer.
% * Redistributions in binary form must reproduce the above copyright
% notice, this list of conditions and the following disclaimer in
% the documentation and/or other materials provided with the distribution
%
% THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS"
% AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE
% IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE
% ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR CONTRIBUTORS BE
% LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR
% CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF
% SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS
% INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN
% CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE)
% ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE
% POSSIBILITY OF SUCH DAMAGE.
%
function vargout=subplot_tight(m, n, p, margins, varargin)
%% subplot_tight
% A subplot function substitude with margins user tunabble parameter.
%
%% Syntax
% h=subplot_tight(m, n, p);
% h=subplot_tight(m, n, p, margins);
% h=subplot_tight(m, n, p, margins, subplotArgs...);
%
%% Description
% Our goal is to grant the user the ability to define the margins between neighbouring
% subplots. Unfotrtunately Matlab subplot function lacks this functionality, and the
% margins between subplots can reach 40% of figure area, which is pretty lavish. While at
% the begining the function was implememnted as wrapper function for Matlab function
% subplot, it was modified due to axes del;etion resulting from what Matlab subplot
% detected as overlapping. Therefore, the current implmenetation makes no use of Matlab
% subplot function, using axes instead. This can be problematic, as axis and subplot
% parameters are quie different. Set isWrapper to "True" to return to wrapper mode, which
% fully supports subplot format.
%
%% Input arguments (defaults exist):
% margins- two elements vector [vertical,horizontal] defining the margins between
% neighbouring axes. Default value is 0.04
%
%% Output arguments
% same as subplot- none, or axes handle according to function call.
%
%% Issues & Comments
% - Note that if additional elements are used in order to be passed to subplot, margins
% parameter must be defined. For default margins value use empty element- [].
% -
%
%% Example
% close all;
% img=imread('peppers.png');
% figSubplotH=figure('Name', 'subplot');
% figSubplotTightH=figure('Name', 'subplot_tight');
% nElems=17;
% subplotRows=ceil(sqrt(nElems)-1);
% subplotRows=max(1, subplotRows);
% subplotCols=ceil(nElems/subplotRows);
% for iElem=1:nElems
% figure(figSubplotH);
% subplot(subplotRows, subplotCols, iElem);
% imshow(img);
% figure(figSubplotTightH);
% subplot_tight(subplotRows, subplotCols, iElem, [0.0001]);
% imshow(img);
% end
%
%% See also
% - subplot
%
%% Revision history
% First version: Nikolay S. 2011-03-29.
% Last update: Nikolay S. 2012-05-24.
%
% *List of Changes:*
% 2012-05-24
% Non wrapping mode (based on axes command) added, to deal with an issue of disappearing
% subplots occuring with massive axes.
%% Default params
isWrapper=false;
if (nargin<4) || isempty(margins)
margins=[0.04,0.04]; % default margins value- 4% of figure
end
if length(margins)==1
margins(2)=margins;
end
%note n and m are switched as Matlab indexing is column-wise, while subplot indexing is row-wise :(
[subplot_col,subplot_row]=ind2sub([n,m],p);
height=(1-(m+1)*margins(1))/m; % single subplot height
width=(1-(n+1)*margins(2))/n; % single subplot width
% note subplot suppors vector p inputs- so a merged subplot of higher dimentions will be created
subplot_cols=1+max(subplot_col)-min(subplot_col); % number of column elements in merged subplot
subplot_rows=1+max(subplot_row)-min(subplot_row); % number of row elements in merged subplot
merged_height=subplot_rows*( height+margins(1) )- margins(1); % merged subplot height
merged_width= subplot_cols*( width +margins(2) )- margins(2); % merged subplot width
merged_bottom=(m-max(subplot_row))*(height+margins(1)) +margins(1); % merged subplot bottom position
merged_left=min(subplot_col)*(width+margins(2))-width; % merged subplot left position
pos=[merged_left, merged_bottom, merged_width, merged_height];
if isWrapper
h=subplot(m, n, p, varargin{:}, 'Units', 'Normalized', 'Position', pos);
else
h=axes('Position', pos, varargin{:});
end
if nargout==1
vargout=h;
end