Added BLASFEO results to docs/PerformanceSmall.md.

Details:
- Updated the graphs linked in PerformanceSmall.md with BLASFEO results,
  and added documenting language accordingly.
- Updated scripts in test/sup/octave to plot BLASFEO data.
- Minor tweak to language re: how OpenBLAS was configured for
  docs/Performance.md.
This commit is contained in:
Field G. Van Zee
2019-06-04 16:06:58 -05:00
parent 763fa39c30
commit cbaa22e1ca
12 changed files with 43 additions and 19 deletions

View File

@@ -137,8 +137,8 @@ size of interest so that we can better assist you.
* Multithreaded (28 core) execution requested via `export BLIS_JC_NT=4 BLIS_IC_NT=7`
* Multithreaded (56 core) execution requested via `export BLIS_JC_NT=8 BLIS_IC_NT=7`
* OpenBLAS 52d3f7a
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=56` (multithreaded, 56 cores)
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=56` (multithreaded, 56 cores)
* Single-threaded (1 core) execution requested via `export OPENBLAS_NUM_THREADS=1`
* Multithreaded (28 core) execution requested via `export OPENBLAS_NUM_THREADS=28`
* Multithreaded (56 core) execution requested via `export OPENBLAS_NUM_THREADS=56`
@@ -197,8 +197,8 @@ size of interest so that we can better assist you.
* Multithreaded (26 core) execution requested via `export BLIS_JC_NT=2 BLIS_IC_NT=13`
* Multithreaded (52 core) execution requested via `export BLIS_JC_NT=4 BLIS_IC_NT=13`
* OpenBLAS 0.3.5
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=52` (multithreaded, 52 cores)
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=52` (multithreaded, 52 cores)
* Single-threaded (1 core) execution requested via `export OPENBLAS_NUM_THREADS=1`
* Multithreaded (26 core) execution requested via `export OPENBLAS_NUM_THREADS=26`
* Multithreaded (52 core) execution requested via `export OPENBLAS_NUM_THREADS=52`
@@ -269,8 +269,8 @@ size of interest so that we can better assist you.
* Multithreaded (12 core) execution requested via `export BLIS_JC_NT=2 BLIS_IC_NT=3 BLIS_JR_NT=2`
* Multithreaded (24 core) execution requested via `export BLIS_JC_NT=4 BLIS_IC_NT=3 BLIS_JR_NT=2`
* OpenBLAS 0.3.5
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=24` (multithreaded, 24 cores)
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=24` (multithreaded, 24 cores)
* Single-threaded (1 core) execution requested via `export OPENBLAS_NUM_THREADS=1`
* Multithreaded (12 core) execution requested via `export OPENBLAS_NUM_THREADS=12`
* Multithreaded (24 core) execution requested via `export OPENBLAS_NUM_THREADS=24`
@@ -339,8 +339,8 @@ size of interest so that we can better assist you.
* Multithreaded (32 core) execution requested via `export BLIS_JC_NT=1 BLIS_IC_NT=8 BLIS_JR_NT=4`
* Multithreaded (64 core) execution requested via `export BLIS_JC_NT=2 BLIS_IC_NT=8 BLIS_JR_NT=4`
* OpenBLAS 0.3.5
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=64` (multithreaded, 64 cores)
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=64` (multithreaded, 64 cores)
* Single-threaded (1 core) execution requested via `export OPENBLAS_NUM_THREADS=1`
* Multithreaded (32 core) execution requested via `export OPENBLAS_NUM_THREADS=32`
* Multithreaded (64 core) execution requested via `export OPENBLAS_NUM_THREADS=64`

View File

@@ -112,13 +112,15 @@ size of interest so that we can better assist you.
* single-core: 57.6 GFLOPS (double-precision), 115.2 GFLOPS (single-precision)
* Operating system: Gentoo Linux (Linux kernel 5.0.7)
* Compiler: gcc 7.3.0
* Results gathered: 31 May 2019
* Results gathered: 31 May 2019, 3 June 2019
* Implementations tested:
* BLIS 6bf449c (0.5.2-42)
* configured with `./configure --enable-cblas auto`
* sub-configuration exercised: `haswell`
* OpenBLAS 0.3.6
* configured with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* configured `Makefile.rule` with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* BLASFEO 75a3dd8
* configured `Makefile.rule` with: `BLAS_API=1 FORTRAN_BLAS_API=1 CBLAS_API=1`.
* Eigen 3.3.90
* Obtained via the [Eigen git mirror](https://github.com/eigenteam/eigen-git-mirror) (30 May 2019)
* Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal).
@@ -170,13 +172,15 @@ size of interest so that we can better assist you.
* single-core: 24 GFLOPS (double-precision), 48 GFLOPS (single-precision)
* Operating system: Ubuntu 18.04 (Linux kernel 4.15.0)
* Compiler: gcc 7.3.0
* Results gathered: 31 May 2019
* Results gathered: 31 May 2019, 3 June 2019
* Implementations tested:
* BLIS 6bf449c (0.5.2-42)
* configured with `./configure --enable-cblas auto`
* sub-configuration exercised: `zen`
* OpenBLAS 0.3.6
* configured with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* configured `Makefile.rule` with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
* BLASFEO 75a3dd8
* configured `Makefile.rule` with: `BLAS_API=1 FORTRAN_BLAS_API=1 CBLAS_API=1`.
* Eigen 3.3.90
* Obtained via the [Eigen git mirror](https://github.com/eigenteam/eigen-git-mirror) (30 May 2019)
* Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal).

Binary file not shown.

Before

Width:  |  Height:  |  Size: 151 KiB

After

Width:  |  Height:  |  Size: 168 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 175 KiB

After

Width:  |  Height:  |  Size: 198 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 152 KiB

After

Width:  |  Height:  |  Size: 170 KiB

Binary file not shown.

Before

Width:  |  Height:  |  Size: 184 KiB

After

Width:  |  Height:  |  Size: 206 KiB

View File

@@ -3,6 +3,7 @@ function r_val = plot_l3sup_perf( opname, ...
data_blislpab, ...
data_eigen, ...
data_open, ...
data_bfeo, ...
data_vend, vend_str, ...
nth, ...
rows, cols, ...
@@ -31,6 +32,7 @@ color_blissup = 'k'; lines_blissup = '-'; markr_blissup = '';
color_blislpab = 'k'; lines_blislpab = ':'; markr_blislpab = '';
color_eigen = 'm'; lines_eigen = '-.'; markr_eigen = 'o';
color_open = 'r'; lines_open = '--'; markr_open = 'o';
color_bfeo = 'c'; lines_bfeo = '-'; markr_bfeo = 'o';
color_vend = 'b'; lines_vend = '-.'; markr_vend = '.';
% Compute the peak performance in terms of the number of double flops
@@ -54,6 +56,7 @@ blissup_legend = sprintf( 'BLIS sup' );
blislpab_legend = sprintf( 'BLIS conv' );
eigen_legend = sprintf( 'Eigen' );
open_legend = sprintf( 'OpenBLAS' );
bfeo_legend = sprintf( 'BLASFEO' );
%vend_legend = sprintf( 'MKL' );
%vend_legend = sprintf( 'ARMPL' );
vend_legend = vend_str;
@@ -113,6 +116,9 @@ eigen_ln = line( x_axis( :, 1 ), data_eigen( :, flopscol ) / nth, ...
open_ln = line( x_axis( :, 1 ), data_open( :, flopscol ) / nth, ...
'Color',color_open, 'LineStyle',lines_open, ...
'LineWidth',linesize );
bfeo_ln = line( x_axis( :, 1 ), data_bfeo( :, flopscol ) / nth, ...
'Color',color_bfeo, 'LineStyle',lines_bfeo, ...
'LineWidth',linesize );
vend_ln = line( x_axis( :, 1 ), data_vend( :, flopscol ) / nth, ...
'Color',color_vend, 'LineStyle',lines_vend, ...
'LineWidth',linesize );
@@ -130,6 +136,9 @@ eigen_ln = line( nan, nan, ...
open_ln = line( nan, nan, ...
'Color',color_open, 'LineStyle',lines_open, ...
'LineWidth',linesize );
bfeo_ln = line( nan, nan, ...
'Color',color_bfeo, 'LineStyle',lines_bfeo, ...
'LineWidth',linesize );
vend_ln = line( nan, nan, ...
'Color',color_vend, 'LineStyle',lines_vend, ...
'LineWidth',linesize );
@@ -168,12 +177,14 @@ if rows == 4 && cols == 7
blislpab_ln ...
eigen_ln ...
open_ln ...
bfeo_ln ...
vend_ln ...
], ...
blissup_legend, ...
blislpab_legend, ...
eigen_legend, ...
open_legend, ...
bfeo_legend, ...
vend_legend, ...
'Location', legend_loc );
set( leg,'Box','off' );
@@ -185,8 +196,8 @@ if rows == 4 && cols == 7
set( leg,'FontSize',fontsize );
set( leg,'Position',[11.92 6.54 1.15 0.7 ] ); % (1,4tl)
else
set( leg,'FontSize',fontsize );
set( leg,'Position',[18.34 10.22 1.15 0.7 ] ); % (1,4tl)
set( leg,'FontSize',fontsize-1 );
set( leg,'Position',[18.24 10.15 1.15 0.7 ] ); % (1,4tl)
end
elseif nth > 1 && theid == legend_plot_id
end

View File

@@ -22,6 +22,7 @@ filetemp_blissup = '%s/output_%s_%s_blissup.m';
filetemp_blislpab = '%s/output_%s_%s_blislpab.m';
filetemp_eigen = '%s/output_%s_%s_eigen.m';
filetemp_open = '%s/output_%s_%s_openblas.m';
filetemp_bfeo = '%s/output_%s_%s_blasfeo.m';
filetemp_vend = '%s/output_%s_%s_vendor.m';
% Create a variable name "template" for the variables contained in the
@@ -76,6 +77,7 @@ for opi = 1:n_opsupnames
file_blislpab = sprintf( filetemp_blislpab, dirpath, thr_str, opsupname );
file_eigen = sprintf( filetemp_eigen, dirpath, thr_str, opsupname );
file_open = sprintf( filetemp_open, dirpath, thr_str, opsupname );
file_bfeo = sprintf( filetemp_bfeo, dirpath, thr_str, opsupname );
file_vend = sprintf( filetemp_vend, dirpath, thr_str, opsupname );
% Load the data files.
@@ -87,6 +89,8 @@ for opi = 1:n_opsupnames
run( file_eigen )
%str = sprintf( ' Loading %s', file_open ); disp(str);
run( file_open )
%str = sprintf( ' Loading %s', file_open ); disp(str);
run( file_bfeo )
%str = sprintf( ' Loading %s', file_vend ); disp(str);
run( file_vend )
@@ -95,20 +99,23 @@ for opi = 1:n_opsupnames
var_blislpab = sprintf( vartemp, thr_str, opname, 'blislpab' );
var_eigen = sprintf( vartemp, thr_str, opname, 'eigen' );
var_open = sprintf( vartemp, thr_str, opname, 'openblas' );
var_bfeo = sprintf( vartemp, thr_str, opname, 'blasfeo' );
var_vend = sprintf( vartemp, thr_str, opname, 'vendor' );
% Use eval() to instantiate the variable names constructed above,
% copying each to a simplified name.
data_blissup = eval( var_blissup ); % e.g. data_st_sgemm_blissup( :, : );
data_blislpab = eval( var_blislpab ); % e.g. data_st_sgemm_blislpab( :, : );
data_eigen = eval( var_eigen ); % e.g. data_st_sgemm_eigen( :, : );
data_open = eval( var_open ); % e.g. data_st_sgemm_openblas( :, : );
data_vend = eval( var_vend ); % e.g. data_st_sgemm_vendor( :, : );
data_blissup = eval( var_blissup ); % e.g. data_st_dgemm_blissup( :, : );
data_blislpab = eval( var_blislpab ); % e.g. data_st_dgemm_blislpab( :, : );
data_eigen = eval( var_eigen ); % e.g. data_st_dgemm_eigen( :, : );
data_open = eval( var_open ); % e.g. data_st_dgemm_openblas( :, : );
data_bfeo = eval( var_bfeo ); % e.g. data_st_dgemm_blasfeo( :, : );
data_vend = eval( var_vend ); % e.g. data_st_dgemm_vendor( :, : );
%str = sprintf( ' Reading %s', var_blissup ); disp(str);
%str = sprintf( ' Reading %s', var_blislpab ); disp(str);
%str = sprintf( ' Reading %s', var_eigen ); disp(str);
%str = sprintf( ' Reading %s', var_open ); disp(str);
%str = sprintf( ' Reading %s', var_bfeo ); disp(str);
%str = sprintf( ' Reading %s', var_vend ); disp(str);
% Plot one result in an m x n grid of plots, via the subplot()
@@ -119,6 +126,7 @@ for opi = 1:n_opsupnames
data_blislpab, ...
data_eigen, ...
data_open, ...
data_bfeo, ...
data_vend, vend_str, ...
nth, ...
4, 7, ...
@@ -131,6 +139,7 @@ for opi = 1:n_opsupnames
clear data_blislpab;
clear data_eigen;
clear data_open;
clear data_bfeo;
clear data_vend;
end