Added BLASFEO results to docs/PerformanceSmall.md.
Details: - Updated the graphs linked in PerformanceSmall.md with BLASFEO results, and added documenting language accordingly. - Updated scripts in test/sup/octave to plot BLASFEO data. - Minor tweak to language re: how OpenBLAS was configured for docs/Performance.md.
@@ -137,8 +137,8 @@ size of interest so that we can better assist you.
|
||||
* Multithreaded (28 core) execution requested via `export BLIS_JC_NT=4 BLIS_IC_NT=7`
|
||||
* Multithreaded (56 core) execution requested via `export BLIS_JC_NT=8 BLIS_IC_NT=7`
|
||||
* OpenBLAS 52d3f7a
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=56` (multithreaded, 56 cores)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=56` (multithreaded, 56 cores)
|
||||
* Single-threaded (1 core) execution requested via `export OPENBLAS_NUM_THREADS=1`
|
||||
* Multithreaded (28 core) execution requested via `export OPENBLAS_NUM_THREADS=28`
|
||||
* Multithreaded (56 core) execution requested via `export OPENBLAS_NUM_THREADS=56`
|
||||
@@ -197,8 +197,8 @@ size of interest so that we can better assist you.
|
||||
* Multithreaded (26 core) execution requested via `export BLIS_JC_NT=2 BLIS_IC_NT=13`
|
||||
* Multithreaded (52 core) execution requested via `export BLIS_JC_NT=4 BLIS_IC_NT=13`
|
||||
* OpenBLAS 0.3.5
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=52` (multithreaded, 52 cores)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=52` (multithreaded, 52 cores)
|
||||
* Single-threaded (1 core) execution requested via `export OPENBLAS_NUM_THREADS=1`
|
||||
* Multithreaded (26 core) execution requested via `export OPENBLAS_NUM_THREADS=26`
|
||||
* Multithreaded (52 core) execution requested via `export OPENBLAS_NUM_THREADS=52`
|
||||
@@ -269,8 +269,8 @@ size of interest so that we can better assist you.
|
||||
* Multithreaded (12 core) execution requested via `export BLIS_JC_NT=2 BLIS_IC_NT=3 BLIS_JR_NT=2`
|
||||
* Multithreaded (24 core) execution requested via `export BLIS_JC_NT=4 BLIS_IC_NT=3 BLIS_JR_NT=2`
|
||||
* OpenBLAS 0.3.5
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=24` (multithreaded, 24 cores)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=24` (multithreaded, 24 cores)
|
||||
* Single-threaded (1 core) execution requested via `export OPENBLAS_NUM_THREADS=1`
|
||||
* Multithreaded (12 core) execution requested via `export OPENBLAS_NUM_THREADS=12`
|
||||
* Multithreaded (24 core) execution requested via `export OPENBLAS_NUM_THREADS=24`
|
||||
@@ -339,8 +339,8 @@ size of interest so that we can better assist you.
|
||||
* Multithreaded (32 core) execution requested via `export BLIS_JC_NT=1 BLIS_IC_NT=8 BLIS_JR_NT=4`
|
||||
* Multithreaded (64 core) execution requested via `export BLIS_JC_NT=2 BLIS_IC_NT=8 BLIS_JR_NT=4`
|
||||
* OpenBLAS 0.3.5
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=64` (multithreaded, 64 cores)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=64` (multithreaded, 64 cores)
|
||||
* Single-threaded (1 core) execution requested via `export OPENBLAS_NUM_THREADS=1`
|
||||
* Multithreaded (32 core) execution requested via `export OPENBLAS_NUM_THREADS=32`
|
||||
* Multithreaded (64 core) execution requested via `export OPENBLAS_NUM_THREADS=64`
|
||||
|
||||
@@ -112,13 +112,15 @@ size of interest so that we can better assist you.
|
||||
* single-core: 57.6 GFLOPS (double-precision), 115.2 GFLOPS (single-precision)
|
||||
* Operating system: Gentoo Linux (Linux kernel 5.0.7)
|
||||
* Compiler: gcc 7.3.0
|
||||
* Results gathered: 31 May 2019
|
||||
* Results gathered: 31 May 2019, 3 June 2019
|
||||
* Implementations tested:
|
||||
* BLIS 6bf449c (0.5.2-42)
|
||||
* configured with `./configure --enable-cblas auto`
|
||||
* sub-configuration exercised: `haswell`
|
||||
* OpenBLAS 0.3.6
|
||||
* configured with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* BLASFEO 75a3dd8
|
||||
* configured `Makefile.rule` with: `BLAS_API=1 FORTRAN_BLAS_API=1 CBLAS_API=1`.
|
||||
* Eigen 3.3.90
|
||||
* Obtained via the [Eigen git mirror](https://github.com/eigenteam/eigen-git-mirror) (30 May 2019)
|
||||
* Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal).
|
||||
@@ -170,13 +172,15 @@ size of interest so that we can better assist you.
|
||||
* single-core: 24 GFLOPS (double-precision), 48 GFLOPS (single-precision)
|
||||
* Operating system: Ubuntu 18.04 (Linux kernel 4.15.0)
|
||||
* Compiler: gcc 7.3.0
|
||||
* Results gathered: 31 May 2019
|
||||
* Results gathered: 31 May 2019, 3 June 2019
|
||||
* Implementations tested:
|
||||
* BLIS 6bf449c (0.5.2-42)
|
||||
* configured with `./configure --enable-cblas auto`
|
||||
* sub-configuration exercised: `zen`
|
||||
* OpenBLAS 0.3.6
|
||||
* configured with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured `Makefile.rule` with `BINARY=64 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* BLASFEO 75a3dd8
|
||||
* configured `Makefile.rule` with: `BLAS_API=1 FORTRAN_BLAS_API=1 CBLAS_API=1`.
|
||||
* Eigen 3.3.90
|
||||
* Obtained via the [Eigen git mirror](https://github.com/eigenteam/eigen-git-mirror) (30 May 2019)
|
||||
* Prior to compilation, modified top-level `CMakeLists.txt` to ensure that `-march=native` was added to `CXX_FLAGS` variable (h/t Sameer Agarwal).
|
||||
|
||||
|
Before Width: | Height: | Size: 151 KiB After Width: | Height: | Size: 168 KiB |
|
Before Width: | Height: | Size: 175 KiB After Width: | Height: | Size: 198 KiB |
|
Before Width: | Height: | Size: 152 KiB After Width: | Height: | Size: 170 KiB |
|
Before Width: | Height: | Size: 184 KiB After Width: | Height: | Size: 206 KiB |
@@ -3,6 +3,7 @@ function r_val = plot_l3sup_perf( opname, ...
|
||||
data_blislpab, ...
|
||||
data_eigen, ...
|
||||
data_open, ...
|
||||
data_bfeo, ...
|
||||
data_vend, vend_str, ...
|
||||
nth, ...
|
||||
rows, cols, ...
|
||||
@@ -31,6 +32,7 @@ color_blissup = 'k'; lines_blissup = '-'; markr_blissup = '';
|
||||
color_blislpab = 'k'; lines_blislpab = ':'; markr_blislpab = '';
|
||||
color_eigen = 'm'; lines_eigen = '-.'; markr_eigen = 'o';
|
||||
color_open = 'r'; lines_open = '--'; markr_open = 'o';
|
||||
color_bfeo = 'c'; lines_bfeo = '-'; markr_bfeo = 'o';
|
||||
color_vend = 'b'; lines_vend = '-.'; markr_vend = '.';
|
||||
|
||||
% Compute the peak performance in terms of the number of double flops
|
||||
@@ -54,6 +56,7 @@ blissup_legend = sprintf( 'BLIS sup' );
|
||||
blislpab_legend = sprintf( 'BLIS conv' );
|
||||
eigen_legend = sprintf( 'Eigen' );
|
||||
open_legend = sprintf( 'OpenBLAS' );
|
||||
bfeo_legend = sprintf( 'BLASFEO' );
|
||||
%vend_legend = sprintf( 'MKL' );
|
||||
%vend_legend = sprintf( 'ARMPL' );
|
||||
vend_legend = vend_str;
|
||||
@@ -113,6 +116,9 @@ eigen_ln = line( x_axis( :, 1 ), data_eigen( :, flopscol ) / nth, ...
|
||||
open_ln = line( x_axis( :, 1 ), data_open( :, flopscol ) / nth, ...
|
||||
'Color',color_open, 'LineStyle',lines_open, ...
|
||||
'LineWidth',linesize );
|
||||
bfeo_ln = line( x_axis( :, 1 ), data_bfeo( :, flopscol ) / nth, ...
|
||||
'Color',color_bfeo, 'LineStyle',lines_bfeo, ...
|
||||
'LineWidth',linesize );
|
||||
vend_ln = line( x_axis( :, 1 ), data_vend( :, flopscol ) / nth, ...
|
||||
'Color',color_vend, 'LineStyle',lines_vend, ...
|
||||
'LineWidth',linesize );
|
||||
@@ -130,6 +136,9 @@ eigen_ln = line( nan, nan, ...
|
||||
open_ln = line( nan, nan, ...
|
||||
'Color',color_open, 'LineStyle',lines_open, ...
|
||||
'LineWidth',linesize );
|
||||
bfeo_ln = line( nan, nan, ...
|
||||
'Color',color_bfeo, 'LineStyle',lines_bfeo, ...
|
||||
'LineWidth',linesize );
|
||||
vend_ln = line( nan, nan, ...
|
||||
'Color',color_vend, 'LineStyle',lines_vend, ...
|
||||
'LineWidth',linesize );
|
||||
@@ -168,12 +177,14 @@ if rows == 4 && cols == 7
|
||||
blislpab_ln ...
|
||||
eigen_ln ...
|
||||
open_ln ...
|
||||
bfeo_ln ...
|
||||
vend_ln ...
|
||||
], ...
|
||||
blissup_legend, ...
|
||||
blislpab_legend, ...
|
||||
eigen_legend, ...
|
||||
open_legend, ...
|
||||
bfeo_legend, ...
|
||||
vend_legend, ...
|
||||
'Location', legend_loc );
|
||||
set( leg,'Box','off' );
|
||||
@@ -185,8 +196,8 @@ if rows == 4 && cols == 7
|
||||
set( leg,'FontSize',fontsize );
|
||||
set( leg,'Position',[11.92 6.54 1.15 0.7 ] ); % (1,4tl)
|
||||
else
|
||||
set( leg,'FontSize',fontsize );
|
||||
set( leg,'Position',[18.34 10.22 1.15 0.7 ] ); % (1,4tl)
|
||||
set( leg,'FontSize',fontsize-1 );
|
||||
set( leg,'Position',[18.24 10.15 1.15 0.7 ] ); % (1,4tl)
|
||||
end
|
||||
elseif nth > 1 && theid == legend_plot_id
|
||||
end
|
||||
|
||||
@@ -22,6 +22,7 @@ filetemp_blissup = '%s/output_%s_%s_blissup.m';
|
||||
filetemp_blislpab = '%s/output_%s_%s_blislpab.m';
|
||||
filetemp_eigen = '%s/output_%s_%s_eigen.m';
|
||||
filetemp_open = '%s/output_%s_%s_openblas.m';
|
||||
filetemp_bfeo = '%s/output_%s_%s_blasfeo.m';
|
||||
filetemp_vend = '%s/output_%s_%s_vendor.m';
|
||||
|
||||
% Create a variable name "template" for the variables contained in the
|
||||
@@ -76,6 +77,7 @@ for opi = 1:n_opsupnames
|
||||
file_blislpab = sprintf( filetemp_blislpab, dirpath, thr_str, opsupname );
|
||||
file_eigen = sprintf( filetemp_eigen, dirpath, thr_str, opsupname );
|
||||
file_open = sprintf( filetemp_open, dirpath, thr_str, opsupname );
|
||||
file_bfeo = sprintf( filetemp_bfeo, dirpath, thr_str, opsupname );
|
||||
file_vend = sprintf( filetemp_vend, dirpath, thr_str, opsupname );
|
||||
|
||||
% Load the data files.
|
||||
@@ -87,6 +89,8 @@ for opi = 1:n_opsupnames
|
||||
run( file_eigen )
|
||||
%str = sprintf( ' Loading %s', file_open ); disp(str);
|
||||
run( file_open )
|
||||
%str = sprintf( ' Loading %s', file_open ); disp(str);
|
||||
run( file_bfeo )
|
||||
%str = sprintf( ' Loading %s', file_vend ); disp(str);
|
||||
run( file_vend )
|
||||
|
||||
@@ -95,20 +99,23 @@ for opi = 1:n_opsupnames
|
||||
var_blislpab = sprintf( vartemp, thr_str, opname, 'blislpab' );
|
||||
var_eigen = sprintf( vartemp, thr_str, opname, 'eigen' );
|
||||
var_open = sprintf( vartemp, thr_str, opname, 'openblas' );
|
||||
var_bfeo = sprintf( vartemp, thr_str, opname, 'blasfeo' );
|
||||
var_vend = sprintf( vartemp, thr_str, opname, 'vendor' );
|
||||
|
||||
% Use eval() to instantiate the variable names constructed above,
|
||||
% copying each to a simplified name.
|
||||
data_blissup = eval( var_blissup ); % e.g. data_st_sgemm_blissup( :, : );
|
||||
data_blislpab = eval( var_blislpab ); % e.g. data_st_sgemm_blislpab( :, : );
|
||||
data_eigen = eval( var_eigen ); % e.g. data_st_sgemm_eigen( :, : );
|
||||
data_open = eval( var_open ); % e.g. data_st_sgemm_openblas( :, : );
|
||||
data_vend = eval( var_vend ); % e.g. data_st_sgemm_vendor( :, : );
|
||||
data_blissup = eval( var_blissup ); % e.g. data_st_dgemm_blissup( :, : );
|
||||
data_blislpab = eval( var_blislpab ); % e.g. data_st_dgemm_blislpab( :, : );
|
||||
data_eigen = eval( var_eigen ); % e.g. data_st_dgemm_eigen( :, : );
|
||||
data_open = eval( var_open ); % e.g. data_st_dgemm_openblas( :, : );
|
||||
data_bfeo = eval( var_bfeo ); % e.g. data_st_dgemm_blasfeo( :, : );
|
||||
data_vend = eval( var_vend ); % e.g. data_st_dgemm_vendor( :, : );
|
||||
|
||||
%str = sprintf( ' Reading %s', var_blissup ); disp(str);
|
||||
%str = sprintf( ' Reading %s', var_blislpab ); disp(str);
|
||||
%str = sprintf( ' Reading %s', var_eigen ); disp(str);
|
||||
%str = sprintf( ' Reading %s', var_open ); disp(str);
|
||||
%str = sprintf( ' Reading %s', var_bfeo ); disp(str);
|
||||
%str = sprintf( ' Reading %s', var_vend ); disp(str);
|
||||
|
||||
% Plot one result in an m x n grid of plots, via the subplot()
|
||||
@@ -119,6 +126,7 @@ for opi = 1:n_opsupnames
|
||||
data_blislpab, ...
|
||||
data_eigen, ...
|
||||
data_open, ...
|
||||
data_bfeo, ...
|
||||
data_vend, vend_str, ...
|
||||
nth, ...
|
||||
4, 7, ...
|
||||
@@ -131,6 +139,7 @@ for opi = 1:n_opsupnames
|
||||
clear data_blislpab;
|
||||
clear data_eigen;
|
||||
clear data_open;
|
||||
clear data_bfeo;
|
||||
clear data_vend;
|
||||
|
||||
end
|
||||
|
||||