mirror of
https://github.com/amd/blis.git
synced 2026-04-20 07:38:53 +00:00
Added Eigen support to test/3 Makefile, runme.sh.
Details: - Added targets to test/3/Makefile that link against a BLAS library build by Eigen. It appears, however, that Eigen's BLAS library does not support multithreading. (It may be that multithreading is only available when using the native C++ APIs.) - Updated runme.sh with a few Eigen-related tweaks. - Minor tweaks to docs/Performance.md.
This commit is contained in:
@@ -49,17 +49,17 @@ Theoretical peak performance, in units of GFLOPS/core, is calculated as the
|
||||
product of:
|
||||
1. the maximum sustainable clock rate in GHz; and
|
||||
2. the maximum number of floating-point operations (flops) that can be
|
||||
executed per cycle.
|
||||
executed per cycle (per core).
|
||||
|
||||
Note that the maximum sustainable clock rate may change depending on the
|
||||
conditions.
|
||||
For example, on some systems the maximum clock rate is higher when only one
|
||||
core is active (e.g. single-threaded performance) versus when all cores are
|
||||
active (e.g. multithreaded performance).
|
||||
The maximum number of flops executable per cycle is generally computed as the
|
||||
product of:
|
||||
The maximum number of flops executable per cycle (per core) is generally
|
||||
computed as the product of:
|
||||
1. the maximum number of fused multiply-add (FMA) vector instructions that
|
||||
can be issued per cycle;
|
||||
can be issued per cycle (per core);
|
||||
2. the maximum number of elements that can be stored within a single vector
|
||||
register (for the datatype in question); and
|
||||
3. 2.0, since an FMA instruction fuses two operations (a multiply and an add).
|
||||
@@ -90,7 +90,7 @@ allow it to finish.
|
||||
|
||||
Where along the x-axis you focus your attention will depend on the segment of
|
||||
the problem size range that you care about most. Some people's applications
|
||||
depend heavily on smaller problems, where "small" can mean anything from 200
|
||||
depend heavily on smaller problems, where "small" can mean anything from 10
|
||||
to 1000 or even higher. Some people consider 1000 to be quite large, while
|
||||
others insist that 5000 is merely "medium." What each of us considers to be
|
||||
small, medium, or large (naturally) depends heavily on the kinds of dense
|
||||
@@ -132,10 +132,16 @@ size of interest so that we can better assist you.
|
||||
* sub-configuration exercised: `thunderx2`
|
||||
* OpenBLAS 52d3f7a
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=56` (multithreaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=56` (multithreaded, 56 cores)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=1` (single-threaded)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=28` (multithreaded, 28 cores)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=56` (multithreaded, 56 cores)
|
||||
* ARMPL 18.4
|
||||
* Requested threading via `export OMP_NUM_THREADS=1` (single-threaded)
|
||||
* Requested threading via `export OMP_NUM_THREADS=28` (multithreaded, 28 cores)
|
||||
* Requested threading via `export OMP_NUM_THREADS=56` (multithreaded, 56 cores)
|
||||
* Affinity:
|
||||
* Thread affinity for BLIS was specified manually via `GOMP_CPU_AFFINITY="0 1 2 3 ... 55"`. However, multithreaded OpenBLAS appears to revert to single-threaded execution if `GOMP_CPU_AFFINITY` is set. Therefore, when measuring OpenBLAS performance, the `GOMP_CPU_AFFINITY` was unset.
|
||||
* Thread affinity for BLIS was specified manually via `GOMP_CPU_AFFINITY="0 1 2 3 ... 55"`. However, multithreaded OpenBLAS appears to revert to single-threaded execution if `GOMP_CPU_AFFINITY` is set. Therefore, when measuring OpenBLAS performance, the `GOMP_CPU_AFFINITY` environment variable was unset.
|
||||
* Frequency throttling (via `cpupower`):
|
||||
* No changes made.
|
||||
* Comments:
|
||||
@@ -183,10 +189,16 @@ size of interest so that we can better assist you.
|
||||
* sub-configuration exercised: `skx`
|
||||
* OpenBLAS 0.3.5
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=52` (multithreaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=52` (multithreaded, 52 cores)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=1` (single-threaded)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=26` (multithreaded, 26 cores)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=52` (multithreaded, 52 cores)
|
||||
* MKL 2019 update 1
|
||||
* Requested threading via `export MKL_NUM_THREADS=1` (single-threaded)
|
||||
* Requested threading via `export MKL_NUM_THREADS=26` (multithreaded, 26 cores)
|
||||
* Requested threading via `export MKL_NUM_THREADS=52` (multithreaded, 52 cores)
|
||||
* Affinity:
|
||||
* Thread affinity for BLIS was specified manually via `GOMP_CPU_AFFINITY="0 1 2 3 ... 51"`. However, multithreaded OpenBLAS appears to revert to single-threaded execution if `GOMP_CPU_AFFINITY` is set. Therefore, when measuring OpenBLAS performance, the `GOMP_CPU_AFFINITY` was unset.
|
||||
* Thread affinity for BLIS was specified manually via `GOMP_CPU_AFFINITY="0 1 2 3 ... 51"`. However, multithreaded OpenBLAS appears to revert to single-threaded execution if `GOMP_CPU_AFFINITY` is set. Therefore, when measuring OpenBLAS performance, the `GOMP_CPU_AFFINITY` environment variable was unset.
|
||||
* Frequency throttling (via `cpupower`):
|
||||
* No changes made.
|
||||
* Comments:
|
||||
@@ -234,10 +246,16 @@ size of interest so that we can better assist you.
|
||||
* sub-configuration exercised: `haswell`
|
||||
* OpenBLAS 0.3.5
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=24` (multithreaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=24` (multithreaded, 24 cores)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=1` (single-threaded)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=12` (multithreaded, 12 cores)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=24` (multithreaded, 24 cores)
|
||||
* MKL 2018 update 2
|
||||
* Requested threading via `export MKL_NUM_THREADS=1` (single-threaded)
|
||||
* Requested threading via `export MKL_NUM_THREADS=12` (multithreaded, 12 cores)
|
||||
* Requested threading via `export MKL_NUM_THREADS=24` (multithreaded, 24 cores)
|
||||
* Affinity:
|
||||
* Thread affinity for BLIS was specified manually via `GOMP_CPU_AFFINITY="0 1 2 3 ... 23"`. However, multithreaded OpenBLAS appears to revert to single-threaded execution if `GOMP_CPU_AFFINITY` is set. Therefore, when measuring OpenBLAS performance, the `GOMP_CPU_AFFINITY` was unset.
|
||||
* Thread affinity for BLIS was specified manually via `GOMP_CPU_AFFINITY="0 1 2 3 ... 23"`. However, multithreaded OpenBLAS appears to revert to single-threaded execution if `GOMP_CPU_AFFINITY` is set. Therefore, when measuring OpenBLAS performance, the `GOMP_CPU_AFFINITY` environment variable was unset.
|
||||
* Frequency throttling (via `cpupower`):
|
||||
* No changes made.
|
||||
* Comments:
|
||||
@@ -286,10 +304,16 @@ size of interest so that we can better assist you.
|
||||
* sub-configuration exercised: `zen`
|
||||
* OpenBLAS 0.3.5
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=0` (single-threaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=64` (multithreaded)
|
||||
* configured with `BINARY=64 NO_CBLAS=1 NO_LAPACK=1 NO_LAPACKE=1 USE_THREAD=1 NUM_THREADS=64` (multithreaded, 64 cores)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=1` (single-threaded)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=32` (multithreaded, 32 cores)
|
||||
* Requested threading via `export OPENBLAS_NUM_THREADS=64` (multithreaded, 64 cores)
|
||||
* MKL 2019 update 1
|
||||
* Requested threading via `export MKL_NUM_THREADS=1` (single-threaded)
|
||||
* Requested threading via `export MKL_NUM_THREADS=32` (multithreaded, 32 cores)
|
||||
* Requested threading via `export MKL_NUM_THREADS=64` (multithreaded, 64 cores)
|
||||
* Affinity:
|
||||
* Thread affinity for BLIS was specified manually via `GOMP_CPU_AFFINITY="0 1 2 3 ... 63"`. However, multithreaded OpenBLAS appears to revert to single-threaded execution if `GOMP_CPU_AFFINITY` is set. Therefore, when measuring OpenBLAS performance, the `GOMP_CPU_AFFINITY` was unset.
|
||||
* Thread affinity for BLIS was specified manually via `GOMP_CPU_AFFINITY="0 1 2 3 ... 63"`. However, multithreaded OpenBLAS appears to revert to single-threaded execution if `GOMP_CPU_AFFINITY` is set. Therefore, when measuring OpenBLAS performance, the `GOMP_CPU_AFFINITY` environment variable was unset.
|
||||
* Frequency throttling (via `cpupower`):
|
||||
* Driver: acpi-cpufreq
|
||||
* Governor: performance
|
||||
|
||||
@@ -106,6 +106,10 @@ OPENBLASP_LIB := $(HOME_LIB_PATH)/libopenblasp.a
|
||||
#ATLAS_LIB := $(HOME_LIB_PATH)/libf77blas.a \
|
||||
# $(HOME_LIB_PATH)/libatlas.a
|
||||
|
||||
# Eigen
|
||||
EIGEN_LIB := $(HOME_LIB_PATH)/libeigen.a
|
||||
EIGENP_LIB := $(HOME_LIB_PATH)/libeigen.a
|
||||
|
||||
# MKL
|
||||
MKL_LIB := -L$(MKL_LIB_PATH) \
|
||||
-lmkl_intel_lp64 \
|
||||
@@ -199,6 +203,7 @@ DNAT := -DIND=BLIS_NAT
|
||||
#STR_1M := -DSTR=\"1m\"
|
||||
STR_NAT := -DSTR=\"asm_blis\"
|
||||
STR_OBL := -DSTR=\"openblas\"
|
||||
STR_EIG := -DSTR=\"eigen\"
|
||||
STR_VEN := -DSTR=\"vendor\"
|
||||
|
||||
# Single or multithreaded string
|
||||
@@ -220,6 +225,7 @@ PDEF_2S := -DP_BEGIN=$(P2_BEGIN) -DP_INC=$(P2_INC) -DP_MAX=$(P2_MAX)
|
||||
all: all-st all-1s all-2s
|
||||
blis: blis-st blis-1s blis-2s
|
||||
openblas: openblas-st openblas-1s openblas-2s
|
||||
eigen: eigen-st eigen-1s eigen-2s
|
||||
vendor: vendor-st vendor-1s vendor-2s
|
||||
mkl: vendor
|
||||
armpl: vendor
|
||||
@@ -238,7 +244,7 @@ blis-nat: blis-nat-st blis-nat-1s blis-nat-2s
|
||||
# Define the datatypes, operations, and implementations.
|
||||
DTS := s d c z
|
||||
OPS := gemm hemm herk trmm trsm
|
||||
IMPLS := asm_blis openblas vendor
|
||||
IMPLS := asm_blis openblas eigen vendor
|
||||
|
||||
# Define functions to construct object filenames from the datatypes and
|
||||
# operations given an implementation. We define one function for single-
|
||||
@@ -263,6 +269,13 @@ OPENBLAS_1S_BINS := $(patsubst %.o,%.x,$(OPENBLAS_1S_OBJS))
|
||||
OPENBLAS_2S_OBJS := $(call get-2s-objs,openblas)
|
||||
OPENBLAS_2S_BINS := $(patsubst %.o,%.x,$(OPENBLAS_2S_OBJS))
|
||||
|
||||
EIGEN_ST_OBJS := $(call get-st-objs,eigen)
|
||||
EIGEN_ST_BINS := $(patsubst %.o,%.x,$(EIGEN_ST_OBJS))
|
||||
EIGEN_1S_OBJS := $(call get-1s-objs,eigen)
|
||||
EIGEN_1S_BINS := $(patsubst %.o,%.x,$(EIGEN_1S_OBJS))
|
||||
EIGEN_2S_OBJS := $(call get-2s-objs,eigen)
|
||||
EIGEN_2S_BINS := $(patsubst %.o,%.x,$(EIGEN_2S_OBJS))
|
||||
|
||||
VENDOR_ST_OBJS := $(call get-st-objs,vendor)
|
||||
VENDOR_ST_BINS := $(patsubst %.o,%.x,$(VENDOR_ST_OBJS))
|
||||
VENDOR_1S_OBJS := $(call get-1s-objs,vendor)
|
||||
@@ -279,6 +292,10 @@ openblas-st: $(OPENBLAS_ST_BINS)
|
||||
openblas-1s: $(OPENBLAS_1S_BINS)
|
||||
openblas-2s: $(OPENBLAS_2S_BINS)
|
||||
|
||||
eigen-st: $(EIGEN_ST_BINS)
|
||||
eigen-1s: $(EIGEN_1S_BINS)
|
||||
eigen-2s: $(EIGEN_2S_BINS)
|
||||
|
||||
vendor-st: $(VENDOR_ST_BINS)
|
||||
vendor-1s: $(VENDOR_1S_BINS)
|
||||
vendor-2s: $(VENDOR_2S_BINS)
|
||||
@@ -293,9 +310,10 @@ armpl-2s: vendor-2s
|
||||
|
||||
# Mark the object files as intermediate so that make will remove them
|
||||
# automatically after building the binaries on which they depend.
|
||||
.INTERMEDIATE: $(BLIS_NAT_ST_OBJS) $(OPENBLAS_ST_OBJS) $(VENDOR_ST_OBJS)
|
||||
.INTERMEDIATE: $(BLIS_NAT_1S_OBJS) $(OPENBLAS_1S_OBJS) $(VENDOR_1S_OBJS)
|
||||
.INTERMEDIATE: $(BLIS_NAT_2S_OBJS) $(OPENBLAS_2S_OBJS) $(VENDOR_2S_OBJS)
|
||||
.INTERMEDIATE: $(BLIS_NAT_ST_OBJS) $(BLIS_NAT_1S_OBJS) $(BLIS_NAT_2S_OBJS)
|
||||
.INTERMEDIATE: $(OPENBLAS_ST_OBJS) $(OPENBLAS_1S_OBJS) $(OPENBLAS_2S_OBJS)
|
||||
.INTERMEDIATE: $(EIGEN_ST_OBJS) $(EIGEN_1S_OBJS) $(EIGEN_2S_OBJS)
|
||||
.INTERMEDIATE: $(VENDOR_ST_OBJS) $(VENDOR_1S_OBJS) $(VENDOR_2S_OBJS)
|
||||
|
||||
|
||||
# --Object file rules --
|
||||
@@ -312,7 +330,8 @@ get-dt-cpp = -DDT=bli_$(1)type
|
||||
get-bl-cpp = $(strip \
|
||||
$(if $(findstring blis,$(1)),$(STR_NAT) $(BLI_DEF),\
|
||||
$(if $(findstring openblas,$(1)),$(STR_OBL) $(BLA_DEF),\
|
||||
$(STR_VEN) $(BLA_DEF))))
|
||||
$(if $(findstring eigen,$(1)),$(STR_EIG) $(BLA_DEF),\
|
||||
$(STR_VEN) $(BLA_DEF)))))
|
||||
|
||||
define make-st-rule
|
||||
test_$(1)$(2)_$(PS_MAX)_$(3)_st.o: test_$(op).c Makefile
|
||||
@@ -349,34 +368,44 @@ $(foreach im,$(IMPLS),$(eval $(call make-2s-rule,$(dt),$(op),$(im))))))
|
||||
# compatibility layer. This prevents BLIS from inadvertently getting called
|
||||
# for the BLAS routines we are trying to test with.
|
||||
|
||||
test_%_$(PS_MAX)_openblas_st.x: test_%_$(PS_MAX)_openblas_st.o $(LIBBLIS_LINK)
|
||||
$(LINKER) $(strip $< $(OPENBLAS_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P1_MAX)_openblas_1s.x: test_%_$(P1_MAX)_openblas_1s.o $(LIBBLIS_LINK)
|
||||
$(LINKER) $(strip $< $(OPENBLASP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P2_MAX)_openblas_2s.x: test_%_$(P2_MAX)_openblas_2s.o $(LIBBLIS_LINK)
|
||||
$(LINKER) $(strip $< $(OPENBLASP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
|
||||
test_%_$(PS_MAX)_vendor_st.x: test_%_$(PS_MAX)_vendor_st.o $(LIBBLIS_LINK)
|
||||
$(LINKER) $(strip $< $(VENDOR_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P1_MAX)_vendor_1s.x: test_%_$(P1_MAX)_vendor_1s.o $(LIBBLIS_LINK)
|
||||
$(LINKER) $(strip $< $(VENDORP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P2_MAX)_vendor_2s.x: test_%_$(P2_MAX)_vendor_2s.o $(LIBBLIS_LINK)
|
||||
$(LINKER) $(strip $< $(VENDORP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
|
||||
test_%_$(PS_MAX)_asm_blis_st.x: test_%_$(PS_MAX)_asm_blis_st.o $(LIBBLIS_LINK)
|
||||
$(LINKER) $(strip $< $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
$(CC) $(strip $< $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P1_MAX)_asm_blis_1s.x: test_%_$(P1_MAX)_asm_blis_1s.o $(LIBBLIS_LINK)
|
||||
$(LINKER) $(strip $< $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
$(CC) $(strip $< $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P2_MAX)_asm_blis_2s.x: test_%_$(P2_MAX)_asm_blis_2s.o $(LIBBLIS_LINK)
|
||||
$(LINKER) $(strip $< $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
$(CC) $(strip $< $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
|
||||
test_%_$(PS_MAX)_openblas_st.x: test_%_$(PS_MAX)_openblas_st.o $(LIBBLIS_LINK)
|
||||
$(CC) $(strip $< $(OPENBLAS_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P1_MAX)_openblas_1s.x: test_%_$(P1_MAX)_openblas_1s.o $(LIBBLIS_LINK)
|
||||
$(CC) $(strip $< $(OPENBLASP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P2_MAX)_openblas_2s.x: test_%_$(P2_MAX)_openblas_2s.o $(LIBBLIS_LINK)
|
||||
$(CC) $(strip $< $(OPENBLASP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
|
||||
test_%_$(PS_MAX)_eigen_st.x: test_%_$(PS_MAX)_eigen_st.o $(LIBBLIS_LINK)
|
||||
$(CXX) $(strip $< $(EIGEN_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P1_MAX)_eigen_1s.x: test_%_$(P1_MAX)_eigen_1s.o $(LIBBLIS_LINK)
|
||||
$(CXX) $(strip $< $(EIGENP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P2_MAX)_eigen_2s.x: test_%_$(P2_MAX)_eigen_2s.o $(LIBBLIS_LINK)
|
||||
$(CXX) $(strip $< $(EIGENP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
|
||||
test_%_$(PS_MAX)_vendor_st.x: test_%_$(PS_MAX)_vendor_st.o $(LIBBLIS_LINK)
|
||||
$(CC) $(strip $< $(VENDOR_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P1_MAX)_vendor_1s.x: test_%_$(P1_MAX)_vendor_1s.o $(LIBBLIS_LINK)
|
||||
$(CC) $(strip $< $(VENDORP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
test_%_$(P2_MAX)_vendor_2s.x: test_%_$(P2_MAX)_vendor_2s.o $(LIBBLIS_LINK)
|
||||
$(CC) $(strip $< $(VENDORP_LIB) $(LIBBLIS_LINK) $(LDFLAGS) -o $@)
|
||||
|
||||
|
||||
# -- Clean rules --
|
||||
|
||||
@@ -78,6 +78,10 @@ if [ "${impls}" = "blis" ]; then
|
||||
|
||||
test_impls="asm_blis"
|
||||
|
||||
elif [ "${impls}" = "eigen" ]; then
|
||||
|
||||
test_impls="eigen"
|
||||
|
||||
elif [ "${impls}" = "other" ]; then
|
||||
|
||||
test_impls="openblas vendor"
|
||||
@@ -148,13 +152,24 @@ for th in ${threads}; do
|
||||
# Set the number of threads according to th.
|
||||
if [ "${suf}" = "1s" ] || [ "${suf}" = "2s" ]; then
|
||||
|
||||
export BLIS_JC_NT=${jc_nt}
|
||||
export BLIS_PC_NT=${pc_nt}
|
||||
export BLIS_IC_NT=${ic_nt}
|
||||
export BLIS_JR_NT=${jr_nt}
|
||||
export BLIS_IR_NT=${ir_nt}
|
||||
export OPENBLAS_NUM_THREADS=${nt}
|
||||
export MKL_NUM_THREADS=${nt}
|
||||
# Set the threading parameters based on the implementation
|
||||
# that we are preparing to run.
|
||||
if [ "${im}" = "asm_blis" ]
|
||||
unset OMP_NUM_THREADS
|
||||
export BLIS_JC_NT=${jc_nt}
|
||||
export BLIS_PC_NT=${pc_nt}
|
||||
export BLIS_IC_NT=${ic_nt}
|
||||
export BLIS_JR_NT=${jr_nt}
|
||||
export BLIS_IR_NT=${ir_nt}
|
||||
elif [ "${im}" = "openblas" ]
|
||||
unset OMP_NUM_THREADS
|
||||
export OPENBLAS_NUM_THREADS=${nt}
|
||||
elif [ "${im}" = "eigen" ]
|
||||
export OMP_NUM_THREADS=${nt}
|
||||
elif [ "${im}" = "vendor" ]
|
||||
unset OMP_NUM_THREADS
|
||||
export MKL_NUM_THREADS=${nt}
|
||||
fi
|
||||
export nt_use=${nt}
|
||||
|
||||
# Multithreaded OpenBLAS seems to have a problem running
|
||||
@@ -173,6 +188,7 @@ for th in ${threads}; do
|
||||
export BLIS_IC_NT=1
|
||||
export BLIS_JR_NT=1
|
||||
export BLIS_IR_NT=1
|
||||
export OMP_NUM_THREADS=1
|
||||
export OPENBLAS_NUM_THREADS=1
|
||||
export MKL_NUM_THREADS=1
|
||||
export nt_use=1
|
||||
|
||||
Reference in New Issue
Block a user