mirror of
https://github.com/amd/blis.git
synced 2026-04-19 23:28:52 +00:00
Merge commit '5013a6cb' into amd-main
* commit '5013a6cb': More edits and fixes to docs/FAQ.md. Fixed newly broken link to CREDITS in FAQ.md. More minor fixes to FAQ.md and Sandboxes.md. Updates to FAQ.md, Sandboxes.md, and README.md. Safelist 'master', 'dev', 'amd' branches. Re-enable and fixfb93d24. Revertedfb93d24. Re-enable and fix8e0c425(BLIS_ENABLE_SYSTEM). Removed last vestige of #define BLIS_NUM_ARCHS. Added new packm var3 to 'gemmlike'. Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell. Fix more copy-paste errors in the haswell gemmsup code. Do a fast test on OSX. [ci skip] Fix AArch64 tests and consolidate some other tests. Use C++ cross-compiler for ARM tests. Attempt to fix cxx-test for OOT builds. Updated travis-ci.org link in README.md to .com. Disabled (at least temporarily) commit8e0c425. Define BLIS_OS_NONE when using --disable-system. Updated stale calls to malloc_intl() in gemmlike. Blacklist clang10/gcc9 and older for 'armsve'. Add test to Travis using C++ compiler to make sure blis.h is C++-compatible. Moved lang defs from _macro_def.h to _lang_defs.h. Minor tweaks to gemmlike sandbox. Added local _check() code to gemmlike sandbox. README.md citation updates (e.g. BLIS7 bibtex). Tweaks to gemmlike to facilitate 3rd party mods. Whitespace tweaks. Add row- and column-strides for A/B in obj_ukr_fn_t. Clean up some warnings that show up on clang/OSX. Remove schema field on obj_t (redundant) and add new API functions. Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects. Disabled sanity check in bli_pool_finalize(). Implement proposed new function pointer fields for obj_t. AMD-Internal: [CPUPL-2698] Change-Id: I6fc33351fa824580cf4f25b63f0370383cd9422d
This commit is contained in:
47
.travis.yml
47
.travis.yml
@@ -1,71 +1,59 @@
|
||||
language: c
|
||||
sudo: required
|
||||
dist: focal
|
||||
branches:
|
||||
only:
|
||||
- master
|
||||
- dev
|
||||
- amd
|
||||
matrix:
|
||||
include:
|
||||
# full testsuite (all tests except for mixed datatype)
|
||||
# full testsuite (all tests + mixed datatype (gemm_nn only) + salt + SDE + OOT)
|
||||
- os: linux
|
||||
compiler: gcc
|
||||
env: OOT=0 TEST=1 SDE=0 THR="none" CONF="auto" \
|
||||
PACKAGES="gcc-8 binutils"
|
||||
# mixed-datatype testsuite (gemm_nn only)
|
||||
- os: linux
|
||||
compiler: gcc
|
||||
env: OOT=0 TEST=MD SDE=0 THR="none" CONF="auto" \
|
||||
PACKAGES="gcc-8 binutils"
|
||||
# salt testsuite (fast set of operations+parameters)
|
||||
- os: linux
|
||||
compiler: gcc
|
||||
env: OOT=0 TEST=SALT SDE=0 THR="none" CONF="auto" \
|
||||
PACKAGES="gcc-8 binutils"
|
||||
# test x86_64 ukrs with SDE
|
||||
- os: linux
|
||||
compiler: gcc
|
||||
env: OOT=0 TEST=0 SDE=1 THR="none" CONF="x86_64" \
|
||||
env: OOT=1 TEST=ALL SDE=1 THR="none" CONF="x86_64" \
|
||||
PACKAGES="gcc-8 binutils"
|
||||
# openmp build
|
||||
- os: linux
|
||||
compiler: gcc
|
||||
env: OOT=0 TEST=0 SDE=0 THR="openmp" CONF="auto" \
|
||||
env: OOT=0 TEST=FAST SDE=0 THR="openmp" CONF="auto" \
|
||||
PACKAGES="gcc-8 binutils"
|
||||
# pthreads build
|
||||
- os: linux
|
||||
compiler: gcc
|
||||
env: OOT=0 TEST=0 SDE=0 THR="pthreads" CONF="auto" \
|
||||
PACKAGES="gcc-8 binutils"
|
||||
# out-of-tree build
|
||||
- os: linux
|
||||
compiler: gcc
|
||||
env: OOT=1 TEST=0 SDE=0 THR="none" CONF="auto" \
|
||||
env: OOT=0 TEST=FAST SDE=0 THR="pthreads" CONF="auto" \
|
||||
PACKAGES="gcc-8 binutils"
|
||||
# clang build
|
||||
- os: linux
|
||||
compiler: clang
|
||||
env: OOT=0 TEST=0 SDE=0 THR="none" CONF="auto"
|
||||
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="auto"
|
||||
# There seems to be some difficulty installing 2 Clang toolchains of different versions.
|
||||
# Use the TravisCI default.
|
||||
# PACKAGES="clang-8 binutils"
|
||||
# macOS with system compiler (clang)
|
||||
- os: osx
|
||||
compiler: clang
|
||||
env: OOT=0 TEST=1 SDE=0 THR="none" CONF="auto"
|
||||
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="auto"
|
||||
# cortexa15 build and fast testsuite (qemu)
|
||||
- os: linux
|
||||
compiler: arm-linux-gnueabihf-gcc
|
||||
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="cortexa15" \
|
||||
PACKAGES="gcc-arm-linux-gnueabihf libc6-dev-armhf-cross qemu-system-arm qemu-user" \
|
||||
CC=arm-linux-gnueabihf-gcc CXX=arm-linux-gnueabihf-g++ \
|
||||
PACKAGES="gcc-arm-linux-gnueabihf g++-arm-linux-gnueabihf libc6-dev-armhf-cross qemu-system-arm qemu-user" \
|
||||
TESTSUITE_WRAPPER="qemu-arm -cpu cortex-a15 -L /usr/arm-linux-gnueabihf/"
|
||||
# cortexa57 build and fast testsuite (qemu)
|
||||
- os: linux
|
||||
compiler: aarch64-linux-gnu-gcc
|
||||
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="cortexa57" \
|
||||
PACKAGES="gcc-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
|
||||
CC=aarch64-linux-gnu-gcc CXX=aarch64-linux-gnu-g++ \
|
||||
PACKAGES="gcc-aarch64-linux-gnu g++-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
|
||||
TESTSUITE_WRAPPER="qemu-aarch64 -L /usr/aarch64-linux-gnu/"
|
||||
# armsve build and fast testsuite (qemu)
|
||||
- os: linux
|
||||
compiler: aarch64-linux-gnu-gcc-10
|
||||
env: OOT=0 TEST=FAST SDE=0 THR="none" CONF="armsve" \
|
||||
PACKAGES="gcc-10-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
|
||||
CC=aarch64-linux-gnu-gcc-10 CXX=aarch64-linux-gnu-g++-10 \
|
||||
PACKAGES="gcc-10-aarch64-linux-gnu g++-10-aarch64-linux-gnu libc6-dev-arm64-cross qemu-system-arm qemu-user" \
|
||||
TESTSUITE_WRAPPER="qemu-aarch64 -cpu max,sve=true,sve512=true -L /usr/aarch64-linux-gnu/"
|
||||
install:
|
||||
- if [ "$CC" = "gcc" ] && [ "$TRAVIS_OS_NAME" = "linux" ]; then export CC="gcc-8"; fi
|
||||
@@ -81,6 +69,7 @@ script:
|
||||
- $CC --version
|
||||
- make -j 2
|
||||
- make install
|
||||
- $DIST_PATH/travis/cxx/cxx-test.sh $DIST_PATH $(ls -1 include)
|
||||
# Qemu SVE is failing sgemmt in some cases. Skip as this issue is not observed on real chip (A64fx).
|
||||
- if [ "$CONF" = "armsve" ]; then sed -i 's/.*\<gemmt\>.*/0/' $DIST_PATH/testsuite/input.operations.fast; fi
|
||||
- if [ "$TEST" != "0" ]; then travis_wait 30 $DIST_PATH/travis/do_testsuite.sh; fi
|
||||
|
||||
@@ -798,7 +798,7 @@ endif()
|
||||
|
||||
# Disable tautological comparision warnings in clang.
|
||||
if("${CMAKE_CXX_COMPILER_ID}" MATCHES "Clang")
|
||||
list(APPEND CWARNFLAGS -Wno-tautological-compare)
|
||||
list(APPEND CWARNFLAGS -Wno-tautological-compare -Wno-pass-failed)
|
||||
endif()
|
||||
|
||||
# Add extra warning flags for Windows builds.
|
||||
@@ -1082,4 +1082,4 @@ add_custom_target(distclean
|
||||
COMMAND ${CMAKE_COMMAND} -P ${CMAKE_CURRENT_SOURCE_DIR}/build/distclean.cmake
|
||||
WORKING_DIRECTORY ${CMAKE_BINARY_DIR}
|
||||
COMMENT "Remove cmake_generated files and executables"
|
||||
)
|
||||
)
|
||||
|
||||
1
CREDITS
1
CREDITS
@@ -51,6 +51,7 @@ but many others have contributed code and feedback, including
|
||||
Tony Kelman @tkelman
|
||||
Lee Killough @leekillough (Cray)
|
||||
Mike Kistler @mkistler (IBM, Austin Research Laboratory)
|
||||
Ivan Korostelev @ivan23kor (University of Alberta)
|
||||
Kyungmin Lee @kyungminlee (Ohio State University)
|
||||
Michael Lehn @michael-lehn
|
||||
Shmuel Levine @ShmuelLevine
|
||||
|
||||
6
Makefile
6
Makefile
@@ -822,7 +822,7 @@ blastest-bin: check-env blastest-f2c $(BLASTEST_DRV_BIN_PATHS)
|
||||
blastest-run: $(BLASTEST_DRV_BINS_R)
|
||||
|
||||
# f2c object file rule.
|
||||
$(BASE_OBJ_BLASTEST_PATH)/%.o: $(BLASTEST_F2C_SRC_PATH)/%.c
|
||||
$(BASE_OBJ_BLASTEST_PATH)/%.o: $(BLASTEST_F2C_SRC_PATH)/%.c $(BLIS_H_FLAT)
|
||||
ifeq ($(ENABLE_VERBOSE),yes)
|
||||
$(CC) $(call get-user-cflags-for,$(CONFIG_NAME)) $(BLAT_CFLAGS) -c $< -o $@
|
||||
else
|
||||
@@ -831,7 +831,7 @@ else
|
||||
endif
|
||||
|
||||
# driver object file rule.
|
||||
$(BASE_OBJ_BLASTEST_PATH)/%.o: $(BLASTEST_DRV_SRC_PATH)/%.c
|
||||
$(BASE_OBJ_BLASTEST_PATH)/%.o: $(BLASTEST_DRV_SRC_PATH)/%.c $(BLIS_H_FLAT)
|
||||
ifeq ($(ENABLE_VERBOSE),yes)
|
||||
$(CC) $(call get-user-cflags-for,$(CONFIG_NAME)) $(BLAT_CFLAGS) -c $< -o $@
|
||||
else
|
||||
@@ -919,7 +919,7 @@ testsuite: testsuite-run
|
||||
testsuite-bin: check-env $(TESTSUITE_BIN)
|
||||
|
||||
# Object file rule.
|
||||
$(BASE_OBJ_TESTSUITE_PATH)/%.o: $(TESTSUITE_SRC_PATH)/%.c
|
||||
$(BASE_OBJ_TESTSUITE_PATH)/%.o: $(TESTSUITE_SRC_PATH)/%.c $(BLIS_H_FLAT)
|
||||
ifeq ($(ENABLE_VERBOSE),yes)
|
||||
$(CC) $(call get-user-cflags-for,$(CONFIG_NAME)) -c $< -o $@
|
||||
else
|
||||
|
||||
@@ -33,12 +33,39 @@
|
||||
|
||||
*/
|
||||
|
||||
#define BLIS_INLINE static
|
||||
#define BLIS_EXPORT_BLIS
|
||||
#include "bli_system.h"
|
||||
#include "bli_type_defs.h"
|
||||
#include "bli_arch.h"
|
||||
#include "bli_cpuid.h"
|
||||
// NOTE: This file will likely only ever get compiled as part of the BLIS
|
||||
// configure script, and therefore BLIS_CONFIGURETIME_CPUID is guaranteed to
|
||||
// be #defined. However, we preserve the cpp conditional for consistency with
|
||||
// the other three files mentioned above.
|
||||
#ifdef BLIS_CONFIGURETIME_CPUID
|
||||
|
||||
// NOTE: If you need to make any changes to this cpp branch, it's probably
|
||||
// the case that you also need to modify bli_arch.c, bli_cpuid.c, and
|
||||
// bli_env.c. Don't forget to update these other files as needed!
|
||||
|
||||
// The BLIS_ENABLE_SYSTEM macro must be defined so that the correct cpp
|
||||
// branch in bli_system.h is processed. (This macro is normally defined in
|
||||
// bli_config.h.)
|
||||
#define BLIS_ENABLE_SYSTEM
|
||||
|
||||
// Use C-style static inline functions for any static inline functions that
|
||||
// happen to be defined by the headers below. (This macro is normally defined
|
||||
// in bli_config_macro_defs.h.)
|
||||
#define BLIS_INLINE static
|
||||
|
||||
// Since we're not building a shared library, we can forgo the use of the
|
||||
// BLIS_EXPORT_BLIS annotations by #defining them to be nothing. (This macro
|
||||
// is normally defined in bli_config_macro_defs.h.)
|
||||
#define BLIS_EXPORT_BLIS
|
||||
|
||||
#include "bli_system.h"
|
||||
#include "bli_type_defs.h"
|
||||
#include "bli_arch.h"
|
||||
#include "bli_cpuid.h"
|
||||
//#include "bli_env.h"
|
||||
#else
|
||||
#include "blis.h"
|
||||
#endif
|
||||
|
||||
int main( int argc, char** argv )
|
||||
{
|
||||
|
||||
@@ -699,7 +699,7 @@ endif
|
||||
|
||||
# Disable tautological comparision warnings in clang.
|
||||
ifeq ($(CC_VENDOR),clang)
|
||||
CWARNFLAGS += -Wno-tautological-compare
|
||||
CWARNFLAGS += -Wno-tautological-compare -Wno-pass-failed
|
||||
endif
|
||||
|
||||
$(foreach c, $(CONFIG_LIST_FAM), $(eval $(call append-var-for,CWARNFLAGS,$(c))))
|
||||
|
||||
8
configure
vendored
8
configure
vendored
@@ -1540,6 +1540,8 @@ check_compiler()
|
||||
# cortexa15: any
|
||||
# cortexa9: any
|
||||
#
|
||||
# armsve: clang11+, gcc10+
|
||||
#
|
||||
# generic: any
|
||||
#
|
||||
# Note: These compiler requirements were originally modeled after similar
|
||||
@@ -1585,6 +1587,9 @@ check_compiler()
|
||||
# gcc 5.x may support POWER9 but it is unverified.
|
||||
blacklistcc_add "power9"
|
||||
fi
|
||||
if [ ${cc_major} -lt 10 ]; then
|
||||
blacklistcc_add "armsve"
|
||||
fi
|
||||
fi
|
||||
|
||||
# icc
|
||||
@@ -1647,6 +1652,9 @@ check_compiler()
|
||||
#blacklistcc_add "zen"
|
||||
: # explicit no-op since bash can't handle empty loop bodies.
|
||||
fi
|
||||
if [ ${cc_major} -lt 11 ]; then
|
||||
blacklistcc_add "armsve"
|
||||
fi
|
||||
fi
|
||||
fi
|
||||
}
|
||||
|
||||
67
docs/FAQ.md
67
docs/FAQ.md
@@ -9,6 +9,7 @@ project, as well as those we think a new user or developer might ask. If you do
|
||||
* [Why should I use BLIS instead of GotoBLAS / OpenBLAS / ATLAS / MKL / ESSL / ACML / Accelerate?](FAQ.md#why-should-i-use-blis-instead-of-gotoblas--openblas--atlas--mkl--essl--acml--accelerate)
|
||||
* [How is BLIS related to FLAME / libflame?](FAQ.md#how-is-blis-related-to-flame--libflame)
|
||||
* [What is the difference between BLIS and the AMD fork of BLIS found in AOCL?](FAQ.md#what-is-the-difference-between-blis-and-the-amd-fork-of-blis-found-in-aocl)
|
||||
* [Who do I contact if I have a question about the AMD version of BLIS?](FAQ.md#who-do-i-contact-if-i-have-a-question-about-the-amd-version-of-blis)
|
||||
* [Does BLIS automatically detect my hardware?](FAQ.md#does-blis-automatically-detect-my-hardware)
|
||||
* [I understand that BLIS is mostly a tool for developers?](FAQ.md#i-understand-that-blis-is-mostly-a-tool-for-developers)
|
||||
* [How do I link against BLIS?](FAQ.md#how-do-i-link-against-blis)
|
||||
@@ -17,6 +18,7 @@ project, as well as those we think a new user or developer might ask. If you do
|
||||
* [What is a macrokernel?](FAQ.md#what-is-a-macrokernel)
|
||||
* [What is a context?](FAQ.md#what-is-a-context)
|
||||
* [I am used to thinking in terms of column-major/row-major storage and leading dimensions. What is a "row stride" / "column stride"?](FAQ.md#im-used-to-thinking-in-terms-of-column-majorrow-major-storage-and-leading-dimensions-what-is-a-row-stride--column-stride)
|
||||
* [I'm somewhat new to this matrix stuff. Can you remind me, what is the difference between a matrix row and a matrix column?](FAQ.md#im-somewhat-new-to-this-matrix-stuff-can-you-remind-me-what-is-the-difference-between-a-matrix-row-and-a-matrix-column)
|
||||
* [Why does BLIS have vector (level-1v) and matrix (level-1m) variations of most level-1 operations?](FAQ.md#why-does-blis-have-vector-level-1v-and-matrix-level-1m-variations-of-most-level-1-operations)
|
||||
* [What does it mean when a matrix with general stride is column-tilted or row-tilted?](FAQ.md#what-does-it-mean-when-a-matrix-with-general-stride-is-column-tilted-or-row-tilted)
|
||||
* [I am not really interested in all of these newfangled features in BLIS. Can I just use BLIS as a BLAS library?](FAQ.md#im-not-really-interested-in-all-of-these-newfangled-features-in-blis-can-i-just-use-blis-as-a-blas-library)
|
||||
@@ -36,8 +38,7 @@ project, as well as those we think a new user or developer might ask. If you do
|
||||
* [Who funded the development of BLIS?](FAQ.md#who-funded-the-development-of-blis)
|
||||
* [I found a bug. How do I report it?](FAQ.md#i-found-a-bug-how-do-i-report-it)
|
||||
* [How do I request a new feature?](FAQ.md#how-do-i-request-a-new-feature)
|
||||
* [What is the difference between this version of BLIS and the one that AMD maintains?](FAQ.md#what-is-the-difference-between-this-version-of-blis-and-the-one-that-amd-maintains)
|
||||
* [Who do I contact if I have a question about the AMD version of BLIS?](FAQ.md#who-do-i-contact-if-i-have-a-question-about-the-amd-version-of-blis)
|
||||
* [I'm a developer and I'd like to study the way matrix multiplication is implemented in BLIS. Where should I start?](FAQ.md#im-a-developer-and-id-like-to-study-the-way-matrix-multiplication-is-implemented-in-blis-where-should-i-start)
|
||||
* [Where did you get the photo for the BLIS logo / mascot?](FAQ.md#where-did-you-get-the-photo-for-the-blis-logo--mascot)
|
||||
|
||||
### Why did you create BLIS?
|
||||
@@ -60,7 +61,9 @@ homepage](https://github.com/flame/blis#key-features). But here are a few reason
|
||||
|
||||
### How is BLIS related to FLAME / `libflame`?
|
||||
|
||||
As explained [above](FAQ.md#why-did-you-create-blis?), BLIS was initially a layer within `libflame` that allowed more convenient interfacing to the BLAS. So in some ways, BLIS is a spin-off project. Prior to developing BLIS, [its author](http://www.cs.utexas.edu/users/field/) worked as the primary maintainer of `libflame`. If you look closely, you can also see that the design of BLIS was influenced by some of the more useful and innovative aspects of `libflame`, such as internal object abstractions and control trees. Also, various members of the [SHPC research group](http://shpc.ices.utexas.edu/people.html) and its [collaborators](http://shpc.ices.utexas.edu/collaborators.html) routinely provide insight, feedback, and also contribute code (especially kernels) to the BLIS project.
|
||||
As explained [above](FAQ.md#why-did-you-create-blis?), BLIS was initially a layer within `libflame` that allowed more convenient interfacing to the BLAS. So in some ways, BLIS is a spin-off project. Prior to developing BLIS, [its primary author](http://www.cs.utexas.edu/users/field/) worked as the primary maintainer of `libflame`. If you look closely, you can also see that the design of BLIS was influenced by some of the more useful and innovative aspects of `libflame`, such as internal object abstractions and control trees.
|
||||
|
||||
Note that various members of the [SHPC research group](http://shpc.ices.utexas.edu/people.html) and its [collaborators](http://shpc.ices.utexas.edu/collaborators.html) routinely provide insight, feedback, and also contribute code (especially kernels) to the BLIS project.
|
||||
|
||||
### What is the difference between BLIS and the AMD fork of BLIS found in AOCL?
|
||||
|
||||
@@ -68,6 +71,10 @@ BLIS, also known as "vanilla BLIS" or "upstream BLIS," is maintained by its [ori
|
||||
|
||||
AMD BLIS sometimes contains certain optimizations specific to AMD hardware. Many of these optimizations are (eventually) merged back into upstream BLIS. However, for various reasons, some changes may remain unique to AMD BLIS for quite some time. Thus, if you want the latest optimizations for AMD hardware, feel free to try AMD BLIS. However, please note that neither The University of Texas at Austin nor BLIS's developers can endorse or offer direct support for any outside fork of BLIS, including AMD BLIS.
|
||||
|
||||
### Who do I contact if I have a question about the AMD version of BLIS?
|
||||
|
||||
For questions or support regarding [AMD's fork of BLIS](https://github.com/amd/blis), please contact the [AMD Optimizing CPU Libraries](https://developer.amd.com/amd-aocl/) group at aoclsupport@amd.com.
|
||||
|
||||
### Does BLIS automatically detect my hardware?
|
||||
|
||||
On certain architectures (most notably x86_64), yes. In order to use auto-detection, you must specify `auto` as your configuration when running `configure` (Please see the BLIS [Build System](BuildSystem.md) guide for more info.) A runtime detection option is also available. (Please see the [Configuration Guide](ConfigurationHowTo.md) for a comprehensive walkthrough.)
|
||||
@@ -76,9 +83,9 @@ If automatic hardware detection is requested at configure-time and the build pro
|
||||
|
||||
### I understand that BLIS is mostly a tool for developers?
|
||||
|
||||
Yes. In order to achieve high performance, BLIS requires that hand-coded kernels and microkernels be written and referenced in a valid [BLIS configuration](ConfigurationHowTo.md). These components are usually written by developers and then included within BLIS for use by others.
|
||||
It is certainly the case that BLIS began as a tool targeted at developers. In order to achieve high performance, BLIS requires that hand-coded kernels and microkernels be written and referenced in a valid [BLIS configuration](ConfigurationHowTo.md). These components are usually written by developers and then included within BLIS for use by others.
|
||||
|
||||
The good news, however, is that end-users can use BLIS too. Once the aforementioned kernels are integrated into BLIS, they can be used without any developer-level knowledge, and many kernels have already been added! Usually, `./configure auto; make; make install` is sufficient for the typical users with typical hardware.
|
||||
The good news, however, is that BLIS has matured to the point where end-users can use it too! Once the aforementioned kernels are integrated into BLIS, they can be used without any developer-level knowledge, and many kernels have already been added! Usually, `./configure auto; make; make install` is sufficient for the typical users with typical hardware.
|
||||
|
||||
### How do I link against BLIS?
|
||||
|
||||
@@ -98,9 +105,9 @@ For a more thorough explanation of the microkernel and its role in the overall l
|
||||
|
||||
### What is a macrokernel?
|
||||
|
||||
The macrokernels are portable codes within the BLIS framework that implement relatively small subproblems within an overall level-3 operation. The overall problem (say, general matrix-matrix multiplication, or `gemm`) is partitioned down, according to cache blocksizes, such that its operands are (1) a suitable size and (2) stored in a special packed format. At that time, the macrokernel is called. The macrokernel is implemented as two loops around the microkernel.
|
||||
The macrokernels are portable codes within the BLIS framework that implement relatively small subproblems within an overall level-3 operation. The overall problem (say, general matrix-matrix multiplication, or `gemm`) is partitioned down, according to cache blocksizes, such that its `A` and `B` operands are (1) a suitable size and (2) stored in a special packed format. At that time, the macrokernel is called. The macrokernel is implemented as two loops around the microkernel.
|
||||
|
||||
The macrokernels in BLIS correspond to the so-called "inner kernels" (or simply "kernels") that formed the fundamental unit of computation in Kazushige Goto's GotoBLAS (and now in the successor library, OpenBLAS).
|
||||
The macrokernels, along with the microkernel that they call, correspond to the so-called "inner kernels" (or simply "kernels") that formed the fundamental unit of computation in Kazushige Goto's GotoBLAS (and now in the successor library, OpenBLAS).
|
||||
|
||||
For more information on macrokernels, please read our [ACM TOMS papers](https://github.com/flame/blis#citations).
|
||||
|
||||
@@ -118,6 +125,18 @@ In generalized storage, we have a row stride and a column stride. The row stride
|
||||
|
||||
BLIS also supports situations where both the row stride and column stride are non-unit. We call this situation "general stride".
|
||||
|
||||
### I'm somewhat new to this matrix stuff. Can you remind me, what is the difference between a matrix row and a matrix column?
|
||||
|
||||
Of course! (BLIS's primary author remembers what it was like to get columns and rows confused.)
|
||||
|
||||
Matrix columns consist of elements that are vertically aligned. Matrix rows consist of elements that are horizontally aligned. (One way to remember this distinction is that real-life columns are vertical structures that hold up buildings. A row of seats in a stadium, by contrast, is horizontal to the ground.)
|
||||
|
||||
Furthermore, it is helpful to know that the number of rows in a matrix constitutes its so-called *m* dimension, and the number of columns constitutes its *n* dimension.
|
||||
|
||||
Matrix dimension are always stated as *m x n*: the number of rows *by* the number of columns.
|
||||
|
||||
So, a *3 x 4* matrix contains three rows (each of length four) and four columns (each of length three).
|
||||
|
||||
### Why does BLIS have vector (level-1v) and matrix (level-1m) variations of most level-1 operations?
|
||||
|
||||
At first glance, it might appear that an element-wise operation such as `copym` or `axpym` would be sufficiently general purpose to cover the cases where the operands are vectors. After all, an *m x 1* matrix can be viewed as a vector of length m and vice versa. But in BLIS, operations on vectors are treated slightly differently than operations on matrices.
|
||||
@@ -126,15 +145,13 @@ If an application wishes to perform an element-wise operation on two objects, an
|
||||
|
||||
However, if an application instead decides to perform an element-wise operation on two objects, and the application calls a level-1v operation, the dimension constraints are slightly relaxed. In this scenario, BLIS only checks that the vector *lengths* are equal. This allows for the vectors to have different orientations (row vs column) while still being considered conformal. So, you could perform a `copyv` operation to copy from an *m x 1* vector to a *1 x m* vector. A `copym` operation on such objects would not be allowed (unless it was executed with the source object containing an implicit transposition).
|
||||
|
||||
Another way to think about level-1v operations is that they will work with any two matrix objects in situations where (a) the corresponding level-1m operation *would have* worked if the input had been transposed, and (b) all operands happen to be vectors (i.e., have one unit dimension).
|
||||
|
||||
### What does it mean when a matrix with general stride is column-tilted or row-tilted?
|
||||
|
||||
When a matrix is stored with general stride, both the row stride and column stride (let's call them `rs` and `cs`) are non-unit. When `rs` < `cs`, we call the general stride matrix "column-tilted" because it is "closer" to being column-stored (than row-stored). Similarly, when `rs` > `cs`, the matrix is "row-tilted" because it is closer to being row-stored.
|
||||
|
||||
### I'm not really interested in all of these newfangled features in BLIS. Can I just use BLIS as a BLAS library?
|
||||
|
||||
Absolutely. Just link your application to BLIS the same way you would link to a BLAS library. For a simple linking example, see the [Linking to BLIS](KernelsHowTo.md#linking-to-blis) section of the BLIS [Build System](BuildSystem.md) guide.
|
||||
Absolutely! Just link your application to BLIS the same way you would link to a BLAS library. For a simple linking example, see the [Linking to BLIS](KernelsHowTo.md#linking-to-blis) section of the BLIS [Build System](BuildSystem.md) guide.
|
||||
|
||||
### What about CBLAS?
|
||||
|
||||
@@ -144,11 +161,13 @@ BLIS also contains an optional CBLAS compatibility layer, which leverages the BL
|
||||
|
||||
In principle, BLIS's native (and BLAS-like) [typed API](BLISTypedAPI) can be called from Fortran. However, you must ensure that the size of the integer in BLIS is equal to the size of integer used by your Fortran program/compiler/environment. The size of BLIS integers is determined at configure-time. Please see `./configure --help` for the syntax for options related to integer sizes.
|
||||
|
||||
You may also want to confirm that your Fortran compiler doesn't perform any name-mangling of called functions or subroutines (such as with additional underscores beyond the single trailing underscore found in the BLAS APIs), and if so, take steps to disable this additional name-mangling. For example, if your source code calls `dgemm()` but your Fortran compiler name-mangles that call to `_dgemm_()` or `dgemm__()`, your program will fail to link against BLIS since BLIS only defines `dgemm_()`.
|
||||
|
||||
As for bindings to other languages, please contact the [blis-devel](http://groups.google.com/group/blis-devel) mailing list.
|
||||
|
||||
### Do I need to call initialization/finalization functions before being able to use BLIS from my application?
|
||||
|
||||
Originally, BLIS did indeed require the application to explicitly setup (initialize) various internal data structures via `bli_init()`. Likewise, calling `bli_finalize()` was recommended to cleanup (finalize) the library. However, since commit 9804adf (circa December 2017), BLIS has implemented self-initialization. These explicit calls to `bli_init()` and `bli_finalize()` are no longer necessary, though experts may still use them in special cases to control the allocation and freeing of resources. This topic is discussed in the BLIS [typed API reference](BLISTypedAPI.md#initialization-and-cleanup).
|
||||
Originally, BLIS did indeed require the application to explicitly setup (initialize) various internal data structures via `bli_init()`. Likewise, calling `bli_finalize()` was recommended to cleanup (finalize) the library. However, since commit `9804adf` (circa December 2017), BLIS has implemented self-initialization. These explicit calls to `bli_init()` and `bli_finalize()` are no longer necessary, though experts may still use them in special cases to control the allocation and freeing of resources. This topic is discussed in the BLIS [typed API reference](BLISTypedAPI.md#initialization-and-cleanup).
|
||||
|
||||
### Does BLIS support multithreading?
|
||||
|
||||
@@ -162,7 +181,7 @@ We have integrated some early foundational support for NUMA *development*, but c
|
||||
|
||||
### Does BLIS work with GPUs?
|
||||
|
||||
BLIS does not currently support graphical processing units (GPUs). However, others have applied the BLIS approach towards frameworks that provide BLAS-like functionality on GPUs. To see how NVIDIA's implementation compares to an analagous approach based on the principles that underlie BLIS, please see a paper by some of our collaborators, ["Implementing Strassen’s Algorithm with CUTLASSon NVIDIA Volta GPUs"](https://apps.cs.utexas.edu/apps/sites/default/files/tech_reports/GPUStrassen.pdf).
|
||||
BLIS does not currently support graphical processing units (GPUs). However, others have applied the BLIS approach towards frameworks that provide BLAS-like functionality on GPUs. To see how NVIDIA's implementation compares to an analogous approach based on the principles that underlie BLIS, please see a paper by some of our collaborators, ["Implementing Strassen’s Algorithm with CUTLASS on NVIDIA Volta GPUs"](https://apps.cs.utexas.edu/apps/sites/default/files/tech_reports/GPUStrassen.pdf).
|
||||
|
||||
### Does BLIS work on _(some architecture)_?
|
||||
|
||||
@@ -174,7 +193,7 @@ No. BLIS is a framework for sequential and shared-memory/multicore implementatio
|
||||
|
||||
### Can I build BLIS on Mac OS X?
|
||||
|
||||
BLIS was designed for use in a GNU/Linux environment. However, we've gone to greath lengths to keep BLIS compatible with other UNIX-like systems as well, such as BSD and OS X. System software requirements for UNIX-like systems are discussed in the BLIS [Build System](BuildSystem.md) guide.
|
||||
BLIS was designed for use in a GNU/Linux environment. However, we've gone to great lengths to keep BLIS compatible with other UNIX-like systems as well, such as BSD and OS X. System software requirements for UNIX-like systems are discussed in the BLIS [Build System](BuildSystem.md) guide.
|
||||
|
||||
### Can I build BLIS on Windows?
|
||||
|
||||
@@ -203,7 +222,7 @@ Yes. By default, most configurations output only a static library archive (e.g.
|
||||
|
||||
### Can I use the mixed domain / mixed precision support in BLIS?
|
||||
|
||||
Yes! As of 5fec95b (circa October 2018), BLIS supports mixed-datatype (mixed domain and/or mixed precision) computation via the `gemm` operation. Documentation on utilizing this new functionality is provided via the [MixedDatatype.md](docs/MixedDatatypes.md) document in the source distribution.
|
||||
Yes! As of 5fec95b (circa October 2018), BLIS supports mixed-datatype (mixed domain and/or mixed precision) computation via the `gemm` operation. Documentation on utilizing this new functionality is provided via the [MixedDatatype.md](MixedDatatypes.md) document in the source distribution.
|
||||
|
||||
If this feature is important or useful to your work, we would love to hear from you. Please contact us via the [blis-devel](http://groups.google.com/group/blis-devel) mailing list and tell us about your application and why you need/want support for BLAS-like operations with mixed-domain/mixed-precision operands.
|
||||
|
||||
@@ -214,33 +233,27 @@ Lots of people! For a full list of those involved, see the
|
||||
|
||||
### Who funded the development of BLIS?
|
||||
|
||||
BLIS was primarily funded by grants from [Microsoft](https://www.microsoft.com/),
|
||||
[Intel](https://www.intel.com/), [Texas
|
||||
Instruments](https://www.ti.com/), [AMD](https://www.amd.com/), [Huawei](https://www.hauwei.com/us/), [Oracle](https://www.oracle.com/), and [Facebook](https://www.facebook.com/) as well as grants from the [National Science Foundation](http://www.nsf.gov/) (Awards CCF-0917167 ACI-1148125/1340293, and CCF-1320112).
|
||||
BLIS was primarily funded by a variety of gifts/grants from industry and the National Science Foundation. Please see the "Funding" section of the [BLIS homepage](https://github.com/flame/blis#funding) for more details.
|
||||
|
||||
Reminder: _Any opinions, findings and conclusions or recommendations expressed in this material are those of the author(s) and do not necessarily reflect the views of the National Science Foundation (NSF)._
|
||||
|
||||
### I found a bug. How do I report it?
|
||||
|
||||
If you think you've found a bug, we request that you [open an issue](http://github.com/flame/blis/issues). Don't be shy! Really, it's the best and most convenient way for us to track your issues/bugs/concerns. Other discussions that are not primarily bug-reports should take place via the [blis-devel](http://groups.google.com/group/blis-devel) mailing list.
|
||||
If you think you've found a bug, we request that you [open an issue](http://github.com/flame/blis/issues). Don't be shy! Really, it's the best and most convenient way for us to track your issues/bugs/concerns.
|
||||
|
||||
### How do I request a new feature?
|
||||
|
||||
Feature requests should also be submitted by [opening a new issue](http://github.com/flame/blis/issues).
|
||||
|
||||
### What is the difference between this version of BLIS and the one that AMD maintains?
|
||||
### I'm a developer and I'd like to study the way matrix multiplication is implemented in BLIS. Where should I start?
|
||||
|
||||
AMD has chosen BLIS as the open-source foundation for the BLAS component of their [AMD Optimizing CPU Libraries (AOCL)](https://developer.amd.com/amd-aocl/) toolkit. Our group enjoys a great collaboration and partnership with AMD, and we are pleased to have their enthusiastic support for our project.
|
||||
Great question! The first thing you should know is that the core framework of [level-3 operations](BLISTypedAPI.md#operation-index) was *not* designed to be used to teach or explain a high-performance implementation of matrix multiplication. Rather, it was designed to encode the family of level-3 operations with as little code duplication as possible. Because of this, and also for historical/evolutionary reasons, it can be a little difficult to trace the execution of, say, `gemm` from within the core framework.
|
||||
|
||||
At a technical level, AMD's fork of BLIS is considered to be a downstream variant. AMD uses their fork to develop optimizations specific to AMD hardware. Occasionally, AMD will submit pull requests to merge their features, enhancements, and fixes back into our "plain vanilla" upstream repository. So our upstream BLIS will eventually contain most of the modifications originally developed by AMD in their fork, but with a lag. Similarly, features introduced into the upstream BLIS may not be immediately available in AMD's fork, but eventually their team will perform a merge and synchronize with our latest code.
|
||||
Thankfully, we have an alternative environment in which experts, application developers, and other curious individuals can study BLIS's matrix multiplication implementation. This so-called "sandbox" is a simplified collection of code that strips away much of the framework complexity while also maintaining local definitions for many of the interesting bits. You may find this `gemmlike` sandbox in `sandbox/gemmlike`.
|
||||
|
||||
AMD also uses a different versioning system for AOCL which is independent of the versions used by the [upstream BLIS](http://github.com/flame/blis) project.
|
||||
|
||||
### Who do I contact if I have a question about the AMD version of BLIS?
|
||||
|
||||
For questions or support regarding [AMD's fork of BLIS](https://github.com/amd/blis), please contact the [AMD Optimizing CPU Libraries](https://developer.amd.com/amd-aocl/) group at aoclsupport@amd.com.
|
||||
Sandboxes go beyond the scope of this FAQ. For an introduction, please refer to the [Sandboxes](Sandboxes.md) document, and/or contact the BLIS developers for more information.
|
||||
|
||||
### Where did you get the photo for the BLIS logo / mascot?
|
||||
|
||||
The sleeping ["BLIS cat"](https://github.com/flame/blis/blob/master/README.md) photo was taken by Petar Mitchev and is used with his permission.
|
||||
The sleeping ["BLIS cat"](README.md) photo was taken by Petar Mitchev and is used with his permission.
|
||||
|
||||
|
||||
@@ -37,11 +37,11 @@ utility functions.
|
||||
To enable a sandbox at configure-time, you simply specify it as an option to
|
||||
`configure`. Either of the following usages are accepted:
|
||||
```
|
||||
$ ./configure --enable-sandbox=ref99 auto
|
||||
$ ./configure -s ref99 auto
|
||||
$ ./configure --enable-sandbox=gemmlike auto
|
||||
$ ./configure -s gemmlike auto
|
||||
```
|
||||
Here, we tell `configure` that we want to use the `ref99` sandbox, which
|
||||
corresponds to a sub-directory of `sandbox` named `ref99`. (Reminder: the
|
||||
Here, we tell `configure` that we want to use the `gemmlike` sandbox, which
|
||||
corresponds to a sub-directory of `sandbox` named `gemmlike`. (Reminder: the
|
||||
`auto` argument is the configuration target and thus unrelated to
|
||||
sandboxes.)
|
||||
|
||||
@@ -50,7 +50,7 @@ sizes and shapes, you'll need to disable the skinny/unpacked "sup"
|
||||
sub-framework within BLIS, which is enabled by default. This can be
|
||||
done by passing the `--disable-sup-handling` option to configure:
|
||||
```
|
||||
$ ./configure --enable-sandbox=ref99 --disable-sup-handling auto
|
||||
$ ./configure --enable-sandbox=gemmlike --disable-sup-handling auto
|
||||
```
|
||||
If you leave sup enabled, the sup implementation will, at runtime, detect
|
||||
and handle certain smaller problem sizes upstream of where BLIS calls
|
||||
@@ -62,13 +62,14 @@ As `configure` runs, you should get output that includes lines
|
||||
similar to:
|
||||
```
|
||||
configure: configuring for alternate gemm implementation:
|
||||
configure: sandbox/ref99
|
||||
configure: sandbox/gemmlike
|
||||
```
|
||||
And when you build BLIS, the last files to be compiled will be the source
|
||||
code in the specified sandbox:
|
||||
```
|
||||
Compiling obj/haswell/sandbox/ref99/blx_gemm_ref_var2.o ('haswell' CFLAGS for sandboxes)
|
||||
Compiling obj/haswell/sandbox/ref99/oapi/bli_gemmnat.o ('haswell' CFLAGS for sandboxes)
|
||||
Compiling obj/haswell/sandbox/gemmlike/bli_gemmnat.o ('haswell' CFLAGS for sandboxes)
|
||||
Compiling obj/haswell/sandbox/gemmlike/bls_gemm.o ('haswell' CFLAGS for sandboxes)
|
||||
Compiling obj/haswell/sandbox/gemmlike/bls_gemm_bp_var1.o ('haswell' CFLAGS for sandboxes)
|
||||
...
|
||||
```
|
||||
That's it! After the BLIS library is built, it will contain your chosen
|
||||
@@ -92,16 +93,19 @@ will be found!
|
||||
2. Your sandbox must be written in C99 or C++11. If you write your sandbox in
|
||||
C++11, you must use one of the BLIS-approved file extensions for your source
|
||||
files (`.cc`, `.cpp`, `.cxx`) and your header files (`.hh`, `.hpp`, `.hxx`).
|
||||
Note that `blis.h`
|
||||
already contains all of its definitions inside of an `extern "C"` block, so
|
||||
you should be able to `#include "blis.h"` from your C++11 source code without
|
||||
any issues.
|
||||
Note that `blis.h` already contains all of its definitions inside of an
|
||||
`extern "C"` block, so you should be able to `#include "blis.h"` from your
|
||||
C++11 source code without any issues.
|
||||
|
||||
3. All of your code to replace BLIS's default implementation of `bli_gemmnat()`
|
||||
should reside in the named sandbox directory, or some directory therein.
|
||||
(Obviously.) For example, the "reference" sandbox is located in
|
||||
`sandbox/ref99`. All of the code associated with this sandbox will be
|
||||
contained within `sandbox/ref99`.
|
||||
(Obviously.) For example, the "gemmlike" sandbox is located in
|
||||
`sandbox/gemmlike`. All of the code associated with this sandbox will be
|
||||
contained within `sandbox/gemmlike`. Note that you absolutely *may* include
|
||||
additional code and interfaces within the sandbox, if you wish -- code and
|
||||
interfaces that are not directly or indirectly needed for satisfying the
|
||||
the "contract" set forth by the sandbox (i.e., including a local definition
|
||||
of`bli_gemmnat()`).
|
||||
|
||||
4. The *only* header file that is required of your sandbox is `bli_sandbox.h`.
|
||||
It must be named `bli_sandbox.h` because `blis.h` will `#include` this file
|
||||
@@ -116,16 +120,17 @@ you should only place things (e.g. prototypes or type definitions) in
|
||||
Usually, neither of these situations will require any of your local definitions
|
||||
since those local definitions are only needed to define your sandbox
|
||||
implementation of `bli_gemmnat()`, and this function is already prototyped by
|
||||
BLIS.
|
||||
BLIS. *But if you are adding additional APIs and/or operations to the sandbox
|
||||
that are unrelated to `bli_gemmnat()`, then you'll want to `#include` those
|
||||
function prototypes from within `bli_sandbox.h`*
|
||||
|
||||
5. Your definition of `bli_gemmnat()` should be the **only function you define**
|
||||
in your sandbox that begins with `bli_`. If you define other functions that
|
||||
begin with `bli_`, you risk a namespace collision with existing framework
|
||||
functions. To guarantee safety, please prefix your locally-defined sandbox
|
||||
functions with another prefix. Here, in the `ref99` sandbox, we use the prefix
|
||||
`blx_`. (The `x` is for sandbox. Or experimental.) Also, please avoid the
|
||||
prefix `bla_` since that prefix is also used in BLIS for BLAS compatibility
|
||||
functions.
|
||||
functions with another prefix. Here, in the `gemmlike` sandbox, we use the prefix
|
||||
`bls_`. (The `s` is for sandbox.) Also, please avoid the prefix `bla_` since that
|
||||
prefix is also used in BLIS for BLAS compatibility functions.
|
||||
|
||||
If you follow these rules, you will be much more likely to have a pleasant
|
||||
experience integrating your BLIS sandbox into the larger framework.
|
||||
@@ -207,15 +212,9 @@ enabled in `input.general`. However, if those options *are* enabled and BLIS was
|
||||
built with mixed datatype support, then BLIS assumes that the implementation of
|
||||
`gemm` will support mixing of datatypes. BLIS *must* assume this, because
|
||||
there's no way for it to confirm at runtime that an implementation was written
|
||||
to support mixing datatypes. Note that even the `ref99` sandbox included with
|
||||
to support mixing datatypes. Note that even the `gemmlike` sandbox included with
|
||||
BLIS does not support mixed-datatype computation.
|
||||
|
||||
* **Multithreading in ref99.** The current reference sandbox, `ref99`, does not
|
||||
currently implement multithreading.
|
||||
|
||||
* **Packing matrices in ref99.** The current reference sandbox, `ref99`, does not
|
||||
currently implement packing of matrices A or B.
|
||||
|
||||
## Conclusion
|
||||
|
||||
If you encounter any problems, or are really bummed-out that `gemm` is the
|
||||
|
||||
@@ -34,8 +34,26 @@
|
||||
*/
|
||||
|
||||
#ifdef BLIS_CONFIGURETIME_CPUID
|
||||
|
||||
// NOTE: If you need to make any changes to this cpp branch, it's probably
|
||||
// the case that you also need to modify bli_arch.c, bli_cpuid.c, and
|
||||
// bli_env.c. Don't forget to update these other files as needed!
|
||||
|
||||
// The BLIS_ENABLE_SYSTEM macro must be defined so that the correct cpp
|
||||
// branch in bli_system.h is processed. (This macro is normally defined in
|
||||
// bli_config.h.)
|
||||
#define BLIS_ENABLE_SYSTEM
|
||||
|
||||
// Use C-style static inline functions for any static inline functions that
|
||||
// happen to be defined by the headers below. (This macro is normally defined
|
||||
// in bli_config_macro_defs.h.)
|
||||
#define BLIS_INLINE static
|
||||
|
||||
// Since we're not building a shared library, we can forgo the use of the
|
||||
// BLIS_EXPORT_BLIS annotations by #defining them to be nothing. (This macro
|
||||
// is normally defined in bli_config_macro_defs.h.)
|
||||
#define BLIS_EXPORT_BLIS
|
||||
|
||||
#include "bli_system.h"
|
||||
#include "bli_type_defs.h"
|
||||
#include "bli_arch.h"
|
||||
|
||||
@@ -47,12 +47,31 @@
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_CONFIGURETIME_CPUID
|
||||
|
||||
// NOTE: If you need to make any changes to this cpp branch, it's probably
|
||||
// the case that you also need to modify bli_arch.c, bli_cpuid.c, and
|
||||
// bli_env.c. Don't forget to update these other files as needed!
|
||||
|
||||
// The BLIS_ENABLE_SYSTEM macro must be defined so that the correct cpp
|
||||
// branch in bli_system.h is processed. (This macro is normally defined in
|
||||
// bli_config.h.)
|
||||
#define BLIS_ENABLE_SYSTEM
|
||||
|
||||
// Use C-style static inline functions for any static inline functions that
|
||||
// happen to be defined by the headers below. (This macro is normally defined
|
||||
// in bli_config_macro_defs.h.)
|
||||
#define BLIS_INLINE static
|
||||
|
||||
// Since we're not building a shared library, we can forgo the use of the
|
||||
// BLIS_EXPORT_BLIS annotations by #defining them to be nothing. (This macro
|
||||
// is normally defined in bli_config_macro_defs.h.)
|
||||
#define BLIS_EXPORT_BLIS
|
||||
|
||||
#include "bli_system.h"
|
||||
#include "bli_type_defs.h"
|
||||
#include "bli_cpuid.h"
|
||||
#include "bli_arch.h"
|
||||
#include "bli_cpuid.h"
|
||||
//#include "bli_env.h"
|
||||
#else
|
||||
#include "blis.h"
|
||||
#include "bli_arch.h"
|
||||
|
||||
@@ -34,10 +34,30 @@
|
||||
*/
|
||||
|
||||
#ifdef BLIS_CONFIGURETIME_CPUID
|
||||
|
||||
// NOTE: If you need to make any changes to this cpp branch, it's probably
|
||||
// the case that you also need to modify bli_arch.c, bli_cpuid.c, and
|
||||
// bli_env.c. Don't forget to update these other files as needed!
|
||||
|
||||
// The BLIS_ENABLE_SYSTEM macro must be defined so that the correct cpp
|
||||
// branch in bli_system.h is processed. (This macro is normally defined in
|
||||
// bli_config.h.)
|
||||
#define BLIS_ENABLE_SYSTEM
|
||||
|
||||
// Use C-style static inline functions for any static inline functions that
|
||||
// happen to be defined by the headers below. (This macro is normally defined
|
||||
// in bli_config_macro_defs.h.)
|
||||
#define BLIS_INLINE static
|
||||
|
||||
// Since we're not building a shared library, we can forgo the use of the
|
||||
// BLIS_EXPORT_BLIS annotations by #defining them to be nothing. (This macro
|
||||
// is normally defined in bli_config_macro_defs.h.)
|
||||
#define BLIS_EXPORT_BLIS
|
||||
|
||||
#include "bli_system.h"
|
||||
#include "bli_type_defs.h"
|
||||
//#include "bli_arch.h"
|
||||
//#include "bli_cpuid.h"
|
||||
#include "bli_env.h"
|
||||
#else
|
||||
#include "blis.h"
|
||||
|
||||
@@ -129,7 +129,12 @@ void bli_pool_finalize
|
||||
// Query the total number of blocks currently allocated.
|
||||
const siz_t num_blocks = bli_pool_num_blocks( pool );
|
||||
|
||||
#if 0 // Removing dead code
|
||||
// NOTE: This sanity check has been disabled because bli_pool_reinit()
|
||||
// is currently implemented in terms of bli_pool_finalize() followed by
|
||||
// bli_pool_init(). If that _reinit() takes place when some blocks are
|
||||
// checked out, then we would expect top_index != 0, and therefore this
|
||||
// check is not universally appropriate.
|
||||
#if 0
|
||||
// Query the top_index of the pool.
|
||||
const siz_t top_index = bli_pool_top_index( pool );
|
||||
|
||||
@@ -149,7 +154,6 @@ void bli_pool_finalize
|
||||
|
||||
//bli_abort();
|
||||
}
|
||||
|
||||
#endif
|
||||
|
||||
// Query the free() function pointer for the pool.
|
||||
|
||||
111
frame/include/bli_lang_defs.h
Normal file
111
frame/include/bli_lang_defs.h
Normal file
@@ -0,0 +1,111 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
Copyright (C) 2018 - 2019, Advanced Micro Devices, Inc.
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name(s) of the copyright holder(s) nor the names of its
|
||||
contributors may be used to endorse or promote products derived
|
||||
from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#ifndef BLIS_LANG_DEFS_H
|
||||
#define BLIS_LANG_DEFS_H
|
||||
|
||||
|
||||
// -- Undefine restrict for C++ and C89/90 --
|
||||
|
||||
#ifdef __cplusplus
|
||||
// Language is C++; define restrict as nothing.
|
||||
#ifndef restrict
|
||||
#define restrict
|
||||
#endif
|
||||
#elif __STDC_VERSION__ >= 199901L
|
||||
// Language is C99 (or later); do nothing since restrict is recognized.
|
||||
#else
|
||||
// Language is pre-C99; define restrict as nothing.
|
||||
#ifndef restrict
|
||||
#define restrict
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
// -- Define typeof() operator if using non-GNU compiler --
|
||||
|
||||
#ifndef __GNUC__
|
||||
#define typeof __typeof__
|
||||
#else
|
||||
#ifndef typeof
|
||||
#define typeof __typeof__
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
// -- BLIS Thread Local Storage Keyword --
|
||||
|
||||
// __thread for TLS is supported by GCC, CLANG, ICC, and IBMC.
|
||||
// There is a small risk here as __GNUC__ can also be defined by some other
|
||||
// compiler (other than ICC and CLANG which we know define it) that
|
||||
// doesn't support __thread, as __GNUC__ is not quite unique to GCC.
|
||||
// But the possibility of someone using such non-main-stream compiler
|
||||
// for building BLIS is low.
|
||||
#if defined(__GNUC__) || defined(__clang__) || defined(__ICC) || defined(__IBMC__)
|
||||
#define BLIS_THREAD_LOCAL __thread
|
||||
#else
|
||||
#define BLIS_THREAD_LOCAL
|
||||
#endif
|
||||
|
||||
|
||||
// -- BLIS constructor/destructor function attribute --
|
||||
|
||||
// __attribute__((constructor/destructor)) is supported by GCC only.
|
||||
// There is a small risk here as __GNUC__ can also be defined by some other
|
||||
// compiler (other than ICC and CLANG which we know define it) that
|
||||
// doesn't support this, as __GNUC__ is not quite unique to GCC.
|
||||
// But the possibility of someone using such non-main-stream compiler
|
||||
// for building BLIS is low.
|
||||
|
||||
#if defined(__ICC) || defined(__INTEL_COMPILER)
|
||||
// ICC defines __GNUC__ but doesn't support this
|
||||
#define BLIS_ATTRIB_CTOR
|
||||
#define BLIS_ATTRIB_DTOR
|
||||
#elif defined(__clang__)
|
||||
// CLANG supports __attribute__, but its documentation doesn't
|
||||
// mention support for constructor/destructor. Compiling with
|
||||
// clang and testing shows that it does support.
|
||||
#define BLIS_ATTRIB_CTOR __attribute__((constructor))
|
||||
#define BLIS_ATTRIB_DTOR __attribute__((destructor))
|
||||
#elif defined(__GNUC__)
|
||||
#define BLIS_ATTRIB_CTOR __attribute__((constructor))
|
||||
#define BLIS_ATTRIB_DTOR __attribute__((destructor))
|
||||
#else
|
||||
#define BLIS_ATTRIB_CTOR
|
||||
#define BLIS_ATTRIB_DTOR
|
||||
#endif
|
||||
|
||||
|
||||
#endif
|
||||
@@ -37,77 +37,6 @@
|
||||
#define BLIS_MACRO_DEFS_H
|
||||
|
||||
|
||||
// -- Undefine restrict for C++ and C89/90 --
|
||||
|
||||
#ifdef __cplusplus
|
||||
// Language is C++; define restrict as nothing.
|
||||
#ifndef restrict
|
||||
#define restrict
|
||||
#endif
|
||||
#elif __STDC_VERSION__ >= 199901L
|
||||
// Language is C99 (or later); do nothing since restrict is recognized.
|
||||
#else
|
||||
// Language is pre-C99; define restrict as nothing.
|
||||
#ifndef restrict
|
||||
#define restrict
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
// -- Define typeof() operator if using non-GNU compiler --
|
||||
|
||||
#ifndef __GNUC__
|
||||
#define typeof __typeof__
|
||||
#else
|
||||
#ifndef typeof
|
||||
#define typeof __typeof__
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
// -- BLIS Thread Local Storage Keyword --
|
||||
|
||||
// __thread for TLS is supported by GCC, CLANG, ICC, and IBMC.
|
||||
// There is a small risk here as __GNUC__ can also be defined by some other
|
||||
// compiler (other than ICC and CLANG which we know define it) that
|
||||
// doesn't support __thread, as __GNUC__ is not quite unique to GCC.
|
||||
// But the possibility of someone using such non-main-stream compiler
|
||||
// for building BLIS is low.
|
||||
#if defined(__GNUC__) || defined(__clang__) || defined(__ICC) || defined(__IBMC__)
|
||||
#define BLIS_THREAD_LOCAL __thread
|
||||
#else
|
||||
#define BLIS_THREAD_LOCAL
|
||||
#endif
|
||||
|
||||
|
||||
// -- BLIS constructor/destructor function attribute --
|
||||
|
||||
// __attribute__((constructor/destructor)) is supported by GCC only.
|
||||
// There is a small risk here as __GNUC__ can also be defined by some other
|
||||
// compiler (other than ICC and CLANG which we know define it) that
|
||||
// doesn't support this, as __GNUC__ is not quite unique to GCC.
|
||||
// But the possibility of someone using such non-main-stream compiler
|
||||
// for building BLIS is low.
|
||||
|
||||
#if defined(__ICC) || defined(__INTEL_COMPILER)
|
||||
// ICC defines __GNUC__ but doesn't support this
|
||||
#define BLIS_ATTRIB_CTOR
|
||||
#define BLIS_ATTRIB_DTOR
|
||||
#elif defined(__clang__)
|
||||
// CLANG supports __attribute__, but its documentation doesn't
|
||||
// mention support for constructor/destructor. Compiling with
|
||||
// clang and testing shows that it does support.
|
||||
#define BLIS_ATTRIB_CTOR __attribute__((constructor))
|
||||
#define BLIS_ATTRIB_DTOR __attribute__((destructor))
|
||||
#elif defined(__GNUC__)
|
||||
#define BLIS_ATTRIB_CTOR __attribute__((constructor))
|
||||
#define BLIS_ATTRIB_DTOR __attribute__((destructor))
|
||||
#else
|
||||
#define BLIS_ATTRIB_CTOR
|
||||
#define BLIS_ATTRIB_DTOR
|
||||
#endif
|
||||
|
||||
|
||||
// -- Concatenation macros --
|
||||
|
||||
#define BLIS_FUNC_PREFIX_STR "bli"
|
||||
|
||||
@@ -1187,6 +1187,57 @@ BLIS_INLINE stor3_t bli_obj_stor3_from_strides( obj_t* c, obj_t* a, obj_t* b )
|
||||
}
|
||||
|
||||
|
||||
// -- User-provided information macros --
|
||||
|
||||
// User data query
|
||||
|
||||
BLIS_INLINE void* bli_obj_user_data( obj_t* obj )
|
||||
{
|
||||
return obj->user_data;
|
||||
}
|
||||
|
||||
// User data modification
|
||||
|
||||
BLIS_INLINE void bli_obj_set_user_data( void* data, obj_t* obj )
|
||||
{
|
||||
obj->user_data = data;
|
||||
}
|
||||
|
||||
// Function pointer query
|
||||
|
||||
BLIS_INLINE obj_pack_fn_t bli_obj_pack_fn( obj_t* obj )
|
||||
{
|
||||
return obj->pack;
|
||||
}
|
||||
|
||||
BLIS_INLINE obj_ker_fn_t bli_obj_ker_fn( obj_t* obj )
|
||||
{
|
||||
return obj->ker;
|
||||
}
|
||||
|
||||
BLIS_INLINE obj_ukr_fn_t bli_obj_ukr_fn( obj_t* obj )
|
||||
{
|
||||
return obj->ukr;
|
||||
}
|
||||
|
||||
// Function pointer modification
|
||||
|
||||
BLIS_INLINE void bli_obj_set_pack_fn( obj_pack_fn_t pack, obj_t* obj )
|
||||
{
|
||||
obj->pack = pack;
|
||||
}
|
||||
|
||||
BLIS_INLINE void bli_obj_set_ker_fn( obj_ker_fn_t ker, obj_t* obj )
|
||||
{
|
||||
obj->ker = ker;
|
||||
}
|
||||
|
||||
BLIS_INLINE void bli_obj_set_ukr_fn( obj_ukr_fn_t ukr, obj_t* obj )
|
||||
{
|
||||
obj->ukr = ukr;
|
||||
}
|
||||
|
||||
|
||||
// -- Initialization-related macros --
|
||||
|
||||
// Finish the initialization started by the matrix-specific static initializer
|
||||
|
||||
@@ -71,28 +71,32 @@
|
||||
#endif
|
||||
|
||||
// Determine the target operating system.
|
||||
#if defined(_WIN32) || defined(__CYGWIN__)
|
||||
#define BLIS_OS_WINDOWS 1
|
||||
#elif defined(__gnu_hurd__)
|
||||
#define BLIS_OS_GNU 1
|
||||
#elif defined(__APPLE__) || defined(__MACH__)
|
||||
#define BLIS_OS_OSX 1
|
||||
#elif defined(__ANDROID__)
|
||||
#define BLIS_OS_ANDROID 1
|
||||
#elif defined(__linux__)
|
||||
#define BLIS_OS_LINUX 1
|
||||
#elif defined(__bgq__)
|
||||
#define BLIS_OS_BGQ 1
|
||||
#elif defined(__bg__)
|
||||
#define BLIS_OS_BGP 1
|
||||
#elif defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__) || \
|
||||
defined(__bsdi__) || defined(__DragonFly__) || \
|
||||
defined(__FreeBSD_kernel__) || defined(__HAIKU__)
|
||||
#define BLIS_OS_BSD 1
|
||||
#elif defined(EMSCRIPTEN)
|
||||
#define BLIS_OS_EMSCRIPTEN
|
||||
#else
|
||||
#error "Cannot determine operating system"
|
||||
#if defined(BLIS_ENABLE_SYSTEM)
|
||||
#if defined(_WIN32) || defined(__CYGWIN__)
|
||||
#define BLIS_OS_WINDOWS 1
|
||||
#elif defined(__gnu_hurd__)
|
||||
#define BLIS_OS_GNU 1
|
||||
#elif defined(__APPLE__) || defined(__MACH__)
|
||||
#define BLIS_OS_OSX 1
|
||||
#elif defined(__ANDROID__)
|
||||
#define BLIS_OS_ANDROID 1
|
||||
#elif defined(__linux__)
|
||||
#define BLIS_OS_LINUX 1
|
||||
#elif defined(__bgq__)
|
||||
#define BLIS_OS_BGQ 1
|
||||
#elif defined(__bg__)
|
||||
#define BLIS_OS_BGP 1
|
||||
#elif defined(__FreeBSD__) || defined(__NetBSD__) || defined(__OpenBSD__) || \
|
||||
defined(__bsdi__) || defined(__DragonFly__) || \
|
||||
defined(__FreeBSD_kernel__) || defined(__HAIKU__)
|
||||
#define BLIS_OS_BSD 1
|
||||
#elif defined(EMSCRIPTEN)
|
||||
#define BLIS_OS_EMSCRIPTEN
|
||||
#else
|
||||
#error "Cannot determine operating system"
|
||||
#endif
|
||||
#else // #if defined(BLIS_DISABLE_SYSTEM)
|
||||
#define BLIS_OS_NONE
|
||||
#endif
|
||||
|
||||
// A few changes that may be necessary in Windows environments.
|
||||
|
||||
@@ -160,7 +160,7 @@ typedef uint32_t objbits_t; // object information bit field
|
||||
// interoperability with BLIS.
|
||||
#ifndef _DEFINED_SCOMPLEX
|
||||
#define _DEFINED_SCOMPLEX
|
||||
typedef struct
|
||||
typedef struct scomplex
|
||||
{
|
||||
float real;
|
||||
float imag;
|
||||
@@ -171,7 +171,7 @@ typedef uint32_t objbits_t; // object information bit field
|
||||
// interoperability with BLIS.
|
||||
#ifndef _DEFINED_DCOMPLEX
|
||||
#define _DEFINED_DCOMPLEX
|
||||
typedef struct
|
||||
typedef struct dcomplex
|
||||
{
|
||||
double real;
|
||||
double imag;
|
||||
@@ -1054,7 +1054,6 @@ typedef enum
|
||||
|
||||
} arch_t;
|
||||
|
||||
|
||||
typedef enum
|
||||
{
|
||||
// Initial value, will be selected for an unrecognized (non-integer)
|
||||
@@ -1275,6 +1274,47 @@ typedef struct constdata_s
|
||||
// -- BLIS object type definitions ---------------------------------------------
|
||||
//
|
||||
|
||||
// Forward declarations for function pointer types
|
||||
struct obj_s;
|
||||
struct cntx_s;
|
||||
struct rntm_s;
|
||||
struct thrinfo_s;
|
||||
|
||||
typedef void (*obj_pack_fn_t)
|
||||
(
|
||||
mdim_t mat,
|
||||
mem_t* mem,
|
||||
struct obj_s* a,
|
||||
struct obj_s* ap,
|
||||
struct cntx_s* cntx,
|
||||
struct rntm_s* rntm,
|
||||
struct thrinfo_s* thread
|
||||
);
|
||||
|
||||
typedef void (*obj_ker_fn_t)
|
||||
(
|
||||
struct obj_s* a,
|
||||
struct obj_s* b,
|
||||
struct obj_s* c,
|
||||
struct cntx_s* cntx,
|
||||
struct rntm_s* rntm,
|
||||
struct thrinfo_s* thread
|
||||
);
|
||||
|
||||
typedef void (*obj_ukr_fn_t)
|
||||
(
|
||||
dim_t m,
|
||||
dim_t n,
|
||||
dim_t k,
|
||||
void* restrict alpha,
|
||||
void* restrict a, inc_t rs_a, inc_t cs_a,
|
||||
void* restrict b, inc_t rs_b, inc_t cs_b,
|
||||
void* restrict beta,
|
||||
void* restrict c, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
struct cntx_s* restrict cntx
|
||||
);
|
||||
|
||||
typedef struct obj_s
|
||||
{
|
||||
// Basic fields
|
||||
@@ -1304,6 +1344,15 @@ typedef struct obj_s
|
||||
// usually MR or NR)
|
||||
dim_t m_panel; // m dimension of a "full" panel
|
||||
dim_t n_panel; // n dimension of a "full" panel
|
||||
|
||||
// User data pointer
|
||||
void* user_data;
|
||||
|
||||
// Function pointers
|
||||
obj_pack_fn_t pack;
|
||||
obj_ker_fn_t ker;
|
||||
obj_ukr_fn_t ukr;
|
||||
|
||||
} obj_t;
|
||||
|
||||
// Pre-initializors. Things that must be set afterwards:
|
||||
@@ -1340,7 +1389,13 @@ typedef struct obj_s
|
||||
.ps = 0, \
|
||||
.pd = 0, \
|
||||
.m_panel = 0, \
|
||||
.n_panel = 0 \
|
||||
.n_panel = 0, \
|
||||
\
|
||||
.user_data = NULL, \
|
||||
\
|
||||
.pack = NULL, \
|
||||
.ker = NULL, \
|
||||
.ukr = NULL \
|
||||
}
|
||||
|
||||
#define BLIS_OBJECT_INITIALIZER_1X1 \
|
||||
@@ -1368,7 +1423,13 @@ typedef struct obj_s
|
||||
.ps = 0, \
|
||||
.pd = 0, \
|
||||
.m_panel = 0, \
|
||||
.n_panel = 0 \
|
||||
.n_panel = 0, \
|
||||
\
|
||||
.user_data = NULL, \
|
||||
\
|
||||
.pack = NULL, \
|
||||
.ker = NULL, \
|
||||
.ukr = NULL \
|
||||
}
|
||||
|
||||
// Define these macros here since they must be updated if contents of
|
||||
@@ -1402,6 +1463,12 @@ BLIS_INLINE void bli_obj_init_full_shallow_copy_of( obj_t* a, obj_t* b )
|
||||
b->pd = a->pd;
|
||||
b->m_panel = a->m_panel;
|
||||
b->n_panel = a->n_panel;
|
||||
|
||||
b->user_data = a->user_data;
|
||||
|
||||
b->pack = a->pack;
|
||||
b->ker = a->ker;
|
||||
b->ukr = a->ukr;
|
||||
}
|
||||
|
||||
BLIS_INLINE void bli_obj_init_subpart_from( obj_t* a, obj_t* b )
|
||||
@@ -1435,6 +1502,12 @@ BLIS_INLINE void bli_obj_init_subpart_from( obj_t* a, obj_t* b )
|
||||
b->pd = a->pd;
|
||||
b->m_panel = a->m_panel;
|
||||
b->n_panel = a->n_panel;
|
||||
|
||||
b->user_data = a->user_data;
|
||||
|
||||
b->pack = a->pack;
|
||||
b->ker = a->ker;
|
||||
b->ukr = a->ukr;
|
||||
}
|
||||
|
||||
// Initializors for global scalar constants.
|
||||
|
||||
@@ -48,15 +48,24 @@ extern "C" {
|
||||
// NOTE: PLEASE DON'T CHANGE THE ORDER IN WHICH HEADERS ARE INCLUDED UNLESS
|
||||
// YOU ARE SURE THAT IT DOESN'T BREAK INTER-HEADER MACRO DEPENDENCIES.
|
||||
|
||||
// -- System headers --
|
||||
// NOTE: This header must be included before bli_config_macro_defs.h.
|
||||
|
||||
#include "bli_system.h"
|
||||
|
||||
|
||||
// -- configure definitions --
|
||||
|
||||
// NOTE: bli_config.h header must be included before any BLIS header.
|
||||
// It is bootstrapped by ./configure and does not depend on later
|
||||
// headers. Moreover, these configuration variables are necessary to change
|
||||
// some default behaviors (e.g. disable OS-detection in bli_system.h in case
|
||||
// of --disable-system).
|
||||
#include "bli_config.h"
|
||||
|
||||
// -- System and language-related headers --
|
||||
|
||||
// NOTE: bli_system.h header must be included before bli_config_macro_defs.h.
|
||||
#include "bli_system.h"
|
||||
#include "bli_lang_defs.h"
|
||||
|
||||
|
||||
// -- configure default definitions --
|
||||
|
||||
#include "bli_config_macro_defs.h"
|
||||
|
||||
|
||||
|
||||
@@ -107,7 +107,7 @@ void bli_cpackm_haswell_asm_3xk
|
||||
if ( cdim0 == mnr && !gs && !conja && unitk )
|
||||
{
|
||||
begin_asm()
|
||||
|
||||
|
||||
mov(var(a), rax) // load address of a.
|
||||
|
||||
mov(var(inca), r8) // load inca
|
||||
@@ -122,14 +122,14 @@ void bli_cpackm_haswell_asm_3xk
|
||||
mov(var(one), rdx) // load address of 1.0 constant
|
||||
vbroadcastss(mem(rdx, 0), ymm1) // load 1.0 and duplicate
|
||||
vxorps(ymm0, ymm0, ymm0) // set ymm0 to 0.0.
|
||||
|
||||
|
||||
mov(var(kappa), rcx) // load address of kappa
|
||||
vbroadcastss(mem(rcx, 0), ymm10) // load kappa_r and duplicate
|
||||
vbroadcastss(mem(rcx, 4), ymm11) // load kappa_i and duplicate
|
||||
|
||||
|
||||
|
||||
// now branch on kappa == 1.0
|
||||
|
||||
|
||||
vucomiss(xmm1, xmm10) // set ZF if kappa_r == 1.0.
|
||||
sete(r12b) // r12b = ( ZF == 1 ? 1 : 0 );
|
||||
vucomiss(xmm0, xmm11) // set ZF if kappa_i == 0.0.
|
||||
@@ -143,7 +143,7 @@ void bli_cpackm_haswell_asm_3xk
|
||||
|
||||
cmp(imm(8), r8) // set ZF if (8*inca) == 8.
|
||||
jz(.CCOLNONU) // jump to column storage case
|
||||
|
||||
|
||||
// -- kappa non-unit, row storage on A -------------------------------------
|
||||
|
||||
label(.CROWNONU)
|
||||
@@ -156,7 +156,7 @@ void bli_cpackm_haswell_asm_3xk
|
||||
label(.CCOLNONU)
|
||||
|
||||
jmp(.CDONE) // jump to end.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -167,7 +167,7 @@ void bli_cpackm_haswell_asm_3xk
|
||||
|
||||
|
||||
// -- kappa unit, row storage on A -----------------------------------------
|
||||
|
||||
|
||||
label(.CROWUNIT)
|
||||
|
||||
//lea(mem(r8, r8, 2), r12) // r12 = 3*inca
|
||||
@@ -251,7 +251,7 @@ void bli_cpackm_haswell_asm_3xk
|
||||
// -- kappa unit, column storage on A --------------------------------------
|
||||
|
||||
label(.CCOLUNIT)
|
||||
|
||||
|
||||
lea(mem(r10, r10, 2), r13) // r13 = 3*lda
|
||||
|
||||
mov(var(k_iter), rsi) // i = k_iter;
|
||||
@@ -315,8 +315,8 @@ void bli_cpackm_haswell_asm_3xk
|
||||
|
||||
|
||||
label(.CDONE)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
end_asm(
|
||||
: // output operands (none)
|
||||
@@ -372,7 +372,7 @@ void bli_cpackm_haswell_asm_3xk
|
||||
(
|
||||
m_edge,
|
||||
n_edge,
|
||||
p_edge, 1, ldp
|
||||
p_edge, 1, ldp
|
||||
);
|
||||
}
|
||||
}
|
||||
@@ -392,7 +392,7 @@ void bli_cpackm_haswell_asm_3xk
|
||||
(
|
||||
m_edge,
|
||||
n_edge,
|
||||
p_edge, 1, ldp
|
||||
p_edge, 1, ldp
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -107,7 +107,7 @@ void bli_cpackm_haswell_asm_8xk
|
||||
if ( cdim0 == mnr && !gs && !conja && unitk )
|
||||
{
|
||||
begin_asm()
|
||||
|
||||
|
||||
mov(var(a), rax) // load address of a.
|
||||
|
||||
mov(var(inca), r8) // load inca
|
||||
@@ -122,14 +122,14 @@ void bli_cpackm_haswell_asm_8xk
|
||||
mov(var(one), rdx) // load address of 1.0 constant
|
||||
vbroadcastss(mem(rdx, 0), ymm1) // load 1.0 and duplicate
|
||||
vxorps(ymm0, ymm0, ymm0) // set ymm0 to 0.0.
|
||||
|
||||
|
||||
mov(var(kappa), rcx) // load address of kappa
|
||||
vbroadcastss(mem(rcx, 0), ymm10) // load kappa_r and duplicate
|
||||
vbroadcastss(mem(rcx, 4), ymm11) // load kappa_i and duplicate
|
||||
|
||||
|
||||
|
||||
// now branch on kappa == 1.0
|
||||
|
||||
|
||||
vucomiss(xmm1, xmm10) // set ZF if kappa_r == 1.0.
|
||||
sete(r12b) // r12b = ( ZF == 1 ? 1 : 0 );
|
||||
vucomiss(xmm0, xmm11) // set ZF if kappa_i == 0.0.
|
||||
@@ -143,7 +143,7 @@ void bli_cpackm_haswell_asm_8xk
|
||||
|
||||
cmp(imm(8), r8) // set ZF if (8*inca) == 8.
|
||||
jz(.CCOLNONU) // jump to column storage case
|
||||
|
||||
|
||||
// -- kappa non-unit, row storage on A -------------------------------------
|
||||
|
||||
label(.CROWNONU)
|
||||
@@ -156,7 +156,7 @@ void bli_cpackm_haswell_asm_8xk
|
||||
label(.CCOLNONU)
|
||||
|
||||
jmp(.CDONE) // jump to end.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -167,7 +167,7 @@ void bli_cpackm_haswell_asm_8xk
|
||||
|
||||
|
||||
// -- kappa unit, row storage on A -----------------------------------------
|
||||
|
||||
|
||||
label(.CROWUNIT)
|
||||
|
||||
lea(mem(r8, r8, 2), r12) // r12 = 3*inca
|
||||
@@ -271,7 +271,7 @@ void bli_cpackm_haswell_asm_8xk
|
||||
// -- kappa unit, column storage on A --------------------------------------
|
||||
|
||||
label(.CCOLUNIT)
|
||||
|
||||
|
||||
lea(mem(r10, r10, 2), r13) // r13 = 3*lda
|
||||
|
||||
mov(var(k_iter), rsi) // i = k_iter;
|
||||
@@ -335,8 +335,8 @@ void bli_cpackm_haswell_asm_8xk
|
||||
|
||||
|
||||
label(.CDONE)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
end_asm(
|
||||
: // output operands (none)
|
||||
@@ -392,7 +392,7 @@ void bli_cpackm_haswell_asm_8xk
|
||||
(
|
||||
m_edge,
|
||||
n_edge,
|
||||
p_edge, 1, ldp
|
||||
p_edge, 1, ldp
|
||||
);
|
||||
}
|
||||
}
|
||||
@@ -410,7 +410,7 @@ void bli_cpackm_haswell_asm_8xk
|
||||
(
|
||||
m_edge,
|
||||
n_edge,
|
||||
p_edge, 1, ldp
|
||||
p_edge, 1, ldp
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -107,7 +107,7 @@ void bli_zpackm_haswell_asm_3xk
|
||||
if ( cdim0 == mnr && !gs && !conja && unitk )
|
||||
{
|
||||
begin_asm()
|
||||
|
||||
|
||||
mov(var(a), rax) // load address of a.
|
||||
|
||||
mov(var(inca), r8) // load inca
|
||||
@@ -124,14 +124,14 @@ void bli_zpackm_haswell_asm_3xk
|
||||
mov(var(one), rdx) // load address of 1.0 constant
|
||||
vbroadcastsd(mem(rdx, 0), ymm1) // load 1.0 and duplicate
|
||||
vxorpd(ymm0, ymm0, ymm0) // set ymm0 to 0.0.
|
||||
|
||||
|
||||
mov(var(kappa), rcx) // load address of kappa
|
||||
vbroadcastsd(mem(rcx, 0), ymm10) // load kappa_r and duplicate
|
||||
vbroadcastsd(mem(rcx, 8), ymm11) // load kappa_i and duplicate
|
||||
|
||||
|
||||
|
||||
// now branch on kappa == 1.0
|
||||
|
||||
|
||||
vucomisd(xmm1, xmm10) // set ZF if kappa_r == 1.0.
|
||||
sete(r12b) // r12b = ( ZF == 1 ? 1 : 0 );
|
||||
vucomisd(xmm0, xmm11) // set ZF if kappa_i == 0.0.
|
||||
@@ -145,7 +145,7 @@ void bli_zpackm_haswell_asm_3xk
|
||||
|
||||
cmp(imm(16), r8) // set ZF if (16*inca) == 16.
|
||||
jz(.ZCOLNONU) // jump to column storage case
|
||||
|
||||
|
||||
// -- kappa non-unit, row storage on A -------------------------------------
|
||||
|
||||
label(.ZROWNONU)
|
||||
@@ -158,7 +158,7 @@ void bli_zpackm_haswell_asm_3xk
|
||||
label(.ZCOLNONU)
|
||||
|
||||
jmp(.ZDONE) // jump to end.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -169,7 +169,7 @@ void bli_zpackm_haswell_asm_3xk
|
||||
|
||||
|
||||
// -- kappa unit, row storage on A -----------------------------------------
|
||||
|
||||
|
||||
label(.ZROWUNIT)
|
||||
|
||||
//lea(mem(r8, r8, 2), r12) // r12 = 3*inca
|
||||
@@ -257,7 +257,7 @@ void bli_zpackm_haswell_asm_3xk
|
||||
// -- kappa unit, column storage on A --------------------------------------
|
||||
|
||||
label(.ZCOLUNIT)
|
||||
|
||||
|
||||
lea(mem(r10, r10, 2), r13) // r13 = 3*lda
|
||||
|
||||
mov(var(k_iter), rsi) // i = k_iter;
|
||||
@@ -321,8 +321,8 @@ void bli_zpackm_haswell_asm_3xk
|
||||
|
||||
|
||||
label(.ZDONE)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
end_asm(
|
||||
: // output operands (none)
|
||||
@@ -378,7 +378,7 @@ void bli_zpackm_haswell_asm_3xk
|
||||
(
|
||||
m_edge,
|
||||
n_edge,
|
||||
p_edge, 1, ldp
|
||||
p_edge, 1, ldp
|
||||
);
|
||||
}
|
||||
}
|
||||
@@ -396,7 +396,7 @@ void bli_zpackm_haswell_asm_3xk
|
||||
(
|
||||
m_edge,
|
||||
n_edge,
|
||||
p_edge, 1, ldp
|
||||
p_edge, 1, ldp
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -107,7 +107,7 @@ void bli_zpackm_haswell_asm_4xk
|
||||
if ( cdim0 == mnr && !gs && !conja && unitk )
|
||||
{
|
||||
begin_asm()
|
||||
|
||||
|
||||
mov(var(a), rax) // load address of a.
|
||||
|
||||
mov(var(inca), r8) // load inca
|
||||
@@ -128,10 +128,10 @@ void bli_zpackm_haswell_asm_4xk
|
||||
mov(var(kappa), rcx) // load address of kappa
|
||||
vbroadcastsd(mem(rcx, 0), ymm10) // load kappa_r and duplicate
|
||||
vbroadcastsd(mem(rcx, 8), ymm11) // load kappa_i and duplicate
|
||||
|
||||
|
||||
|
||||
// now branch on kappa == 1.0
|
||||
|
||||
|
||||
vucomisd(xmm1, xmm10) // set ZF if kappa_r == 1.0.
|
||||
sete(r12b) // r12b = ( ZF == 1 ? 1 : 0 );
|
||||
vucomisd(xmm0, xmm11) // set ZF if kappa_i == 0.0.
|
||||
@@ -145,7 +145,7 @@ void bli_zpackm_haswell_asm_4xk
|
||||
|
||||
cmp(imm(16), r8) // set ZF if (16*inca) == 16.
|
||||
jz(.ZCOLNONU) // jump to column storage case
|
||||
|
||||
|
||||
// -- kappa non-unit, row storage on A -------------------------------------
|
||||
|
||||
label(.ZROWNONU)
|
||||
@@ -158,7 +158,7 @@ void bli_zpackm_haswell_asm_4xk
|
||||
label(.ZCOLNONU)
|
||||
|
||||
jmp(.ZDONE) // jump to end.
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -169,7 +169,7 @@ void bli_zpackm_haswell_asm_4xk
|
||||
|
||||
|
||||
// -- kappa unit, row storage on A -----------------------------------------
|
||||
|
||||
|
||||
label(.ZROWUNIT)
|
||||
|
||||
lea(mem(r8, r8, 2), r12) // r12 = 3*inca
|
||||
@@ -267,7 +267,7 @@ void bli_zpackm_haswell_asm_4xk
|
||||
// -- kappa unit, column storage on A --------------------------------------
|
||||
|
||||
label(.ZCOLUNIT)
|
||||
|
||||
|
||||
lea(mem(r10, r10, 2), r13) // r13 = 3*lda
|
||||
|
||||
mov(var(k_iter), rsi) // i = k_iter;
|
||||
@@ -331,8 +331,8 @@ void bli_zpackm_haswell_asm_4xk
|
||||
|
||||
|
||||
label(.ZDONE)
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
end_asm(
|
||||
: // output operands (none)
|
||||
@@ -388,7 +388,7 @@ void bli_zpackm_haswell_asm_4xk
|
||||
(
|
||||
m_edge,
|
||||
n_edge,
|
||||
p_edge, 1, ldp
|
||||
p_edge, 1, ldp
|
||||
);
|
||||
}
|
||||
}
|
||||
@@ -406,7 +406,7 @@ void bli_zpackm_haswell_asm_4xk
|
||||
(
|
||||
m_edge,
|
||||
n_edge,
|
||||
p_edge, 1, ldp
|
||||
p_edge, 1, ldp
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
@@ -1342,17 +1342,6 @@ void bli_dgemmsup_rd_haswell_asm_1x4
|
||||
|
||||
vperm2f128(imm(0x20), ymm2, ymm0, ymm4 )
|
||||
|
||||
|
||||
//vhaddpd( ymm8, ymm5, ymm0 )
|
||||
//vextractf128(imm(1), ymm0, xmm1 )
|
||||
//vaddpd( xmm0, xmm1, xmm0 )
|
||||
|
||||
//vhaddpd( ymm14, ymm11, ymm2 )
|
||||
//vextractf128(imm(1), ymm2, xmm1 )
|
||||
//vaddpd( xmm2, xmm1, xmm2 )
|
||||
|
||||
//vperm2f128(imm(0x20), ymm2, ymm0, ymm5 )
|
||||
|
||||
// xmm4[0:3] = sum(ymm4) sum(ymm7) sum(ymm10) sum(ymm13)
|
||||
|
||||
|
||||
|
||||
@@ -44,12 +44,15 @@
|
||||
// made available to applications (or the framework) during compilation.
|
||||
|
||||
#include "bls_gemm.h"
|
||||
#include "bls_gemm_check.h"
|
||||
#include "bls_gemm_var.h"
|
||||
|
||||
#include "bls_l3_packm_a.h"
|
||||
#include "bls_l3_packm_b.h"
|
||||
#include "bls_l3_packm_var.h"
|
||||
|
||||
#include "bls_packm_cxk.h"
|
||||
|
||||
#include "bls_l3_decor.h"
|
||||
|
||||
|
||||
|
||||
@@ -94,7 +94,7 @@ void bls_gemm_ex
|
||||
// Check parameters.
|
||||
if ( bli_error_checking_is_enabled() )
|
||||
{
|
||||
bli_gemm_check( alpha, a, b, beta, c, cntx );
|
||||
bls_gemm_check( alpha, a, b, beta, c, cntx );
|
||||
}
|
||||
|
||||
// If C has a zero dimension, return early.
|
||||
|
||||
122
sandbox/gemmlike/bls_gemm_check.c
Normal file
122
sandbox/gemmlike/bls_gemm_check.c
Normal file
@@ -0,0 +1,122 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name(s) of the copyright holder(s) nor the names of its
|
||||
contributors may be used to endorse or promote products derived
|
||||
from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
void bls_gemm_check
|
||||
(
|
||||
obj_t* alpha,
|
||||
obj_t* a,
|
||||
obj_t* b,
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
//bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED );
|
||||
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_noninteger_object( alpha );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_noninteger_object( beta );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_floating_object( a );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_floating_object( b );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_floating_object( c );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check scalar/vector/matrix type.
|
||||
|
||||
e_val = bli_check_scalar_object( alpha );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( beta );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_matrix_object( a );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_matrix_object( b );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_matrix_object( c );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( alpha );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( a );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( b );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( beta );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( c );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check for sufficiently sized stack buffers
|
||||
|
||||
e_val = bli_check_sufficient_stack_buf_size( bli_obj_dt( a ), cntx );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_level3_dims( a, b, c );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check for consistent datatypes.
|
||||
// NOTE: We only perform these tests when mixed datatype support is
|
||||
// disabled.
|
||||
|
||||
e_val = bli_check_consistent_object_datatypes( c, a );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_consistent_object_datatypes( c, b );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
49
sandbox/gemmlike/bls_gemm_check.h
Normal file
49
sandbox/gemmlike/bls_gemm_check.h
Normal file
@@ -0,0 +1,49 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name(s) of the copyright holder(s) nor the names of its
|
||||
contributors may be used to endorse or promote products derived
|
||||
from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based check functions.
|
||||
//
|
||||
|
||||
void bls_gemm_check
|
||||
(
|
||||
obj_t* alpha,
|
||||
obj_t* a,
|
||||
obj_t* b,
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx
|
||||
);
|
||||
|
||||
@@ -61,3 +61,14 @@ GENTPROT( double, d, packm_var1 )
|
||||
GENTPROT( scomplex, c, packm_var1 )
|
||||
GENTPROT( dcomplex, z, packm_var1 )
|
||||
|
||||
//INSERT_GENTPROT_BASIC0( packm_var2 )
|
||||
GENTPROT( float, s, packm_var2 )
|
||||
GENTPROT( double, d, packm_var2 )
|
||||
GENTPROT( scomplex, c, packm_var2 )
|
||||
GENTPROT( dcomplex, z, packm_var2 )
|
||||
|
||||
//INSERT_GENTPROT_BASIC0( packm_var3 )
|
||||
GENTPROT( float, s, packm_var3 )
|
||||
GENTPROT( double, d, packm_var3 )
|
||||
GENTPROT( scomplex, c, packm_var3 )
|
||||
GENTPROT( dcomplex, z, packm_var3 )
|
||||
|
||||
@@ -35,7 +35,7 @@
|
||||
#include "blis.h"
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces to the variants.
|
||||
// Variant 1 provides basic support for packing by calling packm_cxk().
|
||||
//
|
||||
|
||||
#undef GENTFUNC
|
||||
@@ -66,13 +66,11 @@ void PASTECH2(bls_,ch,varname) \
|
||||
dim_t it, ic; \
|
||||
dim_t ic0; \
|
||||
doff_t ic_inc; \
|
||||
dim_t panel_len_full; \
|
||||
dim_t panel_len_i; \
|
||||
dim_t panel_len; \
|
||||
dim_t panel_len_max; \
|
||||
dim_t panel_len_max_i; \
|
||||
dim_t panel_dim_i; \
|
||||
dim_t panel_dim; \
|
||||
dim_t panel_dim_max; \
|
||||
inc_t vs_c; \
|
||||
inc_t incc; \
|
||||
inc_t ldc; \
|
||||
inc_t ldp; \
|
||||
conj_t conjc; \
|
||||
@@ -95,10 +93,10 @@ void PASTECH2(bls_,ch,varname) \
|
||||
{ \
|
||||
/* Prepare to pack to row-stored column panels. */ \
|
||||
iter_dim = n; \
|
||||
panel_len_full = m; \
|
||||
panel_len = m; \
|
||||
panel_len_max = m_max; \
|
||||
panel_dim_max = pd_p; \
|
||||
vs_c = cs_c; \
|
||||
incc = cs_c; \
|
||||
ldc = rs_c; \
|
||||
ldp = rs_p; \
|
||||
} \
|
||||
@@ -106,10 +104,10 @@ void PASTECH2(bls_,ch,varname) \
|
||||
{ \
|
||||
/* Prepare to pack to column-stored row panels. */ \
|
||||
iter_dim = m; \
|
||||
panel_len_full = n; \
|
||||
panel_len = n; \
|
||||
panel_len_max = n_max; \
|
||||
panel_dim_max = pd_p; \
|
||||
vs_c = rs_c; \
|
||||
incc = rs_c; \
|
||||
ldc = cs_c; \
|
||||
ldp = cs_p; \
|
||||
} \
|
||||
@@ -147,31 +145,28 @@ void PASTECH2(bls_,ch,varname) \
|
||||
for ( ic = ic0, it = 0; it < n_iter; \
|
||||
ic += ic_inc, it += 1 ) \
|
||||
{ \
|
||||
panel_dim_i = bli_min( panel_dim_max, iter_dim - ic ); \
|
||||
panel_dim = bli_min( panel_dim_max, iter_dim - ic ); \
|
||||
\
|
||||
ctype* restrict c_begin = c_cast + (ic )*vs_c; \
|
||||
ctype* restrict c_begin = c_cast + (ic )*incc; \
|
||||
\
|
||||
ctype* restrict c_use = c_begin; \
|
||||
ctype* restrict p_use = p_begin; \
|
||||
\
|
||||
panel_len_i = panel_len_full; \
|
||||
panel_len_max_i = panel_len_max; \
|
||||
\
|
||||
/* The definition of bli_packm_my_iter() will depend on whether slab
|
||||
or round-robin partitioning was requested at configure-time. (The
|
||||
default is slab.) */ \
|
||||
if ( bli_packm_my_iter( it, it_start, it_end, tid, nt ) ) \
|
||||
{ \
|
||||
PASTEMAC(ch,packm_cxk) \
|
||||
PASTECH2(bls_,ch,packm_cxk) \
|
||||
( \
|
||||
conjc, \
|
||||
schema, \
|
||||
panel_dim_i, \
|
||||
panel_dim, \
|
||||
panel_dim_max, \
|
||||
panel_len_i, \
|
||||
panel_len_max_i, \
|
||||
panel_len, \
|
||||
panel_len_max, \
|
||||
kappa_cast, \
|
||||
c_use, vs_c, ldc, \
|
||||
c_use, incc, ldc, \
|
||||
p_use, ldp, \
|
||||
cntx \
|
||||
); \
|
||||
244
sandbox/gemmlike/bls_l3_packm_var2.c
Normal file
244
sandbox/gemmlike/bls_l3_packm_var2.c
Normal file
@@ -0,0 +1,244 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2021, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name(s) of the copyright holder(s) nor the names of its
|
||||
contributors may be used to endorse or promote products derived
|
||||
from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
//
|
||||
// Variant 2 is similar to variant 1, but inlines the contents of packm_cxk().
|
||||
//
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, varname ) \
|
||||
\
|
||||
void PASTECH2(bls_,ch,varname) \
|
||||
( \
|
||||
trans_t transc, \
|
||||
pack_t schema, \
|
||||
dim_t m, \
|
||||
dim_t n, \
|
||||
dim_t m_max, \
|
||||
dim_t n_max, \
|
||||
ctype* restrict kappa, \
|
||||
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
|
||||
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
|
||||
dim_t pd_p, inc_t ps_p, \
|
||||
cntx_t* restrict cntx, \
|
||||
thrinfo_t* restrict thread \
|
||||
) \
|
||||
{ \
|
||||
ctype* restrict kappa_cast = kappa; \
|
||||
ctype* restrict c_cast = c; \
|
||||
ctype* restrict p_cast = p; \
|
||||
\
|
||||
dim_t iter_dim; \
|
||||
dim_t n_iter; \
|
||||
dim_t it, ic; \
|
||||
dim_t ic0; \
|
||||
doff_t ic_inc; \
|
||||
dim_t panel_len; \
|
||||
dim_t panel_len_max; \
|
||||
dim_t panel_dim; \
|
||||
dim_t panel_dim_max; \
|
||||
inc_t incc; \
|
||||
inc_t ldc; \
|
||||
inc_t ldp; \
|
||||
conj_t conjc; \
|
||||
\
|
||||
\
|
||||
/* Extract the conjugation bit from the transposition argument. */ \
|
||||
conjc = bli_extract_conj( transc ); \
|
||||
\
|
||||
/* Create flags to incidate row or column storage. Note that the
|
||||
schema bit that encodes row or column is describing the form of
|
||||
micro-panel, not the storage in the micro-panel. Hence the
|
||||
mismatch in "row" and "column" semantics. */ \
|
||||
bool row_stored = bli_is_col_packed( schema ); \
|
||||
/*bool col_stored = bli_is_row_packed( schema );*/ \
|
||||
\
|
||||
/* If the row storage flag indicates row storage, then we are packing
|
||||
to column panels; otherwise, if the strides indicate column storage,
|
||||
we are packing to row panels. */ \
|
||||
if ( row_stored ) \
|
||||
{ \
|
||||
/* Prepare to pack to row-stored column panels. */ \
|
||||
iter_dim = n; \
|
||||
panel_len = m; \
|
||||
panel_len_max = m_max; \
|
||||
panel_dim_max = pd_p; \
|
||||
incc = cs_c; \
|
||||
ldc = rs_c; \
|
||||
ldp = rs_p; \
|
||||
} \
|
||||
else /* if ( col_stored ) */ \
|
||||
{ \
|
||||
/* Prepare to pack to column-stored row panels. */ \
|
||||
iter_dim = m; \
|
||||
panel_len = n; \
|
||||
panel_len_max = n_max; \
|
||||
panel_dim_max = pd_p; \
|
||||
incc = rs_c; \
|
||||
ldc = cs_c; \
|
||||
ldp = cs_p; \
|
||||
} \
|
||||
\
|
||||
/* Compute the total number of iterations we'll need. */ \
|
||||
n_iter = iter_dim / panel_dim_max + ( iter_dim % panel_dim_max ? 1 : 0 ); \
|
||||
\
|
||||
/* Set the initial values and increments for indices related to C and P
|
||||
based on whether reverse iteration was requested. */ \
|
||||
{ \
|
||||
ic0 = 0; \
|
||||
ic_inc = panel_dim_max; \
|
||||
} \
|
||||
\
|
||||
ctype* restrict p_begin = p_cast; \
|
||||
\
|
||||
/* Query the number of threads and thread ids from the current thread's
|
||||
packm thrinfo_t node. */ \
|
||||
const dim_t nt = bli_thread_n_way( thread ); \
|
||||
const dim_t tid = bli_thread_work_id( thread ); \
|
||||
\
|
||||
/* Suppress warnings in case tid isn't used (ie: as in slab partitioning). */ \
|
||||
( void )nt; \
|
||||
( void )tid; \
|
||||
\
|
||||
dim_t it_start, it_end, it_inc; \
|
||||
\
|
||||
/* Determine the thread range and increment using the current thread's
|
||||
packm thrinfo_t node. NOTE: The definition of bli_thread_range_jrir()
|
||||
will depend on whether slab or round-robin partitioning was requested
|
||||
at configure-time. */ \
|
||||
bli_thread_range_jrir( thread, n_iter, 1, FALSE, &it_start, &it_end, &it_inc ); \
|
||||
\
|
||||
/* Iterate over every logical micropanel in the source matrix. */ \
|
||||
for ( ic = ic0, it = 0; it < n_iter; \
|
||||
ic += ic_inc, it += 1 ) \
|
||||
{ \
|
||||
panel_dim = bli_min( panel_dim_max, iter_dim - ic ); \
|
||||
\
|
||||
ctype* restrict c_begin = c_cast + (ic )*incc; \
|
||||
\
|
||||
ctype* restrict c_use = c_begin; \
|
||||
ctype* restrict p_use = p_begin; \
|
||||
\
|
||||
/* The definition of bli_packm_my_iter() will depend on whether slab
|
||||
or round-robin partitioning was requested at configure-time. (The
|
||||
default is slab.) */ \
|
||||
if ( bli_packm_my_iter( it, it_start, it_end, tid, nt ) ) \
|
||||
{ \
|
||||
/* NOTE: We assume here that kappa = 1 and therefore ignore it. If
|
||||
we're wrong, this will get someone's attention. */ \
|
||||
if ( !PASTEMAC(ch,eq1)( *kappa_cast ) ) \
|
||||
bli_abort(); \
|
||||
\
|
||||
/* Perform the packing, taking conjc into account. */ \
|
||||
if ( bli_is_conj( conjc ) ) \
|
||||
{ \
|
||||
for ( dim_t l = 0; l < panel_len; ++l ) \
|
||||
{ \
|
||||
for ( dim_t i = 0; i < panel_dim; ++i ) \
|
||||
{ \
|
||||
ctype* cli = c_use + (l )*ldc + (i )*incc; \
|
||||
ctype* pli = p_use + (l )*ldp + (i )*1; \
|
||||
\
|
||||
PASTEMAC(ch,copyjs)( *cli, *pli ); \
|
||||
} \
|
||||
} \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
for ( dim_t l = 0; l < panel_len; ++l ) \
|
||||
{ \
|
||||
for ( dim_t i = 0; i < panel_dim; ++i ) \
|
||||
{ \
|
||||
ctype* cli = c_use + (l )*ldc + (i )*incc; \
|
||||
ctype* pli = p_use + (l )*ldp + (i )*1; \
|
||||
\
|
||||
PASTEMAC(ch,copys)( *cli, *pli ); \
|
||||
} \
|
||||
} \
|
||||
} \
|
||||
\
|
||||
/* If panel_dim < panel_dim_max, then we zero those unused rows. */ \
|
||||
if ( panel_dim < panel_dim_max ) \
|
||||
{ \
|
||||
const dim_t i = panel_dim; \
|
||||
const dim_t m_edge = panel_dim_max - panel_dim; \
|
||||
const dim_t n_edge = panel_len_max; \
|
||||
ctype* restrict p_edge = p_use + (i )*1; \
|
||||
\
|
||||
PASTEMAC(ch,set0s_mxn) \
|
||||
( \
|
||||
m_edge, \
|
||||
n_edge, \
|
||||
p_edge, 1, ldp \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
/* If panel_len < panel_len_max, then we zero those unused columns. */ \
|
||||
if ( panel_len < panel_len_max ) \
|
||||
{ \
|
||||
const dim_t j = panel_len; \
|
||||
const dim_t m_edge = panel_dim_max; \
|
||||
const dim_t n_edge = panel_len_max - panel_len; \
|
||||
ctype* restrict p_edge = p_use + (j )*ldp; \
|
||||
\
|
||||
PASTEMAC(ch,set0s_mxn) \
|
||||
( \
|
||||
m_edge, \
|
||||
n_edge, \
|
||||
p_edge, 1, ldp \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
\
|
||||
/*
|
||||
if ( !row_stored ) \
|
||||
PASTEMAC(ch,fprintm)( stdout, "packm_var1: a packed", panel_dim_max, panel_len_max, \
|
||||
p_use, rs_p, cs_p, "%5.2f", "" ); \
|
||||
else \
|
||||
PASTEMAC(ch,fprintm)( stdout, "packm_var1: b packed", panel_len_max, panel_dim_max, \
|
||||
p_use, rs_p, cs_p, "%5.2f", "" ); \
|
||||
*/ \
|
||||
\
|
||||
p_begin += ps_p; \
|
||||
} \
|
||||
}
|
||||
|
||||
//INSERT_GENTFUNC_BASIC0( packm_var1 )
|
||||
GENTFUNC( float, s, packm_var2 )
|
||||
GENTFUNC( double, d, packm_var2 )
|
||||
GENTFUNC( scomplex, c, packm_var2 )
|
||||
GENTFUNC( dcomplex, z, packm_var2 )
|
||||
|
||||
200
sandbox/gemmlike/bls_l3_packm_var3.c
Normal file
200
sandbox/gemmlike/bls_l3_packm_var3.c
Normal file
@@ -0,0 +1,200 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2021, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name(s) of the copyright holder(s) nor the names of its
|
||||
contributors may be used to endorse or promote products derived
|
||||
from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
//
|
||||
// Variant 3 is similar to variant 1, except that it parallelizes packing
|
||||
// along the k dimension. (Our current hypothesis is that this method of
|
||||
// parallelizing the operation may perform better on some NUMA systems.)
|
||||
//
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, varname ) \
|
||||
\
|
||||
void PASTECH2(bls_,ch,varname) \
|
||||
( \
|
||||
trans_t transc, \
|
||||
pack_t schema, \
|
||||
dim_t m, \
|
||||
dim_t n, \
|
||||
dim_t m_max, \
|
||||
dim_t n_max, \
|
||||
ctype* restrict kappa, \
|
||||
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
|
||||
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
|
||||
dim_t pd_p, inc_t ps_p, \
|
||||
cntx_t* restrict cntx, \
|
||||
thrinfo_t* restrict thread \
|
||||
) \
|
||||
{ \
|
||||
ctype* restrict kappa_cast = kappa; \
|
||||
ctype* restrict c_cast = c; \
|
||||
ctype* restrict p_cast = p; \
|
||||
\
|
||||
dim_t iter_dim; \
|
||||
dim_t n_iter; \
|
||||
dim_t it, ic; \
|
||||
dim_t ic0; \
|
||||
doff_t ic_inc; \
|
||||
dim_t panel_len; \
|
||||
dim_t panel_len_max; \
|
||||
dim_t panel_dim; \
|
||||
dim_t panel_dim_max; \
|
||||
inc_t incc; \
|
||||
inc_t ldc; \
|
||||
inc_t ldp; \
|
||||
conj_t conjc; \
|
||||
\
|
||||
\
|
||||
/* Extract the conjugation bit from the transposition argument. */ \
|
||||
conjc = bli_extract_conj( transc ); \
|
||||
\
|
||||
/* Create flags to incidate row or column storage. Note that the
|
||||
schema bit that encodes row or column is describing the form of
|
||||
micro-panel, not the storage in the micro-panel. Hence the
|
||||
mismatch in "row" and "column" semantics. */ \
|
||||
bool row_stored = bli_is_col_packed( schema ); \
|
||||
/*bool col_stored = bli_is_row_packed( schema );*/ \
|
||||
\
|
||||
/* If the row storage flag indicates row storage, then we are packing
|
||||
to column panels; otherwise, if the strides indicate column storage,
|
||||
we are packing to row panels. */ \
|
||||
if ( row_stored ) \
|
||||
{ \
|
||||
/* Prepare to pack to row-stored column panels. */ \
|
||||
iter_dim = n; \
|
||||
panel_len = m; \
|
||||
panel_len_max = m_max; \
|
||||
panel_dim_max = pd_p; \
|
||||
incc = cs_c; \
|
||||
ldc = rs_c; \
|
||||
ldp = rs_p; \
|
||||
} \
|
||||
else /* if ( col_stored ) */ \
|
||||
{ \
|
||||
/* Prepare to pack to column-stored row panels. */ \
|
||||
iter_dim = m; \
|
||||
panel_len = n; \
|
||||
panel_len_max = n_max; \
|
||||
panel_dim_max = pd_p; \
|
||||
incc = rs_c; \
|
||||
ldc = cs_c; \
|
||||
ldp = cs_p; \
|
||||
} \
|
||||
\
|
||||
/* Compute the total number of iterations we'll need. */ \
|
||||
n_iter = iter_dim / panel_dim_max + ( iter_dim % panel_dim_max ? 1 : 0 ); \
|
||||
\
|
||||
/* Set the initial values and increments for indices related to C and P
|
||||
based on whether reverse iteration was requested. */ \
|
||||
{ \
|
||||
ic0 = 0; \
|
||||
ic_inc = panel_dim_max; \
|
||||
} \
|
||||
\
|
||||
/* Query the number of threads and thread ids from the current thread's
|
||||
packm thrinfo_t node. */ \
|
||||
const dim_t nt = bli_thread_n_way( thread ); \
|
||||
const dim_t tid = bli_thread_work_id( thread ); \
|
||||
\
|
||||
/* Suppress warnings in case tid isn't used (ie: as in slab partitioning). */ \
|
||||
( void )nt; \
|
||||
( void )tid; \
|
||||
\
|
||||
dim_t pr_start, pr_end; \
|
||||
\
|
||||
/* Determine the thread range and increment using the current thread's
|
||||
packm thrinfo_t node. */ \
|
||||
bli_thread_range_sub( thread, panel_len, 1, FALSE, &pr_start, &pr_end ); \
|
||||
\
|
||||
/* Define instances of panel_len and panel_len_max that are specific to
|
||||
the local thread. */ \
|
||||
dim_t panel_len_loc = pr_end - pr_start; \
|
||||
dim_t panel_len_max_loc = panel_len_loc; \
|
||||
\
|
||||
/* If panel_len_max > panel_len, then there are some columns in p that
|
||||
need to be zeroed. Of course, only the last thread will be responsible
|
||||
for this edge region. */ \
|
||||
dim_t panel_len_zero = panel_len_max - panel_len; \
|
||||
if ( tid == nt - 1 ) panel_len_max_loc += panel_len_zero; \
|
||||
\
|
||||
/* Shift the pointer for c and p to the appropriate locations within the
|
||||
first micropanel. */ \
|
||||
dim_t off_loc = pr_start; \
|
||||
ctype* restrict c_begin_loc = c_cast + off_loc * ldc; \
|
||||
ctype* restrict p_begin_loc = p_cast + off_loc * ldp; \
|
||||
\
|
||||
/* Iterate over every logical micropanel in the source matrix. */ \
|
||||
for ( ic = ic0, it = 0; it < n_iter; \
|
||||
ic += ic_inc, it += 1 ) \
|
||||
{ \
|
||||
panel_dim = bli_min( panel_dim_max, iter_dim - ic ); \
|
||||
\
|
||||
ctype* restrict c_use = c_begin_loc + (ic )*incc; \
|
||||
ctype* restrict p_use = p_begin_loc + (it )*ps_p; \
|
||||
\
|
||||
{ \
|
||||
PASTECH2(bls_,ch,packm_cxk) \
|
||||
( \
|
||||
conjc, \
|
||||
schema, \
|
||||
panel_dim, \
|
||||
panel_dim_max, \
|
||||
panel_len_loc, \
|
||||
panel_len_max_loc, \
|
||||
kappa_cast, \
|
||||
c_use, incc, ldc, \
|
||||
p_use, ldp, \
|
||||
cntx \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
}
|
||||
|
||||
//INSERT_GENTFUNC_BASIC0( packm_var3 )
|
||||
GENTFUNC( float, s, packm_var3 )
|
||||
GENTFUNC( double, d, packm_var3 )
|
||||
GENTFUNC( scomplex, c, packm_var3 )
|
||||
GENTFUNC( dcomplex, z, packm_var3 )
|
||||
|
||||
/*
|
||||
if ( !row_stored ) \
|
||||
PASTEMAC(ch,fprintm)( stdout, "packm_var3: a packed", panel_dim_max, panel_len_max, \
|
||||
p_use, rs_p, cs_p, "%5.2f", "" ); \
|
||||
else \
|
||||
PASTEMAC(ch,fprintm)( stdout, "packm_var3: b packed", panel_len_max, panel_dim_max, \
|
||||
p_use, rs_p, cs_p, "%5.2f", "" ); \
|
||||
*/
|
||||
|
||||
161
sandbox/gemmlike/bls_packm_cxk.c
Normal file
161
sandbox/gemmlike/bls_packm_cxk.c
Normal file
@@ -0,0 +1,161 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name(s) of the copyright holder(s) nor the names of its
|
||||
contributors may be used to endorse or promote products derived
|
||||
from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTECH2(bls_,ch,opname) \
|
||||
( \
|
||||
conj_t conja, \
|
||||
pack_t schema, \
|
||||
dim_t panel_dim, \
|
||||
dim_t panel_dim_max, \
|
||||
dim_t panel_len, \
|
||||
dim_t panel_len_max, \
|
||||
ctype* kappa, \
|
||||
ctype* a, inc_t inca, inc_t lda, \
|
||||
ctype* p, inc_t ldp, \
|
||||
cntx_t* cntx \
|
||||
) \
|
||||
{ \
|
||||
/* Note that we use panel_dim_max, not panel_dim, to query the packm
|
||||
kernel function pointer. This means that we always use the same
|
||||
kernel, even for edge cases. */ \
|
||||
num_t dt = PASTEMAC(ch,type); \
|
||||
l1mkr_t ker_id = panel_dim_max; \
|
||||
\
|
||||
PASTECH2(ch,opname,_ker_ft) f; \
|
||||
\
|
||||
/* Query the context for the packm kernel corresponding to the current
|
||||
panel dimension, or kernel id. If the id is invalid, the function will
|
||||
return NULL. */ \
|
||||
f = bli_cntx_get_packm_ker_dt( dt, ker_id, cntx ); \
|
||||
\
|
||||
/* If there exists a kernel implementation for the micro-panel dimension
|
||||
provided, we invoke the implementation. Otherwise, we use scal2m. */ \
|
||||
/* NOTE: We've disabled calling packm micro-kernels from the context for
|
||||
this implementation. To re-enable, change FALSE to TRUE in the
|
||||
conditional below. */ \
|
||||
if ( f != NULL && FALSE ) \
|
||||
{ \
|
||||
f \
|
||||
( \
|
||||
conja, \
|
||||
schema, \
|
||||
panel_dim, \
|
||||
panel_len, \
|
||||
panel_len_max, \
|
||||
kappa, \
|
||||
a, inca, lda, \
|
||||
p, ldp, \
|
||||
cntx \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
/* NOTE: We assume here that kappa = 1 and therefore ignore it. If
|
||||
we're wrong, this will get someone's attention. */ \
|
||||
if ( !PASTEMAC(ch,eq1)( *kappa ) ) \
|
||||
bli_abort(); \
|
||||
\
|
||||
/* Perform the packing, taking conja into account. */ \
|
||||
if ( bli_is_conj( conja ) ) \
|
||||
{ \
|
||||
for ( dim_t l = 0; l < panel_len; ++l ) \
|
||||
{ \
|
||||
for ( dim_t i = 0; i < panel_dim; ++i ) \
|
||||
{ \
|
||||
ctype* ali = a + (l )*lda + (i )*inca; \
|
||||
ctype* pli = p + (l )*ldp + (i )*1; \
|
||||
\
|
||||
PASTEMAC(ch,copyjs)( *ali, *pli ); \
|
||||
} \
|
||||
} \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
for ( dim_t l = 0; l < panel_len; ++l ) \
|
||||
{ \
|
||||
for ( dim_t i = 0; i < panel_dim; ++i ) \
|
||||
{ \
|
||||
ctype* ali = a + (l )*lda + (i )*inca; \
|
||||
ctype* pli = p + (l )*ldp + (i )*1; \
|
||||
\
|
||||
PASTEMAC(ch,copys)( *ali, *pli ); \
|
||||
} \
|
||||
} \
|
||||
} \
|
||||
\
|
||||
/* If panel_dim < panel_dim_max, then we zero those unused rows. */ \
|
||||
if ( panel_dim < panel_dim_max ) \
|
||||
{ \
|
||||
const dim_t i = panel_dim; \
|
||||
const dim_t m_edge = panel_dim_max - panel_dim; \
|
||||
const dim_t n_edge = panel_len_max; \
|
||||
ctype* restrict p_edge = p + (i )*1; \
|
||||
\
|
||||
PASTEMAC(ch,set0s_mxn) \
|
||||
( \
|
||||
m_edge, \
|
||||
n_edge, \
|
||||
p_edge, 1, ldp \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
/* If panel_len < panel_len_max, then we zero those unused columns. */ \
|
||||
if ( panel_len < panel_len_max ) \
|
||||
{ \
|
||||
const dim_t j = panel_len; \
|
||||
const dim_t m_edge = panel_dim_max; \
|
||||
const dim_t n_edge = panel_len_max - panel_len; \
|
||||
ctype* restrict p_edge = p + (j )*ldp; \
|
||||
\
|
||||
PASTEMAC(ch,set0s_mxn) \
|
||||
( \
|
||||
m_edge, \
|
||||
n_edge, \
|
||||
p_edge, 1, ldp \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
}
|
||||
|
||||
//INSERT_GENTFUNC_BASIC0( packm_cxk )
|
||||
GENTFUNC( float, s, packm_cxk )
|
||||
GENTFUNC( double, d, packm_cxk )
|
||||
GENTFUNC( scomplex, c, packm_cxk )
|
||||
GENTFUNC( dcomplex, z, packm_cxk )
|
||||
|
||||
58
sandbox/gemmlike/bls_packm_cxk.h
Normal file
58
sandbox/gemmlike/bls_packm_cxk.h
Normal file
@@ -0,0 +1,58 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name(s) of the copyright holder(s) nor the names of its
|
||||
contributors may be used to endorse or promote products derived
|
||||
from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, varname ) \
|
||||
\
|
||||
void PASTECH2(bls_,ch,varname) \
|
||||
( \
|
||||
conj_t conja, \
|
||||
pack_t schema, \
|
||||
dim_t panel_dim, \
|
||||
dim_t panel_dim_max, \
|
||||
dim_t panel_len, \
|
||||
dim_t panel_len_max, \
|
||||
ctype* kappa, \
|
||||
ctype* a, inc_t inca, inc_t lda, \
|
||||
ctype* p, inc_t ldp, \
|
||||
cntx_t* cntx \
|
||||
);
|
||||
|
||||
//INSERT_GENTPROT_BASIC0( packm_cxk )
|
||||
GENTPROT( float, s, packm_cxk )
|
||||
GENTPROT( double, d, packm_cxk )
|
||||
GENTPROT( scomplex, c, packm_cxk )
|
||||
GENTPROT( dcomplex, z, packm_cxk )
|
||||
|
||||
@@ -121,6 +121,8 @@ void bls_l3_thread_decorator
|
||||
rntm_t* rntm
|
||||
)
|
||||
{
|
||||
err_t r_val;
|
||||
|
||||
// Query the total number of threads from the context.
|
||||
const dim_t n_threads = bli_rntm_num_threads( rntm );
|
||||
|
||||
@@ -151,12 +153,12 @@ void bls_l3_thread_decorator
|
||||
#ifdef BLIS_ENABLE_MEM_TRACING
|
||||
printf( "bli_l3_thread_decorator().pth: " );
|
||||
#endif
|
||||
bli_pthread_t* pthreads = bli_malloc_intl( sizeof( bli_pthread_t ) * n_threads );
|
||||
bli_pthread_t* pthreads = bli_malloc_intl( sizeof( bli_pthread_t ) * n_threads, &r_val );
|
||||
|
||||
#ifdef BLIS_ENABLE_MEM_TRACING
|
||||
printf( "bli_l3_thread_decorator().pth: " );
|
||||
#endif
|
||||
thread_data_t* datas = bli_malloc_intl( sizeof( thread_data_t ) * n_threads );
|
||||
thread_data_t* datas = bli_malloc_intl( sizeof( thread_data_t ) * n_threads, &r_val );
|
||||
|
||||
// NOTE: We must iterate backwards so that the chief thread (thread id 0)
|
||||
// can spawn all other threads before proceeding with its own computation.
|
||||
|
||||
38
travis/cxx/Makefile
Normal file
38
travis/cxx/Makefile
Normal file
@@ -0,0 +1,38 @@
|
||||
#
|
||||
#
|
||||
# BLIS
|
||||
# An object-based framework for developing high-performance BLAS-like
|
||||
# libraries.
|
||||
#
|
||||
# Copyright (C) 2021, Southern Methodist University
|
||||
#
|
||||
# Redistribution and use in source and binary forms, with or without
|
||||
# modification, are permitted provided that the following conditions are
|
||||
# met:
|
||||
# - Redistributions of source code must retain the above copyright
|
||||
# notice, this list of conditions and the following disclaimer.
|
||||
# - Redistributions in binary form must reproduce the above copyright
|
||||
# notice, this list of conditions and the following disclaimer in the
|
||||
# documentation and/or other materials provided with the distribution.
|
||||
# - Neither the name(s) of the copyright holder(s) nor the names of its
|
||||
# contributors may be used to endorse or promote products derived
|
||||
# from this software without specific prior written permission.
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
#
|
||||
#
|
||||
|
||||
.PHONY: all cxx-test
|
||||
|
||||
all: cxx-test
|
||||
$(CXX) -std=c++0x -o $(BUILD_DIR)/cxx-test.x -I$(INCLUDE_DIR) cxx-test.cxx -L$(LIB_DIR) -lblis
|
||||
50
travis/cxx/cxx-test.cxx
Normal file
50
travis/cxx/cxx-test.cxx
Normal file
@@ -0,0 +1,50 @@
|
||||
//
|
||||
//
|
||||
// BLIS
|
||||
// An object-based framework for developing high-performance BLAS-like
|
||||
// libraries.
|
||||
//
|
||||
// Copyright (C) 2021, Southern Methodist University
|
||||
//
|
||||
// Redistribution and use in source and binary forms, with or without
|
||||
// modification, are permitted provided that the following conditions are
|
||||
// met:
|
||||
// - Redistributions of source code must retain the above copyright
|
||||
// notice, this list of conditions and the following disclaimer.
|
||||
// - Redistributions in binary form must reproduce the above copyright
|
||||
// notice, this list of conditions and the following disclaimer in the
|
||||
// documentation and/or other materials provided with the distribution.
|
||||
// - Neither the name(s) of the copyright holder(s) nor the names of its
|
||||
// contributors may be used to endorse or promote products derived
|
||||
// from this software without specific prior written permission.
|
||||
//
|
||||
// THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
// "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
// LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
// A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
// HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
// SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
// LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
// DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
// THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
// (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
// OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
//
|
||||
//
|
||||
|
||||
#include <vector>
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
int main()
|
||||
{
|
||||
const int N = 5;
|
||||
std::vector<scomplex> A(N*N), B(N*N), C(N*N);
|
||||
scomplex one{1.0, 0.0};
|
||||
scomplex zero{0.0, 0.0};
|
||||
|
||||
bli_cgemm(BLIS_NO_TRANSPOSE, BLIS_NO_TRANSPOSE, N, N, N,
|
||||
&one, A.data(), 1, N,
|
||||
B.data(), 1, N,
|
||||
&zero, C.data(), 1, N);
|
||||
}
|
||||
58
travis/cxx/cxx-test.sh
Executable file
58
travis/cxx/cxx-test.sh
Executable file
@@ -0,0 +1,58 @@
|
||||
#!/bin/bash
|
||||
#
|
||||
#
|
||||
# BLIS
|
||||
# An object-based framework for developing high-performance BLAS-like
|
||||
# libraries.
|
||||
#
|
||||
# Copyright (C) 2021, Southern Methodist University
|
||||
#
|
||||
# Redistribution and use in source and binary forms, with or without
|
||||
# modification, are permitted provided that the following conditions are
|
||||
# met:
|
||||
# - Redistributions of source code must retain the above copyright
|
||||
# notice, this list of conditions and the following disclaimer.
|
||||
# - Redistributions in binary form must reproduce the above copyright
|
||||
# notice, this list of conditions and the following disclaimer in the
|
||||
# documentation and/or other materials provided with the distribution.
|
||||
# - Neither the name(s) of the copyright holder(s) nor the names of its
|
||||
# contributors may be used to endorse or promote products derived
|
||||
# from this software without specific prior written permission.
|
||||
#
|
||||
# THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
# "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
# LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
# A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
# HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
# SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
# LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
# DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
# THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
# (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
# OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
#
|
||||
#
|
||||
|
||||
SOURCE_DIR=$1
|
||||
CONFIG=$2
|
||||
|
||||
if [ -z $SOURCE_DIR ] || [ -z $CONFIG ]; then
|
||||
echo "usage: cxx-test.sh <source dir> <config>"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
BUILD_DIR=$(pwd)
|
||||
INCLUDE_DIR=$BUILD_DIR/include/$CONFIG
|
||||
LIB_DIR=$BUILD_DIR/lib/$CONFIG
|
||||
|
||||
if [ ! -e $INCLUDE_DIR/blis.h ]; then
|
||||
echo "could not find blis.h"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ ! -e $SOURCE_DIR/travis/cxx/Makefile ]; then
|
||||
echo "could not find cxx-test Makefile"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
make -C $SOURCE_DIR/travis/cxx INCLUDE_DIR=$INCLUDE_DIR LIB_DIR=$LIB_DIR BUILD_DIR=$BUILD_DIR
|
||||
@@ -8,19 +8,28 @@ export BLIS_IC_NT=2
|
||||
export BLIS_JR_NT=1
|
||||
export BLIS_IR_NT=1
|
||||
|
||||
if [ "$TEST" = "FAST" ]; then
|
||||
if [ "$TEST" = "FAST" -o "$TEST" = "ALL" ]; then
|
||||
make testblis-fast
|
||||
elif [ "$TEST" = "MD" ]; then
|
||||
$DIST_PATH/testsuite/check-blistest.sh ./output.testsuite
|
||||
fi
|
||||
|
||||
if [ "$TEST" = "MD" -o "$TEST" = "ALL" ]; then
|
||||
make testblis-md
|
||||
elif [ "$TEST" = "SALT" ]; then
|
||||
$DIST_PATH/testsuite/check-blistest.sh ./output.testsuite
|
||||
fi
|
||||
|
||||
if [ "$TEST" = "SALT" -o "$TEST" = "ALL" ]; then
|
||||
# Disable multithreading within BLIS.
|
||||
export BLIS_JC_NT=1 BLIS_IC_NT=1 BLIS_JR_NT=1 BLIS_IR_NT=1
|
||||
make testblis-salt
|
||||
else
|
||||
make testblis
|
||||
$DIST_PATH/testsuite/check-blistest.sh ./output.testsuite
|
||||
fi
|
||||
|
||||
if [ "$TEST" = "1" -o "$TEST" = "ALL" ]; then
|
||||
make testblis
|
||||
$DIST_PATH/testsuite/check-blistest.sh ./output.testsuite
|
||||
fi
|
||||
|
||||
$DIST_PATH/testsuite/check-blistest.sh ./output.testsuite
|
||||
make testblas
|
||||
$DIST_PATH/blastest/check-blastest.sh
|
||||
|
||||
|
||||
Reference in New Issue
Block a user