- Updated Makefile to include DTL files in library build
- Updated Makefile to include cpp header file installation
- Updated test/makefile to include extra API added by AMD team.
AMD-Internal: [CPUPL-1559]
Change-Id: I249c6935d5ff5fb645f9deec7e0218575484be13
Merge conflicts araised has been fixed while downstreaming BLIS code from master to milan-3.1 branch
Implemented an automatic reduction in the number of threads when the user requests parallelism via a single number (ie: the automatic way) and (a) that number of threads is prime, and (b) that number exceeds a minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which defaults to 11. If prime numbers are really desired, this feature may be suppressed by defining the macro BLIS_ENABLE_AUTO_PRIME_NUM_THREADS in the appropriate configuration family's bli_family_*.h. (Jeff Diamond)
Changed default value of BLIS_THREAD_RATIO_M from 2 to 1, which leads to slightly different automatic thread factorizations.
Enable the 1m method only if the real domain microkernel is not a reference kernel. BLIS now forgoes use of 1m if both the real and complex domain kernels are reference implementations.
Relocated the general stride handling for gemmsup. This fixed an issue whereby gemm would fail to trigger to conventional code path for cases that use general stride even after gemmsup rejected the problem. (RuQing Xu)
Fixed an incorrect function signature (and prototype) of bli_?gemmt(). (RuQing Xu)
Redefined BLIS_NUM_ARCHS to be part of the arch_t enum, which means it will be updated automatically when defining future subconfigs.
Minor code consolidation in all level-3 _front() functions.
Reorganized Windows cpp branch of bli_pthreads.c.
Implemented bli_pthread_self() and _equals(), but left them commented out (via cpp guards) due to issues with getting the Windows versions working. Thankfully, these functions aren't yet needed by BLIS.
Allow disabling of trsm diagonal pre-inversion at compile time via --disable-trsm-preinversion.
Fixed obscure testsuite bug for the gemmt test module that relates to its dependency on gemv.
AMD-internal-[CPUPL-1523]
Change-Id: I0d1df018e2df96a23dc4383d01d98b324d5ac5cd
Merged contributions from AMD's AOCL BLIS (#448).
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
only the lower or upper triangle of a square matrix C. For now, only
the conventional/large code path will be supported (in vanilla BLIS).
This was accomplished by leveraging the existing variant logic for
herk. However, some of the infrastructure to support a gemmtsup is
included in this commit, including
- A bli_gemmtsup() front-end, similar to bli_gemmsup().
- A bli_gemmtsup_ref() reference handler function.
- A bli_gemmtsup_int() variant chooser function (with variant calls
commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
wrapper to a set of polymorphic CBLAS-like function wrappers defined
in another header (cblas.hh). These two headers are installed if
running the 'install' target with INSTALL_HH is set to 'yes'. (Also
added a set of unit tests that exercise blis.hh, although they are
disabled for now because they aren't compatible with out-of-tree
builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
various minor updates to dotv and scalv kernels. Also added various
sup kernels contributed by AMD to kernels/zen/3. However, these
kernels are (for now) not yet used, in part because they caused
AppVeyor clang failures, and also because I have not found time to
review and vet them.
- Output the python found during configure into the definition of PYTHON
in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
bug surfaced because the gemmt module verifies its computation using
gemm with its beta parameter set to zero, which, on a cortexa15 system
caused the gemm kernel code to unconditionally multiply the
uninitialized C data by beta. The C matrix likely contained
non-numeric values such as NaN, which then would have resulted in a
false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
in bli_l3_blocksize.c, was inadvertantly being defined in terms of
helper functions meant for trmm. This bug was probably harmless since
the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
kernels/zen/3/bli_gemm_small.c since those macros are not used in
vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
Windows systems.
- Various whitespace changes.
Details:
- Amin api returns index of minimum absolute value in a vector.
- Added amin reference blis kernel.
- Added blas and cblas interface for amin.
AMD-Internal: [CPUPL-1155]
Change-Id: I89c1e37e86950a4582bba70a5d8fc70ac915bd3c
Details:
- added cblas extension cblas_?cabs1.
- Functionality : res=|Re(z)|+|Im(z)|, z is a complex number, and res is a value containing the absolute value of a complex number z.
AMD-Internal: [CPUPL-1129]
Change-Id: I4a3c265c89527c8fd3060c5d2ed38b1953ce6343
Modified Makefile in test folder to enable calling BLAS interfaces for BLIS as
well. This is possible by replacing -DBLIS with -DBLAS=\"aocl\" in the
makefile. Also added linking to multi-threaded MKL library.
Change-Id: Iccf2ec99b48bb35da985b69218bc680f678ff7c9
Details:
- The axpby routines perform a vector-vector operation defined as
y = a*x + b*y where a, b are scalars and x, y are vectors.
- This API is part of BLAS-like extension APIs
Change-Id: I17a53b03bba97de7ae1995a9f086084bd241bcdc
AMD-Internal: [CPUPL-1118]
Details:
- Updated Makefiles in test, test/3, and test/sup so that running any of
the usual targets without having first built BLIS results in a helpful
error message. For example, if BLIS is not yet configured, make will
output:
Makefile:327: *** Cannot proceed: config.mk not detected! Run
configure first. Stop.
Similarly, if BLIS is configured but not yet built, make will output:
Makefile:340: *** Cannot proceed: BLIS library not yet built! Run
make first. Stop.
In previous commits, these actions would result in a rather cryptic
make error such as:
make: *** No rule to make target 'test_sgemm_2400_asm_blis_st.x',
needed by 'blis-nat-st'. Stop.
Details:
- Updated Makefiles in test, test/3, and test/sup so that running any of
the usual targets without having first built BLIS results in a helpful
error message. For example, if BLIS is not yet configured, make will
output:
Makefile:327: *** Cannot proceed: config.mk not detected! Run
configure first. Stop.
Similarly, if BLIS is configured but not yet built, make will output:
Makefile:340: *** Cannot proceed: BLIS library not yet built! Run
make first. Stop.
In previous commits, these actions would result in a rather cryptic
make error such as:
make: *** No rule to make target 'test_sgemm_2400_asm_blis_st.x',
needed by 'blis-nat-st'. Stop.
Details:
- Added framework code for GEMMT SUP.
- Implemented SUP for GEMMT using similar techniques as native path.
- Moved update routines to frame/util folder.
- Ported update routines for complex datatypes.
Change-Id: I17adfd0586d07f5a23dca6a07b2d48f4c9fcf71c
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>,
Dipal M Zambare <DipalMadhukar.Zambare@amd.com>,
Mangala V <managala.v@amd.com>
Details:
- Added new API Which Computes a matrix-matrix product with general matrices
but updates only the upper or lower triangular part of the result matrix.
cblas_?gemmt() and ?gemmt_().
- These routines are similar to the ?gemm routines, but they only access
and update a triangular part of the square result matrix.
- Added DGEMMT functionality by reusing GEMM kernels.
- Created a new folder for GEMMT under l3, and added GEMMT specific
framework code.
- Modified cntl_create routine to choose different macro kernel for
GEMMT.
- Added routines to copy lower/upper triangular part of a block to the
buffer.
- Defined BLIS, BLAS and CBLAS interface APIs for GEMMT.
- Added test_gemmt.c to test folder and Updated the Makefile.
- Added a macro 'CBLAS' in test_gemm.c to call CBLAS APIs.
Change-Id: Ie00c1a15b9c654b65c687a9ca781cbc6f9641791
Details:
-Kernel is called directly from API call to avoid framework
overhead in case of single and double precisions.
-Currently these changes are applicable only for zen2 configuration.
They will be enabled for zen family processors in future.
-These changes improve performance of BLAS and CBLAS interfaces of API.
They do not affect BLIS-specific APIs.
-setv simd kernel is added for single and double precision elements
Change-Id: I1b343aa232f2571717c2b01ada5914f869883e1a
Signed-off-by: Kiran ND <Kiran.Devrajegowda@amd.com>
AMD-Internal: [CPUPL-817]
Details:
- Separate kernel for copyv function added to improve performance.
- Modified cntx_init file in zen and zen2 configuration
- Added test_copyv.c in test folder
- Modified test/Makefile to include test_copyv.c
Change-Id: I297f539f2ddd2d71997b127a71a460991cd07b41
Signed-off-by: Kiran N D <kiran.Devrajegowda@amd.com>
AMD-Internal: [CPUPL-818]
Details:
-Added SIMD kernels for SWAPV for both single and double precisions.
-Modified cntx_init file for zen and zen2 configurations to choose opt kernels for
SWAPV.
-Added test_swapv.c in test folder.
-Modified test/Makefile to include test_swapv.c
Change-Id: Ida786eec722e634aee0dacdd51c327823c80f01a
Signed-off-by: Meghana Vankadari <Meghana.Vankadari@amd.com>
AMD-Internal: [CPUPL-847]
Details:
- Added missing license header to bli_pwr9_asm_macros_12x6.h.
- Reverted temporary changes to various files in 'test' and 'testsuite'
directories.
- Moved testsuite/jobscripts into testsuite/old.
- Minor whitespace/comment changes across various files.
Implemented and registered power9 dgemm ukernel.
Details:
- Implemented 12x6 dgemm microkernel for power9. This microkernel
assumes that elements of B have been duplicated/broadcast during the
packing step. The microkernel uses a column orientation for its
microtile vector registers and thus implements column storage and
general stride IO cases. (A row storage IO case via in-register
transposition may be added at a future date.) It should be noted that
we recommend using this microkernel with gcc and *not* xlc, as issues
with the latter cropped up during development, including but not
limited to slightly incompatible vector register mnemonics in the GNU
extended inline assembly clobber list.
Details:
- NOTE: This is a merge commit of 'master' of git://github.com/amd/blis
into 'amd-master' of flame/blis.
- Fixed a bug in the downstream value of BLIS_NUM_ARCHS, which was
inadvertantly not incremented when the Zen2 subconfiguration was
added.
- In bli_gemm_front(), added a missing conditional constraint around the
call to bli_gemm_small() that ensures that the computation precision
of C matches the storage precision of C.
- In bli_syrk_front(), reorganized and relocated the notrans/trans logic
that existed around the call to bli_syrk_small() into bli_syrk_small()
to minimize the calling code footprint and also to bring that code
into stylistic harmony with similar code in bli_gemm_front() and
bli_trsm_front(). Also, replaced direct accessing of obj_t fields with
proper accessor static functions (e.g. 'a->dim[0]' becomes
'bli_obj_length( a )').
- Added #ifdef BLIS_ENABLE_SMALL_MATRIX guard around prototypes for
bli_gemm_small(), bli_syrk_small(), and bli_trsm_small(). This is
strictly speaking unnecessary, but it serves as a useful visual cue to
those who may be reading the files.
- Removed cpp macro-protected small matrix debugging code from
bli_trsm_front.c.
- Added a GCC_OT_9_1_0 variable to build/config.mk.in to facilitate gcc
version check for availability of -march=znver2, and added appropriate
support to configure script.
- Cleanups to compiler flags common to recent AMD microarchitectures in
config/zen/amd_config.mk, including: removal of -march=znver1 et al.
from CKVECFLAGS (since the -march flag is added within make_defs.mk);
setting CRVECFLAGS similarly to CKVECFLAGS.
- Cleanups to config/zen/bli_cntx_init_zen.c.
- Cleanups, added comments to config/zen/make_defs.mk.
- Cleanups to config/zen2/make_defs.mk, including making use of newly-
added GCC_OT_9_1_0 and existing GCC_OT_6_1_0 to choose the correct
set of compiler flags based on the version of gcc being used.
- Reverted downstream changes to test/test_gemm.c.
- Various whitespace/comment changes.
Details:
- Commented out redundant setting of LIBBLIS_LINK within all driver-
level Makefiles. This variable is already set within common.mk, and
so the only time it should be overridden is if the user wants to link
to a different copy of libblis.
- Very minor changes to build/gen-make-frags/gen-make-frag.sh.
- Whitespace and inconsequential quoting change to configure.
- Moved top-level 'windows' directory into a new 'attic' directory.
Details:
- Removed explicit reference to The University of Texas at Austin in the
third clause of the license comment blocks of all relevant files and
replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
with format of all other comment blocks.
Details:
- Tweaked the cflags functions in common.mk so that a new preprocessor
macro, BLIS_IS_BUILDING_LIBRARY, is defined, but only when BLIS
itself is being built. This macro will not be defined when, for
example, the testsuite or example code compiles code local to those
applications. This was done in part by defining a new cflags function
get-user-cflags-for(), which is now the designated function for
application Makefiles if they wish to inherit a basic set of CFLAGS
from BLIS. (The compiler flags returned are identical to that of
get-frame-cflags-for() except that -DBLIS_IS_BUILDING_LIBRARY is
omitted.)
- Updated all test driver-like makefiles to call get-user-cflags-for()
instead of get-frame-cflags-for().
Details:
- Updated the build system so that "lesser" Makefiles, such as those in
belonging to example code or the testsuite, may be run even if the
directory is orphaned from the original build tree. This allows a
user to configure, compile, and install BLIS, delete the build tree
(that is, the source distribution, or the build directory for out-
of-tree builds) and then compile example or testsuite code and link
against the installed copy of BLIS (provided the example or testsuite
directory was preserved or obtained from another source). The only
requirement is that make be invoked while setting the
BLIS_INSTALL_PATH variable to the same installation prefix used when
BLIS was configured. The easiest syntax is:
make BLIS_INSTALL_PATH=/install/prefix
though it's also permissible to set BLIS_INSTALL_PATH as an
environment variable prior to running 'make'.
- Updated all lesser Makefiles to implement the new aforementioned build
behavior.
- Relocated check-blastest.sh and check-blistest.sh from build to
blastest and testsuite, respectively, so that if those directories are
copied elsewhere the user can still run 'make check' locally.
- Updated docs/Testsuite.md with language that mentions this new option
of building/linking against an installed copy of BLIS.
Details:
- Added logic to common.mk that will choose a BLIS library against which
to link (LIBBLIS_LINK). The default choice is the static (.a) library;
the shared (.so) library is chosen only if the shared library build was
enabled and the static one was disabled.
- Updated the various test driver Makefiles to reference this common,
pre-chosen library against which to link. (Previously, these drivers
unconditionally linked against the static library and would have
failed if the static library build was disabled at configure-time.)
- Renamed many of the variables in common.mk and the top-level Makefile
so that variables relating to the libblis.[a|so] files, including
paths to those files, begin with "LIBBLIS".
- Shuffled around some of the library definitions from the top-level
Makefile to common.mk.
- Renamed BLIS_ENABLE_DYNAMIC_BUILD to BLIS_ENABLE_SHARED_BUILD, and
the @enable_dynamic@ anchor to @enable_shared@ in build/config.mk.in
and in configure.
- A few other cleanups in the top-level Makefile.
Details:
- Merged contributions made by AMD via 'amd' branch (see summary below).
Special thanks to AMD for their contributions to-date, especially with
regard to intrinsic- and assembly-based kernels.
- Added column storage output cases to microkernels in
bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with
the extra cost of transposing the microtile in registers, this is
much faster than using the general storage case when the underlying
matrix is column-stored.
- Added s and d assembly-based zen gemmtrsm_u microkernel (including
column storage optimization mentioned above).
- Updated zen sub-configuration to reflect presence of new native
kernels.
- Temporarily reverted zen sub-configuration's level-3 cache blocksizes
to smaller haswell values.
- Temporarily disabled small matrix handling for zen configuration
family in config/zen/bli_family_zen.h.
- Updated zen CFLAGS according to changes in 1e4365b.
- Updated haswell microkernels such that:
- only one vzeroupper instruction is called prior to returning
- movapd/movupd are used in leiu of movaps/movups for double-real
microkernels. (Note that single-real microkernels still use
movaps/movups.)
- Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is
now included via frame/include/bli_arch_config.h.
- Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation
in testsuite/src/test_amaxv.c).
- Added early return for alpha == 0 in bli_dotxv_ref.c.
- Integrated changes from f07b176, including a fix for undefined
behavior when executing the 1m method under certain conditions.
- Updated config_registry; no longer need haswell kernels for zen
sub-configuration.
- Tweaked marginal and pass thresholds for dotxf.
- Reformatted level-1v, -1f, and -3 amd kernels and inserted additional
comments.
- Updated LICENSE file to explicitly mention that parts are copyright
UT-Austin and AMD.
- Added AMD copyright to header templates in build/templates.
Summary of previous changes from 'amd' branch.
- Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and
s and d assembly-based zen gemmtrsm_l microkernels (d6x8).
- Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv,
and scalv, with extra-unrolling variants for axpyv and scalv.
- Added a small matrix handler to bli_gemm_front(), with the handler
implemented in kernels/zen/3/bli_gemm_small_matrix.c.
- Added additional logic to sumsqv that first attempts to compute the
sum of the squares via dotv(). If there is a floating-point exception
(FE_OVERFLOW), then the previous (numerically conservative) code is
used; otherwise, the result of dotv() is square-rooted and stored as
the result. This new implementation is only enabled when FE_OVERFLOW
is #defined. If the macro is not #defined, then the previous
implementation is used.
- Added axpyv and dotv standalone test drivers to test directory.
- Added zen support to old cpuid_x86.c driver in build/auto-detect/old.
- Added thread-local and __attribute__-related macros to bli_macro_defs.h.
Details:
- Fixed a makefile error encountered when building the testsuite directly
in its directory (as opposed to indirectly via 'make test'). The fix
involves introducing a new variable, BUILD_PATH, alongside the existing
DIST_PATH variable. By default, BUILD_PATH is set to the current
directory, and is overridden by other Makefiles used by, for example,
the testsuite and standalone test drivers in testsuite or test,
respectively.
- Some files/directories in common.mk were redefined in terms of
BUILD_DIR, such as the locations of config.mk file and the intermediate
include directory.
Details:
- Fixed semi-broken testsuite Makefile and very-broken test driver Makefiles,
as well as those for test/3m4m, test/thread_ranges, and test/exec_sizes
sub-directories.
- Factored out much of the top-level Makefile into common.mk. A Makefile
needs only set DIST_PATH to the relative path to the top level of the
BLIS source distribution before including common.mk in order to acquire
all of the definitions typically needed in a Makefile that tests BLIS.
Details:
- Added two new sets of [sd]gemm micro-kernels for haswell architectures,
one that is 4x24/4x12 (s and d) and one that is 6x16/6x8.
- Changed the haswell configuration to use the 6x16/6x8 micro-kernels
by default.
- Updated various Makefiles, in test, test/3m4m, and testsuite.
Details:
- Reverted some changes that were unintentionally included in the
previous commit (9526ce98). Thanks to Tony Kelman for pointing
this out. (Note: a few select changes were not reverted.)
Details:
- Updated copyright headers to include "at Austin" in the name of the
University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
2012).
Details:
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
for x86_64/core2 were calling the wrong reference code (l instead
of u).
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
kernels.
- Minor typecasting fix in testsuite/src/test_libblis.c.
- Makefile updates.
Details:
- Updated Makefiles in test and testsuite directories to use the new
BLIS header installation directory scheme, which is to compile with
-I<PREFIX>/include/blis instead of -I<PREFIX>/include.
Details:
- Added ESSL and Accelerate (OS X) targets to standalone test drivers'
Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
/ providing this patch.