* commit '5013a6cb':
More edits and fixes to docs/FAQ.md.
Fixed newly broken link to CREDITS in FAQ.md.
More minor fixes to FAQ.md and Sandboxes.md.
Updates to FAQ.md, Sandboxes.md, and README.md.
Safelist 'master', 'dev', 'amd' branches.
Re-enable and fix fb93d24.
Reverted fb93d24.
Re-enable and fix 8e0c425 (BLIS_ENABLE_SYSTEM).
Removed last vestige of #define BLIS_NUM_ARCHS.
Added new packm var3 to 'gemmlike'.
Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell.
Fix more copy-paste errors in the haswell gemmsup code.
Do a fast test on OSX. [ci skip]
Fix AArch64 tests and consolidate some other tests.
Use C++ cross-compiler for ARM tests.
Attempt to fix cxx-test for OOT builds.
Updated travis-ci.org link in README.md to .com.
Disabled (at least temporarily) commit 8e0c425.
Define BLIS_OS_NONE when using --disable-system.
Updated stale calls to malloc_intl() in gemmlike.
Blacklist clang10/gcc9 and older for 'armsve'.
Add test to Travis using C++ compiler to make sure blis.h is C++-compatible.
Moved lang defs from _macro_def.h to _lang_defs.h.
Minor tweaks to gemmlike sandbox.
Added local _check() code to gemmlike sandbox.
README.md citation updates (e.g. BLIS7 bibtex).
Tweaks to gemmlike to facilitate 3rd party mods.
Whitespace tweaks.
Add row- and column-strides for A/B in obj_ukr_fn_t.
Clean up some warnings that show up on clang/OSX.
Remove schema field on obj_t (redundant) and add new API functions.
Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects.
Disabled sanity check in bli_pool_finalize().
Implement proposed new function pointer fields for obj_t.
AMD-Internal: [CPUPL-2698]
Change-Id: I6fc33351fa824580cf4f25b63f0370383cd9422d
Current BLIS makefile always uses the static library on Linux for
all BLIS test programs. This commit adds the option to use the shared
library instead by specifying e.g.
make checkblis USE_SHARED=yes
Executables are generated in different sub-directories for static
and shared libraries.
AMD-Internal: [CPUPL-4107]
Change-Id: I3ab5d505cfbc5f6ef47aa28fcbb846c52d56c3f2
* commit 'e366665c':
Fixed stale API calls to membrk API in gemmlike.
Fixed bli_init.c compile-time error on OSX clang.
Fixed configure breakage on OSX clang.
Fixed one-time use property of bli_init() (#525).
CREDITS file update.
Added Graviton2 Neoverse N1 performance results.
Remove unnecesary windows/zen2 directory.
Add vzeroupper to Haswell microkernels. (#524)
Fix Win64 AVX512 bug.
Add comment about make checkblas on Windows
CREDITS file update.
Test installation in Travis CI
Add symlink to blis.pc.in for out-of-tree builds
Revert "Always run `make check`."
Always run `make check`.
Fixed configure script bug. Details: - Fixed kernel list string substitution error by adding function substitute_words in configure script. if the string contains zen and zen2, and zen need to be replaced with another string, then zen2 also be incorrectly replaced.
Update POWER10.md
Rework POWER10 sandbox
Skip clearing temp microtile in gemmlike sandbox.
Fix asm warning
Sandbox header edits trigger full library rebuild.
Add vhsubpd/vhsubpd.
Fixed bugs in cpackm kernels, gemmlike code.
Armv8A Rename Regs for Safe Darwin Compile
Armv8A Rename Regs for Clang Compile: FP32 Part
Armv8A Rename Regs for Clang Compile: FP64 Part
Asm Flag Mingling for Darwin_Aarch64
Added a new 'gemmlike' sandbox.
Updated Fugaku (a64fx) performance results.
Add explicit compiler check for Windows.
Remove `rm-dupls` function in common.mk.
Travis CI Revert Unnecessary Extras from 91d3636
Adjust TravisCI
Travis Support Arm SVE
Added 512b SVE-based a64fx subconfig + SVE kernels.
Replace bli_dlamch with something less archaic (#498)
Allow clang for ThunderX2 config
AMD-Internal: [CPUPL-2698]
Change-Id: I561ca3959b7049a00cc128dee3617be51ae11bc4
- Moved *_blis_impl function declaration outside the BLIS_ENABLE_BLAS
guard.
- Changed Makefile to continue to compile bla_ files to get
*_blis_impl interfaces.
- Modify CBLAS headers, bli_macro_defs.h and bli_util_api_wrap.{c,h}
to add BLIS_ENABLE_CBLAS guards.
- Comment out BLIS_ENABLE_BLAS guards in various headers and utility
functions.
- Define BLIS Fortran-style functions lsame_blis_impl and
xerbla_blis_impl. New macros PASTE_LSAME and PASTE_XERBLA are
used in bla_*_check headers and some other places to select
whether to call lsame and xerbla, or the _blis_impl versions.
- Defined various other missing _blis_impl functions.
- In bli_util_api_wrap.c, only define any functions if
BLIS_ENABLE_BLAS is defined, and only define the subroutine
versions of functions like dot, nrm2, etc if BLIS_ENABLE_CBLAS
is defined.
- BLAS layer is needed if CBLAS layer is enabled. Changed header
files build/bli_config.h.in and bli_blas.h, and configure
program to help ensure consistency in generated blis.h header
and configure output.
Undefining BLIS_ENABLE_BLAS_DEFS appears to be broken in UTA BLIS
too, thus BLIS_ENABLE_BLAS_DEFS is currently permanently defined.
AMD-Internal: [CPUPL-3015]
Change-Id: I7c0fe07db85781db46f2c690e174451860b37635
Details:
- Updated Makefile and common.mk so that the targeted configuration's
kernel CFLAGS are applied to source files that are found in a
'kernels' subdirectory within an enabled addon. For now, this
behavior only applies when the 'kernels' directory is at the top
level of the addon directory structure. For example, if there is an
addon named 'foobar', the source code must be located in
addon/foobar/kernels/ in order for it to be compiled with the target
configurations's kernel CFLAGS. Any other source code within
addon/foobar/ will be compiled with general-purpose CFLAGS (the same
ones that were used on all addon code prior to this commit). Thanks
to AMD (esp. Mithun Mohan) for suggesting this change and catching an
intermediate bug in the PR.
- Comment/whitespace updates.
(cherry picked from commit fd885cf98f)
Change-Id: I9a678f78bde90b23a6293ce90377004876f51067
-Parametric ReLU is the generalization of leaky ReLU in which the
leakage coefficient is tunable. The support for the same is added
following the register-level fusion technique.
-Low precision bench enhancement to check accuracy/performance of
low precision gemm with PReLU.
-Bug fixes in low precision gemm kernels.
AMD-Internal: [CPUPL-2442]
Change-Id: I81336405b185a994297d122b2d868b758ae6dad5
Feature Addition: Added a new variant of low precision GEMM to addon - BFloat16. The kernel takes bf16 type inputs and perform BF16 GEMM operations. The intermediate accumulation and output are in float.
1. Compute kernels will perform computations only if B matrix is reordered in accordance with the usage of AVX-512 BF16 instruction - dpbf16_ps
2. Kernel for packing B matrix is provided
Change-Id: If5d08213068869eff060c9998596d2d2703a6793
- Multi-Threaded int8 GEMM (Input - uint8_t, int8_t, Output - int32_t).
AVX512_vnni based micro-kernel for int8 gemm. Paralellization supported
along m and n dimensions.
- Multi-Threaded B matrix reorder support for sgemm. Reordering B matrix
is packing entire B matrix upfront before sgemm. It allows sgemm to
take advantage of packed B matrix without incurring packing costs during
runtime.
- Makefile updates to addon make rules to compile avx512 code for
selected files in addon folder.
- CPU features query enhancements to check for AVX512_VNNI flag.
- Bench for int8 gemm and sgemm with B matrix reorder. Supports
performance mode for benchmarking and accuracy mode for testing code
correctness.
AMD-Internal: [CPUPL-2102]
Change-Id: I8fb25f5c2fbd97d756f95b623332cb29e3b8d182
Details:
- Implemented a new feature called addons, which are similar to
sandboxes except that there is no requirement to define gemm or any
other particular operation.
- Updated configure to accept --enable-addon=<name> or -a <name> syntax
for requesting an addon be included within a BLIS build. configure now
outputs the list of enabled addons into config.mk. It also outputs the
corresponding #include directives for the addons' headers to a new
companion to the bli_config.h header file named bli_addon.h. Because
addons may wish to make use of existing BLIS types within their own
definitions, the addons' headers must be included sometime after that
of bli_config.h (which currently is #included before bli_type_defs.h).
This is why the #include directives needed to go into a new top-level
header file rather than the existing bli_config.h file.
- Added a markdown document, docs/Addons.md, to explain addons, how to
build with them, and what assumptions their authors should keep in
mind as they create them.
- Added a gemmlike-like implementation of sandwich gemm called 'gemmd'
as an addon in addon/gemmd. The code uses a 'bao_' prefix for local
functions, including the user-level object and typed APIs.
- Updated .gitignore so that git ignores bli_addon.h files.
Change-Id: Ie7efdea366481ce25075cb2459bdbcfd52309717
- Removed BLIS_CONFIG_EPYC macro
- The code dependent on this macro is handled in
one of the three ways
-- It is updated to work across platforms.
-- Added in architecture/feature specific runtime checks.
-- Duplicated in AMD specific files. Build system is updated to
pick AMD specific files when library is built for any of the
zen architecture
AMD-Internal: [CPUPL-1960]
Change-Id: I6f9f8018e41fa48eb43ae4245c9c2c361857f43b
-- Reverted changes made to include lp/ilp info in binary name
This reverts commit c5e6f885f0.
-- Included BLAS int size in 'make showconfig'
-- Renamed amdepyc configuration to amdzen
Change-Id: Ie87ec1c03e105f606aef1eac397ba0d8338906a6
The cpp and testcpp folder exists in root directory as well as vendor
directory. Only folder in vendor directory are needed.
Removed duplicate directories and updated makefiles to pick the
sources from vendor folder.
AMD-Internal: [CPUPL-1834]
Change-Id: I178043a09fd746660938b89ecce73c53d6c53409
This fixes a bug where "make -j<N> check" may fail after a change to one or more header files, or where testsuite code doesn't get properly recompiled after internal changes.
-- Ignore aocl dynamic configuration if multithreading is disabled.
AOCL Dynamic will also be disabled in this case.
-- Added following configuration settings in showconfig output
1. Complex return scheme
2. TRSM preinversion status
3. AOCL dynamic active status
AOCL-Internal: [CPUPL-1565]
Change-Id: Id5a31b233fc08dcd871de4a693aab0b2a5d9f1c4
Details:
- Adjusted the top-level Makefile so that any change to a sandbox header
file will result in blis.h being regenerated along with a full
recompilation of the library. Previously, sandbox files were omitted
from the list of header files that, when touched, could trigger a full
rebuild. Why was it like that previously? Because originally we only
envisioned using sandboxes to *replace* gemm, not augment the library
with new functionality. When replacing gemm, blis.h does not need to
contain any local sandbox defintions in order for the user to be able
to (indirectly) use that sandbox. But if you are adding functions to
the library, those functions need to be prototyped so the compiler
can perform type checking against the user's invocation of those new
functions. Thanks to Jeff Diamond for helping us discover this
deficiency in the build system.
- Updated Makefile to include DTL files in library build
- Updated Makefile to include cpp header file installation
- Updated test/makefile to include extra API added by AMD team.
AMD-Internal: [CPUPL-1559]
Change-Id: I249c6935d5ff5fb645f9deec7e0218575484be13
Merge conflicts araised has been fixed while downstreaming BLIS code from master to milan-3.1 branch
Implemented an automatic reduction in the number of threads when the user requests parallelism via a single number (ie: the automatic way) and (a) that number of threads is prime, and (b) that number exceeds a minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which defaults to 11. If prime numbers are really desired, this feature may be suppressed by defining the macro BLIS_ENABLE_AUTO_PRIME_NUM_THREADS in the appropriate configuration family's bli_family_*.h. (Jeff Diamond)
Changed default value of BLIS_THREAD_RATIO_M from 2 to 1, which leads to slightly different automatic thread factorizations.
Enable the 1m method only if the real domain microkernel is not a reference kernel. BLIS now forgoes use of 1m if both the real and complex domain kernels are reference implementations.
Relocated the general stride handling for gemmsup. This fixed an issue whereby gemm would fail to trigger to conventional code path for cases that use general stride even after gemmsup rejected the problem. (RuQing Xu)
Fixed an incorrect function signature (and prototype) of bli_?gemmt(). (RuQing Xu)
Redefined BLIS_NUM_ARCHS to be part of the arch_t enum, which means it will be updated automatically when defining future subconfigs.
Minor code consolidation in all level-3 _front() functions.
Reorganized Windows cpp branch of bli_pthreads.c.
Implemented bli_pthread_self() and _equals(), but left them commented out (via cpp guards) due to issues with getting the Windows versions working. Thankfully, these functions aren't yet needed by BLIS.
Allow disabling of trsm diagonal pre-inversion at compile time via --disable-trsm-preinversion.
Fixed obscure testsuite bug for the gemmt test module that relates to its dependency on gemv.
AMD-internal-[CPUPL-1523]
Change-Id: I0d1df018e2df96a23dc4383d01d98b324d5ac5cd
Merged contributions from AMD's AOCL BLIS (#448).
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
only the lower or upper triangle of a square matrix C. For now, only
the conventional/large code path will be supported (in vanilla BLIS).
This was accomplished by leveraging the existing variant logic for
herk. However, some of the infrastructure to support a gemmtsup is
included in this commit, including
- A bli_gemmtsup() front-end, similar to bli_gemmsup().
- A bli_gemmtsup_ref() reference handler function.
- A bli_gemmtsup_int() variant chooser function (with variant calls
commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
wrapper to a set of polymorphic CBLAS-like function wrappers defined
in another header (cblas.hh). These two headers are installed if
running the 'install' target with INSTALL_HH is set to 'yes'. (Also
added a set of unit tests that exercise blis.hh, although they are
disabled for now because they aren't compatible with out-of-tree
builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
various minor updates to dotv and scalv kernels. Also added various
sup kernels contributed by AMD to kernels/zen/3. However, these
kernels are (for now) not yet used, in part because they caused
AppVeyor clang failures, and also because I have not found time to
review and vet them.
- Output the python found during configure into the definition of PYTHON
in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
bug surfaced because the gemmt module verifies its computation using
gemm with its beta parameter set to zero, which, on a cortexa15 system
caused the gemm kernel code to unconditionally multiply the
uninitialized C data by beta. The C matrix likely contained
non-numeric values such as NaN, which then would have resulted in a
false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
in bli_l3_blocksize.c, was inadvertantly being defined in terms of
helper functions meant for trmm. This bug was probably harmless since
the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
kernels/zen/3/bli_gemm_small.c since those macros are not used in
vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
Windows systems.
- Various whitespace changes.
Details:
- Modified the 0644 octal code used during installation of shared
libraries to 0755 (for Linux/OSX only). Thanks to Adam J. Stewart
for reporting this issue via #343.
- CREDITS file update.
Details:
- Changed the default installation prefix from $HOME/lib to /usr/local.
- Modified the way configure internally handles the prefix, libdir,
includedir, and sharedir (and also added an --exec-prefix option).
The defaults to these variables are set as follows:
prefix: /usr/local
exec_prefix: ${prefix}
libdir: ${exec_prefix}/lib
includedir: ${prefix}/include
sharedir: ${prefix}/share
The key change, aside from the addition of exec_prefix and its use to
define the default to libdir, is that the variables are substituted
into config.mk with quoting that delays evaluation, meaning the
substituted values may contain unevaluated references to other
variables (namely, ${prefix} and ${exec_prefix}). This more closely
follows GNU conventions, including those used by GNU autoconf, and
also allows make to override any one of the variables *after*
configure has already been run (e.g. during 'make install').
- Updates to build/config.mk.in pursuant to above changes.
- Updates to output of 'configure --help' pursuant to above changes.
- Updated docs/BuildSystem.md to reflect the new default installation
prefix, as well as mention EXECPREFIX and SHAREDIR.
- Changed the definitions of the UNINSTALL_OLD_* variables in the
top-level Makefile to use $(wildcard ...) instead of 'find'. This
was motivated by the new way of handling prefix and friends, which
leads to the 'find' command being run on /usr/local (by default),
which can take a while almost never yielding any benefit (since the
user will very rarely use the uninstall-old targets).
- Removed periods from the end of descriptive output statements (i.e.,
non-verbose output) since those statements often end with file or
directory paths, which get confusing to read when puctuated by a
period.
- Trival change to 'make showconfig' output.
- Removed my name from 'configure --help'. (Many have contributed to it
over the years.)
- In configure script, changed the default state of threading_model
variable from 'no' to 'off' to match that of debug_type, where there
are similarly more than two valid states. ('no' is still accepted
if given via the --enable-debug= option, though it will be
standardized to 'off' prior to config.mk being written out.)
- Minor variable name change in flatten-headers.py that was intended for
32812ff.
- CREDITS file update.
Details:
- Changed the default installation prefix from $HOME/lib to /usr/local.
- Modified the way configure internally handles the prefix, libdir,
includedir, and sharedir (and also added an --exec-prefix option).
The defaults to these variables are set as follows:
prefix: /usr/local
exec_prefix: ${prefix}
libdir: ${exec_prefix}/lib
includedir: ${prefix}/include
sharedir: ${prefix}/share
The key change, aside from the addition of exec_prefix and its use to
define the default to libdir, is that the variables are substituted
into config.mk with quoting that delays evaluation, meaning the
substituted values may contain unevaluated references to other
variables (namely, ${prefix} and ${exec_prefix}). This more closely
follows GNU conventions, including those used by GNU autoconf, and
also allows make to override any one of the variables *after*
configure has already been run (e.g. during 'make install').
- Updates to build/config.mk.in pursuant to above changes.
- Updates to output of 'configure --help' pursuant to above changes.
- Updated docs/BuildSystem.md to reflect the new default installation
prefix, as well as mention EXECPREFIX and SHAREDIR.
- Changed the definitions of the UNINSTALL_OLD_* variables in the
top-level Makefile to use $(wildcard ...) instead of 'find'. This
was motivated by the new way of handling prefix and friends, which
leads to the 'find' command being run on /usr/local (by default),
which can take a while almost never yielding any benefit (since the
user will very rarely use the uninstall-old targets).
- Removed periods from the end of descriptive output statements (i.e.,
non-verbose output) since those statements often end with file or
directory paths, which get confusing to read when puctuated by a
period.
- Trival change to 'make showconfig' output.
- Removed my name from 'configure --help'. (Many have contributed to it
over the years.)
- In configure script, changed the default state of threading_model
variable from 'no' to 'off' to match that of debug_type, where there
are similarly more than two valid states. ('no' is still accepted
if given via the --enable-debug= option, though it will be
standardized to 'off' prior to config.mk being written out.)
- Minor variable name change in flatten-headers.py that was intended for
32812ff.
- CREDITS file update.
Details:
- Changed all occurrances of
micro-kernel -> microkernel
macro-kernel -> macrokernel
micro-panel -> micropanel
in all markdown documents in 'docs' directory. This change is being
made since we've reached the point in adoption and acceptance of
BLIS's insights where words such as "microkernel" are no longer new,
and therefore now merit being unhyphenated.
- Updated "Implementation Notes" sections of KernelsHowTo.md, which
still contained references to nonexistent cpp macros such as
BLIS_DEFAULT_MR_? and BLIS_PACKDIM_MR_?.
- Added 'run-fast' and 'check-fast' targets to testsuite/Makefile.
- Minor updates to Testsuite.md, including suggesting use of
'make check' and 'make check-fast' when running from the local
testsuite directory.
- Added a comment to top-level Makefile explaining the purpose behind
the TESTSUITE_WRAPPER variable, which at first glance appears to serve
no purpose.
Details:
- Removed explicit reference to The University of Texas at Austin in the
third clause of the license comment blocks of all relevant files and
replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
with format of all other comment blocks.
Details:
- Modified .travis.yml to automatically employ the simulation of
application-level threading within the testsuite, with supporting
changes to common.mk, the top-level Makefile, and
travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
'.salt' suffix (similar to those with the '.fast' suffix) for
testing application-level threading.
- Updated docs/BuildSystem.md to document the new make targets
'testblis-salt' and 'checkblis-salt'.
Details:
- Modified .travis.yml to automatically test the mixed-datatype support
of the gemm operation, with supporting changes to common.mk, the
top-level Makefile, and travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
'.mixed' suffix (similar to those with the '.fast' suffix) for testing
mixed-datatype gemm.
- Updated docs/BuildSystem.md to document the new make targets
'testblis-md' and 'checkblis-md'.
Details:
- Rewrote bli_winsys.c to define bli_setenv() and bli_sleep()
unconditionally, but differently for Windows and non-Windows, but
then disabled the definition of bli_setenv() entirely since BLIS
no longer needs to set environment variables. Updated bli_winsys.h
accordingly, and call bli_sleep() from within testsuite instead of
sleep() directly.
- Use
#if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS != 200809L)
instead of
#if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS < 0)
when guarding against local definition of pthread barrier in
testsuite. (The description for unistd.h implies that _POSIX_BARRIERS
should always be set to 200809L when barriers are supported, though I
won't be surprised if we encounter a case in the future where it is
set to something else such as 1 while still supported.)
- Removed old _VERS_CONF_INST definitions and installation rules in
top-level Makefile. These are no longer needed because we no longer
output libraries with the version and configuration name as
substrings.
- Comment/whitespace updates in Makefile, config.mk.in, common.mk,
configure, bli_extern_defs.h, and test_libblis.h.
- Added mention of 1m to README.md and other trivial tweaks.
* Enable shared
* Enable rdp
* Add support for dll
* Use libblis-symbols.def
* Fix building dlls
* Fix libblis-symbols.def
* Fix soname
* Fix Makefile error
* Fix install target
* Fix missing symbols
* Add BLIS_MINUS_TWO
* Add path to dll
* Fix OSX soname
* Add declspec for dll
* Add -DBLIS_BUILD_DLL
* Replace @enable_shared@ in config
* switch to auto for now
* blis_ -> bli_
* Remove BLIS_BUILD_DLL in make check
* change auto->haswell
* enable_shared_01
* Add wno-macro-redefined
* print out.cblat3
* BLIS_BUILD_DLL -> BLIS_IS_BUILDING_LIBRARY
* Use V=1
* Remove fpic for windows
* Remember LIBPTHREAD
* Remove libm for windows
* Remember AR
* Fix remembering libpthread
* Add Wno-maybe-uninitialized in only gcc
* Don't do blastest for shared for now
* Fix install target
And remove unnecessary change
* test auto and x86_64
* Fix install target again
* Use IS_WIN variable
* Remove leading dot from LIBBLIS_SO_MAJ_EXT
* Make is_win yes/no
* Add comments for windows builds
* Change if else blocks location
Details:
- Use get-user-cflags-for() to generate cflags when compiling BLAS test
drivers and BLIS testsuite from top-level Makefile. Meant to include
these changes in previous commit (4b5437e). Thanks to Isuru Fernando
for pointing out this oversight.
Details:
- Updated the build system so that "lesser" Makefiles, such as those in
belonging to example code or the testsuite, may be run even if the
directory is orphaned from the original build tree. This allows a
user to configure, compile, and install BLIS, delete the build tree
(that is, the source distribution, or the build directory for out-
of-tree builds) and then compile example or testsuite code and link
against the installed copy of BLIS (provided the example or testsuite
directory was preserved or obtained from another source). The only
requirement is that make be invoked while setting the
BLIS_INSTALL_PATH variable to the same installation prefix used when
BLIS was configured. The easiest syntax is:
make BLIS_INSTALL_PATH=/install/prefix
though it's also permissible to set BLIS_INSTALL_PATH as an
environment variable prior to running 'make'.
- Updated all lesser Makefiles to implement the new aforementioned build
behavior.
- Relocated check-blastest.sh and check-blistest.sh from build to
blastest and testsuite, respectively, so that if those directories are
copied elsewhere the user can still run 'make check' locally.
- Updated docs/Testsuite.md with language that mentions this new option
of building/linking against an installed copy of BLIS.
Details:
- Changed the cleanmk target to delete makefile fragments from their new
home in obj/$(CONFIG_NAME). The old definition worked only because of
a typo (REFERKN_PATH instead of REFKERN_PATH), and only in the
non-verbose (V != 1) case.
Details:
- Redefined the 'test' make target in the top-level Makefile so that the
final result ("everything passed" or at "least one failure") is echoed
to stdout. Note that 'check' is unchanged, and thus is now effectively
a fast version of 'test'.
- Updated docs/BuildSystem.md to reflect the above change.
Details:
- Fixed a linker error that occurred when attempting to compile and link
the testsuite and/or BLAS test drivers after having configured BLIS to
only generate a shared library (no static library). The chosen
solution involved
(1) adding the local library path, $(BASE_LIB_PATH), to the search
paths for the shared library via the link option
-Wl,-rpath,$(BASE_LIB_PATH).
(2) adding a local symlink to $(BASE_LIB_PATH) that uses the .so major
version number so that ld would find the shared library at
execution time.
Thanks to Sajid Ali for reporting this issue, to Devin Matthews for
pointing out the need for the -rpath option, and to Devangi Parikh for
helping Sajid isolate the problem.
- Added #include <ctype.h> to bli_system.h to avoid a compiler warning
resulting from using toupper() from bli_string.c without a prototype.
Thanks again to Sajid Ali, whose build log revealed this compiler
warning.
- Added '*.so.*' to .gitignore.
- CREDITS file update.
Details:
- Added support for --enable-arg-max-hack to configure, which will
change the behavior of make when building BLIS so that rather than
invoke the archiver/linker with all of the object files as command
line arguments, those object files are echoed to a temporary file
and then the archiver/linker is fed that temporary file via the @
notation. An example of this can be found in the GNU make docs at
https://www.gnu.org/software/make/manual/make.html#File-Function
- Thanks to Isuru Fernando for prompting this feature.
Details:
- Added a c99 sandbox (in sandbox/c99) to serve as a starting point for
others looking to experiment with alternative implementations of gemm
in BLIS. Note that this sandbox implementation is a first draft and
will be refined over time.
- Minor updates to Makefile and common.mk to restrict what source files
get recompiled when sandbox files are touched.
- Added an initial draft of a README.md in sandbox/c99.
Detail:
- configure:
- add support for --enable-sandbox=NAME to configure script, where NAME
is a subdirectory of a new 'sandbox' directory that contains an
alternative implementation of gemm. (For now, only implementations of
gemm may be provided via a sandbox.);
- add support for C++ compiler. C++ compilers are handled in a manner
similar to that of C compilers, in that a default search order is
used, and that CXX is searched for first, if the variable is set. In
practice, the C++ compiler that is selected should correspond to the
selected C compiler. (Example: If gcc is selected for C, g++ should
be selected for C++.) The result of the search is output to config.mk
via build/config.mk.in. NOTE: The use of C++ in BLIS is still
hypothetical, but may eventually move to being experimental. This
support was intended only for use of C++ within a gemm sandbox.
- build/config.mk.in:
- define SANDBOX variable containing sandbox subdirectory name.
- build/bli_config.in:
- define either of the BLIS_ENABLE_SANDBOX or BLIS_DISABLE_SANDBOX
macros in bli_config.h.
- common.mk:
- include makefile fragments that were propagated into the specified
sandbox subdirectory;
- generate different CFLAGS for sandboxes, as well as a separate
CXXFLAGS variable for sandboxes when C++ source files are compiled;
- isolate into a single location lists of file suffixes for various
purposes.
- reorganized/clean up code related to identifying header files and
paths.
- Makefile:
- generate object filepaths for and compile source code files found in
sandbox sub-directory;
- remove makefile fragments placed in sandbox sub-directory (cleanmk);
- various other cleanups.
- Added .cc, .cpp, and .cxx to list of suffixes of files to recognize in
makefile fragments (via build/gen-make-frags/suffix_list).
- Updated blis.h to conditionally #include bli_sandbox.h (via a new file,
bli_sbox.h), which each sandbox is assumed to use for any type
definitions and function prototypes it wishes to export out to blis.h.
- Conditionally disable bli_gemmnat() implementation in frame/3 when
BLIS_ENABLE_SANDBOX is defined.
* Fix detection of systems other than Linux and macOS
The way the logic is currently laid out, any platform that isn't Linux
gets assigned the .dylib shared library extension and the macOS-specific
compiler flags. This reverses the logic to check for macOS first, and
have the fallback use the Linux definitions, which apply to most other
systems as well.
* Use SHLIB_EXT instead of SO_SUF
The former is more standard, as jakirkham pointed out in a comment.
Details:
- Use the .dylib shared library suffix on OS X (instead of .so in Linux).
- Link with the -dynamiclib and -install_name options on OS X (instead of
-shared and -soname in Linux).
- Determine operating system (e.g. Linux, Darwin) during configure and
substitute into config.mk.in rather than run 'uname -s' during make.
- Echo operating system during configure.
Details:
- Tweaked configure so that the shared library is generated by default.
- Updated --help text and configure's feedback messages reporting the
status of the static/shared builds.
- Changed the order of build product installation so that headers are
installed last, after libraries and symlinks.