Adds `--enable-rpath/--disable--rpath` (default disabled) to use an install_name starting with @rpath/. Otherwise, set the install_name to the absolute path of the install library, which was the previous behavior.
Details:
- Reworked support for ARM hardware detection in bli_cpuid.c to parse
the result of a CPUID-like instruction.
- Added a64fx support to bli_gks.c.
- #include arm64 and arm32 family headers from bli_arch_config.h.
- Fix the ordering of the "armsve" and "a64fx" strings in the
config_name string array in bli_arch.c. The ordering did not match
the ordering of the corresponding arch_t values in bli_type_defs.h,
as it should have all along.
- Added clang support to make_defs.mk in arm64, cortexa53, cortexa57
subconfigs.
- Updated arm64 and arm32 families in config_registry.
- Updated docs/HardwareSupport.md to reflect added ARM support.
- Thanks to Dave Love, RuQing Xu, and Devin Matthews for their
contributions in this PR (#344).
- RPATH entries (and DYLD_LIBRARY_PATH) do nothing on macOS unless the install_name of the library starts with @rpath/. While the install_name can be set to the absolute install path, this makes the installation non-relocatable. When using @path in the install_name, install paths within the normal DYLD_LIBRARY_PATH work with no changes on the user side, but for install paths off the beaten track, users must specify an RPATH entry when linking (or modify DYLD_LIBRARY_PATH at runtime). Perhaps this could be made into a configure-time option.
- Having relocable testsuite binaries is not necessarily a priority but it is easy to do with @executable_path (macOS) or $ORIGIN (linux/BSD).
Details:
- Updated FAQ.md to include two new questions, reordered an existing
question, and also removed an outdated and redundant question about
BLIS vs. AMD BLIS.
- Updated Sandboxes.md to use 'gemmlike' as its main example, along with
other smaller details.
- Added ARM as a funder to README.md.
Details:
- Modified .travis.yml so that only commits to 'master', 'dev', and
'amd' branches get built by Travis CI. Thanks to Devin Matthews for
helping to track down the syntax for this change.
Details:
- Re-enabled the changes made in fb93d24.
- Defined BLIS_ENABLE_SYSTEM in bli_arch.c, bli_cpuid.c, and bli_env.c,
all of which needed the definition (in addition to config_detect.c) in
order for the configure-time hardware detection binary to be compiled
properly. Thanks to Minh Quan Ho for helping identify these additional
files as needing to be updated.
- Added additional comments to all four source files, most notably to
prompt the reader to remember to update all of the files when updating
any of the files. Also made the cpp code in each of the files as
consistent/similar as possible.
- Refer to issues #532 and PR #546 for more history.
Details:
- Re-enable the changes originally made in 8e0c425 but quickly reverted
in 2be78fc.
- Moved the #include of bli_config.h so that it occurs before the
#include of bli_system.h. This allows the #define BLIS_ENABLE_SYSTEM
or #define BLIS_DISABLE_SYSTEM in bli_config.h to be processed by the
time it is needed in bli_system.h. This change should have been
in the original 8e0c425, but was accidentally omitted. Thanks to Minh
Quan Ho for catching this.
- Add #define BLIS_ENABLE_SYSTEM to config_detect.c so that the proper
cpp conditional branch executes in bli_system.h when compiling the
hardware detection binary. The changes made in 8e0c425 were an attempt
to support the definition of BLIS_OS_NONE when configuring with
--disable-system (in issue #532). That commit failed because, aside
from the required but omitted header reordering (second bullet above),
AppVeyor was unable to compile the hardware detection binary as a
result of missing Windows headers. This commit, which builds on PR
#546, should help fix that issue. Thanks to Minh Quan Ho for his
assistance and patience on this matter.
- There was redundance between the macro BLIS_MAX_NUM_ERR_MSGS (=200) and
the enum BLIS_ERROR_CODE_MAX (-170), while they both mean the same thing:
the maximal number of error codes/messages.
- The previous initialization of error messages at compile time ignored that
the 'bli_error_string' array still occupies useless memory due to 2D char[][]
declaration. Instead, it should be just an array of pointers, pointing at
strings in .rodata section.
- This commit does the two modifications:
* retired macros BLIS_MAX_NUM_ERR_MSGS and BLIS_MAX_ERR_MSG_LENGTH everywhere
* switch bli_error_string from char[][] to char *[] to reduce its footprint
from 40KB (200*200) to 1.3KB (170*sizeof(char*)).
(No problem to use the enum BLIS_ERROR_CODE_MAX at compile-time,
since compiler is smart enough to determine its value is 170.)
Details:
- Removed the commented-out #define BLIS_NUM_ARCHS in bli_type_defs.h
and its associated (now outdated) comments. BLIS_NUM_ARCHS has been
part of the arch_t enum for some time now, and so this change is
mostly about removing any opportunity for confusion for people who
may be reading the code. Thanks to Minh Quan Ho for leading me to
cleanup.
Details:
- Defined a new packm variant for the 'gemmlike' sandbox. This new
variant (bls_l3_packm_var3.c) parallelizes the packing operation over
the k dimension rather than the m or n dimensions. Note that the
gemmlike implementation still uses var1 by default, and use of the new
code would require changing bls_l3_packm_a.c and/or bls_l3_packm_b.c
so that var3 is called instead. Thanks to Jeff Diamond for proposing
this (perhaps NUMA-friendly) solution.
- `ref2` call in `bli_gemmsup_rv_armv8a_asm_d6x8m.c` is commented out.
- `bli_gemmsup_rv_armv8a_asm_d4x8m.c` contains a tail `ref2` call but
it's not called by any upper routine.
Details:
- Modified bli_system.h so that the cpp macro BLIS_OS_NONE is defined
when BLIS_DISABLE_SYSTEM is defined. Otherwise, the previous OS-
detecting macro conditionals are considered. This change is to
accommodate a solution to a cross-compilation issue described in
#532.
Details:
- Updated two out-of-date calls to bli_malloc_intl() within the gemmlike
sandbox. These calls to malloc_intl(), which resided in
bls_l3_decor_pthreads.c, were missing the err_t argument that the
function uses to report errors. Thanks to Jeff Diamond for helping
isolate this issue.
Ref cannot handle panel strides (packed cases) thus cannot be called
from the beginning of `gemmsup` (i.e. cannot be dispatch target of
gemmsup to other sizes.)
Details:
- Moved miscellaneous language-related definitions, including defs
related to the handling of the 'restrict' keyword, from the top half
of bli_macro_defs.h into a new file, bli_lang_defs.h, which is now
#included immediately after "bli_system.h" in blis.h. This change is
an attempt to fix a report of recent breakage of C++ compilers due
to the recent introduction of 'restrict' in bli_type_defs.h (which
previously was being included *before* bli_macro_defs.h and its
restrict handling therein. Thanks to Ivan Korostelev for reporting
this issue in #527.
- CREDITS file update.