Details:
- Fixed various typecasts in
frame/base/bli_cntx.h
frame/base/bli_mbool.h
frame/base/bli_rntm.h
frame/include/bli_misc_macro_defs.h
frame/include/bli_obj_macro_defs.h
frame/include/bli_param_macro_defs.h
that were missing or being done improperly/incompletely. For example,
many return values were being typecast as
(bool_t)x && y
rather than
(bool_t)(x && y)
Thankfully, none of these deficiencies had manifested as actual bugs
at the time of this commit.
- Changed the return type of bli_env_get_var() from dim_t to gint_t.
This reflects the fact that bli_env_get_var() needs to be able to
return a signed integer, and even though dim_t is currently defined
as a signed integer, it does not intuitively appear to necessarily be
signed by inspection (i.e., an integer named "dim_t" for matrix
"dimension"). Also, updated use of bli_env_get_var() within
bli_pack.c to reflect the changed return type.
- Redefined type of thrcomm_t.barrier_sense field from bool_t to gint_t
and added comments to the bli_thrcomm_*.h files that will explain a
planned replacement of bool_t with C99's bool type.
- Note: These changes are being made to facilitate the substitution of
'bool' for 'bool_t', which will eliminate the namespace conflict with
arm_sve.h as reported in issue #420. This commit implements the first
phase of that transition. Thanks to RuQing Xu for reporting this
issue.
- CREDITS file update.
Details:
- Updated all static function definitions to use the cpp macro
BLIS_INLINE instead of the static keyword. This allows blis.h to
use a different keyword (inline) to define these functions when
compiling with C++, which might otherwise trigger "defined but
not used" warning messages. Thanks to Giorgos Margaritis for
reporting this issue and Devin Matthews for suggesting the fix.
- Updated the following files, which are used by configure's
hardware auto-detection facility, to unconditionally #define
BLIS_INLINE to the static keyword (since we know BLIS will be
compiled with C, not C++):
build/detect/config/config_detect.c
frame/base/bli_arch.c
frame/base/bli_cpuid.c
- CREDITS file update.
Details:
- Relocated the #include "cpuid.h" directive from bli_cpuid.h to
bli_cpuid.c. This was done because cpuid.h (which is pulled into
the post-build blis.h developer header) doesn't protect its
definitions with a preprocessor guard of the form:
#ifndef FOOBAR_H
#define FOOBAR_H
// header contents.
#endif
and as a result, applications (previously) could not #include both
blis.h and cpuid.h (since the former was already including the
latter). Thanks to Bhaskar Nallani for raising this issue via #393
and to Devin Matthews for suggesting this fix.
- CREDITS file update.
* OSX: specify the full path to the location of libblis.dylib so that it can be found at runtime
Before this change:
Appication gives runtime error [when linked with blis]
dyld: Library not loaded: libblis.3.dylib
balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
After this change:
balay@kpro lib % otool -L libblis.dylib
libblis.dylib:
/Users/balay/petsc/arch-darwin-c-debug/lib/libblis.3.dylib (compatibility version 0.0.0, current version 0.0.0)
/usr/lib/libSystem.B.dylib (compatibility version 1.0.0, current version 1281.0.0)
* INSTALL_LIBDIR -> libdir as INSTALL_LIBDIR has DESTDIR
Co-Authored-By: Jed Brown <jed@jedbrown.org>
* CREDITS file update.
Co-authored-by: Jed Brown <jed@jedbrown.org>
Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
Details:
- Disabled use of optimized amaxv kernels, which use vector intrinsics
for both 's' and 'd' datatypes. We disable these kernels because the
current implementations fail to observe a semantic property of the
BLAS i?amax_() subroutine, which is to return the index of the
*first* element containing the maximum absolute value (that is, the
first element if there exist two or more elements that contain the
same value). With the optimized kernels disabled, the affected
subconfigurations (haswell, zen, zen2, knl, and skx) will use the
default reference implementations. Thanks to Mat Cross for reporting
this issue via #380.
- CREDITS file update.
Details:
- Fixed an error that manifests only when using C++ (specifically,
modern versions of g++) to compile drivers in 'test' (and likely most
other application code that #includes blis.h. Thanks to Ajay Panyala
for reporting this issue (#374).
Details:
- Fixed a bug in sdsdot_sub() that redundantly added the "alpha" scalar,
named 'sb'. This value was already being added by the underlying
sdsdot_() function. Thus, we no longer add 'sb' within sdsdot_sub().
Thanks to Simon Lukas Märtens for reporting this bug via #367.
- Fixed a second bug in order of typecasting intermediate products in
sdsdot_(). Previously, the "alpha" scalar was being added after the
"outer" typecast to float. However, the operation is supposed to first
add the dot product to the (promoted) scalar and THEN downcast the sum
to float. Thanks to Devin Matthews for catching this bug.
Details:
- Added BLIS_EXPORT_BLIS annotation to function prototypes for
bli_thrcomm_bcast()
bli_thrcomm_barrier()
bli_thread_range_sub()
so that these functions are exported to shared libraries by default.
This (hopefully) fixes issue #366. Thanks to Kyungmin Lee for
reporting this bug.
- CREDITS file update.
Details:
- Added language to docs/Multithreading.md cautioning the reader about
the nuances of setting multithreading parameters via the manual and
automatic ways simultaneously, and also about how these parameters
behave when multithreading is disabled at configure-time. These
changes are an attempt to address the issues that arose in issue #362.
Thanks to Jérémie du Boisberranger for his feedback on this topic.
- CREDITS file update.
Details:
- Modified the 0644 octal code used during installation of shared
libraries to 0755 (for Linux/OSX only). Thanks to Adam J. Stewart
for reporting this issue via #343.
- CREDITS file update.
* Always use sqsumv to compute normfv on MacOS.
* Unconditionally disable the "dot trick" in normfv.
* Added explanatory comment to normfv definition.
Details:
- Added a comment above the unconditional disabling of the dotv-based
implementation to normfv. Thanks to Roman Yurchak, Devin Matthews,
and Isuru Fernando in helping with this improvement.
- CREDITS file update.
Details:
- Changed the default installation prefix from $HOME/lib to /usr/local.
- Modified the way configure internally handles the prefix, libdir,
includedir, and sharedir (and also added an --exec-prefix option).
The defaults to these variables are set as follows:
prefix: /usr/local
exec_prefix: ${prefix}
libdir: ${exec_prefix}/lib
includedir: ${prefix}/include
sharedir: ${prefix}/share
The key change, aside from the addition of exec_prefix and its use to
define the default to libdir, is that the variables are substituted
into config.mk with quoting that delays evaluation, meaning the
substituted values may contain unevaluated references to other
variables (namely, ${prefix} and ${exec_prefix}). This more closely
follows GNU conventions, including those used by GNU autoconf, and
also allows make to override any one of the variables *after*
configure has already been run (e.g. during 'make install').
- Updates to build/config.mk.in pursuant to above changes.
- Updates to output of 'configure --help' pursuant to above changes.
- Updated docs/BuildSystem.md to reflect the new default installation
prefix, as well as mention EXECPREFIX and SHAREDIR.
- Changed the definitions of the UNINSTALL_OLD_* variables in the
top-level Makefile to use $(wildcard ...) instead of 'find'. This
was motivated by the new way of handling prefix and friends, which
leads to the 'find' command being run on /usr/local (by default),
which can take a while almost never yielding any benefit (since the
user will very rarely use the uninstall-old targets).
- Removed periods from the end of descriptive output statements (i.e.,
non-verbose output) since those statements often end with file or
directory paths, which get confusing to read when puctuated by a
period.
- Trival change to 'make showconfig' output.
- Removed my name from 'configure --help'. (Many have contributed to it
over the years.)
- In configure script, changed the default state of threading_model
variable from 'no' to 'off' to match that of debug_type, where there
are similarly more than two valid states. ('no' is still accepted
if given via the --enable-debug= option, though it will be
standardized to 'off' prior to config.mk being written out.)
- Minor variable name change in flatten-headers.py that was intended for
32812ff.
- CREDITS file update.
Details:
- Updated the Haswell, SkylakeX, and Epyc performance graphs in
docs/graphs to report on Eigen implementations, where applicable.
Specifically, Eigen implements all level-3 operations sequentially,
however, of those operations it only provides multithreaded gemm.
Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
omitted. Thanks to Sameer Agarwal for his help configuring and
using Eigen.
- Updated docs/Performance.md to note the new implementation tested.
- CREDITS file update.
Details:
- Added a new markdown document, docs/Performance.md, which reports
performance of a representative set of level-3 operations across a
variety of hardware architectures, comparing BLIS to OpenBLAS and a
vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
Details:
- After merging PR #303, at Isuru's request, I removed the use of
BLIS_EXPORT_BLIS from all function prototypes *except* those that we
potentially wish to be exported in shared/dynamic libraries. In other
words, I removed the use of BLIS_EXPORT_BLIS from all prototypes of
functions that can be considered private or for internal use only.
This is likely the last big modification along the path towards
implementing the functionality spelled out in issue #248. Thanks
again to Isuru Fernando for his initial efforts of sprinkling the
export macros throughout BLIS, which made removing them where
necessary relatively painless. Also, I'd like to thank Tony Kelman,
Nathaniel Smith, Ian Henriksen, Marat Dukhan, and Matthew Brett for
participating in the initial discussion in issue #37 that was later
summarized and restated in issue #248.
- CREDITS file update.
Details:
- Added "What's New" and "What People Are Saying About BLIS" sections to
README.md.
- Added missing github handles to various individuals' entries in the
CREDITS file.
Details:
- Fixed an issue with specifying threading globally at runtime via
bli_thread_set_num_threads() (the automatic way) or via
bli_thread_set_ways() (the manual way), with bli_thread_init_rntm()
also affected. These functions were not calling bli_init_once() prior
to acting, and therefore their effects on the global rntm_t structure
were being wiped out by the eventual call to bli_init_once(), by some
other BLIS function. Thanks to Ali Emre Gülcü for reporting the
behavior associated with this bug.
- Added additional content to docs/Multithreading.md covering topics of
choosing between OpenMP and pthreads, and specifying affinity via
OpenMP.
- CREDITS file update.
Details:
- Defined Fortran-77 compatible APIs for bli_thread_set_num_threads()
and bli_thread_set_ways(). These wrappers are defined in
frame/compat/blis/thread/b77_thread.c. Thanks to Kay Dewhurst for
suggesting these new interfaces.
- Added missing prototype for bli_thread_set_ways() in bli_thread.h and
removed prototypes for non-existent functions bli_thread_set_*_nt().
- CREDITS file update.
Details:
- Fixed a linker error that occurred when attempting to compile and link
the testsuite and/or BLAS test drivers after having configured BLIS to
only generate a shared library (no static library). The chosen
solution involved
(1) adding the local library path, $(BASE_LIB_PATH), to the search
paths for the shared library via the link option
-Wl,-rpath,$(BASE_LIB_PATH).
(2) adding a local symlink to $(BASE_LIB_PATH) that uses the .so major
version number so that ld would find the shared library at
execution time.
Thanks to Sajid Ali for reporting this issue, to Devin Matthews for
pointing out the need for the -rpath option, and to Devangi Parikh for
helping Sajid isolate the problem.
- Added #include <ctype.h> to bli_system.h to avoid a compiler warning
resulting from using toupper() from bli_string.c without a prototype.
Thanks again to Sajid Ali, whose build log revealed this compiler
warning.
- Added '*.so.*' to .gitignore.
- CREDITS file update.
Details:
- Added missing pthreads library linking to example makefile in
docs/BuildSystem.md, as well as similar language to build requirements
at the beginning of the document. Thanks to Stefanos Mavros for
bringing this to our attention.
- Updated CREDITS file.
Details:
- Fixed artifacts of malformed recursive sed expressions used when
preparing 4b36e85, in which most function-like macros were converted
to static functions. The syntactically defective code was contained
entirely in examples/oapi. Thanks to Tony Skjellum for reporting this
issue.
- Update to CREDITS file.
Details:
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
assigned values of 32, 64, or some other value. The former two result in
defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
causes integers to be defined in terms of a default type (e.g. long int).
- Updated bli_config.h in reference and clarksville configurations according
to above changes.
- Updated test drivers in test and testsuite to avoid type warnings associated
with format specifiers not matching the types of their arguments to printf()
and scanf().
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
inclusion in d141f9eeb6).
- Added explicit typecasting of dim_t and inc_t to macros in
bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
- Slight changes to CREDITS and INSTALL files.
- Slight tweaks to Windows build system, mostly in the form of switching to
Windows-style CRLF newlines for certain files.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
Details:
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
updated as part of the macro. All current uses of the macro have been
coupled with instances of bl2_obj_set_trans() to clear the bit.
- Added Jed to CREDITS file.
Details:
- Merged contents of bl2_arch.h into bl2_config.h for reference and
clarksville configurations.
- Updated CREDITS, INSTALL, LICENSE, README files.