Details:
- Defined Fortran-77 compatible APIs for bli_thread_set_num_threads()
and bli_thread_set_ways(). These wrappers are defined in
frame/compat/blis/thread/b77_thread.c. Thanks to Kay Dewhurst for
suggesting these new interfaces.
- Added missing prototype for bli_thread_set_ways() in bli_thread.h and
removed prototypes for non-existent functions bli_thread_set_*_nt().
- CREDITS file update.
Details:
- Renamed the following C preprocessor macros whose fallback/default
values are specified within frame/include/bli_kernel_macro_defs.h:
BLIS_DEFAULT_MR_THREAD_MAX -> BLIS_THREAD_MAX_IR
BLIS_DEFAULT_NR_THREAD_MAX -> BLIS_THREAD_MAX_JR
BLIS_DEFAULT_M_THREAD_RATIO -> BLIS_THREAD_RATIO_M
BLIS_DEFAULT_N_THREAD_RATIO -> BLIS_THREAD_RATIO_N
- Renamed the above cpp macro overrides within the knl, skx, and zen
sub-configurations, as well as invocations of those macros in
bli_rntm.c.
- Moved config/zen/bli_kernel.h to an 'old' directory as it is no longer
used by any code within BLIS.
Details:
- Previously, trsm was consolidating all ways of parallelism into the jr
loop. This was unnecessary and to some degree detrimental on some
types of hardware. Now, any parallelism bound for the jc loop will be
applied to the jc loop, while all other loops' parallelism is funneled
to the jr loop. Thanks to Devangi Parikh for helping investigate this
issue and suggesting the fix.
- NOTE: This change affects only left-side trsm. However, currently
right-side trsm is currently implemented in terms of the left-side
case, and thus the change effectively applies to both left and right
cases.
Details:
- Removed a guard from bli_clock_min_diff() that would return 0 if the
time delta was greater than 60 minutes. This was originally intended
to disregard extremely large values under the assumption that the
user probably didn't intend to run a test that long. However, since
it is in bli_clock_min_diff(), it doesn't actually help short-circuit
an implementation that is hanging or looping infinitely, since such
an implementation would first have to finish before the
bli_clock_min_diff() is called. Thanks to Kiran Varaganti for
reporting this issue.
Details:
- gcc 7 introduced new behavior to the -dumpversion option whereby only
the major version component is output. However, as part of this
change, gcc 7 also introduced a new option, -dumpfullversion, which is
guaranteed to always output the major, minor, and revision numbers. If
we are using gcc 7 or later, we re-query the version string with this
new option and then re-parse the result so as to avoid misleading
output from configure (e.g. using gcc 7.3.0 is reported as 7.7.7).
Details:
- Use the modulo operator to limit the size of an integer that is given
to sprintf(). This avoids a warning in some versions of gcc about the
integer potentially overflowing the available space in the string into
which the integer is being printed.
Details:
- Replaced much of "Getting Started" section with a shortened version of
the bullet list of documentation currently shown in the github wiki
page. Thanks to Devangi Parikh for her feedback in this change.
Details:
- Updated the configure script to allow slashes in version string. This
is needed so that downstream maintainers (such as those for Debian)
can create local tags such as "upstream/0.4.1". Thanks to M. Zhou for
reporting this issue via PR #256 and providing me the information
needed to debug the problem.
Details:
- Removed the top-level 'debian' directory. This directory is apparently
no longer needed (issue #257). Thanks to M. Zhou and Nico Schlömer for
their contributions.
Details:
- Corrected feedback echoed to user by configure when libmemkind is
found but not explicitly requested. In these cases, configure would
echo a message that it had received an explicit request to enable
libmemkind, which was not accurate, even if the end result was the
same--that libmemkind is enabled by default when it is found. Thanks
To Devangi Parikh for reporting this issue.
Details:
- Rewrote bli_winsys.c to define bli_setenv() and bli_sleep()
unconditionally, but differently for Windows and non-Windows, but
then disabled the definition of bli_setenv() entirely since BLIS
no longer needs to set environment variables. Updated bli_winsys.h
accordingly, and call bli_sleep() from within testsuite instead of
sleep() directly.
- Use
#if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS != 200809L)
instead of
#if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS < 0)
when guarding against local definition of pthread barrier in
testsuite. (The description for unistd.h implies that _POSIX_BARRIERS
should always be set to 200809L when barriers are supported, though I
won't be surprised if we encounter a case in the future where it is
set to something else such as 1 while still supported.)
- Removed old _VERS_CONF_INST definitions and installation rules in
top-level Makefile. These are no longer needed because we no longer
output libraries with the version and configuration name as
substrings.
- Comment/whitespace updates in Makefile, config.mk.in, common.mk,
configure, bli_extern_defs.h, and test_libblis.h.
- Added mention of 1m to README.md and other trivial tweaks.
Details:
- Added the new bli_dgemm_skx_asm_16x14.c microkernel from the skx-redux
branch, along with appropriate blocksizes in bli_cntx_init_skx.c and
a prototype in bli_kernels_skx.h. (Devin has not yet written the
sgemm analague, so for now we will continue using the older sgemm
ukernel.)
- Updated frame/include/bli_x86_asm_macros.h with a minor change that
was present within the skx-redux branch.
* Enable shared
* Enable rdp
* Add support for dll
* Use libblis-symbols.def
* Fix building dlls
* Fix libblis-symbols.def
* Fix soname
* Fix Makefile error
* Fix install target
* Fix missing symbols
* Add BLIS_MINUS_TWO
* Add path to dll
* Fix OSX soname
* Add declspec for dll
* Add -DBLIS_BUILD_DLL
* Replace @enable_shared@ in config
* switch to auto for now
* blis_ -> bli_
* Remove BLIS_BUILD_DLL in make check
* change auto->haswell
* enable_shared_01
* Add wno-macro-redefined
* print out.cblat3
* BLIS_BUILD_DLL -> BLIS_IS_BUILDING_LIBRARY
* Use V=1
* Remove fpic for windows
* Remember LIBPTHREAD
* Remove libm for windows
* Remember AR
* Fix remembering libpthread
* Add Wno-maybe-uninitialized in only gcc
* Don't do blastest for shared for now
* Fix install target
And remove unnecessary change
* test auto and x86_64
* Fix install target again
* Use IS_WIN variable
* Remove leading dot from LIBBLIS_SO_MAJ_EXT
* Make is_win yes/no
* Add comments for windows builds
* Change if else blocks location
Details:
- Use get-user-cflags-for() to generate cflags when compiling BLAS test
drivers and BLIS testsuite from top-level Makefile. Meant to include
these changes in previous commit (4b5437e). Thanks to Isuru Fernando
for pointing out this oversight.
Details:
- Tweaked the cflags functions in common.mk so that a new preprocessor
macro, BLIS_IS_BUILDING_LIBRARY, is defined, but only when BLIS
itself is being built. This macro will not be defined when, for
example, the testsuite or example code compiles code local to those
applications. This was done in part by defining a new cflags function
get-user-cflags-for(), which is now the designated function for
application Makefiles if they wish to inherit a basic set of CFLAGS
from BLIS. (The compiler flags returned are identical to that of
get-frame-cflags-for() except that -DBLIS_IS_BUILDING_LIBRARY is
omitted.)
- Updated all test driver-like makefiles to call get-user-cflags-for()
instead of get-frame-cflags-for().
Details:
- Added a new sub-configuration 'cortexa53', which is a mirror image
of cortexa57 except that it will use slightly different compiler
flags. Thanks to Mathieu Poumeyrol for making this suggestion after
discovering that the compiler flags being used by cortexa57 were
not working properly in certain OS X environments (the fix to which
is currently pending in pull request #245).
Details:
- Removed four trailing spaces after "BLIS" that occurs in most files'
commented-out license headers.
- Added UT copyright lines to some files. (These files previously had
only AMD copyright lines but were contributed to by both UT and AMD.)
- In some files' copyright lines, expanded 'The University of Texas' to
'The University of Texas at Austin'.
- Fixed various typos/misspellings in some license headers.
Details:
- Added lines associated with the testsuite's new threading option to
input.general.fast. This change was intended for the previous commit
(10d0735).
Details:
- Replaced critical sections that were conditional upon multithreading
being enabled (via pthreads or OpenMP) with unconditional use of
pthreads mutexes. (Why pthreads? Because BLIS already requires it
for its initialization mechanism: pthread_once().) This was done in
bli_error.c, bli_gks.c, bli_l3_ind.c. Also, replaced usage of BLIS's
mtx_t object and bli_mutex_*() API with pthread mutexes in
bli_thread.c. The previous status quo could result in a race condition
if the application called BLIS from more than one thread. The new
pthread-based code should be completely agnostic to the application's
threading configuration. Thanks to AMD for bringing to our attention
the need for a thread-safety review.
- Added an option to the testsuite to simulate application-level
multithreading. Specifically, each thread maintains a counter that is
incremented after each experiment. The thread only executes the
experiment if: counter % n_threads == thread_id. In other words, the
threads simply take turns executing each problem experiment. Also,
POSIX guarantees that fprintf() will not intermingle output, so
output was switched to fprintf() instead of libblis_test_fprintf().
- Changed membrk_t objects to use pthread_mutex_t intead of mtx_t and
replaced use of bli_mutex_init()/_finalize() in bli_membrk.c with
wrappers to pthread_mutex_init()/_destroy().
- Changed the implementation of bli_l3_ind_oper_enable_only() to fix
a race condition; specifically, two threads calling the function with
the same parameters could lead to a non-deterministic outcome.
- Added #include <pthread.h> to bli_cpuid.c and moved the same in
bli_arch.c.
- Added 'const' to declaration of OPT_MARKER in bli_getopt.c.
- Added #include <pthread.h> to bli_system.h.
- Added add-copyright.py script to automate adding new copyright lines
to (and updating existing lines of) source files.
Details:
- Fixed stale filepaths to check-blastest.sh and check-blistest.sh in
travis/do_testsuite.sh and travis/do_sde.sh.
- Create a symbolic link to the 'config' directory so that the top-level
Makefile can find the configs' make_defs.mk files during out-of-tree
builds.
- Added additional case handling to out-of-tree scenario to handle
situations where files 'Makefile', 'common.mk', or 'config' exist but
are not symbolic links. In such cases, configure warns the user and
exits.
- Homogenized various error messages throughout configure.
- Belated thanks to Victor Eijkhout for requesting the feature added
in 0f491e9 whereby lesser Makefiles can compile and link against
an existing installation of BLIS.
Details:
- Updated the build system so that "lesser" Makefiles, such as those in
belonging to example code or the testsuite, may be run even if the
directory is orphaned from the original build tree. This allows a
user to configure, compile, and install BLIS, delete the build tree
(that is, the source distribution, or the build directory for out-
of-tree builds) and then compile example or testsuite code and link
against the installed copy of BLIS (provided the example or testsuite
directory was preserved or obtained from another source). The only
requirement is that make be invoked while setting the
BLIS_INSTALL_PATH variable to the same installation prefix used when
BLIS was configured. The easiest syntax is:
make BLIS_INSTALL_PATH=/install/prefix
though it's also permissible to set BLIS_INSTALL_PATH as an
environment variable prior to running 'make'.
- Updated all lesser Makefiles to implement the new aforementioned build
behavior.
- Relocated check-blastest.sh and check-blistest.sh from build to
blastest and testsuite, respectively, so that if those directories are
copied elsewhere the user can still run 'make check' locally.
- Updated docs/Testsuite.md with language that mentions this new option
of building/linking against an installed copy of BLIS.
Details:
- Changed configure so that the absence of any C++ compiler from the
pre-defined search list does not result in an exit. Instead, in this
situation, the found_cxx variable is assigned 'c++notfound' and the
error message is changed to remind the user that C++ will not be
available in the sandbox. Thanks to Devangi Parikh for reporting this
issue.
- Also tweaked the message when a C++ compiler *is* found to remind any
would-be confused user that BLIS will only use C++ if it is needed by
code in the sandbox.
Details:
- Fixed a bug in the way that the variadic bli_cntx_set_l3_nat_ukrs()
function was defined. This function is meant to take a microkernel id,
microkernel datatype, microkernel address, and microkernel preference
as arguments, and is typically called within the bli_cntx_init_*()
function defined within a sub-configuration for initializing an
appropriate context. The problem is with the final argument: the
microkernel preference. These preferences are actually boolean values,
0 or 1 (encoded as FALSE or TRUE). Since the variadic function does
not give the compiler any type information for any variadic arguments,
they are "promoted" in the course of internal (macroized) processing
according to default argument promotion rules. Thus, integer literals
such as 0 and 1 become int and floating-point literals (such as 0.0 or
1.0) become double. Previous to this commit, we indicated to va_arg()
that the ukernel preference was a 'bool_t', which is a typedef of
int64_t on 64-bit systems. On systems where int is defined as 64 bits,
no problems manifest since int is the same size as the type we passed
in to va_arg(), but on systems where int is 32 bits, the ukernel
preference could be misinterpreted as a garbage value. (This was
observed on a modern armv8 system.) The fix was to interpret the
bool_t value as int and then immediately typecast it to and store it
as a bool_t. Special thanks to Devangi Parikh for helping track down
this issue, including deciphering the use of va_arg() and its
byzantine treatment of types.
- Added explicit typecasts for all invocations of va_arg() in
bli_cntx.c.
Details:
- Fixed a memory leak in the global kernel structure that resulted in 56
bytes per configured architecture (of which only 18 are presently
supported by BLIS). The leak would only manifest if BLIS was
initialized and then finalized before the application terminated.
Thanks to Devangi Parikh for helping track down this leak.