Re-enabled AMD-specific optimizations for zen.
Details:
- Re-enabled Zen-specific cache blocksizes for 'zen' sub-configuration.
- Re-enabled small matrix gemm optimization for 'zen'.
- These were both temporarily disabled during a previous merge simply due to lack of Zen hardware for testing.
Details:
- Execute 'cleanh' target as part of 'clean'
- Remove cblas.h file from 'include/<configname>/' as part of 'cleanh'
target.
- Updated the echoed (non-verbose) text for uniformity.
Details:
- Added logic to build/bump-version.sh that will run './configure auto'
if 'common.mk' is not present (usually because 'make distclean' was run
recently).
Details:
- Added code to examples/oapi/5level1m.c that demonstrates transposing
(and conjugate-transposing) unstructured matrices.
- Comment updates to 6level1m_diag.c to maintain consistency with new
examples in 5level1m.c.
Details:
- Added a new code example file to examples/oapi demonstrating how to use
various utility operations.
- Comment updates to other example files.
- README updates.
Details:
- Previously forgot to add explicit enforcement of a minimum gcc version
in configure script when 'knl' sub-configuration is requested.
- Comment updates to configure.
Details:
- Added an 'examples' directory at the top level.
- Added an 'oapi' subdirectory in 'examples' that contains a tutorial-like
sequence of example code demostrating the core functionality of BLIS's
object-based API, along with a Makefile and README. Thanks to Victor
Eijkhout for being the first to suggest including such code in BLIS.
Details:
- Added bli_setgetijm.c, which defines bli_setijm(), bli_getijm(), and
related functions that can be used to read and write individual
elements of an obj_t.
- Defined a new function, bli_obj_create_conf_to(), in bli_obj.c that will
create a new object with dimensions conformal to an existing object.
Transposition and conjugation states on the existing object are ignored,
as are structure and uplo fields.
- Defined a new function, bli_datatype_string(), in bli_obj.c that returns
a char* to a string representation of the name of each num_t datatype.
For example, BLIS_DOUBLE is "double" and BLIS_DCOMPLEX is "dcomplex".
BLIS_INT is included (as "int"), but BLIS_CONSTANT is not, and thus is
not a valid input argument to bli_datatype_string().
- Added calls to bli_init_once() to various functions in bli_obj.c, the
most important of which was bli_obj_create_without_buffer().
- Removed unintended/extra newline from the end of printv output.
- Whitespace changes to
- frame/base/bli_machval.c
- frame/base/bli_machval.h
- frame/0/copysc/bli_copysc.c
- Trivial changes to README.md and common.mk.
Details:
- Added a file named 'RELEASING' that contains basic notes on how to
create a new version/release of BLIS. This is mostly just a reminder
to myself, but also may become useful if/when others take over
development and administration of the project.
Details:
- Imported the 24x16 knl sgemm microkernel (and its corresonding spackm
kernel) from TBLIS and enabled its use in the knl sub-config. Also
Added sgemm microkernel prototype to bli_kernels_knl.h.
- Updated dgemm and dpackm microkernels from TBLIS, which included an
important change regarding the offsets array (changed from extern
declaration to static declaration/definition).
- Activated use of level-1v and -1f zen kernels in skx and knl
sub-configs.
- Removed some old macros no longer needed in bli_family_skx.h now that
libmemkind support exists in configure.
- Moved bli_avx512_macros.h to frame/include and adjusted #includes in
skx and knl kernels accordingly.
- Moved unused kernels in kernels/knl/3 to kernels/knl/3/other
directory.
- Fixed a minor bug in the 'make' output per compile when verboseness
is not turned on. The rule-generating function 'make-kernel-rule' was
previously passing in the name of the config, rather than the name of
the kernel set returned by get-config-for-kset, which could give
misleading information to the user when the kconfig_map mapped a
kernel set to a sub-configuration that did not share the same name.
(This didn't affect the CFLAGS that were actually used.)
- Updated test/3m4m/Makefile, removing acml targets and renaming the
remaining targets.
Details:
- Moved microkernels in kernels/haswell/3 to kernels/haswell/3/old. These
microkernels were no longer being used and only sowed confusion to
anyone inspecting the repository without being fully cognizant of the
build system and how it works (and sometimes even to those who wrote
the build system). Note that the haswell configuration currently
employs the zen microkernels.
Details:
- Swiched from querying version of 'objdump' to 'as' (e.g. the
assembler).
- Fixed the outputting of the version of 'as' on OS X, which required
this beauty:
...=$(as -v /dev/null -o /dev/null 2>&1)
- Only add sub-configs to blacklist if the sub-config hasn't already
been added.
Details:
- Added logic to configure that attempts to assemble various small files
containing select instructions designed to reveal whether binutils
(specifically, the assembler) supports emitting those instruction sets.
This information provides additional opportunities to blacklist sub-
configurations that are unsupported by the environment. Thanks to Devin
Matthews for pointing me towards a similar solution in TBLIS as an
example.
- Various other cleanups in configure.
- Reorganized the detection code in the 'build' directory, bringing the
"auto-detect" configuration detection, libmemkind detection, and new
instruction set detection codes into a single new subdirectory named
'detect'.
Details:
- Fixed a failure to observe the value of CC when selecting the compiler
in configure. Thanks to Devangi Parikh for reporting this bug.
- The semantics now also work for the CC environment variable. That is,
if CC is set prior to running configure, that value is used, but will
be overridden by specifying the CC= argument to configure. If the CC
environment variable is not set, the CC= value is used. If neither the
environment variable nor CC= are specified, then the choice is made
internally to configure: first attempting to find gcc, then clang, and
then cc.
Details:
- Removed some old conditional code in config/knl/make_defs.mk that
added -lmemkind to LDFLAGS if DEBUG_TYPE was not 'sde' and inserted
code into common.mk that affirmatively filters out -lmemkind from
LDFLAGS if DEBUG_TYPE is 'sde'. (Thanks to Dave Love for reporting
this issue.) Other minor cleanups to neighboring code in common.mk.
- Updated CRVECFLAGS in knl/make_defs.mk to be based on -march=knl,
and then AVX-512 functionality is manually removed via various
-mno-avx512* flags. Also, make the setting of CRVECFLAGS conditional
on CC_VENDOR. Similar change to skx/make_defs.mk.
- Comment/whitespace updates.
Details:
- Renamed CVECFLAGS variables in sub-configurations' make_defs.mk files
to CKVECFLAGS.
- Added default defintions of two new make variables to most sub-
configurations' make_defs.mk files--CROPTFLAGS and CRVECFLAGS--
which correspond to reference kernel analogues of the CKOPTFLAGS
and CKVECFLAGS, which track optimization and vectorization flags for
optimized kernels. Currently, two sub-configurations (knl and skx)
explicitly set CRVECFLAGS to non-default values (using AVX2 instead of
AVX-512 for reference kernels. Thanks to Jeff Hammond, whose feedback
prompted me to make this change (issue #187).
- Changed common.mk so that the get-refkern-cflags-for function returns
the flags associated with the given sub-configuration's CROPTFLAGS
and CRVECFLAGS (instead of CKOPTFLAGS and CKVECFLAGS).
Details:
- Changed from -v1 to -v0 when calling gen-make-frag.sh from configure.
The directory-by-directory recursive output didn't add much value to
the user, so now we just echo a line for each top-level directory into
which we will recurse (e.g. 'config', 'ref_kernels', 'frame', etc.).
This also helps keep more interesting information (from earlier in the
execution of configure) from scrolling out of the terminal window.
Details:
- Updated the build system and BLAS test drivers to use 64-bit integers
when BLIS is configured for 64-bit integers in the BLAS layer. Also
updated blastest/Makefile accordingly. Thanks to Dave Love for
reporting the need for this feature.
- Added a 'check' target to blastest/Makefile so that the user can see
a summary of the tests.
- Commented out the initial definition of INCLUDE_PATHS in common.mk,
which was used pre-monolithic header, back when BLIS needed paths to
*all* headers, rather than just a select few. This line is no longer
needed since the value of INCLUDE_PATHS is overwritten by a later
definition limited to only the header paths that are needed now.
Details:
- Removed CKOPTFLAGS and CVECFLAGS from the set of CFLAGS used to
compile bli_cntx_ref.c for each configuration. This is necessary
because the file defines functions like bli_cntx_init_skx_ref(),
which are called during BLIS's initialization of the global kernel
structure, potentially being executed by an architecture that lacks
the instruction set used to compile the kernels for, in this example,
skx, which would lead to an illegal instruction error. Thanks to
Dave Love for reporting this issue.
- Further adjusted CFLAGS used when compiling code in the 'config'
directory (e.g. bli_cntx_init_skx.c) as well as code in 'frame' so
as to avoid the aforementioned issue.
Details:
- Added color coding to output of check-blistest.sh, check-blastest.sh
scripts. Success messages are coded green and failure are coded red.
This helps draw the eye toward those messages as the 'make checkblis',
'make checkblis-fast', and 'make checkblas' targets are executed.
- Changed top-level Makefile so that execution will not halt if
'checkblis', 'checkblis-fast', or 'checkblas' targets fail, which
means that the second of the two tests (BLIS and BLAS) run by
'make check' will run even if the first test fails.
Details:
- Added 'skx' and 'knl' sub-configurations to the 'x86_64' configuration
family in the config_registry file.
- Added logic to configure that avoids committing certain sub-configs to
the configuration/kernel registries if those sub-configs cannot be
handled properly by the chosen compiler. (This was modeled after
similar logic in TBLIS's configure; thanks to Devin Matthews for
pointing this out.) First, the compiler and its version are inspected
and, based on the results, certain configurations are added to a
"blacklist". Then, as the configuration registries are being created,
configurations and/or kernels that match items in the blacklist are
skipped over and not commited to the registries. Under certain
circumstances, omitting a blacklisted configuration will indirectly
invalidate other configurations due to the loss of availability of
the original blacklisted configuration's kernel set. This additional
indirect blacklist is also accounted for.
- Added output to the beginning of configure that echos information
about the chosen compiler as well as the configurations that are
blacklisted and must be stripped from the registries.
- Various other cleanups in configure, especially with respect to
explicitly declaring local variables in functions.
- Comment updates to config/zen/make_defs.mk regarding choice of -march
flags based on compiler version.
Details:
- Fixed a compiler warning concerning a type mismatch between the
format specifier of the printf() call in cblas_xerbla.c and its
corresponding (info) argument. The warning manifested when the CBLAS
layer was enabled and the BLAS/CBLAS integer type siwas is set to 64
(the default is 32). The warning was fixed by changing the specifier
from %d to %jd and typecasting the argument to intmax_t. Thanks to
Dave Love for reporting this issue and submitting the patch.
Details:
- Split the main loop bodies of zen's [sd]dotxf kernels into two cases:
one to handle a column-stored matrix A and one to handle a row-stored
matrix A. This allows vector instructions to be employed even if A is
stored by rows (and A^T appears stored as columns). Both storage cases
use a common edge case loop. Thanks to Devin Matthews for this idea
and for prototyping the change needed for sdotxf kernel.
Details:
- Changed the declaration of k_iter and k_left for d, c, z microkernels
from dim_t to uint64_t. This is needed to ensure compatibility with
the movq instruction used to load the value into registers. This
change should have been made a long time ago, but for some reason
only recently began showing up via Travis CI.
Details:
- Fixed a compile-time error that occurred due to the fact that
BLIS_ENABLE_MEMKIND, defined in bli_config.h, was not being defined
soon enough to be used in bli_system.h where it is needed to determine
whether hbwmalloc.h should be #included. bli_system.h is now included
after bli_config.h (and bli_config_macro_defs.h). Thanks to Dave Love
for reporting this issue.
- Tweaked the language used by configure to echo the status of the
--with[out]-memkind option.
Details:
- Fixed a bug in the zen (also used by haswell) dotxf kernels whereby a
vzeroupper instruction destoryed part of the intermediate result
stored by the vdpps instructions that came right before. (The
vzeroupper instrinsic was removed.)
- Removed remaining vzeroupper instrinsics from other zen kernels.
Previously, the vzeroupper instructions were included because BLIS is
typically compiled with -mfpmath=sse. But it was brought to my
attention that inserting these vzeroupper instructions is unnecessary
for our purposes, since (a) -mfpmath=sse results in VEX-encoded scalar
code rather than literal SSE instructions, and (b) compilers already
(likely) insert vzeroupper instructions where necessary. Thanks to
Devin Matthews for zeroing in on the dotxf bug.
- Removed -malign-double from bulldozer make_defs.mk. This alignment
was already happening by default since bulldozer is an x86_64 system.
Details:
- Added support for libmemkind to configure. configure attempts to
detect the presence of libmemkind by compiling a small program
containing #include <hbwmalloc.h> and a call to hbw_malloc(). If
successful, it is assumed that libmemkind is present and available.
If present, use of libmemkind is enabled by default, and otherwise
use is disabled by default. If libmemkind is present, the user may
explicitly disable use of the library by running configure with the
--without-memkind option. Furthermore, a configuration may disable
libmemkind, perhaps conditional on some aspect of the build system,
by including -DBLIS_DISABLE_MEMKIND in the configuration's CPPROCFLAGS
make variable and setting the BLIS_ENABLE_MEMKIND makefile variable,
set in config.mk, to 'no'. (The knl configuration makes use of this
latter feature; see below.)
- If enabled at configure-time, bli_system.h will #include <hbwmalloc.h>
and bli_kernel_macro_defs.h will define BLIS_MALLOC_POOL and
BLIS_FREE_POOL to use hbw_malloc() and hbw_free(), respectively.
- Deprecated explicit use of BLIS_NO_HBWMALLOC in
config/knl/bli_family.knl.h and replaced use of -DBLIS_NO_HBWMALLOC in
config/knl/make_defs.mk with -DBLIS_DISABLE_MEMKIND, which overrides
(#undefs) the definition of BLIS_ENABLE_MEMKIND in bli_system.h, if it
would otherwise be defined. Also, set the BLIS_ENABLE_MEMKIND makefile
variable to 'no'.
- common.mk now adds libmemkind to LDFLAGS if libmemkind is enabled.
Details:
- Added logic to common.mk that will choose a BLIS library against which
to link (LIBBLIS_LINK). The default choice is the static (.a) library;
the shared (.so) library is chosen only if the shared library build was
enabled and the static one was disabled.
- Updated the various test driver Makefiles to reference this common,
pre-chosen library against which to link. (Previously, these drivers
unconditionally linked against the static library and would have
failed if the static library build was disabled at configure-time.)
- Renamed many of the variables in common.mk and the top-level Makefile
so that variables relating to the libblis.[a|so] files, including
paths to those files, begin with "LIBBLIS".
- Shuffled around some of the library definitions from the top-level
Makefile to common.mk.
- Renamed BLIS_ENABLE_DYNAMIC_BUILD to BLIS_ENABLE_SHARED_BUILD, and
the @enable_dynamic@ anchor to @enable_shared@ in build/config.mk.in
and in configure.
- A few other cleanups in the top-level Makefile.