Details:
- Removed explicit reference to The University of Texas at Austin in the
third clause of the license comment blocks of all relevant files and
replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
with format of all other comment blocks.
Details:
- Removed four trailing spaces after "BLIS" that occurs in most files'
commented-out license headers.
- Added UT copyright lines to some files. (These files previously had
only AMD copyright lines but were contributed to by both UT and AMD.)
- In some files' copyright lines, expanded 'The University of Texas' to
'The University of Texas at Austin'.
- Fixed various typos/misspellings in some license headers.
Details:
- Historically, the compiler selection has happened statically in the
various make_defs.mk and would only be overriden by setting CC (either
prior to running configure or as a configure argument). However, in
the last couple months, configure has evolved to contain rather
sophisticated compiler detection logic for the purposes of blacklisting
sub-configurations. It only makes sense that configure now fully take
over the responsibility of selecting a compiler from the GNU make side
of the build system. Thanks to Alex Arslan for his help exposing this
issue.
- Substitute found_cc into CC in config.mk via configure.
- Set a new variable, CC_VENDOR, in config.mk via substitution from
configure, and disable the corresponding CC_VENDOR code in common.mk.
- Disabled default compiler selection (usually gcc) in the sub-configs'
various make_def.mk files.
Details:
- Removed some old conditional code in config/knl/make_defs.mk that
added -lmemkind to LDFLAGS if DEBUG_TYPE was not 'sde' and inserted
code into common.mk that affirmatively filters out -lmemkind from
LDFLAGS if DEBUG_TYPE is 'sde'. (Thanks to Dave Love for reporting
this issue.) Other minor cleanups to neighboring code in common.mk.
- Updated CRVECFLAGS in knl/make_defs.mk to be based on -march=knl,
and then AVX-512 functionality is manually removed via various
-mno-avx512* flags. Also, make the setting of CRVECFLAGS conditional
on CC_VENDOR. Similar change to skx/make_defs.mk.
- Comment/whitespace updates.
Details:
- Renamed CVECFLAGS variables in sub-configurations' make_defs.mk files
to CKVECFLAGS.
- Added default defintions of two new make variables to most sub-
configurations' make_defs.mk files--CROPTFLAGS and CRVECFLAGS--
which correspond to reference kernel analogues of the CKOPTFLAGS
and CKVECFLAGS, which track optimization and vectorization flags for
optimized kernels. Currently, two sub-configurations (knl and skx)
explicitly set CRVECFLAGS to non-default values (using AVX2 instead of
AVX-512 for reference kernels. Thanks to Jeff Hammond, whose feedback
prompted me to make this change (issue #187).
- Changed common.mk so that the get-refkern-cflags-for function returns
the flags associated with the given sub-configuration's CROPTFLAGS
and CRVECFLAGS (instead of CKOPTFLAGS and CKVECFLAGS).
Details:
- Rewrote code that selects the compiler for the purposes of compiling
the auto-detection executable. CC (if specified) is tried first. Then
gcc. Then clang. The absolute fallback is cc. The previous code was
sort of broken, and seemed to unintentionally always use gcc.
- Moved various configuration-agnostic flags from config/*/make_defs.mk
files to common.mk. The new mechanism appends the configuration-
agnostic flags to the various compiler flag variables initialized in
make_defs.mk. Flags specific to the sub-configuration are still set
in make_defs.mk.
- Added -Wno-tautological-compare to CMISCFLAGS when clang is in use.
Also added the flag to the compiler instantiation during configure-
time hardware detection (when clang is selected).
- Added some missing (but mostly-optional) quotes to configure script.
Details:
- Employ the new semantics of bli_blksz_init*() in e31f0b3 in various
sub-configurations' bli_cntx_init_*() functions by passing in 0 for
register and cache blocksizes that correpond to gemm microkernel
datatypes that were not registered, allowing the default values
set by the bli_cntx_init_*_ref() function call to remain.
Details:
- Added missing x86_64 configuration directory, which was intended to be
part of b7ca580.
- Added -Wfatal-errors compiler warning flag to all configurations so that
compilation stops after the first error.
- Changed the vectorization flags for intel64 configuration to be compatible
with 'penryn', the oldest sub-config included in that family.
- Changed the vectorization flags for penryn to target the 'core2'
microarchitecture and ssse3.
Details:
- Added a "generic" configuration that leaves the default blocksizes and
kernels unchanged. This replaces the older "reference" configuration.
Updated auto-detect script and code accordingly.
- Added support for generic configuration to arch_t (bli_type_defs.h),
bli_gks_init() (bli_gks.c), and bli_arch_config.h
- Moved bli_arch_query_id() to bli_arch.c (and prototype to bli_arch.h).
- Whitespace changes to configurations' make_defs.mk files.
Details:
- Renamed the various configurations' "bli_arch_<configname>.h" header files
(replacing "arch" with "family") to free up the 'bli_arch' namespace for a
different purpose (hardware detection).
- Renamed "bli_arch.h" and "bli_arch_pre_macro_defs.h" in frame/include to
"bli_arch_config.h" and "bli_arch_config_pre.h", respectively.
Details:
- Reworked the build system around a configuration registry file, named
config_registry', that identifies valid configuration targets, their
constituent sub-configurations, and the kernel sets that are needed by
those sub-configurations. The build system now facilitates the building
of a single library that can contains kernels and cache/register
blocksizes for multiple configurations (microarchitectures). Reference
kernels are also built on a per-configuration basis.
- Updated the Makefile to use new variables set by configure via the
config.mk.in template, such as CONFIG_LIST, KERNEL_LIST, and KCONFIG_MAP,
in determining which sub-configurations (CONFIG_LIST) and kernel sets
(KERNEL_LIST) are included in the library, and which make_defs.mk files'
CFLAGS (KCONFIG_MAP) are used when compiling kernels.
- Reorganized 'kernels' directory into a "flat" structure. Renamed kernel
functions into a standard format that includes the kernel set name
(e.g. 'haswell'). Created a "bli_kernels_<kernelset>.h" file in each
kernels sub-directory. These files exist to provide prototypes for the
kernels present in those directories.
- Reorganized reference kernels into a top-level 'ref_kernels' directory.
This directory includes a new source file, bli_cntx_ref.c (compiled on
a per-configuration basis), that defines the code needed to initialize
a reference context and a context for induced methods for the
microarchitecture in question.
- Rewrote make_defs.mk files in each configuration so that the compiler
variables (e.g. CFLAGS) are "stored" (renamed) on a per-configuration
basis.
- Modified bli_config.h.in template so that bli_config.h is generated with
#defines for the config (family) name, the sub-configurations that are
associated with the family, and the kernel sets needed by those
sub-configurations.
- Deprecated all kernel-related information in bli_kernel.h and transferred
what remains to new header files named "bli_arch_<configname>.h", which
are conditionally #included from a new header bli_arch.h. These files
are still needed to set library-wide parameters such as custom
malloc()/free() functions or SIMD alignment values.
- Added bli_cntx_init_<configname>.c files to each configuration directory.
The files contain a function, named the same as the file, that initializes
a "native" context for a particular configuration (microarchitecture). The
idea is that optimized kernels, if available, will be initialized into
these contexts. Other fields will retain pointers to reference functions,
which will be compiled on a per-configuration basis. These bli_cntx_init_*()
functions will be called during the initialization of the global kernel
structure. They are thought of as initializing for "native" execution, but
they also form the basis for contexts that use induced methods. These
functions are prototyped, along with their _ref() and _ind() brethren, by
prototype-generating macros in bli_arch.h.
- Added a new typedef enum in bli_type_defs.h to define an arch_t, which
identifies the various sub-configurations.
- Redesigned the global kernel structure (gks) around a 2D array of cntx_t
structures (pointers to cntx_t, actually). The first dimension is indexed
over arch_t and the inner dimension is the ind_t (induced method) for
each microarchitecture. When a microarchitecture (configuration) is
"registered" at init-time, the inner array for that configuration in the
2D array is initialized (and allocated, if it hasn't been already). The
cntx_t slot for BLIS_NAT is initialized immediately and those for other
induced method types are initialized and cached on-demand, as needed. At
cntx_t registration, we also store function pointers to cntx_init functions
that will initialize (a) "reference" contexts and (b) contexts for use with
induced methods. We don't cache the full contexts for reference contexts
since they are rarely needed. The functions that initialize these two kinds
of contexts are generated automatically for each targeted sub-configuration
from cpp-templatized code at compile-time. Induced method contexts that
need "stage" adjustments can still obtain them via functions in
bli_cntx_ind_stage.c.
- Added new functions and functionality to bli_cntx.c, such as for setting
the level-1f, level-1v, and packm kernels, and for converting a native
context into one for executing an induced method.
- Moved the checking of register/cache blocksize consistency from being cpp
macros in bli_kernel_macro_defs.h to being runtime checks defined in
bli_check.c and called from bli_gks_register_cntx() at the time that the
global kernel structure's internal context is initialized for a given
microarchitecture/configuration.
- Deprecated all of the old per-operation bli_*_cntx.c files and removed
the previous operation-level cntx_t_init()/_finalize() invocations.
Instead, we now query the gks for a suitable context, usually via
bli_gks_query_cntx().
- Deprecated support for the 3m2 and 3m3 induced methods. (They required
hackery that I was no longer willing to support.)
- Consolidated the 1e and 1r packm kernels for any given register blocksize
into a single kernel that will branch on the schema and support packing
to both formats.
- Added the cntx_t* argument to all packm kernel signatures.
- Deprecated the local function pointer array in all bli_packm_cxk*.c files
and instead obtain the packm kernel from the cntx_t.
- Added bli_calloc_intl(), which serves as the calloc-equivalent to to
bli_malloc_intl(). Useful when we wish to allocate and initialize to
zero/NULL.
- Converted existing cpp macro functions defined in bli_blksz.h, bli_func.h,
bli_cntx.h into static functions.
Details:
- Dropped 'u' from the list of modifiers passed into the library archiver
ar. Previously, "cru" was used, while now we employ only "cr". This
change was prompted by a warning observed on Ubuntu 16.04:
ar: `u' modifier ignored since `D' is the default (see `U')
This caused me to realize that the default mode causes timestamps to be
zero, and thus the 'u' option, which causes only changed object files to
be inserted, is not applicable.
Intel compilers include a highly optimized math library (libimf) that
should be used instead of GNU libm.
yes, this change is for ALL targets, including those that are not
supported by the Intel compiler. there is no harm in doing this, and it
is future-proof in the event that the Intel compilers support other
architectures.
Options to configure have been added for:
- Setting the internal BLIS and BLAS/CBLAS integer sizes.
- Enabling and disabling the BLAS and CBLAS layers.
Additionally, configure options which require defining macros (the above plus the threading model), write their macros to the automatically-generated bli_config.h file in the top-level build directory. The old bli_config.h files in the config dirs were removed, and any kernel-related macros (SIMD size and alignment etc.) were moved to bli_kernel.h. The Makefiles were also modified to find the new bli_config.h file.
Lastly, support for OMP in clang has been added (closes#56).
1) Add -- options.
2) Add -d/--enable-debug option to enable debugging symbols with and without optimization.
3) Allow user to specify CC at configure time, and determine vendor (gcc/icc/etc.). For now configurations enforce a particular vendor.
4) Add make V=[0,1] option to control build verbosity.
Details:
- Replaced the old memory allocator, which was based on statically-
allocated arrays, with one based on a new internal pool_t type, which,
combined with a new bli_pool_*() API, provides a new abstract data
type that implements the same memory pool functionality but with blocks
from the heap (ie: malloc() or equivalent). Hiding the details of the
pool in a separate API also allows for a much simpler bli_mem.c family
of functions.
- Added a new internal header, bli_config_macro_defs.h, which enables
sane defaults for the values previously found in bli_config. Those
values can be overridden by #defining them in bli_config.h the same
way kernel defaults can be overridden in bli_kernel.h. This file most
resembles what was previously a typical configuration's bli_config.h.
- Added a new configuration macro, BLIS_POOL_ADDR_ALIGN_SIZE, which
defaults to BLIS_PAGE_SIZE, to specify the alignment of individual
blocks in the memory pool. Also added a corresponding query routine to
the bli_info API.
- Deprecated (once again) the micro-panel alignment feature. Upon further
reflection, it seems that the goal of more predictable L1 cache
replacement behavior is outweighed by the harm caused by non-contiguous
micro-panels when k % kc != 0. I honestly don't think anyone will even
miss this feature.
- Changed bli_ukr_get_funcs() and bli_ukr_get_ref_funcs() to call
bli_cntl_init() instead of bli_init().
- Removed query functions from bli_info.c that are no longer applicable
given the dynamic memory allocator.
- Removed unnecessary definitions from configurations' bli_config.h files,
which are now pleasantly sparse.
- Fixed incorrect flop counts in addv, subv, scal2v, scal2m testsuite
modules. Thanks to Devangi Parikh for pointing out these
miscalculations.
- Comment, whitespace changes.
Details:
- Since 4674ca8c, the constraint that KC be a multiple of both MR and
NR have been relaxed, and thus it was time to remove the comments
from the top of the bli_kernel.h files of all configurations.
Details:
- Changed semantics of cache and register blocksize extensions so that
the extended values are tracked, rather than just the marginal
extensions.
- BLIS_EXTEND_[MKN]C_? has been renamed BLIS_MAXIMUM_[MKN]C_?.
- BLIS_EXTEND_[MKN]R_? has been renamed BLIS_PACKDIM_[MKN]R_?.
- bli_blksz_ext_*() APIs have been renamed to bli_blksz_max_*(). Note
that these "max" query routines grab the maximum value for cache
blocksizes and the packdim value for register blocksizes.
- bli_info_*() API has been updated accordingly.
- All configurations have been updated accordingly.
Details:
- Redefined CFLAGS, CFLAGS_NOOPT, and CFLAGS_KERNELS so that CFLAGS_NOOPT
is defined first and then the other two are defined in terms of
CFLAGS_NOOPT. This textually cleans up the definitions and makes them a
little easier to read.
Details:
- Added a new section in bli_config.h files of all configurations for
enabling CBLAS support. (Currently, the default is for the CBLAS layer
to be disabled.)
- Added a directory, frame/compat/cblas, to house CBLAS source code. A
subdirectory 'f77_sub' holds subroutine wrappers corresponding to
subroutines found in CBLAS that allow calling some BLAS routines with
the return value passed as the last argument rather than as an actual
(function) return value. This was probably intended to allow CBLAS to
avoid the whole f2c debacle altogether. However, since BLIS does not
assume the presence of a Fortran compiler, we had to provide similar
routines in C.
- A script, integrate-cblas-tarball.sh, is included to streamline the
integration of future revisions of the CBLAS source code.
- The current tarball, cblas.tgz, that was used with the above script to
generate the present set of CBLAS source code is also included.
- Updated blis.h to include necessary CBLAS-related headers.
Details:
- Updated copyright headers to include "at Austin" in the name of the
University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
2012).
Details:
- Modified top-level Makefile to support building shared (dynamic)
libraries.
- Updated most configurations' make_defs.mk files to include necessary
compiler/linker flags needed by top-level Makefile.
- Note that by default, all configurations presently do NOT build
shared libraries. To enable, one must change the value of
BLIS_ENABLE_DYNAMIC_BUILD to 'yes'.
Details:
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all
configurations. It was already going unused in packm_init() since the
recent 4m/3m commit. This setting was rarely, if ever, useful, and its
existence only posed a potential risk for 4m/3m-based implementations.
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h.
- Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template
micro-kernels.
Details:
- Standard names for reference kernels (levels-1v, -1f and 3) are now
macro constants. Examples:
BLIS_SAXPYV_KERNEL_REF
BLIS_DDOTXF_KERNEL_REF
BLIS_ZGEMM_UKERNEL_REF
- Developers no longer have to name all datatype instances of a kernel
with a common base name; [sdcz] datatype flavors of each kernel or
micro-kernel (level-1v, -1f, or 3) may now be named independently.
This means you can now, if you wish, encode the datatype-specific
register blocksizes in the name of the micro-kernel functions.
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
undefined in bli_kernel.h will default to the corresponding reference
implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
it will be defined to be BLIS_DGEMM_UKERNEL_REF.
- Developers no longer need to name level-1v/-1f kernels with multiple
datatype chars to match the number of types the kernel WOULD take in
a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
sufficient, as in bli_daxpyv_opt().
- There is no longer a need to define an obj_t wrapper to go along with
your level-1v/-1f kernels. The framework now prvides a _kernel()
function which serves as the obj_t wrapper for whatever kernels are
specified (or defaulted to) via bli_kernel.h
- Developers no longer need to prototype their kernels, and thus no
longer need to include any prototyping headers from within
bli_kernel.h. The framework now generates kernel prototypes, with the
proper type signature, based on the kernel names defined (or defaulted
to) via bli_kernel.h.
- If the complex datatype x (of [cz]) implementation of the gemm micro-
kernel is left undefined by bli_kernel.h, but its same-precision real
domain equivalent IS defined, BLIS will use a 4m-based implementation
for the datatype x implementations of all level-3 operations, using
only the real gemm micro-kernel.
Details:
- Removed macro constant definitions related to incremental blocksizes
from all configurations' bli_kernel.h files. This change is minor and
is mostly a cleanup related to a previous commit.
Details:
- Modified build system (mostly configure and top-level Makefile) so that
a user can build a BLIS library outside of the top-level directory of
the source distribution.
- Added "test" target to Makefile so that the user can run "make test",
which will compile, link, and run the testsuite binary. This works even
if the build directory is externally located, thanks to the test suite
binary's new -g and -o command-line options. Also, when creating the
test suite via the top-level Makefile, the linking is against the
local archive, in lib/<configname>, rather than at <install_prefix>/lib.
- Modified testsuite/Makefile so that it links against the library built
locally, in ../lib/<configname>.
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
- Various other cleanups to build system.
Details:
- Changed top-level Makefile so that headers are installed to
$(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
named by version/configuration and then symlinked.)
- Added uninstall targets, including uninstall-old to clean out old
library archives.
- Added GREP makefile definitions to all configurations' make_defs.mk.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
and all framework code.
- Updated test suite modules according to above changes.
Details:
- Added test modules in test suite for level-1f kernels and level-3
micro-kernels. (Duplication in the micro-kernels, for now, is NOT
supported by these test modules.)
- Added section override switches to test suite's input.operations file.
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
facilitate the level-1f test modules. Also added front-end for dupl
operation.
- Added obj_t-based check routines for level-1f operations, which are
called from the new front-ends mentioned above.
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
factors as a function of datatype, which is needed by their respective
test modules.
- Whitespace changes to bli_kernel.h of all existing configurations.
Details:
- Added a 'template' configuration, which contains stub implementations of the
level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
dotxaxpyf, as well as the default fusing factor (which are all equal
in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
reference variants were implemented in terms of front-end routines rather
that directly in terms of the kernels. (For example, axpy2v was implemented
as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
A is assumed to be m x b_n in both cases, and for dotxf A is actually used
as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
frame/3/gemm/ukernels and frame/3/trsm/ukernels.
Details:
- Added various micro-kernels for the following architectures:
Intel MIC
IBM BG/Q
IBM Power7
AMD Piledriver
Loogson 3A
and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
and Xianyi Zhang for contributing these kernels.
- Added configurations corresponding to above architectures, and renamed
"clarksville" configuration to "dunnington".