Details:
- Integrated a patch originally authored and submitted by Ricardo Magana
of HP Enterprise. The changeset inserts use of a new object type, membrk_t,
(memory broker) that allows multiple sets of memory pools on, for example,
separate NUMA nodes, each of which has a separate memory space.
- Added membrk field to cntx_t and defined corresponding accessor macros.
- Added membrk field to mem_t object and defined corresponding accessor macros.
- Created new bli_membrk.c file, which contains the new memory broker API,
including:
bli_membrk_init(), bli_membrk_finalize()
bli_membrk_acquire_[mv](), bli_membrk_release(),
bli_membrk_init_pools(), bli_membrk_reinit_pools(),
bli_membrk_finalize_pools(),
bli_membrk_pool_size()
- In bli_mem.c, changed function calls to
bli_mem_init_pools() -> bli_membrk_init()
bli_mem_reinit_pools() -> bli_membrk_reinit()
bli_mem_finalize_pools() -> bli_membrk_finalize()
- In bli_packv_init.c, bli_packm_init.c, changed function calls to:
bli_mem_acquire_[mv]() -> bli_membrk_acquire_[mv]()
bli_mem_release() -> bli_membrk_release()
- Added bli_mutex.c and related files to frame/thread. These files define
abstract mutexes (locks) and corresponding APIs for pthreads, openmp, or
single-threaded execution. This new API is employed within functions
such as bli_membrk_acquire_[mv]() and bli_membrk_release().
Details:
- Fixed compiler warnings about unused variables related to the disabling
of normalization in the structured cases of the rand[vm] and randn[vm]
operations.
Details:
- Defined a new randomization operation, randn, on vectors and matrices.
The randnv and randnm operations randomize each element of the target
object with values from a narrow range of values. Presently, those
values are all integer powers of two, but they do not need to be powers
of two in order to achieve the primary goal, which is to initialize
objects that can be operated on with plenty of precision "slack"
available to allow computations that avoid roundoff. Using this method
of randomization makes it much more likely that testsuite residuals of
properly-functioning operations are close to zero, if not exactly zero.
- Updated existing randomization operations randv and randm to skip
special diagonal handling and normalization for matrices with structure.
This is now handled by the testsuite modules by explicitly calling a
testsuite function that loads the diagonal (and scales off-diagonal
elements).
- Added support for randnv and randnm in the testsuite with a new switch
in input.general that universally toggles between use of the classic
randv/randm, which use real values on the interval [-1,1], and
randnv/randnm, which use only values from a narrow range. Currently,
the narrow range is: +/-{2^0, 2^-1, 2^-2, 2^-3, 2^-4, 2^-5, 2^-6}, as
well as 0.0.
- Updated testsuite modules so that a testsutie wrapper function is called
instead of directly calling the randomization operations (such as
bli_randv() and bli_randm()). This wrapper also takes a bool_t that
indicates whether the object's elements should be normalized. (NOTE: As
alluded to above, in the test modules of triangular solve operations such
as trsv and trsm, we perform the extra step of loading the diagonal.)
- Defined a new level-0 operation, invertsc, which inverts a scalar.
- Updated the abval2ris and sqrt2ris level-0 macros to avoid an unlikely
but possible divide-by-zero.
- Updated function signature and prototype formatting in testsuite.
Details:
- Reorganized code and renamed files defining APIs related to multithreading.
All code that is not specific to a particular operation is now located in a
new directory: frame/thread. Code is now organized, roughly, by the
namespace to which it belongs (see below).
- Consolidated all operation-specific *_thrinfo_t object types into a single
thrinfo_t object type. Operation-specific level-3 *_thrinfo_t APIs were
also consolidated, leaving bli_l3_thrinfo_*() and bli_packm_thrinfo_*()
functions (aside from a few general purpose bli_thrinfo_*() functions).
- Renamed thread_comm_t object type to thrcomm_t.
- Renamed many of the routines and functions (and macros) for multithreading.
We now have the following API namespaces:
- bli_thrinfo_*(): functions related to thrinfo_t objects
- bli_thrcomm_*(): functions related to thrcomm_t objects.
- bli_thread_*(): general-purpose functions, such as initialization,
finalization, and computing ranges. (For now, some macros, such as
bli_thread_[io]broadcast() and bli_thread_[io]barrier() use the
bli_thread_ namespace prefix, even though bli_thrinfo_ may be more
appropriate.)
- Renamed thread-related macros so that they use a bli_ prefix.
- Renamed control tree-related macros so that they use a bli_ prefix (to be
consistent with the thread-related macros that were also renamed).
- Removed #undef BLIS_SIMD_ALIGN_SIZE from dunnington's bli_kernel.h. This
#undef was a temporary fix to some macro defaults which were being applied
in the wrong order, which was recently fixed.
Details:
- Replaced all instances of bli_malloc() and bli_free() with one of:
- bli_malloc_pool()/bli_free_pool()
- bli_malloc_user()/bli_free_user()
- bli_malloc_intl()/bli_free_intl()
each of which can be configured to call malloc()/free() substitutes,
so long as the substitute functions have the same function type
signatures as malloc() and free() defined by C's stdlib.h. The _pool()
function is called when allocating blocks for the memory pools (used
for packing buffers, primarily), the _user() function is called when
obj_t's are created (via bli_obj_create() and friends), and the _intl()
function is called for internal use by BLIS, such as when creating
control tree nodes or temporary buffers for manipulating internal data
structures. Substitutes for any of the three types of bli_malloc() may
be specified by #defining the following pairs of cpp macros in
bli_kernel.h:
- BLIS_MALLOC_POOL/BLIS_FREE_POOL
- BLIS_MALLOC_USER/BLIS_FREE_USER
- BLIS_MALLOC_INTL/BLIS_FREE_INTL
to be the name of the substitute functions. (Obviously, the object
code that contains these functions must be provided at link-time.)
These macros default to malloc() and free(). Subsitute functions are
also automatically prototyped by BLIS (in bli_malloc_prototypes.h).
- Removed definitions for bli_malloc() and bli_free().
- Note that bli_malloc_pool() and bli_malloc_user() are now defined in
terms of a new function, bli_malloc_align(), which aligns memory to an
arbitrary (power of two) alignment boundary, but does so manually,
whereas before alignment was performed behind the scenes by
posix_memalign(). Currently, bli_malloc_intl() is defined in terms
of bli_malloc_noalign(), which serves as a simple wrapper to the
designated function that is passed in (e.g. BLIS_MALLOC_INTL).
Similarly, there are bli_free_align() and bli_free_noalign(), which
are used in concert with their bli_malloc_*() counterparts.
Details:
- Added a new input parameter to input.general that globally toggles
whether testsuite tests are performed on objects whose buffers and
leading dimensions have been aligned, and changed the implementation
of libblis_test_mobj_create() to employ alignment (or not) regardless
of whether row, column, or general storage is being tested.
- Updated configure script's "--help" text to indicate default behavior
for internal integer type size and BLAS/CBLAS integer type size
options.
Details:
- Created template prototypes for packm kernels (in bli_l1m_ker.h), and
then redefined reference packm kernels' prototyping headers in terms of
this template, as is already done for level-1v, -1f, and -3 kernels.
- Automatically generate prototypes for user-defined packm kernels in
bli_kernel_prototypes.h (using the new template prototypes in
bli_l1m_ker.h).
- Defined packm kernel function types in bli_l1m_ft.h, including for
packm kernels specific to induced methods, which are now used in
bli_packm_cxk.c and friends rather than using a locally-defined
function type.
- In bli_packm_cxk.c, extended function pointer for packm kernels array
from out to index 31 (from previous maximum of 17). This allows us to
store the unrolled 30xk kernel in the array for use (on knc, for
example). Note: This should have been done a long time ago.
Details:
- Fixed incorrect calls to bli_get_range_*() from within trsm blocked
variants 1f, 2b, and 2f. The bug somehow went undetected since the
big commit (537a1f4), and, strangely, did not manifest via the BLIS
testsuite. The bug finally came to our attention when running thei
libflame test suite while linking to BLIS. Thanks to Kiran Varaganti
for submitting the initial report that led to this bug.
Details:
- Added #undef guards to certain #define statements in bli_f2c.h,
and renamed the file guard to BLIS_F2C_H. This helps when
#including "blis.h" from an application or library that already
#includes an "f2c.h" header.
Details:
- Added #include statements for certain key BLIS headers so that the
definition of f77_int is pulled in when a user compiles application
code with only #include "cblas.h" (and no other BLIS header). This
is necessary since f77_int is now used within the cblas API.
Details:
- Added two new sets of [sd]gemm micro-kernels for haswell architectures,
one that is 4x24/4x12 (s and d) and one that is 6x16/6x8.
- Changed the haswell configuration to use the 6x16/6x8 micro-kernels
by default.
- Updated various Makefiles, in test, test/3m4m, and testsuite.
Details:
- Fixed a typo in bli_l1f_ref.h, introduced into bbb8569, that only
manifested when non-reference level-1f kernels were used.
- Added an #undef BLIS_SIMD_ALIGN_SIZE to bli_kernel.h of dunnington
configuration to prevent a compile-time warning until I can figure out
the proper permanent fix.
- Moved frame/1f/kernels/bli_dotxaxpyf_ref_var1.c out of the compilation
path (into 'other' directory). _ref_var2 is used by default, which is
the variant that is built on axpyf and dotxf instead of dotaxpyv.
- Removed section of frame/include/bli_config_macro_defs.h pertaining to
mixed datatype support.
Details:
- Updated level-1v, level-1f kernel function types (bli_l1?_ft.h) and
generic kernel prototypes (bli_l1?_ker.h) to use 'restrict' for all
numerical operand pointers (ie: all pointers except the cntx_t).
- Updated level-1f reference kernel definitions to use 'restrict' for
all numerical operand pointers. (Level-1v reference kernel definitions
were already updated in bdbda6e.)
- Rewrote the level-1v and level-1f reference kernel prototypes in
bli_l1v_ref.h and bli_l1f_ref.h, respectively, to simply #include
bli_l1v_ker.h and bli_l1f_ker.h with redefined function base names
(as was already being done for the level-3 micro-kernel prototypes
in bli_l3_ref.h), rather than duplicate the signatures from the
_ker.h files.
- Added definitions to frame/include/bli_kernel_prototypes.h for axpbyv
and xpbyv, which were probably meant for inclusion in bdbda6e.
- Converted a number of instances of four spaces, as introduced in
bdbda6e, to tabs.
- Add missing axpby and xpby operations (plus test cases).
- Add special case for scal2v with alpha=1.
- Add restrict qualifiers.
- Add special-case algorithms for incx=incy=1.
Details:
- In all level-3 operations' _cntx_init() functions, replaced calls to
bli_cntx_obj_init() with calls to bli_cntx_obj_clear(), and in all
level-3 operations' _cntx_finalize() functions, removed calls to
bli_cntx_obj_finalize(), leaving those function definitions empty.
- Changed the definition of bli_cntx_obj_clear() so that the clearing
occurs via a single call to memset().
Details:
- Commented out several datatype-aware query functions (those ending in
_dt) from bli_cntx.c, as well as their prototypes in bli_cntx.h, and
added equivalent cpp query macros to bli_cntx.h.
- Added 'bli_config.h' to .gitignore.
Options to configure have been added for:
- Setting the internal BLIS and BLAS/CBLAS integer sizes.
- Enabling and disabling the BLAS and CBLAS layers.
Additionally, configure options which require defining macros (the above plus the threading model), write their macros to the automatically-generated bli_config.h file in the top-level build directory. The old bli_config.h files in the config dirs were removed, and any kernel-related macros (SIMD size and alignment etc.) were moved to bli_kernel.h. The Makefiles were also modified to find the new bli_config.h file.
Lastly, support for OMP in clang has been added (closes#56).
Details:
- Fixed an old reference to bli_daxpyf_fusefac, which no longer exists,
by replacing it with the axpyf fusing factor (8), and cleaned up the
relevant section of config/bgq/bli_kernel.h.
- Removed most of the details of the level-3 kernels from the template
kernel code in config/template/kernels/3 and replaced it with a
reference to the relevant kernel wiki maintained on the BLIS github
website.