Details:
- Added malloc_ft and free_ft fields to pool_t, which are provided when
the pool is initialized, to allow bli_pool_alloc_block() and
bli_pool_free_block() to call bli_fmalloc_align()/bli_ffree_align()
with arbitrary align_size values (according to how the pool_t was
initialized).
- Added a block_ptrs_len argument to bli_pool_init(), which allows the
caller to specify an initial length for the block_ptrs array, which
previously suffered the cost of being reallocated, copied, and freed
each time a new block was added to the pool.
- Consolidated the "buf_sys" and "buf_align" pointer fields in pblk_t
into a single "buf" field. Consolidated the bli_pblk API accordingly
and also updated the bli_mem API implementation. This was done
because I'd previously already implemented opaque alignment via
bli_malloc_align(), which allocates extra space and stores the
original pointer returned by malloc() one element before the element
whose address is aligned.
- Tweaked bli_membrk_acquire_m() and bli_membrk_release() to call
bli_fmalloc_align() and bli_ffree_align(), which required adding an
align_size field to the membrk_t struct.
- Pass the pack schemas directly into bli_l3_cntl_create_if() rather
than transmit them via objects for A and B.
- Simplified bli_l3_cntl_free_if() and renamed to bli_l3_cntl_free().
The function had not been conditionally freeing control trees for
quite some time. Also, removed obj_t* parameters since they aren't
needed anymore (or never were).
- Spun-off OpenMP nesting code in bli_l3_thread_decorator() to a
separate function, bli_l3_thread_decorator_thread_check().
- Renamed:
bli_malloc_align() -> bli_fmalloc_align()
bli_free_align() -> bli_ffree_align()
bli_malloc_noalign() -> bli_fmalloc_noalign()
bli_free_noalign() -> bli_ffree_noalign()
The 'f' is for "function" since they each take a malloc_ft or free_ft
function pointer argument.
- Inserted various printf() calls for the purposes of tracing memory
allocation and freeing, guarded by cpp macro ENABLE_MEM_DEBUG, which,
for now, is intended to be a "hidden" feature rather than one hooked
up to a configure-time option.
- Defined bli_rntm_equals(), which compares two rntm_t for equality.
(There are no use cases for this function yet, but there may be soon.)
- Whitespace changes to function parameter lists in bli_pool.c, .h.
Details:
- Disabled (commented out) the arm32 and arm64 configuration families
in the config_registry file. Having a configuration family registered
only makes sense if BLIS is currently outfitted with runtime hardware
detection logic to choose the appropriate sub-configuration. That
logic is currently missing for ARM architectures, and thus having the
ARM configuration families in the configuration registry only serves
to confuse people. Thanks to Devangi Parikh for suggesting this
change.
Details:
- Parameterized, reorganized, and added comments to matlab scripts in
test/mixeddt/matlab.
- Reordered some lines of code and added comments to plot_l3_perf.m in
test/3m4m/matlab.
Details:
- Updated 3m4m and mixeddt Makefiles and runme.sh scripts, mostly to
port recent changes to the former to the latter.
- Disabled (for now) code in 3m4m/test_*.c files that disables all
induced methods except for the one that is requested from the
Makefile via the IND macro. This is done because usually, we want to
test whatever method is enabled automatically for complex datatypes.
(That is, when native complex microkernels are missing, we usually
want to test performance of 1m.)
Details:
- Removed explicit reference to The University of Texas at Austin in the
third clause of the license comment blocks of all relevant files and
replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
with format of all other comment blocks.
Details:
- Lifted the constraint that 1m only be used when all operands' storage
datatypes (along with the computation datatype) are equal. Now, 1m may
be used as long as all operands are stored in the complex domain. This
change largely consisted of adding the ability to pack to 1e and 1r
formats from one precision to another. It also required adding logic
for handling complex values of alpha to bli_packm_blk_var1_md()
(similar to the logic in bli_packm_blk_var1()).
- Fixed a bug in several virtual microkernels (bli_gemm_md_c2r_ref.c,
bli_gemm1m_ref.c, and bli_gemmtrsm1m_ref.c) that resulted in the wrong
ukernel output preference field being read. Previously, the preference
for the native complex ukernel was being read instead of the pref for
the native real domain ukernel. This bug would not manifest if the
preference for the native complex ukernel happened to be equal to that
of the native real ukernel.
- Added support for testing mixed-precision 1m execution via the gemm
module of the testsuite.
- Tweaked/simplified bli_gemm_front() and bli_gemm_md.c so that pack
schemas are always read from the context, rather than trying to
sometimes embed them directly to the A and B objects. (They are still
embedded, but now uniformly only after reading the schemas from the
context.)
- Redefined cpp macro bli_l3_ind_recast_1m_params() as a static function
and renamed to bli_gemm_ind_recast_1m_params() (since gemm is the only
consumer).
- Added 1m optimization logic (via bli_gemm_ind_recast_1m_params()) to
bli_gemm_ker_var2_md().
- Added explicit handling for beta == 1 and beta == 0 in the reference
gemm1m virtual microkernel in ref_kernels/ind/bli_gemm1m_ref.c.
- Rewrote various level-0 macro defs, including axpyris, axpbyris,
scal2ris, and xpbyris (and their conjugating counterparts) to
explicitly support three operand types and updated invocations to
xpbyris in bli_gemmtrsm1m_ref.c.
- Query and use the storage datatype of the packed object instead of the
storage datatype of the source object in bli_packm_blk_var1().
- Relocated and renamed frame/ind/misc/bli_l3_ind_opt.h to
frame/3/gemm/ind/bli_gemm_ind_opt.h.
- Various whitespace/comment updates.
Details:
- Added a num_t datatype bitfield to the obj_t in the form of a new
info2 field in the obj_t. This change was made primarily so that in
the case of mixed-datatype gemm, the alpha scalar would not need to
be cast to the storage datatype of B (or A) before then being cast to
the computation datatype just before the macrokernel is called. This
double-casting regime could result in loss of precision if the storage
datatype of B (or A) is less than the computation precision. In
practice, it was likely not going to be a big deal since most usage of
alpha is for -1.0, 0.0, and 1.0 (or integer multiples thereof), which
can all be represented exactly in single or double precision.
- The type of objbits_t was changed to uint32_t, so the new format
potentially takes up the same space as the previous obj_t definition,
assuming no padding inserted by the compiler. Shrinking info to 32
bits and spilling over into a second field was chosen over using the
high 32 bits of a single 64-bit objbits_t info field because many of
the bitwise operations are performed with enums such as num_t, dom_t,
and prec_t, which may take on the type of 32-bit ints. It's easier to
just keep all of those bitwise operations in 32 bits than perform a
million typecasts throughout bli_type_defs.h and bli_obj_macro_defs.h
to ensure that the integers are treated as 64-bit for the purposes of
the ANDs, ORs, and bitshifts.
- Many comment updates.
- Thanks to Devin Matthews and Devangi Parikh for their feedback and
involvement during this commit cycle.
Details:
- Adjusted the definition for libblis_test_get_string_for_result() in
testsuite/src/test_libblis.c so that the "FAIL" string is returned if
the computed residual contains either NaN or Inf. Previously, a
residual containing NaN would result in the selection of the "PASS"
string. Thanks to Devin Matthews for reporting this issue (#279).
- Expounded on comment for the macro definitions of bli_isnan() and
bli_isinf() in bli_misc_macro_defs.h to make it more obvious why they
must remain macros.
Details:
- Added a new directory, test/3m4m/matlab, containing matlab scripts for
plotting 4x5 panels of performance graphs (using the subplot()
function) for gemm, hemm, herk, trmm, and trsm across all four
floating-point datatypes. I expect to further refine these scripts as
time goes on, but their current state constitutes a good start.
Details:
- Expanded cpp guard in frame/include/bli_x86_asm_macros.h to also check
__MINGW32__ in addition to _WIN32, __clang__, and __MIC__. Thanks to
Isuru Fernando for suggesting this fix, and also to Costas Yamin for
originally reporting the issue (#277).
Details:
- Cleanups to Makefile to allow all test drivers to be built for
OpenBLAS and MKL in addition to BLIS.
- Fixed copy-paste typos in test_hemm in calls to ssymm_() and dsymm_().
- Fixed incorrect types for betap in BLAS cpp macro branch of
test_herk.c.
Details:
- Added a new test_hemm.c test driver to test/3m4m, which was modeled
after the driver by the similar name in test. Also updated Makefile
so that blis-nat-[sm]t would trigger builds for the new driver.
Details:
- Fixed a bug in frame/3/bli_l3_oapi.c in the conditional that divides
use of induced method (1m) execution from native execution. The former
was intended to only be used in cases where all storage datatypes are
complex and the datatype of C is equal to the computation datatype.
(If mixed datatypes are detected, native execution would be used.)
However, the code in bli_gemm() was erroneously checking the execution
datatype instead of the computation datatype, which at that point is
guaranteed to be equal to the storage datatype even if the computation
datatype contains a different value. Thanks to Devangi Parikh for
helping in isolating this bug.
Details:
- Added debug output to bli_malloc.c in order to debug certain kinds of
memory behavior in BLIS. The printf() statements are disabled and must
be enabled manually.
- Whitespace/comment updates in bli_membrk.c.