Details:
- Fixed issue #115 by adding implementations for scabs1_() and dcabs1_()
to the BLAS compatibility layer. Thanks to heroxbd for pointing out
their absence.
Details:
- Fixed a bug in the configure script whereby a non-preferred value for
--enable-threading would cause problems in common.mk vis-a-vis detecting
which threading model was chosen. Thanks to heroxbd for reporting this
issue.
Details:
- Disabled the implementation of trsm_r that allows the right-hand matrix
B to be trianglar, and switched to the implementation that simply
transposes the operation (and thus the storage of C) in order to recast
the operation as trsm_l. This avoids the need to use trsm_rl and trsm_ru
macrokernels, which require an awkward swapping of MR and NR. For now,
the support for trsm_r macrokernels, via separate control trees, remains.
- Modified bli_config_macro_defs.h so that BLIS_RELAX_MCNR_NCMR_CONSTRAINTS
is defined by default. This is mostly a safety precaution in case someone
tries to switch back to the previous trsm_r implementation, but also
serves as a convenience on some systems where one does not naturally
choose blocksizes in a way that satisfies MC % NR = 0 and NC % MR = 0.
Details:
- Replaced permutation-based implementations in bli_gemm_asm_d4x12.c, which
defines 4x24 single real and 4x12 double real gemm microkernels, with
broadcast-based implementations. (The previous microkernel file has been
moved to an 'old' subdirectory.)
Details:
- Updated the changes introduced in 618f433 so that the strides of the
temporary microtile ct used in the macrokernels is determined based
on the storage preference of the microkernel (via the new functions
below), rather than the strides of c. In almost all cases, presently,
this change results in no net effect, as a high-level optimization
in the _front() functions aligns the storage of c to that of the
microkernel's preference. However, I encountered some cases where
this is not always the case in some development code that has yet
to be committed, and therefore I'm generalizing the framework code
in advance.
- Defined two new functions in bli_cntx.c:
bli_cntx_l3_ukr_prefers_rows_dt()
bli_cntx_l3_ukr_prefers_cols_dt()
which return bool_t's based on the current micro-kernel's storage
preferences. For induced methods, the preference of the underlying
real domain microkernel is returned.
- Updated definition of bli_cntx_l3_ukr_dislikes_storage_of(), and
by proxy bli_cntx_l3_ukr_prefers_storage_of(), to be in terms of
the above functions, rather than querying the preferences of the
native microkernel directly (which did the wrong thing for induced
methods).
Details:
- Changed a cpp macro that was meant to prevent using certain trsm_r code
if BLIS_RELAX_MCNR_NCMR_CONSTRAINTS was defined. It was actually coded
incorrectly at first. I've now fixed its location and changed its
consequence to a compile-time #error message.
- Number of threads is determined by BLIS_NUM_THREADS or OMP_NUM_THREADS, but can be overridden by BLIS_XX_NT as before.
- Threads are assigned to loops (ic, jc, ir, and jc) automatically by weighted partitioning and heuristics, both of which are tunable via bli_kernel.h.
- All level-3 BLAS covered.
Details:
- Consolidated the macros that define the lower and upper versions of the
gemmtrsm microkernels into a single macro that is instantiated twice.
Did this for both 3m1 and 4m1 microkernels.
- Consolidated lower and upper versions of the trsm microkernels for 3m1
and 4m1 into single files (each).
Details:
- Added cpp guards around the constraints in bli_kernel_macro_defs.h
that enforce MC % NR = 0 and NC % MR = 0. These constraints are ONLY
needed when handling right-side trsm by allowing the matrix on the
right (matrix B) to be triangular, because it involves swapping
register, but not cache, blocksizes (packing A by NR and B by MR)
and then swapping the operands to gemmtrsm just before that kernel
is called. It may be useful to disable these constraints if, for
example, the developer wishes to test the configuration with
a different set of cache blocksizes where only MC % MR = 0 and
NC % NR = 0 are enforced.
- In summary, #defining BLIS_RELAX_MCNR_NCMR_CONSTRAINTS will bypass
the enforcement of MC % NR = 0 and NC % MR = 0.