Details:
- Redefined CFLAGS, CFLAGS_NOOPT, and CFLAGS_KERNELS so that CFLAGS_NOOPT
is defined first and then the other two are defined in terms of
CFLAGS_NOOPT. This textually cleans up the definitions and makes them a
little easier to read.
Details:
- Added a new section in bli_config.h files of all configurations for
enabling CBLAS support. (Currently, the default is for the CBLAS layer
to be disabled.)
- Added a directory, frame/compat/cblas, to house CBLAS source code. A
subdirectory 'f77_sub' holds subroutine wrappers corresponding to
subroutines found in CBLAS that allow calling some BLAS routines with
the return value passed as the last argument rather than as an actual
(function) return value. This was probably intended to allow CBLAS to
avoid the whole f2c debacle altogether. However, since BLIS does not
assume the presence of a Fortran compiler, we had to provide similar
routines in C.
- A script, integrate-cblas-tarball.sh, is included to streamline the
integration of future revisions of the CBLAS source code.
- The current tarball, cblas.tgz, that was used with the above script to
generate the present set of CBLAS source code is also included.
- Updated blis.h to include necessary CBLAS-related headers.
When installing to a directory which is not owned by the installing
user, even when the user has write permission for the directory, the
installation can fail with an error similar to the following:
Installing libblis-0.1.4-7-sandybridge.a into /usr/local/lib/
install: cannot change permissions of ‘/usr/local/lib’: Operation not permitted
Makefile:658: recipe for target '/usr/local/lib/libblis-0.1.4-7-sandybridge.a' failed
make: *** [/usr/local/lib/libblis-0.1.4-7-sandybridge.a] Error 1
In the example case, the error occurred because the user attempted to
install to /usr/local and /usr/local/lib is owned by root with mode 2755
which the Makefile unsuccessfully attempted to change to 0755.
Given that installing to /usr/local is likely to be quite common and the
ownership/permissions are the default for Debian and Debian-derived
Linux distributions (perhaps others as well), this commit attempts to
support that use case by using mkdir rather than install to create the
directory (which is the same approach as Automake).
Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
Details:
- Redefined many of the macros that define bit fields and bit values in
the obj_t info field using the bitshift operator (<<). This makes it
easier to reorder bit fields, or expand existing bit fields, or add
new fields. The bitshifting should be evaluated by the compiler at
compile-time.
Details:
- Instead of inferring the storage format of the micro-panels from within
the packm variants, we now pass in a bool_t value that denotes whether
the packed matrix contains row-stored column panels or column-stored
row panels. This value can then be tested more easily inside the main
packm variant loop.
- Renumbered pack_t schema values in bli_type_defs.h so that there are
now five bits, each with different meaning:
- 4: packed or not packed?
- 3: packed for 3m?
- 2: packed for 4m?
- 1: packed to panels?
- 0: stored by rows or columns?
- Added new macros that test for status of above bits in schema bit
subfield, and renamed some existing macros related to 4m/3m.
Details:
- Fixed a breakdown in BLIS's ability to differentiate between row-stored
and column-stored micro-panels when MR or NR is unit. When either
register blocksize (or both) is equal to one, inspecting the strides of
the affected packed micro-panel is no longer sufficient to determine
whether the micro-panel is a row-stored column panel or a column-stored
row panel (because both strides are unit). At that point, dimension
information is necessary when invoking the bli_is_row_stored_f() and
bli_is_col_stored_f() macros (and their "obj" counterparts). Thanks to
Ilya Polkovnichenko for reporting this bug.
- Added panel dimensions (m and n) to obj_t, which are set in
packm_init() and then passed into the blocked variants to support the
aforementioned update.
Details:
- Added setid call (to zero imaginary parts of diagonal elements) to
early return branches of herk_front() and her2k_front() for cases
where alpha is zero. Thanks to Murtaza Ali for suggesting this fix.
- Comment update.
Details:
- Updated herk_front() and her2k_front() to explicitly set the imaginary
components of the diagonal entries of C to zero after the computation
is complete. This is needed in case downstream applications read the
full diagonal entries (i.e., including imaginary part), which could, in
the absence of this modification, accumulate numerical error from
subsequent rank-k/rank-2k updates.
- Updated BLAS compatibility wrappers for herk and her2k to return early
if:
n == 0 || ( ( alpha == 0 || k == 0 ) && beta == 1 )
This also results in the imaginary components of diagonal entries NOT
being set to zero (see above), which is consistent with BLAS.
- Updated mkherm to use setid instead of an inlined loop over the
diagonal.
Details:
- Defined a new level-1d operation, setid, which sets the imaginary
elements of an object's diagonal to a single scalar. This can be
useful, for example, when trying to make the diagonal of a Hermitian
matrix real-valued.
Details:
- Rewrote bli_init() and bli_finalize() with OpenMP critical sections
for thread-safety. Also added lots of explanatory comments.
- Renamed bli_init_safe() and bli_finalize_safe() with the _auto()
suffix, and reimplemented for simplicity. Updated all invocations
in BLAS compatibility layer to use _auto() suffix.
Details:
- Reverted two symlinks, in kernels/power7/3/test, back to being symlinks
after recursive-sed.sh mistakenly replaced them with copies of the
actual files to which they referred. Meant to include this in previous
commit.
Details:
- Updated copyright headers to include "at Austin" in the name of the
University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
2012).
Details:
- Updated level-2 and level-3 internal back-ends so that the operation's
_check() function is called BEFORE any attempt to return early due to
the presence of zero dimensions. This ordering makes more sense because
(for example) object dimensions should match even if one of them is
zero. Previously, a dimension mismatch could result in an early return
with no error message.
- Updated bli_check_object_buffer() so that NULL buffers result in an
error only if the object is dimensionally non-empty (i.e., only if both
of the object's dimensions are non-zero). This allows BLIS operations
to be performed on dimensionally empty objects (i.e., where at least one
dimension is zero).
- Updated the error message associated with bli_check_object_buffer()
to mention the newly relaxed constraint mentioned above, vis-a-vis
non-zero dimensions.
Details:
- Replaced "not yet implemented" error messages in dsdot() and sdsdot()
with actual implementations. (These routines are so rarely used that
this log message will probably lead to some people learning of their
existence for the first time.)
Details:
- BLAS has a peculiar bug (or feature) whereby calling gemv on a vector
y of non-zero length and a vector x of zero length results in no action.
Given that the operation is y := beta*y + A*x, many (most?) individuals
would expect vector y to still be scaled by beta. BLIS, when called
natively, handles these cases intuitively (with beta scaling).
Unfortunately, many BLAS test suites actually check for the way this
situation is handled. Therefore, we have decided to implement this "bug"
in the compatibility layer so as to provide "bug-for-bug" compatibility
with BLAS.
Details:
- Added a new API family, bli_info_*(), which can be used to query
information about how BLIS was configured. Most of these values are
returned as gint_t, with the exception of the version string which
is char*.
- Changed how the testsuite driver queries information about how BLIS
was configured (from using macro constants directly to using the
new bli_info API).
- Removed bli_version.c and its header file.
- Added STRINGIFY_INT() macro to bli_macro_defs.h
- Renamed info_t type in bli_type_defs.h to objbits_t (not because of
an actual naming conflict, but because the name 'info_t' would now be
somewhat misleading in the presence of the new bli_info API, as the
two are unrelated).
Details:
- Changed bla_amax.c so that i?amax() routines now correctly return 0
if ( n < 1 || incx <= 0 ).
- Changed bla_rotg.c and bla_rotmg.c to use bli_fabs() macro instead of
f2c's abs() macro for float and double cases.
- Thanks to Murtaza Ali for suggesting the two fixes above.
- Updated label of fnormv to normfv in testsuite/input.operations.
Details:
- Redefined xpbys_mxn and xpbys_mxn_u/_l macros to employ a copy
(instead of scaling by beta) when beta is zero. This will stamp out
any possible infs or NaNs in the output matrix, if it happens to be
uninitialized. Thanks to Tony Kelman for isolating this bug.
Details:
- Added wrappers for micro-kernels so that users may invoke the
micro-kernels without knowing what the function names actually are.
This is useful when an application wishes to call the micro-kernel
from a shared library instance of BLIS, where the application may not
necessarily have the luxury of grabbing the micro-kernel name(s) from
C preprocessor macros at compile-time. Also, since the wrappers use
void* pointers, one's environment does not need to be aware of some
BLIS types such as scomplex and dcomplex. These wrappers now join the
level-1 and level-1f kernel wrappers, which pre-dated this commit.
- Removed the wrapper definitions and prototypes from the micro-kernel
test suite modules, and replaced calls to them with calls to the new
wrappers mentioned above.
Should work fine for small number of threads (up to 8 or maybe even 16).
However, performance is yet untested.
This parallelizes the "JR" loop for the left sided cases
and the "IR" loop for the right sided cases.
Future work is to parallelize the outer loops as well.
Details:
- Modified top-level Makefile to support building shared (dynamic)
libraries.
- Updated most configurations' make_defs.mk files to include necessary
compiler/linker flags needed by top-level Makefile.
- Note that by default, all configurations presently do NOT build
shared libraries. To enable, one must change the value of
BLIS_ENABLE_DYNAMIC_BUILD to 'yes'.