Details:
- Modified all control tree node definitions to include a new field of
type func_t*, which is similar to a blksz_t except that it contains
one function pointer (each typed simply as void*) for each datatype.
We use the func_t* to embed pointers to the micro-kernels to use for
the leaf-level nodes of each control tree. This change is a natural
extension of control trees and will allow more flexibility in the
future.
- Modified all macro-kernel wrappers to obtain the micro-kernel pointers
from the incomming (previously ignored) control tree node and then pass
the queried pointer into the datatype-specific macro-kernel code, which
then casts the pointer to the appropriate type (new typedefs residing
in bli_kernel_type_defs.h) and then uses the pointer to call the micro-
kernel. Thus, the micro-kernel function is no longer "hard-coded" (that
is, determined when the datatype-specific macro-kernel functions are
instantiated by the C preprocessor).
- Added macros to bli_kernel_macro_defs.h that build datatype-specific
base names if they do not exist already, and then uses those to build
datatype-specific micro-kernel function names. This will allow
developers extra flexibility if they wanted to, for example, name each
of their datatype-specific micro-kernels differently (e.g. double
real might be named bli_dgemm_opt_4x4() while double complex might be
named bli_zgemm_opt_2x2()).
- Inserted appropriate code into _cntl_init() functions that allocates
and initializes a func_t object for the corresponding micro-kernels.
The gemm ukernel func_t object is created once, in bli_gemm_cntl_init(),
and then reused via extern wherever possible.
Details:
- Removed commented-out code from macro-kernels that was supposed to
facilitate implementing mixed domain (complex times real) matrix
multiplication. This functionality is still (probably possible),
but I'm getting tired of looking at the code every time I edit
a macro-kernel. Plus, there are probably ways of doing it at a
higher level, via control trees.
Details:
- Removed b_aux field from all control tree node definitions. This field
was being used in certain optimizations (incremental blocking) that were
not actually being employed within BLIS, and are probably not employed
by others.
- Updated all _cntl_obj_create() function definitions and invocations
according to above change.
- Retired bli_gemm_blk_var4.c, which was one such function that employed
incremental blocking, but which was never called by BLIS itself.
Details:
- Changed the pack_t enumerations so that BLIS_PACKED_VECTOR no longer has
its own value, and instead simply aliases to BLIS_PACKED_UNSPEC. This
makes room in the three pack_t bits of the info field of obj_t so that
two values are now unused, and may be used for other future purposes.
- Updated sloppy terminology usage in comments in level-2 front-ends.
(Replaced "is contiguous" with more accurate "has unit stride".)
Details:
- Redirect errors to /dev/null when using 'find' to locate libraries that
would be uninstalled upon executing "make uninstall-old". Before, if the
Makefile was read before $(INSTALL_PREFIX)/lib existed, a "No such file
or directory" message was emitted. This message was harmless, but is now
suppressed in this situation.
Details:
- In the test suite driver, inserted an explicit typecast of the return
value of bli_getopt() prior parsing. The lack of typecast caused a
problem on at least one system whereby a return value of -1 was
interpreted as garbage character. Thanks to Francisco Igual for finding
and submitting this fix.
Details:
- Modified build system (mostly configure and top-level Makefile) so that
a user can build a BLIS library outside of the top-level directory of
the source distribution.
- Added "test" target to Makefile so that the user can run "make test",
which will compile, link, and run the testsuite binary. This works even
if the build directory is externally located, thanks to the test suite
binary's new -g and -o command-line options. Also, when creating the
test suite via the top-level Makefile, the linking is against the
local archive, in lib/<configname>, rather than at <install_prefix>/lib.
- Modified testsuite/Makefile so that it links against the library built
locally, in ../lib/<configname>.
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
- Various other cleanups to build system.
Details:
- Added bli_getopt.c and .h files to frame/base. These files implement
a custom version of getopt(), which may be used to parse command line
options passed into a program via argc/argv. I am implementing this
function myself, as opposed to using the version available via unistd.h,
for portability reasons, as the only requirements are string.h (which
is available via the standard C library).
- Modified test suite to allow the user to specify the file name (and/or
path) to the parameters and operations input files: -g may be used to
specify the general input file and -o to specify the operations input
file). If -g or -o or both are not given, default filenames are assumed
(as well as their existence in the current directory).
Details:
- Updated template micro-kernel implementations (located in
config/template/kernels), to adhere to the new auxinfo_t interface.
Meant to include this change in a0331fb1.
- Changed template configuration to use 64-bit integers (for both BLIS
and the BLAS compatibility layer).
Details:
- Disabled error checking in netlib-to-BLIS parameter mapping functions.
If the char value input to these functions was not one of the defined
values, bli_check_error_code() with the appropriate error code value
would be called, resulting in an abort(). This was unnecessary and
redundant since these routines are currently only used within the
BLAS compatibility layer, and they are only called AFTER parameter
checking has already been performed on the original BLAS char values.
If the application tried to override xerbla() to prevent an abort()
from being called, this error checking would still get in the way.
Thus, instead of reporting the error situation to the framework (ie:
calling abort()), an arbitrary BLIS parameter value is now chosen and
the function returns normally. Thanks to Jeff Hammond for finding and
reporting this issue.
Details:
- Changed the value being stored into the auxinfo_t structure in trmm
and trsm macro-kernels. Whereas before we stored whatever value was
provided to the macro-kernel implementation via ps_a/ps_b, now we
store the stride that will advance to the next variable-length
micro-panel of the triangular matrix A (left) or B (right).
- Whitespace changes to the files affected above.
Details:
- Replaced conditional expressions in macro-kernels related to computing
the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
invocation, bli_is_last_iter(), that tests the same condition.
- Updated gemm_ukr module to use auxinfo_t argument.
- Whitespace changes in test suite ukr modules.
Details:
- Removed a_next and b_next arguments to micro-kernels and replaced them
with a pointer to a new datatype, auxinfo_t, which is simply a struct
that holds a_next and b_next. The struct may hold other auxiliary
information that may be useful to a micro-kernel, such as micro-panel
stride. Micro-kernels may access struct fields via accessor macros
defined in bli_auxinfo_macro_defs.h.
- Updated all instances of micro-kernel definitions, micro-kernel calls,
as well as macro-kernels (for declaring and initializing the structs)
according to above change.
Details:
- Added set of basic scalar macros that take arguments' real and
imaginary components separately, named like the previous set except
with the "ris" (instead of "s") suffix.
- Redefined the previous set of scalar macros (those that take arguments
"whole") in terms of the new "ri" set.
- Renamed setris and getris macros to sets and gets.
- Renamed setimag0 macros to seti0s.
- Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.
Details:
- Added commented alternatives to dunnington configuration's bli_kernel.h.
- Minor reformatting of optimization flag variables in make_defs.mk.
Details:
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
for x86_64/core2 were calling the wrong reference code (l instead
of u).
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
kernels.
- Minor typecasting fix in testsuite/src/test_libblis.c.
- Makefile updates.
Details:
- Joined some lines in level-2 blocked variants to match formatting used
in level-3 blocked variants.
- Streamlined implementation of bli_obj_equals() in bli_query.c.
Details:
- Added infrastructure to support a new scalar representation, whereby
every object contains an internal scalar that defaults to 1.0. This
facilitates passing scalars around without having to house them in
separate objects. These "attached" scalars are stored in the internal
atom_t field of the obj_t struct, and are always stored to be the same
datatype as the object to which they are attached. Level-3 variants no
longer take scalar arguments, however, level-3 internal back-ends stll
do; this is so that the calling function can perform subproblems such
as C := C - alpha * A * B on-the-fly without needing to change either
of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):
bli_obj_init_scalar_copy_of()
-> bli_obj_scalar_init_detached_copy_of()
bli_obj_init_scalar() -> bli_obj_scalar_init_detached()
bli_obj_create_scalar_with_attached_buffer()
-> bli_obj_create_1x1_with_attached_buffer()
bli_obj_scalar_equals() -> bli_obj_equals()
- Defined new functions:
bli_obj_scalar_detach()
bli_obj_scalar_attach()
bli_obj_scalar_apply_scalar()
bli_obj_scalar_reset()
bli_obj_scalar_has_nonzero_imag()
bli_obj_scalar_equals()
- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:
bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
bli_obj_is_scalar() -> bli_obj_is_1x1()
- Defined new macros to set and copy internal scalars between objects:
bli_obj_set_internal_scalar()
bli_obj_copy_internal_scalar()
- In level-3 internal back-ends, added conditional blocks where alpha and
beta are checked for non-unit-ness. Those values for alpha and beta are
applied to the scalars attached to aliases of A/B/C, as appropriate,
before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
attached to A and B are multiplied together to obtain alpha, while beta
is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
future support for mixed domain/precision. These can be added back later
once that functionality is given proper treatment. Also, removed the
creating of copy-casts of alpha and beta since typecasting of scalars
is now implicitly handled in the internal back-ends when alpha and
beta are applied to the attached scalars.
Details:
- Updated arm, bgq, loongson3a, and x86_64 kernels so that unimplemented
datatypes call the corresponding reference kernel. Previously, these
kernel functions called abort() with a "not yet implemented" error
message.
Details:
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
unimplemented kernel functions simply call the corresponding reference
implementation. (Previously, these unimplemented functions would
abort() with a "not yet implemented" message.)
Details:
- Removed does_scale field from packm control tree node and
bli_packm_cntl_obj_create() interface. Adjusted all invocations of
_cntl_obj_create() accordingly.
- Redefined/renamted macros that are used in aliasing so that now,
bli_obj_alias_to() does a full alias (shallow copy) while
bli_obj_alias_for_packing() does a partial alias that preserves the
pack_mem-related fields of the aliasing (destination) object.
- Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree
will work just fine for bli_trmm3().
- Removed some commented vestiges of the typecasting functionality needed
to support heterogeneous datatypes.
Details:
- Comment updates to packm_blk_var2.c and packm_blk_var3.c.
- In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly
instead of setm(), scal2m().
Details:
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
that already existed in kernels/x86_64/core2-sse3/3.
Details:
- Removed restrict declaration from b_cast and c_cast from
bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
are causing problems for xlc only in those two files and no other
macro-kernels.
- Fixed (hopefully) kernel function parameter type declarations in
kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.
Details:
- Updated Makefiles in test and testsuite directories to use the new
BLIS header installation directory scheme, which is to compile with
-I<PREFIX>/include/blis instead of -I<PREFIX>/include.
Details:
- Fixed sloppy placement of 'restrict' pointer declarations in level-3
macro-kernels. Previously, all restricted pointers were being declared
at the outer-most function scope level. While this violates the C99
standard, very few of the compilers used with BLIS so far have seemed
to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith
for identifying this bug (and suggesting the fix).
Details:
- Changed top-level Makefile so that headers are installed to
$(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
named by version/configuration and then symlinked.)
- Added uninstall targets, including uninstall-old to clean out old
library archives.
- Added GREP makefile definitions to all configurations' make_defs.mk.
Details:
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
Thanks to Francisco Igual for contributing these kernels and
configurations.
Details:
- Added missing object wrappers to level-1f test suite modules. This was
only apparent if you were configuring with something other than the
reference configuration.
- Commented out object-wrappers in level-1f front-ends. These were not
working as intended the reference configuration was selected, because
most kernel sets, such as those in the template set, do not have object
wrappers.
- Whitespace changes to template micro-kernels.
- Comment changes to template level-1f kernel headers.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
and all framework code.
- Updated test suite modules according to above changes.
Details:
- Added extensive comments to the top of testsuite/input.operations,
which describe how to edit the file.
- Removed input.operations.0 and input.operations.1.
- Changed input.general to test all datatypes ("sdcz") by default.
Details:
- Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
This will facilitate interoperability with Fortran in the future.
(Fortran does not support unsigned integers.)
- Redefined many instances of stride-related macros so that they return
or use the absolute value of the strides, rather than the raw strides
which may now be signed. Added new macros bli_is_row_stored_f() and
bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
and changed the packm_blk_var[23] variants to use these macros instead
of the existing bli_is_row_stored(), bli_is_col_stored().
- Added/adjusted typecasting to to various functions/macros, including
bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
related macros in bli_param_macro_defs.h.
- Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
layer properly handles situations where vector increments are negative.
Thanks to Vladimir Sukharev for pointing out this issue.
- Changed type of increment parameters in bli_adjust_strides() from dim_t
to inc_t. Likewise in bli_check_matrix_strides().
- Defined bli_check_matrix_object(), which checks for negative strides.
- Redefined bli_check_scalar_object() and bli_check_vector_object() so
that they also check for negative stride.
- Added instances of bli_check_matrix_object() to various operations'
_check routines.
Details:
- Fixed bugs similar to those addressed in cca1e1f51d, whereby
a segmentation fault may occur if beta is not the same type as
the vector operand for scalv and setv.
- Changed axpyv and scal2v front-ends in a similar fashion.
Details:
- Fixed a bug whereby BLIS assumed that the imaginary components of the
diagonal elements of Hermitian matrices were already zero. This property
is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
to Vladimir Sukharev for reporting this bug.
- Minor comment updates to template kernels.
Details:
- Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
and overflow from squaring the real and imaginary components. (This is
the same technique used to fix recent bugs in invscals/invscaljs and
inverts.)
Details:
- Fixed complex inversion in invscals and invscaljs whereby the
imaginary component was being computed incorrectly.
- Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
in inverts, invscals, and invscaljs.
- Changed bli_abs() and bli_fabs() macro definitions to use "<="
operator instead of "<".
Details:
- Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
these bugs.
Details:
- Fixed bugs in scalm and setm that resulted in segmentation faults when
beta is not the same type as the matrix operand. Thanks to Vladimir
Sukharev for reporting this bug.
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
and setm; namely, the alpha scalar is copy-cast the type of the first
matrix operand.
- Changed the template and reference configurations' bli_config.h files
so that the number of memory allocator blocks of A and B are set based
on BLIS_MAX_NUM_THREADS.
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.