Details:
- After merging PR #303, at Isuru's request, I removed the use of
BLIS_EXPORT_BLIS from all function prototypes *except* those that we
potentially wish to be exported in shared/dynamic libraries. In other
words, I removed the use of BLIS_EXPORT_BLIS from all prototypes of
functions that can be considered private or for internal use only.
This is likely the last big modification along the path towards
implementing the functionality spelled out in issue #248. Thanks
again to Isuru Fernando for his initial efforts of sprinkling the
export macros throughout BLIS, which made removing them where
necessary relatively painless. Also, I'd like to thank Tony Kelman,
Nathaniel Smith, Ian Henriksen, Marat Dukhan, and Matthew Brett for
participating in the initial discussion in issue #37 that was later
summarized and restated in issue #248.
- CREDITS file update.
Details:
- Removed explicit reference to The University of Texas at Austin in the
third clause of the license comment blocks of all relevant files and
replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
with format of all other comment blocks.
Details:
- Implemented support for gemm where A, B, and C may have different
storage datatypes, as well as a computational precision (and implied
computation domain) that may be different from the storage precision
of either A or B. This results in 128 different combinations, all
which are implemented within this commit. (For now, the mixed-datatype
functionality is only supported via the object API.) If desired, the
mixed-datatype support may be disabled at configure-time.
- Added a memory-intensive optimization to certain mixed-datatype cases
that requires a single m-by-n matrix be allocated (temporarily) per
call to gemm. This optimization aims to avoid the overhead involved in
repeatedly updating C with general stride, or updating C after a
typecast from the computation precision. This memory optimization may
be disabled at configure-time (provided that the mixed-datatype
support is enabled in the first place).
- Added support for testing mixed-datatype combinations to testsuite.
The user may test gemm with mixed domains, precisions, both, or
neither.
- Added a standalone test driver directory for building and running
mixed-datatype performance experiments.
- Defined a new variation of castm, castnzm, which operates like castm
except that imaginary values are not touched when casting a real
operand to a complex operand. (By contrast, in these situations castm
sets the imaginary components of the destination matrix to zero.)
- Defined bli_obj_imag_is_zero() and substituted calls in lieu of all
usages of bli_obj_imag_equals() that tested against BLIS_ZERO, and
also simplified the implementation of bli_obj_imag_equals().
- Fixed bad behavior from bli_obj_is_real() and bli_obj_is_complex()
when given BLIS_CONSTANT objects.
- Disabled dt_on_output field in auxinfo_t structure as well as all
accessor functions. Also commented out all usage of accessor
functions within macrokernels. (Typecasting in the microkernel is
still feasible, though probably unrealistic for now given the
additional complexity required.)
- Use void function pointer type (instead of void*) for storing function
pointers in bli_l0_fpa.c.
- Added documentation for using gemm with mixed datatypes in
docs/MixedDatatypes.md and example code in examples/oapi/11gemm_md.c.
- Defined level-1d operation xpbyd and level-1m operation xpbym.
- Added xpbym test module to testsuite.
- Updated frame/include/bli_x86_asm_macros.h with additional macros
(courtsey of Devin Matthews).
Details:
- Removed four trailing spaces after "BLIS" that occurs in most files'
commented-out license headers.
- Added UT copyright lines to some files. (These files previously had
only AMD copyright lines but were contributed to by both UT and AMD.)
- In some files' copyright lines, expanded 'The University of Texas' to
'The University of Texas at Austin'.
- Fixed various typos/misspellings in some license headers.
Details:
- Implemented a new function, bli_obj_imag_equals(), which compares the
imaginary part of the first argument to the second argument, which may
be a BLIS_CONSTANT or of a regular real datatype.
Details:
- Updated copyright headers to include "at Austin" in the name of the
University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
2012).
Details:
- Added infrastructure to support a new scalar representation, whereby
every object contains an internal scalar that defaults to 1.0. This
facilitates passing scalars around without having to house them in
separate objects. These "attached" scalars are stored in the internal
atom_t field of the obj_t struct, and are always stored to be the same
datatype as the object to which they are attached. Level-3 variants no
longer take scalar arguments, however, level-3 internal back-ends stll
do; this is so that the calling function can perform subproblems such
as C := C - alpha * A * B on-the-fly without needing to change either
of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):
bli_obj_init_scalar_copy_of()
-> bli_obj_scalar_init_detached_copy_of()
bli_obj_init_scalar() -> bli_obj_scalar_init_detached()
bli_obj_create_scalar_with_attached_buffer()
-> bli_obj_create_1x1_with_attached_buffer()
bli_obj_scalar_equals() -> bli_obj_equals()
- Defined new functions:
bli_obj_scalar_detach()
bli_obj_scalar_attach()
bli_obj_scalar_apply_scalar()
bli_obj_scalar_reset()
bli_obj_scalar_has_nonzero_imag()
bli_obj_scalar_equals()
- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:
bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
bli_obj_is_scalar() -> bli_obj_is_1x1()
- Defined new macros to set and copy internal scalars between objects:
bli_obj_set_internal_scalar()
bli_obj_copy_internal_scalar()
- In level-3 internal back-ends, added conditional blocks where alpha and
beta are checked for non-unit-ness. Those values for alpha and beta are
applied to the scalars attached to aliases of A/B/C, as appropriate,
before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
attached to A and B are multiplied together to obtain alpha, while beta
is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
future support for mixed domain/precision. These can be added back later
once that functionality is given proper treatment. Also, removed the
creating of copy-casts of alpha and beta since typecasting of scalars
is now implicitly handled in the internal back-ends when alpha and
beta are applied to the attached scalars.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.