Details:
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
unimplemented kernel functions simply call the corresponding reference
implementation. (Previously, these unimplemented functions would
abort() with a "not yet implemented" message.)
Details:
- Comment updates to packm_blk_var2.c and packm_blk_var3.c.
- In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly
instead of setm(), scal2m().
Details:
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
that already existed in kernels/x86_64/core2-sse3/3.
Details:
- Removed restrict declaration from b_cast and c_cast from
bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
are causing problems for xlc only in those two files and no other
macro-kernels.
- Fixed (hopefully) kernel function parameter type declarations in
kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.
Details:
- Updated Makefiles in test and testsuite directories to use the new
BLIS header installation directory scheme, which is to compile with
-I<PREFIX>/include/blis instead of -I<PREFIX>/include.
Details:
- Fixed sloppy placement of 'restrict' pointer declarations in level-3
macro-kernels. Previously, all restricted pointers were being declared
at the outer-most function scope level. While this violates the C99
standard, very few of the compilers used with BLIS so far have seemed
to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith
for identifying this bug (and suggesting the fix).
Details:
- Changed top-level Makefile so that headers are installed to
$(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
named by version/configuration and then symlinked.)
- Added uninstall targets, including uninstall-old to clean out old
library archives.
- Added GREP makefile definitions to all configurations' make_defs.mk.
Details:
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
Thanks to Francisco Igual for contributing these kernels and
configurations.
Details:
- Added missing object wrappers to level-1f test suite modules. This was
only apparent if you were configuring with something other than the
reference configuration.
- Commented out object-wrappers in level-1f front-ends. These were not
working as intended the reference configuration was selected, because
most kernel sets, such as those in the template set, do not have object
wrappers.
- Whitespace changes to template micro-kernels.
- Comment changes to template level-1f kernel headers.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
and all framework code.
- Updated test suite modules according to above changes.
Details:
- Added extensive comments to the top of testsuite/input.operations,
which describe how to edit the file.
- Removed input.operations.0 and input.operations.1.
- Changed input.general to test all datatypes ("sdcz") by default.
Details:
- Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
This will facilitate interoperability with Fortran in the future.
(Fortran does not support unsigned integers.)
- Redefined many instances of stride-related macros so that they return
or use the absolute value of the strides, rather than the raw strides
which may now be signed. Added new macros bli_is_row_stored_f() and
bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
and changed the packm_blk_var[23] variants to use these macros instead
of the existing bli_is_row_stored(), bli_is_col_stored().
- Added/adjusted typecasting to to various functions/macros, including
bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
related macros in bli_param_macro_defs.h.
- Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
layer properly handles situations where vector increments are negative.
Thanks to Vladimir Sukharev for pointing out this issue.
- Changed type of increment parameters in bli_adjust_strides() from dim_t
to inc_t. Likewise in bli_check_matrix_strides().
- Defined bli_check_matrix_object(), which checks for negative strides.
- Redefined bli_check_scalar_object() and bli_check_vector_object() so
that they also check for negative stride.
- Added instances of bli_check_matrix_object() to various operations'
_check routines.
Details:
- Fixed bugs similar to those addressed in cca1e1f51d, whereby
a segmentation fault may occur if beta is not the same type as
the vector operand for scalv and setv.
- Changed axpyv and scal2v front-ends in a similar fashion.
Details:
- Fixed a bug whereby BLIS assumed that the imaginary components of the
diagonal elements of Hermitian matrices were already zero. This property
is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
to Vladimir Sukharev for reporting this bug.
- Minor comment updates to template kernels.
Details:
- Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
and overflow from squaring the real and imaginary components. (This is
the same technique used to fix recent bugs in invscals/invscaljs and
inverts.)
Details:
- Fixed complex inversion in invscals and invscaljs whereby the
imaginary component was being computed incorrectly.
- Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
in inverts, invscals, and invscaljs.
- Changed bli_abs() and bli_fabs() macro definitions to use "<="
operator instead of "<".
Details:
- Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
these bugs.
Details:
- Fixed bugs in scalm and setm that resulted in segmentation faults when
beta is not the same type as the matrix operand. Thanks to Vladimir
Sukharev for reporting this bug.
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
and setm; namely, the alpha scalar is copy-cast the type of the first
matrix operand.
- Changed the template and reference configurations' bli_config.h files
so that the number of memory allocator blocks of A and B are set based
on BLIS_MAX_NUM_THREADS.
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.
Details:
- Fixed the complex bli_?inverts() macros, which were inverting elements
in an "unsafe" manner, such that very large and very small values were
unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
reporting this bug.
- Comment update to bli_sumsqv_unb_var1.c.
- Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
- Changed 1.0F to 1.0 for bli_drands() macro.
Details:
- Fixed a minor parameter checking bug in the BLAS compatibility layer
for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the
trans parameter of either operation, it is (a) allowed, and (b) treated
as 'T' (whereas previously it was disallowed). Thanks for Vladimir
Sukharev for finding and reporting this bug.
Details:
- Applied a patch from Tyler that fixes minor staleness in the piledriver
configuration and gemm micro-kernel.
- Very minor changes to test suite input files.
Details:
- Added a kernel directory for kernels developed by Francisco Igual for
the Sandy Bridge architecture, including a dgemm ukernel coded with
AVX intrinsics.
- Added a configuration for Sandy Bridge using values supplied by Fran.
Details:
- Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental
bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not
computed correctly (ie: do not wraparound) at the edge cases. Thanks to
Tze Meng for helping me identify this bug.
Details:
- Updated the test suite driver (and modules where appropriate) so that
the level-1f fusing factors are output along with the variable dimension.
While this is not strictly necessary, since the fusing factors are output
in the initial parameter summary, it allows extra reassurance to the user
since the fusing factors appear alongside the variable dimension, which
together give a complete picture of the problem size. Similar changes were
made for outputting the register blocksizes when reporting results for the
micro-kernel test modules.
Details:
- Added test modules in test suite for level-1f kernels and level-3
micro-kernels. (Duplication in the micro-kernels, for now, is NOT
supported by these test modules.)
- Added section override switches to test suite's input.operations file.
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
facilitate the level-1f test modules. Also added front-end for dupl
operation.
- Added obj_t-based check routines for level-1f operations, which are
called from the new front-ends mentioned above.
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
factors as a function of datatype, which is needed by their respective
test modules.
- Whitespace changes to bli_kernel.h of all existing configurations.
Details:
- Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
this out.
Details:
- Added new lines of input to the test suite's input.operations file, which
allows the user to disable entire sections (levels) of tests. Before this
change, the user had to manually disable each operation tests's "master
switch". (This is why input.operations.0 existed: to allow a more
convenient starting point for someone who only wanted to test one or a
few operations.)
Details:
- Added a 'template' configuration, which contains stub implementations of the
level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
dotxaxpyf, as well as the default fusing factor (which are all equal
in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
reference variants were implemented in terms of front-end routines rather
that directly in terms of the kernels. (For example, axpy2v was implemented
as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
A is assumed to be m x b_n in both cases, and for dotxf A is actually used
as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
frame/3/gemm/ukernels and frame/3/trsm/ukernels.
Details:
- Added various micro-kernels for the following architectures:
Intel MIC
IBM BG/Q
IBM Power7
AMD Piledriver
Loogson 3A
and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
and Xianyi Zhang for contributing these kernels.
- Added configurations corresponding to above architectures, and renamed
"clarksville" configuration to "dunnington".
Details:
- Changed the configure script so that it no longer defaults to the
reference configuration. This change is being made so that the
developer has a firm awareness of which configuration is being used
to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this
suggested change.
Details:
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
functionality includes computing the pool size for each datatype (using
that datatype's cache blocksizes) and using the maximum to size the
actual pool array. This addresses the somewhat common pitfall whereby a
developer updates cache blocksizes in bli_kernel.h for only one datatype
(say, single-precision real), while the memory pools are sized using the
double-precision real values. Then, when the developer attempts to link
to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
a message saying the static memory pool was exhausted. Clearly, this
message is misleading when the pool was not sized properly to begin with.
- Removed previously disabled code in bli_kernel_macro_defs.h that was
meant to check for size consistency among the various cache blocksizes.
(Obviously the memory pool size-based solution mentioned above is better.)
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
reasonable place to put these constants, rather than further crowd up
bli_config.h.
- Updated testsuite driver to output memory pool sizes for A, B, and C.
- Minor comment updates to bli_config.h.
- Removed 'flame' configuration. It was beginning to get out-of-date, and
I hadn't used it in months. We can always re-create it later.
Details:
- Added ESSL and Accelerate (OS X) targets to standalone test drivers'
Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
/ providing this patch.
Details:
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
assigned values of 32, 64, or some other value. The former two result in
defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
causes integers to be defined in terms of a default type (e.g. long int).
- Updated bli_config.h in reference and clarksville configurations according
to above changes.
- Updated test drivers in test and testsuite to avoid type warnings associated
with format specifiers not matching the types of their arguments to printf()
and scanf().
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
inclusion in d141f9eeb6).
- Added explicit typecasting of dim_t and inc_t to macros in
bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
- Slight changes to CREDITS and INSTALL files.
- Slight tweaks to Windows build system, mostly in the form of switching to
Windows-style CRLF newlines for certain files.
Details:
- Used void-casts of certain variables to appease gcc (and perhaps other
compilers) when such variables are only used in the complex instances of
the functions. Special thanks to Karl Rupp for suggesting a portable fix
for these warnings.
Details:
- Commented out a !include statement that was attempting to include a
version file that does not yet exist. For now, the version string is
hard-coded into defs.mk.
Details:
- Added a 'windows' directory, which contains a Windows build system
similar to that of libflame's. Thanks to Martin for getting this up
and running.
- Spun off system header #includes into bli_system.h, which is included
in blis.h
- Added a Windows section to bli_clock.c (similar to libflame's).
Details:
- Renamed "small" variable to "smnum" to avoid collision with Windows type
by the same name. This change is needed in advance of the upcoming Windows
build system.
Details:
- Redefined gint_t and guint_t in terms of the standard C types long int
and unsigned long int, respectively.
- Changed testsuite default max problem size to 500.
- Changed testsuite input.operations to use square problems for level-3
operation tests.
Details:
- In light of recent segfaulting issues when compiling on 32-bit systems,
I've changed the default typedef for gint_t and guint_t from int64_t and
uint64_t to int32_t and uint32_t, respectively.
- Disabled 64-bit integers in the blas2blis layer for the reference
configuration.
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
to introductory output of the testsuite.
Details:
- Applied a temporary fix to the typecasting bug in the testsuite driver.
The fix involves casting both numerator and denominator to unsigned long.
This fix is more voodoo than science, as I can't be sure why it even
works.