Details:
- Replaced permutation-based implementations in bli_gemm_asm_d4x12.c, which
defines 4x24 single real and 4x12 double real gemm microkernels, with
broadcast-based implementations. (The previous microkernel file has been
moved to an 'old' subdirectory.)
Details:
- Updated s and d microkernels in bli_gemm_asm_d8x6.c to relax alignment
constraints.
- Added missing c and z microkernels, which are based on the corresponding
kernels in the d6x8 set.
- This completes the d8x6 set (which may be used for situations when it
is desirable to have a microkernel with a column preference).
Details:
- Updated the top-level Makefile, build/config.mk.in template, and
configure script so that object files corresponding to source files
belonging to the BLAS compatibility layer are not compiled (or archived)
when the compatibility layer is disabled. (Same for CBLAS.) Thanks
to Devin Matthews for suggesting this optimization.
- Slight change to the way configure handles internal variables. Instead
of converting (overwriting) some, such as enable_blas2blis and
enable_cblas, from a "yes" or "no" to a "1" or "0" value, the latter are
now stored in new variables that live alongside the originals (with the
suffix "_01"). This is convenient since some values need to be
sed-substituted into the config.mk.in template, which requires "yes" or
"no", while some need to be written to the bli_config.h.in template,
which requires "0" or "1".
Updated BLIS4 TOMS citation in README.md.
Added complex gemm micro-kernels for haswell.
Details:
- Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
architectures. As with their real domain brethren, these kernels perfer
row storage, (though this doesn't affect most users due to high-level
optimizations in most level-3 operations that induce a transpose to
whatever storage preference the kernel may have).
Change-Id: I512ab90784ecbb7cdaee24928d2ccebb544ba5c1
Details:
- Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
architectures. As with their real domain brethren, these kernels perfer
row storage, (though this doesn't affect most users due to high-level
optimizations in most level-3 operations that induce a transpose to
whatever storage preference the kernel may have).
Details:
- Relaxed the base pointer and leading dimension alignment restrictions
in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
instead of vmovaps/vmovapd. These change mimic those made to the haswell
microkernels in e0d2fa0 and ee2c139.
- Updated testsuite modules as well as standalone test drivers in 'test'
directory to use DBL_MAX as the initial time candidate. Thanks to Devin
Matthews for suggesting this change.
- Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
- Minor update (vis-a-vis contexts) to driver code in test/3m4m.
Details:
- Added two new sets of [sd]gemm micro-kernels for haswell architectures,
one that is 4x24/4x12 (s and d) and one that is 6x16/6x8.
- Changed the haswell configuration to use the 6x16/6x8 micro-kernels
by default.
- Updated various Makefiles, in test, test/3m4m, and testsuite.