Updated copyright information for kernels/zen/bli_trsm_small.c file
Removed separate kernels for zen2 architecture
Instead added threshold conditions in zen kernels both for ROME and NAPLES
Change-Id: Ifd715731741d649b6ad16b123a86dbd6665d97e5
config/zen/bli_family_zen.h: deleted macro BLIS_ENBLE_ZEN_BLOCK_SIZES
config/zen/make_defs.mk: removed compiler flag -mno-avx256-split-unaligned-store
frame/base/bli_cpuid.c: ROME family is 17H but model # is from 0x30H.
test/test_gemm.c - commented out #define FILE_IN_OUT (some compilation error when BLIS is configured as amd64)
Now we can use single configuration has ./configure amd64 - this will work both for ROME & Naples
Change-Id: I91b4fc35380f8a35b4f4c345da040c6b5910b4a2
A Macro 'FILE_IN_OUT" is defined to read input parameters from a csv file.
Format for input file:
Each line defines a gemm problem with following parameters: m k n cs_a cs_b cs_c
The operation always implemented is C = C - A*B and column-major format.
When macro is disabled - it reverts back to original implementation.
Usage: ./test_gemm_<mkl/blis/openblas>.x input.csv output.csv
GEMM is called through BLAS interface
For BLIS - the test application also prints either 'S' indicating small gemm routine or 'N' - conventional BLIS gemm
for MKL/OpenBLAS - ignore this character
Change-Id: I0924ef2c1f7bdea48d4cdb230b888e2af2c86a36
Details:
- Replaced direct usage of _Pragma( "omp simd" ) in reference kernels
with PRAGMA_SIMD, which is defined as a function of the compiler being
used in a new bli_pragma_macro_defs.h file. That definition is cleared
when BLIS detects that the -fopenmp-simd command line option is
unsupported. Thanks to Devin Matthews and Jeff Hammond for suggestions
that guided this commit.
- Updated configure and bli_config.h.in so that the appropriate anchor
is substituted in (when the corresponding pragma omp simd support is
present).
Details:
- Changed all occurrances of
micro-kernel -> microkernel
macro-kernel -> macrokernel
micro-panel -> micropanel
in all markdown documents in 'docs' directory. This change is being
made since we've reached the point in adoption and acceptance of
BLIS's insights where words such as "microkernel" are no longer new,
and therefore now merit being unhyphenated.
- Updated "Implementation Notes" sections of KernelsHowTo.md, which
still contained references to nonexistent cpp macros such as
BLIS_DEFAULT_MR_? and BLIS_PACKDIM_MR_?.
- Added 'run-fast' and 'check-fast' targets to testsuite/Makefile.
- Minor updates to Testsuite.md, including suggesting use of
'make check' and 'make check-fast' when running from the local
testsuite directory.
- Added a comment to top-level Makefile explaining the purpose behind
the TESTSUITE_WRAPPER variable, which at first glance appears to serve
no purpose.
Details:
- Fixed code in the skx subconfiguration that became a bug after
committing bdd46f9. Specifically, the bli_cntx_init_skx() function
was overwriting default blocksizes for the scomplex and dcomplex
microkernels despite the fact that only single and double real
microkernels were being registered. This was not a problem prior to
bdd46f9 since all microkernels used dynamically-queried (at runtime)
register blocksizes for loop bounds. However, post-bdd46f9, this
became a bug because the reference ukernels for scomplex and dcomplex
were written with their register blocksizes hard-coded as constant
loop bounds, which conflicted the the erroneous scomplex and dcomplex
values that bli_cntx_init_skx() was setting in the context. The
lesson here is that going forward, all subconfigurations must not set
any blocksizes for datatypes corresponding to default/reference
microkernels. (Note that a blocksize is left unchanged by the
bli_cntx_set_blkszs() function if it was set to -1.)
Details:
- Fixed a bug that mainfested anytime a configuration was used in which
optimized microkernels were registered and the trsm operation (or
kernel) was invoked. The bug resulted from the optimized microkernels'
register blocksizes conflicting with the hard-coded values--expressed
in the form of constant loop bounds--used in the new reference trsm
ukernels that were introduced in bdd46f9. The fix was easy: reverting
back to the implementation that uses variable-bound loops, which
amounted to changing an #if 0 to #if 1 (since I preserved the older
implementation in the file alongside the new code based on constant-
bound loops). It should be noted that this fix must be permanent,
since the trsm kernel code with constant-bound loops can never work
with gemm ukernels that use different register blocksizes.
Details:
- Rewrote level-1v, -1f, and -3 reference kernels in terms of simplified
indexing annotated by the #pragma omp simd directive, which a compiler
can use to vectorize certain constant-bounded loops. (The new kernels
actually use _Pragma("omp simd") since the kernels are defined via
templatizing macros.) Modest speedup was observed in most cases using
gcc 5.4.0, which may improve with newer versions. Thanks to Devin
Matthews for suggesting this via issue #286 and #259.
- Updated default blocksizes defined in ref_kernels/bli_cntx_ref.c to
be 4x16, 4x8, 4x8, and 4x4 for single, double, scomplex and dcomplex,
respectively, with a default row preference for the gemm ukernel. Also
updated axpyf, dotxf, and dotxaxpyf fusing factors to 8, 6, and 4,
respectively, for all datatypes.
- Modified configure to verify that -fopenmp-simd is a valid compiler
option (via a new detect/omp_simd/omp_simd_detect.c file).
- Added a new header in which prefetch macros are defined according to
which compiler is detected (via macros such as __GNUC__). These
prefetch macros are not yet employed anywhere, though.
- Updated the year in copyrights of template license headers in
build/templates and removed AMD as a default copyright holder.
Details:
- Guard typedef of ftnlen in f2c_types.h with a #ifndef HAVE_BLIS_H
directive to prevent the redefinition of that type. Thanks to Jeff
Diamond for reporting this compiler warning (and apologies for the
delay in committing a fix).
Details:
- Add os_name to the list of variables into which the '/' character is
escaped. This is meant to address (or at least make progress toward
addressing) #293. Thanks to Isuru Fernando for spotting this as the
potential fix, and also thanks to M. Zhou for the original report.
Details:
- Added a variant set of matlab scripts geared to producing plots that
reflect performance data gathered with and without extra memory
optimizations enabled. These scripts reside (for now) in
test/mixeddt/matlab/wawoxmem.
Details:
- Removed malloc_ft and free_ft function pointer arguments from the
interface to bli_apool_init() after deciding that there is no need to
specify the malloc()/free() for blocks within the apool. (The apool
blocks are actually just array_t structs.) Instead, we simply call
bli_malloc_intl()/_free_intl() directly. This has the added benefit
of allowing additional output when memory tracing is enabled via
--enable-mem-tracing. Also made corresponding changes elsewhere in
the apool API.
- Changed the inner pools (elements of the array_t within the apool_t)
to use BLIS_MALLOC_POOL and BLIS_FREE_POOL instead of BLIS_MALLOC_INTL
and BLIS_FREE_INTL.
- Disabled definitions of bli_malloc_pool() and bli_free_pool() since
there are no longer any consumers of these functions.
- Very minor comment / printf() updates.
* Initialize error messages at compile time
- Assigning strings directly to the bli_error_string array, instead of
snprintf() at execution-time.
* Retired bli_error_init(), _finalize().
Details:
- Removed functions obviated by changes in 80e8dc6: bli_error_init(),
bli_error_finalize(), and bli_error_init_msgs(), as well as calls to
the former two in bli_init.c.
* Regenerated symbols in build/libblis-symbols.def.
Details:
- Reran ./build/regen-symbols.sh after running
'configure --enable-cblas auto'.
Details:
- Updated/added comments about Fedora, OpenSUSE, and GNU Guix under the
newly-renamed "External GNU/Linux packages" section. Thanks to Dave
Love for providing these revisions.