All AMD specific optimization in BLIS are enclosed in BLIS_CONFIG_EPYC
pre-preprocessor, this was not defined in CMake which are resulting in
overall lower performance.
Updated version number to 3.1.1
Change-Id: I9848b695a599df07da44e77e71a64414b28c75b9
Details:
- Since GEMM kernel prefers row-storage, if input C matrix is in col-major order,
entire operation is transposed. In that case uplo(c) needs to be toggled
before kernel-variant selection.
- disabled "bli_gemmsup_ref_var1n2m_opt_cases" inside gemmtsup.
- Updated version number to 2.2.1
Change-Id: I0a85df1141fc4a98d98ea4e0c3d42db8602fa69b
Details:
- Updated the API and semantics of packm kernels such that they must now
handle edge cases, meaning that a c-by-k packm kernel must be able to
pack edge cases that are fewer than c rows/columns and be able to
zero-fill the remaining elements. They must also be able to zero-fill
the equivalent region when copying fewer than k columns/rows (which is
needed by trsm). The new packm kernel API is generally:
void packm_kernel
(
conj_t conja,
dim_t cdim,
dim_t n,
dim_t n_max,
ctype* restrict kappa,
ctype* restrict a, inc_t inca, inc_t lda,
ctype* restrict p, inc_t ldp,
cntx_t* restrict cntx
);
where cdim and n are the dimensions (short and long, respectively) of
the submatrix being copied from the source matrix A, and n_max is the
"full" long dimension (corresponding to the k dimension in gemm) of
the micropanel. The "full" short dimension (corresponding to the
register blocksize MR or NR) is not part of the API because it is
known intrinsically by the packm kernel implementation. Thanks to
Devin Matthews for prompting us to make this change (#282).
- Updated all reference packm kernels in ref_kernels/1m according to
above changes, as well as all optimized packm kernels (which only
consisted of those for knl).
- Bumped the major soname version number in 'so_version' to 2. At first
I was considering leaving it unchanged, but I couldn't escape the
reality that the packm kernel API is much closer to an expert API
than it is some obscure helper function interface within the framework
that nobody would ever notice.
- Removed reference packm kernels for mr/nr = 30. The only sub-config
that would have been using those kernels is knc, which is likely no
longer being used by very many people (if any). (This also mostly
offset the larger object code footprint incurred by moving the edge-
case handling into the individual packm kernels.)
- Fixed an obscure race condition for 3mh and 4mh induced methods in
which those implementations were modifying the contexts stored in the
gks rather than a local copy.
- Fixed a minor bug in the testsuite that prevented non-1m-based induced
method implementations of trsm from executing.
Details:
- Split existing typed APIs into two subsets of interfaces: one for use
with expert parameters, such as the cntx_t*, and one without. This
separation was already in place for the object APIs, and after this
commit the typed and object APIs will have similar expert and non-
expert APIs. The expert functions will be suffixed with "_ex" just as
is the case for expert interfaces in the object APIs.
- Updated internal invocations of typed APIs (functions such as
bli_?setm() and bli_?scalv()) throughout BLIS to reflect use of the
new explictly expert APIs.
- Updated example code in examples/tapi to reflect the existence (and
usage) of non-expert APIs.
- Bumped the major soname version number in 'so_version'. While code
compiled against a previous version/commit will likely still work
(since the old typed function symbol names still exist in the new API,
just with one less function argument) the semantics of the function
have changed if the cntx_t* parameter the application passes in is
non-NULL. For example, calling bli_daxpyv() with a non-NULL context
does not behave the same way now as it did before; before, the
context would be used in the computation, and now the context would
be ignored since the interace for that function no longer expects a
context argument.
Details:
- Changed the naming conventions used for installed libraries and
symlinks to more closely mirror patterns used by typical GNU/Linux
libraries. Whereas previously static and shared libraries were
installed and symlinked as follows:
(library) libblis-0.3.2-15-haswell.a
(library) libblis-0.3.2-15-haswell.so
(symlink) libblis.a -> libblis-0.3.2-15-haswell.a
(symlink) libblis.so -> libblis-0.3.2-15-haswell.so
we now use the following naming conventions:
(library) libblis.a
(symlink) libblis.so -> libblis.so.0.1.2
(symlink) libblis.so.0 -> libblis.so.0.1.2
(library) libblis.so.0.1.2
where 0.1.2 indicates shared library major, minor, and build versions
of 0, 1, and 2, respectively. The conventional version string can
still be queried by linking to the library in question and then calling
bli_info_get_version_str(). (The testsuite binary does this
automatically at startup.)
- Added logic to common.mk to set the soname field in the shared library
via the -soname linker flag.
- Added a 'so_version' file to the top-level directory containing two
lines. The first line specifies the .so major version number, and the
second line specifies the minor and build version numbers joined with
a '.'. This file is read by configure and those values substituted
into build/config.mk.in to define SO_MAJOR, SO_MINORB, and SO_MMB
variables.