mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
CHANGELOG update (0.3.1)
This commit is contained in:
622
CHANGELOG
622
CHANGELOG
@@ -1,10 +1,622 @@
|
||||
commit 709f8361ebc90b96b02ebe5c5ffb6fc3b1b25e58 (HEAD -> master, tag: 0.3.0)
|
||||
commit 1f28d7c86e17730f05bd239c8e8d67e3e7510a4f (HEAD -> master, tag: 0.3.1)
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Apr 4 17:13:15 2018 -0500
|
||||
|
||||
Version file update (0.3.1)
|
||||
|
||||
commit e6cc9ee26bcf0450f1120d5d12985b04d9fb8516 (origin/master, origin/dev, origin/amd, origin/HEAD, dev, amd)
|
||||
Merge: 786d15c5 3c91c7ae
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Apr 4 16:08:18 2018 -0500
|
||||
|
||||
Merge branch 'dev' of github.com:flame/blis into dev
|
||||
|
||||
commit 786d15c5ef09f1f647b126b63d57e76d5810c58e
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Apr 4 16:06:47 2018 -0500
|
||||
|
||||
Added skx, knl to x86_64 configuration family.
|
||||
|
||||
Details:
|
||||
- Added 'skx' and 'knl' sub-configurations to the 'x86_64' configuration
|
||||
family in the config_registry file.
|
||||
- Added logic to configure that avoids committing certain sub-configs to
|
||||
the configuration/kernel registries if those sub-configs cannot be
|
||||
handled properly by the chosen compiler. (This was modeled after
|
||||
similar logic in TBLIS's configure; thanks to Devin Matthews for
|
||||
pointing this out.) First, the compiler and its version are inspected
|
||||
and, based on the results, certain configurations are added to a
|
||||
"blacklist". Then, as the configuration registries are being created,
|
||||
configurations and/or kernels that match items in the blacklist are
|
||||
skipped over and not commited to the registries. Under certain
|
||||
circumstances, omitting a blacklisted configuration will indirectly
|
||||
invalidate other configurations due to the loss of availability of
|
||||
the original blacklisted configuration's kernel set. This additional
|
||||
indirect blacklist is also accounted for.
|
||||
- Added output to the beginning of configure that echos information
|
||||
about the chosen compiler as well as the configurations that are
|
||||
blacklisted and must be stripped from the registries.
|
||||
- Various other cleanups in configure, especially with respect to
|
||||
explicitly declaring local variables in functions.
|
||||
- Comment updates to config/zen/make_defs.mk regarding choice of -march
|
||||
flags based on compiler version.
|
||||
|
||||
commit 3c91c7aebafb446a2582267beb3b22c8bb475b3b
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Apr 2 12:40:25 2018 -0500
|
||||
|
||||
Fixed 64b type mismatch warning in cblas_xerbla.c.
|
||||
|
||||
Details:
|
||||
- Fixed a compiler warning concerning a type mismatch between the
|
||||
format specifier of the printf() call in cblas_xerbla.c and its
|
||||
corresponding (info) argument. The warning manifested when the CBLAS
|
||||
layer was enabled and the BLAS/CBLAS integer type siwas is set to 64
|
||||
(the default is 32). The warning was fixed by changing the specifier
|
||||
from %d to %jd and typecasting the argument to intmax_t. Thanks to
|
||||
Dave Love for reporting this issue and submitting the patch.
|
||||
|
||||
commit 71eaf449a812fe2bd640d21513ec83974b2edb45
|
||||
Merge: 6a628184 ae9a5be5
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Mar 27 17:21:43 2018 -0500
|
||||
|
||||
Merge branch 'dev'
|
||||
|
||||
commit ae9a5be56d6f9b87278d6032154d2dcf3fb7d54f
|
||||
Author: dnp <devangiparikh@gmail.com>
|
||||
Date: Tue Mar 27 17:01:23 2018 -0500
|
||||
|
||||
Fixed bug in skx sgemm microkernel
|
||||
|
||||
commit 3f02af0905b1e2e2e065862f8afe5e9a52f282b2
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Mar 26 17:40:04 2018 -0500
|
||||
|
||||
Row storage optimizations to zen dotxf kernels.
|
||||
|
||||
Details:
|
||||
- Split the main loop bodies of zen's [sd]dotxf kernels into two cases:
|
||||
one to handle a column-stored matrix A and one to handle a row-stored
|
||||
matrix A. This allows vector instructions to be employed even if A is
|
||||
stored by rows (and A^T appears stored as columns). Both storage cases
|
||||
use a common edge case loop. Thanks to Devin Matthews for this idea
|
||||
and for prototyping the change needed for sdotxf kernel.
|
||||
|
||||
commit 679dcc331dd870ec680e135a3fb65ffa6e3a91c2
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Mar 26 15:35:17 2018 -0500
|
||||
|
||||
Make k_iter/k_left uint64_t in bulldozer fma ukrs.
|
||||
|
||||
Details:
|
||||
- Changed the declaration of k_iter and k_left for d, c, z microkernels
|
||||
from dim_t to uint64_t. This is needed to ensure compatibility with
|
||||
the movq instruction used to load the value into registers. This
|
||||
change should have been made a long time ago, but for some reason
|
||||
only recently began showing up via Travis CI.
|
||||
|
||||
commit 6a628184f6938673440e4cdd4fed0208c51fd1f9
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Mar 26 14:48:16 2018 -0500
|
||||
|
||||
Fixed a memkind-related compile-time bug on knl.
|
||||
|
||||
Details:
|
||||
- Fixed a compile-time error that occurred due to the fact that
|
||||
BLIS_ENABLE_MEMKIND, defined in bli_config.h, was not being defined
|
||||
soon enough to be used in bli_system.h where it is needed to determine
|
||||
whether hbwmalloc.h should be #included. bli_system.h is now included
|
||||
after bli_config.h (and bli_config_macro_defs.h). Thanks to Dave Love
|
||||
for reporting this issue.
|
||||
- Tweaked the language used by configure to echo the status of the
|
||||
--with[out]-memkind option.
|
||||
|
||||
commit e2192a8fd58ec3657434ddd407033e097edad8f4
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Mar 23 12:53:48 2018 -0500
|
||||
|
||||
Removed vzeroupper intrinsics from zen kenels.
|
||||
|
||||
Details:
|
||||
- Fixed a bug in the zen (also used by haswell) dotxf kernels whereby a
|
||||
vzeroupper instruction destoryed part of the intermediate result
|
||||
stored by the vdpps instructions that came right before. (The
|
||||
vzeroupper instrinsic was removed.)
|
||||
- Removed remaining vzeroupper instrinsics from other zen kernels.
|
||||
Previously, the vzeroupper instructions were included because BLIS is
|
||||
typically compiled with -mfpmath=sse. But it was brought to my
|
||||
attention that inserting these vzeroupper instructions is unnecessary
|
||||
for our purposes, since (a) -mfpmath=sse results in VEX-encoded scalar
|
||||
code rather than literal SSE instructions, and (b) compilers already
|
||||
(likely) insert vzeroupper instructions where necessary. Thanks to
|
||||
Devin Matthews for zeroing in on the dotxf bug.
|
||||
- Removed -malign-double from bulldozer make_defs.mk. This alignment
|
||||
was already happening by default since bulldozer is an x86_64 system.
|
||||
|
||||
commit 22289ad23cd10b81451ce82f60d84b5f97e7fd85
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Mar 22 18:21:30 2018 -0500
|
||||
|
||||
Added build system support for libmemkind.
|
||||
|
||||
Details:
|
||||
- Added support for libmemkind to configure. configure attempts to
|
||||
detect the presence of libmemkind by compiling a small program
|
||||
containing #include <hbwmalloc.h> and a call to hbw_malloc(). If
|
||||
successful, it is assumed that libmemkind is present and available.
|
||||
If present, use of libmemkind is enabled by default, and otherwise
|
||||
use is disabled by default. If libmemkind is present, the user may
|
||||
explicitly disable use of the library by running configure with the
|
||||
--without-memkind option. Furthermore, a configuration may disable
|
||||
libmemkind, perhaps conditional on some aspect of the build system,
|
||||
by including -DBLIS_DISABLE_MEMKIND in the configuration's CPPROCFLAGS
|
||||
make variable and setting the BLIS_ENABLE_MEMKIND makefile variable,
|
||||
set in config.mk, to 'no'. (The knl configuration makes use of this
|
||||
latter feature; see below.)
|
||||
- If enabled at configure-time, bli_system.h will #include <hbwmalloc.h>
|
||||
and bli_kernel_macro_defs.h will define BLIS_MALLOC_POOL and
|
||||
BLIS_FREE_POOL to use hbw_malloc() and hbw_free(), respectively.
|
||||
- Deprecated explicit use of BLIS_NO_HBWMALLOC in
|
||||
config/knl/bli_family.knl.h and replaced use of -DBLIS_NO_HBWMALLOC in
|
||||
config/knl/make_defs.mk with -DBLIS_DISABLE_MEMKIND, which overrides
|
||||
(#undefs) the definition of BLIS_ENABLE_MEMKIND in bli_system.h, if it
|
||||
would otherwise be defined. Also, set the BLIS_ENABLE_MEMKIND makefile
|
||||
variable to 'no'.
|
||||
- common.mk now adds libmemkind to LDFLAGS if libmemkind is enabled.
|
||||
|
||||
commit 7dc40eafdd9af3e8c4519a8d1b04d25830b4ca7a
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Mar 21 18:39:16 2018 -0500
|
||||
|
||||
Updates to top-level and test driver Makefiles.
|
||||
|
||||
Details:
|
||||
- Added logic to common.mk that will choose a BLIS library against which
|
||||
to link (LIBBLIS_LINK). The default choice is the static (.a) library;
|
||||
the shared (.so) library is chosen only if the shared library build was
|
||||
enabled and the static one was disabled.
|
||||
- Updated the various test driver Makefiles to reference this common,
|
||||
pre-chosen library against which to link. (Previously, these drivers
|
||||
unconditionally linked against the static library and would have
|
||||
failed if the static library build was disabled at configure-time.)
|
||||
- Renamed many of the variables in common.mk and the top-level Makefile
|
||||
so that variables relating to the libblis.[a|so] files, including
|
||||
paths to those files, begin with "LIBBLIS".
|
||||
- Shuffled around some of the library definitions from the top-level
|
||||
Makefile to common.mk.
|
||||
- Renamed BLIS_ENABLE_DYNAMIC_BUILD to BLIS_ENABLE_SHARED_BUILD, and
|
||||
the @enable_dynamic@ anchor to @enable_shared@ in build/config.mk.in
|
||||
and in configure.
|
||||
- A few other cleanups in the top-level Makefile.
|
||||
|
||||
commit 97e1eeade3c51df1bae574a9bc1da34b05bf2bd3
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Mar 21 15:47:11 2018 -0500
|
||||
|
||||
Added input.operations.fast file for 'make check'.
|
||||
|
||||
Details:
|
||||
- Added an 'input.operations.fast' file to testsuite directory to go
|
||||
along with the 'input.general.fast' file used by the 'make check'
|
||||
target in the top-level Makefile. This will allow the "fast" check
|
||||
to prune operations and/or parameter combinations from the test
|
||||
space in order to save time.
|
||||
- Currently, input.operations.fast prunes trmm3 and all transposition
|
||||
and conjugation parameters from the level-3 test space.
|
||||
- Reduced problem size tested in input.general.fast to 100 and disabled
|
||||
testing of 1m method.
|
||||
|
||||
commit c441caa95aabe69f54e2160eb67bf4ca76a66c34
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Mar 20 17:56:02 2018 -0500
|
||||
|
||||
README update.
|
||||
|
||||
Details:
|
||||
- Minor updates to README.md.
|
||||
- Minor change to blastest/Makefile.
|
||||
|
||||
commit 6fe018eb4ac8c16f2edc916c24f5994848017b7f
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Mar 20 15:35:45 2018 -0500
|
||||
|
||||
Added .gitkeep file to blastest/obj.
|
||||
|
||||
Details:
|
||||
- Added an empty file named '.gitkeep' to blastest/obj/ so that git will
|
||||
track the otherwise empty directory. (This is already done for the BLIS
|
||||
testsuite in testsuite/obj.)
|
||||
|
||||
commit 0e6d000db9291342913dc5f8590a28c67bbcbc95
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Mar 20 15:08:43 2018 -0500
|
||||
|
||||
Updated .gitignore to ignore BLAS test out.* files.
|
||||
|
||||
commit 40c040a31d96fbadff11f761d0cad1ef03ef2cc5
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Mar 20 14:33:50 2018 -0500
|
||||
|
||||
Fixes to .travis.yml.
|
||||
|
||||
Details:
|
||||
- Invoke the full BLIS testsuite via 'make testblis' instead of the fast
|
||||
version via 'blistest-fast' (which was wrong anyway, since the correct
|
||||
fast traget is 'testblis-fast').
|
||||
- Invoke the BLAS tests via 'make testblas' instead of 'blastest'.
|
||||
|
||||
commit 664ec4813d8b53121cce7a68bef47da656ece9cb
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Mar 20 13:54:58 2018 -0500
|
||||
|
||||
Integrated f2c'ed netlib BLAS test suite.
|
||||
|
||||
Details:
|
||||
- Created a new test suite that exercises only the BLAS compatibility
|
||||
found in BLIS. The test suite is a straightforward port of code
|
||||
obtained from netlib LAPACK, run through f2c and linked to a stripped-
|
||||
down version of libf2c that is compiled along with the test drivers
|
||||
(to prevent any obvious ABI issues). The new BLAS test suite can be
|
||||
run from within its new local directory, 'blastest' (through its local
|
||||
'make ; make run' targets) or from the top-level Makefile (via the
|
||||
'make testblas' target). Output files are created in whatever directory
|
||||
the test drivers are run, whether it be the 'blastest' directory, the
|
||||
top-level source distribution directory, or the out-of-tree directory
|
||||
in which 'configure' was run. Also, the results of the BLAS test suite
|
||||
can be checked via 'make checkblas', which summarizes the presence or
|
||||
absence of test failures in a single line printed to stdout.
|
||||
- Updated the 'test' target to run both 'testblis' and 'testblas'.
|
||||
- Added a new 'testblis-fast' target that runs the BLIS testsuite with
|
||||
smaller problem sizes, allowing it to finish more quickly.
|
||||
- Added a 'make check' target, which runs 'checkblis-fast' and
|
||||
'checkblas'.
|
||||
- Changed .travis.yml so that Travis CI runs 'testblis-fast' instead of
|
||||
'testblis' before (calling the check-blistest.sh script to check the
|
||||
result manually).
|
||||
- Renamed some targets in the top-level Makefile to be consistent between
|
||||
BLAS and BLIS.
|
||||
|
||||
commit 40fa10396c0a3f9601cf49f6b6cd9922185c932e
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Mar 19 18:19:43 2018 -0500
|
||||
|
||||
Fixed a few obscure bugs in the BLAS API.
|
||||
|
||||
Details:
|
||||
- Fixed a missing parameter in the definition of sdsdot_(). The 'sb'
|
||||
argument was missing. Strangely, the argument is omitted from dsdot_()
|
||||
in the BLAS API.
|
||||
- Fixed the missing 'c' or 'u' in the "?gerc" or "?geru" operation string
|
||||
passed to xerbla_() by the bla_ger_check() macro.
|
||||
- For bla_syrk_check() and bla_syr2k_check() macros, only allow
|
||||
conjugate-transpose (trans='c') as a valid argument for the real
|
||||
domain functions [sd]syrk_() and [sd]syr2k_(). (Previously, the
|
||||
argument was allowed even for the complex domain equivalents, which
|
||||
was inconsistent with the BLAS API.)
|
||||
|
||||
commit fe7d7f1e43e4c26249eed83d4188beee1ba96202
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Sun Mar 18 19:43:06 2018 -0500
|
||||
|
||||
Fixed cpp macro parameter "ch" typo in bla_ger.c.
|
||||
|
||||
Details:
|
||||
- Previously, the BLAS routine-generating macro in bla_ger.c was
|
||||
incorrectly passing MKSTR(ch) into the _check() macro when it
|
||||
should have been passing in the char that was available, chxy.
|
||||
I've instead changed the name of the macro parameter from chxy
|
||||
to ch. Similar change as made to bla_ger.h for consistency.
|
||||
Thanks to Dave Love in helping track this down. (NOTE: This is
|
||||
actually the root cause of the bug that was first patched by
|
||||
increasing the length of the operation name strings passed into
|
||||
xerbla_(), as defined by the constant BLIS_MAX_BLAS_FUNC_STR_LENGTH,
|
||||
in 3d1a5a7. In theory, that change could be backed out now.)
|
||||
- Applied aforementioned chxy->ch change to bla_dot.[ch], as well as
|
||||
frame/compat/cblas/f77_sub/f77_dot_sub.[ch] (not because it needed
|
||||
to happen, but for naming consistency).
|
||||
- Reformatted function signatures/prototypes of CBLAS functions and
|
||||
function calls to BLAS in frame/compat/cblas/f77_sub/*.c.
|
||||
|
||||
commit cb7ed90752d1ddbac11368c4510641ca4f3a02eb
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Mar 16 13:05:56 2018 -0500
|
||||
|
||||
Convert op names to uppercase before calling xerbla_().
|
||||
|
||||
Details:
|
||||
- Defined a new function, bli_string_mkupper(), that calls toupper() on
|
||||
every non-NULL character in a string.
|
||||
- Call bli_string_mkupper() prior to calling xerbla_() in the level-2/-3
|
||||
BLAS _check() macros. This prevents the BLAS testsuite from complaining
|
||||
that the operation name (e.g. "dgemm") does not match the expected
|
||||
value (e.g. "DGEMM"). Thanks to Dave Love for reporting this issue.
|
||||
|
||||
commit 3d1a5a7c08fed3ba29f060fe1db2b0dc42dde223
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Mar 16 12:24:07 2018 -0500
|
||||
|
||||
Fixed printf() format overflow.
|
||||
|
||||
Details:
|
||||
- Increased the length of operation name strings passed to xerbla_() in
|
||||
the level-2 and level-3 operation _check() functions, found in
|
||||
frame/compat/check. This avoids a format specifier overflow warning by
|
||||
gcc 7. Thanks to Dave Love for reporting this issue and suggesting the
|
||||
fix.
|
||||
|
||||
commit c73055f028684d998e03b2392093c393782bbfe7
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Mar 15 16:08:21 2018 -0500
|
||||
|
||||
Return after non-zero info in BLAS checks.
|
||||
|
||||
Details:
|
||||
- Previously, when calling the BLAS compatibility layer, discovering a
|
||||
parameter check failure would result in the proper setting of the
|
||||
info parameter (printed by xerbla_()), but would also come with an
|
||||
immediate abort() rather than a return. This was incorrect behavior
|
||||
for two overlapping reasons.
|
||||
(1) BLAS should return gracefully to the caller in the event of a
|
||||
bad set of parameters, not abort().
|
||||
(2) When BLIS was being tested via the BLAS testsuite, BLIS's
|
||||
xerbla_() would correctly get preempted/overridden by the
|
||||
xerbla_() in the BLAS testsuite, but execution would then
|
||||
erroneously continue on to the BLIS implementation with bad
|
||||
parameter values.
|
||||
- The previous issue was addressed by disabling the abort() in BLIS's
|
||||
xerbla_(), changing all of the BLAS _check() functions to cpp macros,
|
||||
and adding a return statement to the end of each _check() macro's
|
||||
"if ( info != 0 )" conditional.
|
||||
Thanks to Dave Love for reporting this issue.
|
||||
|
||||
commit c4f1d18b97a6a8c3ea0366aa759db597a664062a
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Mar 14 19:10:09 2018 -0500
|
||||
|
||||
Minor typo fix to printing arch in testsuite.
|
||||
|
||||
Details:
|
||||
- Mistakenly was calling bli_cpuid_query_id() instead of
|
||||
bli_arch_query_id() in the recent addition to the testsuite output
|
||||
that prints the active sub-configuration. The former function is
|
||||
only used for multi-architecture builds, whereas the latter is the
|
||||
more general option that also works for single configuration
|
||||
(including 'configure auto') builds.
|
||||
|
||||
commit 8f2fabec800a720b3e94b33c0048cc8c4ead436d
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Wed Mar 14 17:43:42 2018 -0500
|
||||
|
||||
Make arm32 and arm64 families work. (#176)
|
||||
|
||||
commit fc6a1842518a0820c6708c285611346d5a1419da
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Mar 14 15:31:17 2018 -0500
|
||||
|
||||
Print sub-configuration name in testsuite output.
|
||||
|
||||
Details:
|
||||
- Added a line to the testsuite output that prints the name of the
|
||||
current/active sub-configuration. This is useful when linking the
|
||||
testsuite against multi-configuration builds because it confirms
|
||||
the sub-configuration that is actually being employed at runtime.
|
||||
Thanks to Devin Matthews for suggesting this feature.
|
||||
|
||||
commit 9943a899d64bf7ec4a24106f6f4c70629bbe1f6e
|
||||
Merge: 290dd4a9 b1a15ae6
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Wed Mar 14 13:27:44 2018 -0500
|
||||
|
||||
Merge pull request #173 from devinamatthews/dev
|
||||
|
||||
Fix Cortex-A9 and Cortex-A15 configs.
|
||||
|
||||
commit b1a15ae6ee0f46c9a95cf59f9555925e0e8e21ff
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Wed Mar 14 13:26:44 2018 -0500
|
||||
|
||||
Use BLIS_H_FLAT
|
||||
|
||||
commit 290dd4a9feee447e69b40ad108954af78e196f7e
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Mar 14 13:15:37 2018 -0500
|
||||
|
||||
Allow arbitrarily deep configuration families.
|
||||
|
||||
Details:
|
||||
- Updated configure so that configuration families specified in the
|
||||
config_registry are no longer constrained as being only one level
|
||||
deep. For example, previously the x86_64 family could not be defined
|
||||
concisely in terms of, say, intel64 and amd64 families, and instead
|
||||
had to be defined as containing "haswell, sandybridge, penryn, zen,
|
||||
etc." In other words, families were constrained to only having
|
||||
singleton configurations as their members. That constraint is now
|
||||
lifted.
|
||||
- Redefined x86_64 family in config_registry in terms of intel64 and
|
||||
amd64.
|
||||
|
||||
commit 9cee78e006d56543ac02fc9c488905c0434e60ae
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Wed Mar 14 13:09:48 2018 -0500
|
||||
|
||||
Fix Cortex-A9 and Cortex-A15 configs.
|
||||
|
||||
Tested with QEMU.
|
||||
|
||||
commit 1a3031740f7fcbbcc2c99d5c4cb50d0413407455
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Mar 13 16:04:40 2018 -0500
|
||||
|
||||
Updates to ARM hardware detection support.
|
||||
|
||||
Details:
|
||||
- Updated/clarified the ARM preprocessor macro branch of bli_cpuid.c.
|
||||
Going forward, cortexa57 (64-bit), cortexa15, and cortexa9 (32-bit)
|
||||
sub-configurations are supported. However, the functions that detect
|
||||
features specific to a15 and a9 are identical, and since a15 is tested
|
||||
first, it will always be chosen for arm32 hardware (even if both
|
||||
sub-configurations were enabled at configure-time and the library is
|
||||
linked and run on an a9). Thus, more work needs to be done to
|
||||
distinguish these two.
|
||||
- Added cpp guard around x86_64 portions of bli_cpuid.c. Now, either
|
||||
the x86_64 or ARM code will be compiled (or neither, if neither
|
||||
environment is detected).
|
||||
- In bli_arch_query_id(), call bli_cpuid_query_id() when the
|
||||
BLIS_FAMILY_ARM64 or BLIS_FAMILY_ARM32 macros are defined.
|
||||
- Added arm64 and arm32 configuration families to config_registry.
|
||||
- Added a note to the arch_t typedef enum in bli_type_defs.h reminding
|
||||
the developer to update the string array in bli_arch.c whenever new
|
||||
enum values are added or existing values are reordered.
|
||||
|
||||
commit 1442d06886ebdc34d8f1cb620229ddc6062c2ce8
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Sun Mar 11 16:59:50 2018 -0500
|
||||
|
||||
Fixed misnamed kernels in _cntx_init_cortexa57.c.
|
||||
|
||||
Details:
|
||||
- Changed incorrect kernel function names in bli_cntx_init_cortexa57.c:
|
||||
bli_sgemm_cortexa57_asm_8x12 -> bli_sgemm_armv8a_asm_8x12
|
||||
bli_dgemm_cortexa57_asm_6x8 -> bli_dgemm_armv8a_asm_6x8
|
||||
Thanks to Jacob Gorm Hansen for reporting this issue.
|
||||
|
||||
commit 48da9f5805f0a49f6ad181ae2bf57b4fde8e1b0a
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Mar 7 12:54:06 2018 -0600
|
||||
|
||||
Tweaked common.mk, Makefile, skx/knl make_defs.mk.
|
||||
|
||||
Details:
|
||||
- Reorganized linker-related section of common.mk so that LDFLAGS set
|
||||
in a sub-configuration's make_defs.mk file will not be immediately
|
||||
(and erroneously) overridden by the default values.
|
||||
- Re-enabled redirected (to file) output of the testsuite when run from
|
||||
the top-level Makefile via 'make test'. (For some reason, it was
|
||||
commented-out for the non-verbose case.)
|
||||
- Removed old/unnecessary code from the make_defs.mk files of skx and
|
||||
knl sub-configurations.
|
||||
|
||||
commit 8b0475a87daa177916e2caac0e530c6a57fa07cf
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Mar 6 06:39:44 2018 -0600
|
||||
|
||||
Fixed typo in attempted fix in 1a8350f7.
|
||||
|
||||
Details:
|
||||
- Mistakenly entered 148 as knl mc blocksize for double real when the
|
||||
value should have been 144. Thanks to Dave Love for reporting this.
|
||||
|
||||
commit 8912e6886b97eabb4ce0c35a3609a0fd994d347b
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Mar 5 18:00:45 2018 -0600
|
||||
|
||||
Fixed missing flags during shared object build.
|
||||
|
||||
Details:
|
||||
- Fixed a bug in common.mk that caused warning, position-independent
|
||||
code, miscellaneous, and general preprocessor flags to be omitted
|
||||
from the configuration family-specific variables that hold those
|
||||
values, as registered by the family's make_defs.mk file. This would
|
||||
most obviously manifest when targeting a configuration family such as
|
||||
'intel64' while simultaneously configuring for a shared object build,
|
||||
as the key '-fPIC' flag would be omitted at compile-time and prevent
|
||||
successful linking. Thanks to Dave Love for reporting this bug.
|
||||
- Other cleanups to common.mk for readability and clarity.
|
||||
|
||||
commit 1a8350f70557fc53ca0c2eadf2076710dd0d9bc9
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Mar 5 13:32:00 2018 -0600
|
||||
|
||||
Fixed cache blocksize bug in knl configuration.
|
||||
|
||||
Details:
|
||||
- Changed the mc blocksize for double real execution in the knl sub-
|
||||
configuration from 160 to 148. The old value was not a multiple of
|
||||
mr (which is 24), and thus the safeguards in bli_gks_register_cntx()
|
||||
were tripping. Thanks for Dave Love for reporting this issue.
|
||||
- Switch knl sub-configuration to use default blocksizes for datatypes
|
||||
not supported by native kernels.
|
||||
- Fixed typos in bli_error.c that prevented certain error strings
|
||||
(which report maximum cache blocksizes not being multiples of their
|
||||
corresponding register blocksize) from properly initializing.
|
||||
|
||||
commit c09fffa827fe6241dc20193a1c404496664220de
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Sat Mar 3 13:13:39 2018 -0600
|
||||
|
||||
Added missing cntx_t* arg in knl packm kernels.
|
||||
|
||||
Details:
|
||||
- Added the missing cntx_t* argument to the function signature of packm
|
||||
kernels in kernels/knl/1m/. Thanks to Dave Love for reporting this
|
||||
issue.
|
||||
|
||||
commit 1ef9360b1fd0209fbeb5766f7a35402fbd080fcb
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Mar 1 14:36:39 2018 -0600
|
||||
|
||||
Enable non-unit vector stride tests by default.
|
||||
|
||||
Details:
|
||||
- Change "vector storage schemes to test" parameter in testsuite's
|
||||
input.general file to "cj". This means that both unit stride column
|
||||
vectors and non-unit stride column vectors will be tested in
|
||||
operations with vector operands (e.g. level-1v, level-1f, level-2).
|
||||
- Very minor comment (typo) changes to input.operations.
|
||||
|
||||
commit 8c4e55a1a1ead9a5e970200fee027ffd2c7e8454
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Feb 28 17:01:47 2018 -0600
|
||||
|
||||
Added individual operation overrides in testsuite.
|
||||
|
||||
Details:
|
||||
- Updated the testsuite driver so that setting one or more individual
|
||||
operation test switches to "2" in input.operations will enable ONLY
|
||||
those operations and disable all others, regardless of the values of
|
||||
the section overrides and other operation switches. This makes it
|
||||
every easy to quickly test only one or two operations, and equally
|
||||
easy to revert back to the previous combination of operation tests.
|
||||
- Added more comments to input.operations describing the use of
|
||||
individual "enable only" overrides.
|
||||
|
||||
commit 34862aed89e5d5a8f35aeecd49f3052ada1f337b
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Feb 28 15:30:14 2018 -0600
|
||||
|
||||
Use zen kernels in haswell sub-configuration.
|
||||
|
||||
Details:
|
||||
- Register use of level-1v zen intrinsic kernels for amaxv, axpyv, dotv,
|
||||
dotxv, and scalv, as well asl level-1f zen intrinsic kernels for axpyf
|
||||
and dotxf. This works because these kernels simply target AVX/AVX2,
|
||||
and therefore work without modification on haswell hardware.
|
||||
- Switch to use of zen microkernels in bli_cntx_init_haswell.c. The zen
|
||||
kernels are essentially identical to those used by haswell, except that
|
||||
now zen kernels are a bit more up-to-date. In the future, I may
|
||||
continue to maintain duplicates, or I may keep the kernels named after
|
||||
one architecture (zen or haswell) but used by both sub-configurations.
|
||||
- In config_registry, enable use of both haswell and zen kernels for the
|
||||
haswell sub-configuration. This is necessary in order to make zen
|
||||
kernels visible when registering kernels in bli_cntx_init_haswell.c.
|
||||
- Enable use of assembly-based complex gemm microkernels for zen,
|
||||
bli_cgemm_zen_asm_3x8() and bli_zgemm_zen_asm_3x4(), in
|
||||
bli_cntx_init_zen.c. This was actually intended for 1681333.
|
||||
|
||||
commit d9079655c9cbb903c6761d79194a21b7c0a322bc
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Feb 23 17:42:48 2018 -0600
|
||||
|
||||
CHANGELOG update (0.3.0)
|
||||
|
||||
commit 709f8361ebc90b96b02ebe5c5ffb6fc3b1b25e58 (tag: 0.3.0)
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Feb 23 17:42:48 2018 -0600
|
||||
|
||||
Version file update (0.3.0)
|
||||
|
||||
commit 3defc7265c12cf85e9de2d7a1f243c5e090a6f9d (origin/master, origin/HEAD)
|
||||
commit 3defc7265c12cf85e9de2d7a1f243c5e090a6f9d
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Feb 23 17:38:19 2018 -0600
|
||||
|
||||
@@ -40,7 +652,7 @@ Date: Fri Feb 23 16:33:32 2018 -0600
|
||||
contained. To remedy this situation, we now selectively use movss to
|
||||
load any element that could be the last element in the matrix.
|
||||
|
||||
commit 5112e1859e7f8888f5555eb7bc02bd9fab9b4442 (origin/rt, rt)
|
||||
commit 5112e1859e7f8888f5555eb7bc02bd9fab9b4442 (origin/rt)
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Feb 23 14:31:26 2018 -0600
|
||||
|
||||
@@ -272,7 +884,7 @@ Date: Thu Jan 4 20:51:35 2018 -0600
|
||||
time hardware detection (when clang is selected).
|
||||
- Added some missing (but mostly-optional) quotes to configure script.
|
||||
|
||||
commit 5a7005dd44ed3174abbe360981e367fd41c99b4b (origin/amd, amd)
|
||||
commit 5a7005dd44ed3174abbe360981e367fd41c99b4b
|
||||
Merge: 7be88705 3bc99a96
|
||||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||||
Date: Wed Jan 3 12:05:12 2018 +0530
|
||||
@@ -321,7 +933,7 @@ Date: Sat Dec 23 15:32:03 2017 -0600
|
||||
is used by the auto-detection script to printf() the name of the
|
||||
sub-configuration corresponding to the detected hardware.
|
||||
|
||||
commit 9804adfd405056ec332bb8e13d68c7b52bd3a6c1 (origin/selfinit, selfinit)
|
||||
commit 9804adfd405056ec332bb8e13d68c7b52bd3a6c1 (origin/selfinit)
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Dec 21 19:22:57 2017 -0600
|
||||
|
||||
|
||||
Reference in New Issue
Block a user