diff --git a/CHANGELOG b/CHANGELOG index 4b8218ffb..f7cd6246d 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,10 +1,622 @@ -commit 709f8361ebc90b96b02ebe5c5ffb6fc3b1b25e58 (HEAD -> master, tag: 0.3.0) +commit 1f28d7c86e17730f05bd239c8e8d67e3e7510a4f (HEAD -> master, tag: 0.3.1) +Author: Field G. Van Zee +Date: Wed Apr 4 17:13:15 2018 -0500 + + Version file update (0.3.1) + +commit e6cc9ee26bcf0450f1120d5d12985b04d9fb8516 (origin/master, origin/dev, origin/amd, origin/HEAD, dev, amd) +Merge: 786d15c5 3c91c7ae +Author: Field G. Van Zee +Date: Wed Apr 4 16:08:18 2018 -0500 + + Merge branch 'dev' of github.com:flame/blis into dev + +commit 786d15c5ef09f1f647b126b63d57e76d5810c58e +Author: Field G. Van Zee +Date: Wed Apr 4 16:06:47 2018 -0500 + + Added skx, knl to x86_64 configuration family. + + Details: + - Added 'skx' and 'knl' sub-configurations to the 'x86_64' configuration + family in the config_registry file. + - Added logic to configure that avoids committing certain sub-configs to + the configuration/kernel registries if those sub-configs cannot be + handled properly by the chosen compiler. (This was modeled after + similar logic in TBLIS's configure; thanks to Devin Matthews for + pointing this out.) First, the compiler and its version are inspected + and, based on the results, certain configurations are added to a + "blacklist". Then, as the configuration registries are being created, + configurations and/or kernels that match items in the blacklist are + skipped over and not commited to the registries. Under certain + circumstances, omitting a blacklisted configuration will indirectly + invalidate other configurations due to the loss of availability of + the original blacklisted configuration's kernel set. This additional + indirect blacklist is also accounted for. + - Added output to the beginning of configure that echos information + about the chosen compiler as well as the configurations that are + blacklisted and must be stripped from the registries. + - Various other cleanups in configure, especially with respect to + explicitly declaring local variables in functions. + - Comment updates to config/zen/make_defs.mk regarding choice of -march + flags based on compiler version. + +commit 3c91c7aebafb446a2582267beb3b22c8bb475b3b +Author: Field G. Van Zee +Date: Mon Apr 2 12:40:25 2018 -0500 + + Fixed 64b type mismatch warning in cblas_xerbla.c. + + Details: + - Fixed a compiler warning concerning a type mismatch between the + format specifier of the printf() call in cblas_xerbla.c and its + corresponding (info) argument. The warning manifested when the CBLAS + layer was enabled and the BLAS/CBLAS integer type siwas is set to 64 + (the default is 32). The warning was fixed by changing the specifier + from %d to %jd and typecasting the argument to intmax_t. Thanks to + Dave Love for reporting this issue and submitting the patch. + +commit 71eaf449a812fe2bd640d21513ec83974b2edb45 +Merge: 6a628184 ae9a5be5 +Author: Field G. Van Zee +Date: Tue Mar 27 17:21:43 2018 -0500 + + Merge branch 'dev' + +commit ae9a5be56d6f9b87278d6032154d2dcf3fb7d54f +Author: dnp +Date: Tue Mar 27 17:01:23 2018 -0500 + + Fixed bug in skx sgemm microkernel + +commit 3f02af0905b1e2e2e065862f8afe5e9a52f282b2 +Author: Field G. Van Zee +Date: Mon Mar 26 17:40:04 2018 -0500 + + Row storage optimizations to zen dotxf kernels. + + Details: + - Split the main loop bodies of zen's [sd]dotxf kernels into two cases: + one to handle a column-stored matrix A and one to handle a row-stored + matrix A. This allows vector instructions to be employed even if A is + stored by rows (and A^T appears stored as columns). Both storage cases + use a common edge case loop. Thanks to Devin Matthews for this idea + and for prototyping the change needed for sdotxf kernel. + +commit 679dcc331dd870ec680e135a3fb65ffa6e3a91c2 +Author: Field G. Van Zee +Date: Mon Mar 26 15:35:17 2018 -0500 + + Make k_iter/k_left uint64_t in bulldozer fma ukrs. + + Details: + - Changed the declaration of k_iter and k_left for d, c, z microkernels + from dim_t to uint64_t. This is needed to ensure compatibility with + the movq instruction used to load the value into registers. This + change should have been made a long time ago, but for some reason + only recently began showing up via Travis CI. + +commit 6a628184f6938673440e4cdd4fed0208c51fd1f9 +Author: Field G. Van Zee +Date: Mon Mar 26 14:48:16 2018 -0500 + + Fixed a memkind-related compile-time bug on knl. + + Details: + - Fixed a compile-time error that occurred due to the fact that + BLIS_ENABLE_MEMKIND, defined in bli_config.h, was not being defined + soon enough to be used in bli_system.h where it is needed to determine + whether hbwmalloc.h should be #included. bli_system.h is now included + after bli_config.h (and bli_config_macro_defs.h). Thanks to Dave Love + for reporting this issue. + - Tweaked the language used by configure to echo the status of the + --with[out]-memkind option. + +commit e2192a8fd58ec3657434ddd407033e097edad8f4 +Author: Field G. Van Zee +Date: Fri Mar 23 12:53:48 2018 -0500 + + Removed vzeroupper intrinsics from zen kenels. + + Details: + - Fixed a bug in the zen (also used by haswell) dotxf kernels whereby a + vzeroupper instruction destoryed part of the intermediate result + stored by the vdpps instructions that came right before. (The + vzeroupper instrinsic was removed.) + - Removed remaining vzeroupper instrinsics from other zen kernels. + Previously, the vzeroupper instructions were included because BLIS is + typically compiled with -mfpmath=sse. But it was brought to my + attention that inserting these vzeroupper instructions is unnecessary + for our purposes, since (a) -mfpmath=sse results in VEX-encoded scalar + code rather than literal SSE instructions, and (b) compilers already + (likely) insert vzeroupper instructions where necessary. Thanks to + Devin Matthews for zeroing in on the dotxf bug. + - Removed -malign-double from bulldozer make_defs.mk. This alignment + was already happening by default since bulldozer is an x86_64 system. + +commit 22289ad23cd10b81451ce82f60d84b5f97e7fd85 +Author: Field G. Van Zee +Date: Thu Mar 22 18:21:30 2018 -0500 + + Added build system support for libmemkind. + + Details: + - Added support for libmemkind to configure. configure attempts to + detect the presence of libmemkind by compiling a small program + containing #include and a call to hbw_malloc(). If + successful, it is assumed that libmemkind is present and available. + If present, use of libmemkind is enabled by default, and otherwise + use is disabled by default. If libmemkind is present, the user may + explicitly disable use of the library by running configure with the + --without-memkind option. Furthermore, a configuration may disable + libmemkind, perhaps conditional on some aspect of the build system, + by including -DBLIS_DISABLE_MEMKIND in the configuration's CPPROCFLAGS + make variable and setting the BLIS_ENABLE_MEMKIND makefile variable, + set in config.mk, to 'no'. (The knl configuration makes use of this + latter feature; see below.) + - If enabled at configure-time, bli_system.h will #include + and bli_kernel_macro_defs.h will define BLIS_MALLOC_POOL and + BLIS_FREE_POOL to use hbw_malloc() and hbw_free(), respectively. + - Deprecated explicit use of BLIS_NO_HBWMALLOC in + config/knl/bli_family.knl.h and replaced use of -DBLIS_NO_HBWMALLOC in + config/knl/make_defs.mk with -DBLIS_DISABLE_MEMKIND, which overrides + (#undefs) the definition of BLIS_ENABLE_MEMKIND in bli_system.h, if it + would otherwise be defined. Also, set the BLIS_ENABLE_MEMKIND makefile + variable to 'no'. + - common.mk now adds libmemkind to LDFLAGS if libmemkind is enabled. + +commit 7dc40eafdd9af3e8c4519a8d1b04d25830b4ca7a +Author: Field G. Van Zee +Date: Wed Mar 21 18:39:16 2018 -0500 + + Updates to top-level and test driver Makefiles. + + Details: + - Added logic to common.mk that will choose a BLIS library against which + to link (LIBBLIS_LINK). The default choice is the static (.a) library; + the shared (.so) library is chosen only if the shared library build was + enabled and the static one was disabled. + - Updated the various test driver Makefiles to reference this common, + pre-chosen library against which to link. (Previously, these drivers + unconditionally linked against the static library and would have + failed if the static library build was disabled at configure-time.) + - Renamed many of the variables in common.mk and the top-level Makefile + so that variables relating to the libblis.[a|so] files, including + paths to those files, begin with "LIBBLIS". + - Shuffled around some of the library definitions from the top-level + Makefile to common.mk. + - Renamed BLIS_ENABLE_DYNAMIC_BUILD to BLIS_ENABLE_SHARED_BUILD, and + the @enable_dynamic@ anchor to @enable_shared@ in build/config.mk.in + and in configure. + - A few other cleanups in the top-level Makefile. + +commit 97e1eeade3c51df1bae574a9bc1da34b05bf2bd3 +Author: Field G. Van Zee +Date: Wed Mar 21 15:47:11 2018 -0500 + + Added input.operations.fast file for 'make check'. + + Details: + - Added an 'input.operations.fast' file to testsuite directory to go + along with the 'input.general.fast' file used by the 'make check' + target in the top-level Makefile. This will allow the "fast" check + to prune operations and/or parameter combinations from the test + space in order to save time. + - Currently, input.operations.fast prunes trmm3 and all transposition + and conjugation parameters from the level-3 test space. + - Reduced problem size tested in input.general.fast to 100 and disabled + testing of 1m method. + +commit c441caa95aabe69f54e2160eb67bf4ca76a66c34 +Author: Field G. Van Zee +Date: Tue Mar 20 17:56:02 2018 -0500 + + README update. + + Details: + - Minor updates to README.md. + - Minor change to blastest/Makefile. + +commit 6fe018eb4ac8c16f2edc916c24f5994848017b7f +Author: Field G. Van Zee +Date: Tue Mar 20 15:35:45 2018 -0500 + + Added .gitkeep file to blastest/obj. + + Details: + - Added an empty file named '.gitkeep' to blastest/obj/ so that git will + track the otherwise empty directory. (This is already done for the BLIS + testsuite in testsuite/obj.) + +commit 0e6d000db9291342913dc5f8590a28c67bbcbc95 +Author: Field G. Van Zee +Date: Tue Mar 20 15:08:43 2018 -0500 + + Updated .gitignore to ignore BLAS test out.* files. + +commit 40c040a31d96fbadff11f761d0cad1ef03ef2cc5 +Author: Field G. Van Zee +Date: Tue Mar 20 14:33:50 2018 -0500 + + Fixes to .travis.yml. + + Details: + - Invoke the full BLIS testsuite via 'make testblis' instead of the fast + version via 'blistest-fast' (which was wrong anyway, since the correct + fast traget is 'testblis-fast'). + - Invoke the BLAS tests via 'make testblas' instead of 'blastest'. + +commit 664ec4813d8b53121cce7a68bef47da656ece9cb +Author: Field G. Van Zee +Date: Tue Mar 20 13:54:58 2018 -0500 + + Integrated f2c'ed netlib BLAS test suite. + + Details: + - Created a new test suite that exercises only the BLAS compatibility + found in BLIS. The test suite is a straightforward port of code + obtained from netlib LAPACK, run through f2c and linked to a stripped- + down version of libf2c that is compiled along with the test drivers + (to prevent any obvious ABI issues). The new BLAS test suite can be + run from within its new local directory, 'blastest' (through its local + 'make ; make run' targets) or from the top-level Makefile (via the + 'make testblas' target). Output files are created in whatever directory + the test drivers are run, whether it be the 'blastest' directory, the + top-level source distribution directory, or the out-of-tree directory + in which 'configure' was run. Also, the results of the BLAS test suite + can be checked via 'make checkblas', which summarizes the presence or + absence of test failures in a single line printed to stdout. + - Updated the 'test' target to run both 'testblis' and 'testblas'. + - Added a new 'testblis-fast' target that runs the BLIS testsuite with + smaller problem sizes, allowing it to finish more quickly. + - Added a 'make check' target, which runs 'checkblis-fast' and + 'checkblas'. + - Changed .travis.yml so that Travis CI runs 'testblis-fast' instead of + 'testblis' before (calling the check-blistest.sh script to check the + result manually). + - Renamed some targets in the top-level Makefile to be consistent between + BLAS and BLIS. + +commit 40fa10396c0a3f9601cf49f6b6cd9922185c932e +Author: Field G. Van Zee +Date: Mon Mar 19 18:19:43 2018 -0500 + + Fixed a few obscure bugs in the BLAS API. + + Details: + - Fixed a missing parameter in the definition of sdsdot_(). The 'sb' + argument was missing. Strangely, the argument is omitted from dsdot_() + in the BLAS API. + - Fixed the missing 'c' or 'u' in the "?gerc" or "?geru" operation string + passed to xerbla_() by the bla_ger_check() macro. + - For bla_syrk_check() and bla_syr2k_check() macros, only allow + conjugate-transpose (trans='c') as a valid argument for the real + domain functions [sd]syrk_() and [sd]syr2k_(). (Previously, the + argument was allowed even for the complex domain equivalents, which + was inconsistent with the BLAS API.) + +commit fe7d7f1e43e4c26249eed83d4188beee1ba96202 +Author: Field G. Van Zee +Date: Sun Mar 18 19:43:06 2018 -0500 + + Fixed cpp macro parameter "ch" typo in bla_ger.c. + + Details: + - Previously, the BLAS routine-generating macro in bla_ger.c was + incorrectly passing MKSTR(ch) into the _check() macro when it + should have been passing in the char that was available, chxy. + I've instead changed the name of the macro parameter from chxy + to ch. Similar change as made to bla_ger.h for consistency. + Thanks to Dave Love in helping track this down. (NOTE: This is + actually the root cause of the bug that was first patched by + increasing the length of the operation name strings passed into + xerbla_(), as defined by the constant BLIS_MAX_BLAS_FUNC_STR_LENGTH, + in 3d1a5a7. In theory, that change could be backed out now.) + - Applied aforementioned chxy->ch change to bla_dot.[ch], as well as + frame/compat/cblas/f77_sub/f77_dot_sub.[ch] (not because it needed + to happen, but for naming consistency). + - Reformatted function signatures/prototypes of CBLAS functions and + function calls to BLAS in frame/compat/cblas/f77_sub/*.c. + +commit cb7ed90752d1ddbac11368c4510641ca4f3a02eb +Author: Field G. Van Zee +Date: Fri Mar 16 13:05:56 2018 -0500 + + Convert op names to uppercase before calling xerbla_(). + + Details: + - Defined a new function, bli_string_mkupper(), that calls toupper() on + every non-NULL character in a string. + - Call bli_string_mkupper() prior to calling xerbla_() in the level-2/-3 + BLAS _check() macros. This prevents the BLAS testsuite from complaining + that the operation name (e.g. "dgemm") does not match the expected + value (e.g. "DGEMM"). Thanks to Dave Love for reporting this issue. + +commit 3d1a5a7c08fed3ba29f060fe1db2b0dc42dde223 +Author: Field G. Van Zee +Date: Fri Mar 16 12:24:07 2018 -0500 + + Fixed printf() format overflow. + + Details: + - Increased the length of operation name strings passed to xerbla_() in + the level-2 and level-3 operation _check() functions, found in + frame/compat/check. This avoids a format specifier overflow warning by + gcc 7. Thanks to Dave Love for reporting this issue and suggesting the + fix. + +commit c73055f028684d998e03b2392093c393782bbfe7 +Author: Field G. Van Zee +Date: Thu Mar 15 16:08:21 2018 -0500 + + Return after non-zero info in BLAS checks. + + Details: + - Previously, when calling the BLAS compatibility layer, discovering a + parameter check failure would result in the proper setting of the + info parameter (printed by xerbla_()), but would also come with an + immediate abort() rather than a return. This was incorrect behavior + for two overlapping reasons. + (1) BLAS should return gracefully to the caller in the event of a + bad set of parameters, not abort(). + (2) When BLIS was being tested via the BLAS testsuite, BLIS's + xerbla_() would correctly get preempted/overridden by the + xerbla_() in the BLAS testsuite, but execution would then + erroneously continue on to the BLIS implementation with bad + parameter values. + - The previous issue was addressed by disabling the abort() in BLIS's + xerbla_(), changing all of the BLAS _check() functions to cpp macros, + and adding a return statement to the end of each _check() macro's + "if ( info != 0 )" conditional. + Thanks to Dave Love for reporting this issue. + +commit c4f1d18b97a6a8c3ea0366aa759db597a664062a +Author: Field G. Van Zee +Date: Wed Mar 14 19:10:09 2018 -0500 + + Minor typo fix to printing arch in testsuite. + + Details: + - Mistakenly was calling bli_cpuid_query_id() instead of + bli_arch_query_id() in the recent addition to the testsuite output + that prints the active sub-configuration. The former function is + only used for multi-architecture builds, whereas the latter is the + more general option that also works for single configuration + (including 'configure auto') builds. + +commit 8f2fabec800a720b3e94b33c0048cc8c4ead436d +Author: Devin Matthews +Date: Wed Mar 14 17:43:42 2018 -0500 + + Make arm32 and arm64 families work. (#176) + +commit fc6a1842518a0820c6708c285611346d5a1419da +Author: Field G. Van Zee +Date: Wed Mar 14 15:31:17 2018 -0500 + + Print sub-configuration name in testsuite output. + + Details: + - Added a line to the testsuite output that prints the name of the + current/active sub-configuration. This is useful when linking the + testsuite against multi-configuration builds because it confirms + the sub-configuration that is actually being employed at runtime. + Thanks to Devin Matthews for suggesting this feature. + +commit 9943a899d64bf7ec4a24106f6f4c70629bbe1f6e +Merge: 290dd4a9 b1a15ae6 +Author: Devin Matthews +Date: Wed Mar 14 13:27:44 2018 -0500 + + Merge pull request #173 from devinamatthews/dev + + Fix Cortex-A9 and Cortex-A15 configs. + +commit b1a15ae6ee0f46c9a95cf59f9555925e0e8e21ff +Author: Devin Matthews +Date: Wed Mar 14 13:26:44 2018 -0500 + + Use BLIS_H_FLAT + +commit 290dd4a9feee447e69b40ad108954af78e196f7e +Author: Field G. Van Zee +Date: Wed Mar 14 13:15:37 2018 -0500 + + Allow arbitrarily deep configuration families. + + Details: + - Updated configure so that configuration families specified in the + config_registry are no longer constrained as being only one level + deep. For example, previously the x86_64 family could not be defined + concisely in terms of, say, intel64 and amd64 families, and instead + had to be defined as containing "haswell, sandybridge, penryn, zen, + etc." In other words, families were constrained to only having + singleton configurations as their members. That constraint is now + lifted. + - Redefined x86_64 family in config_registry in terms of intel64 and + amd64. + +commit 9cee78e006d56543ac02fc9c488905c0434e60ae +Author: Devin Matthews +Date: Wed Mar 14 13:09:48 2018 -0500 + + Fix Cortex-A9 and Cortex-A15 configs. + + Tested with QEMU. + +commit 1a3031740f7fcbbcc2c99d5c4cb50d0413407455 +Author: Field G. Van Zee +Date: Tue Mar 13 16:04:40 2018 -0500 + + Updates to ARM hardware detection support. + + Details: + - Updated/clarified the ARM preprocessor macro branch of bli_cpuid.c. + Going forward, cortexa57 (64-bit), cortexa15, and cortexa9 (32-bit) + sub-configurations are supported. However, the functions that detect + features specific to a15 and a9 are identical, and since a15 is tested + first, it will always be chosen for arm32 hardware (even if both + sub-configurations were enabled at configure-time and the library is + linked and run on an a9). Thus, more work needs to be done to + distinguish these two. + - Added cpp guard around x86_64 portions of bli_cpuid.c. Now, either + the x86_64 or ARM code will be compiled (or neither, if neither + environment is detected). + - In bli_arch_query_id(), call bli_cpuid_query_id() when the + BLIS_FAMILY_ARM64 or BLIS_FAMILY_ARM32 macros are defined. + - Added arm64 and arm32 configuration families to config_registry. + - Added a note to the arch_t typedef enum in bli_type_defs.h reminding + the developer to update the string array in bli_arch.c whenever new + enum values are added or existing values are reordered. + +commit 1442d06886ebdc34d8f1cb620229ddc6062c2ce8 +Author: Field G. Van Zee +Date: Sun Mar 11 16:59:50 2018 -0500 + + Fixed misnamed kernels in _cntx_init_cortexa57.c. + + Details: + - Changed incorrect kernel function names in bli_cntx_init_cortexa57.c: + bli_sgemm_cortexa57_asm_8x12 -> bli_sgemm_armv8a_asm_8x12 + bli_dgemm_cortexa57_asm_6x8 -> bli_dgemm_armv8a_asm_6x8 + Thanks to Jacob Gorm Hansen for reporting this issue. + +commit 48da9f5805f0a49f6ad181ae2bf57b4fde8e1b0a +Author: Field G. Van Zee +Date: Wed Mar 7 12:54:06 2018 -0600 + + Tweaked common.mk, Makefile, skx/knl make_defs.mk. + + Details: + - Reorganized linker-related section of common.mk so that LDFLAGS set + in a sub-configuration's make_defs.mk file will not be immediately + (and erroneously) overridden by the default values. + - Re-enabled redirected (to file) output of the testsuite when run from + the top-level Makefile via 'make test'. (For some reason, it was + commented-out for the non-verbose case.) + - Removed old/unnecessary code from the make_defs.mk files of skx and + knl sub-configurations. + +commit 8b0475a87daa177916e2caac0e530c6a57fa07cf +Author: Field G. Van Zee +Date: Tue Mar 6 06:39:44 2018 -0600 + + Fixed typo in attempted fix in 1a8350f7. + + Details: + - Mistakenly entered 148 as knl mc blocksize for double real when the + value should have been 144. Thanks to Dave Love for reporting this. + +commit 8912e6886b97eabb4ce0c35a3609a0fd994d347b +Author: Field G. Van Zee +Date: Mon Mar 5 18:00:45 2018 -0600 + + Fixed missing flags during shared object build. + + Details: + - Fixed a bug in common.mk that caused warning, position-independent + code, miscellaneous, and general preprocessor flags to be omitted + from the configuration family-specific variables that hold those + values, as registered by the family's make_defs.mk file. This would + most obviously manifest when targeting a configuration family such as + 'intel64' while simultaneously configuring for a shared object build, + as the key '-fPIC' flag would be omitted at compile-time and prevent + successful linking. Thanks to Dave Love for reporting this bug. + - Other cleanups to common.mk for readability and clarity. + +commit 1a8350f70557fc53ca0c2eadf2076710dd0d9bc9 +Author: Field G. Van Zee +Date: Mon Mar 5 13:32:00 2018 -0600 + + Fixed cache blocksize bug in knl configuration. + + Details: + - Changed the mc blocksize for double real execution in the knl sub- + configuration from 160 to 148. The old value was not a multiple of + mr (which is 24), and thus the safeguards in bli_gks_register_cntx() + were tripping. Thanks for Dave Love for reporting this issue. + - Switch knl sub-configuration to use default blocksizes for datatypes + not supported by native kernels. + - Fixed typos in bli_error.c that prevented certain error strings + (which report maximum cache blocksizes not being multiples of their + corresponding register blocksize) from properly initializing. + +commit c09fffa827fe6241dc20193a1c404496664220de +Author: Field G. Van Zee +Date: Sat Mar 3 13:13:39 2018 -0600 + + Added missing cntx_t* arg in knl packm kernels. + + Details: + - Added the missing cntx_t* argument to the function signature of packm + kernels in kernels/knl/1m/. Thanks to Dave Love for reporting this + issue. + +commit 1ef9360b1fd0209fbeb5766f7a35402fbd080fcb +Author: Field G. Van Zee +Date: Thu Mar 1 14:36:39 2018 -0600 + + Enable non-unit vector stride tests by default. + + Details: + - Change "vector storage schemes to test" parameter in testsuite's + input.general file to "cj". This means that both unit stride column + vectors and non-unit stride column vectors will be tested in + operations with vector operands (e.g. level-1v, level-1f, level-2). + - Very minor comment (typo) changes to input.operations. + +commit 8c4e55a1a1ead9a5e970200fee027ffd2c7e8454 +Author: Field G. Van Zee +Date: Wed Feb 28 17:01:47 2018 -0600 + + Added individual operation overrides in testsuite. + + Details: + - Updated the testsuite driver so that setting one or more individual + operation test switches to "2" in input.operations will enable ONLY + those operations and disable all others, regardless of the values of + the section overrides and other operation switches. This makes it + every easy to quickly test only one or two operations, and equally + easy to revert back to the previous combination of operation tests. + - Added more comments to input.operations describing the use of + individual "enable only" overrides. + +commit 34862aed89e5d5a8f35aeecd49f3052ada1f337b +Author: Field G. Van Zee +Date: Wed Feb 28 15:30:14 2018 -0600 + + Use zen kernels in haswell sub-configuration. + + Details: + - Register use of level-1v zen intrinsic kernels for amaxv, axpyv, dotv, + dotxv, and scalv, as well asl level-1f zen intrinsic kernels for axpyf + and dotxf. This works because these kernels simply target AVX/AVX2, + and therefore work without modification on haswell hardware. + - Switch to use of zen microkernels in bli_cntx_init_haswell.c. The zen + kernels are essentially identical to those used by haswell, except that + now zen kernels are a bit more up-to-date. In the future, I may + continue to maintain duplicates, or I may keep the kernels named after + one architecture (zen or haswell) but used by both sub-configurations. + - In config_registry, enable use of both haswell and zen kernels for the + haswell sub-configuration. This is necessary in order to make zen + kernels visible when registering kernels in bli_cntx_init_haswell.c. + - Enable use of assembly-based complex gemm microkernels for zen, + bli_cgemm_zen_asm_3x8() and bli_zgemm_zen_asm_3x4(), in + bli_cntx_init_zen.c. This was actually intended for 1681333. + +commit d9079655c9cbb903c6761d79194a21b7c0a322bc +Author: Field G. Van Zee +Date: Fri Feb 23 17:42:48 2018 -0600 + + CHANGELOG update (0.3.0) + +commit 709f8361ebc90b96b02ebe5c5ffb6fc3b1b25e58 (tag: 0.3.0) Author: Field G. Van Zee Date: Fri Feb 23 17:42:48 2018 -0600 Version file update (0.3.0) -commit 3defc7265c12cf85e9de2d7a1f243c5e090a6f9d (origin/master, origin/HEAD) +commit 3defc7265c12cf85e9de2d7a1f243c5e090a6f9d Author: Field G. Van Zee Date: Fri Feb 23 17:38:19 2018 -0600 @@ -40,7 +652,7 @@ Date: Fri Feb 23 16:33:32 2018 -0600 contained. To remedy this situation, we now selectively use movss to load any element that could be the last element in the matrix. -commit 5112e1859e7f8888f5555eb7bc02bd9fab9b4442 (origin/rt, rt) +commit 5112e1859e7f8888f5555eb7bc02bd9fab9b4442 (origin/rt) Author: Field G. Van Zee Date: Fri Feb 23 14:31:26 2018 -0600 @@ -272,7 +884,7 @@ Date: Thu Jan 4 20:51:35 2018 -0600 time hardware detection (when clang is selected). - Added some missing (but mostly-optional) quotes to configure script. -commit 5a7005dd44ed3174abbe360981e367fd41c99b4b (origin/amd, amd) +commit 5a7005dd44ed3174abbe360981e367fd41c99b4b Merge: 7be88705 3bc99a96 Author: Nisanth M P Date: Wed Jan 3 12:05:12 2018 +0530 @@ -321,7 +933,7 @@ Date: Sat Dec 23 15:32:03 2017 -0600 is used by the auto-detection script to printf() the name of the sub-configuration corresponding to the detected hardware. -commit 9804adfd405056ec332bb8e13d68c7b52bd3a6c1 (origin/selfinit, selfinit) +commit 9804adfd405056ec332bb8e13d68c7b52bd3a6c1 (origin/selfinit) Author: Field G. Van Zee Date: Thu Dec 21 19:22:57 2017 -0600