amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-11 17:50:00 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	c09fffa827	Added missing cntx_t* arg in knl packm kernels. Details: - Added the missing cntx_t* argument to the function signature of packm kernels in kernels/knl/1m/. Thanks to Dave Love for reporting this issue.	2018-03-03 13:13:39 -06:00
Field G. Van Zee	1ef9360b1f	Enable non-unit vector stride tests by default. Details: - Change "vector storage schemes to test" parameter in testsuite's input.general file to "cj". This means that both unit stride column vectors and non-unit stride column vectors will be tested in operations with vector operands (e.g. level-1v, level-1f, level-2). - Very minor comment (typo) changes to input.operations.	2018-03-01 14:36:39 -06:00
Field G. Van Zee	8c4e55a1a1	Added individual operation overrides in testsuite. Details: - Updated the testsuite driver so that setting one or more individual operation test switches to "2" in input.operations will enable ONLY those operations and disable all others, regardless of the values of the section overrides and other operation switches. This makes it every easy to quickly test only one or two operations, and equally easy to revert back to the previous combination of operation tests. - Added more comments to input.operations describing the use of individual "enable only" overrides.	2018-02-28 17:01:47 -06:00
Field G. Van Zee	34862aed89	Use zen kernels in haswell sub-configuration. Details: - Register use of level-1v zen intrinsic kernels for amaxv, axpyv, dotv, dotxv, and scalv, as well asl level-1f zen intrinsic kernels for axpyf and dotxf. This works because these kernels simply target AVX/AVX2, and therefore work without modification on haswell hardware. - Switch to use of zen microkernels in bli_cntx_init_haswell.c. The zen kernels are essentially identical to those used by haswell, except that now zen kernels are a bit more up-to-date. In the future, I may continue to maintain duplicates, or I may keep the kernels named after one architecture (zen or haswell) but used by both sub-configurations. - In config_registry, enable use of both haswell and zen kernels for the haswell sub-configuration. This is necessary in order to make zen kernels visible when registering kernels in bli_cntx_init_haswell.c. - Enable use of assembly-based complex gemm microkernels for zen, bli_cgemm_zen_asm_3x8() and bli_zgemm_zen_asm_3x4(), in bli_cntx_init_zen.c. This was actually intended for `1681333`.	2018-02-28 15:30:14 -06:00
Field G. Van Zee	d9079655c9	CHANGELOG update (0.3.0)	2018-02-23 17:42:48 -06:00
Field G. Van Zee	709f8361eb	Version file update (0.3.0) 0.3.0	2018-02-23 17:42:48 -06:00
Field G. Van Zee	3defc7265c	Applied `34b72a3` to non-active/unused microkernels. Details: - Applied the read-beyond-bounds bugfix in `34b72a3` to other haswell and zen kernels (ie: other microtile shapes) which are not used by default. This was done mostly in case someone decided to pick up these kernels and start using them, not because it affects BLIS's behavior out-of-the-box.	2018-02-23 17:38:19 -06:00
Field G. Van Zee	34b72a3517	Fixed obscure read-beyond-bounds bug in sgemm ukrs. Details: - Fixed an obscure bug in the bli_sgemm_haswell_asm_6x16 and bli_sgemm_zen_asm_6x16 microkernels when the input/output matrix C is stored with general stride (ie: both rs and cs are non-unit). The bug was rooted in the way those microkernels read from matrix C-- namely, they used vmovlps/vmovhps instead of movss. By loading two floats at a time, even if one of them was treated as junk, the assembly code could be written in a more concise manner. However, under certain conditions--if m % mr == 0 and n % nr == 0 and the underlying matrix is not an internal "view" into a larger matrix-- this could result in the very last vmovhps of the last (bottom-right) microkernel invocation reading beyond valid memory. Specifically, the low 32 bits read would always be valid, but the high 32 bits could reside beyond the bounds of the array in which the output C matrix is contained. To remedy this situation, we now selectively use movss to load any element that could be the last element in the matrix.	2018-02-23 16:33:32 -06:00
Field G. Van Zee	5112e1859e	Added missing 'restrict' to some kernels' cntx_t. Details: - Added missing 'restrict' keyword to cntx_t argument of function signatures corresponding to level-1v, level-1f, and level-1m kernels. This affected bli_l1v_ker_prot.h, bli_l1f_ker_prot.h, and bli_l1m_ker_prot.h. (The 'restrict' was already being used to qualify cntx_t* arguments for kernels defined in bli_l3_ker_prot.h.) - Added comments to bli_l1v_ker.h, bli_l1f_ker.h, bli_l1m_ker.h, and bli_l3_ukr.h that help explain how those headers function to produce kernel prototypes using the prototype macros defined in the files mentioned above.	2018-02-23 14:31:26 -06:00
Field G. Van Zee	1fa8af95d8	Merge branch 'rt'	2018-02-21 17:54:02 -06:00
Field G. Van Zee	c084b03b31	Merge branch 'rt'	2018-02-21 17:52:17 -06:00
Field G. Van Zee	16813335bd	Merge branch 'amd' into rt Details: - Merged contributions made by AMD via 'amd' branch (see summary below). Special thanks to AMD for their contributions to-date, especially with regard to intrinsic- and assembly-based kernels. - Added column storage output cases to microkernels in bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with the extra cost of transposing the microtile in registers, this is much faster than using the general storage case when the underlying matrix is column-stored. - Added s and d assembly-based zen gemmtrsm_u microkernel (including column storage optimization mentioned above). - Updated zen sub-configuration to reflect presence of new native kernels. - Temporarily reverted zen sub-configuration's level-3 cache blocksizes to smaller haswell values. - Temporarily disabled small matrix handling for zen configuration family in config/zen/bli_family_zen.h. - Updated zen CFLAGS according to changes in `1e4365b`. - Updated haswell microkernels such that: - only one vzeroupper instruction is called prior to returning - movapd/movupd are used in leiu of movaps/movups for double-real microkernels. (Note that single-real microkernels still use movaps/movups.) - Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is now included via frame/include/bli_arch_config.h. - Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation in testsuite/src/test_amaxv.c). - Added early return for alpha == 0 in bli_dotxv_ref.c. - Integrated changes from `f07b176`, including a fix for undefined behavior when executing the 1m method under certain conditions. - Updated config_registry; no longer need haswell kernels for zen sub-configuration. - Tweaked marginal and pass thresholds for dotxf. - Reformatted level-1v, -1f, and -3 amd kernels and inserted additional comments. - Updated LICENSE file to explicitly mention that parts are copyright UT-Austin and AMD. - Added AMD copyright to header templates in build/templates. Summary of previous changes from 'amd' branch. - Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and s and d assembly-based zen gemmtrsm_l microkernels (d6x8). - Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv, and scalv, with extra-unrolling variants for axpyv and scalv. - Added a small matrix handler to bli_gemm_front(), with the handler implemented in kernels/zen/3/bli_gemm_small_matrix.c. - Added additional logic to sumsqv that first attempts to compute the sum of the squares via dotv(). If there is a floating-point exception (FE_OVERFLOW), then the previous (numerically conservative) code is used; otherwise, the result of dotv() is square-rooted and stored as the result. This new implementation is only enabled when FE_OVERFLOW is #defined. If the macro is not #defined, then the previous implementation is used. - Added axpyv and dotv standalone test drivers to test directory. - Added zen support to old cpuid_x86.c driver in build/auto-detect/old. - Added thread-local and __attribute__-related macros to bli_macro_defs.h.	2018-02-21 17:43:32 -06:00
Devin Matthews	5d03b6e6e1	Fix asm macro include line for KNL. Fixes #167 .	2018-02-19 11:31:30 -06:00
Field G. Van Zee	f07b176c84	Fixed an obscure bug in the 1m implementation. Details: - Fixed a bug in the way the bli_gemm1m_cntx_ref() function (defined in ref_kernels/bli_cntx_ref.c) initializes its context for 1m execution. Previously, the function probed the context that was in the process of being updated for use with 1m--this context being previously initialized/copied from a native context--for its storage preference to determine which "variant" (row- or column-oriented) of 1m would be needed. However, the _cntx_ref() function was not updating the method field of the context until AFTER this query, and the conditional which depended on it, had taken place, meaning the storage preference query function would mistakenly think the context was for native execution, since the context's method field would still be set to BLIS_NAT. This would lead it to incorrectly grab the storage preference of the complex domain microkernel rather than the corresponding real domain microkernel, which could cause the storage preference predicate to evaluate to the wrong value, which would lead to the _cntx_ref() function choosing the wrong variant. This could lead to undefined behavior at runtime. The method is now explicitly set within the context prior to calling the storage preference query function. - Updated comments in frame/ind/oapi/bli_l3_3m4m1m_oapi.c. - Fixed a typo in the commented-out CFLAGS in config/zen/make_defs.mk, which are appropriate for gcc 6.x and newer. (Mistakenly used -march=bdver4 instead of -march=znver1.)	2018-02-15 18:36:54 -06:00
Field G. Van Zee	1f94bb7b96	Document how to enable zen-specific instructions. Details: - Added as a comment in config/zen/make_defs.mk the list of compiler flags that could be added to manually enable the instructions provided by the Zen microarchitecture that are not already implied by -march=bdver4. This information, along with the previous commit's flags to selectively disable Bulldozer instructions no longer present in Zen, was gathered from [1]. I hesitate to enable use of these instructions since I don't have any Zen hardware to test on yet. [1] https://wiki.gentoo.org/wiki/Ryzen	2018-01-19 12:46:53 -06:00
Field G. Van Zee	1e4365b21b	Augment zen CFLAGS to prevent illegal instruction. Details: - Added various compiler flags (-mno-fma4 -mno-tbm -mno-xop -mno-lwp) so that compiling with -march=bdver4 on zen-based architectures does not result in an illegal instruction error at runtime. Note: This fix is only needed for gcc 5.4; gcc 6.3 or later supports the use of -march=znver1, which can be used in lieu of the augmented set of flags based on bdver4. Thanks to Nisanth Padinharepatt for reporting this error.	2018-01-18 12:03:51 -06:00
Field G. Van Zee	fa74af4e1f	Minor labeling update for './configure -c' output. Details: - Print the name of the configuration in the output of the kernel-to-config map (and chosen pairs list) as a subtle way to remind the user that these only apply to the targeted configuration (whereas the config list and kernel list are printed without regard to which configuration was actually targeted).	2018-01-09 13:43:15 -06:00
Field G. Van Zee	5cdea756c7	Merge branch 'rt'	2018-01-07 19:45:20 -06:00
Devin Matthews	9d8858b5cf	Merge pull request #164 from devinamatthews/master Don't use memkind for skx configuration.	2018-01-07 10:03:25 -06:00
Devin Matthews	f7df64daf6	Don't use memkind for skx configuration. Fixes #163 .	2018-01-07 09:37:25 -06:00
Field G. Van Zee	1e7a4896e0	Minor error handling in update-version-file.sh. Details: - Added explicit handling of situations when 'git describe --tags' returns an error. This command is used by update-version-file.sh when deciding whether or not to update the version file prior to configuration. - Removed bli_packm.c and bli_unpackm.c, as they contained no source code.	2018-01-05 12:33:48 -06:00
Field G. Van Zee	0b3ca3cfb6	Intelligently select compiler for auto-detection. Details: - Rewrote code that selects the compiler for the purposes of compiling the auto-detection executable. CC (if specified) is tried first. Then gcc. Then clang. The absolute fallback is cc. The previous code was sort of broken, and seemed to unintentionally always use gcc. - Moved various configuration-agnostic flags from config/*/make_defs.mk files to common.mk. The new mechanism appends the configuration- agnostic flags to the various compiler flag variables initialized in make_defs.mk. Flags specific to the sub-configuration are still set in make_defs.mk. - Added -Wno-tautological-compare to CMISCFLAGS when clang is in use. Also added the flag to the compiler instantiation during configure- time hardware detection (when clang is selected). - Added some missing (but mostly-optional) quotes to configure script.	2018-01-04 20:51:35 -06:00
Nisanth M P	5a7005dd44	Merge changes in AMD beta release 0.95 into amd branch	2018-01-03 12:37:53 +05:30
Field G. Van Zee	0b9c5127e9	Enabled C99, added stdint.h to auto-detect build. Details: - Added "-std=c99" to compiler arguments when building auto-detection driver in configure script. - Added #include <stdint.h> to all three source files needed by auto- detection program.	2017-12-23 15:53:44 -06:00
Field G. Van Zee	0ce5e19c31	Reimplemented configure-time hardware detection. Details: - Reimplemented the hardware detection functionality invoked when running "./configure auto". Previously, a standalone script in build/auto-detect that used CPUID was used. However, the script attempted to enumerate all models for each microarchitecture supported. The new approach recycles the same code used for runtime hardware detection introduced in `2c51356`. This has two immediate benefits. First, it reduces and consolidates the code required to detect microarchitectures via the CPUID instruction. Second, it provides an indirect way of testing at configure-time the code that is used to detect hardware at runtime. This code is (a) only activated when targeting a configuration family (such as intel64 or amd64) at configure-time and (b) somewhat difficult to test in practice, since it relies on having access to older microarchitectures. - The above change required placing conditional cpp macro blocks in bli_arch.c and bli_cpuid.c which either #include "blis.h" or #include a bare-bones set of headers that does not rely on the presence of a bli_config.h header. This is needed because bli_config.h has not been created yet when configure-time auto-detection takes places. - Defined a new function in bli_arch.c, bli_arch_string(), which takes an arch_t id and returns a pointer to a string that contains the lowercase name of the corresponding microarchitecture. This function is used by the auto-detection script to printf() the name of the sub-configuration corresponding to the detected hardware.	2017-12-23 15:32:03 -06:00
Field G. Van Zee	9804adfd40	Added option to disable pack buffer memory pools. Details: - Added a new configure option, --[en\|dis]able-packbuf-pools, which will enable or disable the use of internal memory pools for managing buffers used for packing. When disabled, the function specified by the cpp macro BLIS_MALLOC_POOL is called whenever a packing buffer is needed (and BLIS_FREE_POOL is called when the buffer is ready to be released, usually at the end of a loop). When enabled, which was the status quo prior to this commit, a memory pool data structure is created and managed to provide threads with packing buffers. The memory pool minimizes calls to bli_malloc_pool() (i.e., the wrapper that calls BLIS_MALLOC_POOL), but does so through a somewhat more complex mechanism that may incur additional overhead in some (but not all) situations. The new option defaults to --enable-packbuf-pools. - Removed the reinitialization of the memory pools from the level-3 front-ends and replaced it with automatic reinitialization within the pool API's implementation. This required an extra argument to bli_pool_checkout_block() in the form of a requested size, but hides the complexity entirely from BLIS. And since bli_pool_checkout_block() is only ever called within a critical section, this change fixes a potential race condition in which threads using contexts with different cache blocksizes--most likely a heterogeneous environment--can check out pool blocks that are too small for the submatrices it wishes to pack. Thanks to Nisanth Padinharepatt for reporting this potential issue. - Removed several functions in light of the relocation of pool reinit, including bli_membrk_reinit_pools(), bli_memsys_reinit(), bli_pool_reinit_if(), and bli_check_requested_block_size_for_pool(). - Updated the testsuite to print whether the memory pools are enabled or disabled.	2017-12-21 19:22:57 -06:00
Field G. Van Zee	107801aaae	Merge branch 'master' into selfinit	2017-12-18 16:29:28 -06:00
Field G. Van Zee	0084531d3e	Updated flatten-headers.py for python3. Details: - Modifed flatten-headers.py to work with python 3.x. This mostly amounted to removing print statements (which I replaced with calls to my_print(), a wrapper to sys.stdout.write()). Thanks to Stefan Husmann for pointing out the script's incompatibility with python 3. - Other minor changes/cleanups.	2017-12-17 18:58:25 -06:00
Field G. Van Zee	90b11b79c3	Modest performance boost to flatten-headers.py. Details: - Updated flatten-headers.py to pre-compile the main regular expression used to isolate #include directives and the header filenames they reference. The compiled regex object is then used over and over on each header file in the tree of referenced headers. This appears to have provided a 1.7-2x performance increase in the best case. - Other minor tweaks, such as renaming the main recursive function from replace_pass() to flatten_header().	2017-12-17 17:34:32 -06:00
Field G. Van Zee	99dee87f30	Reimplemented flatten-headers.sh in python. Details: - Added flatten-headers.py, a python implementation of the bash script flatten-headers.sh. The new script appears to be 25-100x faster, depending on the operating system, filesystem, etc. The python script abides by the same command line interface as its predecessor and targets python 2.7 or later. (Thanks to Devin Matthews for suggesting that I look into a python replacement for higher performance.) - Activated use of flatten-headers.py in common.mk via the FLATTEN_H variable. - Made minor tweaks to flatten-headers.sh such as spelling corrections in comments.	2017-12-17 16:47:27 -06:00
Field G. Van Zee	d9c0574599	Allow travis failures of OS X builds that run testsuite. Details: - Added an allowance for OS X builds that run the testsuite to fail. There seems to be an issue with 1m when running in Travis CI under OS X and clang, but only in double-precision. Haven't been able to reproduce the error on my own, and thus, I can't debug it. (Hopefully it is simply a version-specific compiler bug.)	2017-12-14 17:13:42 -06:00
Field G. Van Zee	86cd23b737	Fixed testsuite Makefile brokenness from `9091a207`. Details: - Fixed a makefile error encountered when building the testsuite directly in its directory (as opposed to indirectly via 'make test'). The fix involves introducing a new variable, BUILD_PATH, alongside the existing DIST_PATH variable. By default, BUILD_PATH is set to the current directory, and is overridden by other Makefiles used by, for example, the testsuite and standalone test drivers in testsuite or test, respectively. - Some files/directories in common.mk were redefined in terms of BUILD_DIR, such as the locations of config.mk file and the intermediate include directory.	2017-12-14 15:47:41 -06:00
Field G. Van Zee	6a3a8924c0	Temporarily show Makefile's testsuite output. Details: - Disabled redirection of testsuite output for 'test' target. This is part of an attempt to debug a segmentation fault on OS X via Travis.	2017-12-14 13:20:02 -06:00
Field G. Van Zee	9a01080dd4	Merge branch 'master' into selfinit	2017-12-14 11:27:19 -06:00
Field G. Van Zee	a32e8a47c0	Added an exclusion to .travis.yml. Details: - Added exclusion for out-of-tree builds on OS X (clang).	2017-12-13 16:31:36 -06:00
Field G. Van Zee	b9f7d987df	Cleaned up after previous travis oot debugging. Details: - Removed debugging output from common.mk related to Travis CI out-of-tree builds. - Other minor cleanups to common.mk.	2017-12-13 16:22:09 -06:00
Field G. Van Zee	9091a207aa	Attempted fix to travis oot build failure. Details: - Found the likely cause of the Travis CI out-of-tree build failures: config.mk was being read from DIST_PATH, rather than the current directory.	2017-12-13 16:12:34 -06:00
Field G. Van Zee	c01c71c33e	Added debugging output to Makefile. Details: - Added $(info ...) statements in key locations in an attempt to reveal why Travis CI doesn't like building BLIS out-of-tree.	2017-12-13 15:58:50 -06:00
Field G. Van Zee	784289d69d	Updated SHELL in common.mk from /bin/bash to bash.	2017-12-13 15:31:27 -06:00
Field G. Van Zee	d9bb1d1d4e	Defined SHELL in common.mk so "echo -n" works. Details: - Defined the SHELL variable in common.mk as "/bin/bash" so that the -n option can be used with echo in the Makefile rule for flattening blis.h. Thanks to Devin Matthews for suggesting this fix.	2017-12-13 15:27:54 -06:00
Field G. Van Zee	9289a08667	Attempt 3 on .travis.yml.	2017-12-13 15:14:27 -06:00
Field G. Van Zee	720bfcf0ef	More fixes to .travis.yml. Details: - Fixed a mistake (hopefully) in `d0c4dd0` that resulted in many more osx/clang sub-tests than intended. - Shortened the variable names in an effort to make them more readable via the Travis CI web interface.	2017-12-13 14:52:28 -06:00
Field G. Van Zee	8717c9c97f	Added 'pwd' commands to .travis.yml for debugging. Details: - Added 'pwd' commands to the script portion of the .travis.yml file in an attempt to uncover the problem with the recent out-of-tree build testing changes made in `d0c4dd0`.	2017-12-13 14:36:37 -06:00
Field G. Van Zee	83316485ce	Simplified/fixed self-initialization. Details: - Fixed a race condition in self-initialization whereby the bli_is_init static variable could be erroneously read as TRUE by thread 1 while thread 0 is still executing bli_init_apis(), thus allowing thread 1 to use the library before it is actually ready. Thanks to to Minh Quan Ho and Devin Matthews for pointing out this issue. - Part of the solution to the aforementioned race condition was involved replacing the runtime initialization of the global scalar constants (e.g., BLIS_ONE, BLIS_ZERO, etc.) in bli_const.c with a static initialization of those same constants. This eliminates the need for bli_const_init() altogether. (The static initialization is made concise via preprocess macros.) - Defined bli_gks_query_cntx_noinit(), which behaves just like bli_gks_query_cntx(), except that it does not call bli_init_once(). This function is called in lieu of bli_gks_query_cntx() in bli_ind_init() and bli_memsys_init() so as to not result in any recursion into bli_init_once(). - Removed BLIS_ONE_HALF, BLIS_MINUS_ONE_HALF global scalar constants. They have no use in BLIS or its test products, and we have little reason to believe they are used by others. - Removed testsuite/out file, which was accidentally committed as part of `70640a3`.	2017-12-13 14:14:50 -06:00
Field G. Van Zee	6526d1d4ae	Added temp_dir argument to flatten-headers.sh. Details: - Added "temp_dir" argument to flatten-headers.sh so that the caller can specify where intermediate files should be created as the script runs. - Updated flatten-headers.sh to create intermediate files in temp_dir instead of alongside the corresponding source files. This should now (once again) allow out-of-tree builds where the BLIS distribution is read-only, or where the out-of-tree build is running concurrently with another out-of-tree build. (Thanks to Devin Matthews for pointing out the possibility of simultaneous out-of-tree builds.)	2017-12-12 13:50:43 -06:00
Field G. Van Zee	94755017c9	Merge branch 'master' of github.com:flame/blis	2017-12-12 12:50:41 -06:00
Field G. Van Zee	d0c4dd000f	Added out-of-tree build test to .travis.yml file. Details: - Modified .travis.yml file to include an out-of-tree build test (using the "auto" configure target). Thanks to Devin Matthews for this suggestion.	2017-12-12 12:47:53 -06:00
Devin Matthews	5cf7b0c4e5	Ignore blis.h.interm [ci skip]	2017-12-12 12:38:48 -06:00
Field G. Van Zee	8d8ff74d15	Further attempt to fix out-of-tree builds. Details: - Fix applied in `87978f6` was necessary but not sufficient to fix out-of-tree builds. It turns out that using a source tree that had already built the target erroneously gave the impression that out-of-tree builds were working again, when in fact they were still broken. The additional changes in this commit should complete the fix that was started in the aforementioned commit. Thanks to Devin Matthews and Shaden Smith for their help in isolating this issue.	2017-12-12 12:32:50 -06:00
Field G. Van Zee	70640a3710	Implemented library self-initialization. Details: - Defined two new functions in bli_init.c: bli_init_once() and bli_finalize_once(). Each is implemented with pthread_once(), which guarantees that, among the threads that pass in the same pthread_once_t data structure, exactly one thread will execute a user-defined function. (Thus, there is now a runtime dependency against libpthread even when multithreading is not enabled at configure-time.) - Added calls to bli_init_once() to top-level user APIs for all computational operations as well as many other functions in BLIS to all but guarantee that BLIS will self-initialize through the normal use of its functions. - Rewrote and simplified bli_init() and bli_finalize() and related functions. - Added -lpthread to LDFLAGS in common.mk. - Modified the bli_init_auto()/_finalize_auto() functions used by the BLAS compatibility layer to take and return no arguments. (The previous API that tracked whether BLIS was initialized, and then only finalized if it was initialized in the same function, was too cute by half and borderline useless because by default BLIS stays initialized when auto-initialized via the compatibility layer.) - Removed static variables that track initialization of the sub-APIs in bli_const.c, bli_error.c, bli_init.c, bli_memsys.c, bli_thread, and bli_ind.c. We don't need to track initialization at the sub-API level, especially now that BLIS can self-initialize. - Added a critical section around the changing of the error checking level in bli_error.c. - Deprecated bli_ind_oper_has_avail() as well as all functions bli_<opname>_ind_get_avail(), where <opname> is a level-3 operation name. These functions had no use cases within BLIS and likely none outside of BLIS. - Commented out calls to bli_init() and bli_finalize() in testsuite's main() function, and likewise for standalone test drivers in 'test' directory, so that self-initialization is exercised by default.	2017-12-11 17:18:43 -06:00

1 2 3 4 5 ...

1117 Commits