Commit Graph

1111 Commits

Author SHA1 Message Date
Field G. Van Zee
3defc7265c Applied 34b72a3 to non-active/unused microkernels.
Details:
- Applied the read-beyond-bounds bugfix in 34b72a3 to other haswell and
  zen kernels (ie: other microtile shapes) which are not used by default.
  This was done mostly in case someone decided to pick up these kernels
  and start using them, not because it affects BLIS's behavior
  out-of-the-box.
2018-02-23 17:38:19 -06:00
Field G. Van Zee
34b72a3517 Fixed obscure read-beyond-bounds bug in sgemm ukrs.
Details:
- Fixed an obscure bug in the bli_sgemm_haswell_asm_6x16 and
  bli_sgemm_zen_asm_6x16 microkernels when the input/output matrix C
  is stored with general stride (ie: both rs and cs are non-unit). The
  bug was rooted in the way those microkernels read from matrix C--
  namely, they used vmovlps/vmovhps instead of movss. By loading two
  floats at a time, even if one of them was treated as junk, the
  assembly code could be written in a more concise manner. However,
  under certain conditions--if m % mr == 0 and n % nr == 0 and the
  underlying matrix is not an internal "view" into a larger matrix--
  this could result in the very last vmovhps of the last (bottom-right)
  microkernel invocation reading beyond valid memory. Specifically, the
  low 32 bits read would always be valid, but the high 32 bits could
  reside beyond the bounds of the array in which the output C matrix is
  contained. To remedy this situation, we now selectively use movss to
  load any element that could be the last element in the matrix.
2018-02-23 16:33:32 -06:00
Field G. Van Zee
5112e1859e Added missing 'restrict' to some kernels' cntx_t*.
Details:
- Added missing 'restrict' keyword to cntx_t* argument of function
  signatures corresponding to level-1v, level-1f, and level-1m kernels.
  This affected bli_l1v_ker_prot.h, bli_l1f_ker_prot.h, and
  bli_l1m_ker_prot.h. (The 'restrict' was already being used to
  qualify cntx_t* arguments for kernels defined in bli_l3_ker_prot.h.)
- Added comments to bli_l1v_ker.h, bli_l1f_ker.h, bli_l1m_ker.h, and
  bli_l3_ukr.h that help explain how those headers function to produce
  kernel prototypes using the prototype macros defined in the files
  mentioned above.
2018-02-23 14:31:26 -06:00
Field G. Van Zee
1fa8af95d8 Merge branch 'rt' 2018-02-21 17:54:02 -06:00
Field G. Van Zee
c084b03b31 Merge branch 'rt' 2018-02-21 17:52:17 -06:00
Field G. Van Zee
16813335bd Merge branch 'amd' into rt
Details:
- Merged contributions made by AMD via 'amd' branch (see summary below).
  Special thanks to AMD for their contributions to-date, especially with
  regard to intrinsic- and assembly-based kernels.
- Added column storage output cases to microkernels in
  bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with
  the extra cost of transposing the microtile in registers, this is
  much faster than using the general storage case when the underlying
  matrix is column-stored.
- Added s and d assembly-based zen gemmtrsm_u microkernel (including
  column storage optimization mentioned above).
- Updated zen sub-configuration to reflect presence of new native
  kernels.
- Temporarily reverted zen sub-configuration's level-3 cache blocksizes
  to smaller haswell values.
- Temporarily disabled small matrix handling for zen configuration
  family in config/zen/bli_family_zen.h.
- Updated zen CFLAGS according to changes in 1e4365b.
- Updated haswell microkernels such that:
  - only one vzeroupper instruction is called prior to returning
  - movapd/movupd are used in leiu of movaps/movups for double-real
    microkernels. (Note that single-real microkernels still use
    movaps/movups.)
- Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is
  now included via frame/include/bli_arch_config.h.
- Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation
  in testsuite/src/test_amaxv.c).
- Added early return for alpha == 0 in bli_dotxv_ref.c.
- Integrated changes from f07b176, including a fix for undefined
  behavior when executing the 1m method under certain conditions.
- Updated config_registry; no longer need haswell kernels for zen
  sub-configuration.
- Tweaked marginal and pass thresholds for dotxf.
- Reformatted level-1v, -1f, and -3 amd kernels and inserted additional
  comments.
- Updated LICENSE file to explicitly mention that parts are copyright
  UT-Austin and AMD.
- Added AMD copyright to header templates in build/templates.

Summary of previous changes from 'amd' branch.
- Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and
  s and d assembly-based zen gemmtrsm_l microkernels (d6x8).
- Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv,
  and scalv, with extra-unrolling variants for axpyv and scalv.
- Added a small matrix handler to bli_gemm_front(), with the handler
  implemented in kernels/zen/3/bli_gemm_small_matrix.c.
- Added additional logic to sumsqv that first attempts to compute the
  sum of the squares via dotv(). If there is a floating-point exception
  (FE_OVERFLOW), then the previous (numerically conservative) code is
  used; otherwise, the result of dotv() is square-rooted and stored as
  the result. This new implementation is only enabled when FE_OVERFLOW
  is #defined. If the macro is not #defined, then the previous
  implementation is used.
- Added axpyv and dotv standalone test drivers to test directory.
- Added zen support to old cpuid_x86.c driver in build/auto-detect/old.
- Added thread-local and __attribute__-related macros to bli_macro_defs.h.
2018-02-21 17:43:32 -06:00
Devin Matthews
5d03b6e6e1 Fix asm macro include line for KNL. Fixes #167. 2018-02-19 11:31:30 -06:00
Field G. Van Zee
f07b176c84 Fixed an obscure bug in the 1m implementation.
Details:
- Fixed a bug in the way the bli_gemm1m_cntx_ref() function (defined in
  ref_kernels/bli_cntx_ref.c) initializes its context for 1m execution.
  Previously, the function probed the context that was in the process of
  being updated for use with 1m--this context being previously
  initialized/copied from a native context--for its storage preference
  to determine which "variant" (row- or column-oriented) of 1m would be
  needed. However, the _cntx_ref() function was not updating the method
  field of the context until AFTER this query, and the conditional which
  depended on it, had taken place, meaning the storage preference query
  function would mistakenly think the context was for native execution,
  since the context's method field would still be set to BLIS_NAT. This
  would lead it to incorrectly grab the storage preference of the complex
  domain microkernel rather than the corresponding real domain
  microkernel, which could cause the storage preference predicate to
  evaluate to the wrong value, which would lead to the _cntx_ref()
  function choosing the wrong variant. This could lead to undefined
  behavior at runtime. The method is now explicitly set within the
  context prior to calling the storage preference query function.
- Updated comments in frame/ind/oapi/bli_l3_3m4m1m_oapi.c.
- Fixed a typo in the commented-out CFLAGS in config/zen/make_defs.mk,
  which are appropriate for gcc 6.x and newer. (Mistakenly used
  -march=bdver4 instead of -march=znver1.)
2018-02-15 18:36:54 -06:00
Field G. Van Zee
1f94bb7b96 Document how to enable zen-specific instructions.
Details:
- Added as a comment in config/zen/make_defs.mk the list of compiler flags
  that could be added to manually enable the instructions provided by the
  Zen microarchitecture that are not already implied by -march=bdver4.
  This information, along with the previous commit's flags to selectively
  disable Bulldozer instructions no longer present in Zen, was gathered
  from [1]. I hesitate to enable use of these instructions since I don't
  have any Zen hardware to test on yet.
  [1] https://wiki.gentoo.org/wiki/Ryzen
2018-01-19 12:46:53 -06:00
Field G. Van Zee
1e4365b21b Augment zen CFLAGS to prevent illegal instruction.
Details:
- Added various compiler flags (-mno-fma4 -mno-tbm -mno-xop -mno-lwp) so
  that compiling with -march=bdver4 on zen-based architectures does not
  result in an illegal instruction error at runtime. Note: This fix is
  only needed for gcc 5.4; gcc 6.3 or later supports the use of
  -march=znver1, which can be used in lieu of the augmented set of flags
  based on bdver4. Thanks to Nisanth Padinharepatt for reporting this
  error.
2018-01-18 12:03:51 -06:00
Field G. Van Zee
fa74af4e1f Minor labeling update for './configure -c' output.
Details:
- Print the name of the configuration in the output of the
  kernel-to-config map (and chosen pairs list) as a subtle way to remind
  the user that these only apply to the targeted configuration (whereas
  the config list and kernel list are printed without regard to which
  configuration was actually targeted).
2018-01-09 13:43:15 -06:00
Field G. Van Zee
5cdea756c7 Merge branch 'rt' 2018-01-07 19:45:20 -06:00
Devin Matthews
9d8858b5cf Merge pull request #164 from devinamatthews/master
Don't use memkind for skx configuration.
2018-01-07 10:03:25 -06:00
Devin Matthews
f7df64daf6 Don't use memkind for skx configuration. Fixes #163. 2018-01-07 09:37:25 -06:00
Field G. Van Zee
1e7a4896e0 Minor error handling in update-version-file.sh.
Details:
- Added explicit handling of situations when 'git describe --tags'
  returns an error. This command is used by update-version-file.sh
  when deciding whether or not to update the version file prior to
  configuration.
- Removed bli_packm.c and bli_unpackm.c, as they contained no source
  code.
2018-01-05 12:33:48 -06:00
Field G. Van Zee
0b3ca3cfb6 Intelligently select compiler for auto-detection.
Details:
- Rewrote code that selects the compiler for the purposes of compiling
  the auto-detection executable. CC (if specified) is tried first. Then
  gcc. Then clang. The absolute fallback is cc. The previous code was
  sort of broken, and seemed to unintentionally always use gcc.
- Moved various configuration-agnostic flags from config/*/make_defs.mk
  files to common.mk. The new mechanism appends the configuration-
  agnostic flags to the various compiler flag variables initialized in
  make_defs.mk. Flags specific to the sub-configuration are still set
  in make_defs.mk.
- Added -Wno-tautological-compare to CMISCFLAGS when clang is in use.
  Also added the flag to the compiler instantiation during configure-
  time hardware detection (when clang is selected).
- Added some missing (but mostly-optional) quotes to configure script.
2018-01-04 20:51:35 -06:00
Nisanth M P
5a7005dd44 Merge changes in AMD beta release 0.95 into amd branch 2018-01-03 12:37:53 +05:30
Field G. Van Zee
0b9c5127e9 Enabled C99, added stdint.h to auto-detect build.
Details:
- Added "-std=c99" to compiler arguments when building auto-detection
  driver in configure script.
- Added #include <stdint.h> to all three source files needed by auto-
  detection program.
2017-12-23 15:53:44 -06:00
Field G. Van Zee
0ce5e19c31 Reimplemented configure-time hardware detection.
Details:
- Reimplemented the hardware detection functionality invoked when running
  "./configure auto". Previously, a standalone script in build/auto-detect
  that used CPUID was used. However, the script attempted to enumerate all
  models for each microarchitecture supported. The new approach recycles
  the same code used for runtime hardware detection introduced in 2c51356.
  This has two immediate benefits. First, it reduces and consolidates the
  code required to detect microarchitectures via the CPUID instruction.
  Second, it provides an indirect way of testing at configure-time the
  code that is used to detect hardware at runtime. This code is (a) only
  activated when targeting a configuration family (such as intel64 or
  amd64) at configure-time and (b) somewhat difficult to test in
  practice, since it relies on having access to older microarchitectures.
- The above change required placing conditional cpp macro blocks in
  bli_arch.c and bli_cpuid.c which either #include "blis.h" or #include
  a bare-bones set of headers that does not rely on the presence of a
  bli_config.h header. This is needed because bli_config.h has not been
  created yet when configure-time auto-detection takes places.
- Defined a new function in bli_arch.c, bli_arch_string(), which takes
  an arch_t id and returns a pointer to a string that contains the
  lowercase name of the corresponding microarchitecture. This function
  is used by the auto-detection script to printf() the name of the
  sub-configuration corresponding to the detected hardware.
2017-12-23 15:32:03 -06:00
Field G. Van Zee
9804adfd40 Added option to disable pack buffer memory pools.
Details:
- Added a new configure option, --[en|dis]able-packbuf-pools, which will
  enable or disable the use of internal memory pools for managing buffers
  used for packing. When disabled, the function specified by the cpp
  macro BLIS_MALLOC_POOL is called whenever a packing buffer is needed
  (and BLIS_FREE_POOL is called when the buffer is ready to be released,
  usually at the end of a loop). When enabled, which was the status quo
  prior to this commit, a memory pool data structure is created and
  managed to provide threads with packing buffers. The memory pool
  minimizes calls to bli_malloc_pool() (i.e., the wrapper that calls
  BLIS_MALLOC_POOL), but does so through a somewhat more complex
  mechanism that may incur additional overhead in some (but not all)
  situations. The new option defaults to --enable-packbuf-pools.
- Removed the reinitialization of the memory pools from the level-3
  front-ends and replaced it with automatic reinitialization within the
  pool API's implementation. This required an extra argument to
  bli_pool_checkout_block() in the form of a requested size, but hides
  the complexity entirely from BLIS. And since bli_pool_checkout_block()
  is only ever called within a critical section, this change fixes a
  potential race condition in which threads using contexts with different
  cache blocksizes--most likely a heterogeneous environment--can check
  out pool blocks that are too small for the submatrices it wishes to
  pack. Thanks to Nisanth Padinharepatt for reporting this potential
  issue.
- Removed several functions in light of the relocation of pool reinit,
  including bli_membrk_reinit_pools(), bli_memsys_reinit(),
  bli_pool_reinit_if(), and bli_check_requested_block_size_for_pool().
- Updated the testsuite to print whether the memory pools are enabled or
  disabled.
2017-12-21 19:22:57 -06:00
Field G. Van Zee
107801aaae Merge branch 'master' into selfinit 2017-12-18 16:29:28 -06:00
Field G. Van Zee
0084531d3e Updated flatten-headers.py for python3.
Details:
- Modifed flatten-headers.py to work with python 3.x. This mostly
  amounted to removing print statements (which I replaced with calls
  to my_print(), a wrapper to sys.stdout.write()). Thanks to Stefan
  Husmann for pointing out the script's incompatibility with python 3.
- Other minor changes/cleanups.
2017-12-17 18:58:25 -06:00
Field G. Van Zee
90b11b79c3 Modest performance boost to flatten-headers.py.
Details:
- Updated flatten-headers.py to pre-compile the main regular expression
  used to isolate #include directives and the header filenames they
  reference. The compiled regex object is then used over and over on
  each header file in the tree of referenced headers. This appears to
  have provided a 1.7-2x performance increase in the best case.
- Other minor tweaks, such as renaming the main recursive function from
  replace_pass() to flatten_header().
2017-12-17 17:34:32 -06:00
Field G. Van Zee
99dee87f30 Reimplemented flatten-headers.sh in python.
Details:
- Added flatten-headers.py, a python implementation of the bash script
  flatten-headers.sh. The new script appears to be 25-100x faster,
  depending on the operating system, filesystem, etc. The python script
  abides by the same command line interface as its predecessor and
  targets python 2.7 or later. (Thanks to Devin Matthews for suggesting
  that I look into a python replacement for higher performance.)
- Activated use of flatten-headers.py in common.mk via the FLATTEN_H
  variable.
- Made minor tweaks to flatten-headers.sh such as spelling corrections
  in comments.
2017-12-17 16:47:27 -06:00
Field G. Van Zee
d9c0574599 Allow travis failures of OS X builds that run testsuite.
Details:
- Added an allowance for OS X builds that run the testsuite to fail.
  There seems to be an issue with 1m when running in Travis CI under
  OS X and clang, but only in double-precision. Haven't been able to
  reproduce the error on my own, and thus, I can't debug it. (Hopefully
  it is simply a version-specific compiler bug.)
2017-12-14 17:13:42 -06:00
Field G. Van Zee
86cd23b737 Fixed testsuite Makefile brokenness from 9091a207.
Details:
- Fixed a makefile error encountered when building the testsuite directly
  in its directory (as opposed to indirectly via 'make test'). The fix
  involves introducing a new variable, BUILD_PATH, alongside the existing
  DIST_PATH variable. By default, BUILD_PATH is set to the current
  directory, and is overridden by other Makefiles used by, for example,
  the testsuite and standalone test drivers in testsuite or test,
  respectively.
- Some files/directories in common.mk were redefined in terms of
  BUILD_DIR, such as the locations of config.mk file and the intermediate
  include directory.
2017-12-14 15:47:41 -06:00
Field G. Van Zee
6a3a8924c0 Temporarily show Makefile's testsuite output.
Details:
- Disabled redirection of testsuite output for 'test' target. This is
  part of an attempt to debug a segmentation fault on OS X via Travis.
2017-12-14 13:20:02 -06:00
Field G. Van Zee
9a01080dd4 Merge branch 'master' into selfinit 2017-12-14 11:27:19 -06:00
Field G. Van Zee
a32e8a47c0 Added an exclusion to .travis.yml.
Details:
- Added exclusion for out-of-tree builds on OS X (clang).
2017-12-13 16:31:36 -06:00
Field G. Van Zee
b9f7d987df Cleaned up after previous travis oot debugging.
Details:
- Removed debugging output from common.mk related to Travis CI
  out-of-tree builds.
- Other minor cleanups to common.mk.
2017-12-13 16:22:09 -06:00
Field G. Van Zee
9091a207aa Attempted fix to travis oot build failure.
Details:
- Found the likely cause of the Travis CI out-of-tree build failures:
  config.mk was being read from DIST_PATH, rather than the current
  directory.
2017-12-13 16:12:34 -06:00
Field G. Van Zee
c01c71c33e Added debugging output to Makefile.
Details:
- Added $(info ...) statements in key locations in an attempt to reveal
  why Travis CI doesn't like building BLIS out-of-tree.
2017-12-13 15:58:50 -06:00
Field G. Van Zee
784289d69d Updated SHELL in common.mk from /bin/bash to bash. 2017-12-13 15:31:27 -06:00
Field G. Van Zee
d9bb1d1d4e Defined SHELL in common.mk so "echo -n" works.
Details:
- Defined the SHELL variable in common.mk as "/bin/bash" so that the
  -n option can be used with echo in the Makefile rule for flattening
  blis.h. Thanks to Devin Matthews for suggesting this fix.
2017-12-13 15:27:54 -06:00
Field G. Van Zee
9289a08667 Attempt 3 on .travis.yml. 2017-12-13 15:14:27 -06:00
Field G. Van Zee
720bfcf0ef More fixes to .travis.yml.
Details:
- Fixed a mistake (hopefully) in d0c4dd0 that resulted in many more
  osx/clang sub-tests than intended.
- Shortened the variable names in an effort to make them more readable
  via the Travis CI web interface.
2017-12-13 14:52:28 -06:00
Field G. Van Zee
8717c9c97f Added 'pwd' commands to .travis.yml for debugging.
Details:
- Added 'pwd' commands to the script portion of the .travis.yml file in
  an attempt to uncover the problem with the recent out-of-tree build
  testing changes made in d0c4dd0.
2017-12-13 14:36:37 -06:00
Field G. Van Zee
83316485ce Simplified/fixed self-initialization.
Details:
- Fixed a race condition in self-initialization whereby the bli_is_init
  static variable could be erroneously read as TRUE by thread 1 while
  thread 0 is still executing bli_init_apis(), thus allowing thread 1 to
  use the library before it is actually ready. Thanks to to Minh Quan Ho
  and Devin Matthews for pointing out this issue.
- Part of the solution to the aforementioned race condition was involved
  replacing the runtime initialization of the global scalar constants
  (e.g., BLIS_ONE, BLIS_ZERO, etc.) in bli_const.c with a static
  initialization of those same constants. This eliminates the need for
  bli_const_init() altogether. (The static initialization is made concise
  via preprocess macros.)
- Defined bli_gks_query_cntx_noinit(), which behaves just like
  bli_gks_query_cntx(), except that it does not call bli_init_once(). This
  function is called in lieu of bli_gks_query_cntx() in bli_ind_init() and
  bli_memsys_init() so as to not result in any recursion into
  bli_init_once().
- Removed BLIS_ONE_HALF, BLIS_MINUS_ONE_HALF global scalar constants.
  They have no use in BLIS or its test products, and we have little reason
  to believe they are used by others.
- Removed testsuite/out file, which was accidentally committed as part
  of 70640a3.
2017-12-13 14:14:50 -06:00
Field G. Van Zee
6526d1d4ae Added temp_dir argument to flatten-headers.sh.
Details:
- Added "temp_dir" argument to flatten-headers.sh so that the caller can
  specify where intermediate files should be created as the script runs.
- Updated flatten-headers.sh to create intermediate files in temp_dir
  instead of alongside the corresponding source files. This should now
  (once again) allow out-of-tree builds where the BLIS distribution is
  read-only, or where the out-of-tree build is running concurrently with
  another out-of-tree build. (Thanks to Devin Matthews for pointing out
  the possibility of simultaneous out-of-tree builds.)
2017-12-12 13:50:43 -06:00
Field G. Van Zee
94755017c9 Merge branch 'master' of github.com:flame/blis 2017-12-12 12:50:41 -06:00
Field G. Van Zee
d0c4dd000f Added out-of-tree build test to .travis.yml file.
Details:
- Modified .travis.yml file to include an out-of-tree build test (using
  the "auto" configure target). Thanks to Devin Matthews for this
  suggestion.
2017-12-12 12:47:53 -06:00
Devin Matthews
5cf7b0c4e5 Ignore blis.h.interm [ci skip] 2017-12-12 12:38:48 -06:00
Field G. Van Zee
8d8ff74d15 Further attempt to fix out-of-tree builds.
Details:
- Fix applied in 87978f6 was necessary but not sufficient to fix
  out-of-tree builds. It turns out that using a source tree that had
  already built the target erroneously gave the impression that
  out-of-tree builds were working again, when in fact they were still
  broken. The additional changes in this commit should complete the
  fix that was started in the aforementioned commit. Thanks to Devin
  Matthews and Shaden Smith for their help in isolating this issue.
2017-12-12 12:32:50 -06:00
Field G. Van Zee
70640a3710 Implemented library self-initialization.
Details:
- Defined two new functions in bli_init.c: bli_init_once() and
  bli_finalize_once(). Each is implemented with pthread_once(), which
  guarantees that, among the threads that pass in the same pthread_once_t
  data structure, exactly one thread will execute a user-defined function.
  (Thus, there is now a runtime dependency against libpthread even when
  multithreading is not enabled at configure-time.)
- Added calls to bli_init_once() to top-level user APIs for all
  computational operations as well as many other functions in BLIS to
  all but guarantee that BLIS will self-initialize through the normal
  use of its functions.
- Rewrote and simplified bli_init() and bli_finalize() and related
  functions.
- Added -lpthread to LDFLAGS in common.mk.
- Modified the bli_init_auto()/_finalize_auto() functions used by the
  BLAS compatibility layer to take and return no arguments. (The
  previous API that tracked whether BLIS was initialized, and then
  only finalized if it was initialized in the same function, was too
  cute by half and borderline useless because by default BLIS stays
  initialized when auto-initialized via the compatibility layer.)
- Removed static variables that track initialization of the sub-APIs in
  bli_const.c, bli_error.c, bli_init.c, bli_memsys.c, bli_thread, and
  bli_ind.c. We don't need to track initialization at the sub-API level,
  especially now that BLIS can self-initialize.
- Added a critical section around the changing of the error checking
  level in bli_error.c.
- Deprecated bli_ind_oper_has_avail() as well as all functions
  bli_<opname>_ind_get_avail(), where <opname> is a level-3 operation
  name. These functions had no use cases within BLIS and likely none
  outside of BLIS.
- Commented out calls to bli_init() and bli_finalize() in testsuite's
  main() function, and likewise for standalone test drivers in 'test'
  directory, so that self-initialization is exercised by default.
2017-12-11 17:18:43 -06:00
Field G. Van Zee
70a64432ee Fixed off-by-one indexing in bli_cpuid.c.
Details:
- In bli_cpuid.c, fixed an off-by-one indexing statement in vpu_count()
  whereby a string-terminating NULL character, '\0', is written beyond
  the bounds of the model_num string.
- Minor whitespace and formatting edits to bli_cpuid.c.
2017-12-11 13:14:20 -06:00
Field G. Van Zee
87978f6261 Fixed broken out-of-tree builds since 52f9e6f.
Details:
- Added missing $(DIST_PATH)/ prefix to relative path to flatten-headers.sh
  script in common.mk so that the script could be found during out-of-tree
  builds. Thanks to Devin Matthews for reporting this bug.
2017-12-11 12:49:03 -06:00
Field G. Van Zee
513ef4d040 Various typecasting fixes, mis-typed enums, etc.
Details:
- Fixed implicit typecasting of conj_t to trans_t in bli_[un]packm_cxk.c.
- Properly typecast integer arguments to match format specifier in various
  calls to printf() in bli_l3_thrinfo.c, bli_cntx.c, bli_pool.c, and
  bli_util_oapi.c.
- Fixed "unsigned less-than-comparison with zero" checks in bli_check.c,
  bli_cntx.h.
- Fixed mis-typed enums in bli_cntx.c (e.g., l1mkr_t that should have been
  l1fkr_t or l1vkr_t).
- Fixed instances of opid_t value BLIS_GEMM that should have been l3ukr_t
  value BLIS_GEMM_UKR in bli_cntx_ref.c.
- NOTE: These issues were identified via compiler warnings when building
  BLIS with clang on a rather old installation of OS X:
    $ clang --version
    Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
    Target: x86_64-apple-darwin15.2.0
    Thread model: posix
2017-12-11 12:35:59 -06:00
prangana
3bc99a96a3 Fix merge conflicts after rebase with release branch
Change-Id: I581b26c6d515f717ff0dce91c7c0c92553aa2630
betarelease-0.95
2017-12-11 13:07:59 +05:30
Nisanth M P
3a44118398 Added AMD copyright line to the changed files in last 3 commits
Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66
2017-12-11 12:41:02 +05:30
Field G. Van Zee
268a56c06e Revert to default SIMD alignment for bulldozer.
Details:
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
  config/bulldozer/bli_kernel.h. Not sure where this value came from, but
  it would seem to allow for insufficient starting address alignment for
  any matrices created via bli_malloc_user(), such as via
  bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
  led us to this bug.
- This commit is a manual patch of the same fix made to the 'rt' branch
  in 8f150f2.
2017-12-11 12:12:29 +05:30