Commit Graph

1886 Commits

Author SHA1 Message Date
Field G. Van Zee
86cd23b737 Fixed testsuite Makefile brokenness from 9091a207.
Details:
- Fixed a makefile error encountered when building the testsuite directly
  in its directory (as opposed to indirectly via 'make test'). The fix
  involves introducing a new variable, BUILD_PATH, alongside the existing
  DIST_PATH variable. By default, BUILD_PATH is set to the current
  directory, and is overridden by other Makefiles used by, for example,
  the testsuite and standalone test drivers in testsuite or test,
  respectively.
- Some files/directories in common.mk were redefined in terms of
  BUILD_DIR, such as the locations of config.mk file and the intermediate
  include directory.
2017-12-14 15:47:41 -06:00
Field G. Van Zee
6a3a8924c0 Temporarily show Makefile's testsuite output.
Details:
- Disabled redirection of testsuite output for 'test' target. This is
  part of an attempt to debug a segmentation fault on OS X via Travis.
2017-12-14 13:20:02 -06:00
Field G. Van Zee
9a01080dd4 Merge branch 'master' into selfinit 2017-12-14 11:27:19 -06:00
Field G. Van Zee
a32e8a47c0 Added an exclusion to .travis.yml.
Details:
- Added exclusion for out-of-tree builds on OS X (clang).
2017-12-13 16:31:36 -06:00
Field G. Van Zee
b9f7d987df Cleaned up after previous travis oot debugging.
Details:
- Removed debugging output from common.mk related to Travis CI
  out-of-tree builds.
- Other minor cleanups to common.mk.
2017-12-13 16:22:09 -06:00
Field G. Van Zee
9091a207aa Attempted fix to travis oot build failure.
Details:
- Found the likely cause of the Travis CI out-of-tree build failures:
  config.mk was being read from DIST_PATH, rather than the current
  directory.
2017-12-13 16:12:34 -06:00
Field G. Van Zee
c01c71c33e Added debugging output to Makefile.
Details:
- Added $(info ...) statements in key locations in an attempt to reveal
  why Travis CI doesn't like building BLIS out-of-tree.
2017-12-13 15:58:50 -06:00
Field G. Van Zee
784289d69d Updated SHELL in common.mk from /bin/bash to bash. 2017-12-13 15:31:27 -06:00
Field G. Van Zee
d9bb1d1d4e Defined SHELL in common.mk so "echo -n" works.
Details:
- Defined the SHELL variable in common.mk as "/bin/bash" so that the
  -n option can be used with echo in the Makefile rule for flattening
  blis.h. Thanks to Devin Matthews for suggesting this fix.
2017-12-13 15:27:54 -06:00
Field G. Van Zee
9289a08667 Attempt 3 on .travis.yml. 2017-12-13 15:14:27 -06:00
Field G. Van Zee
720bfcf0ef More fixes to .travis.yml.
Details:
- Fixed a mistake (hopefully) in d0c4dd0 that resulted in many more
  osx/clang sub-tests than intended.
- Shortened the variable names in an effort to make them more readable
  via the Travis CI web interface.
2017-12-13 14:52:28 -06:00
Field G. Van Zee
8717c9c97f Added 'pwd' commands to .travis.yml for debugging.
Details:
- Added 'pwd' commands to the script portion of the .travis.yml file in
  an attempt to uncover the problem with the recent out-of-tree build
  testing changes made in d0c4dd0.
2017-12-13 14:36:37 -06:00
Field G. Van Zee
83316485ce Simplified/fixed self-initialization.
Details:
- Fixed a race condition in self-initialization whereby the bli_is_init
  static variable could be erroneously read as TRUE by thread 1 while
  thread 0 is still executing bli_init_apis(), thus allowing thread 1 to
  use the library before it is actually ready. Thanks to to Minh Quan Ho
  and Devin Matthews for pointing out this issue.
- Part of the solution to the aforementioned race condition was involved
  replacing the runtime initialization of the global scalar constants
  (e.g., BLIS_ONE, BLIS_ZERO, etc.) in bli_const.c with a static
  initialization of those same constants. This eliminates the need for
  bli_const_init() altogether. (The static initialization is made concise
  via preprocess macros.)
- Defined bli_gks_query_cntx_noinit(), which behaves just like
  bli_gks_query_cntx(), except that it does not call bli_init_once(). This
  function is called in lieu of bli_gks_query_cntx() in bli_ind_init() and
  bli_memsys_init() so as to not result in any recursion into
  bli_init_once().
- Removed BLIS_ONE_HALF, BLIS_MINUS_ONE_HALF global scalar constants.
  They have no use in BLIS or its test products, and we have little reason
  to believe they are used by others.
- Removed testsuite/out file, which was accidentally committed as part
  of 70640a3.
2017-12-13 14:14:50 -06:00
Field G. Van Zee
6526d1d4ae Added temp_dir argument to flatten-headers.sh.
Details:
- Added "temp_dir" argument to flatten-headers.sh so that the caller can
  specify where intermediate files should be created as the script runs.
- Updated flatten-headers.sh to create intermediate files in temp_dir
  instead of alongside the corresponding source files. This should now
  (once again) allow out-of-tree builds where the BLIS distribution is
  read-only, or where the out-of-tree build is running concurrently with
  another out-of-tree build. (Thanks to Devin Matthews for pointing out
  the possibility of simultaneous out-of-tree builds.)
2017-12-12 13:50:43 -06:00
Field G. Van Zee
94755017c9 Merge branch 'master' of github.com:flame/blis 2017-12-12 12:50:41 -06:00
Field G. Van Zee
d0c4dd000f Added out-of-tree build test to .travis.yml file.
Details:
- Modified .travis.yml file to include an out-of-tree build test (using
  the "auto" configure target). Thanks to Devin Matthews for this
  suggestion.
2017-12-12 12:47:53 -06:00
Devin Matthews
5cf7b0c4e5 Ignore blis.h.interm [ci skip] 2017-12-12 12:38:48 -06:00
Field G. Van Zee
8d8ff74d15 Further attempt to fix out-of-tree builds.
Details:
- Fix applied in 87978f6 was necessary but not sufficient to fix
  out-of-tree builds. It turns out that using a source tree that had
  already built the target erroneously gave the impression that
  out-of-tree builds were working again, when in fact they were still
  broken. The additional changes in this commit should complete the
  fix that was started in the aforementioned commit. Thanks to Devin
  Matthews and Shaden Smith for their help in isolating this issue.
2017-12-12 12:32:50 -06:00
Field G. Van Zee
70640a3710 Implemented library self-initialization.
Details:
- Defined two new functions in bli_init.c: bli_init_once() and
  bli_finalize_once(). Each is implemented with pthread_once(), which
  guarantees that, among the threads that pass in the same pthread_once_t
  data structure, exactly one thread will execute a user-defined function.
  (Thus, there is now a runtime dependency against libpthread even when
  multithreading is not enabled at configure-time.)
- Added calls to bli_init_once() to top-level user APIs for all
  computational operations as well as many other functions in BLIS to
  all but guarantee that BLIS will self-initialize through the normal
  use of its functions.
- Rewrote and simplified bli_init() and bli_finalize() and related
  functions.
- Added -lpthread to LDFLAGS in common.mk.
- Modified the bli_init_auto()/_finalize_auto() functions used by the
  BLAS compatibility layer to take and return no arguments. (The
  previous API that tracked whether BLIS was initialized, and then
  only finalized if it was initialized in the same function, was too
  cute by half and borderline useless because by default BLIS stays
  initialized when auto-initialized via the compatibility layer.)
- Removed static variables that track initialization of the sub-APIs in
  bli_const.c, bli_error.c, bli_init.c, bli_memsys.c, bli_thread, and
  bli_ind.c. We don't need to track initialization at the sub-API level,
  especially now that BLIS can self-initialize.
- Added a critical section around the changing of the error checking
  level in bli_error.c.
- Deprecated bli_ind_oper_has_avail() as well as all functions
  bli_<opname>_ind_get_avail(), where <opname> is a level-3 operation
  name. These functions had no use cases within BLIS and likely none
  outside of BLIS.
- Commented out calls to bli_init() and bli_finalize() in testsuite's
  main() function, and likewise for standalone test drivers in 'test'
  directory, so that self-initialization is exercised by default.
2017-12-11 17:18:43 -06:00
Field G. Van Zee
70a64432ee Fixed off-by-one indexing in bli_cpuid.c.
Details:
- In bli_cpuid.c, fixed an off-by-one indexing statement in vpu_count()
  whereby a string-terminating NULL character, '\0', is written beyond
  the bounds of the model_num string.
- Minor whitespace and formatting edits to bli_cpuid.c.
2017-12-11 13:14:20 -06:00
Field G. Van Zee
87978f6261 Fixed broken out-of-tree builds since 52f9e6f.
Details:
- Added missing $(DIST_PATH)/ prefix to relative path to flatten-headers.sh
  script in common.mk so that the script could be found during out-of-tree
  builds. Thanks to Devin Matthews for reporting this bug.
2017-12-11 12:49:03 -06:00
Field G. Van Zee
513ef4d040 Various typecasting fixes, mis-typed enums, etc.
Details:
- Fixed implicit typecasting of conj_t to trans_t in bli_[un]packm_cxk.c.
- Properly typecast integer arguments to match format specifier in various
  calls to printf() in bli_l3_thrinfo.c, bli_cntx.c, bli_pool.c, and
  bli_util_oapi.c.
- Fixed "unsigned less-than-comparison with zero" checks in bli_check.c,
  bli_cntx.h.
- Fixed mis-typed enums in bli_cntx.c (e.g., l1mkr_t that should have been
  l1fkr_t or l1vkr_t).
- Fixed instances of opid_t value BLIS_GEMM that should have been l3ukr_t
  value BLIS_GEMM_UKR in bli_cntx_ref.c.
- NOTE: These issues were identified via compiler warnings when building
  BLIS with clang on a rather old installation of OS X:
    $ clang --version
    Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
    Target: x86_64-apple-darwin15.2.0
    Thread model: posix
2017-12-11 12:35:59 -06:00
prangana
3bc99a96a3 Fix merge conflicts after rebase with release branch
Change-Id: I581b26c6d515f717ff0dce91c7c0c92553aa2630
betarelease-0.95
2017-12-11 13:07:59 +05:30
Nisanth M P
3a44118398 Added AMD copyright line to the changed files in last 3 commits
Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66
2017-12-11 12:41:02 +05:30
Field G. Van Zee
268a56c06e Revert to default SIMD alignment for bulldozer.
Details:
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
  config/bulldozer/bli_kernel.h. Not sure where this value came from, but
  it would seem to allow for insufficient starting address alignment for
  any matrices created via bli_malloc_user(), such as via
  bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
  led us to this bug.
- This commit is a manual patch of the same fix made to the 'rt' branch
  in 8f150f2.
2017-12-11 12:12:29 +05:30
Devin Matthews
510a6863e2 Fix CVECFLAGS for bulldozer config. 2017-12-11 12:12:29 +05:30
Nisanth M P
c669716790 Adding __attribute__((constructor/destructor)) for CLANG case.
CLANG supports __attribute__, but its documentation doesn't
mention support for constructor/destructor. Compiling with
clang and testing shows that it does support this.

Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b
2017-12-11 12:12:29 +05:30
Field G. Van Zee
24e64a9d08 Removed a duplicate bli_avx512_macros.h header.
Details:
- Removed a duplicate header file that was causing problems during
  installation for the 'knl' configuration. Thanks to Victor Eijkhout
  for reporting this issue.
2017-12-11 12:12:29 +05:30
Nisanth M P
9c0a3c4c02 Thread Safety: Move bli_init() before and bli_finalize() after main()
BLIS provides APIs to initialize and finalize its global context.
One application thread can finalize BLIS, while other threads
in the application are stil using BLIS.

This issue can be solved by removing bli_finalize() from API.
One way to do this is by getting bli_finalize() to execute by default
after application exits from main().

GCC supports this behaviour with the help of __attribute__((destructor))
added to the function that need to be executed after main exits.

Similarly bli_init() can be made to run before application enters main()
so that application need not call it.

Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac
2017-12-11 12:12:29 +05:30
Nisanth M P
83f31253eb Thread safety: Make the global induced method status array local to thread
BLIS retains a global status array for induced methods, and provides
APIs to modify this state during runtime. So, one application thread
can modify the state, before another starts the corresponding
BLIS operation.

This patch solves this issue by making the induced method status array
local to threads.

Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe
2017-12-11 12:12:29 +05:30
sthangar
e923402e68 The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default
Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5
2017-12-11 12:12:29 +05:30
Field G. Van Zee
a64c15de19 Fixed a pthread typo in previous commit.
Details:
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
42dcd589c3 Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
Details:
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
  into a k x k triangular matrix for the purposes of obtaining an mr x k
  micropanel of A with which to test.
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
  very large k (depending on the product of mr x kc on that architecture).
  The bug arose from the fact that the test module was triggering the
  allocation of blocks from the internal memory pools, which are limited in
  size. This allocation imposes an implicit assumption that the micro-
  panel being tested with will fit inside, and this assumption is violated
  for large values of k. Arbitrarily large k may now be tested for both
  operation tests.
- Added OpenMP/pthread critical sections around the setting or getting of
  statuses from the induced method operation lookup table in bli_l3_ind.c.
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
  issues.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
206beb68ff Updated bibtex info for BLIS5 (3m4m) article. 2017-12-11 12:12:29 +05:30
sthangar
0c8c0363ae Bug fix for the testsuite build failing
Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77
2017-12-11 12:12:29 +05:30
sthangar
63d1c84465 Adding auto hardware detection for Zen
Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf
2017-12-11 12:12:29 +05:30
Devin Matthews
537fb2a895 Add vzeroupper to Intel AVX kernels. 2017-12-11 12:12:29 +05:30
Field G. Van Zee
7628de3f76 Removed trailing enum commas from bli_type_defs.h.
Details:
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
  Erling Andersen for pointing out this inconsistency and suggesting
  the change.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
a666fd4e26 Added edge handling to _determine_blocksize_b().
Details:
- Added explicit handling of situations where i == dim to
  bli_determine_blocksize_b_sub(). This isn't actually needed by any
  current use case within BLIS, but handling the situation is nonetheless
  prudent. Thanks to Minh Quan for reporting this issue and requesting
  the fix.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
0c8afa546d Fixed a minor bug in level-3 packm management.
Details:
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
  entries to be released and then re-acquired unnecessarily. (In essence,
  the "<" operands in the conditional that guards the
  release-and-reacquire code block simply needed to be swapped.) The bug
  should have only affected performance (rather than the computed result).
  Thanks to Minh Quan for identifying and reporting the bug.
2017-12-11 12:12:29 +05:30
Devin Matthews
6cf68a185d Change lsame_ signature to match lapacke. 2017-12-11 12:12:29 +05:30
Field G. Van Zee
6a9bd97295 Fixed pthreads compile bug with previous commit.
Details:
- Erroneously passed family parameter into l3int_t function despite
  that function not taking the parameter. Oops.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
95adc43d80 Moved 'family' field from cntx_t to cntl_t.
Details:
- Removed the family field inside the cntx_t struct and re-added it to the
  cntl_t struct. Updated all accessor functions/macros accordingly, as well
  as all consumers and intermediaries of the family parameter (such as
  bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
  change was motivated by the desire to keep the context limited, as much
  as possible, to information about the computing environment. (The family
  field, by contrast, is a descriptor about the operation being executed.)
- Added additional functions to bli_blksz_*() API.
- Added additional functions to bli_cntx_*() API.
- Minor updates to bli_func.c, bli_mbool.c.
- Removed 'obj' from bli_blksz_*() API names.
- Removed 'obj' from bli_cntx_*() API names.
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
  that operate only on a single struct to contain the "_node" suffix to
  differentiate with those routines that operate on the entire tree.
- Added enums for packm and unpackm kernels to bli_type_defs.h.
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
  They weren't being used and probably never will be.
2017-12-11 12:12:29 +05:30
Devin Matthews
a98e4aa547 Clang can't make up it's mind what to support. 2017-12-11 12:11:07 +05:30
Devin Matthews
32eb36c3e8 Add default #define for __has_extension. 2017-12-11 12:11:07 +05:30
Devin Matthews
2a9aa134f7 Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143. 2017-12-11 12:11:07 +05:30
Field G. Van Zee
6f07a034d5 Updated ar option list used by all configurations.
Details:
- Dropped 'u' from the list of modifiers passed into the library archiver
  ar. Previously, "cru" was used, while now we employ only "cr". This
  change was prompted by a warning observed on Ubuntu 16.04:

    ar: `u' modifier ignored since `D' is the default (see `U')

  This caused me to realize that the default mode causes timestamps to be
  zero, and thus the 'u' option, which causes only changed object files to
  be inserted, is not applicable.
2017-12-11 12:11:07 +05:30
Field G. Van Zee
32bc03f9ee Added --force-version=STRING option to configure.
Details:
- Added an option to configure that allows the user to force an arbitrary
  version string at configure-time. The help text also now describes the
  usage information.
- Changed the way the version string is communicated to the Makefile.
  Previously, it was read into the VERSION variable from the 'version' file
  via $(shell cat ...). Now, the VERSION variable is instead set in
  config.mk (via a configure-substituted anchor from config.mk.in).
2017-12-11 12:08:58 +05:30
Field G. Van Zee
befaee6dd8 Updated openmp/pthread barriers with GNU atomics.
Details:
- Updated the non-tree openmp and pthreads barriers defined in
  bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
  implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
  implementation goes through the same motions as the previous codes, but
  protects its loads and increments with GNU atomic built-ins. These atomic
  statements take memory ordering parameters that allow us to specify just
  enough constraints for the barrier to work as intended on weakly-ordered
  hardware. The prior implementation was only guaranteed to work on systems
  with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
  this change and his crash-course in atomics and memory ordering.)
- Removed 'volatile' from structs' barrier field declarations in
  bli_thrcomm_*.h.
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
  consistent with that of the _openmp.? files.
- Updated other bli_thrcomm_* files to rename "communicator" variables to
  simply "comm".
2017-12-11 12:08:58 +05:30
Field G. Van Zee
8f739cc847 Added API to set mt environment variables.
Details:
- Renamed bli_env_get_nway() -> bli_thread_get_env().
- Added bli_thread_set_env() to allow setting environment variables
  pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
- Added the following convenience wrapper routines:
    bli_thread_get_jc_nt()
    bli_thread_get_ic_nt()
    bli_thread_get_jr_nt()
    bli_thread_get_ir_nt()
    bli_thread_get_num_threads()
    bli_thread_set_jc_nt()
    bli_thread_set_ic_nt()
    bli_thread_set_jr_nt()
    bli_thread_set_ir_nt()
    bli_thread_set_num_threads()
- Added #include "errno.h" to bli_system.h.
- This commit addresses issue #140.
- Thanks to Chris Goodyer for inspiring these updates.
2017-12-11 12:08:58 +05:30