Commit Graph

1015 Commits

Author SHA1 Message Date
Nisanth M P
5a7005dd44 Merge changes in AMD beta release 0.95 into amd branch 2018-01-03 12:37:53 +05:30
prangana
3bc99a96a3 Fix merge conflicts after rebase with release branch
Change-Id: I581b26c6d515f717ff0dce91c7c0c92553aa2630
betarelease-0.95
2017-12-11 13:07:59 +05:30
Nisanth M P
3a44118398 Added AMD copyright line to the changed files in last 3 commits
Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66
2017-12-11 12:41:02 +05:30
Field G. Van Zee
268a56c06e Revert to default SIMD alignment for bulldozer.
Details:
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
  config/bulldozer/bli_kernel.h. Not sure where this value came from, but
  it would seem to allow for insufficient starting address alignment for
  any matrices created via bli_malloc_user(), such as via
  bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
  led us to this bug.
- This commit is a manual patch of the same fix made to the 'rt' branch
  in 8f150f2.
2017-12-11 12:12:29 +05:30
Devin Matthews
510a6863e2 Fix CVECFLAGS for bulldozer config. 2017-12-11 12:12:29 +05:30
Nisanth M P
c669716790 Adding __attribute__((constructor/destructor)) for CLANG case.
CLANG supports __attribute__, but its documentation doesn't
mention support for constructor/destructor. Compiling with
clang and testing shows that it does support this.

Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b
2017-12-11 12:12:29 +05:30
Field G. Van Zee
24e64a9d08 Removed a duplicate bli_avx512_macros.h header.
Details:
- Removed a duplicate header file that was causing problems during
  installation for the 'knl' configuration. Thanks to Victor Eijkhout
  for reporting this issue.
2017-12-11 12:12:29 +05:30
Nisanth M P
9c0a3c4c02 Thread Safety: Move bli_init() before and bli_finalize() after main()
BLIS provides APIs to initialize and finalize its global context.
One application thread can finalize BLIS, while other threads
in the application are stil using BLIS.

This issue can be solved by removing bli_finalize() from API.
One way to do this is by getting bli_finalize() to execute by default
after application exits from main().

GCC supports this behaviour with the help of __attribute__((destructor))
added to the function that need to be executed after main exits.

Similarly bli_init() can be made to run before application enters main()
so that application need not call it.

Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac
2017-12-11 12:12:29 +05:30
Nisanth M P
83f31253eb Thread safety: Make the global induced method status array local to thread
BLIS retains a global status array for induced methods, and provides
APIs to modify this state during runtime. So, one application thread
can modify the state, before another starts the corresponding
BLIS operation.

This patch solves this issue by making the induced method status array
local to threads.

Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe
2017-12-11 12:12:29 +05:30
sthangar
e923402e68 The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default
Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5
2017-12-11 12:12:29 +05:30
Field G. Van Zee
a64c15de19 Fixed a pthread typo in previous commit.
Details:
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
42dcd589c3 Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
Details:
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
  into a k x k triangular matrix for the purposes of obtaining an mr x k
  micropanel of A with which to test.
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
  very large k (depending on the product of mr x kc on that architecture).
  The bug arose from the fact that the test module was triggering the
  allocation of blocks from the internal memory pools, which are limited in
  size. This allocation imposes an implicit assumption that the micro-
  panel being tested with will fit inside, and this assumption is violated
  for large values of k. Arbitrarily large k may now be tested for both
  operation tests.
- Added OpenMP/pthread critical sections around the setting or getting of
  statuses from the induced method operation lookup table in bli_l3_ind.c.
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
  issues.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
206beb68ff Updated bibtex info for BLIS5 (3m4m) article. 2017-12-11 12:12:29 +05:30
sthangar
0c8c0363ae Bug fix for the testsuite build failing
Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77
2017-12-11 12:12:29 +05:30
sthangar
63d1c84465 Adding auto hardware detection for Zen
Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf
2017-12-11 12:12:29 +05:30
Devin Matthews
537fb2a895 Add vzeroupper to Intel AVX kernels. 2017-12-11 12:12:29 +05:30
Field G. Van Zee
7628de3f76 Removed trailing enum commas from bli_type_defs.h.
Details:
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
  Erling Andersen for pointing out this inconsistency and suggesting
  the change.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
a666fd4e26 Added edge handling to _determine_blocksize_b().
Details:
- Added explicit handling of situations where i == dim to
  bli_determine_blocksize_b_sub(). This isn't actually needed by any
  current use case within BLIS, but handling the situation is nonetheless
  prudent. Thanks to Minh Quan for reporting this issue and requesting
  the fix.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
0c8afa546d Fixed a minor bug in level-3 packm management.
Details:
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
  entries to be released and then re-acquired unnecessarily. (In essence,
  the "<" operands in the conditional that guards the
  release-and-reacquire code block simply needed to be swapped.) The bug
  should have only affected performance (rather than the computed result).
  Thanks to Minh Quan for identifying and reporting the bug.
2017-12-11 12:12:29 +05:30
Devin Matthews
6cf68a185d Change lsame_ signature to match lapacke. 2017-12-11 12:12:29 +05:30
Field G. Van Zee
6a9bd97295 Fixed pthreads compile bug with previous commit.
Details:
- Erroneously passed family parameter into l3int_t function despite
  that function not taking the parameter. Oops.
2017-12-11 12:12:29 +05:30
Field G. Van Zee
95adc43d80 Moved 'family' field from cntx_t to cntl_t.
Details:
- Removed the family field inside the cntx_t struct and re-added it to the
  cntl_t struct. Updated all accessor functions/macros accordingly, as well
  as all consumers and intermediaries of the family parameter (such as
  bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
  change was motivated by the desire to keep the context limited, as much
  as possible, to information about the computing environment. (The family
  field, by contrast, is a descriptor about the operation being executed.)
- Added additional functions to bli_blksz_*() API.
- Added additional functions to bli_cntx_*() API.
- Minor updates to bli_func.c, bli_mbool.c.
- Removed 'obj' from bli_blksz_*() API names.
- Removed 'obj' from bli_cntx_*() API names.
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
  that operate only on a single struct to contain the "_node" suffix to
  differentiate with those routines that operate on the entire tree.
- Added enums for packm and unpackm kernels to bli_type_defs.h.
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
  They weren't being used and probably never will be.
2017-12-11 12:12:29 +05:30
Devin Matthews
a98e4aa547 Clang can't make up it's mind what to support. 2017-12-11 12:11:07 +05:30
Devin Matthews
32eb36c3e8 Add default #define for __has_extension. 2017-12-11 12:11:07 +05:30
Devin Matthews
2a9aa134f7 Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143. 2017-12-11 12:11:07 +05:30
Field G. Van Zee
6f07a034d5 Updated ar option list used by all configurations.
Details:
- Dropped 'u' from the list of modifiers passed into the library archiver
  ar. Previously, "cru" was used, while now we employ only "cr". This
  change was prompted by a warning observed on Ubuntu 16.04:

    ar: `u' modifier ignored since `D' is the default (see `U')

  This caused me to realize that the default mode causes timestamps to be
  zero, and thus the 'u' option, which causes only changed object files to
  be inserted, is not applicable.
2017-12-11 12:11:07 +05:30
Field G. Van Zee
32bc03f9ee Added --force-version=STRING option to configure.
Details:
- Added an option to configure that allows the user to force an arbitrary
  version string at configure-time. The help text also now describes the
  usage information.
- Changed the way the version string is communicated to the Makefile.
  Previously, it was read into the VERSION variable from the 'version' file
  via $(shell cat ...). Now, the VERSION variable is instead set in
  config.mk (via a configure-substituted anchor from config.mk.in).
2017-12-11 12:08:58 +05:30
Field G. Van Zee
befaee6dd8 Updated openmp/pthread barriers with GNU atomics.
Details:
- Updated the non-tree openmp and pthreads barriers defined in
  bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
  implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
  implementation goes through the same motions as the previous codes, but
  protects its loads and increments with GNU atomic built-ins. These atomic
  statements take memory ordering parameters that allow us to specify just
  enough constraints for the barrier to work as intended on weakly-ordered
  hardware. The prior implementation was only guaranteed to work on systems
  with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
  this change and his crash-course in atomics and memory ordering.)
- Removed 'volatile' from structs' barrier field declarations in
  bli_thrcomm_*.h.
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
  consistent with that of the _openmp.? files.
- Updated other bli_thrcomm_* files to rename "communicator" variables to
  simply "comm".
2017-12-11 12:08:58 +05:30
Field G. Van Zee
8f739cc847 Added API to set mt environment variables.
Details:
- Renamed bli_env_get_nway() -> bli_thread_get_env().
- Added bli_thread_set_env() to allow setting environment variables
  pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
- Added the following convenience wrapper routines:
    bli_thread_get_jc_nt()
    bli_thread_get_ic_nt()
    bli_thread_get_jr_nt()
    bli_thread_get_ir_nt()
    bli_thread_get_num_threads()
    bli_thread_set_jc_nt()
    bli_thread_set_ic_nt()
    bli_thread_set_jr_nt()
    bli_thread_set_ir_nt()
    bli_thread_set_num_threads()
- Added #include "errno.h" to bli_system.h.
- This commit addresses issue #140.
- Thanks to Chris Goodyer for inspiring these updates.
2017-12-11 12:08:58 +05:30
Marat Dukhan
1016383307 Fix Emscripten builds 2017-12-11 12:08:58 +05:30
Minh Quan HO
c09b30d115 set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
not set in bli_membrk_init
2017-12-11 12:08:58 +05:30
sthangar
997628ed97 Reducing the framework overhead of GEMV routines
Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684
2017-12-11 12:08:58 +05:30
Kiran Varaganti
ee86906616 Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024.
Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4
2017-12-11 12:08:58 +05:30
Devin Matthews
7b933b90b1 Add new SSI acknowledgment 2017-12-11 12:08:21 +05:30
sthangar
3485abba4b Checked in the small matrix code to compute GEMM called with A transpose case
Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462
2017-12-11 12:07:31 +05:30
Devin Matthews
de16beb83b PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%. 2017-12-11 12:07:31 +05:30
Devin Matthews
25d0e61854 Revert "Change PACKDIM_MR (double) for haswell to 8."
This reverts commit 681eec913d.
2017-12-11 12:07:31 +05:30
Devin Matthews
c5bdd84b35 Change PACKDIM_MR (double) for haswell to 8. 2017-12-11 12:07:31 +05:30
Field G. Van Zee
172789d562 Restored deleted lines from makefile fragments. 2017-12-11 12:07:31 +05:30
Devin Matthews
3ea9bd2c8e Change to /bin/sh.
All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.
2017-12-11 12:07:31 +05:30
Devin Matthews
49438409ee Remove shebangs from makefiles. 2017-12-11 12:07:31 +05:30
J M Dieterich
497e264047 Fix if/else structure. Thanks to TravisCI. 2017-12-11 12:07:31 +05:30
J M Dieterich
835035c56a Mark piledriver compilable w/ clang. 2017-12-11 12:06:40 +05:30
J M Dieterich
6cdb533472 Mark bulldozer compilable w/ clang. 2017-12-11 12:06:40 +05:30
J M Dieterich
a85697d622 Correct error message. 2017-12-11 12:06:40 +05:30
J M Dieterich
e0c64cad27 Indeed once can compile for carrizo also using clang. 2017-12-11 12:06:40 +05:30
J M Dieterich
4aafe0505d A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash 2017-12-11 12:06:40 +05:30
Field G. Van Zee
abaeaa68ea Fixed a bug in norm1v, norm1m.
Details:
- Fixed a bug that manifested as improperly-computed 1-norm for vectors
  and matrices. This is one of the few operations in BLIS that does not
  have its own test module within the testsuite, hence why it went
  undetected for so long. The bad 1-norms were being used to normalize
  matrices in the testsuite after initialization, which led to some
  matrices containing a combination of "large" and "small" values. This
  tended to push the residuals computed after each test away from zero.
  In some cases, they were off *just* enough to the testsuite to label
  it a "failure". Many thanks to Jeff Hammond for reporting this bug.
  (Wonky details: the bug was due to improperly-defined level-0 scalar
  macros for abval2, an operation that computes the absolute square,
  or complex magnitude/modulus. Certain complex domain instances of
  abval2 were being incorrectly defined in terms of real-only solutions,
  leading to bad results. This level-0 operation forms the basis of
  norm1v/norm1m. absq2 was also affected, but almost nothing uses
  this operation.)
2017-12-11 12:05:22 +05:30
Devin Matthews
cc3107ae1c Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123. 2017-12-11 12:05:22 +05:30
Field G. Van Zee
c8ab91f70d Disable complex 3m/4m in testsuite by default.
Details:
- Disabled testsuite tests of all level-3 implementations based on 3m
  and 4m. This will improve testing runtime on Travis CI as well as for
  anyone manually running the testsuite using default test parameters.
  Thanks to Devin Matthews for suggesting this change.
2017-12-11 12:05:22 +05:30