Commit Graph

846 Commits

Author SHA1 Message Date
Field G. Van Zee
375342799c Removed a duplicate bli_avx512_macros.h header.
Details:
- Removed a duplicate header file that was causing problems during
  installation for the 'knl' configuration. Thanks to Victor Eijkhout
  for reporting this issue.
2017-10-18 13:41:25 -05:00
Field G. Van Zee
e02d3cb841 Fixed a pthread typo in previous commit.
Details:
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.
2017-09-26 19:02:53 -05:00
Field G. Van Zee
f5962a1aae Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
Details:
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
  into a k x k triangular matrix for the purposes of obtaining an mr x k
  micropanel of A with which to test.
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
  very large k (depending on the product of mr x kc on that architecture).
  The bug arose from the fact that the test module was triggering the
  allocation of blocks from the internal memory pools, which are limited in
  size. This allocation imposes an implicit assumption that the micro-
  panel being tested with will fit inside, and this assumption is violated
  for large values of k. Arbitrarily large k may now be tested for both
  operation tests.
- Added OpenMP/pthread critical sections around the setting or getting of
  statuses from the induced method operation lookup table in bli_l3_ind.c.
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
  issues.
2017-09-26 17:00:04 -05:00
Field G. Van Zee
8e917b256c Updated bibtex info for BLIS5 (3m4m) article. 2017-09-09 14:10:15 -05:00
Devin Matthews
adafe974b4 Merge pull request #150 from devinamatthews/vzeroupper
Add vzeroupper to Intel AVX kernels.
2017-08-15 15:17:21 -05:00
Devin Matthews
7dc78b49f9 Add vzeroupper to Intel AVX kernels. 2017-08-15 10:02:25 -05:00
Field G. Van Zee
f86ce54d6f Removed trailing enum commas from bli_type_defs.h.
Details:
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
  Erling Andersen for pointing out this inconsistency and suggesting
  the change.
2017-08-10 16:24:28 -05:00
Field G. Van Zee
60a1eeb231 Added edge handling to _determine_blocksize_b().
Details:
- Added explicit handling of situations where i == dim to
  bli_determine_blocksize_b_sub(). This isn't actually needed by any
  current use case within BLIS, but handling the situation is nonetheless
  prudent. Thanks to Minh Quan for reporting this issue and requesting
  the fix.
2017-08-05 13:04:31 -05:00
Field G. Van Zee
b01c808299 Fixed a minor bug in level-3 packm management.
Details:
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
  entries to be released and then re-acquired unnecessarily. (In essence,
  the "<" operands in the conditional that guards the
  release-and-reacquire code block simply needed to be swapped.) The bug
  should have only affected performance (rather than the computed result).
  Thanks to Minh Quan for identifying and reporting the bug.
2017-08-04 14:17:44 -05:00
Devin Matthews
05925dd5d3 Merge pull request #146 from devinamatthews/master
Change lsame_ signature to match lapacke.
2017-08-01 09:31:01 -05:00
Devin Matthews
cecdc05d28 Change lsame_ signature to match lapacke. 2017-07-31 15:19:51 -05:00
Field G. Van Zee
803bbef0a3 Fixed pthreads compile bug with previous commit.
Details:
- Erroneously passed family parameter into l3int_t function despite
  that function not taking the parameter. Oops.
2017-07-29 20:17:05 -05:00
Field G. Van Zee
c63980f4ca Moved 'family' field from cntx_t to cntl_t.
Details:
- Removed the family field inside the cntx_t struct and re-added it to the
  cntl_t struct. Updated all accessor functions/macros accordingly, as well
  as all consumers and intermediaries of the family parameter (such as
  bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
  change was motivated by the desire to keep the context limited, as much
  as possible, to information about the computing environment. (The family
  field, by contrast, is a descriptor about the operation being executed.)
- Added additional functions to bli_blksz_*() API.
- Added additional functions to bli_cntx_*() API.
- Minor updates to bli_func.c, bli_mbool.c.
- Removed 'obj' from bli_blksz_*() API names.
- Removed 'obj' from bli_cntx_*() API names.
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
  that operate only on a single struct to contain the "_node" suffix to
  differentiate with those routines that operate on the entire tree.
- Added enums for packm and unpackm kernels to bli_type_defs.h.
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
  They weren't being used and probably never will be.
2017-07-29 14:53:39 -05:00
Field G. Van Zee
0783739556 Merge pull request #139 from Maratyszcza/emscripten
Fix Emscripten builds
2017-07-21 16:49:48 -05:00
Field G. Van Zee
ad8610b441 Merge branch 'master' into emscripten 2017-07-21 15:18:33 -05:00
Devin Matthews
ca1d1d8560 Merge pull request #144 from devinamatthews/fix_atomics_on_bgq
Add fallbacks to __sync_* or __c11_atomic_* builtins...
2017-07-21 09:49:50 -05:00
Devin Matthews
733faf848d Clang can't make up it's mind what to support. 2017-07-20 14:50:13 -05:00
Devin Matthews
7425d0744d Add default #define for __has_extension. 2017-07-20 12:54:58 -05:00
Devin Matthews
b537b5bbe8 Merge pull request #133 from devinamatthews/haswell-packdim
Fix prefetching in haswell ukernel
2017-07-20 10:58:39 -05:00
Devin Matthews
8823f91a14 Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143. 2017-07-20 10:04:34 -05:00
Field G. Van Zee
1f1ec0db93 Updated ar option list used by all configurations.
Details:
- Dropped 'u' from the list of modifiers passed into the library archiver
  ar. Previously, "cru" was used, while now we employ only "cr". This
  change was prompted by a warning observed on Ubuntu 16.04:

    ar: `u' modifier ignored since `D' is the default (see `U')

  This caused me to realize that the default mode causes timestamps to be
  zero, and thus the 'u' option, which causes only changed object files to
  be inserted, is not applicable.
2017-07-19 15:40:48 -05:00
Field G. Van Zee
5caaba2d61 Added --force-version=STRING option to configure.
Details:
- Added an option to configure that allows the user to force an arbitrary
  version string at configure-time. The help text also now describes the
  usage information.
- Changed the way the version string is communicated to the Makefile.
  Previously, it was read into the VERSION variable from the 'version' file
  via $(shell cat ...). Now, the VERSION variable is instead set in
  config.mk (via a configure-substituted anchor from config.mk.in).
2017-07-19 13:51:53 -05:00
Field G. Van Zee
13175c5fb7 Updated openmp/pthread barriers with GNU atomics.
Details:
- Updated the non-tree openmp and pthreads barriers defined in
  bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
  implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
  implementation goes through the same motions as the previous codes, but
  protects its loads and increments with GNU atomic built-ins. These atomic
  statements take memory ordering parameters that allow us to specify just
  enough constraints for the barrier to work as intended on weakly-ordered
  hardware. The prior implementation was only guaranteed to work on systems
  with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
  this change and his crash-course in atomics and memory ordering.)
- Removed 'volatile' from structs' barrier field declarations in
  bli_thrcomm_*.h.
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
  consistent with that of the _openmp.? files.
- Updated other bli_thrcomm_* files to rename "communicator" variables to
  simply "comm".
2017-07-18 17:56:00 -05:00
Field G. Van Zee
0e58ba1b3a Added API to set mt environment variables.
Details:
- Renamed bli_env_get_nway() -> bli_thread_get_env().
- Added bli_thread_set_env() to allow setting environment variables
  pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
- Added the following convenience wrapper routines:
    bli_thread_get_jc_nt()
    bli_thread_get_ic_nt()
    bli_thread_get_jr_nt()
    bli_thread_get_ir_nt()
    bli_thread_get_num_threads()
    bli_thread_set_jc_nt()
    bli_thread_set_ic_nt()
    bli_thread_set_jr_nt()
    bli_thread_set_ir_nt()
    bli_thread_set_num_threads()
- Added #include "errno.h" to bli_system.h.
- This commit addresses issue #140.
- Thanks to Chris Goodyer for inspiring these updates.
2017-07-17 19:03:22 -05:00
Marat Dukhan
8772a0b33a Fix Emscripten builds 2017-07-13 21:39:24 -07:00
Field G. Van Zee
72c8b49bb8 Merge pull request #138 from hominhquan/membrk_set_free_fp
Set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
2017-07-12 14:58:12 -05:00
Minh Quan HO
ba7cada51a set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
not set in bli_membrk_init
2017-07-07 10:52:05 +02:00
Devin Matthews
70cc825b55 Update LICENSE
Remove totally unnecessary first 9 lines and hopefully get Github to recognize it as 3BSD [ci skip].
2017-06-06 21:58:21 -05:00
Devin Matthews
cf54c77bc7 Add new SSI acknowledgment 2017-06-06 20:23:17 -05:00
Devin Matthews
7f41bb0a0b PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%. 2017-05-26 14:49:31 -04:00
Devin Matthews
d87614af3f Revert "Change PACKDIM_MR (double) for haswell to 8."
This reverts commit 681eec913d.
2017-05-26 14:47:36 -04:00
Devin Matthews
681eec913d Change PACKDIM_MR (double) for haswell to 8. 2017-05-26 12:28:09 -05:00
Field G. Van Zee
6e04f9df01 Restored deleted lines from makefile fragments. 2017-05-17 13:03:52 -05:00
Devin Matthews
ec5c0c0448 Change to /bin/sh.
All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.
2017-05-17 12:29:44 -05:00
Devin Matthews
555ddc30d4 Remove shebangs from makefiles. 2017-05-17 12:27:14 -05:00
Devin Matthews
f26bd7f42e Merge pull request #128 from iotamudelta/master
Portability and clang
2017-05-17 11:58:41 -05:00
J M Dieterich
169fb05f22 Fix if/else structure. Thanks to TravisCI. 2017-05-16 23:11:22 -04:00
J M Dieterich
0579dfea0b Restore version. 2017-05-16 22:58:07 -04:00
J M Dieterich
a75b05c23d Mark piledriver compilable w/ clang. 2017-05-16 22:23:27 -04:00
J M Dieterich
7541d46e2b Mark bulldozer compilable w/ clang. 2017-05-16 22:12:12 -04:00
J M Dieterich
91f897073e Correct error message. 2017-05-16 22:06:59 -04:00
J M Dieterich
f5131e1e49 Indeed once can compile for carrizo also using clang. 2017-05-16 22:03:23 -04:00
J M Dieterich
5fa4e9439c A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash 2017-05-16 21:50:49 -04:00
Tyler Michael Smith
cbf8710a1b Merge pull request #127 from devinamatthews/fix_blis_nt_xx
Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS
2017-05-08 11:21:20 -05:00
Field G. Van Zee
cf39d3ef3b Fixed a bug in norm1v, norm1m.
Details:
- Fixed a bug that manifested as improperly-computed 1-norm for vectors
  and matrices. This is one of the few operations in BLIS that does not
  have its own test module within the testsuite, hence why it went
  undetected for so long. The bad 1-norms were being used to normalize
  matrices in the testsuite after initialization, which led to some
  matrices containing a combination of "large" and "small" values. This
  tended to push the residuals computed after each test away from zero.
  In some cases, they were off *just* enough to the testsuite to label
  it a "failure". Many thanks to Jeff Hammond for reporting this bug.
  (Wonky details: the bug was due to improperly-defined level-0 scalar
  macros for abval2, an operation that computes the absolute square,
  or complex magnitude/modulus. Certain complex domain instances of
  abval2 were being incorrectly defined in terms of real-only solutions,
  leading to bad results. This level-0 operation forms the basis of
  norm1v/norm1m. absq2 was also affected, but almost nothing uses
  this operation.)
2017-05-05 15:06:56 -05:00
Devin Matthews
799485124f Merge pull request #121 from jeffhammond/not-real-knl
allow KNL build without hbwmalloc (i.e. emulated)
2017-05-04 10:52:09 -05:00
Devin Matthews
fdc66f12d4 Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123. 2017-05-04 10:36:04 -05:00
Field G. Van Zee
773a24efb2 Merge branch 'master' of github.com:flame/blis 2017-05-03 15:07:59 -05:00
Field G. Van Zee
dd58c9545c Disable complex 3m/4m in testsuite by default.
Details:
- Disabled testsuite tests of all level-3 implementations based on 3m
  and 4m. This will improve testing runtime on Travis CI as well as for
  anyone manually running the testsuite using default test parameters.
  Thanks to Devin Matthews for suggesting this change.
2017-05-03 15:04:51 -05:00
Jeff Hammond
0df3541f54 allow KNL build without hbwmalloc.h (i.e. emulated)
we want to be able to run BLIS KNL binaries on non-KNL machines via SDE.
although it is possible to install hbwmalloc implementation on such
systems, it is easier not to, since obviously the performance of SDE
execution is not representative so there is no reason to emulate HBW
allocation.
2017-05-02 19:35:38 -07:00