Commit Graph

1886 Commits

Author SHA1 Message Date
Devin Matthews
7dc78b49f9 Add vzeroupper to Intel AVX kernels. 2017-08-15 10:02:25 -05:00
Field G. Van Zee
f86ce54d6f Removed trailing enum commas from bli_type_defs.h.
Details:
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
  Erling Andersen for pointing out this inconsistency and suggesting
  the change.
2017-08-10 16:24:28 -05:00
Field G. Van Zee
60a1eeb231 Added edge handling to _determine_blocksize_b().
Details:
- Added explicit handling of situations where i == dim to
  bli_determine_blocksize_b_sub(). This isn't actually needed by any
  current use case within BLIS, but handling the situation is nonetheless
  prudent. Thanks to Minh Quan for reporting this issue and requesting
  the fix.
2017-08-05 13:04:31 -05:00
Field G. Van Zee
b01c808299 Fixed a minor bug in level-3 packm management.
Details:
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
  entries to be released and then re-acquired unnecessarily. (In essence,
  the "<" operands in the conditional that guards the
  release-and-reacquire code block simply needed to be swapped.) The bug
  should have only affected performance (rather than the computed result).
  Thanks to Minh Quan for identifying and reporting the bug.
2017-08-04 14:17:44 -05:00
Field G. Van Zee
8b379069fc Merge branch 'master' into rt 2017-08-01 15:30:40 -05:00
Devin Matthews
05925dd5d3 Merge pull request #146 from devinamatthews/master
Change lsame_ signature to match lapacke.
2017-08-01 09:31:01 -05:00
Devin Matthews
cecdc05d28 Change lsame_ signature to match lapacke. 2017-07-31 15:19:51 -05:00
Field G. Van Zee
803bbef0a3 Fixed pthreads compile bug with previous commit.
Details:
- Erroneously passed family parameter into l3int_t function despite
  that function not taking the parameter. Oops.
2017-07-29 20:17:05 -05:00
Field G. Van Zee
c63980f4ca Moved 'family' field from cntx_t to cntl_t.
Details:
- Removed the family field inside the cntx_t struct and re-added it to the
  cntl_t struct. Updated all accessor functions/macros accordingly, as well
  as all consumers and intermediaries of the family parameter (such as
  bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
  change was motivated by the desire to keep the context limited, as much
  as possible, to information about the computing environment. (The family
  field, by contrast, is a descriptor about the operation being executed.)
- Added additional functions to bli_blksz_*() API.
- Added additional functions to bli_cntx_*() API.
- Minor updates to bli_func.c, bli_mbool.c.
- Removed 'obj' from bli_blksz_*() API names.
- Removed 'obj' from bli_cntx_*() API names.
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
  that operate only on a single struct to contain the "_node" suffix to
  differentiate with those routines that operate on the entire tree.
- Added enums for packm and unpackm kernels to bli_type_defs.h.
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
  They weren't being used and probably never will be.
2017-07-29 14:53:39 -05:00
Field G. Van Zee
0783739556 Merge pull request #139 from Maratyszcza/emscripten
Fix Emscripten builds
2017-07-21 16:49:48 -05:00
Field G. Van Zee
ad8610b441 Merge branch 'master' into emscripten 2017-07-21 15:18:33 -05:00
Devin Matthews
ca1d1d8560 Merge pull request #144 from devinamatthews/fix_atomics_on_bgq
Add fallbacks to __sync_* or __c11_atomic_* builtins...
2017-07-21 09:49:50 -05:00
Devin Matthews
733faf848d Clang can't make up it's mind what to support. 2017-07-20 14:50:13 -05:00
Devin Matthews
7425d0744d Add default #define for __has_extension. 2017-07-20 12:54:58 -05:00
Devin Matthews
b537b5bbe8 Merge pull request #133 from devinamatthews/haswell-packdim
Fix prefetching in haswell ukernel
2017-07-20 10:58:39 -05:00
Devin Matthews
8823f91a14 Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143. 2017-07-20 10:04:34 -05:00
Field G. Van Zee
1f1ec0db93 Updated ar option list used by all configurations.
Details:
- Dropped 'u' from the list of modifiers passed into the library archiver
  ar. Previously, "cru" was used, while now we employ only "cr". This
  change was prompted by a warning observed on Ubuntu 16.04:

    ar: `u' modifier ignored since `D' is the default (see `U')

  This caused me to realize that the default mode causes timestamps to be
  zero, and thus the 'u' option, which causes only changed object files to
  be inserted, is not applicable.
2017-07-19 15:40:48 -05:00
Field G. Van Zee
5caaba2d61 Added --force-version=STRING option to configure.
Details:
- Added an option to configure that allows the user to force an arbitrary
  version string at configure-time. The help text also now describes the
  usage information.
- Changed the way the version string is communicated to the Makefile.
  Previously, it was read into the VERSION variable from the 'version' file
  via $(shell cat ...). Now, the VERSION variable is instead set in
  config.mk (via a configure-substituted anchor from config.mk.in).
2017-07-19 13:51:53 -05:00
Field G. Van Zee
13175c5fb7 Updated openmp/pthread barriers with GNU atomics.
Details:
- Updated the non-tree openmp and pthreads barriers defined in
  bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
  implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
  implementation goes through the same motions as the previous codes, but
  protects its loads and increments with GNU atomic built-ins. These atomic
  statements take memory ordering parameters that allow us to specify just
  enough constraints for the barrier to work as intended on weakly-ordered
  hardware. The prior implementation was only guaranteed to work on systems
  with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
  this change and his crash-course in atomics and memory ordering.)
- Removed 'volatile' from structs' barrier field declarations in
  bli_thrcomm_*.h.
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
  consistent with that of the _openmp.? files.
- Updated other bli_thrcomm_* files to rename "communicator" variables to
  simply "comm".
2017-07-18 17:56:00 -05:00
Field G. Van Zee
0e58ba1b3a Added API to set mt environment variables.
Details:
- Renamed bli_env_get_nway() -> bli_thread_get_env().
- Added bli_thread_set_env() to allow setting environment variables
  pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
- Added the following convenience wrapper routines:
    bli_thread_get_jc_nt()
    bli_thread_get_ic_nt()
    bli_thread_get_jr_nt()
    bli_thread_get_ir_nt()
    bli_thread_get_num_threads()
    bli_thread_set_jc_nt()
    bli_thread_set_ic_nt()
    bli_thread_set_jr_nt()
    bli_thread_set_ir_nt()
    bli_thread_set_num_threads()
- Added #include "errno.h" to bli_system.h.
- This commit addresses issue #140.
- Thanks to Chris Goodyer for inspiring these updates.
2017-07-17 19:03:22 -05:00
Marat Dukhan
8772a0b33a Fix Emscripten builds 2017-07-13 21:39:24 -07:00
Field G. Van Zee
72c8b49bb8 Merge pull request #138 from hominhquan/membrk_set_free_fp
Set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
2017-07-12 14:58:12 -05:00
Minh Quan HO
ba7cada51a set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
not set in bli_membrk_init
2017-07-07 10:52:05 +02:00
Kiran Varaganti
1241301869 Merge "Reducing the framework overhead of GEMV routines" into amd-staging 2017-07-05 02:24:00 -04:00
sthangar
25ead66fb7 Reducing the framework overhead of GEMV routines
Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684
2017-07-05 10:41:28 +05:30
Kiran Varaganti
969b67e880 Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024.
Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4
2017-07-04 12:57:32 +05:30
Devin Matthews
70cc825b55 Update LICENSE
Remove totally unnecessary first 9 lines and hopefully get Github to recognize it as 3BSD [ci skip].
2017-06-06 21:58:21 -05:00
Devin Matthews
cf54c77bc7 Add new SSI acknowledgment 2017-06-06 20:23:17 -05:00
prangana
d6ef56c6db Update version number
Change-Id: Ib6e52d1d34c0791367ab9152dfab31f94deedeb4
betarelease-0.9
2017-06-01 16:22:23 +05:30
prangana
897bfa0e92 Update version number
Change-Id: Ib6e52d1d34c0791367ab9152dfab31f94deedeb4
2017-06-01 16:11:09 +05:30
Santanu Thangaraj
99d0ba5606 Merge "Checked in the small matrix code to compute GEMM called with A transpose case" into amd-staging 2017-06-01 02:19:02 -04:00
sthangar
6d17e0120f Checked in the small matrix code to compute GEMM called with A transpose case
Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462
2017-06-01 11:42:54 +05:30
prangana
9d93f8481a Update Licence File
Change-Id: I4c5cf1690d0cef92a68400f9a89e454ab6856ad2
2017-05-30 14:00:03 +05:30
prangana
be2c7eb851 Update Licence File
Change-Id: I4c5cf1690d0cef92a68400f9a89e454ab6856ad2
2017-05-30 10:10:52 +05:30
Devin Matthews
7f41bb0a0b PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%. 2017-05-26 14:49:31 -04:00
Devin Matthews
d87614af3f Revert "Change PACKDIM_MR (double) for haswell to 8."
This reverts commit 681eec913d.
2017-05-26 14:47:36 -04:00
Devin Matthews
681eec913d Change PACKDIM_MR (double) for haswell to 8. 2017-05-26 12:28:09 -05:00
praveeng
0a3ae0ecaa frame/3/gemm/bli_gemm_front.c
Change-Id: I52a0fbc1d33bb948d430942323bbc5fe44e3ca13
2017-05-20 16:53:50 +05:30
Field G. Van Zee
6e04f9df01 Restored deleted lines from makefile fragments. 2017-05-17 13:03:52 -05:00
Devin Matthews
ec5c0c0448 Change to /bin/sh.
All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.
2017-05-17 12:29:44 -05:00
Devin Matthews
555ddc30d4 Remove shebangs from makefiles. 2017-05-17 12:27:14 -05:00
Devin Matthews
f26bd7f42e Merge pull request #128 from iotamudelta/master
Portability and clang
2017-05-17 11:58:41 -05:00
J M Dieterich
169fb05f22 Fix if/else structure. Thanks to TravisCI. 2017-05-16 23:11:22 -04:00
J M Dieterich
0579dfea0b Restore version. 2017-05-16 22:58:07 -04:00
J M Dieterich
a75b05c23d Mark piledriver compilable w/ clang. 2017-05-16 22:23:27 -04:00
J M Dieterich
7541d46e2b Mark bulldozer compilable w/ clang. 2017-05-16 22:12:12 -04:00
J M Dieterich
91f897073e Correct error message. 2017-05-16 22:06:59 -04:00
J M Dieterich
f5131e1e49 Indeed once can compile for carrizo also using clang. 2017-05-16 22:03:23 -04:00
J M Dieterich
5fa4e9439c A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash 2017-05-16 21:50:49 -04:00
Field G. Van Zee
1f3a58197e Housekeeping, induced method file/function renames.
Details:
- Renamed all level-3 induced method files to use the "_vir.c" suffix
  instead of "_ref.c". Also renamed functions within these files
  accordingly.
- Renamed cpp macro definitions in frame/ind/include according to the
  above changes.
- Removed frame/3/old.
2017-05-08 16:10:03 -05:00