Commit Graph

826 Commits

Author SHA1 Message Date
Devin Matthews
b537b5bbe8 Merge pull request #133 from devinamatthews/haswell-packdim
Fix prefetching in haswell ukernel
2017-07-20 10:58:39 -05:00
Field G. Van Zee
1f1ec0db93 Updated ar option list used by all configurations.
Details:
- Dropped 'u' from the list of modifiers passed into the library archiver
  ar. Previously, "cru" was used, while now we employ only "cr". This
  change was prompted by a warning observed on Ubuntu 16.04:

    ar: `u' modifier ignored since `D' is the default (see `U')

  This caused me to realize that the default mode causes timestamps to be
  zero, and thus the 'u' option, which causes only changed object files to
  be inserted, is not applicable.
2017-07-19 15:40:48 -05:00
Field G. Van Zee
5caaba2d61 Added --force-version=STRING option to configure.
Details:
- Added an option to configure that allows the user to force an arbitrary
  version string at configure-time. The help text also now describes the
  usage information.
- Changed the way the version string is communicated to the Makefile.
  Previously, it was read into the VERSION variable from the 'version' file
  via $(shell cat ...). Now, the VERSION variable is instead set in
  config.mk (via a configure-substituted anchor from config.mk.in).
2017-07-19 13:51:53 -05:00
Field G. Van Zee
13175c5fb7 Updated openmp/pthread barriers with GNU atomics.
Details:
- Updated the non-tree openmp and pthreads barriers defined in
  bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
  implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
  implementation goes through the same motions as the previous codes, but
  protects its loads and increments with GNU atomic built-ins. These atomic
  statements take memory ordering parameters that allow us to specify just
  enough constraints for the barrier to work as intended on weakly-ordered
  hardware. The prior implementation was only guaranteed to work on systems
  with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
  this change and his crash-course in atomics and memory ordering.)
- Removed 'volatile' from structs' barrier field declarations in
  bli_thrcomm_*.h.
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
  consistent with that of the _openmp.? files.
- Updated other bli_thrcomm_* files to rename "communicator" variables to
  simply "comm".
2017-07-18 17:56:00 -05:00
Field G. Van Zee
0e58ba1b3a Added API to set mt environment variables.
Details:
- Renamed bli_env_get_nway() -> bli_thread_get_env().
- Added bli_thread_set_env() to allow setting environment variables
  pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
- Added the following convenience wrapper routines:
    bli_thread_get_jc_nt()
    bli_thread_get_ic_nt()
    bli_thread_get_jr_nt()
    bli_thread_get_ir_nt()
    bli_thread_get_num_threads()
    bli_thread_set_jc_nt()
    bli_thread_set_ic_nt()
    bli_thread_set_jr_nt()
    bli_thread_set_ir_nt()
    bli_thread_set_num_threads()
- Added #include "errno.h" to bli_system.h.
- This commit addresses issue #140.
- Thanks to Chris Goodyer for inspiring these updates.
2017-07-17 19:03:22 -05:00
Field G. Van Zee
72c8b49bb8 Merge pull request #138 from hominhquan/membrk_set_free_fp
Set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
2017-07-12 14:58:12 -05:00
Minh Quan HO
ba7cada51a set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
not set in bli_membrk_init
2017-07-07 10:52:05 +02:00
Devin Matthews
70cc825b55 Update LICENSE
Remove totally unnecessary first 9 lines and hopefully get Github to recognize it as 3BSD [ci skip].
2017-06-06 21:58:21 -05:00
Devin Matthews
cf54c77bc7 Add new SSI acknowledgment 2017-06-06 20:23:17 -05:00
Devin Matthews
7f41bb0a0b PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%. 2017-05-26 14:49:31 -04:00
Devin Matthews
d87614af3f Revert "Change PACKDIM_MR (double) for haswell to 8."
This reverts commit 681eec913d.
2017-05-26 14:47:36 -04:00
Devin Matthews
681eec913d Change PACKDIM_MR (double) for haswell to 8. 2017-05-26 12:28:09 -05:00
Field G. Van Zee
6e04f9df01 Restored deleted lines from makefile fragments. 2017-05-17 13:03:52 -05:00
Devin Matthews
ec5c0c0448 Change to /bin/sh.
All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.
2017-05-17 12:29:44 -05:00
Devin Matthews
555ddc30d4 Remove shebangs from makefiles. 2017-05-17 12:27:14 -05:00
Devin Matthews
f26bd7f42e Merge pull request #128 from iotamudelta/master
Portability and clang
2017-05-17 11:58:41 -05:00
J M Dieterich
169fb05f22 Fix if/else structure. Thanks to TravisCI. 2017-05-16 23:11:22 -04:00
J M Dieterich
0579dfea0b Restore version. 2017-05-16 22:58:07 -04:00
J M Dieterich
a75b05c23d Mark piledriver compilable w/ clang. 2017-05-16 22:23:27 -04:00
J M Dieterich
7541d46e2b Mark bulldozer compilable w/ clang. 2017-05-16 22:12:12 -04:00
J M Dieterich
91f897073e Correct error message. 2017-05-16 22:06:59 -04:00
J M Dieterich
f5131e1e49 Indeed once can compile for carrizo also using clang. 2017-05-16 22:03:23 -04:00
J M Dieterich
5fa4e9439c A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash 2017-05-16 21:50:49 -04:00
Tyler Michael Smith
cbf8710a1b Merge pull request #127 from devinamatthews/fix_blis_nt_xx
Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS
2017-05-08 11:21:20 -05:00
Field G. Van Zee
cf39d3ef3b Fixed a bug in norm1v, norm1m.
Details:
- Fixed a bug that manifested as improperly-computed 1-norm for vectors
  and matrices. This is one of the few operations in BLIS that does not
  have its own test module within the testsuite, hence why it went
  undetected for so long. The bad 1-norms were being used to normalize
  matrices in the testsuite after initialization, which led to some
  matrices containing a combination of "large" and "small" values. This
  tended to push the residuals computed after each test away from zero.
  In some cases, they were off *just* enough to the testsuite to label
  it a "failure". Many thanks to Jeff Hammond for reporting this bug.
  (Wonky details: the bug was due to improperly-defined level-0 scalar
  macros for abval2, an operation that computes the absolute square,
  or complex magnitude/modulus. Certain complex domain instances of
  abval2 were being incorrectly defined in terms of real-only solutions,
  leading to bad results. This level-0 operation forms the basis of
  norm1v/norm1m. absq2 was also affected, but almost nothing uses
  this operation.)
2017-05-05 15:06:56 -05:00
Devin Matthews
799485124f Merge pull request #121 from jeffhammond/not-real-knl
allow KNL build without hbwmalloc (i.e. emulated)
2017-05-04 10:52:09 -05:00
Devin Matthews
fdc66f12d4 Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123. 2017-05-04 10:36:04 -05:00
Field G. Van Zee
773a24efb2 Merge branch 'master' of github.com:flame/blis 2017-05-03 15:07:59 -05:00
Field G. Van Zee
dd58c9545c Disable complex 3m/4m in testsuite by default.
Details:
- Disabled testsuite tests of all level-3 implementations based on 3m
  and 4m. This will improve testing runtime on Travis CI as well as for
  anyone manually running the testsuite using default test parameters.
  Thanks to Devin Matthews for suggesting this change.
2017-05-03 15:04:51 -05:00
Jeff Hammond
0df3541f54 allow KNL build without hbwmalloc.h (i.e. emulated)
we want to be able to run BLIS KNL binaries on non-KNL machines via SDE.
although it is possible to install hbwmalloc implementation on such
systems, it is easier not to, since obviously the performance of SDE
execution is not representative so there is no reason to emulate HBW
allocation.
2017-05-02 19:35:38 -07:00
Field G. Van Zee
b88542591d Merge pull request #107 from jeffhammond/intel-compilers-no-use-libm
never use libm with Intel compilers
2017-05-02 19:22:41 -05:00
Field G. Van Zee
43007f7b65 Fixed stray parentheses in README citations. 2017-05-02 16:48:43 -05:00
Field G. Van Zee
a4f1d0b880 CHANGELOG update (0.2.2) 2017-05-02 16:38:43 -05:00
Field G. Van Zee
940a707ac7 Version file update (0.2.2) 0.2.2 2017-05-02 16:38:42 -05:00
Field G. Van Zee
d5a5e003ea Fixed a trsm1m bug that affected right-side cases.
Details:
- Fixed a bug introduced in 1c732d3 that affected trsm1m_r. The result
  was nondeterministic behavior (usually segmentation faults) for certain
  problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The
  cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c
  which explicitly directed the virtual gemm micro-kernel to use temporary
  space if the storage preference of the [real domain] gemm ukernel did
  not match the storage of the output matrix C. In the context of gemm,
  this handling is not needed because agreement between the storage pref
  and the matrix is guaranteed by a high-level optimization in BLIS.
  However, this optimization is not applied to trsm because the storage
  of C is not necessarily the same as the storage of the micro-panels of
  B--both of which are updated by the micro-kernel during a trsm
  operation. Thus, the guarantee of storage/preference agreement is not
  in place for trsm, which means we must handle that case within the
  virtual gemm micro-kernel.
- Comment updates and a minor macro change to bli_trsm*_cntx_init() for
  3m1, 4m1a, and 1m.
2017-05-02 15:48:30 -05:00
Field G. Van Zee
e80993e71f Merge branch 'master' into 1m 2017-05-02 12:30:28 -05:00
Field G. Van Zee
ca3a792477 README.md update.
Details:
- Updated bibtex entries for 4th BLIS paper, and adds entries for 5th
  and 6th BLIS papers.
2017-05-02 12:09:39 -05:00
Field G. Van Zee
6e7de6ef84 Minor updates to test/3m4m.
Details:
- Updated initial problem size and increment in Makefile.
- Updated code in test_gemm.c to correctly query kc from context.
2017-03-17 12:10:24 -05:00
Field G. Van Zee
f484c6cd43 Whitespace reformatting to armv8a kernels file.
Details:
- Updated formatting of function signature/header in
  kernels/armv8a/3/bli_gemm_opt_4x4.c.
2017-03-17 12:07:27 -05:00
Field G. Van Zee
a509fbd5ac Merge branch 'master' into 1m 2017-02-21 17:06:16 -06:00
Field G. Van Zee
69b4846ae9 Disabled experiment-related 1m code.
Details:
- Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
  specifically inserted to facilitate the benchmarking of 1m block-panel
  and panel-block algorithms.
- Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
  reflect changes used/needed during benchmarking.
2017-02-21 15:33:39 -06:00
Devin Matthews
513944e4a9 Merge pull request #118 from devinamatthews/master
Handle k=0 correctly in KNL dgemm ukernel.
2017-02-20 10:04:33 -05:00
Devin Matthews
0e18f68cf1 Handle k=0 correctly in KNL dgemm ukernel. 2017-02-20 09:03:21 -06:00
Devin Matthews
8b462a0e8c Merge pull request #117 from devinamatthews/master
Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.
2017-02-19 23:03:03 -05:00
Devin Matthews
7d42fc0796 Cast dim_t and inc_t parameters to 64-bit in KNL microkernels. 2017-02-19 21:10:55 -05:00
Field G. Van Zee
c362afc525 Added missing "level-0" BLAS [sd]cabs1_().
Details:
- Fixed issue #115 by adding implementations for scabs1_() and dcabs1_()
  to the BLAS compatibility layer. Thanks to heroxbd for pointing out
  their absence.
2017-02-09 11:54:59 -06:00
Field G. Van Zee
018180c938 Fixed a minor bug in configure (issue #114).
Details:
- Fixed a bug in the configure script whereby a non-preferred value for
  --enable-threading would cause problems in common.mk vis-a-vis detecting
  which threading model was chosen. Thanks to heroxbd for reporting this
  issue.
2017-02-08 11:20:52 -06:00
Devin Matthews
ddf45e7177 Merge pull request #113 from devinamatthews/knl_thread_params
Change default threading parameters for KNL.
2017-01-27 14:25:40 -06:00
Devin Matthews
78e1b16e16 Change default threading parameters for KNL. 2017-01-27 14:22:20 -06:00
Field G. Van Zee
1c732d3ddc Added 1m-specific APIs for bp, pb gemm algorithms.
Details:
- Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the
  body of bli_gemm_cntl_create() replaced with a call to the former.
- Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now,
  bli_cntl_free() can check if the thread parameter is NULL, and if so,
  call the latter, and otherwise call the former.
- Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in
  terms of bli_gemm1mxx_cntx_init(), which behaves the same as
  bli_gemm1m_cntx_init() did before, except that an extra bool parameter
  (is_pb) is used to support both bp and pb algorithms (including to
  support the anti-preference field described below).
- Added support for "anti-preference" in context. The anti_pref field,
  when true, will toggle the boolean return value of routines such as
  bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of
  causing BLIS to transpose the operation to achieve disagreement (rather
  than agreement) between the storage of C and the micro-kernel output
  preference. This disagreement is needed for panel-block implementations,
  since they induce a transposition of the suboperation immediately before
  the macro-kernel is called, which changes the apparent storage of C. For
  now, anti-preference is used only with the pb algorithm for 1m (and not
  with any other non-1m implementation).
- Defined new functions,
    bli_cntx_l3_ukr_eff_prefers_storage_of()
    bli_cntx_l3_ukr_eff_dislikes_storage_of()
    bli_cntx_l3_nat_ukr_eff_prefers_storage_of()
    bli_cntx_l3_nat_ukr_eff_dislikes_storage_of()
  which are identical to their non-"eff" (effectively) counterparts except
  that they take the anti-preference field of the context into account.
- Explicitly initialize the anti-pref field to FALSE in
  bli_gks_cntx_set_l3_nat_ukr_prefs().
- Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel
  in terms of the existing block-panel macro-kernel _ker_var2(). This
  technique requires inducing transposes on all operands and swapping
  the A and B.
- Changed bli_obj_induce_trans() macro so that pack-related fields are
  also changed to reflect the induced transposition.
- Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily
  specify the 1m algorithm (block-panel or panel-block).
- Renamed the following cntx_t-related macros:
    bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block()
    bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel()
    bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel()
  and updated all instantiations. Also updated the field names in the
  cntx_t struct.
- Comment updates.
2017-01-25 16:25:46 -06:00