Commit Graph

1507 Commits

Author SHA1 Message Date
Isuru Fernando
707a5e7f9b No conda for mingw build 2018-11-21 00:29:08 -06:00
Isuru Fernando
65b0565c0a Check MinGW-w64 2018-11-21 00:29:08 -06:00
Isuru Fernando
9ddffba584 Fix MinGW build failure
Fixes https://github.com/flame/blis/issues/278
2018-11-21 00:23:34 -06:00
Field G. Van Zee
e769bf46b0 Tweak testsuite to issue FAIL for Nan, Inf (#279).
Details:
- Adjusted the definition for libblis_test_get_string_for_result() in
  testsuite/src/test_libblis.c so that the "FAIL" string is returned if
  the computed residual contains either NaN or Inf. Previously, a
  residual containing NaN would result in the selection of the "PASS"
  string. Thanks to Devin Matthews for reporting this issue (#279).
- Expounded on comment for the macro definitions of bli_isnan() and
  bli_isinf() in bli_misc_macro_defs.h to make it more obvious why they
  must remain macros.
2018-11-20 16:16:53 -06:00
Field G. Van Zee
279deae18f Added 4x5 matlab plotting scripts to test/3m4m.
Details:
- Added a new directory, test/3m4m/matlab, containing matlab scripts for
  plotting 4x5 panels of performance graphs (using the subplot()
  function) for gemm, hemm, herk, trmm, and trsm across all four
  floating-point datatypes. I expect to further refine these scripts as
  time goes on, but their current state constitutes a good start.
2018-11-16 11:34:19 -06:00
Field G. Van Zee
7b02c72665 CREDITS file update. 2018-11-14 13:49:55 -06:00
Field G. Van Zee
84dd298a27 Patch to fix msys2/Windows build failure (#277).
Details:
- Expanded cpp guard in frame/include/bli_x86_asm_macros.h to also check
  __MINGW32__ in addition to _WIN32, __clang__, and __MIC__. Thanks to
  Isuru Fernando for suggesting this fix, and also to Costas Yamin for
  originally reporting the issue (#277).
2018-11-14 13:47:45 -06:00
Field G. Van Zee
7b5ba7319b Merge branch 'dev' of github.com:flame/blis into dev 2018-11-14 12:32:01 -06:00
Field G. Van Zee
52392932dc Minor fixes to test/3m4m drivers.
Details:
- Cleanups to Makefile to allow all test drivers to be built for
  OpenBLAS and MKL in addition to BLIS.
- Fixed copy-paste typos in test_hemm in calls to ssymm_() and dsymm_().
- Fixed incorrect types for betap in BLAS cpp macro branch of
  test_herk.c.
2018-11-13 22:23:38 +00:00
Field G. Van Zee
4f12e36a0d Fixed number of columns in first output line.
Details:
- In previous commit, forgot to remove output column corresponding to
  the k dimension.
2018-11-13 14:23:12 -06:00
Field G. Van Zee
a2e0cdd7de Added hemm test driver to test/3m4m.
Details:
- Added a new test_hemm.c test driver to test/3m4m, which was modeled
  after the driver by the similar name in test. Also updated Makefile
  so that blis-nat-[sm]t would trigger builds for the new driver.
2018-11-13 14:15:11 -06:00
Field G. Van Zee
0f9b53e84b Fixed a bug in high-level mixeddt conditional.
Details:
- Fixed a bug in frame/3/bli_l3_oapi.c in the conditional that divides
  use of induced method (1m) execution from native execution. The former
  was intended to only be used in cases where all storage datatypes are
  complex and the datatype of C is equal to the computation datatype.
  (If mixed datatypes are detected, native execution would be used.)
  However, the code in bli_gemm() was erroneously checking the execution
  datatype instead of the computation datatype, which at that point is
  guaranteed to be equal to the storage datatype even if the computation
  datatype contains a different value. Thanks to Devangi Parikh for
  helping in isolating this bug.
2018-11-13 13:03:15 -06:00
Field G. Van Zee
ce719f816d More edits to mixeddt matlab scripts.
Details:
- Renamed scripts in test/mixeddt/matlab:
    plot_case_all.m -> plot_dom_all.m
    plot_case_md.m  -> plot_dom_case.m
    plot_all_md.m   -> plot_dt_all.m
- Added plot_dt_select.m in order to plot select graphs for the main
  body of the mixeddt paper, and added additional related legend
  handling in plot_gemm_perf.m.
- Added test/mixeddt/matlab/output and a .gitkeep file within in order
  to force git to recognize the directory.
2018-11-10 14:48:43 -06:00
Field G. Van Zee
bf99e7c14b Minor updates to test/mixeddt driver.
Details:
- Cleaned up test/mixeddt Makefile in preparation for gathering new
  data for mixeddt paper, including renaming implementations to
  "internal" and "ad-hoc" to match the terminology to be used in the
  paper.
- Added new matlab scripts for generating 8 figures, each covering all
  mixed-precision cases for each mixed-domain case.
- Updated the runme.sh script according to changes to Makefile.
- Fixed a minor bug in test_gemm.c that may have given incorrect
  performance in complex, homogeneous storage datatype cases where
  the computation precision was equal to the storage precisions.
  (Examples: zzzd, cccs.)
2018-11-08 18:47:17 -06:00
Field G. Van Zee
4bbb454bf3 Testsuite docs update for mixed-datatype gemm.
Details:
- Updated docs/Testsuite.md to include mention of the new mixed-domain
  and mixed-precision settings, including descriptions.
- Updated docs/MixedDatatypes.md to include a brief section on running
  the testsuite to exercise mixed-datatype functionality, which mostly
  amounts to a link to the Testsuite.md document.
- Minor verbiage change to testsuite output to correct a misleading
  label associated with the value returned by the query function
  bli_info_get_simd_num_registers(). (The function does not return the
  number of SIMD registers present in the hardware, but rather a maximum
  assumed value for the purposes of allocating temporary microtile
  workspace on the function stack.)
2018-11-03 19:11:01 -05:00
Field G. Van Zee
16401ae922 Merge branch 'dev' 2018-11-03 19:09:43 -05:00
Field G. Van Zee
2d403a1535 Merge pull request #275 from RhysU/patch-1
Spelling in FAQ
2018-11-01 20:18:53 -05:00
Rhys Ulerich
4a12979f65 Spelling in FAQ 2018-11-01 20:20:59 -04:00
Field G. Van Zee
f19c33af4c Disallow 64b BLAS integers + 32b BLIS integers.
Details:
- Print an error message from configure if the user attempts to
  explicitly configure BLIS for simultaneous use of 64-bit integers in
  the BLAS API with 32-bit integers in the BLIS API.
- Added cpp macro conditional to bli_type_defs.h to mandate that BLIS
  integers be 64 bits if the BLAS integers are 64 bits. This and the
  above item take care of issue #274. Thanks to Devin Matthews and
  Jeff Hammond for suggesting these safeguards.
- Slight reorganization and relabeling (for clarity) of BLAS/CBLAS
  sections and BLIS integer size line of the testsuite configuration
  output.
- Very minor edits to docs/MixedDatatypes.md.
2018-10-26 17:07:15 -05:00
Field G. Van Zee
e90e7f309b CHANGELOG update (0.5.0) 2018-10-25 14:09:43 -05:00
Field G. Van Zee
be7c57819c Version file update (0.5.0) 0.5.0 2018-10-25 14:09:40 -05:00
Field G. Van Zee
75da7f2a20 ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
- Updated docs/FAQ.md to reflect recent developments, and other edits.
- Minor updates to RELEASING.
2018-10-25 14:02:41 -05:00
Field G. Van Zee
6fbc456fb3 Added SALT testing to Travis CI.
Details:
- Modified .travis.yml to automatically employ the simulation of
  application-level threading within the testsuite, with supporting
  changes to common.mk, the top-level Makefile, and
  travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
  '.salt' suffix (similar to those with the '.fast' suffix) for
  testing application-level threading.
- Updated docs/BuildSystem.md to document the new make targets
  'testblis-salt' and 'checkblis-salt'.
2018-10-25 13:20:25 -05:00
Field G. Van Zee
0e27963a67 Add bli_pthread_mutex_trylock().
Details:
- Added the missing bli_pthread_mutex_trylock() function and prototype
  to the non-Windows sections of bli_pthread.c and .h. This function
  isn't needed by BLIS, but I figured why not make the Windows and
  non-Windows sections consistent with one another.
2018-10-24 12:16:19 -05:00
Field G. Van Zee
4b683740c1 Defined bli_pthread_cond_*() and related defs.
Details:
- Added function definitions for bli_pthread_cond_*() as well as related
  types and constants to bli_pthread.c, and corresponding prototypes to
  bli_pthread.h.
2018-10-24 11:56:16 -05:00
Field G. Van Zee
4b4f8072b9 Define bli_pthreads barrier types on OS X.
Details:
- Fully define bli_pthreads barrier-related types on OS X. Only typedef
  those types in terms of pthreads types on non-Windows, non-Apple OSes
  (i.e. Linux).
2018-10-24 11:31:46 -05:00
Field G. Van Zee
ad98790dce Fix names of Windows pthread initializer macros.
Details:
- Renamed the PTHREAD_ initializer macros in the Windows cpp case to use
  BLIS_ prefixes to match their non-Windows counterparts.
2018-10-23 20:35:05 -05:00
Field G. Van Zee
06c23954e6 Defined unified bli_pthreads_*() API for all OSes.
Details:
- Expanded the bli_pthread_*() -> pthread_*() wrappers in
  frame/thread/bli_pthread.c to include cases for Windows taken from
  frame/base/bli_pthread_wrap.c. Now, bli_thread_*() is always defined
  and always used by BLIS and the BLIS testsuite (in lieu of calling
  pthreads directly, as before). The implementation used in this new
  API depends on whether we are building for Windows, and to a lesser
  extent, whether we are building on OS X. For the core API, Windows
  uses Windows threads, non-Windows (Linux, OS X) uses pthreads.
  OS X and Windows get barriers implemented in terms of other
  bli_pthread_*() functions, and Linux gets barriers implemented in
  terms of pthread_barrier*(). This commit addresses issue #273.
- Fixed a bug in the Linux definition of bli_pthread_mutex_unlock(),
  which was erroneously calling pthread_mutex_lock().
- Minor changes to configure so that the auto-detection executable
  can be built given the above changes (most notably, turning on
  POSIX extensions via -D_GNU_SOURCE).
- Removed temporary play-test code for shiftd that accidentally got
  committed into test/3m4m/test_gemm.c.
2018-10-23 19:16:54 -05:00
Field G. Van Zee
eac7d267a0 Unconditionally define bli_l3_thread_entry().
Details:
- Define a dummy bli_l3_thread_entry() function when multithreading is
  disabled altogether, or enabled via OpenMP. This function was
  originally necessary when multithreading is enabled via pthreads.
  By defining the function no matter the threading options given, it is
  less likely that an AppVeyor Windows build will complain due to a
  missing symbol in the DLL. (To be clear: AppVeyor was working fine
  before, but a problem may have arisen if it were switched to an
  OpenMP build.)
- Removed the prototype for bli_l3_thread_entry() from
  bli_thrcomm_pthreads.c and placed it in bli_thrcomm.h.
- Regenerated the symbols list file build/libblis-symbols.def.
2018-10-22 18:10:59 -05:00
Field G. Van Zee
4ee986f0a7 Added mixed-datatype testing to Travis CI (#271).
Details:
- Modified .travis.yml to automatically test the mixed-datatype support
  of the gemm operation, with supporting changes to common.mk, the
  top-level Makefile, and travis/do_testsuite.sh.
- Added a new pair of input files to testsuite directory with the
  '.mixed' suffix (similar to those with the '.fast' suffix) for testing
  mixed-datatype gemm.
- Updated docs/BuildSystem.md to document the new make targets
  'testblis-md' and 'checkblis-md'.
2018-10-22 14:09:44 -05:00
Field G. Van Zee
c3c6ebc9c6 Fixed thrinfo_t printing for small problems.
Details:
- Fixed a bug in the code that prints out the communicator and work ids
  from the various threads' thrinfo_t nodes. This bug manifested when
  the dimension being parallelized was not large enough such that every
  thread was assigned actual work (since the minimum amount of work is
  determined by the register blocksize in the dimension being
  parallelized). In those cases, the threads that receive no work in
  that dimension do not finish building their thrinfo_t tree, leaving
  lower-level nodes non-existent. (The bug itself was usally observed as
  a segfault when the printing code attempted to dereference all the way
  down the thrinfo_t tree.) The solution involves explicitly checking
  each node as it is dereferenced, and if at any time NULL is found, all
  subsequent communicator and work ids are set to -1.
2018-10-21 18:48:54 -05:00
Field G. Van Zee
73a222c0d9 Minor edits to 'configure --help' text. 2018-10-20 14:13:04 -05:00
Field G. Van Zee
14f3d5e6df Refresh libblis-symbols.def post-merge 090e4f0. 2018-10-19 20:39:35 -05:00
Field G. Van Zee
090e4f08fc Merge branch 'master' into dev 2018-10-19 18:41:10 -05:00
Field G. Van Zee
0854e880b0 Merge pull request #261 from flame/win-pthreads
Implement missing pthreads function on Windows
2018-10-19 18:05:00 -05:00
Field G. Van Zee
c9be5889fb Added "Known issues" section to Multithreading.md.
Details:
- Added known issues section to Multithreading.md.
- Trivial changes to MixedDatatypes.md, Sandboxes.md.
2018-10-19 17:42:40 -05:00
Field G. Van Zee
343a2715eb Whitespace changes to configure, bli_pthread_wrap.
Details:
- Mostly whitespace changes (spaces to tabs) to configure and
  bli_pthread_wrap.c and .h.
2018-10-19 16:59:19 -05:00
Field G. Van Zee
3678a1cd51 Merge branch 'master' into win-pthreads 2018-10-19 16:11:31 -05:00
Field G. Van Zee
4e38a8d4ee Implemented python version checking in configure.
Details:
- Added python version checking to configure script. (Recall that python
  is needed to execute the flatten-headers.py script.) Minimum versions
  of python needed are currently as follows:
    python2: 2.7 or later
    python3: 3.5 or later
  The standard search order for python interpeters is:
    python python3 python2
  The PYTHON environment variable is also supported and will be checked
  before the standard search order list.
- Updated BuildSystem.md to include: a minimum make version; mention
  that the C compiler must actually be a C99 compiler; and the caveat
  that Windows builds do not require pthreads since BLIS can provide
  an implementation of pthreads internally.
2018-10-19 15:54:15 -05:00
Field G. Van Zee
85397cd4fa Added explanatory comment to bli_pthread.c.
Details:
- Added a verbose comment to bli_pthread.c that explains why a bli_
  wrapper to pthreads APIs is useful.
2018-10-19 13:12:43 -05:00
Field G. Van Zee
53c07035ef Refresh libblis-symbols.def from bb6df28.
Details:
- Forgot to regenerate the symbols file after the previous commit
  (bb6df281) in which shiftd operation was introduced.
2018-10-19 12:53:03 -05:00
Field G. Van Zee
473ce54f5f Added bli_pthread_*() API.
Details:
- Defined a bli_pthread_*() API so that the testsuite, when being linked
  against a Windows DLL, will be able to access pthreads functionality
  without those pthreads functions being explicitly exported by the DLL.
  Instead, we export the bli_pthread_*() layer, which uses types and
  functions that are identical to pthreads, but adds a 'bli_' prefix.
  Only a few basic functions are present in the bli_pthreads_*() API
  for now. Thanks to Devin Matthews and Isuru Fernando for their help
  on a related PR (#261) that this commit will hopefully facilitate.
- Updated testsuite so that it calls bli_pthread_*() layer instead of
  pthread_*() functions directly.
- Regenerated build/libblis-symbols.def.
- Comment updated to build/regen-symbols.sh.
2018-10-18 19:03:56 -05:00
Field G. Van Zee
bb6df2814f Defined a new level-1d operation: shiftd.
Details:
- Defined a new level-1d operation called 'shiftd', including object and
  typed APIs. This operation adds a scalar value to every element along
  an arbitrary diagonal of a matrix. Currently, shiftd is implemented in
  terms of the addv kernel. (The scalar is passed in as the x vector
  with an increment of zero.)
- Replaced ad-hoc usage of setd and addd (after creating a temporary
  matrix object) with use of shiftd, which is much more concise, in
  various test driver files in the testsuite. Similar changes were made
  to the standalone test drivers and the example code.
- Added documentation entries in BLISObjectAPI.md and BLISTypedAPI.md
  for bli_shiftd() and bli_?shiftd(), respectively.
- Added observed object properties to level-1d documentation in
  BLISObjectAPI.md.
2018-10-18 17:11:39 -05:00
Field G. Van Zee
53e0a0c9b3 Merge branch 'master' into win-pthreads 2018-10-18 14:54:59 -05:00
Field G. Van Zee
ec67679990 Refreshed Windows symbol list; added regen script.
Details:
- Moved windows/build/libblis-symbols.def to build/libblis-symbols.def.
  Updated link commands in common.mk accordingly.
- Added a new script build/regen-symbols.sh that will regenerate the
  libblis-symbols.def file in its new location after building a
  haswell-targeted shared library. Thanks to Isuru Fernando for
  providing the symbol generation command.
- Ran the new script to refresh the symbols file.
2018-10-18 14:27:02 -05:00
Field G. Van Zee
fdad54ab8e Removed old symbol from libblis-symbols.def.
Details:
- Removed bli_gemm_ker_var1() from windows/build/libblis-symbols.def
  since this function is no longer compiled.
2018-10-18 12:43:22 -05:00
Field G. Van Zee
49d3f9fcbb Merge branch 'master' into dev 2018-10-17 18:00:40 -05:00
Field G. Van Zee
3c52725693 Renamed/moved l3 zen ukernels to haswell kernel set.
Details:
- Renamed the microkernels in kernels/zen/3 to kernels/haswell/3 and
  then updated the file contents to use the 'haswell' infix.
- Updated bli_cntx_init_zen.c and bli_cntx_init_haswell.c according to
  above function renames.
- Moved/updated the corresponding prototypes in bli_kernels_zen.h to
  bli_kernels_haswell.h.
- Updated config_registry according to above changes.
- NOTE: This rename reflects the fact that haswell microkernels are
  specifically written to overcome the floating-point latency for FMA
  instructions on Intel Haswell-like architectures, which can issue two
  FMA instructions per cycle. These ukernels happen to work fine on AMD
  Zen-based architectures. However, Zen only issues one FMA per cycle,
  which, while halving its floating-point throughput, gives it extra
  flexibility in the design of its microkernels--namely, mr and nr can
  be smaller and still overcome the floating-point latency for those
  single-issue cores. A smaller value of mr and nr allows for a larger
  value of kc, which may be useful in some situations. In the future,
  we may write such Zen-specific microkernels to take advantage of this
  additional flexibility.
2018-10-17 14:56:22 -05:00
Field G. Van Zee
71c5832d5f Consolidated slab/rr-explicit level-3 macrokernels.
Details:
- Consolidated the *sl.c and *rr.c level-3 macrokernels into a single
  file per sl/rr pair, with those files named as they were before
  c92762e. The consolidation does not take away the *option* of using
  slab or round-robin assignment of micropanels to threads; it merely
  *hides* the choice within the definitions of functions such as
  bli_thread_range_jrir(), bli_packm_my_iter(), and bli_is_last_iter()
  rather than expose that choice explicitly in the code. The choice of
  slab or rr is not always hidden, however; there are some cases
  involving herk and trmm, for example, that require some part of the
  computation to use rr unconditionally. (The --thread-part-jrir option
  controls the partitioning in all other cases.)
- Note: Originally, the sl and rr macrokernels were separated out for
  clarity. However, aside from the additional binary code bloat, I later
  deemed that clarity not worth the price of maintaining the additional
  (mostly similar) codes.
2018-10-17 14:11:01 -05:00
Field G. Van Zee
57eab3a4f0 CREDITS file update. 2018-10-17 11:29:20 -05:00