Commit Graph

1444 Commits

Author SHA1 Message Date
Field G. Van Zee
57eab3a4f0 CREDITS file update. 2018-10-17 11:29:20 -05:00
Ye Luo
6722ec2181 Fix bgclang compilation on BGQ (#270)
* Fix bgq kernels

* Support bgq with bgclang
2018-10-17 11:26:00 -05:00
Field G. Van Zee
b9c61d03f5 Merge branch 'nested-omp-patch' 2018-10-16 14:39:57 -05:00
Field G. Van Zee
5a1e461ffe Execute flatten-headers.py via $(PYTHON).
Details:
- Execute build/flatten-headers.py python script via $(PYTHON) in
  common.mk. This allows distributions that define the current/preferred
  python interpreter in the PYTHON environment variable to use that
  interpreter when executing flatten-headers.py. Thanks to Isuru
  Fernando for this suggestion, and for Dave Love for submitting the
  initial issue/request.
2018-10-16 14:21:45 -05:00
Field G. Van Zee
ed65771482 Fixed merge fail on testsuite threading macros.
Details:
- Applied the following C preprocessor macro renames

    BLIS_DEFAULT_MR_THREAD_MAX  -> BLIS_THREAD_MAX_IR
    BLIS_DEFAULT_NR_THREAD_MAX  -> BLIS_THREAD_MAX_JR
    BLIS_DEFAULT_M_THREAD_RATIO -> BLIS_THREAD_RATIO_M
    BLIS_DEFAULT_N_THREAD_RATIO -> BLIS_THREAD_RATIO_N

  in src/test_libblis.c. This is apparently the result of a failure by
  git to properly merge the 'master' and 'amd' branches in the previous
  commit. (The 'master' branch contained a commit, 53a9ab1, in which
  these same cpp macros were renamed throughout the source distribution.
2018-10-15 17:54:45 -05:00
Field G. Van Zee
dc5fd898af Merge branch 'amd' 2018-10-15 17:41:35 -05:00
Field G. Van Zee
3612ecac98 Added comments to nested OpenMP handling code.
Details:
- Added comments to bli_thrcomm_openmp.c relating to changes made in
  6ac0c80 and 1064d79.
2018-10-11 15:16:41 -05:00
Field G. Van Zee
667d3929ee Added Fortran APIs for some thread functions.
Details:
- Defined Fortran-77 compatible APIs for bli_thread_set_num_threads()
  and bli_thread_set_ways(). These wrappers are defined in
  frame/compat/blis/thread/b77_thread.c. Thanks to Kay Dewhurst for
  suggesting these new interfaces.
- Added missing prototype for bli_thread_set_ways() in bli_thread.h and
  removed prototypes for non-existent functions bli_thread_set_*_nt().
- CREDITS file update.
2018-10-11 11:47:57 -05:00
Devin Matthews
1064d79711 Adjust rntm_t struct as well. 2018-10-11 11:14:25 -05:00
Devin Matthews
6ac0c80560 Fix OMP nesting problem.
Detect when OpenMP uses fewer threads than requested and correct accordingly, so that we don't wait forever for nonexistent threads. Fixes #267.
2018-10-11 10:45:07 -05:00
Field G. Van Zee
53a9ab1c85 Renamed thread auto-factorization macro constants.
Details:
- Renamed the following C preprocessor macros whose fallback/default
  values are specified within frame/include/bli_kernel_macro_defs.h:

    BLIS_DEFAULT_MR_THREAD_MAX  -> BLIS_THREAD_MAX_IR
    BLIS_DEFAULT_NR_THREAD_MAX  -> BLIS_THREAD_MAX_JR
    BLIS_DEFAULT_M_THREAD_RATIO -> BLIS_THREAD_RATIO_M
    BLIS_DEFAULT_N_THREAD_RATIO -> BLIS_THREAD_RATIO_N

- Renamed the above cpp macro overrides within the knl, skx, and zen
  sub-configurations, as well as invocations of those macros in
  bli_rntm.c.
- Moved config/zen/bli_kernel.h to an 'old' directory as it is no longer
  used by any code within BLIS.
2018-10-10 15:11:09 -05:00
Field G. Van Zee
637c2ce794 Updated column index range for irun.py -q.
Details:
- Forgot to apply the column index range fix in 10f179f to situations
  when "quiet" mode (-q) is requested. This commit applies the new
  column index range modifications to the quiet case.
2018-10-09 17:18:04 -05:00
Field G. Van Zee
e2a59400bd Allow trsm_l parallelism in the jc loop.
Details:
- Previously, trsm was consolidating all ways of parallelism into the jr
  loop. This was unnecessary and to some degree detrimental on some
  types of hardware. Now, any parallelism bound for the jc loop will be
  applied to the jc loop, while all other loops' parallelism is funneled
  to the jr loop. Thanks to Devangi Parikh for helping investigate this
  issue and suggesting the fix.
- NOTE: This change affects only left-side trsm. However, currently
  right-side trsm is currently implemented in terms of the left-side
  case, and thus the change effectively applies to both left and right
  cases.
2018-10-09 15:29:48 -05:00
Field G. Van Zee
f1dba506c9 Output threading status/params from testsuite.
Details:
- Updated testsuite to output various parameters related to parallelism
  in BLIS. These parameters include:
  - threading status: disabled, openmp, or pthreads;
  - thread partitioning for jr/ir loops: slab or rr (round-robin);
  - ways of parallelism from environment variables, and also actual
    values used by gemm, herk, trmm_l, trmm_r, trsm_l, and trsm_r for
    square problems (assuming all dimensions are set to 1000);
  - automatic thread factorization parameters.
- Also output the status of two relatively new configure-time options:
  libmemkind and the sandbox.
2018-10-08 17:59:41 -05:00
Field G. Van Zee
10f179fb13 Updated irun.py to use updated column index range.
Details:
- Updated the irun.py script so that it updates the matlab column index
  range (if found) to reflect the additional columns of data that are
  substituted in. Thanks to Devangi Parikh for recognizing and reporting
  this issue.
2018-10-08 14:36:38 -05:00
Field G. Van Zee
c244a716c9 Added missing -r option to configure --help output.
Details:
- Added inadvertantly-omitted mention of -r option-equivalent to
  --thread-part-jrir to the output for 'configure --help'. Also made
  minor edits to the same text.
2018-10-07 20:59:40 -05:00
Field G. Van Zee
c92762ecdc Added option of slab or rr partitioning in jr/ir.
Details:
- Updated existing macrokernel function names and definitions to
  explicitly use slab assignment of micropanels to threads, then created
  duplicate versions of macrokernels that explicitly use round-robin
  assignment instead of slab. NOTE: As in ac18949, trsm_r macrokernels
  were not substantially updated in this commit because they are
  currently disabled in bli_trsm_front.c.
- Updated existing packing function (in blk_packm_blk_var1.c) to
  explicitly use slab partitioning, and then duplicated for round-robin.
- Updated control tree initialization to use the appropriate macrokernel
  and packm function pointers depending on which method (slab or rr) was
  enabled at configure-time.
- Updated configure script to accept new --thread-part-jrir=[slab|rr]
  option (-m [slab|rr] for short), which allows the user to explicitly
  request either slab or round-robin assignment (partitioning) of
  micropanels to threads.
- Updated sandbox/ref99 according to above changes.
- Minor updates to build/add-copyright.py.
2018-10-07 20:30:32 -05:00
Field G. Van Zee
98e01ea04b Merge branch 'master' into amd 2018-10-04 20:44:12 -05:00
Field G. Van Zee
541b8a3b3e Removed 1h short-circuit from bli_clock_min_diff().
Details:
- Removed a guard from bli_clock_min_diff() that would return 0 if the
  time delta was greater than 60 minutes. This was originally intended
  to disregard extremely large values under the assumption that the
  user probably didn't intend to run a test that long. However, since
  it is in bli_clock_min_diff(), it doesn't actually help short-circuit
  an implementation that is hanging or looping infinitely, since such
  an implementation would first have to finish before the
  bli_clock_min_diff() is called. Thanks to Kiran Varaganti for
  reporting this issue.
2018-10-04 20:39:06 -05:00
Devangi N. Parikh
8bf30eb473 Fixed runme.sh in test/studies/thunderx2
Details:
- Fixed the setting of threads for a single core run.
2018-10-03 22:22:29 -04:00
Devangi N. Parikh
f6f2456ba2 Fixed the Makefile in test/studies/thunderx2
Details:
- Fixed target for make-all-st and make-all-mt so that the armpl
  targets are built
2018-10-03 21:43:46 -04:00
Field G. Van Zee
743a1a6dec Fixed misleading version query from gcc 7+.
Details:
- gcc 7 introduced new behavior to the -dumpversion option whereby only
  the major version component is output. However, as part of this
  change, gcc 7 also introduced a new option, -dumpfullversion, which is
  guaranteed to always output the major, minor, and revision numbers. If
  we are using gcc 7 or later, we re-query the version string with this
  new option and then re-parse the result so as to avoid misleading
  output from configure (e.g. using gcc 7.3.0 is reported as 7.7.7).
2018-10-03 14:40:10 -05:00
Field G. Van Zee
de07840ba5 Whitespace, https updates to README.md.
Details:
- Reformatted to fit all lines within 80 columns, unless a link is too
  long to fit on a single line.
- Changed some links from http to https.
2018-10-03 13:57:25 -05:00
Field G. Van Zee
9d5f1c4f3b Patch to avoid gcc warning in blastest/f2c/open.c.
Details:
- Use the modulo operator to limit the size of an integer that is given
  to sprintf(). This avoids a warning in some versions of gcc about the
  integer potentially overflowing the available space in the string into
  which the integer is being printed.
2018-10-01 17:39:26 -05:00
Field G. Van Zee
0c3cd00ba7 More README.md updates.
Details:
- Replaced much of "Getting Started" section with a shortened version of
  the bullet list of documentation currently shown in the github wiki
  page. Thanks to Devangi Parikh for her feedback in this change.
2018-10-01 16:18:25 -05:00
Field G. Van Zee
8eaf34bd23 Very minor README.md update. 2018-10-01 14:29:07 -05:00
Field G. Van Zee
599090e0eb README.md update.
Details:
- Added language mentioning SHPC group to Introduction.
2018-10-01 14:04:30 -05:00
Field G. Van Zee
ac18949a4b Multithreading optimizations for l3 macrokernels.
Details:
- Adjusted the method by which micropanels are assigned to threads in
  the 2nd (jr) and 1st (ir) loops around the microkernel to (mostly)
  employ contiguous "slab" partitioning rather than interleaved (round
  robin) partitioning. The new partitioning schemes and related details
  for specific families of operations are listed below:
  - gemm: slab partitioning.
  - herk: slab partitioning for region corresponding to non-triangular
          region of C; round robin partitioning for triangular region.
  - trmm: slab partitioning for region corresponding to non-triangular
          region of B; round robin partitioning for triangular region.
          (NOTE: This affects both left- and right-side macrokernels:
          trmm_ll, trmm_lu, trmm_rl, trmm_ru.)
  - trsm: slab partitioning.
          (NOTE: This only affects only left-side macrokernels trsm_ll,
          trsm_lu; right-side macrokernels were not touched.)
  Also note that the previous macrokernels were preserved inside of
  the 'other' directory of each operation family directory (e.g.
  frame/3/gemm/other, frame/3/herk/other, etc).
- Updated gemm macrokernel in sandbox/ref99 in light of above changes
  and fixed a stale function pointer type in blx_gemm_int.c
  (gemm_voft -> gemm_var_oft).
- Added standalone test drivers in test/3m4m for herk, trmm, and trsm
  and minor changes to test/3m4m/Makefile.
- Updated the arguments and definitions of bli_*_get_next_[ab]_upanel()
  and bli_trmm_?_?r_my_iter() macros defined in bli_l3_thrinfo.h.
- Renamed bli_thread_get_range*() APIs to bli_thread_range*().
2018-09-30 18:54:56 -05:00
Field G. Van Zee
b952ca8feb CREDITS file update. 2018-09-28 16:12:32 -05:00
Field G. Van Zee
7d96fc437e Allow slashes ('/') in version tags.
Details:
- Updated the configure script to allow slashes in version string. This
  is needed so that downstream maintainers (such as those for Debian)
  can create local tags such as "upstream/0.4.1". Thanks to M. Zhou for
  reporting this issue via PR #256 and providing me the information
  needed to debug the problem.
2018-09-28 15:40:45 -05:00
Field G. Van Zee
5fdddf6f37 Removed 'debian' directory.
Details:
- Removed the top-level 'debian' directory. This directory is apparently
  no longer needed (issue #257). Thanks to M. Zhou and Nico Schlömer for
  their contributions.
2018-09-28 11:25:54 -05:00
Field G. Van Zee
60b2650d74 Added statistics-collecting irun.py script.
Details:
- Added irun.py script to 'build' directory. This irun.py script is a
  python script for repeatedly invoking a test driver executable, such
  as those found in test/3m4m, and replace the performance output column
  with four columns that aggregate statistics. Specifically, the script
  reports the minimum, average, maximum, and standard deviation for each
  problem size. This script is useful especially (though not
  exclusively) when trying to determine the impact of relatively minor
  changes to the code, or other small optimizations that may be
  difficult to distinguish from "noise." One way this "noise" manifests
  is that a test executable may run slightly slower or faster for all
  problem sizes (and all implementations) tested by the executable over
  the life of a single execution. The cause of these minor
  across-the-board pertubations in the overall performance signatures is
  unknown, though we hypothesize that it may relate to any number of
  issues such as operating system scheduling, where in memory the
  program is loaded, or how the CPU clock frequency is throttled at the
  time of execution. Regardless of the source of these subtle
  performance anomalies, the statistical properties reported by the
  irun.py script help the user to more precisely characterize the
  underlying performance exhibited by any given test driver, which
  allows him or her to make better judgments about the true difference
  in performance between two implementations, or minor changes within a
  single implementation.
2018-09-24 15:04:45 -05:00
Field G. Van Zee
807a654888 Fixed confusing configure message for libmemkind.
Details:
- Corrected feedback echoed to user by configure when libmemkind is
  found but not explicitly requested. In these cases, configure would
  echo a message that it had received an explicit request to enable
  libmemkind, which was not accurate, even if the end result was the
  same--that libmemkind is enabled by default when it is found. Thanks
  To Devangi Parikh for reporting this issue.
2018-09-20 15:41:05 -05:00
Devangi N. Parikh
02adab427c Created a 'thunderx2' subdirectory within test/studies
Details:
- Created a 'thunderx2' subdirectory within test/studies to house
  various level-3 test driver used to measure performance on
  ThunderX2.
2018-09-20 14:38:50 -04:00
Field G. Van Zee
d7537fb51d Merge branch 'dev' 2018-09-12 15:24:20 -05:00
Devangi N. Parikh
dad07245db Fixed yet another bug in runme script in test/studies
Details:
- Fixed another copy-paste bug
2018-09-12 04:16:58 -05:00
Devangi N. Parikh
e669057fe3 Fixed bug in runme script in test/studies
Details:
- Fixed bug in runme script for skx studies that set the number of
  threads incorrectly
2018-09-11 22:29:42 -05:00
Devangi N. Parikh
232fdc3df3 Updated runme script in test/studies.
Details:
- Updated runme script for skx studies to run multithreading tests
  on 1 and 2 sockets.
2018-09-10 18:45:50 -05:00
Field G. Van Zee
c03728f1f4 Various minor cleanups.
Details:
- Rewrote bli_winsys.c to define bli_setenv() and bli_sleep()
  unconditionally, but differently for Windows and non-Windows, but
  then disabled the definition of bli_setenv() entirely since BLIS
  no longer needs to set environment variables. Updated bli_winsys.h
  accordingly, and call bli_sleep() from within testsuite instead of
  sleep() directly.
- Use
    #if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS != 200809L)
  instead of
    #if !defined(_POSIX_BARRIERS) || (_POSIX_BARRIERS < 0)
  when guarding against local definition of pthread barrier in
  testsuite. (The description for unistd.h implies that _POSIX_BARRIERS
  should always be set to 200809L when barriers are supported, though I
  won't be surprised if we encounter a case in the future where it is
  set to something else such as 1 while still supported.)
- Removed old _VERS_CONF_INST definitions and installation rules in
  top-level Makefile. These are no longer needed because we no longer
  output libraries with the version and configuration name as
  substrings.
- Comment/whitespace updates in Makefile, config.mk.in, common.mk,
  configure, bli_extern_defs.h, and test_libblis.h.
- Added mention of 1m to README.md and other trivial tweaks.
2018-09-10 17:54:27 -05:00
Field G. Van Zee
e249a00a82 Imported skx dgemm ukernel from skx-redux branch.
Details:
- Added the new bli_dgemm_skx_asm_16x14.c microkernel from the skx-redux
  branch, along with appropriate blocksizes in bli_cntx_init_skx.c and
  a prototype in bli_kernels_skx.h. (Devin has not yet written the
  sgemm analague, so for now we will continue using the older sgemm
  ukernel.)
- Updated frame/include/bli_x86_asm_macros.h with a minor change that
  was present within the skx-redux branch.
2018-09-10 16:48:35 -05:00
Isuru Fernando
e93b01ff60 Windows DLL support (#246)
* Enable shared

* Enable rdp

* Add support for dll

* Use libblis-symbols.def

* Fix building dlls

* Fix libblis-symbols.def

* Fix soname

* Fix Makefile error

* Fix install target

* Fix missing symbols

* Add BLIS_MINUS_TWO

* Add path to dll

* Fix OSX soname

* Add declspec for dll

* Add -DBLIS_BUILD_DLL

* Replace @enable_shared@ in config

* switch to auto for now

* blis_ -> bli_

* Remove BLIS_BUILD_DLL in make check

* change auto->haswell

* enable_shared_01

* Add wno-macro-redefined

* print out.cblat3

* BLIS_BUILD_DLL -> BLIS_IS_BUILDING_LIBRARY

* Use V=1

* Remove fpic for windows

* Remember LIBPTHREAD

* Remove libm for windows

* Remember AR

* Fix remembering libpthread

* Add Wno-maybe-uninitialized in only gcc

* Don't do blastest for shared for now

* Fix install target

And remove unnecessary change

* test auto and x86_64

* Fix install target again

* Use IS_WIN variable

* Remove leading dot from LIBBLIS_SO_MAJ_EXT

* Make is_win yes/no

* Add comments for windows builds

* Change if else blocks location
2018-09-09 15:57:43 -05:00
Field G. Van Zee
1330d5c4bc Employ "user" cflags for tl Makefile test targets.
Details:
- Use get-user-cflags-for() to generate cflags when compiling BLAS test
  drivers and BLIS testsuite from top-level Makefile. Meant to include
  these changes in previous commit (4b5437e). Thanks to Isuru Fernando
  for pointing out this oversight.
2018-09-07 19:37:59 -05:00
Field G. Van Zee
4b5437ec7a Define a cpp macro specific to BLIS compilation.
Details:
- Tweaked the cflags functions in common.mk so that a new preprocessor
  macro, BLIS_IS_BUILDING_LIBRARY, is defined, but only when BLIS
  itself is being built. This macro will not be defined when, for
  example, the testsuite or example code compiles code local to those
  applications. This was done in part by defining a new cflags function
  get-user-cflags-for(), which is now the designated function for
  application Makefiles if they wish to inherit a basic set of CFLAGS
  from BLIS. (The compiler flags returned are identical to that of
  get-frame-cflags-for() except that -DBLIS_IS_BUILDING_LIBRARY is
  omitted.)
- Updated all test driver-like makefiles to call get-user-cflags-for()
  instead of get-frame-cflags-for().
2018-09-07 17:24:32 -05:00
Field G. Van Zee
cc2cca4f56 Merge branch 'dev' 2018-09-06 17:12:13 -05:00
Jeff Hammond
e19e721287 Merge pull request #244 from kali/pthread-barrier-osx
add an adhoc impl for pthread_barrier
2018-09-06 14:58:49 -07:00
Jeff Hammond
b3d0702cf2 Merge branch 'master' into pthread-barrier-osx 2018-09-06 14:58:23 -07:00
Mathieu Poumeyrol
4e7d06700f second __APPLE__ 2018-09-06 23:48:31 +02:00
Field G. Van Zee
fb81c7fc66 Defined cortexa53 sub-configuration.
Details:
- Added a new sub-configuration 'cortexa53', which is a mirror image
  of cortexa57 except that it will use slightly different compiler
  flags. Thanks to Mathieu Poumeyrol for making this suggestion after
  discovering that the compiler flags being used by cortexa57 were
  not working properly in certain OS X environments (the fix to which
  is currently pending in pull request #245).
2018-09-06 16:29:39 -05:00
Mathieu Poumeyrol
24ecc0d94a use _POSIX_BARRIERS instead of __APPLE__ 2018-09-06 22:10:16 +02:00
Mathieu Poumeyrol
97965b0905 cortexa9 and cortexa53 travis build + qemu test (#245) 2018-09-06 14:10:29 -05:00