Commit Graph

115 Commits

Author SHA1 Message Date
Devrajegowda, Kiran
85fa9e4107 resolved merge conflicts when merged with public repo master branch
Change-Id: Iad6ba809680ba5081cc9d7879794ef58cc8f8a40
2019-11-25 14:46:48 +05:30
Field G. Van Zee
8f399c8940 Tweaked/added notes to docs/Multithreading.md.
Details:
- Added language to docs/Multithreading.md cautioning the reader about
  the nuances of setting multithreading parameters via the manual and
  automatic ways simultaneously, and also about how these parameters
  behave when multithreading is disabled at configure-time. These
  changes are an attempt to address the issues that arose in issue #362.
  Thanks to Jérémie du Boisberranger for his feedback on this topic.
- CREDITS file update.
2019-11-12 15:32:57 -06:00
Field G. Van Zee
a617301f93 Updates to docs/CodingConventions.md. 2019-10-08 17:14:05 -05:00
Field G. Van Zee
171f100691 Merge remote-tracking branch 'loveshack/emacs' 2019-10-04 11:18:23 -05:00
Field G. Van Zee
702486b125 Removed stray FAQ section introduced in 1907000. 2019-10-02 16:35:41 -05:00
Field G. Van Zee
1907000ad6 Updated to FAQ (AMD-related questions).
Details:
- Added a couple potential frequently-asked questions/answers releated
  to AMD's fork of BLIS.
- Updated existing answers to other questions.
2019-10-02 16:31:54 -05:00
Field G. Van Zee
834f30a0da Mention mixeddt paper in docs/MixedDatatypes.md. 2019-10-02 12:45:56 -05:00
Dave Love
05d58edfe0 Note .dir-locals.el in docs 2019-10-02 10:45:50 +01:00
ShmuelLevine
6c8f2d1486 Fix description for function bli_*pxby2v (#340)
Fix typo in BLISTypedAPI.md for bli_?axpy2v() description.
2019-09-17 15:43:46 -05:00
Field G. Van Zee
b5679c1520 Inserted Multithreading links into BuildSystem.md.
Details:
- Inserted brief disclaimers about default disabled multithreading
  and default single-threadedness to BuildSystem.md along with links to
  the Multithreading.md document. Thanks to Jeff Diamond for suggesting
  these additions.
- Trivial reword of sentence regarding automatically-detected
  architectures.
2019-09-17 14:00:37 -05:00
Field G. Van Zee
80e6c10b72 Added reproduction section to Performance docs.
Details:
- Added section titled "Reproduction" to both Performance.md and
  PerformanceSmall.md that briefly nudges the motivated reader in the
  right direction if he/she wishes to run the same performance
  benchmarks used to produce the graphs shown in those documents.
  Thanks to Dave Love for making this suggestion.
2019-08-29 12:12:08 -05:00
Field G. Van Zee
14cb426414 Updated OpenBLAS, Eigen sup results.
Details:
- Updated the results shown in docs/PerformanceSmall.md for OpenBLAS and
  Eigen.
2019-08-28 17:04:33 -05:00
Field G. Van Zee
d5a05a15a7 Cropped whitespace from new sup graphs.
Details:
- Previously forgot crop whitespace from the new .png graphs
  added/updated in docs/graphs/sup.
2019-08-26 16:54:31 -05:00
Field G. Van Zee
a6c80171a3 Fixed contents links in docs/PerformanceSmall.md.
Details:
- Corrected links in contents section of docs/PerformanceSmall.md,
  which were erroneously directing readers to the corresponding
  sections of docs/Performance.md.
2019-08-26 16:51:31 -05:00
Field G. Van Zee
40781774df Updated sup performance graphs with libxsmm.
Details:
- Added libxsmm to column-stored sup graphs presented in
  docs/PerformanceSmall.md.
- Updated sup results for BLASFEO.
- Added sup results for Lonestar5 (Haswell).
- Addresses issue #326.
2019-08-26 16:47:37 -05:00
Field G. Van Zee
66c43ca427 Updated BLASFEO results in PerformanceSmall.md.
Details:
- Updated the BLASFEO performance graphs shown in PerformanceSmall.md
  using a new commit of BLASFEO (2c9f312); updated PerformanceSmall.md
  accordingly.
- Updated test/sup/octave/plot_l3sup_perf.m so that the .m files
  containing the mpnpkp results do not need to be preprocessed in order
  to plot half the problem size range (ie: up to 400 instead of the
  800 range of the other shape cases).
- Trivial updates to runme.m.
2019-08-23 14:18:09 +05:30
Field G. Van Zee
2bf1ad11a7 Fixed formatting/typo in docs/PerformanceSmall.md. 2019-08-23 14:18:09 +05:30
Field G. Van Zee
bb4a01f130 Added BLASFEO results to docs/PerformanceSmall.md.
Details:
- Updated the graphs linked in PerformanceSmall.md with BLASFEO results,
  and added documenting language accordingly.
- Updated scripts in test/sup/octave to plot BLASFEO data.
- Minor tweak to language re: how OpenBLAS was configured for
  docs/Performance.md.
2019-08-23 14:18:09 +05:30
Field G. Van Zee
2ae8faa252 ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
- CREDITS file update.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
d5903a8393 Minor edits to docs/PerformanceSmall.md.
Details:
- Added performance analysis to "Comments" section of both Kaby Lake and
  Epyc sections.
- Added emphasis to certain passages.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
55e7b045c3 Added sup performance graphs/document to 'docs'.
Details:
- Added a new markdown document, docs/PerformanceSmall.md, which
  publishes new performance graphs for Kaby Lake and Epyc showcasing
  the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
  For now, only single-threaded dgemm performance is shown.
- Reorganized graphs in docs/graphs into docs/graphs/large, with new
  graphs being placed in docs/graphs/sup.
- Updates to scripts in test/sup/octave, mostly to allow decent output
  in both GNU octave and Matlab.
- Updated README.md to mention and refer to the new PerformanceSmall.md
  document.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
defe789b8c Minor rewording of language around mt env. vars. 2019-08-23 14:18:08 +05:30
Field G. Van Zee
73970bf124 Added BLIS theading info to Performance.md.
Details:
- Documented the BLIS environment variables that were set
  (e.g. BLIS_JC_NT, BLIS_IC_NT, BLIS_JR_NT) for each machine and
  threading configuration in order to achieve the parallelism reported
  on in docs/Performance.md.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
184ba1b3d5 GNU-like handling of installation prefix et al.
Details:
- Changed the default installation prefix from $HOME/lib to /usr/local.
- Modified the way configure internally handles the prefix, libdir,
  includedir, and sharedir (and also added an --exec-prefix option).
  The defaults to these variables are set as follows:
    prefix:      /usr/local
    exec_prefix: ${prefix}
    libdir:      ${exec_prefix}/lib
    includedir:  ${prefix}/include
    sharedir:    ${prefix}/share
  The key change, aside from the addition of exec_prefix and its use to
  define the default to libdir, is that the variables are substituted
  into config.mk with quoting that delays evaluation, meaning the
  substituted values may contain unevaluated references to other
  variables (namely, ${prefix} and ${exec_prefix}). This more closely
  follows GNU conventions, including those used by GNU autoconf, and
  also allows make to override any one of the variables *after*
  configure has already been run (e.g. during 'make install').
- Updates to build/config.mk.in pursuant to above changes.
- Updates to output of 'configure --help' pursuant to above changes.
- Updated docs/BuildSystem.md to reflect the new default installation
  prefix, as well as mention EXECPREFIX and SHAREDIR.
- Changed the definitions of the UNINSTALL_OLD_* variables in the
  top-level Makefile to use $(wildcard ...) instead of 'find'. This
  was motivated by the new way of handling prefix and friends, which
  leads to the 'find' command being run on /usr/local (by default),
  which can take a while almost never yielding any benefit (since the
  user will very rarely use the uninstall-old targets).
- Removed periods from the end of descriptive output statements (i.e.,
  non-verbose output) since those statements often end with file or
  directory paths, which get confusing to read when puctuated by a
  period.
- Trival change to 'make showconfig' output.
- Removed my name from 'configure --help'. (Many have contributed to it
  over the years.)
- In configure script, changed the default state of threading_model
  variable from 'no' to 'off' to match that of debug_type, where there
  are similarly more than two valid states. ('no' is still accepted
  if given via the --enable-debug= option, though it will be
  standardized to 'off' prior to config.mk being written out.)
- Minor variable name change in flatten-headers.py that was intended for
  32812ff.
- CREDITS file update.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
959d8d906a Minor update to docs/HardwareSupport.md document.
Details:
- Added more details and clarifying language to implications of 1m and
  the recycling of microkernels between microarchitectures.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
22768bf959 Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
  results, this time using a development version cloned from their git
  mirror on March 27, 2019 (version 3.3.90). Performance is improved
  over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
  test/3/matlab.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
cb45eb9ae2 Minor text updates (Eigen) to docs/Performance.md.
Details:
- Added/updated a few more details, mostly regarding Eigen.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
3e94a0ffd2 Added Eigen results to performance graphs.
Details:
- Updated the Haswell, SkylakeX, and Epyc performance graphs in
  docs/graphs to report on Eigen implementations, where applicable.
  Specifically, Eigen implements all level-3 operations sequentially,
  however, of those operations it only provides multithreaded gemm.
  Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
  omitted. Thanks to Sameer Agarwal for his help configuring and
  using Eigen.
- Updated docs/Performance.md to note the new implementation tested.
- CREDITS file update.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
a9270bef26 Allow disabling of BLAS prototypes at compile-time.
Details:
- Modified bli_blas.h so that:
  - By default, if the BLAS layer is enabled at configure-time, BLAS
    prototypes are also enabled within blis.h;
  - But if the user #defines BLIS_DISABLE_BLAS_DEFS prior to including
    blis.h, BLAS prototypes are skipped over entirely so that, for
    example, the application or some other header pulled in by the
    application may prototype the BLAS functions without causing any
    duplication.
- Updated docs/BuildSystem.md to document the feature above, and
  related text.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
a1c8b11b3f Added Eigen support to test/3 Makefile, runme.sh.
Details:
- Added targets to test/3/Makefile that link against a BLAS library
  build by Eigen. It appears, however, that Eigen's BLAS library does
  not support multithreading. (It may be that multithreading is only
  available when using the native C++ APIs.)
- Updated runme.sh with a few Eigen-related tweaks.
- Minor tweaks to docs/Performance.md.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
366c4b14c0 More minor tweaks to docs/Performance.md.
Details:
- Defined GFLOPS as billions of floating-point operations per second,
  and reworded the sentence after about normalization.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
14bc42f3f7 ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
cd81a6a50a Very minor tweaks to Performance.md. 2019-08-23 14:18:08 +05:30
Field G. Van Zee
6385a3ed51 Minor fixes to docs/Performance.md.
Details:
- Fixed some incorrect labels associated with the pdf/png graphs,
  apparently the result of copy-pasting.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
25db9033d9 Fixed broken section links in docs/Performance.md.
Details:
- Fixed a few broken section links in the Contents section.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
c6793be46e Added docs/Performance.md and docs/graphs subdir.
Details:
- Added a new markdown document, docs/Performance.md, which reports
  performance of a representative set of level-3 operations across a
  variety of hardware architectures, comparing BLIS to OpenBLAS and a
  vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
  in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
2019-08-23 14:18:08 +05:30
Field G. Van Zee
3e16033ead Updates to docs/Multithreading.md.
Details:
- Made extra explicit the fact that: (a) multithreading in BLIS is
  disabled by default; and (b) even with multithreading enabled, the
  user must specify multithreading at runtime in order to observe
  parallelism. Thanks to M. Zhou for suggesting these clarifications
  in #292.
- Also made explicit that only the environment variable and global
  runtime API methods are available when using the BLAS API. If the
  user wishes to use the local runtime API (specify multithreading on
  a per-call basis), one of the native BLIS APIs must be used.
2019-08-23 14:18:07 +05:30
Field G. Van Zee
2f387e32ef Added Eigen -march=native hack to perf docs.
Details:
- Spell out the hack given to me by Sameer Agarwal in order to get Eigen
  to build with -march=native (which is critically important for Eigen)
  in docs/Performance.md and docs/PerformanceSmall.md.
2019-08-22 14:27:30 -05:00
Field G. Van Zee
e6ac4ebcb6 Added page size, source location to perf docs.
Details:
- Added the page size, as returned via 'getconf -a | grep PAGE_SIZE',
  and the location of the performance drivers to docs/Performance.md
  (test/3) and docs/PerformanceSmall.md (test/sup). Thanks to Dave
  Love for suggesting these additions in #325.
2019-08-20 13:49:47 -05:00
Field G. Van Zee
c152109e9a Updated BLASFEO results in PerformanceSmall.md.
Details:
- Updated the BLASFEO performance graphs shown in PerformanceSmall.md
  using a new commit of BLASFEO (2c9f312); updated PerformanceSmall.md
  accordingly.
- Updated test/sup/octave/plot_l3sup_perf.m so that the .m files
  containing the mpnpkp results do not need to be preprocessed in order
  to plot half the problem size range (ie: up to 400 instead of the
  800 range of the other shape cases).
- Trivial updates to runme.m.
2019-06-19 13:23:24 -05:00
Field G. Van Zee
ce671917b2 Fixed formatting/typo in docs/PerformanceSmall.md. 2019-06-06 14:17:21 -05:00
Field G. Van Zee
cbaa22e1ca Added BLASFEO results to docs/PerformanceSmall.md.
Details:
- Updated the graphs linked in PerformanceSmall.md with BLASFEO results,
  and added documenting language accordingly.
- Updated scripts in test/sup/octave to plot BLASFEO data.
- Minor tweak to language re: how OpenBLAS was configured for
  docs/Performance.md.
2019-06-04 16:06:58 -05:00
Field G. Van Zee
0f1b3bf49e ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
- CREDITS file update.
2019-06-03 18:35:19 -05:00
Field G. Van Zee
27da2e8400 Minor edits to docs/PerformanceSmall.md.
Details:
- Added performance analysis to "Comments" section of both Kaby Lake and
  Epyc sections.
- Added emphasis to certain passages.
2019-06-03 17:14:56 -05:00
Field G. Van Zee
09ba05c6f8 Added sup performance graphs/document to 'docs'.
Details:
- Added a new markdown document, docs/PerformanceSmall.md, which
  publishes new performance graphs for Kaby Lake and Epyc showcasing
  the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
  For now, only single-threaded dgemm performance is shown.
- Reorganized graphs in docs/graphs into docs/graphs/large, with new
  graphs being placed in docs/graphs/sup.
- Updates to scripts in test/sup/octave, mostly to allow decent output
  in both GNU octave and Matlab.
- Updated README.md to mention and refer to the new PerformanceSmall.md
  document.
2019-06-03 16:53:19 -05:00
Field G. Van Zee
755730608d Minor rewording of language around mt env. vars. 2019-05-23 17:34:36 -05:00
Field G. Van Zee
ba31abe73c Added BLIS theading info to Performance.md.
Details:
- Documented the BLIS environment variables that were set
  (e.g. BLIS_JC_NT, BLIS_IC_NT, BLIS_JR_NT) for each machine and
  threading configuration in order to achieve the parallelism reported
  on in docs/Performance.md.
2019-05-23 14:59:53 -05:00
Field G. Van Zee
89a70cccf8 GNU-like handling of installation prefix et al.
Details:
- Changed the default installation prefix from $HOME/lib to /usr/local.
- Modified the way configure internally handles the prefix, libdir,
  includedir, and sharedir (and also added an --exec-prefix option).
  The defaults to these variables are set as follows:
    prefix:      /usr/local
    exec_prefix: ${prefix}
    libdir:      ${exec_prefix}/lib
    includedir:  ${prefix}/include
    sharedir:    ${prefix}/share
  The key change, aside from the addition of exec_prefix and its use to
  define the default to libdir, is that the variables are substituted
  into config.mk with quoting that delays evaluation, meaning the
  substituted values may contain unevaluated references to other
  variables (namely, ${prefix} and ${exec_prefix}). This more closely
  follows GNU conventions, including those used by GNU autoconf, and
  also allows make to override any one of the variables *after*
  configure has already been run (e.g. during 'make install').
- Updates to build/config.mk.in pursuant to above changes.
- Updates to output of 'configure --help' pursuant to above changes.
- Updated docs/BuildSystem.md to reflect the new default installation
  prefix, as well as mention EXECPREFIX and SHAREDIR.
- Changed the definitions of the UNINSTALL_OLD_* variables in the
  top-level Makefile to use $(wildcard ...) instead of 'find'. This
  was motivated by the new way of handling prefix and friends, which
  leads to the 'find' command being run on /usr/local (by default),
  which can take a while almost never yielding any benefit (since the
  user will very rarely use the uninstall-old targets).
- Removed periods from the end of descriptive output statements (i.e.,
  non-verbose output) since those statements often end with file or
  directory paths, which get confusing to read when puctuated by a
  period.
- Trival change to 'make showconfig' output.
- Removed my name from 'configure --help'. (Many have contributed to it
  over the years.)
- In configure script, changed the default state of threading_model
  variable from 'no' to 'off' to match that of debug_type, where there
  are similarly more than two valid states. ('no' is still accepted
  if given via the --enable-debug= option, though it will be
  standardized to 'off' prior to config.mk being written out.)
- Minor variable name change in flatten-headers.py that was intended for
  32812ff.
- CREDITS file update.
2019-04-11 18:33:08 -05:00
Field G. Van Zee
bec90e0b6a Minor update to docs/HardwareSupport.md document.
Details:
- Added more details and clarifying language to implications of 1m and
  the recycling of microkernels between microarchitectures.
2019-04-02 17:45:13 -05:00
Field G. Van Zee
7bc75882f0 Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
  results, this time using a development version cloned from their git
  mirror on March 27, 2019 (version 3.3.90). Performance is improved
  over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
  test/3/matlab.
2019-03-28 17:40:50 -05:00