Commit Graph

105 Commits

Author SHA1 Message Date
Field G. Van Zee
ceb9b95a96 Fixed incorrect link to shiftd in BLISTypedAPI.md.
Details:
- Previously, the entry for shiftd in the Operation index section of
  BLISTypedAPI.md was incorrectly linking to the shiftd operation entry
  in BLISObjectAPI.md. This has been fixed. Thanks to Jeff Diamond for
  helping find this incorrect link.
2020-06-18 17:15:25 -05:00
Isuru Fernando
31af73c11a Expand windows instructions (#414)
* Expand windows instructions

* Windows: both static and shared don't work at the same time
2020-06-18 13:35:54 -05:00
Isuru Fernando
35e38fb693 FIx typo in FAQ 2020-06-16 09:08:31 -07:00
Isuru Fernando
943a21def0 Add build instructions for Windows (#404) 2020-05-21 14:09:21 -05:00
Field G. Van Zee
fbef422f0d Separate OS X and Windows into separate FAQs.
Details:
- Separated the unified Mac OS X / Windows frequently asked question
  into two separate questions, one for each OS.
2020-05-21 10:30:41 -05:00
Field G. Van Zee
c53b5153be Documented Perl prerequisite for build system.
Details:
- Added Perl to list of prerequisites for building BLIS. This is in part
  (and perhaps completely?) due to some substitution commands used at
  the end of configure that include '\n' characters that are not
  properly interpreted by the version of sed included on some versions
  of OS X. This new documentation addresses issue #398.
2020-05-05 12:39:12 -05:00
Yingbo Ma
4d87eb24e8 Update KernelsHowTo.md (#395) 2020-04-27 16:02:47 -05:00
Field G. Van Zee
8bde63ffd7 Adding missing conjy to her2/syr2 in typed API doc.
Details:
- Fixed a missing argument (conjy) in the function signatures of
  bli_?her2() and bli_?syr2() in docs/BLISTypedAPI.md. Thanks to Robert
  van de Geijn for reporting this omission.
2020-04-18 12:50:12 -05:00
Field G. Van Zee
b04de636c1 ReleaseNotes.md update in advance of next version.
Details:
- Updated docs/ReleaseNotes.md in preparation for next version.
2020-04-07 14:37:43 -05:00
Field G. Van Zee
0f9e0399e1 Updated sup performance graphs; added mt results.
Details:
- Reran all existing single-threaded performance experiments comparing
  BLIS sup to other implementations (including the conventional code
  path within BLIS), using the latest versions (where appropriate).
- Added multithreaded results for the three existing hardware types
  showcased in docs/PerformanceSmall.md: Kaby Lake, Haswell, and Epyc
  (Zen1).
- Various minor updates to the text in docs/PerformanceSmall.md.
- Updates to the octave scripts in test/sup/octave, test/supmt/octave.
2020-03-05 17:03:21 -06:00
Field G. Van Zee
c0558fde45 Support multithreading within the sup framework.
Details:
- Added multithreading support to the sup framework (via either OpenMP
  or pthreads). Both variants 1n and 2m now have the appropriate
  threading infrastructure, including data partitioning logic, to
  parallelize computation. This support handles all four combinations
  of packing on matrices A and B (neither, A only, B only, or both).
  This implementation tries to be a little smarter when automatic
  threading is requested (e.g. via BLIS_NUM_THREADS) in that it will
  recalculate the factorization in units of micropanels (rather than
  using the raw dimensions) in bli_l3_sup_int.c, when the final
  problem shape is known and after threads have already been spawned.
- Implemented bli_?packm_sup_var2(), which packs to conventional row-
  or column-stored matrices. (This is used for the rrc and crc storage
  cases.) Previously, copym was used, but that would no longer suffice
  because it could not be parallelized.
- Minor reorganization of packing-related sup functions. Specifically,
  bli_packm_sup_init_mem_[ab]() are called from within packm_sup_[ab]()
  instead of from the variant functions. This has the effect of making
  the variant functions more readable.
- Added additional bli_thrinfo_set_*() static functions to bli_thrinfo.h
  and inserted usage of these functions within bli_thrinfo_init(), which
  previously was accessing thrinfo_t fields via the -> operator.
- Renamed bli_partition_2x2() to bli_thread_partition_2x2().
- Added an auto_factor field to the rntm_t struct in order to track
  whether automatic thread factorization was originally requested.
- Added new test drivers in test/supmt that perform multithreaded sup
  tests, as well as appropriate octave/matlab scripts to plot the
  resulting output files.
- Added additional language to docs/Multithreading.md to make it clear
  that specifying any BLIS_*_NT variable, even if it is set to 1, will
  be considered manual specification for the purposes of determining
  whether to auto-factorize via BLIS_NUM_THREADS.
- Minor comment updates.
2020-02-17 14:08:08 -06:00
Field G. Van Zee
5db8e710a2 ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
2020-01-14 15:59:59 -06:00
Dave Love
f391b3e2e7 Fix parsing in vpu_count on workstation SKX (#351)
* Fix parsing in vpu_count on workstation SKX

* Document Skylake-X as Haswell for single FMA

* Update vpu_count for Skylake and Cascade Lake models

* Support printing the configuration selected, controlled by the environment

Intended particularly for diagnosing mis-selection of SKX through
unknown, or incorrect, number of VPUs.

* Move bli_log outside the cpp condition, and use it where intended

* Add Fixme comment (Skylake D)

* Mostly superficial edits to commits towards #351.

Details:
- Moved architecture/sub-config logging-related code from bli_cpuid.c
  to bli_arch.c, tweaked names, and added more set/get layering.
- Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
- Content, whitespace changes to new bullet in HardwareSupport.md that
  relates to single-VPU Skylake-Xs.

* Fix comment typos

Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
2020-01-06 14:15:48 -06:00
Field G. Van Zee
8f399c8940 Tweaked/added notes to docs/Multithreading.md.
Details:
- Added language to docs/Multithreading.md cautioning the reader about
  the nuances of setting multithreading parameters via the manual and
  automatic ways simultaneously, and also about how these parameters
  behave when multithreading is disabled at configure-time. These
  changes are an attempt to address the issues that arose in issue #362.
  Thanks to Jérémie du Boisberranger for his feedback on this topic.
- CREDITS file update.
2019-11-12 15:32:57 -06:00
Field G. Van Zee
a617301f93 Updates to docs/CodingConventions.md. 2019-10-08 17:14:05 -05:00
Field G. Van Zee
171f100691 Merge remote-tracking branch 'loveshack/emacs' 2019-10-04 11:18:23 -05:00
Field G. Van Zee
702486b125 Removed stray FAQ section introduced in 1907000. 2019-10-02 16:35:41 -05:00
Field G. Van Zee
1907000ad6 Updated to FAQ (AMD-related questions).
Details:
- Added a couple potential frequently-asked questions/answers releated
  to AMD's fork of BLIS.
- Updated existing answers to other questions.
2019-10-02 16:31:54 -05:00
Field G. Van Zee
834f30a0da Mention mixeddt paper in docs/MixedDatatypes.md. 2019-10-02 12:45:56 -05:00
Dave Love
05d58edfe0 Note .dir-locals.el in docs 2019-10-02 10:45:50 +01:00
ShmuelLevine
6c8f2d1486 Fix description for function bli_*pxby2v (#340)
Fix typo in BLISTypedAPI.md for bli_?axpy2v() description.
2019-09-17 15:43:46 -05:00
Field G. Van Zee
b5679c1520 Inserted Multithreading links into BuildSystem.md.
Details:
- Inserted brief disclaimers about default disabled multithreading
  and default single-threadedness to BuildSystem.md along with links to
  the Multithreading.md document. Thanks to Jeff Diamond for suggesting
  these additions.
- Trivial reword of sentence regarding automatically-detected
  architectures.
2019-09-17 14:00:37 -05:00
Field G. Van Zee
80e6c10b72 Added reproduction section to Performance docs.
Details:
- Added section titled "Reproduction" to both Performance.md and
  PerformanceSmall.md that briefly nudges the motivated reader in the
  right direction if he/she wishes to run the same performance
  benchmarks used to produce the graphs shown in those documents.
  Thanks to Dave Love for making this suggestion.
2019-08-29 12:12:08 -05:00
Field G. Van Zee
14cb426414 Updated OpenBLAS, Eigen sup results.
Details:
- Updated the results shown in docs/PerformanceSmall.md for OpenBLAS and
  Eigen.
2019-08-28 17:04:33 -05:00
Field G. Van Zee
d5a05a15a7 Cropped whitespace from new sup graphs.
Details:
- Previously forgot crop whitespace from the new .png graphs
  added/updated in docs/graphs/sup.
2019-08-26 16:54:31 -05:00
Field G. Van Zee
a6c80171a3 Fixed contents links in docs/PerformanceSmall.md.
Details:
- Corrected links in contents section of docs/PerformanceSmall.md,
  which were erroneously directing readers to the corresponding
  sections of docs/Performance.md.
2019-08-26 16:51:31 -05:00
Field G. Van Zee
40781774df Updated sup performance graphs with libxsmm.
Details:
- Added libxsmm to column-stored sup graphs presented in
  docs/PerformanceSmall.md.
- Updated sup results for BLASFEO.
- Added sup results for Lonestar5 (Haswell).
- Addresses issue #326.
2019-08-26 16:47:37 -05:00
Field G. Van Zee
2f387e32ef Added Eigen -march=native hack to perf docs.
Details:
- Spell out the hack given to me by Sameer Agarwal in order to get Eigen
  to build with -march=native (which is critically important for Eigen)
  in docs/Performance.md and docs/PerformanceSmall.md.
2019-08-22 14:27:30 -05:00
Field G. Van Zee
e6ac4ebcb6 Added page size, source location to perf docs.
Details:
- Added the page size, as returned via 'getconf -a | grep PAGE_SIZE',
  and the location of the performance drivers to docs/Performance.md
  (test/3) and docs/PerformanceSmall.md (test/sup). Thanks to Dave
  Love for suggesting these additions in #325.
2019-08-20 13:49:47 -05:00
Field G. Van Zee
c152109e9a Updated BLASFEO results in PerformanceSmall.md.
Details:
- Updated the BLASFEO performance graphs shown in PerformanceSmall.md
  using a new commit of BLASFEO (2c9f312); updated PerformanceSmall.md
  accordingly.
- Updated test/sup/octave/plot_l3sup_perf.m so that the .m files
  containing the mpnpkp results do not need to be preprocessed in order
  to plot half the problem size range (ie: up to 400 instead of the
  800 range of the other shape cases).
- Trivial updates to runme.m.
2019-06-19 13:23:24 -05:00
Field G. Van Zee
ce671917b2 Fixed formatting/typo in docs/PerformanceSmall.md. 2019-06-06 14:17:21 -05:00
Field G. Van Zee
cbaa22e1ca Added BLASFEO results to docs/PerformanceSmall.md.
Details:
- Updated the graphs linked in PerformanceSmall.md with BLASFEO results,
  and added documenting language accordingly.
- Updated scripts in test/sup/octave to plot BLASFEO data.
- Minor tweak to language re: how OpenBLAS was configured for
  docs/Performance.md.
2019-06-04 16:06:58 -05:00
Field G. Van Zee
0f1b3bf49e ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
- CREDITS file update.
2019-06-03 18:35:19 -05:00
Field G. Van Zee
27da2e8400 Minor edits to docs/PerformanceSmall.md.
Details:
- Added performance analysis to "Comments" section of both Kaby Lake and
  Epyc sections.
- Added emphasis to certain passages.
2019-06-03 17:14:56 -05:00
Field G. Van Zee
09ba05c6f8 Added sup performance graphs/document to 'docs'.
Details:
- Added a new markdown document, docs/PerformanceSmall.md, which
  publishes new performance graphs for Kaby Lake and Epyc showcasing
  the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
  For now, only single-threaded dgemm performance is shown.
- Reorganized graphs in docs/graphs into docs/graphs/large, with new
  graphs being placed in docs/graphs/sup.
- Updates to scripts in test/sup/octave, mostly to allow decent output
  in both GNU octave and Matlab.
- Updated README.md to mention and refer to the new PerformanceSmall.md
  document.
2019-06-03 16:53:19 -05:00
Field G. Van Zee
755730608d Minor rewording of language around mt env. vars. 2019-05-23 17:34:36 -05:00
Field G. Van Zee
ba31abe73c Added BLIS theading info to Performance.md.
Details:
- Documented the BLIS environment variables that were set
  (e.g. BLIS_JC_NT, BLIS_IC_NT, BLIS_JR_NT) for each machine and
  threading configuration in order to achieve the parallelism reported
  on in docs/Performance.md.
2019-05-23 14:59:53 -05:00
Field G. Van Zee
89a70cccf8 GNU-like handling of installation prefix et al.
Details:
- Changed the default installation prefix from $HOME/lib to /usr/local.
- Modified the way configure internally handles the prefix, libdir,
  includedir, and sharedir (and also added an --exec-prefix option).
  The defaults to these variables are set as follows:
    prefix:      /usr/local
    exec_prefix: ${prefix}
    libdir:      ${exec_prefix}/lib
    includedir:  ${prefix}/include
    sharedir:    ${prefix}/share
  The key change, aside from the addition of exec_prefix and its use to
  define the default to libdir, is that the variables are substituted
  into config.mk with quoting that delays evaluation, meaning the
  substituted values may contain unevaluated references to other
  variables (namely, ${prefix} and ${exec_prefix}). This more closely
  follows GNU conventions, including those used by GNU autoconf, and
  also allows make to override any one of the variables *after*
  configure has already been run (e.g. during 'make install').
- Updates to build/config.mk.in pursuant to above changes.
- Updates to output of 'configure --help' pursuant to above changes.
- Updated docs/BuildSystem.md to reflect the new default installation
  prefix, as well as mention EXECPREFIX and SHAREDIR.
- Changed the definitions of the UNINSTALL_OLD_* variables in the
  top-level Makefile to use $(wildcard ...) instead of 'find'. This
  was motivated by the new way of handling prefix and friends, which
  leads to the 'find' command being run on /usr/local (by default),
  which can take a while almost never yielding any benefit (since the
  user will very rarely use the uninstall-old targets).
- Removed periods from the end of descriptive output statements (i.e.,
  non-verbose output) since those statements often end with file or
  directory paths, which get confusing to read when puctuated by a
  period.
- Trival change to 'make showconfig' output.
- Removed my name from 'configure --help'. (Many have contributed to it
  over the years.)
- In configure script, changed the default state of threading_model
  variable from 'no' to 'off' to match that of debug_type, where there
  are similarly more than two valid states. ('no' is still accepted
  if given via the --enable-debug= option, though it will be
  standardized to 'off' prior to config.mk being written out.)
- Minor variable name change in flatten-headers.py that was intended for
  32812ff.
- CREDITS file update.
2019-04-11 18:33:08 -05:00
Field G. Van Zee
bec90e0b6a Minor update to docs/HardwareSupport.md document.
Details:
- Added more details and clarifying language to implications of 1m and
  the recycling of microkernels between microarchitectures.
2019-04-02 17:45:13 -05:00
Field G. Van Zee
7bc75882f0 Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
  results, this time using a development version cloned from their git
  mirror on March 27, 2019 (version 3.3.90). Performance is improved
  over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
  test/3/matlab.
2019-03-28 17:40:50 -05:00
Field G. Van Zee
20ea7a1217 Minor text updates (Eigen) to docs/Performance.md.
Details:
- Added/updated a few more details, mostly regarding Eigen.
2019-03-27 18:09:17 -05:00
Field G. Van Zee
2c85e1dd9d Added Eigen results to performance graphs.
Details:
- Updated the Haswell, SkylakeX, and Epyc performance graphs in
  docs/graphs to report on Eigen implementations, where applicable.
  Specifically, Eigen implements all level-3 operations sequentially,
  however, of those operations it only provides multithreaded gemm.
  Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
  omitted. Thanks to Sameer Agarwal for his help configuring and
  using Eigen.
- Updated docs/Performance.md to note the new implementation tested.
- CREDITS file update.
2019-03-27 16:29:51 -05:00
Field G. Van Zee
e593221383 Merge branch 'master' into dev 2019-03-26 15:51:45 -05:00
Field G. Van Zee
feefcab442 Allow disabling of BLAS prototypes at compile-time.
Details:
- Modified bli_blas.h so that:
  - By default, if the BLAS layer is enabled at configure-time, BLAS
    prototypes are also enabled within blis.h;
  - But if the user #defines BLIS_DISABLE_BLAS_DEFS prior to including
    blis.h, BLAS prototypes are skipped over entirely so that, for
    example, the application or some other header pulled in by the
    application may prototype the BLAS functions without causing any
    duplication.
- Updated docs/BuildSystem.md to document the feature above, and
  related text.
2019-03-21 18:11:20 -05:00
Field G. Van Zee
288843b06d Added Eigen support to test/3 Makefile, runme.sh.
Details:
- Added targets to test/3/Makefile that link against a BLAS library
  build by Eigen. It appears, however, that Eigen's BLAS library does
  not support multithreading. (It may be that multithreading is only
  available when using the native C++ APIs.)
- Updated runme.sh with a few Eigen-related tweaks.
- Minor tweaks to docs/Performance.md.
2019-03-20 17:52:23 -05:00
Field G. Van Zee
153e0be21d More minor tweaks to docs/Performance.md.
Details:
- Defined GFLOPS as billions of floating-point operations per second,
  and reworded the sentence after about normalization.
2019-03-19 17:53:18 -05:00
Field G. Van Zee
64560cd924 ReleaseNotes.md update in advance of next version.
Details:
- Updated ReleaseNotes.md in preparation for next version.
2019-03-19 17:04:20 -05:00
Field G. Van Zee
ab5ad557ea Very minor tweaks to Performance.md. 2019-03-19 16:50:41 -05:00
Field G. Van Zee
03c4a25e1a Minor fixes to docs/Performance.md.
Details:
- Fixed some incorrect labels associated with the pdf/png graphs,
  apparently the result of copy-pasting.
2019-03-19 16:47:15 -05:00
Field G. Van Zee
fe6dd8b132 Fixed broken section links in docs/Performance.md.
Details:
- Fixed a few broken section links in the Contents section.
2019-03-19 16:30:23 -05:00