Commit Graph

145 Commits

Author SHA1 Message Date
Field G. Van Zee
b36fb0fbc5 Fixed newly broken link to CREDITS in FAQ.md. 2021-09-28 18:47:45 -05:00
Field G. Van Zee
3442d4002b More minor fixes to FAQ.md and Sandboxes.md. 2021-09-28 18:43:23 -05:00
Field G. Van Zee
89aaf00650 Updates to FAQ.md, Sandboxes.md, and README.md.
Details:
- Updated FAQ.md to include two new questions, reordered an existing
  question, and also removed an outdated and redundant question about
  BLIS vs. AMD BLIS.
- Updated Sandboxes.md to use 'gemmlike' as its main example, along with
  other smaller details.
- Added ARM as a funder to README.md.
2021-09-28 18:34:33 -05:00
Field G. Van Zee
cc9206df66 Added Graviton2 Neoverse N1 performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on a Graviton2
  Neoverse N1 server. Special thanks to Nicholai Tukanov for
  collecting these results via the Arm-HPC/AWS hackaton.
- Corrected what was supposed to be a temporary tweak to the legend
  labels in test/3/octave/plot_l3_perf.m.
2021-07-16 15:48:37 -05:00
Field G. Van Zee
689fa0f403 Merge branch 'master' into dev 2021-06-13 19:44:14 -05:00
Field G. Van Zee
82af05f54c Updated Fugaku (a64fx) performance results.
Details:
- Updated the performance graphs (pdfs and pngs) for the Fugaku/a64fx
  entry within Performance.md, and also updated the experiment details
  accordingly. Thanks to RuQing Xu for re-running the BLIS and SSL2
  experiments reflected in this commit.
- In Performance.md, added an English translation of the project name
  under which the Fugaku results were gathered, courtesy of RuQing Xu.
2021-05-25 15:25:08 -05:00
Field G. Van Zee
f0e8634775 Defined eqsc, eqv, eqm to test object equality.
Details:
- Defined eqsc, eqv, and eqm operations, which set a bool depending on
  whether the two scalars, two vectors, or two matrix operands are equal
  (element-wise). eqsc and eqv support implicit conjugation and eqm
  supports diagonal offset, diag, uplo, and trans parameters (in a
  manner consistent with other level-1m operations). These operations
  are currently housed under frame/util, at least for now, because they
  are not computational in nature.
- Redefined bli_obj_equals() in terms of eqsc, eqv, and eqm.
- Documented eqsc, eqv, and eqm in BLISObjectAPI.md and BLISTypedAPI.md.
  Also:
  - Documented getsc and setsc in both docs.
  - Reordered entry for setijv in BLISTypedAPI.md, and added separator
    bars to both docs.
  - Added missing "Observed object properties" clauses to various
    levle-1v entries in BLISObjectAPI.md.
- Defined bli_apply_trans() in bli_param_macro_defs.h.
- Defined supporting _check() function, bli_l0_xxbsc_check(), in
  bli_l0_check.c for eqsc.
- Programming style and whitespace updates to bli_l1m_unb_var1.c.
- Whitespace updates to bli_l0_oapi.c, bli_l1m_oapi.c
- Consolidated redundant macro redefinition for copym function pointer
  type in bli_l1m_ft.h.
- Added macros to bli_oapi_ba.h, _ex.h, and bli_tapi_ba.h, _ex.h that
  allow oapi and tapi source files to forego defining certain expert
  functions. (Certain operations such as printv and printm do not need
  to have both basic expert interfaces. This also includes eqsc, eqv,
  and eqm.)
2021-05-12 18:45:32 -05:00
Field G. Van Zee
6a89c7d8f9 Defined setijv, getijv to set/get vector elements.
Details:
- Defined getijv, setijv operations to get and set elements of a vector,
  in bli_setgetijv.c and .h.
- Renamed bli_setgetij.c and .h to bli_setgetijm.c and .h, respectively.
- Added additional bounds checking to getijm and setijm to prevent
  actions with negative indices.
- Added documentation to BLISObjectAPI.md and BLISTypedAPI.md for getijv
  and setijv.
- Added documentation to BLISTypedAPI.md for getijm and setijm, which
  were inadvertently missing.
- Added a new entry to the FAQ titled "Why does BLIS have vector
  (level-1v) and matrix (level-1m) variations of most level-1
  operations?"
- Comment updates.
2021-05-01 18:54:48 -05:00
Gaëtan Cassiers
1f3461a5a5 Fix typo in FAQ.md 2021-04-21 16:49:05 +02:00
Field G. Van Zee
6280757be3 Minor updates to a64fx section of Performance.md. 2021-04-07 13:03:56 -05:00
RuQing Xu
1e6ed823c6 Additional A64fx Comments (#490)
* Performance.md Update A64fx Comments

- Reason for ARMPL's missing data;
- Additional envs / flags for kernel selection;
- Update BLIS SRC commit.

* Include Another Fix in armsve-cfg-vendor

A prototype was forgotten, causing that void* pointer was not fully returned.
2021-04-07 12:59:26 -05:00
Field G. Van Zee
2688f21a5b Added Fujitsu A64fx (512-bit SVE) perf results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on the "Fugaku"
  Fujitsu A64fx supercomputer at the RIKEN Center for Computational
  Science in Kobe, Japan. Special thanks to RuQing Xu and Stepan
  Nassyr for their work in developing and optimizing A64fx support in
  BLIS and RuQing for gathering the performance data that is reflected
  in these new graphs.
2021-04-06 19:02:37 -05:00
Field G. Van Zee
e56d9f2d94 ReleaseNotes.md update in advance of next version. 2021-03-22 17:40:50 -05:00
Field G. Van Zee
4493cf516e Redefined BLIS_NUM_ARCHS to update automatically.
Details:
- Changed BLIS_NUM_ARCHS from a cpp macro definition to the last enum
  value in the arch_t enum. This means that it no longer needs to get
  updated manually whenever new subconfigurations are added to BLIS.
  Also removed the explicit initial index assigment of 0 from the
  first enum value, which was unnecessary due to how the C language
  standard mandates indexing of enum values. Thanks to Devin Matthews
  for originally submitting this as a PR in #446.
- Updated docs/ConfigurationHowTo.md to reflect the aforementioned
  change.
2021-03-15 13:12:49 -05:00
RuQing Xu
b8dcc5bc75 Fixed typed API definition for gemmt (#476)
Details:
- Fixed incorrect definition and prototype of bli_?gemmt() in 
  frame/3/bli_l3_tapi.c and .h, respectively. gemmt was previously
  defined identically to gemm, which was wrong because it did not
  take into account the uplo property of C.
- Fixed incorrect API documentation for her2k/syr2k in BLISTypedAPI.md.
  Specifically, the document erroneously listed only a single transab
  parameter instead of transa and transb.
2021-03-01 16:58:24 -06:00
Field G. Van Zee
ed50c94738 Merge branch 'master' into dev 2021-01-04 14:31:44 -06:00
Devin Matthews
6d3bafacd7 Update BuildSystem.md
Add git version >= 1.8.5 requirement (see #462).
2020-11-28 17:17:56 -06:00
Field G. Van Zee
64856ea5a6 Auto-reduce (by default) prime numbers of threads.
Details:
- When requesting multithreaded parallelism by specifying the total
  number of threads (whether it be via environment variable, globally at
  runtime, or locally at runtime), reduce the number of threads actually
  used by one if the original value (a) is prime and (b) exceeds a
  minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which is set
  to 11 by default. If, when specifying the total number of threads (and
  not the individual ways of parallelism for each loop), prime numbers
  of threads are desired, this feature may be overridden by defining the
  BLIS_ENABLE_AUTO_PRIME_NUM_THREADS macro in the bli_family_*.h that
  corresponds to the configuration family targeted at configure-time.
  (For now, there is no configure option(s) to control this feature.)
  Thanks to Jeff Diamond for suggesting this change.
- Defined a new function in bli_thread.c, bli_is_prime(), that returns a
  bool that determines whether an integer is prime. This function is
  implemented in terms of existing functions in bli_thread.c.
- Updated docs/Multithreading.md to document the above feature, along
  with unrelated minor edits.
2020-11-23 16:54:51 -06:00
Field G. Van Zee
55933b6ff6 Added missing attribution to docs/ReleaseNotes.md. 2020-11-20 10:39:32 -06:00
Field G. Van Zee
2928ec750d ReleaseNotes.md update in advance of next version.
Details:
- Updated docs/ReleaseNotes.md in preparation for next version.
2020-11-18 18:31:35 -06:00
Field G. Van Zee
9bb23e6c2a Added support for systemless build (no pthreads).
Details:
- Added a configure option, --[enable|disable]-system, which determines
  whether the modest operating system dependencies in BLIS are included.
  The most notable example of this on Linux and BSD/OSX is the use of
  POSIX threads to ensure thread safety for when application-level
  threads call BLIS. When --disable-system is given, the bli_pthreads
  implementation is dummied out entirely, allowing the calling code
  within BLIS to remain unchanged. Why would anyone want to build BLIS
  like this? The motivating example was submitted via #454 in which a
  user wanted to build BLIS for a simulator such as gem5 where thread
  safety may not be a concern (and where the operating system is largely
  absent anyway). Thanks to Stepan Nassyr for suggesting this feature.
- Another, more minor side effect of the --disable-system option is that
  the implementation of bli_clock() unconditionally returns 0.0 instead
  of the time elapsed since some fixed point in the past. The reasoning
  for this is that if the operating system is truly minimal, the system
  function call upon which bli_clock() would normally be implemented
  (e.g. clock_gettime()) may not be available.
- Refactored preprocess-guarded code in bli_pthread.c and bli_pthread.h
  to remove redundancies.
- Removed old comments and commented #include of "bli_pthread_wrap.h"
  from bli_system.h.
- Documented bli_clock() and bli_clock_min_diff() in BLISObjectAPI.md
  and BLISTypedAPI.md, with a note that both are non-functional when
  BLIS is configured with --disable-system.
2020-11-16 15:55:45 -06:00
Field G. Van Zee
88ad841434 Squash-merge 'pr' into 'squash'. (#457)
Merged contributions from AMD's AOCL BLIS (#448).
  
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
  only the lower or upper triangle of a square matrix C. For now, only
  the conventional/large code path will be supported (in vanilla BLIS).
  This was accomplished by leveraging the existing variant logic for
  herk. However, some of the infrastructure to support a gemmtsup is
  included in this commit, including
  - A bli_gemmtsup() front-end, similar to bli_gemmsup().
  - A bli_gemmtsup_ref() reference handler function.
  - A bli_gemmtsup_int() variant chooser function (with variant calls
    commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
  wrapper to a set of polymorphic CBLAS-like function wrappers defined
  in another header (cblas.hh). These two headers are installed if
  running the 'install' target with INSTALL_HH is set to 'yes'. (Also
  added a set of unit tests that exercise blis.hh, although they are
  disabled for now because they aren't compatible with out-of-tree
  builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
  within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
  various minor updates to dotv and scalv kernels. Also added various
  sup kernels contributed by AMD to kernels/zen/3. However, these
  kernels are (for now) not yet used, in part because they caused
  AppVeyor clang failures, and also because I have not found time to
  review and vet them.
- Output the python found during configure into the definition of PYTHON
  in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
  to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
  bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
  bug surfaced because the gemmt module verifies its computation using
  gemm with its beta parameter set to zero, which, on a cortexa15 system
  caused the gemm kernel code to unconditionally multiply the
  uninitialized C data by beta. The C matrix likely contained
  non-numeric values such as NaN, which then would have resulted in a
  false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
  in bli_l3_blocksize.c, was inadvertantly being defined in terms of
  helper functions meant for trmm. This bug was probably harmless since
  the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
  kernels/zen/3/bli_gemm_small.c since those macros are not used in
  vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
  accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
  Windows systems.
- Various whitespace changes.
2020-11-14 09:39:48 -06:00
Field G. Van Zee
e14424f55b Merge branch 'dev' 2020-11-07 13:02:50 -06:00
Field G. Van Zee
0cfe1aac22 Relocated operation index to ToC in API docs.
Details:
- Moved the "Operation index" section of both the BLISObjectAPI.md and
  BLISTypedAPI.md docs to appear immediately after the table of contents
  of each document. This allows the reader to quickly jump to the
  documentation for any operation without having to scroll through much
  of the document (when rendered via a web browser).
- Fixed a mistake in the BLISObjectAPI.md for the setd operation, which
  does *not* observe the diag property of its matrix argument. Thanks to
  Jeff Diamond for reporting this.
2020-10-30 17:10:36 -05:00
Field G. Van Zee
eccdd75a2d Whitespace tweak in docs/PerformanceSmall.md. 2020-10-09 15:44:16 -05:00
Field G. Van Zee
addcd46b05 Added Epyc 7742 Zen2 ("Rome") sup perf results.
Details:
- Added single-threaded and multithreaded sup performance results to
  docs/PerformanceSmall.md for both sgemm and dgemm. These results were
  gathered on an Epyc 7742 "Rome" server featuring AMD's Zen2
  microarchitecture. Special thanks to Jeff Diamond for facilitating
  access to the system via the Oracle Cloud.
- Updates to octave scripts in test/sup/octave for use with Octave 5.2
  and for use with subplot_tight().
- Minor updates to octave scripts in test/3/octave.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
- Decreased line thickness slightly in large/conventional Zen2 graphs.
  I'm done tweaking those this time. Really.
- Added missing line regarding eigen header installation for each
  microarchitecture section.
2020-10-09 15:41:09 -05:00
Field G. Van Zee
d98368c32d Another tweak to line thickness of Zen2 graphs. 2020-10-08 19:05:51 -05:00
Field G. Van Zee
1855dfbdaa Tweaked line thickness in Zen2 graphs once more.
Details:
- Decreased (relative to previous commit) line thickness in recent Zen2
  graphs.
2020-10-08 19:01:00 -05:00
Field G. Van Zee
0991611e7e Increased line thickness in recent Zen2 graphs.
Details:
- Increased the width of the lines in the graphs introduced in 74ec6b8.
2020-10-08 18:54:49 -05:00
Field G. Van Zee
8273cbacd7 README.md, docs/FAQ.md updates.
Details:
- Added a frequently asked question to docs/FAQ.md regarding the
  difference between upstream (vanilla) BLIS and AMD BLIS.
- Updated the name of ICES in the README.md to reflect the Oden
  rebranding.
2020-10-07 14:51:33 -05:00
Field G. Van Zee
a178a822ad Added Zen2 links to docs/Performance.md Contents. 2020-09-30 16:00:52 -05:00
Field G. Van Zee
74ec6b8f45 Added Epyc 7742 Zen2 ("Rome") performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on an Epyc 7742
  "Rome" server with AMD's Zen2 microarchitecture. Special thanks
  to Jeff Diamond for facilitating access to the system via the
  Oracle Cloud.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
2020-09-30 15:54:18 -05:00
Devin Matthews
5d653a11a0 Update Multithreading.md
Addresses the issue raised in #426.
2020-08-06 17:58:26 -05:00
Field G. Van Zee
882dcb11bf Mention example code at top of documentation docs.
Details:
- Steer the reader towards the example code section of each
  documentation doc (object and typed).
- Trivial update to examples/oapi/README, examples/tapi/README.
2020-08-06 17:28:14 -05:00
Field G. Van Zee
f4894512e5 Very minor updates to previous commit. 2020-08-06 17:20:00 -05:00
Field G. Van Zee
adedb893ae Documented mutator functions in BLISObjectAPI.md.
Details:
- Added documentation for commonly-used object mutator functions in
  BLISObjectAPI.md. Previously, only accessor functions were documented.
  Thanks to Jeff Diamond for pointing out this omission.
- Explicitly set the 'diag' property of objects in oapi example modules
  (08level2.c and 09level3.c).
2020-08-06 17:14:01 -05:00
Field G. Van Zee
6e522e5823 Mention disabling of sup in docs/Sandboxes.md.
Details:
- Added language to remind the reader to disable sup if the intended
  behavior is for the sandbox implementation to handle all problem
  sizes, even the smaller ones that would normally be handled by the
  sup code path.
2020-07-30 19:31:37 -05:00
Field G. Van Zee
00e14cb6d8 Replaced use of bool_t type with C99 bool.
Details:
- Textually replaced nearly all non-comment instances of bool_t with the
  C99 bool type. A few remaining instances, such as those in the files
  bli_herk_x_ker_var2.c, bli_trmm_xx_ker_var2.c, and
  bli_trsm_xx_ker_var2.c, were promoted to dim_t since they were being
  used not for boolean purposes but to index into an array.
- This commit constitutes the third phase of a transition toward using
  C99's bool instead of bool_t, which was raised in issue #420. The first
  phase, which cleaned up various typecasts in preparation for using
  bool as the basis for bool_t (instead of gint_t), was implemented by
  commit a69a4d7. The second phase, which redefined the bool_t typedef
  in terms of bool (from gint_t), was implemented by commit 2c554c2.
2020-07-29 14:24:34 -05:00
Field G. Van Zee
a6437a5c11 Replaced broken ref99 sandbox w/ simpler version.
Details:
- The 'ref99' sandbox was broken by multiple refactorings and internal
  API changes over the last two years. Rather than try to fix it, I've
  replaced it with a much simpler version based on var2 of gemmsup.
  Why not fix the previous implementation? It occurred to me that the
  old implementation was trying to be a lightly simplified duplication
  of what exists in the framework. Duplication aside, this sandbox
  would have worked fine if it had been completely independent of the
  framework code. The problem was that it was only partially
  independent, with many function calls calling a function in BLIS
  rather than a duplicated/simplified version within the sandbox. (And
  the reason I didn't make it fully independent to begin with was that
  it seemed unnecessarily duplicative at the time.) Maintaining two
  versions of the same implementation is problematic for obvious
  reasons, especially when it wasn't even done properly to begin with.
  This explains the reimplementation in this commit. The only catch is
  that the newer implementation is single-threaded only and does not
  perform any packing on either input matrix (A or B). Basically, it's
  only meant to be a simple placeholder that shows how you could plug
  in your own implementation. Thanks to Francisco Igual for reporting
  this brokenness.
- Updated the three reference gemmsup kernels (defined in
  ref_kernels/3/bli_gemmsup_ref.c) so that they properly handle
  conjugation of conja and/or conjb. The general storage kernel, which
  is currently identical to the column-storage kernel, is used in the
  new ref99 sandbox to provide basic support for all datatypes
  (including scomplex and dcomplex).
- Minor updates to docs/Sandboxes.md, including adding the threading
  and packing limitations to the Caveats section.
- Fixed a comment typo in bli_l3_sup_var1n2m.c (upon which the new
  sandbox implementation is based).
2020-07-20 19:21:07 -05:00
Giorgos Margaritis
171ecc1dc6 Update Multithreading.md 2020-07-20 12:24:06 +03:00
Field G. Van Zee
ceb9b95a96 Fixed incorrect link to shiftd in BLISTypedAPI.md.
Details:
- Previously, the entry for shiftd in the Operation index section of
  BLISTypedAPI.md was incorrectly linking to the shiftd operation entry
  in BLISObjectAPI.md. This has been fixed. Thanks to Jeff Diamond for
  helping find this incorrect link.
2020-06-18 17:15:25 -05:00
Isuru Fernando
31af73c11a Expand windows instructions (#414)
* Expand windows instructions

* Windows: both static and shared don't work at the same time
2020-06-18 13:35:54 -05:00
Isuru Fernando
35e38fb693 FIx typo in FAQ 2020-06-16 09:08:31 -07:00
Isuru Fernando
943a21def0 Add build instructions for Windows (#404) 2020-05-21 14:09:21 -05:00
Field G. Van Zee
fbef422f0d Separate OS X and Windows into separate FAQs.
Details:
- Separated the unified Mac OS X / Windows frequently asked question
  into two separate questions, one for each OS.
2020-05-21 10:30:41 -05:00
Field G. Van Zee
c53b5153be Documented Perl prerequisite for build system.
Details:
- Added Perl to list of prerequisites for building BLIS. This is in part
  (and perhaps completely?) due to some substitution commands used at
  the end of configure that include '\n' characters that are not
  properly interpreted by the version of sed included on some versions
  of OS X. This new documentation addresses issue #398.
2020-05-05 12:39:12 -05:00
Yingbo Ma
4d87eb24e8 Update KernelsHowTo.md (#395) 2020-04-27 16:02:47 -05:00
Field G. Van Zee
8bde63ffd7 Adding missing conjy to her2/syr2 in typed API doc.
Details:
- Fixed a missing argument (conjy) in the function signatures of
  bli_?her2() and bli_?syr2() in docs/BLISTypedAPI.md. Thanks to Robert
  van de Geijn for reporting this omission.
2020-04-18 12:50:12 -05:00
Field G. Van Zee
b04de636c1 ReleaseNotes.md update in advance of next version.
Details:
- Updated docs/ReleaseNotes.md in preparation for next version.
2020-04-07 14:37:43 -05:00
Field G. Van Zee
0f9e0399e1 Updated sup performance graphs; added mt results.
Details:
- Reran all existing single-threaded performance experiments comparing
  BLIS sup to other implementations (including the conventional code
  path within BLIS), using the latest versions (where appropriate).
- Added multithreaded results for the three existing hardware types
  showcased in docs/PerformanceSmall.md: Kaby Lake, Haswell, and Epyc
  (Zen1).
- Various minor updates to the text in docs/PerformanceSmall.md.
- Updates to the octave scripts in test/sup/octave, test/supmt/octave.
2020-03-05 17:03:21 -06:00