Commit Graph

185 Commits

Author SHA1 Message Date
Field G. Van Zee
7a0ba4194f Added support for addons.
Details:
- Implemented a new feature called addons, which are similar to
  sandboxes except that there is no requirement to define gemm or any
  other particular operation.
- Updated configure to accept --enable-addon=<name> or -a <name> syntax
  for requesting an addon be included within a BLIS build. configure now
  outputs the list of enabled addons into config.mk. It also outputs the
  corresponding #include directives for the addons' headers to a new
  companion to the bli_config.h header file named bli_addon.h. Because
  addons may wish to make use of existing BLIS types within their own
  definitions, the addons' headers must be included sometime after that
  of bli_config.h (which currently is #included before bli_type_defs.h).
  This is why the #include directives needed to go into a new top-level
  header file rather than the existing bli_config.h file.
- Added a markdown document, docs/Addons.md, to explain addons, how to
  build with them, and what assumptions their authors should keep in
  mind as they create them.
- Added a gemmlike-like implementation of sandwich gemm called 'gemmd'
  as an addon in addon/gemmd. The code uses a 'bao_' prefix for local
  functions, including the user-level object and typed APIs.
- Updated .gitignore so that git ignores bli_addon.h files.

Change-Id: Ie7efdea366481ce25075cb2459bdbcfd52309717
2022-03-31 12:03:27 +05:30
lcpu
7401effc03 BLIS:merge:
Merge conflicts araised has been fixed while downstreaming BLIS code from master to milan-3.1 branch

Implemented an automatic reduction in the number of threads when the user requests parallelism via a single number (ie: the automatic way) and (a) that number of threads is prime, and (b) that number exceeds a minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which defaults to 11. If prime numbers are really desired, this feature may be suppressed by defining the macro BLIS_ENABLE_AUTO_PRIME_NUM_THREADS in the appropriate configuration family's bli_family_*.h. (Jeff Diamond)

Changed default value of BLIS_THREAD_RATIO_M from 2 to 1, which leads to slightly different automatic thread factorizations.

Enable the 1m method only if the real domain microkernel is not a reference kernel. BLIS now forgoes use of 1m if both the real and complex domain kernels are reference implementations.

Relocated the general stride handling for gemmsup. This fixed an issue whereby gemm would fail to trigger to conventional code path for cases that use general stride even after gemmsup rejected the problem. (RuQing Xu)

Fixed an incorrect function signature (and prototype) of bli_?gemmt(). (RuQing Xu)

Redefined BLIS_NUM_ARCHS to be part of the arch_t enum, which means it will be updated automatically when defining future subconfigs.

Minor code consolidation in all level-3 _front() functions.

Reorganized Windows cpp branch of bli_pthreads.c.

Implemented bli_pthread_self() and _equals(), but left them commented out (via cpp guards) due to issues with getting the Windows versions working. Thankfully, these functions aren't yet needed by BLIS.

Allow disabling of trsm diagonal pre-inversion at compile time via --disable-trsm-preinversion.

Fixed obscure testsuite bug for the gemmt test module that relates to its dependency on gemv.

AMD-internal-[CPUPL-1523]

Change-Id: I0d1df018e2df96a23dc4383d01d98b324d5ac5cd
2021-04-27 11:09:48 +05:30
Gaëtan Cassiers
1f3461a5a5 Fix typo in FAQ.md 2021-04-21 16:49:05 +02:00
Field G. Van Zee
6280757be3 Minor updates to a64fx section of Performance.md. 2021-04-07 13:03:56 -05:00
RuQing Xu
1e6ed823c6 Additional A64fx Comments (#490)
* Performance.md Update A64fx Comments

- Reason for ARMPL's missing data;
- Additional envs / flags for kernel selection;
- Update BLIS SRC commit.

* Include Another Fix in armsve-cfg-vendor

A prototype was forgotten, causing that void* pointer was not fully returned.
2021-04-07 12:59:26 -05:00
Field G. Van Zee
2688f21a5b Added Fujitsu A64fx (512-bit SVE) perf results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on the "Fugaku"
  Fujitsu A64fx supercomputer at the RIKEN Center for Computational
  Science in Kobe, Japan. Special thanks to RuQing Xu and Stepan
  Nassyr for their work in developing and optimizing A64fx support in
  BLIS and RuQing for gathering the performance data that is reflected
  in these new graphs.
2021-04-06 19:02:37 -05:00
Field G. Van Zee
e56d9f2d94 ReleaseNotes.md update in advance of next version. 2021-03-22 17:40:50 -05:00
Field G. Van Zee
4493cf516e Redefined BLIS_NUM_ARCHS to update automatically.
Details:
- Changed BLIS_NUM_ARCHS from a cpp macro definition to the last enum
  value in the arch_t enum. This means that it no longer needs to get
  updated manually whenever new subconfigurations are added to BLIS.
  Also removed the explicit initial index assigment of 0 from the
  first enum value, which was unnecessary due to how the C language
  standard mandates indexing of enum values. Thanks to Devin Matthews
  for originally submitting this as a PR in #446.
- Updated docs/ConfigurationHowTo.md to reflect the aforementioned
  change.
2021-03-15 13:12:49 -05:00
RuQing Xu
b8dcc5bc75 Fixed typed API definition for gemmt (#476)
Details:
- Fixed incorrect definition and prototype of bli_?gemmt() in 
  frame/3/bli_l3_tapi.c and .h, respectively. gemmt was previously
  defined identically to gemm, which was wrong because it did not
  take into account the uplo property of C.
- Fixed incorrect API documentation for her2k/syr2k in BLISTypedAPI.md.
  Specifically, the document erroneously listed only a single transab
  parameter instead of transa and transb.
2021-03-01 16:58:24 -06:00
Field G. Van Zee
ed50c94738 Merge branch 'master' into dev 2021-01-04 14:31:44 -06:00
Devin Matthews
6d3bafacd7 Update BuildSystem.md
Add git version >= 1.8.5 requirement (see #462).
2020-11-28 17:17:56 -06:00
Field G. Van Zee
64856ea5a6 Auto-reduce (by default) prime numbers of threads.
Details:
- When requesting multithreaded parallelism by specifying the total
  number of threads (whether it be via environment variable, globally at
  runtime, or locally at runtime), reduce the number of threads actually
  used by one if the original value (a) is prime and (b) exceeds a
  minimum threshold defined by the macro BLIS_NT_MAX_PRIME, which is set
  to 11 by default. If, when specifying the total number of threads (and
  not the individual ways of parallelism for each loop), prime numbers
  of threads are desired, this feature may be overridden by defining the
  BLIS_ENABLE_AUTO_PRIME_NUM_THREADS macro in the bli_family_*.h that
  corresponds to the configuration family targeted at configure-time.
  (For now, there is no configure option(s) to control this feature.)
  Thanks to Jeff Diamond for suggesting this change.
- Defined a new function in bli_thread.c, bli_is_prime(), that returns a
  bool that determines whether an integer is prime. This function is
  implemented in terms of existing functions in bli_thread.c.
- Updated docs/Multithreading.md to document the above feature, along
  with unrelated minor edits.
2020-11-23 16:54:51 -06:00
Field G. Van Zee
55933b6ff6 Added missing attribution to docs/ReleaseNotes.md. 2020-11-20 10:39:32 -06:00
Field G. Van Zee
2928ec750d ReleaseNotes.md update in advance of next version.
Details:
- Updated docs/ReleaseNotes.md in preparation for next version.
2020-11-18 18:31:35 -06:00
Field G. Van Zee
9bb23e6c2a Added support for systemless build (no pthreads).
Details:
- Added a configure option, --[enable|disable]-system, which determines
  whether the modest operating system dependencies in BLIS are included.
  The most notable example of this on Linux and BSD/OSX is the use of
  POSIX threads to ensure thread safety for when application-level
  threads call BLIS. When --disable-system is given, the bli_pthreads
  implementation is dummied out entirely, allowing the calling code
  within BLIS to remain unchanged. Why would anyone want to build BLIS
  like this? The motivating example was submitted via #454 in which a
  user wanted to build BLIS for a simulator such as gem5 where thread
  safety may not be a concern (and where the operating system is largely
  absent anyway). Thanks to Stepan Nassyr for suggesting this feature.
- Another, more minor side effect of the --disable-system option is that
  the implementation of bli_clock() unconditionally returns 0.0 instead
  of the time elapsed since some fixed point in the past. The reasoning
  for this is that if the operating system is truly minimal, the system
  function call upon which bli_clock() would normally be implemented
  (e.g. clock_gettime()) may not be available.
- Refactored preprocess-guarded code in bli_pthread.c and bli_pthread.h
  to remove redundancies.
- Removed old comments and commented #include of "bli_pthread_wrap.h"
  from bli_system.h.
- Documented bli_clock() and bli_clock_min_diff() in BLISObjectAPI.md
  and BLISTypedAPI.md, with a note that both are non-functional when
  BLIS is configured with --disable-system.
2020-11-16 15:55:45 -06:00
Field G. Van Zee
88ad841434 Squash-merge 'pr' into 'squash'. (#457)
Merged contributions from AMD's AOCL BLIS (#448).
  
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
  only the lower or upper triangle of a square matrix C. For now, only
  the conventional/large code path will be supported (in vanilla BLIS).
  This was accomplished by leveraging the existing variant logic for
  herk. However, some of the infrastructure to support a gemmtsup is
  included in this commit, including
  - A bli_gemmtsup() front-end, similar to bli_gemmsup().
  - A bli_gemmtsup_ref() reference handler function.
  - A bli_gemmtsup_int() variant chooser function (with variant calls
    commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
  wrapper to a set of polymorphic CBLAS-like function wrappers defined
  in another header (cblas.hh). These two headers are installed if
  running the 'install' target with INSTALL_HH is set to 'yes'. (Also
  added a set of unit tests that exercise blis.hh, although they are
  disabled for now because they aren't compatible with out-of-tree
  builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
  within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
  various minor updates to dotv and scalv kernels. Also added various
  sup kernels contributed by AMD to kernels/zen/3. However, these
  kernels are (for now) not yet used, in part because they caused
  AppVeyor clang failures, and also because I have not found time to
  review and vet them.
- Output the python found during configure into the definition of PYTHON
  in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
  to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
  bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
  bug surfaced because the gemmt module verifies its computation using
  gemm with its beta parameter set to zero, which, on a cortexa15 system
  caused the gemm kernel code to unconditionally multiply the
  uninitialized C data by beta. The C matrix likely contained
  non-numeric values such as NaN, which then would have resulted in a
  false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
  in bli_l3_blocksize.c, was inadvertantly being defined in terms of
  helper functions meant for trmm. This bug was probably harmless since
  the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
  kernels/zen/3/bli_gemm_small.c since those macros are not used in
  vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
  accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
  Windows systems.
- Various whitespace changes.
2020-11-14 09:39:48 -06:00
Field G. Van Zee
e14424f55b Merge branch 'dev' 2020-11-07 13:02:50 -06:00
Field G. Van Zee
0cfe1aac22 Relocated operation index to ToC in API docs.
Details:
- Moved the "Operation index" section of both the BLISObjectAPI.md and
  BLISTypedAPI.md docs to appear immediately after the table of contents
  of each document. This allows the reader to quickly jump to the
  documentation for any operation without having to scroll through much
  of the document (when rendered via a web browser).
- Fixed a mistake in the BLISObjectAPI.md for the setd operation, which
  does *not* observe the diag property of its matrix argument. Thanks to
  Jeff Diamond for reporting this.
2020-10-30 17:10:36 -05:00
Field G. Van Zee
eccdd75a2d Whitespace tweak in docs/PerformanceSmall.md. 2020-10-09 15:44:16 -05:00
Field G. Van Zee
addcd46b05 Added Epyc 7742 Zen2 ("Rome") sup perf results.
Details:
- Added single-threaded and multithreaded sup performance results to
  docs/PerformanceSmall.md for both sgemm and dgemm. These results were
  gathered on an Epyc 7742 "Rome" server featuring AMD's Zen2
  microarchitecture. Special thanks to Jeff Diamond for facilitating
  access to the system via the Oracle Cloud.
- Updates to octave scripts in test/sup/octave for use with Octave 5.2
  and for use with subplot_tight().
- Minor updates to octave scripts in test/3/octave.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
- Decreased line thickness slightly in large/conventional Zen2 graphs.
  I'm done tweaking those this time. Really.
- Added missing line regarding eigen header installation for each
  microarchitecture section.
2020-10-09 15:41:09 -05:00
Field G. Van Zee
d98368c32d Another tweak to line thickness of Zen2 graphs. 2020-10-08 19:05:51 -05:00
Field G. Van Zee
1855dfbdaa Tweaked line thickness in Zen2 graphs once more.
Details:
- Decreased (relative to previous commit) line thickness in recent Zen2
  graphs.
2020-10-08 19:01:00 -05:00
Field G. Van Zee
0991611e7e Increased line thickness in recent Zen2 graphs.
Details:
- Increased the width of the lines in the graphs introduced in 74ec6b8.
2020-10-08 18:54:49 -05:00
Field G. Van Zee
8273cbacd7 README.md, docs/FAQ.md updates.
Details:
- Added a frequently asked question to docs/FAQ.md regarding the
  difference between upstream (vanilla) BLIS and AMD BLIS.
- Updated the name of ICES in the README.md to reflect the Oden
  rebranding.
2020-10-07 14:51:33 -05:00
Field G. Van Zee
a178a822ad Added Zen2 links to docs/Performance.md Contents. 2020-09-30 16:00:52 -05:00
Field G. Van Zee
74ec6b8f45 Added Epyc 7742 Zen2 ("Rome") performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on an Epyc 7742
  "Rome" server with AMD's Zen2 microarchitecture. Special thanks
  to Jeff Diamond for facilitating access to the system via the
  Oracle Cloud.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
2020-09-30 15:54:18 -05:00
Devin Matthews
5d653a11a0 Update Multithreading.md
Addresses the issue raised in #426.
2020-08-06 17:58:26 -05:00
Field G. Van Zee
882dcb11bf Mention example code at top of documentation docs.
Details:
- Steer the reader towards the example code section of each
  documentation doc (object and typed).
- Trivial update to examples/oapi/README, examples/tapi/README.
2020-08-06 17:28:14 -05:00
Field G. Van Zee
f4894512e5 Very minor updates to previous commit. 2020-08-06 17:20:00 -05:00
Field G. Van Zee
adedb893ae Documented mutator functions in BLISObjectAPI.md.
Details:
- Added documentation for commonly-used object mutator functions in
  BLISObjectAPI.md. Previously, only accessor functions were documented.
  Thanks to Jeff Diamond for pointing out this omission.
- Explicitly set the 'diag' property of objects in oapi example modules
  (08level2.c and 09level3.c).
2020-08-06 17:14:01 -05:00
dzambare
267a959af1 Rebased amd-staging-milan-3.0 branch on master
-- Rebased on top of master commit # 6e522e5823
  -- Updated merged code to remove duplicated code added by auto-merging
  -- Updated merged code to rename bool_t type
  -- Updated merged code to rename bli_thread_obarrier
  -- Updated merged code to rename bli_thread_obroadcast

Change-Id: I39879f1ef3b42ecbe5808af3b559d88c36dbbf6c
AMD-Internal: [CPUPL-1067]
2020-08-06 10:09:29 +05:30
Field G. Van Zee
89105695cc Added sup performance graphs/document to 'docs'.
Details:
- Added a new markdown document, docs/PerformanceSmall.md, which
  publishes new performance graphs for Kaby Lake and Epyc showcasing
  the new BLIS sup (small/skinny/unpacked) framework logic and kernels.
  For now, only single-threaded dgemm performance is shown.
- Reorganized graphs in docs/graphs into docs/graphs/large, with new
  graphs being placed in docs/graphs/sup.
- Updates to scripts in test/sup/octave, mostly to allow decent output
  in both GNU octave and Matlab.
- Updated README.md to mention and refer to the new PerformanceSmall.md
  document.
2020-08-03 11:48:43 +05:30
Field G. Van Zee
ba8fde137c Updated Eigen results in docs/graphs with 3.3.90.
Details:
- Updated the level-3 performance graphs in docs/graphs with new Eigen
  results, this time using a development version cloned from their git
  mirror on March 27, 2019 (version 3.3.90). Performance is improved
  over 3.3.7, though still noticeably short of BLIS/MKL in most cases.
- Very minor updates to docs/Performance.md and matlab scripts in
  test/3/matlab.
2020-08-03 11:48:06 +05:30
Field G. Van Zee
1bc2002706 Added Eigen results to performance graphs.
Details:
- Updated the Haswell, SkylakeX, and Epyc performance graphs in
  docs/graphs to report on Eigen implementations, where applicable.
  Specifically, Eigen implements all level-3 operations sequentially,
  however, of those operations it only provides multithreaded gemm.
  Thus, mt results for symm/hemm, syrk/herk, trmm, and trsm are
  omitted. Thanks to Sameer Agarwal for his help configuring and
  using Eigen.
- Updated docs/Performance.md to note the new implementation tested.
- CREDITS file update.
2020-08-03 11:48:05 +05:30
Field G. Van Zee
ec6776a1b5 Added docs/Performance.md and docs/graphs subdir.
Details:
- Added a new markdown document, docs/Performance.md, which reports
  performance of a representative set of level-3 operations across a
  variety of hardware architectures, comparing BLIS to OpenBLAS and a
  vendor library (MKL on Intel/AMD, ARMPL on ARM). Performance graphs,
  in pdf and png formats, reside in docs/graphs.
- Updated README.md to link to new Performance.md document.
- Minor updates to CREDITS, docs/Multithreading.md.
- Minor updates to matlab scripts in test/3/matlab.
2020-08-03 11:47:40 +05:30
Field G. Van Zee
5904e18d40 Updates to docs/Multithreading.md.
Details:
- Made extra explicit the fact that: (a) multithreading in BLIS is
  disabled by default; and (b) even with multithreading enabled, the
  user must specify multithreading at runtime in order to observe
  parallelism. Thanks to M. Zhou for suggesting these clarifications
  in #292.
- Also made explicit that only the environment variable and global
  runtime API methods are available when using the BLAS API. If the
  user wishes to use the local runtime API (specify multithreading on
  a per-call basis), one of the native BLIS APIs must be used.
2020-08-03 11:47:18 +05:30
Field G. Van Zee
c04966262b Mention disabling of sup in docs/Sandboxes.md.
Details:
- Added language to remind the reader to disable sup if the intended
  behavior is for the sandbox implementation to handle all problem
  sizes, even the smaller ones that would normally be handled by the
  sup code path.
2020-08-03 11:27:13 +05:30
Field G. Van Zee
fd5db714f4 Replaced use of bool_t type with C99 bool.
Details:
- Textually replaced nearly all non-comment instances of bool_t with the
  C99 bool type. A few remaining instances, such as those in the files
  bli_herk_x_ker_var2.c, bli_trmm_xx_ker_var2.c, and
  bli_trsm_xx_ker_var2.c, were promoted to dim_t since they were being
  used not for boolean purposes but to index into an array.
- This commit constitutes the third phase of a transition toward using
  C99's bool instead of bool_t, which was raised in issue #420. The first
  phase, which cleaned up various typecasts in preparation for using
  bool as the basis for bool_t (instead of gint_t), was implemented by
  commit a69a4d7. The second phase, which redefined the bool_t typedef
  in terms of bool (from gint_t), was implemented by commit 2c554c2.
2020-08-03 11:27:13 +05:30
Field G. Van Zee
5cbdbe495f Replaced broken ref99 sandbox w/ simpler version.
Details:
- The 'ref99' sandbox was broken by multiple refactorings and internal
  API changes over the last two years. Rather than try to fix it, I've
  replaced it with a much simpler version based on var2 of gemmsup.
  Why not fix the previous implementation? It occurred to me that the
  old implementation was trying to be a lightly simplified duplication
  of what exists in the framework. Duplication aside, this sandbox
  would have worked fine if it had been completely independent of the
  framework code. The problem was that it was only partially
  independent, with many function calls calling a function in BLIS
  rather than a duplicated/simplified version within the sandbox. (And
  the reason I didn't make it fully independent to begin with was that
  it seemed unnecessarily duplicative at the time.) Maintaining two
  versions of the same implementation is problematic for obvious
  reasons, especially when it wasn't even done properly to begin with.
  This explains the reimplementation in this commit. The only catch is
  that the newer implementation is single-threaded only and does not
  perform any packing on either input matrix (A or B). Basically, it's
  only meant to be a simple placeholder that shows how you could plug
  in your own implementation. Thanks to Francisco Igual for reporting
  this brokenness.
- Updated the three reference gemmsup kernels (defined in
  ref_kernels/3/bli_gemmsup_ref.c) so that they properly handle
  conjugation of conja and/or conjb. The general storage kernel, which
  is currently identical to the column-storage kernel, is used in the
  new ref99 sandbox to provide basic support for all datatypes
  (including scomplex and dcomplex).
- Minor updates to docs/Sandboxes.md, including adding the threading
  and packing limitations to the Caveats section.
- Fixed a comment typo in bli_l3_sup_var1n2m.c (upon which the new
  sandbox implementation is based).
2020-08-03 11:23:40 +05:30
Giorgos Margaritis
7db89fe91d Update Multithreading.md 2020-08-03 11:23:40 +05:30
Field G. Van Zee
2d7a43d7ef Fixed incorrect link to shiftd in BLISTypedAPI.md.
Details:
- Previously, the entry for shiftd in the Operation index section of
  BLISTypedAPI.md was incorrectly linking to the shiftd operation entry
  in BLISObjectAPI.md. This has been fixed. Thanks to Jeff Diamond for
  helping find this incorrect link.
2020-08-03 11:22:32 +05:30
Isuru Fernando
51c36e8019 Expand windows instructions (#414)
* Expand windows instructions

* Windows: both static and shared don't work at the same time
2020-08-03 11:22:32 +05:30
Isuru Fernando
c7f9684384 FIx typo in FAQ 2020-08-03 11:22:32 +05:30
Isuru Fernando
454047caa3 Add build instructions for Windows (#404) 2020-08-03 11:22:32 +05:30
Field G. Van Zee
713313562b Separate OS X and Windows into separate FAQs.
Details:
- Separated the unified Mac OS X / Windows frequently asked question
  into two separate questions, one for each OS.
2020-08-03 11:22:32 +05:30
Field G. Van Zee
6e522e5823 Mention disabling of sup in docs/Sandboxes.md.
Details:
- Added language to remind the reader to disable sup if the intended
  behavior is for the sandbox implementation to handle all problem
  sizes, even the smaller ones that would normally be handled by the
  sup code path.
2020-07-30 19:31:37 -05:00
Field G. Van Zee
00e14cb6d8 Replaced use of bool_t type with C99 bool.
Details:
- Textually replaced nearly all non-comment instances of bool_t with the
  C99 bool type. A few remaining instances, such as those in the files
  bli_herk_x_ker_var2.c, bli_trmm_xx_ker_var2.c, and
  bli_trsm_xx_ker_var2.c, were promoted to dim_t since they were being
  used not for boolean purposes but to index into an array.
- This commit constitutes the third phase of a transition toward using
  C99's bool instead of bool_t, which was raised in issue #420. The first
  phase, which cleaned up various typecasts in preparation for using
  bool as the basis for bool_t (instead of gint_t), was implemented by
  commit a69a4d7. The second phase, which redefined the bool_t typedef
  in terms of bool (from gint_t), was implemented by commit 2c554c2.
2020-07-29 14:24:34 -05:00
Field G. Van Zee
a6437a5c11 Replaced broken ref99 sandbox w/ simpler version.
Details:
- The 'ref99' sandbox was broken by multiple refactorings and internal
  API changes over the last two years. Rather than try to fix it, I've
  replaced it with a much simpler version based on var2 of gemmsup.
  Why not fix the previous implementation? It occurred to me that the
  old implementation was trying to be a lightly simplified duplication
  of what exists in the framework. Duplication aside, this sandbox
  would have worked fine if it had been completely independent of the
  framework code. The problem was that it was only partially
  independent, with many function calls calling a function in BLIS
  rather than a duplicated/simplified version within the sandbox. (And
  the reason I didn't make it fully independent to begin with was that
  it seemed unnecessarily duplicative at the time.) Maintaining two
  versions of the same implementation is problematic for obvious
  reasons, especially when it wasn't even done properly to begin with.
  This explains the reimplementation in this commit. The only catch is
  that the newer implementation is single-threaded only and does not
  perform any packing on either input matrix (A or B). Basically, it's
  only meant to be a simple placeholder that shows how you could plug
  in your own implementation. Thanks to Francisco Igual for reporting
  this brokenness.
- Updated the three reference gemmsup kernels (defined in
  ref_kernels/3/bli_gemmsup_ref.c) so that they properly handle
  conjugation of conja and/or conjb. The general storage kernel, which
  is currently identical to the column-storage kernel, is used in the
  new ref99 sandbox to provide basic support for all datatypes
  (including scomplex and dcomplex).
- Minor updates to docs/Sandboxes.md, including adding the threading
  and packing limitations to the Caveats section.
- Fixed a comment typo in bli_l3_sup_var1n2m.c (upon which the new
  sandbox implementation is based).
2020-07-20 19:21:07 -05:00
Giorgos Margaritis
171ecc1dc6 Update Multithreading.md 2020-07-20 12:24:06 +03:00
Field G. Van Zee
ceb9b95a96 Fixed incorrect link to shiftd in BLISTypedAPI.md.
Details:
- Previously, the entry for shiftd in the Operation index section of
  BLISTypedAPI.md was incorrectly linking to the shiftd operation entry
  in BLISObjectAPI.md. This has been fixed. Thanks to Jeff Diamond for
  helping find this incorrect link.
2020-06-18 17:15:25 -05:00