Commit Graph

1924 Commits

Author SHA1 Message Date
Field G. Van Zee
b9899bedff CREDITS file update. 2020-11-18 16:52:41 -06:00
Field G. Van Zee
88ad841434 Squash-merge 'pr' into 'squash'. (#457)
Merged contributions from AMD's AOCL BLIS (#448).
  
Details:
- Added support for level-3 operation gemmt, which performs a gemm on
  only the lower or upper triangle of a square matrix C. For now, only
  the conventional/large code path will be supported (in vanilla BLIS).
  This was accomplished by leveraging the existing variant logic for
  herk. However, some of the infrastructure to support a gemmtsup is
  included in this commit, including
  - A bli_gemmtsup() front-end, similar to bli_gemmsup().
  - A bli_gemmtsup_ref() reference handler function.
  - A bli_gemmtsup_int() variant chooser function (with variant calls
    commented out).
- Added support for inducing complex domain gemmt via the 1m method.
- Added gemmt APIs to the BLAS and CBLAS compatiblity layers.
- Added gemmt test module to testsuite.
- Added standalone gemmt test driver to 'test' directory.
- Documented gemmt APIs in BLISObjectAPI.md and BLISTypedAPI.md.
- Added a C++ template header (blis.hh) containing a BLAS-inspired
  wrapper to a set of polymorphic CBLAS-like function wrappers defined
  in another header (cblas.hh). These two headers are installed if
  running the 'install' target with INSTALL_HH is set to 'yes'. (Also
  added a set of unit tests that exercise blis.hh, although they are
  disabled for now because they aren't compatible with out-of-tree
  builds.) These files now live in the 'vendor' top-level directory.
- Various updates to 'zen' and 'zen2' subconfigurations, particularly
  within the context initialization functions.
- Added s and d copyv, setv, and swapv kernels to kernels/zen/1, and
  various minor updates to dotv and scalv kernels. Also added various
  sup kernels contributed by AMD to kernels/zen/3. However, these
  kernels are (for now) not yet used, in part because they caused
  AppVeyor clang failures, and also because I have not found time to
  review and vet them.
- Output the python found during configure into the definition of PYTHON
  in build/config.mk (via build/config.mk.in).
- Added early-return checks (A, B, or C with zero dimension; alpha = 0)
  to bli_gemm_front.c.
- Implemented explicit beta = 0 handling in for the sgemm ukernel in
  bli_gemm_armv7a_int_d4x4.c, which was previously missing. This latent
  bug surfaced because the gemmt module verifies its computation using
  gemm with its beta parameter set to zero, which, on a cortexa15 system
  caused the gemm kernel code to unconditionally multiply the
  uninitialized C data by beta. The C matrix likely contained
  non-numeric values such as NaN, which then would have resulted in a
  false failure.
- Fixed a bug whereby the implementation for bli_herk_determine_kc(),
  in bli_l3_blocksize.c, was inadvertantly being defined in terms of
  helper functions meant for trmm. This bug was probably harmless since
  the trmm code should have also done the right thing for herk.
- Used cpp macros to neutralize the various AOCL_DTL_TRACE_ macros in
  kernels/zen/3/bli_gemm_small.c since those macros are not used in
  vanilla BLIS.
- Added cpp guard to definition of bli_mem_clear() in bli_mem.h to
  accommodate C++'s stricter type checking.
- Added cpp guard to test/*.c drivers that facilitate compilation on
  Windows systems.
- Various whitespace changes.
2020-11-14 09:39:48 -06:00
Field G. Van Zee
234b8b0cf4 Increased dotxaxpyf testsuite thresholds.
Details:
- Increased the test thresholds used by the dotxaxpyf testsuite module
  by a factor of five in order to avoid residuals that unnecessarily
  fall in the MARGINAL range. This commit should fix #455. Thanks to
  @nagsingh for reporting this issue.
2020-11-12 19:11:16 -06:00
Field G. Van Zee
ed612dd82c Updated README.md with sgemmsup blurb.
Details:
- Added an entry to the "What's New" section of the README.md to
  announce the availability of sgemmsup.
2020-11-07 13:09:42 -06:00
Field G. Van Zee
e14424f55b Merge branch 'dev' 2020-11-07 13:02:50 -06:00
Field G. Van Zee
0cfe1aac22 Relocated operation index to ToC in API docs.
Details:
- Moved the "Operation index" section of both the BLISObjectAPI.md and
  BLISTypedAPI.md docs to appear immediately after the table of contents
  of each document. This allows the reader to quickly jump to the
  documentation for any operation without having to scroll through much
  of the document (when rendered via a web browser).
- Fixed a mistake in the BLISObjectAPI.md for the setd operation, which
  does *not* observe the diag property of its matrix argument. Thanks to
  Jeff Diamond for reporting this.
2020-10-30 17:10:36 -05:00
Field G. Van Zee
2a0682f8e5 Implemented runtime subconfig selection (#451).
Details:
- Implemented support for the user manually overriding the automatic
  subconfiguration selection that happens at runtime. This override
  can be requested by setting the BLIS_ARCH_TYPE environment variable.
  The variable must be set to the arch_t id (as enumerated in
  bli_type_defs.h) corresponding to the desired subconfiguration. If a
  value outside this enumerated range is given, BLIS will abort with an
  error message. If the value is in the valid range but corresponds to a
  subconfiguration that was not activated at configure-time/compile-time,
  BLIS will abort with a (different) error message. Thanks to decandia50
  for suggesting this feature via issue #451.
- Defined a new function bli_gks_lookup_id to return the address of an
  internal data structure within the gks. If this address is NULL, then
  it indicates that the subconfig corresponding to the arch_t id passed
  into the function was not compiled into BLIS. This function is used
  in the second of the two abort scenarios described above.
- Defined the enumerated error code BLIS_UNINITIALIZED_GKS_CNTX, which
  is returned for the latter of the two abort scenarios mentioned above,
  along with a corresponding error message and a function to perform
  the error check.
- Added cpp macro branching to bli_env.c to support compilation of the
  auto-detect.x executable during configure-time. This cpp branch is
  similar to the cpp code already found in bli_arch.c and bli_cpuid.c.
- Cleaned up the auto_detect() function to facilitate easier maintenance
  going forward. Also added a convenient debug switch that outputs the
  compilation command for the auto-detect.x executable and exits.
2020-10-18 18:04:03 -05:00
Field G. Van Zee
eccdd75a2d Whitespace tweak in docs/PerformanceSmall.md. 2020-10-09 15:44:16 -05:00
Field G. Van Zee
7677e9ba60 Merge branch 'dev' of github.com:flame/blis into dev 2020-10-09 15:41:25 -05:00
Field G. Van Zee
addcd46b05 Added Epyc 7742 Zen2 ("Rome") sup perf results.
Details:
- Added single-threaded and multithreaded sup performance results to
  docs/PerformanceSmall.md for both sgemm and dgemm. These results were
  gathered on an Epyc 7742 "Rome" server featuring AMD's Zen2
  microarchitecture. Special thanks to Jeff Diamond for facilitating
  access to the system via the Oracle Cloud.
- Updates to octave scripts in test/sup/octave for use with Octave 5.2
  and for use with subplot_tight().
- Minor updates to octave scripts in test/3/octave.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
- Decreased line thickness slightly in large/conventional Zen2 graphs.
  I'm done tweaking those this time. Really.
- Added missing line regarding eigen header installation for each
  microarchitecture section.
2020-10-09 15:41:09 -05:00
Field G. Van Zee
a0849d390d Register l3 sup kernels in zen2 subconfig.
Details:
- Registered full suite of sgemm and dgemm sup millikernels, blocksizes,
  and crossover thresholds in bli_cntx_init_zen2.c.
- Minor updates to test/sup/runme.sh for running on Zen2 Epyc 7742
  system.
2020-10-09 20:22:17 +00:00
Field G. Van Zee
d98368c32d Another tweak to line thickness of Zen2 graphs. 2020-10-08 19:05:51 -05:00
Field G. Van Zee
1855dfbdaa Tweaked line thickness in Zen2 graphs once more.
Details:
- Decreased (relative to previous commit) line thickness in recent Zen2
  graphs.
2020-10-08 19:01:00 -05:00
Field G. Van Zee
0991611e7e Increased line thickness in recent Zen2 graphs.
Details:
- Increased the width of the lines in the graphs introduced in 74ec6b8.
2020-10-08 18:54:49 -05:00
Field G. Van Zee
8273cbacd7 README.md, docs/FAQ.md updates.
Details:
- Added a frequently asked question to docs/FAQ.md regarding the
  difference between upstream (vanilla) BLIS and AMD BLIS.
- Updated the name of ICES in the README.md to reflect the Oden
  rebranding.
2020-10-07 14:51:33 -05:00
Field G. Van Zee
a178a822ad Added Zen2 links to docs/Performance.md Contents. 2020-09-30 16:00:52 -05:00
Field G. Van Zee
74ec6b8f45 Added Epyc 7742 Zen2 ("Rome") performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on an Epyc 7742
  "Rome" server with AMD's Zen2 microarchitecture. Special thanks
  to Jeff Diamond for facilitating access to the system via the
  Oracle Cloud.
- Renamed files containing the previous Zen performance results for
  consistency with the new results.
2020-09-30 15:54:18 -05:00
Field G. Van Zee
bc4a213a2c Updated matlab (now octave) plot code in test/3.
Details:
- Renamed test/3/matlab to test/3/octave.
- Within test/3, updated and tuned plot_l3_perf.m and plot_panel_4x5.m
  files for use with octave (which is free and doesn't crash on me
  mid-way through my use of subplot).
- Updated runthese.m scratchpad for zen2 invocations.
- Added Nikolay S.'s subplot_tight() function, along with its license.
2020-09-30 15:28:20 -05:00
Field G. Van Zee
c77ddc4181 Added optional numactl usage to test/3/runme.sh. 2020-09-30 20:15:43 +00:00
Nicholai Tukanov
2d8ec164e7 Add POWER10 support to BLIS (#450) 2020-09-29 16:52:18 -05:00
Field G. Van Zee
4fd8d9fec2 Tweaked zen2 subconfig's MC cache blocksizes.
Details:
- Updated the MC cache blocksizes registered by the 'zen2' subconfig.
- Minor updates to test/3/Makefile and test/3/runme.sh.
2020-09-28 23:39:05 +00:00
Field G. Van Zee
5efcdeffd5 More minor README.md updates. 2020-09-25 14:25:24 -05:00
Field G. Van Zee
9e940f8aad Added 1m SISC bibtex to README.md.
Details:
- Added final citation info to 1m bibtex in README.md file.
- Updated draft 1m paper link.
- Changed some http to https.
2020-09-25 13:53:35 -05:00
Field G. Van Zee
e293cae2d1 Implemented sgemmsup assembly kernels.
Details:
- Created a set of single-precision real millikernels and microkernels
  comparable to the dgemmsup kernels that already exist within BLIS.
- Added prototypes for all kernels within bli_kernels_haswell.h.
- Registered entry-point millikernels in bli_cntx_init_haswell.c and
  bli_cntx_init_zen.c.
- Added sgemmsup support to the Makefile, runme.sh script, and source
  file in test/sup. This included edits that allow for separate "small"
  dimensions for single- and double-precision as well as for single-
  vs. multithreaded execution.
2020-09-15 16:09:11 -05:00
Field G. Van Zee
2765c6f37c Type saga continues; fixed sgemm ukernel signature.
Details:
- Changed double* pointers in sgemm function signature to float*. At
  this point I've lost track of whether this was my fault or another
  dormant bug like the one described in ece9f6a, but at this point I
  no longer care. It's one of those days (aka I didn't ask for this).
2020-09-12 17:48:15 -05:00
Field G. Van Zee
0779559509 Fixed missing restrict in knl sgemm prototype.
Details:
- Added a missing 'restrict' qualifier in the sgemm ukernel prototype
  for knl. (Not sure how that code was ever compiling before now.)
2020-09-12 17:37:21 -05:00
Field G. Van Zee
ece9f6a3ef Fixed dormant type bugs in bli_kernels_knl.h.
Details:
- Fixed dormant type mismatches in the use of the prototype-generating
  macros in bli_kernels_knl.h. Specifically, some float prototypes
  were incorrectly using double as their ctype. This didn't actually
  matter until the type changes in 645d771, as previously those types
  were not used since packm was prototyped with void* pointers.
2020-09-12 17:22:42 -05:00
Field G. Van Zee
8ebb3b60e1 Fixed accidental breakage in 645d771.
Details:
- In trying to clean up kappa_cast variables in the reference packm
  kernels, which I initally believed to be redundant given the other
  void* -> ctype* changes in 645d771, I accidentally ended up violating
  restrict semantics for 1e/1r packing and possibly other packm kernels.
  (Normally, my pre-commit testsuite run would have caught this, but I
  was unknowingly using an edited input.operations file in which I'd
  disabled most tests as part of unrelated work.) This commit reverts
  the kappa_cast changes in 645d771.
2020-09-12 17:00:47 -05:00
Field G. Van Zee
645d771a14 Minor packm kernel type cleanup (void* -> ctype*).
Details:
- Changed all void* function arguments in reference packm kernels to
  those of the native type (ctype*). These pointers no longer need to
  be void* and are better represented by their native types anyway.
  (See below for details.) Updated knl packm kernels accordingly.
- In the definition of the PACKM_KER_PROT prototype macro template in
  frame/1m/bli_l1m_ker_prot.h, changed the pointer types for kappa, a,
  and p from void* to ctype*. They were originally void* because these
  function signatures had to share the same type so they could all be
  stored in a single array of that shared type, from which they were
  queried and called by packm_cxk(). This is no longer how the function
  pointers are stored, and so it no longer makes sense to force the
  caller of packm kernels to use void*, only so that the implementor
  of the packm kernels can typecast back to the native datatype within
  the kernel definition. This change has no effect internally within
  BLIS because currently all packm kernels are called after querying
  the function addresses from the context and then typecasting to the
  appropriate function pointer type, which is based upon type-specific
  function pointers like float* and double*.
- Removed a comment in frame/1m/bli_l1m_ft_ker.h that was outdated and
  misleading due to changes to the handling of packm kernels since
  moving them into the context.
2020-09-12 15:31:56 -05:00
Field G. Van Zee
54bf6c3554 Minor README.md update.
Details:
- Added a new entry to the "What people are saying about BLIS" section.
2020-09-10 15:42:01 -05:00
Field G. Van Zee
e50b4d4046 Minor update to README.md (SIAM Best Paper Prize). 2020-09-09 14:12:53 -05:00
Devin Matthews
a8efb72074 Merge pull request #434 from flame/intel-zdot
Add an option to change the complex return type.
2020-09-07 16:18:19 -05:00
Field G. Van Zee
97e87f2c9f Whitespace/comment updates to #434 PR. 2020-09-07 15:56:42 -05:00
Devin Matthews
b0c4da1732 Merge pull request #436 from flame/s390x
Add checks so that s390x is detected as 64-bit.
2020-09-07 15:47:54 -05:00
Field G. Van Zee
810e90ee80 Minor README.md update.
Details:
- Added HPE to list of funders.
- Changed http to https in funders' website links.
2020-09-01 16:11:40 -05:00
Devin Matthews
7d41128219 Use -O2 for all framework code. (#435)
It seems that -O3 might be causing intermittent problems with the f2c'ed packed and banded code. -O3 is retained for kernel code. Fixes #341 and fixes #342.
2020-08-13 17:50:58 -05:00
Dave Love
9c5b485d35 Don't override -mcpu with -march on ARM (#353)
* Use -mcpu for ARM
See the GCC doc about -march, -mtune, and -mpu and maybe
https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu

* Fix typo in flags

* Fix typo in cortexa9 flags

* Modify cortexa53 compilation flags to fix failing BLAS check (#341)
2020-08-07 15:11:18 -05:00
Devin Matthews
c253d14a72 Also handle Intel-style complex return in CBLAS interface. 2020-08-07 09:39:04 -05:00
Devin Matthews
5d653a11a0 Update Multithreading.md
Addresses the issue raised in #426.
2020-08-06 17:58:26 -05:00
Devin Matthews
b1b5870dd3 Add checks so that s390x is detected as 64-bit. 2020-08-06 17:34:20 -05:00
Field G. Van Zee
882dcb11bf Mention example code at top of documentation docs.
Details:
- Steer the reader towards the example code section of each
  documentation doc (object and typed).
- Trivial update to examples/oapi/README, examples/tapi/README.
2020-08-06 17:28:14 -05:00
Field G. Van Zee
f4894512e5 Very minor updates to previous commit. 2020-08-06 17:20:00 -05:00
Field G. Van Zee
adedb893ae Documented mutator functions in BLISObjectAPI.md.
Details:
- Added documentation for commonly-used object mutator functions in
  BLISObjectAPI.md. Previously, only accessor functions were documented.
  Thanks to Jeff Diamond for pointing out this omission.
- Explicitly set the 'diag' property of objects in oapi example modules
  (08level2.c and 09level3.c).
2020-08-06 17:14:01 -05:00
Devin Matthews
5b5278ff49 Use #ifdef instead of #if as macro may be undefined. 2020-08-06 14:19:37 -05:00
Devin Matthews
7fdc0fc893 Add an option to change the complex return type.
ifort apparently does not return complex numbers in registers as in C/C++ (or gfortran), but instead creates a "hidden" first parameter for the return value. The option --complex-return=gnu|intel has been added, as well as a guess based on a provided FC if not specified (otherwise default to gnu). This option affects the signatures of cdotc, cdotu, zdotc, and zdotu, and a single library cannot be used with both GNU and Intel Fortran compilers. Fixes #433.
2020-08-06 14:09:23 -05:00
Field G. Van Zee
6e522e5823 Mention disabling of sup in docs/Sandboxes.md.
Details:
- Added language to remind the reader to disable sup if the intended
  behavior is for the sandbox implementation to handle all problem
  sizes, even the smaller ones that would normally be handled by the
  sup code path.
2020-07-30 19:31:37 -05:00
Field G. Van Zee
00e14cb6d8 Replaced use of bool_t type with C99 bool.
Details:
- Textually replaced nearly all non-comment instances of bool_t with the
  C99 bool type. A few remaining instances, such as those in the files
  bli_herk_x_ker_var2.c, bli_trmm_xx_ker_var2.c, and
  bli_trsm_xx_ker_var2.c, were promoted to dim_t since they were being
  used not for boolean purposes but to index into an array.
- This commit constitutes the third phase of a transition toward using
  C99's bool instead of bool_t, which was raised in issue #420. The first
  phase, which cleaned up various typecasts in preparation for using
  bool as the basis for bool_t (instead of gint_t), was implemented by
  commit a69a4d7. The second phase, which redefined the bool_t typedef
  in terms of bool (from gint_t), was implemented by commit 2c554c2.
2020-07-29 14:24:34 -05:00
Field G. Van Zee
2c554c2fce Redefined bool_t typedef in terms of C99 bool.
Details:
- Changed the typedef that defines bool_t from:

    typedef gint_t bool_t;

  where gint_t is a signed integer that forms the basis of most other
  integers in BLIS, to:

    typedef bool bool_t;

- Changed BLIS's TRUE and FALSE macro definitions from being in terms of
  integer literals:

    #define TRUE  1
    #define FALSE 0

  to being in terms of C99 boolean constants:

    #define TRUE  true
    #define FALSE false

  which are provided by stdbool.h.
- This commit constitutes the second phase of a transition toward using
  C99's bool instead of bool_t, which will address issue #420. The first
  phase, which cleaned up various typecasts in preparation for using
  bool as the basis for bool_t (instead of gint_t), was implemented by
  commit a69a4d7.
2020-07-24 15:57:19 -05:00
Field G. Van Zee
e01dd12558 Fail-safe updates to Makefiles in 'test' dir.
Details:
- Updated Makefiles in test, test/3, and test/sup so that running any of
  the usual targets without having first built BLIS results in a helpful
  error message. For example, if BLIS is not yet configured, make will
  output:

    Makefile:327: *** Cannot proceed: config.mk not detected! Run
    configure first.  Stop.

  Similarly, if BLIS is configured but not yet built, make will output:

    Makefile:340: *** Cannot proceed: BLIS library not yet built! Run
    make first.  Stop.

  In previous commits, these actions would result in a rather cryptic
  make error such as:

    make: *** No rule to make target 'test_sgemm_2400_asm_blis_st.x',
    needed by 'blis-nat-st'.  Stop.
2020-07-24 15:41:46 -05:00
Devin Matthews
b4f47f7540 Add BLIS_EXPORT_BLIS to bli_abort. (#429)
Fixes #428.
2020-07-24 13:56:13 -05:00