Commit Graph

2125 Commits

Author SHA1 Message Date
RuQing Xu
4bfadf9b56 Firestorm Block Size Fixes 2021-10-06 01:51:26 +09:00
RuQing Xu
40baf83f0e Armv8 Handle *beta == 0 for GEMMSUP ??r Case. 2021-10-06 01:00:52 +09:00
Devin Matthews
079fbd42ce Merge branch 'master' into arm64-hi-bw 2021-10-04 17:21:48 -05:00
Devin Matthews
9905f44347 Merge pull request #553 from flame/rpath-fix
Add an option to use an @rpath-dependent install_name on macOS
2021-10-04 15:58:59 -05:00
Devin Matthews
6d3036e31d Merge pull request #545 from hominhquan/clean_error
bli_error: more cleanup on the error strings array
2021-10-04 15:58:43 -05:00
Devin Matthews
53377fcca9 Merge pull request #554 from flame/armsve-cleanup
Move unused ARM SVE kernels to "old" directory.
2021-10-04 15:45:53 -05:00
Devin Matthews
80c5366e4a Move unused ARM SVE kernels to "old" directory. 2021-10-04 15:40:28 -05:00
Devin Matthews
64a421f698 Add an option to control whether or not to use @rpath.
Adds `--enable-rpath/--disable--rpath` (default disabled) to use an install_name starting with @rpath/. Otherwise, set the install_name to the absolute path of the install library, which was the previous behavior.
2021-10-04 13:40:43 -05:00
Devin Matthews
c4a31683dd Fix $ORIGIN usage on linux. 2021-10-04 13:27:10 -05:00
Dave Love
d0a0b4b841 Arm micro-architecture dispatch (#344)
Details:
- Reworked support for ARM hardware detection in bli_cpuid.c to parse 
  the result of a CPUID-like instruction.
- Added a64fx support to bli_gks.c.
- #include arm64 and arm32 family headers from bli_arch_config.h.
- Fix the ordering of the "armsve" and "a64fx" strings in the 
  config_name string array in bli_arch.c. The ordering did not match
  the ordering of the corresponding arch_t values in bli_type_defs.h,
  as it should have all along.
- Added clang support to make_defs.mk in arm64, cortexa53, cortexa57 
  subconfigs.
- Updated arm64 and arm32 families in config_registry.
- Updated docs/HardwareSupport.md to reflect added ARM support.
- Thanks to Dave Love, RuQing Xu, and Devin Matthews for their
  contributions in this PR (#344).
2021-10-04 13:03:04 -05:00
Devin Matthews
91408d161a Use @path-based install name on MacOS and use relocatable RPATH entries for testsuite inaries.
- RPATH entries (and DYLD_LIBRARY_PATH) do nothing on macOS unless the install_name of the library starts with @rpath/. While the install_name can be set to the absolute install path, this makes the installation non-relocatable. When using @path in the install_name, install paths within the normal DYLD_LIBRARY_PATH work with no changes on the user side, but for install paths off the beaten track, users must specify an RPATH entry when linking (or modify DYLD_LIBRARY_PATH at runtime). Perhaps this could be made into a configure-time option.
- Having relocable testsuite binaries is not necessarily a priority but it is easy to do with @executable_path (macOS) or $ORIGIN (linux/BSD).
2021-10-04 11:37:48 -05:00
RuQing Xu
f5c03e9fe8 Armv8 Handle *beta == 0 for GEMMSUP ?rc Case. 2021-10-03 16:51:51 +09:00
RuQing Xu
abc648352c Armv8 Fix 6x8 Row-Maj Ukr
- Fixed for 6x8 only, 4x4 & 4x8 pending;
- Installed to config firestorm as benchmark seems to show better perf:
   Old:
blis_dgemm_ukr_c                     6     8   320    36.87   2.43e-17   PASS
blis_dgemm_ukr_c                     6     8   352    40.55   1.04e-17   PASS
blis_dgemm_ukr_c                     6     8   384    44.24   5.68e-17   PASS
blis_dgemm_ukr_c                     6     8   416    41.67   3.51e-17   PASS
blis_dgemm_ukr_c                     6     8   448    34.41   2.94e-17   PASS
blis_dgemm_ukr_c                     6     8   480    42.53   2.35e-17   PASS

   New:
blis_dgemm_ukr_r                     6     8   352    50.69   1.59e-17   PASS
blis_dgemm_ukr_r                     6     8   384    49.15   5.55e-17   PASS
blis_dgemm_ukr_r                     6     8   416    50.44   2.86e-17   PASS
blis_dgemm_ukr_r                     6     8   448    46.92   3.12e-17   PASS
blis_dgemm_ukr_r                     6     8   480    48.08   4.08e-17   PASS
2021-10-03 13:14:19 +09:00
Devin Matthews
0a45bc0fbc Merge pull request #552 from flame/armsve_beta_0
Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.
2021-10-02 18:59:43 -05:00
Devin Matthews
13dbd5b5d3 Apply patch from @xrq-phys. 2021-10-02 16:08:05 -05:00
Devin Matthews
ae0eeeaf77 Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs. 2021-09-29 16:43:38 -05:00
Field G. Van Zee
5013a6cb71 More edits and fixes to docs/FAQ.md. 2021-09-29 10:38:50 -05:00
Field G. Van Zee
b36fb0fbc5 Fixed newly broken link to CREDITS in FAQ.md. 2021-09-28 18:47:45 -05:00
Field G. Van Zee
3442d4002b More minor fixes to FAQ.md and Sandboxes.md. 2021-09-28 18:43:23 -05:00
Field G. Van Zee
89aaf00650 Updates to FAQ.md, Sandboxes.md, and README.md.
Details:
- Updated FAQ.md to include two new questions, reordered an existing
  question, and also removed an outdated and redundant question about
  BLIS vs. AMD BLIS.
- Updated Sandboxes.md to use 'gemmlike' as its main example, along with
  other smaller details.
- Added ARM as a funder to README.md.
2021-09-28 18:34:33 -05:00
Field G. Van Zee
c52c43115e Merge branch 'dev' 2021-09-26 15:56:54 -05:00
Field G. Van Zee
1fc23d2141 Safelist 'master', 'dev', 'amd' branches.
Details:
- Modified .travis.yml so that only commits to 'master', 'dev', and
  'amd' branches get built by Travis CI. Thanks to Devin Matthews for
  helping to track down the syntax for this change.
2021-09-21 14:54:20 -05:00
Field G. Van Zee
1f527a93b9 Re-enable and fix fb93d24.
Details:
- Re-enabled the changes made in fb93d24.
- Defined BLIS_ENABLE_SYSTEM in bli_arch.c, bli_cpuid.c, and bli_env.c,
  all of which needed the definition (in addition to config_detect.c) in
  order for the configure-time hardware detection binary to be compiled
  properly. Thanks to Minh Quan Ho for helping identify these additional
  files as needing to be updated.
- Added additional comments to all four source files, most notably to
  prompt the reader to remember to update all of the files when updating
  any of the files. Also made the cpp code in each of the files as
  consistent/similar as possible.
- Refer to issues #532 and PR #546 for more history.
2021-09-20 17:56:36 -05:00
Field G. Van Zee
7b39c14920 Reverted fb93d24.
Details:
- The latest changes in fb93d24 are still causing problems. Reverting
  and preparing to move them to a branch.
2021-09-20 16:13:50 -05:00
Field G. Van Zee
fb93d242a4 Re-enable and fix 8e0c425 (BLIS_ENABLE_SYSTEM).
Details:
- Re-enable the changes originally made in 8e0c425 but quickly reverted
  in 2be78fc.
- Moved the #include of bli_config.h so that it occurs before the
  #include of bli_system.h. This allows the #define BLIS_ENABLE_SYSTEM
  or #define BLIS_DISABLE_SYSTEM in bli_config.h to be processed by the
  time it is needed in bli_system.h. This change should have been
  in the original 8e0c425, but was accidentally omitted. Thanks to Minh
  Quan Ho for catching this.
- Add #define BLIS_ENABLE_SYSTEM to config_detect.c so that the proper
  cpp conditional branch executes in bli_system.h when compiling the
  hardware detection binary. The changes made in 8e0c425 were an attempt
  to support the definition of BLIS_OS_NONE when configuring with
  --disable-system (in issue #532).  That commit failed because, aside
  from the required but omitted header reordering (second bullet above),
  AppVeyor was unable to compile the hardware detection binary as a
  result of missing Windows headers. This commit, which builds on PR
  #546, should help fix that issue. Thanks to Minh Quan Ho for his
  assistance and patience on this matter.
2021-09-20 15:42:08 -05:00
Minh Quan HO
eaa554aa52 bli_error: more cleanup on the error strings array
- There was redundance between the macro BLIS_MAX_NUM_ERR_MSGS (=200) and
  the enum BLIS_ERROR_CODE_MAX (-170), while they both mean the same thing:
  the maximal number of error codes/messages.
- The previous initialization of error messages at compile time ignored that
  the 'bli_error_string' array still occupies useless memory due to 2D char[][]
  declaration. Instead, it should be just an array of pointers, pointing at
  strings in .rodata section.
- This commit does the two modifications:
   * retired macros BLIS_MAX_NUM_ERR_MSGS and BLIS_MAX_ERR_MSG_LENGTH everywhere
   * switch bli_error_string from char[][] to char *[] to reduce its footprint
     from 40KB (200*200) to 1.3KB (170*sizeof(char*)).
     (No problem to use the enum BLIS_ERROR_CODE_MAX at compile-time,
     since compiler is smart enough to determine its value is 170.)
2021-09-20 10:39:05 +02:00
Field G. Van Zee
52f29f739d Removed last vestige of #define BLIS_NUM_ARCHS.
Details:
- Removed the commented-out #define BLIS_NUM_ARCHS in bli_type_defs.h
  and its associated (now outdated) comments. BLIS_NUM_ARCHS has been
  part of the arch_t enum for some time now, and so this change is
  mostly about removing any opportunity for confusion for people who
  may be reading the code. Thanks to Minh Quan Ho for leading me to
  cleanup.
2021-09-17 08:38:29 -05:00
Field G. Van Zee
849aae09f4 Added new packm var3 to 'gemmlike'.
Details:
- Defined a new packm variant for the 'gemmlike' sandbox. This new
  variant (bls_l3_packm_var3.c) parallelizes the packing operation over
  the k dimension rather than the m or n dimensions. Note that the
  gemmlike implementation still uses var1 by default, and use of the new
  code would require changing bls_l3_packm_a.c and/or bls_l3_packm_b.c
  so that var3 is called instead. Thanks to Jeff Diamond for proposing
  this (perhaps NUMA-friendly) solution.
2021-09-16 14:47:45 -05:00
Devin Matthews
b6f71fd378 Merge pull request #544 from flame/haswell-gemmsup-fpe
Fix more copy-paste errors in the haswell gemmsup code.
2021-09-16 12:24:33 -05:00
Devin Matthews
e3dc1954ff Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell.
The fix is to use the same (valid) source register twice in the horizontal addition.
2021-09-16 10:59:37 -05:00
Devin Matthews
5191c43fac Fix more copy-paste errors in the haswell gemmsup code.
Fixes #486.
2021-09-16 10:16:17 -05:00
Devin Matthews
9293a68eb6 Merge pull request #534 from flame/cxx_test
Add test to Travis using C++ compiler to make sure blis.h is C++-compatible
2021-09-10 14:13:29 -05:00
Devin Matthews
98ce6e8bc9 Do a fast test on OSX. [ci skip] 2021-09-10 14:12:13 -05:00
Devin Matthews
c76fcad0c2 Fix AArch64 tests and consolidate some other tests. 2021-09-10 13:57:02 -05:00
Devin Matthews
e486d666ff Use C++ cross-compiler for ARM tests. 2021-09-10 13:50:16 -05:00
Devin Matthews
fbb3560cb8 Attempt to fix cxx-test for OOT builds. 2021-09-10 13:38:27 -05:00
Devin Matthews
9c0064f3f6 Fix config_name in bli_arch.c 2021-09-10 10:39:04 -05:00
Field G. Van Zee
ade10f4278 Updated travis-ci.org link in README.md to .com. 2021-08-27 12:47:12 -05:00
Field G. Van Zee
2be78fc977 Disabled (at least temporarily) commit 8e0c425.
Details:
- Reverted changes in 8e0c425 due to AppVeyor build failures that we do
  not yet understand.
2021-08-27 12:17:26 -05:00
RuQing Xu
820f11a469 Arm Whole GEMMSUP Call Route is Asm/Int Optimized
- `ref2` call in `bli_gemmsup_rv_armv8a_asm_d6x8m.c` is commented out.
- `bli_gemmsup_rv_armv8a_asm_d4x8m.c` contains a tail `ref2` call but
  it's not called by any upper routine.
2021-08-27 13:40:26 +09:00
Field G. Van Zee
8e0c4255de Define BLIS_OS_NONE when using --disable-system.
Details:
- Modified bli_system.h so that the cpp macro BLIS_OS_NONE is defined
  when BLIS_DISABLE_SYSTEM is defined. Otherwise, the previous OS-
  detecting macro conditionals are considered. This change is to
  accommodate a solution to a cross-compilation issue described in
  #532.
2021-08-26 15:29:18 -05:00
Field G. Van Zee
d6eb70fbc3 Updated stale calls to malloc_intl() in gemmlike.
Details:
- Updated two out-of-date calls to bli_malloc_intl() within the gemmlike
  sandbox. These calls to malloc_intl(), which resided in
  bls_l3_decor_pthreads.c, were missing the err_t argument that the
  function uses to report errors. Thanks to Jeff Diamond for helping
  isolate this issue.
2021-08-26 13:12:39 -05:00
Field G. Van Zee
2f7325b2b7 Blacklist clang10/gcc9 and older for 'armsve'.
Details:
- Prohibit use of clang 10.x and older or gcc 9.x and older for the
  'armsve' subconfiguration. Addresses issue #535.
2021-08-23 15:04:05 -05:00
RuQing Xu
7e2951e61f Arm: DGEMMSUP `Macro' Edge Cases Stop Calling Ref
Ref cannot handle panel strides (packed cases) thus cannot be called
from the beginning of `gemmsup` (i.e. cannot be dispatch target of
gemmsup to other sizes.)
2021-08-23 17:06:44 +09:00
RuQing Xu
4fd82b0e93 Header Typo 2021-08-23 05:18:32 +09:00
RuQing Xu
35409ebe67 Arm: DGEMMSUP ??r(rv) Invoke Edge Size
Plus some fix at edges.

TODO: Should ensure that no ref kernel appear in beginning of gemmsup
kernels. As ref does not recognise panel stride.
2021-08-23 04:51:47 +09:00
RuQing Xu
a361492c24 Arm: DGEMMSUP ?rc(rd) Invoke Edge Size 2021-08-23 01:13:39 +09:00
Devin Matthews
eaea67401c Merge branch 'master' into cxx_test 2021-08-21 16:09:31 -05:00
Devin Matthews
5fc65cdd9e Add test to Travis using C++ compiler to make sure blis.h is C++-compatible. 2021-08-21 15:59:27 -05:00
Field G. Van Zee
e320ec6d5c Moved lang defs from _macro_def.h to _lang_defs.h.
Details:
- Moved miscellaneous language-related definitions, including defs
  related to the handling of the 'restrict' keyword, from the top half
  of bli_macro_defs.h into a new file, bli_lang_defs.h, which is now
  #included immediately after "bli_system.h" in blis.h. This change is
  an attempt to fix a report of recent breakage of C++ compilers due
  to the recent introduction of 'restrict' in bli_type_defs.h (which
  previously was being included *before* bli_macro_defs.h and its
  restrict handling therein. Thanks to Ivan Korostelev for reporting
  this issue in #527.
- CREDITS file update.
2021-08-20 17:15:20 -05:00