Commit Graph

2160 Commits

Author SHA1 Message Date
Minh Quan Ho
81e1034632 Alloc at least 1 elem in pool_t block_ptrs. (#560)
Details:
- Previously, the block_ptrs field of the pool_t was allowed to be
  initialized as any unsigned integer, including 0. However, a length of
  0 could be problematic given that malloc(0) is undefined and therefore
  variable across implementations. As a safety measure, we check for
  block_ptrs array lengths of 0 and, in that case, increase them to 1.
- Co-authored-by: Minh Quan Ho <minh-quan.ho@kalray.eu>
2021-10-13 13:28:02 -05:00
Minh Quan Ho
327481a4b0 Fix insufficient pool-growing logic in bli_pool.c. (#559)
Details:
- The current mechanism for growing a pool_t doubles the length of the
  block_ptrs array every time the array length needs to be increased
  due to new blocks being added. However, that logic did not take in
  account the new total number of blocks, and the fact that the caller
  may be requesting more blocks that would fit even after doubling the
  current length of block_ptrs. The code comments now contain two 
  illustrating examples that show why, even after doubling, we must 
  always have at least enough room to fit all of the old blocks plus
  the newly requested blocks.
- This commit also happens to fix a memory corruption issue that stems
  from growing any pool_t that is initialized with a block_ptrs length
  of 0. (Previously, the memory pool for packed buffers of C was 
  initialized with a block_ptrs length of 0, but because it is unused 
  this bug did not manifest by default.)
- Co-authored-by: Minh Quan Ho <minh-quan.ho@kalray.eu>
2021-10-12 12:53:04 -05:00
Devin Matthews
32a6d93ef6 Merge pull request #543 from xrq-phys/armsve-packm-fix
ARMSVE Block SVE-Intrinsic Kernels for GCC 8-9
2021-10-09 15:53:54 -05:00
Devin Matthews
408906fdd8 Merge pull request #542 from xrq-phys/armsve-zgemm
Arm SVE CGEMM / ZGEMM Natural Kernels
2021-10-09 15:50:25 -05:00
RuQing Xu
ccf16289d2 Arm SVE C/ZGEMM Fix FMOV 0 Mistake
FMOV [hsd]M, #imm does not allow zero immediate.
Use wzr, xzr instead.
2021-10-08 12:34:14 +09:00
RuQing Xu
82b61283b2 SH Kernel Unused Eigher 2021-10-08 12:17:29 +09:00
RuQing Xu
1749dfa493 Arm SVE C/ZGEMM Support *beta==0 2021-10-08 12:13:08 +09:00
RuQing Xu
4b648e47da Arm SVE Config armsve Use ZGEMM/CGEMM 2021-10-08 12:13:08 +09:00
RuQing Xu
f76ea905e2 Arm SVE: Update Perf. Graph
Pic. size seems a bit different from upstream.
Generaged w/ MATLAB. Open to any change.
2021-10-08 12:13:08 +09:00
RuQing Xu
66a018e6ad Arm SVE CGEMM 2Vx10 Unindex Process Alpha=1.0 2021-10-08 12:13:08 +09:00
RuQing Xu
9e1e781cb5 Arm SVE ZGEMM 2Vx10 Unindex Process Alpha=1.0 2021-10-08 12:13:08 +09:00
RuQing Xu
f7c6c2b119 A64FX Config Use ZGEMM/CGEMM 2021-10-08 12:13:08 +09:00
RuQing Xu
e4cabb977d Arm SVE Typo Fix ZGEMM/CGEMM C Prefetch Reg 2021-10-08 12:13:08 +09:00
RuQing Xu
b677e0d61b Arm SVE Add SGEMM 2Vx10 Unindexed 2021-10-08 12:13:07 +09:00
RuQing Xu
3f68e8309f Arm SVE ZGEMM Support Gather Load / Scatt. St. 2021-10-08 12:13:07 +09:00
RuQing Xu
c19db2ff82 Arm SVE Add ZGEMM 2Vx10 Unindexed 2021-10-08 12:13:07 +09:00
RuQing Xu
e13abde30b Arm SVE Add ZGEMM 2Vx7 Unindexed 2021-10-08 12:13:06 +09:00
RuQing Xu
49b9d7998e Arm SVE Add ZGEMM 2Vx8 Unindexed 2021-10-08 12:12:48 +09:00
Devin Matthews
4277fec0d0 Merge pull request #533 from xrq-phys/arm64-hi-bw
ARMv8 PACKM and GEMMSUP Kernels + Apple Firestorm Subconfig
2021-10-07 13:47:22 -05:00
Devin Matthews
2329d99016 Update Travis CI badge
[ci skip]
2021-10-07 12:37:58 -05:00
RuQing Xu
f44149f787 Armv8 Trash New Bulk Kernels
- They didn't make much improvements.
- Can't register row-preferral and column-preferral ukrs at the same time.
  Will break 1m.
2021-10-08 02:35:58 +09:00
Devin Matthews
70b52cadc5 Enable testing 1m in make check. 2021-10-07 12:34:35 -05:00
RuQing Xu
2604f40713 Config ArmSVE Unregister 12xk. Move 12xk to Old 2021-10-07 02:39:00 +09:00
RuQing Xu
1e3200326b Revert __has_include(). Distinguish w/ BLIS_FAMILY_** 2021-10-07 02:37:14 +09:00
RuQing Xu
a4066f278a Register firestorm into arm64 Metaconfig 2021-10-07 02:26:05 +09:00
RuQing Xu
d7a3372247 Armv8 DGEMMSUP Fix Edge 6x4 Switch Case Typo 2021-10-07 02:25:14 +09:00
RuQing Xu
2920dde5ac Armv8 DGEMMSUP Fix 8x4m Store Inst. Typo 2021-10-07 02:01:45 +09:00
Devin Matthews
14b13583f1 Add test for Apple M1 (firestorm)
This test will run on Linux, but all the kernels should run just fine. This does not test autodetection but then none of the other ARM tests do either.
2021-10-06 10:22:34 -05:00
RuQing Xu
a024715065 Firestorm CPUID Dispatcher
Commenting out <sys/sysctl.h> due to possibly a Xcode bug.
2021-10-07 00:15:54 +09:00
RuQing Xu
b9da6d55fe Armv8 GEMMSUP Edge Cases Require Signed Ints
Fix a bug in bli_gemmsup_rd_armv8a_asm_d6x8m.c.
For safety upon similar strategies in the future,
 change all [mn]_[iter/left] into signed ints.
2021-10-06 12:25:54 +09:00
Devin Matthews
34919de3df Make error checking level a thread-local variable.
Previously, this was a global variable. Setting the value was synchronized via a mutex but reading the value was not. Of course, these accesses are almost certainly atomic, but there is still the possibility of one thread attempting to set the value and then reading the value set by another thread. For correct operation under user threading (e.g. pthreads), this should probably be thread-local with no mutex.
2021-10-05 15:22:31 -05:00
Devin Matthews
c3024993c3 Fix data race in testsuite. 2021-10-05 15:20:27 -05:00
Devin Matthews
353a0d8257 Update .appveyor.yml
[ci skip]
2021-10-05 14:24:17 -05:00
RuQing Xu
4bfadf9b56 Firestorm Block Size Fixes 2021-10-06 01:51:26 +09:00
RuQing Xu
40baf83f0e Armv8 Handle *beta == 0 for GEMMSUP ??r Case. 2021-10-06 01:00:52 +09:00
Devin Matthews
079fbd42ce Merge branch 'master' into arm64-hi-bw 2021-10-04 17:21:48 -05:00
Devin Matthews
9905f44347 Merge pull request #553 from flame/rpath-fix
Add an option to use an @rpath-dependent install_name on macOS
2021-10-04 15:58:59 -05:00
Devin Matthews
6d3036e31d Merge pull request #545 from hominhquan/clean_error
bli_error: more cleanup on the error strings array
2021-10-04 15:58:43 -05:00
Devin Matthews
53377fcca9 Merge pull request #554 from flame/armsve-cleanup
Move unused ARM SVE kernels to "old" directory.
2021-10-04 15:45:53 -05:00
Devin Matthews
80c5366e4a Move unused ARM SVE kernels to "old" directory. 2021-10-04 15:40:28 -05:00
Devin Matthews
64a421f698 Add an option to control whether or not to use @rpath.
Adds `--enable-rpath/--disable--rpath` (default disabled) to use an install_name starting with @rpath/. Otherwise, set the install_name to the absolute path of the install library, which was the previous behavior.
2021-10-04 13:40:43 -05:00
Devin Matthews
c4a31683dd Fix $ORIGIN usage on linux. 2021-10-04 13:27:10 -05:00
Dave Love
d0a0b4b841 Arm micro-architecture dispatch (#344)
Details:
- Reworked support for ARM hardware detection in bli_cpuid.c to parse 
  the result of a CPUID-like instruction.
- Added a64fx support to bli_gks.c.
- #include arm64 and arm32 family headers from bli_arch_config.h.
- Fix the ordering of the "armsve" and "a64fx" strings in the 
  config_name string array in bli_arch.c. The ordering did not match
  the ordering of the corresponding arch_t values in bli_type_defs.h,
  as it should have all along.
- Added clang support to make_defs.mk in arm64, cortexa53, cortexa57 
  subconfigs.
- Updated arm64 and arm32 families in config_registry.
- Updated docs/HardwareSupport.md to reflect added ARM support.
- Thanks to Dave Love, RuQing Xu, and Devin Matthews for their
  contributions in this PR (#344).
2021-10-04 13:03:04 -05:00
Devin Matthews
91408d161a Use @path-based install name on MacOS and use relocatable RPATH entries for testsuite inaries.
- RPATH entries (and DYLD_LIBRARY_PATH) do nothing on macOS unless the install_name of the library starts with @rpath/. While the install_name can be set to the absolute install path, this makes the installation non-relocatable. When using @path in the install_name, install paths within the normal DYLD_LIBRARY_PATH work with no changes on the user side, but for install paths off the beaten track, users must specify an RPATH entry when linking (or modify DYLD_LIBRARY_PATH at runtime). Perhaps this could be made into a configure-time option.
- Having relocable testsuite binaries is not necessarily a priority but it is easy to do with @executable_path (macOS) or $ORIGIN (linux/BSD).
2021-10-04 11:37:48 -05:00
RuQing Xu
f5c03e9fe8 Armv8 Handle *beta == 0 for GEMMSUP ?rc Case. 2021-10-03 16:51:51 +09:00
RuQing Xu
abc648352c Armv8 Fix 6x8 Row-Maj Ukr
- Fixed for 6x8 only, 4x4 & 4x8 pending;
- Installed to config firestorm as benchmark seems to show better perf:
   Old:
blis_dgemm_ukr_c                     6     8   320    36.87   2.43e-17   PASS
blis_dgemm_ukr_c                     6     8   352    40.55   1.04e-17   PASS
blis_dgemm_ukr_c                     6     8   384    44.24   5.68e-17   PASS
blis_dgemm_ukr_c                     6     8   416    41.67   3.51e-17   PASS
blis_dgemm_ukr_c                     6     8   448    34.41   2.94e-17   PASS
blis_dgemm_ukr_c                     6     8   480    42.53   2.35e-17   PASS

   New:
blis_dgemm_ukr_r                     6     8   352    50.69   1.59e-17   PASS
blis_dgemm_ukr_r                     6     8   384    49.15   5.55e-17   PASS
blis_dgemm_ukr_r                     6     8   416    50.44   2.86e-17   PASS
blis_dgemm_ukr_r                     6     8   448    46.92   3.12e-17   PASS
blis_dgemm_ukr_r                     6     8   480    48.08   4.08e-17   PASS
2021-10-03 13:14:19 +09:00
Devin Matthews
0a45bc0fbc Merge pull request #552 from flame/armsve_beta_0
Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.
2021-10-02 18:59:43 -05:00
Devin Matthews
13dbd5b5d3 Apply patch from @xrq-phys. 2021-10-02 16:08:05 -05:00
Devin Matthews
ae0eeeaf77 Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs. 2021-09-29 16:43:38 -05:00
Field G. Van Zee
5013a6cb71 More edits and fixes to docs/FAQ.md. 2021-09-29 10:38:50 -05:00