Commit Graph

2058 Commits

Author SHA1 Message Date
Field G. Van Zee
e320ec6d5c Moved lang defs from _macro_def.h to _lang_defs.h.
Details:
- Moved miscellaneous language-related definitions, including defs
  related to the handling of the 'restrict' keyword, from the top half
  of bli_macro_defs.h into a new file, bli_lang_defs.h, which is now
  #included immediately after "bli_system.h" in blis.h. This change is
  an attempt to fix a report of recent breakage of C++ compilers due
  to the recent introduction of 'restrict' in bli_type_defs.h (which
  previously was being included *before* bli_macro_defs.h and its
  restrict handling therein. Thanks to Ivan Korostelev for reporting
  this issue in #527.
- CREDITS file update.
2021-08-20 17:15:20 -05:00
Field G. Van Zee
3b275f810b Minor tweaks to gemmlike sandbox.
Details:
- In the gemmlike sandbox, changed the loop index variable of inner
  loop of packm_cxk() from 'd' to 'i' (and likewise for the
  corresponding inlined code within packm_var2()).
- Pack matrices A and B using packm_var1() instead of packm_var2().
2021-08-19 16:06:46 -05:00
Field G. Van Zee
3eccfd456e Added local _check() code to gemmlike sandbox.
Details:
- Added code to the gemmlike sandbox that handles parameter checking.
  Previously, the gemmlike implementation called bli_gemm_check(), which
  resides within the BLIS framework proper. Certain modifications that a
  user may wish to perform on the sandbox, such as adding a new matrix
  or vector operand, would have required additional checks, and so these
  changes make it easier for such a person to implement those checks for
  their custom gemm-like operation.
2021-08-19 13:22:10 -05:00
Field G. Van Zee
7144230cdb README.md citation updates (e.g. BLIS7 bibtex). 2021-08-18 13:25:39 -05:00
Field G. Van Zee
4a955e9390 Tweaks to gemmlike to facilitate 3rd party mods.
Details:
- Changed the implementation in the 'gemmlike' sandbox to more easily
  allow others to provide custom implementations of packm. These changes
  include:
  - Calling a local version of packm_cxk() that can be modified. This
    version of packm_cxk() uses inlined loops in packm_cxk() rather
    than querying the context for packm kernels (or even using scal2m).
  - Providing two variants of packm, one of which calls the
    aforementioned packm_cxk(), the other of which inlines the contents
    of packm_cxk() into the variant itself, making it self-contained.
    To switch from one to the other, simply change which function gets
    called within bls_packm_a() and bls_packm_b().
  - Simplified and cleaned up some variant names in both variants of
    packm, relative to their parent code.
2021-08-16 13:49:27 -05:00
Devin Matthews
2c0b4150e4 Merge pull request #527 from flame/obj_t_makeover
Implement proposed new function pointer fields for obj_t.
2021-08-14 18:41:35 -05:00
Field G. Van Zee
4b8ed99d92 Whitespace tweaks. 2021-08-13 15:31:10 -05:00
Devin Matthews
c99fae50ac Merge pull request #530 from flame/fix_clang_warnings
Clean up some warnings that show up on clang/OSX.
2021-08-13 14:48:00 -05:00
Devin Matthews
e6d68bc4fd Merge pull request #529 from flame/fix_make_check_dependencies
Add dependency on the "flat" blis.h file for the BLIS and BLAS testuite objects.
2021-08-13 14:47:46 -05:00
Devin Matthews
1772db029e Add row- and column-strides for A/B in obj_ukr_fn_t. 2021-08-13 14:46:35 -05:00
Devin Matthews
4f70eb7913 Clean up some warnings that show up on clang/OSX. 2021-08-13 11:12:43 -05:00
Devin Matthews
3cddce1e2a Remove schema field on obj_t (redundant) and add new API functions. 2021-08-12 22:32:34 -05:00
Devin Matthews
ec06b6a503 Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects.
This fixes a bug where "make -j<N> check" may fail after a change to one or more header files, or where testsuite code doesn't get properly recompiled after internal changes.
2021-08-12 19:27:31 -05:00
Field G. Van Zee
20a1c4014c Disabled sanity check in bli_pool_finalize().
Details:
- Disabled a sanity check in bli_pool_finalize() that was meant to alert
  the user if a pool_t was being finalized while some blocks were still
  checked out. However, this is exactly the situation that might happen
  when a pool_t is re-initialized for a larger blocksize, and currently
  bli_pool_reinit() is implemeneted as _finalize() followed by _init().
  So, this sanity check is not universally appropriate. Thanks to
  AMD-India for reporting this issue.
2021-08-12 14:44:04 -05:00
Field G. Van Zee
e366665cd2 Fixed stale API calls to membrk API in gemmlike.
Details:
- Updated stale calls to the bli_membrk API within the 'gemmlike'
  sandbox. This API is now called bli_pba (packed block allocator).
  Ideally, this forgotten update would have been included as part of
  21911d6, which is when the branch where the membrk->pba changes was
  introduced was merged into 'master'.
- Comment updates.
2021-08-12 14:06:53 -05:00
Devin Matthews
64a1f786d5 Implement proposed new function pointer fields for obj_t.
The added fields:
1. `pack_t schema`: storing the pack schema on the object allows the macrokernel to act accordingly without side-channel information from the rntm_t and cntx_t. The pack schema and "pack_[ab]" fields could be removed from those structs.
2. `void* user_data`: this field can be used to store any sort of additional information provided by the user. The pointer is propagated to submatrix objects and copies, but is otherwise ignored by the framework and the default implementations of the following three fields. User-specified pack, kernel, or ukr functions can do whatever they want with the data, and the user is 100% responsible for allocating, assigning, and freeing this buffer.
3. `obj_pack_fn_t pack`: the function called when a matrix is packed. This functions receives the expected arguments, as well as a mdim_t and mem_t* as memory must be allocated inside this function, and behavior may differ based on which matrix is being backed (i.e. transposition for B). This could also be achieved by passing a desired pack schema, but this would require additional information to travel down the control tree.
4. `obj_ker_fn_t ker`: the function called when we get to the "second loop", or the macro-kernel. Behavior may depend on the pack schemas of the input matrices. The default implementation would perform the inner two loops around the ukr, and then call either the default ukr or a user-supplied one (next field).
5. `obj_ukr_fn_t ukr`: the function called by the default macrokernel. This would replace the various current "virtual" microkernels, and could also be used to supply user-defined behavior. Users could supply both a custom kernel (above) and microkernel, although the user-specified kernel does **not** necessarily have to call the ukr function specified on the obj_t.

Note that no macros or functions for accessing these new fields have been defined yet. That is next once these are finalized. Addresses https://github.com/flame/blis/projects/1#card-62357687.
2021-08-11 18:11:47 -05:00
Field G. Van Zee
a32257eeab Fixed bli_init.c compile-time error on OSX clang.
Details:
- Fixed a compile-time error in bli_init.c when compiling with OSX's
  clang. This error was introduced in 868b901, which introduced a
  post-declaration struct assignment where the RHS was a struct
  initialization expression (i.e. { ... }). This use of struct
  initializer expressions apparently works with gcc despite it not
  being strict C99. The fix included in this commit declares a temporary
  variable for the purposes of being initialized to the desired value,
  via the struct initializer, and then copies the temporary struct (via
  '=' struct assignment) to the persistent struct. Thanks to Devin
  Matthews for his help with this.
2021-08-05 16:23:02 -05:00
Field G. Van Zee
c8728cfbd1 Fixed configure breakage on OSX clang.
Details:
- Accept either 'clang' or 'LLVM' in vendor string when greping for
  the version number (after determining that we're working with clang).
  Thanks to Devin Matthews for this fix.
2021-08-05 15:17:09 -05:00
Field G. Van Zee
868b90138e Fixed one-time use property of bli_init() (#525).
Details:
- Fixes a rather obvious bug that resulted in segmentation fault
  whenever the calling application tried to re-initialize BLIS after
  its first init/finalize cycle. The bug resulted from the fact that
  the bli_init.c APIs made no effort to allow bli_init() to be called
  subsequent times at all due to it, and bli_finalize(), being
  implemented in terms of pthread_once(). This has been fixed by
  resetting the pthread_once_t control variable for initialization
  at the end of bli_finalize_apis(), and by resetting the control
  variable for finalization at the end of bli_init_apis(). Thanks to
  @lschork2 for reporting this issue (#525), and to Minh Quan Ho and
  Devin Matthews for suggesting the chosen solution.
- CREDITS file update.
2021-08-04 18:31:01 -05:00
Field G. Van Zee
8dba1e752c CREDITS file update. 2021-07-27 12:38:24 -05:00
Field G. Van Zee
cc9206df66 Added Graviton2 Neoverse N1 performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on a Graviton2
  Neoverse N1 server. Special thanks to Nicholai Tukanov for
  collecting these results via the Arm-HPC/AWS hackaton.
- Corrected what was supposed to be a temporary tweak to the legend
  labels in test/3/octave/plot_l3_perf.m.
2021-07-16 15:48:37 -05:00
Devin Matthews
fab5c86d68 Merge pull request #516 from nicholaiTukanov/p10-sandbox-rework
P10 sandbox rework
2021-07-13 16:46:21 -05:00
Devin Matthews
84f9dcd449 Remove unnecesary windows/zen2 directory. 2021-07-13 16:45:44 -05:00
Field G. Van Zee
21911d6ed3 Merge branch 'dev' 2021-07-09 18:10:46 -05:00
Devin Matthews
17729cf449 Add vzeroupper to Haswell microkernels. (#524)
Details:
- Added vzeroupper instruction to the end of all 'gemm' and 'gemmtrsm' 
  microkernels so as to avoid a performance penalty when mixing AVX
  and SSE instructions. These vzeroupper instructions were once part 
  of the haswell kernels, but were inadvertently removed during a source 
  code shuffle some time ago when we were managing duplicate 'haswell' 
  and 'zen' kernel sets. Thanks to Devin Matthews for tracking this down 
  and re-inserting the missing instructions.
2021-07-09 14:59:48 -05:00
Devin Matthews
c9a7f59aa8 Merge pull request #522 from flame/windows-avx512
Fix Win64 AVX512 bug.
2021-07-08 14:00:38 -05:00
Devin Matthews
9a8e649c5a Fix Win64 AVX512 bug.
Use `-march=haswell` for kernels. Fixes #514.
2021-07-08 11:40:00 -05:00
Devin Matthews
75f03907c5 Add comment about make checkblas on Windows
[ci skip]
2021-07-07 15:44:11 -05:00
Devin Matthews
4651583b12 Merge pull request #520 from flame/travis-ci-install
Test installation in Travis CI
2021-07-07 01:11:20 -05:00
Field G. Van Zee
69205ac266 CREDITS file update.
Details:
- Thanks to Chengguo Sun for submitting #515 (5ef7f68).
- Thanks to Andrew Wildman for submitting #519 (551c6b4).
- Whitespace update to configure (spaces to tabs).
2021-07-06 20:39:22 -05:00
Devin Matthews
174f7fc9a1 Test installation in Travis CI 2021-07-06 19:35:55 -05:00
Devin Matthews
551c6b4ee8 Merge pull request #519 from awild82/oot_build_bugfix
Fix installation from out-of-tree builds
2021-07-06 19:32:53 -05:00
Andrew Wildman
f648df4e55 Add symlink to blis.pc.in for out-of-tree builds 2021-07-06 16:35:12 -07:00
Devin Matthews
78eac6a0ab Revert "Always run make check."
This reverts commit a201a53440.
2021-07-06 11:05:43 -05:00
Devin Matthews
a201a53440 Always run make check.
I'm concerned that problems may lurk for `x86_64` builds on Windows which may be uncovered by a fuller `make check`.
2021-07-05 21:39:18 -05:00
Devin Matthews
5ef7f684dc Merge pull request #515 from chengguosun/bug-fix
Fixed configure script bug.
2021-07-05 21:35:07 -05:00
sunchengguo
ad6231cca3 Fixed configure script bug.
Details:
- Fixed kernel list string substitution error by adding function substitute_words in configure script.
  if the string contains zen and zen2, and zen need to be replaced with another string, then zen2
  also be incorrectly replaced.
2021-07-06 07:30:00 -04:00
nicholaiTukanov
d073fc9aca Update POWER10.md 2021-07-02 19:54:33 -05:00
nicholaiTukanov
907226c0af Rework POWER10 sandbox
- Add a testsuite for gathering performance (in GFLOPs) and measuring correctness for the POWER10 GEMM reduced precision/integer kernels.
- Reworked GENERIC_GEMM template to hardcode the cache parameters.
- Remove kernel wrapper that checked that only allowed matrices that weren't transposed or conjugated. However, the kernels still assume the matrices are not transposed. This wrapper was removed for performance reasons.
- Renamed and restructured files and functions for clarity.
- Editted the POWER10 document to reflect new changes.
2021-07-02 19:47:18 -05:00
Field G. Van Zee
aaa10c87e1 Skip clearing temp microtile in gemmlike sandbox.
Details:
- Removed code from gemmlike sandbox files bls_gemm_bp_var1.c and
  bls_gemm_bp_var2.c that initializes the elements of the temporary
  microtile to zero. This code, introduced recently in 7f7d726, did
  not actually fix any bug (despite that commit's log entry). The
  microtile does not need to be initialized because it is completely
  overwritten by a "beta = 0" invocation of gemm prior to it being
  read. Any NaNs or Infs present at the outset would have no impact
  on the output matrix C. Thanks to Devin Matthews for reminding me
  of this.
2021-06-21 17:53:52 -05:00
Devin Matthews
bc10a3f2ff Merge pull request #492 from flame/thunderx2-clang
Allow clang for ThunderX2 config
2021-06-18 19:01:08 -05:00
Devin Matthews
bf72763663 Merge pull request #506 from xrq-phys/arm64-mac
BLIS on Darwin_Aarch64
2021-06-18 18:59:43 -05:00
Devin Matthews
e28f2a2dfc Merge pull request #513 from nicholaiTukanov/asm_warning_p9_fix
Fix assembler warning in POWER9 DGEMM
2021-06-15 19:35:07 -05:00
nicholai
56ffca6a9b Fix asm warning 2021-06-15 18:17:39 -05:00
Field G. Van Zee
689fa0f403 Merge branch 'master' into dev 2021-06-13 19:44:14 -05:00
Field G. Van Zee
d10e05bbd1 Sandbox header edits trigger full library rebuild.
Details:
- Adjusted the top-level Makefile so that any change to a sandbox header
  file will result in blis.h being regenerated along with a full
  recompilation of the library. Previously, sandbox files were omitted
  from the list of header files that, when touched, could trigger a full
  rebuild. Why was it like that previously? Because originally we only
  envisioned using sandboxes to *replace* gemm, not augment the library
  with new functionality. When replacing gemm, blis.h does not need to
  contain any local sandbox defintions in order for the user to be able
  to (indirectly) use that sandbox. But if you are adding functions to
  the library, those functions need to be prototyped so the compiler
  can perform type checking against the user's invocation of those new
  functions. Thanks to Jeff Diamond for helping us discover this
  deficiency in the build system.
2021-06-13 19:36:16 -05:00
Devin Matthews
7c3eb44efa Add vhsubpd/vhsubpd.
Horizontal subtraction instructions added to bli_x86_asm_macros.h, currently unused [ci skip].
2021-06-02 11:28:22 -05:00
Field G. Van Zee
7f7d72610c Fixed bugs in cpackm kernels, gemmlike code.
Details:
- Fixed intermittent bugs in bli_packm_haswell_asm_c3xk.c and
  bli_packm_haswell_asm_c8xk.c whereby the imaginary component of the
  kappa scalar was incorrectly loaded at an offset of 8 bytes (instead
  of 4 bytes) from the real component. This was almost certainly a copy-
  paste bug carried over from the corresonding zpackm kernels. Thanks to
  Devin Matthews for bringing this to my attention.
- Added missing code to gemmlike sandbox files bls_gemm_bp_var1.c and
  bls_gemm_bp_var2.c that initializes the elements of the temporary
  microtile to zero. (This bug was never observed in output but rather
  noticed analytically. It probably would have also manifested as
  intermittent failures, this time involving edge cases.)
- Minor commented-out/disabled changes to testsuite/src/test_gemm.c
  relating to debugging.
2021-05-31 16:50:18 -05:00
RuQing Xu
5fc93e2806 Armv8A Rename Regs for Safe Darwin Compile
Avoid x18 use in FP32 kernel:
- C address lines x[18-26] renamed to x[19-27] (reg index +1)
- Original role of x27 fulfilled by x5 which is free after k-loop pert.

FP64 does not require changing since x18 is not used there.
2021-05-29 18:44:47 +09:00
RuQing Xu
9f4a4a3cfb Armv8A Rename Regs for Clang Compile: FP32 Part
Roughly the same as 916e1fa , additionally with x15 clobbering removed.
- x15: Not used at all.

Compilation w/ Clang shows warning about x18 reservation, but
compilation itself is OK and all tests got passed.
2021-05-29 17:21:28 +09:00