Commit Graph

2080 Commits

Author SHA1 Message Date
Field G. Van Zee
c52c43115e Merge branch 'dev' 2021-09-26 15:56:54 -05:00
Field G. Van Zee
1fc23d2141 Safelist 'master', 'dev', 'amd' branches.
Details:
- Modified .travis.yml so that only commits to 'master', 'dev', and
  'amd' branches get built by Travis CI. Thanks to Devin Matthews for
  helping to track down the syntax for this change.
2021-09-21 14:54:20 -05:00
Field G. Van Zee
1f527a93b9 Re-enable and fix fb93d24.
Details:
- Re-enabled the changes made in fb93d24.
- Defined BLIS_ENABLE_SYSTEM in bli_arch.c, bli_cpuid.c, and bli_env.c,
  all of which needed the definition (in addition to config_detect.c) in
  order for the configure-time hardware detection binary to be compiled
  properly. Thanks to Minh Quan Ho for helping identify these additional
  files as needing to be updated.
- Added additional comments to all four source files, most notably to
  prompt the reader to remember to update all of the files when updating
  any of the files. Also made the cpp code in each of the files as
  consistent/similar as possible.
- Refer to issues #532 and PR #546 for more history.
2021-09-20 17:56:36 -05:00
Field G. Van Zee
7b39c14920 Reverted fb93d24.
Details:
- The latest changes in fb93d24 are still causing problems. Reverting
  and preparing to move them to a branch.
2021-09-20 16:13:50 -05:00
Field G. Van Zee
fb93d242a4 Re-enable and fix 8e0c425 (BLIS_ENABLE_SYSTEM).
Details:
- Re-enable the changes originally made in 8e0c425 but quickly reverted
  in 2be78fc.
- Moved the #include of bli_config.h so that it occurs before the
  #include of bli_system.h. This allows the #define BLIS_ENABLE_SYSTEM
  or #define BLIS_DISABLE_SYSTEM in bli_config.h to be processed by the
  time it is needed in bli_system.h. This change should have been
  in the original 8e0c425, but was accidentally omitted. Thanks to Minh
  Quan Ho for catching this.
- Add #define BLIS_ENABLE_SYSTEM to config_detect.c so that the proper
  cpp conditional branch executes in bli_system.h when compiling the
  hardware detection binary. The changes made in 8e0c425 were an attempt
  to support the definition of BLIS_OS_NONE when configuring with
  --disable-system (in issue #532).  That commit failed because, aside
  from the required but omitted header reordering (second bullet above),
  AppVeyor was unable to compile the hardware detection binary as a
  result of missing Windows headers. This commit, which builds on PR
  #546, should help fix that issue. Thanks to Minh Quan Ho for his
  assistance and patience on this matter.
2021-09-20 15:42:08 -05:00
Field G. Van Zee
52f29f739d Removed last vestige of #define BLIS_NUM_ARCHS.
Details:
- Removed the commented-out #define BLIS_NUM_ARCHS in bli_type_defs.h
  and its associated (now outdated) comments. BLIS_NUM_ARCHS has been
  part of the arch_t enum for some time now, and so this change is
  mostly about removing any opportunity for confusion for people who
  may be reading the code. Thanks to Minh Quan Ho for leading me to
  cleanup.
2021-09-17 08:38:29 -05:00
Field G. Van Zee
849aae09f4 Added new packm var3 to 'gemmlike'.
Details:
- Defined a new packm variant for the 'gemmlike' sandbox. This new
  variant (bls_l3_packm_var3.c) parallelizes the packing operation over
  the k dimension rather than the m or n dimensions. Note that the
  gemmlike implementation still uses var1 by default, and use of the new
  code would require changing bls_l3_packm_a.c and/or bls_l3_packm_b.c
  so that var3 is called instead. Thanks to Jeff Diamond for proposing
  this (perhaps NUMA-friendly) solution.
2021-09-16 14:47:45 -05:00
Devin Matthews
b6f71fd378 Merge pull request #544 from flame/haswell-gemmsup-fpe
Fix more copy-paste errors in the haswell gemmsup code.
2021-09-16 12:24:33 -05:00
Devin Matthews
e3dc1954ff Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell.
The fix is to use the same (valid) source register twice in the horizontal addition.
2021-09-16 10:59:37 -05:00
Devin Matthews
5191c43fac Fix more copy-paste errors in the haswell gemmsup code.
Fixes #486.
2021-09-16 10:16:17 -05:00
Devin Matthews
9293a68eb6 Merge pull request #534 from flame/cxx_test
Add test to Travis using C++ compiler to make sure blis.h is C++-compatible
2021-09-10 14:13:29 -05:00
Devin Matthews
98ce6e8bc9 Do a fast test on OSX. [ci skip] 2021-09-10 14:12:13 -05:00
Devin Matthews
c76fcad0c2 Fix AArch64 tests and consolidate some other tests. 2021-09-10 13:57:02 -05:00
Devin Matthews
e486d666ff Use C++ cross-compiler for ARM tests. 2021-09-10 13:50:16 -05:00
Devin Matthews
fbb3560cb8 Attempt to fix cxx-test for OOT builds. 2021-09-10 13:38:27 -05:00
Field G. Van Zee
ade10f4278 Updated travis-ci.org link in README.md to .com. 2021-08-27 12:47:12 -05:00
Field G. Van Zee
2be78fc977 Disabled (at least temporarily) commit 8e0c425.
Details:
- Reverted changes in 8e0c425 due to AppVeyor build failures that we do
  not yet understand.
2021-08-27 12:17:26 -05:00
Field G. Van Zee
8e0c4255de Define BLIS_OS_NONE when using --disable-system.
Details:
- Modified bli_system.h so that the cpp macro BLIS_OS_NONE is defined
  when BLIS_DISABLE_SYSTEM is defined. Otherwise, the previous OS-
  detecting macro conditionals are considered. This change is to
  accommodate a solution to a cross-compilation issue described in
  #532.
2021-08-26 15:29:18 -05:00
Field G. Van Zee
d6eb70fbc3 Updated stale calls to malloc_intl() in gemmlike.
Details:
- Updated two out-of-date calls to bli_malloc_intl() within the gemmlike
  sandbox. These calls to malloc_intl(), which resided in
  bls_l3_decor_pthreads.c, were missing the err_t argument that the
  function uses to report errors. Thanks to Jeff Diamond for helping
  isolate this issue.
2021-08-26 13:12:39 -05:00
Field G. Van Zee
2f7325b2b7 Blacklist clang10/gcc9 and older for 'armsve'.
Details:
- Prohibit use of clang 10.x and older or gcc 9.x and older for the
  'armsve' subconfiguration. Addresses issue #535.
2021-08-23 15:04:05 -05:00
Devin Matthews
eaea67401c Merge branch 'master' into cxx_test 2021-08-21 16:09:31 -05:00
Devin Matthews
5fc65cdd9e Add test to Travis using C++ compiler to make sure blis.h is C++-compatible. 2021-08-21 15:59:27 -05:00
Field G. Van Zee
e320ec6d5c Moved lang defs from _macro_def.h to _lang_defs.h.
Details:
- Moved miscellaneous language-related definitions, including defs
  related to the handling of the 'restrict' keyword, from the top half
  of bli_macro_defs.h into a new file, bli_lang_defs.h, which is now
  #included immediately after "bli_system.h" in blis.h. This change is
  an attempt to fix a report of recent breakage of C++ compilers due
  to the recent introduction of 'restrict' in bli_type_defs.h (which
  previously was being included *before* bli_macro_defs.h and its
  restrict handling therein. Thanks to Ivan Korostelev for reporting
  this issue in #527.
- CREDITS file update.
2021-08-20 17:15:20 -05:00
Field G. Van Zee
3b275f810b Minor tweaks to gemmlike sandbox.
Details:
- In the gemmlike sandbox, changed the loop index variable of inner
  loop of packm_cxk() from 'd' to 'i' (and likewise for the
  corresponding inlined code within packm_var2()).
- Pack matrices A and B using packm_var1() instead of packm_var2().
2021-08-19 16:06:46 -05:00
Field G. Van Zee
3eccfd456e Added local _check() code to gemmlike sandbox.
Details:
- Added code to the gemmlike sandbox that handles parameter checking.
  Previously, the gemmlike implementation called bli_gemm_check(), which
  resides within the BLIS framework proper. Certain modifications that a
  user may wish to perform on the sandbox, such as adding a new matrix
  or vector operand, would have required additional checks, and so these
  changes make it easier for such a person to implement those checks for
  their custom gemm-like operation.
2021-08-19 13:22:10 -05:00
Field G. Van Zee
7144230cdb README.md citation updates (e.g. BLIS7 bibtex). 2021-08-18 13:25:39 -05:00
Field G. Van Zee
4a955e9390 Tweaks to gemmlike to facilitate 3rd party mods.
Details:
- Changed the implementation in the 'gemmlike' sandbox to more easily
  allow others to provide custom implementations of packm. These changes
  include:
  - Calling a local version of packm_cxk() that can be modified. This
    version of packm_cxk() uses inlined loops in packm_cxk() rather
    than querying the context for packm kernels (or even using scal2m).
  - Providing two variants of packm, one of which calls the
    aforementioned packm_cxk(), the other of which inlines the contents
    of packm_cxk() into the variant itself, making it self-contained.
    To switch from one to the other, simply change which function gets
    called within bls_packm_a() and bls_packm_b().
  - Simplified and cleaned up some variant names in both variants of
    packm, relative to their parent code.
2021-08-16 13:49:27 -05:00
Devin Matthews
2c0b4150e4 Merge pull request #527 from flame/obj_t_makeover
Implement proposed new function pointer fields for obj_t.
2021-08-14 18:41:35 -05:00
Field G. Van Zee
4b8ed99d92 Whitespace tweaks. 2021-08-13 15:31:10 -05:00
Devin Matthews
c99fae50ac Merge pull request #530 from flame/fix_clang_warnings
Clean up some warnings that show up on clang/OSX.
2021-08-13 14:48:00 -05:00
Devin Matthews
e6d68bc4fd Merge pull request #529 from flame/fix_make_check_dependencies
Add dependency on the "flat" blis.h file for the BLIS and BLAS testuite objects.
2021-08-13 14:47:46 -05:00
Devin Matthews
1772db029e Add row- and column-strides for A/B in obj_ukr_fn_t. 2021-08-13 14:46:35 -05:00
Devin Matthews
4f70eb7913 Clean up some warnings that show up on clang/OSX. 2021-08-13 11:12:43 -05:00
Devin Matthews
3cddce1e2a Remove schema field on obj_t (redundant) and add new API functions. 2021-08-12 22:32:34 -05:00
Devin Matthews
ec06b6a503 Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects.
This fixes a bug where "make -j<N> check" may fail after a change to one or more header files, or where testsuite code doesn't get properly recompiled after internal changes.
2021-08-12 19:27:31 -05:00
Field G. Van Zee
20a1c4014c Disabled sanity check in bli_pool_finalize().
Details:
- Disabled a sanity check in bli_pool_finalize() that was meant to alert
  the user if a pool_t was being finalized while some blocks were still
  checked out. However, this is exactly the situation that might happen
  when a pool_t is re-initialized for a larger blocksize, and currently
  bli_pool_reinit() is implemeneted as _finalize() followed by _init().
  So, this sanity check is not universally appropriate. Thanks to
  AMD-India for reporting this issue.
2021-08-12 14:44:04 -05:00
Field G. Van Zee
e366665cd2 Fixed stale API calls to membrk API in gemmlike.
Details:
- Updated stale calls to the bli_membrk API within the 'gemmlike'
  sandbox. This API is now called bli_pba (packed block allocator).
  Ideally, this forgotten update would have been included as part of
  21911d6, which is when the branch where the membrk->pba changes was
  introduced was merged into 'master'.
- Comment updates.
2021-08-12 14:06:53 -05:00
Devin Matthews
64a1f786d5 Implement proposed new function pointer fields for obj_t.
The added fields:
1. `pack_t schema`: storing the pack schema on the object allows the macrokernel to act accordingly without side-channel information from the rntm_t and cntx_t. The pack schema and "pack_[ab]" fields could be removed from those structs.
2. `void* user_data`: this field can be used to store any sort of additional information provided by the user. The pointer is propagated to submatrix objects and copies, but is otherwise ignored by the framework and the default implementations of the following three fields. User-specified pack, kernel, or ukr functions can do whatever they want with the data, and the user is 100% responsible for allocating, assigning, and freeing this buffer.
3. `obj_pack_fn_t pack`: the function called when a matrix is packed. This functions receives the expected arguments, as well as a mdim_t and mem_t* as memory must be allocated inside this function, and behavior may differ based on which matrix is being backed (i.e. transposition for B). This could also be achieved by passing a desired pack schema, but this would require additional information to travel down the control tree.
4. `obj_ker_fn_t ker`: the function called when we get to the "second loop", or the macro-kernel. Behavior may depend on the pack schemas of the input matrices. The default implementation would perform the inner two loops around the ukr, and then call either the default ukr or a user-supplied one (next field).
5. `obj_ukr_fn_t ukr`: the function called by the default macrokernel. This would replace the various current "virtual" microkernels, and could also be used to supply user-defined behavior. Users could supply both a custom kernel (above) and microkernel, although the user-specified kernel does **not** necessarily have to call the ukr function specified on the obj_t.

Note that no macros or functions for accessing these new fields have been defined yet. That is next once these are finalized. Addresses https://github.com/flame/blis/projects/1#card-62357687.
2021-08-11 18:11:47 -05:00
Field G. Van Zee
a32257eeab Fixed bli_init.c compile-time error on OSX clang.
Details:
- Fixed a compile-time error in bli_init.c when compiling with OSX's
  clang. This error was introduced in 868b901, which introduced a
  post-declaration struct assignment where the RHS was a struct
  initialization expression (i.e. { ... }). This use of struct
  initializer expressions apparently works with gcc despite it not
  being strict C99. The fix included in this commit declares a temporary
  variable for the purposes of being initialized to the desired value,
  via the struct initializer, and then copies the temporary struct (via
  '=' struct assignment) to the persistent struct. Thanks to Devin
  Matthews for his help with this.
2021-08-05 16:23:02 -05:00
Field G. Van Zee
c8728cfbd1 Fixed configure breakage on OSX clang.
Details:
- Accept either 'clang' or 'LLVM' in vendor string when greping for
  the version number (after determining that we're working with clang).
  Thanks to Devin Matthews for this fix.
2021-08-05 15:17:09 -05:00
Field G. Van Zee
868b90138e Fixed one-time use property of bli_init() (#525).
Details:
- Fixes a rather obvious bug that resulted in segmentation fault
  whenever the calling application tried to re-initialize BLIS after
  its first init/finalize cycle. The bug resulted from the fact that
  the bli_init.c APIs made no effort to allow bli_init() to be called
  subsequent times at all due to it, and bli_finalize(), being
  implemented in terms of pthread_once(). This has been fixed by
  resetting the pthread_once_t control variable for initialization
  at the end of bli_finalize_apis(), and by resetting the control
  variable for finalization at the end of bli_init_apis(). Thanks to
  @lschork2 for reporting this issue (#525), and to Minh Quan Ho and
  Devin Matthews for suggesting the chosen solution.
- CREDITS file update.
2021-08-04 18:31:01 -05:00
Field G. Van Zee
8dba1e752c CREDITS file update. 2021-07-27 12:38:24 -05:00
Field G. Van Zee
cc9206df66 Added Graviton2 Neoverse N1 performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on a Graviton2
  Neoverse N1 server. Special thanks to Nicholai Tukanov for
  collecting these results via the Arm-HPC/AWS hackaton.
- Corrected what was supposed to be a temporary tweak to the legend
  labels in test/3/octave/plot_l3_perf.m.
2021-07-16 15:48:37 -05:00
Devin Matthews
fab5c86d68 Merge pull request #516 from nicholaiTukanov/p10-sandbox-rework
P10 sandbox rework
2021-07-13 16:46:21 -05:00
Devin Matthews
84f9dcd449 Remove unnecesary windows/zen2 directory. 2021-07-13 16:45:44 -05:00
Field G. Van Zee
21911d6ed3 Merge branch 'dev' 2021-07-09 18:10:46 -05:00
Devin Matthews
17729cf449 Add vzeroupper to Haswell microkernels. (#524)
Details:
- Added vzeroupper instruction to the end of all 'gemm' and 'gemmtrsm' 
  microkernels so as to avoid a performance penalty when mixing AVX
  and SSE instructions. These vzeroupper instructions were once part 
  of the haswell kernels, but were inadvertently removed during a source 
  code shuffle some time ago when we were managing duplicate 'haswell' 
  and 'zen' kernel sets. Thanks to Devin Matthews for tracking this down 
  and re-inserting the missing instructions.
2021-07-09 14:59:48 -05:00
Devin Matthews
c9a7f59aa8 Merge pull request #522 from flame/windows-avx512
Fix Win64 AVX512 bug.
2021-07-08 14:00:38 -05:00
Devin Matthews
9a8e649c5a Fix Win64 AVX512 bug.
Use `-march=haswell` for kernels. Fixes #514.
2021-07-08 11:40:00 -05:00
Devin Matthews
75f03907c5 Add comment about make checkblas on Windows
[ci skip]
2021-07-07 15:44:11 -05:00