Commit Graph

2076 Commits

Author SHA1 Message Date
Minh Quan HO
eaa554aa52 bli_error: more cleanup on the error strings array
- There was redundance between the macro BLIS_MAX_NUM_ERR_MSGS (=200) and
  the enum BLIS_ERROR_CODE_MAX (-170), while they both mean the same thing:
  the maximal number of error codes/messages.
- The previous initialization of error messages at compile time ignored that
  the 'bli_error_string' array still occupies useless memory due to 2D char[][]
  declaration. Instead, it should be just an array of pointers, pointing at
  strings in .rodata section.
- This commit does the two modifications:
   * retired macros BLIS_MAX_NUM_ERR_MSGS and BLIS_MAX_ERR_MSG_LENGTH everywhere
   * switch bli_error_string from char[][] to char *[] to reduce its footprint
     from 40KB (200*200) to 1.3KB (170*sizeof(char*)).
     (No problem to use the enum BLIS_ERROR_CODE_MAX at compile-time,
     since compiler is smart enough to determine its value is 170.)
2021-09-20 10:39:05 +02:00
Field G. Van Zee
52f29f739d Removed last vestige of #define BLIS_NUM_ARCHS.
Details:
- Removed the commented-out #define BLIS_NUM_ARCHS in bli_type_defs.h
  and its associated (now outdated) comments. BLIS_NUM_ARCHS has been
  part of the arch_t enum for some time now, and so this change is
  mostly about removing any opportunity for confusion for people who
  may be reading the code. Thanks to Minh Quan Ho for leading me to
  cleanup.
2021-09-17 08:38:29 -05:00
Field G. Van Zee
849aae09f4 Added new packm var3 to 'gemmlike'.
Details:
- Defined a new packm variant for the 'gemmlike' sandbox. This new
  variant (bls_l3_packm_var3.c) parallelizes the packing operation over
  the k dimension rather than the m or n dimensions. Note that the
  gemmlike implementation still uses var1 by default, and use of the new
  code would require changing bls_l3_packm_a.c and/or bls_l3_packm_b.c
  so that var3 is called instead. Thanks to Jeff Diamond for proposing
  this (perhaps NUMA-friendly) solution.
2021-09-16 14:47:45 -05:00
Devin Matthews
b6f71fd378 Merge pull request #544 from flame/haswell-gemmsup-fpe
Fix more copy-paste errors in the haswell gemmsup code.
2021-09-16 12:24:33 -05:00
Devin Matthews
e3dc1954ff Fix problem where uninitialized registers are included in vhaddpd in the Mx1 gemmsup kernels for haswell.
The fix is to use the same (valid) source register twice in the horizontal addition.
2021-09-16 10:59:37 -05:00
Devin Matthews
5191c43fac Fix more copy-paste errors in the haswell gemmsup code.
Fixes #486.
2021-09-16 10:16:17 -05:00
Devin Matthews
9293a68eb6 Merge pull request #534 from flame/cxx_test
Add test to Travis using C++ compiler to make sure blis.h is C++-compatible
2021-09-10 14:13:29 -05:00
Devin Matthews
98ce6e8bc9 Do a fast test on OSX. [ci skip] 2021-09-10 14:12:13 -05:00
Devin Matthews
c76fcad0c2 Fix AArch64 tests and consolidate some other tests. 2021-09-10 13:57:02 -05:00
Devin Matthews
e486d666ff Use C++ cross-compiler for ARM tests. 2021-09-10 13:50:16 -05:00
Devin Matthews
fbb3560cb8 Attempt to fix cxx-test for OOT builds. 2021-09-10 13:38:27 -05:00
Field G. Van Zee
ade10f4278 Updated travis-ci.org link in README.md to .com. 2021-08-27 12:47:12 -05:00
Field G. Van Zee
2be78fc977 Disabled (at least temporarily) commit 8e0c425.
Details:
- Reverted changes in 8e0c425 due to AppVeyor build failures that we do
  not yet understand.
2021-08-27 12:17:26 -05:00
Field G. Van Zee
8e0c4255de Define BLIS_OS_NONE when using --disable-system.
Details:
- Modified bli_system.h so that the cpp macro BLIS_OS_NONE is defined
  when BLIS_DISABLE_SYSTEM is defined. Otherwise, the previous OS-
  detecting macro conditionals are considered. This change is to
  accommodate a solution to a cross-compilation issue described in
  #532.
2021-08-26 15:29:18 -05:00
Field G. Van Zee
d6eb70fbc3 Updated stale calls to malloc_intl() in gemmlike.
Details:
- Updated two out-of-date calls to bli_malloc_intl() within the gemmlike
  sandbox. These calls to malloc_intl(), which resided in
  bls_l3_decor_pthreads.c, were missing the err_t argument that the
  function uses to report errors. Thanks to Jeff Diamond for helping
  isolate this issue.
2021-08-26 13:12:39 -05:00
Field G. Van Zee
2f7325b2b7 Blacklist clang10/gcc9 and older for 'armsve'.
Details:
- Prohibit use of clang 10.x and older or gcc 9.x and older for the
  'armsve' subconfiguration. Addresses issue #535.
2021-08-23 15:04:05 -05:00
Devin Matthews
eaea67401c Merge branch 'master' into cxx_test 2021-08-21 16:09:31 -05:00
Devin Matthews
5fc65cdd9e Add test to Travis using C++ compiler to make sure blis.h is C++-compatible. 2021-08-21 15:59:27 -05:00
Field G. Van Zee
e320ec6d5c Moved lang defs from _macro_def.h to _lang_defs.h.
Details:
- Moved miscellaneous language-related definitions, including defs
  related to the handling of the 'restrict' keyword, from the top half
  of bli_macro_defs.h into a new file, bli_lang_defs.h, which is now
  #included immediately after "bli_system.h" in blis.h. This change is
  an attempt to fix a report of recent breakage of C++ compilers due
  to the recent introduction of 'restrict' in bli_type_defs.h (which
  previously was being included *before* bli_macro_defs.h and its
  restrict handling therein. Thanks to Ivan Korostelev for reporting
  this issue in #527.
- CREDITS file update.
2021-08-20 17:15:20 -05:00
Field G. Van Zee
3b275f810b Minor tweaks to gemmlike sandbox.
Details:
- In the gemmlike sandbox, changed the loop index variable of inner
  loop of packm_cxk() from 'd' to 'i' (and likewise for the
  corresponding inlined code within packm_var2()).
- Pack matrices A and B using packm_var1() instead of packm_var2().
2021-08-19 16:06:46 -05:00
Field G. Van Zee
3eccfd456e Added local _check() code to gemmlike sandbox.
Details:
- Added code to the gemmlike sandbox that handles parameter checking.
  Previously, the gemmlike implementation called bli_gemm_check(), which
  resides within the BLIS framework proper. Certain modifications that a
  user may wish to perform on the sandbox, such as adding a new matrix
  or vector operand, would have required additional checks, and so these
  changes make it easier for such a person to implement those checks for
  their custom gemm-like operation.
2021-08-19 13:22:10 -05:00
Field G. Van Zee
7144230cdb README.md citation updates (e.g. BLIS7 bibtex). 2021-08-18 13:25:39 -05:00
Field G. Van Zee
4a955e9390 Tweaks to gemmlike to facilitate 3rd party mods.
Details:
- Changed the implementation in the 'gemmlike' sandbox to more easily
  allow others to provide custom implementations of packm. These changes
  include:
  - Calling a local version of packm_cxk() that can be modified. This
    version of packm_cxk() uses inlined loops in packm_cxk() rather
    than querying the context for packm kernels (or even using scal2m).
  - Providing two variants of packm, one of which calls the
    aforementioned packm_cxk(), the other of which inlines the contents
    of packm_cxk() into the variant itself, making it self-contained.
    To switch from one to the other, simply change which function gets
    called within bls_packm_a() and bls_packm_b().
  - Simplified and cleaned up some variant names in both variants of
    packm, relative to their parent code.
2021-08-16 13:49:27 -05:00
Devin Matthews
2c0b4150e4 Merge pull request #527 from flame/obj_t_makeover
Implement proposed new function pointer fields for obj_t.
2021-08-14 18:41:35 -05:00
Field G. Van Zee
4b8ed99d92 Whitespace tweaks. 2021-08-13 15:31:10 -05:00
Devin Matthews
c99fae50ac Merge pull request #530 from flame/fix_clang_warnings
Clean up some warnings that show up on clang/OSX.
2021-08-13 14:48:00 -05:00
Devin Matthews
e6d68bc4fd Merge pull request #529 from flame/fix_make_check_dependencies
Add dependency on the "flat" blis.h file for the BLIS and BLAS testuite objects.
2021-08-13 14:47:46 -05:00
Devin Matthews
1772db029e Add row- and column-strides for A/B in obj_ukr_fn_t. 2021-08-13 14:46:35 -05:00
Devin Matthews
4f70eb7913 Clean up some warnings that show up on clang/OSX. 2021-08-13 11:12:43 -05:00
Devin Matthews
3cddce1e2a Remove schema field on obj_t (redundant) and add new API functions. 2021-08-12 22:32:34 -05:00
Devin Matthews
ec06b6a503 Add dependency on the "flat" blis.h file for the BLIS and BLAS testsuite objects.
This fixes a bug where "make -j<N> check" may fail after a change to one or more header files, or where testsuite code doesn't get properly recompiled after internal changes.
2021-08-12 19:27:31 -05:00
Field G. Van Zee
20a1c4014c Disabled sanity check in bli_pool_finalize().
Details:
- Disabled a sanity check in bli_pool_finalize() that was meant to alert
  the user if a pool_t was being finalized while some blocks were still
  checked out. However, this is exactly the situation that might happen
  when a pool_t is re-initialized for a larger blocksize, and currently
  bli_pool_reinit() is implemeneted as _finalize() followed by _init().
  So, this sanity check is not universally appropriate. Thanks to
  AMD-India for reporting this issue.
2021-08-12 14:44:04 -05:00
Field G. Van Zee
e366665cd2 Fixed stale API calls to membrk API in gemmlike.
Details:
- Updated stale calls to the bli_membrk API within the 'gemmlike'
  sandbox. This API is now called bli_pba (packed block allocator).
  Ideally, this forgotten update would have been included as part of
  21911d6, which is when the branch where the membrk->pba changes was
  introduced was merged into 'master'.
- Comment updates.
2021-08-12 14:06:53 -05:00
Devin Matthews
64a1f786d5 Implement proposed new function pointer fields for obj_t.
The added fields:
1. `pack_t schema`: storing the pack schema on the object allows the macrokernel to act accordingly without side-channel information from the rntm_t and cntx_t. The pack schema and "pack_[ab]" fields could be removed from those structs.
2. `void* user_data`: this field can be used to store any sort of additional information provided by the user. The pointer is propagated to submatrix objects and copies, but is otherwise ignored by the framework and the default implementations of the following three fields. User-specified pack, kernel, or ukr functions can do whatever they want with the data, and the user is 100% responsible for allocating, assigning, and freeing this buffer.
3. `obj_pack_fn_t pack`: the function called when a matrix is packed. This functions receives the expected arguments, as well as a mdim_t and mem_t* as memory must be allocated inside this function, and behavior may differ based on which matrix is being backed (i.e. transposition for B). This could also be achieved by passing a desired pack schema, but this would require additional information to travel down the control tree.
4. `obj_ker_fn_t ker`: the function called when we get to the "second loop", or the macro-kernel. Behavior may depend on the pack schemas of the input matrices. The default implementation would perform the inner two loops around the ukr, and then call either the default ukr or a user-supplied one (next field).
5. `obj_ukr_fn_t ukr`: the function called by the default macrokernel. This would replace the various current "virtual" microkernels, and could also be used to supply user-defined behavior. Users could supply both a custom kernel (above) and microkernel, although the user-specified kernel does **not** necessarily have to call the ukr function specified on the obj_t.

Note that no macros or functions for accessing these new fields have been defined yet. That is next once these are finalized. Addresses https://github.com/flame/blis/projects/1#card-62357687.
2021-08-11 18:11:47 -05:00
Field G. Van Zee
a32257eeab Fixed bli_init.c compile-time error on OSX clang.
Details:
- Fixed a compile-time error in bli_init.c when compiling with OSX's
  clang. This error was introduced in 868b901, which introduced a
  post-declaration struct assignment where the RHS was a struct
  initialization expression (i.e. { ... }). This use of struct
  initializer expressions apparently works with gcc despite it not
  being strict C99. The fix included in this commit declares a temporary
  variable for the purposes of being initialized to the desired value,
  via the struct initializer, and then copies the temporary struct (via
  '=' struct assignment) to the persistent struct. Thanks to Devin
  Matthews for his help with this.
2021-08-05 16:23:02 -05:00
Field G. Van Zee
c8728cfbd1 Fixed configure breakage on OSX clang.
Details:
- Accept either 'clang' or 'LLVM' in vendor string when greping for
  the version number (after determining that we're working with clang).
  Thanks to Devin Matthews for this fix.
2021-08-05 15:17:09 -05:00
Field G. Van Zee
868b90138e Fixed one-time use property of bli_init() (#525).
Details:
- Fixes a rather obvious bug that resulted in segmentation fault
  whenever the calling application tried to re-initialize BLIS after
  its first init/finalize cycle. The bug resulted from the fact that
  the bli_init.c APIs made no effort to allow bli_init() to be called
  subsequent times at all due to it, and bli_finalize(), being
  implemented in terms of pthread_once(). This has been fixed by
  resetting the pthread_once_t control variable for initialization
  at the end of bli_finalize_apis(), and by resetting the control
  variable for finalization at the end of bli_init_apis(). Thanks to
  @lschork2 for reporting this issue (#525), and to Minh Quan Ho and
  Devin Matthews for suggesting the chosen solution.
- CREDITS file update.
2021-08-04 18:31:01 -05:00
Field G. Van Zee
8dba1e752c CREDITS file update. 2021-07-27 12:38:24 -05:00
Field G. Van Zee
cc9206df66 Added Graviton2 Neoverse N1 performance results.
Details:
- Added single-threaded and multithreaded performance results to
  docs/Performance.md. These results were gathered on a Graviton2
  Neoverse N1 server. Special thanks to Nicholai Tukanov for
  collecting these results via the Arm-HPC/AWS hackaton.
- Corrected what was supposed to be a temporary tweak to the legend
  labels in test/3/octave/plot_l3_perf.m.
2021-07-16 15:48:37 -05:00
Devin Matthews
fab5c86d68 Merge pull request #516 from nicholaiTukanov/p10-sandbox-rework
P10 sandbox rework
2021-07-13 16:46:21 -05:00
Devin Matthews
84f9dcd449 Remove unnecesary windows/zen2 directory. 2021-07-13 16:45:44 -05:00
Field G. Van Zee
21911d6ed3 Merge branch 'dev' 2021-07-09 18:10:46 -05:00
Devin Matthews
17729cf449 Add vzeroupper to Haswell microkernels. (#524)
Details:
- Added vzeroupper instruction to the end of all 'gemm' and 'gemmtrsm' 
  microkernels so as to avoid a performance penalty when mixing AVX
  and SSE instructions. These vzeroupper instructions were once part 
  of the haswell kernels, but were inadvertently removed during a source 
  code shuffle some time ago when we were managing duplicate 'haswell' 
  and 'zen' kernel sets. Thanks to Devin Matthews for tracking this down 
  and re-inserting the missing instructions.
2021-07-09 14:59:48 -05:00
Devin Matthews
c9a7f59aa8 Merge pull request #522 from flame/windows-avx512
Fix Win64 AVX512 bug.
2021-07-08 14:00:38 -05:00
Devin Matthews
9a8e649c5a Fix Win64 AVX512 bug.
Use `-march=haswell` for kernels. Fixes #514.
2021-07-08 11:40:00 -05:00
Devin Matthews
75f03907c5 Add comment about make checkblas on Windows
[ci skip]
2021-07-07 15:44:11 -05:00
Devin Matthews
4651583b12 Merge pull request #520 from flame/travis-ci-install
Test installation in Travis CI
2021-07-07 01:11:20 -05:00
Field G. Van Zee
69205ac266 CREDITS file update.
Details:
- Thanks to Chengguo Sun for submitting #515 (5ef7f68).
- Thanks to Andrew Wildman for submitting #519 (551c6b4).
- Whitespace update to configure (spaces to tabs).
2021-07-06 20:39:22 -05:00
Devin Matthews
174f7fc9a1 Test installation in Travis CI 2021-07-06 19:35:55 -05:00
Devin Matthews
551c6b4ee8 Merge pull request #519 from awild82/oot_build_bugfix
Fix installation from out-of-tree builds
2021-07-06 19:32:53 -05:00