Commit Graph

2214 Commits

Author SHA1 Message Date
Field G. Van Zee
88cab8383c CHANGELOG update (0.9.0) 2022-04-01 08:12:06 -05:00
Field G. Van Zee
14c86f66b2 Version file update (0.9.0) 2022-04-01 08:12:06 -05:00
Field G. Van Zee
99bb9002f1 ReleaseNotes.md update in advance of next version. 2022-04-01 08:10:59 -05:00
Field G. Van Zee
bee7678b25 CREDITS file update. 2022-03-31 14:09:39 -05:00
Field G. Van Zee
cf06364327 Fixed typo in BLAS gemm3m call to _check().
Details:
- Fixed an unresolved symbol issue leftover from #590 whereby ?gemm3m_()
  as defined in bla_gemm3m.c was referencing bla_gemm3m_check(), which
  does not exist. It should have simply called the _check() function for
  gemm.
2022-03-29 16:18:25 -05:00
Dipal M Zambare
1ec020b33e AMD kernel updates; frame-specific AMD updates. (#597)
Details:
- Allow building BLIS with certain framework files (each with the '_amd'
  suffix) that have been customized by AMD for Zen-based hardware. These
  customized files were derived from portable versions of the same files
  (i.e., those without the '_amd' suffix). Whether the portable or AMD-
  specific files are compiled is now controlled by a new configure
  option, --[en|dis]able-amd-frame-tweaks. This option is disabled by
  default in vanilla BLIS, though AMD may choose to enable it by default
  in their fork. For now, the added AMD-specific files are:
  - bli_gemv_unf_var2_amd.c
  - bla_copy_amd.c
  - bla_gemv_amd.c
  These files reside in 'amd' subdirectories found within the directory
  housing their generic counterparts.
- Register optimized real-domain copyv, setv, and swapv kernels in
  bli_cntx_init_zen.c.
- Various minor updates to level-1v kernels in 'zen' kernel set.
- Added caxpyf kernel as well as saxpyf and multiple daxpyf kernels to
  the 'zen' kernel set
- If the problem passed to ?gemm_() in bla_gemm.c has a unit m or n dim,
  call gemv instead and return early.
- Combined variable declarations with their initialization in various
  level-2 and level-3 BLAS compatibility files, and also inserted
  'const' qualifer in those same declaration statements.
- Moved frame/compat/bla_gemmt.c and .h to frame/compat/extra/ .
- Added copyv and swapv test drivers to 'test' directory.
- Whitespace, comment changes.
2022-03-29 16:15:36 -05:00
Bhaskar Nallani
0db2bd5341 Added BLAS/CBLAS APIs for gemm3m. (#590)
Details:
- Created ?gemm3m_() and cblas_?gemm3m() APIs that (for now) simply
  invoke the 1m implementation unconditionally. (Note that these APIs
  bypass sup handling.)
- Added BLAS prototypes for gemm3m in frame/compat/bla_gemm3m.h.
- Added CBLAS prototypes for gemm3m in frame/compat/cblas/src/cblas.h.
- Relocated: 
    frame/compat/cblas/src/cblas_?gemmt.c 
  files into
    frame/compat/cblas/src/extra/ 
- Relocated frame/compat/bla_gemmt.? into frame/compat/extra/ .
- Minor reorganization of prototypes and cpp macro directives in 
  bli_blas.h, cblas.h, and cblas_f77.h.
- Trival whitespace change to cblas_zgemm.c.
2022-03-24 18:41:55 -05:00
Devin Matthews
d6810000e9 Update Multithreading.md
Add notes about `BLIS_IR_NT` (should typically be 1) and `BLIS_JR_NT` (should typically be small, e.g. <= 4). [ci skip]
2022-03-14 10:29:54 -05:00
Field G. Van Zee
f1dbb0e514 Trival whitespace change; commit log addendum.
Details:
- A co-attribution to Mithun Mohan was inadvertently omitted from the
  commit log for headline change in the previous commit, 7c07b47.
2022-03-11 13:38:28 -06:00
Field G. Van Zee
7c07b477e4 Avoid gemmsup barriers when not packing A or B. (#622)
Details:
- Implemented a multithreaded optimization for the special (and common)
  case of employing the gemmsup code path when the user requests
  (implicitly or explicitly) that neither A nor B be packed during
  computation. This optimization takes the form of a greatly reduced
  code branch in bli_thrinfo_sup_create_for_cntl(), which avoids a
  broadcast and two barriers, and results in higher performance when
  obtaining two-way or higher parallelism within BLIS. Thanks to
  Bhaskar Nallani of AMD for proposing this change via issue #605.
- Added an early return branch to bli_thrinfo_create_for_cntl() that
  detects and quickly handles cases where no parallelism is being
  obtained within BLIS (i.e., single-threaded execution). Note that
  this special case handling was/is already present in
  bli_thrinfo_sup_create_for_cntl().
- CREDITS file update.
2022-03-11 13:28:50 -06:00
Ivan Korostelev
cad10410b2 POWER10: edge cases in microkernel (#620)
Use new API for POWER10 gemm microkernel
2022-03-10 09:58:14 -06:00
Field G. Van Zee
71851a0549 Fixed level-3 performance bug in haswell ukernels.
Details:
- Fixed a performance regression affecting nearly all level-3 operations
  that use the 'haswell' sgemm and dgemm microkernels. This regression
  was introduced in 54fa28b, caused by an ill-formed conditional
  expression in the assembly code that controls whether cache lines of C
  should be prefetched as rows or as columns. Essentially, the two
  branches were reversed, causing incomplete prefetching to occur for
  both row- and column-stored instances of matrix C. Thanks to Devin
  Matthews for his help finding and fixing this bug.
2022-03-08 17:38:09 -06:00
Field G. Van Zee
84732bf956 Revamp how tools are handled/checked by configure.
Details:
- Consolidate handling of tools that are specifiable via CC, CXX, FC, 
  PYTHON, AR, and RANLIB into one bash function, select_tool_w_env().
  - If the user specifies a tool via an environment variable (e.g. 
    CC=gcc) and that tool does not seem valid, print an error message 
    and abort configure, unless the tool is optional (e.g. CXX or FC), 
    in which case a warning message is printed instead.
  - The definition of "seems valid" above amounts to:
    - responding to at least one of a basic set of command line options 
      (e.g. --version, -V, -h) if the os_name is Linux (since GNU tools 
      tend to respond to flags such as --version) or if the tool in 
      question is CC, CXX, FC, or PYTHON (which tend to respond to the 
      expected flags regardless of OS)
    - the binary merely existing for AR and RANLIB on Darwin/OSX/BSD. 
      (These OSes tend to have non-GNU versions of ar and ranlib, which 
      typically do not respond to --version and friends.)
- This PR addresses #584. Thanks to Devin Matthews for suggesting some
  of the changes in this commit.
2022-02-28 12:19:31 -06:00
RuQing Xu
d5146582b1 ArmSVE Ensure Non-zero Block Size (#615)
Fixes #613. There are several macros/environment variables which need to be tuned to get good cache block sizes. It would be nice to have a way of getting values automatically.
2022-02-22 12:35:46 -06:00
RuQing Xu
4d83523097 Add armsve to arm64 Metaconfig (#614)
Availability of the `armsve` subconfig is controlled by the compiler version (gcc/clang). Tested for SVE and non-SVE. Fixes #612.
2022-02-22 10:03:47 -06:00
Field G. Van Zee
c9700f369a Renamed SIMD-related macro constants for clarity.
Details:
- Renamed the following macros defined in bli_kernel_macro_defs.h:

    BLIS_SIMD_NUM_REGISTERS -> BLIS_SIMD_MAX_NUM_REGISTERS
    BLIS_SIMD_SIZE          -> BLIS_SIMD_MAX_SIZE

  Also updated all instances of these macros elsewhere, including
  subconfigurations, source code, and documentation. Thanks to Devin
  Matthews for suggesting this change.
2022-02-15 15:36:52 -06:00
Field G. Van Zee
ee9ff988c4 Move edge cases to gemmtrsm ukrs; doc updates.
Details:
- Moved edge-case handling into the gemmtrsm microkernel. This required
  changing the microkernel API to take m and n dimension parameters as
  well as updating all existing gemmtrsm microkernel function pointer
  types, function signatures, and related definitions to take m and n
  dimensions. Also updated all existing gemmtrsm kernels in the
  'kernels' directory (which for now is limited to haswell and penryn
  kernel sets, plus native and 1m-based reference kernels in
  'ref_kernels') to take m and n dimensions, and implemented edge-case
  handling within those microkernels via a collection of new C
  preprocessor macros defined within bli_edge_case_macro_defs.h. Note
  that the edge-case handling for gemm-like operations had already
  been relocated into the gemm microkernel in 54fa28b.
- Added desriptive comments to GEMM_UKR_SETUP_CT() and related macros in
  bli_edge_case_macro_defs.h to allow for easier reading.
- Updated docs/KernelsHowTo.md to reflect above changes. Also cleaned up
  the bullet under "Implementation Notes for gemm" that covers alignment
  issues. (Thanks to Ivan Korostelev for pointing out the confusing and
  outdated language in issue #591.)
- Other minor tweaks to KernelsHowTo.md.
2022-02-15 15:01:51 -06:00
Devin Matthews
2506159346 Don't use -Wl,-flat-namespace.
Flat namespaces can cause problems due to conflicting system libraries,
etc., so just mark `xerbla_` as a weak symbol on macOS instead.
2022-02-13 20:11:55 -06:00
Devin Matthews
5a4d3f5208 Use -flat_namespace option to link on macOS
Fixes #611.
2022-02-13 17:28:30 -06:00
Devin Matthews
26742910a0 Update CC_VENDOR logic
Look for `GCC` in addition to `gcc` to handle weird conda version strings. [ci skip]
2022-02-13 16:53:45 -06:00
RuQing Xu
2f3872e01d ArmSVE Adopts Label Wrapper
For clang (& armclang?) compilation.

Hopefully solves #609 .
2022-02-07 09:54:11 -06:00
RuQing Xu
72089bb291 ArmSVE Use Predicate in M-Direction
No need to query MR during kernel runtime.
2022-02-07 09:54:11 -06:00
Ruqing Xu
9cc897f374 Fix SVE Compil. 2022-02-07 09:54:11 -06:00
RuQing Xu
b5df1811f1 Armv8a, ArmSVE: Simplify Gen-C 2022-02-07 09:54:11 -06:00
Devin Matthews
35195bb5ce Add armclang detection to configure.
armclang is treated as regular clang. Fixes #606. [ci skip]
2022-01-31 10:29:50 -06:00
Field G. Van Zee
0be9282cdc Updated zen3 macro constant names.
Details:
- In config/zen3/bli_family_zen3.h, renamed:
    BLIS_SMALL_MATRIX_A_THRES_M_GEMMT -> _M_SYRK
    BLIS_SMALL_MATRIX_A_THRES_N_GEMMT -> _N_SYRK
  Thanks to Jeff Diamond for helping spot the stale _SYRK naming.
2022-01-26 17:46:24 -06:00
Jeff Hammond
0ab20c0e72 the Apple local label thing is required by Clang in general
@egaudry and I both saw this issue on Linux with Clang 10.

```
Compiling obj/thunderx2/kernels/armv8a/3/sup/bli_gemmsup_rv_armv8a_asm_d4x8m.o ('thunderx2' CFLAGS for kernels)
kernels/armv8a/3/bli_gemm_armv8a_asm_d6x8.c:171:49: fatal error: invalid symbol redefinition
        "                                            \n\t"
                                                       ^
<inline asm>:90:5: note: instantiated into assembly here
           .SLOOPKITER:
           ^
1 error generated.
```

Signed-off-by: Jeff Hammond <jehammond@nvidia.com>
2022-01-17 09:28:03 -06:00
Devin Matthews
81f93be056 Fix row-/column-major pref. in 16x8 haswell sgemm ukr (unused) 2022-01-10 18:32:25 -06:00
Devin Matthews
268ce1f29a Relax alignment constraints
Remove alignment of temporary AB buffer in edge case handling macros unless alignment is specifically requested (e.g. Core2, SDB/IVB). Fixes #595.
2022-01-10 18:32:25 -06:00
Field G. Van Zee
3f2440b022 Added m, n dims to gemmd/gemmlike ukernel calls.
Details:
- Updated the gemmd addon and the gemmlike sandbox code to use the new
  microkernel calling sequence, which now includes m and n dimensions so
  that the microkernel has all the information necessary to handle edge
  cases. Thanks to Jeff Diamond for catching this, which ideally would
  have been included in commit 54fa28b.
- Retired var2 of both gemmd and gemmlike to 'attic' directories and
  removed their corresponding prototypes. In both cases, var2 was a
  variant of the block-panel algorithm where edge-case handling was
  abstracted away to a microkernel wrapper. (Since this is now the
  official behavior of BLIS microkernels, I saw no need to have it
  included as a separate code path.)
- Comment updates.
2022-01-06 14:57:36 -06:00
Field G. Van Zee
864bfab448 CREDITS file update. 2022-01-04 15:10:34 -06:00
Devin Matthews
466b68a3ad Add unique tag to branch labels for Apple ARM64.
Add `%=` tag to branch labels, which expands to a unique identifier for each inline assembly block. This prevents duplicate symbol errors on Apple Silicon (#594). Fixes #594. [ci skip] since we can't test Apple Silicon anyways...
2022-01-02 14:59:41 -06:00
RuQing Xu
08174a2f6e Evict <arm_sve.h> Requirement for SVE GEMM
For 8<= GCC < 10 compatibility.
2022-01-01 09:29:11 -06:00
Devin Matthews
54fa28bd84 Move edge cases to gemm ukr; more user-custom mods. (#583)
Details:
- Moved edge-case handling into the gemm microkernel. This required
  changing the microkernel API to take m and n dimension parameters.
  This required updating all existing gemm microkernel function pointer
  types, function signatures, and related definitions to take m and n
  dimensions. We also updated all existing kernels in the 'kernels' 
  directory to take m and n dimensions, and implemented edge-case 
  handling within those microkernels via a collection of new C 
  preprocessor macros defined within bli_edge_case_macro_defs.h. Also
  removed the assembly code that formerly would handle general stride 
  IO on the microtile, since this can now be handled by the same code
  that does edge cases.
- Pass the obj_t.ker_fn (of matrix C) into bli_gemm_cntl_create() and
  bli_trsm_cntl_create(), where this function pointer is used in lieu of 
  the default macrokernel when it is non-NULL, and ignored when it is
  NULL.
- Re-implemented macrokernel in bli_gemm_ker_var2.c to be a single
  function using byte pointers rather that one function for each
  floating-point datatype. Also, obtain the microkernel function pointer
  from the .ukr field of the params struct embedded within the obj_t
  for matrix C (assuming params is non-NULL and contains a non-NULL
  value in the .ukr field). Communicate both the gemm microkernel
  pointer to use as well as the params struct to the microkernel via
  the auxinfo_t struct.
- Defined gemm_ker_params_t type (for the aforementioned obj_t.params 
  struct) in bli_gemm_var.h.
- Retired the separate _md macrokernel for mixed datatype computation.
  We now use the reimplemented bli_gemm_ker_var2() instead.
- Updated gemmt macrokernels to pass m and n dimensions into microkernel
  calls.
- Removed edge-case handling from trmm and trsm macrokernels.
- Moved most of bli_packm_alloc() code into a new helper function,
  bli_packm_alloc_ex().
- Fixed a typo bug in bli_gemmtrsm_u_template_noopt_mxn.c.
- Added test/syrk_diagonal and test/tensor_contraction directories with
  associated code to test those operations.
2021-12-24 08:00:33 -06:00
Kiran
961d9d509d Re-add BLIS_ENABLE_ZEN_BLOCK_SIZES macro for 'zen'.
Details:
- Added previously-deleted cpp macro block to bli_cntx_init_zen.c 
  targeting the Naples microarchitecture that enabled different cache 
  blocksizes when the number of threads exceeds 16. This commit 
  represents PR #573.
2021-12-07 15:30:38 -06:00
Devin Matthews
cf7d616a2f Enable user-customized packm ukernel/variant. (#549)
Details:
- Added four new fields to obj_t: .pack_fn, .pack_params, .ker_fn, and
  .ker_params. These fields store pointers to functions and data that
  will allow the user to more flexibly create custom operations while  
  recycling BLIS's existing partitioning infrastructure.
- Updated typed API to packm variant and structure-aware kernels to 
  replace the diagonal offset with panel offsets, and changed strides 
  of both C and P to inc/ldim semantics. Updated object API to the packm
  variant to include rntm_t*.
- Removed the packm variant function pointer from the packm cntl_t node
  definition since it has been replaced by the .pack_fn pointer in the 
  obj_t.
- Updated bli_packm_int() to read the new packm variant function pointer
  from the obj_t and call it instead of from the cntl_t node.
- Moved some of the logic of bli_l3_packm.c to a new file,
  bli_packm_alloc.c.
- Rewrote bli_packm_blk_var1.c so that it uses byte (char*) pointers
  instead of typed pointers, allowing a single function to be used
  regardless of datatype. This obviated having a separate implementation
  in bli_packm_blk_var1_md.c. Also relegated handling of scalars to a 
  new function, bli_packm_scalar().
- Employed a new standard whereby right-hand matrix operands ("B") are
  always packed as column-stored row panels -- that is, identically to 
  that of left-hand matrix operands ("A"). This means that while we pack
  matrix A normally, we actually pack B in a transposed state. This
  allowed us to simplify a lot of code throughout the framework, and
  also affected some of the logic in bli_l3_packa() and _packb().
- Simplified bli_packm_init.c in light of the new B^T convention
  described above. bli_packm_init()--which is now called from within
  bli_packm_blk_var1()--also now calls bli_packm_alloc() and returns
  a bool that indicates whether packing should be performed (or
  skipped).
- Consolidated bli_gemm_int() and bli_trsm_int() into a bli_l3_int(),
  which, among other things, defaults the new .pack_fn field of the 
  obj_t to bli_packm_blk_var1() if the field is NULL.
- Defined a new function, bli_obj_reset_origin(), which permanently
  refocuses the view of an object so that it "forgets" any offsets from 
  its original pointer. This function also sets the object's root field 
  to itself. Calls to bli_obj_reset_origin() for each matrix operand
  appear in the _front() functions, after the obj_t's are aliased. This
  resetting of the underlying matrices' origins is needed in preparation
  for more advanced features from within custom packm kernels.
- Redefined bli_pba_rntm_set_pba() from a regular function to a static 
  inline function.
- Updated gemm_ukr, gemmtrsm_ukr, and trsm_ukr testsuite modules to use
  libblis_test_pobj_create() to create local packed objects. Previously,
  these packed objects were created by calling lower-level functions.
2021-12-02 17:10:03 -06:00
Field G. Van Zee
e229e049ca Added recu-sed.sh script to 'build' directory.
Details:
- Added a recursive sed script to the 'build' directory.
2021-12-01 17:36:22 -06:00
Field G. Van Zee
12c66a4acc Minor updates to README.md, docs/Addons.md.
Details:
- Add additional mentions of addons to README.md, including in the
  "What's New" section.
- Removed mention of sandboxes from the long list of advantages
  provided by BLIS.
- Very minor description update to opening line of Addons.md.
2021-11-19 14:43:53 -06:00
Field G. Van Zee
a4bc03b990 Brief mention/link to Addons.md in README.md.
Details:
- Add a blurb about the new addons feature to the "Documentation for
  BLIS developers" section of the README.md, which also links to the
  Addons.md document.
2021-11-19 13:29:00 -06:00
Field G. Van Zee
b727645eb7 Merge branch 'dev' 2021-11-19 13:22:09 -06:00
Madan mohan Manokar
9be97c150e Support all four dts in test/test_her[2][k].c (#578)
Details:
- Replaced the hard-coded calls to double-precision real syr, syr2, 
  syrk, and syrk in the corresponding standalone test drivers in the 
  'test' directory with conditional branches that will call the 
  appropriate BLAS interface depending on which datatype is enabled. 
  Thanks to Madan mohan Manokar for this improvement.
- CREDITS file update.
2021-11-17 13:16:46 -06:00
Dipal M Zambare
26e4b6b293 Added support for AMD's Zen3 microarchitecture.
Details:
- Added a new 'zen3' subconfiguration targeting support for the AMD Zen3
  microarchitecture (#561). Thanks to AMD for this contribution.
- Restructured clang and AOCC support for zen, zen2, and zen3
  make_defs.mk files. The clang and AOCC version detection now happens
  in configure, not in the subconfigurations' makefile fragments. That
  is, we've added logic to configure that detects the version of
  clang/AOCC, outputs an appropriate variable to config.mk
  (ie: CLANG_OT_*, AOCC_OT_*), and then checks for it within the
  makefile fragment (as is currently done for the GCC_OT_* variables).
- Added configure support for a GCC_OT_10_1_0 variable (and associated
  substitution anchor) to communicate whether the gcc version is older
  than 10.1.0, and use this variable to check for recent enough versions
  of gcc to use -march=znver3 in the zen3 subconfig.
- Inlined the contents of config/zen/amd_config.mk into the zen and zen2
  make_defs.mk so that the files are self-contained, harmonizing the
  format of all three Zen-based subconfigurations' make_defs.mk files.
- Added indenting (with spaces) of GNU make conditionals for easier
  reading in zen, zen2, and zen3 make_defs.mk files.
- Adjusted the range of models checked by bli_cpuid_is_zen() (which was
  previously 0x00 ~ 0xff and is now 0x00 ~ 0x2f) so that it is
  completely disjoint from the models checked by bli_cpuid_is_zen2()
  (0x30 ~ 0xff). This is normally necessary because Zen and Zen2
  microarchitectures share the same family (23, or 0x17), and so the
  model code is the only way to differentiate the two. But in our case,
  fixing the model range for zen *wasn't* actually necessary since we
  checked for zen2 first, and therefore the wide zen range acted like
  the 'else' of an 'if-else' statement. That said, the change helps
  improve clarity for the reader by encoding useful knowledge, which
  was obtained from https://en.wikichip.org/wiki/amd/cpuid .
- Added zen2.def and zen3.def files to the collection in travis/cpuid.
  Note that support for zen, zen2, and zen3 is now present, and while
  all the three microarchitectures have identical instruction sets from
  the perspective of BLIS microkernels, they each correspond to
  different subconfigurations and therefore merit separate testing.
  Thanks to Devin Matthews for his guidance in hacking these files as
  slight modifications of zen.def.
- Enabled testing of zen2 and zen3 via the SDE in travis/do_sde.sh.
  Now, zen, zen2, and zen3 are tested through the SDE via Travis CI
  builds.
- Updated travis/do_sde.sh to grab the SDE tarball from a new ci-utils
  repository on GitHub rather than on Intel's website. This change was
  made in an attempt to circumvent recent troubles with Travis CI not
  being able to download the SDE directly from Intel's website via curl.
  Thanks to Devin Matthews for suggesting the idea.
- Updated travis/do_sde.sh to grab the latest version (8.69.1) of the
  Intel SDE from the flame/ci-utils repository.
- Updated .travis.yml to use gcc 9. The file was previously using gcc 8,
  which did not support -march=znver2.
- Created amd64_legacy umbrella family in config_registry for targeting
  older (bulldozer, piledriver, steamroller, and excavator)
  microarchitectures and moved those same subconfigs out of the amd64
  umbrella family. However, x86_64 retains amd64_legacy as a constituent
  member.
- Fixed a bug in configure related to the building of the so-called
  config list. When processing the contents of config_registry,
  configure creates a series of structures and lists that allow for
  various mappings related to configuration families, subconfigs, and
  kernel sets. Two of those lists are built via substitution of
  umbrella families with their subconfig members, and one of those
  lists was improperly performing the substitution in a way that would
  erroneously match on partial umbrella family names. That code was
  changed to match the code that was already doing the substitution
  properly, via substitute_words(). Also added comments noting the
  importance of using substitute_words() in both instances.
- Comment updates.
2021-11-17 13:02:00 -06:00
Field G. Van Zee
74c0c62221 Reverted cbc88fe.
Details:
- Reverted the annotation of some markdown code blocks with 'bash'
  after realizing that the in-browser syntax highlighting was not
  worthwhile.
2021-11-16 16:06:33 -06:00
Field G. Van Zee
cbc88feb51 Marked some markdown shell code blocks as 'bash'.
Details:
- Annotated the code blocks that represent shell commands and output as
  'bash' in README.md and BuildSystem.md.
2021-11-16 16:02:39 -06:00
Field G. Van Zee
78cd1b0451 Added 'Example Code' section to README.md.
Details:
- Inserted a new 'Example Code' section into the README.md immediately
  after the 'Getting Started' section. Thanks to Devin Matthews for
  recommending this addition.
- Moved the 'Performance' section of the README down slightly so that it
  appears after the 'Documentation' section.
2021-11-16 15:53:40 -06:00
Field G. Van Zee
7bde468c6f Added support for addons.
Details:
- Implemented a new feature called addons, which are similar to
  sandboxes except that there is no requirement to define gemm or any
  other particular operation.
- Updated configure to accept --enable-addon=<name> or -a <name> syntax
  for requesting an addon be included within a BLIS build. configure now
  outputs the list of enabled addons into config.mk. It also outputs the
  corresponding #include directives for the addons' headers to a new
  companion to the bli_config.h header file named bli_addon.h. Because
  addons may wish to make use of existing BLIS types within their own
  definitions, the addons' headers must be included sometime after that
  of bli_config.h (which currently is #included before bli_type_defs.h).
  This is why the #include directives needed to go into a new top-level
  header file rather than the existing bli_config.h file.
- Added a markdown document, docs/Addons.md, to explain addons, how to
  build with them, and what assumptions their authors should keep in
  mind as they create them.
- Added a gemmlike-like implementation of sandwich gemm called 'gemmd'
  as an addon in addon/gemmd. The code uses a 'bao_' prefix for local
  functions, including the user-level object and typed APIs.
- Updated .gitignore so that git ignores bli_addon.h files.
2021-11-13 16:39:37 -06:00
Meghana-vankadari
7bc8ab485e Added BLAS/CBLAS APIs for axpby, gemm_batch. (#566)
Details:
- Expanded the BLAS compatibility layer to include support for 
  ?axpby_() and ?gemm_batch_(). The former is a straightforward
  BLAS-like interface into the axpbyv operation while the latter
  implements a batched gemm via loops over bli_?gemm(). Also
  expanded the CBLAS compatibility layer to include support for
  cblas_?axpby() and cblas_?gemm_batch(), which serve as wrappers to 
  the corresponding (new) BLAS-like APIs. Thanks to Meghana Vankadari
  for submitting these new APIs via #566.
- Fixed a long-standing bug in common.mk that for some reason never
  manifested until now. Previously, CBLAS source files were compiled
  *without* the location of cblas.h being specified via a -I flag.
  I'm not sure why this worked, but it may be due to the fact that
  the cblas.h file resided in the same directory as all of the CBLAS
  source, and perhaps compilers implicitly add a -I flag for the
  directory that corresponds to the location of the source file being
  compiled. This bug only showed up because some CBLAS-like source code
  was moved into an 'extra' subdirectory of that frame/compat/cblas/src
  directory. After moving the code, compilation for those files failed
  (because the cblas.h header file, presumably, could not be found in
  the same location). This bug was fixed within common.mk by explicitly
  adding the cblas.h directory to the list of -I flags passed to the
  compiler.
- Added test_axpbyv.c and test_gemm_batch.c files to 'test' directory,
  and updated test/Makefile to build those drivers.
- Fixed typo in error message string in cblas_sgemm.c.
2021-11-11 16:46:14 -06:00
Devin Matthews
28b0982ea7 Refactored her[2]k/syr[2]k in terms of gemmt. (#531)
Details:
- Renamed herk macrokernels and supporting files and functions to gemmt, 
  which is possible since at the macrokernel level they are identical. 
  Then recast herk/her2k/syrk/syr2k in terms of gemmt within the expert
  level-3 oapi (bli_l3_oapi_ex.c) while also redefining them as literal
  functions rather than cpp macros that instantiate multiple functions.
  Thanks to Devin Matthews for his efforts on this issue (#531).
- Check that the maximum stack buffer size is sufficiently large
  relative to the register blocksizes for each datatype, and do so when
  the context is initialized rather than when an operation is called.
  Note that with this change, users who pass in their own contexts into
  the expert interfaces currently will *not* have any checks performed.
  Thanks to Devin Matthews for suggesting this change.
2021-11-10 12:34:50 -06:00
Field G. Van Zee
cfa3db3f34 Fixed bug in mixed-dt gemm introduced in e9da642.
Details:
- Fixed a bug that broke certain mixed-datatype gemm behavior. This
  bug was introduced recently in e9da642 when the code that performs
  the operation transposition (for microkernel IO preference purposes)
  was moved up so that it occurred sooner. However, when I moved that
  code, I failed to notice that there was a cpp-protected "if"
  conditional that applied to the entire code block that was moved. Once
  the code block was relocated, the orphaned if-statement was now
  (erroneously) glomming on to the next thing that happened to be in the
  function, which happened to be the call to bli_rntm_set_ways_for_op(),
  causing a rather odd memory exhaustion error in the sba due to the
  num_threads field of the rntm_t still being -1 (because the rntm_t
  field were never processed as they should have been). Thanks to
  @ArcadioN09 (Snehith) for reporting this error and helpfully including
  relevant memory trace output.
2021-11-03 18:13:56 -05:00
Field G. Van Zee
f065a8070f Removed support for 3m, 4m induced methods.
Details:
- Removed support for all induced methods except for 1m. This included
  removing code related to 3mh, 3m1, 4mh, 4m1a, and 4m1b as well as any
  code that existed only to support those implementations. These
  implementations were rarely used and posed code maintenance challenges
  for BLIS's maintainers going forward.
- Removed reference kernels for packm that pack 3m and 4m micropanels,
  and removed 3m/4m-related code from bli_cntx_ref.c.
- Removed support for 3m/4m from the code in frame/ind, then reorganized
  and streamlined the remaining code in that directory. The *ind(),
  *nat(), and *1m() APIs were all removed. (These additional API layers
  no longer made as much sense with only one induced method (1m) being
  supported.) The bli_ind.c file (and header) were moved to frame/base
  and bli_l3_ind.c (and header) and bli_l3_ind_tapi.h were moved to
  frame/3.
- Removed 3m/4m support from the code in frame/1m/packm.
- Removed 3m/4m support from trmm/trsm macrokernels and simplified some
  pointer arithmetic that was previously expressed in terms of the
  bli_ptr_inc_by_frac() static inline function (whose definition was
  also removed).
- Removed the following subdirectories of level-0 macro headers from
  frame/include/level0: ri3, rih, ri, ro, rpi. The level-0 scalar macros
  defined in these directories were used exclusively for 3m and 4m
  method codes.
- Simplified bli_cntx_set_blkszs() and bli_cntx_set_ind_blkszs() in
  light of 1m being the only induced method left within BLIS.
- Removed dt_on_output field within auxinfo_t and its associated
  accessor functions.
- Re-indexed the 1e/1r pack schemas after removing those associated with
  variants of the 3m and 4m methods. This leaves two bits unused within
  the pack format portion of the schema bitfield. (See bli_type_defs.h
  for more info.)
- Spun off the basic and expert interfaces to the object and typed APIs
  into separate files: bli_l3_oapi.c and bli_l3_oapi_ex.c; bli_l3_tapi.c
  and bli_l3_tapi_ex.c.
- Moved the level-3 operation-specific _check function calls from the
  operations' _front() functions to the corresponding _ex() function of
  the object API. (This change roughly maintains where the _check()
  functions are called in the call stack but lays the groundwork for
  future changes that may come to the level-3 object APIs.) Minor
  modifications to bli_l3_check.c to allow the check() functions to be
  called from the expert interface APIs.
- Removed support within the testsuite for testing the aforementioned
  induced methods, and updated the standalone test drivers in the 'test'
  directory so reflect the retirement of those induced methods.
- Modified the sandbox contract so that the user is obliged to define
  bli_gemm_ex() instead of bli_gemmnat(). (This change was made in light
  of the *nat() functions no longer existing.) Also updated the existing
  'power10' and 'gemmlike' sandboxes to come into compliance with the
  new sandbox rules.
- Updated BLISObjectAPI.md, BLISTypedAPI.md, Testsuite.md documentation
  to reflect the retirement of 3m/4m, and also modified Sandboxes.md to
  bring the document into alignment with new conventions.
- Updated various comments; removed segments of commented-out code.
2021-10-28 16:05:43 -05:00