Commit Graph

7 Commits

Author SHA1 Message Date
Field G. Van Zee
72f6ed0637 Declare/define static functions via BLIS_INLINE.
Details:
- Updated all static function definitions to use the cpp macro
  BLIS_INLINE instead of the static keyword. This allows blis.h to
  use a different keyword (inline) to define these functions when
  compiling with C++, which might otherwise trigger "defined but
  not used" warning messages. Thanks to Giorgos Margaritis for
  reporting this issue and Devin Matthews for suggesting the fix.
- Updated the following files, which are used by configure's
  hardware auto-detection facility, to unconditionally #define
  BLIS_INLINE to the static keyword (since we know BLIS will be
  compiled with C, not C++):
    build/detect/config/config_detect.c
    frame/base/bli_arch.c
    frame/base/bli_cpuid.c
- CREDITS file update.
2020-07-03 17:55:54 -05:00
Field G. Van Zee
f808d829c5 Handle edge cases, zero-filling in packm kernels.
Details:
- Updated the API and semantics of packm kernels such that they must now
  handle edge cases, meaning that a c-by-k packm kernel must be able to
  pack edge cases that are fewer than c rows/columns and be able to
  zero-fill the remaining elements. They must also be able to zero-fill
  the equivalent region when copying fewer than k columns/rows (which is
  needed by trsm). The new packm kernel API is generally:

    void packm_kernel
         (
           conj_t           conja,
           dim_t            cdim,
           dim_t            n,
           dim_t            n_max,
           ctype*  restrict kappa,
           ctype*  restrict a, inc_t inca, inc_t lda,
           ctype*  restrict p,             inc_t ldp,
           cntx_t* restrict cntx
         );

  where cdim and n are the dimensions (short and long, respectively) of
  the submatrix being copied from the source matrix A, and n_max is the
  "full" long dimension (corresponding to the k dimension in gemm) of
  the micropanel. The "full" short dimension (corresponding to the
  register blocksize MR or NR) is not part of the API because it is
  known intrinsically by the packm kernel implementation. Thanks to
  Devin Matthews for prompting us to make this change (#282).
- Updated all reference packm kernels in ref_kernels/1m according to
  above changes, as well as all optimized packm kernels (which only
  consisted of those for knl).
- Bumped the major soname version number in 'so_version' to 2. At first
  I was considering leaving it unchanged, but I couldn't escape the
  reality that the packm kernel API is much closer to an expert API
  than it is some obscure helper function interface within the framework
  that nobody would ever notice.
- Removed reference packm kernels for mr/nr = 30. The only sub-config
  that would have been using those kernels is knc, which is likely no
  longer being used by very many people (if any). (This also mostly
  offset the larger object code footprint incurred by moving the edge-
  case handling into the individual packm kernels.)
- Fixed an obscure race condition for 3mh and 4mh induced methods in
  which those implementations were modifying the contexts stored in the
  gks rather than a local copy.
- Fixed a minor bug in the testsuite that prevented non-1m-based induced
  method implementations of trsm from executing.
2018-12-12 15:22:59 -06:00
Field G. Van Zee
0645f239fb Remove UT-Austin from copyright headers' clause 3.
Details:
- Removed explicit reference to The University of Texas at Austin in the
  third clause of the license comment blocks of all relevant files and
  replaced it with a more all-encompassing "copyright holder(s)".
- Removed duplicate words ("derived") from a few kernels' license
  comment blocks.
- Homogenized license comment block in kernels/zen/3/bli_gemm_small.c
  with format of all other comment blocks.
2018-12-04 14:31:06 -06:00
Field G. Van Zee
4fa4cb0734 Trivial comment header updates.
Details:
- Removed four trailing spaces after "BLIS" that occurs in most files'
  commented-out license headers.
- Added UT copyright lines to some files. (These files previously had
  only AMD copyright lines but were contributed to by both UT and AMD.)
- In some files' copyright lines, expanded 'The University of Texas' to
  'The University of Texas at Austin'.
- Fixed various typos/misspellings in some license headers.
2018-08-29 18:06:41 -05:00
Field G. Van Zee
7ed415824d Updated copyright headers (continued).
Details:
- Inserted "at Austin" into third clause of license declarations.
  Meant to include this change in previous commit.
2014-07-14 16:14:33 -05:00
Field G. Van Zee
5c2c6c8561 Updated copyright headers to contain "at Austin".
Details:
- Updated copyright headers to include "at Austin" in the name of the
  University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
  2012).
2014-07-14 16:05:03 -05:00
Field G. Van Zee
6363a9f658 Added level-3 support for complex via 4m-/3m.
Details:
- Added the ability to induce complex domain level-3 operations via new
  virtual complex micro-kernels which are implemented via only real
  domain micro-kernels. Two new implementations are provided: 4m and 3m.
  4m implements complex matrix multiplication in terms of four real
  matrix multiplications, where as 3m uses only three and thus is
  capable of even higher (than peak) performance. However, the 3m method
  has somewhat weaker numerical properties, making it less desirable
  in general.
- Further refined packing routines, which were recently revamped, and
  added packing functionality for 4m and 3m.
- Some modifications to trmm and trsm macro-kernels to facilitate indexing
  into micro-panels which were packed for 4m/3m virtual kernels.
- Added 4m and 3m interfaces for each level-3 operation.
- Various other minor changes to facilitate 4m/3m methods.
2014-02-19 17:00:52 -06:00