Commit Graph

9 Commits

Author SHA1 Message Date
Field G. Van Zee
849aae09f4 Added new packm var3 to 'gemmlike'.
Details:
- Defined a new packm variant for the 'gemmlike' sandbox. This new
  variant (bls_l3_packm_var3.c) parallelizes the packing operation over
  the k dimension rather than the m or n dimensions. Note that the
  gemmlike implementation still uses var1 by default, and use of the new
  code would require changing bls_l3_packm_a.c and/or bls_l3_packm_b.c
  so that var3 is called instead. Thanks to Jeff Diamond for proposing
  this (perhaps NUMA-friendly) solution.
2021-09-16 14:47:45 -05:00
Field G. Van Zee
d6eb70fbc3 Updated stale calls to malloc_intl() in gemmlike.
Details:
- Updated two out-of-date calls to bli_malloc_intl() within the gemmlike
  sandbox. These calls to malloc_intl(), which resided in
  bls_l3_decor_pthreads.c, were missing the err_t argument that the
  function uses to report errors. Thanks to Jeff Diamond for helping
  isolate this issue.
2021-08-26 13:12:39 -05:00
Field G. Van Zee
3b275f810b Minor tweaks to gemmlike sandbox.
Details:
- In the gemmlike sandbox, changed the loop index variable of inner
  loop of packm_cxk() from 'd' to 'i' (and likewise for the
  corresponding inlined code within packm_var2()).
- Pack matrices A and B using packm_var1() instead of packm_var2().
2021-08-19 16:06:46 -05:00
Field G. Van Zee
3eccfd456e Added local _check() code to gemmlike sandbox.
Details:
- Added code to the gemmlike sandbox that handles parameter checking.
  Previously, the gemmlike implementation called bli_gemm_check(), which
  resides within the BLIS framework proper. Certain modifications that a
  user may wish to perform on the sandbox, such as adding a new matrix
  or vector operand, would have required additional checks, and so these
  changes make it easier for such a person to implement those checks for
  their custom gemm-like operation.
2021-08-19 13:22:10 -05:00
Field G. Van Zee
4a955e9390 Tweaks to gemmlike to facilitate 3rd party mods.
Details:
- Changed the implementation in the 'gemmlike' sandbox to more easily
  allow others to provide custom implementations of packm. These changes
  include:
  - Calling a local version of packm_cxk() that can be modified. This
    version of packm_cxk() uses inlined loops in packm_cxk() rather
    than querying the context for packm kernels (or even using scal2m).
  - Providing two variants of packm, one of which calls the
    aforementioned packm_cxk(), the other of which inlines the contents
    of packm_cxk() into the variant itself, making it self-contained.
    To switch from one to the other, simply change which function gets
    called within bls_packm_a() and bls_packm_b().
  - Simplified and cleaned up some variant names in both variants of
    packm, relative to their parent code.
2021-08-16 13:49:27 -05:00
Field G. Van Zee
e366665cd2 Fixed stale API calls to membrk API in gemmlike.
Details:
- Updated stale calls to the bli_membrk API within the 'gemmlike'
  sandbox. This API is now called bli_pba (packed block allocator).
  Ideally, this forgotten update would have been included as part of
  21911d6, which is when the branch where the membrk->pba changes was
  introduced was merged into 'master'.
- Comment updates.
2021-08-12 14:06:53 -05:00
Field G. Van Zee
aaa10c87e1 Skip clearing temp microtile in gemmlike sandbox.
Details:
- Removed code from gemmlike sandbox files bls_gemm_bp_var1.c and
  bls_gemm_bp_var2.c that initializes the elements of the temporary
  microtile to zero. This code, introduced recently in 7f7d726, did
  not actually fix any bug (despite that commit's log entry). The
  microtile does not need to be initialized because it is completely
  overwritten by a "beta = 0" invocation of gemm prior to it being
  read. Any NaNs or Infs present at the outset would have no impact
  on the output matrix C. Thanks to Devin Matthews for reminding me
  of this.
2021-06-21 17:53:52 -05:00
Field G. Van Zee
7f7d72610c Fixed bugs in cpackm kernels, gemmlike code.
Details:
- Fixed intermittent bugs in bli_packm_haswell_asm_c3xk.c and
  bli_packm_haswell_asm_c8xk.c whereby the imaginary component of the
  kappa scalar was incorrectly loaded at an offset of 8 bytes (instead
  of 4 bytes) from the real component. This was almost certainly a copy-
  paste bug carried over from the corresonding zpackm kernels. Thanks to
  Devin Matthews for bringing this to my attention.
- Added missing code to gemmlike sandbox files bls_gemm_bp_var1.c and
  bls_gemm_bp_var2.c that initializes the elements of the temporary
  microtile to zero. (This bug was never observed in output but rather
  noticed analytically. It probably would have also manifested as
  intermittent failures, this time involving edge cases.)
- Minor commented-out/disabled changes to testsuite/src/test_gemm.c
  relating to debugging.
2021-05-31 16:50:18 -05:00
Field G. Van Zee
213dce32d2 Added a new 'gemmlike' sandbox.
Details:
- Added a new sandbox called 'gemmlike', which implements sequential and
  multithreaded gemm in the style of gemmsup but also unconditionally
  employs packing. The purpose of this sandbox is to
  (1) avoid select abstractions, such as objects and control trees, in
      order to allow readers to better understand how a real-world
      implementation of high-performance gemm can be constructed;
  (2) provide a starting point for expert users who wish to build
      something that is gemm-like without "reinventing the wheel."
  Thanks to Jeff Diamond, Tze Meng Low, Nicholai Tukanov, and Devangi
  Parikh for requesting and inspiring this work.
- The functions defined in this sandbox currently use the "bls_" prefix
  instead of "bli_" in order to avoid any symbol collisions in the main
  library.
- The sandbox contains two variants, each of which implements gemm via a
  block-panel algorithm. The only difference between the two is that
  variant 1 calls the microkernel directly while variant 2 calls the
  microkernel indirectly, via a function wrapper, which allows the edge
  case handling to be abstracted away from the classic five loops.
- This sandbox implementation utilizes the conventional gemm microkernel
  (not the skinny/unpacked gemmsup kernels).
- Updated some typos in the comments of a few files in the main
  framework.
2021-05-28 14:49:57 -05:00