amd/blis - blis - Public git mirror

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-22 01:18:18 +00:00

Author	SHA1	Message	Date
Field G. Van Zee	849aae09f4	Added new packm var3 to 'gemmlike'. Details: - Defined a new packm variant for the 'gemmlike' sandbox. This new variant (bls_l3_packm_var3.c) parallelizes the packing operation over the k dimension rather than the m or n dimensions. Note that the gemmlike implementation still uses var1 by default, and use of the new code would require changing bls_l3_packm_a.c and/or bls_l3_packm_b.c so that var3 is called instead. Thanks to Jeff Diamond for proposing this (perhaps NUMA-friendly) solution.	2021-09-16 14:47:45 -05:00
Field G. Van Zee	d6eb70fbc3	Updated stale calls to malloc_intl() in gemmlike. Details: - Updated two out-of-date calls to bli_malloc_intl() within the gemmlike sandbox. These calls to malloc_intl(), which resided in bls_l3_decor_pthreads.c, were missing the err_t argument that the function uses to report errors. Thanks to Jeff Diamond for helping isolate this issue.	2021-08-26 13:12:39 -05:00
Field G. Van Zee	3b275f810b	Minor tweaks to gemmlike sandbox. Details: - In the gemmlike sandbox, changed the loop index variable of inner loop of packm_cxk() from 'd' to 'i' (and likewise for the corresponding inlined code within packm_var2()). - Pack matrices A and B using packm_var1() instead of packm_var2().	2021-08-19 16:06:46 -05:00
Field G. Van Zee	3eccfd456e	Added local _check() code to gemmlike sandbox. Details: - Added code to the gemmlike sandbox that handles parameter checking. Previously, the gemmlike implementation called bli_gemm_check(), which resides within the BLIS framework proper. Certain modifications that a user may wish to perform on the sandbox, such as adding a new matrix or vector operand, would have required additional checks, and so these changes make it easier for such a person to implement those checks for their custom gemm-like operation.	2021-08-19 13:22:10 -05:00
Field G. Van Zee	4a955e9390	Tweaks to gemmlike to facilitate 3rd party mods. Details: - Changed the implementation in the 'gemmlike' sandbox to more easily allow others to provide custom implementations of packm. These changes include: - Calling a local version of packm_cxk() that can be modified. This version of packm_cxk() uses inlined loops in packm_cxk() rather than querying the context for packm kernels (or even using scal2m). - Providing two variants of packm, one of which calls the aforementioned packm_cxk(), the other of which inlines the contents of packm_cxk() into the variant itself, making it self-contained. To switch from one to the other, simply change which function gets called within bls_packm_a() and bls_packm_b(). - Simplified and cleaned up some variant names in both variants of packm, relative to their parent code.	2021-08-16 13:49:27 -05:00
Field G. Van Zee	e366665cd2	Fixed stale API calls to membrk API in gemmlike. Details: - Updated stale calls to the bli_membrk API within the 'gemmlike' sandbox. This API is now called bli_pba (packed block allocator). Ideally, this forgotten update would have been included as part of `21911d6`, which is when the branch where the membrk->pba changes was introduced was merged into 'master'. - Comment updates.	2021-08-12 14:06:53 -05:00
Field G. Van Zee	aaa10c87e1	Skip clearing temp microtile in gemmlike sandbox. Details: - Removed code from gemmlike sandbox files bls_gemm_bp_var1.c and bls_gemm_bp_var2.c that initializes the elements of the temporary microtile to zero. This code, introduced recently in `7f7d726`, did not actually fix any bug (despite that commit's log entry). The microtile does not need to be initialized because it is completely overwritten by a "beta = 0" invocation of gemm prior to it being read. Any NaNs or Infs present at the outset would have no impact on the output matrix C. Thanks to Devin Matthews for reminding me of this.	2021-06-21 17:53:52 -05:00
Field G. Van Zee	7f7d72610c	Fixed bugs in cpackm kernels, gemmlike code. Details: - Fixed intermittent bugs in bli_packm_haswell_asm_c3xk.c and bli_packm_haswell_asm_c8xk.c whereby the imaginary component of the kappa scalar was incorrectly loaded at an offset of 8 bytes (instead of 4 bytes) from the real component. This was almost certainly a copy- paste bug carried over from the corresonding zpackm kernels. Thanks to Devin Matthews for bringing this to my attention. - Added missing code to gemmlike sandbox files bls_gemm_bp_var1.c and bls_gemm_bp_var2.c that initializes the elements of the temporary microtile to zero. (This bug was never observed in output but rather noticed analytically. It probably would have also manifested as intermittent failures, this time involving edge cases.) - Minor commented-out/disabled changes to testsuite/src/test_gemm.c relating to debugging.	2021-05-31 16:50:18 -05:00
Field G. Van Zee	213dce32d2	Added a new 'gemmlike' sandbox. Details: - Added a new sandbox called 'gemmlike', which implements sequential and multithreaded gemm in the style of gemmsup but also unconditionally employs packing. The purpose of this sandbox is to (1) avoid select abstractions, such as objects and control trees, in order to allow readers to better understand how a real-world implementation of high-performance gemm can be constructed; (2) provide a starting point for expert users who wish to build something that is gemm-like without "reinventing the wheel." Thanks to Jeff Diamond, Tze Meng Low, Nicholai Tukanov, and Devangi Parikh for requesting and inspiring this work. - The functions defined in this sandbox currently use the "bls_" prefix instead of "bli_" in order to avoid any symbol collisions in the main library. - The sandbox contains two variants, each of which implements gemm via a block-panel algorithm. The only difference between the two is that variant 1 calls the microkernel directly while variant 2 calls the microkernel indirectly, via a function wrapper, which allows the edge case handling to be abstracted away from the classic five loops. - This sandbox implementation utilizes the conventional gemm microkernel (not the skinny/unpacked gemmsup kernels). - Updated some typos in the comments of a few files in the main framework.	2021-05-28 14:49:57 -05:00

9 Commits