Commit Graph

2 Commits

Author SHA1 Message Date
Field G. Van Zee
26a4b8f6f9 Implemented 3m2, 3m3 induced algorithms (gemm only).
Details:
- Defined a new "3ms" (separated 3m) pack schema and added appropriate
  support in packm_init(), packm_blk_var2().
- Generalized packm_struc_cxk_3mi to take the imaginary stride (is_p)
  as an argument instead of computing it locally. Exception: for trmm,
  is_p must be computed locally, since it changes for triangular
  packed matrices. Also exposed is_p in interface to dt-specific
  packm_blk_var2 (and _var1, even though it does not use imaginary
  stride).
- Renamed many functions/variables from _3mi to _3mis to indicate that
  they work for either interleaved or separated 3m pack schemas.
- Generalized gemm and herk macro-kernels to pass in imaginary stride
  rather than compute them locally.
- Added support for 3m2 and 3m3 algorithms to frame/ind, including 3m2-
  and 3m3-specific virtual micro-kernels.
- Added special gemm macro-kernels to support 3m2 and 3m3.
- Added support for 3m2 and 3m3 to testsuite.
- Corrected the type of the panel dimension (pd_) in various macro-
  kernels from inc_t to dim_t.
- Renamed many functions defined in bli_blocksize.c.
- Moved most induced-related macro defs from frame/include to
  frame/ind/include.
- Updated the _ukernel.c files so that the micro-kernel function pointers
  are obtained from the func_t objects rather than the cpp macros that
  define the function names.
- Updated test/3m4m driver, Makefile, and run script.
2015-04-01 10:44:54 -05:00
Field G. Van Zee
f1a6b7d028 Reorganized code for induced complex methods.
Details:
- Consolidated most of the code relating to induced complex methods
  (e.g. 4mh, 4m1, 3mh, 3m1, etc.) into frame/ind. Induced methods
  are now enabled on a per-operation basis. The current "available"
  (enabled and implemented) implementation can then be queried on
  an operation basis. Micro-kernel func_t objects as well as blksz_t
  objects can also be queried in a similar maner.
- Redefined several micro-kernel and operation-related functions in
  bli_info_*() API, in accordance with above changes.
- Added mr and nr fields to blksz_t object, which point to the mr
  and nr blksz_t objects for each cache blocksize (and are NULL for
  register blocksizes). Renamed the sub-blocksize field "sub" to
  "mult" since it is really expressing a blocksize multiple.
- Updated bli_*_determine_kc_[fb]() for gemm/hemm/symm, trmm, and
  trsm to correctly query mr and nr (for purposes of nudging kc).
- Introduced an enumerated opid_t in bli_type_defs.h that uniquely
  identifies an operation. For now, only level-3 id values are defined,
  along with a generic, catch-all BLIS_NOID value.
- Reworked testsuite so that all induced methods that are enabled
  are tested (one at a time) rather than only testing the first
  available method.
- Reformated summary at the beginning of testsuite output so that
  blocksize and micro-kernel info is shown for each induced method
  that was requested (as well as native execution).
- Reduced the number of columns needed to display non-matlab
  testsuite output (from approx. 90 to 80).
2015-03-18 15:37:10 -05:00