Commit Graph

458 Commits

Author SHA1 Message Date
Field G. Van Zee
6f8575ab25 Added zgemm ukernel for avx/sandybridge.
Details:
- Implemented AVX-based zgemm micro-kernel (via GNU extended inline
  assembly syntax).
- Updated sandybridge configuration accordingly.
2014-10-10 10:01:45 -05:00
Field G. Van Zee
23ce7ee542 Merge branch 'master' of github.com:flame/blis 2014-10-09 16:41:22 -05:00
Field G. Van Zee
99fd9a3971 Fixed two minor bugs.
Details:
- Fixed a bug in the test suite for the trsm_ukr and gemmtrsm_ukr test
  modules whereby the uplo bits of some packed matrix objects were not
  being set properly, resulting in false FAILURE results for those
  tests. Thanks to Tyler Smith for bringing this issue to my attention.
- Fixed a bug in bli_obj_alloc_buffer() that caused an unnecessary
  "not yet implemented" abort() when creating a 1x1 object with non-unit
  strides.
2014-10-09 16:38:04 -05:00
Tyler Smith
7a8ad47fb2 Minor changes to knc configuration, including preference row major storage
Also fixed a bug in the knc micro-kernel where it would fail if k == 0
2014-10-08 15:52:13 -05:00
Field G. Van Zee
76b7c34af0 Fixed a bug in the pack schema-related bit macros.
Details:
- Expanded the BLIS_PACK_SCHEMA_BITS value in bli_type_defs.h to
  include all six bits presently used in the pack schema bitfield of
  the info field of obj_t structs. Prior to this commit, the macro
  constant only included the lowest five bits, which excluded the
  "is or is not packed" bit. This manifested as a strange bug in
  probably many level-2 codes that invoked packing, though we only
  observed it in ger before fixing. Thanks to Devin Matthews for
  finding and reporting this bug.
2014-10-02 14:15:38 -05:00
Field G. Van Zee
a5763e3322 Added extra output to bli_obj_print().
Details:
- Print extra values from info field of obj_t struct within
  bli_obj_print().
2014-10-02 13:28:17 -05:00
Tyler Smith
9bba209fc4 Fixed bug when packing anywhere besides in blk_var_1 for gemm. 2014-09-29 14:56:36 -05:00
Tyler Smith
614a4afc92 Merge branch 'master' of http://github.com/flame/blis 2014-09-26 10:49:57 -05:00
Field G. Van Zee
4a7df04e8a Added 30xk support for packm ukernels.
Details:
- Updated bli_kernel_*_macro_defs.h headers to include default
  definitions for 30xk packm kernels.
- Extended function pointer arrays in bli_packm_cxk_*() out to 31 and
  included 30xk kernels.
- Addex 30xk kernels to frame/1m/packm/ukernels/bli_packm_ref_cxk_*.c.
2014-09-22 16:06:15 -05:00
Field G. Van Zee
b6d4bd792e Fixed missing tabs from Makefile patch. 2014-09-22 16:02:37 -05:00
Field G. Van Zee
32630f9b6f Comment update to virtual micro-kernels. 2014-09-19 17:18:20 -05:00
Field G. Van Zee
13447cffea Minor bugfix to top-level Makefile.
Details:
- Applied a patch that allows the top-level Makefile to work on certain
  systems. The patch simply separates out the source-to-object code
  generation rules for .c and .S files into two separate rules. Thanks
  to Devin Matthews for submitting this patch.
2014-09-19 13:00:48 -05:00
Field G. Van Zee
e80a453784 Fixed bug introduced by bugfix in 25b258d.
Details:
- We actually need to check alignment of lda*sizeof(double) and NOT
  a+lda because in the latter case, alignment could cancel out and
  still allow the optimized code to run when it shouldn't. Thanks
  to Devin for pointing this out.
2014-09-18 10:24:20 -05:00
Field G. Van Zee
25b258d61f Fixed a non-fatal problem with bugfix in a68b316c.
Details:
- The bugfix in a68b316c was inadvertantly checkin alignment of the
  leading dimension itself, rather than the byte size of the leading
  dimension. Now, we simply check alignment of a+lda.
2014-09-18 10:10:49 -05:00
Field G. Van Zee
96302d4fc8 Renamed bli_info_get_*_ukr_type() functions.
Details:
- Added _string() suffix to bli_info_get_*_ukr_type() function names.
  This makes them consistent with the bli_info_get_*_impl_string()
  functions.
2014-09-18 09:43:40 -05:00
Field G. Van Zee
a68b316ca4 Fixed alignment bugs in level-1f kernels.
Details:
- Fixed bugs whereby the level-1f dotxf, axpyxf, and dotxaxpyf kernels
  were attempting to compute problems with unaligned leading dimensions
  with optimized code, rather than (correctly) using the reference
  implementations. Thanks to Devin Matthews for reporting this bug.
2014-09-17 11:10:07 -05:00
Field G. Van Zee
870761eb90 Merge branch 'master' of github.com:flame/blis 2014-09-16 18:20:49 -05:00
Field G. Van Zee
e9899be090 Added high-level implementations of 4m, 3m.
Details:
- Added "4mh" and "3mh" APIs, which implement the 4m and 3m methods at
  high levels, respectively. APIs for trmm and trsm were NOT added due
  to the fact that these approaches are inherently incompatible with
  implementing 4m or 3m at high levels (because the input right-hand
  side matrix is overwritten).
- Added 4mh, 3mh virtual micro-kernels, and updated the existing 4m and
  3m so that all are stylistically consistent.
- Added new "rih" packing kernels (both low-level and structure-aware)
  to support both 4mh and 3mh.
- Defined new pack_t schemas to support real-only, imaginary-only, and
  real+imaginary packing formats.
- Added various level0 scalar macros to support the rih packm kernels.
- Minor tweaks to trmm macro-kernels to facilitate 4mh and 3mh.
- Added the ability to enable/disable 4mh, 3m, and 3mh, and adjusted
  level-3 front-ends to check enabledness of 3mh, 3m, 4mh, and 4m (in
  that order) and execute the first one that is enabled, or the native
  implementation if none are enabled.
- Added implementation query functions for each level-3 operation so
  that the user can query a string that describes the implementation
  that is currently enabled.
- Updated test suite to output implementation types for reach level-3
  operation, as well as micro-kernel types for each of the five micro-
  kernels.
- Renamed BLIS_ENABLE_?COMPLEX_VIA_4M macros to _ENABLE_VIRTUAL_?COMPLEX.
- Fixed an obscure bug when packing Hermitian matrices (regular packing
  type) whereby the diagonal elements of the packed micro-panels could
  get tainted if the source matrix's imaginary diagonal part contained
  garbage.
2014-09-16 18:19:32 -05:00
Tyler Smith
a2b59a37f1 Fixed make defs so that they actually compile for bulldozer 2014-09-15 10:44:44 -05:00
Tyler Smith
86fc7e4076 Added bulldozer configuration and updated piledriver micro-kernel 2014-09-15 10:35:46 -05:00
Field G. Van Zee
0644e61a79 Minor updates to bli_packm_init.c. 2014-09-11 12:55:34 -05:00
Field G. Van Zee
9dc9b44a05 Renamed bli_obj_pack_status() to _pack_schema().
Details:
- Renamed the bli_obj_pack_status() macro to bli_obj_pack_schema() in
  order to help avoid confusion as to what the macro returns.
2014-09-11 12:03:28 -05:00
Field G. Van Zee
cf5efdde05 Pass pack_t schemas into ukernels via auxinfo_t.
Details:
- Modified macro-kernels to pass the pack_t schema values for matrices
  A and B into the datatype-specific functions, where they are now
  inserted into a newly-expanded auxinfo_t struct. This gives gives the
  micro-kernels access to the pack_t schema values embedded in the
  control trees, which determine the precise format into which the
  matrix elements are packed.
- Updated a call to bli_packm_init_pack() in src/test_libblis.c to
  remove densify argument. Meant to include this in commit c472993b.
2014-09-11 11:47:56 -05:00
Field G. Van Zee
cc8d2b8277 Updated old test drivers in 'test'. 2014-09-09 13:48:22 -05:00
Field G. Van Zee
c472993bbc Removed densify argument to packm_cntl_obj_create().
Details:
- Removed the "densify" bool_t argument to bli_packm_cntl_obj_create().
  This argument was inserted very early in BLIS's development, when it
  was anticipated that the developer may sometimes wish to pack a
  Hermitian, symmetric, or triangular matrix without making it dense.
  But as it turns out, if we are packing a matrix, we always want to
  make it dense in some way or another due to the fact that the micro-
  kernel only multiplies dense micro-panels. Thus, unless/until there
  is a real need for the feature, it seems reasonable to remove it from
  the packm_cntl API.
2014-09-09 13:42:04 -05:00
Field G. Van Zee
5c43ee3871 Moved trmm4m/3m_cntl files to 'old' directory.
Details:
- Meant to include this in previous commit.
2014-09-08 15:19:29 -05:00
Field G. Van Zee
7b2f469d54 Retired trmm_t control tree definitions, usage.
Details:
- Replaced all trmm_t control tree instances and usage with that of
  gemm_t. This change is similar to the recent retirement of the herk_t
  control tree.
- Tweaked packm blocked variants so that the triangular code does NOT
  assume that k is a multiple of MR (when A is triangular) or NR (when
  B is triangular). This means that bottom-right micro-panels packed for
  trmm will have different zero-padding when k is not already a multiple
  of the relevant register blocksize. While this creates a seemingly
  arbitrary and unnecessary distinction between trmm and trsm packing,
  it actually allows trmm to be handled with one control tree, instead
  of one for left and one for right side cases. Furthermore, since only
  one tree is required, it can now be handled by the gemm tree, and thus
  the trmm control tree definitions can be disposed of entirely.
- Tweaked trmm macro-kernels so that they do NOT inflate k up to a
  multiple of MR (when A is triangular) or NR (when B is triangular).
- Misc. tweaks and cleanups to bli_packm_struc_cxk_4m.c and _3m.c, some
  of which are to facilitate above-mentioned changes whereby k is no
  longer required to be a multiple of register blocksize when packing
  triangular micro-panels.
- Adjusted trmm3 according to above changes.
- Retired trmm_t control tree creation/initialization functions.
2014-09-08 14:49:50 -05:00
Field G. Van Zee
576e9e9255 Retired herk_t control tree definitions, usage.
Details:
- Replaced all herk_t control tree instances and usage with that of
  gemm_t, since the two types presently have the same fields. This means
  that herk, her2k, syrk, and syr2k can simply use the gemm control tree
  as-is, just as hemm and symm have been doing for some time now.
- Retired herk_t control tree creation/initialization functions.
- Retired many _target.c and .h files into 'old' directories.
2014-09-07 16:12:52 -05:00
Field G. Van Zee
b2fed052c9 Minor code cleanup to bli_packm_struc_cxk*.c
Details:
- Realized that we don't need to track rs_p11 and cs_p11 for
  Hermitian/symmetric case of bli_packm_struc_cxk*(). They are always
  equal to rs_p and cs_p.
2014-09-03 17:07:25 -05:00
Field G. Van Zee
023ce77096 Minor update to packm_cxk kernels.
Details:
- Changed m and n dimension parameter names to panel_dim and panel_len,
  respectively, in packm_cxk, packm_cxk_3m, packm_cxk_4m kernel wrapper
  functions. This makes the code a little easier to read since "m" and
  "n" have connotations that are not applicable here.
- Comment updates.
2014-09-03 10:47:53 -05:00
Field G. Van Zee
189def3667 Retired portions of bli_kernel_3m/4m_macro_defs.h.
Details:
- Removed sections of bli_kernel_[4m|3m]_macro_defs.h that defined
  4m/3m-specific blocksizes after realizing that this can be done in
  bli_gemm[4m|3m]_cntl.c, since that is (mostly) the only place they
  are used.
- The maximum cache values for 4m/3m are stll needed when computing mem
  pool dimensions in bli_mem_pool_macro_defs.h. As a workaround, "local"
  definitions in terms of the regular cache blocksizes are now in place.
- Similarly, the register blocksizes for 4m/3m are still needed in
  bli_kernel_post_macro_defs.h. As a workaround, "local" definitions in
  terms of the regular register blocksizes are now in place.
2014-09-01 16:23:17 -05:00
Field G. Van Zee
af521ee6f2 Changed semantics of blocksize extensions.
Details:
- Changed semantics of cache and register blocksize extensions so that
  the extended values are tracked, rather than just the marginal
  extensions.
- BLIS_EXTEND_[MKN]C_? has been renamed BLIS_MAXIMUM_[MKN]C_?.
- BLIS_EXTEND_[MKN]R_? has been renamed BLIS_PACKDIM_[MKN]R_?.
- bli_blksz_ext_*() APIs have been renamed to bli_blksz_max_*(). Note
  that these "max" query routines grab the maximum value for cache
  blocksizes and the packdim value for register blocksizes.
- bli_info_*() API has been updated accordingly.
- All configurations have been updated accordingly.
2014-09-01 14:06:46 -05:00
Field G. Van Zee
07f23aefd5 Pass pack schema into packm_struc_cxk*().
Details:
- Changed the interface to the packm_struc_cxk*() kernels to include
  the pack_t schema. This allows the implementation to more easily
  determine how the micro-panel is stored (row-stored column panel
  or column-stored row panel).
- Updated packm blocked variants to pass in the schema.
- Updated packm_ker_t function pointer definition accordingly.
2014-08-31 11:58:50 -05:00
Field G. Van Zee
f032ba9b11 Reorganized packm implementation.
Details:
- Reorganized packm variants and structure-aware kernels so that all
  routines for a given pack format (4m, 3m, regular) reside in a single
  file.
- Renamed _blk_var4 to _blk_var2 and generalized so that it will work
  for
  both 4m and 3m, and adjusted 4m/3m _cntl_init() functions accordingly.
- Added a new packm_ker_t function pointer type to
  bli_kernel_type_defs.h
  to facilitate function pointer typecasting in the datatype-specific
  packm_blk_var2() functions.
- Deprecated _blk_var3.
- Fixed a bug in the triangular micro-panel packing facility that
  affected trmm and trmm3 with unit diagonals.
2014-08-30 16:21:20 -05:00
Field G. Van Zee
c6793cecb7 Reorganized #includes for scalar macro headers.
Details:
- Reordered the #include statements in bli_scalar_macro_defs.h so that
  conventional, ri-, and ri3-based macros are grouped together.
- Renamed bli_eqri.h (and macros within) to end with 'ris' suffix.
2014-08-28 17:14:48 -05:00
Field G. Van Zee
b4da890728 Whitespace, comments updates on packm_blk_var?.c. 2014-08-28 14:10:32 -05:00
Field G. Van Zee
46e46a1d83 Minor updates to packm blocked, cxk_3m/4m code.
Details:
- Added 'const' qualifier to inlined packing code that handles
  micro-panel packing that is too large for an existing packm ukernel.
- Comment updates.
2014-08-28 12:05:45 -05:00
Field G. Van Zee
908dc688b5 Pass pack schema into blocked packm routines.
Details:
- Rather than passing the packm blocked routines a boolean value that
  represents whether the matrix is being packed to row or column storage,
  we now pass in the pack schema itself.
2014-08-28 11:55:12 -05:00
Field G. Van Zee
a0ff6066e0 Merge branch 'master' of github.com:flame/blis 2014-08-24 15:56:21 -05:00
Field G. Van Zee
c4c99c4813 Renamed packm scalar from beta to kappa.
Details:
- The packm implementation (i.e. sources files in frame/1m/packm and
  frame/1m/packm/ukernels), interchangeably used the names "beta" and
  "kappa" to refer to the optional scalar to be applied during packing.
  This commit renames all uses of "beta" to be "kappa", since "beta"
  sometimes evokes the scalar specifically on the output matrix of a
  level-2 or level-3 operation.
2014-08-24 15:52:22 -05:00
Field G. Van Zee
d40b32bc24 Merge branch 'master' of github.com:flame/blis 2014-08-24 13:46:36 -05:00
Field G. Van Zee
6c25c379fa Consolidated unpackm ukernels into single file.
Details:
- Reorganized unpackm ukernels into a single file,
  bli_unpackm_ref_cxk.c, in a manner similar to what was done for packm
  ukernels in commit 4cc2b46.
2014-08-24 13:44:10 -05:00
Field G. Van Zee
9331f79443 Merge branch 'master' of github.com:flame/blis 2014-08-24 10:54:21 -05:00
Field G. Van Zee
670b63926a Added whitespace to bli_obj_scalar_ routine calls.
Details:
- Added extra spaces to align arguments of
  bli_obj_scalar_init_detached_copy_of(). This misalignment was due to
  the fact that the function was previously named
  bli_obj_init_scalar_copy_of() and the name change, performed in
  b444489f, was done via recursive sed commands which left subsequent
  lines untouched.
2014-08-24 10:46:27 -05:00
Field G. Van Zee
7fc48a7d92 Combined 4m/3m bits into an expanded bitfield.
Details:
- Combined the 4m/3m bits into an expanded bitfield, which will encode
  the packing "format" of the micro-panels. This will allow for more
  easily and compactly encoding additional formats.
- Other minor comment/whitespace updates to bli_type_defs.h.
- Updated bli_obj_macro_defs.h and bli_param_macro_defs.h to use the new
  format bitfield.
- Comment update to bli_kernel_post_macro_defs.h.
- Whitespace changes to bli_kernel_3m_macro_defs.h, _4m_macro_defs.h.
2014-08-23 16:50:58 -05:00
Field G. Van Zee
ef0143cc14 Renamed _ri, _ri3 packm ukernels to _4m, _3m.
Details:
- Renamed packm ukernels, _cxk dispatcher, and structure-aware _cxk
  helper functions to use _4m and _3m instead of _ri and _ri3 suffixes.
- Updated names of cpp macros that correspond to packm ukernels.
2014-08-23 14:02:27 -05:00
Field G. Van Zee
b0ccac1161 Cleaned up front-end layering for 4m/3m.
Details:
- Added an extra layer to level-3 front-ends (examples: bli_gemm_entry()
  and bli_gemm4m_entry()) to hide the control trees from the code that
  decides whether to execute native or 4m-based implementations. The
  layering was also applied to 3m.
- Branch to 4m code based on the return value of bli_4m_is_enabled(),
  rather than the cpp macros BLIS_ENABLE_?COMPLEX_VIA_4M. This lays
  the groundwork for users to be able to change at runtime which
  implementation is called by the main front-ends (e.g. bli_gemm()).
- Retired some experimental gemm code that hadn't been touched in
  months.
2014-08-21 19:21:52 -05:00
Field G. Van Zee
bedec95451 Added bli_4m API for querying 4m enabled state.
Details:
- Added bli_4m.c (and header), which defines a simple API that can be
  used to query, enable, and disable 4m-based complex support in BLIS.
  The macros BLIS_ENABLE_?COMPLEX_VIA_4M are now used to initialize
  the variable that determines the state (enabled or disabled).
- Changed bli_info*() API so that all cache and register blocksize-
  related query routines return the blksz_t objects' values as they
  exist at runtime, rather than return the values as determined by the
  configuration system (e.g. bli_kernel.h, or defaults for those values
  not specified). This sets the foundation for being able to change
  those blocksizes at runtime.
2014-08-21 18:25:48 -05:00
Tyler Smith
b541b667ca Merge branch 'master' of http://github.com/flame/blis
Conflicts:
	frame/3/trsm/bli_trsm_blk_var2b.c
	frame/3/trsm/bli_trsm_blk_var2f.c
2014-08-20 14:44:51 -05:00
Tyler Smith
699a8151ca Some improvements to trsm parallelism 2014-08-20 14:43:17 -05:00