Commit Graph

491 Commits

Author SHA1 Message Date
Field G. Van Zee
81114824a0 Minor 4m/3m consolidation to mem_pool_macro_defs.h.
Details:
- Merged the 4m and 3m definitions in bli_mem_pool_macro_defs.h to
  reduce code and improve readability.
2015-01-06 12:15:21 -06:00
Tyler Michael Smith
36a9b7b743 reduced the default number of MC by KC blocks for bgq 2014-12-17 21:55:50 +00:00
Field G. Van Zee
c60619c7c3 Minor tweaks for 3m4m test drivers.
Details:
- Changed gemm_kc blocksizes to be reduced by two-thirds instead of
  half.
- Changed 3m4m/test_gemm.c driver to divide by 3 instead of 2 when
  computing the fixed k dimension.
- Fixed runme.sh so that it would use multiple threads for s/dgemm
  cases.
2014-12-16 17:08:22 -06:00
Field G. Van Zee
c6929ba6a5 Added 4m_1b to test/3m4m test driver and script. 2014-12-16 11:27:50 -06:00
Field G. Van Zee
785d480805 Merge branch 'master' of github.com:flame/blis 2014-12-12 14:34:19 -06:00
Field G. Van Zee
9456f330af Added 4m_1b implementation for gemm.
Details:
- Added yet another 4m-based implementation for complex domain level-3
  operations. This method, which the 3m/4m paper identifies as Algorithm
  "4m_1b" fissures the first loop around the micro-kernel so that the
  real sub-panel of the current micro-panel of B is multiplied against
  (both sub-panels of) all micro-panels of A, before doing the same for
  the imaginary sub-panel of the micro-panel of B. For now, only gemm is
  supported, and 4m_1b (labeled "4mb" within the framework) is not yet
  integrated into the test suite.
2014-12-12 14:31:57 -06:00
Field G. Van Zee
4156c0880d Fixed obscure level-2 packing / general stride bug.
Details:
- Fixed a bug in certain structured level-2 operations that manifested
  only when the structured matrix was provided to BLIS as matrix stored
  with general stride. The bug was introduced in c472993b when the
  densify field was removed from the packm control tree node and
  associated APIs. Since then, the packed object was unconditionally
  marked with an uplo field of BLIS_DENSE. This is fine for level-3
  operations where micro-panels are always densified, but in level-2
  contexts, the underlying unblocked variant (fused or unfused) of
  structured operations (e.g. trmv) still needs to know whether to
  execute its "lower" or "upper" branches of code. Since this field
  was unconditionally being set to BLIS_DENSE, the unblocked variants
  were always executed the "else" branch, which happened to be the
  "lower" case code. Thus, running an upper case produced the wrong
  answer. This most obviously manifested in the form of failures for
  trmm, trmm3, and trsm in the test suite.
  The bug was fixed by setting the packed object's uplo field to
  BLIS_DENSE only if the schema indicated that micro-panels were to be
  packed. Otherwise, we can assume we are packing to regular row or
  column storage, as is the case with level-2 packing. Thanks to
  Francisco Igual for reporting the testsuite failures and ultimately
  leading us to this bug.
2014-12-09 16:03:14 -06:00
Field G. Van Zee
689f60a578 Merge pull request #21 from figual/master
Adding armv8a configuration and micro-kernels.
2014-12-07 14:03:30 -06:00
Francisco D. Igual
483e4d6a3f Adding armv8a configuration and micro-kernels.
Only sgemm micro-kernel is fully functional at this point.
2014-12-07 20:27:49 +01:00
Tyler Smith
bef24e67e0 Fixed a type of race condition exposed by pthreads implementation.
Lead thread of the inner thread communicator could exit subproblem, move on the next iteration of the loop and modify a1_pack, b1_pack, or c1_pack while other threads were still using those.

Barriers were inserted to fix this.
2014-11-26 18:00:56 -06:00
Field G. Van Zee
76bde44411 Merge branch 'master' of github.com:flame/blis 2014-11-26 17:25:24 -06:00
Tyler Michael Smith
f3d729e504 Added static mutex to bli_init and bli_finalize 2014-11-26 22:25:24 -06:00
Tyler Michael Smith
d71cc79786 Refactored bli_threading files and added support for pthreads 2014-11-26 21:36:39 -06:00
Field G. Van Zee
e56e61438f Minor cleanups to bli_threading.h and friends.
Details:
- No longer need to define BLIS_ENABLE_MULTITHREADING manually in
  bli_config.h; it now gets defined when BLIS_ENABLE_OPENMP or
  BLIS_ENABLE_PTHREADS is defined.
- Added sanity check to prevent both BLIS__ENABLE_OPENMP and
  BLIS_ENABLE_PTHREADS from being enabled simultaneously.
- Reorganization of bli_threading*.h header files, which led to
  simplification of threading-related part of blis.h.
- added "-fopenmp -lpthread" to LDFLAGS of sandybridge make_defs.mk
  file.
2014-11-26 17:20:35 -06:00
Field G. Van Zee
3be2744cbe Update to template gemm ukernel comments.
Details:
- Updated comments on alignment of a1 and b1 to match wiki.
2014-11-21 12:28:08 -06:00
Field G. Van Zee
994429c688 Merge pull request #20 from TimmyLiu/master
#define PASTEF773 required by cblas compatibility layer
2014-11-20 13:55:35 -06:00
Timmy
694029d9d7 #define PASTEF773 required by cblas compatiility layer 2014-11-19 15:25:14 -06:00
Field G. Van Zee
58796abda6 Removed KC constraint comments from _kernel.h files.
Details:
- Since 4674ca8c, the constraint that KC be a multiple of both MR and
  NR have been relaxed, and thus it was time to remove the comments
  from the top of the bli_kernel.h files of all configurations.
2014-11-06 14:31:52 -06:00
Field G. Van Zee
7bbc95a54f Added new piledriver micro-kernels.
Details:
- Added new micro-kernels for the AMD piledriver architecture (one
  for each datatype).
- Updates and tweaks to piledriver configuration.
- Added 3xk packm micro-kernel support.
- Explicitly unrolled some of the smaller packm micro-kernels.
- Added notes to avx/sandybridge and piledriver micro-kernel files
  acknowledging the influence of the corresponding kernel code in
  OpenBLAS.
2014-10-29 10:52:23 -05:00
Field G. Van Zee
59613f1d55 Added separeate micro-panel alignment for A and B.
Details:
- Changed the recently-added micro-panel alignment macros so that we now
  have two sets--one for micro-panels of matrix A and one for micro-
  panels of matrix B: BLIS_UPANEL_[AB]_ALIGN_SIZE_?.
- Store each set of alignment values into a separate blksz_t object in
  bli_gemm_cntl_init().
- Adjusted packm_init() to use the separate alignment values.
- Added query routines for the new alignment values to bli_info.c.
- Modified test suite output accordingly.
2014-10-23 17:21:37 -05:00
Field G. Van Zee
a8e12884ee CHANGELOG update (0.1.6) 2014-10-23 11:35:48 -05:00
Field G. Van Zee
38ea5022e4 Version file update (0.1.6) 0.1.6 2014-10-23 11:35:45 -05:00
Field G. Van Zee
a3e6341bdb Factored common code from blocksize functions.
Details:
- Split bli_determine_blocksize_[fb]() into two functions each, the
  newer ones ending with the _sub suffix. These new sub-functions are
  now called from bli_[gemm|trmm|trsm]_determine_kc_[fb](), which
  eliminates redundant code and will allow any future tweaks to the
  core sub-functions to automatically be inherited by the operation-
  specific versions.
2014-10-23 11:13:28 -05:00
Field G. Van Zee
4674ca8cff Extended newly relaxed KC to hemm, symm.
Details:
- These changes were intended for the previous commit.
- Defined bli_gemm_determine_kc_[fb]() and bli_gemm_determine_kc_[fb](),
  which determine blocksizes for gemm-based operations, taking special
  care to "nudge" the kc dimension up to a multiple of MR or NR for
  hemm and symm operations, as needed.
- Changed bli_gemm_blk_var3f.c to call bli_gemm_determine_kc_f().
  instead of bli_determine_blocksize_f().
- Comment updates to bli_trmm_blocksize.c, bli_trsm_blocksize.c.
2014-10-23 10:50:59 -05:00
Field G. Van Zee
ab954ba6f8 Relaxed constraint that KC be multiple of MR, NR.
Details:
- Relaxed a long-held requirement in register blocksizes that required
  the kernel programmer to choose a KC that was divisible by both MR
  and NR. This was very constraining on some architectures that did not
  use register blocksizes that were powers of two. The constraint is
  now enforced only for trmm and trsm, where it is needed, and it is
  now handled by "nudging" kc upward at runtime, if necessary, to be a
  multiple of MR or NR, as needed.
- Defined bli_trmm_determine_kc_[fb]() and bli_trsm_determine_kc_[fb](),
  which determine blocksizes for trmm and trsm, taking special care to
  "nudge" the kc dimension up to a multiple of MR or NR, as needed.
- Changed bli_trmm_blk_var3[fb].c to call bli_trmm_determine_kc_[fb]()
  instead of bli_determine_blocksize_[fb]().
- Added safeguard to bli_align_dim_to_mult() that returns the dimension
  unmodified if the dimension multiple is zero (to avoid division by
  zero).
- Removed cpp guard/check for KC % MR == 0 and KC % NR == 0 from
  bli_kernel_macro_defs.h.
- Whitespace, variable name changes to bli_blocksize.c.
- Removed old commented code from bli_gemm_cntl.c.
2014-10-23 10:12:27 -05:00
Tyler Smith
95cdae65d6 Fixed bug in KNC microkernel where k=0 and beta != 1 2014-10-22 16:30:16 -05:00
Field G. Van Zee
e64dba5633 Re-implemented micro-panel alignment.
Details:
- This commit re-implements a feature that was removed in commit
  c2b2ab62. It was removed because, at the time, I wasn't sure how the
  micro-panel alignment feature would interact with the 4m method (when
  applied at the micro-kernrel level), and so it seemed safer to disable
  the feature entirely rather than allow possible breakage. This commit
  revisits the issue and safely re-implements the feature in a way that
  is compatible with 4m, 3m, 4mh, and 3mh (and native execution).
- Modified the static memory pool to account for micro-panel alignment
  space.
- Modified packm_init and blocked variants to align whole micro-panels
  by a datatype-specific alignment value that may be set by the
  configuration. (If it is not set by the configuration, it will default
  to BLIS_SIZEOF_?.)
- Modified macro-kernels so that:
  - storage stride is handled properly given the new micro-panel
    alignment behavior;
  - indexing through 3m/4m/rih-type sub-panels, as is done by trmm and
    trsm, is more robust (e.g. will work if the applicable packing
    register blocksize is odd);
  - imaginary strides are computed and stored within auxinfo_t structs,
    which allows the virtual micro-kernels to more easily determine how
    to index into the micro-panel operands.
- Modified virtual 3m and 4m micro-kernels to use the imaginary strides
  within the auxinfo_t structs instead of panel strides.
- Deprecated the panel stride fields from the auxinfo_t structs.
- Updated test suite to print out the micro-panel alignment values.
2014-10-20 19:23:06 -05:00
Field G. Van Zee
add16b0e54 Added 3m4m test driver subdir of 'test'.
Details:
- Added a modified test driver for [cz]gemm that will test all 3m/4m
  as well as assembly-based and OpenBLAS implementations of gemm
  in single and multithreaded modes.
2014-10-17 11:49:24 -05:00
Field G. Van Zee
e171504a72 Use correct definition of bli_is_last_iter().
Details:
- As intended for previous commit, the new definition of
  bli_is_last_iter() is now disabled in favor of the old
  definition.
2014-10-17 11:25:59 -05:00
Field G. Van Zee
0d954087b2 Minor changes and fixes.
Details:
- Redefined bli_is_last_iter() to take thread_id and num_thread
  arguments, which allows the macro to correctly compute whether a
  given iteration is the last that the thread will compute in that
  particular loop. The new definition, however, remains disabled
  (commented out) until someone can look at this more closely, as
  the new definition seems to actually hurt performance slightly.
- Whitespace and related updates to level-3 macro-kernels.
- Updated test suite so that performance results in the hundreds of
  gigaflops does not disrupt the column alignment of the output.
2014-10-17 11:19:34 -05:00
Field G. Van Zee
d1e86e1876 More minor tweaks to sandybridge/avx micro-kernel.
Details:
- Re-enabled use of b_next for dgemm and cgemm micro-kernels.
2014-10-12 13:43:47 -05:00
Field G. Van Zee
7b6fe4cae5 Minor tweaks to sandybridge/avx micro-kernels.
Details:
- Changed the MC blocksize for zgemm micro-kernel from 128 to 64.
- Removed usage of b_next in all x86_64/avx gemm micro-kernels.
2014-10-12 12:01:51 -05:00
Field G. Van Zee
a6a156e9fe Added cgemm ukernel for avx/sandybridge.
Details:
- Implemented AVX-based cgemm micro-kernel (via GNU extended inline
  assembly syntax).
- Updated sandybridge configuration accordingly.
2014-10-10 14:26:41 -05:00
Field G. Van Zee
6f8575ab25 Added zgemm ukernel for avx/sandybridge.
Details:
- Implemented AVX-based zgemm micro-kernel (via GNU extended inline
  assembly syntax).
- Updated sandybridge configuration accordingly.
2014-10-10 10:01:45 -05:00
Field G. Van Zee
23ce7ee542 Merge branch 'master' of github.com:flame/blis 2014-10-09 16:41:22 -05:00
Field G. Van Zee
99fd9a3971 Fixed two minor bugs.
Details:
- Fixed a bug in the test suite for the trsm_ukr and gemmtrsm_ukr test
  modules whereby the uplo bits of some packed matrix objects were not
  being set properly, resulting in false FAILURE results for those
  tests. Thanks to Tyler Smith for bringing this issue to my attention.
- Fixed a bug in bli_obj_alloc_buffer() that caused an unnecessary
  "not yet implemented" abort() when creating a 1x1 object with non-unit
  strides.
2014-10-09 16:38:04 -05:00
Tyler Smith
7a8ad47fb2 Minor changes to knc configuration, including preference row major storage
Also fixed a bug in the knc micro-kernel where it would fail if k == 0
2014-10-08 15:52:13 -05:00
Field G. Van Zee
76b7c34af0 Fixed a bug in the pack schema-related bit macros.
Details:
- Expanded the BLIS_PACK_SCHEMA_BITS value in bli_type_defs.h to
  include all six bits presently used in the pack schema bitfield of
  the info field of obj_t structs. Prior to this commit, the macro
  constant only included the lowest five bits, which excluded the
  "is or is not packed" bit. This manifested as a strange bug in
  probably many level-2 codes that invoked packing, though we only
  observed it in ger before fixing. Thanks to Devin Matthews for
  finding and reporting this bug.
2014-10-02 14:15:38 -05:00
Field G. Van Zee
a5763e3322 Added extra output to bli_obj_print().
Details:
- Print extra values from info field of obj_t struct within
  bli_obj_print().
2014-10-02 13:28:17 -05:00
Tyler Smith
9bba209fc4 Fixed bug when packing anywhere besides in blk_var_1 for gemm. 2014-09-29 14:56:36 -05:00
Tyler Smith
614a4afc92 Merge branch 'master' of http://github.com/flame/blis 2014-09-26 10:49:57 -05:00
Field G. Van Zee
4a7df04e8a Added 30xk support for packm ukernels.
Details:
- Updated bli_kernel_*_macro_defs.h headers to include default
  definitions for 30xk packm kernels.
- Extended function pointer arrays in bli_packm_cxk_*() out to 31 and
  included 30xk kernels.
- Addex 30xk kernels to frame/1m/packm/ukernels/bli_packm_ref_cxk_*.c.
2014-09-22 16:06:15 -05:00
Field G. Van Zee
b6d4bd792e Fixed missing tabs from Makefile patch. 2014-09-22 16:02:37 -05:00
Field G. Van Zee
32630f9b6f Comment update to virtual micro-kernels. 2014-09-19 17:18:20 -05:00
Field G. Van Zee
13447cffea Minor bugfix to top-level Makefile.
Details:
- Applied a patch that allows the top-level Makefile to work on certain
  systems. The patch simply separates out the source-to-object code
  generation rules for .c and .S files into two separate rules. Thanks
  to Devin Matthews for submitting this patch.
2014-09-19 13:00:48 -05:00
Field G. Van Zee
e80a453784 Fixed bug introduced by bugfix in 25b258d.
Details:
- We actually need to check alignment of lda*sizeof(double) and NOT
  a+lda because in the latter case, alignment could cancel out and
  still allow the optimized code to run when it shouldn't. Thanks
  to Devin for pointing this out.
2014-09-18 10:24:20 -05:00
Field G. Van Zee
25b258d61f Fixed a non-fatal problem with bugfix in a68b316c.
Details:
- The bugfix in a68b316c was inadvertantly checkin alignment of the
  leading dimension itself, rather than the byte size of the leading
  dimension. Now, we simply check alignment of a+lda.
2014-09-18 10:10:49 -05:00
Field G. Van Zee
96302d4fc8 Renamed bli_info_get_*_ukr_type() functions.
Details:
- Added _string() suffix to bli_info_get_*_ukr_type() function names.
  This makes them consistent with the bli_info_get_*_impl_string()
  functions.
2014-09-18 09:43:40 -05:00
Field G. Van Zee
a68b316ca4 Fixed alignment bugs in level-1f kernels.
Details:
- Fixed bugs whereby the level-1f dotxf, axpyxf, and dotxaxpyf kernels
  were attempting to compute problems with unaligned leading dimensions
  with optimized code, rather than (correctly) using the reference
  implementations. Thanks to Devin Matthews for reporting this bug.
2014-09-17 11:10:07 -05:00
Field G. Van Zee
870761eb90 Merge branch 'master' of github.com:flame/blis 2014-09-16 18:20:49 -05:00