Commit Graph

4 Commits

Author SHA1 Message Date
Field G. Van Zee
874707c1b1 Fixed edge case handling bug in herk macrokernels.
Details:
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
  only manifests when BLIS is configured such that MR != NR. The bug involves
  incorrectly detecting edge cases, which resulted in some parts of matrix C
  potentially being skipped and not updated, depending on the problem size.
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
  8 and 4, respectively, so that I can better stress the framework on a
  day-to-day basis. (The fact that they were both equal to 4 for so long is
  why I did not stumble upon this bug much sooner.)
2013-04-05 17:19:43 -05:00
Field G. Van Zee
7cbda15291 Added reference microkernels for arbitrary MR, NR.
Details:
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
  contain explicit loops over MR and NR, thus allowing them to be used
  unmodified by developers who want to build a reference library with
  custom register blocksizes.
- Changed config/reference/bli_kernel.h to use above ukernels by default.
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
  to use 'restrict' keyword.
- Added -funroll-loops option to config/reference/make_defs.mk.
- Updated comments in bli_kernel.h describing constraints on register and
  cache blocksizes.
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
  single-char macros are also defined.
2013-04-04 15:25:43 -05:00
Field G. Van Zee
37308f9a50 Align packed panel strides with system alignment.
Details:
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
  subsequent packed panel of A and B begins at an aligned address. (The
  first panel is presumably aligned to system alignment because it is
  aligned to a page boundary, which is typically much larger.)
- Rearranged code in packm_init_pack() to prevent additional conditional
  blocks as a result of the aforementioned change.
- Adjusted contiguous memory allocator so that the system memory alignment
  is used to allocate enough space for each block no matter what kind of
  register blocking is used (even if register blocksize is unit and every
  row/column needs maximal padding).
- Adjusted default blocksizes in reference configuration so that MC*KC
  and KC*NC result in identical footprints for all datatypes.
2013-03-26 12:43:14 -05:00
Field G. Van Zee
b65cdc57d9 Migrated 'bl2' prefix to 'bli'.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
  corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
2013-03-24 20:01:49 -05:00