Fixed obscure 3m1/4m1a bugs in trmm[3] and trsm.

Details:
- Fixed a family of bugs in the triangular level-3 operations for
  certain complex implementations (3m1 and 4m1a) that only manifest if
  one of the register blocksizes (PACKMR/PACKNR, actually) is odd:
  - Fixed incorrect imaginary stride computation in bli_packm_blk_var2()
    for the triangular case.
  - Fixed the incorrect computation of imaginary stride, as stored in
    the auxinfo_t struct in trmm and trsm macro-kernels.
  - Fixed incorrect pointer arithmetic in the trsm macro-kernels in the
    cases where the the register blocksize for the triangular matrix is
    odd. Introduced a new byte-granular pointer arithmetic macro,
    bli_ptr_add(), that computes the correct value.
- Added cpp macro to bli_macro_defs.h for typeof() operator, defined in
  terms of __typeof__, which is used by bli_ptr_add() macro.
- Disabled the row- vs. column-storage optimization in bli_trmm_front()
  for singleton problems because the inherent ambiguity of whether a
  scalar is row-stored or column-stored causes the wrong parameter
  combination code to be executed (by dumb luck of our checking for
  row storage first).
- Added commented-out debugging lines to 3m1/4m1a and reference
  micro-kernels, and trsm_ll macro-kernel.
This commit is contained in:
Field G. Van Zee
2015-10-30 18:25:04 -05:00
parent 46294d80e5
commit 37e55ca39b
17 changed files with 222 additions and 65 deletions

View File

@@ -49,6 +49,17 @@
#endif
// -- Define typeof() operator if using non-GNU compiler --
#ifndef __GNUC__
#define typeof __typeof__
#else
#ifndef typeof
#define typeof __typeof__
#endif
#endif
// -- Boolean values --
#ifndef TRUE

View File

@@ -653,6 +653,20 @@
bli_is_rpi_packed( schema ) )
// pointer-related
// p1 = p0 + (num/dem)
#define bli_ptr_add( p1, p0, num, dem ) \
{ \
p1 = ( typeof( p1 ) ) \
( ( char* )(p0) + ( ( (num) * sizeof( *(p0) ) \
) / (dem) \
) \
); \
}
// return datatype for char