Fixed x86_64 kernel bugs and other minor issues.

Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
  unaligned subpartitions. We were already going out of our way a bit to
  handle edge cases in the first iteration for blocked variants, and this
  was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
  into account how the choice of variant needed to be altered for
  upper-stored matrices (given that only lower-stored algorithms are
  explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
  macros to provide inlined versions of bli_determine_blocksize_[fb]() for
  use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
  consistency with that of the bugfix for trmv/trsv (both of which now
  use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
  vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
  conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
  was invalid only because the code was expecting 1 (for purposes of
  performing contiguous vector loads) but got a value greater than 1 because
  the column stride of the object (e.g. rho) was inflated for alignment
  purposes (albeit unnecessarily since there is only one element in the
  object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
  dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
  internal back-ends to correctly handle cases where output operand still
  needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
This commit is contained in:
Field G. Van Zee
2013-05-24 16:28:10 -05:00
parent d57ec42b34
commit 2d9c667f3c
82 changed files with 480 additions and 175 deletions

View File

@@ -485,8 +485,10 @@ bli_obj_width_stored( obj )
#define bli_obj_vector_inc( x ) \
\
( bli_obj_is_scalar( x ) ? 1 : \
( bli_obj_length( x ) == 1 ? bli_obj_col_stride( x ) \
: bli_obj_row_stride( x ) )
: bli_obj_row_stride( x ) ) \
)
#define bli_obj_is_vector( x ) \
\
@@ -506,6 +508,11 @@ bli_obj_width_stored( obj )
( bli_obj_length( obj ) == 0 || \
bli_obj_width( obj ) == 0 )
#define bli_obj_is_scalar( x ) \
\
( bli_obj_length( x ) == 1 && \
bli_obj_width( x ) == 1 )
// Dimension modification

View File

@@ -314,6 +314,19 @@
else { mt = n; nt = m; rst = cs; cst = rs; } \
}
// blocksize-related
#define bli_determine_blocksize_dim_f( i, dim, b_alg ) \
\
( bli_min( b_alg, dim - i ) )
#define bli_determine_blocksize_dim_b( i, dim, b_alg ) \
\
( i == 0 && dim % b_alg != 0 ? dim % b_alg \
: b_alg )
// stride-related
#define bli_vector_inc( trans, m, n, rs, cs ) \