mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
Fixed x86_64 kernel bugs and other minor issues.
Details: - Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in unaligned subpartitions. We were already going out of our way a bit to handle edge cases in the first iteration for blocked variants, and this was simply the unblocked-fused extension of that idea. - Fixed control tree handling in her/her2/syr/syr2 that was not taking into account how the choice of variant needed to be altered for upper-stored matrices (given that only lower-stored algorithms are explicitly implemented). - Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b() macros to provide inlined versions of bli_determine_blocksize_[fb]() for use by unblocked-fused variants. - Integrated new blocksize_dim macros into gemv/hemv unf variants for consistency with that of the bugfix for trmv/trsv (both of which now use the same macros). - Modified bli_obj_vector_inc() so that 1 is returned if the object is a vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain conditions (e.g. dotv_opt_var1), an invalid increment was returned, which was invalid only because the code was expecting 1 (for purposes of performing contiguous vector loads) but got a value greater than 1 because the column stride of the object (e.g. rho) was inflated for alignment purposes (albeit unnecessarily since there is only one element in the object). - Replaced some old invocations of set0 with set0s. - Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly. - Fixed increment bug in cleanup loop of gemm ukernel for x86_64. - Added safeguard to test modules so that testing a problem with a zero dimension does not result in a failure. - Tweaked handling of zero dimensions in level-2 and level-3 operations' internal back-ends to correctly handle cases where output operand still needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
This commit is contained in:
@@ -485,8 +485,10 @@ bli_obj_width_stored( obj )
|
||||
|
||||
#define bli_obj_vector_inc( x ) \
|
||||
\
|
||||
( bli_obj_is_scalar( x ) ? 1 : \
|
||||
( bli_obj_length( x ) == 1 ? bli_obj_col_stride( x ) \
|
||||
: bli_obj_row_stride( x ) )
|
||||
: bli_obj_row_stride( x ) ) \
|
||||
)
|
||||
|
||||
#define bli_obj_is_vector( x ) \
|
||||
\
|
||||
@@ -506,6 +508,11 @@ bli_obj_width_stored( obj )
|
||||
( bli_obj_length( obj ) == 0 || \
|
||||
bli_obj_width( obj ) == 0 )
|
||||
|
||||
#define bli_obj_is_scalar( x ) \
|
||||
\
|
||||
( bli_obj_length( x ) == 1 && \
|
||||
bli_obj_width( x ) == 1 )
|
||||
|
||||
|
||||
// Dimension modification
|
||||
|
||||
|
||||
@@ -314,6 +314,19 @@
|
||||
else { mt = n; nt = m; rst = cs; cst = rs; } \
|
||||
}
|
||||
|
||||
|
||||
// blocksize-related
|
||||
|
||||
#define bli_determine_blocksize_dim_f( i, dim, b_alg ) \
|
||||
\
|
||||
( bli_min( b_alg, dim - i ) )
|
||||
|
||||
#define bli_determine_blocksize_dim_b( i, dim, b_alg ) \
|
||||
\
|
||||
( i == 0 && dim % b_alg != 0 ? dim % b_alg \
|
||||
: b_alg )
|
||||
|
||||
|
||||
// stride-related
|
||||
|
||||
#define bli_vector_inc( trans, m, n, rs, cs ) \
|
||||
|
||||
Reference in New Issue
Block a user