mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
Fixed obscure 3m1/4m1a bugs in trmm[3] and trsm.
Details:
- Fixed a family of bugs in the triangular level-3 operations for
certain complex implementations (3m1 and 4m1a) that only manifest if
one of the register blocksizes (PACKMR/PACKNR, actually) is odd:
- Fixed incorrect imaginary stride computation in bli_packm_blk_var2()
for the triangular case.
- Fixed the incorrect computation of imaginary stride, as stored in
the auxinfo_t struct in trmm and trsm macro-kernels.
- Fixed incorrect pointer arithmetic in the trsm macro-kernels in the
cases where the the register blocksize for the triangular matrix is
odd. Introduced a new byte-granular pointer arithmetic macro,
bli_ptr_add(), that computes the correct value.
- Added cpp macro to bli_macro_defs.h for typeof() operator, defined in
terms of __typeof__, which is used by bli_ptr_add() macro.
- Disabled the row- vs. column-storage optimization in bli_trmm_front()
for singleton problems because the inherent ambiguity of whether a
scalar is row-stored or column-stored causes the wrong parameter
combination code to be executed (by dumb luck of our checking for
row storage first).
- Added commented-out debugging lines to 3m1/4m1a and reference
micro-kernels, and trsm_ll macro-kernel.
This commit is contained in:
@@ -49,6 +49,17 @@
|
||||
#endif
|
||||
|
||||
|
||||
// -- Define typeof() operator if using non-GNU compiler --
|
||||
|
||||
#ifndef __GNUC__
|
||||
#define typeof __typeof__
|
||||
#else
|
||||
#ifndef typeof
|
||||
#define typeof __typeof__
|
||||
#endif
|
||||
#endif
|
||||
|
||||
|
||||
// -- Boolean values --
|
||||
|
||||
#ifndef TRUE
|
||||
|
||||
@@ -653,6 +653,20 @@
|
||||
bli_is_rpi_packed( schema ) )
|
||||
|
||||
|
||||
// pointer-related
|
||||
|
||||
// p1 = p0 + (num/dem)
|
||||
#define bli_ptr_add( p1, p0, num, dem ) \
|
||||
{ \
|
||||
p1 = ( typeof( p1 ) ) \
|
||||
( ( char* )(p0) + ( ( (num) * sizeof( *(p0) ) \
|
||||
) / (dem) \
|
||||
) \
|
||||
); \
|
||||
}
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
// return datatype for char
|
||||
|
||||
Reference in New Issue
Block a user