mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
Fixed a performance bug in trsm.
Details: - Fixed a bug in the reference implementations of the gemmtrsm wrappers (bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the reference gemm microkernel was hard-coded, and thus always called, even when GEMM_UKERNEL was defined to point to an optimzied microkernel. This manifested as artificially low trsm performance for all problem sizes, but especially for small problem sizes as it only affected blocks of A that intersected the diagonal. Thanks to Mike Kistler of IBM for helping me find this bug.
This commit is contained in:
@@ -69,5 +69,5 @@ void PASTEMAC(ch,varname)( \
|
||||
c, rs_c, cs_c ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC2( gemmtrsm_l_ref_mxn, gemm_ref_mxn, trsm_l_ref_mxn )
|
||||
INSERT_GENTFUNC_BASIC2( gemmtrsm_l_ref_mxn, GEMM_UKERNEL, TRSM_L_UKERNEL )
|
||||
|
||||
|
||||
@@ -69,5 +69,5 @@ void PASTEMAC(ch,varname)( \
|
||||
c, rs_c, cs_c ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC2( gemmtrsm_u_ref_mxn, gemm_ref_mxn, trsm_u_ref_mxn )
|
||||
INSERT_GENTFUNC_BASIC2( gemmtrsm_u_ref_mxn, GEMM_UKERNEL, TRSM_U_UKERNEL )
|
||||
|
||||
|
||||
Reference in New Issue
Block a user