Fixed a performance bug in trsm.

Details:
- Fixed a bug in the reference implementations of the gemmtrsm wrappers
  (bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the
  reference gemm microkernel was hard-coded, and thus always called, even
  when GEMM_UKERNEL was defined to point to an optimzied microkernel. This
  manifested as artificially low trsm performance for all problem sizes, but
  especially for small problem sizes as it only affected blocks of A that
  intersected the diagonal. Thanks to Mike Kistler of IBM for helping me
  find this bug.
This commit is contained in:
Field G. Van Zee
2013-04-08 19:08:43 -05:00
parent a7252e40b5
commit 54988e8dca
2 changed files with 2 additions and 2 deletions

View File

@@ -69,5 +69,5 @@ void PASTEMAC(ch,varname)( \
c, rs_c, cs_c ); \
}
INSERT_GENTFUNC_BASIC2( gemmtrsm_l_ref_mxn, gemm_ref_mxn, trsm_l_ref_mxn )
INSERT_GENTFUNC_BASIC2( gemmtrsm_l_ref_mxn, GEMM_UKERNEL, TRSM_L_UKERNEL )

View File

@@ -69,5 +69,5 @@ void PASTEMAC(ch,varname)( \
c, rs_c, cs_c ); \
}
INSERT_GENTFUNC_BASIC2( gemmtrsm_u_ref_mxn, gemm_ref_mxn, trsm_u_ref_mxn )
INSERT_GENTFUNC_BASIC2( gemmtrsm_u_ref_mxn, GEMM_UKERNEL, TRSM_U_UKERNEL )