mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
Details:
- Fixed a bug introduced in 1c732d3 that affected trsm1m_r. The result
was nondeterministic behavior (usually segmentation faults) for certain
problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The
cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c
which explicitly directed the virtual gemm micro-kernel to use temporary
space if the storage preference of the [real domain] gemm ukernel did
not match the storage of the output matrix C. In the context of gemm,
this handling is not needed because agreement between the storage pref
and the matrix is guaranteed by a high-level optimization in BLIS.
However, this optimization is not applied to trsm because the storage
of C is not necessarily the same as the storage of the micro-panels of
B--both of which are updated by the micro-kernel during a trsm
operation. Thus, the guarantee of storage/preference agreement is not
in place for trsm, which means we must handle that case within the
virtual gemm micro-kernel.
- Comment updates and a minor macro change to bli_trsm*_cntx_init() for
3m1, 4m1a, and 1m.