mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
Optimisation of DTRSM and ZTRSM
1. Extract instruction replaced with cast when accessing first 128bit, as cast inst needs no cycle but extract takes few cycles 2. Added prefetch of A buffer when computing gemm operation 3. Added prefetch of C11 buffer before TRSM operation, with offset of 7 to cs_c With above changes performance improvements observed in case of Single thread Change-Id: Id377c490ddac8b06384acfa9a6d89dbe11bbc7be
This commit is contained in: