mirror of
https://github.com/amd/blis.git
synced 2026-05-13 18:52:14 +00:00
Details:
- Fixed a performance regression affecting nearly all level-3 operations
that use the 'haswell' sgemm and dgemm microkernels. This regression
was introduced in 54fa28b, caused by an ill-formed conditional
expression in the assembly code that controls whether cache lines of C
should be prefetched as rows or as columns. Essentially, the two
branches were reversed, causing incomplete prefetching to occur for
both row- and column-stored instances of matrix C. Thanks to Devin
Matthews for his help finding and fixing this bug.