mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
Details: - Previously, trsm was consolidating all ways of parallelism into the jr loop. This was unnecessary and to some degree detrimental on some types of hardware. Now, any parallelism bound for the jc loop will be applied to the jc loop, while all other loops' parallelism is funneled to the jr loop. Thanks to Devangi Parikh for helping investigate this issue and suggesting the fix. - NOTE: This change affects only left-side trsm. However, currently right-side trsm is currently implemented in terms of the left-side case, and thus the change effectively applies to both left and right cases.