mirror of
https://github.com/amd/blis.git
synced 2026-04-20 07:38:53 +00:00
Fixing some inefficiencies on the zen4 SUP RD kernel for SGEMM The loops for the 8 and 1 iteration of the K-loop were performing loads on ymm/xmm registers and computation on zmm registers This caused multiple unnecessary iterations in the kernel for matrices with certain k-values. Fixed by introducing masked loads and computations for these cases AMD-Internal: https://amd.atlassian.net/browse/CPUPL-7762 Co-authored-by: Rohan Rayan <rohrayan@amd.com>