mirror of
https://github.com/amd/blis.git
synced 2026-07-02 21:27:52 +00:00
Fixing a memory issue in the cgemm zen4 packing kernel In the loop section where the leftover m and k iterations were handled, the load operations (in the k-direction) were missing the mask instructions which has now been added. Resolved memory-access issues in the SGEMM SUP kernels on AVX2 and AVX-512 by correcting instructions that could read invalid addresses in the C matrix. Standardized all instruction macros for sgemm to lowercase in the Zen4 kernel to improve readability and code consistency. AMD-Internal: CPUPL-8117 AMD-Internal: CPUPL-8189 Co-authored-by: Rohan Rayan rohrayan@amd.com