mirror of
https://github.com/amd/blis.git
synced 2026-05-12 18:15:37 +00:00
Added all fringe kernels with mask load store support Fringe kernels cover m direction from 5 to 1 and n direction from 15 to 1 for row storage format - New edge kernels that uses masked load-store instructions for handling corner cases. - Mask load-store instruction macros are added. vmaskmovps, VMASKMOVPS for masked load-store. - It improves performance by reducing branching overhead and by being more cache friendly. - Mask load-store is added only for row storage format AMD-Internal: [CPUPL-4041] Change-Id: I563c036c79bf8e476a8ebde37f8f6db751fb3456