blis/kernels at ebf8721a5ca47d01200a1ecabd8c3079a5bfddde - blis

amd/blis

mirror of https://github.com/amd/blis.git synced 2026-05-12 01:59:59 +00:00

Files

Rayan, Rohan ebf8721a5c Optimizing sgemm rd kernels on zen3 (#293 )

Fixing some inefficiencies on the zen (AVX2) SUP RD kernel for SGEMM.
After performing the iteration for the 8 loop, the next loop that was being performed was the 1 loop for the k-direction.
This caused a lot of unnecessary iterations when the remainder of k < 8.
This has been fixed by introducing masked operations for k < 8
When remainder of k == 1, we handle this with the original non-masked code (with a branch) as the masked code introduces more penalty because of the masking operation.
There were also some unnecessary instructions in the zen4 kernels which have been removed.

AMD-Internal: https://amd.atlassian.net/browse/CPUPL-7775
Co-authored-by: rohrayan@amd.com

2026-02-04 09:08:11 +05:30

armsve

Merge commit 'cfa3db3f' into amd-main

2024-07-08 06:09:11 -04:00

armv7a

Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs.

2021-09-29 16:43:38 -05:00

armv8a

Armv8 Trash New Bulk Kernels

2021-10-08 02:35:58 +09:00

bgq

Replaced use of bool_t type with C99 bool.