Files
blis/kernels
Rayan, Rohan 9cbb1c45d8 Improving sgemm rd kernel on zen4/zen5 (#292)
Fixing some inefficiencies on the zen4 SUP RD kernel for SGEMM
The loops for the 8 and 1 iteration of the K-loop were performing loads on ymm/xmm registers and computation on zmm registers
This caused multiple unnecessary iterations in the kernel for matrices with certain k-values.
Fixed by introducing masked loads and computations for these cases

AMD-Internal: https://amd.atlassian.net/browse/CPUPL-7762
Co-authored-by: Rohan Rayan <rohrayan@amd.com>
2025-12-17 18:48:50 +05:30
..
2021-10-08 02:35:58 +09:00
2024-08-05 15:35:08 -04:00
2025-09-04 17:14:06 +01:00
2024-08-05 15:35:08 -04:00
2024-08-05 15:35:08 -04:00
2023-11-23 08:54:31 -05:00
2020-07-22 18:24:26 +05:30