Files
blis/kernels
Sharma, Shubham 1dca574a9d Improved fringe case handling for AVXPV kernel (#228)
- Current kernel uses masked AVX512 instructions to handle fringe cases.
- These instructions are slow on genoa.
- To handle sizes less than 8, AVX2 and SSE code has been added.
- Existing masked AVX512 code is performing better when n > 8 therefore it is still kept for handling larger sizes where n % 8 != 0.

AMD-Internal: [CPUPL-7467]
2025-10-10 14:25:35 +05:30
..
2021-10-08 02:35:58 +09:00
2024-08-05 15:35:08 -04:00
2025-09-04 17:14:06 +01:00
2024-08-05 15:35:08 -04:00
2024-08-05 15:35:08 -04:00
2023-11-23 08:54:31 -05:00
2020-07-22 18:24:26 +05:30