Files
blis/kernels
S, Hari Govind d29f3f0b5e Fix GCC 12+ instruction scheduling issue in complex scalv kernel (#149)
Replace fused multiply-add (FMA) intrinsics with explicit multiply and add/subtract operations in bli_cscalv_zen_int to resolve incorrect results with GCC 12 and later compilers.

The original code used register reuse pattern with _mm256_fmaddsub_ps() that causes GCC 12+ instruction scheduler to generate assembly with corrupted intermediate values due to register allocation conflicts. GCC 11 and earlier handled the same pattern correctly.

Changes:
- Replace _mm256_fmaddsub_ps() with _mm256_mul_ps() + _mm256_addsub_ps()
- Eliminate temp register reuse to fix instruction scheduling conflicts

AMD-Internal: [CPUPL-6445]
2025-08-22 14:23:43 +05:30
..
2021-10-08 02:35:58 +09:00
2024-08-05 15:35:08 -04:00
2024-08-05 15:35:08 -04:00
2024-08-05 15:35:08 -04:00
2023-11-23 08:54:31 -05:00
2020-07-22 18:24:26 +05:30
2025-08-19 18:19:51 +01:00