mirror of
https://github.com/amd/blis.git
synced 2026-06-29 02:37:05 +00:00
APIs like GEMV use DOTXF (for parts of problem which are multiple of fuse_factor) and DOTXV (for parts not multiple of fuse_factor). DOTXF and DOTXV use different numbers of temporary accumulation registers(rho). This results in different round offs which can be significant when sizes are small and problem is about equally divided between DOTXF and DOTXV. To fix this, the number of temporary accumulation (and therefore roundoffs) and have made identical across both kernels. Known related GCC bugs to reference GCC Bug #56812 — incorrect code with vzeroupper and register allocation GCC Bug #95483 — vzeroupper clobbers live values GCC Bug #101617 — wrong code generation with AVX intrinsics and transitions AMD-Internal:CPUPL-8015