Files
Sharma, Shubham 4e84bbfb68 Ensure Accumulation Consistency Across DOTXF and DOTXV kernels (#325)
APIs like GEMV use DOTXF (for parts of problem which are multiple of fuse_factor) and DOTXV (for parts not multiple of fuse_factor).

DOTXF and DOTXV use different numbers of temporary accumulation registers(rho).

This results in different round offs which can be significant when sizes are small and problem is about equally divided between DOTXF and DOTXV.

To fix this, the number of temporary accumulation (and therefore roundoffs) and have made identical across both kernels.

Known related GCC bugs to reference

GCC Bug #56812 — incorrect code with vzeroupper and register allocation
GCC Bug #95483 — vzeroupper clobbers live values
GCC Bug #101617 — wrong code generation with AVX intrinsics and transitions

AMD-Internal:CPUPL-8015
2026-03-03 16:38:55 +05:30
..