Files
Rayan, Rohan d512e3a736 Converting mul+add to FMA for ddot, daxpy and daxpyf zen kernels
- Replace mul+add with FMA for ddot, daxpy and daxpyf
- Using masked operations where possible
- Non-unit stride code paths still use scalar loops, but use FMAs for accuracy

AMD-Internal: CPUPL-8055
Co-authored-by: Rohan Rayan rohrayan@amd.com
2026-03-20 11:07:12 +05:30
..