mirror of
https://github.com/amd/blis.git
synced 2026-06-29 02:37:05 +00:00
- Replace mul+add with FMA for ddot, daxpy and daxpyf - Using masked operations where possible - Non-unit stride code paths still use scalar loops, but use FMAs for accuracy AMD-Internal: CPUPL-8055 Co-authored-by: Rohan Rayan rohrayan@amd.com