mirror of
https://github.com/amd/blis.git
synced 2026-05-12 10:05:38 +00:00
- Performance of u8s8s16os16 came down by 40% after the
introduction of post-ops
- Analysis revealed that the target compiler assumed false
dependency and was generating sub-optimal code due to the
post-ops structure
- Inserted vzeroupper to hint the compiler that no ISA change
will occur
AMD-Internal: [CPUPL-2447]
Change-Id: I0b383b9742ad237d0e053394602428872691ef0c