From 76fbf1233d68d00a54ebf62691d6457cef623c8b Mon Sep 17 00:00:00 2001 From: Devin Matthews Date: Fri, 9 Jul 2021 14:59:48 -0500 Subject: [PATCH] Add vzeroupper to Haswell microkernels. (#524) Details: - Added vzeroupper instruction to the end of all 'gemm' and 'gemmtrsm' microkernels so as to avoid a performance penalty when mixing AVX and SSE instructions. These vzeroupper instructions were once part of the haswell kernels, but were inadvertently removed during a source code shuffle some time ago when we were managing duplicate 'haswell' and 'zen' kernel sets. Thanks to Devin Matthews for tracking this down and re-inserting the missing instructions. Change-Id: I418fea9fed27ba3ad7d395cf96d1be507955d8e9 --- kernels/haswell/3/bli_gemm_haswell_asm_d6x8.c | 7 ++++--- 1 file changed, 4 insertions(+), 3 deletions(-) diff --git a/kernels/haswell/3/bli_gemm_haswell_asm_d6x8.c b/kernels/haswell/3/bli_gemm_haswell_asm_d6x8.c index 59e239fe1..e6d47268f 100644 --- a/kernels/haswell/3/bli_gemm_haswell_asm_d6x8.c +++ b/kernels/haswell/3/bli_gemm_haswell_asm_d6x8.c @@ -870,7 +870,7 @@ void bli_sgemm_haswell_asm_6x16 label(.SDONE) - + vzeroupper() end_asm( : // output operands (none) @@ -1624,6 +1624,7 @@ void bli_dgemm_haswell_asm_6x8 label(.DDONE) + vzeroupper() @@ -2158,7 +2159,7 @@ void bli_cgemm_haswell_asm_3x8 label(.CDONE) - + vzeroupper() end_asm( : // output operands (none) @@ -2758,7 +2759,7 @@ void bli_zgemm_haswell_asm_3x4 label(.ZDONE) - + vzeroupper() end_asm( : // output operands (none)