Add vzeroupper to Haswell microkernels. (#524)

Details:
- Added vzeroupper instruction to the end of all 'gemm' and 'gemmtrsm'
  microkernels so as to avoid a performance penalty when mixing AVX
  and SSE instructions. These vzeroupper instructions were once part
  of the haswell kernels, but were inadvertently removed during a source
  code shuffle some time ago when we were managing duplicate 'haswell'
  and 'zen' kernel sets. Thanks to Devin Matthews for tracking this down
  and re-inserting the missing instructions.

Change-Id: I418fea9fed27ba3ad7d395cf96d1be507955d8e9
This commit is contained in:
Devin Matthews
2021-07-09 14:59:48 -05:00
committed by Dipal M Zambare
parent 2a81437bd8
commit 76fbf1233d

View File

@@ -870,7 +870,7 @@ void bli_sgemm_haswell_asm_6x16
label(.SDONE)
vzeroupper()
end_asm(
: // output operands (none)
@@ -1624,6 +1624,7 @@ void bli_dgemm_haswell_asm_6x8
label(.DDONE)
vzeroupper()
@@ -2158,7 +2159,7 @@ void bli_cgemm_haswell_asm_3x8
label(.CDONE)
vzeroupper()
end_asm(
: // output operands (none)
@@ -2758,7 +2759,7 @@ void bli_zgemm_haswell_asm_3x4
label(.ZDONE)
vzeroupper()
end_asm(
: // output operands (none)