Add AVX2 path for TRSM+GEMM combination.

- Enabled AVX2 TRSM + GEMM kernel path, when GEMM is called
  from TRSM context it will invoke AVX2 GEMM kernels instead
  of the default AVX-512 GEMM kernels.

- The default context has the block sizes for AVX512 GEMM
  kernels, however, TRSM uses AVX2 GEMM kernels and they
  need different block sizes.

- Added new API bli_zen4_override_trsm_blkszs(). It overrides
  default block sizes in context with block sizes needed for
  AVX2 GEMM kernels.

- Added new API bli_zen4_restore_default_blkszs(). It restores
  The block sizes to there default values (as needed by default
   AVX512 GEMM kernels).

- Updated bli_trsm_front() to override the block sizes in the
  context needed by TRSM + AVX2 GEMM kernels and restore them
  to the default values at the end of this function. It is done
  in bli_trsm_front() so that we override the context before
  creating different threads.

AMD-Internal: [CPUPL-2225]
Change-Id: Ie92d0fc40f94a32dfb865fe3771dc14ed7884c55
This commit is contained in:
Dipal M Zambare
2022-05-18 11:01:41 +05:30
committed by Dipal M. Zambare
parent d4bb906094
commit 2ba2fb2b63
12 changed files with 311 additions and 34 deletions

View File

@@ -802,10 +802,11 @@ typedef enum
BLIS_GEMMTRSM_L_UKR,
BLIS_GEMMTRSM_U_UKR,
BLIS_TRSM_L_UKR,
BLIS_TRSM_U_UKR
BLIS_TRSM_U_UKR,
BLIS_GEMM_AVX2_UKR
} l3ukr_t;
#define BLIS_NUM_LEVEL3_UKRS 5
#define BLIS_NUM_LEVEL3_UKRS 6
typedef enum