Added new ZTRSM small code path for ZEN5

- Added new ZTRSM kernels for right and left variants.
- Kernel dimensions are 12x4.
- 12x4 ZGEMM SUP kernels are used internally
  for solving GEMM subproblem.
- These kernels do not support conjugate transpose.
- Only column major inputs are supported.
- Tuned thresholds to pick efficent code path for ZEN5.

AMD-Internal: [CPUPL-6356]
Change-Id: I33ba3d337b0fcd972ca9cfe4668cb23d2b279b6e
This commit is contained in:
Shubham Sharma
2025-02-06 18:01:10 +05:30
parent 2e687d8847
commit f8c83fedb6
8 changed files with 3545 additions and 1648 deletions

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -75,6 +75,11 @@ TRSMSMALL_KER_PROT( d, trsm_small_XAutB_XAlB_ZEN5 )
TRSMSMALL_KER_PROT( d, trsm_small_AltXB_AuXB_ZEN5 )
TRSMSMALL_KER_PROT( d, trsm_small_AutXB_AlXB_ZEN5 )
TRSMSMALL_KER_PROT( z, trsm_small_XAltB_XAuB_ZEN5 )
TRSMSMALL_KER_PROT( z, trsm_small_XAutB_XAlB_ZEN5 )
TRSMSMALL_KER_PROT( z, trsm_small_AltXB_AuXB_ZEN5 )
TRSMSMALL_KER_PROT( z, trsm_small_AutXB_AlXB_ZEN5 )
#ifdef BLIS_ENABLE_OPENMP
err_t bli_trsm_small_mt_ZEN5
(