Threshold tuning for code-paths and optimal thread selection for ZGEMM(ZEN5)

- Updated the thresholds to enter the AVX512 SUP codepath in
  ZGEMM(on ZEN5). This caters to inputs that scale well with
  multithreaded-execution(in the SUP path).

- Also updated the thresholds to decide ideal threads, based on
  'm', 'n' and 'k' values. The thread-setting logic involves
  determining the number of tiles for computation, and using them
  to further tune for the optimal number of threads.

- This logic builds over the assumption that the current thread
  factorization logic is optimal. Thus, an additional data analysis
  was performed(on the existing ZEN4 and the new ZEN5 thresholds),
  to also cover the corner cases, where this assumption doesn't hold
  true.

- As part of the future work, we could reimplement the thread
  factorization for GEMM, which would additionally require a new
  set of threshold tuning for every datatype.

AMD-Internal: [CPUPL-7028]

Co-authored-by: Vignesh Balasubramanian <vignbala@amd.com>
This commit is contained in:
Balasubramanian, Vignesh
2025-08-01 16:02:12 +05:30
committed by GitHub
parent 1bb1160061
commit c96e7eb197
4 changed files with 468 additions and 237 deletions

View File

@@ -87,10 +87,8 @@ bool bli_cntx_gemmsup_thresh_is_met_zen4( obj_t* a, obj_t* b, obj_t* c, cntx_t*
// The threshold for m is a single value, but for n, it is
// also based on the packing size of A, since the kernels are
// column preferential
if( ( ( m <= 1380 ) || ( n <= 1520 ) || ( k <= 128 ) ) ) return TRUE;
if( ( m <= 1380 ) || ( n <= 1520 ) || ( k <= 128 ) ) return TRUE;
// For all combinations in small sizes
if( ( m <= 512 ) && ( n <= 512 ) && ( k <= 512 ) ) return TRUE;
return FALSE;
}
else if( dt == BLIS_SCOMPLEX )

View File

@@ -87,10 +87,8 @@ bool bli_cntx_gemmsup_thresh_is_met_zen5( obj_t* a, obj_t* b, obj_t* c, cntx_t*
// The threshold for m is a single value, but for n, it is
// also based on the packing size of A, since the kernels are
// column preferential
if( ( m <= 60 ) || ( ( n <= 60 ) && ( m <= 960 ) && ( k <= 16384 ) ) || ( k <= 8 ) ) return TRUE;
if( ( m <= 1380 ) || ( n <= 1520 ) || ( k <= 128 ) ) return TRUE;
// For all combinations in small sizes
if( ( m <= 216 ) && ( n <= 216 ) && ( k <= 216 ) ) return TRUE;
return FALSE;
}
else if( dt == BLIS_SCOMPLEX )