- Updated the thresholds to enter the AVX512 SUP codepath in
ZGEMM(on ZEN5). This caters to inputs that scale well with
multithreaded-execution(in the SUP path).
- Also updated the thresholds to decide ideal threads, based on
'm', 'n' and 'k' values. The thread-setting logic involves
determining the number of tiles for computation, and using them
to further tune for the optimal number of threads.
- This logic builds over the assumption that the current thread
factorization logic is optimal. Thus, an additional data analysis
was performed(on the existing ZEN4 and the new ZEN5 thresholds),
to also cover the corner cases, where this assumption doesn't hold
true.
- As part of the future work, we could reimplement the thread
factorization for GEMM, which would additionally require a new
set of threshold tuning for every datatype.
AMD-Internal: [CPUPL-7028]
Co-authored-by: Vignesh Balasubramanian <vignbala@amd.com>