mirror of
https://github.com/amd/blis.git
synced 2026-04-19 15:18:52 +00:00
- Updated the thresholds to enter the AVX512 SUP codepath in ZGEMM(on ZEN5). This caters to inputs that scale well with multithreaded-execution(in the SUP path). - Also updated the thresholds to decide ideal threads, based on 'm', 'n' and 'k' values. The thread-setting logic involves determining the number of tiles for computation, and using them to further tune for the optimal number of threads. - This logic builds over the assumption that the current thread factorization logic is optimal. Thus, an additional data analysis was performed(on the existing ZEN4 and the new ZEN5 thresholds), to also cover the corner cases, where this assumption doesn't hold true. - As part of the future work, we could reimplement the thread factorization for GEMM, which would additionally require a new set of threshold tuning for every datatype. AMD-Internal: [CPUPL-7028] Co-authored-by: Vignesh Balasubramanian <vignbala@amd.com>