Files
blis/kernels
Balasubramanian, Vignesh c96e7eb197 Threshold tuning for code-paths and optimal thread selection for ZGEMM(ZEN5)
- Updated the thresholds to enter the AVX512 SUP codepath in
  ZGEMM(on ZEN5). This caters to inputs that scale well with
  multithreaded-execution(in the SUP path).

- Also updated the thresholds to decide ideal threads, based on
  'm', 'n' and 'k' values. The thread-setting logic involves
  determining the number of tiles for computation, and using them
  to further tune for the optimal number of threads.

- This logic builds over the assumption that the current thread
  factorization logic is optimal. Thus, an additional data analysis
  was performed(on the existing ZEN4 and the new ZEN5 thresholds),
  to also cover the corner cases, where this assumption doesn't hold
  true.

- As part of the future work, we could reimplement the thread
  factorization for GEMM, which would additionally require a new
  set of threshold tuning for every datatype.

AMD-Internal: [CPUPL-7028]

Co-authored-by: Vignesh Balasubramanian <vignbala@amd.com>
2025-08-01 16:02:12 +05:30
..
2021-10-08 02:35:58 +09:00
2024-08-05 15:35:08 -04:00
2024-08-05 15:35:08 -04:00
2024-08-05 15:35:08 -04:00
2023-11-23 08:54:31 -05:00
2020-07-22 18:24:26 +05:30