- Added Smart Threading logic for AVX-512 based SGEMM SUP.
- Calculating ic and jc for optimal work distribution to the allocated
threads based on logic similar to Zen3.
- Zen4 Architecture specific Native-to-SUP check has been added to
redirect few Native inputs to the SUP path based on the fact that in a
multi-threaded environment some Native cases perfom better as SUP.
- For the same, the SUP thresholds, namely, BLIS_MT and BLIS_NT have
been increased from 512 and 200 to 682 and 512, respectively.
- Further optimizations to the work distribution logic will be added
subsequently.
AMD-Internal: [CPUPL-3248]
Change-Id: Ibccbbefef251010ec94bd37ffc86c35b7866a5ca