mirror of
https://github.com/amd/blis.git
synced 2026-05-13 10:35:38 +00:00
- Partial completion of compute was happening since BLIS was unable to launch the required number of threads. This was because rntm was returning a thread count greater than the maximum number of threads that can be launched in the subsequent parallel region. - Added 'omp_get_num_threads' inside the parallel regions to get the actual number of threads spawned. The work distribution happens based on the actual number of threads launched in that region. AMD-Internal: [CPUPL-3268] Change-Id: I086ad4b9b644f966b7bab439e43222396f0c2bf0