mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
Details:
1. Runtime Thread Control Feature is enhanced to create a provision
for the application to allocate a different number of threads to
BLIS from the number of threads application is using for itself.
2. In the previous implementation, if application sets BLIS_NUM_THREADS
with a valid value, BLIS internally calls omp_set_num_threads() API with
same value. Due to this, application could not differentiate between
the number of threads used in BLIS library and the application.
3. With the current solution, if Application wants to allocate
different number of threads for BLIS API and application, Application
can choose either BLIS_NUM_THREADS environment variable or
bli_thread_set_num_threads(nt) API for BLIS,
and OpenMP APIs or environment variables for itself, respectively.
4. If BLIS_NUM_THREADS is set with a valid value, same value
will be used in the subsequent parallel regions unless
bli_thread_set_num_threads() API is used by the Application
to modify the desired number of threads during BLIS API execution.
5. Once BLIS_NUM_THREADS environment variable or
bli_thread_set_num_threads(nt) API is used by the application,
BLIS module would always give precedence to these values. BLIS API would
not consider the values set using OpenMP API omp_set_num_threads(nt) API
or OMP_NUM_THREADS environment variable.
6. If BLIS_NUM_THREADS is not set, then if Application is multithreaded and
issued omp_set_num_threads(nt) with desired number of threads,
omp_get_max_threads() API will fetch the number of threads set earlier.
7. If BLIS_NUM_THREADS is not set, omp_set_num_threads(nt) is not called
by the application, but only OMP_NUM_THREADS is set,
omp_get_max_threads() API will fetch the value of OMP_NUM_THREADS.
8. If both environment variables are not set, or if they are set with
invalid values, and omp_set_num_threads(nt) is not issued by
application, omp_get_max_threads() API will return the number of the
cores in the current context.
9. BLIS will initialize rntm->num_threads with the same value.
However if omp_set_nested is false - BLIS APIs called from parallel
threads will run in sequential. But if nested parallelism is enabled
Then each application will launch MT BLIS.
10. Order of precedence used for number of threads:
0. value set using bli_thread_set_num_threads(nt) by the application
1. valid value set for BLIS_NUM_THREADS environment variable
2. omp_set_num_threads(nt) issued by the application
3. valid value set for OMP_NUM_THREADS environment variable
4. Number of cores
11. If nt is not a valid value for omp_set_num_threads(nt) API,
number of threads would be set to 1.
omp_get_max_threads() API will return 1.
12. OMP_NUM_THREADS env. variable is applicable only when OpenMP is enabled.
AMD-Internal: [CPUPL-2342]
Change-Id: I2041ac1d824f0b57a23a2a69abd6017c800f21b6