Change usage of global_rntm and tl_rntm to elimate need
for mutex operations when accessing global_rntm. Usage of
these data structures is now as follows:
* global_rntm is set once during bli_init_apis and includes
all getenv calls to check BLIS threading and error printing
environment variables. global_rntm is then read-only.
* tl_rntm is intialized once from global_rntm on each
application thread. Any calls to BLIS set threading/ways
APIs will update tl_rntm for that application thread only
(Previously they updated global_rntm for all application threads).
* Re-initialize info_value in tl_rntm in every call to bli_init APIs.
* In bli_rntm_init_from_global() we initialize the local (per API
call) rntm as a copy of tl_rntm and then update threading values
in bli_thread_update_rntm_from_env() to reflect the current status
of OpenMP runtime ICVs.
AMD-Internal: [CPUPL-6168][SWLCSG-3143]
Change-Id: Ib9387ee2b51f507ed08cc38267057109acea14a6