mirror of
https://github.com/amd/blis.git
synced 2026-04-20 07:38:53 +00:00
Revamped bli_init() to use TLS where feasible. (#767)
Details:
- Revamped bli_init_apis() and bli_finalize_apis() to use separate
bli_pthread_switch_t objects for each of the five sub-API init
functions, with the objects for the 'ind' and 'rntm' sub-APIs being
declared with BLIS_THREAD_LOCAL. This allows some APIs to be treated
as thread-local and the rest as thread-shared. Thanks to Edward Smyth
for requesting application thread-specific rntm_t structs, which
inspired these change.
- Combined bli_thread_init_from_env() and bli_pack_init_from_env() into
a new function, bli_rntm_init_rntm_from_env(), and placed the combined
code in bli_rntm.c inside of a new bli_rntm_init() function. Then
removed the (now empty) bli_pack_init() and _finalize() function defs.
- Deprecated bli_rntm_init() for the purposes of initializing a rntm_t
(temporarily preserving it as bli_rntm_clear() in a cpp-undefined code
block) so that the function name could be used for the aforementioned
bli_rntm_init() function.
- Updated libblis_test_pobj_create() in test_libblis.c to use a static
rntm_t initializer instead of the deprecated bli_rntm_init()
function-based option.
- Minor updates to docs/Multithreading.md, including removal of
bli_rntm_init() in the example of how to initialize rntm_t structs.
- Changed the return value of bli_gks_init(), bli_ind_init(),
bli_memsys_init(), bli_thread_init(), and bli_rntm_init() (and their
finalize() counterparts) from 'void' to 'int' so that those functions
match the function type expected by bli_pthread_switch_on()/_off().
Those init/finalize functions now return 0 to indicate success, which
is needed so that the switch actually changes state from off to on
and vice versa.
- Defined bli_thread_reset(), which copies the contents of the
global_rntm_at_init() struct into the global_rntm struct (for the
current application thread).
- Guard calls to bli_pthread_mutex_lock()/_unlock() in
- bli_pack_set_pack_a() and _pack_b()
- bli_rntm_init_from_global()
- bli_thread_set_ways()
- bli_thread_set_num_threads()
- bli_thread_set_thread_impl()
- bli_thread_reset()
- bli_l3_ind_oper_set_enable()
with #ifdef BLIS_DISABLE_TLS (since TLS precludes the possibility of
race conditions).
- In frame/base/bli_rntm.c, declare global_rntm, global_rntm_at_init,
and global_rntm_mutex as BLIS_THREAD_LOCAL so that separate
application threads can change the number of ways of BLIS parallelism
independently from one another.
- Access global_rntm only via a new private (not exported) function,
bli_global_rntm(). Defined a similar function for a rntm_t new to
this commit, global_rntm_at_init, which preserves the state of the
global rntm at initialization-time.
- In frame/3/bli_l3_ind.c, added a guard to the declaration of the
static variable oper_st_mutex with #ifdef BLIS_DISABLE_TLS so that the
mutex is omitted altogether when TLS is enabled (which prevents the
compiler from warning about an unused variable).
- Removed redundant code from bli_thread.c:
#ifdef BLIS_ENABLE_HPX
#include "bli_thread_hpx.h"
#endif
since this code is already present in bli_thread.h.
- Thanks to Minh Quan Ho for his review of and feedback on this commit.
- Comment updates.
This commit is contained in:
@@ -205,6 +205,8 @@ If you still wish to set the parallelization scheme globally, but you want to do
|
||||
|
||||
**Note**: Regardless of which way ([automatic](Multithreading.md#globally-at-runtime-the-automatic-way) or [manual](Multithreading.md#globally-at-runtime-the-manual-way)) the global runtime API is used to specify multithreading, that specification will affect operation of BLIS through **both** the BLAS compatibility layer as well as the native ([typed](docs/BLISTypedAPI.md) and [object](docs/BLISObjectAPI.md)) APIs that are unique to BLIS.
|
||||
|
||||
If BLIS is being used by two or more application-level threads, each of those application threads will track their own global state for the purpose of specifying parallelism. We felt this makes sense because each application thread may wish to specify a different parallelization scheme without affecting the scheme for the other application thread(s).
|
||||
|
||||
### Globally at runtime: the automatic way
|
||||
|
||||
If you simply want to specify an overall number of threads and let BLIS choose a thread factorization automatically, use the following function:
|
||||
@@ -281,10 +283,6 @@ If you want to initialize it as part of the declaration, you may do so via the d
|
||||
```c
|
||||
rntm_t rntm = BLIS_RNTM_INITIALIZER;
|
||||
```
|
||||
Alternatively, you can perform the same initialization by passing the address of the `rntm_t` to an initialization function:
|
||||
```c
|
||||
bli_rntm_init( &rntm );
|
||||
```
|
||||
As of this writing, BLIS treats a default-initialized `rntm_t` as a request for single-threaded execution.
|
||||
|
||||
**Note**: If you choose to **not** initialize the `rntm_t` object and then pass it into a level-3 operation, **you will almost surely observe undefined behavior!** Please don't do this!
|
||||
|
||||
@@ -36,8 +36,8 @@
|
||||
#include "blis.h"
|
||||
|
||||
// This array tracks whether a particular operation is implemented for each of
|
||||
// the induced methods.
|
||||
static bool bli_l3_ind_oper_impl[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS] =
|
||||
// the induced methods. This array is meant to be read-only.
|
||||
static const bool bli_l3_ind_oper_impl[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS] =
|
||||
{
|
||||
/* gemm gemmt hemm herk her2k symm syrk syr2k trmm3 trmm trsm */
|
||||
/* 1m */ { TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE },
|
||||
@@ -64,6 +64,11 @@ bool bli_l3_ind_oper_st[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS][2] =
|
||||
{TRUE,TRUE}, {TRUE,TRUE}, {TRUE,TRUE}, {TRUE,TRUE}, {TRUE,TRUE} },
|
||||
};
|
||||
|
||||
// A mutex to allow synchronous access to the bli_l3_ind_oper_st array.
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
static bli_pthread_mutex_t oper_st_mutex = BLIS_PTHREAD_MUTEX_INITIALIZER;
|
||||
#endif
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
#undef GENFUNC
|
||||
@@ -191,9 +196,6 @@ void bli_l3_ind_oper_set_enable_all( opid_t oper, num_t dt, bool status )
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
// A mutex to allow synchronous access to the bli_l3_ind_oper_st array.
|
||||
static bli_pthread_mutex_t oper_st_mutex = BLIS_PTHREAD_MUTEX_INITIALIZER;
|
||||
|
||||
void bli_l3_ind_oper_set_enable( opid_t oper, ind_t method, num_t dt, bool status )
|
||||
{
|
||||
num_t idt;
|
||||
@@ -218,8 +220,11 @@ void bli_l3_ind_oper_set_enable( opid_t oper, ind_t method, num_t dt, bool statu
|
||||
|
||||
idt = bli_ind_map_cdt_to_index( dt );
|
||||
|
||||
// Acquire the mutex protecting bli_l3_ind_oper_st.
|
||||
// If TLS is disabled, we need to use a mutex to protect the status array
|
||||
// since it will be shared with all application threads.
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_lock( &oper_st_mutex );
|
||||
#endif
|
||||
|
||||
// BEGIN CRITICAL SECTION
|
||||
{
|
||||
@@ -227,8 +232,9 @@ void bli_l3_ind_oper_set_enable( opid_t oper, ind_t method, num_t dt, bool statu
|
||||
}
|
||||
// END CRITICAL SECTION
|
||||
|
||||
// Release the mutex protecting bli_l3_ind_oper_st.
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_unlock( &oper_st_mutex );
|
||||
#endif
|
||||
}
|
||||
|
||||
bool bli_l3_ind_oper_get_enable( opid_t oper, ind_t method, num_t dt )
|
||||
|
||||
@@ -59,10 +59,18 @@ typedef void (*ind_cntx_init_ft)( ind_t method, cntx_t* cntx );
|
||||
static cntx_t* cached_cntx_nat = NULL;
|
||||
static cntx_t* cached_cntx_ind = NULL;
|
||||
|
||||
// A mutex to allow synchronous access to the gks when it needs to be updated
|
||||
// with a new entry corresponding to a context for an ind_t value.
|
||||
static bli_pthread_mutex_t gks_mutex = BLIS_PTHREAD_MUTEX_INITIALIZER;
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_gks_init( void )
|
||||
int bli_gks_init( void )
|
||||
{
|
||||
// NOTE: This function is called once by ONLY ONE application thread per
|
||||
// library init/finalize cycle (see bli_init.c). Thus, a mutex is not
|
||||
// needed to protect the data initialization.
|
||||
|
||||
{
|
||||
// Initialize the internal data structure we use to track registered
|
||||
// contexts.
|
||||
@@ -261,11 +269,13 @@ void bli_gks_init( void )
|
||||
cached_cntx_nat = ( cntx_t* )bli_gks_query_nat_cntx_noinit();
|
||||
cached_cntx_ind = ( cntx_t* )bli_gks_query_ind_cntx_noinit( BLIS_1M );
|
||||
#endif
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_gks_finalize( void )
|
||||
int bli_gks_finalize( void )
|
||||
{
|
||||
arch_t id;
|
||||
ind_t ind;
|
||||
@@ -318,6 +328,8 @@ void bli_gks_finalize( void )
|
||||
cached_cntx_nat = NULL;
|
||||
cached_cntx_ind = NULL;
|
||||
#endif
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
@@ -613,10 +625,6 @@ const cntx_t* bli_gks_query_ind_cntx_noinit
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
// A mutex to allow synchronous access to the gks when it needs to be updated
|
||||
// with a new entry corresponding to a context for an ind_t value.
|
||||
static bli_pthread_mutex_t gks_mutex = BLIS_PTHREAD_MUTEX_INITIALIZER;
|
||||
|
||||
const cntx_t* bli_gks_query_ind_cntx_impl
|
||||
(
|
||||
ind_t ind
|
||||
|
||||
@@ -35,8 +35,8 @@
|
||||
#ifndef BLIS_GKS_H
|
||||
#define BLIS_GKS_H
|
||||
|
||||
void bli_gks_init( void );
|
||||
void bli_gks_finalize( void );
|
||||
int bli_gks_init( void );
|
||||
int bli_gks_finalize( void );
|
||||
|
||||
void bli_gks_init_index( void );
|
||||
|
||||
|
||||
@@ -42,8 +42,14 @@ static const char* bli_ind_impl_str[BLIS_NUM_IND_METHODS] =
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_ind_init( void )
|
||||
int bli_ind_init( void )
|
||||
{
|
||||
// NOTE: If TLS is enabled, this function is called once by EACH application
|
||||
// thread per library init/finalize cycle (see bli_init.c). In this case,
|
||||
// the threads will initialize thread-local data (see bli_l3_ind.c). If TLS
|
||||
// is disabled, this function is called once by ONLY ONE application thread.
|
||||
// In neither case is a mutex needed to protect the data initialization.
|
||||
|
||||
// NOTE: We intentionally call bli_gks_query_nat_cntx_noinit() in order
|
||||
// to avoid the internal call to bli_init_once().
|
||||
const cntx_t* cntx = bli_gks_query_nat_cntx_noinit();
|
||||
@@ -62,10 +68,13 @@ void bli_ind_init( void )
|
||||
|
||||
if ( c_is_ref && !s_is_ref ) bli_ind_enable_dt( BLIS_1M, BLIS_SCOMPLEX );
|
||||
if ( z_is_ref && !d_is_ref ) bli_ind_enable_dt( BLIS_1M, BLIS_DCOMPLEX );
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void bli_ind_finalize( void )
|
||||
int bli_ind_finalize( void )
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
@@ -38,8 +38,8 @@
|
||||
// level-3 induced method management
|
||||
#include "bli_l3_ind.h"
|
||||
|
||||
void bli_ind_init( void );
|
||||
void bli_ind_finalize( void );
|
||||
int bli_ind_init( void );
|
||||
int bli_ind_finalize( void );
|
||||
|
||||
BLIS_EXPORT_BLIS void bli_ind_enable( ind_t method );
|
||||
BLIS_EXPORT_BLIS void bli_ind_disable( ind_t method );
|
||||
|
||||
@@ -64,28 +64,34 @@ void bli_finalize_auto( void )
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
static bli_pthread_switch_t lib_state = BLIS_PTHREAD_SWITCH_INIT;
|
||||
|
||||
void bli_init_once( void )
|
||||
{
|
||||
bli_pthread_switch_on( &lib_state, bli_init_apis );
|
||||
bli_init_apis();
|
||||
}
|
||||
|
||||
void bli_finalize_once( void )
|
||||
{
|
||||
bli_pthread_switch_off( &lib_state, bli_finalize_apis );
|
||||
bli_finalize_apis();
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
static bli_pthread_switch_t gks_g_state = BLIS_PTHREAD_SWITCH_INIT;
|
||||
static BLIS_THREAD_LOCAL
|
||||
bli_pthread_switch_t ind_l_state = BLIS_PTHREAD_SWITCH_INIT;
|
||||
static bli_pthread_switch_t thread_g_state = BLIS_PTHREAD_SWITCH_INIT;
|
||||
static BLIS_THREAD_LOCAL
|
||||
bli_pthread_switch_t rntm_l_state = BLIS_PTHREAD_SWITCH_INIT;
|
||||
static bli_pthread_switch_t memsys_g_state = BLIS_PTHREAD_SWITCH_INIT;
|
||||
|
||||
int bli_init_apis( void )
|
||||
{
|
||||
// Initialize various sub-APIs.
|
||||
bli_gks_init();
|
||||
bli_ind_init();
|
||||
bli_thread_init();
|
||||
bli_pack_init();
|
||||
bli_memsys_init();
|
||||
bli_pthread_switch_on( &gks_g_state, bli_gks_init );
|
||||
bli_pthread_switch_on( &ind_l_state, bli_ind_init );
|
||||
bli_pthread_switch_on( &thread_g_state, bli_thread_init );
|
||||
bli_pthread_switch_on( &rntm_l_state, bli_rntm_init );
|
||||
bli_pthread_switch_on( &memsys_g_state, bli_memsys_init );
|
||||
|
||||
return 0;
|
||||
}
|
||||
@@ -93,11 +99,11 @@ int bli_init_apis( void )
|
||||
int bli_finalize_apis( void )
|
||||
{
|
||||
// Finalize various sub-APIs.
|
||||
bli_memsys_finalize();
|
||||
bli_pack_finalize();
|
||||
bli_thread_finalize();
|
||||
bli_ind_finalize();
|
||||
bli_gks_finalize();
|
||||
bli_pthread_switch_off( &memsys_g_state, bli_memsys_finalize );
|
||||
bli_pthread_switch_off( &rntm_l_state, bli_rntm_finalize );
|
||||
bli_pthread_switch_off( &thread_g_state, bli_thread_finalize );
|
||||
bli_pthread_switch_off( &ind_l_state, bli_ind_finalize );
|
||||
bli_pthread_switch_off( &gks_g_state, bli_gks_finalize );
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
@@ -36,8 +36,12 @@
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
void bli_memsys_init( void )
|
||||
int bli_memsys_init( void )
|
||||
{
|
||||
// NOTE: This function is called once by ONLY ONE application thread per
|
||||
// library init/finalize cycle (see bli_init.c). Thus, a mutex is not
|
||||
// needed to protect the data initialization.
|
||||
|
||||
// Query a native context so we have something to pass into
|
||||
// bli_pba_init_pools().
|
||||
// NOTE: We intentionally call bli_gks_query_nat_cntx_noinit() in order
|
||||
@@ -49,14 +53,18 @@ void bli_memsys_init( void )
|
||||
|
||||
// Initialize the small block allocator and its data structures.
|
||||
bli_sba_init();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
void bli_memsys_finalize( void )
|
||||
int bli_memsys_finalize( void )
|
||||
{
|
||||
// Finalize the small block allocator and its data structures.
|
||||
bli_sba_finalize();
|
||||
|
||||
// Finalize the packing block allocator and its data structures.
|
||||
bli_pba_finalize();
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
|
||||
@@ -37,10 +37,8 @@
|
||||
#ifndef BLIS_MEMSYS_H
|
||||
#define BLIS_MEMSYS_H
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_memsys_init( void );
|
||||
void bli_memsys_finalize( void );
|
||||
int bli_memsys_init( void );
|
||||
int bli_memsys_finalize( void );
|
||||
|
||||
|
||||
#endif
|
||||
|
||||
@@ -35,26 +35,6 @@
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
// The global rntm_t structure. (The definition resides in bli_rntm.c.)
|
||||
extern rntm_t global_rntm;
|
||||
|
||||
// A mutex to allow synchronous access to global_rntm. (The definition
|
||||
// resides in bli_rntm.c.)
|
||||
extern bli_pthread_mutex_t global_rntm_mutex;
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_pack_init( void )
|
||||
{
|
||||
// Read the environment variables and use them to initialize the
|
||||
// global runtime object.
|
||||
bli_pack_init_rntm_from_env( &global_rntm );
|
||||
}
|
||||
|
||||
void bli_pack_finalize( void )
|
||||
{
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_pack_get_pack_a( bool* pack_a )
|
||||
@@ -62,7 +42,7 @@ void bli_pack_get_pack_a( bool* pack_a )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
*pack_a = bli_rntm_pack_a( &global_rntm );
|
||||
*pack_a = bli_rntm_pack_a( bli_global_rntm() );
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
@@ -72,7 +52,7 @@ void bli_pack_get_pack_b( bool* pack_b )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
*pack_b = bli_rntm_pack_b( &global_rntm );
|
||||
*pack_b = bli_rntm_pack_b( bli_global_rntm() );
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
@@ -82,13 +62,17 @@ void bli_pack_set_pack_a( bool pack_a )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
// Acquire the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_lock( &global_rntm_mutex );
|
||||
// If TLS is disabled, we need to use a mutex to protect the global rntm_t
|
||||
// since it will be shared with all application threads.
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_lock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
|
||||
bli_rntm_set_pack_a( pack_a, &global_rntm );
|
||||
bli_rntm_set_pack_a( pack_a, bli_global_rntm() );
|
||||
|
||||
// Release the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_unlock( &global_rntm_mutex );
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_unlock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
@@ -98,60 +82,16 @@ void bli_pack_set_pack_b( bool pack_b )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
// Acquire the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_lock( &global_rntm_mutex );
|
||||
// If TLS is disabled, we need to use a mutex to protect the global rntm_t
|
||||
// since it will be shared with all application threads.
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_lock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
|
||||
bli_rntm_set_pack_b( pack_b, &global_rntm );
|
||||
bli_rntm_set_pack_b( pack_b, bli_global_rntm() );
|
||||
|
||||
// Release the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_unlock( &global_rntm_mutex );
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
|
||||
void bli_pack_init_rntm_from_env
|
||||
(
|
||||
rntm_t* rntm
|
||||
)
|
||||
{
|
||||
// NOTE: We don't need to acquire the global_rntm_mutex here because this
|
||||
// function is only called from bli_pack_init(), which is only called
|
||||
// by bli_init_once().
|
||||
|
||||
bool pack_a;
|
||||
bool pack_b;
|
||||
|
||||
#if 1 //def BLIS_ENABLE_SELECTIVE_PACKING
|
||||
|
||||
// Try to read BLIS_PACK_A and BLIS_PACK_B. For each variable, default to
|
||||
// -1 if it is unset.
|
||||
gint_t pack_a_env = bli_env_get_var( "BLIS_PACK_A", -1 );
|
||||
gint_t pack_b_env = bli_env_get_var( "BLIS_PACK_B", -1 );
|
||||
|
||||
// Enforce the default behavior first, then check for affirmative FALSE, and
|
||||
// finally assume anything else is TRUE.
|
||||
if ( pack_a_env == -1 ) pack_a = FALSE; // default behavior
|
||||
else if ( pack_a_env == 0 ) pack_a = FALSE; // zero is FALSE
|
||||
else pack_a = TRUE; // anything else is TRUE
|
||||
|
||||
if ( pack_b_env == -1 ) pack_b = FALSE; // default behavior
|
||||
else if ( pack_b_env == 0 ) pack_b = FALSE; // zero is FALSE
|
||||
else pack_b = TRUE; // anything else is TRUE
|
||||
|
||||
#else
|
||||
|
||||
pack_a = TRUE;
|
||||
pack_b = TRUE;
|
||||
|
||||
#endif
|
||||
|
||||
// Save the results back in the runtime object.
|
||||
bli_rntm_set_pack_a( pack_a, rntm );
|
||||
bli_rntm_set_pack_b( pack_b, rntm );
|
||||
|
||||
#if 0
|
||||
printf( "bli_pack_init_rntm_from_env()\n" );
|
||||
bli_rntm_print( rntm );
|
||||
#endif
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_unlock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
@@ -35,15 +35,10 @@
|
||||
#ifndef BLIS_PACK_H
|
||||
#define BLIS_PACK_H
|
||||
|
||||
void bli_pack_init( void );
|
||||
void bli_pack_finalize( void );
|
||||
|
||||
BLIS_EXPORT_BLIS void bli_pack_get_pack_a( bool* pack_a );
|
||||
BLIS_EXPORT_BLIS void bli_pack_get_pack_b( bool* pack_b );
|
||||
BLIS_EXPORT_BLIS void bli_pack_set_pack_a( bool pack_a );
|
||||
BLIS_EXPORT_BLIS void bli_pack_set_pack_b( bool pack_b );
|
||||
|
||||
void bli_pack_init_rntm_from_env( rntm_t* rntm );
|
||||
|
||||
#endif
|
||||
|
||||
|
||||
@@ -34,27 +34,214 @@
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
// The global rntm_t structure, which holds the global thread settings
|
||||
// along with a few other key parameters.
|
||||
rntm_t global_rntm = BLIS_RNTM_INITIALIZER;
|
||||
// The global rntm_t structure, which holds the global thread settings along
|
||||
// with a few other key parameters, along with a spare copy to capture a
|
||||
// snapshot at init-time and a mutex to control access to both structs.
|
||||
static BLIS_THREAD_LOCAL
|
||||
rntm_t global_rntm = BLIS_RNTM_INITIALIZER;
|
||||
static BLIS_THREAD_LOCAL
|
||||
rntm_t global_rntm_at_init = BLIS_RNTM_INITIALIZER;
|
||||
static BLIS_THREAD_LOCAL
|
||||
bli_pthread_mutex_t global_rntm_mutex = BLIS_PTHREAD_MUTEX_INITIALIZER;
|
||||
|
||||
// A mutex to allow synchronous access to global_rntm.
|
||||
bli_pthread_mutex_t global_rntm_mutex = BLIS_PTHREAD_MUTEX_INITIALIZER;
|
||||
// Private functions to access the above static variables.
|
||||
rntm_t* bli_global_rntm( void ) { return &global_rntm; }
|
||||
rntm_t* bli_global_rntm_at_init( void ) { return &global_rntm_at_init; }
|
||||
bli_pthread_mutex_t*
|
||||
bli_global_rntm_mutex( void ) { return &global_rntm_mutex; }
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
int bli_rntm_init( void )
|
||||
{
|
||||
// NOTE: If TLS is enabled, this function is called once by EACH application
|
||||
// thread per library init/finalize cycle (see bli_init.c). In this case,
|
||||
// the threads will initialize thread-local data (see vars above). If TLS
|
||||
// is disabled, this function is called once by ONLY ONE application thread.
|
||||
// In neither case is a mutex needed to protect the data initialization.
|
||||
|
||||
rntm_t* gr = bli_global_rntm();
|
||||
rntm_t* grai = bli_global_rntm_at_init();
|
||||
|
||||
// Read the threading-related and sup packing-related environment variables
|
||||
// and use them to initialize the global_rntm object.
|
||||
bli_rntm_init_from_env( gr );
|
||||
|
||||
// Copy the contents of the global_rntm object into the global_rntm_at_init
|
||||
// object, which is intended to remain unchanged for the duration of the
|
||||
// current init/finalize cycle.
|
||||
*grai = *gr;
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
int bli_rntm_finalize( void )
|
||||
{
|
||||
return 0;
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_rntm_init_from_env
|
||||
(
|
||||
rntm_t* rntm
|
||||
)
|
||||
{
|
||||
|
||||
#ifdef BLIS_ENABLE_MULTITHREADING
|
||||
|
||||
timpl_t ti = BLIS_SINGLE;
|
||||
|
||||
// Try to read BLIS_THREAD_IMPL.
|
||||
char* ti_env = bli_env_get_str( "BLIS_THREAD_IMPL" );
|
||||
|
||||
// If BLIS_THREAD_IMPL was not set, try to read BLIS_TI.
|
||||
if ( ti_env == NULL ) ti_env = bli_env_get_str( "BLIS_TI" );
|
||||
|
||||
if ( ti_env != NULL )
|
||||
{
|
||||
// If BLIS_THREAD_IMPL was set, parse the value. If the value was
|
||||
// anything other than a "openmp" or "pthreads" (or reasonable
|
||||
// variations thereof), interpret it as a request for single-threaded
|
||||
// execution.
|
||||
if ( !strncmp( ti_env, "openmp", 6 ) ) ti = BLIS_OPENMP;
|
||||
else if ( !strncmp( ti_env, "omp", 3 ) ) ti = BLIS_OPENMP;
|
||||
else if ( !strncmp( ti_env, "pthreads", 8 ) ) ti = BLIS_POSIX;
|
||||
else if ( !strncmp( ti_env, "pthread", 7 ) ) ti = BLIS_POSIX;
|
||||
else if ( !strncmp( ti_env, "posix", 5 ) ) ti = BLIS_POSIX;
|
||||
else if ( !strncmp( ti_env, "hpx", 3 ) ) ti = BLIS_HPX;
|
||||
else ti = BLIS_SINGLE;
|
||||
|
||||
#ifdef PRINT_IMPL
|
||||
printf( "detected BLIS_THREAD_IMPL=%s.\n",
|
||||
bli_thread_get_thread_impl_str( ti );
|
||||
#endif
|
||||
}
|
||||
else
|
||||
{
|
||||
// If BLIS_THREAD_IMPL was unset, default to the implementation that
|
||||
// was determined at configure-time.
|
||||
ti = BLIS_SINGLE;
|
||||
|
||||
#ifdef BLIS_ENABLE_OPENMP_AS_DEFAULT
|
||||
ti = BLIS_OPENMP;
|
||||
#endif
|
||||
#ifdef BLIS_ENABLE_PTHREADS_AS_DEFAULT
|
||||
ti = BLIS_POSIX;
|
||||
#endif
|
||||
#ifdef BLIS_ENABLE_HPX_AS_DEFAULT
|
||||
ti = BLIS_HPX;
|
||||
#endif
|
||||
|
||||
#ifdef PRINT_IMPL
|
||||
printf( "BLIS_THREAD_IMPL unset; defaulting to BLIS_THREAD_IMPL=%s.\n",
|
||||
bli_thread_get_thread_impl_str( ti );
|
||||
#endif
|
||||
}
|
||||
|
||||
// ------------------------------------------------------------------------
|
||||
|
||||
// Try to read BLIS_NUM_THREADS first.
|
||||
dim_t nt = bli_env_get_var( "BLIS_NUM_THREADS", -1 );
|
||||
|
||||
// If BLIS_NUM_THREADS was not set, try to read BLIS_NT.
|
||||
if ( nt == -1 ) nt = bli_env_get_var( "BLIS_NT", -1 );
|
||||
|
||||
// If neither BLIS_NUM_THREADS nor BLIS_NT were set, try OMP_NUM_THREADS.
|
||||
if ( nt == -1 ) nt = bli_env_get_var( "OMP_NUM_THREADS", -1 );
|
||||
|
||||
// ------------------------------------------------------------------------
|
||||
|
||||
// Read the environment variables for the number of threads (ways of
|
||||
// parallelism) for each individual loop.
|
||||
dim_t jc = bli_env_get_var( "BLIS_JC_NT", -1 );
|
||||
dim_t pc = bli_env_get_var( "BLIS_PC_NT", -1 );
|
||||
dim_t ic = bli_env_get_var( "BLIS_IC_NT", -1 );
|
||||
dim_t jr = bli_env_get_var( "BLIS_JR_NT", -1 );
|
||||
dim_t ir = bli_env_get_var( "BLIS_IR_NT", -1 );
|
||||
|
||||
// ------------------------------------------------------------------------
|
||||
|
||||
// Save the results back in the runtime object.
|
||||
bli_rntm_set_thread_impl_only( ti, rntm );
|
||||
bli_rntm_set_num_threads_only( nt, rntm );
|
||||
bli_rntm_set_ways_only( jc, pc, ic, jr, ir, rntm );
|
||||
|
||||
// ------------------------------------------------------------------------
|
||||
|
||||
// This function, bli_thread_init_rntm_from_env(), is only called when BLIS
|
||||
// is initialized, and so we need to go one step further and process the
|
||||
// rntm's contents into a standard form to ensure, for example, that none of
|
||||
// the ways of parallelism are negative or zero (in case the user queries
|
||||
// them later).
|
||||
bli_rntm_sanitize( rntm );
|
||||
|
||||
#else
|
||||
|
||||
// When multithreading is disabled, the global rntm can keep the values it
|
||||
// was assigned at (static) initialization time.
|
||||
|
||||
#endif
|
||||
|
||||
//printf( "bli_thread_init_rntm_from_env()\n" ); bli_rntm_print( rntm );
|
||||
|
||||
// ------------------------------------------------------------------------
|
||||
|
||||
bool pack_a;
|
||||
bool pack_b;
|
||||
|
||||
#if 1
|
||||
|
||||
// Try to read BLIS_PACK_A and BLIS_PACK_B. For each variable, default to
|
||||
// -1 if it is unset.
|
||||
gint_t pack_a_env = bli_env_get_var( "BLIS_PACK_A", -1 );
|
||||
gint_t pack_b_env = bli_env_get_var( "BLIS_PACK_B", -1 );
|
||||
|
||||
// Enforce the default behavior first, then check for affirmative FALSE, and
|
||||
// finally assume anything else is TRUE.
|
||||
if ( pack_a_env == -1 ) pack_a = FALSE; // default behavior
|
||||
else if ( pack_a_env == 0 ) pack_a = FALSE; // zero is FALSE
|
||||
else pack_a = TRUE; // anything else is TRUE
|
||||
|
||||
if ( pack_b_env == -1 ) pack_b = FALSE; // default behavior
|
||||
else if ( pack_b_env == 0 ) pack_b = FALSE; // zero is FALSE
|
||||
else pack_b = TRUE; // anything else is TRUE
|
||||
|
||||
#else
|
||||
|
||||
pack_a = TRUE;
|
||||
pack_b = TRUE;
|
||||
|
||||
#endif
|
||||
|
||||
// Save the results back in the runtime object.
|
||||
bli_rntm_set_pack_a( pack_a, rntm );
|
||||
bli_rntm_set_pack_b( pack_b, rntm );
|
||||
|
||||
#if 0
|
||||
printf( "bli_pack_init_rntm_from_env()\n" );
|
||||
bli_rntm_print( rntm );
|
||||
#endif
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
|
||||
void bli_rntm_init_from_global( rntm_t* rntm )
|
||||
{
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
// Acquire the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_lock( &global_rntm_mutex );
|
||||
// If TLS is disabled, we need to use a mutex to protect the global rntm_t
|
||||
// since it will be shared with all application threads.
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_lock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
|
||||
*rntm = global_rntm;
|
||||
|
||||
// Release the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_unlock( &global_rntm_mutex );
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_unlock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
@@ -267,7 +267,8 @@ BLIS_INLINE void bli_rntm_clear_l3_sup( rntm_t* rntm )
|
||||
.l3_sup = TRUE, \
|
||||
} \
|
||||
|
||||
BLIS_INLINE void bli_rntm_init( rntm_t* rntm )
|
||||
#if 0
|
||||
//BLIS_INLINE void bli_rntm_clear( rntm_t* rntm )
|
||||
{
|
||||
bli_rntm_clear_thread_impl( rntm );
|
||||
|
||||
@@ -279,6 +280,7 @@ BLIS_INLINE void bli_rntm_init( rntm_t* rntm )
|
||||
bli_rntm_clear_pack_b( rntm );
|
||||
bli_rntm_clear_l3_sup( rntm );
|
||||
}
|
||||
#endif
|
||||
|
||||
//
|
||||
// -- rntm_t total thread calculation ------------------------------------------
|
||||
@@ -304,6 +306,15 @@ BLIS_INLINE dim_t bli_rntm_calc_num_threads
|
||||
// -- Function prototypes ------------------------------------------------------
|
||||
//
|
||||
|
||||
rntm_t* bli_global_rntm( void );
|
||||
rntm_t* bli_global_rntm_at_init( void );
|
||||
bli_pthread_mutex_t* bli_global_rntm_mutex( void );
|
||||
|
||||
int bli_rntm_init( void );
|
||||
int bli_rntm_finalize( void );
|
||||
|
||||
void bli_rntm_init_from_env( rntm_t* rntm );
|
||||
|
||||
BLIS_EXPORT_BLIS void bli_rntm_init_from_global( rntm_t* rntm );
|
||||
|
||||
BLIS_EXPORT_BLIS void bli_rntm_set_num_threads
|
||||
|
||||
@@ -35,18 +35,10 @@
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
#ifdef BLIS_ENABLE_HPX
|
||||
#include "bli_thread_hpx.h"
|
||||
#endif
|
||||
|
||||
// A global communicator that is hard-coded for single-threaded execution.
|
||||
thrcomm_t BLIS_SINGLE_COMM = {};
|
||||
|
||||
// The global rntm_t structure. (The definition resides in bli_rntm.c.)
|
||||
extern rntm_t global_rntm;
|
||||
|
||||
// A mutex to allow synchronous access to global_rntm. (The definition
|
||||
// resides in bli_rntm.c.)
|
||||
extern bli_pthread_mutex_t global_rntm_mutex;
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
typedef void (*thread_launch_t)
|
||||
(
|
||||
@@ -80,17 +72,22 @@ static thread_launch_t thread_launch_fpa[ BLIS_NUM_THREAD_IMPLS ] =
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_thread_init( void )
|
||||
int bli_thread_init( void )
|
||||
{
|
||||
// NOTE: This function is called once by ONLY ONE application thread per
|
||||
// library init/finalize cycle (see bli_init.c). Thus, a mutex is not
|
||||
// needed to protect the data initialization.
|
||||
|
||||
bli_thrcomm_init( BLIS_SINGLE, 1, &BLIS_SINGLE_COMM );
|
||||
|
||||
// Read the environment variables and use them to initialize the
|
||||
// global runtime object.
|
||||
bli_thread_init_rntm_from_env( &global_rntm );
|
||||
return 0;
|
||||
}
|
||||
|
||||
void bli_thread_finalize( void )
|
||||
int bli_thread_finalize( void )
|
||||
{
|
||||
bli_thrcomm_cleanup( &BLIS_SINGLE_COMM );
|
||||
|
||||
return 0;
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
@@ -653,7 +650,7 @@ dim_t bli_thread_get_jc_nt( void )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
return bli_rntm_jc_ways( &global_rntm );
|
||||
return bli_rntm_jc_ways( bli_global_rntm() );
|
||||
}
|
||||
|
||||
dim_t bli_thread_get_pc_nt( void )
|
||||
@@ -661,7 +658,7 @@ dim_t bli_thread_get_pc_nt( void )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
return bli_rntm_pc_ways( &global_rntm );
|
||||
return bli_rntm_pc_ways( bli_global_rntm() );
|
||||
}
|
||||
|
||||
dim_t bli_thread_get_ic_nt( void )
|
||||
@@ -669,7 +666,7 @@ dim_t bli_thread_get_ic_nt( void )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
return bli_rntm_ic_ways( &global_rntm );
|
||||
return bli_rntm_ic_ways( bli_global_rntm() );
|
||||
}
|
||||
|
||||
dim_t bli_thread_get_jr_nt( void )
|
||||
@@ -677,7 +674,7 @@ dim_t bli_thread_get_jr_nt( void )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
return bli_rntm_jr_ways( &global_rntm );
|
||||
return bli_rntm_jr_ways( bli_global_rntm() );
|
||||
}
|
||||
|
||||
dim_t bli_thread_get_ir_nt( void )
|
||||
@@ -685,7 +682,7 @@ dim_t bli_thread_get_ir_nt( void )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
return bli_rntm_ir_ways( &global_rntm );
|
||||
return bli_rntm_ir_ways( bli_global_rntm() );
|
||||
}
|
||||
|
||||
dim_t bli_thread_get_num_threads( void )
|
||||
@@ -693,7 +690,7 @@ dim_t bli_thread_get_num_threads( void )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
return bli_rntm_num_threads( &global_rntm );
|
||||
return bli_rntm_num_threads( bli_global_rntm() );
|
||||
}
|
||||
|
||||
timpl_t bli_thread_get_thread_impl( void )
|
||||
@@ -701,7 +698,7 @@ timpl_t bli_thread_get_thread_impl( void )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
return bli_rntm_thread_impl( &global_rntm );
|
||||
return bli_rntm_thread_impl( bli_global_rntm() );
|
||||
}
|
||||
|
||||
static const char* bli_timpl_string[BLIS_NUM_THREAD_IMPLS] =
|
||||
@@ -726,16 +723,20 @@ void bli_thread_set_ways( dim_t jc, dim_t pc, dim_t ic, dim_t jr, dim_t ir )
|
||||
|
||||
#ifdef BLIS_ENABLE_MULTITHREADING
|
||||
|
||||
// Acquire the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_lock( &global_rntm_mutex );
|
||||
// If TLS is disabled, we need to use a mutex to protect the global rntm_t
|
||||
// since it will be shared with all application threads.
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_lock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
|
||||
bli_rntm_set_ways_only( jc, 1, ic, jr, ir, &global_rntm );
|
||||
bli_rntm_set_ways_only( jc, 1, ic, jr, ir, bli_global_rntm() );
|
||||
|
||||
// Ensure that the rntm_t is in a consistent state.
|
||||
bli_rntm_sanitize( &global_rntm );
|
||||
bli_rntm_sanitize( bli_global_rntm() );
|
||||
|
||||
// Release the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_unlock( &global_rntm_mutex );
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_unlock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
|
||||
#else
|
||||
|
||||
@@ -752,16 +753,20 @@ void bli_thread_set_num_threads( dim_t n_threads )
|
||||
|
||||
#ifdef BLIS_ENABLE_MULTITHREADING
|
||||
|
||||
// Acquire the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_lock( &global_rntm_mutex );
|
||||
// If TLS is disabled, we need to use a mutex to protect the global rntm_t
|
||||
// since it will be shared with all application threads.
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_lock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
|
||||
bli_rntm_set_num_threads_only( n_threads, &global_rntm );
|
||||
bli_rntm_set_num_threads_only( n_threads, bli_global_rntm() );
|
||||
|
||||
// Ensure that the rntm_t is in a consistent state.
|
||||
bli_rntm_sanitize( &global_rntm );
|
||||
bli_rntm_sanitize( bli_global_rntm() );
|
||||
|
||||
// Release the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_unlock( &global_rntm_mutex );
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_unlock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
|
||||
#else
|
||||
|
||||
@@ -776,123 +781,54 @@ void bli_thread_set_thread_impl( timpl_t ti )
|
||||
// We must ensure that global_rntm has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
// Acquire the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_lock( &global_rntm_mutex );
|
||||
// If TLS is disabled, we need to use a mutex to protect the global rntm_t
|
||||
// since it will be shared with all application threads.
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_lock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
|
||||
bli_rntm_set_thread_impl_only( ti, &global_rntm );
|
||||
bli_rntm_set_thread_impl_only( ti, bli_global_rntm() );
|
||||
|
||||
// Release the mutex protecting global_rntm.
|
||||
bli_pthread_mutex_unlock( &global_rntm_mutex );
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_unlock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
}
|
||||
|
||||
// ----------------------------------------------------------------------------
|
||||
|
||||
//#define PRINT_IMPL
|
||||
|
||||
void bli_thread_init_rntm_from_env
|
||||
(
|
||||
rntm_t* rntm
|
||||
)
|
||||
void bli_thread_reset( void )
|
||||
{
|
||||
// NOTE: We don't need to acquire the global_rntm_mutex here because this
|
||||
// function is only called from bli_thread_init(), which is only called
|
||||
// by bli_init_once().
|
||||
// We must ensure that global_rntm_at_init has been initialized.
|
||||
bli_init_once();
|
||||
|
||||
#ifdef BLIS_ENABLE_MULTITHREADING
|
||||
// If TLS is disabled, we need to use a mutex to protect the global rntm_t
|
||||
// since it will be shared with all application threads.
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_lock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
|
||||
timpl_t ti = BLIS_SINGLE;
|
||||
// Overwrite the global rntm_t with the contents of the snapshot we took
|
||||
// at initialization.
|
||||
|
||||
// Try to read BLIS_THREAD_IMPL.
|
||||
char* ti_env = bli_env_get_str( "BLIS_THREAD_IMPL" );
|
||||
rntm_t* src = bli_global_rntm_at_init();
|
||||
rntm_t* dst = bli_global_rntm();
|
||||
|
||||
// If BLIS_THREAD_IMPL was not set, try to read BLIS_TI.
|
||||
if ( ti_env == NULL ) ti_env = bli_env_get_str( "BLIS_TI" );
|
||||
timpl_t ti = bli_rntm_thread_impl( src );
|
||||
bool af = bli_rntm_auto_factor( src );
|
||||
dim_t nt = bli_rntm_num_threads( src );
|
||||
|
||||
if ( ti_env != NULL )
|
||||
{
|
||||
// If BLIS_THREAD_IMPL was set, parse the value. If the value was
|
||||
// anything other than a "openmp" or "pthreads" (or reasonable
|
||||
// variations thereof), interpret it as a request for single-threaded
|
||||
// execution.
|
||||
if ( !strncmp( ti_env, "openmp", 6 ) ) ti = BLIS_OPENMP;
|
||||
else if ( !strncmp( ti_env, "omp", 3 ) ) ti = BLIS_OPENMP;
|
||||
else if ( !strncmp( ti_env, "pthreads", 8 ) ) ti = BLIS_POSIX;
|
||||
else if ( !strncmp( ti_env, "pthread", 7 ) ) ti = BLIS_POSIX;
|
||||
else if ( !strncmp( ti_env, "posix", 5 ) ) ti = BLIS_POSIX;
|
||||
else if ( !strncmp( ti_env, "hpx", 3 ) ) ti = BLIS_HPX;
|
||||
else ti = BLIS_SINGLE;
|
||||
bli_rntm_set_thread_impl_only( ti, dst );
|
||||
bli_rntm_set_auto_factor_only( af, dst );
|
||||
bli_rntm_set_num_threads_only( nt, dst );
|
||||
|
||||
#ifdef PRINT_IMPL
|
||||
printf( "detected BLIS_THREAD_IMPL=%s.\n",
|
||||
bli_thread_get_thread_impl_str( ti );
|
||||
#endif
|
||||
}
|
||||
else
|
||||
{
|
||||
// If BLIS_THREAD_IMPL was unset, default to the implementation that
|
||||
// was determined at configure-time.
|
||||
ti = BLIS_SINGLE;
|
||||
dim_t jc = bli_rntm_jc_ways( src );
|
||||
dim_t pc = bli_rntm_pc_ways( src );
|
||||
dim_t ic = bli_rntm_ic_ways( src );
|
||||
dim_t jr = bli_rntm_jr_ways( src );
|
||||
dim_t ir = bli_rntm_ir_ways( src );
|
||||
|
||||
#ifdef BLIS_ENABLE_OPENMP_AS_DEFAULT
|
||||
ti = BLIS_OPENMP;
|
||||
#endif
|
||||
#ifdef BLIS_ENABLE_PTHREADS_AS_DEFAULT
|
||||
ti = BLIS_POSIX;
|
||||
#endif
|
||||
#ifdef BLIS_ENABLE_HPX_AS_DEFAULT
|
||||
ti = BLIS_HPX;
|
||||
#endif
|
||||
bli_rntm_set_ways_only( jc, pc, ic, jr, ir, dst );
|
||||
|
||||
#ifdef PRINT_IMPL
|
||||
printf( "BLIS_THREAD_IMPL unset; defaulting to BLIS_THREAD_IMPL=%s.\n",
|
||||
bli_thread_get_thread_impl_str( ti );
|
||||
#endif
|
||||
}
|
||||
|
||||
// ------------------------------------------------------------------------
|
||||
|
||||
// Try to read BLIS_NUM_THREADS first.
|
||||
dim_t nt = bli_env_get_var( "BLIS_NUM_THREADS", -1 );
|
||||
|
||||
// If BLIS_NUM_THREADS was not set, try to read BLIS_NT.
|
||||
if ( nt == -1 ) nt = bli_env_get_var( "BLIS_NT", -1 );
|
||||
|
||||
// If neither BLIS_NUM_THREADS nor BLIS_NT were set, try OMP_NUM_THREADS.
|
||||
if ( nt == -1 ) nt = bli_env_get_var( "OMP_NUM_THREADS", -1 );
|
||||
|
||||
// ------------------------------------------------------------------------
|
||||
|
||||
// Read the environment variables for the number of threads (ways of
|
||||
// parallelism) for each individual loop.
|
||||
dim_t jc = bli_env_get_var( "BLIS_JC_NT", -1 );
|
||||
dim_t pc = bli_env_get_var( "BLIS_PC_NT", -1 );
|
||||
dim_t ic = bli_env_get_var( "BLIS_IC_NT", -1 );
|
||||
dim_t jr = bli_env_get_var( "BLIS_JR_NT", -1 );
|
||||
dim_t ir = bli_env_get_var( "BLIS_IR_NT", -1 );
|
||||
|
||||
// ------------------------------------------------------------------------
|
||||
|
||||
// Save the results back in the runtime object.
|
||||
bli_rntm_set_thread_impl_only( ti, rntm );
|
||||
bli_rntm_set_num_threads_only( nt, rntm );
|
||||
bli_rntm_set_ways_only( jc, pc, ic, jr, ir, rntm );
|
||||
|
||||
// ------------------------------------------------------------------------
|
||||
|
||||
// This function, bli_thread_init_rntm_from_env(), is only called when BLIS
|
||||
// is initialized, and so we need to go one step further and process the
|
||||
// rntm's contents into a standard form to ensure, for example, that none of
|
||||
// the ways of parallelism are negative or zero (in case the user queries
|
||||
// them later).
|
||||
bli_rntm_sanitize( rntm );
|
||||
|
||||
#else
|
||||
|
||||
// When multithreading is disabled, the global rntm can keep the values it
|
||||
// was assigned at (static) initialization time.
|
||||
|
||||
#endif
|
||||
|
||||
//printf( "bli_thread_init_rntm_from_env()\n" ); bli_rntm_print( rntm );
|
||||
#ifdef BLIS_DISABLE_TLS
|
||||
bli_pthread_mutex_unlock( bli_global_rntm_mutex() );
|
||||
#endif
|
||||
}
|
||||
|
||||
|
||||
@@ -53,8 +53,8 @@ typedef void (*thread_func_t)( thrcomm_t* gl_comm, dim_t tid, const void* params
|
||||
#include "bli_thread_single.h"
|
||||
|
||||
// Initialization-related prototypes.
|
||||
void bli_thread_init( void );
|
||||
void bli_thread_finalize( void );
|
||||
int bli_thread_init( void );
|
||||
int bli_thread_finalize( void );
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
@@ -126,8 +126,7 @@ BLIS_EXPORT_BLIS const char* bli_thread_get_thread_impl_str( timpl_t ti );
|
||||
BLIS_EXPORT_BLIS void bli_thread_set_ways( dim_t jc, dim_t pc, dim_t ic, dim_t jr, dim_t ir );
|
||||
BLIS_EXPORT_BLIS void bli_thread_set_num_threads( dim_t value );
|
||||
BLIS_EXPORT_BLIS void bli_thread_set_thread_impl( timpl_t ti );
|
||||
|
||||
void bli_thread_init_rntm_from_env( rntm_t* rntm );
|
||||
BLIS_EXPORT_BLIS void bli_thread_reset( void );
|
||||
|
||||
|
||||
#endif
|
||||
|
||||
@@ -2716,8 +2716,7 @@ thrinfo_t* libblis_test_pobj_create( bszid_t bmult_id_m, bszid_t bmult_id_n, inv
|
||||
if ( inv_diag == BLIS_NO_INVERT_DIAG ) does_inv_diag = FALSE;
|
||||
else does_inv_diag = TRUE;
|
||||
|
||||
rntm_t rntm;
|
||||
bli_rntm_init( &rntm );
|
||||
rntm_t rntm = BLIS_RNTM_INITIALIZER;
|
||||
|
||||
// Create a control tree node for the packing operation.
|
||||
cntl_t* cntl = bli_packm_cntl_create_node
|
||||
|
||||
Reference in New Issue
Block a user