Defined rntm_t to relocate cntx_t.thrloop (#235).

Details:
- Defined a new struct datatype, rntm_t (runtime), to house the thrloop
  field of the cntx_t (context). The thrloop array holds the number of
  ways of parallelism (thread "splits") to extract per level-3
  algorithmic loop until those values can be used to create a
  corresponding node in the thread control tree (thrinfo_t structure),
  which (for any given level-3 invocation) usually happens by the time
  the macrokernel is called for the first time.
- Relocating the thrloop from the cntx_t remedies a thread-safety issue
  when invoking level-3 operations from two or more application threads.
  The race condition existed because the cntx_t, a pointer to which is
  usually queried from the global kernel structure (gks), is supposed to
  be a read-only. However, the previous code would write to the cntx_t's
  thrloop field *after* it had been queried, thus violating its read-only
  status. In practice, this would not cause a problem when a sequential
  application made a multithreaded call to BLIS, nor when two or more
  application threads used the same parallelization scheme when calling
  BLIS, because in either case all application theads would be using
  the same ways of parallelism for each loop. The true effects of the
  race condition were limited to situations where two or more application
  theads used *different* parallelization schemes for any given level-3
  call.
- In remedying the above race condition, the application or calling
  library can now specify the parallelization scheme on a per-call basis.
  All that is required is that the thread encode its request for
  parallelism into the rntm_t struct prior to passing the address of the
  rntm_t to one of the expert interfaces of either the typed or object
  APIs. This allows, for example, one application thread to extract 4-way
  parallelism from a call to gemm while another application thread
  requests 2-way parallelism. Or, two threads could each request 4-way
  parallelism, but from different loops.
- A rntm_t* parameter has been added to the function signatures of most
  of the level-3 implementation stack (with the most notable exception
  being packm) as well as all level-1v, -1d, -1f, -1m, and -2 expert
  APIs. (A few internal functions gained the rntm_t* parameter even
  though they currently have no use for it, such as bli_l3_packm().)
  This required some internal calls to some of those functions to
  be updated since BLIS was already using those operations internally
  via the expert interfaces. For situations where a rntm_t object is
  not available, such as within packm/unpackm implementations, NULL is
  passed in to the relevant expert interfaces. This is acceptable for
  now since parallelism is not obtained for non-level-3 operations.
- Revamped how global parallelism is encoded. First, the conventional
  environment variables such as BLIS_NUM_THREADS and BLIS_*_NT  are only
  read once, at library initialization. (Thanks to Nathaniel Smith for
  suggesting this to avoid repeated calls getenv(), which can be slow.)
  Those values are recorded to a global rntm_t object. Public APIs, in
  bli_thread.c, are still available to get/set these values from the
  global rntm_t, though now the "set" functions have additional logic
  to ensure that the values are set in a synchronous manner via a mutex.
  If/when NULL is passed into an expert API (meaning the user opted to
  not provide a custom rntm_t), the values from the global rntm_t are
  copied to a local rntm_t, which is then passed down the function stack.
  Calling a basic API is equivalent to calling the expert APIs with NULL
  for the cntx and rntm parameters, which means the semantic behavior of
  these basic APIs (vis-a-vis multithreading) is unchanged from before.
- Renamed bli_cntx_set_thrloop_from_env() to bli_rntm_set_ways_for_op()
  and reimplemented, with the function now being able to treat the
  incoming rntm_t in a manner agnostic to its origin--whether it came
  from the application or is an internal copy of the global rntm_t.
- Removed various global runtime APIs for setting the number of ways of
  parallelism for individual loops (e.g. bli_thread_set_*_nt()) as well
  as the corresponding "get" functions. The new model simplifies these
  interfaces so that one must either set the total number of threads, OR
  set all of the ways of parallelism for each loop simultaneously (in a
  single function call).
- Updated sandbox/ref99 according to above changes.
- Rewrote/augmented docs/Multithreading.md to document the three methods
  (and two specific ways within each method) of requesting parallelism
  in BLIS.
- Removed old, disabled code from bli_l3_thrinfo.c.
- Whitespace changes to code (e.g. bli_obj.c) and docs/BuildSystem.md.
This commit is contained in:
Field G. Van Zee
2018-07-17 18:37:32 -05:00
parent 323eaaab99
commit ecbebe7c2e
177 changed files with 2210 additions and 1166 deletions

View File

@@ -67,7 +67,7 @@ void PASTEMAC(opname,EX_SUF) \
PASTEMAC(opname,_check)( x, y ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_7 \
bli_call_ft_8 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -75,7 +75,8 @@ void PASTEMAC(opname,EX_SUF) \
n, \
buf_x, inc_x, \
buf_y, inc_y, \
cntx \
cntx, \
rntm \
); \
}
@@ -110,14 +111,15 @@ void PASTEMAC(opname,EX_SUF) \
PASTEMAC(opname,_check)( x, index ); \
\
/* Invoke the typed function. */ \
bli_call_ft_5 \
bli_call_ft_6 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
n, \
buf_x, incx, \
buf_index, \
cntx \
cntx, \
rntm \
); \
}
@@ -168,7 +170,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_beta = bli_obj_buffer_for_1x1( dt, &beta_local ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_9 \
bli_call_ft_10 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -178,7 +180,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_x, inc_x, \
buf_beta, \
buf_y, inc_y, \
cntx \
cntx, \
rntm \
); \
}
@@ -223,7 +226,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_8 \
bli_call_ft_9 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -232,7 +235,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha, \
buf_x, inc_x, \
buf_y, inc_y, \
cntx \
cntx, \
rntm \
); \
}
@@ -270,7 +274,7 @@ void PASTEMAC(opname,EX_SUF) \
PASTEMAC(opname,_check)( x, y, rho ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_9 \
bli_call_ft_10 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -280,7 +284,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_x, inc_x, \
buf_y, inc_y, \
buf_rho, \
cntx \
cntx, \
rntm \
); \
}
@@ -334,7 +339,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_beta = bli_obj_buffer_for_1x1( dt, &beta_local ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_11 \
bli_call_ft_12 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -346,7 +351,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_y, inc_y, \
buf_beta, \
buf_rho, \
cntx \
cntx, \
rntm \
); \
}
@@ -376,13 +382,14 @@ void PASTEMAC(opname,EX_SUF) \
PASTEMAC(opname,_check)( x ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_4 \
bli_call_ft_5 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
n, \
buf_x, inc_x, \
cntx \
cntx, \
rntm \
); \
}
@@ -424,7 +431,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_6 \
bli_call_ft_7 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -432,7 +439,8 @@ void PASTEMAC(opname,EX_SUF) \
n, \
buf_alpha, \
buf_x, inc_x, \
cntx \
cntx, \
rntm \
); \
}
@@ -466,14 +474,15 @@ void PASTEMAC(opname,EX_SUF) \
PASTEMAC(opname,_check)( x, y ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_6 \
bli_call_ft_7 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
n, \
buf_x, inc_x, \
buf_y, inc_y, \
cntx \
cntx, \
rntm \
); \
}
@@ -518,7 +527,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_beta = bli_obj_buffer_for_1x1( dt, &beta_local ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_8 \
bli_call_ft_9 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -527,7 +536,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_x, inc_x, \
buf_beta, \
buf_y, inc_y, \
cntx \
cntx, \
rntm \
); \
}

View File

@@ -72,7 +72,7 @@ void PASTEMAC(opname,EX_SUF) \
PASTEMAC(opname,_check)( x, y ); \
\
/* Invoke the typed function. */ \
bli_call_ft_12 \
bli_call_ft_13 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -83,7 +83,8 @@ void PASTEMAC(opname,EX_SUF) \
n, \
buf_x, rs_x, cs_x, \
buf_y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
}
@@ -135,7 +136,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_13 \
bli_call_ft_14 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -147,7 +148,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha, \
buf_x, rs_x, cs_x, \
buf_y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
}
@@ -181,7 +183,7 @@ void PASTEMAC(opname,EX_SUF) \
PASTEMAC(opname,_check)( x ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_7 \
bli_call_ft_8 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -189,7 +191,8 @@ void PASTEMAC(opname,EX_SUF) \
m, \
n, \
buf_x, rs_x, cs_x, \
cntx \
cntx, \
rntm \
); \
}
@@ -234,7 +237,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_9 \
bli_call_ft_10 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -244,7 +247,8 @@ void PASTEMAC(opname,EX_SUF) \
n, \
buf_alpha, \
buf_x, rs_x, cs_x, \
cntx \
cntx, \
rntm \
); \
}
@@ -281,7 +285,7 @@ void PASTEMAC(opname,EX_SUF) \
PASTEMAC(opname,_check)( alpha, x ); \
\
/* Invoke the typed function. */ \
bli_call_ft_8 \
bli_call_ft_9 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -290,7 +294,8 @@ void PASTEMAC(opname,EX_SUF) \
n, \
buf_alpha, \
buf_x, rs_x, cs_x, \
cntx \
cntx, \
rntm \
); \
}

View File

@@ -88,7 +88,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alphay = bli_obj_buffer_for_1x1( dt, &alphay_local ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_12 \
bli_call_ft_13 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -100,7 +100,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_x, inc_x, \
buf_y, inc_y, \
buf_z, inc_z, \
cntx \
cntx, \
rntm \
); \
}
@@ -154,7 +155,7 @@ void PASTEMAC(opname,EX_SUF) \
if ( bli_obj_has_trans( a ) ) { bli_swap_incs( &rs_a, &cs_a ); } \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_13 \
bli_call_ft_14 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -166,7 +167,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_a, rs_a, cs_a, \
buf_x, inc_x, \
buf_y, inc_y, \
cntx \
cntx, \
rntm \
); \
}
@@ -219,7 +221,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_13 \
bli_call_ft_14 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -232,7 +234,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_y, inc_y, \
buf_rho, \
buf_z, inc_z, \
cntx \
cntx, \
rntm \
); \
}
@@ -301,7 +304,7 @@ void PASTEMAC(opname,EX_SUF) \
if ( bli_obj_has_trans( a ) ) { bli_swap_incs( &rs_a, &cs_a ); } \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_20 \
bli_call_ft_21 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -318,7 +321,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_beta, \
buf_y, inc_y, \
buf_z, inc_z, \
cntx \
cntx, \
rntm \
); \
}
@@ -378,7 +382,7 @@ void PASTEMAC(opname,EX_SUF) \
if ( bli_obj_has_trans( a ) ) { bli_swap_incs( &rs_a, &cs_a ); } \
\
/* Invoke the void pointer-based function. */ \
bli_call_ft_14 \
bli_call_ft_15 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -391,7 +395,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_x, inc_x, \
buf_beta, \
buf_y, inc_y, \
cntx \
cntx, \
rntm \
); \
}

View File

@@ -73,7 +73,7 @@ void PASTEMAC(opname,EX_SUF) \
PASTEMAC(opname,_check)( x, y ); \
\
/* Invoke the typed function. */ \
bli_call_ft_13 \
bli_call_ft_14 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -85,7 +85,8 @@ void PASTEMAC(opname,EX_SUF) \
n, \
buf_x, rs_x, cs_x, \
buf_y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
}
@@ -138,7 +139,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_14 \
bli_call_ft_15 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -151,7 +152,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha, \
buf_x, rs_x, cs_x, \
buf_y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
}
@@ -212,7 +214,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_internal_scalar_buffer( &x_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_11 \
bli_call_ft_12 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -224,7 +226,8 @@ void PASTEMAC(opname,EX_SUF) \
n, \
buf_alpha, \
buf_x, rs_x, cs_x, \
cntx \
cntx, \
rntm \
); \
}
@@ -271,7 +274,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_11 \
bli_call_ft_12 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -283,7 +286,8 @@ void PASTEMAC(opname,EX_SUF) \
n, \
buf_alpha, \
buf_x, rs_x, cs_x, \
cntx \
cntx, \
rntm \
); \
}

View File

@@ -77,7 +77,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
n, \
x, rs_x, cs_x, \
y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
\
/* When the diagonal of an upper- or lower-stored matrix is unit,
@@ -94,7 +95,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
n, \
x, rs_x, cs_x, \
y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
} \
}
@@ -140,7 +142,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
n, \
x, rs_x, cs_x, \
y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
\
/* When the diagonal of an upper- or lower-stored matrix is unit,
@@ -162,7 +165,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
n, \
one, \
y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
} \
}
@@ -212,7 +216,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
alpha, \
x, rs_x, cs_x, \
y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
\
/* When the diagonal of an upper- or lower-stored matrix is unit,
@@ -230,7 +235,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
alpha, \
x, rs_x, cs_x, \
y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
} \
}
@@ -280,7 +286,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
n, \
alpha, \
y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
return; \
} \
@@ -298,7 +305,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
alpha, \
x, rs_x, cs_x, \
y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
\
/* When the diagonal of an upper- or lower-stored matrix is unit,
@@ -319,7 +327,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
n, \
alpha, \
y, rs_y, cs_y, \
cntx \
cntx, \
rntm \
); \
} \
}
@@ -364,7 +373,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
n, \
alpha, \
x, rs_x, cs_x, \
cntx \
cntx, \
rntm \
); \
}

View File

@@ -51,7 +51,8 @@ void PASTEMAC(ch,opname) \
dim_t n, \
ctype* x, inc_t rs_x, inc_t cs_x, \
ctype* y, inc_t rs_y, inc_t cs_y, \
cntx_t* cntx \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
const num_t dt = PASTEMAC(ch,type); \
@@ -167,7 +168,8 @@ void PASTEMAC(ch,opname) \
ctype* alpha, \
ctype* x, inc_t rs_x, inc_t cs_x, \
ctype* y, inc_t rs_y, inc_t cs_y, \
cntx_t* cntx \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
const num_t dt = PASTEMAC(ch,type); \
@@ -284,7 +286,8 @@ void PASTEMAC(ch,opname) \
dim_t n, \
ctype* alpha, \
ctype* x, inc_t rs_x, inc_t cs_x, \
cntx_t* cntx \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
const num_t dt = PASTEMAC(ch,type); \

View File

@@ -50,7 +50,8 @@ void PASTEMAC2(ch,opname,_unb_var1) \
dim_t n, \
ctype* x, inc_t rs_x, inc_t cs_x, \
ctype* y, inc_t rs_y, inc_t cs_y, \
cntx_t* cntx \
cntx_t* cntx, \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( addm )
@@ -72,7 +73,8 @@ void PASTEMAC2(ch,opname,_unb_var1) \
ctype* alpha, \
ctype* x, inc_t rs_x, inc_t cs_x, \
ctype* y, inc_t rs_y, inc_t cs_y, \
cntx_t* cntx \
cntx_t* cntx, \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( axpym )
@@ -92,7 +94,8 @@ void PASTEMAC2(ch,opname,_unb_var1) \
dim_t n, \
ctype* alpha, \
ctype* x, inc_t rs_x, inc_t cs_x, \
cntx_t* cntx \
cntx_t* cntx, \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( scalm )

View File

@@ -89,7 +89,10 @@ void PASTEMAC(ch,opname) \
kappa, \
a, inca, lda, \
p, 1, ldp, \
cntx \
cntx, \
/* The rntm_t* can safely be NULL as long as it's not used by
scal2m_ex(). */ \
NULL \
); \
} \
}

View File

@@ -181,7 +181,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero, \
p_edge, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -203,7 +204,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero, \
p_edge, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -236,7 +238,8 @@ void PASTEMAC(ch,varname) \
n_br, \
one, \
p_br, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
} \
@@ -450,7 +453,8 @@ void PASTEMAC(ch,varname) \
p11_n, \
c11, rs_c, cs_c, \
p11, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
\
/* If source matrix c is Hermitian, we have to zero out the
@@ -481,7 +485,8 @@ void PASTEMAC(ch,varname) \
p11_n, \
kappa, \
p11, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
} \
@@ -544,7 +549,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
kappa, \
p, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -557,7 +563,8 @@ void PASTEMAC(ch,varname) \
m_panel, \
n_panel, \
p, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -586,7 +593,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
zero, \
p, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -183,7 +183,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
@@ -195,7 +196,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
@@ -207,7 +209,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_rpi, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -231,7 +234,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
@@ -243,7 +247,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
@@ -255,7 +260,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_rpi, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -290,7 +296,8 @@ void PASTEMAC(ch,varname) \
n_br, \
one_r, \
p_br_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
@@ -300,7 +307,8 @@ void PASTEMAC(ch,varname) \
n_br, \
zero_r, \
p_br_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
} \
@@ -521,7 +529,8 @@ void PASTEMAC(ch,varname) \
alpha_r, \
c11_r, rs_c11, cs_c11, \
p11_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
\
/* Copy the imaginary part of the stored triangle of c11 to p11_i,
@@ -537,7 +546,8 @@ void PASTEMAC(ch,varname) \
alpha_i, \
c11_i, rs_c11, cs_c11, \
p11_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
\
/* If source matrix c is Hermitian, we have to zero out the
@@ -689,7 +699,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
&kappa_r, \
p_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
@@ -699,7 +710,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
&kappa_i, \
p_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
\
/* Update the diagonal of the p11 section of the rpi panel.
@@ -757,7 +769,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
zero_r, \
p_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
@@ -769,7 +782,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
zero_r, \
p_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
@@ -781,7 +795,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
zero_r, \
p_rpi, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
} \

View File

@@ -182,7 +182,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
@@ -194,7 +195,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -217,7 +219,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
@@ -229,7 +232,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -264,7 +268,8 @@ void PASTEMAC(ch,varname) \
n_br, \
one_r, \
p_br_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
@@ -274,7 +279,8 @@ void PASTEMAC(ch,varname) \
n_br, \
zero_r, \
p_br_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
} \
@@ -495,7 +501,8 @@ void PASTEMAC(ch,varname) \
alpha_r, \
c11_r, rs_c11, cs_c11, \
p11_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
\
/* Copy the imaginary part of the stored triangle of c11 to p11_i,
@@ -511,7 +518,8 @@ void PASTEMAC(ch,varname) \
alpha_i, \
c11_i, rs_c11, cs_c11, \
p11_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
\
/* If source matrix c is Hermitian, we have to zero out the
@@ -634,7 +642,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
&kappa_r, \
p_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
@@ -644,7 +653,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
&kappa_i, \
p_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -689,7 +699,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
zero_r, \
p_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
@@ -701,7 +712,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
zero_r, \
p_i, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
} \

View File

@@ -185,7 +185,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -207,7 +208,8 @@ void PASTEMAC(ch,varname) \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -581,7 +583,8 @@ void PASTEMAC(ch,varname) \
n_panel, \
zero_r, \
p_r, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
} \

View File

@@ -163,7 +163,8 @@ void PASTEMAC(ch,varname) \
kappa_cast, \
c_cast, rs_c, cs_c, \
p_cast, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
\
/* If uploc is upper or lower, then the structure of c is necessarily
@@ -205,7 +206,8 @@ void PASTEMAC(ch,varname) \
kappa_cast, \
c_cast, rs_c, cs_c, \
p_cast, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
else /* if ( bli_is_triangular( strucc ) ) */ \
@@ -239,7 +241,8 @@ void PASTEMAC(ch,varname) \
n, \
zero, \
p_cast, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
} \
@@ -265,7 +268,8 @@ void PASTEMAC(ch,varname) \
n_max, \
zero, \
p_edge, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
\
@@ -283,7 +287,8 @@ void PASTEMAC(ch,varname) \
n_max - n, \
zero, \
p_edge, rs_p, cs_p, \
cntx \
cntx, \
NULL \
); \
} \
}

View File

@@ -246,7 +246,8 @@ void PASTEMAC(ch,varname) \
one, \
p_begin, rs_p, cs_p, \
c_begin, rs_c, cs_c, \
cntx \
cntx, \
NULL \
); \
} \
else \

View File

@@ -89,7 +89,8 @@ void PASTEMAC(ch,opname) \
kappa, \
p, 1, ldp, \
a, inca, lda, \
cntx \
cntx, \
NULL \
); \
} \
}

View File

@@ -122,7 +122,8 @@ void PASTEMAC(ch,varname)( \
n, \
p_cast, rs_p, cs_p, \
c_cast, rs_c, cs_c, \
cntx \
cntx, \
NULL \
); \
}

View File

@@ -90,7 +90,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_beta = bli_obj_buffer_for_1x1( dt, &beta_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_14 \
bli_call_ft_15 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -103,7 +103,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_x, incx, \
buf_beta, \
buf_y, incy, \
cntx \
cntx, \
rntm \
); \
}
@@ -154,7 +155,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_13 \
bli_call_ft_14 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -166,7 +167,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_x, incx, \
buf_y, incy, \
buf_a, rs_a, cs_a, \
cntx \
cntx, \
rntm \
); \
}
@@ -223,7 +225,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_beta = bli_obj_buffer_for_1x1( dt, &beta_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_14 \
bli_call_ft_15 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -236,7 +238,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_x, incx, \
buf_beta, \
buf_y, incy, \
cntx \
cntx, \
rntm \
); \
}
@@ -284,7 +287,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_10 \
bli_call_ft_11 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -294,7 +297,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha, \
buf_x, incx, \
buf_a, rs_a, cs_a, \
cntx \
cntx, \
rntm \
); \
}
@@ -346,7 +350,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_13 \
bli_call_ft_14 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -358,7 +362,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_x, incx, \
buf_y, incy, \
buf_a, rs_a, cs_a, \
cntx \
cntx, \
rntm \
); \
}
@@ -407,7 +412,7 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
\
/* Invoke the typed function. */ \
bli_call_ft_11 \
bli_call_ft_12 \
( \
dt, \
PASTECH(opname,BLIS_TAPI_EX_SUF), \
@@ -418,7 +423,8 @@ void PASTEMAC(opname,EX_SUF) \
buf_alpha, \
buf_a, rs_a, cs_a, \
buf_x, incx, \
cntx \
cntx, \
rntm \
); \
}

View File

@@ -82,7 +82,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
m_y, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
return; \
} \
@@ -206,7 +207,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
m, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
return; \
} \
@@ -461,7 +463,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
m, \
alpha, \
x, incx, \
cntx \
cntx, \
NULL \
); \
return; \
} \

View File

@@ -79,7 +79,8 @@ void PASTEMAC(ch,varname) \
n_elem, \
zero, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
else \
@@ -91,7 +92,8 @@ void PASTEMAC(ch,varname) \
n_elem, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -79,7 +79,8 @@ void PASTEMAC(ch,varname) \
n_elem, \
zero, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
else \
@@ -91,7 +92,8 @@ void PASTEMAC(ch,varname) \
n_elem, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -100,7 +100,8 @@ void PASTEMAC(ch,varname) \
m, \
zero, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
else \
@@ -112,7 +113,8 @@ void PASTEMAC(ch,varname) \
m, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -102,7 +102,8 @@ void PASTEMAC(ch,varname) \
m, \
zero, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
else \
@@ -114,7 +115,8 @@ void PASTEMAC(ch,varname) \
m, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -100,7 +100,8 @@ void PASTEMAC(ch,varname) \
m, \
zero, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
else \
@@ -112,7 +113,8 @@ void PASTEMAC(ch,varname) \
m, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -101,7 +101,8 @@ void PASTEMAC(ch,varname) \
m, \
zero, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
else \
@@ -113,7 +114,8 @@ void PASTEMAC(ch,varname) \
m, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -109,7 +109,8 @@ void PASTEMAC(ch,varname) \
m, \
zero, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
else \
@@ -121,7 +122,8 @@ void PASTEMAC(ch,varname) \
m, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -100,7 +100,8 @@ void PASTEMAC(ch,varname) \
m, \
zero, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
else \
@@ -112,7 +113,8 @@ void PASTEMAC(ch,varname) \
m, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -109,7 +109,8 @@ void PASTEMAC(ch,varname) \
m, \
zero, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
else \
@@ -121,7 +122,8 @@ void PASTEMAC(ch,varname) \
m, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -100,7 +100,8 @@ void PASTEMAC(ch,varname) \
m, \
zero, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
else \
@@ -112,7 +113,8 @@ void PASTEMAC(ch,varname) \
m, \
beta, \
y, incy, \
cntx \
cntx, \
NULL \
); \
} \
\

View File

@@ -87,7 +87,8 @@ void PASTEMAC(ch,varname) \
m, \
alpha, \
x, incx, \
cntx \
cntx, \
NULL \
); \
\
PASTECH(ch,dotv_ft) kfp_tv; \

View File

@@ -87,7 +87,8 @@ void PASTEMAC(ch,varname) \
m, \
alpha, \
x, incx, \
cntx \
cntx, \
NULL \
); \
\
PASTECH(ch,axpyv_ft) kfp_av; \

View File

@@ -81,7 +81,8 @@ void PASTEMAC(ch,varname) \
m, \
alpha, \
x, incx, \
cntx \
cntx, \
NULL \
); \
\
if ( bli_does_notrans( transa ) ) \

View File

@@ -80,7 +80,8 @@ void PASTEMAC(ch,varname) \
m, \
alpha, \
x, incx, \
cntx \
cntx, \
NULL \
); \
\
if ( bli_does_notrans( transa ) ) \

View File

@@ -53,7 +53,9 @@ void bli_l3_cntl_create_if
// values for unpacked objects. Notice that we do this even if the
// caller passed in a custom control tree; that's because we still need
// to reset the pack schema of a and b, which were modified by the
// operation's _front() function.
// operation's _front() function. However, in order for this to work,
// the level-3 thread entry function (or omp parallel region) must
// alias thread-local copies of objects a and b.
pack_t schema_a = bli_obj_pack_schema( a );
pack_t schema_b = bli_obj_pack_schema( b );

View File

@@ -70,11 +70,11 @@ void PASTEMAC(opname,EX_SUF) \
that is available (ie: implemented and enabled), and if none are
enabled, it calls native execution. (For real problems, it calls
the operation's native execution interface.) */ \
PASTEMAC(opname,ind)( alpha, a, b, beta, c, cntx ); \
PASTEMAC(opname,ind)( alpha, a, b, beta, c, cntx, rntm ); \
} \
else \
{ \
PASTEMAC(opname,nat)( alpha, a, b, beta, c, cntx ); \
PASTEMAC(opname,nat)( alpha, a, b, beta, c, cntx, rntm ); \
} \
}
@@ -114,11 +114,11 @@ void PASTEMAC(opname,EX_SUF) \
that is available (ie: implemented and enabled), and if none are
enabled, it calls native execution. (For real problems, it calls
the operation's native execution interface.) */ \
PASTEMAC(opname,ind)( side, alpha, a, b, beta, c, cntx ); \
PASTEMAC(opname,ind)( side, alpha, a, b, beta, c, cntx, rntm ); \
} \
else \
{ \
PASTEMAC(opname,nat)( side, alpha, a, b, beta, c, cntx ); \
PASTEMAC(opname,nat)( side, alpha, a, b, beta, c, cntx, rntm ); \
} \
}
@@ -155,11 +155,11 @@ void PASTEMAC(opname,EX_SUF) \
that is available (ie: implemented and enabled), and if none are
enabled, it calls native execution. (For real problems, it calls
the operation's native execution interface.) */ \
PASTEMAC(opname,ind)( alpha, a, beta, c, cntx ); \
PASTEMAC(opname,ind)( alpha, a, beta, c, cntx, rntm ); \
} \
else \
{ \
PASTEMAC(opname,nat)( alpha, a, beta, c, cntx ); \
PASTEMAC(opname,nat)( alpha, a, beta, c, cntx, rntm ); \
} \
}
@@ -195,11 +195,11 @@ void PASTEMAC(opname,EX_SUF) \
that is available (ie: implemented and enabled), and if none are
enabled, it calls native execution. (For real problems, it calls
the operation's native execution interface.) */ \
PASTEMAC(opname,ind)( side, alpha, a, b, cntx ); \
PASTEMAC(opname,ind)( side, alpha, a, b, cntx, rntm ); \
} \
else \
{ \
PASTEMAC(opname,nat)( side, alpha, a, b, cntx ); \
PASTEMAC(opname,nat)( side, alpha, a, b, cntx, rntm ); \
} \
}

View File

@@ -52,7 +52,8 @@ typedef void (*PASTECH(opname,_oft)) \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx \
cntx_t* cntx, \
rntm_t* rntm \
);
GENTDEF( gemm )
@@ -73,7 +74,8 @@ typedef void (*PASTECH(opname,_oft)) \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx \
cntx_t* cntx, \
rntm_t* rntm \
);
GENTDEF( hemm )
@@ -92,7 +94,8 @@ typedef void (*PASTECH(opname,_oft)) \
obj_t* a, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx \
cntx_t* cntx, \
rntm_t* rntm \
);
GENTDEF( herk )
@@ -110,7 +113,8 @@ typedef void (*PASTECH(opname,_oft)) \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
cntx_t* cntx \
cntx_t* cntx, \
rntm_t* rntm \
);
GENTDEF( trmm )

View File

@@ -39,6 +39,7 @@ void bli_l3_packm
obj_t* x,
obj_t* x_pack,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)

View File

@@ -37,6 +37,7 @@ void bli_l3_packm
obj_t* x,
obj_t* x_pack,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
);

View File

@@ -89,7 +89,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
&bo, \
&betao, \
&co, \
cntx \
cntx, \
rntm \
); \
}
@@ -150,7 +151,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
&bo, \
&betao, \
&co, \
cntx \
cntx, \
rntm \
); \
}
@@ -204,7 +206,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
&ao, \
&betao, \
&co, \
cntx \
cntx, \
rntm \
); \
}
@@ -264,7 +267,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
&bo, \
&betao, \
&co, \
cntx \
cntx, \
rntm \
); \
}
@@ -316,7 +320,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
&ao, \
&betao, \
&co, \
cntx \
cntx, \
rntm \
); \
}
@@ -375,7 +380,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
&bo, \
&betao, \
&co, \
cntx \
cntx, \
rntm \
); \
}
@@ -438,7 +444,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
&bo, \
&betao, \
&co, \
cntx \
cntx, \
rntm \
); \
}
@@ -491,7 +498,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
&alphao, \
&ao, \
&bo, \
cntx \
cntx, \
rntm \
); \
}

View File

@@ -122,7 +122,7 @@ void bli_l3_thrinfo_create_root
(
dim_t id,
thrcomm_t* gl_comm,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t** thread
)
@@ -136,7 +136,7 @@ void bli_l3_thrinfo_create_root
// Use the blocksize id of the current (root) control tree node to
// query the top-most ways of parallelism to obtain.
bszid_t bszid = bli_cntl_bszid( cntl );
dim_t xx_way = bli_cntx_way_for_bszid( bszid, cntx );
dim_t xx_way = bli_rntm_ways_for( bszid, rntm );
// Determine the work id for this thrinfo_t node.
dim_t work_id = gl_comm_id / ( n_threads / xx_way );
@@ -259,196 +259,6 @@ void bli_l3_thrinfo_print_paths
// -----------------------------------------------------------------------------
#if 0
thrinfo_t** bli_l3_thrinfo_create_roots
(
cntx_t* cntx,
cntl_t* cntl
)
{
// Query the context for the total number of threads to use.
dim_t n_threads = bli_cntx_get_num_threads( cntx );
// Create a global thread communicator for all the threads.
thrcomm_t* gl_comm = bli_thrcomm_create( n_threads );
// Allocate an array of thrinfo_t pointers, one for each thread.
thrinfo_t** paths = bli_malloc_intl( n_threads * sizeof( thrinfo_t* ) );
// Use the blocksize id of the current (root) control tree node to
// query the top-most ways of parallelism to obtain.
bszid_t bszid = bli_cntl_bszid( cntl );
dim_t xx_way = bli_cntx_way_for_bszid( bszid, cntx );
dim_t gl_comm_id;
// Create one thrinfo_t node for each thread in the (global) communicator.
for ( gl_comm_id = 0; gl_comm_id < n_threads; ++gl_comm_id )
{
dim_t work_id = gl_comm_id / ( n_threads / xx_way );
paths[ gl_comm_id ] = bli_thrinfo_create
(
gl_comm,
gl_comm_id,
xx_way,
work_id,
TRUE,
NULL
);
}
return paths;
}
//#define PRINT_THRINFO
thrinfo_t** bli_l3_thrinfo_create_full_paths
(
cntx_t* cntx
)
{
dim_t jc_way = bli_cntx_jc_way( cntx );
dim_t pc_way = bli_cntx_pc_way( cntx );
dim_t ic_way = bli_cntx_ic_way( cntx );
dim_t jr_way = bli_cntx_jr_way( cntx );
dim_t ir_way = bli_cntx_ir_way( cntx );
dim_t gl_nt = jc_way * pc_way * ic_way * jr_way * ir_way;
dim_t jc_nt = pc_way * ic_way * jr_way * ir_way;
dim_t pc_nt = ic_way * jr_way * ir_way;
dim_t ic_nt = jr_way * ir_way;
dim_t jr_nt = ir_way;
dim_t ir_nt = 1;
assert( gl_nt != 0 );
#ifdef PRINT_THRINFO
printf( " gl jc kc pb ic pa jr ir\n" );
printf( "xx_nt: %4lu %4lu %4lu %4lu %4lu %4lu %4lu %4lu\n",
gl_nt, jc_nt, pc_nt, pc_nt, ic_nt, ic_nt, jr_nt, ir_nt );
printf( "\n" );
printf( " jc kc pb ic pa jr ir\n" );
printf( "xx_way: %4lu %4lu %4lu %4lu %4lu %4lu %4lu\n",
jc_way, pc_way, (dim_t)0, ic_way, (dim_t)0, jr_way, ir_way );
printf( "=================================================\n" );
#endif
thrinfo_t** paths = bli_malloc_intl( gl_nt * sizeof( thrinfo_t* ) );
thrcomm_t* gl_comm = bli_thrcomm_create( gl_nt );
for( int a = 0; a < jc_way; a++ )
{
thrcomm_t* jc_comm = bli_thrcomm_create( jc_nt );
for( int b = 0; b < pc_way; b++ )
{
thrcomm_t* pc_comm = bli_thrcomm_create( pc_nt );
for( int c = 0; c < ic_way; c++ )
{
thrcomm_t* ic_comm = bli_thrcomm_create( ic_nt );
for( int d = 0; d < jr_way; d++ )
{
thrcomm_t* jr_comm = bli_thrcomm_create( jr_nt );
for( int e = 0; e < ir_way; e++ )
{
//thrcomm_t* ir_comm = bli_thrcomm_create( ir_nt );
dim_t ir_comm_id = 0;
dim_t jr_comm_id = e*ir_nt + ir_comm_id;
dim_t ic_comm_id = d*jr_nt + jr_comm_id;
dim_t pc_comm_id = c*ic_nt + ic_comm_id;
dim_t jc_comm_id = b*pc_nt + pc_comm_id;
dim_t gl_comm_id = a*jc_nt + jc_comm_id;
// macro-kernel loops
thrinfo_t* ir_info
=
bli_l3_thrinfo_create( jr_comm, jr_comm_id,
ir_way, e,
NULL );
thrinfo_t* jr_info
=
bli_l3_thrinfo_create( ic_comm, ic_comm_id,
jr_way, d,
ir_info );
// packa
thrinfo_t* pa_info
=
bli_packm_thrinfo_create( ic_comm, ic_comm_id,
ic_nt, ic_comm_id,
jr_info );
// blk_var1
thrinfo_t* ic_info
=
bli_l3_thrinfo_create( pc_comm, pc_comm_id,
ic_way, c,
pa_info );
// packb
thrinfo_t* pb_info
=
bli_packm_thrinfo_create( pc_comm, pc_comm_id,
pc_nt, pc_comm_id,
ic_info );
// blk_var3
thrinfo_t* pc_info
=
bli_l3_thrinfo_create( jc_comm, jc_comm_id,
pc_way, b,
pb_info );
// blk_var2
thrinfo_t* jc_info
=
bli_l3_thrinfo_create( gl_comm, gl_comm_id,
jc_way, a,
pc_info );
paths[gl_comm_id] = jc_info;
#ifdef PRINT_THRINFO
{
dim_t gl_comm_id = bli_thread_ocomm_id( jc_info );
dim_t jc_comm_id = bli_thread_ocomm_id( pc_info );
dim_t pc_comm_id = bli_thread_ocomm_id( pb_info );
dim_t pb_comm_id = bli_thread_ocomm_id( ic_info );
dim_t ic_comm_id = bli_thread_ocomm_id( pa_info );
dim_t pa_comm_id = bli_thread_ocomm_id( jr_info );
dim_t jr_comm_id = bli_thread_ocomm_id( ir_info );
dim_t jc_work_id = bli_thread_work_id( jc_info );
dim_t pc_work_id = bli_thread_work_id( pc_info );
dim_t pb_work_id = bli_thread_work_id( pb_info );
dim_t ic_work_id = bli_thread_work_id( ic_info );
dim_t pa_work_id = bli_thread_work_id( pa_info );
dim_t jr_work_id = bli_thread_work_id( jr_info );
dim_t ir_work_id = bli_thread_work_id( ir_info );
printf( " gl jc pb kc pa ic jr \n" );
printf( "comm ids: %4lu %4lu %4lu %4lu %4lu %4lu %4lu\n",
gl_comm_id, jc_comm_id, pc_comm_id, pb_comm_id, ic_comm_id, pa_comm_id, jr_comm_id );
printf( "work ids: %4ld %4ld %4lu %4lu %4ld %4ld %4ld\n",
jc_work_id, pc_work_id, pb_work_id, ic_work_id, pa_work_id, jr_work_id, ir_work_id );
printf( "-------------------------------------------------\n" );
}
#endif
}
}
}
}
}
#ifdef PRINT_THRINFO
exit(1);
#endif
return paths;
}
#endif
void bli_l3_thrinfo_free_paths
(
thrinfo_t** threads

View File

@@ -61,17 +61,6 @@
// thrinfo_t APIs specific to level-3 operations.
//
#if 0
thrinfo_t* bli_l3_thrinfo_create
(
thrcomm_t* ocomm,
dim_t ocomm_id,
dim_t n_way,
dim_t work_id,
thrinfo_t* sub_node
);
#endif
void bli_l3_thrinfo_init
(
thrinfo_t* thread,
@@ -98,7 +87,7 @@ void bli_l3_thrinfo_create_root
(
dim_t id,
thrcomm_t* gl_comm,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t** thread
);
@@ -110,19 +99,6 @@ void bli_l3_thrinfo_print_paths
// -----------------------------------------------------------------------------
#if 0
thrinfo_t** bli_l3_thrinfo_create_roots
(
cntx_t* cntx,
cntl_t* cntl
);
thrinfo_t** bli_l3_thrinfo_create_full_paths
(
cntx_t* cntx
);
#endif
void bli_l3_thrinfo_free_paths
(
thrinfo_t** threads

View File

@@ -49,6 +49,7 @@ typedef void (*PASTECH(opname,_voft)) \
obj_t* b, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm, \
cntl_t* cntl, \
thrinfo_t* thread \
);
@@ -64,6 +65,7 @@ typedef void (*PASTECH(opname,_voft)) \
obj_t* b, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm, \
cntl_t* cntl, \
thrinfo_t* thread \
);

View File

@@ -40,6 +40,7 @@ void bli_gemm_blk_var1
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -87,6 +88,7 @@ void bli_gemm_blk_var1
&BLIS_ONE,
&c1,
cntx,
rntm,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);

View File

@@ -40,6 +40,7 @@ void bli_gemm_blk_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -87,6 +88,7 @@ void bli_gemm_blk_var2
&BLIS_ONE,
&c1,
cntx,
rntm,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);

View File

@@ -40,6 +40,7 @@ void bli_gemm_blk_var3
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -83,6 +84,7 @@ void bli_gemm_blk_var3
&BLIS_ONE,
c,
cntx,
rntm,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);

View File

@@ -43,6 +43,7 @@ void bli_gemm_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
)
{
@@ -86,34 +87,38 @@ void bli_gemm_front
bli_obj_induce_trans( &c_local );
}
// Record the threading for each level within the context.
bli_cntx_set_thrloop_from_env
// Parse and interpret the contents of the rntm_t object to properly
// set the ways of parallelism for each loop, and then make any
// additional modifications necessary for the current operation.
bli_rntm_set_ways_for_op
(
BLIS_GEMM,
BLIS_LEFT, // ignored for gemm/hemm/symm
bli_obj_length( &c_local ),
bli_obj_width( &c_local ),
bli_obj_width( &a_local ),
cntx
rntm
);
// A sort of hack for communicating the desired pach schemas for A and B
// to bli_gemm_cntl_create() (via bli_l3_thread_decorator() and
// bli_l3_cntl_create_if()). This allows us to access the schemas from
// the control tree, which hopefully reduces some confusion, particularly
// in bli_packm_init().
if ( bli_cntx_method( cntx ) == BLIS_NAT )
{
bli_obj_set_pack_schema( BLIS_PACKED_ROW_PANELS, &a_local );
bli_obj_set_pack_schema( BLIS_PACKED_COL_PANELS, &b_local );
}
else // if ( bli_cntx_method( cntx ) != BLIS_NAT )
{
pack_t schema_a = bli_cntx_schema_a_block( cntx );
pack_t schema_b = bli_cntx_schema_b_panel( cntx );
// A sort of hack for communicating the desired pach schemas for A and B
// to bli_gemm_cntl_create() (via bli_l3_thread_decorator() and
// bli_l3_cntl_create_if()). This allows us to access the schemas from
// the control tree, which hopefully reduces some confusion, particularly
// in bli_packm_init().
if ( bli_cntx_method( cntx ) == BLIS_NAT )
{
bli_obj_set_pack_schema( BLIS_PACKED_ROW_PANELS, &a_local );
bli_obj_set_pack_schema( BLIS_PACKED_COL_PANELS, &b_local );
}
else // if ( bli_cntx_method( cntx ) != BLIS_NAT )
{
pack_t schema_a = bli_cntx_schema_a_block( cntx );
pack_t schema_b = bli_cntx_schema_b_panel( cntx );
bli_obj_set_pack_schema( schema_a, &a_local );
bli_obj_set_pack_schema( schema_b, &b_local );
bli_obj_set_pack_schema( schema_a, &a_local );
bli_obj_set_pack_schema( schema_b, &b_local );
}
}
// Invoke the internal back-end via the thread handler.
@@ -127,6 +132,7 @@ void bli_gemm_front
beta,
&c_local,
cntx,
rntm,
cntl
);
}

View File

@@ -40,6 +40,7 @@ void bli_gemm_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
);

View File

@@ -42,6 +42,7 @@ void bli_gemm_int
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -102,7 +103,7 @@ void bli_gemm_int
}
// Create the next node in the thrinfo_t structure.
bli_thrinfo_grow( cntx, cntl, thread );
bli_thrinfo_grow( rntm, cntl, thread );
// Extract the function pointer from the current control tree node.
f = bli_cntl_var_func( cntl );
@@ -124,6 +125,7 @@ void bli_gemm_int
&b_local,
&c_local,
cntx,
rntm,
cntl,
thread
);

View File

@@ -40,6 +40,7 @@ void bli_gemm_int
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
);

View File

@@ -40,6 +40,7 @@ void bli_gemm_ker_var1
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -51,6 +52,6 @@ void bli_gemm_ker_var1
bli_obj_induce_trans( b );
bli_obj_induce_trans( c );
bli_gemm_ker_var2( b, a, c, cntx, cntl, thread );
bli_gemm_ker_var2( b, a, c, cntx, rntm, cntl, thread );
}

View File

@@ -50,6 +50,7 @@ typedef void (*FUNCPTR_T)(
void* beta,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -62,6 +63,7 @@ void bli_gemm_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -147,6 +149,7 @@ void bli_gemm_ker_var2
buf_beta,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -169,6 +172,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
) \
{ \

View File

@@ -40,6 +40,7 @@ void bli_gemm_packa
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -52,6 +53,7 @@ void bli_gemm_packa
a,
&a_pack,
cntx,
rntm,
cntl,
thread
);
@@ -65,6 +67,7 @@ void bli_gemm_packa
&BLIS_ONE,
c,
cntx,
rntm,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);
@@ -78,6 +81,7 @@ void bli_gemm_packb
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -90,6 +94,7 @@ void bli_gemm_packb
b,
&b_pack,
cntx,
rntm,
cntl,
thread
);
@@ -103,6 +108,7 @@ void bli_gemm_packb
&BLIS_ONE,
c,
cntx,
rntm,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);

View File

@@ -46,6 +46,7 @@ void PASTEMAC0(opname) \
obj_t* b, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm, \
cntl_t* cntl, \
thrinfo_t* thread \
);
@@ -85,6 +86,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
);

View File

@@ -50,6 +50,7 @@ typedef void (*FUNCPTR_T)(
void* beta,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -62,6 +63,7 @@ void bli_gemm4mb_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -127,6 +129,7 @@ void bli_gemm4mb_ker_var2
buf_beta,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -149,6 +152,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
) \
{ \

View File

@@ -43,6 +43,7 @@ void bli_hemm_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
)
{
@@ -87,15 +88,17 @@ void bli_hemm_front
bli_obj_swap( &a_local, &b_local );
}
// Record the threading for each level within the context.
bli_cntx_set_thrloop_from_env
// Parse and interpret the contents of the rntm_t object to properly
// set the ways of parallelism for each loop, and then make any
// additional modifications necessary for the current operation.
bli_rntm_set_ways_for_op
(
BLIS_HEMM,
BLIS_LEFT, // ignored for gemm/hemm/symm
bli_obj_length( &c_local ),
bli_obj_width( &c_local ),
bli_obj_width( &a_local ),
cntx
rntm
);
// A sort of hack for communicating the desired pach schemas for A and B
@@ -128,6 +131,7 @@ void bli_hemm_front
beta,
&c_local,
cntx,
rntm,
cntl
);
}

View File

@@ -41,5 +41,6 @@ void bli_hemm_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
);

View File

@@ -42,6 +42,7 @@ void bli_her2k_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
)
{
@@ -105,15 +106,17 @@ void bli_her2k_front
bli_obj_induce_trans( &c_local );
}
// Record the threading for each level within the context.
bli_cntx_set_thrloop_from_env
// Parse and interpret the contents of the rntm_t object to properly
// set the ways of parallelism for each loop, and then make any
// additional modifications necessary for the current operation.
bli_rntm_set_ways_for_op
(
BLIS_HER2K,
BLIS_LEFT, // ignored for her[2]k/syr[2]k
bli_obj_length( &c_local ),
bli_obj_width( &c_local ),
bli_obj_width( &a_local ),
cntx
rntm
);
// A sort of hack for communicating the desired pach schemas for A and B
@@ -152,6 +155,7 @@ void bli_her2k_front
beta,
&c_local,
cntx,
rntm,
cntl
);
@@ -165,6 +169,7 @@ void bli_her2k_front
&BLIS_ONE,
&c_local,
cntx,
rntm,
cntl
);

View File

@@ -40,5 +40,6 @@ void bli_her2k_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
);

View File

@@ -41,6 +41,7 @@ void bli_herk_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
)
{
@@ -85,15 +86,17 @@ void bli_herk_front
bli_obj_induce_trans( &c_local );
}
// Record the threading for each level within the context.
bli_cntx_set_thrloop_from_env
// Parse and interpret the contents of the rntm_t object to properly
// set the ways of parallelism for each loop, and then make any
// additional modifications necessary for the current operation.
bli_rntm_set_ways_for_op
(
BLIS_HERK,
BLIS_LEFT, // ignored for her[2]k/syr[2]k
bli_obj_length( &c_local ),
bli_obj_width( &c_local ),
bli_obj_width( &a_local ),
cntx
rntm
);
// A sort of hack for communicating the desired pach schemas for A and B
@@ -126,6 +129,7 @@ void bli_herk_front
beta,
&c_local,
cntx,
rntm,
cntl
);

View File

@@ -39,5 +39,6 @@ void bli_herk_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
);

View File

@@ -51,6 +51,7 @@ typedef void (*FUNCPTR_T)(
void* beta,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -63,6 +64,7 @@ void bli_herk_l_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -131,6 +133,7 @@ void bli_herk_l_ker_var2
buf_beta,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -154,6 +157,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
) \
{ \

View File

@@ -51,6 +51,7 @@ typedef void (*FUNCPTR_T)(
void* beta,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -63,6 +64,7 @@ void bli_herk_u_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -131,6 +133,7 @@ void bli_herk_u_ker_var2
buf_beta,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -154,6 +157,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
) \
{ \

View File

@@ -46,6 +46,7 @@ void PASTEMAC0(opname) \
obj_t* ah, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm, \
cntl_t* cntl, \
thrinfo_t* thread \
);
@@ -84,6 +85,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
);

View File

@@ -45,6 +45,7 @@ void bli_herk_x_ker_var2
obj_t* ah,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -66,6 +67,7 @@ void bli_herk_x_ker_var2
ah,
c,
cntx,
rntm,
cntl,
thread
);

View File

@@ -43,6 +43,7 @@ void bli_symm_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
)
{
@@ -86,15 +87,17 @@ void bli_symm_front
bli_obj_swap( &a_local, &b_local );
}
// Record the threading for each level within the context.
bli_cntx_set_thrloop_from_env
// Parse and interpret the contents of the rntm_t object to properly
// set the ways of parallelism for each loop, and then make any
// additional modifications necessary for the current operation.
bli_rntm_set_ways_for_op
(
BLIS_SYMM,
BLIS_LEFT, // ignored for gemm/hemm/symm
bli_obj_length( &c_local ),
bli_obj_width( &c_local ),
bli_obj_width( &a_local ),
cntx
rntm
);
// A sort of hack for communicating the desired pach schemas for A and B
@@ -127,6 +130,7 @@ void bli_symm_front
beta,
&c_local,
cntx,
rntm,
cntl
);
}

View File

@@ -41,5 +41,6 @@ void bli_symm_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
);

View File

@@ -42,6 +42,7 @@ void bli_syr2k_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
)
{
@@ -86,15 +87,17 @@ void bli_syr2k_front
bli_obj_induce_trans( &c_local );
}
// Record the threading for each level within the context.
bli_cntx_set_thrloop_from_env
// Parse and interpret the contents of the rntm_t object to properly
// set the ways of parallelism for each loop, and then make any
// additional modifications necessary for the current operation.
bli_rntm_set_ways_for_op
(
BLIS_SYR2K,
BLIS_LEFT, // ignored for her[2]k/syr[2]k
bli_obj_length( &c_local ),
bli_obj_width( &c_local ),
bli_obj_width( &a_local ),
cntx
rntm
);
// A sort of hack for communicating the desired pach schemas for A and B
@@ -133,6 +136,7 @@ void bli_syr2k_front
beta,
&c_local,
cntx,
rntm,
cntl
);
@@ -146,6 +150,7 @@ void bli_syr2k_front
&BLIS_ONE,
&c_local,
cntx,
rntm,
cntl
);
}

View File

@@ -40,5 +40,6 @@ void bli_syr2k_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
);

View File

@@ -41,6 +41,7 @@ void bli_syrk_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
)
{
@@ -79,15 +80,17 @@ void bli_syrk_front
bli_obj_induce_trans( &c_local );
}
// Record the threading for each level within the context.
bli_cntx_set_thrloop_from_env
// Parse and interpret the contents of the rntm_t object to properly
// set the ways of parallelism for each loop, and then make any
// additional modifications necessary for the current operation.
bli_rntm_set_ways_for_op
(
BLIS_SYRK,
BLIS_LEFT, // ignored for her[2]k/syr[2]k
bli_obj_length( &c_local ),
bli_obj_width( &c_local ),
bli_obj_width( &a_local ),
cntx
rntm
);
// A sort of hack for communicating the desired pach schemas for A and B
@@ -120,6 +123,7 @@ void bli_syrk_front
beta,
&c_local,
cntx,
rntm,
cntl
);
}

View File

@@ -39,5 +39,6 @@ void bli_syrk_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
);

View File

@@ -41,6 +41,7 @@ void bli_trmm_front
obj_t* a,
obj_t* b,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
)
{
@@ -129,15 +130,17 @@ void bli_trmm_front
bli_obj_set_as_root( &b_local );
bli_obj_set_as_root( &c_local );
// Record the threading for each level within the context.
bli_cntx_set_thrloop_from_env
// Parse and interpret the contents of the rntm_t object to properly
// set the ways of parallelism for each loop, and then make any
// additional modifications necessary for the current operation.
bli_rntm_set_ways_for_op
(
BLIS_TRMM,
side,
bli_obj_length( &c_local ),
bli_obj_width( &c_local ),
bli_obj_width( &a_local ),
cntx
rntm
);
// A sort of hack for communicating the desired pach schemas for A and B
@@ -170,6 +173,7 @@ void bli_trmm_front
&BLIS_ZERO,
&c_local,
cntx,
rntm,
cntl
);
}

View File

@@ -39,5 +39,6 @@ void bli_trmm_front
obj_t* a,
obj_t* b,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
);

View File

@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
void* beta,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -61,6 +62,7 @@ void bli_trmm_ll_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -125,6 +127,7 @@ void bli_trmm_ll_ker_var2
buf_beta,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -146,6 +149,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* jr_thread \
) \
{ \

View File

@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
void* beta,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -61,6 +62,7 @@ void bli_trmm_lu_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -125,6 +127,7 @@ void bli_trmm_lu_ker_var2
buf_beta,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -146,6 +149,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* jr_thread \
) \
{ \

View File

@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
void* beta,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -61,6 +62,7 @@ void bli_trmm_rl_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -125,6 +127,7 @@ void bli_trmm_rl_ker_var2
buf_beta,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -146,6 +149,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* jr_thread \
) \
{ \

View File

@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
void* beta,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -61,6 +62,7 @@ void bli_trmm_ru_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -125,6 +127,7 @@ void bli_trmm_ru_ker_var2
buf_beta,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -146,6 +149,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* jr_thread \
) \
{ \

View File

@@ -46,6 +46,7 @@ void PASTEMAC0(opname) \
obj_t* b, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm, \
cntl_t* cntl, \
thrinfo_t* thread \
);
@@ -84,6 +85,7 @@ void PASTEMAC(ch,varname) \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
);

View File

@@ -46,6 +46,7 @@ void bli_trmm_xx_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -80,6 +81,7 @@ void bli_trmm_xx_ker_var2
b,
c,
cntx,
rntm,
cntl,
thread
);

View File

@@ -43,6 +43,7 @@ void bli_trmm3_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
)
{
@@ -128,15 +129,17 @@ void bli_trmm3_front
bli_obj_set_as_root( &b_local );
bli_obj_set_as_root( &c_local );
// Record the threading for each level within the context.
bli_cntx_set_thrloop_from_env
// Parse and interpret the contents of the rntm_t object to properly
// set the ways of parallelism for each loop, and then make any
// additional modifications necessary for the current operation.
bli_rntm_set_ways_for_op
(
BLIS_TRMM3,
side,
bli_obj_length( &c_local ),
bli_obj_width( &c_local ),
bli_obj_width( &a_local ),
cntx
rntm
);
// A sort of hack for communicating the desired pach schemas for A and B
@@ -169,6 +172,7 @@ void bli_trmm3_front
beta,
&c_local,
cntx,
rntm,
cntl
);
}

View File

@@ -41,5 +41,6 @@ void bli_trmm3_front
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
);

View File

@@ -40,6 +40,7 @@ void bli_trsm_blk_var1
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -87,6 +88,7 @@ void bli_trsm_blk_var1
&BLIS_ONE,
&c1,
cntx,
rntm,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);

View File

@@ -40,6 +40,7 @@ void bli_trsm_blk_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -87,6 +88,7 @@ void bli_trsm_blk_var2
&BLIS_ONE,
&c1,
cntx,
rntm,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);

View File

@@ -40,6 +40,7 @@ void bli_trsm_blk_var3
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -83,6 +84,7 @@ void bli_trsm_blk_var3
&BLIS_ONE,
c,
cntx,
rntm,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);

View File

@@ -41,6 +41,7 @@ void bli_trsm_front
obj_t* a,
obj_t* b,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
)
{
@@ -120,15 +121,17 @@ void bli_trsm_front
bli_obj_set_as_root( &b_local );
bli_obj_set_as_root( &c_local );
// Record the threading for each level within the context.
bli_cntx_set_thrloop_from_env
// Parse and interpret the contents of the rntm_t object to properly
// set the ways of parallelism for each loop, and then make any
// additional modifications necessary for the current operation.
bli_rntm_set_ways_for_op
(
BLIS_TRSM,
side,
bli_obj_length( &c_local ),
bli_obj_width( &c_local ),
bli_obj_width( &a_local ),
cntx
rntm
);
// A sort of hack for communicating the desired pach schemas for A and B
@@ -161,6 +164,7 @@ void bli_trsm_front
alpha,
&c_local,
cntx,
rntm,
cntl
);
}

View File

@@ -39,5 +39,6 @@ void bli_trsm_front
obj_t* a,
obj_t* b,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl
);

View File

@@ -42,6 +42,7 @@ void bli_trsm_int
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -118,7 +119,7 @@ void bli_trsm_int
bli_thread_obarrier( thread );
// Create the next node in the thrinfo_t structure.
bli_thrinfo_grow( cntx, cntl, thread );
bli_thrinfo_grow( rntm, cntl, thread );
// Extract the function pointer from the current control tree node.
f = bli_cntl_var_func( cntl );
@@ -130,6 +131,7 @@ void bli_trsm_int
&b_local,
&c_local,
cntx,
rntm,
cntl,
thread
);

View File

@@ -40,6 +40,7 @@ void bli_trsm_int
obj_t* beta,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
);

View File

@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
void* alpha2,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -61,6 +62,7 @@ void bli_trsm_ll_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -128,6 +130,7 @@ void bli_trsm_ll_ker_var2
buf_alpha2,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -149,6 +152,7 @@ void PASTEMAC(ch,varname) \
void* alpha2, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
) \
{ \

View File

@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
void* alpha2,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -61,6 +62,7 @@ void bli_trsm_lu_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -128,6 +130,7 @@ void bli_trsm_lu_ker_var2
buf_alpha2,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -149,6 +152,7 @@ void PASTEMAC(ch,varname) \
void* alpha2, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
) \
{ \

View File

@@ -40,6 +40,7 @@ void bli_trsm_packa
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -52,6 +53,7 @@ void bli_trsm_packa
a,
&a_pack,
cntx,
rntm,
cntl,
thread
);
@@ -65,6 +67,7 @@ void bli_trsm_packa
&BLIS_ONE,
c,
cntx,
rntm,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);
@@ -78,6 +81,7 @@ void bli_trsm_packb
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -90,6 +94,7 @@ void bli_trsm_packb
b,
&b_pack,
cntx,
rntm,
cntl,
thread
);
@@ -103,6 +108,7 @@ void bli_trsm_packb
&BLIS_ONE,
c,
cntx,
rntm,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);

View File

@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
void* alpha2,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -61,6 +62,7 @@ void bli_trsm_rl_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -128,6 +130,7 @@ void bli_trsm_rl_ker_var2
buf_alpha2,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -149,6 +152,7 @@ void PASTEMAC(ch,varname) \
void* alpha2, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
) \
{ \

View File

@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
void* alpha2,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
@@ -61,6 +62,7 @@ void bli_trsm_ru_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -128,6 +130,7 @@ void bli_trsm_ru_ker_var2
buf_alpha2,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
@@ -149,6 +152,7 @@ void PASTEMAC(ch,varname) \
void* alpha2, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
) \
{ \

View File

@@ -46,6 +46,7 @@ void PASTEMAC0(opname) \
obj_t* b, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm, \
cntl_t* cntl, \
thrinfo_t* thread \
);
@@ -86,6 +87,7 @@ void PASTEMAC(ch,varname) \
void* alpha2, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
);

View File

@@ -46,6 +46,7 @@ void bli_trsm_xx_ker_var2
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
@@ -80,6 +81,7 @@ void bli_trsm_xx_ker_var2
b,
c,
cntx,
rntm,
cntl,
thread
);

View File

@@ -243,6 +243,12 @@ void bli_cntl_mark_family
cntl_t* cntl
)
{
// This function sets the family field of all cntl tree nodes that are
// children of cntl. It's used by bli_l3_cntl_create_if() after making
// a copy of a user-given cntl tree, if the user provided one, to mark
// the operation family, which is used to determine appropriate behavior
// by various functions when executing the blocked variants.
// Set the family of the root node.
bli_cntl_set_family( family, cntl );
@@ -257,3 +263,31 @@ void bli_cntl_mark_family
}
}
// -----------------------------------------------------------------------------
dim_t bli_cntl_calc_num_threads_in
(
rntm_t* rntm,
cntl_t* cntl
)
{
dim_t n_threads_in = 1;
for ( ; cntl != NULL; cntl = bli_cntl_sub_node( cntl ) )
{
bszid_t bszid = bli_cntl_bszid( cntl );
dim_t cur_way;
// We assume bszid is in {KR,MR,NR,MC,KC,NR} if it is not
// BLIS_NO_PART.
if ( bszid != BLIS_NO_PART )
cur_way = bli_rntm_ways_for( bszid, rntm );
else
cur_way = 1;
n_threads_in *= cur_way;
}
return n_threads_in;
}

View File

@@ -109,6 +109,14 @@ void bli_cntl_mark_family
// -----------------------------------------------------------------------------
dim_t bli_cntl_calc_num_threads_in
(
rntm_t* rntm,
cntl_t* cntl
);
// -----------------------------------------------------------------------------
// cntl_t query (fields only)
static opid_t bli_cntl_family( cntl_t* cntl )

View File

@@ -42,34 +42,6 @@ void bli_cntx_clear( cntx_t* cntx )
// -----------------------------------------------------------------------------
dim_t bli_cntx_get_num_threads_in
(
cntx_t* cntx,
cntl_t* cntl
)
{
dim_t n_threads_in = 1;
for ( ; cntl != NULL; cntl = bli_cntl_sub_node( cntl ) )
{
bszid_t bszid = bli_cntl_bszid( cntl );
dim_t cur_way;
// We assume bszid is in {KR,MR,NR,MC,KC,NR} if it is not
// BLIS_NO_PART.
if ( bszid != BLIS_NO_PART )
cur_way = bli_cntx_way_for_bszid( bszid, cntx );
else
cur_way = 1;
n_threads_in *= cur_way;
}
return n_threads_in;
}
// -----------------------------------------------------------------------------
void bli_cntx_set_blkszs( ind_t method, dim_t n_bs, ... )
{
// This function can be called from the bli_cntx_init_*() function for
@@ -872,146 +844,6 @@ void bli_cntx_set_packm_kers( dim_t n_kers, ... )
bli_free_intl( ker_fps );
}
// -----------------------------------------------------------------------------
void bli_cntx_set_thrloop_from_env
(
opid_t l3_op,
side_t side,
dim_t m,
dim_t n,
dim_t k,
cntx_t* cntx
)
{
dim_t jc, pc, ic, jr, ir;
#ifdef BLIS_ENABLE_MULTITHREADING
int nthread = bli_thread_get_env( "BLIS_NUM_THREADS", -1 );
if ( nthread == -1 )
nthread = bli_thread_get_env( "OMP_NUM_THREADS", -1 );
if ( nthread < 1 ) nthread = 1;
bli_partition_2x2( nthread, m*BLIS_DEFAULT_M_THREAD_RATIO,
n*BLIS_DEFAULT_N_THREAD_RATIO, &ic, &jc );
for ( ir = BLIS_DEFAULT_MR_THREAD_MAX ; ir > 1 ; ir-- )
{
if ( ic % ir == 0 )
{
ic /= ir;
break;
}
}
for ( jr = BLIS_DEFAULT_NR_THREAD_MAX ; jr > 1 ; jr-- )
{
if ( jc % jr == 0 )
{
jc /= jr;
break;
}
}
pc = 1;
dim_t jc_env = bli_thread_get_env( "BLIS_JC_NT", -1 );
dim_t ic_env = bli_thread_get_env( "BLIS_IC_NT", -1 );
dim_t jr_env = bli_thread_get_env( "BLIS_JR_NT", -1 );
dim_t ir_env = bli_thread_get_env( "BLIS_IR_NT", -1 );
if (jc_env != -1 || ic_env != -1 || jr_env != -1 || ir_env != -1)
{
jc = (jc_env == -1 ? 1 : jc_env);
ic = (ic_env == -1 ? 1 : ic_env);
jr = (jr_env == -1 ? 1 : jr_env);
ir = (ir_env == -1 ? 1 : ir_env);
}
#else
jc = 1;
pc = 1;
ic = 1;
jr = 1;
ir = 1;
#endif
if ( l3_op == BLIS_TRMM )
{
// We reconfigure the parallelism from trmm_r due to a dependency in
// the jc loop. (NOTE: This dependency does not exist for trmm3.)
if ( bli_is_right( side ) )
{
bli_cntx_set_thrloop
(
1,
pc,
ic,
jr * jc,
ir,
cntx
);
}
else // if ( bli_is_left( side ) )
{
bli_cntx_set_thrloop
(
jc,
pc,
ic,
jr,
ir,
cntx
);
}
}
else if ( l3_op == BLIS_TRSM )
{
if ( bli_is_right( side ) )
{
bli_cntx_set_thrloop
(
1,
1,
ic * pc * jc * ir * jr,
1,
1,
cntx
);
}
else // if ( bli_is_left( side ) )
{
bli_cntx_set_thrloop
(
1,
1,
1,
ic * pc * jc * jr * ir,
1,
cntx
);
}
}
else // any other level-3 operation besides trmm/trsm
{
bli_cntx_set_thrloop
(
jc,
pc,
ic,
jr,
ir,
cntx
);
}
}
// -----------------------------------------------------------------------------
void bli_cntx_print( cntx_t* cntx )

View File

@@ -60,8 +60,6 @@ typedef struct cntx_s
pack_t schema_b;
pack_t schema_c;
dim_t* thrloop;
membrk_t* membrk;
} cntx_t;
*/
@@ -124,10 +122,6 @@ static pack_t bli_cntx_schema_c_panel( cntx_t* cntx )
{
return cntx->schema_c_panel;
}
static dim_t* bli_cntx_thrloop( cntx_t* cntx )
{
return cntx->thrloop;
}
static membrk_t* bli_cntx_get_membrk( cntx_t* cntx )
{
return cntx->membrk;
@@ -379,47 +373,6 @@ static void* bli_cntx_get_unpackm_ker_dt( num_t dt, l1mkr_t ker_id, cntx_t* cntx
// -----------------------------------------------------------------------------
static dim_t bli_cntx_jc_way( cntx_t* cntx )
{
return cntx->thrloop[ BLIS_NC ];
}
static dim_t bli_cntx_pc_way( cntx_t* cntx )
{
return cntx->thrloop[ BLIS_KC ];
}
static dim_t bli_cntx_ic_way( cntx_t* cntx )
{
return cntx->thrloop[ BLIS_MC ];
}
static dim_t bli_cntx_jr_way( cntx_t* cntx )
{
return cntx->thrloop[ BLIS_NR ];
}
static dim_t bli_cntx_ir_way( cntx_t* cntx )
{
return cntx->thrloop[ BLIS_MR ];
}
static dim_t bli_cntx_pr_way( cntx_t* cntx )
{
return cntx->thrloop[ BLIS_KR ];
}
static dim_t bli_cntx_way_for_bszid( bszid_t bszid, cntx_t* cntx )
{
return cntx->thrloop[ bszid ];
}
static dim_t bli_cntx_get_num_threads( cntx_t* cntx )
{
return bli_cntx_jc_way( cntx ) *
bli_cntx_pc_way( cntx ) *
bli_cntx_ic_way( cntx ) *
bli_cntx_jr_way( cntx ) *
bli_cntx_ir_way( cntx );
}
// -----------------------------------------------------------------------------
static bool_t bli_cntx_l3_nat_ukr_prefers_rows_dt( num_t dt, l3ukr_t ukr_id, cntx_t* cntx )
{
bool_t prefs = bli_cntx_get_l3_nat_ukr_prefs_dt( dt, ukr_id, cntx );
@@ -584,24 +537,12 @@ static void bli_cntx_set_unpackm_ker_dt( void* fp, num_t dt, l1mkr_t ker_id, cnt
bli_func_set_dt( fp, dt, func );
}
static void bli_cntx_set_thrloop( dim_t jc, dim_t pc, dim_t ic, dim_t jr, dim_t ir, cntx_t* cntx )
{
cntx->thrloop[ BLIS_NC ] = jc;
cntx->thrloop[ BLIS_KC ] = pc;
cntx->thrloop[ BLIS_MC ] = ic;
cntx->thrloop[ BLIS_NR ] = jr;
cntx->thrloop[ BLIS_MR ] = ir;
cntx->thrloop[ BLIS_KR ] = 1;
}
// -----------------------------------------------------------------------------
// Function prototypes
void bli_cntx_clear( cntx_t* cntx );
dim_t bli_cntx_get_num_threads_in( cntx_t* cntx, cntl_t* cntl );
void bli_cntx_set_blkszs( ind_t method, dim_t n_bs, ... );
void bli_cntx_set_ind_blkszs( ind_t method, dim_t n_bs, ... );
@@ -611,16 +552,6 @@ void bli_cntx_set_l1f_kers( dim_t n_kers, ... );
void bli_cntx_set_l1v_kers( dim_t n_kers, ... );
void bli_cntx_set_packm_kers( dim_t n_kers, ... );
void bli_cntx_set_thrloop_from_env
(
opid_t l3_op,
side_t side,
dim_t m,
dim_t n,
dim_t k,
cntx_t* cntx
);
void bli_cntx_print( cntx_t* cntx );

View File

@@ -34,12 +34,15 @@
#include "blis.h"
void bli_obj_create( num_t dt,
dim_t m,
dim_t n,
inc_t rs,
inc_t cs,
obj_t* obj )
void bli_obj_create
(
num_t dt,
dim_t m,
dim_t n,
inc_t rs,
inc_t cs,
obj_t* obj
)
{
bli_init_once();
@@ -48,13 +51,16 @@ void bli_obj_create( num_t dt,
bli_obj_alloc_buffer( rs, cs, 1, obj );
}
void bli_obj_create_with_attached_buffer( num_t dt,
dim_t m,
dim_t n,
void* p,
inc_t rs,
inc_t cs,
obj_t* obj )
void bli_obj_create_with_attached_buffer
(
num_t dt,
dim_t m,
dim_t n,
void* p,
inc_t rs,
inc_t cs,
obj_t* obj
)
{
bli_init_once();
@@ -63,10 +69,13 @@ void bli_obj_create_with_attached_buffer( num_t dt,
bli_obj_attach_buffer( p, rs, cs, 1, obj );
}
void bli_obj_create_without_buffer( num_t dt,
dim_t m,
dim_t n,
obj_t* obj )
void bli_obj_create_without_buffer
(
num_t dt,
dim_t m,
dim_t n,
obj_t* obj
)
{
siz_t elem_size;
void* s;
@@ -112,10 +121,13 @@ void bli_obj_create_without_buffer( num_t dt,
else if ( bli_is_dcomplex( dt ) ) { bli_zset1s( *(( dcomplex* )s) ); }
}
void bli_obj_alloc_buffer( inc_t rs,
inc_t cs,
inc_t is,
obj_t* obj )
void bli_obj_alloc_buffer
(
inc_t rs,
inc_t cs,
inc_t is,
obj_t* obj
)
{
dim_t n_elem = 0;
dim_t m, n;
@@ -178,11 +190,14 @@ void bli_obj_alloc_buffer( inc_t rs,
bli_obj_set_imag_stride( is, obj );
}
void bli_obj_attach_buffer( void* p,
inc_t rs,
inc_t cs,
inc_t is,
obj_t* obj )
void bli_obj_attach_buffer
(
void* p,
inc_t rs,
inc_t cs,
inc_t is,
obj_t* obj
)
{
bli_init_once();
@@ -201,24 +216,34 @@ void bli_obj_attach_buffer( void* p,
bli_obj_set_imag_stride( is, obj );
}
void bli_obj_create_1x1( num_t dt,
obj_t* obj )
void bli_obj_create_1x1
(
num_t dt,
obj_t* obj
)
{
bli_obj_create_without_buffer( dt, 1, 1, obj );
bli_obj_alloc_buffer( 1, 1, 1, obj );
}
void bli_obj_create_1x1_with_attached_buffer( num_t dt,
void* p,
obj_t* obj )
void bli_obj_create_1x1_with_attached_buffer
(
num_t dt,
void* p,
obj_t* obj
)
{
bli_obj_create_without_buffer( dt, 1, 1, obj );
bli_obj_attach_buffer( p, 1, 1, 1, obj );
}
void bli_obj_create_conf_to( obj_t* s, obj_t* d )
void bli_obj_create_conf_to
(
obj_t* s,
obj_t* d
)
{
const num_t dt = bli_obj_dt( s );
const dim_t m = bli_obj_length( s );
@@ -229,7 +254,10 @@ void bli_obj_create_conf_to( obj_t* s, obj_t* d )
bli_obj_create( dt, m, n, rs, cs, d );
}
void bli_obj_free( obj_t* obj )
void bli_obj_free
(
obj_t* obj
)
{
if ( bli_error_checking_is_enabled() )
bli_obj_free_check( obj );
@@ -246,7 +274,11 @@ void bli_obj_free( obj_t* obj )
}
#if 0
//void bli_obj_create_const( double value, obj_t* obj )
//void bli_obj_create_const
(
double value,
obj_t* obj
)
{
gint_t* temp_i;
float* temp_s;
@@ -273,7 +305,11 @@ void bli_obj_free( obj_t* obj )
*temp_i = ( gint_t ) value;
}
//void bli_obj_create_const_copy_of( obj_t* a, obj_t* b )
//void bli_obj_create_const_copy_of
(
obj_t* a,
obj_t* b
)
{
gint_t* temp_i;
float* temp_s;
@@ -328,12 +364,15 @@ void bli_obj_free( obj_t* obj )
}
#endif
void bli_adjust_strides( dim_t m,
dim_t n,
siz_t elem_size,
inc_t* rs,
inc_t* cs,
inc_t* is )
void bli_adjust_strides
(
dim_t m,
dim_t n,
siz_t elem_size,
inc_t* rs,
inc_t* cs,
inc_t* is
)
{
// Here, we check the strides that were input from the user and modify
// them if needed.
@@ -422,7 +461,10 @@ static siz_t dt_sizes[6] =
sizeof( constdata_t )
};
siz_t bli_dt_size( num_t dt )
siz_t bli_dt_size
(
num_t dt
)
{
if ( bli_error_checking_is_enabled() )
bli_dt_size_check( dt );
@@ -439,7 +481,10 @@ static char* dt_names[ BLIS_NUM_FP_TYPES+1 ] =
"int"
};
char* bli_dt_string( num_t dt )
char* bli_dt_string
(
num_t dt
)
{
if ( bli_error_checking_is_enabled() )
bli_dt_string_check( dt );
@@ -447,7 +492,11 @@ char* bli_dt_string( num_t dt )
return dt_names[dt];
}
dim_t bli_align_dim_to_mult( dim_t dim, dim_t dim_mult )
dim_t bli_align_dim_to_mult
(
dim_t dim,
dim_t dim_mult
)
{
// We return the dimension unmodified if the multiple is zero
// (to avoid division by zero).
@@ -460,7 +509,12 @@ dim_t bli_align_dim_to_mult( dim_t dim, dim_t dim_mult )
return dim;
}
dim_t bli_align_dim_to_size( dim_t dim, siz_t elem_size, siz_t align_size )
dim_t bli_align_dim_to_size
(
dim_t dim,
siz_t elem_size,
siz_t align_size
)
{
dim = ( ( dim * ( dim_t )elem_size +
( dim_t )align_size - 1
@@ -473,7 +527,11 @@ dim_t bli_align_dim_to_size( dim_t dim, siz_t elem_size, siz_t align_size )
return dim;
}
dim_t bli_align_ptr_to_size( void* p, size_t align_size )
dim_t bli_align_ptr_to_size
(
void* p,
size_t align_size
)
{
dim_t dim;
@@ -484,6 +542,7 @@ dim_t bli_align_ptr_to_size( void* p, size_t align_size )
return dim;
}
#if 0
static num_t type_union[BLIS_NUM_FP_TYPES][BLIS_NUM_FP_TYPES] =
{
// s c d z
@@ -500,8 +559,13 @@ num_t bli_dt_union( num_t dt1, num_t dt2 )
return type_union[dt1][dt2];
}
#endif
void bli_obj_print( char* label, obj_t* obj )
void bli_obj_print
(
char* label,
obj_t* obj
)
{
bli_init_once();

View File

@@ -34,67 +34,118 @@
#include "bli_obj_check.h"
void bli_obj_create( num_t dt,
dim_t m,
dim_t n,
inc_t rs,
inc_t cs,
obj_t* obj );
void bli_obj_create
(
num_t dt,
dim_t m,
dim_t n,
inc_t rs,
inc_t cs,
obj_t* obj
);
void bli_obj_create_with_attached_buffer( num_t dt,
dim_t m,
dim_t n,
void* p,
inc_t rs,
inc_t cs,
obj_t* obj );
void bli_obj_create_with_attached_buffer
(
num_t dt,
dim_t m,
dim_t n,
void* p,
inc_t rs,
inc_t cs,
obj_t* obj
);
void bli_obj_create_without_buffer( num_t dt,
dim_t m,
dim_t n,
obj_t* obj );
void bli_obj_create_without_buffer
(
num_t dt,
dim_t m,
dim_t n,
obj_t* obj
);
void bli_obj_alloc_buffer( inc_t rs,
inc_t cs,
inc_t is,
obj_t* obj );
void bli_obj_alloc_buffer
(
inc_t rs,
inc_t cs,
inc_t is,
obj_t* obj
);
void bli_obj_attach_buffer( void* p,
inc_t rs,
inc_t cs,
inc_t is,
obj_t* obj );
void bli_obj_attach_buffer
(
void* p,
inc_t rs,
inc_t cs,
inc_t is,
obj_t* obj
);
void bli_obj_create_1x1( num_t dt,
obj_t* obj );
void bli_obj_create_1x1
(
num_t dt,
obj_t* obj
);
void bli_obj_create_1x1_with_attached_buffer( num_t dt,
void* p,
obj_t* obj );
void bli_obj_create_1x1_with_attached_buffer
(
num_t dt,
void* p,
obj_t* obj
);
void bli_obj_create_conf_to( obj_t* s, obj_t* d );
void bli_obj_create_conf_to
(
obj_t* s,
obj_t* d
);
void bli_obj_free( obj_t* obj );
void bli_obj_free
(
obj_t* obj
);
//void bli_obj_create_const( double value, obj_t* obj );
void bli_adjust_strides
(
dim_t m,
dim_t n,
siz_t elem_size,
inc_t* rs,
inc_t* cs,
inc_t* is
);
//void bli_obj_create_const_copy_of( obj_t* a, obj_t* b );
siz_t bli_dt_size
(
num_t dt
);
void bli_adjust_strides( dim_t m,
dim_t n,
siz_t elem_size,
inc_t* rs,
inc_t* cs,
inc_t* is );
char* bli_dt_string
(
num_t dt
);
siz_t bli_dt_size( num_t dt );
char* bli_dt_string( num_t dt );
dim_t bli_align_dim_to_mult
(
dim_t dim,
dim_t dim_mult
);
dim_t bli_align_dim_to_mult( dim_t dim, dim_t dim_mult );
dim_t bli_align_dim_to_size( dim_t dim, siz_t elem_size, siz_t align_size );
dim_t bli_align_ptr_to_size( void* p, size_t align_size );
dim_t bli_align_dim_to_size
(
dim_t dim,
siz_t elem_size,
siz_t align_size
);
num_t bli_dt_union( num_t dt1, num_t dt2 );
dim_t bli_align_ptr_to_size
(
void* p,
size_t align_size
);
void bli_obj_print( char* label, obj_t* obj );
void bli_obj_print
(
char* label,
obj_t* obj
);

255
frame/base/bli_rntm.c Normal file
View File

@@ -0,0 +1,255 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of The University of Texas at Austin nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
// -----------------------------------------------------------------------------
void bli_rntm_set_ways_for_op
(
opid_t l3_op,
side_t side,
dim_t m,
dim_t n,
dim_t k,
rntm_t* rntm
)
{
// Set the number of ways for each loop, if needed, depending on what
// kind of information is already stored in the rntm_t object.
bli_rntm_set_ways_from_rntm( m, n, k, rntm );
#if 0
printf( "bli_rntm_set_ways_for_op()\n" );
bli_rntm_print( rntm );
#endif
// Now modify the number of ways, if necessary, based on the operation.
if ( l3_op == BLIS_TRMM ||
l3_op == BLIS_TRSM )
{
dim_t jc = bli_rntm_jc_ways( rntm );
dim_t pc = bli_rntm_pc_ways( rntm );
dim_t ic = bli_rntm_ic_ways( rntm );
dim_t jr = bli_rntm_jr_ways( rntm );
dim_t ir = bli_rntm_ir_ways( rntm );
// Notice that, if we do need to update the ways, we don't need to
// update the num_threads field since we only reshuffle where the
// parallelism is extracted, not the total amount of parallelism.
if ( l3_op == BLIS_TRMM )
{
// We reconfigure the parallelism extracted from trmm_r due to a
// dependency in the jc loop. (NOTE: This dependency does not exist
// for trmm3.)
if ( bli_is_left( side ) )
{
bli_rntm_set_ways_only
(
jc,
pc,
ic,
jr,
ir,
rntm
);
}
else // if ( bli_is_right( side ) )
{
bli_rntm_set_ways_only
(
1,
pc,
ic,
jr * jc,
ir,
rntm
);
}
}
else if ( l3_op == BLIS_TRSM )
{
// For trsm_l, we extract all parallelism from the jr loop, and
// for trsm_r, we extract all parallelism from the ic loop.
if ( bli_is_left( side ) )
{
bli_rntm_set_ways_only
(
1,
1,
1,
ic * pc * jc * jr * ir,
1,
rntm
);
}
else // if ( bli_is_right( side ) )
{
bli_rntm_set_ways_only
(
1,
1,
ic * pc * jc * ir * jr,
1,
1,
rntm
);
}
}
}
}
void bli_rntm_set_ways_from_rntm
(
dim_t m,
dim_t n,
dim_t k,
rntm_t* rntm
)
{
dim_t nt = bli_rntm_num_threads( rntm );
dim_t jc = bli_rntm_jc_ways( rntm );
dim_t pc = bli_rntm_pc_ways( rntm );
dim_t ic = bli_rntm_ic_ways( rntm );
dim_t jr = bli_rntm_jr_ways( rntm );
dim_t ir = bli_rntm_ir_ways( rntm );
bool_t nt_set = FALSE;
bool_t ways_set = FALSE;
#ifdef BLIS_ENABLE_MULTITHREADING
// If the rntm was fed in as a copy of the global runtime via
// bli_thread_init_rntm(), we know that either the num_threads
// field will be set and all of the ways unset, or vice versa.
// However, we can't be sure that a user-provided rntm_t isn't
// initialized uncleanly. So here we have to enforce some rules
// to get the rntm_t into a predictable state.
// First, we establish whether or not the number of threads is set.
if ( nt > 0 ) nt_set = TRUE;
// Next, we establish whether or not any of the ways of parallelism
// for each loop were set. If any of the ways are set (positive), we
// then we assume the user wanted to use those positive values and
// default the non-positive values to 1.
if ( jc > 0 || pc > 0 || ic > 0 || jr > 0 || ir > 0 )
{
ways_set = TRUE;
if ( jc < 1 ) jc = 1;
if ( pc < 1 ) pc = 1;
if ( ic < 1 ) ic = 1;
if ( jr < 1 ) jr = 1;
if ( ir < 1 ) ir = 1;
}
// Now we use the values of nt_set and ways_set to determine how to
// interpret the original values we found in the rntm_t object.
if ( ways_set == TRUE )
{
// If the ways were set, then we use the values that were given
// and interpreted above (we set any non-positive value to 1).
// The only thing left to do is calculate the correct number of
// threads.
nt = jc * pc * ic * jr * ir;
}
else if ( ways_set == FALSE && nt_set == TRUE )
{
// If the ways were not set but the number of threas was set, then
// we attempt to automatically generate a thread factorization that
// will work given the problem size. Thus, here we only set the
// ways and leave the number of threads unchanged.
pc = 1;
bli_partition_2x2( nt, m*BLIS_DEFAULT_M_THREAD_RATIO,
n*BLIS_DEFAULT_N_THREAD_RATIO, &ic, &jc );
for ( ir = BLIS_DEFAULT_MR_THREAD_MAX ; ir > 1 ; ir-- )
{
if ( ic % ir == 0 ) { ic /= ir; break; }
}
for ( jr = BLIS_DEFAULT_NR_THREAD_MAX ; jr > 1 ; jr-- )
{
if ( jc % jr == 0 ) { jc /= jr; break; }
}
}
else // if ( ways_set == FALSE && nt_set == FALSE )
{
// If neither the ways nor the number of threads were set, then
// the rntm was not meaningfully changed since initialization,
// and thus we'll default to single-threaded execution.
nt = 1;
jc = pc = ic = jr = ir = 1;
}
#else
// When multithreading is disabled, always set the rntm_t ways
// values to 1.
nt = 1;
jc = pc = ic = jr = ir = 1;
#endif
// Save the results back in the runtime object.
bli_rntm_set_num_threads_only( nt, rntm );
bli_rntm_set_ways_only( jc, pc, ic, jr, ir, rntm );
}
void bli_rntm_print
(
rntm_t* rntm
)
{
dim_t nt = bli_rntm_num_threads( rntm );
dim_t jc = bli_rntm_jc_ways( rntm );
dim_t pc = bli_rntm_pc_ways( rntm );
dim_t ic = bli_rntm_ic_ways( rntm );
dim_t jr = bli_rntm_jr_ways( rntm );
dim_t ir = bli_rntm_ir_ways( rntm );
printf( "rntm contents nt jc pc ic jr ir\n" );
printf( " %4d%4d%4d%4d%4d%4d\n", (int)nt, (int)jc, (int)pc,
(int)ic, (int)jr, (int)ir );
}

Some files were not shown because too many files have changed in this diff Show More