mirror of
https://github.com/amd/blis.git
synced 2026-04-19 23:28:52 +00:00
Defined rntm_t to relocate cntx_t.thrloop (#235).
Details: - Defined a new struct datatype, rntm_t (runtime), to house the thrloop field of the cntx_t (context). The thrloop array holds the number of ways of parallelism (thread "splits") to extract per level-3 algorithmic loop until those values can be used to create a corresponding node in the thread control tree (thrinfo_t structure), which (for any given level-3 invocation) usually happens by the time the macrokernel is called for the first time. - Relocating the thrloop from the cntx_t remedies a thread-safety issue when invoking level-3 operations from two or more application threads. The race condition existed because the cntx_t, a pointer to which is usually queried from the global kernel structure (gks), is supposed to be a read-only. However, the previous code would write to the cntx_t's thrloop field *after* it had been queried, thus violating its read-only status. In practice, this would not cause a problem when a sequential application made a multithreaded call to BLIS, nor when two or more application threads used the same parallelization scheme when calling BLIS, because in either case all application theads would be using the same ways of parallelism for each loop. The true effects of the race condition were limited to situations where two or more application theads used *different* parallelization schemes for any given level-3 call. - In remedying the above race condition, the application or calling library can now specify the parallelization scheme on a per-call basis. All that is required is that the thread encode its request for parallelism into the rntm_t struct prior to passing the address of the rntm_t to one of the expert interfaces of either the typed or object APIs. This allows, for example, one application thread to extract 4-way parallelism from a call to gemm while another application thread requests 2-way parallelism. Or, two threads could each request 4-way parallelism, but from different loops. - A rntm_t* parameter has been added to the function signatures of most of the level-3 implementation stack (with the most notable exception being packm) as well as all level-1v, -1d, -1f, -1m, and -2 expert APIs. (A few internal functions gained the rntm_t* parameter even though they currently have no use for it, such as bli_l3_packm().) This required some internal calls to some of those functions to be updated since BLIS was already using those operations internally via the expert interfaces. For situations where a rntm_t object is not available, such as within packm/unpackm implementations, NULL is passed in to the relevant expert interfaces. This is acceptable for now since parallelism is not obtained for non-level-3 operations. - Revamped how global parallelism is encoded. First, the conventional environment variables such as BLIS_NUM_THREADS and BLIS_*_NT are only read once, at library initialization. (Thanks to Nathaniel Smith for suggesting this to avoid repeated calls getenv(), which can be slow.) Those values are recorded to a global rntm_t object. Public APIs, in bli_thread.c, are still available to get/set these values from the global rntm_t, though now the "set" functions have additional logic to ensure that the values are set in a synchronous manner via a mutex. If/when NULL is passed into an expert API (meaning the user opted to not provide a custom rntm_t), the values from the global rntm_t are copied to a local rntm_t, which is then passed down the function stack. Calling a basic API is equivalent to calling the expert APIs with NULL for the cntx and rntm parameters, which means the semantic behavior of these basic APIs (vis-a-vis multithreading) is unchanged from before. - Renamed bli_cntx_set_thrloop_from_env() to bli_rntm_set_ways_for_op() and reimplemented, with the function now being able to treat the incoming rntm_t in a manner agnostic to its origin--whether it came from the application or is an internal copy of the global rntm_t. - Removed various global runtime APIs for setting the number of ways of parallelism for individual loops (e.g. bli_thread_set_*_nt()) as well as the corresponding "get" functions. The new model simplifies these interfaces so that one must either set the total number of threads, OR set all of the ways of parallelism for each loop simultaneously (in a single function call). - Updated sandbox/ref99 according to above changes. - Rewrote/augmented docs/Multithreading.md to document the three methods (and two specific ways within each method) of requesting parallelism in BLIS. - Removed old, disabled code from bli_l3_thrinfo.c. - Whitespace changes to code (e.g. bli_obj.c) and docs/BuildSystem.md.
This commit is contained in:
@@ -163,15 +163,15 @@ configure: configured to build within top-level directory of source distribution
|
||||
```
|
||||
The installation prefix can be specified via the `--prefix=PREFIX` option:
|
||||
```
|
||||
$ ./configure --prefix=/usr <configname>
|
||||
$ ./configure --prefix=/usr <configname>
|
||||
```
|
||||
This will cause libraries to eventually be installed (via `make install`) to `PREFIX/lib` and development headers to be installed to `PREFIX/include`. (The default value of `PREFIX` is `$(HOME)/blis`.) You can also specify the library install directory separately from the development header install directory with the `--libdir=LIBDIR` and `--includedir=INCDIR` options, respectively:
|
||||
```
|
||||
$ ./configure --libdir=/usr/lib --includedir=/usr/include <configname>
|
||||
$ ./configure --libdir=/usr/lib --includedir=/usr/include <configname>
|
||||
```
|
||||
The `--libdir=LIBDIR` and `--includedir=INCDIR` options will override any `PREFIX` path, whether it was specified explicitly via `--prefix` or implicitly (via the default). That is, `LIBDIR` defaults to `PREFIX/lib` and `INCDIR` defaults to `PREFIX/include`, but each will be overriden by their respective `--libdir`/`--includedir` options. So,
|
||||
```
|
||||
$ ./configure --libdir=/usr/lib <configname>
|
||||
$ ./configure --libdir=/usr/lib <configname>
|
||||
|
||||
```
|
||||
will configure BLIS to install libraries to `/usr/lib` and header files to the default location (`$HOME/blis/include`).
|
||||
@@ -179,7 +179,7 @@ Also, note that `configure` will create any installation directories that do not
|
||||
|
||||
For a complete list of supported `configure` options and arguments, run `configure` with the `-h` option:
|
||||
```
|
||||
$ ./configure -h
|
||||
$ ./configure -h
|
||||
```
|
||||
The output from this invocation of `configure` should give you an up-to-date list of options and their descriptions.
|
||||
|
||||
|
||||
@@ -1,64 +1,83 @@
|
||||
## Contents
|
||||
# Contents
|
||||
|
||||
* **[Contents](Multithreading.md#contents)**
|
||||
* **[Introduction](Multithreading.md#introduction)**
|
||||
* **[Enabling multithreading](Multithreading.md#enabling-multithreading)**
|
||||
* **[Specifying multithreading](Multithreading.md#specifying-multithreading)**
|
||||
* [The automatic way](Multithreading.md#the-automatic-way)
|
||||
* [The manual way](Multithreading.md#the-manual-way)
|
||||
* [Globally via environment variables](Multithreading.md#globally-via-environment-variables)
|
||||
* [The automatic way](Multithreading.md#environment-variables-the-automatic-way)
|
||||
* [The manual way](Multithreading.md#environment-variables-the-manual-way)
|
||||
* [Globally at runtime](Multithreading.md#globally-at-runtime)
|
||||
* [The automatic way](Multithreading.md#globally-at-runtime-the-automatic-way)
|
||||
* [The manual way](Multithreading.md#globally-at-runtime-the-manual-way)
|
||||
* [Locally at runtime](Multithreading.md#locally-at-runtime)
|
||||
* [Initializing a rntm_t](Multithreading.md#initializing-a-rntm-t)
|
||||
* [The automatic way](Multithreading.md#locally-at-runtime-the-automatic-way)
|
||||
* [The manual way](Multithreading.md#locally-at-runtime-the-manual-way)
|
||||
* [Using the expert interface](Multithreading.md#locally-at-runtime-using-the-expert-interface)
|
||||
|
||||
## Introduction
|
||||
|
||||
Our paper [Anatomy of High-Performance Many-Threaded Matrix Multiplication](https://github.com/flame/blis#citations), presented at IPDPS'14, identified 5 loops around the micro-kernel as opportunities for parallelization. Within BLIS, we have enabled parallelism for 4 of those loops and have extended it to the rest of the level-3 operations except for `trsm`.
|
||||
# Introduction
|
||||
|
||||
## Enabling multithreading
|
||||
Our paper [Anatomy of High-Performance Many-Threaded Matrix Multiplication](https://github.com/flame/blis#citations), presented at IPDPS'14, identified 5 loops around the micro-kernel as opportunities for parallelization within level-3 operations such as `gemm`. Within BLIS, we have enabled parallelism for 4 of those loops and have extended it to the rest of the level-3 operations except for `trsm`.
|
||||
|
||||
Note that BLIS disables multithreading by default.
|
||||
# Enabling multithreading
|
||||
|
||||
Note that BLIS disables multithreading by default. In order to extract multithreaded parallelism from BLIS, you must first enable multithreading explicitly at configure-time.
|
||||
|
||||
As of this writing, BLIS optionally supports multithreading via either OpenMP or POSIX threads.
|
||||
|
||||
To enable multithreading via OpenMP, you must provide the `--enable-threading` option to the `configure` script:
|
||||
```
|
||||
$ ./configure --enable-threading=openmp haswell
|
||||
$ ./configure --enable-threading=openmp auto
|
||||
```
|
||||
In this example, we configure for the `haswell` configuration. Similarly, to enable multithreading via POSIX threads (pthreads), specify the threading model as `pthreads` instead of `openmp`:
|
||||
```
|
||||
$ ./configure --enable-threading=pthreads haswell
|
||||
$ ./configure --enable-threading=pthreads auto
|
||||
```
|
||||
You can also use the shorthand option for `--enable-threading`, which is `-t`:
|
||||
```
|
||||
$ ./configure -t pthreads
|
||||
```
|
||||
For more complete and up-to-date information on the `--enable-threading` option, simply run `configure` with the `--help` (or `-h`) option:
|
||||
```
|
||||
$ ./configure --help
|
||||
$ ./configure --help
|
||||
```
|
||||
|
||||
|
||||
## Specifying multithreading
|
||||
# Specifying multithreading
|
||||
|
||||
There are two broad ways to specify multithreading in BLIS: the "automatic way" or the "manual way".
|
||||
There are three broad methods of specifying multithreading in BLIS:
|
||||
* [Globally via environment variables](Multithreading.md#globally-via-environment-variables)
|
||||
* [Globally at runtime](Multithreading.md#globally-at-runtime)
|
||||
* [Locally at runtime](Multithreading.md#locally-at-runtime) (that is, on a per-call, thread-safe basis)
|
||||
|
||||
### The automatic way
|
||||
Within these three broad methods there are two specific ways of expressing a request for parallelism. First, the user may express a single number--the total number of threads, or ways of parallelism, to use within a single operation such as `gemm`. We call this the "automatic" way. Alternatively, the user may express the number of ways of parallelism to obtain within *each loop* of the level-3 operation. We call this the "manual" way. The latter way is actually what BLIS eventually needs before it can perform its multithreading; the former is viable only because we have a heuristic of determing a reasonable instance of the latter when given the former.
|
||||
This pattern--automatic or manual--holds regardless of which of the three methods is used.
|
||||
|
||||
The simplest way to enable multithreading in BLIS is to simply set the total number of threads you wish BLIS to employ in its parallelization. This total number of threads is captured by the `BLIS_NUM_THREADS` environment variable. You can set this variable prior to executing your BLIS-linked executable:
|
||||
Regardless of which method is employed, and which specific way within each method, after setting the number of threads, the application may simply call the desired level-3 operation via either the BLAS, the [typed API](docs/BLISTypedAPI.md), or the [object API](docs/BLISObjectAPI.md), and the operation will execute in a multithreaded manner.
|
||||
|
||||
## Globally via environment variables
|
||||
|
||||
The most common method of specifying multithreading in BLIS is globally via environment variables. With this method, the user sets one or more environment variables in the shell before launching the BLIS-linked executable.
|
||||
|
||||
Regardless of whether you end up using the automatic or manual way of expressing a request for multithreading, note that the environment variables are read (via `getenv()`) by BLIS **only once**, when the library is initialized. Subsequent to library initialization, the global settings for parallelization may only be changed via the [global runtime API](Multithreading.md#globally-at-runtime). If this constraint is not a problem, then environment variables may work fine for you.
|
||||
|
||||
### Environment variables: the automatic way
|
||||
|
||||
The automatic way of specifying parallelism entails simply setting the total number of threads you wish BLIS to employ in its parallelization. This total number of threads is captured by the `BLIS_NUM_THREADS` environment variable. You can set this variable prior to executing your BLIS-linked executable:
|
||||
```
|
||||
$ export BLIS_NUM_THREADS=16
|
||||
$ ./my_blis_program
|
||||
```
|
||||
This causes BLIS to automatically determine a reasonable threading strategy based on what is known about your architecture. If `BLIS_NUM_THREADS` is not set, then BLIS also looks at the value of `OMP_NUM_THREADS`, if set. If neither variable is set, the default number of threads is 1.
|
||||
|
||||
Alternatively, any time after calling `bli_init()` but before `bli_finalize()`, you can also set (or change) the value of `BLIS_NUM_THREADS` at run-time:
|
||||
```
|
||||
bli_thread_set_num_threads( 8 );
|
||||
```
|
||||
Similarly, the current value of `BLIS_NUM_THREADS` can always be queried as follows:
|
||||
```
|
||||
dim_t num_threads = bli_thread_get_num_threads();
|
||||
$ export GOMP_CPU_AFFINITY="..." # optional step when using GNU libgomp.
|
||||
$ export BLIS_NUM_THREADS=16
|
||||
$ ./my_blis_program
|
||||
```
|
||||
This causes BLIS to automatically determine a reasonable threading strategy based on what is known about the operation and problem size. If `BLIS_NUM_THREADS` is not set, then BLIS also looks at the value of `OMP_NUM_THREADS`, if set. If neither variable is set, the default number of threads is 1.
|
||||
|
||||
### The manual way
|
||||
### Environment variables: the manual way
|
||||
|
||||
The "manual way" of specifying parallelism in BLIS involves specifying which loops within the matrix multiplication algorithm to parallelize, and the degree of parallelism to be obtained from those loops.
|
||||
The manual way of specifying parallelism involves communicating which loops within the matrix multiplication algorithm to parallelize and the degree of parallelism to be obtained from each of those loops.
|
||||
|
||||
The below chart describes the five loops used in BLIS's matrix multiplication operations.
|
||||
The below chart describes the five loops used in BLIS's matrix multiplication operations.
|
||||
|
||||
| Loop around micro-kernel | Environment variable | Direction | Notes |
|
||||
|:-------------------------|:---------------------|:----------|:------------|
|
||||
@@ -68,9 +87,11 @@ The below chart describes the five loops used in BLIS's matrix multiplication op
|
||||
| 2nd loop | `BLIS_JR_NT` | `n` | |
|
||||
| 1st loop | `BLIS_IR_NT` | `m` | |
|
||||
|
||||
Note: Parallelization of the 4th loop is not currently enabled because each iteration of the loop updates the same part of the matrix C. Thus to parallelize it requires either a reduction or mutex locks when updating C.
|
||||
**Note**: Parallelization of the 4th loop is not currently enabled because each iteration of the loop updates the same part of the output matrix C. Thus, to safely parallelize it requires either a reduction or mutex locks when updating C.
|
||||
|
||||
Parallelization in BLIS is hierarchical. So if we parallelize multiple loops, the total number of threads will be the product of the amount of parallelism for each loop. Thus the total number of threads used is `BLIS_IR_NT * BLIS_JR_NT * BLIS_IC_NT * BLIS_JC_NT`.
|
||||
Parallelization in BLIS is hierarchical. So if we parallelize multiple loops, the total number of threads will be the product of the amount of parallelism for each loop. Thus the total number of threads used is the product of all the values:
|
||||
`BLIS_JC_NT * BLIS_IC_NT * BLIS_JR_NT * BLIS_IR_NT`.
|
||||
Note that if you set at least one of these loop-specific variables, any others that are unset will default to 1.
|
||||
|
||||
In general, the way to choose how to set these environment variables is as follows: The amount of parallelism from the M and N dimensions should be roughly the same. Thus `BLIS_IR_NT * BLIS_IC_NT` should be roughly equal to `BLIS_JR_NT * BLIS_JC_NT`.
|
||||
|
||||
@@ -81,18 +102,123 @@ Next, which combinations of loops to parallelize depends on which caches are sha
|
||||
|
||||

|
||||
|
||||
As with specifying parallelism via `BLIS_NUM_THREADS`, you can set the `BLIS_xx_NT` environment variables in the shell, prior to launching your BLIS-linked executable, or you can set (or update) the environment variables at run-time. Here are some examples of using the run-time API:
|
||||
## Globally at runtime
|
||||
|
||||
If you still wish to set the parallelization scheme globally, but you want to do so at runtime, BLIS provides a thread-safe API for specifying multithreading. Think of these functions as a way to modify the same internal data structure into which the environment variables are read. (Recall that the environment variables are only read once, when BLIS is initialized).
|
||||
|
||||
### Globally at runtime: the automatic way
|
||||
|
||||
If you simply want to specify an overall number of threads and let BLIS choose a thread factorization automatically, use the following function:
|
||||
```c
|
||||
bli_thread_set_jc_nt( 2 ); // Set BLIS_JC_NT to 2.
|
||||
bli_thread_set_jc_nt( 4 ); // Set BLIS_IC_NT to 4.
|
||||
bli_thread_set_jr_nt( 3 ); // Set BLIS_JR_NT to 3.
|
||||
bli_thread_set_ir_nt( 1 ); // Set BLIS_IR_NT to 1.
|
||||
void bli_thread_set_num_threads( dim_t n_threads );
|
||||
```
|
||||
There are also equivalent "get" functions that allow you to query the current values for the `BLIS_xx_NT` variables:
|
||||
This function takes one integer--the total number of threads for BLIS to utilize in any one operation. So, for example, if we call
|
||||
```c
|
||||
dim_t jc_nt = bli_thread_get_jc_nt();
|
||||
dim_t ic_nt = bli_thread_get_ic_nt();
|
||||
dim_t jr_nt = bli_thread_get_jr_nt();
|
||||
dim_t ir_nt = bli_thread_get_ir_nt();
|
||||
bli_thread_set_num_threads( 4 );
|
||||
```
|
||||
we are requesting that the global number of threads be set to 4. You may also query the global number of threads at any time via
|
||||
```c
|
||||
dim_t bli_thread_get_num_threads( void );
|
||||
```
|
||||
Which may be called in the usual way:
|
||||
```c
|
||||
dim_t nt = bli_thread_get_num_threads();
|
||||
```
|
||||
|
||||
### Globally at runtime: the manual way
|
||||
|
||||
If you want to specify the number of ways of parallelism to obtain for each loop, use the following function:
|
||||
```c
|
||||
void bli_thread_set_ways( dim_t jc, dim_t pc, dim_t ic, dim_t jr, dim_t ir );
|
||||
```
|
||||
This function takes one integer for each loop in the level-3 operations. (**Note**: even though the function takes a `pc` argument, it will be ignored until parallelism is supported in the KC loop.)
|
||||
So, for example, if we call
|
||||
```c
|
||||
bli_thread_set_ways( 2, 1, 4, 1, 1 );
|
||||
```
|
||||
we are requesting two ways of parallelism in the `JC` loop and 4 ways of parallelism in the `IC` loop.
|
||||
Unlike environment variables, which only allow the user to set the parallelization strategy prior to running the executable, `bli_thread_set_ways()` may be called any time during the normal course of the BLIS-linked application's execution.
|
||||
|
||||
## Locally at runtime
|
||||
|
||||
In addition to the global methods based on environment variables and runtime function calls, BLIS also a local, *per-call* method of requesting parallelism at runtime. This method has the benefit of being thread-safe and flexible; your application can spawn two threads, with each thread requesting different degrees of parallelism from their respective calls to level-3 operations.
|
||||
|
||||
As with environment variables and the global runtime API, there are two ways to specify parallelism: the automatic way and the manual way. Both ways involve allocating a BLIS-specific object, initializing the object and encoding the desired parallelization, and then passing a pointer to the object into one of the expert interfaces of either the [typed](docs/BLISTypedAPI.md) or [object](docs/BLISObjectAPI) APIs. We provide examples of utilizing this threading object below.
|
||||
|
||||
### Initializing a rntm_t
|
||||
|
||||
Before specifying the parallelism (automatically or manually), you must first allocate a special BLIS object called a `rntm_t` (runtime). The object is quite small (about 64 bytes), and so we recommend allocating it statically on the function stack:
|
||||
```c
|
||||
rntm_t rntm;
|
||||
```
|
||||
We **strongly recommend** initializing the `rntm_t`. This can be done in either of two ways.
|
||||
If you want to also initialize it as part of the declaration, you may do so via the default `BLIS_RNTM_INITIALIZER` macro:
|
||||
```c
|
||||
rntm_t rntm = BLIS_RNTM_INITIALIZER;
|
||||
```
|
||||
Alternatively, you can perform the same initialization by passing the address of the `rntm_t` to an initialization function:
|
||||
```c
|
||||
bli_rntm_init( &rntm );
|
||||
```
|
||||
As of this writing, BLIS treats a default-initialized `rntm_t` as a request for single-threaded execution.
|
||||
|
||||
**Note**: If you choose to **not** initialize the `rntm_t` object, you **must** set its parallelism via either the automatic way or the manual way, described below. Passing a completely uninitialized `rntm_t` to a level-3 operation **will almost surely result in undefined behvaior!**
|
||||
|
||||
### Locally at runtime: the automatic way
|
||||
|
||||
Once your `rntm_t` is initialized, you may request automatic parallelization by encoding only the total number of threads into the `rntm_t` via the following function:
|
||||
```c
|
||||
void bli_rntm_set_num_threads( dim_t n_threads, rntm_t* rntm );
|
||||
```
|
||||
As with `bli_thread_set_num_threads()` [discussed previously](Multithreading.md#globally-at-runtime-the-automatic-way), this function takes a single integer. It also takes the address of the `rntm_t` to modify. So, for example, if (after declaring and initializing a `rntm_t` as discussed above) we call
|
||||
```c
|
||||
bli_rntm_set_num_threads( 6, &rntm );
|
||||
```
|
||||
the `rntm_t` object will be encoded to use a total of 6 threads.
|
||||
|
||||
### Locally at runtime: the manual way
|
||||
|
||||
Once your `rntm_t` is initialized, you may manually encode the ways of parallelism for each loop into the `rntm_t` by using the following function:
|
||||
```c
|
||||
void bli_rntm_set_ways( dim_t jc, dim_t pc, dim_t ic, dim_t jr, dim_t ir, rntm_t* rntm );
|
||||
```
|
||||
As with `bli_thread_set_ways()` [discussed previously](Multithreading.md#globally-at-runtime-the-manual-way), this function takes one integer for each loop in the level-3 operations. It also takes the address of the `rntm_t` to modify.
|
||||
(**Note**: even though the function takes a `pc` argument, it will be ignored until parallelism is supported in the `KC` loop.)
|
||||
So, for example, if we call
|
||||
```c
|
||||
bli_rntm_set_ways( 1, 1, 2, 3, 1, &rntm );
|
||||
```
|
||||
we are requesting two ways of parallelism in the `IC` loop and three ways of parallelism in the `JR` loop.
|
||||
|
||||
### Locally at runtime: using the expert interfaces
|
||||
|
||||
Regardless of whether you specified parallelism into your `rntm_t` object via the automatic or manual method, eventually you must use the data structure when calling a BLIS operation.
|
||||
|
||||
Let's assume you wish to call `gemm`. To so do, simply use the expert interface, which takes two additional arguments: a `cntx_t` (context) and a `rntm_t`. For the context, you may simply pass in `NULL` and BLIS will select a default context (which is exactly what happens when you call the basic/non-expert interfaces). Here is an example of such a call:
|
||||
```c
|
||||
bli_gemm_ex( &alpha, &a, &b, &beta, &c, NULL, &rntm );
|
||||
```
|
||||
This will cause `gemm` to execute and be parallelized in the manner encoded by `rntm`.
|
||||
|
||||
To summarize, using a `rntm_t` involves three steps:
|
||||
```c
|
||||
// Declare and initialize a rntm_t object.
|
||||
rntm_t rntm = BLIS_RNTM_INITIALIZER;
|
||||
|
||||
// Call ONE (not both) of the following to encode your parallelization into
|
||||
// the rntm_t. (These are examples only--use numbers that make sense for your
|
||||
// application!)
|
||||
bli_rntm_set_num_threads( 6, &rntm );
|
||||
bli_rntm_set_ways( 1, 1, 2, 3, 1, &rntm );
|
||||
|
||||
// Finally, call BLIS via an expert interface and pass in your rntm_t.
|
||||
bli_gemm_ex( &alpha, &a, &b, &beta, &c, NULL, &rntm );
|
||||
```
|
||||
Note that `rntm_t` objects may be reused over and over again once they are initialized; there is no need to reinitialize them and re-encode their threading values!
|
||||
|
||||
# Conclusion
|
||||
|
||||
Please send us feedback if you have any concerns or questions, or [open an issue](http://github.com/flame/blis/issues) if you observe any reproducible behavior that you think is erroneous. (You are welcome to use the issue feature to start any non-trivial dialogue; we don't restrict them only to bug reports!
|
||||
|
||||
Thanks for your interest in BLIS.
|
||||
|
||||
|
||||
@@ -67,7 +67,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
PASTEMAC(opname,_check)( x, y ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_7 \
|
||||
bli_call_ft_8 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -75,7 +75,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
n, \
|
||||
buf_x, inc_x, \
|
||||
buf_y, inc_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -110,14 +111,15 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
PASTEMAC(opname,_check)( x, index ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_5 \
|
||||
bli_call_ft_6 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
n, \
|
||||
buf_x, incx, \
|
||||
buf_index, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -168,7 +170,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_beta = bli_obj_buffer_for_1x1( dt, &beta_local ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_9 \
|
||||
bli_call_ft_10 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -178,7 +180,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_x, inc_x, \
|
||||
buf_beta, \
|
||||
buf_y, inc_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -223,7 +226,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_8 \
|
||||
bli_call_ft_9 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -232,7 +235,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha, \
|
||||
buf_x, inc_x, \
|
||||
buf_y, inc_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -270,7 +274,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
PASTEMAC(opname,_check)( x, y, rho ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_9 \
|
||||
bli_call_ft_10 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -280,7 +284,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_x, inc_x, \
|
||||
buf_y, inc_y, \
|
||||
buf_rho, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -334,7 +339,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_beta = bli_obj_buffer_for_1x1( dt, &beta_local ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_11 \
|
||||
bli_call_ft_12 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -346,7 +351,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_y, inc_y, \
|
||||
buf_beta, \
|
||||
buf_rho, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -376,13 +382,14 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
PASTEMAC(opname,_check)( x ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_4 \
|
||||
bli_call_ft_5 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
n, \
|
||||
buf_x, inc_x, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -424,7 +431,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_6 \
|
||||
bli_call_ft_7 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -432,7 +439,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
n, \
|
||||
buf_alpha, \
|
||||
buf_x, inc_x, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -466,14 +474,15 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
PASTEMAC(opname,_check)( x, y ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_6 \
|
||||
bli_call_ft_7 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
n, \
|
||||
buf_x, inc_x, \
|
||||
buf_y, inc_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -518,7 +527,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_beta = bli_obj_buffer_for_1x1( dt, &beta_local ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_8 \
|
||||
bli_call_ft_9 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -527,7 +536,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_x, inc_x, \
|
||||
buf_beta, \
|
||||
buf_y, inc_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
|
||||
@@ -72,7 +72,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
PASTEMAC(opname,_check)( x, y ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_12 \
|
||||
bli_call_ft_13 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -83,7 +83,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
n, \
|
||||
buf_x, rs_x, cs_x, \
|
||||
buf_y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -135,7 +136,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_13 \
|
||||
bli_call_ft_14 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -147,7 +148,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha, \
|
||||
buf_x, rs_x, cs_x, \
|
||||
buf_y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -181,7 +183,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
PASTEMAC(opname,_check)( x ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_7 \
|
||||
bli_call_ft_8 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -189,7 +191,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
m, \
|
||||
n, \
|
||||
buf_x, rs_x, cs_x, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -234,7 +237,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_9 \
|
||||
bli_call_ft_10 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -244,7 +247,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
n, \
|
||||
buf_alpha, \
|
||||
buf_x, rs_x, cs_x, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -281,7 +285,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
PASTEMAC(opname,_check)( alpha, x ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_8 \
|
||||
bli_call_ft_9 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -290,7 +294,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
n, \
|
||||
buf_alpha, \
|
||||
buf_x, rs_x, cs_x, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
|
||||
@@ -88,7 +88,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alphay = bli_obj_buffer_for_1x1( dt, &alphay_local ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_12 \
|
||||
bli_call_ft_13 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -100,7 +100,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_x, inc_x, \
|
||||
buf_y, inc_y, \
|
||||
buf_z, inc_z, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -154,7 +155,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
if ( bli_obj_has_trans( a ) ) { bli_swap_incs( &rs_a, &cs_a ); } \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_13 \
|
||||
bli_call_ft_14 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -166,7 +167,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_a, rs_a, cs_a, \
|
||||
buf_x, inc_x, \
|
||||
buf_y, inc_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -219,7 +221,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_13 \
|
||||
bli_call_ft_14 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -232,7 +234,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_y, inc_y, \
|
||||
buf_rho, \
|
||||
buf_z, inc_z, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -301,7 +304,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
if ( bli_obj_has_trans( a ) ) { bli_swap_incs( &rs_a, &cs_a ); } \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_20 \
|
||||
bli_call_ft_21 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -318,7 +321,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_beta, \
|
||||
buf_y, inc_y, \
|
||||
buf_z, inc_z, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -378,7 +382,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
if ( bli_obj_has_trans( a ) ) { bli_swap_incs( &rs_a, &cs_a ); } \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
bli_call_ft_14 \
|
||||
bli_call_ft_15 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -391,7 +395,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_x, inc_x, \
|
||||
buf_beta, \
|
||||
buf_y, inc_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
|
||||
@@ -73,7 +73,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
PASTEMAC(opname,_check)( x, y ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_13 \
|
||||
bli_call_ft_14 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -85,7 +85,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
n, \
|
||||
buf_x, rs_x, cs_x, \
|
||||
buf_y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -138,7 +139,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_14 \
|
||||
bli_call_ft_15 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -151,7 +152,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha, \
|
||||
buf_x, rs_x, cs_x, \
|
||||
buf_y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -212,7 +214,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_internal_scalar_buffer( &x_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_11 \
|
||||
bli_call_ft_12 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -224,7 +226,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
n, \
|
||||
buf_alpha, \
|
||||
buf_x, rs_x, cs_x, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -271,7 +274,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_11 \
|
||||
bli_call_ft_12 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -283,7 +286,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
n, \
|
||||
buf_alpha, \
|
||||
buf_x, rs_x, cs_x, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
|
||||
@@ -77,7 +77,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
n, \
|
||||
x, rs_x, cs_x, \
|
||||
y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
\
|
||||
/* When the diagonal of an upper- or lower-stored matrix is unit,
|
||||
@@ -94,7 +95,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
n, \
|
||||
x, rs_x, cs_x, \
|
||||
y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
} \
|
||||
}
|
||||
@@ -140,7 +142,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
n, \
|
||||
x, rs_x, cs_x, \
|
||||
y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
\
|
||||
/* When the diagonal of an upper- or lower-stored matrix is unit,
|
||||
@@ -162,7 +165,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
n, \
|
||||
one, \
|
||||
y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
} \
|
||||
}
|
||||
@@ -212,7 +216,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
alpha, \
|
||||
x, rs_x, cs_x, \
|
||||
y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
\
|
||||
/* When the diagonal of an upper- or lower-stored matrix is unit,
|
||||
@@ -230,7 +235,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
alpha, \
|
||||
x, rs_x, cs_x, \
|
||||
y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
} \
|
||||
}
|
||||
@@ -280,7 +286,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
n, \
|
||||
alpha, \
|
||||
y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
return; \
|
||||
} \
|
||||
@@ -298,7 +305,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
alpha, \
|
||||
x, rs_x, cs_x, \
|
||||
y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
\
|
||||
/* When the diagonal of an upper- or lower-stored matrix is unit,
|
||||
@@ -319,7 +327,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
n, \
|
||||
alpha, \
|
||||
y, rs_y, cs_y, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
} \
|
||||
}
|
||||
@@ -364,7 +373,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
n, \
|
||||
alpha, \
|
||||
x, rs_x, cs_x, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
|
||||
@@ -51,7 +51,8 @@ void PASTEMAC(ch,opname) \
|
||||
dim_t n, \
|
||||
ctype* x, inc_t rs_x, inc_t cs_x, \
|
||||
ctype* y, inc_t rs_y, inc_t cs_y, \
|
||||
cntx_t* cntx \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm \
|
||||
) \
|
||||
{ \
|
||||
const num_t dt = PASTEMAC(ch,type); \
|
||||
@@ -167,7 +168,8 @@ void PASTEMAC(ch,opname) \
|
||||
ctype* alpha, \
|
||||
ctype* x, inc_t rs_x, inc_t cs_x, \
|
||||
ctype* y, inc_t rs_y, inc_t cs_y, \
|
||||
cntx_t* cntx \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm \
|
||||
) \
|
||||
{ \
|
||||
const num_t dt = PASTEMAC(ch,type); \
|
||||
@@ -284,7 +286,8 @@ void PASTEMAC(ch,opname) \
|
||||
dim_t n, \
|
||||
ctype* alpha, \
|
||||
ctype* x, inc_t rs_x, inc_t cs_x, \
|
||||
cntx_t* cntx \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm \
|
||||
) \
|
||||
{ \
|
||||
const num_t dt = PASTEMAC(ch,type); \
|
||||
|
||||
@@ -50,7 +50,8 @@ void PASTEMAC2(ch,opname,_unb_var1) \
|
||||
dim_t n, \
|
||||
ctype* x, inc_t rs_x, inc_t cs_x, \
|
||||
ctype* y, inc_t rs_y, inc_t cs_y, \
|
||||
cntx_t* cntx \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC0( addm )
|
||||
@@ -72,7 +73,8 @@ void PASTEMAC2(ch,opname,_unb_var1) \
|
||||
ctype* alpha, \
|
||||
ctype* x, inc_t rs_x, inc_t cs_x, \
|
||||
ctype* y, inc_t rs_y, inc_t cs_y, \
|
||||
cntx_t* cntx \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC0( axpym )
|
||||
@@ -92,7 +94,8 @@ void PASTEMAC2(ch,opname,_unb_var1) \
|
||||
dim_t n, \
|
||||
ctype* alpha, \
|
||||
ctype* x, inc_t rs_x, inc_t cs_x, \
|
||||
cntx_t* cntx \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC0( scalm )
|
||||
|
||||
@@ -89,7 +89,10 @@ void PASTEMAC(ch,opname) \
|
||||
kappa, \
|
||||
a, inca, lda, \
|
||||
p, 1, ldp, \
|
||||
cntx \
|
||||
cntx, \
|
||||
/* The rntm_t* can safely be NULL as long as it's not used by
|
||||
scal2m_ex(). */ \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
}
|
||||
|
||||
@@ -181,7 +181,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero, \
|
||||
p_edge, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -203,7 +204,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero, \
|
||||
p_edge, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -236,7 +238,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_br, \
|
||||
one, \
|
||||
p_br, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
@@ -450,7 +453,8 @@ void PASTEMAC(ch,varname) \
|
||||
p11_n, \
|
||||
c11, rs_c, cs_c, \
|
||||
p11, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
/* If source matrix c is Hermitian, we have to zero out the
|
||||
@@ -481,7 +485,8 @@ void PASTEMAC(ch,varname) \
|
||||
p11_n, \
|
||||
kappa, \
|
||||
p11, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
@@ -544,7 +549,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
kappa, \
|
||||
p, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -557,7 +563,8 @@ void PASTEMAC(ch,varname) \
|
||||
m_panel, \
|
||||
n_panel, \
|
||||
p, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -586,7 +593,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
zero, \
|
||||
p, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -183,7 +183,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -195,7 +196,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -207,7 +209,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_rpi, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -231,7 +234,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -243,7 +247,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -255,7 +260,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_rpi, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -290,7 +296,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_br, \
|
||||
one_r, \
|
||||
p_br_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -300,7 +307,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_br, \
|
||||
zero_r, \
|
||||
p_br_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
@@ -521,7 +529,8 @@ void PASTEMAC(ch,varname) \
|
||||
alpha_r, \
|
||||
c11_r, rs_c11, cs_c11, \
|
||||
p11_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
/* Copy the imaginary part of the stored triangle of c11 to p11_i,
|
||||
@@ -537,7 +546,8 @@ void PASTEMAC(ch,varname) \
|
||||
alpha_i, \
|
||||
c11_i, rs_c11, cs_c11, \
|
||||
p11_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
/* If source matrix c is Hermitian, we have to zero out the
|
||||
@@ -689,7 +699,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
&kappa_r, \
|
||||
p_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -699,7 +710,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
&kappa_i, \
|
||||
p_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
/* Update the diagonal of the p11 section of the rpi panel.
|
||||
@@ -757,7 +769,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
zero_r, \
|
||||
p_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -769,7 +782,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
zero_r, \
|
||||
p_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -781,7 +795,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
zero_r, \
|
||||
p_rpi, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
|
||||
@@ -182,7 +182,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -194,7 +195,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -217,7 +219,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -229,7 +232,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -264,7 +268,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_br, \
|
||||
one_r, \
|
||||
p_br_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -274,7 +279,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_br, \
|
||||
zero_r, \
|
||||
p_br_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
@@ -495,7 +501,8 @@ void PASTEMAC(ch,varname) \
|
||||
alpha_r, \
|
||||
c11_r, rs_c11, cs_c11, \
|
||||
p11_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
/* Copy the imaginary part of the stored triangle of c11 to p11_i,
|
||||
@@ -511,7 +518,8 @@ void PASTEMAC(ch,varname) \
|
||||
alpha_i, \
|
||||
c11_i, rs_c11, cs_c11, \
|
||||
p11_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
/* If source matrix c is Hermitian, we have to zero out the
|
||||
@@ -634,7 +642,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
&kappa_r, \
|
||||
p_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -644,7 +653,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
&kappa_i, \
|
||||
p_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -689,7 +699,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
zero_r, \
|
||||
p_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
|
||||
( \
|
||||
@@ -701,7 +712,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
zero_r, \
|
||||
p_i, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
|
||||
@@ -185,7 +185,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -207,7 +208,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_edge, \
|
||||
zero_r, \
|
||||
p_edge_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -581,7 +583,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_panel, \
|
||||
zero_r, \
|
||||
p_r, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
|
||||
@@ -163,7 +163,8 @@ void PASTEMAC(ch,varname) \
|
||||
kappa_cast, \
|
||||
c_cast, rs_c, cs_c, \
|
||||
p_cast, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
/* If uploc is upper or lower, then the structure of c is necessarily
|
||||
@@ -205,7 +206,8 @@ void PASTEMAC(ch,varname) \
|
||||
kappa_cast, \
|
||||
c_cast, rs_c, cs_c, \
|
||||
p_cast, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else /* if ( bli_is_triangular( strucc ) ) */ \
|
||||
@@ -239,7 +241,8 @@ void PASTEMAC(ch,varname) \
|
||||
n, \
|
||||
zero, \
|
||||
p_cast, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
} \
|
||||
@@ -265,7 +268,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_max, \
|
||||
zero, \
|
||||
p_edge, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
@@ -283,7 +287,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_max - n, \
|
||||
zero, \
|
||||
p_edge, rs_p, cs_p, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
}
|
||||
|
||||
@@ -246,7 +246,8 @@ void PASTEMAC(ch,varname) \
|
||||
one, \
|
||||
p_begin, rs_p, cs_p, \
|
||||
c_begin, rs_c, cs_c, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
|
||||
@@ -89,7 +89,8 @@ void PASTEMAC(ch,opname) \
|
||||
kappa, \
|
||||
p, 1, ldp, \
|
||||
a, inca, lda, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
}
|
||||
|
||||
@@ -122,7 +122,8 @@ void PASTEMAC(ch,varname)( \
|
||||
n, \
|
||||
p_cast, rs_p, cs_p, \
|
||||
c_cast, rs_c, cs_c, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
}
|
||||
|
||||
|
||||
@@ -90,7 +90,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_beta = bli_obj_buffer_for_1x1( dt, &beta_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_14 \
|
||||
bli_call_ft_15 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -103,7 +103,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_x, incx, \
|
||||
buf_beta, \
|
||||
buf_y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -154,7 +155,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_13 \
|
||||
bli_call_ft_14 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -166,7 +167,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_x, incx, \
|
||||
buf_y, incy, \
|
||||
buf_a, rs_a, cs_a, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -223,7 +225,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_beta = bli_obj_buffer_for_1x1( dt, &beta_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_14 \
|
||||
bli_call_ft_15 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -236,7 +238,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_x, incx, \
|
||||
buf_beta, \
|
||||
buf_y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -284,7 +287,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_10 \
|
||||
bli_call_ft_11 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -294,7 +297,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha, \
|
||||
buf_x, incx, \
|
||||
buf_a, rs_a, cs_a, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -346,7 +350,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_13 \
|
||||
bli_call_ft_14 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -358,7 +362,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_x, incx, \
|
||||
buf_y, incy, \
|
||||
buf_a, rs_a, cs_a, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -407,7 +412,7 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha = bli_obj_buffer_for_1x1( dt, &alpha_local ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_11 \
|
||||
bli_call_ft_12 \
|
||||
( \
|
||||
dt, \
|
||||
PASTECH(opname,BLIS_TAPI_EX_SUF), \
|
||||
@@ -418,7 +423,8 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
buf_alpha, \
|
||||
buf_a, rs_a, cs_a, \
|
||||
buf_x, incx, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
|
||||
@@ -82,7 +82,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
m_y, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
return; \
|
||||
} \
|
||||
@@ -206,7 +207,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
m, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
return; \
|
||||
} \
|
||||
@@ -461,7 +463,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
m, \
|
||||
alpha, \
|
||||
x, incx, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
return; \
|
||||
} \
|
||||
|
||||
@@ -79,7 +79,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_elem, \
|
||||
zero, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
@@ -91,7 +92,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_elem, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -79,7 +79,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_elem, \
|
||||
zero, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
@@ -91,7 +92,8 @@ void PASTEMAC(ch,varname) \
|
||||
n_elem, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -100,7 +100,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
zero, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
@@ -112,7 +113,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -102,7 +102,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
zero, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
@@ -114,7 +115,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -100,7 +100,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
zero, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
@@ -112,7 +113,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -101,7 +101,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
zero, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
@@ -113,7 +114,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -109,7 +109,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
zero, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
@@ -121,7 +122,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -100,7 +100,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
zero, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
@@ -112,7 +113,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -109,7 +109,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
zero, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
@@ -121,7 +122,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -100,7 +100,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
zero, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
else \
|
||||
@@ -112,7 +113,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
beta, \
|
||||
y, incy, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
} \
|
||||
\
|
||||
|
||||
@@ -87,7 +87,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
alpha, \
|
||||
x, incx, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
PASTECH(ch,dotv_ft) kfp_tv; \
|
||||
|
||||
@@ -87,7 +87,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
alpha, \
|
||||
x, incx, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
PASTECH(ch,axpyv_ft) kfp_av; \
|
||||
|
||||
@@ -81,7 +81,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
alpha, \
|
||||
x, incx, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
if ( bli_does_notrans( transa ) ) \
|
||||
|
||||
@@ -80,7 +80,8 @@ void PASTEMAC(ch,varname) \
|
||||
m, \
|
||||
alpha, \
|
||||
x, incx, \
|
||||
cntx \
|
||||
cntx, \
|
||||
NULL \
|
||||
); \
|
||||
\
|
||||
if ( bli_does_notrans( transa ) ) \
|
||||
|
||||
@@ -53,7 +53,9 @@ void bli_l3_cntl_create_if
|
||||
// values for unpacked objects. Notice that we do this even if the
|
||||
// caller passed in a custom control tree; that's because we still need
|
||||
// to reset the pack schema of a and b, which were modified by the
|
||||
// operation's _front() function.
|
||||
// operation's _front() function. However, in order for this to work,
|
||||
// the level-3 thread entry function (or omp parallel region) must
|
||||
// alias thread-local copies of objects a and b.
|
||||
pack_t schema_a = bli_obj_pack_schema( a );
|
||||
pack_t schema_b = bli_obj_pack_schema( b );
|
||||
|
||||
|
||||
@@ -70,11 +70,11 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
that is available (ie: implemented and enabled), and if none are
|
||||
enabled, it calls native execution. (For real problems, it calls
|
||||
the operation's native execution interface.) */ \
|
||||
PASTEMAC(opname,ind)( alpha, a, b, beta, c, cntx ); \
|
||||
PASTEMAC(opname,ind)( alpha, a, b, beta, c, cntx, rntm ); \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
PASTEMAC(opname,nat)( alpha, a, b, beta, c, cntx ); \
|
||||
PASTEMAC(opname,nat)( alpha, a, b, beta, c, cntx, rntm ); \
|
||||
} \
|
||||
}
|
||||
|
||||
@@ -114,11 +114,11 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
that is available (ie: implemented and enabled), and if none are
|
||||
enabled, it calls native execution. (For real problems, it calls
|
||||
the operation's native execution interface.) */ \
|
||||
PASTEMAC(opname,ind)( side, alpha, a, b, beta, c, cntx ); \
|
||||
PASTEMAC(opname,ind)( side, alpha, a, b, beta, c, cntx, rntm ); \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
PASTEMAC(opname,nat)( side, alpha, a, b, beta, c, cntx ); \
|
||||
PASTEMAC(opname,nat)( side, alpha, a, b, beta, c, cntx, rntm ); \
|
||||
} \
|
||||
}
|
||||
|
||||
@@ -155,11 +155,11 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
that is available (ie: implemented and enabled), and if none are
|
||||
enabled, it calls native execution. (For real problems, it calls
|
||||
the operation's native execution interface.) */ \
|
||||
PASTEMAC(opname,ind)( alpha, a, beta, c, cntx ); \
|
||||
PASTEMAC(opname,ind)( alpha, a, beta, c, cntx, rntm ); \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
PASTEMAC(opname,nat)( alpha, a, beta, c, cntx ); \
|
||||
PASTEMAC(opname,nat)( alpha, a, beta, c, cntx, rntm ); \
|
||||
} \
|
||||
}
|
||||
|
||||
@@ -195,11 +195,11 @@ void PASTEMAC(opname,EX_SUF) \
|
||||
that is available (ie: implemented and enabled), and if none are
|
||||
enabled, it calls native execution. (For real problems, it calls
|
||||
the operation's native execution interface.) */ \
|
||||
PASTEMAC(opname,ind)( side, alpha, a, b, cntx ); \
|
||||
PASTEMAC(opname,ind)( side, alpha, a, b, cntx, rntm ); \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
PASTEMAC(opname,nat)( side, alpha, a, b, cntx ); \
|
||||
PASTEMAC(opname,nat)( side, alpha, a, b, cntx, rntm ); \
|
||||
} \
|
||||
}
|
||||
|
||||
|
||||
@@ -52,7 +52,8 @@ typedef void (*PASTECH(opname,_oft)) \
|
||||
obj_t* b, \
|
||||
obj_t* beta, \
|
||||
obj_t* c, \
|
||||
cntx_t* cntx \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm \
|
||||
);
|
||||
|
||||
GENTDEF( gemm )
|
||||
@@ -73,7 +74,8 @@ typedef void (*PASTECH(opname,_oft)) \
|
||||
obj_t* b, \
|
||||
obj_t* beta, \
|
||||
obj_t* c, \
|
||||
cntx_t* cntx \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm \
|
||||
);
|
||||
|
||||
GENTDEF( hemm )
|
||||
@@ -92,7 +94,8 @@ typedef void (*PASTECH(opname,_oft)) \
|
||||
obj_t* a, \
|
||||
obj_t* beta, \
|
||||
obj_t* c, \
|
||||
cntx_t* cntx \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm \
|
||||
);
|
||||
|
||||
GENTDEF( herk )
|
||||
@@ -110,7 +113,8 @@ typedef void (*PASTECH(opname,_oft)) \
|
||||
obj_t* alpha, \
|
||||
obj_t* a, \
|
||||
obj_t* b, \
|
||||
cntx_t* cntx \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm \
|
||||
);
|
||||
|
||||
GENTDEF( trmm )
|
||||
|
||||
@@ -39,6 +39,7 @@ void bli_l3_packm
|
||||
obj_t* x,
|
||||
obj_t* x_pack,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
|
||||
@@ -37,6 +37,7 @@ void bli_l3_packm
|
||||
obj_t* x,
|
||||
obj_t* x_pack,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -89,7 +89,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
&bo, \
|
||||
&betao, \
|
||||
&co, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -150,7 +151,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
&bo, \
|
||||
&betao, \
|
||||
&co, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -204,7 +206,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
&ao, \
|
||||
&betao, \
|
||||
&co, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -264,7 +267,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
&bo, \
|
||||
&betao, \
|
||||
&co, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -316,7 +320,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
&ao, \
|
||||
&betao, \
|
||||
&co, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -375,7 +380,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
&bo, \
|
||||
&betao, \
|
||||
&co, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -438,7 +444,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
&bo, \
|
||||
&betao, \
|
||||
&co, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
@@ -491,7 +498,8 @@ void PASTEMAC2(ch,opname,EX_SUF) \
|
||||
&alphao, \
|
||||
&ao, \
|
||||
&bo, \
|
||||
cntx \
|
||||
cntx, \
|
||||
rntm \
|
||||
); \
|
||||
}
|
||||
|
||||
|
||||
@@ -122,7 +122,7 @@ void bli_l3_thrinfo_create_root
|
||||
(
|
||||
dim_t id,
|
||||
thrcomm_t* gl_comm,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t** thread
|
||||
)
|
||||
@@ -136,7 +136,7 @@ void bli_l3_thrinfo_create_root
|
||||
// Use the blocksize id of the current (root) control tree node to
|
||||
// query the top-most ways of parallelism to obtain.
|
||||
bszid_t bszid = bli_cntl_bszid( cntl );
|
||||
dim_t xx_way = bli_cntx_way_for_bszid( bszid, cntx );
|
||||
dim_t xx_way = bli_rntm_ways_for( bszid, rntm );
|
||||
|
||||
// Determine the work id for this thrinfo_t node.
|
||||
dim_t work_id = gl_comm_id / ( n_threads / xx_way );
|
||||
@@ -259,196 +259,6 @@ void bli_l3_thrinfo_print_paths
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
#if 0
|
||||
thrinfo_t** bli_l3_thrinfo_create_roots
|
||||
(
|
||||
cntx_t* cntx,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
// Query the context for the total number of threads to use.
|
||||
dim_t n_threads = bli_cntx_get_num_threads( cntx );
|
||||
|
||||
// Create a global thread communicator for all the threads.
|
||||
thrcomm_t* gl_comm = bli_thrcomm_create( n_threads );
|
||||
|
||||
// Allocate an array of thrinfo_t pointers, one for each thread.
|
||||
thrinfo_t** paths = bli_malloc_intl( n_threads * sizeof( thrinfo_t* ) );
|
||||
|
||||
// Use the blocksize id of the current (root) control tree node to
|
||||
// query the top-most ways of parallelism to obtain.
|
||||
bszid_t bszid = bli_cntl_bszid( cntl );
|
||||
dim_t xx_way = bli_cntx_way_for_bszid( bszid, cntx );
|
||||
|
||||
dim_t gl_comm_id;
|
||||
|
||||
// Create one thrinfo_t node for each thread in the (global) communicator.
|
||||
for ( gl_comm_id = 0; gl_comm_id < n_threads; ++gl_comm_id )
|
||||
{
|
||||
dim_t work_id = gl_comm_id / ( n_threads / xx_way );
|
||||
|
||||
paths[ gl_comm_id ] = bli_thrinfo_create
|
||||
(
|
||||
gl_comm,
|
||||
gl_comm_id,
|
||||
xx_way,
|
||||
work_id,
|
||||
TRUE,
|
||||
NULL
|
||||
);
|
||||
}
|
||||
|
||||
return paths;
|
||||
}
|
||||
|
||||
//#define PRINT_THRINFO
|
||||
|
||||
thrinfo_t** bli_l3_thrinfo_create_full_paths
|
||||
(
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
dim_t jc_way = bli_cntx_jc_way( cntx );
|
||||
dim_t pc_way = bli_cntx_pc_way( cntx );
|
||||
dim_t ic_way = bli_cntx_ic_way( cntx );
|
||||
dim_t jr_way = bli_cntx_jr_way( cntx );
|
||||
dim_t ir_way = bli_cntx_ir_way( cntx );
|
||||
|
||||
dim_t gl_nt = jc_way * pc_way * ic_way * jr_way * ir_way;
|
||||
dim_t jc_nt = pc_way * ic_way * jr_way * ir_way;
|
||||
dim_t pc_nt = ic_way * jr_way * ir_way;
|
||||
dim_t ic_nt = jr_way * ir_way;
|
||||
dim_t jr_nt = ir_way;
|
||||
dim_t ir_nt = 1;
|
||||
|
||||
assert( gl_nt != 0 );
|
||||
|
||||
#ifdef PRINT_THRINFO
|
||||
printf( " gl jc kc pb ic pa jr ir\n" );
|
||||
printf( "xx_nt: %4lu %4lu %4lu %4lu %4lu %4lu %4lu %4lu\n",
|
||||
gl_nt, jc_nt, pc_nt, pc_nt, ic_nt, ic_nt, jr_nt, ir_nt );
|
||||
printf( "\n" );
|
||||
printf( " jc kc pb ic pa jr ir\n" );
|
||||
printf( "xx_way: %4lu %4lu %4lu %4lu %4lu %4lu %4lu\n",
|
||||
jc_way, pc_way, (dim_t)0, ic_way, (dim_t)0, jr_way, ir_way );
|
||||
printf( "=================================================\n" );
|
||||
#endif
|
||||
|
||||
thrinfo_t** paths = bli_malloc_intl( gl_nt * sizeof( thrinfo_t* ) );
|
||||
|
||||
thrcomm_t* gl_comm = bli_thrcomm_create( gl_nt );
|
||||
|
||||
for( int a = 0; a < jc_way; a++ )
|
||||
{
|
||||
thrcomm_t* jc_comm = bli_thrcomm_create( jc_nt );
|
||||
|
||||
for( int b = 0; b < pc_way; b++ )
|
||||
{
|
||||
thrcomm_t* pc_comm = bli_thrcomm_create( pc_nt );
|
||||
|
||||
for( int c = 0; c < ic_way; c++ )
|
||||
{
|
||||
thrcomm_t* ic_comm = bli_thrcomm_create( ic_nt );
|
||||
|
||||
for( int d = 0; d < jr_way; d++ )
|
||||
{
|
||||
thrcomm_t* jr_comm = bli_thrcomm_create( jr_nt );
|
||||
|
||||
for( int e = 0; e < ir_way; e++ )
|
||||
{
|
||||
//thrcomm_t* ir_comm = bli_thrcomm_create( ir_nt );
|
||||
dim_t ir_comm_id = 0;
|
||||
dim_t jr_comm_id = e*ir_nt + ir_comm_id;
|
||||
dim_t ic_comm_id = d*jr_nt + jr_comm_id;
|
||||
dim_t pc_comm_id = c*ic_nt + ic_comm_id;
|
||||
dim_t jc_comm_id = b*pc_nt + pc_comm_id;
|
||||
dim_t gl_comm_id = a*jc_nt + jc_comm_id;
|
||||
|
||||
// macro-kernel loops
|
||||
thrinfo_t* ir_info
|
||||
=
|
||||
bli_l3_thrinfo_create( jr_comm, jr_comm_id,
|
||||
ir_way, e,
|
||||
NULL );
|
||||
thrinfo_t* jr_info
|
||||
=
|
||||
bli_l3_thrinfo_create( ic_comm, ic_comm_id,
|
||||
jr_way, d,
|
||||
ir_info );
|
||||
// packa
|
||||
thrinfo_t* pa_info
|
||||
=
|
||||
bli_packm_thrinfo_create( ic_comm, ic_comm_id,
|
||||
ic_nt, ic_comm_id,
|
||||
jr_info );
|
||||
// blk_var1
|
||||
thrinfo_t* ic_info
|
||||
=
|
||||
bli_l3_thrinfo_create( pc_comm, pc_comm_id,
|
||||
ic_way, c,
|
||||
pa_info );
|
||||
// packb
|
||||
thrinfo_t* pb_info
|
||||
=
|
||||
bli_packm_thrinfo_create( pc_comm, pc_comm_id,
|
||||
pc_nt, pc_comm_id,
|
||||
ic_info );
|
||||
// blk_var3
|
||||
thrinfo_t* pc_info
|
||||
=
|
||||
bli_l3_thrinfo_create( jc_comm, jc_comm_id,
|
||||
pc_way, b,
|
||||
pb_info );
|
||||
// blk_var2
|
||||
thrinfo_t* jc_info
|
||||
=
|
||||
bli_l3_thrinfo_create( gl_comm, gl_comm_id,
|
||||
jc_way, a,
|
||||
pc_info );
|
||||
|
||||
paths[gl_comm_id] = jc_info;
|
||||
|
||||
#ifdef PRINT_THRINFO
|
||||
{
|
||||
dim_t gl_comm_id = bli_thread_ocomm_id( jc_info );
|
||||
dim_t jc_comm_id = bli_thread_ocomm_id( pc_info );
|
||||
dim_t pc_comm_id = bli_thread_ocomm_id( pb_info );
|
||||
dim_t pb_comm_id = bli_thread_ocomm_id( ic_info );
|
||||
dim_t ic_comm_id = bli_thread_ocomm_id( pa_info );
|
||||
dim_t pa_comm_id = bli_thread_ocomm_id( jr_info );
|
||||
dim_t jr_comm_id = bli_thread_ocomm_id( ir_info );
|
||||
|
||||
dim_t jc_work_id = bli_thread_work_id( jc_info );
|
||||
dim_t pc_work_id = bli_thread_work_id( pc_info );
|
||||
dim_t pb_work_id = bli_thread_work_id( pb_info );
|
||||
dim_t ic_work_id = bli_thread_work_id( ic_info );
|
||||
dim_t pa_work_id = bli_thread_work_id( pa_info );
|
||||
dim_t jr_work_id = bli_thread_work_id( jr_info );
|
||||
dim_t ir_work_id = bli_thread_work_id( ir_info );
|
||||
|
||||
printf( " gl jc pb kc pa ic jr \n" );
|
||||
printf( "comm ids: %4lu %4lu %4lu %4lu %4lu %4lu %4lu\n",
|
||||
gl_comm_id, jc_comm_id, pc_comm_id, pb_comm_id, ic_comm_id, pa_comm_id, jr_comm_id );
|
||||
printf( "work ids: %4ld %4ld %4lu %4lu %4ld %4ld %4ld\n",
|
||||
jc_work_id, pc_work_id, pb_work_id, ic_work_id, pa_work_id, jr_work_id, ir_work_id );
|
||||
printf( "-------------------------------------------------\n" );
|
||||
}
|
||||
#endif
|
||||
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
#ifdef PRINT_THRINFO
|
||||
exit(1);
|
||||
#endif
|
||||
|
||||
|
||||
return paths;
|
||||
}
|
||||
#endif
|
||||
|
||||
void bli_l3_thrinfo_free_paths
|
||||
(
|
||||
thrinfo_t** threads
|
||||
|
||||
@@ -61,17 +61,6 @@
|
||||
// thrinfo_t APIs specific to level-3 operations.
|
||||
//
|
||||
|
||||
#if 0
|
||||
thrinfo_t* bli_l3_thrinfo_create
|
||||
(
|
||||
thrcomm_t* ocomm,
|
||||
dim_t ocomm_id,
|
||||
dim_t n_way,
|
||||
dim_t work_id,
|
||||
thrinfo_t* sub_node
|
||||
);
|
||||
#endif
|
||||
|
||||
void bli_l3_thrinfo_init
|
||||
(
|
||||
thrinfo_t* thread,
|
||||
@@ -98,7 +87,7 @@ void bli_l3_thrinfo_create_root
|
||||
(
|
||||
dim_t id,
|
||||
thrcomm_t* gl_comm,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t** thread
|
||||
);
|
||||
@@ -110,19 +99,6 @@ void bli_l3_thrinfo_print_paths
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
#if 0
|
||||
thrinfo_t** bli_l3_thrinfo_create_roots
|
||||
(
|
||||
cntx_t* cntx,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
thrinfo_t** bli_l3_thrinfo_create_full_paths
|
||||
(
|
||||
cntx_t* cntx
|
||||
);
|
||||
#endif
|
||||
|
||||
void bli_l3_thrinfo_free_paths
|
||||
(
|
||||
thrinfo_t** threads
|
||||
|
||||
@@ -49,6 +49,7 @@ typedef void (*PASTECH(opname,_voft)) \
|
||||
obj_t* b, \
|
||||
obj_t* c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
cntl_t* cntl, \
|
||||
thrinfo_t* thread \
|
||||
);
|
||||
@@ -64,6 +65,7 @@ typedef void (*PASTECH(opname,_voft)) \
|
||||
obj_t* b, \
|
||||
obj_t* c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
cntl_t* cntl, \
|
||||
thrinfo_t* thread \
|
||||
);
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_gemm_blk_var1
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -87,6 +88,7 @@ void bli_gemm_blk_var1
|
||||
&BLIS_ONE,
|
||||
&c1,
|
||||
cntx,
|
||||
rntm,
|
||||
bli_cntl_sub_node( cntl ),
|
||||
bli_thrinfo_sub_node( thread )
|
||||
);
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_gemm_blk_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -87,6 +88,7 @@ void bli_gemm_blk_var2
|
||||
&BLIS_ONE,
|
||||
&c1,
|
||||
cntx,
|
||||
rntm,
|
||||
bli_cntl_sub_node( cntl ),
|
||||
bli_thrinfo_sub_node( thread )
|
||||
);
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_gemm_blk_var3
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -83,6 +84,7 @@ void bli_gemm_blk_var3
|
||||
&BLIS_ONE,
|
||||
c,
|
||||
cntx,
|
||||
rntm,
|
||||
bli_cntl_sub_node( cntl ),
|
||||
bli_thrinfo_sub_node( thread )
|
||||
);
|
||||
|
||||
@@ -43,6 +43,7 @@ void bli_gemm_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
@@ -86,34 +87,38 @@ void bli_gemm_front
|
||||
bli_obj_induce_trans( &c_local );
|
||||
}
|
||||
|
||||
// Record the threading for each level within the context.
|
||||
bli_cntx_set_thrloop_from_env
|
||||
// Parse and interpret the contents of the rntm_t object to properly
|
||||
// set the ways of parallelism for each loop, and then make any
|
||||
// additional modifications necessary for the current operation.
|
||||
bli_rntm_set_ways_for_op
|
||||
(
|
||||
BLIS_GEMM,
|
||||
BLIS_LEFT, // ignored for gemm/hemm/symm
|
||||
bli_obj_length( &c_local ),
|
||||
bli_obj_width( &c_local ),
|
||||
bli_obj_width( &a_local ),
|
||||
cntx
|
||||
rntm
|
||||
);
|
||||
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
// to bli_gemm_cntl_create() (via bli_l3_thread_decorator() and
|
||||
// bli_l3_cntl_create_if()). This allows us to access the schemas from
|
||||
// the control tree, which hopefully reduces some confusion, particularly
|
||||
// in bli_packm_init().
|
||||
if ( bli_cntx_method( cntx ) == BLIS_NAT )
|
||||
{
|
||||
bli_obj_set_pack_schema( BLIS_PACKED_ROW_PANELS, &a_local );
|
||||
bli_obj_set_pack_schema( BLIS_PACKED_COL_PANELS, &b_local );
|
||||
}
|
||||
else // if ( bli_cntx_method( cntx ) != BLIS_NAT )
|
||||
{
|
||||
pack_t schema_a = bli_cntx_schema_a_block( cntx );
|
||||
pack_t schema_b = bli_cntx_schema_b_panel( cntx );
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
// to bli_gemm_cntl_create() (via bli_l3_thread_decorator() and
|
||||
// bli_l3_cntl_create_if()). This allows us to access the schemas from
|
||||
// the control tree, which hopefully reduces some confusion, particularly
|
||||
// in bli_packm_init().
|
||||
if ( bli_cntx_method( cntx ) == BLIS_NAT )
|
||||
{
|
||||
bli_obj_set_pack_schema( BLIS_PACKED_ROW_PANELS, &a_local );
|
||||
bli_obj_set_pack_schema( BLIS_PACKED_COL_PANELS, &b_local );
|
||||
}
|
||||
else // if ( bli_cntx_method( cntx ) != BLIS_NAT )
|
||||
{
|
||||
pack_t schema_a = bli_cntx_schema_a_block( cntx );
|
||||
pack_t schema_b = bli_cntx_schema_b_panel( cntx );
|
||||
|
||||
bli_obj_set_pack_schema( schema_a, &a_local );
|
||||
bli_obj_set_pack_schema( schema_b, &b_local );
|
||||
bli_obj_set_pack_schema( schema_a, &a_local );
|
||||
bli_obj_set_pack_schema( schema_b, &b_local );
|
||||
}
|
||||
}
|
||||
|
||||
// Invoke the internal back-end via the thread handler.
|
||||
@@ -127,6 +132,7 @@ void bli_gemm_front
|
||||
beta,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
}
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_gemm_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
|
||||
@@ -42,6 +42,7 @@ void bli_gemm_int
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -102,7 +103,7 @@ void bli_gemm_int
|
||||
}
|
||||
|
||||
// Create the next node in the thrinfo_t structure.
|
||||
bli_thrinfo_grow( cntx, cntl, thread );
|
||||
bli_thrinfo_grow( rntm, cntl, thread );
|
||||
|
||||
// Extract the function pointer from the current control tree node.
|
||||
f = bli_cntl_var_func( cntl );
|
||||
@@ -124,6 +125,7 @@ void bli_gemm_int
|
||||
&b_local,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl,
|
||||
thread
|
||||
);
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_gemm_int
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_gemm_ker_var1
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -51,6 +52,6 @@ void bli_gemm_ker_var1
|
||||
bli_obj_induce_trans( b );
|
||||
bli_obj_induce_trans( c );
|
||||
|
||||
bli_gemm_ker_var2( b, a, c, cntx, cntl, thread );
|
||||
bli_gemm_ker_var2( b, a, c, cntx, rntm, cntl, thread );
|
||||
}
|
||||
|
||||
|
||||
@@ -50,6 +50,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* beta,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -62,6 +63,7 @@ void bli_gemm_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -147,6 +149,7 @@ void bli_gemm_ker_var2
|
||||
buf_beta,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -169,6 +172,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_gemm_packa
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -52,6 +53,7 @@ void bli_gemm_packa
|
||||
a,
|
||||
&a_pack,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl,
|
||||
thread
|
||||
);
|
||||
@@ -65,6 +67,7 @@ void bli_gemm_packa
|
||||
&BLIS_ONE,
|
||||
c,
|
||||
cntx,
|
||||
rntm,
|
||||
bli_cntl_sub_node( cntl ),
|
||||
bli_thrinfo_sub_node( thread )
|
||||
);
|
||||
@@ -78,6 +81,7 @@ void bli_gemm_packb
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -90,6 +94,7 @@ void bli_gemm_packb
|
||||
b,
|
||||
&b_pack,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl,
|
||||
thread
|
||||
);
|
||||
@@ -103,6 +108,7 @@ void bli_gemm_packb
|
||||
&BLIS_ONE,
|
||||
c,
|
||||
cntx,
|
||||
rntm,
|
||||
bli_cntl_sub_node( cntl ),
|
||||
bli_thrinfo_sub_node( thread )
|
||||
);
|
||||
|
||||
@@ -46,6 +46,7 @@ void PASTEMAC0(opname) \
|
||||
obj_t* b, \
|
||||
obj_t* c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
cntl_t* cntl, \
|
||||
thrinfo_t* thread \
|
||||
);
|
||||
@@ -85,6 +86,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
);
|
||||
|
||||
|
||||
@@ -50,6 +50,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* beta,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -62,6 +63,7 @@ void bli_gemm4mb_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -127,6 +129,7 @@ void bli_gemm4mb_ker_var2
|
||||
buf_beta,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -149,6 +152,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -43,6 +43,7 @@ void bli_hemm_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
@@ -87,15 +88,17 @@ void bli_hemm_front
|
||||
bli_obj_swap( &a_local, &b_local );
|
||||
}
|
||||
|
||||
// Record the threading for each level within the context.
|
||||
bli_cntx_set_thrloop_from_env
|
||||
// Parse and interpret the contents of the rntm_t object to properly
|
||||
// set the ways of parallelism for each loop, and then make any
|
||||
// additional modifications necessary for the current operation.
|
||||
bli_rntm_set_ways_for_op
|
||||
(
|
||||
BLIS_HEMM,
|
||||
BLIS_LEFT, // ignored for gemm/hemm/symm
|
||||
bli_obj_length( &c_local ),
|
||||
bli_obj_width( &c_local ),
|
||||
bli_obj_width( &a_local ),
|
||||
cntx
|
||||
rntm
|
||||
);
|
||||
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
@@ -128,6 +131,7 @@ void bli_hemm_front
|
||||
beta,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
}
|
||||
|
||||
@@ -41,5 +41,6 @@ void bli_hemm_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
@@ -42,6 +42,7 @@ void bli_her2k_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
@@ -105,15 +106,17 @@ void bli_her2k_front
|
||||
bli_obj_induce_trans( &c_local );
|
||||
}
|
||||
|
||||
// Record the threading for each level within the context.
|
||||
bli_cntx_set_thrloop_from_env
|
||||
// Parse and interpret the contents of the rntm_t object to properly
|
||||
// set the ways of parallelism for each loop, and then make any
|
||||
// additional modifications necessary for the current operation.
|
||||
bli_rntm_set_ways_for_op
|
||||
(
|
||||
BLIS_HER2K,
|
||||
BLIS_LEFT, // ignored for her[2]k/syr[2]k
|
||||
bli_obj_length( &c_local ),
|
||||
bli_obj_width( &c_local ),
|
||||
bli_obj_width( &a_local ),
|
||||
cntx
|
||||
rntm
|
||||
);
|
||||
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
@@ -152,6 +155,7 @@ void bli_her2k_front
|
||||
beta,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
|
||||
@@ -165,6 +169,7 @@ void bli_her2k_front
|
||||
&BLIS_ONE,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
|
||||
|
||||
@@ -40,5 +40,6 @@ void bli_her2k_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
@@ -41,6 +41,7 @@ void bli_herk_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
@@ -85,15 +86,17 @@ void bli_herk_front
|
||||
bli_obj_induce_trans( &c_local );
|
||||
}
|
||||
|
||||
// Record the threading for each level within the context.
|
||||
bli_cntx_set_thrloop_from_env
|
||||
// Parse and interpret the contents of the rntm_t object to properly
|
||||
// set the ways of parallelism for each loop, and then make any
|
||||
// additional modifications necessary for the current operation.
|
||||
bli_rntm_set_ways_for_op
|
||||
(
|
||||
BLIS_HERK,
|
||||
BLIS_LEFT, // ignored for her[2]k/syr[2]k
|
||||
bli_obj_length( &c_local ),
|
||||
bli_obj_width( &c_local ),
|
||||
bli_obj_width( &a_local ),
|
||||
cntx
|
||||
rntm
|
||||
);
|
||||
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
@@ -126,6 +129,7 @@ void bli_herk_front
|
||||
beta,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
|
||||
|
||||
@@ -39,5 +39,6 @@ void bli_herk_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
@@ -51,6 +51,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* beta,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -63,6 +64,7 @@ void bli_herk_l_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -131,6 +133,7 @@ void bli_herk_l_ker_var2
|
||||
buf_beta,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -154,6 +157,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -51,6 +51,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* beta,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -63,6 +64,7 @@ void bli_herk_u_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -131,6 +133,7 @@ void bli_herk_u_ker_var2
|
||||
buf_beta,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -154,6 +157,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -46,6 +46,7 @@ void PASTEMAC0(opname) \
|
||||
obj_t* ah, \
|
||||
obj_t* c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
cntl_t* cntl, \
|
||||
thrinfo_t* thread \
|
||||
);
|
||||
@@ -84,6 +85,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
);
|
||||
|
||||
|
||||
@@ -45,6 +45,7 @@ void bli_herk_x_ker_var2
|
||||
obj_t* ah,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -66,6 +67,7 @@ void bli_herk_x_ker_var2
|
||||
ah,
|
||||
c,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl,
|
||||
thread
|
||||
);
|
||||
|
||||
@@ -43,6 +43,7 @@ void bli_symm_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
@@ -86,15 +87,17 @@ void bli_symm_front
|
||||
bli_obj_swap( &a_local, &b_local );
|
||||
}
|
||||
|
||||
// Record the threading for each level within the context.
|
||||
bli_cntx_set_thrloop_from_env
|
||||
// Parse and interpret the contents of the rntm_t object to properly
|
||||
// set the ways of parallelism for each loop, and then make any
|
||||
// additional modifications necessary for the current operation.
|
||||
bli_rntm_set_ways_for_op
|
||||
(
|
||||
BLIS_SYMM,
|
||||
BLIS_LEFT, // ignored for gemm/hemm/symm
|
||||
bli_obj_length( &c_local ),
|
||||
bli_obj_width( &c_local ),
|
||||
bli_obj_width( &a_local ),
|
||||
cntx
|
||||
rntm
|
||||
);
|
||||
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
@@ -127,6 +130,7 @@ void bli_symm_front
|
||||
beta,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
}
|
||||
|
||||
@@ -41,5 +41,6 @@ void bli_symm_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
@@ -42,6 +42,7 @@ void bli_syr2k_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
@@ -86,15 +87,17 @@ void bli_syr2k_front
|
||||
bli_obj_induce_trans( &c_local );
|
||||
}
|
||||
|
||||
// Record the threading for each level within the context.
|
||||
bli_cntx_set_thrloop_from_env
|
||||
// Parse and interpret the contents of the rntm_t object to properly
|
||||
// set the ways of parallelism for each loop, and then make any
|
||||
// additional modifications necessary for the current operation.
|
||||
bli_rntm_set_ways_for_op
|
||||
(
|
||||
BLIS_SYR2K,
|
||||
BLIS_LEFT, // ignored for her[2]k/syr[2]k
|
||||
bli_obj_length( &c_local ),
|
||||
bli_obj_width( &c_local ),
|
||||
bli_obj_width( &a_local ),
|
||||
cntx
|
||||
rntm
|
||||
);
|
||||
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
@@ -133,6 +136,7 @@ void bli_syr2k_front
|
||||
beta,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
|
||||
@@ -146,6 +150,7 @@ void bli_syr2k_front
|
||||
&BLIS_ONE,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
}
|
||||
|
||||
@@ -40,5 +40,6 @@ void bli_syr2k_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
@@ -41,6 +41,7 @@ void bli_syrk_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
@@ -79,15 +80,17 @@ void bli_syrk_front
|
||||
bli_obj_induce_trans( &c_local );
|
||||
}
|
||||
|
||||
// Record the threading for each level within the context.
|
||||
bli_cntx_set_thrloop_from_env
|
||||
// Parse and interpret the contents of the rntm_t object to properly
|
||||
// set the ways of parallelism for each loop, and then make any
|
||||
// additional modifications necessary for the current operation.
|
||||
bli_rntm_set_ways_for_op
|
||||
(
|
||||
BLIS_SYRK,
|
||||
BLIS_LEFT, // ignored for her[2]k/syr[2]k
|
||||
bli_obj_length( &c_local ),
|
||||
bli_obj_width( &c_local ),
|
||||
bli_obj_width( &a_local ),
|
||||
cntx
|
||||
rntm
|
||||
);
|
||||
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
@@ -120,6 +123,7 @@ void bli_syrk_front
|
||||
beta,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
}
|
||||
|
||||
@@ -39,5 +39,6 @@ void bli_syrk_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
@@ -41,6 +41,7 @@ void bli_trmm_front
|
||||
obj_t* a,
|
||||
obj_t* b,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
@@ -129,15 +130,17 @@ void bli_trmm_front
|
||||
bli_obj_set_as_root( &b_local );
|
||||
bli_obj_set_as_root( &c_local );
|
||||
|
||||
// Record the threading for each level within the context.
|
||||
bli_cntx_set_thrloop_from_env
|
||||
// Parse and interpret the contents of the rntm_t object to properly
|
||||
// set the ways of parallelism for each loop, and then make any
|
||||
// additional modifications necessary for the current operation.
|
||||
bli_rntm_set_ways_for_op
|
||||
(
|
||||
BLIS_TRMM,
|
||||
side,
|
||||
bli_obj_length( &c_local ),
|
||||
bli_obj_width( &c_local ),
|
||||
bli_obj_width( &a_local ),
|
||||
cntx
|
||||
rntm
|
||||
);
|
||||
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
@@ -170,6 +173,7 @@ void bli_trmm_front
|
||||
&BLIS_ZERO,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
}
|
||||
|
||||
@@ -39,5 +39,6 @@ void bli_trmm_front
|
||||
obj_t* a,
|
||||
obj_t* b,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* beta,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -61,6 +62,7 @@ void bli_trmm_ll_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -125,6 +127,7 @@ void bli_trmm_ll_ker_var2
|
||||
buf_beta,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -146,6 +149,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* jr_thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* beta,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -61,6 +62,7 @@ void bli_trmm_lu_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -125,6 +127,7 @@ void bli_trmm_lu_ker_var2
|
||||
buf_beta,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -146,6 +149,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* jr_thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* beta,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -61,6 +62,7 @@ void bli_trmm_rl_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -125,6 +127,7 @@ void bli_trmm_rl_ker_var2
|
||||
buf_beta,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -146,6 +149,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* jr_thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* beta,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -61,6 +62,7 @@ void bli_trmm_ru_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -125,6 +127,7 @@ void bli_trmm_ru_ker_var2
|
||||
buf_beta,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -146,6 +149,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* jr_thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -46,6 +46,7 @@ void PASTEMAC0(opname) \
|
||||
obj_t* b, \
|
||||
obj_t* c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
cntl_t* cntl, \
|
||||
thrinfo_t* thread \
|
||||
);
|
||||
@@ -84,6 +85,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* beta, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
);
|
||||
|
||||
|
||||
@@ -46,6 +46,7 @@ void bli_trmm_xx_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -80,6 +81,7 @@ void bli_trmm_xx_ker_var2
|
||||
b,
|
||||
c,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl,
|
||||
thread
|
||||
);
|
||||
|
||||
@@ -43,6 +43,7 @@ void bli_trmm3_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
@@ -128,15 +129,17 @@ void bli_trmm3_front
|
||||
bli_obj_set_as_root( &b_local );
|
||||
bli_obj_set_as_root( &c_local );
|
||||
|
||||
// Record the threading for each level within the context.
|
||||
bli_cntx_set_thrloop_from_env
|
||||
// Parse and interpret the contents of the rntm_t object to properly
|
||||
// set the ways of parallelism for each loop, and then make any
|
||||
// additional modifications necessary for the current operation.
|
||||
bli_rntm_set_ways_for_op
|
||||
(
|
||||
BLIS_TRMM3,
|
||||
side,
|
||||
bli_obj_length( &c_local ),
|
||||
bli_obj_width( &c_local ),
|
||||
bli_obj_width( &a_local ),
|
||||
cntx
|
||||
rntm
|
||||
);
|
||||
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
@@ -169,6 +172,7 @@ void bli_trmm3_front
|
||||
beta,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
}
|
||||
|
||||
@@ -41,5 +41,6 @@ void bli_trmm3_front
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_trsm_blk_var1
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -87,6 +88,7 @@ void bli_trsm_blk_var1
|
||||
&BLIS_ONE,
|
||||
&c1,
|
||||
cntx,
|
||||
rntm,
|
||||
bli_cntl_sub_node( cntl ),
|
||||
bli_thrinfo_sub_node( thread )
|
||||
);
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_trsm_blk_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -87,6 +88,7 @@ void bli_trsm_blk_var2
|
||||
&BLIS_ONE,
|
||||
&c1,
|
||||
cntx,
|
||||
rntm,
|
||||
bli_cntl_sub_node( cntl ),
|
||||
bli_thrinfo_sub_node( thread )
|
||||
);
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_trsm_blk_var3
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -83,6 +84,7 @@ void bli_trsm_blk_var3
|
||||
&BLIS_ONE,
|
||||
c,
|
||||
cntx,
|
||||
rntm,
|
||||
bli_cntl_sub_node( cntl ),
|
||||
bli_thrinfo_sub_node( thread )
|
||||
);
|
||||
|
||||
@@ -41,6 +41,7 @@ void bli_trsm_front
|
||||
obj_t* a,
|
||||
obj_t* b,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
@@ -120,15 +121,17 @@ void bli_trsm_front
|
||||
bli_obj_set_as_root( &b_local );
|
||||
bli_obj_set_as_root( &c_local );
|
||||
|
||||
// Record the threading for each level within the context.
|
||||
bli_cntx_set_thrloop_from_env
|
||||
// Parse and interpret the contents of the rntm_t object to properly
|
||||
// set the ways of parallelism for each loop, and then make any
|
||||
// additional modifications necessary for the current operation.
|
||||
bli_rntm_set_ways_for_op
|
||||
(
|
||||
BLIS_TRSM,
|
||||
side,
|
||||
bli_obj_length( &c_local ),
|
||||
bli_obj_width( &c_local ),
|
||||
bli_obj_width( &a_local ),
|
||||
cntx
|
||||
rntm
|
||||
);
|
||||
|
||||
// A sort of hack for communicating the desired pach schemas for A and B
|
||||
@@ -161,6 +164,7 @@ void bli_trsm_front
|
||||
alpha,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl
|
||||
);
|
||||
}
|
||||
|
||||
@@ -39,5 +39,6 @@ void bli_trsm_front
|
||||
obj_t* a,
|
||||
obj_t* b,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
@@ -42,6 +42,7 @@ void bli_trsm_int
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -118,7 +119,7 @@ void bli_trsm_int
|
||||
bli_thread_obarrier( thread );
|
||||
|
||||
// Create the next node in the thrinfo_t structure.
|
||||
bli_thrinfo_grow( cntx, cntl, thread );
|
||||
bli_thrinfo_grow( rntm, cntl, thread );
|
||||
|
||||
// Extract the function pointer from the current control tree node.
|
||||
f = bli_cntl_var_func( cntl );
|
||||
@@ -130,6 +131,7 @@ void bli_trsm_int
|
||||
&b_local,
|
||||
&c_local,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl,
|
||||
thread
|
||||
);
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_trsm_int
|
||||
obj_t* beta,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* alpha2,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -61,6 +62,7 @@ void bli_trsm_ll_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -128,6 +130,7 @@ void bli_trsm_ll_ker_var2
|
||||
buf_alpha2,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -149,6 +152,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* alpha2, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* alpha2,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -61,6 +62,7 @@ void bli_trsm_lu_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -128,6 +130,7 @@ void bli_trsm_lu_ker_var2
|
||||
buf_alpha2,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -149,6 +152,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* alpha2, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -40,6 +40,7 @@ void bli_trsm_packa
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -52,6 +53,7 @@ void bli_trsm_packa
|
||||
a,
|
||||
&a_pack,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl,
|
||||
thread
|
||||
);
|
||||
@@ -65,6 +67,7 @@ void bli_trsm_packa
|
||||
&BLIS_ONE,
|
||||
c,
|
||||
cntx,
|
||||
rntm,
|
||||
bli_cntl_sub_node( cntl ),
|
||||
bli_thrinfo_sub_node( thread )
|
||||
);
|
||||
@@ -78,6 +81,7 @@ void bli_trsm_packb
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -90,6 +94,7 @@ void bli_trsm_packb
|
||||
b,
|
||||
&b_pack,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl,
|
||||
thread
|
||||
);
|
||||
@@ -103,6 +108,7 @@ void bli_trsm_packb
|
||||
&BLIS_ONE,
|
||||
c,
|
||||
cntx,
|
||||
rntm,
|
||||
bli_cntl_sub_node( cntl ),
|
||||
bli_thrinfo_sub_node( thread )
|
||||
);
|
||||
|
||||
@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* alpha2,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -61,6 +62,7 @@ void bli_trsm_rl_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -128,6 +130,7 @@ void bli_trsm_rl_ker_var2
|
||||
buf_alpha2,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -149,6 +152,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* alpha2, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -49,6 +49,7 @@ typedef void (*FUNCPTR_T)(
|
||||
void* alpha2,
|
||||
void* c, inc_t rs_c, inc_t cs_c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
thrinfo_t* thread
|
||||
);
|
||||
|
||||
@@ -61,6 +62,7 @@ void bli_trsm_ru_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -128,6 +130,7 @@ void bli_trsm_ru_ker_var2
|
||||
buf_alpha2,
|
||||
buf_c, rs_c, cs_c,
|
||||
cntx,
|
||||
rntm,
|
||||
thread );
|
||||
}
|
||||
|
||||
@@ -149,6 +152,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* alpha2, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
) \
|
||||
{ \
|
||||
|
||||
@@ -46,6 +46,7 @@ void PASTEMAC0(opname) \
|
||||
obj_t* b, \
|
||||
obj_t* c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
cntl_t* cntl, \
|
||||
thrinfo_t* thread \
|
||||
);
|
||||
@@ -86,6 +87,7 @@ void PASTEMAC(ch,varname) \
|
||||
void* alpha2, \
|
||||
void* c, inc_t rs_c, inc_t cs_c, \
|
||||
cntx_t* cntx, \
|
||||
rntm_t* rntm, \
|
||||
thrinfo_t* thread \
|
||||
);
|
||||
|
||||
|
||||
@@ -46,6 +46,7 @@ void bli_trsm_xx_ker_var2
|
||||
obj_t* b,
|
||||
obj_t* c,
|
||||
cntx_t* cntx,
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl,
|
||||
thrinfo_t* thread
|
||||
)
|
||||
@@ -80,6 +81,7 @@ void bli_trsm_xx_ker_var2
|
||||
b,
|
||||
c,
|
||||
cntx,
|
||||
rntm,
|
||||
cntl,
|
||||
thread
|
||||
);
|
||||
|
||||
@@ -243,6 +243,12 @@ void bli_cntl_mark_family
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
// This function sets the family field of all cntl tree nodes that are
|
||||
// children of cntl. It's used by bli_l3_cntl_create_if() after making
|
||||
// a copy of a user-given cntl tree, if the user provided one, to mark
|
||||
// the operation family, which is used to determine appropriate behavior
|
||||
// by various functions when executing the blocked variants.
|
||||
|
||||
// Set the family of the root node.
|
||||
bli_cntl_set_family( family, cntl );
|
||||
|
||||
@@ -257,3 +263,31 @@ void bli_cntl_mark_family
|
||||
}
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
dim_t bli_cntl_calc_num_threads_in
|
||||
(
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
dim_t n_threads_in = 1;
|
||||
|
||||
for ( ; cntl != NULL; cntl = bli_cntl_sub_node( cntl ) )
|
||||
{
|
||||
bszid_t bszid = bli_cntl_bszid( cntl );
|
||||
dim_t cur_way;
|
||||
|
||||
// We assume bszid is in {KR,MR,NR,MC,KC,NR} if it is not
|
||||
// BLIS_NO_PART.
|
||||
if ( bszid != BLIS_NO_PART )
|
||||
cur_way = bli_rntm_ways_for( bszid, rntm );
|
||||
else
|
||||
cur_way = 1;
|
||||
|
||||
n_threads_in *= cur_way;
|
||||
}
|
||||
|
||||
return n_threads_in;
|
||||
}
|
||||
|
||||
|
||||
@@ -109,6 +109,14 @@ void bli_cntl_mark_family
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
dim_t bli_cntl_calc_num_threads_in
|
||||
(
|
||||
rntm_t* rntm,
|
||||
cntl_t* cntl
|
||||
);
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
// cntl_t query (fields only)
|
||||
|
||||
static opid_t bli_cntl_family( cntl_t* cntl )
|
||||
|
||||
@@ -42,34 +42,6 @@ void bli_cntx_clear( cntx_t* cntx )
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
dim_t bli_cntx_get_num_threads_in
|
||||
(
|
||||
cntx_t* cntx,
|
||||
cntl_t* cntl
|
||||
)
|
||||
{
|
||||
dim_t n_threads_in = 1;
|
||||
|
||||
for ( ; cntl != NULL; cntl = bli_cntl_sub_node( cntl ) )
|
||||
{
|
||||
bszid_t bszid = bli_cntl_bszid( cntl );
|
||||
dim_t cur_way;
|
||||
|
||||
// We assume bszid is in {KR,MR,NR,MC,KC,NR} if it is not
|
||||
// BLIS_NO_PART.
|
||||
if ( bszid != BLIS_NO_PART )
|
||||
cur_way = bli_cntx_way_for_bszid( bszid, cntx );
|
||||
else
|
||||
cur_way = 1;
|
||||
|
||||
n_threads_in *= cur_way;
|
||||
}
|
||||
|
||||
return n_threads_in;
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_cntx_set_blkszs( ind_t method, dim_t n_bs, ... )
|
||||
{
|
||||
// This function can be called from the bli_cntx_init_*() function for
|
||||
@@ -872,146 +844,6 @@ void bli_cntx_set_packm_kers( dim_t n_kers, ... )
|
||||
bli_free_intl( ker_fps );
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_cntx_set_thrloop_from_env
|
||||
(
|
||||
opid_t l3_op,
|
||||
side_t side,
|
||||
dim_t m,
|
||||
dim_t n,
|
||||
dim_t k,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
dim_t jc, pc, ic, jr, ir;
|
||||
|
||||
#ifdef BLIS_ENABLE_MULTITHREADING
|
||||
|
||||
int nthread = bli_thread_get_env( "BLIS_NUM_THREADS", -1 );
|
||||
|
||||
if ( nthread == -1 )
|
||||
nthread = bli_thread_get_env( "OMP_NUM_THREADS", -1 );
|
||||
|
||||
if ( nthread < 1 ) nthread = 1;
|
||||
|
||||
bli_partition_2x2( nthread, m*BLIS_DEFAULT_M_THREAD_RATIO,
|
||||
n*BLIS_DEFAULT_N_THREAD_RATIO, &ic, &jc );
|
||||
|
||||
for ( ir = BLIS_DEFAULT_MR_THREAD_MAX ; ir > 1 ; ir-- )
|
||||
{
|
||||
if ( ic % ir == 0 )
|
||||
{
|
||||
ic /= ir;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
for ( jr = BLIS_DEFAULT_NR_THREAD_MAX ; jr > 1 ; jr-- )
|
||||
{
|
||||
if ( jc % jr == 0 )
|
||||
{
|
||||
jc /= jr;
|
||||
break;
|
||||
}
|
||||
}
|
||||
|
||||
pc = 1;
|
||||
|
||||
dim_t jc_env = bli_thread_get_env( "BLIS_JC_NT", -1 );
|
||||
dim_t ic_env = bli_thread_get_env( "BLIS_IC_NT", -1 );
|
||||
dim_t jr_env = bli_thread_get_env( "BLIS_JR_NT", -1 );
|
||||
dim_t ir_env = bli_thread_get_env( "BLIS_IR_NT", -1 );
|
||||
|
||||
if (jc_env != -1 || ic_env != -1 || jr_env != -1 || ir_env != -1)
|
||||
{
|
||||
jc = (jc_env == -1 ? 1 : jc_env);
|
||||
ic = (ic_env == -1 ? 1 : ic_env);
|
||||
jr = (jr_env == -1 ? 1 : jr_env);
|
||||
ir = (ir_env == -1 ? 1 : ir_env);
|
||||
}
|
||||
|
||||
#else
|
||||
|
||||
jc = 1;
|
||||
pc = 1;
|
||||
ic = 1;
|
||||
jr = 1;
|
||||
ir = 1;
|
||||
|
||||
#endif
|
||||
|
||||
if ( l3_op == BLIS_TRMM )
|
||||
{
|
||||
// We reconfigure the parallelism from trmm_r due to a dependency in
|
||||
// the jc loop. (NOTE: This dependency does not exist for trmm3.)
|
||||
if ( bli_is_right( side ) )
|
||||
{
|
||||
bli_cntx_set_thrloop
|
||||
(
|
||||
1,
|
||||
pc,
|
||||
ic,
|
||||
jr * jc,
|
||||
ir,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
else // if ( bli_is_left( side ) )
|
||||
{
|
||||
bli_cntx_set_thrloop
|
||||
(
|
||||
jc,
|
||||
pc,
|
||||
ic,
|
||||
jr,
|
||||
ir,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
}
|
||||
else if ( l3_op == BLIS_TRSM )
|
||||
{
|
||||
if ( bli_is_right( side ) )
|
||||
{
|
||||
bli_cntx_set_thrloop
|
||||
(
|
||||
1,
|
||||
1,
|
||||
ic * pc * jc * ir * jr,
|
||||
1,
|
||||
1,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
else // if ( bli_is_left( side ) )
|
||||
{
|
||||
bli_cntx_set_thrloop
|
||||
(
|
||||
1,
|
||||
1,
|
||||
1,
|
||||
ic * pc * jc * jr * ir,
|
||||
1,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
}
|
||||
else // any other level-3 operation besides trmm/trsm
|
||||
{
|
||||
bli_cntx_set_thrloop
|
||||
(
|
||||
jc,
|
||||
pc,
|
||||
ic,
|
||||
jr,
|
||||
ir,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_cntx_print( cntx_t* cntx )
|
||||
|
||||
@@ -60,8 +60,6 @@ typedef struct cntx_s
|
||||
pack_t schema_b;
|
||||
pack_t schema_c;
|
||||
|
||||
dim_t* thrloop;
|
||||
|
||||
membrk_t* membrk;
|
||||
} cntx_t;
|
||||
*/
|
||||
@@ -124,10 +122,6 @@ static pack_t bli_cntx_schema_c_panel( cntx_t* cntx )
|
||||
{
|
||||
return cntx->schema_c_panel;
|
||||
}
|
||||
static dim_t* bli_cntx_thrloop( cntx_t* cntx )
|
||||
{
|
||||
return cntx->thrloop;
|
||||
}
|
||||
static membrk_t* bli_cntx_get_membrk( cntx_t* cntx )
|
||||
{
|
||||
return cntx->membrk;
|
||||
@@ -379,47 +373,6 @@ static void* bli_cntx_get_unpackm_ker_dt( num_t dt, l1mkr_t ker_id, cntx_t* cntx
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
static dim_t bli_cntx_jc_way( cntx_t* cntx )
|
||||
{
|
||||
return cntx->thrloop[ BLIS_NC ];
|
||||
}
|
||||
static dim_t bli_cntx_pc_way( cntx_t* cntx )
|
||||
{
|
||||
return cntx->thrloop[ BLIS_KC ];
|
||||
}
|
||||
static dim_t bli_cntx_ic_way( cntx_t* cntx )
|
||||
{
|
||||
return cntx->thrloop[ BLIS_MC ];
|
||||
}
|
||||
static dim_t bli_cntx_jr_way( cntx_t* cntx )
|
||||
{
|
||||
return cntx->thrloop[ BLIS_NR ];
|
||||
}
|
||||
static dim_t bli_cntx_ir_way( cntx_t* cntx )
|
||||
{
|
||||
return cntx->thrloop[ BLIS_MR ];
|
||||
}
|
||||
static dim_t bli_cntx_pr_way( cntx_t* cntx )
|
||||
{
|
||||
return cntx->thrloop[ BLIS_KR ];
|
||||
}
|
||||
|
||||
static dim_t bli_cntx_way_for_bszid( bszid_t bszid, cntx_t* cntx )
|
||||
{
|
||||
return cntx->thrloop[ bszid ];
|
||||
}
|
||||
|
||||
static dim_t bli_cntx_get_num_threads( cntx_t* cntx )
|
||||
{
|
||||
return bli_cntx_jc_way( cntx ) *
|
||||
bli_cntx_pc_way( cntx ) *
|
||||
bli_cntx_ic_way( cntx ) *
|
||||
bli_cntx_jr_way( cntx ) *
|
||||
bli_cntx_ir_way( cntx );
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
static bool_t bli_cntx_l3_nat_ukr_prefers_rows_dt( num_t dt, l3ukr_t ukr_id, cntx_t* cntx )
|
||||
{
|
||||
bool_t prefs = bli_cntx_get_l3_nat_ukr_prefs_dt( dt, ukr_id, cntx );
|
||||
@@ -584,24 +537,12 @@ static void bli_cntx_set_unpackm_ker_dt( void* fp, num_t dt, l1mkr_t ker_id, cnt
|
||||
bli_func_set_dt( fp, dt, func );
|
||||
}
|
||||
|
||||
static void bli_cntx_set_thrloop( dim_t jc, dim_t pc, dim_t ic, dim_t jr, dim_t ir, cntx_t* cntx )
|
||||
{
|
||||
cntx->thrloop[ BLIS_NC ] = jc;
|
||||
cntx->thrloop[ BLIS_KC ] = pc;
|
||||
cntx->thrloop[ BLIS_MC ] = ic;
|
||||
cntx->thrloop[ BLIS_NR ] = jr;
|
||||
cntx->thrloop[ BLIS_MR ] = ir;
|
||||
cntx->thrloop[ BLIS_KR ] = 1;
|
||||
}
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
// Function prototypes
|
||||
|
||||
void bli_cntx_clear( cntx_t* cntx );
|
||||
|
||||
dim_t bli_cntx_get_num_threads_in( cntx_t* cntx, cntl_t* cntl );
|
||||
|
||||
void bli_cntx_set_blkszs( ind_t method, dim_t n_bs, ... );
|
||||
|
||||
void bli_cntx_set_ind_blkszs( ind_t method, dim_t n_bs, ... );
|
||||
@@ -611,16 +552,6 @@ void bli_cntx_set_l1f_kers( dim_t n_kers, ... );
|
||||
void bli_cntx_set_l1v_kers( dim_t n_kers, ... );
|
||||
void bli_cntx_set_packm_kers( dim_t n_kers, ... );
|
||||
|
||||
void bli_cntx_set_thrloop_from_env
|
||||
(
|
||||
opid_t l3_op,
|
||||
side_t side,
|
||||
dim_t m,
|
||||
dim_t n,
|
||||
dim_t k,
|
||||
cntx_t* cntx
|
||||
);
|
||||
|
||||
void bli_cntx_print( cntx_t* cntx );
|
||||
|
||||
|
||||
|
||||
@@ -34,12 +34,15 @@
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
void bli_obj_create( num_t dt,
|
||||
dim_t m,
|
||||
dim_t n,
|
||||
inc_t rs,
|
||||
inc_t cs,
|
||||
obj_t* obj )
|
||||
void bli_obj_create
|
||||
(
|
||||
num_t dt,
|
||||
dim_t m,
|
||||
dim_t n,
|
||||
inc_t rs,
|
||||
inc_t cs,
|
||||
obj_t* obj
|
||||
)
|
||||
{
|
||||
bli_init_once();
|
||||
|
||||
@@ -48,13 +51,16 @@ void bli_obj_create( num_t dt,
|
||||
bli_obj_alloc_buffer( rs, cs, 1, obj );
|
||||
}
|
||||
|
||||
void bli_obj_create_with_attached_buffer( num_t dt,
|
||||
dim_t m,
|
||||
dim_t n,
|
||||
void* p,
|
||||
inc_t rs,
|
||||
inc_t cs,
|
||||
obj_t* obj )
|
||||
void bli_obj_create_with_attached_buffer
|
||||
(
|
||||
num_t dt,
|
||||
dim_t m,
|
||||
dim_t n,
|
||||
void* p,
|
||||
inc_t rs,
|
||||
inc_t cs,
|
||||
obj_t* obj
|
||||
)
|
||||
{
|
||||
bli_init_once();
|
||||
|
||||
@@ -63,10 +69,13 @@ void bli_obj_create_with_attached_buffer( num_t dt,
|
||||
bli_obj_attach_buffer( p, rs, cs, 1, obj );
|
||||
}
|
||||
|
||||
void bli_obj_create_without_buffer( num_t dt,
|
||||
dim_t m,
|
||||
dim_t n,
|
||||
obj_t* obj )
|
||||
void bli_obj_create_without_buffer
|
||||
(
|
||||
num_t dt,
|
||||
dim_t m,
|
||||
dim_t n,
|
||||
obj_t* obj
|
||||
)
|
||||
{
|
||||
siz_t elem_size;
|
||||
void* s;
|
||||
@@ -112,10 +121,13 @@ void bli_obj_create_without_buffer( num_t dt,
|
||||
else if ( bli_is_dcomplex( dt ) ) { bli_zset1s( *(( dcomplex* )s) ); }
|
||||
}
|
||||
|
||||
void bli_obj_alloc_buffer( inc_t rs,
|
||||
inc_t cs,
|
||||
inc_t is,
|
||||
obj_t* obj )
|
||||
void bli_obj_alloc_buffer
|
||||
(
|
||||
inc_t rs,
|
||||
inc_t cs,
|
||||
inc_t is,
|
||||
obj_t* obj
|
||||
)
|
||||
{
|
||||
dim_t n_elem = 0;
|
||||
dim_t m, n;
|
||||
@@ -178,11 +190,14 @@ void bli_obj_alloc_buffer( inc_t rs,
|
||||
bli_obj_set_imag_stride( is, obj );
|
||||
}
|
||||
|
||||
void bli_obj_attach_buffer( void* p,
|
||||
inc_t rs,
|
||||
inc_t cs,
|
||||
inc_t is,
|
||||
obj_t* obj )
|
||||
void bli_obj_attach_buffer
|
||||
(
|
||||
void* p,
|
||||
inc_t rs,
|
||||
inc_t cs,
|
||||
inc_t is,
|
||||
obj_t* obj
|
||||
)
|
||||
{
|
||||
bli_init_once();
|
||||
|
||||
@@ -201,24 +216,34 @@ void bli_obj_attach_buffer( void* p,
|
||||
bli_obj_set_imag_stride( is, obj );
|
||||
}
|
||||
|
||||
void bli_obj_create_1x1( num_t dt,
|
||||
obj_t* obj )
|
||||
void bli_obj_create_1x1
|
||||
(
|
||||
num_t dt,
|
||||
obj_t* obj
|
||||
)
|
||||
{
|
||||
bli_obj_create_without_buffer( dt, 1, 1, obj );
|
||||
|
||||
bli_obj_alloc_buffer( 1, 1, 1, obj );
|
||||
}
|
||||
|
||||
void bli_obj_create_1x1_with_attached_buffer( num_t dt,
|
||||
void* p,
|
||||
obj_t* obj )
|
||||
void bli_obj_create_1x1_with_attached_buffer
|
||||
(
|
||||
num_t dt,
|
||||
void* p,
|
||||
obj_t* obj
|
||||
)
|
||||
{
|
||||
bli_obj_create_without_buffer( dt, 1, 1, obj );
|
||||
|
||||
bli_obj_attach_buffer( p, 1, 1, 1, obj );
|
||||
}
|
||||
|
||||
void bli_obj_create_conf_to( obj_t* s, obj_t* d )
|
||||
void bli_obj_create_conf_to
|
||||
(
|
||||
obj_t* s,
|
||||
obj_t* d
|
||||
)
|
||||
{
|
||||
const num_t dt = bli_obj_dt( s );
|
||||
const dim_t m = bli_obj_length( s );
|
||||
@@ -229,7 +254,10 @@ void bli_obj_create_conf_to( obj_t* s, obj_t* d )
|
||||
bli_obj_create( dt, m, n, rs, cs, d );
|
||||
}
|
||||
|
||||
void bli_obj_free( obj_t* obj )
|
||||
void bli_obj_free
|
||||
(
|
||||
obj_t* obj
|
||||
)
|
||||
{
|
||||
if ( bli_error_checking_is_enabled() )
|
||||
bli_obj_free_check( obj );
|
||||
@@ -246,7 +274,11 @@ void bli_obj_free( obj_t* obj )
|
||||
}
|
||||
|
||||
#if 0
|
||||
//void bli_obj_create_const( double value, obj_t* obj )
|
||||
//void bli_obj_create_const
|
||||
(
|
||||
double value,
|
||||
obj_t* obj
|
||||
)
|
||||
{
|
||||
gint_t* temp_i;
|
||||
float* temp_s;
|
||||
@@ -273,7 +305,11 @@ void bli_obj_free( obj_t* obj )
|
||||
*temp_i = ( gint_t ) value;
|
||||
}
|
||||
|
||||
//void bli_obj_create_const_copy_of( obj_t* a, obj_t* b )
|
||||
//void bli_obj_create_const_copy_of
|
||||
(
|
||||
obj_t* a,
|
||||
obj_t* b
|
||||
)
|
||||
{
|
||||
gint_t* temp_i;
|
||||
float* temp_s;
|
||||
@@ -328,12 +364,15 @@ void bli_obj_free( obj_t* obj )
|
||||
}
|
||||
#endif
|
||||
|
||||
void bli_adjust_strides( dim_t m,
|
||||
dim_t n,
|
||||
siz_t elem_size,
|
||||
inc_t* rs,
|
||||
inc_t* cs,
|
||||
inc_t* is )
|
||||
void bli_adjust_strides
|
||||
(
|
||||
dim_t m,
|
||||
dim_t n,
|
||||
siz_t elem_size,
|
||||
inc_t* rs,
|
||||
inc_t* cs,
|
||||
inc_t* is
|
||||
)
|
||||
{
|
||||
// Here, we check the strides that were input from the user and modify
|
||||
// them if needed.
|
||||
@@ -422,7 +461,10 @@ static siz_t dt_sizes[6] =
|
||||
sizeof( constdata_t )
|
||||
};
|
||||
|
||||
siz_t bli_dt_size( num_t dt )
|
||||
siz_t bli_dt_size
|
||||
(
|
||||
num_t dt
|
||||
)
|
||||
{
|
||||
if ( bli_error_checking_is_enabled() )
|
||||
bli_dt_size_check( dt );
|
||||
@@ -439,7 +481,10 @@ static char* dt_names[ BLIS_NUM_FP_TYPES+1 ] =
|
||||
"int"
|
||||
};
|
||||
|
||||
char* bli_dt_string( num_t dt )
|
||||
char* bli_dt_string
|
||||
(
|
||||
num_t dt
|
||||
)
|
||||
{
|
||||
if ( bli_error_checking_is_enabled() )
|
||||
bli_dt_string_check( dt );
|
||||
@@ -447,7 +492,11 @@ char* bli_dt_string( num_t dt )
|
||||
return dt_names[dt];
|
||||
}
|
||||
|
||||
dim_t bli_align_dim_to_mult( dim_t dim, dim_t dim_mult )
|
||||
dim_t bli_align_dim_to_mult
|
||||
(
|
||||
dim_t dim,
|
||||
dim_t dim_mult
|
||||
)
|
||||
{
|
||||
// We return the dimension unmodified if the multiple is zero
|
||||
// (to avoid division by zero).
|
||||
@@ -460,7 +509,12 @@ dim_t bli_align_dim_to_mult( dim_t dim, dim_t dim_mult )
|
||||
return dim;
|
||||
}
|
||||
|
||||
dim_t bli_align_dim_to_size( dim_t dim, siz_t elem_size, siz_t align_size )
|
||||
dim_t bli_align_dim_to_size
|
||||
(
|
||||
dim_t dim,
|
||||
siz_t elem_size,
|
||||
siz_t align_size
|
||||
)
|
||||
{
|
||||
dim = ( ( dim * ( dim_t )elem_size +
|
||||
( dim_t )align_size - 1
|
||||
@@ -473,7 +527,11 @@ dim_t bli_align_dim_to_size( dim_t dim, siz_t elem_size, siz_t align_size )
|
||||
return dim;
|
||||
}
|
||||
|
||||
dim_t bli_align_ptr_to_size( void* p, size_t align_size )
|
||||
dim_t bli_align_ptr_to_size
|
||||
(
|
||||
void* p,
|
||||
size_t align_size
|
||||
)
|
||||
{
|
||||
dim_t dim;
|
||||
|
||||
@@ -484,6 +542,7 @@ dim_t bli_align_ptr_to_size( void* p, size_t align_size )
|
||||
return dim;
|
||||
}
|
||||
|
||||
#if 0
|
||||
static num_t type_union[BLIS_NUM_FP_TYPES][BLIS_NUM_FP_TYPES] =
|
||||
{
|
||||
// s c d z
|
||||
@@ -500,8 +559,13 @@ num_t bli_dt_union( num_t dt1, num_t dt2 )
|
||||
|
||||
return type_union[dt1][dt2];
|
||||
}
|
||||
#endif
|
||||
|
||||
void bli_obj_print( char* label, obj_t* obj )
|
||||
void bli_obj_print
|
||||
(
|
||||
char* label,
|
||||
obj_t* obj
|
||||
)
|
||||
{
|
||||
bli_init_once();
|
||||
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user