Removed support for 3m, 4m induced methods.

Details:
- Removed support for all induced methods except for 1m. This included
  removing code related to 3mh, 3m1, 4mh, 4m1a, and 4m1b as well as any
  code that existed only to support those implementations. These
  implementations were rarely used and posed code maintenance challenges
  for BLIS's maintainers going forward.
- Removed reference kernels for packm that pack 3m and 4m micropanels,
  and removed 3m/4m-related code from bli_cntx_ref.c.
- Removed support for 3m/4m from the code in frame/ind, then reorganized
  and streamlined the remaining code in that directory. The *ind(),
  *nat(), and *1m() APIs were all removed. (These additional API layers
  no longer made as much sense with only one induced method (1m) being
  supported.) The bli_ind.c file (and header) were moved to frame/base
  and bli_l3_ind.c (and header) and bli_l3_ind_tapi.h were moved to
  frame/3.
- Removed 3m/4m support from the code in frame/1m/packm.
- Removed 3m/4m support from trmm/trsm macrokernels and simplified some
  pointer arithmetic that was previously expressed in terms of the
  bli_ptr_inc_by_frac() static inline function (whose definition was
  also removed).
- Removed the following subdirectories of level-0 macro headers from
  frame/include/level0: ri3, rih, ri, ro, rpi. The level-0 scalar macros
  defined in these directories were used exclusively for 3m and 4m
  method codes.
- Simplified bli_cntx_set_blkszs() and bli_cntx_set_ind_blkszs() in
  light of 1m being the only induced method left within BLIS.
- Removed dt_on_output field within auxinfo_t and its associated
  accessor functions.
- Re-indexed the 1e/1r pack schemas after removing those associated with
  variants of the 3m and 4m methods. This leaves two bits unused within
  the pack format portion of the schema bitfield. (See bli_type_defs.h
  for more info.)
- Spun off the basic and expert interfaces to the object and typed APIs
  into separate files: bli_l3_oapi.c and bli_l3_oapi_ex.c; bli_l3_tapi.c
  and bli_l3_tapi_ex.c.
- Moved the level-3 operation-specific _check function calls from the
  operations' _front() functions to the corresponding _ex() function of
  the object API. (This change roughly maintains where the _check()
  functions are called in the call stack but lays the groundwork for
  future changes that may come to the level-3 object APIs.) Minor
  modifications to bli_l3_check.c to allow the check() functions to be
  called from the expert interface APIs.
- Removed support within the testsuite for testing the aforementioned
  induced methods, and updated the standalone test drivers in the 'test'
  directory so reflect the retirement of those induced methods.
- Modified the sandbox contract so that the user is obliged to define
  bli_gemm_ex() instead of bli_gemmnat(). (This change was made in light
  of the *nat() functions no longer existing.) Also updated the existing
  'power10' and 'gemmlike' sandboxes to come into compliance with the
  new sandbox rules.
- Updated BLISObjectAPI.md, BLISTypedAPI.md, Testsuite.md documentation
  to reflect the retirement of 3m/4m, and also modified Sandboxes.md to
  bring the document into alignment with new conventions.
- Updated various comments; removed segments of commented-out code.
This commit is contained in:
Field G. Van Zee
2021-10-28 16:05:43 -05:00
parent e8caf200a9
commit f065a8070f
163 changed files with 1455 additions and 17026 deletions

View File

@@ -2336,16 +2336,9 @@ char* bli_info_get_trsm_u_ukr_impl_string( ind_t method, num_t dt )
```
Possible implementation (ie: the `ind_t method` argument) types are:
* `BLIS_3MH`: Implementation based on the 3m method applied at the highest level, outside the 5th loop around the microkernel.
* `BLIS_3M1`: Implementation based on the 3m method applied within the 1st loop around the microkernel.
* `BLIS_4MH`: Implementation based on the 4m method applied at the highest level, outside the 5th loop around the microkernel.
* `BLIS_4M1B`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that the 1st loop is fissured into two loops, the first of which multiplies the real part of the current micropanel of packed matrix B (against all real and imaginary parts of packed matrix A), and the second of which multiplies the imaginary part of the current micropanel of packed matrix B.
* `BLIS_4M1A`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that real and imaginary components of the current micropanels are completely used before proceeding to the next virtual microkernel invocation.
* `BLIS_1M`: Implementation based on the 1m method. (This is the default induced method when real domain kernels are present but complex kernels are missing.)
* `BLIS_NAT`: Implementation based on "native" execution (ie: NOT an induced method).
**NOTE**: `BLIS_3M3` and `BLIS_3M2` have been deprecated from the `typedef enum` of `ind_t`, and `BLIS_4M1B` is also effectively no longer available, though the `typedef enum` value still exists.
Possible microkernel types (ie: the return values for `bli_info_get_*_ukr_impl_string()`) are:
* `BLIS_REFERENCE_UKERNEL` (`"refrnce"`): This value is returned when the queried microkernel is provided by the reference implementation.
* `BLIS_VIRTUAL_UKERNEL` (`"virtual"`): This value is returned when the queried microkernel is driven by a the "virtual" microkernel provided by an induced method. This happens for any `method` value that is not `BLIS_NAT` (ie: native), but only applies to the complex domain.

View File

@@ -2015,16 +2015,9 @@ char* bli_info_get_trsm_u_ukr_impl_string( ind_t method, num_t dt )
```
Possible implementation (ie: the `ind_t method` argument) types are:
* `BLIS_3MH`: Implementation based on the 3m method applied at the highest level, outside the 5th loop around the microkernel.
* `BLIS_3M1`: Implementation based on the 3m method applied within the 1st loop around the microkernel.
* `BLIS_4MH`: Implementation based on the 4m method applied at the highest level, outside the 5th loop around the microkernel.
* `BLIS_4M1B`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that the 1st loop is fissured into two loops, the first of which multiplies the real part of the current micropanel of packed matrix B (against all real and imaginary parts of packed matrix A), and the second of which multiplies the imaginary part of the current micropanel of packed matrix B.
* `BLIS_4M1A`: Implementation based on the 4m method applied within the 1st loop around the microkernel. Computation is ordered such that real and imaginary components of the current micropanels are completely used before proceeding to the next virtual microkernel invocation.
* `BLIS_1M`: Implementation based on the 1m method. (This is the default induced method when real domain kernels are present but complex kernels are missing.)
* `BLIS_NAT`: Implementation based on "native" execution (ie: NOT an induced method).
**NOTE**: `BLIS_3M3` and `BLIS_3M2` have been deprecated from the `typedef enum` of `ind_t`, and `BLIS_4M1B` is also effectively no longer available, though the `typedef enum` value still exists.
Possible microkernel types (ie: the return values for `bli_info_get_*_ukr_impl_string()`) are:
* `BLIS_REFERENCE_UKERNEL` (`"refrnce"`): This value is returned when the queried microkernel is provided by the reference implementation.
* `BLIS_VIRTUAL_UKERNEL` (`"virtual"`): This value is returned when the queried microkernel is driven by a the "virtual" microkernel provided by an induced method. This happens for any `method` value that is not `BLIS_NAT` (ie: native), but only applies to the complex domain.

View File

@@ -17,13 +17,9 @@ Simply put, a sandbox in BLIS provides an alternative implementation to the
`gemm` operation.
To get a little more specific, a sandbox provides an alternative implementation
to the function `bli_gemmnat()`, which is the object-based API call for
computing the `gemm` operation via native execution.
**Note**: Native execution simply means that an induced method will not be used.
It's what you probably already think of when you think of implementing the
`gemm` operation: a series of loops around an optimized (usually assembly-based)
microkernel with some packing functions thrown in at various levels.
to the function `bli_gemm_ex()`, which is the
[expert interface](BLISObjectAPI.md##basic-vs-expert-interfaces) for calling the
[object-based API](BLISObjectAPI.md#gemm) for the `gemm` operation.
Why sandboxes? Sometimes you want to experiment with tweaks or changes to
the `gemm` operation, but you want to do so in a simple environment rather than
@@ -45,18 +41,11 @@ corresponds to a sub-directory of `sandbox` named `gemmlike`. (Reminder: the
`auto` argument is the configuration target and thus unrelated to
sandboxes.)
NOTE: If you want your sandbox implementation to handle *all* problem
sizes and shapes, you'll need to disable the skinny/unpacked "sup"
sub-framework within BLIS, which is enabled by default. This can be
done by passing the `--disable-sup-handling` option to configure:
```
$ ./configure --enable-sandbox=gemmlike --disable-sup-handling auto
```
If you leave sup enabled, the sup implementation will, at runtime, detect
and handle certain smaller problem sizes upstream of where BLIS calls
`bli_gemmnat()` while all other problems will fall to your sandbox
implementation. Thus, you should only leave sup enabled if you are fine
with those smaller problems being handled by sup.
NOTE: Using your own sandbox implementation means that BLIS will call your
sandbox for *all* problem sizes and shapes, for *all* datatypes supported
by BLIS. If you intend to only implement a subset of this functionality
within your sandbox, you should be sure to redirect execution back into
the core framework for the parts that you don't wish to reimplement yourself.
As `configure` runs, you should get output that includes lines
similar to:
@@ -67,13 +56,12 @@ configure: sandbox/gemmlike
And when you build BLIS, the last files to be compiled will be the source
code in the specified sandbox:
```
Compiling obj/haswell/sandbox/gemmlike/bli_gemmnat.o ('haswell' CFLAGS for sandboxes)
Compiling obj/haswell/sandbox/gemmlike/bls_gemm.o ('haswell' CFLAGS for sandboxes)
Compiling obj/haswell/sandbox/gemmlike/bls_gemm_bp_var1.o ('haswell' CFLAGS for sandboxes)
...
```
That's it! After the BLIS library is built, it will contain your chosen
sandbox's implementation of `bli_gemmnat()` instead of the default
sandbox's implementation of `bli_gemm_ex()` instead of the default BLIS
implementation.
## Sandbox rules
@@ -97,7 +85,7 @@ Note that `blis.h` already contains all of its definitions inside of an
`extern "C"` block, so you should be able to `#include "blis.h"` from your
C++11 source code without any issues.
3. All of your code to replace BLIS's default implementation of `bli_gemmnat()`
3. All of your code to replace BLIS's default implementation of `bli_gemm_ex()`
should reside in the named sandbox directory, or some directory therein.
(Obviously.) For example, the "gemmlike" sandbox is located in
`sandbox/gemmlike`. All of the code associated with this sandbox will be
@@ -105,7 +93,7 @@ contained within `sandbox/gemmlike`. Note that you absolutely *may* include
additional code and interfaces within the sandbox, if you wish -- code and
interfaces that are not directly or indirectly needed for satisfying the
the "contract" set forth by the sandbox (i.e., including a local definition
of`bli_gemmnat()`).
of`bli_gemm_ex()`).
4. The *only* header file that is required of your sandbox is `bli_sandbox.h`.
It must be named `bli_sandbox.h` because `blis.h` will `#include` this file
@@ -119,12 +107,12 @@ you should only place things (e.g. prototypes or type definitions) in
(b) an *application* that calls your sandbox-enabled BLIS library.
Usually, neither of these situations will require any of your local definitions
since those local definitions are only needed to define your sandbox
implementation of `bli_gemmnat()`, and this function is already prototyped by
implementation of `bli_gemm_ex()`, and this function is already prototyped by
BLIS. *But if you are adding additional APIs and/or operations to the sandbox
that are unrelated to `bli_gemmnat()`, then you'll want to `#include` those
that are unrelated to `bli_gemm_ex()`, then you'll want to `#include` those
function prototypes from within `bli_sandbox.h`*
5. Your definition of `bli_gemmnat()` should be the **only function you define**
5. Your definition of `bli_gemm_ex()` should be the **only function you define**
in your sandbox that begins with `bli_`. If you define other functions that
begin with `bli_`, you risk a namespace collision with existing framework
functions. To guarantee safety, please prefix your locally-defined sandbox
@@ -147,9 +135,9 @@ For example, with a BLIS sandbox you **can** do the following kinds of things:
kernels, which can already be customized within each sub-configuration);
- try inlining your functions manually;
- pivot away from using `obj_t` objects at higher algorithmic level (such as
immediately after calling `bli_gemmnat()`) to try to avoid some overhead;
immediately after calling `bli_gemm_ex()`) to try to avoid some overhead;
- create experimental implementations of new BLAS-like operations (provided
that you also provide an implementation of `bli_gemmnat()`).
that you also provide an implementation of `bli_gemm_ex()`).
You **cannot**, however, use a sandbox to do the following kinds of things:
- define new datatypes (half-precision, quad-precision, short integer, etc.)
@@ -167,8 +155,8 @@ Another important limitation is the fact that the build system currently uses
# Example framework CFLAGS used by 'haswell' sub-configuration
-O3 -Wall -Wno-unused-function -Wfatal-errors -fPIC -std=c99
-D_POSIX_C_SOURCE=200112L -I./include/haswell -I./frame/3/
-I./frame/ind/ukernels/ -I./frame/1m/ -I./frame/1f/ -I./frame/1/
-I./frame/include -DBLIS_VERSION_STRING=\"0.3.2-51\"
-I./frame/1m/ -I./frame/1f/ -I./frame/1/ -I./frame/include
-DBLIS_VERSION_STRING=\"0.3.2-51\"
```
which are likely more general-purpose than the `CFLAGS` used for, say,
optimized kernels or even reference kernels.
@@ -176,8 +164,8 @@ optimized kernels or even reference kernels.
# Example optimized kernel CFLAGS used by 'haswell' sub-configuration
-O3 -mavx2 -mfma -mfpmath=sse -march=core-avx2 -Wall -Wno-unused-function
-Wfatal-errors -fPIC -std=c99 -D_POSIX_C_SOURCE=200112L -I./include/haswell
-I./frame/3/ -I./frame/ind/ukernels/ -I./frame/1m/ -I./frame/1f/ -I./frame/1/
-I./frame/include -DBLIS_VERSION_STRING=\"0.3.2-51\"
-I./frame/3/ -I./frame/1m/ -I./frame/1f/ -I./frame/1/ -I./frame/include
-DBLIS_VERSION_STRING=\"0.3.2-51\"
```
(To see precisely which flags are being employed for any given file, enable
verbosity at compile-time via `make V=1`.) Compiling sandboxes with these more

View File

@@ -128,11 +128,6 @@ sdcz # Datatype(s) to test:
300 # Problem size: maximum to test
100 # Problem size: increment between experiments
# Complex level-3 implementations to test
1 # 3mh ('1' = enable; '0' = disable)
1 # 3m1 ('1' = enable; '0' = disable)
1 # 4mh ('1' = enable; '0' = disable)
1 # 4m1b ('1' = enable; '0' = disable)
1 # 4m1a ('1' = enable; '0' = disable)
1 # 1m ('1' = enable; '0' = disable)
1 # native ('1' = enable; '0' = disable)
1 # Simulate application-level threading:
@@ -169,7 +164,7 @@ _**Test gemm with mixed-precision operands?**_ This boolean determines whether `
_**Problem size.**_ These values determine the first problem size to test, the maximum problem size to test, and the increment between problem sizes. Note that the maximum problem size only bounds the range of problem sizes; it is not guaranteed to be tested. Example: If the initial problem size is 128, the maximum is 1000, and the increment is 64, then the last problem size to be tested will be 960.
_**Complex level-3 implementations to test.**_ With the exception of the switch marked `native`, these switches control whether experimental complex domain implementations are tested (when applicable). These implementations employ induced methods complex matrix multiplication and apply to some (though not all) of the level-3 operations. If you don't know what these are, you can ignore them. The `native` switch corresponds to native execution of complex domain level-3 operations, which we test by default. We also test the `1m` method, since it is the induced method of choice when complex microkernels are not available. Note that all of these induced method tests (including `native`) are automatically disabled if the `c` and `z` datatypes are disabled.
_**Complex level-3 implementations to test.**_ This section lists which complex domain implementations of level-3 operations are tested. If you don't know what these are, you can ignore them. The `native` switch corresponds to native execution of complex domain level-3 operations, which we test by default. We also test the `1m` method, since it is the induced method of choice when optimized complex microkernels are not available. Note that all of these induced method tests (including `native`) are automatically disabled if the `c` and `z` datatypes are disabled.
_**Simulate application-level threading.**_ This setting specifies the number of threads the testsuite will spawn, and is meant to allow the user to exercise BLIS as a multithreaded application might if it were to make multiple concurrent calls to BLIS operations. (Note that the threading controlled by this option is orthogonal to, and has no effect on, whatever multithreading may be employed _within_ BLIS, as specified by the environment variables described in the [Multithreading](Multithreading.md) documentation.) When this option is set to 1, the testsuite is run with only one thread. When set to n > 1 threads, the spawned threads will parallelize (in round-robin fashion) the total set of tests specified by the testsuite input files, executing them in roughly the same order as that of a sequential execution.

View File

@@ -110,28 +110,6 @@ typedef void (*PASTECH3(ch,opname,_ker,tsuf)) \
INSERT_GENTDEF( unpackm_cxk )
// packm_3mis_ker
// packm_4mi_ker
#undef GENTDEF
#define GENTDEF( ctype, ch, opname, tsuf ) \
\
typedef void (*PASTECH3(ch,opname,_ker,tsuf)) \
( \
conj_t conja, \
dim_t cdim, \
dim_t n, \
dim_t n_max, \
ctype* restrict kappa, \
ctype* restrict a, inc_t inca, inc_t lda, \
ctype* restrict p, inc_t is_p, inc_t ldp, \
cntx_t* restrict cntx \
);
INSERT_GENTDEF( packm_cxk_3mis )
INSERT_GENTDEF( packm_cxk_4mi )
// packm_rih_ker
// packm_1er_ker
#undef GENTDEF
@@ -150,12 +128,8 @@ typedef void (*PASTECH3(ch,opname,_ker,tsuf)) \
cntx_t* restrict cntx \
);
INSERT_GENTDEF( packm_cxk_rih )
INSERT_GENTDEF( packm_cxk_1er )
#endif

View File

@@ -74,51 +74,6 @@ INSERT_GENTPROT_BASIC0( unpackm_14xk_ker_name )
INSERT_GENTPROT_BASIC0( unpackm_16xk_ker_name )
// 3mis packm kernels
#undef GENTPROT
#define GENTPROT PACKM_3MIS_KER_PROT
INSERT_GENTPROT_BASIC0( packm_2xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_4xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_6xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_8xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_10xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_12xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_14xk_3mis_ker_name )
INSERT_GENTPROT_BASIC0( packm_16xk_3mis_ker_name )
// 4mi packm kernels
#undef GENTPROT
#define GENTPROT PACKM_4MI_KER_PROT
INSERT_GENTPROT_BASIC0( packm_2xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_4xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_6xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_8xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_10xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_12xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_14xk_4mi_ker_name )
INSERT_GENTPROT_BASIC0( packm_16xk_4mi_ker_name )
// rih packm kernels
#undef GENTPROT
#define GENTPROT PACKM_RIH_KER_PROT
INSERT_GENTPROT_BASIC0( packm_2xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_4xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_6xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_8xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_10xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_12xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_14xk_rih_ker_name )
INSERT_GENTPROT_BASIC0( packm_16xk_rih_ker_name )
// 1e/1r packm kernels
#undef GENTPROT

View File

@@ -70,58 +70,6 @@ void PASTEMAC(ch,varname) \
);
// 3mis packm kernels
#define PACKM_3MIS_KER_PROT( ctype, ch, varname ) \
\
void PASTEMAC(ch,varname) \
( \
conj_t conja, \
dim_t cdim, \
dim_t n, \
dim_t n_max, \
ctype* restrict kappa, \
ctype* restrict a, inc_t inca, inc_t lda, \
ctype* restrict p, inc_t is_p, inc_t ldp, \
cntx_t* restrict cntx \
);
// 4mi packm kernels
#define PACKM_4MI_KER_PROT( ctype, ch, varname ) \
\
void PASTEMAC(ch,varname) \
( \
conj_t conja, \
dim_t cdim, \
dim_t n, \
dim_t n_max, \
ctype* restrict kappa, \
ctype* restrict a, inc_t inca, inc_t lda, \
ctype* restrict p, inc_t is_p, inc_t ldp, \
cntx_t* restrict cntx \
);
// rih packm kernels
#define PACKM_RIH_KER_PROT( ctype, ch, varname ) \
\
void PASTEMAC(ch,varname) \
( \
conj_t conja, \
pack_t schema, \
dim_t cdim, \
dim_t n, \
dim_t n_max, \
ctype* restrict kappa, \
ctype* restrict a, inc_t inca, inc_t lda, \
ctype* restrict p, inc_t ldp, \
cntx_t* restrict cntx \
);
// 1e/1r packm kernels
#define PACKM_1ER_KER_PROT( ctype, ch, varname ) \

View File

@@ -43,15 +43,9 @@
#include "bli_packm_var.h"
#include "bli_packm_struc_cxk.h"
#include "bli_packm_struc_cxk_4mi.h"
#include "bli_packm_struc_cxk_3mis.h"
#include "bli_packm_struc_cxk_rih.h"
#include "bli_packm_struc_cxk_1er.h"
#include "bli_packm_cxk.h"
#include "bli_packm_cxk_4mi.h"
#include "bli_packm_cxk_3mis.h"
#include "bli_packm_cxk_rih.h"
#include "bli_packm_cxk_1er.h"
// Mixed datatype support.

View File

@@ -71,31 +71,10 @@ static func_t packm_struc_cxk_kers[BLIS_NUM_PACK_SCHEMA_TYPES] =
// 0000 row/col panels
{ { bli_spackm_struc_cxk, bli_cpackm_struc_cxk,
bli_dpackm_struc_cxk, bli_zpackm_struc_cxk, } },
// 0001 row/col panels: 4m interleaved
{ { NULL, bli_cpackm_struc_cxk_4mi,
NULL, bli_zpackm_struc_cxk_4mi, } },
// 0010 row/col panels: 3m interleaved
{ { NULL, bli_cpackm_struc_cxk_3mis,
NULL, bli_zpackm_struc_cxk_3mis, } },
// 0011 row/col panels: 4m separated (NOT IMPLEMENTED)
{ { NULL, NULL,
NULL, NULL, } },
// 0100 row/col panels: 3m separated
{ { NULL, bli_cpackm_struc_cxk_3mis,
NULL, bli_zpackm_struc_cxk_3mis, } },
// 0101 row/col panels: real only
{ { NULL, bli_cpackm_struc_cxk_rih,
NULL, bli_zpackm_struc_cxk_rih, } },
// 0110 row/col panels: imaginary only
{ { NULL, bli_cpackm_struc_cxk_rih,
NULL, bli_zpackm_struc_cxk_rih, } },
// 0111 row/col panels: real+imaginary only
{ { NULL, bli_cpackm_struc_cxk_rih,
NULL, bli_zpackm_struc_cxk_rih, } },
// 1000 row/col panels: 1m-expanded (1e)
// 0001 row/col panels: 1m-expanded (1e)
{ { NULL, bli_cpackm_struc_cxk_1er,
NULL, bli_zpackm_struc_cxk_1er, } },
// 1001 row/col panels: 1m-reordered (1r)
// 0010 row/col panels: 1m-reordered (1r)
{ { NULL, bli_cpackm_struc_cxk_1er,
NULL, bli_zpackm_struc_cxk_1er, } },
};
@@ -204,15 +183,6 @@ void bli_packm_blk_var1
}
#if 0
if ( bli_is_4mi_packed( schema ) ) packm_kers = packm_struc_cxk_4mi_kers;
else if ( bli_is_3mi_packed( schema ) ||
bli_is_3ms_packed( schema ) ) packm_kers = packm_struc_cxk_3mis_kers;
else if ( bli_is_ro_packed( schema ) ||
bli_is_io_packed( schema ) ||
bli_is_rpi_packed( schema ) ) packm_kers = packm_struc_cxk_rih_kers;
else packm_kers = packm_struc_cxk_kers;
#else
// The original idea here was to read the packm_ukr from the context
// if it is non-NULL. The problem is, it requires that we be able to
// assume that the packm_ukr field is initialized to NULL, which it
@@ -238,7 +208,6 @@ void bli_packm_blk_var1
//packm_kers = bli_cntx_packm_ukrs( cntx );
packm_kers = cntx_packm_kers;
}
#endif
#endif
// Query the datatype-specific function pointer from the func_t object.
@@ -336,8 +305,6 @@ void PASTEMAC(ch,varname) \
bool row_stored; \
bool col_stored; \
inc_t is_p_use; \
dim_t ss_num; \
dim_t ss_den; \
\
ctype* restrict c_use; \
ctype* restrict p_use; \
@@ -408,17 +375,6 @@ void PASTEMAC(ch,varname) \
m_panel_max = &panel_dim_max; \
n_panel_max = &panel_len_max_i; \
} \
\
/* Compute the storage stride scaling. Usually this is just 1. However,
in the case of interleaved 3m, we need to scale by 3/2, and in the
cases of real-only, imag-only, or summed-only, we need to scale by
1/2. In both cases, we are compensating for the fact that pointer
arithmetic occurs in terms of complex elements rather than real
elements. */ \
if ( bli_is_3mi_packed( schema ) ) { ss_num = 3; ss_den = 2; } \
else if ( bli_is_3ms_packed( schema ) ) { ss_num = 1; ss_den = 2; } \
else if ( bli_is_rih_packed( schema ) ) { ss_num = 1; ss_den = 2; } \
else { ss_num = 1; ss_den = 1; } \
\
/* Compute the total number of iterations we'll need. */ \
n_iter = iter_dim / panel_dim_max + ( iter_dim % panel_dim_max ? 1 : 0 ); \
@@ -549,7 +505,7 @@ void PASTEMAC(ch,varname) \
/* NOTE: This value is usually LESS than ps_p because triangular
matrices usually have several micro-panels that are shorter
than a "full" micro-panel. */ \
p_inc = ( is_p_use * ss_num ) / ss_den; \
p_inc = is_p_use; \
} \
else if ( bli_is_herm_or_symm( strucc ) ) \
{ \
@@ -705,29 +661,6 @@ bli_thread_barrier( thread ); \
bli_thread_barrier( thread ); \
} \
*/
/*
if ( bli_is_4mi_packed( schema ) ) { \
printf( "packm_var2: is_p_use = %lu\n", is_p_use ); \
if ( col_stored ) { \
if ( 0 ) \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: a_r", *m_panel_use, *n_panel_use, \
( ctype_r* )c_use, 2*rs_c, 2*cs_c, "%4.1f", "" ); \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: ap_r", *m_panel_max, *n_panel_max, \
( ctype_r* )p_use, rs_p, cs_p, "%4.1f", "" ); \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: ap_i", *m_panel_max, *n_panel_max, \
( ctype_r* )p_use + is_p_use, rs_p, cs_p, "%4.1f", "" ); \
} \
if ( row_stored ) { \
if ( 0 ) \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: b_r", *m_panel_use, *n_panel_use, \
( ctype_r* )c_use, 2*rs_c, 2*cs_c, "%4.1f", "" ); \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: bp_r", *m_panel_max, *n_panel_max, \
( ctype_r* )p_use, rs_p, cs_p, "%4.1f", "" ); \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: bp_i", *m_panel_max, *n_panel_max, \
( ctype_r* )p_use + is_p_use, rs_p, cs_p, "%4.1f", "" ); \
} \
} \
*/
/*
PASTEMAC(chr,fprintm)( stdout, "packm_var2: bp_rpi", *m_panel_max, *n_panel_max, \
( ctype_r* )p_use, rs_p, cs_p, "%4.1f", "" ); \

View File

@@ -1,204 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
conj_t conja, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* kappa, \
ctype* a, inc_t inca, inc_t lda, \
ctype* p, inc_t is_p, inc_t ldp, \
cntx_t* cntx \
) \
{ \
/* Note that we use panel_dim_max, not panel_dim, to query the packm
kernel function pointer. This means that we always use the same
kernel, even for edge cases. */ \
num_t dt = PASTEMAC(ch,type); \
l1mkr_t ker_id = panel_dim_max; \
\
PASTECH2(ch,opname,_ker_ft) f; \
\
/* Query the context for the packm kernel corresponding to the current
panel dimension, or kernel id. If the id is invalid, the function will
return NULL. */ \
f = bli_cntx_get_packm_ker_dt( dt, ker_id, cntx ); \
\
/* If there exists a kernel implementation for the micro-panel dimension
provided, we invoke the implementation. Otherwise, we use scal2m. */ \
if ( f != NULL ) \
{ \
f \
( \
conja, \
panel_dim, \
panel_len, \
panel_len_max, \
kappa, \
a, inca, lda, \
p, is_p, ldp, \
cntx \
); \
} \
else \
{ \
/* Treat the micro-panel as panel_dim x panel_len and column-stored
(unit row stride). */ \
\
PASTEMAC(ch,scal2ri3s_mxn) \
( \
conja, \
panel_dim, \
panel_len, \
kappa, \
a, inca, lda, \
p, 1, ldp, is_p \
); \
\
/* If panel_dim < panel_dim_max, then we zero those unused rows. */ \
if ( panel_dim < panel_dim_max ) \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
const dim_t i = panel_dim; \
const dim_t m_edge = panel_dim_max - i; \
const dim_t n_edge = panel_len_max; \
ctype_r* p_edge_r = ( ctype_r* )p + (i )*1; \
ctype_r* p_edge_i = ( ctype_r* )p + is_p + (i )*1; \
ctype_r* p_edge_rpi = ( ctype_r* )p + 2*is_p + (i )*1; \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_r, 1, ldp, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_i, 1, ldp, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_rpi, 1, ldp, \
cntx, \
NULL \
); \
} \
\
/* If panel_len < panel_len_max, then we zero those unused columns. */ \
if ( panel_len < panel_len_max ) \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
const dim_t j = panel_len; \
const dim_t m_edge = panel_dim_max; \
const dim_t n_edge = panel_len_max - j; \
ctype_r* p_edge_r = ( ctype_r* )p + (j )*ldp; \
ctype_r* p_edge_i = ( ctype_r* )p + is_p + (j )*ldp; \
ctype_r* p_edge_rpi = ( ctype_r* )p + 2*is_p + (j )*ldp; \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_r, 1, ldp, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_i, 1, ldp, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_rpi, 1, ldp, \
cntx, \
NULL \
); \
} \
} \
}
INSERT_GENTFUNCCO_BASIC0( packm_cxk_3mis )

View File

@@ -1,53 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
conj_t conja, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* kappa, \
ctype* a, inc_t inca, inc_t lda, \
ctype* p, inc_t is_p, inc_t ldp, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_cxk_3mis )

View File

@@ -1,146 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
conj_t conja, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* kappa, \
ctype* a, inc_t inca, inc_t lda, \
ctype* p, inc_t is_p, inc_t ldp, \
cntx_t* cntx \
) \
{ \
/* Note that we use panel_dim_max, not panel_dim, to query the packm
kernel function pointer. This means that we always use the same
kernel, even for edge cases. */ \
num_t dt = PASTEMAC(ch,type); \
l1mkr_t ker_id = panel_dim_max; \
\
PASTECH2(ch,opname,_ker_ft) f; \
\
/* Query the context for the packm kernel corresponding to the current
panel dimension, or kernel id. If the id is invalid, the function will
return NULL. */ \
f = bli_cntx_get_packm_ker_dt( dt, ker_id, cntx ); \
\
/* If there exists a kernel implementation for the micro-panel dimension
provided, we invoke the implementation. Otherwise, we use scal2m. */ \
if ( f != NULL ) \
{ \
f \
( \
conja, \
panel_dim, \
panel_len, \
panel_len_max, \
kappa, \
a, inca, lda, \
p, is_p, ldp, \
cntx \
); \
} \
else \
{ \
/* Treat the micro-panel as panel_dim x panel_len and column-stored
(unit row stride). */ \
\
PASTEMAC(ch,scal2ris_mxn) \
( \
conja, \
panel_dim, \
panel_len, \
kappa, \
a, inca, lda, \
p, 1, ldp, is_p \
); \
\
/* If panel_dim < panel_dim_max, then we zero those unused rows. */ \
if ( panel_dim != panel_dim_max ) \
{ \
const dim_t i = panel_dim; \
const dim_t m_edge = panel_dim_max - i; \
const dim_t n_edge = panel_len_max; \
ctype_r* p_edge_r = ( ctype_r* )p + (i )*1; \
ctype_r* p_edge_i = ( ctype_r* )p + is_p + (i )*1; \
\
PASTEMAC(chr,set0s_mxn) \
( \
m_edge, \
n_edge, \
p_edge_r, 1, ldp \
); \
PASTEMAC(chr,set0s_mxn) \
( \
m_edge, \
n_edge, \
p_edge_i, 1, ldp \
); \
} \
\
/* If panel_len < panel_len_max, then we zero those unused columns. */ \
if ( panel_len != panel_len_max ) \
{ \
const dim_t j = panel_len; \
const dim_t m_edge = panel_dim_max; \
const dim_t n_edge = panel_len_max - j; \
ctype_r* p_edge_r = ( ctype_r* )p + (j )*ldp; \
ctype_r* p_edge_i = ( ctype_r* )p + is_p + (j )*ldp; \
\
PASTEMAC(chr,set0s_mxn) \
( \
m_edge, \
n_edge, \
p_edge_r, 1, ldp \
); \
PASTEMAC(chr,set0s_mxn) \
( \
m_edge, \
n_edge, \
p_edge_i, 1, ldp \
); \
} \
} \
}
INSERT_GENTFUNCCO_BASIC0( packm_cxk_4mi )

View File

@@ -1,53 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
conj_t conja, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* kappa, \
ctype* a, inc_t inca, inc_t lda, \
ctype* p, inc_t is_p, inc_t ldp, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_cxk_4mi )

View File

@@ -1,151 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
conj_t conja, \
pack_t schema, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* kappa, \
ctype* a, inc_t inca, inc_t lda, \
ctype* p, inc_t ldp, \
cntx_t* cntx \
) \
{ \
/* Note that we use panel_dim_max, not panel_dim, to query the packm
kernel function pointer. This means that we always use the same
kernel, even for edge cases. */ \
num_t dt = PASTEMAC(ch,type); \
l1mkr_t ker_id = panel_dim_max; \
\
PASTECH2(ch,opname,_ker_ft) f; \
\
/* Query the context for the packm kernel corresponding to the current
panel dimension, or kernel id. If the id is invalid, the function will
return NULL. */ \
f = bli_cntx_get_packm_ker_dt( dt, ker_id, cntx ); \
\
/* If there exists a kernel implementation for the micro-panel dimension
provided, we invoke the implementation. Otherwise, we use scal2m. */ \
if ( 0 && f != NULL ) \
{ \
f \
( \
conja, \
schema, \
panel_dim, \
panel_len, \
panel_len_max, \
kappa, \
a, inca, lda, \
p, ldp, \
cntx \
); \
} \
else \
{ \
/* Treat the micro-panel as panel_dim x panel_len and column-stored
(unit row stride). */ \
\
PASTEMAC(ch,scal2rihs_mxn) \
( \
schema, \
conja, \
panel_dim, \
panel_len, \
kappa, \
a, inca, lda, \
p, 1, ldp \
); \
\
/* If panel_dim < panel_dim_max, then we zero those unused rows. */ \
if ( panel_dim != panel_dim_max ) \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
const dim_t i = panel_dim; \
const dim_t m_edge = panel_dim_max - i; \
const dim_t n_edge = panel_len_max; \
ctype_r* p_edge_r = ( ctype_r* )p + (i )*1; \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_r, 1, ldp, \
cntx, \
NULL \
); \
} \
\
/* If panel_len < panel_len_max, then we zero those unused columns. */ \
if ( panel_len != panel_len_max ) \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
const dim_t j = panel_len; \
const dim_t m_edge = panel_dim_max; \
const dim_t n_edge = panel_len_max - j; \
ctype_r* p_edge_r = ( ctype_r* )p + (j )*ldp; \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_r, 1, ldp, \
cntx, \
NULL \
); \
} \
} \
}
INSERT_GENTFUNCCO_BASIC0( packm_cxk_rih )

View File

@@ -316,7 +316,7 @@ siz_t bli_packm_init_pack
bli_is_panel_packed( schema ) )
{
dim_t m_panel;
dim_t ps_p, ps_p_orig;
dim_t ps_p;
// The panel dimension (for each datatype) should be equal to the
// default (logical) blocksize multiple in the m dimension.
@@ -341,58 +341,17 @@ siz_t bli_packm_init_pack
// dimension of the matrix is not a whole multiple of MR.
ps_p = cs_p * n_p_pad;
// As a general rule, we don't want micropanel strides to be odd. This
// is primarily motivated by our desire to support interleaved 3m
// micropanels, in which case we have to scale the panel stride
// by 3/2. That division by 2 means the numerator (prior to being
// scaled by 3) must be even.
// As a general rule, we don't want micropanel strides to be odd.
// NOTE: This safety feature *may* not be necessary anymore, but was
// definitely needed to support certain variations of the 3m method.
if ( bli_is_odd( ps_p ) ) ps_p += 1;
// Preserve this early panel stride value for use later, if needed.
ps_p_orig = ps_p;
// Here, we adjust the panel stride, if necessary. Remember: ps_p is
// always interpreted as being in units of the datatype of the object
// which is not necessarily how the micropanels will be stored. For
// interleaved 3m, we will increase ps_p by 50%, and for ro/io/rpi,
// we halve ps_p. Why? Because the macro-kernel indexes in units of
// the complex datatype. So these changes "trick" it into indexing
// the correct amount.
if ( bli_is_3mi_packed( schema ) )
{
ps_p = ( ps_p * 3 ) / 2;
}
else if ( bli_is_3ms_packed( schema ) ||
bli_is_ro_packed( schema ) ||
bli_is_io_packed( schema ) ||
bli_is_rpi_packed( schema ) )
{
// The division by 2 below assumes that ps_p is an even number.
// However, it is possible that, at this point, ps_p is an odd.
// If it is indeed odd, we nudge it higher.
if ( bli_is_odd( ps_p ) ) ps_p += 1;
// Despite the fact that the packed micropanels will contain
// real elements, the panel stride that we store in the obj_t
// (which is passed into the macro-kernel) needs to be in units
// of complex elements, since the macro-kernel will index through
// micropanels via complex pointer arithmetic for trmm/trsm.
// Since the indexing "increment" will be twice as large as each
// actual stored element, we divide the panel_stride by 2.
ps_p = ps_p / 2;
}
// Set the imaginary stride (in units of fundamental elements) for
// 3m and 4m (separated or interleaved). We use ps_p_orig since
// that variable tracks the number of real part elements contained
// within each micropanel of the source matrix. Therefore, this
// is the number of real elements that must be traversed before
// reaching the imaginary part (3mi/4mi) of the packed micropanel,
// or the real part of the next micropanel (3ms).
if ( bli_is_3mi_packed( schema ) ) is_p = ps_p_orig;
else if ( bli_is_4mi_packed( schema ) ) is_p = ps_p_orig;
else if ( bli_is_3ms_packed( schema ) ) is_p = ps_p_orig * ( m_p_pad / m_panel );
else is_p = 1;
// Set the imaginary stride (in units of fundamental elements).
// This is the number of real elements that must be traversed before
// reaching the imaginary part of the packed micropanel. NOTE: the
// imaginary stride is mostly vestigial and left over from the 3m
// and 4m implementations.
is_p = 1;
// Store the strides and panel dimension in P.
bli_obj_set_strides( rs_p, cs_p, p );
@@ -409,7 +368,7 @@ siz_t bli_packm_init_pack
bli_is_panel_packed( schema ) )
{
dim_t n_panel;
dim_t ps_p, ps_p_orig;
dim_t ps_p;
// The panel dimension (for each datatype) should be equal to the
// default (logical) blocksize multiple in the n dimension.
@@ -435,58 +394,17 @@ siz_t bli_packm_init_pack
// dimension of the matrix is not a whole multiple of NR.
ps_p = m_p_pad * rs_p;
// As a general rule, we don't want micropanel strides to be odd. This
// is primarily motivated by our desire to support interleaved 3m
// micropanels, in which case we have to scale the panel stride
// by 3/2. That division by 2 means the numerator (prior to being
// scaled by 3) must be even.
// As a general rule, we don't want micropanel strides to be odd.
// NOTE: This safety feature *may* not be necessary anymore, but was
// definitely needed to support certain variations of the 3m method.
if ( bli_is_odd( ps_p ) ) ps_p += 1;
// Preserve this early panel stride value for use later, if needed.
ps_p_orig = ps_p;
// Here, we adjust the panel stride, if necessary. Remember: ps_p is
// always interpreted as being in units of the datatype of the object
// which is not necessarily how the micropanels will be stored. For
// interleaved 3m, we will increase ps_p by 50%, and for ro/io/rpi,
// we halve ps_p. Why? Because the macro-kernel indexes in units of
// the complex datatype. So these changes "trick" it into indexing
// the correct amount.
if ( bli_is_3mi_packed( schema ) )
{
ps_p = ( ps_p * 3 ) / 2;
}
else if ( bli_is_3ms_packed( schema ) ||
bli_is_ro_packed( schema ) ||
bli_is_io_packed( schema ) ||
bli_is_rpi_packed( schema ) )
{
// The division by 2 below assumes that ps_p is an even number.
// However, it is possible that, at this point, ps_p is an odd.
// If it is indeed odd, we nudge it higher.
if ( bli_is_odd( ps_p ) ) ps_p += 1;
// Despite the fact that the packed micropanels will contain
// real elements, the panel stride that we store in the obj_t
// (which is passed into the macro-kernel) needs to be in units
// of complex elements, since the macro-kernel will index through
// micropanels via complex pointer arithmetic for trmm/trsm.
// Since the indexing "increment" will be twice as large as each
// actual stored element, we divide the panel_stride by 2.
ps_p = ps_p / 2;
}
// Set the imaginary stride (in units of fundamental elements) for
// 3m and 4m (separated or interleaved). We use ps_p_orig since
// that variable tracks the number of real part elements contained
// within each micropanel of the source matrix. Therefore, this
// is the number of real elements that must be traversed before
// reaching the imaginary part (3mi/4mi) of the packed micropanel,
// or the real part of the next micropanel (3ms).
if ( bli_is_3mi_packed( schema ) ) is_p = ps_p_orig;
else if ( bli_is_4mi_packed( schema ) ) is_p = ps_p_orig;
else if ( bli_is_3ms_packed( schema ) ) is_p = ps_p_orig * ( n_p_pad / n_panel );
else is_p = 1;
// Set the imaginary stride (in units of fundamental elements).
// This is the number of real elements that must be traversed before
// reaching the imaginary part of the packed micropanel. NOTE: the
// imaginary stride is mostly vestigial and left over from the 3m
// and 4m implementations.
is_p = 1;
// Store the strides and panel dimension in P.
bli_obj_set_strides( rs_p, cs_p, p );

View File

@@ -1,842 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, \
cntx_t* cntx \
) \
{ \
dim_t panel_dim; \
dim_t panel_dim_max; \
dim_t panel_len; \
dim_t panel_len_max; \
inc_t incc, ldc; \
inc_t ldp; \
\
\
/* Determine the dimensions and relative strides of the micro-panel
based on its pack schema. */ \
if ( bli_is_col_packed( schema ) ) \
{ \
/* Prepare to pack to row-stored column panel. */ \
panel_dim = n_panel; \
panel_dim_max = n_panel_max; \
panel_len = m_panel; \
panel_len_max = m_panel_max; \
incc = cs_c; \
ldc = rs_c; \
ldp = rs_p; \
} \
else /* if ( bli_is_row_packed( schema ) ) */ \
{ \
/* Prepare to pack to column-stored row panel. */ \
panel_dim = m_panel; \
panel_dim_max = m_panel_max; \
panel_len = n_panel; \
panel_len_max = n_panel_max; \
incc = rs_c; \
ldc = cs_c; \
ldp = cs_p; \
} \
\
\
/* Handle micro-panel packing based on the structure of the matrix
being packed. */ \
if ( bli_is_general( strucc ) ) \
{ \
/* For micro-panels of general matrices, we can call the pack
kernel front-end directly. */ \
PASTEMAC(ch,kername) \
( \
conjc, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, incc, ldc, \
p, is_p, ldp, \
cntx \
); \
} \
else if ( bli_is_herm_or_symm( strucc ) ) \
{ \
/* Call a helper function for micro-panels of Hermitian/symmetric
matrices. */ \
PASTEMAC(ch,packm_herm_cxk_3mis) \
( \
strucc, \
diagoffc, \
uploc, \
conjc, \
schema, \
m_panel, \
n_panel, \
m_panel_max, \
n_panel_max, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, rs_c, cs_c, \
incc, ldc, \
p, rs_p, cs_p, \
is_p, ldp, \
cntx \
); \
} \
else /* ( bli_is_triangular( strucc ) ) */ \
{ \
/* Call a helper function for micro-panels of triangular
matrices. */ \
PASTEMAC(ch,packm_tri_cxk_3mis) \
( \
strucc, \
diagoffc, \
diagc, \
uploc, \
conjc, \
schema, \
invdiag, \
m_panel, \
n_panel, \
m_panel_max, \
n_panel_max, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, rs_c, cs_c, \
incc, ldc, \
p, rs_p, cs_p, \
is_p, ldp, \
cntx \
); \
} \
\
\
/* If m_panel < m_panel_max, or n_panel < n_panel_max, we would normally
fill the edge region (the bottom m_panel_max - m_panel rows or right-
side n_panel_max - n_panel columns) of the micropanel with zeros.
However, this responsibility has been moved to the packm microkernel.
This change allows experts to use custom kernels that pack to custom
packing formats when the problem size is not a nice multiple of the
register blocksize. */ \
/*
if ( m_panel != m_panel_max ) \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
dim_t i = m_panel; \
dim_t m_edge = m_panel_max - i; \
dim_t n_edge = n_panel_max; \
ctype_r* p_edge_r = ( ctype_r* )p + (i )*rs_p; \
ctype_r* p_edge_i = ( ctype_r* )p + is_p + (i )*rs_p; \
ctype_r* p_edge_rpi = ( ctype_r* )p + 2*is_p + (i )*rs_p; \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_i, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_rpi, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
*/ \
\
/*
if ( n_panel != n_panel_max ) \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
dim_t j = n_panel; \
dim_t m_edge = m_panel_max; \
dim_t n_edge = n_panel_max - j; \
ctype_r* p_edge_r = ( ctype_r* )p + (j )*cs_p; \
ctype_r* p_edge_i = ( ctype_r* )p + is_p + (j )*cs_p; \
ctype_r* p_edge_rpi = ( ctype_r* )p + 2*is_p + (j )*cs_p; \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_i, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_rpi, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
*/ \
\
\
if ( bli_is_triangular( strucc ) ) \
{ \
/* If this panel is an edge case in both panel dimension and length,
then it must be a bottom-right corner case. Set the part of the
diagonal that extends into the zero-padded region to identity.
NOTE: This is actually only necessary when packing for trsm, as
it helps prevent NaNs and Infs from creeping into the computation.
However, we set the region to identity for trmm as well. Those
1.0's end up getting muliplied by the 0.0's in the zero-padded
region of the other matrix, so there is no harm in this. */ \
if ( m_panel != m_panel_max && \
n_panel != n_panel_max ) \
{ \
ctype_r* restrict one_r = PASTEMAC(chr,1); \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
dim_t i = m_panel; \
dim_t j = n_panel; \
dim_t m_br = m_panel_max - i; \
dim_t n_br = n_panel_max - j; \
ctype_r* p_br_r = ( ctype_r* )p + (i )*rs_p + (j )*cs_p; \
ctype_r* p_br_i = ( ctype_r* )p + is_p + (i )*rs_p + (j )*cs_p; \
\
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
m_br, \
n_br, \
one_r, \
p_br_r, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
m_br, \
n_br, \
zero_r, \
p_br_i, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
} \
}
INSERT_GENTFUNCCO_BASIC( packm_struc_cxk_3mis, packm_cxk_3mis )
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, inc_t ldp, \
cntx_t* cntx \
) \
{ \
doff_t diagoffc_abs; \
dim_t i, j; \
bool row_stored; \
bool col_stored; \
\
\
/* Create flags to incidate row or column storage. Note that the
schema bit that encodes row or column is describing the form of
micro-panel, not the storage in the micro-panel. Hence the
mismatch in "row" and "column" semantics. */ \
row_stored = bli_is_col_packed( schema ); \
col_stored = bli_is_row_packed( schema ); \
\
\
/* Handle the case where the micro-panel does NOT intersect the
diagonal separately from the case where it does intersect. */ \
if ( !bli_intersects_diag_n( diagoffc, m_panel, n_panel ) ) \
{ \
/* If the current panel is unstored, we need to make a few
adjustments so we refer to the data where it is actually
stored, also taking conjugation into account. (Note this
implicitly assumes we are operating on a dense panel
within a larger symmetric or Hermitian matrix, since a
general matrix would not contain any unstored region.) */ \
if ( bli_is_unstored_subpart_n( diagoffc, uploc, m_panel, n_panel ) ) \
{ \
c = c + diagoffc * ( doff_t )cs_c + \
-diagoffc * ( doff_t )rs_c; \
bli_swap_incs( &incc, &ldc ); \
\
if ( bli_is_hermitian( strucc ) ) \
bli_toggle_conj( &conjc ); \
} \
\
/* Pack the full panel. */ \
PASTEMAC(ch,kername) \
( \
conjc, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, incc, ldc, \
p, is_p, ldp, \
cntx \
); \
} \
else /* if ( bli_intersects_diag_n( diagoffc, m_panel, n_panel ) ) */ \
{ \
ctype_r* restrict p_r = ( ctype_r* )p; \
\
ctype_r* restrict one_r = PASTEMAC(chr,1); \
ctype_r* restrict minus_one_r = PASTEMAC(chr,m1); \
\
ctype* restrict c10; \
ctype_r* restrict p10; \
dim_t p10_dim, p10_len; \
inc_t incc10, ldc10; \
doff_t diagoffc10; \
conj_t conjc10; \
\
ctype* restrict c12; \
ctype_r* restrict p12; \
dim_t p12_dim, p12_len; \
inc_t incc12, ldc12; \
doff_t diagoffc12; \
conj_t conjc12; \
\
/* Sanity check. Diagonals should not intersect the short end of
a micro-panel. If they do, then somehow the constraints on
cache blocksizes being a whole multiple of the register
blocksizes was somehow violated. */ \
if ( ( col_stored && diagoffc < 0 ) || \
( row_stored && diagoffc > 0 ) ) \
bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \
\
diagoffc_abs = bli_abs( diagoffc ); \
\
if ( ( row_stored && bli_is_upper( uploc ) ) || \
( col_stored && bli_is_lower( uploc ) ) ) \
{ \
p10_dim = panel_dim; \
p10_len = diagoffc_abs; \
p10 = p_r; \
c10 = c; \
incc10 = incc; \
ldc10 = ldc; \
conjc10 = conjc; \
\
p12_dim = panel_dim; \
p12_len = panel_len - p10_len; \
j = p10_len; \
diagoffc12 = diagoffc_abs - j; \
p12 = p_r + (j )*ldp; \
c12 = c + (j )*ldc; \
c12 = c12 + diagoffc12 * ( doff_t )cs_c + \
-diagoffc12 * ( doff_t )rs_c; \
incc12 = ldc; \
ldc12 = incc; \
conjc12 = conjc; \
\
if ( bli_is_hermitian( strucc ) ) \
bli_toggle_conj( &conjc12 ); \
} \
else /* if ( ( row_stored && bli_is_lower( uploc ) ) || \
( col_stored && bli_is_upper( uploc ) ) ) */ \
{ \
p10_dim = panel_dim; \
p10_len = diagoffc_abs + panel_dim; \
diagoffc10 = diagoffc; \
p10 = p_r; \
c10 = c; \
c10 = c10 + diagoffc10 * ( doff_t )cs_c + \
-diagoffc10 * ( doff_t )rs_c; \
incc10 = ldc; \
ldc10 = incc; \
conjc10 = conjc; \
\
p12_dim = panel_dim; \
p12_len = panel_len - p10_len; \
j = p10_len; \
p12 = p_r + (j )*ldp; \
c12 = c + (j )*ldc; \
incc12 = incc; \
ldc12 = ldc; \
conjc12 = conjc; \
\
if ( bli_is_hermitian( strucc ) ) \
bli_toggle_conj( &conjc10 ); \
} \
\
/* Pack to p10. For upper storage, this includes the unstored
triangle of c11. */ \
/* NOTE: Since we're only packing partial panels here, we pass in
p1x_len as panel_len_max; otherwise, the packm kernel will zero-
fill the columns up to panel_len_max, which is not what we need
or want to happen. */ \
PASTEMAC(ch,kername) \
( \
conjc10, \
p10_dim, \
panel_dim_max, \
p10_len, \
p10_len, \
kappa, \
c10, incc10, ldc10, \
( ctype* )p10, is_p, ldp, \
cntx \
); \
\
/* Pack to p12. For lower storage, this includes the unstored
triangle of c11. */ \
/* NOTE: Since we're only packing partial panels here, we pass in
p1x_len as panel_len_max; otherwise, the packm kernel will zero-
fill the columns up to panel_len_max, which is not what we need
or want to happen. */ \
PASTEMAC(ch,kername) \
( \
conjc12, \
p12_dim, \
panel_dim_max, \
p12_len, \
p12_len, \
kappa, \
c12, incc12, ldc12, \
( ctype* )p12, is_p, ldp, \
cntx \
); \
\
/* Pack the stored triangle of c11 to p11. */ \
{ \
dim_t p11_m = panel_dim; \
dim_t p11_n = panel_dim; \
inc_t rs_c11 = 2*rs_c; \
inc_t cs_c11 = 2*cs_c; \
dim_t j2 = diagoffc_abs; \
ctype* c11 = ( ctype* )c + (j2 )*ldc; \
ctype_r* p11 = ( ctype_r* )p_r + (j2 )*ldp; \
ctype_r* c11_r = ( ctype_r* )c11; \
ctype_r* c11_i = ( ctype_r* )c11 + 1; \
ctype_r* p11_r = ( ctype_r* )p11; \
ctype_r* p11_i = ( ctype_r* )p11 + is_p; \
ctype_r* alpha_r = one_r; \
ctype_r* alpha_i = ( bli_is_conj( conjc ) ? minus_one_r : one_r ); \
ctype_r kappa_r = PASTEMAC(ch,real)( *kappa ); \
ctype_r kappa_i = PASTEMAC(ch,imag)( *kappa ); \
\
/* Copy the real part of the stored triangle of c11 to p11_r. */ \
PASTEMAC2(chr,scal2m,BLIS_TAPI_EX_SUF) \
( \
0, \
BLIS_NONUNIT_DIAG, \
uploc, \
BLIS_NO_TRANSPOSE, \
p11_m, \
p11_n, \
alpha_r, \
c11_r, rs_c11, cs_c11, \
p11_r, rs_p, cs_p, \
cntx, \
NULL \
); \
\
/* Copy the imaginary part of the stored triangle of c11 to p11_i,
scaling by -1 if conjugation on c was requested. */ \
PASTEMAC2(chr,scal2m,BLIS_TAPI_EX_SUF) \
( \
0, \
BLIS_NONUNIT_DIAG, \
uploc, \
BLIS_NO_TRANSPOSE, \
p11_m, \
p11_n, \
alpha_i, \
c11_i, rs_c11, cs_c11, \
p11_i, rs_p, cs_p, \
cntx, \
NULL \
); \
\
/* If source matrix c is Hermitian, we have to zero out the
imaginary components of the diagonal of p11 in case the
corresponding elements in c11 were not already zero. */ \
if ( bli_is_hermitian( strucc ) ) \
{ \
for ( i = 0; i < p11_m; ++i ) \
{ \
ctype_r* pi11_i = p11_i + (i )*rs_p + (i )*cs_p; \
\
PASTEMAC(chr,set0s)( *pi11_i ); \
} \
} \
\
/* Apply kappa to the part of p11 that corresponds to the stored
part of c11 that was copied above. */ \
if ( bli_is_upper( uploc ) ) \
{ \
PASTEMAC(ch,scalris_mxn_u) \
( \
0, \
p11_m, \
p11_n, \
&kappa_r, \
&kappa_i, \
p11_r, \
p11_i, rs_p, cs_p \
); \
} \
else \
{ \
PASTEMAC(ch,scalris_mxn_l) \
( \
0, \
p11_m, \
p11_n, \
&kappa_r, \
&kappa_i, \
p11_r, \
p11_i, rs_p, cs_p \
); \
} \
\
/* Update the p11 section of the ri panel. It simply needs
to contain the sum of p11_r + p11_i. */ \
{ \
ctype_r* p11_rpi = p11_i + is_p; \
\
for ( j = 0; j < p11_n; ++j ) \
for ( i = 0; i < p11_m; ++i ) \
{ \
ctype_r* pi11_r = p11_r + (i )*rs_p + (j )*cs_p; \
ctype_r* pi11_i = p11_i + (i )*rs_p + (j )*cs_p; \
ctype_r* pi11_rpi = p11_rpi + (i )*rs_p + (j )*cs_p; \
\
PASTEMAC(chr,add3s) \
( \
*pi11_r, \
*pi11_i, \
*pi11_rpi \
); \
} \
} \
} \
} \
}
INSERT_GENTFUNCCO_BASIC( packm_herm_cxk_3mis, packm_cxk_3mis )
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffp, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, inc_t ldp, \
cntx_t* cntx \
) \
{ \
/* Pack the panel. */ \
PASTEMAC(ch,kername) \
( \
conjc, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, incc, ldc, \
p, is_p, ldp, \
cntx \
); \
\
\
/* Tweak the panel according to its triangular structure */ \
{ \
ctype_r* p_r = ( ctype_r* )p + 0; \
ctype_r* p_i = ( ctype_r* )p + is_p; \
ctype_r* p_rpi = ( ctype_r* )p + 2*is_p; \
\
dim_t j = bli_abs( diagoffp ); \
ctype_r* p11_r = p_r + (j )*ldp; \
ctype_r* p11_i = p_i + (j )*ldp; \
ctype_r* p11_rpi = p_rpi + (j )*ldp; \
\
dim_t p11_m = m_panel; \
dim_t p11_n = n_panel; \
\
dim_t min_p11_m_n; \
\
if ( diagoffp < 0 ) p11_m -= j; \
else if ( diagoffp > 0 ) p11_n -= j; \
\
min_p11_m_n = bli_min( p11_m, p11_n ); \
\
\
/* If the diagonal of c is implicitly unit, explicitly set the
the diagonal of the packed panel to kappa. */ \
if ( bli_is_unit_diag( diagc ) ) \
{ \
ctype_r kappa_r = PASTEMAC(ch,real)( *kappa ); \
ctype_r kappa_i = PASTEMAC(ch,imag)( *kappa ); \
dim_t i; \
\
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
diagoffp, \
m_panel, \
n_panel, \
&kappa_r, \
p_r, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
diagoffp, \
m_panel, \
n_panel, \
&kappa_i, \
p_i, rs_p, cs_p, \
cntx, \
NULL \
); \
\
/* Update the diagonal of the p11 section of the rpi panel.
It simply needs to contain the sum of diagonals of p11_r
and p11_i. */ \
for ( i = 0; i < min_p11_m_n; ++i ) \
{ \
ctype_r* pi11_r = p11_r + (i )*rs_p + (i )*cs_p; \
ctype_r* pi11_i = p11_i + (i )*rs_p + (i )*cs_p; \
ctype_r* pi11_rpi = p11_rpi + (i )*rs_p + (i )*cs_p; \
\
PASTEMAC(chr,add3s)( *pi11_r, *pi11_i, *pi11_rpi ); \
} \
} \
\
/* If requested, invert the diagonal of the packed panel. Note
that we do not need to update the ri panel since inverted
diagonals are only needed by trsm, which does not use the
p11 section of the ri panel. */ \
if ( invdiag == TRUE ) \
{ \
dim_t i; \
\
for ( i = 0; i < min_p11_m_n; ++i ) \
{ \
ctype_r* pi11_r = p11_r + (i )*rs_p + (i )*cs_p; \
ctype_r* pi11_i = p11_i + (i )*rs_p + (i )*cs_p; \
\
PASTEMAC(ch,invertris)( *pi11_r, *pi11_i ); \
} \
} \
\
/* Set the region opposite the diagonal of p to zero. To do this,
we need to reference the "unstored" region on the other side of
the diagonal. This amounts to toggling uploc and then shifting
the diagonal offset to shrink the newly referenced region (by
one diagonal). Note that this zero-filling is not needed for
trsm, since the unstored region is not referenced by the trsm
micro-kernel; however, zero-filling is needed for trmm, which
uses the gemm micro-kernel.*/ \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
uplo_t uplop = uploc; \
\
bli_toggle_uplo( &uplop ); \
bli_shift_diag_offset_to_shrink_uplo( uplop, &diagoffp ); \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
diagoffp, \
BLIS_NONUNIT_DIAG, \
uplop, \
m_panel, \
n_panel, \
zero_r, \
p_r, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
diagoffp, \
BLIS_NONUNIT_DIAG, \
uplop, \
m_panel, \
n_panel, \
zero_r, \
p_i, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
diagoffp, \
BLIS_NONUNIT_DIAG, \
uplop, \
m_panel, \
n_panel, \
zero_r, \
p_rpi, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
} \
}
INSERT_GENTFUNCCO_BASIC( packm_tri_cxk_3mis, packm_cxk_3mis )

View File

@@ -1,121 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffp, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_struc_cxk_3mis )
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, inc_t ldp, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_herm_cxk_3mis )
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, inc_t ldp, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_tri_cxk_3mis )

View File

@@ -1,757 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, \
cntx_t* cntx \
) \
{ \
dim_t panel_dim; \
dim_t panel_dim_max; \
dim_t panel_len; \
dim_t panel_len_max; \
inc_t incc, ldc; \
inc_t ldp; \
\
\
/* Determine the dimensions and relative strides of the micro-panel
based on its pack schema. */ \
if ( bli_is_col_packed( schema ) ) \
{ \
/* Prepare to pack to row-stored column panel. */ \
panel_dim = n_panel; \
panel_dim_max = n_panel_max; \
panel_len = m_panel; \
panel_len_max = m_panel_max; \
incc = cs_c; \
ldc = rs_c; \
ldp = rs_p; \
} \
else /* if ( bli_is_row_packed( schema ) ) */ \
{ \
/* Prepare to pack to column-stored row panel. */ \
panel_dim = m_panel; \
panel_dim_max = m_panel_max; \
panel_len = n_panel; \
panel_len_max = n_panel_max; \
incc = rs_c; \
ldc = cs_c; \
ldp = cs_p; \
} \
\
\
/* Handle micro-panel packing based on the structure of the matrix
being packed. */ \
if ( bli_is_general( strucc ) ) \
{ \
/* For micro-panels of general matrices, we can call the pack
kernel front-end directly. */ \
PASTEMAC(ch,kername) \
( \
conjc, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, incc, ldc, \
p, is_p, ldp, \
cntx \
); \
} \
else if ( bli_is_herm_or_symm( strucc ) ) \
{ \
/* Call a helper function for micro-panels of Hermitian/symmetric
matrices. */ \
PASTEMAC(ch,packm_herm_cxk_4mi) \
( \
strucc, \
diagoffc, \
uploc, \
conjc, \
schema, \
m_panel, \
n_panel, \
m_panel_max, \
n_panel_max, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, rs_c, cs_c, \
incc, ldc, \
p, rs_p, cs_p, \
is_p, ldp, \
cntx \
); \
} \
else /* ( bli_is_triangular( strucc ) ) */ \
{ \
/* Call a helper function for micro-panels of triangular
matrices. */ \
PASTEMAC(ch,packm_tri_cxk_4mi) \
( \
strucc, \
diagoffc, \
diagc, \
uploc, \
conjc, \
schema, \
invdiag, \
m_panel, \
n_panel, \
m_panel_max, \
n_panel_max, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, rs_c, cs_c, \
incc, ldc, \
p, rs_p, cs_p, \
is_p, ldp, \
cntx \
); \
} \
\
\
/* If m_panel < m_panel_max, or n_panel < n_panel_max, we would normally
fill the edge region (the bottom m_panel_max - m_panel rows or right-
side n_panel_max - n_panel columns) of the micropanel with zeros.
However, this responsibility has been moved to the packm microkernel.
This change allows experts to use custom kernels that pack to custom
packing formats when the problem size is not a nice multiple of the
register blocksize. */ \
/*
if ( m_panel != m_panel_max ) \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
dim_t i = m_panel; \
dim_t m_edge = m_panel_max - i; \
dim_t n_edge = n_panel_max; \
ctype_r* p_edge_r = ( ctype_r* )p + (i )*rs_p; \
ctype_r* p_edge_i = ( ctype_r* )p + is_p + (i )*rs_p; \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_i, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
\
if ( n_panel != n_panel_max ) \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
dim_t j = n_panel; \
dim_t m_edge = m_panel_max; \
dim_t n_edge = n_panel_max - j; \
ctype_r* p_edge_r = ( ctype_r* )p + (j )*cs_p; \
ctype_r* p_edge_i = ( ctype_r* )p + is_p + (j )*cs_p; \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_i, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
*/ \
\
\
if ( bli_is_triangular( strucc ) ) \
{ \
/* If this panel is an edge case in both panel dimension and length,
then it must be a bottom-right corner case. Set the part of the
diagonal that extends into the zero-padded region to identity.
NOTE: This is actually only necessary when packing for trsm, as
it helps prevent NaNs and Infs from creeping into the computation.
However, we set the region to identity for trmm as well. Those
1.0's end up getting muliplied by the 0.0's in the zero-padded
region of the other matrix, so there is no harm in this. */ \
if ( m_panel != m_panel_max && \
n_panel != n_panel_max ) \
{ \
ctype_r* restrict one_r = PASTEMAC(chr,1); \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
dim_t i = m_panel; \
dim_t j = n_panel; \
dim_t m_br = m_panel_max - i; \
dim_t n_br = n_panel_max - j; \
ctype_r* p_br_r = ( ctype_r* )p + (i )*rs_p + (j )*cs_p; \
ctype_r* p_br_i = ( ctype_r* )p + is_p + (i )*rs_p + (j )*cs_p; \
\
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
m_br, \
n_br, \
one_r, \
p_br_r, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
m_br, \
n_br, \
zero_r, \
p_br_i, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
} \
}
INSERT_GENTFUNCCO_BASIC( packm_struc_cxk_4mi, packm_cxk_4mi )
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, inc_t ldp, \
cntx_t* cntx \
) \
{ \
doff_t diagoffc_abs; \
dim_t i, j; \
bool row_stored; \
bool col_stored; \
\
\
/* Create flags to incidate row or column storage. Note that the
schema bit that encodes row or column is describing the form of
micro-panel, not the storage in the micro-panel. Hence the
mismatch in "row" and "column" semantics. */ \
row_stored = bli_is_col_packed( schema ); \
col_stored = bli_is_row_packed( schema ); \
\
\
/* Handle the case where the micro-panel does NOT intersect the
diagonal separately from the case where it does intersect. */ \
if ( !bli_intersects_diag_n( diagoffc, m_panel, n_panel ) ) \
{ \
/* If the current panel is unstored, we need to make a few
adjustments so we refer to the data where it is actually
stored, also taking conjugation into account. (Note this
implicitly assumes we are operating on a dense panel
within a larger symmetric or Hermitian matrix, since a
general matrix would not contain any unstored region.) */ \
if ( bli_is_unstored_subpart_n( diagoffc, uploc, m_panel, n_panel ) ) \
{ \
c = c + diagoffc * ( doff_t )cs_c + \
-diagoffc * ( doff_t )rs_c; \
bli_swap_incs( &incc, &ldc ); \
\
if ( bli_is_hermitian( strucc ) ) \
bli_toggle_conj( &conjc ); \
} \
\
/* Pack the full panel. */ \
PASTEMAC(ch,kername) \
( \
conjc, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, incc, ldc, \
p, is_p, ldp, \
cntx \
); \
} \
else /* if ( bli_intersects_diag_n( diagoffc, m_panel, n_panel ) ) */ \
{ \
ctype_r* restrict p_r = ( ctype_r* )p; \
\
ctype_r* restrict one_r = PASTEMAC(chr,1); \
ctype_r* restrict minus_one_r = PASTEMAC(chr,m1); \
\
ctype* restrict c10; \
ctype_r* restrict p10; \
dim_t p10_dim, p10_len; \
inc_t incc10, ldc10; \
doff_t diagoffc10; \
conj_t conjc10; \
\
ctype* restrict c12; \
ctype_r* restrict p12; \
dim_t p12_dim, p12_len; \
inc_t incc12, ldc12; \
doff_t diagoffc12; \
conj_t conjc12; \
\
/* Sanity check. Diagonals should not intersect the short end of
a micro-panel. If they do, then somehow the constraints on
cache blocksizes being a whole multiple of the register
blocksizes was somehow violated. */ \
if ( ( col_stored && diagoffc < 0 ) || \
( row_stored && diagoffc > 0 ) ) \
bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \
\
diagoffc_abs = bli_abs( diagoffc ); \
\
if ( ( row_stored && bli_is_upper( uploc ) ) || \
( col_stored && bli_is_lower( uploc ) ) ) \
{ \
p10_dim = panel_dim; \
p10_len = diagoffc_abs; \
p10 = p_r; \
c10 = c; \
incc10 = incc; \
ldc10 = ldc; \
conjc10 = conjc; \
\
p12_dim = panel_dim; \
p12_len = panel_len - p10_len; \
j = p10_len; \
diagoffc12 = diagoffc_abs - j; \
p12 = p_r + (j )*ldp; \
c12 = c + (j )*ldc; \
c12 = c12 + diagoffc12 * ( doff_t )cs_c + \
-diagoffc12 * ( doff_t )rs_c; \
incc12 = ldc; \
ldc12 = incc; \
conjc12 = conjc; \
\
if ( bli_is_hermitian( strucc ) ) \
bli_toggle_conj( &conjc12 ); \
} \
else /* if ( ( row_stored && bli_is_lower( uploc ) ) || \
( col_stored && bli_is_upper( uploc ) ) ) */ \
{ \
p10_dim = panel_dim; \
p10_len = diagoffc_abs + panel_dim; \
diagoffc10 = diagoffc; \
p10 = p_r; \
c10 = c; \
c10 = c10 + diagoffc10 * ( doff_t )cs_c + \
-diagoffc10 * ( doff_t )rs_c; \
incc10 = ldc; \
ldc10 = incc; \
conjc10 = conjc; \
\
p12_dim = panel_dim; \
p12_len = panel_len - p10_len; \
j = p10_len; \
p12 = p_r + (j )*ldp; \
c12 = c + (j )*ldc; \
incc12 = incc; \
ldc12 = ldc; \
conjc12 = conjc; \
\
if ( bli_is_hermitian( strucc ) ) \
bli_toggle_conj( &conjc10 ); \
} \
\
/* Pack to p10. For upper storage, this includes the unstored
triangle of c11. */ \
/* NOTE: Since we're only packing partial panels here, we pass in
p1x_len as panel_len_max; otherwise, the packm kernel will zero-
fill the columns up to panel_len_max, which is not what we need
or want to happen. */ \
PASTEMAC(ch,kername) \
( \
conjc10, \
p10_dim, \
panel_dim_max, \
p10_len, \
p10_len, \
kappa, \
c10, incc10, ldc10, \
( ctype* )p10, is_p, ldp, \
cntx \
); \
\
/* Pack to p12. For lower storage, this includes the unstored
triangle of c11. */ \
/* NOTE: Since we're only packing partial panels here, we pass in
p1x_len as panel_len_max; otherwise, the packm kernel will zero-
fill the columns up to panel_len_max, which is not what we need
or want to happen. */ \
PASTEMAC(ch,kername) \
( \
conjc12, \
p12_dim, \
panel_dim_max, \
p12_len, \
p12_len, \
kappa, \
c12, incc12, ldc12, \
( ctype* )p12, is_p, ldp, \
cntx \
); \
\
/* Pack the stored triangle of c11 to p11. */ \
{ \
dim_t p11_m = panel_dim; \
dim_t p11_n = panel_dim; \
inc_t rs_c11 = 2*rs_c; \
inc_t cs_c11 = 2*cs_c; \
dim_t j2 = diagoffc_abs; \
ctype* c11 = ( ctype* )c + (j2 )*ldc; \
ctype_r* p11 = ( ctype_r* )p_r + (j2 )*ldp; \
ctype_r* c11_r = ( ctype_r* )c11; \
ctype_r* c11_i = ( ctype_r* )c11 + 1; \
ctype_r* p11_r = ( ctype_r* )p11; \
ctype_r* p11_i = ( ctype_r* )p11 + is_p; \
ctype_r* alpha_r = one_r; \
ctype_r* alpha_i = ( bli_is_conj( conjc ) ? minus_one_r : one_r ); \
ctype_r kappa_r = PASTEMAC(ch,real)( *kappa ); \
ctype_r kappa_i = PASTEMAC(ch,imag)( *kappa ); \
\
/* Copy the real part of the stored triangle of c11 to p11_r. */ \
PASTEMAC2(chr,scal2m,BLIS_TAPI_EX_SUF) \
( \
0, \
BLIS_NONUNIT_DIAG, \
uploc, \
BLIS_NO_TRANSPOSE, \
p11_m, \
p11_n, \
alpha_r, \
c11_r, rs_c11, cs_c11, \
p11_r, rs_p, cs_p, \
cntx, \
NULL \
); \
\
/* Copy the imaginary part of the stored triangle of c11 to p11_i,
scaling by -1 if conjugation on c was requested. */ \
PASTEMAC2(chr,scal2m,BLIS_TAPI_EX_SUF) \
( \
0, \
BLIS_NONUNIT_DIAG, \
uploc, \
BLIS_NO_TRANSPOSE, \
p11_m, \
p11_n, \
alpha_i, \
c11_i, rs_c11, cs_c11, \
p11_i, rs_p, cs_p, \
cntx, \
NULL \
); \
\
/* If source matrix c is Hermitian, we have to zero out the
imaginary components of the diagonal of p11 in case the
corresponding elements in c11 were not already zero. */ \
if ( bli_is_hermitian( strucc ) ) \
{ \
for ( i = 0; i < p11_m; ++i ) \
{ \
ctype_r* pi11_i = p11_i + (i )*rs_p + (i )*cs_p; \
\
PASTEMAC(chr,set0s)( *pi11_i ); \
} \
} \
\
/* Apply kappa to the part of p11 that corresponds to the stored
part of c11 that was copied above. */ \
if ( bli_is_upper( uploc ) ) \
{ \
PASTEMAC(ch,scalris_mxn_u) \
( \
0, \
p11_m, \
p11_n, \
&kappa_r, \
&kappa_i, \
p11_r, \
p11_i, rs_p, cs_p \
); \
} \
else \
{ \
PASTEMAC(ch,scalris_mxn_l) \
( \
0, \
p11_m, \
p11_n, \
&kappa_r, \
&kappa_i, \
p11_r, \
p11_i, rs_p, cs_p \
); \
} \
/*
PASTEMAC(chr,fprintm)( stdout, "packm_herm_cxk: ap_r copied", m_panel_max, n_panel_max, \
p_r + 0*is_p, rs_p, cs_p, "%4.1f", "" ); \
PASTEMAC(chr,fprintm)( stdout, "packm_herm_cxk: ap_i copied", m_panel_max, n_panel_max, \
p_r + 1*is_p, rs_p, cs_p, "%4.1f", "" ); \
*/ \
} \
} \
}
INSERT_GENTFUNCCO_BASIC( packm_herm_cxk_4mi, packm_cxk_4mi )
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffp, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, inc_t ldp, \
cntx_t* cntx \
) \
{ \
/* Pack the panel. */ \
PASTEMAC(ch,kername) \
( \
conjc, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, incc, ldc, \
p, is_p, ldp, \
cntx \
); \
\
\
/* Tweak the panel according to its triangular structure */ \
{ \
ctype_r* p_r = ( ctype_r* )p; \
ctype_r* p_i = ( ctype_r* )p + is_p; \
\
dim_t j = bli_abs( diagoffp ); \
ctype_r* p11_r = p_r + (j )*ldp; \
ctype_r* p11_i = p_i + (j )*ldp; \
\
/* If the diagonal of c is implicitly unit, explicitly set the
the diagonal of the packed panel to kappa. */ \
if ( bli_is_unit_diag( diagc ) ) \
{ \
ctype_r kappa_r = PASTEMAC(ch,real)( *kappa ); \
ctype_r kappa_i = PASTEMAC(ch,imag)( *kappa ); \
\
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
diagoffp, \
m_panel, \
n_panel, \
&kappa_r, \
p_r, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setd,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
diagoffp, \
m_panel, \
n_panel, \
&kappa_i, \
p_i, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
\
\
/* If requested, invert the diagonal of the packed panel. */ \
if ( invdiag == TRUE ) \
{ \
dim_t i; \
\
for ( i = 0; i < panel_dim; ++i ) \
{ \
ctype_r* pi11_r = p11_r + (i )*rs_p + (i )*cs_p; \
ctype_r* pi11_i = p11_i + (i )*rs_p + (i )*cs_p; \
\
PASTEMAC(ch,invertris)( *pi11_r, *pi11_i ); \
} \
} \
\
\
/* Set the region opposite the diagonal of p to zero. To do this,
we need to reference the "unstored" region on the other side of
the diagonal. This amounts to toggling uploc and then shifting
the diagonal offset to shrink the newly referenced region (by
one diagonal). Note that this zero-filling is not needed for
trsm, since the unstored region is not referenced by the trsm
micro-kernel; however, zero-filling is needed for trmm, which
uses the gemm micro-kernel.*/ \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
uplo_t uplop = uploc; \
\
bli_toggle_uplo( &uplop ); \
bli_shift_diag_offset_to_shrink_uplo( uplop, &diagoffp ); \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
diagoffp, \
BLIS_NONUNIT_DIAG, \
uplop, \
m_panel, \
n_panel, \
zero_r, \
p_r, rs_p, cs_p, \
cntx, \
NULL \
); \
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
diagoffp, \
BLIS_NONUNIT_DIAG, \
uplop, \
m_panel, \
n_panel, \
zero_r, \
p_i, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
} \
}
INSERT_GENTFUNCCO_BASIC( packm_tri_cxk_4mi, packm_cxk_4mi )

View File

@@ -1,121 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffp, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_struc_cxk_4mi )
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, inc_t ldp, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_herm_cxk_4mi )
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, inc_t ldp, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_tri_cxk_4mi )

View File

@@ -1,625 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, \
cntx_t* cntx \
) \
{ \
dim_t panel_dim; \
dim_t panel_dim_max; \
dim_t panel_len; \
dim_t panel_len_max; \
inc_t incc, ldc; \
inc_t ldp; \
\
\
/* Determine the dimensions and relative strides of the micro-panel
based on its pack schema. */ \
if ( bli_is_col_packed( schema ) ) \
{ \
/* Prepare to pack to row-stored column panel. */ \
panel_dim = n_panel; \
panel_dim_max = n_panel_max; \
panel_len = m_panel; \
panel_len_max = m_panel_max; \
incc = cs_c; \
ldc = rs_c; \
ldp = rs_p; \
} \
else /* if ( bli_is_row_packed( schema ) ) */ \
{ \
/* Prepare to pack to column-stored row panel. */ \
panel_dim = m_panel; \
panel_dim_max = m_panel_max; \
panel_len = n_panel; \
panel_len_max = n_panel_max; \
incc = rs_c; \
ldc = cs_c; \
ldp = cs_p; \
} \
\
\
/* Handle micro-panel packing based on the structure of the matrix
being packed. */ \
if ( bli_is_general( strucc ) ) \
{ \
/* For micro-panels of general matrices, we can call the pack
kernel front-end directly. */ \
PASTEMAC(ch,kername) \
( \
conjc, \
schema, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, incc, ldc, \
p, ldp, \
cntx \
); \
} \
else if ( bli_is_herm_or_symm( strucc ) ) \
{ \
/* Call a helper function for micro-panels of Hermitian/symmetric
matrices. */ \
PASTEMAC(ch,packm_herm_cxk_rih) \
( \
strucc, \
diagoffc, \
uploc, \
conjc, \
schema, \
m_panel, \
n_panel, \
m_panel_max, \
n_panel_max, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, rs_c, cs_c, \
incc, ldc, \
p, rs_p, cs_p, \
ldp, \
cntx \
); \
} \
else /* ( bli_is_triangular( strucc ) ) */ \
{ \
/* Call a helper function for micro-panels of triangular
matrices. */ \
PASTEMAC(ch,packm_tri_cxk_rih) \
( \
strucc, \
diagoffc, \
diagc, \
uploc, \
conjc, \
schema, \
invdiag, \
m_panel, \
n_panel, \
m_panel_max, \
n_panel_max, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, rs_c, cs_c, \
incc, ldc, \
p, rs_p, cs_p, \
ldp, \
cntx \
); \
} \
\
\
/* If m_panel < m_panel_max, or n_panel < n_panel_max, we would normally
fill the edge region (the bottom m_panel_max - m_panel rows or right-
side n_panel_max - n_panel columns) of the micropanel with zeros.
However, this responsibility has been moved to the packm microkernel.
This change allows experts to use custom kernels that pack to custom
packing formats when the problem size is not a nice multiple of the
register blocksize. */ \
/*
if ( m_panel != m_panel_max ) \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
dim_t i = m_panel; \
dim_t m_edge = m_panel_max - i; \
dim_t n_edge = n_panel_max; \
ctype_r* p_edge_r = ( ctype_r* )p + (i )*rs_p; \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
\
if ( n_panel != n_panel_max ) \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
dim_t j = n_panel; \
dim_t m_edge = m_panel_max; \
dim_t n_edge = n_panel_max - j; \
ctype_r* p_edge_r = ( ctype_r* )p + (j )*cs_p; \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
0, \
BLIS_NONUNIT_DIAG, \
BLIS_DENSE, \
m_edge, \
n_edge, \
zero_r, \
p_edge_r, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
*/ \
\
\
if ( bli_is_triangular( strucc ) ) \
{ \
/* If this panel is an edge case in both panel dimension and length,
then it must be a bottom-right corner case. Set the part of the
diagonal that extends into the zero-padded region to identity.
NOTE: This is actually only necessary when packing for trsm, as
it helps prevent NaNs and Infs from creeping into the computation.
However, we set the region to identity for trmm as well. Those
1.0's end up getting muliplied by the 0.0's in the zero-padded
region of the other matrix, so there is no harm in this. */ \
if ( m_panel != m_panel_max && \
n_panel != n_panel_max ) \
{ \
/* We don't need this case if we aren't supporting trsm.
Why? Because trmm's packm control tree node should be
using k dimension multiples of 1 (kr == 1), which means
there will never be zero padding at the far end of a
micro-panel. */ \
} \
} \
\
\
/*
{ \
if ( bli_is_col_packed( schema ) ) \
PASTEMAC(chr,fprintm)( stdout, "packm_struc_cxk_rih: bp copied", m_panel_max, n_panel_max, \
( ctype_r* )p, rs_p, cs_p, "%4.1f", "" ); \
else if ( bli_is_row_packed( schema ) ) \
PASTEMAC(chr,fprintm)( stdout, "packm_struc_cxk_rih: ap copied", m_panel_max, n_panel_max, \
( ctype_r* )p, rs_p, cs_p, "%4.1f", "" ); \
} \
*/ \
\
\
}
INSERT_GENTFUNCCO_BASIC( packm_struc_cxk_rih, packm_cxk_rih )
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t ldp, \
cntx_t* cntx \
) \
{ \
bool row_stored; \
bool col_stored; \
doff_t diagoffc_abs; \
dim_t j; \
\
\
/* Create flags to incidate row or column storage. Note that the
schema bit that encodes row or column is describing the form of
micro-panel, not the storage in the micro-panel. Hence the
mismatch in "row" and "column" semantics. */ \
row_stored = bli_is_col_packed( schema ); \
col_stored = bli_is_row_packed( schema ); \
\
\
/* Handle the case where the micro-panel does NOT intersect the
diagonal separately from the case where it does intersect. */ \
if ( !bli_intersects_diag_n( diagoffc, m_panel, n_panel ) ) \
{ \
/* If the current panel is unstored, we need to make a few
adjustments so we refer to the data where it is actually
stored, also taking conjugation into account. (Note this
implicitly assumes we are operating on a dense panel
within a larger symmetric or Hermitian matrix, since a
general matrix would not contain any unstored region.) */ \
if ( bli_is_unstored_subpart_n( diagoffc, uploc, m_panel, n_panel ) ) \
{ \
c = c + diagoffc * ( doff_t )cs_c + \
-diagoffc * ( doff_t )rs_c; \
bli_swap_incs( &incc, &ldc ); \
\
if ( bli_is_hermitian( strucc ) ) \
bli_toggle_conj( &conjc ); \
} \
\
/* Pack the full panel. */ \
PASTEMAC(ch,kername) \
( \
conjc, \
schema, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, incc, ldc, \
p, ldp, \
cntx \
); \
} \
else /* if ( bli_intersects_diag_n( diagoffc, m_panel, n_panel ) ) */ \
{ \
ctype_r* restrict p_r = ( ctype_r* )p; \
\
ctype* restrict c10; \
ctype_r* restrict p10; \
dim_t p10_dim, p10_len; \
inc_t incc10, ldc10; \
doff_t diagoffc10; \
conj_t conjc10; \
\
ctype* restrict c12; \
ctype_r* restrict p12; \
dim_t p12_dim, p12_len; \
inc_t incc12, ldc12; \
doff_t diagoffc12; \
conj_t conjc12; \
\
/* Sanity check. Diagonals should not intersect the short end of
a micro-panel. If they do, then somehow the constraints on
cache blocksizes being a whole multiple of the register
blocksizes was somehow violated. */ \
if ( ( col_stored && diagoffc < 0 ) || \
( row_stored && diagoffc > 0 ) ) \
bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \
\
diagoffc_abs = bli_abs( diagoffc ); \
\
if ( ( row_stored && bli_is_upper( uploc ) ) || \
( col_stored && bli_is_lower( uploc ) ) ) \
{ \
p10_dim = panel_dim; \
p10_len = diagoffc_abs; \
p10 = p_r; \
c10 = c; \
incc10 = incc; \
ldc10 = ldc; \
conjc10 = conjc; \
\
p12_dim = panel_dim; \
p12_len = panel_len - p10_len; \
j = p10_len; \
diagoffc12 = diagoffc_abs - j; \
p12 = p_r + (j )*ldp; \
c12 = c + (j )*ldc; \
c12 = c12 + diagoffc12 * ( doff_t )cs_c + \
-diagoffc12 * ( doff_t )rs_c; \
incc12 = ldc; \
ldc12 = incc; \
conjc12 = conjc; \
\
if ( bli_is_hermitian( strucc ) ) \
bli_toggle_conj( &conjc12 ); \
} \
else /* if ( ( row_stored && bli_is_lower( uploc ) ) || \
( col_stored && bli_is_upper( uploc ) ) ) */ \
{ \
p10_dim = panel_dim; \
p10_len = diagoffc_abs + panel_dim; \
diagoffc10 = diagoffc; \
p10 = p_r; \
c10 = c; \
c10 = c10 + diagoffc10 * ( doff_t )cs_c + \
-diagoffc10 * ( doff_t )rs_c; \
incc10 = ldc; \
ldc10 = incc; \
conjc10 = conjc; \
\
p12_dim = panel_dim; \
p12_len = panel_len - p10_len; \
j = p10_len; \
p12 = p_r + (j )*ldp; \
c12 = c + (j )*ldc; \
incc12 = incc; \
ldc12 = ldc; \
conjc12 = conjc; \
\
if ( bli_is_hermitian( strucc ) ) \
bli_toggle_conj( &conjc10 ); \
} \
\
/* Pack to p10. For upper storage, this includes the unstored
triangle of c11. */ \
/* NOTE: Since we're only packing partial panels here, we pass in
p1x_len as panel_len_max; otherwise, the packm kernel will zero-
fill the columns up to panel_len_max, which is not what we need
or want to happen. */ \
PASTEMAC(ch,kername) \
( \
conjc10, \
schema, \
p10_dim, \
panel_dim_max, \
p10_len, \
p10_len, \
kappa, \
c10, incc10, ldc10, \
( ctype* )p10, ldp, \
cntx \
); \
\
/* Pack to p12. For lower storage, this includes the unstored
triangle of c11. */ \
/* NOTE: Since we're only packing partial panels here, we pass in
p1x_len as panel_len_max; otherwise, the packm kernel will zero-
fill the columns up to panel_len_max, which is not what we need
or want to happen. */ \
PASTEMAC(ch,kername) \
( \
conjc12, \
schema, \
p12_dim, \
panel_dim_max, \
p12_len, \
p12_len, \
kappa, \
c12, incc12, ldc12, \
( ctype* )p12, ldp, \
cntx \
); \
\
/* Pack the stored triangle of c11 to p11. */ \
{ \
dim_t j2 = diagoffc_abs; \
/*ctype_r* restrict p_r = ( ctype_r* )p;*/ \
ctype* restrict c11 = c + (j2 )*ldc; \
ctype_r* restrict p11_r = p_r + (j2 )*ldp; \
\
PASTEMAC(ch,scal2rihs_mxn_uplo) \
( \
schema, \
uploc, \
conjc, \
panel_dim, \
kappa, \
c11, rs_c, cs_c, \
p11_r, rs_p, cs_p \
); \
\
/* If we are packing a micro-panel with Hermitian structure,
we must take special care of the diagonal. Now, if kappa
were guaranteed to be unit, all we would need to do is
explicitly zero out the imaginary part of the diagonal of
p11, in case the diagonal of the source matrix contained
garbage (non-zero) imaginary values. HOWEVER, since kappa
can be non-unit, things become a little more complicated.
In general, we must re-apply the kappa scalar to ONLY the
real part of the diagonal of the source matrix and save
the result to the diagonal of p11. */ \
if ( bli_is_hermitian( strucc ) ) \
{ \
PASTEMAC3(ch,chr,ch,scal2rihs_mxn_diag) \
( \
schema, \
panel_dim, \
panel_dim, \
kappa, \
c11, rs_c, cs_c, \
p11_r, rs_p, cs_p \
); \
} \
\
/*
PASTEMAC(chr,fprintm)( stdout, "packm_herm_cxk: ap_r copied", m_panel_max, n_panel_max, \
p_r + 0*is_p, rs_p, cs_p, "%4.1f", "" ); \
PASTEMAC(chr,fprintm)( stdout, "packm_herm_cxk: ap_i copied", m_panel_max, n_panel_max, \
p_r + 1*is_p, rs_p, cs_p, "%4.1f", "" ); \
*/ \
} \
} \
}
INSERT_GENTFUNCCO_BASIC( packm_herm_cxk_rih, packm_cxk_rih )
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, varname, kername ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffp, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t ldp, \
cntx_t* cntx \
) \
{ \
/* Pack the panel. */ \
PASTEMAC(ch,kername) \
( \
conjc, \
schema, \
panel_dim, \
panel_dim_max, \
panel_len, \
panel_len_max, \
kappa, \
c, incc, ldc, \
p, ldp, \
cntx \
); \
\
\
/* Tweak the panel according to its triangular structure */ \
{ \
ctype_r* p_r = ( ctype_r* )p; \
\
dim_t j = bli_abs( diagoffp ); \
ctype_r* p11_r = p_r + (j )*ldp; \
\
/* If the diagonal of c is implicitly unit, explicitly set the
the diagonal of the packed panel to kappa. */ \
if ( bli_is_unit_diag( diagc ) ) \
{ \
PASTEMAC(ch,setrihs_mxn_diag) \
( \
schema, \
panel_dim, \
panel_dim, \
kappa, \
p11_r, rs_p, cs_p \
); \
} \
\
\
/* If requested, invert the diagonal of the packed panel. */ \
if ( invdiag == TRUE ) \
{ \
/* We don't need this case if we aren't supporting trsm. */ \
} \
\
\
/* Set the region opposite the diagonal of p to zero. To do this,
we need to reference the "unstored" region on the other side of
the diagonal. This amounts to toggling uploc and then shifting
the diagonal offset to shrink the newly referenced region (by
one diagonal). */ \
{ \
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
uplo_t uplop = uploc; \
\
bli_toggle_uplo( &uplop ); \
bli_shift_diag_offset_to_shrink_uplo( uplop, &diagoffp ); \
\
PASTEMAC2(chr,setm,BLIS_TAPI_EX_SUF) \
( \
BLIS_NO_CONJUGATE, \
diagoffp, \
BLIS_NONUNIT_DIAG, \
uplop, \
m_panel, \
n_panel, \
zero_r, \
p_r, rs_p, cs_p, \
cntx, \
NULL \
); \
} \
} \
}
INSERT_GENTFUNCCO_BASIC( packm_tri_cxk_rih, packm_cxk_rih )

View File

@@ -1,121 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffp, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t is_p, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_struc_cxk_rih )
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t ldp, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_herm_cxk_rih )
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
\
void PASTEMAC(ch,varname) \
( \
struc_t strucc, \
doff_t diagoffc, \
diag_t diagc, \
uplo_t uploc, \
conj_t conjc, \
pack_t schema, \
bool invdiag, \
dim_t m_panel, \
dim_t n_panel, \
dim_t m_panel_max, \
dim_t n_panel_max, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* restrict kappa, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
inc_t incc, inc_t ldc, \
ctype* restrict p, inc_t rs_p, inc_t cs_p, \
inc_t ldp, \
cntx_t* cntx \
);
INSERT_GENTPROTCO_BASIC0( packm_tri_cxk_rih )

View File

@@ -48,23 +48,13 @@
#include "bli_l3_packm.h"
#include "bli_l3_schema.h"
// Prototype object APIs (expert and non-expert).
#include "bli_oapi_ex.h"
// Prototype object APIs (basic and expert).
#include "bli_l3_oapi.h"
#include "bli_xapi_undef.h"
#include "bli_l3_oapi_ex.h"
#include "bli_oapi_ba.h"
#include "bli_l3_oapi.h"
#include "bli_xapi_undef.h"
// Prototype typed APIs (expert and non-expert).
#include "bli_tapi_ex.h"
// Prototype typed APIs (basic and expert).
#include "bli_l3_tapi.h"
#include "bli_xapi_undef.h"
#include "bli_tapi_ba.h"
#include "bli_l3_tapi.h"
#include "bli_xapi_undef.h"
#include "bli_l3_tapi_ex.h"
// Define function types for small/unpacked handlers/kernels.
#include "bli_l3_sup_oft.h"

View File

@@ -98,7 +98,7 @@ void bli_hemm_check
{
err_t e_val;
// Perform checks common to hemm/symm.
// Perform checks common to hemm/symm/trmm/trsm.
bli_hemm_basic_check( side, alpha, a, b, beta, c, cntx );
@@ -248,7 +248,7 @@ void bli_syr2k_check
bli_check_error_code( e_val );
}
void bli_trmm_check
void bli_trmm3_check
(
side_t side,
obj_t* alpha,
@@ -261,7 +261,7 @@ void bli_trmm_check
{
err_t e_val;
// Perform checks common to hemm/symm.
// Perform checks common to hemm/symm/trmm/trsm.
bli_hemm_basic_check( side, alpha, a, b, beta, c, cntx );
@@ -271,22 +271,41 @@ void bli_trmm_check
bli_check_error_code( e_val );
}
void bli_trmm_check
(
side_t side,
obj_t* alpha,
obj_t* a,
obj_t* b,
cntx_t* cntx
)
{
err_t e_val;
// Perform checks common to hemm/symm/trmm/trsm.
bli_hemm_basic_check( side, alpha, a, b, &BLIS_ZERO, b, cntx );
// Check object structure.
e_val = bli_check_triangular_object( a );
bli_check_error_code( e_val );
}
void bli_trsm_check
(
side_t side,
obj_t* alpha,
obj_t* a,
obj_t* b,
obj_t* beta,
obj_t* c,
cntx_t* cntx
)
{
err_t e_val;
// Perform checks common to hemm/symm.
// Perform checks common to hemm/symm/trmm/trsm.
bli_hemm_basic_check( side, alpha, a, b, beta, c, cntx );
bli_hemm_basic_check( side, alpha, a, b, &BLIS_ZERO, b, cntx );
// Check object structure.

View File

@@ -72,8 +72,7 @@ void PASTEMAC(opname,_check) \
GENPROT( hemm )
GENPROT( symm )
GENPROT( trmm )
GENPROT( trsm )
GENPROT( trmm3 )
#undef GENPROT
@@ -92,6 +91,22 @@ GENPROT( herk )
GENPROT( syrk )
#undef GENPROT
#define GENPROT( opname ) \
\
void PASTEMAC(opname,_check) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
cntx_t* cntx \
);
GENPROT( trmm )
GENPROT( trsm )
// -----------------------------------------------------------------------------
void bli_gemm_basic_check

View File

@@ -35,23 +35,13 @@
#include "blis.h"
static void_fp bli_l3_ind_oper_fp[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS] =
// This array tracks whether a particular operation is implemented for each of
// the induced methods.
static bool bli_l3_ind_oper_impl[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS] =
{
/* gemm gemmt hemm herk her2k symm syrk syr2k trmm3 trmm trsm */
/* 3mh */ { bli_gemm3mh, NULL, bli_hemm3mh, bli_herk3mh, bli_her2k3mh, bli_symm3mh,
bli_syrk3mh, bli_syr2k3mh, bli_trmm33mh, NULL, NULL },
/* 3m1 */ { bli_gemm3m1, NULL, bli_hemm3m1, bli_herk3m1, bli_her2k3m1, bli_symm3m1,
bli_syrk3m1, bli_syr2k3m1, bli_trmm33m1, bli_trmm3m1, bli_trsm3m1 },
/* 4mh */ { bli_gemm4mh, NULL, bli_hemm4mh, bli_herk4mh, bli_her2k4mh, bli_symm4mh,
bli_syrk4mh, bli_syr2k4mh, bli_trmm34mh, NULL, NULL },
/* 4mb */ { bli_gemm4mb, NULL, NULL, NULL, NULL, NULL,
NULL, NULL, NULL, NULL, NULL },
/* 4m1 */ { bli_gemm4m1, NULL, bli_hemm4m1, bli_herk4m1, bli_her2k4m1, bli_symm4m1,
bli_syrk4m1, bli_syr2k4m1, bli_trmm34m1, bli_trmm4m1, bli_trsm4m1 },
/* 1m */ { bli_gemm1m, NULL, bli_hemm1m, bli_herk1m, bli_her2k1m, bli_symm1m,
bli_syrk1m, bli_syr2k1m, bli_trmm31m, bli_trmm1m, bli_trsm1m },
/* nat */ { bli_gemmnat, bli_gemmtnat, bli_hemmnat, bli_herknat, bli_her2knat, bli_symmnat,
bli_syrknat, bli_syr2knat, bli_trmm3nat, bli_trmmnat, bli_trsmnat },
/* 1m */ { TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE },
/* nat */ { TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE, TRUE }
};
//
@@ -67,16 +57,6 @@ bool bli_l3_ind_oper_st[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS][2] =
{
/* gemm gemmt hemm herk her2k symm syrk syr2k trmm3 trmm trsm */
/* c z */
/* 3mh */ { {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE},
{FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE} },
/* 3m1 */ { {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE},
{FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE} },
/* 4mh */ { {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE},
{FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE} },
/* 4mb */ { {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE},
{FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE} },
/* 4m1 */ { {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE},
{FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE} },
/* 1m */ { {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE},
{FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE}, {FALSE,FALSE} },
/* nat */ { {TRUE,TRUE}, {TRUE,TRUE}, {TRUE,TRUE}, {TRUE,TRUE}, {TRUE,TRUE}, {TRUE,TRUE},
@@ -88,16 +68,14 @@ bool bli_l3_ind_oper_st[BLIS_NUM_IND_METHODS][BLIS_NUM_LEVEL3_OPS][2] =
#undef GENFUNC
#define GENFUNC( opname, optype ) \
\
void_fp PASTEMAC(opname,ind_get_avail)( num_t dt ) \
ind_t PASTEMAC(opname,ind_find_avail)( num_t dt ) \
{ \
return bli_ind_oper_get_avail( optype, dt ); \
return bli_l3_ind_oper_find_avail( optype, dt ); \
}
/*
bool PASTEMAC(opname,ind_has_avail)( num_t dt )
{
return bli_ind_oper_has_avail( optype, dt );
}
*/
//bool PASTEMAC(opname,ind_has_avail)( num_t dt )
//{
// return bli_ind_oper_has_avail( optype, dt );
//}
GENFUNC( gemm, BLIS_GEMM )
GENFUNC( gemmt, BLIS_GEMMT )
@@ -116,16 +94,16 @@ GENFUNC( trsm, BLIS_TRSM )
#if 0
bool bli_l3_ind_oper_is_avail( opid_t oper, ind_t method, num_t dt )
{
void_fp func;
bool stat;
bool enabled;
bool stat;
// If the datatype is real, it is never available.
if ( !bli_is_complex( dt ) ) return FALSE;
func = bli_l3_ind_oper_get_func( oper, method );
stat = bli_l3_ind_oper_get_enable( oper, method, dt );
enabled = bli_l3_ind_oper_is_impl( oper, method );
stat = bli_l3_ind_oper_get_enable( oper, method, dt );
return ( func != NULL && stat == TRUE );
return ( enabled == TRUE && stat == TRUE );
}
#endif
@@ -148,11 +126,11 @@ ind_t bli_l3_ind_oper_find_avail( opid_t oper, num_t dt )
// current operation and datatype.
for ( im = 0; im < BLIS_NUM_IND_METHODS; ++im )
{
void_fp func = bli_l3_ind_oper_get_func( oper, im );
bool stat = bli_l3_ind_oper_get_enable( oper, im, dt );
bool enabled = bli_l3_ind_oper_is_impl( oper, im );
bool stat = bli_l3_ind_oper_get_enable( oper, im, dt );
if ( func != NULL &&
stat == TRUE ) return im;
if ( enabled == TRUE &&
stat == TRUE ) return im;
}
// This return statement should never execute since the native index
@@ -258,8 +236,7 @@ bool bli_l3_ind_oper_get_enable( opid_t oper, ind_t method, num_t dt )
// -----------------------------------------------------------------------------
void_fp bli_l3_ind_oper_get_func( opid_t oper, ind_t method )
bool bli_l3_ind_oper_is_impl( opid_t oper, ind_t method )
{
return bli_l3_ind_oper_fp[ method ][ oper ];
return bli_l3_ind_oper_impl[ method ][ oper ];
}

View File

@@ -41,7 +41,7 @@
#undef GENPROT
#define GENPROT( opname ) \
\
void_fp PASTEMAC(opname,ind_get_avail)( num_t dt );
ind_t PASTEMAC(opname,ind_find_avail)( num_t dt );
/*bool PASTEMAC(opname,ind_has_avail)( num_t dt ); */
GENPROT( gemm )
@@ -70,7 +70,7 @@ void bli_l3_ind_oper_set_enable_all( opid_t oper, num_t dt, bool status );
void bli_l3_ind_oper_set_enable( opid_t oper, ind_t method, num_t dt, bool status );
bool bli_l3_ind_oper_get_enable( opid_t oper, ind_t method, num_t dt );
void_fp bli_l3_ind_oper_get_func( opid_t oper, ind_t method );
bool bli_l3_ind_oper_is_impl( opid_t oper, ind_t method );
#endif

View File

@@ -53,11 +53,6 @@ void PASTEMAC(ch,opname) \
cntx_t* restrict cntx \
);
INSERT_GENTPROT_BASIC0( gemm3mh_ukr_name )
INSERT_GENTPROT_BASIC0( gemm3m1_ukr_name )
INSERT_GENTPROT_BASIC0( gemm4mh_ukr_name )
INSERT_GENTPROT_BASIC0( gemm4mb_ukr_name )
INSERT_GENTPROT_BASIC0( gemm4m1_ukr_name )
INSERT_GENTPROT_BASIC0( gemm1m_ukr_name )
@@ -77,10 +72,6 @@ void PASTEMAC(ch,opname) \
cntx_t* restrict cntx \
);
INSERT_GENTPROT_BASIC0( gemmtrsm3m1_l_ukr_name )
INSERT_GENTPROT_BASIC0( gemmtrsm3m1_u_ukr_name )
INSERT_GENTPROT_BASIC0( gemmtrsm4m1_l_ukr_name )
INSERT_GENTPROT_BASIC0( gemmtrsm4m1_u_ukr_name )
INSERT_GENTPROT_BASIC0( gemmtrsm1m_l_ukr_name )
INSERT_GENTPROT_BASIC0( gemmtrsm1m_u_ukr_name )
@@ -97,10 +88,6 @@ void PASTEMAC(ch,opname) \
cntx_t* restrict cntx \
);
INSERT_GENTPROT_BASIC0( trsm3m1_l_ukr_name )
INSERT_GENTPROT_BASIC0( trsm3m1_u_ukr_name )
INSERT_GENTPROT_BASIC0( trsm4m1_l_ukr_name )
INSERT_GENTPROT_BASIC0( trsm4m1_u_ukr_name )
INSERT_GENTPROT_BASIC0( trsm1m_l_ukr_name )
INSERT_GENTPROT_BASIC0( trsm1m_u_ukr_name )

View File

@@ -4,8 +4,7 @@
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2020, Advanced Micro Devices, Inc.
Copyright (C) 2021, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
@@ -33,187 +32,31 @@
*/
// Guard the function definitions so that they are only compiled when
// #included from files that define the object API macros.
#ifdef BLIS_ENABLE_OAPI
#include "blis.h"
//
// Define object-based interfaces.
// Define object-based interfaces (basic).
//
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,EX_SUF) \
void PASTEMAC0(opname) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c \
BLIS_OAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_OAPI_EX_DECLS \
\
/* If the rntm is non-NULL, it may indicate that we should forgo sup
handling altogether. */ \
bool enable_sup = TRUE; \
if ( rntm != NULL ) enable_sup = bli_rntm_l3_sup( rntm ); \
\
if ( enable_sup ) \
{ \
/* Execute the small/unpacked oapi handler. If it finds that the problem
does not fall within the thresholds that define "small", or for some
other reason decides not to use the small/unpacked implementation,
the function returns with BLIS_FAILURE, which causes execution to
proceed towards the conventional implementation. */ \
err_t result = PASTEMAC(opname,sup)( alpha, a, b, beta, c, cntx, rntm ); \
if ( result == BLIS_SUCCESS ) \
{ \
return; \
} \
} \
\
/* Only proceed with an induced method if each of the operands have a
complex storage datatype. NOTE: Allowing precisions to vary while
using 1m, which is what we do here, is unique to gemm; other level-3
operations use 1m only if all storage datatypes are equal (and they
ignore the computation precision). If any operands are real, skip the
induced method chooser function and proceed directly with native
execution. */ \
if ( bli_obj_is_complex( c ) && \
bli_obj_is_complex( a ) && \
bli_obj_is_complex( b ) ) \
{ \
/* Invoke the operation's "ind" function--its induced method front-end.
For complex problems, it calls the highest priority induced method
that is available (ie: implemented and enabled), and if none are
enabled, it calls native execution. (For real problems, it calls
the operation's native execution interface.) */ \
PASTEMAC(opname,ind)( alpha, a, b, beta, c, cntx, rntm ); \
} \
else \
{ \
PASTEMAC(opname,nat)( alpha, a, b, beta, c, cntx, rntm ); \
} \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC(opname,_ex)( alpha, a, b, beta, c, NULL, NULL ); \
}
GENFRONT( gemm )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,EX_SUF) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c \
BLIS_OAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_OAPI_EX_DECLS \
\
/* If the rntm is non-NULL, it may indicate that we should forgo sup
handling altogether. */ \
/*
bool enable_sup = TRUE; \
if ( rntm != NULL ) enable_sup = bli_rntm_l3_sup( rntm ); \
*/ \
\
/* NOTE: The sup handling for gemmt is disabled here because gemmtsup
is not yet fully implemented. */ \
/*
if ( enable_sup ) \
{ \
*/ \
/* Execute the small/unpacked oapi handler. If it finds that the problem
does not fall within the thresholds that define "small", or for some
other reason decides not to use the small/unpacked implementation,
the function returns with BLIS_FAILURE, which causes execution to
proceed towards the conventional implementation. */ \
/*
err_t result = PASTEMAC(opname,sup)( alpha, a, b, beta, c, cntx, rntm ); \
if ( result == BLIS_SUCCESS ) \
{ \
return; \
} \
} \
*/ \
\
/* Only proceed with an induced method if each of the operands have a
complex storage datatype. NOTE: Allowing precisions to vary while
using 1m, which is what we do here, is unique to gemm; other level-3
operations use 1m only if all storage datatypes are equal (and they
ignore the computation precision). If any operands are real, skip the
induced method chooser function and proceed directly with native
execution. */ \
if ( bli_obj_is_complex( c ) && \
bli_obj_is_complex( a ) && \
bli_obj_is_complex( b ) ) \
{ \
/* FIXME: BLIS does not yet support induced methods for gemmt. Thus,
we call the native implementation code path for now. */ \
/*PASTEMAC(opname,ind)( alpha, a, b, beta, c, cntx, rntm );*/ \
PASTEMAC(opname,nat)( alpha, a, b, beta, c, cntx, rntm ); \
} \
else \
{ \
PASTEMAC(opname,nat)( alpha, a, b, beta, c, cntx, rntm ); \
} \
}
GENFRONT( gemmt )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,EX_SUF) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c \
BLIS_OAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_OAPI_EX_DECLS \
\
/* Only proceed with an induced method if each of the operands have a
complex storage datatype. NOTE: Allowing precisions to vary while
using 1m, which is what we do here, is unique to gemm; other level-3
operations use 1m only if all storage datatypes are equal (and they
ignore the computation precision). If any operands are real, skip the
induced method chooser function and proceed directly with native
execution. */ \
if ( bli_obj_is_complex( c ) && \
bli_obj_is_complex( a ) && \
bli_obj_is_complex( b ) ) \
{ \
/* Invoke the operation's "ind" function--its induced method front-end.
For complex problems, it calls the highest priority induced method
that is available (ie: implemented and enabled), and if none are
enabled, it calls native execution. (For real problems, it calls
the operation's native execution interface.) */ \
PASTEMAC(opname,ind)( alpha, a, b, beta, c, cntx, rntm ); \
} \
else \
{ \
PASTEMAC(opname,nat)( alpha, a, b, beta, c, cntx, rntm ); \
} \
}
GENFRONT( her2k )
GENFRONT( syr2k )
@@ -221,7 +64,7 @@ GENFRONT( syr2k )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,EX_SUF) \
void PASTEMAC0(opname) \
( \
side_t side, \
obj_t* alpha, \
@@ -229,32 +72,11 @@ void PASTEMAC(opname,EX_SUF) \
obj_t* b, \
obj_t* beta, \
obj_t* c \
BLIS_OAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_OAPI_EX_DECLS \
\
/* Only proceed with an induced method if all operands have the same
(complex) datatype. If any datatypes differ, skip the induced method
chooser function and proceed directly with native execution, which is
where mixed datatype support will be implemented (if at all). */ \
if ( bli_obj_dt( a ) == bli_obj_dt( c ) && \
bli_obj_dt( b ) == bli_obj_dt( c ) && \
bli_obj_is_complex( c ) ) \
{ \
/* Invoke the operation's "ind" function--its induced method front-end.
For complex problems, it calls the highest priority induced method
that is available (ie: implemented and enabled), and if none are
enabled, it calls native execution. (For real problems, it calls
the operation's native execution interface.) */ \
PASTEMAC(opname,ind)( side, alpha, a, b, beta, c, cntx, rntm ); \
} \
else \
{ \
PASTEMAC(opname,nat)( side, alpha, a, b, beta, c, cntx, rntm ); \
} \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC(opname,_ex)( side, alpha, a, b, beta, c, NULL, NULL ); \
}
GENFRONT( hemm )
@@ -265,37 +87,17 @@ GENFRONT( trmm3 )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,EX_SUF) \
void PASTEMAC0(opname) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* beta, \
obj_t* c \
BLIS_OAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_OAPI_EX_DECLS \
\
/* Only proceed with an induced method if all operands have the same
(complex) datatype. If any datatypes differ, skip the induced method
chooser function and proceed directly with native execution, which is
where mixed datatype support will be implemented (if at all). */ \
if ( bli_obj_dt( a ) == bli_obj_dt( c ) && \
bli_obj_is_complex( c ) ) \
{ \
/* Invoke the operation's "ind" function--its induced method front-end.
For complex problems, it calls the highest priority induced method
that is available (ie: implemented and enabled), and if none are
enabled, it calls native execution. (For real problems, it calls
the operation's native execution interface.) */ \
PASTEMAC(opname,ind)( alpha, a, beta, c, cntx, rntm ); \
} \
else \
{ \
PASTEMAC(opname,nat)( alpha, a, beta, c, cntx, rntm ); \
} \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC(opname,_ex)( alpha, a, beta, c, NULL, NULL ); \
}
GENFRONT( herk )
@@ -305,42 +107,19 @@ GENFRONT( syrk )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,EX_SUF) \
void PASTEMAC0(opname) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b \
BLIS_OAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_OAPI_EX_DECLS \
\
/* Only proceed with an induced method if all operands have the same
(complex) datatype. If any datatypes differ, skip the induced method
chooser function and proceed directly with native execution, which is
where mixed datatype support will be implemented (if at all). */ \
if ( bli_obj_dt( a ) == bli_obj_dt( b ) && \
bli_obj_is_complex( b ) ) \
{ \
/* Invoke the operation's "ind" function--its induced method front-end.
For complex problems, it calls the highest priority induced method
that is available (ie: implemented and enabled), and if none are
enabled, it calls native execution. (For real problems, it calls
the operation's native execution interface.) */ \
PASTEMAC(opname,ind)( side, alpha, a, b, cntx, rntm ); \
} \
else \
{ \
PASTEMAC(opname,nat)( side, alpha, a, b, cntx, rntm ); \
} \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC(opname,_ex)( side, alpha, a, b, NULL, NULL ); \
}
GENFRONT( trmm )
GENFRONT( trsm )
#endif

View File

@@ -35,20 +35,19 @@
//
// Prototype object-based interfaces.
// Prototype object-based interfaces (basic).
//
#undef GENPROT
#define GENPROT( opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC(opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC0(opname) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c \
BLIS_OAPI_EX_PARAMS \
);
GENPROT( gemm )
@@ -60,7 +59,7 @@ GENPROT( syr2k )
#undef GENPROT
#define GENPROT( opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC(opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC0(opname) \
( \
side_t side, \
obj_t* alpha, \
@@ -68,7 +67,6 @@ BLIS_EXPORT_BLIS void PASTEMAC(opname,EX_SUF) \
obj_t* b, \
obj_t* beta, \
obj_t* c \
BLIS_OAPI_EX_PARAMS \
);
GENPROT( hemm )
@@ -79,13 +77,12 @@ GENPROT( trmm3 )
#undef GENPROT
#define GENPROT( opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC(opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC0(opname) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* beta, \
obj_t* c \
BLIS_OAPI_EX_PARAMS \
);
GENPROT( herk )
@@ -95,13 +92,12 @@ GENPROT( syrk )
#undef GENPROT
#define GENPROT( opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC(opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC0(opname) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b \
BLIS_OAPI_EX_PARAMS \
);
GENPROT( trmm )

View File

@@ -1,46 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
// Include cpp macros that instantiate the API definition templates as
// omitting expert parameters.
#include "bli_oapi_ba.h"
// Define the macro protecting the object API definitions.
#define BLIS_ENABLE_OAPI
// Include the object API definitions here.
#include "bli_l3_oapi.c"

View File

@@ -4,7 +4,7 @@
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2021, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
@@ -34,13 +34,305 @@
#include "blis.h"
// Include cpp macros that instantiate the API definition templates as
// having expert parameters.
#include "bli_oapi_ex.h"
//
// Define object-based interfaces (expert).
//
// Define the macro protecting the object API definitions.
#define BLIS_ENABLE_OAPI
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
/* If the rntm is non-NULL, it may indicate that we should forgo sup
handling altogether. */ \
bool enable_sup = TRUE; \
if ( rntm != NULL ) enable_sup = bli_rntm_l3_sup( rntm ); \
\
if ( enable_sup ) \
{ \
/* Execute the small/unpacked oapi handler. If it finds that the problem
does not fall within the thresholds that define "small", or for some
other reason decides not to use the small/unpacked implementation,
the function returns with BLIS_FAILURE, which causes execution to
proceed towards the conventional implementation. */ \
err_t result = PASTEMAC(opname,sup)( alpha, a, b, beta, c, cntx, rntm ); \
if ( result == BLIS_SUCCESS ) \
{ \
return; \
} \
} \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Default to using native execution. */ \
num_t dt = bli_obj_dt( c ); \
ind_t im = BLIS_NAT; \
\
/* If each matrix operand has a complex storage datatype, try to get an
induced method (if one is available and enabled). NOTE: Allowing
precisions to vary while using 1m, which is what we do here, is unique
to gemm; other level-3 operations use 1m only if all storage datatypes
are equal (and they ignore the computation precision). */ \
if ( bli_obj_is_complex( c ) && \
bli_obj_is_complex( a ) && \
bli_obj_is_complex( b ) ) \
{ \
/* Find the highest priority induced method that is both enabled and
available for the current operation. (If an induced method is
available but not enabled, or simply unavailable, BLIS_NAT will
be returned here.) */ \
im = PASTEMAC(opname,ind_find_avail)( dt ); \
} \
\
/* If necessary, obtain a valid context from the gks using the induced
method id determined above. */ \
if ( cntx == NULL ) cntx = bli_gks_query_ind_cntx( im, dt ); \
\
/* Check the operands. */ \
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( alpha, a, b, beta, c, cntx ); \
\
/* Invoke the operation's front-end and request the default control tree. */ \
PASTEMAC(opname,_front)( alpha, a, b, beta, c, cntx, rntm, NULL ); \
}
// Include the object API definitions here.
#include "bli_l3_oapi.c"
// If a sandbox was enabled, we forgo defining bli_gemm_ex() since it will be
// defined in the sandbox environment.
#ifndef BLIS_ENABLE_SANDBOX
GENFRONT( gemm )
#endif
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Default to using native execution. */ \
num_t dt = bli_obj_dt( c ); \
ind_t im = BLIS_NAT; \
\
/* If all matrix operands are complex and of the same storage datatype, try
to get an induced method (if one is available and enabled). */ \
if ( bli_obj_dt( a ) == bli_obj_dt( c ) && \
bli_obj_dt( b ) == bli_obj_dt( c ) && \
bli_obj_is_complex( c ) ) \
{ \
/* Find the highest priority induced method that is both enabled and
available for the current operation. (If an induced method is
available but not enabled, or simply unavailable, BLIS_NAT will
be returned here.) */ \
im = PASTEMAC(opname,ind_find_avail)( dt ); \
} \
\
/* If necessary, obtain a valid context from the gks using the induced
method id determined above. */ \
if ( cntx == NULL ) cntx = bli_gks_query_ind_cntx( im, dt ); \
\
/* Check the operands. */ \
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( alpha, a, b, beta, c, cntx ); \
\
/* Invoke the operation's front-end and request the default control tree. */ \
PASTEMAC(opname,_front)( alpha, a, b, beta, c, cntx, rntm, NULL ); \
}
GENFRONT( gemmt )
GENFRONT( her2k )
GENFRONT( syr2k )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Default to using native execution. */ \
num_t dt = bli_obj_dt( c ); \
ind_t im = BLIS_NAT; \
\
/* If all matrix operands are complex and of the same storage datatype, try
to get an induced method (if one is available and enabled). */ \
if ( bli_obj_dt( a ) == bli_obj_dt( c ) && \
bli_obj_dt( b ) == bli_obj_dt( c ) && \
bli_obj_is_complex( c ) ) \
{ \
/* Find the highest priority induced method that is both enabled and
available for the current operation. (If an induced method is
available but not enabled, or simply unavailable, BLIS_NAT will
be returned here.) */ \
im = PASTEMAC(opname,ind_find_avail)( dt ); \
} \
\
/* If necessary, obtain a valid context from the gks using the induced
method id determined above. */ \
if ( cntx == NULL ) cntx = bli_gks_query_ind_cntx( im, dt ); \
\
/* Check the operands. */ \
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( side, alpha, a, b, beta, c, cntx ); \
\
/* Invoke the operation's front-end and request the default control tree. */ \
PASTEMAC(opname,_front)( side, alpha, a, b, beta, c, cntx, rntm, NULL ); \
}
GENFRONT( hemm )
GENFRONT( symm )
GENFRONT( trmm3 )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Default to using native execution. */ \
num_t dt = bli_obj_dt( c ); \
ind_t im = BLIS_NAT; \
\
/* If all matrix operands are complex and of the same storage datatype, try
to get an induced method (if one is available and enabled). */ \
if ( bli_obj_dt( a ) == bli_obj_dt( c ) && \
bli_obj_is_complex( c ) ) \
{ \
/* Find the highest priority induced method that is both enabled and
available for the current operation. (If an induced method is
available but not enabled, or simply unavailable, BLIS_NAT will
be returned here.) */ \
im = PASTEMAC(opname,ind_find_avail)( dt ); \
} \
\
/* If necessary, obtain a valid context from the gks using the induced
method id determined above. */ \
if ( cntx == NULL ) cntx = bli_gks_query_ind_cntx( im, dt ); \
\
/* Check the operands. */ \
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( alpha, a, beta, c, cntx ); \
\
/* Invoke the operation's front-end and request the default control tree. */ \
PASTEMAC(opname,_front)( alpha, a, beta, c, cntx, rntm, NULL ); \
}
GENFRONT( herk )
GENFRONT( syrk )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Default to using native execution. */ \
num_t dt = bli_obj_dt( b ); \
ind_t im = BLIS_NAT; \
\
/* If all matrix operands are complex and of the same storage datatype, try
to get an induced method (if one is available and enabled). */ \
if ( bli_obj_dt( a ) == bli_obj_dt( b ) && \
bli_obj_is_complex( b ) ) \
{ \
/* Find the highest priority induced method that is both enabled and
available for the current operation. (If an induced method is
available but not enabled, or simply unavailable, BLIS_NAT will
be returned here.) */ \
im = PASTEMAC(opname,ind_find_avail)( dt ); \
} \
\
/* If necessary, obtain a valid context from the gks using the induced
method id determined above. */ \
if ( cntx == NULL ) cntx = bli_gks_query_ind_cntx( im, dt ); \
\
/* Check the operands. */ \
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( side, alpha, a, b, cntx ); \
\
/* Invoke the operation's front-end and request the default control tree. */ \
PASTEMAC(opname,_front)( side, alpha, a, b, cntx, rntm, NULL ); \
}
GENFRONT( trmm )
GENFRONT( trsm )

View File

@@ -5,6 +5,7 @@
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2020, Advanced Micro Devices, Inc.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
@@ -33,22 +34,80 @@
*/
#undef GENTPROTCO
#define GENTPROTCO( ctype, ctype_r, ch, chr, varname ) \
//
// Prototype object-based interfaces (expert).
//
#undef GENPROT
#define GENPROT( opname ) \
\
void PASTEMAC(ch,varname) \
BLIS_EXPORT_BLIS void PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
conj_t conja, \
pack_t schema, \
dim_t panel_dim, \
dim_t panel_dim_max, \
dim_t panel_len, \
dim_t panel_len_max, \
ctype* kappa, \
ctype* a, inc_t inca, inc_t lda, \
ctype* p, inc_t ldp, \
cntx_t* cntx \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
);
INSERT_GENTPROTCO_BASIC0( packm_cxk_rih )
GENPROT( gemm )
GENPROT( gemmt )
GENPROT( her2k )
GENPROT( syr2k )
#undef GENPROT
#define GENPROT( opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
);
GENPROT( hemm )
GENPROT( symm )
GENPROT( trmm3 )
#undef GENPROT
#define GENPROT( opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
);
GENPROT( herk )
GENPROT( syrk )
#undef GENPROT
#define GENPROT( opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
cntx_t* cntx, \
rntm_t* rntm \
);
GENPROT( trmm )
GENPROT( trsm )

View File

@@ -275,29 +275,6 @@ bli_thread_barrier( thread ); \
bli_thread_barrier( thread ); \
} \
*/
/*
if ( bli_is_4mi_packed( schema ) ) { \
printf( "packm_var2: is_p_use = %lu\n", is_p_use ); \
if ( col_stored ) { \
if ( 0 ) \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: a_r", *m_panel_use, *n_panel_use, \
( ctype_r* )c_use, 2*rs_c, 2*cs_c, "%4.1f", "" ); \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: ap_r", *m_panel_max, *n_panel_max, \
( ctype_r* )p_use, rs_p, cs_p, "%4.1f", "" ); \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: ap_i", *m_panel_max, *n_panel_max, \
( ctype_r* )p_use + is_p_use, rs_p, cs_p, "%4.1f", "" ); \
} \
if ( row_stored ) { \
if ( 0 ) \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: b_r", *m_panel_use, *n_panel_use, \
( ctype_r* )c_use, 2*rs_c, 2*cs_c, "%4.1f", "" ); \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: bp_r", *m_panel_max, *n_panel_max, \
( ctype_r* )p_use, rs_p, cs_p, "%4.1f", "" ); \
PASTEMAC(chr,fprintm)( stdout, "packm_var2: bp_i", *m_panel_max, *n_panel_max, \
( ctype_r* )p_use + is_p_use, rs_p, cs_p, "%4.1f", "" ); \
} \
} \
*/
/*
PASTEMAC(chr,fprintm)( stdout, "packm_var2: bp_rpi", *m_panel_max, *n_panel_max, \
( ctype_r* )p_use, rs_p, cs_p, "%4.1f", "" ); \

View File

@@ -4,8 +4,7 @@
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2020, Advanced Micro Devices, Inc.
Copyright (C) 2021, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
@@ -33,18 +32,16 @@
*/
// Guard the function definitions so that they are only compiled when
// #included from files that define the typed API macros.
#ifdef BLIS_ENABLE_TAPI
#include "blis.h"
//
// Define BLAS-like interfaces with typed operands.
// Define BLAS-like interfaces with typed operands (basic).
//
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,EX_SUF) \
void PASTEMAC(ch,opname) \
( \
trans_t transa, \
trans_t transb, \
@@ -56,55 +53,70 @@ void PASTEMAC2(ch,opname,EX_SUF) \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_TAPI_EX_DECLS \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, k, n, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, n, c, rs_c, cs_c, &co ); \
\
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
transa, \
transb, \
m, n, k, \
alpha, \
a, rs_a, cs_a, \
b, rs_b, cs_b, \
beta, \
c, rs_c, cs_c, \
NULL, \
NULL \
); \
}
INSERT_GENTFUNC_BASIC0( gemm )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
trans_t transb, \
dim_t m, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
) \
{ \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
uploc, \
transa, \
transb, \
m, k, \
alpha, \
a, rs_a, cs_a, \
b, rs_b, cs_b, \
beta, \
c, rs_c, cs_c, \
NULL, \
NULL \
); \
}
INSERT_GENTFUNC_BASIC0( gemmt )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname, struca ) \
\
void PASTEMAC2(ch,opname,EX_SUF) \
void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
@@ -117,50 +129,24 @@ void PASTEMAC2(ch,opname,EX_SUF) \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_TAPI_EX_DECLS \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t mn_a; \
dim_t m_b, n_b; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
bli_set_dims_with_trans( transb, m, n, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, n, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_conj( conja, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( struca, &ao ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
side, \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
uploa, \
conja, \
transb, \
m, n, \
alpha, \
a, rs_a, cs_a, \
b, rs_b, cs_b, \
beta, \
c, rs_c, cs_c, \
NULL, \
NULL \
); \
}
@@ -171,7 +157,7 @@ INSERT_GENTFUNC_BASIC( symm, BLIS_SYMMETRIC )
#undef GENTFUNCR
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC2(ch,opname,EX_SUF) \
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
@@ -181,44 +167,21 @@ void PASTEMAC2(ch,opname,EX_SUF) \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_TAPI_EX_DECLS \
\
const num_t dt_r = PASTEMAC(chr,type); \
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
\
bli_obj_init_finish_1x1( dt_r, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt_r, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
\
bli_obj_set_struc( BLIS_HERMITIAN, &co ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&betao, \
&co, \
cntx, \
rntm \
uploc, \
transa, \
m, k, \
alpha, \
a, rs_a, cs_a, \
beta, \
c, rs_c, cs_c, \
NULL, \
NULL \
); \
}
@@ -228,7 +191,7 @@ INSERT_GENTFUNCR_BASIC0( herk )
#undef GENTFUNCR
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC2(ch,opname,EX_SUF) \
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
@@ -240,50 +203,23 @@ void PASTEMAC2(ch,opname,EX_SUF) \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_TAPI_EX_DECLS \
\
const num_t dt_r = PASTEMAC(chr,type); \
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, m, k, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt_r, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_HERMITIAN, &co ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
uploc, \
transa, \
transb, \
m, k, \
alpha, \
a, rs_a, cs_a, \
b, rs_b, cs_b, \
beta, \
c, rs_c, cs_c, \
NULL, \
NULL \
); \
}
@@ -293,7 +229,7 @@ INSERT_GENTFUNCR_BASIC0( her2k )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,EX_SUF) \
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
@@ -303,43 +239,21 @@ void PASTEMAC2(ch,opname,EX_SUF) \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_TAPI_EX_DECLS \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
\
bli_obj_set_struc( BLIS_SYMMETRIC, &co ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&betao, \
&co, \
cntx, \
rntm \
uploc, \
transa, \
m, k, \
alpha, \
a, rs_a, cs_a, \
beta, \
c, rs_c, cs_c, \
NULL, \
NULL \
); \
}
@@ -349,7 +263,7 @@ INSERT_GENTFUNC_BASIC0( syrk )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,EX_SUF) \
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
@@ -361,49 +275,23 @@ void PASTEMAC2(ch,opname,EX_SUF) \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_TAPI_EX_DECLS \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, m, k, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_SYMMETRIC, &co ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
uploc, \
transa, \
transb, \
m, k, \
alpha, \
a, rs_a, cs_a, \
b, rs_b, cs_b, \
beta, \
c, rs_c, cs_c, \
NULL, \
NULL \
); \
}
@@ -413,69 +301,7 @@ INSERT_GENTFUNC_BASIC0( syr2k )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,EX_SUF) \
( \
uplo_t uploc, \
trans_t transa, \
trans_t transb, \
dim_t m, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_TAPI_EX_DECLS \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, k, m, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( gemmt )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,EX_SUF) \
void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
@@ -489,51 +315,25 @@ void PASTEMAC2(ch,opname,EX_SUF) \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_TAPI_EX_DECLS \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t mn_a; \
dim_t m_b, n_b; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
bli_set_dims_with_trans( transb, m, n, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, n, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_diag( diaga, &ao ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_TRIANGULAR, &ao ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
side, \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
uploa, \
transa, \
diaga, \
transb, \
m, n, \
alpha, \
a, rs_a, cs_a, \
b, rs_b, cs_b, \
beta, \
c, rs_c, cs_c, \
NULL, \
NULL \
); \
}
@@ -543,7 +343,7 @@ INSERT_GENTFUNC_BASIC0( trmm3 )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,EX_SUF) \
void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
@@ -554,48 +354,25 @@ void PASTEMAC2(ch,opname,EX_SUF) \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b \
BLIS_TAPI_EX_PARAMS \
) \
{ \
bli_init_once(); \
\
BLIS_TAPI_EX_DECLS \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
\
dim_t mn_a; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
\
bli_obj_init_finish( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m, n, b, rs_b, cs_b, &bo ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_diag( diaga, &ao ); \
bli_obj_set_conjtrans( transa, &ao ); \
\
bli_obj_set_struc( BLIS_TRIANGULAR, &ao ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
/* Invoke the expert interface and request default cntx_t and rntm_t
objects. */ \
PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
side, \
&alphao, \
&ao, \
&bo, \
cntx, \
rntm \
uploa, \
transa, \
diaga, \
m, n, \
alpha, \
a, rs_a, cs_a, \
b, rs_b, cs_b, \
NULL, \
NULL \
); \
}
INSERT_GENTFUNC_BASIC0( trmm )
INSERT_GENTFUNC_BASIC0( trsm )
#endif

View File

@@ -35,13 +35,13 @@
//
// Prototype BLAS-like interfaces with typed operands.
// Prototype BLAS-like interfaces with typed operands (basic).
//
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC(ch,opname) \
( \
trans_t transa, \
trans_t transb, \
@@ -53,7 +53,6 @@ BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
);
INSERT_GENTPROT_BASIC0( gemm )
@@ -61,7 +60,7 @@ INSERT_GENTPROT_BASIC0( gemm )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
@@ -74,7 +73,6 @@ BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
);
INSERT_GENTPROT_BASIC0( hemm )
@@ -84,7 +82,7 @@ INSERT_GENTPROT_BASIC0( symm )
#undef GENTPROTR
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
@@ -94,7 +92,6 @@ BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
);
INSERT_GENTPROTR_BASIC0( herk )
@@ -103,7 +100,7 @@ INSERT_GENTPROTR_BASIC0( herk )
#undef GENTPROTR
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
@@ -115,7 +112,6 @@ BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
);
INSERT_GENTPROTR_BASIC0( her2k )
@@ -124,7 +120,7 @@ INSERT_GENTPROTR_BASIC0( her2k )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
@@ -134,7 +130,6 @@ BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
);
INSERT_GENTPROT_BASIC0( syrk )
@@ -143,7 +138,7 @@ INSERT_GENTPROT_BASIC0( syrk )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
@@ -155,17 +150,16 @@ BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
);
INSERT_GENTPROT_BASIC0( syr2k )
INSERT_GENTPROT_BASIC0( gemmt )
INSERT_GENTPROT_BASIC0( syr2k )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
@@ -179,7 +173,6 @@ BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c \
BLIS_TAPI_EX_PARAMS \
);
INSERT_GENTPROT_BASIC0( trmm3 )
@@ -188,7 +181,7 @@ INSERT_GENTPROT_BASIC0( trmm3 )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
BLIS_EXPORT_BLIS void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
@@ -199,7 +192,6 @@ BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,EX_SUF) \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b \
BLIS_TAPI_EX_PARAMS \
);
INSERT_GENTPROT_BASIC0( trmm )

View File

@@ -1,46 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
// Include cpp macros that instantiate the API definition templates as
// omitting expert parameters.
#include "bli_tapi_ba.h"
// Define the macro protecting the typed API definitions.
#define BLIS_ENABLE_TAPI
// Include the typed API definitions here.
#include "bli_l3_tapi.c"

View File

@@ -5,6 +5,7 @@
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2020, Advanced Micro Devices, Inc.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
@@ -34,13 +35,553 @@
#include "blis.h"
// Include cpp macros that instantiate the API definition templates as
// having expert parameters.
#include "bli_tapi_ex.h"
//
// Define BLAS-like interfaces with typed operands (expert).
//
// Define the macro protecting the typed API definitions.
#define BLIS_ENABLE_TAPI
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,BLIS_OAPI_EX_SUF) \
( \
trans_t transa, \
trans_t transb, \
dim_t m, \
dim_t n, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, k, n, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, n, c, rs_c, cs_c, &co ); \
\
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
// Include the typed API definitions here.
#include "bli_l3_tapi.c"
INSERT_GENTFUNC_BASIC0( gemm )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname, struca ) \
\
void PASTEMAC2(ch,opname,BLIS_OAPI_EX_SUF) \
( \
side_t side, \
uplo_t uploa, \
conj_t conja, \
trans_t transb, \
dim_t m, \
dim_t n, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t mn_a; \
dim_t m_b, n_b; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
bli_set_dims_with_trans( transb, m, n, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, n, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_conj( conja, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( struca, &ao ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
side, \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC( hemm, BLIS_HERMITIAN )
INSERT_GENTFUNC_BASIC( symm, BLIS_SYMMETRIC )
#undef GENTFUNCR
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC2(ch,opname,BLIS_OAPI_EX_SUF) \
( \
uplo_t uploc, \
trans_t transa, \
dim_t m, \
dim_t k, \
ctype_r* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt_r = PASTEMAC(chr,type); \
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
\
bli_obj_init_finish_1x1( dt_r, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt_r, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
\
bli_obj_set_struc( BLIS_HERMITIAN, &co ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNCR_BASIC0( herk )
#undef GENTFUNCR
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC2(ch,opname,BLIS_OAPI_EX_SUF) \
( \
uplo_t uploc, \
trans_t transa, \
trans_t transb, \
dim_t m, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt_r = PASTEMAC(chr,type); \
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, m, k, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt_r, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_HERMITIAN, &co ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNCR_BASIC0( her2k )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,BLIS_OAPI_EX_SUF) \
( \
uplo_t uploc, \
trans_t transa, \
dim_t m, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
\
bli_obj_set_struc( BLIS_SYMMETRIC, &co ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( syrk )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,BLIS_OAPI_EX_SUF) \
( \
uplo_t uploc, \
trans_t transa, \
trans_t transb, \
dim_t m, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, m, k, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_SYMMETRIC, &co ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( syr2k )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,BLIS_OAPI_EX_SUF) \
( \
uplo_t uploc, \
trans_t transa, \
trans_t transb, \
dim_t m, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, k, m, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( gemmt )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,BLIS_OAPI_EX_SUF) \
( \
side_t side, \
uplo_t uploa, \
trans_t transa, \
diag_t diaga, \
trans_t transb, \
dim_t m, \
dim_t n, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
obj_t betao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t co = BLIS_OBJECT_INITIALIZER; \
\
dim_t mn_a; \
dim_t m_b, n_b; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
bli_set_dims_with_trans( transb, m, n, &m_b, &n_b ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
bli_obj_init_finish_1x1( dt, beta, &betao ); \
\
bli_obj_init_finish( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_init_finish( dt, m, n, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_diag( diaga, &ao ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_TRIANGULAR, &ao ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
side, \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( trmm3 )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC2(ch,opname,BLIS_OAPI_EX_SUF) \
( \
side_t side, \
uplo_t uploa, \
trans_t transa, \
diag_t diaga, \
dim_t m, \
dim_t n, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao = BLIS_OBJECT_INITIALIZER_1X1; \
obj_t ao = BLIS_OBJECT_INITIALIZER; \
obj_t bo = BLIS_OBJECT_INITIALIZER; \
\
dim_t mn_a; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
\
bli_obj_init_finish_1x1( dt, alpha, &alphao ); \
\
bli_obj_init_finish( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_init_finish( dt, m, n, b, rs_b, cs_b, &bo ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_diag( diaga, &ao ); \
bli_obj_set_conjtrans( transa, &ao ); \
\
bli_obj_set_struc( BLIS_TRIANGULAR, &ao ); \
\
PASTEMAC(opname,BLIS_OAPI_EX_SUF) \
( \
side, \
&alphao, \
&ao, \
&bo, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( trmm )
INSERT_GENTFUNC_BASIC0( trsm )

View File

@@ -5,6 +5,7 @@
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2020, Advanced Micro Devices, Inc.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
@@ -33,10 +34,14 @@
*/
//
// Prototype BLAS-like interfaces with typed operands (expert).
//
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
trans_t transa, \
trans_t transb, \
@@ -52,18 +57,12 @@ void PASTEMAC(ch,opname) \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( gemm3mh )
INSERT_GENTPROT_BASIC0( gemm3m1 )
INSERT_GENTPROT_BASIC0( gemm4mh )
INSERT_GENTPROT_BASIC0( gemm4mb )
INSERT_GENTPROT_BASIC0( gemm4m1 )
INSERT_GENTPROT_BASIC0( gemm1m )
INSERT_GENTPROT_BASIC0( gemm )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
side_t side, \
uplo_t uploa, \
@@ -80,17 +79,34 @@ void PASTEMAC(ch,opname) \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( hemm3mh )
INSERT_GENTPROT_BASIC0( hemm3m1 )
INSERT_GENTPROT_BASIC0( hemm4mh )
INSERT_GENTPROT_BASIC0( hemm4m1 )
INSERT_GENTPROT_BASIC0( hemm1m )
INSERT_GENTPROT_BASIC0( hemm )
INSERT_GENTPROT_BASIC0( symm )
#undef GENTPROTR
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
uplo_t uploc, \
trans_t transa, \
dim_t m, \
dim_t k, \
ctype_r* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
);
INSERT_GENTPROTR_BASIC0( herk )
#undef GENTPROTR
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
\
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
uplo_t uploc, \
trans_t transa, \
@@ -103,71 +119,36 @@ void PASTEMAC(ch,opname) \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntmx \
rntm_t* rntm \
);
INSERT_GENTPROTR_BASIC0( her2k3mh )
INSERT_GENTPROTR_BASIC0( her2k3m1 )
INSERT_GENTPROTR_BASIC0( her2k4mh )
INSERT_GENTPROTR_BASIC0( her2k4m1 )
INSERT_GENTPROTR_BASIC0( her2k1m )
#undef GENTPROTR
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
dim_t m, \
dim_t k, \
ctype_r* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntmx \
);
INSERT_GENTPROTR_BASIC0( herk3mh )
INSERT_GENTPROTR_BASIC0( herk3m1 )
INSERT_GENTPROTR_BASIC0( herk4mh )
INSERT_GENTPROTR_BASIC0( herk4m1 )
INSERT_GENTPROTR_BASIC0( herk1m )
INSERT_GENTPROTR_BASIC0( her2k )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
side_t side, \
uplo_t uploa, \
conj_t conja, \
trans_t transb, \
uplo_t uploc, \
trans_t transa, \
dim_t m, \
dim_t n, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( symm3mh )
INSERT_GENTPROT_BASIC0( symm3m1 )
INSERT_GENTPROT_BASIC0( symm4mh )
INSERT_GENTPROT_BASIC0( symm4m1 )
INSERT_GENTPROT_BASIC0( symm1m )
INSERT_GENTPROT_BASIC0( syrk )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
uplo_t uploc, \
trans_t transa, \
@@ -183,41 +164,14 @@ void PASTEMAC(ch,opname) \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( syr2k3mh )
INSERT_GENTPROT_BASIC0( syr2k3m1 )
INSERT_GENTPROT_BASIC0( syr2k4mh )
INSERT_GENTPROT_BASIC0( syr2k4m1 )
INSERT_GENTPROT_BASIC0( syr2k1m )
INSERT_GENTPROT_BASIC0( gemmt )
INSERT_GENTPROT_BASIC0( syr2k )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
dim_t m, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( syrk3mh )
INSERT_GENTPROT_BASIC0( syrk3m1 )
INSERT_GENTPROT_BASIC0( syrk4mh )
INSERT_GENTPROT_BASIC0( syrk4m1 )
INSERT_GENTPROT_BASIC0( syrk1m )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
side_t side, \
uplo_t uploa, \
@@ -235,17 +189,13 @@ void PASTEMAC(ch,opname) \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( trmm33mh )
INSERT_GENTPROT_BASIC0( trmm33m1 )
INSERT_GENTPROT_BASIC0( trmm34mh )
INSERT_GENTPROT_BASIC0( trmm34m1 )
INSERT_GENTPROT_BASIC0( trmm31m )
INSERT_GENTPROT_BASIC0( trmm3 )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
BLIS_EXPORT_BLIS void PASTEMAC2(ch,opname,BLIS_TAPI_EX_SUF) \
( \
side_t side, \
uplo_t uploa, \
@@ -260,30 +210,6 @@ void PASTEMAC(ch,opname) \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( trmm3m1 )
INSERT_GENTPROT_BASIC0( trmm4m1 )
INSERT_GENTPROT_BASIC0( trmm1m )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
trans_t transa, \
diag_t diaga, \
dim_t m, \
dim_t n, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
cntx_t* cntx, \
rntm_t* rntm \
);
INSERT_GENTPROT_BASIC0( trsm3m1 )
INSERT_GENTPROT_BASIC0( trsm4m1 )
INSERT_GENTPROT_BASIC0( trsm1m )
INSERT_GENTPROT_BASIC0( trmm )
INSERT_GENTPROT_BASIC0( trsm )

View File

@@ -53,10 +53,6 @@ void bli_gemm_front
obj_t b_local;
obj_t c_local;
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_gemm_check( alpha, a, b, beta, c, cntx );
// If C has a zero dimension, return early.
if ( bli_obj_has_zero_dim( c ) )
{

View File

@@ -112,17 +112,6 @@ void bli_gemm_int
// Extract the function pointer from the current control tree node.
f = bli_cntl_var_func( cntl );
// Somewhat hackish support for 4m1b method implementation.
{
ind_t im = bli_cntx_method( cntx );
if ( im != BLIS_NAT )
{
if ( im == BLIS_4M1B )
if ( f == bli_gemm_ker_var2 ) f = bli_gemm4mb_ker_var2;
}
}
// Invoke the variant.
f
(

View File

@@ -219,7 +219,17 @@ void PASTEMAC(ch,varname) \
/*const dim_t PACKNR = rs_b;*/ \
\
/* Query the context for the micro-kernel address and cast it to its
function pointer type. */ \
function pointer type. Note that the virtual gemm ukernel is queried
instead of the native gemm ukernel. This is needed for certain
situations for the 1m method that require an extra layer of logic
to allow for handling (for example) complex values of beta. Also
note that under certain circumstances, the real-domain version of
this macrokernel will be called for 1m (NOT the complex version)
as an optimization. In these cases, the corresponding real-domain
slots within the cntx_t's virtual gemm ukernel func_t will contain
pointers to the *native* gemm ukernel, thanks to logic in the
context initialization function for the induced method (defined
in bli_cntx_ref.c). */ \
PASTECH(ch,gemm_ukr_ft) \
gemm_ukr = bli_cntx_get_l3_vir_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \
\

View File

@@ -368,8 +368,6 @@ void PASTEMAC2(chc,che,varname) \
then accumulate it into C via the xpbys_mxn macro. */ \
/*if ( 1 )*/ \
{ \
/*bli_auxinfo_set_dt_on_output( dte, &aux );*/ \
\
/* Invoke the gemm micro-kernel. */ \
gemm_ukr \
( \
@@ -392,48 +390,6 @@ void PASTEMAC2(chc,che,varname) \
c11, rs_c, cs_c \
); \
} \
/*
else if ( m_cur == MR && n_cur == NR ) \
{ \
bli_auxinfo_set_dt_on_output( dtc, &aux ); \
\
gemm_ukr \
( \
k, \
alpha_cast, \
a1, \
b1, \
( ctype_e* )beta_cast, \
( ctype_e* )c11, rs_c, cs_c, \
&aux, \
cntx \
); \
} \
else \
{ \
bli_auxinfo_set_dt_on_output( dte, &aux ); \
\
gemm_ukr \
( \
k, \
alpha_cast, \
a1, \
b1, \
zero, \
ct, rs_ct, cs_ct, \
&aux, \
cntx \
); \
\
PASTEMAC3(che,chc,chc,xpbys_mxn) \
( \
m_cur, n_cur, \
ct, rs_ct, cs_ct, \
beta_cast, \
c11, rs_c, cs_c \
); \
} \
*/ \
} \
} \
\

View File

@@ -62,9 +62,6 @@ GENPROT( gemm_ker_var1 )
GENPROT( gemm_ker_var2 )
// Headers for induced algorithms:
GENPROT( gemm4mb_ker_var2 ) // 4m1b
//
// Prototype BLAS-like interfaces with void pointer operands.
@@ -94,6 +91,3 @@ void PASTEMAC(ch,varname) \
INSERT_GENTPROT_BASIC0( gemm_ker_var2 )
// Headers for induced algorithms:
INSERT_GENTPROT_BASIC0( gemm4mb_ker_var2 ) // 4m1b

View File

@@ -1,365 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2018 - 2019, Advanced Micro Devices, Inc.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
#define FUNCPTR_T gemm_fp
typedef void (*FUNCPTR_T)(
pack_t schema_a,
pack_t schema_b,
dim_t m,
dim_t n,
dim_t k,
void* alpha,
void* a, inc_t cs_a, inc_t is_a,
dim_t pd_a, inc_t ps_a,
void* b, inc_t rs_b, inc_t is_b,
dim_t pd_b, inc_t ps_b,
void* beta,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
rntm_t* rntm,
thrinfo_t* thread
);
static FUNCPTR_T GENARRAY(ftypes,gemm4mb_ker_var2);
void bli_gemm4mb_ker_var2
(
obj_t* a,
obj_t* b,
obj_t* c,
cntx_t* cntx,
rntm_t* rntm,
cntl_t* cntl,
thrinfo_t* thread
)
{
num_t dt_exec = bli_obj_exec_dt( c );
pack_t schema_a = bli_obj_pack_schema( a );
pack_t schema_b = bli_obj_pack_schema( b );
dim_t m = bli_obj_length( c );
dim_t n = bli_obj_width( c );
dim_t k = bli_obj_width( a );
void* buf_a = bli_obj_buffer_at_off( a );
inc_t cs_a = bli_obj_col_stride( a );
inc_t is_a = bli_obj_imag_stride( a );
dim_t pd_a = bli_obj_panel_dim( a );
inc_t ps_a = bli_obj_panel_stride( a );
void* buf_b = bli_obj_buffer_at_off( b );
inc_t rs_b = bli_obj_row_stride( b );
inc_t is_b = bli_obj_imag_stride( b );
dim_t pd_b = bli_obj_panel_dim( b );
inc_t ps_b = bli_obj_panel_stride( b );
void* buf_c = bli_obj_buffer_at_off( c );
inc_t rs_c = bli_obj_row_stride( c );
inc_t cs_c = bli_obj_col_stride( c );
obj_t scalar_a;
obj_t scalar_b;
void* buf_alpha;
void* buf_beta;
FUNCPTR_T f;
// Detach and multiply the scalars attached to A and B.
bli_obj_scalar_detach( a, &scalar_a );
bli_obj_scalar_detach( b, &scalar_b );
bli_mulsc( &scalar_a, &scalar_b );
// Grab the addresses of the internal scalar buffers for the scalar
// merged above and the scalar attached to C.
buf_alpha = bli_obj_internal_scalar_buffer( &scalar_b );
buf_beta = bli_obj_internal_scalar_buffer( c );
// Index into the type combination array to extract the correct
// function pointer.
f = ftypes[dt_exec];
// Invoke the function.
f( schema_a,
schema_b,
m,
n,
k,
buf_alpha,
buf_a, cs_a, is_a,
pd_a, ps_a,
buf_b, rs_b, is_b,
pd_b, ps_b,
buf_beta,
buf_c, rs_c, cs_c,
cntx,
rntm,
thread );
}
#undef GENTFUNC
#define GENTFUNC( ctype, ch, varname ) \
\
void PASTEMAC(ch,varname) \
( \
pack_t schema_a, \
pack_t schema_b, \
dim_t m, \
dim_t n, \
dim_t k, \
void* alpha, \
void* a, inc_t cs_a, inc_t is_a, \
dim_t pd_a, inc_t ps_a, \
void* b, inc_t rs_b, inc_t is_b, \
dim_t pd_b, inc_t ps_b, \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm, \
thrinfo_t* thread \
) \
{ \
const num_t dt = PASTEMAC(ch,type); \
\
/* Alias some constants to simpler names. */ \
const dim_t MR = pd_a; \
const dim_t NR = pd_b; \
/*const dim_t PACKMR = cs_a;*/ \
/*const dim_t PACKNR = rs_b;*/ \
\
/* Query the context for the micro-kernel address and cast it to its
function pointer type. */ \
PASTECH(ch,gemm_ukr_ft) \
gemm_ukr = bli_cntx_get_l3_vir_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \
\
/* Temporary C buffer for edge cases. */ \
ctype ct[ BLIS_STACK_BUF_MAX_SIZE \
/ sizeof( ctype ) ] \
__attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \
const bool col_pref = bli_cntx_l3_vir_ukr_prefers_cols_dt( dt, BLIS_GEMM_UKR, cntx ); \
const inc_t rs_ct = ( col_pref ? 1 : NR ); \
const inc_t cs_ct = ( col_pref ? MR : 1 ); \
\
ctype* restrict zero = PASTEMAC(ch,0); \
ctype* restrict one = PASTEMAC(ch,1); \
ctype* restrict a_cast = a; \
ctype* restrict b_cast = b; \
ctype* restrict c_cast = c; \
ctype* restrict alpha_cast = alpha; \
ctype* restrict beta_cast = beta; \
ctype* restrict b1; \
ctype* restrict c1; \
\
dim_t m_iter, m_left; \
dim_t n_iter, n_left; \
dim_t i, j; \
dim_t ii; \
dim_t m_cur; \
dim_t n_cur; \
inc_t rstep_a; \
inc_t cstep_b; \
inc_t rstep_c, cstep_c; \
auxinfo_t aux; \
\
/*
Assumptions/assertions:
rs_a == 1
cs_a == PACKMR
pd_a == MR
ps_a == stride to next micro-panel of A
rs_b == PACKNR
cs_b == 1
pd_b == NR
ps_b == stride to next micro-panel of B
rs_c == (no assumptions)
cs_c == (no assumptions)
*/ \
\
/* If any dimension is zero, return immediately. */ \
if ( bli_zero_dim3( m, n, k ) ) return; \
\
/* Clear the temporary C buffer in case it has any infs or NaNs. */ \
PASTEMAC(ch,set0s_mxn)( MR, NR, \
ct, rs_ct, cs_ct ); \
\
/* Compute number of primary and leftover components of the m and n
dimensions. */ \
n_iter = n / NR; \
n_left = n % NR; \
\
m_iter = m / MR; \
m_left = m % MR; \
\
if ( n_left ) ++n_iter; \
if ( m_left ) ++m_iter; \
\
/* Determine some increments used to step through A, B, and C. */ \
rstep_a = ps_a; \
\
cstep_b = ps_b; \
\
rstep_c = rs_c * MR; \
cstep_c = cs_c * NR; \
\
/* Save the pack schemas of A and B to the auxinfo_t object. */ \
bli_auxinfo_set_schema_a( schema_a, &aux ); \
bli_auxinfo_set_schema_b( schema_b, &aux ); \
\
/* Save the imaginary stride of A and B to the auxinfo_t object. */ \
bli_auxinfo_set_is_a( is_a, &aux ); \
bli_auxinfo_set_is_b( is_b, &aux ); \
\
thrinfo_t* caucus = bli_thrinfo_sub_node( thread ); \
dim_t jr_num_threads = bli_thread_n_way( thread ); \
dim_t jr_thread_id = bli_thread_work_id( thread ); \
dim_t ir_num_threads = bli_thread_n_way( caucus ); \
dim_t ir_thread_id = bli_thread_work_id( caucus ); \
\
dim_t jr_inc = jr_num_threads; \
dim_t ir_inc = ir_num_threads; \
\
/* Loop over the n dimension (NR columns at a time). */ \
for ( j = jr_thread_id; j < n_iter; j += jr_num_threads ) \
{ \
ctype* restrict a1; \
ctype* restrict c11; \
ctype* restrict b2; \
\
b1 = b_cast + j * cstep_b; \
c1 = c_cast + j * cstep_c; \
\
n_cur = ( bli_is_not_edge_f( j, n_iter, n_left ) ? NR : n_left ); \
\
/* Initialize our next panel of B to be the current panel of B. */ \
b2 = b1; \
\
/* In the 4mb method, we execute the ir loop twice: once for b_r
and once for b_i. */ \
for ( ii = 0; ii < 2; ++ii ) \
{ \
ctype* restrict beta_use; \
\
if ( ii == 0 ) \
{ \
bli_auxinfo_set_schema_b( BLIS_PACKED_COL_PANELS_RO, &aux ); \
beta_use = beta_cast; \
} \
else \
{ \
bli_auxinfo_set_schema_b( BLIS_PACKED_COL_PANELS_IO, &aux ); \
beta_use = one; \
} \
\
/* Loop over the m dimension (MR rows at a time). */ \
for ( i = ir_thread_id; i < m_iter; i += ir_num_threads ) \
{ \
ctype* restrict a2; \
\
a1 = a_cast + i * rstep_a; \
c11 = c1 + i * rstep_c; \
\
m_cur = ( bli_is_not_edge_f( i, m_iter, m_left ) ? MR : m_left ); \
\
/* Compute the addresses of the next panels of A and B. */ \
a2 = bli_gemm_get_next_a_upanel( a1, rstep_a, ir_inc ); \
if ( bli_is_last_iter_rr( i, m_iter, ir_thread_id, ir_num_threads ) ) \
{ \
a2 = a_cast; \
b2 = bli_gemm_get_next_b_upanel( b1, cstep_b, jr_inc ); \
if ( bli_is_last_iter_rr( j, n_iter, jr_thread_id, jr_num_threads ) ) \
b2 = b_cast; \
} \
\
/* Save addresses of next panels of A and B to the auxinfo_t
object. */ \
bli_auxinfo_set_next_a( a2, &aux ); \
bli_auxinfo_set_next_b( b2, &aux ); \
\
/* Handle interior and edge cases separately. */ \
if ( m_cur == MR && n_cur == NR ) \
{ \
/*PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var3 (4m1b): c before", 8, 6, c11, rs_c, cs_c, "%4.1f", "" );*/ \
/* Invoke the gemm micro-kernel. */ \
gemm_ukr \
( \
k, \
alpha_cast, \
a1, \
b1, \
beta_use, \
c11, rs_c, cs_c, \
&aux, \
cntx \
); \
/*PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var3 (4m1b): c after", 8, 6, c11, rs_c, cs_c, "%4.1f", "" );*/ \
} \
else \
{ \
/* Invoke the gemm micro-kernel. */ \
gemm_ukr \
( \
k, \
alpha_cast, \
a1, \
b1, \
zero, \
ct, rs_ct, cs_ct, \
&aux, \
cntx \
); \
\
/* Scale the bottom edge of C and add the result from above. */ \
PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \
ct, rs_ct, cs_ct, \
beta_use, \
c11, rs_c, cs_c ); \
} \
} \
} \
} \
/*printf( "gemm_ker_var3 (4m1b): returning\n" );*/ \
\
/*PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var3: b1", k, NR, b1, NR, 1, "%4.1f", "" ); \
PASTEMAC(ch,fprintm)( stdout, "gemm_ker_var3: a1", MR, k, a1, 1, MR, "%4.1f", "" );*/ \
}
INSERT_GENTFUNC_BASIC0( gemm4mb_ker_var2 )

View File

@@ -1,363 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
#define FUNCPTR_T gemm_fp
typedef void (*FUNCPTR_T)(
pack_t schema_a,
pack_t schema_b,
dim_t m,
dim_t n,
dim_t k,
void* alpha,
void* a, inc_t cs_a, inc_t is_a,
dim_t pd_a, inc_t ps_a,
void* b, inc_t rs_b, inc_t is_b,
dim_t pd_b, inc_t ps_b,
void* beta,
void* c, inc_t rs_c, inc_t cs_c,
cntx_t* cntx,
thrinfo_t* thread
);
static FUNCPTR_T GENARRAY(ftypes,gemm3m2_ker_var2);
void bli_gemm3m2_ker_var2
(
obj_t* a,
obj_t* b,
obj_t* c,
cntx_t* cntx,
cntl_t* cntl,
thrinfo_t* thread
)
{
num_t dt_exec = bli_obj_exec_dt( c );
pack_t schema_a = bli_obj_pack_schema( a );
pack_t schema_b = bli_obj_pack_schema( b );
dim_t m = bli_obj_length( c );
dim_t n = bli_obj_width( c );
dim_t k = bli_obj_width( a );
void* buf_a = bli_obj_buffer_at_off( a );
inc_t cs_a = bli_obj_col_stride( a );
inc_t is_a = bli_obj_imag_stride( a );
dim_t pd_a = bli_obj_panel_dim( a );
inc_t ps_a = bli_obj_panel_stride( a );
void* buf_b = bli_obj_buffer_at_off( b );
inc_t rs_b = bli_obj_row_stride( b );
inc_t is_b = bli_obj_imag_stride( b );
dim_t pd_b = bli_obj_panel_dim( b );
inc_t ps_b = bli_obj_panel_stride( b );
void* buf_c = bli_obj_buffer_at_off( c );
inc_t rs_c = bli_obj_row_stride( c );
inc_t cs_c = bli_obj_col_stride( c );
obj_t scalar_a;
obj_t scalar_b;
void* buf_alpha;
void* buf_beta;
FUNCPTR_T f;
// Detach and multiply the scalars attached to A and B.
bli_obj_scalar_detach( a, &scalar_a );
bli_obj_scalar_detach( b, &scalar_b );
bli_mulsc( &scalar_a, &scalar_b );
// Grab the addresses of the internal scalar buffers for the scalar
// merged above and the scalar attached to C.
buf_alpha = bli_obj_internal_scalar_buffer( &scalar_b );
buf_beta = bli_obj_internal_scalar_buffer( c );
// Index into the type combination array to extract the correct
// function pointer.
f = ftypes[dt_exec];
// Invoke the function.
f( schema_a,
schema_b,
m,
n,
k,
buf_alpha,
buf_a, cs_a, is_a,
pd_a, ps_a,
buf_b, rs_b, is_b,
pd_b, ps_b,
buf_beta,
buf_c, rs_c, cs_c,
cntx,
thread );
}
#undef GENTFUNC
#define GENTFUNC( ctype, ch, varname ) \
\
void PASTEMAC(ch,varname) \
( \
pack_t schema_a, \
pack_t schema_b, \
dim_t m, \
dim_t n, \
dim_t k, \
void* alpha, \
void* a, inc_t cs_a, inc_t is_a, \
dim_t pd_a, inc_t ps_a, \
void* b, inc_t rs_b, inc_t is_b, \
dim_t pd_b, inc_t ps_b, \
void* beta, \
void* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
thrinfo_t* thread \
) \
{ \
const num_t dt = PASTEMAC(ch,type); \
\
/* Alias some constants to simpler names. */ \
const dim_t MR = pd_a; \
const dim_t NR = pd_b; \
/*const dim_t PACKMR = cs_a;*/ \
/*const dim_t PACKNR = rs_b;*/ \
\
/* Query the context for the micro-kernel address and cast it to its
function pointer type. */ \
PASTECH(ch,gemm_ukr_ft) \
gemm_ukr = bli_cntx_get_l3_vir_ukr_dt( dt, BLIS_GEMM_UKR, cntx ); \
\
/* Temporary C buffer for edge cases. */ \
ctype ct[ BLIS_STACK_BUF_MAX_SIZE \
/ sizeof( ctype ) ] \
__attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \
const bool col_pref = bli_cntx_l3_vir_ukr_prefers_cols_dt( dt, BLIS_GEMM_UKR, cntx ); \
const inc_t rs_ct = ( col_pref ? 1 : NR ); \
const inc_t cs_ct = ( col_pref ? MR : 1 ); \
\
ctype* restrict zero = PASTEMAC(ch,0); \
ctype* restrict one = PASTEMAC(ch,1); \
ctype* restrict a_cast = a; \
ctype* restrict b_cast = b; \
ctype* restrict c_cast = c; \
ctype* restrict alpha_cast = alpha; \
ctype* restrict beta_cast = beta; \
ctype* restrict b1; \
ctype* restrict c1; \
\
dim_t m_iter, m_left; \
dim_t n_iter, n_left; \
dim_t i, j; \
dim_t ii; \
dim_t m_cur; \
dim_t n_cur; \
inc_t rstep_a; \
inc_t cstep_b; \
inc_t rstep_c, cstep_c; \
auxinfo_t aux; \
\
/*
Assumptions/assertions:
rs_a == 1
cs_a == PACKMR
pd_a == MR
ps_a == stride to next micro-panel of A
rs_b == PACKNR
cs_b == 1
pd_b == NR
ps_b == stride to next micro-panel of B
rs_c == (no assumptions)
cs_c == (no assumptions)
*/ \
\
/* If any dimension is zero, return immediately. */ \
if ( bli_zero_dim3( m, n, k ) ) return; \
\
/* Clear the temporary C buffer in case it has any infs or NaNs. */ \
PASTEMAC(ch,set0s_mxn)( MR, NR, \
ct, rs_ct, cs_ct ); \
\
/* Compute number of primary and leftover components of the m and n
dimensions. */ \
n_iter = n / NR; \
n_left = n % NR; \
\
m_iter = m / MR; \
m_left = m % MR; \
\
if ( n_left ) ++n_iter; \
if ( m_left ) ++m_iter; \
\
/* Determine some increments used to step through A, B, and C. */ \
rstep_a = ps_a; \
\
cstep_b = ps_b; \
\
rstep_c = rs_c * MR; \
cstep_c = cs_c * NR; \
\
/* Save the pack schemas of A and B to the auxinfo_t object. */ \
bli_auxinfo_set_schema_a( schema_a, &aux ); \
bli_auxinfo_set_schema_b( schema_b, &aux ); \
\
/* Save the imaginary stride of A and B to the auxinfo_t object. */ \
bli_auxinfo_set_is_a( is_a, &aux ); \
bli_auxinfo_set_is_b( is_b, &aux ); \
\
thrinfo_t* caucus = bli_thrinfo_sub_node( thread ); \
dim_t jr_num_threads = bli_thread_n_way( thread ); \
dim_t jr_thread_id = bli_thread_work_id( thread ); \
dim_t ir_num_threads = bli_thread_n_way( caucus ); \
dim_t ir_thread_id = bli_thread_work_id( caucus ); \
\
/* Loop over the n dimension (NR columns at a time). */ \
for ( j = jr_thread_id; j < n_iter; j += jr_num_threads ) \
{ \
ctype* restrict a1; \
ctype* restrict c11; \
ctype* restrict b2; \
\
b1 = b_cast + j * cstep_b; \
c1 = c_cast + j * cstep_c; \
\
n_cur = ( bli_is_not_edge_f( j, n_iter, n_left ) ? NR : n_left ); \
\
/* Initialize our next panel of B to be the current panel of B. */ \
b2 = b1; \
\
/* In the 3m2 method, we execute the ir loop thrice: once for
a_r[ir] * b_r, once for a_i[ir] * b_i, and once for
a_{r+i}[ir] * b_{r+i}. */ \
for ( ii = 0; ii < 3; ++ii ) \
{ \
ctype* restrict beta_use; \
\
if ( ii == 0 ) \
{ \
bli_auxinfo_set_schema_a( BLIS_PACKED_ROW_PANELS_RO, &aux ); \
bli_auxinfo_set_schema_b( BLIS_PACKED_COL_PANELS_RO, &aux ); \
beta_use = beta_cast; \
} \
else if ( ii == 1 ) \
{ \
bli_auxinfo_set_schema_a( BLIS_PACKED_ROW_PANELS_IO, &aux ); \
bli_auxinfo_set_schema_b( BLIS_PACKED_COL_PANELS_IO, &aux ); \
beta_use = one; \
} \
else \
{ \
bli_auxinfo_set_schema_a( BLIS_PACKED_ROW_PANELS_RPI, &aux ); \
bli_auxinfo_set_schema_b( BLIS_PACKED_COL_PANELS_RPI, &aux ); \
beta_use = one; \
} \
\
/* Loop over the m dimension (MR rows at a time). */ \
for ( i = ir_thread_id; i < m_iter; i += ir_num_threads ) \
{ \
ctype* restrict a2; \
\
a1 = a_cast + i * rstep_a; \
c11 = c1 + i * rstep_c; \
\
m_cur = ( bli_is_not_edge_f( i, m_iter, m_left ) ? MR : m_left ); \
\
/* Compute the addresses of the next panels of A and B. */ \
a2 = bli_gemm_get_next_a_upanel( caucus, a1, rstep_a ); \
if ( bli_is_last_iter( i, m_iter, ir_thread_id, ir_num_threads ) ) \
{ \
a2 = a_cast; \
b2 = bli_gemm_get_next_b_upanel( thread, b1, cstep_b ); \
if ( bli_is_last_iter( j, n_iter, jr_thread_id, jr_num_threads ) ) \
b2 = b_cast; \
} \
\
/* Save addresses of next panels of A and B to the auxinfo_t
object. */ \
bli_auxinfo_set_next_a( a2, &aux ); \
bli_auxinfo_set_next_b( b2, &aux ); \
\
/* Handle interior and edge cases separately. */ \
if ( m_cur == MR && n_cur == NR ) \
{ \
/* Invoke the gemm micro-kernel. */ \
gemm_ukr \
( \
k, \
alpha_cast, \
a1, \
b1, \
beta_use, \
c11, rs_c, cs_c, \
&aux, \
cntx \
); \
} \
else \
{ \
/* Invoke the gemm micro-kernel. */ \
gemm_ukr \
( \
k, \
alpha_cast, \
a1, \
b1, \
zero, \
ct, rs_ct, cs_ct, \
&aux, \
cntx \
); \
\
/* Scale the bottom edge of C and add the result from above. */ \
PASTEMAC(ch,xpbys_mxn)( m_cur, n_cur, \
ct, rs_ct, cs_ct, \
beta_use, \
c11, rs_c, cs_c ); \
} \
} \
} \
} \
\
/*PASTEMAC(ch,fprintm)( stdout, "gemm3m2_ker_var2: b1", k, NR, b1, NR, 1, "%4.1f", "" ); \
PASTEMAC(ch,fprintm)( stdout, "gemm3m2_ker_var2: a1", MR, k, a1, 1, MR, "%4.1f", "" );*/ \
}
INSERT_GENTFUNC_BASIC0( gemm3m2_ker_var2 )

View File

@@ -1,142 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
void bli_gemm3m3_packa
(
obj_t* a,
obj_t* b,
obj_t* c,
cntx_t* cntx,
cntl_t* cntl,
thrinfo_t* thread
)
{
obj_t a_pack;
// Make a copy of the context for each stage.
cntx_t cntx_ro = *cntx;
cntx_t cntx_io = *cntx;
cntx_t cntx_rpi = *cntx;
// -----------------------------------------------------
// Initialize the context for the real-only stage.
bli_gemm3m3_cntx_stage( 0, &cntx_ro );
// Pack matrix the real-only part of A.
bli_l3_packm
(
a,
&a_pack,
&cntx_ro,
cntl,
thread
);
// Proceed with execution using packed matrix A.
bli_gemm_int
(
&BLIS_ONE,
&a_pack,
b,
&BLIS_ONE,
c,
cntx,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);
// Only apply beta within the first of three subproblems.
bli_obj_scalar_reset( c );
// -----------------------------------------------------
// Initialize the context for the imag-only stage.
bli_gemm3m3_cntx_stage( 1, &cntx_io );
// Pack matrix the imag-only part of A.
bli_l3_packm
(
a,
&a_pack,
&cntx_io,
cntl,
thread
);
// Proceed with execution using packed matrix A.
bli_gemm_int
(
&BLIS_ONE,
&a_pack,
b,
&BLIS_ONE,
c,
cntx,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);
// -----------------------------------------------------
// Initialize the context for the real+imag stage.
bli_gemm3m3_cntx_stage( 2, &cntx_rpi );
// Pack matrix the real+imag part of A.
bli_l3_packm
(
a,
&a_pack,
&cntx_rpi,
cntl,
thread
);
// Proceed with execution using packed matrix A.
bli_gemm_int
(
&BLIS_ONE,
&a_pack,
b,
&BLIS_ONE,
c,
cntx,
bli_cntl_sub_node( cntl ),
bli_thrinfo_sub_node( thread )
);
}

View File

@@ -53,10 +53,6 @@ void bli_gemmt_front
obj_t b_local;
obj_t c_local;
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_gemmt_check( alpha, a, b, beta, c, cntx );
// If C has a zero dimension, return early.
if ( bli_obj_has_zero_dim( c ) )
{

View File

@@ -53,10 +53,6 @@ void bli_hemm_front
obj_t b_local;
obj_t c_local;
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_hemm_check( side, alpha, a, b, beta, c, cntx );
// If alpha is zero, scale by beta and return.
if ( bli_obj_equals( alpha, &BLIS_ZERO ) )
{

View File

@@ -55,10 +55,6 @@ void bli_her2k_front
obj_t b_local;
obj_t ah_local;
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_her2k_check( alpha, a, b, beta, c, cntx );
// If alpha is zero, scale by beta, zero the imaginary components of
// the diagonal elements, and return.
if ( bli_obj_equals( alpha, &BLIS_ZERO ) )

View File

@@ -51,10 +51,6 @@ void bli_herk_front
obj_t ah_local;
obj_t c_local;
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_herk_check( alpha, a, beta, c, cntx );
// If alpha is zero, scale by beta, zero the imaginary components of
// the diagonal elements, and return.
if ( bli_obj_equals( alpha, &BLIS_ZERO ) )

View File

@@ -279,9 +279,6 @@ void PASTEMAC(ch,varname) \
/* Save the imaginary stride of A and B to the auxinfo_t object. */ \
bli_auxinfo_set_is_a( is_a, &aux ); \
bli_auxinfo_set_is_b( is_b, &aux ); \
\
/* Save the desired output datatype (indicating no typecasting). */ \
/*bli_auxinfo_set_dt_on_output( dt, &aux );*/ \
\
/* The 'thread' argument points to the thrinfo_t node for the 2nd (jr)
loop around the microkernel. Here we query the thrinfo_t node for the

View File

@@ -281,9 +281,6 @@ void PASTEMAC(ch,varname) \
/* Save the imaginary stride of A and B to the auxinfo_t object. */ \
bli_auxinfo_set_is_a( is_a, &aux ); \
bli_auxinfo_set_is_b( is_b, &aux ); \
\
/* Save the desired output datatype (indicating no typecasting). */ \
/*bli_auxinfo_set_dt_on_output( dt, &aux );*/ \
\
/* The 'thread' argument points to the thrinfo_t node for the 2nd (jr)
loop around the microkernel. Here we query the thrinfo_t node for the

View File

@@ -53,10 +53,6 @@ void bli_symm_front
obj_t b_local;
obj_t c_local;
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_symm_check( side, alpha, a, b, beta, c, cntx );
// If alpha is zero, scale by beta and return.
if ( bli_obj_equals( alpha, &BLIS_ZERO ) )
{

View File

@@ -54,10 +54,6 @@ void bli_syr2k_front
obj_t b_local;
obj_t at_local;
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_syr2k_check( alpha, a, b, beta, c, cntx );
// If alpha is zero, scale by beta and return.
if ( bli_obj_equals( alpha, &BLIS_ZERO ) )
{

View File

@@ -69,10 +69,6 @@ void bli_syrk_front
#endif
#endif
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_syrk_check( alpha, a, beta, c, cntx );
// If alpha is zero, scale by beta and return.
if ( bli_obj_equals( alpha, &BLIS_ZERO ) )
{

View File

@@ -52,10 +52,6 @@ void bli_trmm_front
obj_t b_local;
obj_t c_local;
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_trmm_check( side, alpha, a, b, &BLIS_ZERO, b, cntx );
// If alpha is zero, scale by beta and return.
if ( bli_obj_equals( alpha, &BLIS_ZERO ) )
{

View File

@@ -203,9 +203,6 @@ void PASTEMAC(ch,varname) \
inc_t rstep_c, cstep_c; \
inc_t istep_a; \
inc_t istep_b; \
inc_t off_scl; \
inc_t ss_a_num; \
inc_t ss_a_den; \
inc_t ps_a_cur; \
inc_t is_a_cur; \
auxinfo_t aux; \
@@ -243,30 +240,6 @@ void PASTEMAC(ch,varname) \
matrix), which is used by 4m1/3m1 implementations, we need
this unreduced value of k. */ \
k_full = k; \
\
/* Compute indexing scaling factor for for 4m or 3m. This is
needed because one of the packing register blocksizes (PACKMR
or PACKNR) is used to index into the micro-panels of the non-
triangular matrix when computing with a diagonal-intersecting
micro-panel of the triangular matrix. In the case of 4m or 3m,
real values are stored in both sub-panels, and so the indexing
needs to occur in units of real values. The value computed
here is divided into the complex pointer offset to cause the
pointer to be advanced by the correct value. */ \
if ( bli_is_4mi_packed( schema_a ) || \
bli_is_3mi_packed( schema_a ) || \
bli_is_rih_packed( schema_a ) ) off_scl = 2; \
else off_scl = 1; \
\
/* Compute the storage stride scaling. Usually this is just 1.
However, in the case of interleaved 3m, we need to scale the
offset by 3/2. And if we are packing real-only, imag-only, or
summed-only, we need to scale the computed panel sizes by 1/2
to compensate for the fact that the pointer arithmetic occurs
in terms of complex elements rather than real elements. */ \
if ( bli_is_3mi_packed( schema_a ) ) { ss_a_num = 3; ss_a_den = 2; } \
else if ( bli_is_rih_packed( schema_a ) ) { ss_a_num = 1; ss_a_den = 2; } \
else { ss_a_num = 1; ss_a_den = 1; } \
\
/* If there is a zero region above where the diagonal of A intersects the
left edge of the block, adjust the pointer to C and treat this case as
@@ -317,9 +290,6 @@ void PASTEMAC(ch,varname) \
\
/* Save the imaginary stride of B to the auxinfo_t object. */ \
bli_auxinfo_set_is_b( istep_b, &aux ); \
\
/* Save the desired output datatype (indicating no typecasting). */ \
/*bli_auxinfo_set_dt_on_output( dt, &aux );*/ \
\
/* The 'thread' argument points to the thrinfo_t node for the 2nd (jr)
loop around the microkernel. Here we query the thrinfo_t node for the
@@ -387,12 +357,12 @@ void PASTEMAC(ch,varname) \
intersecting micro-panel. */ \
is_a_cur = k_a1011 * PACKMR; \
is_a_cur += ( bli_is_odd( is_a_cur ) ? 1 : 0 ); \
ps_a_cur = ( is_a_cur * ss_a_num ) / ss_a_den; \
ps_a_cur = is_a_cur; \
\
/* NOTE: ir loop parallelism disabled for now. */ \
/*if ( bli_trmm_my_iter( i, ir_thread ) ) {*/ \
\
b1_i = b1 + ( off_a1011 * PACKNR ) / off_scl; \
b1_i = b1 + off_a1011 * PACKNR; \
\
/* Compute the addresses of the next panels of A and B. */ \
a2 = a1; \
@@ -408,10 +378,6 @@ void PASTEMAC(ch,varname) \
object. */ \
bli_auxinfo_set_next_a( a2, &aux ); \
bli_auxinfo_set_next_b( b2, &aux ); \
\
/* Save the 4m1/3m1 imaginary stride of A to the auxinfo_t
object. */ \
bli_auxinfo_set_is_a( is_a_cur, &aux ); \
\
/* Handle interior and edge cases separately. */ \
if ( m_cur == MR && n_cur == NR ) \
@@ -479,10 +445,6 @@ void PASTEMAC(ch,varname) \
object. */ \
bli_auxinfo_set_next_a( a2, &aux ); \
bli_auxinfo_set_next_b( b2, &aux ); \
\
/* Save the 4m1/3m1 imaginary stride of A to the auxinfo_t
object. */ \
bli_auxinfo_set_is_a( istep_a, &aux ); \
\
/* Handle interior and edge cases separately. */ \
if ( m_cur == MR && n_cur == NR ) \

View File

@@ -203,9 +203,6 @@ void PASTEMAC(ch,varname) \
inc_t rstep_c, cstep_c; \
inc_t istep_a; \
inc_t istep_b; \
inc_t off_scl; \
inc_t ss_a_num; \
inc_t ss_a_den; \
inc_t ps_a_cur; \
inc_t is_a_cur; \
auxinfo_t aux; \
@@ -243,30 +240,6 @@ void PASTEMAC(ch,varname) \
matrix), which is used by 4m1/3m1 implementations, we need
this unreduced value of k. */ \
k_full = k; \
\
/* Compute indexing scaling factor for for 4m or 3m. This is
needed because one of the packing register blocksizes (PACKMR
or PACKNR) is used to index into the micro-panels of the non-
triangular matrix when computing with a diagonal-intersecting
micro-panel of the triangular matrix. In the case of 4m or 3m,
real values are stored in both sub-panels, and so the indexing
needs to occur in units of real values. The value computed
here is divided into the complex pointer offset to cause the
pointer to be advanced by the correct value. */ \
if ( bli_is_4mi_packed( schema_a ) || \
bli_is_3mi_packed( schema_a ) || \
bli_is_rih_packed( schema_a ) ) off_scl = 2; \
else off_scl = 1; \
\
/* Compute the storage stride scaling. Usually this is just 1.
However, in the case of interleaved 3m, we need to scale the
offset by 3/2. And if we are packing real-only, imag-only, or
summed-only, we need to scale the computed panel sizes by 1/2
to compensate for the fact that the pointer arithmetic occurs
in terms of complex elements rather than real elements. */ \
if ( bli_is_3mi_packed( schema_a ) ) { ss_a_num = 3; ss_a_den = 2; } \
else if ( bli_is_rih_packed( schema_a ) ) { ss_a_num = 1; ss_a_den = 2; } \
else { ss_a_num = 1; ss_a_den = 1; } \
\
/* If there is a zero region to the left of where the diagonal of A
intersects the top edge of the block, adjust the pointer to B and
@@ -278,7 +251,7 @@ void PASTEMAC(ch,varname) \
i = diagoffa; \
k = k - i; \
diagoffa = 0; \
b_cast = b_cast + ( i * PACKNR ) / off_scl; \
b_cast = b_cast + i * PACKNR; \
} \
\
/* If there is a zero region below where the diagonal of A intersects the
@@ -324,9 +297,6 @@ void PASTEMAC(ch,varname) \
\
/* Save the imaginary stride of B to the auxinfo_t object. */ \
bli_auxinfo_set_is_b( istep_b, &aux ); \
\
/* Save the desired output datatype (indicating no typecasting). */ \
/*bli_auxinfo_set_dt_on_output( dt, &aux );*/ \
\
/* The 'thread' argument points to the thrinfo_t node for the 2nd (jr)
loop around the microkernel. Here we query the thrinfo_t node for the
@@ -394,12 +364,12 @@ void PASTEMAC(ch,varname) \
intersecting micro-panel. */ \
is_a_cur = k_a1112 * PACKMR; \
is_a_cur += ( bli_is_odd( is_a_cur ) ? 1 : 0 ); \
ps_a_cur = ( is_a_cur * ss_a_num ) / ss_a_den; \
ps_a_cur = is_a_cur; \
\
/* NOTE: ir loop parallelism disabled for now. */ \
/*if ( bli_trmm_my_iter( i, ir_thread ) ) {*/ \
\
b1_i = b1 + ( off_a1112 * PACKNR ) / off_scl; \
b1_i = b1 + off_a1112 * PACKNR; \
\
/* Compute the addresses of the next panels of A and B. */ \
a2 = a1; \
@@ -415,10 +385,6 @@ void PASTEMAC(ch,varname) \
object. */ \
bli_auxinfo_set_next_a( a2, &aux ); \
bli_auxinfo_set_next_b( b2, &aux ); \
\
/* Save the 4m1/3m1 imaginary stride of A to the auxinfo_t
object. */ \
bli_auxinfo_set_is_a( is_a_cur, &aux ); \
\
/* Handle interior and edge cases separately. */ \
if ( m_cur == MR && n_cur == NR ) \
@@ -486,10 +452,6 @@ void PASTEMAC(ch,varname) \
object. */ \
bli_auxinfo_set_next_a( a2, &aux ); \
bli_auxinfo_set_next_b( b2, &aux ); \
\
/* Save the 4m1/3m1 imaginary stride of A to the auxinfo_t
object. */ \
bli_auxinfo_set_is_a( istep_a, &aux ); \
\
/* Handle interior and edge cases separately. */ \
if ( m_cur == MR && n_cur == NR ) \

View File

@@ -203,9 +203,6 @@ void PASTEMAC(ch,varname) \
inc_t rstep_c, cstep_c; \
inc_t istep_a; \
inc_t istep_b; \
inc_t off_scl; \
inc_t ss_b_num; \
inc_t ss_b_den; \
inc_t ps_b_cur; \
inc_t is_b_cur; \
auxinfo_t aux; \
@@ -243,30 +240,6 @@ void PASTEMAC(ch,varname) \
matrix), which is used by 4m1/3m1 implementations, we need
this unreduced value of k. */ \
k_full = k; \
\
/* Compute indexing scaling factor for for 4m or 3m. This is
needed because one of the packing register blocksizes (PACKMR
or PACKNR) is used to index into the micro-panels of the non-
triangular matrix when computing with a diagonal-intersecting
micro-panel of the triangular matrix. In the case of 4m or 3m,
real values are stored in both sub-panels, and so the indexing
needs to occur in units of real values. The value computed
here is divided into the complex pointer offset to cause the
pointer to be advanced by the correct value. */ \
if ( bli_is_4mi_packed( schema_b ) || \
bli_is_3mi_packed( schema_b ) || \
bli_is_rih_packed( schema_b ) ) off_scl = 2; \
else off_scl = 1; \
\
/* Compute the storage stride scaling. Usually this is just 1.
However, in the case of interleaved 3m, we need to scale the
offset by 3/2. And if we are packing real-only, imag-only, or
summed-only, we need to scale the computed panel sizes by 1/2
to compensate for the fact that the pointer arithmetic occurs
in terms of complex elements rather than real elements. */ \
if ( bli_is_3mi_packed( schema_b ) ) { ss_b_num = 3; ss_b_den = 2; } \
else if ( bli_is_rih_packed( schema_b ) ) { ss_b_num = 1; ss_b_den = 2; } \
else { ss_b_num = 1; ss_b_den = 1; } \
\
/* If there is a zero region above where the diagonal of B intersects
the left edge of the panel, adjust the pointer to A and treat this
@@ -278,7 +251,7 @@ void PASTEMAC(ch,varname) \
j = -diagoffb; \
k = k - j; \
diagoffb = 0; \
a_cast = a_cast + ( j * PACKMR ) / off_scl; \
a_cast = a_cast + j * PACKMR; \
} \
\
/* If there is a zero region to the right of where the diagonal
@@ -324,9 +297,6 @@ void PASTEMAC(ch,varname) \
\
/* Save the imaginary stride of A to the auxinfo_t object. */ \
bli_auxinfo_set_is_a( istep_a, &aux ); \
\
/* Save the desired output datatype (indicating no typecasting). */ \
/*bli_auxinfo_set_dt_on_output( dt, &aux );*/ \
\
thrinfo_t* caucus = bli_thrinfo_sub_node( thread ); \
\
@@ -387,10 +357,6 @@ void PASTEMAC(ch,varname) \
b2 = b1; \
\
{ \
/* Save the 4m1/3m1 imaginary stride of B to the auxinfo_t
object. */ \
bli_auxinfo_set_is_b( istep_b, &aux ); \
\
/* Loop over the m dimension (MR rows at a time). */ \
for ( i = ir_start; i < ir_end; i += ir_inc ) \
{ \
@@ -504,13 +470,9 @@ void PASTEMAC(ch,varname) \
intersecting micro-panel. */ \
is_b_cur = k_b1121 * PACKNR; \
is_b_cur += ( bli_is_odd( is_b_cur ) ? 1 : 0 ); \
ps_b_cur = ( is_b_cur * ss_b_num ) / ss_b_den; \
ps_b_cur = is_b_cur; \
\
if ( bli_trmm_my_iter_rr( j, thread ) ) { \
\
/* Save the 4m1/3m1 imaginary stride of B to the auxinfo_t
object. */ \
bli_auxinfo_set_is_b( is_b_cur, &aux ); \
\
/* Loop over the m dimension (MR rows at a time). */ \
for ( i = 0; i < m_iter; ++i ) \
@@ -522,7 +484,7 @@ void PASTEMAC(ch,varname) \
\
m_cur = ( bli_is_not_edge_f( i, m_iter, m_left ) ? MR : m_left ); \
\
a1_i = a1 + ( off_b1121 * PACKMR ) / off_scl; \
a1_i = a1 + off_b1121 * PACKMR; \
\
/* Compute the addresses of the next panels of A and B. */ \
a2 = a1; \

View File

@@ -203,9 +203,6 @@ void PASTEMAC(ch,varname) \
inc_t rstep_c, cstep_c; \
inc_t istep_a; \
inc_t istep_b; \
inc_t off_scl; \
inc_t ss_b_num; \
inc_t ss_b_den; \
inc_t ps_b_cur; \
inc_t is_b_cur; \
auxinfo_t aux; \
@@ -243,30 +240,6 @@ void PASTEMAC(ch,varname) \
matrix), which is used by 4m1/3m1 implementations, we need
this unreduced value of k. */ \
k_full = k; \
\
/* Compute indexing scaling factor for for 4m or 3m. This is
needed because one of the packing register blocksizes (PACKMR
or PACKNR) is used to index into the micro-panels of the non-
triangular matrix when computing with a diagonal-intersecting
micro-panel of the triangular matrix. In the case of 4m or 3m,
real values are stored in both sub-panels, and so the indexing
needs to occur in units of real values. The value computed
here is divided into the complex pointer offset to cause the
pointer to be advanced by the correct value. */ \
if ( bli_is_4mi_packed( schema_b ) || \
bli_is_3mi_packed( schema_b ) || \
bli_is_rih_packed( schema_b ) ) off_scl = 2; \
else off_scl = 1; \
\
/* Compute the storage stride scaling. Usually this is just 1.
However, in the case of interleaved 3m, we need to scale the
offset by 3/2. And if we are packing real-only, imag-only, or
summed-only, we need to scale the computed panel sizes by 1/2
to compensate for the fact that the pointer arithmetic occurs
in terms of complex elements rather than real elements. */ \
if ( bli_is_3mi_packed( schema_b ) ) { ss_b_num = 3; ss_b_den = 2; } \
else if ( bli_is_rih_packed( schema_b ) ) { ss_b_num = 1; ss_b_den = 2; } \
else { ss_b_num = 1; ss_b_den = 1; } \
\
/* If there is a zero region to the left of where the diagonal of B
intersects the top edge of the panel, adjust the pointer to C and
@@ -325,9 +298,6 @@ void PASTEMAC(ch,varname) \
\
/* Save the imaginary stride of A to the auxinfo_t object. */ \
bli_auxinfo_set_is_a( istep_a, &aux ); \
\
/* Save the desired output datatype (indicating no typecasting). */ \
/*bli_auxinfo_set_dt_on_output( dt, &aux );*/ \
\
/* The 'thread' argument points to the thrinfo_t node for the 2nd (jr)
loop around the microkernel. Here we query the thrinfo_t node for the
@@ -409,13 +379,9 @@ void PASTEMAC(ch,varname) \
intersecting micro-panel. */ \
is_b_cur = k_b0111 * PACKNR; \
is_b_cur += ( bli_is_odd( is_b_cur ) ? 1 : 0 ); \
ps_b_cur = ( is_b_cur * ss_b_num ) / ss_b_den; \
ps_b_cur = is_b_cur; \
\
if ( bli_trmm_my_iter_rr( j, thread ) ) { \
\
/* Save the 4m1/3m1 imaginary stride of B to the auxinfo_t
object. */ \
bli_auxinfo_set_is_b( is_b_cur, &aux ); \
\
/* Loop over the m dimension (MR rows at a time). */ \
for ( i = 0; i < m_iter; ++i ) \
@@ -427,7 +393,7 @@ void PASTEMAC(ch,varname) \
\
m_cur = ( bli_is_not_edge_f( i, m_iter, m_left ) ? MR : m_left ); \
\
a1_i = a1 + ( off_b0111 * PACKMR ) / off_scl; \
a1_i = a1 + off_b0111 * PACKMR; \
\
/* Compute the addresses of the next panels of A and B. */ \
a2 = a1; \
@@ -542,10 +508,6 @@ void PASTEMAC(ch,varname) \
This allows the current macro-kernel to work for both trmm
and trmm3. */ \
{ \
/* Save the 4m1/3m1 imaginary stride of B to the auxinfo_t
object. */ \
bli_auxinfo_set_is_b( istep_b, &aux ); \
\
/* Loop over the m dimension (MR rows at a time). */ \
for ( i = ir_start; i < ir_end; i += ir_inc ) \
{ \

View File

@@ -53,10 +53,6 @@ void bli_trmm3_front
obj_t b_local;
obj_t c_local;
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_trmm_check( side, alpha, a, b, beta, c, cntx );
// If alpha is zero, scale by beta and return.
if ( bli_obj_equals( alpha, &BLIS_ZERO ) )
{

View File

@@ -59,10 +59,6 @@ void bli_trsm_front
#endif
#endif
// Check parameters.
if ( bli_error_checking_is_enabled() )
bli_trsm_check( side, alpha, a, b, &BLIS_ZERO, b, cntx );
// If alpha is zero, scale by beta and return.
if ( bli_obj_equals( alpha, &BLIS_ZERO ) )
{

View File

@@ -209,9 +209,6 @@ void PASTEMAC(ch,varname) \
inc_t rstep_c, cstep_c; \
inc_t istep_a; \
inc_t istep_b; \
inc_t off_scl; \
inc_t ss_a_num; \
inc_t ss_a_den; \
inc_t ps_a_cur; \
inc_t is_a_cur; \
auxinfo_t aux; \
@@ -249,29 +246,6 @@ void PASTEMAC(ch,varname) \
matrix), which is used by 4m1/3m1 implementations, we need
this unreduced value of k. */ \
k_full = ( k % MR != 0 ? k + MR - ( k % MR ) : k ); \
\
/* Compute indexing scaling factor for for 4m or 3m. This is
needed because one of the packing register blocksizes (PACKMR
or PACKNR) is used to index into the micro-panels of the non-
triangular matrix when computing with a diagonal-intersecting
micro-panel of the triangular matrix. In the case of 4m or 3m,
real values are stored in both sub-panels, and so the indexing
needs to occur in units of real values. The value computed
here is divided into the complex pointer offset to cause the
pointer to be advanced by the correct value. */ \
if ( bli_is_4mi_packed( schema_a ) || \
bli_is_3mi_packed( schema_a ) || \
bli_is_rih_packed( schema_a ) ) off_scl = 2; \
else off_scl = 1; \
\
/* Compute the storage stride scaling. Usually this is just 1.
However, in the case of interleaved 3m, we need to scale the
offset by 3/2. Note that real-only, imag-only, and summed-only
packing formats are not applicable here since trsm is a two-
operand operation only (unlike trmm, which is capable of three-
operand). */ \
if ( bli_is_3mi_packed( schema_a ) ) { ss_a_num = 3; ss_a_den = 2; } \
else { ss_a_num = 1; ss_a_den = 1; } \
\
/* If there is a zero region above where the diagonal of A intersects the
left edge of the block, adjust the pointer to C and treat this case as
@@ -339,9 +313,6 @@ void PASTEMAC(ch,varname) \
\
/* Save the imaginary stride of B to the auxinfo_t object. */ \
bli_auxinfo_set_is_b( istep_b, &aux ); \
\
/* Save the desired output datatype (indicating no typecasting). */ \
/*bli_auxinfo_set_dt_on_output( dt, &aux );*/ \
\
/* We don't bother querying the thrinfo_t node for the 1st loop because
we can't parallelize that loop in trsm due to the inter-iteration
@@ -411,18 +382,18 @@ void PASTEMAC(ch,varname) \
intersecting micro-panel. */ \
is_a_cur = k_a1011 * PACKMR; \
is_a_cur += ( bli_is_odd( is_a_cur ) ? 1 : 0 ); \
ps_a_cur = ( is_a_cur * ss_a_num ) / ss_a_den; \
ps_a_cur = is_a_cur; \
\
/* Compute the addresses of the panel A10 and the triangular
block A11. */ \
a10 = a1; \
/* a11 = a1 + ( k_a10 * PACKMR ) / off_scl; */ \
a11 = bli_ptr_inc_by_frac( a1, sizeof( ctype ), k_a10 * PACKMR, off_scl ); \
a11 = a1 + k_a10 * PACKMR; \
/*a11 = bli_ptr_inc_by_frac( a1, sizeof( ctype ), k_a10 * PACKMR, 1 );*/ \
\
/* Compute the addresses of the panel B01 and the block
B11. */ \
b01 = b1 + ( off_a10 * PACKNR ) / off_scl; \
b11 = b1 + ( off_a11 * PACKNR ) / off_scl; \
b01 = b1 + off_a10 * PACKNR; \
b11 = b1 + off_a11 * PACKNR; \
\
/* Compute the addresses of the next panels of A and B. */ \
a2 = a1 + ps_a_cur; \
@@ -438,10 +409,6 @@ void PASTEMAC(ch,varname) \
object. */ \
bli_auxinfo_set_next_a( a2, &aux ); \
bli_auxinfo_set_next_b( b2, &aux ); \
\
/* Save the 4m1/3m1 imaginary stride of A to the auxinfo_t
object. */ \
bli_auxinfo_set_is_a( is_a_cur, &aux ); \
\
/* Handle interior and edge cases separately. */ \
if ( m_cur == MR && n_cur == NR ) \
@@ -502,10 +469,6 @@ void PASTEMAC(ch,varname) \
object. */ \
bli_auxinfo_set_next_a( a2, &aux ); \
bli_auxinfo_set_next_b( b2, &aux ); \
\
/* Save the 4m1/3m1 imaginary stride of A to the auxinfo_t
object. */ \
bli_auxinfo_set_is_a( istep_a, &aux ); \
\
/* Handle interior and edge cases separately. */ \
if ( m_cur == MR && n_cur == NR ) \
@@ -553,44 +516,11 @@ void PASTEMAC(ch,varname) \
} \
\
/*
if ( bli_is_4mi_packed( schema_a ) ){ \
PASTEMAC(d,fprintm)( stdout, "trsm4m1_ll_ker_var2: b_r before", k, n, \
( double* )b, rs_b, 1, "%4.1f", "" ); \
PASTEMAC(d,fprintm)( stdout, "trsm4m1_ll_ker_var2: b_i before", k, n, \
( double* )b+72, rs_b, 1, "%4.1f", "" ); \
}else{ \
PASTEMAC(d,fprintm)( stdout, "trsmnat_ll_ker_var2: b_r before", k, n, \
( double* )b, 2*rs_b, 2, "%4.1f", "" ); \
PASTEMAC(d,fprintm)( stdout, "trsmnat_ll_ker_var2: b_i before", k, n, \
( double* )b+1, 2*rs_b, 2, "%4.1f", "" ); \
} \
*/ \
\
/*
PASTEMAC(d,fprintm)( stdout, "trsm_ll_ker_var2: a11p_r computed", MR, MR, \
( double* )a11, 1, PACKMR, "%4.1f", "" ); \
*/ \
\
/*
if ( bli_is_4mi_packed( schema_a ) ){ \
PASTEMAC(d,fprintm)( stdout, "trsm4m1_ll_ker_var2: b_r after", k, n, \
( double* )b, rs_b, 1, "%4.1f", "" ); \
PASTEMAC(d,fprintm)( stdout, "trsm4m1_ll_ker_var2: b_i after", k, n, \
( double* )b+72, rs_b, 1, "%4.1f", "" ); \
}else{ \
PASTEMAC(d,fprintm)( stdout, "trsmnat_ll_ker_var2: b_r after", k, n, \
( double* )b, 2*rs_b, 2, "%4.1f", "" ); \
PASTEMAC(d,fprintm)( stdout, "trsmnat_ll_ker_var2: b_i after", k, n, \
( double* )b+1, 2*rs_b, 2, "%4.1f", "" ); \
} \
PASTEMAC(d,fprintm)( stdout, "trsm_ll_ker_var2: b_r", m, n, \
( double* )c, 1, cs_c, "%4.1f", "" ); \
PASTEMAC(d,fprintm)( stdout, "trsm_ll_ker_var2: b_i", m, n, \
( double* )c + 8*9, 1, cs_c, "%4.1f", "" ); \
*/ \
\
/*
PASTEMAC(ch,fprintm)( stdout, "trsm_ll_ker_var2: a1 (diag)", MR, k_a1011, a1, 1, MR, "%5.2f", "" ); \
PASTEMAC(ch,fprintm)( stdout, "trsm_ll_ker_var2: a11 (diag)", MR, MR, a11, 1, MR, "%5.2f", "" ); \
PASTEMAC(ch,fprintm)( stdout, "trsm_ll_ker_var2: b1 (diag)", k_a1011, NR, bp_i, NR, 1, "%5.2f", "" ); \

View File

@@ -210,9 +210,6 @@ void PASTEMAC(ch,varname) \
inc_t rstep_c, cstep_c; \
inc_t istep_a; \
inc_t istep_b; \
inc_t off_scl; \
inc_t ss_a_num; \
inc_t ss_a_den; \
inc_t ps_a_cur; \
inc_t is_a_cur; \
auxinfo_t aux; \
@@ -250,29 +247,6 @@ void PASTEMAC(ch,varname) \
matrix), which is used by 4m1/3m1 implementations, we need
this unreduced value of k. */ \
k_full = ( k % MR != 0 ? k + MR - ( k % MR ) : k ); \
\
/* Compute indexing scaling factor for for 4m or 3m. This is
needed because one of the packing register blocksizes (PACKMR
or PACKNR) is used to index into the micro-panels of the non-
triangular matrix when computing with a diagonal-intersecting
micro-panel of the triangular matrix. In the case of 4m or 3m,
real values are stored in both sub-panels, and so the indexing
needs to occur in units of real values. The value computed
here is divided into the complex pointer offset to cause the
pointer to be advanced by the correct value. */ \
if ( bli_is_4mi_packed( schema_a ) || \
bli_is_3mi_packed( schema_a ) || \
bli_is_rih_packed( schema_a ) ) off_scl = 2; \
else off_scl = 1; \
\
/* Compute the storage stride scaling. Usually this is just 1.
However, in the case of interleaved 3m, we need to scale the
offset by 3/2. Note that real-only, imag-only, and summed-only
packing formats are not applicable here since trsm is a two-
operand operation only (unlike trmm, which is capable of three-
operand). */ \
if ( bli_is_3mi_packed( schema_a ) ) { ss_a_num = 3; ss_a_den = 2; } \
else { ss_a_num = 1; ss_a_den = 1; } \
\
/* If there is a zero region to the left of where the diagonal of A
intersects the top edge of the block, adjust the pointer to B and
@@ -284,7 +258,7 @@ void PASTEMAC(ch,varname) \
i = diagoffa; \
k = k - i; \
diagoffa = 0; \
b_cast = b_cast + ( i * PACKNR ) / off_scl; \
b_cast = b_cast + i * PACKNR; \
} \
\
/* If there is a zero region below where the diagonal of A intersects the
@@ -347,9 +321,6 @@ void PASTEMAC(ch,varname) \
\
/* Save the imaginary stride of B to the auxinfo_t object. */ \
bli_auxinfo_set_is_b( istep_b, &aux ); \
\
/* Save the desired output datatype (indicating no typecasting). */ \
/*bli_auxinfo_set_dt_on_output( dt, &aux );*/ \
\
/* We don't bother querying the thrinfo_t node for the 1st loop because
we can't parallelize that loop in trsm due to the inter-iteration
@@ -421,18 +392,18 @@ void PASTEMAC(ch,varname) \
intersecting micro-panel. */ \
is_a_cur = k_a1112 * PACKMR; \
is_a_cur += ( bli_is_odd( is_a_cur ) ? 1 : 0 ); \
ps_a_cur = ( is_a_cur * ss_a_num ) / ss_a_den; \
ps_a_cur = is_a_cur; \
\
/* Compute the addresses of the triangular block A11 and the
panel A12. */ \
a11 = a1; \
/* a12 = a1 + ( k_a11 * PACKMR ) / off_scl; */ \
a12 = bli_ptr_inc_by_frac( a1, sizeof( ctype ), k_a11 * PACKMR, off_scl ); \
a12 = a1 + k_a11 * PACKMR; \
/*a12 = bli_ptr_inc_by_frac( a1, sizeof( ctype ), k_a11 * PACKMR, 1 );*/ \
\
/* Compute the addresses of the panel B01 and the block
B11. */ \
b11 = b1 + ( off_a11 * PACKNR ) / off_scl; \
b21 = b1 + ( off_a12 * PACKNR ) / off_scl; \
b11 = b1 + off_a11 * PACKNR; \
b21 = b1 + off_a12 * PACKNR; \
\
/* Compute the addresses of the next panels of A and B. */ \
a2 = a1 + ps_a_cur; \
@@ -448,10 +419,6 @@ void PASTEMAC(ch,varname) \
object. */ \
bli_auxinfo_set_next_a( a2, &aux ); \
bli_auxinfo_set_next_b( b2, &aux ); \
\
/* Save the 4m1/3m1 imaginary stride of A to the auxinfo_t
object. */ \
bli_auxinfo_set_is_a( is_a_cur, &aux ); \
\
/* Handle interior and edge cases separately. */ \
if ( m_cur == MR && n_cur == NR ) \
@@ -512,10 +479,6 @@ void PASTEMAC(ch,varname) \
object. */ \
bli_auxinfo_set_next_a( a2, &aux ); \
bli_auxinfo_set_next_b( b2, &aux ); \
\
/* Save the 4m1/3m1 imaginary stride of A to the auxinfo_t
object. */ \
bli_auxinfo_set_is_a( istep_a, &aux ); \
\
/* Handle interior and edge cases separately. */ \
if ( m_cur == MR && n_cur == NR ) \

View File

@@ -215,9 +215,6 @@ void PASTEMAC(ch,varname) \
inc_t rstep_c, cstep_c; \
inc_t istep_a; \
inc_t istep_b; \
inc_t off_scl; \
inc_t ss_b_num; \
inc_t ss_b_den; \
inc_t ps_b_cur; \
inc_t is_b_cur; \
auxinfo_t aux; \
@@ -263,29 +260,6 @@ void PASTEMAC(ch,varname) \
matrix), which is used by 4m1/3m1 implementations, we need
this unreduced value of k. */ \
k_full = ( k % NR != 0 ? k + NR - ( k % NR ) : k ); \
\
/* Compute indexing scaling factor for for 4m or 3m. This is
needed because one of the packing register blocksizes (PACKMR
or PACKNR) is used to index into the micro-panels of the non-
triangular matrix when computing with a diagonal-intersecting
micro-panel of the triangular matrix. In the case of 4m or 3m,
real values are stored in both sub-panels, and so the indexing
needs to occur in units of real values. The value computed
here is divided into the complex pointer offset to cause the
pointer to be advanced by the correct value. */ \
if ( bli_is_4mi_packed( schema_b ) || \
bli_is_3mi_packed( schema_b ) || \
bli_is_rih_packed( schema_b ) ) off_scl = 2; \
else off_scl = 1; \
\
/* Compute the storage stride scaling. Usually this is just 1.
However, in the case of interleaved 3m, we need to scale the
offset by 3/2. Note that real-only, imag-only, and summed-only
packing formats are not applicable here since trsm is a two-
operand operation only (unlike trmm, which is capable of three-
operand). */ \
if ( bli_is_3mi_packed( schema_b ) ) { ss_b_num = 3; ss_b_den = 2; } \
else { ss_b_num = 1; ss_b_den = 1; } \
\
/* If there is a zero region above where the diagonal of B intersects
the left edge of the panel, adjust the pointer to A and treat this
@@ -297,7 +271,7 @@ void PASTEMAC(ch,varname) \
j = -diagoffb; \
k = k - j; \
diagoffb = 0; \
a_cast = a_cast + ( j * PACKMR ) / off_scl; \
a_cast = a_cast + j * PACKMR; \
} \
\
/* If there is a zero region to the right of where the diagonal
@@ -369,9 +343,6 @@ void PASTEMAC(ch,varname) \
NOTE: We swap the values for A and B since the triangular
"A" matrix is actually contained within B. */ \
bli_auxinfo_set_is_b( istep_a, &aux ); \
\
/* Save the desired output datatype (indicating no typecasting). */ \
/*bli_auxinfo_set_dt_on_output( dt, &aux );*/ \
\
b1 = b_cast; \
c1 = c_cast; \
@@ -413,20 +384,14 @@ void PASTEMAC(ch,varname) \
\
/* Compute the addresses of the triangular block B11 and the
panel B21. */ \
b11 = b1; \
/* b21 = b1 + ( k_b11 * PACKNR ) / off_scl; */ \
b21 = bli_ptr_inc_by_frac( b1, sizeof( ctype ), k_b11 * PACKNR, off_scl ); \
b11 = b1; \
b21 = b1 + k_b11 * PACKNR; \
/*b21 = bli_ptr_inc_by_frac( b1, sizeof( ctype ), k_b11 * PACKNR, 1 );*/ \
\
/* Compute the panel stride for the current micro-panel. */ \
is_b_cur = k_b1121 * PACKNR; \
is_b_cur += ( bli_is_odd( is_b_cur ) ? 1 : 0 ); \
ps_b_cur = ( is_b_cur * ss_b_num ) / ss_b_den; \
\
/* Save the 4m1/3m1 imaginary stride of B to the auxinfo_t
object.
NOTE: We swap the values for A and B since the triangular
"A" matrix is actually contained within B. */ \
bli_auxinfo_set_is_a( is_b_cur, &aux ); \
ps_b_cur = is_b_cur; \
\
/* Loop over the m dimension (MR rows at a time). */ \
for ( i = 0; i < m_iter; ++i ) \
@@ -440,8 +405,8 @@ void PASTEMAC(ch,varname) \
m_cur = ( bli_is_not_edge_f( i, m_iter, m_left ) ? MR : m_left ); \
\
/* Compute the addresses of the A11 block and A12 panel. */ \
a11 = a1 + ( off_b11 * PACKMR ) / off_scl; \
a12 = a1 + ( off_b21 * PACKMR ) / off_scl; \
a11 = a1 + off_b11 * PACKMR; \
a12 = a1 + off_b21 * PACKMR; \
\
/* Compute the addresses of the next panels of A and B. */ \
a2 = a1; \
@@ -508,12 +473,6 @@ void PASTEMAC(ch,varname) \
} \
else if ( bli_is_strictly_below_diag_n( diagoffb_j, k, NR ) ) \
{ \
/* Save the 4m1/3m1 imaginary stride of B to the auxinfo_t
object.
NOTE: We swap the values for A and B since the triangular
"A" matrix is actually contained within B. */ \
bli_auxinfo_set_is_a( istep_b, &aux ); \
\
/* Loop over the m dimension (MR rows at a time). */ \
for ( i = 0; i < m_iter; ++i ) \
{ \

View File

@@ -214,9 +214,6 @@ void PASTEMAC(ch,varname) \
inc_t rstep_c, cstep_c; \
inc_t istep_a; \
inc_t istep_b; \
inc_t off_scl; \
inc_t ss_b_num; \
inc_t ss_b_den; \
inc_t ps_b_cur; \
inc_t is_b_cur; \
auxinfo_t aux; \
@@ -262,29 +259,6 @@ void PASTEMAC(ch,varname) \
matrix), which is used by 4m1/3m1 implementations, we need
this unreduced value of k. */ \
k_full = ( k % NR != 0 ? k + NR - ( k % NR ) : k ); \
\
/* Compute indexing scaling factor for for 4m or 3m. This is
needed because one of the packing register blocksizes (PACKMR
or PACKNR) is used to index into the micro-panels of the non-
triangular matrix when computing with a diagonal-intersecting
micro-panel of the triangular matrix. In the case of 4m or 3m,
real values are stored in both sub-panels, and so the indexing
needs to occur in units of real values. The value computed
here is divided into the complex pointer offset to cause the
pointer to be advanced by the correct value. */ \
if ( bli_is_4mi_packed( schema_b ) || \
bli_is_3mi_packed( schema_b ) || \
bli_is_rih_packed( schema_b ) ) off_scl = 2; \
else off_scl = 1; \
\
/* Compute the storage stride scaling. Usually this is just 1.
However, in the case of interleaved 3m, we need to scale the
offset by 3/2. Note that real-only, imag-only, and summed-only
packing formats are not applicable here since trsm is a two-
operand operation only (unlike trmm, which is capable of three-
operand). */ \
if ( bli_is_3mi_packed( schema_b ) ) { ss_b_num = 3; ss_b_den = 2; } \
else { ss_b_num = 1; ss_b_den = 1; } \
\
/* If there is a zero region to the left of where the diagonal of B
intersects the top edge of the panel, adjust the pointer to C and
@@ -364,9 +338,6 @@ void PASTEMAC(ch,varname) \
NOTE: We swap the values for A and B since the triangular
"A" matrix is actually contained within B. */ \
bli_auxinfo_set_is_b( istep_a, &aux ); \
\
/* Save the desired output datatype (indicating no typecasting). */ \
/*bli_auxinfo_set_dt_on_output( dt, &aux );*/ \
\
b1 = b_cast; \
c1 = c_cast; \
@@ -406,20 +377,14 @@ void PASTEMAC(ch,varname) \
\
/* Compute the addresses of the panel B10 and the triangular
block B11. */ \
b01 = b1; \
/* b11 = b1 + ( k_b01 * PACKNR ) / off_scl; */ \
b11 = bli_ptr_inc_by_frac( b1, sizeof( ctype ), k_b01 * PACKNR, off_scl ); \
b01 = b1; \
b11 = b1 + k_b01 * PACKNR; \
/*b11 = bli_ptr_inc_by_frac( b1, sizeof( ctype ), k_b01 * PACKNR, 1 );*/ \
\
/* Compute the panel stride for the current micro-panel. */ \
is_b_cur = k_b0111 * PACKNR; \
is_b_cur += ( bli_is_odd( is_b_cur ) ? 1 : 0 ); \
ps_b_cur = ( is_b_cur * ss_b_num ) / ss_b_den; \
\
/* Save the 4m1/3m1 imaginary stride of B to the auxinfo_t
object.
NOTE: We swap the values for A and B since the triangular
"A" matrix is actually contained within B. */ \
bli_auxinfo_set_is_a( is_b_cur, &aux ); \
ps_b_cur = is_b_cur; \
\
/* Loop over the m dimension (MR rows at a time). */ \
for ( i = 0; i < m_iter; ++i ) \
@@ -433,8 +398,8 @@ void PASTEMAC(ch,varname) \
m_cur = ( bli_is_not_edge_f( i, m_iter, m_left ) ? MR : m_left ); \
\
/* Compute the addresses of the A10 panel and A11 block. */ \
a10 = a1 + ( off_b01 * PACKMR ) / off_scl; \
a11 = a1 + ( off_b11 * PACKMR ) / off_scl; \
a10 = a1 + off_b01 * PACKMR; \
a11 = a1 + off_b11 * PACKMR; \
\
/* Compute the addresses of the next panels of A and B. */ \
a2 = a1; \
@@ -501,12 +466,6 @@ void PASTEMAC(ch,varname) \
} \
else if ( bli_is_strictly_above_diag_n( diagoffb_j, k, NR ) ) \
{ \
/* Save the 4m1/3m1 imaginary stride of B to the auxinfo_t
object.
NOTE: We swap the values for A and B since the triangular
"A" matrix is actually contained within B. */ \
bli_auxinfo_set_is_a( istep_b, &aux ); \
\
/* Loop over the m dimension (MR rows at a time). */ \
for ( i = 0; i < m_iter; ++i ) \
{ \

View File

@@ -74,13 +74,6 @@ BLIS_INLINE inc_t bli_auxinfo_ps_b( auxinfo_t* ai )
return ai->ps_b;
}
#if 0
BLIS_INLINE inc_t bli_auxinfo_dt_on_output( auxinfo_t* ai )
{
return ai->dt_on_output;
}
#endif
// auxinfo_t field modification
@@ -125,12 +118,5 @@ BLIS_INLINE void bli_auxinfo_set_ps_b( inc_t ps, auxinfo_t* ai )
ai->ps_b = ps;
}
#if 0
BLIS_INLINE void bli_auxinfo_set_dt_on_output( num_t dt_on_output, auxinfo_t* ai )
{
ai->dt_on_output = dt_on_output;
}
#endif
#endif

View File

@@ -224,12 +224,6 @@ void bli_cntx_set_blkszs( ind_t method, dim_t n_bs, ... )
double msclr = msclrs[ i ];
blksz_t* blksz = blkszs[ i ];
// NOTE: This is a bug! We need to grab the actual blocksize
// multiple, which is not at blkszs[i], but rather somewhere else
// in the array. In order to fix this, you probably need to store
// the contents of blkszs (and all the other arrays) by bs_id
// rather than i in the first loop.
blksz_t* bmult = blkszs[ i ];
blksz_t* cntx_blksz = &cntx_blkszs[ bs_id ];
@@ -248,20 +242,6 @@ void bli_cntx_set_blkszs( ind_t method, dim_t n_bs, ... )
// blocksize object.
bli_blksz_scale_def( 1, ( dim_t )dsclr, BLIS_SCOMPLEX, cntx_blksz );
bli_blksz_scale_def( 1, ( dim_t )dsclr, BLIS_DCOMPLEX, cntx_blksz );
// Perform rounding to ensure the newly scaled values are still
// multiples of their register blocksize multiples. But only
// perform this rounding when the blocksize id is not equal to
// the blocksize multiple id (ie: we don't round down scaled
// register blocksizes since they are their own multiples).
// Also, we skip the rounding for 1m since it should never need
// such rounding.
if ( bs_id != bm_id && method != BLIS_1M )
{
// Round the newly-scaled blocksizes down to their multiple.
bli_blksz_reduce_def_to( BLIS_FLOAT, bmult, BLIS_SCOMPLEX, cntx_blksz );
bli_blksz_reduce_def_to( BLIS_DOUBLE, bmult, BLIS_DCOMPLEX, cntx_blksz );
}
}
// Similarly, if the maximum blocksize scalar is non-unit, we need
@@ -272,20 +252,6 @@ void bli_cntx_set_blkszs( ind_t method, dim_t n_bs, ... )
// blocksize object.
bli_blksz_scale_max( 1, ( dim_t )msclr, BLIS_SCOMPLEX, cntx_blksz );
bli_blksz_scale_max( 1, ( dim_t )msclr, BLIS_DCOMPLEX, cntx_blksz );
// Perform rounding to ensure the newly scaled values are still
// multiples of their register blocksize multiples. But only
// perform this rounding when the blocksize id is not equal to
// the blocksize multiple id (ie: we don't round down scaled
// register blocksizes since they are their own multiples).
// Also, we skip the rounding for 1m since it should never need
// such rounding.
if ( bs_id != bm_id && method != BLIS_1M )
{
// Round the newly-scaled blocksizes down to their multiple.
bli_blksz_reduce_max_to( BLIS_FLOAT, bmult, BLIS_SCOMPLEX, cntx_blksz );
bli_blksz_reduce_max_to( BLIS_DOUBLE, bmult, BLIS_DCOMPLEX, cntx_blksz );
}
}
// Copy the blocksize multiple id into the context.
@@ -422,14 +388,10 @@ void bli_cntx_set_ind_blkszs( ind_t method, num_t dt, dim_t n_bs, ... )
//blksz_t* cntx_blksz = &cntx_blkszs[ bs_id ];
// Query the blocksize multiple's blocksize id.
bszid_t bm_id = bli_cntx_get_bmult_id( bs_id, cntx );
// Query the context for the blksz_t object assoicated with the
// current blocksize id, and also query the object corresponding
// to the blocksize multiple.
blksz_t* cntx_blksz = bli_cntx_get_blksz( bs_id, cntx );
blksz_t* cntx_bmult = bli_cntx_get_bmult( bs_id, cntx );
// Copy the real domain value of the blksz_t object into the
// corresponding complex domain slot of the same object.
@@ -442,19 +404,6 @@ void bli_cntx_set_ind_blkszs( ind_t method, num_t dt, dim_t n_bs, ... )
// Scale the default blocksize value corresponding to the given
// datatype.
bli_blksz_scale_def( 1, ( dim_t )dsclr, dt, cntx_blksz );
// Perform rounding to ensure the newly scaled values are still
// multiples of their register blocksize multiples. But only
// perform this rounding when the blocksize id is not equal to
// the blocksize multiple id (ie: we don't round down scaled
// register blocksizes since they are their own multiples).
// Also, we skip the rounding for 1m since it should never need
// such rounding.
if ( bs_id != bm_id && method != BLIS_1M )
{
// Round the newly-scaled blocksize down to its multiple.
bli_blksz_reduce_def_to( dt_real, cntx_bmult, dt, cntx_blksz );
}
}
// Similarly, if the maximum blocksize scalar is non-unit, we need
@@ -464,19 +413,6 @@ void bli_cntx_set_ind_blkszs( ind_t method, num_t dt, dim_t n_bs, ... )
// Scale the maximum blocksize value corresponding to the given
// datatype.
bli_blksz_scale_max( 1, ( dim_t )msclr, dt, cntx_blksz );
// Perform rounding to ensure the newly scaled values are still
// multiples of their register blocksize multiples. But only
// perform this rounding when the blocksize id is not equal to
// the blocksize multiple id (ie: we don't round down scaled
// register blocksizes since they are their own multiples).
// Also, we skip the rounding for 1m since it should never need
// such rounding.
if ( bs_id != bm_id && method != BLIS_1M )
{
// Round the newly-scaled blocksize down to their multiple.
bli_blksz_reduce_max_to( dt_real, cntx_bmult, dt, cntx_blksz );
}
}
}
}

View File

@@ -36,11 +36,6 @@
static char* bli_ind_impl_str[BLIS_NUM_IND_METHODS] =
{
/* 3mh */ "3mh",
/* 3m1 */ "3m1",
/* 4mh */ "4mh",
/* 4m1b */ "4m1b",
/* 4m1a */ "4m1a",
/* 1m */ "1m",
/* nat */ "native",
};
@@ -147,8 +142,9 @@ bool bli_ind_oper_is_impl( opid_t oper, ind_t method )
if ( bli_opid_is_level3( oper ) )
{
// Look up whether its func_t pointer in the table is NULL.
is_impl = ( bli_l3_ind_oper_get_func( oper, method ) != NULL );
// Look up whether the operation is implemented for the given induced
// method id.
is_impl = bli_l3_ind_oper_is_impl( oper, method );
}
else
{
@@ -162,39 +158,6 @@ bool bli_ind_oper_is_impl( opid_t oper, ind_t method )
return is_impl;
}
#if 0
bool bli_ind_oper_has_avail( opid_t oper, num_t dt )
{
ind_t method = bli_ind_oper_find_avail( oper, dt );
if ( method == BLIS_NAT ) return FALSE;
else return TRUE;
}
#endif
void_fp bli_ind_oper_get_avail( opid_t oper, num_t dt )
{
void_fp func_p;
if ( bli_opid_is_level3( oper ) )
{
ind_t method = bli_ind_oper_find_avail( oper, dt );
func_p = bli_l3_ind_oper_get_func( oper, method );
}
else
{
// Currently, any operation that is not level-3 does not
// have induced method implementations. (This should actually
// assign the pointer to be the native front-end, but for
// now there are no calls to bli_ind_oper_get_avail() in the
// context of level-2 operations.
func_p = NULL;
}
return func_p;
}
ind_t bli_ind_oper_find_avail( opid_t oper, num_t dt )
{
ind_t method;

View File

@@ -38,16 +38,6 @@
// level-3 induced method management
#include "bli_l3_ind.h"
// level-3 object APIs
#include "bli_l3_ind_oapi.h"
// level-3 typed APIs
#include "bli_l3_ind_tapi.h"
// level-3 cntx initialization
#include "bli_cntx_ind_stage.h"
void bli_ind_init( void );
void bli_ind_finalize( void );
@@ -62,8 +52,6 @@ BLIS_EXPORT_BLIS void bli_ind_disable_all_dt( num_t dt );
BLIS_EXPORT_BLIS void bli_ind_oper_enable_only( opid_t oper, ind_t method, num_t dt );
BLIS_EXPORT_BLIS bool bli_ind_oper_is_impl( opid_t oper, ind_t method );
//bool bli_ind_oper_has_avail( opid_t oper, num_t dt );
BLIS_EXPORT_BLIS void_fp bli_ind_oper_get_avail( opid_t oper, num_t dt );
BLIS_EXPORT_BLIS ind_t bli_ind_oper_find_avail( opid_t oper, num_t dt );
BLIS_EXPORT_BLIS char* bli_ind_oper_get_avail_impl_string( opid_t oper, num_t dt );

View File

@@ -988,50 +988,6 @@ BLIS_INLINE bool bli_is_panel_packed( pack_t schema )
( schema & BLIS_PACK_PANEL_BIT );
}
BLIS_INLINE bool bli_is_4mi_packed( pack_t schema )
{
return ( bool )
( ( schema & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_4MI );
}
BLIS_INLINE bool bli_is_3mi_packed( pack_t schema )
{
return ( bool )
( ( schema & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_3MI );
}
BLIS_INLINE bool bli_is_3ms_packed( pack_t schema )
{
return ( bool )
( ( schema & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_3MS );
}
BLIS_INLINE bool bli_is_ro_packed( pack_t schema )
{
return ( bool )
( ( schema & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_RO );
}
BLIS_INLINE bool bli_is_io_packed( pack_t schema )
{
return ( bool )
( ( schema & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_IO );
}
BLIS_INLINE bool bli_is_rpi_packed( pack_t schema )
{
return ( bool )
( ( schema & BLIS_PACK_FORMAT_BITS ) == BLIS_BITVAL_RPI );
}
BLIS_INLINE bool bli_is_rih_packed( pack_t schema )
{
return ( bool )
( bli_is_ro_packed( schema ) ||
bli_is_io_packed( schema ) ||
bli_is_rpi_packed( schema ) );
}
BLIS_INLINE bool bli_is_1r_packed( pack_t schema )
{
return ( bool )
@@ -1070,20 +1026,6 @@ BLIS_INLINE guint_t bli_pack_schema_index( pack_t schema )
}
// pointer-related
// Increment a pointer by an integer fraction:
// p0 + (num/dem)
// where p0 is a pointer to a datatype of size sizeof_p0.
BLIS_INLINE void_fp bli_ptr_inc_by_frac( void_fp p0, siz_t sizeof_p0, dim_t num, dim_t den )
{
return ( void_fp )
( ( char* )p0 + ( ( num * ( dim_t )sizeof_p0 ) / den ) );
}
// Set dimensions, increments, effective uplo/diagoff, etc for ONE matrix
// argument.

View File

@@ -206,37 +206,6 @@
#include "bli_set0bbs_mxn.h"
// -- 3m-specific scalar macros --
#include "bli_copyri3s.h"
#include "bli_copyjri3s.h"
#include "bli_scal2ri3s.h"
#include "bli_scal2jri3s.h"
#include "bli_scal2ri3s_mxn.h"
// -- 4mh/3mh-specific scalar macros --
// ro
#include "bli_scal2ros.h"
#include "bli_scal2jros.h"
// io
#include "bli_scal2ios.h"
#include "bli_scal2jios.h"
// rpi
#include "bli_scal2rpis.h"
#include "bli_scal2jrpis.h"
#include "bli_scal2rihs_mxn.h"
#include "bli_scal2rihs_mxn_diag.h"
#include "bli_scal2rihs_mxn_uplo.h"
#include "bli_setrihs_mxn_diag.h"
// -- 1m-specific scalar macros --
// 1e

View File

@@ -248,24 +248,10 @@ typedef void (*free_ft) ( void* p );
- 1 0000 01: packed by columns
- 1 0000 10: packed by row panels
- 1 0000 11: packed by column panels
- 1 0001 10: packed by 4m interleaved row panels
- 1 0001 11: packed by 4m interleaved column panels
- 1 0010 10: packed by 3m interleaved row panels
- 1 0010 11: packed by 3m interleaved column panels
- 1 0011 10: packed by 4m separated row panels (not used)
- 1 0011 11: packed by 4m separated column panels (not used)
- 1 0100 10: packed by 3m separated row panels
- 1 0100 11: packed by 3m separated column panels
- 1 0101 10: packed real-only row panels
- 1 0101 11: packed real-only column panels
- 1 0110 10: packed imag-only row panels
- 1 0110 11: packed imag-only column panels
- 1 0111 10: packed real+imag row panels
- 1 0111 11: packed real+imag column panels
- 1 1000 10: packed by 1m expanded row panels
- 1 1000 11: packed by 1m expanded column panels
- 1 1001 10: packed by 1m reordered row panels
- 1 1001 11: packed by 1m reordered column panels
- 1 0001 10: packed by 1m expanded row panels
- 1 0001 11: packed by 1m expanded column panels
- 1 0010 10: packed by 1m reordered row panels
- 1 0010 11: packed by 1m reordered column panels
23 Packed panel order if upper-stored
- 0 == forward order if upper
- 1 == reverse order if upper
@@ -403,34 +389,13 @@ typedef void (*free_ft) ( void* p );
#define BLIS_BITVAL_UNIT_DIAG BLIS_UNIT_DIAG_BIT
#define BLIS_BITVAL_INVERT_DIAG BLIS_INVERT_DIAG_BIT
#define BLIS_BITVAL_NOT_PACKED 0x0
#define BLIS_BITVAL_4MI ( 0x1 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_3MI ( 0x2 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_4MS ( 0x3 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_3MS ( 0x4 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_RO ( 0x5 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_IO ( 0x6 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_RPI ( 0x7 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_1E ( 0x8 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_1R ( 0x9 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_1E ( 0x1 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_1R ( 0x2 << BLIS_PACK_FORMAT_SHIFT )
#define BLIS_BITVAL_PACKED_UNSPEC ( BLIS_PACK_BIT )
#define BLIS_BITVAL_PACKED_ROWS ( BLIS_PACK_BIT )
#define BLIS_BITVAL_PACKED_COLUMNS ( BLIS_PACK_BIT | BLIS_PACK_RC_BIT )
#define BLIS_BITVAL_PACKED_ROW_PANELS ( BLIS_PACK_BIT | BLIS_PACK_PANEL_BIT )
#define BLIS_BITVAL_PACKED_COL_PANELS ( BLIS_PACK_BIT | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT )
#define BLIS_BITVAL_PACKED_ROW_PANELS_4MI ( BLIS_PACK_BIT | BLIS_BITVAL_4MI | BLIS_PACK_PANEL_BIT )
#define BLIS_BITVAL_PACKED_COL_PANELS_4MI ( BLIS_PACK_BIT | BLIS_BITVAL_4MI | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT )
#define BLIS_BITVAL_PACKED_ROW_PANELS_3MI ( BLIS_PACK_BIT | BLIS_BITVAL_3MI | BLIS_PACK_PANEL_BIT )
#define BLIS_BITVAL_PACKED_COL_PANELS_3MI ( BLIS_PACK_BIT | BLIS_BITVAL_3MI | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT )
#define BLIS_BITVAL_PACKED_ROW_PANELS_4MS ( BLIS_PACK_BIT | BLIS_BITVAL_4MS | BLIS_PACK_PANEL_BIT )
#define BLIS_BITVAL_PACKED_COL_PANELS_4MS ( BLIS_PACK_BIT | BLIS_BITVAL_4MS | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT )
#define BLIS_BITVAL_PACKED_ROW_PANELS_3MS ( BLIS_PACK_BIT | BLIS_BITVAL_3MS | BLIS_PACK_PANEL_BIT )
#define BLIS_BITVAL_PACKED_COL_PANELS_3MS ( BLIS_PACK_BIT | BLIS_BITVAL_3MS | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT )
#define BLIS_BITVAL_PACKED_ROW_PANELS_RO ( BLIS_PACK_BIT | BLIS_BITVAL_RO | BLIS_PACK_PANEL_BIT )
#define BLIS_BITVAL_PACKED_COL_PANELS_RO ( BLIS_PACK_BIT | BLIS_BITVAL_RO | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT )
#define BLIS_BITVAL_PACKED_ROW_PANELS_IO ( BLIS_PACK_BIT | BLIS_BITVAL_IO | BLIS_PACK_PANEL_BIT )
#define BLIS_BITVAL_PACKED_COL_PANELS_IO ( BLIS_PACK_BIT | BLIS_BITVAL_IO | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT )
#define BLIS_BITVAL_PACKED_ROW_PANELS_RPI ( BLIS_PACK_BIT | BLIS_BITVAL_RPI | BLIS_PACK_PANEL_BIT )
#define BLIS_BITVAL_PACKED_COL_PANELS_RPI ( BLIS_PACK_BIT | BLIS_BITVAL_RPI | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT )
#define BLIS_BITVAL_PACKED_ROW_PANELS_1E ( BLIS_PACK_BIT | BLIS_BITVAL_1E | BLIS_PACK_PANEL_BIT )
#define BLIS_BITVAL_PACKED_COL_PANELS_1E ( BLIS_PACK_BIT | BLIS_BITVAL_1E | BLIS_PACK_PANEL_BIT | BLIS_PACK_RC_BIT )
#define BLIS_BITVAL_PACKED_ROW_PANELS_1R ( BLIS_PACK_BIT | BLIS_BITVAL_1R | BLIS_PACK_PANEL_BIT )
@@ -542,20 +507,6 @@ typedef enum
BLIS_PACKED_COLUMNS = BLIS_BITVAL_PACKED_COLUMNS,
BLIS_PACKED_ROW_PANELS = BLIS_BITVAL_PACKED_ROW_PANELS,
BLIS_PACKED_COL_PANELS = BLIS_BITVAL_PACKED_COL_PANELS,
BLIS_PACKED_ROW_PANELS_4MI = BLIS_BITVAL_PACKED_ROW_PANELS_4MI,
BLIS_PACKED_COL_PANELS_4MI = BLIS_BITVAL_PACKED_COL_PANELS_4MI,
BLIS_PACKED_ROW_PANELS_3MI = BLIS_BITVAL_PACKED_ROW_PANELS_3MI,
BLIS_PACKED_COL_PANELS_3MI = BLIS_BITVAL_PACKED_COL_PANELS_3MI,
BLIS_PACKED_ROW_PANELS_4MS = BLIS_BITVAL_PACKED_ROW_PANELS_4MS,
BLIS_PACKED_COL_PANELS_4MS = BLIS_BITVAL_PACKED_COL_PANELS_4MS,
BLIS_PACKED_ROW_PANELS_3MS = BLIS_BITVAL_PACKED_ROW_PANELS_3MS,
BLIS_PACKED_COL_PANELS_3MS = BLIS_BITVAL_PACKED_COL_PANELS_3MS,
BLIS_PACKED_ROW_PANELS_RO = BLIS_BITVAL_PACKED_ROW_PANELS_RO,
BLIS_PACKED_COL_PANELS_RO = BLIS_BITVAL_PACKED_COL_PANELS_RO,
BLIS_PACKED_ROW_PANELS_IO = BLIS_BITVAL_PACKED_ROW_PANELS_IO,
BLIS_PACKED_COL_PANELS_IO = BLIS_BITVAL_PACKED_COL_PANELS_IO,
BLIS_PACKED_ROW_PANELS_RPI = BLIS_BITVAL_PACKED_ROW_PANELS_RPI,
BLIS_PACKED_COL_PANELS_RPI = BLIS_BITVAL_PACKED_COL_PANELS_RPI,
BLIS_PACKED_ROW_PANELS_1E = BLIS_BITVAL_PACKED_ROW_PANELS_1E,
BLIS_PACKED_COL_PANELS_1E = BLIS_BITVAL_PACKED_COL_PANELS_1E,
BLIS_PACKED_ROW_PANELS_1R = BLIS_BITVAL_PACKED_ROW_PANELS_1R,
@@ -563,10 +514,8 @@ typedef enum
} pack_t;
// We combine row and column packing into one "type", and we start
// with BLIS_PACKED_ROW_PANELS, _COLUMN_PANELS. We also count the
// schema pair for "4ms" (4m separated), because its bit value has
// been reserved, even though we don't use it.
#define BLIS_NUM_PACK_SCHEMA_TYPES 10
// with BLIS_PACKED_ROW_PANELS, _COLUMN_PANELS.
#define BLIS_NUM_PACK_SCHEMA_TYPES 3
// -- Pack order type --
@@ -659,12 +608,7 @@ typedef enum
typedef enum
{
BLIS_3MH = 0,
BLIS_3M1,
BLIS_4MH,
BLIS_4M1B,
BLIS_4M1A,
BLIS_1M,
BLIS_1M = 0,
BLIS_NAT,
BLIS_IND_FIRST = 0,
BLIS_IND_LAST = BLIS_NAT
@@ -672,13 +616,8 @@ typedef enum
#define BLIS_NUM_IND_METHODS (BLIS_NAT+1)
// These are used in bli_*_oapi.c to construct the ind_t values from
// These are used in bli_l3_*_oapi.c to construct the ind_t values from
// the induced method substrings that go into function names.
#define bli_3mh BLIS_3MH
#define bli_3m1 BLIS_3M1
#define bli_4mh BLIS_4MH
#define bli_4mb BLIS_4M1B
#define bli_4m1 BLIS_4M1A
#define bli_1m BLIS_1M
#define bli_nat BLIS_NAT
@@ -1204,9 +1143,6 @@ typedef struct
inc_t ps_a;
inc_t ps_b;
// The type to convert to on output.
//num_t dt_on_output;
} auxinfo_t;

View File

@@ -1,148 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
typedef void (*cntx_stage_ft)( dim_t stage, cntx_t* cntx );
static void_fp bli_cntx_ind_stage_fp[BLIS_NUM_IND_METHODS] =
{
/* 3mh */ bli_cntx_3mh_stage,
/* 3m1 */ bli_cntx_3m1_stage,
/* 4mh */ bli_cntx_4mh_stage,
/* 4mb */ bli_cntx_4mb_stage,
/* 4m1 */ bli_cntx_4m1_stage,
/* 1m */ bli_cntx_1m_stage,
/* nat */ bli_cntx_nat_stage
};
// -----------------------------------------------------------------------------
// Execute the context initialization/finalization function associated
// with a given induced method.
void bli_cntx_ind_stage( ind_t method, dim_t stage, cntx_t* cntx )
{
cntx_stage_ft func = bli_cntx_ind_stage_fp[ method ];
func( stage, cntx );
}
// -----------------------------------------------------------------------------
// These functions modify a context, if needed, for the particular "stage" of
// the induced method execution. Some induced methods do not make use of this
// feature. NOTE: ANY INDUCED METHOD THAT HAS A NON-EMPTY _stage() FUNCTION
// IS NOT THREAT-SAFE FOR APPLICATION-LEVEL THREADING.
// -----------------------------------------------------------------------------
void bli_cntx_3mh_stage( dim_t stage, cntx_t* cntx )
{
// Set the pack_t schemas as a function of the stage of execution.
if ( stage == 0 )
{
//bli_cntx_set_schema_a_block( BLIS_PACKED_ROW_PANELS_RO, cntx );
//bli_cntx_set_schema_b_panel( BLIS_PACKED_COL_PANELS_RO, cntx );
}
else if ( stage == 1 )
{
//bli_cntx_set_schema_a_block( BLIS_PACKED_ROW_PANELS_IO, cntx );
//bli_cntx_set_schema_b_panel( BLIS_PACKED_COL_PANELS_IO, cntx );
}
else // if ( stage == 2 )
{
//bli_cntx_set_schema_a_block( BLIS_PACKED_ROW_PANELS_RPI, cntx );
//bli_cntx_set_schema_b_panel( BLIS_PACKED_COL_PANELS_RPI, cntx );
}
}
// -----------------------------------------------------------------------------
void bli_cntx_3m1_stage( dim_t stage, cntx_t* cntx )
{
}
// -----------------------------------------------------------------------------
void bli_cntx_4mh_stage( dim_t stage, cntx_t* cntx )
{
// Set the pack_t schemas as a function of the stage of execution.
if ( stage == 0 )
{
//bli_cntx_set_schema_a_block( BLIS_PACKED_ROW_PANELS_RO, cntx );
//bli_cntx_set_schema_b_panel( BLIS_PACKED_COL_PANELS_RO, cntx );
}
else if ( stage == 1 )
{
//bli_cntx_set_schema_a_block( BLIS_PACKED_ROW_PANELS_IO, cntx );
//bli_cntx_set_schema_b_panel( BLIS_PACKED_COL_PANELS_IO, cntx );
}
else if ( stage == 2 )
{
//bli_cntx_set_schema_a_block( BLIS_PACKED_ROW_PANELS_RO, cntx );
//bli_cntx_set_schema_b_panel( BLIS_PACKED_COL_PANELS_IO, cntx );
}
else // if ( stage == 3 )
{
//bli_cntx_set_schema_a_block( BLIS_PACKED_ROW_PANELS_IO, cntx );
//bli_cntx_set_schema_b_panel( BLIS_PACKED_COL_PANELS_RO, cntx );
}
}
// -----------------------------------------------------------------------------
void bli_cntx_4mb_stage( dim_t stage, cntx_t* cntx )
{
}
// -----------------------------------------------------------------------------
void bli_cntx_4m1_stage( dim_t stage, cntx_t* cntx )
{
}
// -----------------------------------------------------------------------------
void bli_cntx_1m_stage( dim_t stage, cntx_t* cntx )
{
}
// -----------------------------------------------------------------------------
void bli_cntx_nat_stage( dim_t stage, cntx_t* cntx )
{
}

View File

@@ -1,44 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
void bli_cntx_ind_stage( ind_t method, dim_t stage, cntx_t* cntx );
void bli_cntx_3mh_stage( dim_t stage, cntx_t* cntx );
void bli_cntx_3m1_stage( dim_t stage, cntx_t* cntx );
void bli_cntx_4mh_stage( dim_t stage, cntx_t* cntx );
void bli_cntx_4mb_stage( dim_t stage, cntx_t* cntx );
void bli_cntx_4m1_stage( dim_t stage, cntx_t* cntx );
void bli_cntx_1m_stage( dim_t stage, cntx_t* cntx );
void bli_cntx_nat_stage( dim_t stage, cntx_t* cntx );

View File

@@ -1,443 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2018 - 2019, Advanced Micro Devices, Inc.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
// -- gemm/her2k/syr2k ---------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, cname, imeth, nstage ) \
\
void PASTEMAC(opname,imeth) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
ind_t ind = PASTEMAC0(imeth); \
num_t dt = bli_obj_dt( c ); \
obj_t* beta_use = beta; \
\
dim_t i; \
\
/* If the objects are in the real domain, execute the native
implementation. */ \
if ( bli_obj_is_real( c ) ) \
{ \
PASTEMAC(opname,nat)( alpha, a, b, beta, c, cntx, rntm ); \
return; \
} \
\
/* A temporary hack to easily specify the 1m algorithm (block-panel or
panel-block). */ \
/*
if ( PASTEMAC(opname,imeth) == bli_gemm1m ) \
{ \
bli_gemm1mbp( alpha, a, b, beta, c ); \
return; \
} \
else if ( PASTEMAC(opname,imeth) == bli_gemm3m1 ) \
{ \
bli_gemm1mpb( alpha, a, b, beta, c ); \
return; \
} \
*/ \
\
/* Query a context for the current induced method. This context is
managed and cached by the gks and should not be freed by the caller.
Note that the datatype argument is needed because it will be passed
in when bli_gks_query_ind_cntx() eventually calls the induced method's
_cntx_init() function. */ \
cntx = bli_gks_query_ind_cntx( ind, dt ); \
\
/* 3mh and 4mh change the context for each stage, and so in order to
remain thread-safe, we must make a local copy of the context for
those induced methods. */ \
cntx_t cntx_l; \
if ( ind == BLIS_3MH || ind == BLIS_4MH ) { cntx_l = *cntx; cntx = &cntx_l; } \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Some induced methods execute in multiple "stages". */ \
for ( i = 0; i < nstage; ++i ) \
{ \
/* Prepare the context for the ith stage of computation. */ \
bli_cntx_ind_stage( ind, i, cntx ); \
\
/* For multi-stage methods, use BLIS_ONE as beta after the first
stage. */ \
if ( i > 0 ) beta_use = &BLIS_ONE; \
\
/* Invoke the operation's front end and request the default control
tree. */ \
PASTEMAC(opname,_front)( alpha, a, b, beta_use, c, cntx, rntm, NULL ); \
} \
}
// gemm
GENFRONT( gemm, gemm, 3mh, 3 )
GENFRONT( gemm, gemm, 3m1, 1 )
GENFRONT( gemm, gemm, 4mh, 4 )
GENFRONT( gemm, gemm, 4mb, 1 )
GENFRONT( gemm, gemm, 4m1, 1 )
GENFRONT( gemm, gemm, 1m, 1 )
// her2k
GENFRONT( her2k, gemm, 3mh, 3 )
GENFRONT( her2k, gemm, 3m1, 1 )
GENFRONT( her2k, gemm, 4mh, 4 )
//GENFRONT( her2k, gemm, 4mb, 1 ) // Not implemented.
GENFRONT( her2k, gemm, 4m1, 1 )
GENFRONT( her2k, gemm, 1m, 1 )
// syr2k
GENFRONT( syr2k, gemm, 3mh, 3 )
GENFRONT( syr2k, gemm, 3m1, 1 )
GENFRONT( syr2k, gemm, 4mh, 4 )
//GENFRONT( syr2k, gemm, 4mb, 1 ) // Not implemented.
GENFRONT( syr2k, gemm, 4m1, 1 )
GENFRONT( syr2k, gemm, 1m, 1 )
// -- hemm/symm/trmm3 ----------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, cname, imeth, nstage ) \
\
void PASTEMAC(opname,imeth) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
ind_t ind = PASTEMAC0(imeth); \
num_t dt = bli_obj_dt( c ); \
obj_t* beta_use = beta; \
\
dim_t i; \
\
/* If the objects are in the real domain, execute the native
implementation. */ \
if ( bli_obj_is_real( c ) ) \
{ \
PASTEMAC(opname,nat)( side, alpha, a, b, beta, c, cntx, rntm ); \
return; \
} \
\
/* Query a context for the current induced method. This context is
managed and cached by the gks and should not be freed by the caller.
Note that the datatype argument is needed because it will be passed
in when bli_gks_query_ind_cntx() eventually calls the induced method's
_cntx_init() function. */ \
cntx = bli_gks_query_ind_cntx( ind, dt ); \
\
/* 3mh and 4mh change the context for each stage, and so in order to
remain thread-safe, we must make a local copy of the context for
those induced methods. */ \
cntx_t cntx_l; \
if ( ind == BLIS_3MH || ind == BLIS_4MH ) { cntx_l = *cntx; cntx = &cntx_l; } \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Some induced methods execute in multiple "stages". */ \
for ( i = 0; i < nstage; ++i ) \
{ \
/* Prepare the context for the ith stage of computation. */ \
bli_cntx_ind_stage( ind, i, cntx ); \
\
/* For multi-stage methods, use BLIS_ONE as beta after the first
stage. */ \
if ( i > 0 ) beta_use = &BLIS_ONE; \
\
/* Invoke the operation's front end and request the default control
tree. */ \
PASTEMAC(opname,_front)( side, alpha, a, b, beta_use, c, cntx, rntm, NULL ); \
} \
}
// hemm
GENFRONT( hemm, gemm, 3mh, 3 )
GENFRONT( hemm, gemm, 3m1, 1 )
GENFRONT( hemm, gemm, 4mh, 4 )
//GENFRONT( hemm, gemm, 4mb, 1 ) // Not implemented.
GENFRONT( hemm, gemm, 4m1, 1 )
GENFRONT( hemm, gemm, 1m, 1 )
// symm
GENFRONT( symm, gemm, 3mh, 3 )
GENFRONT( symm, gemm, 3m1, 1 )
GENFRONT( symm, gemm, 4mh, 4 )
//GENFRONT( symm, gemm, 4mb, 1 ) // Not implemented.
GENFRONT( symm, gemm, 4m1, 1 )
GENFRONT( symm, gemm, 1m, 1 )
// trmm3
GENFRONT( trmm3, gemm, 3mh, 3 )
GENFRONT( trmm3, gemm, 3m1, 1 )
GENFRONT( trmm3, gemm, 4mh, 4 )
//GENFRONT( trmm3, gemm, 4mb, 1 ) // Not implemented.
GENFRONT( trmm3, gemm, 4m1, 1 )
GENFRONT( trmm3, gemm, 1m, 1 )
// -- herk/syrk ----------------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, cname, imeth, nstage ) \
\
void PASTEMAC(opname,imeth) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
ind_t ind = PASTEMAC0(imeth); \
num_t dt = bli_obj_dt( c ); \
obj_t* beta_use = beta; \
\
dim_t i; \
\
/* If the objects are in the real domain, execute the native
implementation. */ \
if ( bli_obj_is_real( c ) ) \
{ \
PASTEMAC(opname,nat)( alpha, a, beta, c, cntx, rntm ); \
return; \
} \
\
/* Query a context for the current induced method. This context is
managed and cached by the gks and should not be freed by the caller.
Note that the datatype argument is needed because it will be passed
in when bli_gks_query_ind_cntx() eventually calls the induced method's
_cntx_init() function. */ \
cntx = bli_gks_query_ind_cntx( ind, dt ); \
\
/* 3mh and 4mh change the context for each stage, and so in order to
remain thread-safe, we must make a local copy of the context for
those induced methods. */ \
cntx_t cntx_l; \
if ( ind == BLIS_3MH || ind == BLIS_4MH ) { cntx_l = *cntx; cntx = &cntx_l; } \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Some induced methods execute in multiple "stages". */ \
for ( i = 0; i < nstage; ++i ) \
{ \
/* Prepare the context for the ith stage of computation. */ \
bli_cntx_ind_stage( ind, i, cntx ); \
\
/* For multi-stage methods, use BLIS_ONE as beta after the first
stage. */ \
if ( i > 0 ) beta_use = &BLIS_ONE; \
\
/* Invoke the operation's front end and request the default control
tree. */ \
PASTEMAC(opname,_front)( alpha, a, beta_use, c, cntx, rntm, NULL ); \
} \
}
// herk
GENFRONT( herk, gemm, 3mh, 3 )
GENFRONT( herk, gemm, 3m1, 1 )
GENFRONT( herk, gemm, 4mh, 4 )
//GENFRONT( herk, gemm, 4mb, 1 ) // Not implemented.
GENFRONT( herk, gemm, 4m1, 1 )
GENFRONT( herk, gemm, 1m, 1 )
// syrk
GENFRONT( syrk, gemm, 3mh, 3 )
GENFRONT( syrk, gemm, 3m1, 1 )
GENFRONT( syrk, gemm, 4mh, 4 )
//GENFRONT( syrk, gemm, 4mb, 1 ) // Not implemented.
GENFRONT( syrk, gemm, 4m1, 1 )
GENFRONT( syrk, gemm, 1m, 1 )
// -- trmm ---------------------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, cname, imeth, nstage ) \
\
void PASTEMAC(opname,imeth) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
ind_t ind = PASTEMAC0(imeth); \
num_t dt = bli_obj_dt( b ); \
\
dim_t i; \
\
/* If the objects are in the real domain, execute the native
implementation. */ \
if ( bli_obj_is_real( b ) ) \
{ \
PASTEMAC(opname,nat)( side, alpha, a, b, cntx, rntm ); \
return; \
} \
\
/* Query a context for the current induced method. This context is
managed and cached by the gks and should not be freed by the caller.
Note that the datatype argument is needed because it will be passed
in when bli_gks_query_ind_cntx() eventually calls the induced method's
_cntx_init() function. */ \
cntx = bli_gks_query_ind_cntx( ind, dt ); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Some induced methods execute in multiple "stages". */ \
for ( i = 0; i < nstage; ++i ) \
{ \
/* Prepare the context for the ith stage of computation. */ \
bli_cntx_ind_stage( ind, i, cntx ); \
\
/* Invoke the operation's front end and request the default control
tree. */ \
PASTEMAC(opname,_front)( side, alpha, a, b, cntx, rntm, NULL ); \
} \
}
// trmm
//GENFRONT( trmm, gemm, 3mh, 3 ) // Unimplementable.
GENFRONT( trmm, gemm, 3m1, 1 )
//GENFRONT( trmm, gemm, 4mh, 4 ) // Unimplementable.
//GENFRONT( trmm, gemm, 4mb, 1 ) // Unimplementable.
GENFRONT( trmm, gemm, 4m1, 1 )
GENFRONT( trmm, gemm, 1m, 1 )
// -- trsm ---------------------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, cname, imeth, nstage ) \
\
void PASTEMAC(opname,imeth) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
ind_t ind = PASTEMAC0(imeth); \
num_t dt = bli_obj_dt( b ); \
\
/* If the objects are in the real domain, execute the native
implementation. */ \
if ( bli_obj_is_real( b ) ) \
{ \
PASTEMAC(opname,nat)( side, alpha, a, b, cntx, rntm ); \
return; \
} \
\
/* Query a context for the current induced method. This context is
managed and cached by the gks and should not be freed by the caller.
Note that the datatype argument is needed because it will be passed
in when bli_gks_query_ind_cntx() eventually calls the induced method's
_cntx_init() function. */ \
cntx = bli_gks_query_ind_cntx( ind, dt ); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
{ \
/* NOTE: trsm cannot be implemented via any induced method that
needs to execute in stages (e.g. 3mh, 4mh). */ \
\
/* Invoke the operation's front end and request the default control
tree. */ \
PASTEMAC(opname,_front)( side, alpha, a, b, cntx, rntm, NULL ); \
} \
}
// trsm
//GENFRONT( trmm, trsm, 3mh, 3 ) // Unimplementable.
GENFRONT( trsm, trsm, 3m1, 1 )
//GENFRONT( trmm, trsm, 4mh, 4 ) // Unimplementable.
//GENFRONT( trmm, trsm, 4mb, 1 ) // Unimplementable.
GENFRONT( trsm, trsm, 4m1, 1 )
GENFRONT( trsm, trsm, 1m, 1 )

View File

@@ -1,175 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2018 - 2020, Advanced Micro Devices, Inc.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
// -- gemm/her2k/syr2k ---------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, imeth ) \
\
void PASTEMAC(opname,imeth) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
num_t dt = bli_obj_dt( c ); \
PASTECH(opname,_oft) func = PASTEMAC(opname,ind_get_avail)( dt ); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
func( alpha, a, b, beta, c, cntx, rntm ); \
}
GENFRONT( gemm, ind )
GENFRONT( gemmt, ind )
GENFRONT( her2k, ind )
GENFRONT( syr2k, ind )
// -- hemm/symm/trmm3 ----------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, imeth ) \
\
void PASTEMAC(opname,imeth) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
num_t dt = bli_obj_dt( c ); \
PASTECH(opname,_oft) func = PASTEMAC(opname,ind_get_avail)( dt ); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
func( side, alpha, a, b, beta, c, cntx, rntm ); \
}
GENFRONT( hemm, ind )
GENFRONT( symm, ind )
GENFRONT( trmm3, ind )
// -- herk/syrk ----------------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, imeth ) \
\
void PASTEMAC(opname,imeth) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
num_t dt = bli_obj_dt( c ); \
PASTECH(opname,_oft) func = PASTEMAC(opname,ind_get_avail)( dt ); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
func( alpha, a, beta, c, cntx, rntm ); \
}
GENFRONT( herk, ind )
GENFRONT( syrk, ind )
// -- trmm/trsm ----------------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, imeth ) \
\
void PASTEMAC(opname,imeth) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
num_t dt = bli_obj_dt( b ); \
PASTECH(opname,_oft) func = PASTEMAC(opname,ind_get_avail)( dt ); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
func( side, alpha, a, b, cntx, rntm ); \
}
GENFRONT( trmm, ind )
GENFRONT( trsm, ind )

View File

@@ -1,98 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2020, Advanced Micro Devices, Inc.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
//
// Generate object-based prototypes for induced methods that work for
// trmm and trsm (ie: two-operand operations).
//
#undef GENPROT
#define GENPROT( imeth ) \
\
BLIS_EXPORT_BLIS void PASTEMAC(gemm,imeth) ( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(gemmt,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(hemm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(herk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(her2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(symm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(syrk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(syr2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(trmm3,imeth)( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(trmm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(trsm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, cntx_t* cntx, rntm_t* rntm );
GENPROT( nat )
GENPROT( ind )
GENPROT( 3m1 )
GENPROT( 4m1 )
GENPROT( 1m )
//
// Generate object-based prototypes for induced methods that do NOT work
// for trmm and trsm (ie: two-operand operations).
//
#undef GENPROT_NO2OP
#define GENPROT_NO2OP( imeth ) \
\
BLIS_EXPORT_BLIS void PASTEMAC(gemm,imeth) ( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(hemm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(herk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(her2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(symm,imeth) ( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(syrk,imeth) ( obj_t* alpha, obj_t* a, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(syr2k,imeth)( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm ); \
BLIS_EXPORT_BLIS void PASTEMAC(trmm3,imeth)( side_t side, obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c, cntx_t* cntx, rntm_t* rntm );
GENPROT_NO2OP( 3mh )
GENPROT_NO2OP( 4mh )
GENPROT_NO2OP( 4mb )
//
// Generate object-based prototypes for 1m methods that specify an algorithm
// (e.g., block-panel or panel-block).
//
/*
#undef GENPROT
#define GENPROT( imeth, alg ) \
\
BLIS_EXPORT_BLIS void PASTEMAC2(gemm,imeth,alg) ( obj_t* alpha, obj_t* a, obj_t* b, obj_t* beta, obj_t* c ); \
*/
//GENPROT( 1m, bp )
//GENPROT( 1m, pb )

View File

@@ -1,235 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Copyright (C) 2018 - 2020, Advanced Micro Devices, Inc.
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
// NOTE: The function definitions in this file can be consolidated with the
// definitions for the other induced methods. The only advantage of keeping
// them separate is that it allows us to avoid the very small loop overhead
// of executing one iteration of a for loop, plus the overhead of calling a
// function that does nothing (ie: the _cntx_init_stage() function).
// -- gemm/her2k/syr2k/gemmt ---------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, cname, imeth ) \
\
void PASTEMAC(opname,imeth) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
/* Obtain a valid (native) context from the gks if necessary. */ \
if ( cntx == NULL ) cntx = bli_gks_query_cntx(); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Invoke the operation's front end. */ \
PASTEMAC(opname,_front) \
( \
alpha, a, b, beta, c, cntx, rntm, NULL \
); \
}
// If a sandbox was enabled, do not define bli_gemmnat() since it will be
// defined in the sandbox environment.
#ifndef BLIS_ENABLE_SANDBOX
GENFRONT( gemm, gemm, nat )
#endif
GENFRONT( gemmt, gemm, nat )
GENFRONT( her2k, gemm, nat )
GENFRONT( syr2k, gemm, nat )
// -- hemm/symm/trmm3 ----------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, cname, imeth ) \
\
void PASTEMAC(opname,imeth) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
/* Obtain a valid (native) context from the gks if necessary. */ \
if ( cntx == NULL ) cntx = bli_gks_query_cntx(); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Invoke the operation's front end. */ \
PASTEMAC(opname,_front) \
( \
side, alpha, a, b, beta, c, cntx, rntm, NULL \
); \
}
GENFRONT( hemm, gemm, nat )
GENFRONT( symm, gemm, nat )
GENFRONT( trmm3, gemm, nat )
// -- herk/syrk ----------------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, cname, imeth ) \
\
void PASTEMAC(opname,imeth) \
( \
obj_t* alpha, \
obj_t* a, \
obj_t* beta, \
obj_t* c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
/* Obtain a valid (native) context from the gks if necessary. */ \
if ( cntx == NULL ) cntx = bli_gks_query_cntx(); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Invoke the operation's front end. */ \
PASTEMAC(opname,_front) \
( \
alpha, a, beta, c, cntx, rntm, NULL \
); \
}
GENFRONT( herk, gemm, nat )
GENFRONT( syrk, gemm, nat )
// -- trmm ---------------------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, cname, imeth ) \
\
void PASTEMAC(opname,imeth) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
/* Obtain a valid (native) context from the gks if necessary. */ \
if ( cntx == NULL ) cntx = bli_gks_query_cntx(); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Invoke the operation's front end. */ \
PASTEMAC(opname,_front) \
( \
side, alpha, a, b, cntx, rntm, NULL \
); \
}
GENFRONT( trmm, gemm, nat )
// -- trsm ---------------------------------------------------------------------
#undef GENFRONT
#define GENFRONT( opname, cname, imeth ) \
\
void PASTEMAC(opname,imeth) \
( \
side_t side, \
obj_t* alpha, \
obj_t* a, \
obj_t* b, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
/* Obtain a valid (native) context from the gks if necessary. */ \
if ( cntx == NULL ) cntx = bli_gks_query_cntx(); \
\
/* Initialize a local runtime with global settings if necessary. Note
that in the case that a runtime is passed in, we make a local copy. */ \
rntm_t rntm_l; \
if ( rntm == NULL ) { bli_rntm_init_from_global( &rntm_l ); rntm = &rntm_l; } \
else { rntm_l = *rntm; rntm = &rntm_l; } \
\
/* Invoke the operation's front end. */ \
PASTEMAC(opname,_front) \
( \
side, alpha, a, b, cntx, rntm, NULL \
); \
}
GENFRONT( trsm, trsm, nat )

View File

@@ -1,664 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
// -- gemm ---------------------------------------------------------------------
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
trans_t transa, \
trans_t transb, \
dim_t m, \
dim_t n, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao, ao, bo, betao, co; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, k, n, &m_b, &n_b ); \
\
bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \
bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \
\
bli_obj_create_with_attached_buffer( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_create_with_attached_buffer( dt, m, n, c, rs_c, cs_c, &co ); \
\
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
PASTEMAC0(opname) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( gemm3mh )
INSERT_GENTFUNC_BASIC0( gemm3m1 )
INSERT_GENTFUNC_BASIC0( gemm4mh )
INSERT_GENTFUNC_BASIC0( gemm4mb )
INSERT_GENTFUNC_BASIC0( gemm4m1 )
INSERT_GENTFUNC_BASIC0( gemm1m )
// -- hemm ---------------------------------------------------------------------
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
conj_t conja, \
trans_t transb, \
dim_t m, \
dim_t n, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao, ao, bo, betao, co; \
\
dim_t mn_a; \
dim_t m_b, n_b; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
bli_set_dims_with_trans( transb, m, n, &m_b, &n_b ); \
\
bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \
bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \
\
bli_obj_create_with_attached_buffer( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_create_with_attached_buffer( dt, m, n, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_conj( conja, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_HERMITIAN, &ao ); \
\
PASTEMAC0(opname) \
( \
side, \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( hemm3mh )
INSERT_GENTFUNC_BASIC0( hemm3m1 )
INSERT_GENTFUNC_BASIC0( hemm4mh )
INSERT_GENTFUNC_BASIC0( hemm4m1 )
INSERT_GENTFUNC_BASIC0( hemm1m )
// -- herk ---------------------------------------------------------------------
#undef GENTFUNCR
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
dim_t m, \
dim_t k, \
ctype_r* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt_r = PASTEMAC(chr,type); \
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao, ao, betao, co; \
\
dim_t m_a, n_a; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
\
bli_obj_create_1x1_with_attached_buffer( dt_r, alpha, &alphao ); \
bli_obj_create_1x1_with_attached_buffer( dt_r, beta, &betao ); \
\
bli_obj_create_with_attached_buffer( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_create_with_attached_buffer( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
\
bli_obj_set_struc( BLIS_HERMITIAN, &co ); \
\
PASTEMAC0(opname) \
( \
&alphao, \
&ao, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNCR_BASIC0( herk3mh )
INSERT_GENTFUNCR_BASIC0( herk3m1 )
INSERT_GENTFUNCR_BASIC0( herk4mh )
INSERT_GENTFUNCR_BASIC0( herk4m1 )
INSERT_GENTFUNCR_BASIC0( herk1m )
// -- her2k --------------------------------------------------------------------
#undef GENTFUNCR
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
trans_t transb, \
dim_t m, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype_r* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt_r = PASTEMAC(chr,type); \
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao, ao, bo, betao, co; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, m, k, &m_b, &n_b ); \
\
bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \
bli_obj_create_1x1_with_attached_buffer( dt_r, beta, &betao ); \
\
bli_obj_create_with_attached_buffer( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_create_with_attached_buffer( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_HERMITIAN, &co ); \
\
PASTEMAC0(opname) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNCR_BASIC0( her2k3mh )
INSERT_GENTFUNCR_BASIC0( her2k3m1 )
INSERT_GENTFUNCR_BASIC0( her2k4mh )
INSERT_GENTFUNCR_BASIC0( her2k4m1 )
INSERT_GENTFUNCR_BASIC0( her2k1m )
// -- symm ---------------------------------------------------------------------
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
conj_t conja, \
trans_t transb, \
dim_t m, \
dim_t n, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao, ao, bo, betao, co; \
\
dim_t mn_a; \
dim_t m_b, n_b; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
bli_set_dims_with_trans( transb, m, n, &m_b, &n_b ); \
\
bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \
bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \
\
bli_obj_create_with_attached_buffer( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_create_with_attached_buffer( dt, m, n, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_conj( conja, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_SYMMETRIC, &ao ); \
\
PASTEMAC0(opname) \
( \
side, \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( symm3mh )
INSERT_GENTFUNC_BASIC0( symm3m1 )
INSERT_GENTFUNC_BASIC0( symm4mh )
INSERT_GENTFUNC_BASIC0( symm4m1 )
INSERT_GENTFUNC_BASIC0( symm1m )
// -- syrk ---------------------------------------------------------------------
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
dim_t m, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao, ao, betao, co; \
\
dim_t m_a, n_a; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
\
bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \
bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \
\
bli_obj_create_with_attached_buffer( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_create_with_attached_buffer( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
\
bli_obj_set_struc( BLIS_SYMMETRIC, &co ); \
\
PASTEMAC0(opname) \
( \
&alphao, \
&ao, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( syrk3mh )
INSERT_GENTFUNC_BASIC0( syrk3m1 )
INSERT_GENTFUNC_BASIC0( syrk4mh )
INSERT_GENTFUNC_BASIC0( syrk4m1 )
INSERT_GENTFUNC_BASIC0( syrk1m )
// -- syr2k --------------------------------------------------------------------
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
uplo_t uploc, \
trans_t transa, \
trans_t transb, \
dim_t m, \
dim_t k, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao, ao, bo, betao, co; \
\
dim_t m_a, n_a; \
dim_t m_b, n_b; \
\
bli_set_dims_with_trans( transa, m, k, &m_a, &n_a ); \
bli_set_dims_with_trans( transb, m, k, &m_b, &n_b ); \
\
bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \
bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \
\
bli_obj_create_with_attached_buffer( dt, m_a, n_a, a, rs_a, cs_a, &ao ); \
bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_create_with_attached_buffer( dt, m, m, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploc, &co ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_SYMMETRIC, &co ); \
\
PASTEMAC0(opname) \
( \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( syr2k3mh )
INSERT_GENTFUNC_BASIC0( syr2k3m1 )
INSERT_GENTFUNC_BASIC0( syr2k4mh )
INSERT_GENTFUNC_BASIC0( syr2k4m1 )
INSERT_GENTFUNC_BASIC0( syr2k1m )
// -- trmm3 --------------------------------------------------------------------
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
trans_t transa, \
diag_t diaga, \
trans_t transb, \
dim_t m, \
dim_t n, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
ctype* beta, \
ctype* c, inc_t rs_c, inc_t cs_c, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao, ao, bo, betao, co; \
\
dim_t mn_a; \
dim_t m_b, n_b; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
bli_set_dims_with_trans( transb, m, n, &m_b, &n_b ); \
\
bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \
bli_obj_create_1x1_with_attached_buffer( dt, beta, &betao ); \
\
bli_obj_create_with_attached_buffer( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_create_with_attached_buffer( dt, m_b, n_b, b, rs_b, cs_b, &bo ); \
bli_obj_create_with_attached_buffer( dt, m, n, c, rs_c, cs_c, &co ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_diag( diaga, &ao ); \
bli_obj_set_conjtrans( transa, &ao ); \
bli_obj_set_conjtrans( transb, &bo ); \
\
bli_obj_set_struc( BLIS_TRIANGULAR, &ao ); \
\
PASTEMAC0(opname) \
( \
side, \
&alphao, \
&ao, \
&bo, \
&betao, \
&co, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( trmm33mh )
INSERT_GENTFUNC_BASIC0( trmm33m1 )
INSERT_GENTFUNC_BASIC0( trmm34mh )
INSERT_GENTFUNC_BASIC0( trmm34m1 )
INSERT_GENTFUNC_BASIC0( trmm31m )
// -- trmm ---------------------------------------------------------------------
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
trans_t transa, \
diag_t diaga, \
dim_t m, \
dim_t n, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao, ao, bo; \
\
dim_t mn_a; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
\
bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \
\
bli_obj_create_with_attached_buffer( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_create_with_attached_buffer( dt, m, n, b, rs_b, cs_b, &bo ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_diag( diaga, &ao ); \
bli_obj_set_conjtrans( transa, &ao ); \
\
bli_obj_set_struc( BLIS_TRIANGULAR, &ao ); \
\
PASTEMAC0(opname) \
( \
side, \
&alphao, \
&ao, \
&bo, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( trmm3m1 )
INSERT_GENTFUNC_BASIC0( trmm4m1 )
INSERT_GENTFUNC_BASIC0( trmm1m )
// -- trsm ---------------------------------------------------------------------
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
side_t side, \
uplo_t uploa, \
trans_t transa, \
diag_t diaga, \
dim_t m, \
dim_t n, \
ctype* alpha, \
ctype* a, inc_t rs_a, inc_t cs_a, \
ctype* b, inc_t rs_b, inc_t cs_b, \
cntx_t* cntx, \
rntm_t* rntm \
) \
{ \
bli_init_once(); \
\
const num_t dt = PASTEMAC(ch,type); \
\
obj_t alphao, ao, bo; \
\
dim_t mn_a; \
\
bli_set_dim_with_side( side, m, n, &mn_a ); \
\
bli_obj_create_1x1_with_attached_buffer( dt, alpha, &alphao ); \
\
bli_obj_create_with_attached_buffer( dt, mn_a, mn_a, a, rs_a, cs_a, &ao ); \
bli_obj_create_with_attached_buffer( dt, m, n, b, rs_b, cs_b, &bo ); \
\
bli_obj_set_uplo( uploa, &ao ); \
bli_obj_set_diag( diaga, &ao ); \
bli_obj_set_conjtrans( transa, &ao ); \
\
bli_obj_set_struc( BLIS_TRIANGULAR, &ao ); \
\
PASTEMAC0(opname) \
( \
side, \
&alphao, \
&ao, \
&bo, \
cntx, \
rntm \
); \
}
INSERT_GENTFUNC_BASIC0( trsm3m1 )
INSERT_GENTFUNC_BASIC0( trsm4m1 )
INSERT_GENTFUNC_BASIC0( trsm1m )

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

File diff suppressed because it is too large Load Diff

View File

@@ -47,7 +47,7 @@
// -- Level-3 native micro-kernel prototype redefinitions ----------------------
// -- prototypes for completely generic level-3 microkernels --
// -- Prototypes for completely generic level-3 microkernels --
#undef gemm_ukr_name
#define gemm_ukr_name GENARNAME(gemm)
@@ -66,46 +66,7 @@
// -- Level-3 virtual micro-kernel prototype redefinitions ---------------------
// -- 3mh --
#undef gemm3mh_ukr_name
#define gemm3mh_ukr_name GENARNAME(gemm3mh)
// -- 3m1 --
#undef gemm3m1_ukr_name
#define gemm3m1_ukr_name GENARNAME(gemm3m1)
#undef gemmtrsm3m1_l_ukr_name
#define gemmtrsm3m1_l_ukr_name GENARNAME(gemmtrsm3m1_l)
#undef gemmtrsm3m1_u_ukr_name
#define gemmtrsm3m1_u_ukr_name GENARNAME(gemmtrsm3m1_u)
#undef trsm3m1_l_ukr_name
#define trsm3m1_l_ukr_name GENARNAME(trsm3m1_l)
#undef trsm3m1_u_ukr_name
#define trsm3m1_u_ukr_name GENARNAME(trsm3m1_u)
// -- 4mh --
#undef gemm4mh_ukr_name
#define gemm4mh_ukr_name GENARNAME(gemm4mh)
// -- 4mb --
#undef gemm4mb_ukr_name
#define gemm4mb_ukr_name GENARNAME(gemm4mb)
// -- 4m1 --
#undef gemm4m1_ukr_name
#define gemm4m1_ukr_name GENARNAME(gemm4m1)
#undef gemmtrsm4m1_l_ukr_name
#define gemmtrsm4m1_l_ukr_name GENARNAME(gemmtrsm4m1_l)
#undef gemmtrsm4m1_u_ukr_name
#define gemmtrsm4m1_u_ukr_name GENARNAME(gemmtrsm4m1_u)
#undef trsm4m1_l_ukr_name
#define trsm4m1_l_ukr_name GENARNAME(trsm4m1_l)
#undef trsm4m1_u_ukr_name
#define trsm4m1_u_ukr_name GENARNAME(trsm4m1_u)
// -- Prototypes for induced method level-3 microkernels --
// -- 1m --
@@ -184,59 +145,6 @@
#undef unpackm_16xk_ker_name
#define unpackm_16xk_ker_name GENARNAME(unpackm_16xk)
#undef packm_2xk_3mis_ker_name
#define packm_2xk_3mis_ker_name GENARNAME(packm_2xk_3mis)
#undef packm_4xk_3mis_ker_name
#define packm_4xk_3mis_ker_name GENARNAME(packm_4xk_3mis)
#undef packm_6xk_3mis_ker_name
#define packm_6xk_3mis_ker_name GENARNAME(packm_6xk_3mis)
#undef packm_8xk_3mis_ker_name
#define packm_8xk_3mis_ker_name GENARNAME(packm_8xk_3mis)
#undef packm_10xk_3mis_ker_name
#define packm_10xk_3mis_ker_name GENARNAME(packm_10xk_3mis)
#undef packm_12xk_3mis_ker_name
#define packm_12xk_3mis_ker_name GENARNAME(packm_12xk_3mis)
#undef packm_14xk_3mis_ker_name
#define packm_14xk_3mis_ker_name GENARNAME(packm_14xk_3mis)
#undef packm_16xk_3mis_ker_name
#define packm_16xk_3mis_ker_name GENARNAME(packm_16xk_3mis)
#undef packm_2xk_4mi_ker_name
#define packm_2xk_4mi_ker_name GENARNAME(packm_2xk_4mi)
#undef packm_3xk_4mi_ker_name
#define packm_3xk_4mi_ker_name GENARNAME(packm_3xk_4mi)
#undef packm_4xk_4mi_ker_name
#define packm_4xk_4mi_ker_name GENARNAME(packm_4xk_4mi)
#undef packm_6xk_4mi_ker_name
#define packm_6xk_4mi_ker_name GENARNAME(packm_6xk_4mi)
#undef packm_8xk_4mi_ker_name
#define packm_8xk_4mi_ker_name GENARNAME(packm_8xk_4mi)
#undef packm_10xk_4mi_ker_name
#define packm_10xk_4mi_ker_name GENARNAME(packm_10xk_4mi)
#undef packm_12xk_4mi_ker_name
#define packm_12xk_4mi_ker_name GENARNAME(packm_12xk_4mi)
#undef packm_14xk_4mi_ker_name
#define packm_14xk_4mi_ker_name GENARNAME(packm_14xk_4mi)
#undef packm_16xk_4mi_ker_name
#define packm_16xk_4mi_ker_name GENARNAME(packm_16xk_4mi)
#undef packm_2xk_rih_ker_name
#define packm_2xk_rih_ker_name GENARNAME(packm_2xk_rih)
#undef packm_4xk_rih_ker_name
#define packm_4xk_rih_ker_name GENARNAME(packm_4xk_rih)
#undef packm_6xk_rih_ker_name
#define packm_6xk_rih_ker_name GENARNAME(packm_6xk_rih)
#undef packm_8xk_rih_ker_name
#define packm_8xk_rih_ker_name GENARNAME(packm_8xk_rih)
#undef packm_10xk_rih_ker_name
#define packm_10xk_rih_ker_name GENARNAME(packm_10xk_rih)
#undef packm_12xk_rih_ker_name
#define packm_12xk_rih_ker_name GENARNAME(packm_12xk_rih)
#undef packm_14xk_rih_ker_name
#define packm_14xk_rih_ker_name GENARNAME(packm_14xk_rih)
#undef packm_16xk_rih_ker_name
#define packm_16xk_rih_ker_name GENARNAME(packm_16xk_rih)
#undef packm_2xk_1er_ker_name
#define packm_2xk_1er_ker_name GENARNAME(packm_2xk_1er)
#undef packm_4xk_1er_ker_name
@@ -405,8 +313,8 @@ void GENBARNAME(cntx_init)
// NOTE: We set the virtual micro-kernel slots to contain the addresses
// of the native micro-kernels. In general, the ukernels in the virtual
// ukernel slots are always called, and if the function called happens to
// be a virtual micro-kernel, it will then know to find its native
// ukernel in the native ukernel slots.
// be a virtual micro-kernel, it will then know to find its native ukernel
// (i.e., in the native ukernel slots).
gen_func_init( &funcs[ BLIS_GEMM_UKR ], gemm_ukr_name );
gen_func_init( &funcs[ BLIS_GEMMTRSM_L_UKR ], gemmtrsm_l_ukr_name );
gen_func_init( &funcs[ BLIS_GEMMTRSM_U_UKR ], gemmtrsm_u_ukr_name );
@@ -619,41 +527,7 @@ void GENBAINAME(cntx_init)
funcs = bli_cntx_l3_vir_ukrs_buf( cntx );
// 3mh, 4mh, and 4mb do not not support trsm.
bli_func_init_null( &funcs[ BLIS_GEMMTRSM_L_UKR ] );
bli_func_init_null( &funcs[ BLIS_GEMMTRSM_U_UKR ] );
bli_func_init_null( &funcs[ BLIS_TRSM_L_UKR ] );
bli_func_init_null( &funcs[ BLIS_TRSM_U_UKR ] );
if ( method == BLIS_3MH )
{
gen_func_init_co( &funcs[ BLIS_GEMM_UKR ], gemm3mh_ukr_name );
}
else if ( method == BLIS_3M1 )
{
gen_func_init_co( &funcs[ BLIS_GEMM_UKR ], gemm3m1_ukr_name );
gen_func_init_co( &funcs[ BLIS_GEMMTRSM_L_UKR ], gemmtrsm3m1_l_ukr_name );
gen_func_init_co( &funcs[ BLIS_GEMMTRSM_U_UKR ], gemmtrsm3m1_u_ukr_name );
gen_func_init_co( &funcs[ BLIS_TRSM_L_UKR ], trsm3m1_l_ukr_name );
gen_func_init_co( &funcs[ BLIS_TRSM_U_UKR ], trsm3m1_u_ukr_name );
}
else if ( method == BLIS_4MH )
{
gen_func_init_co( &funcs[ BLIS_GEMM_UKR ], gemm4mh_ukr_name );
}
else if ( method == BLIS_4M1B )
{
gen_func_init_co( &funcs[ BLIS_GEMM_UKR ], gemm4mb_ukr_name );
}
else if ( method == BLIS_4M1A )
{
gen_func_init_co( &funcs[ BLIS_GEMM_UKR ], gemm4m1_ukr_name );
gen_func_init_co( &funcs[ BLIS_GEMMTRSM_L_UKR ], gemmtrsm4m1_l_ukr_name );
gen_func_init_co( &funcs[ BLIS_GEMMTRSM_U_UKR ], gemmtrsm4m1_u_ukr_name );
gen_func_init_co( &funcs[ BLIS_TRSM_L_UKR ], trsm4m1_l_ukr_name );
gen_func_init_co( &funcs[ BLIS_TRSM_U_UKR ], trsm4m1_u_ukr_name );
}
else if ( method == BLIS_1M )
if ( method == BLIS_1M )
{
gen_func_init_co( &funcs[ BLIS_GEMM_UKR ], gemm1m_ukr_name );
gen_func_init_co( &funcs[ BLIS_GEMMTRSM_L_UKR ], gemmtrsm1m_l_ukr_name );
@@ -672,7 +546,14 @@ void GENBAINAME(cntx_init)
// For 1m, we employ an optimization which requires that we copy the native
// real domain gemm ukernel function pointers to the corresponding real
// domain slots in the virtual gemm ukernel func_t.
// domain slots in the virtual gemm ukernel func_t. This optimization allows
// us to, under certain conditions, adjust various parameters within the gemm
// macrokernel so that the real-domain macrokernel (which will query and use
// the real-domain virtual gemm ukernel) can be called instead of calling the
// complex-domain macrokernel and the corresponding complex-domain virtual
// microkernel. The non-optimized code path would require an extra level of
// function call overhead, which can be avoided in most cases (i.e., when
// beta has a zero imaginary component and C is either row- or column-stored).
if ( method == BLIS_1M )
{
func_t* gemm_nat_ukrs = bli_cntx_get_l3_nat_ukrs( BLIS_GEMM_UKR, cntx );
@@ -693,40 +574,7 @@ void GENBAINAME(cntx_init)
bli_func_init_null( &funcs[ i ] );
}
if ( method == BLIS_3MH || method == BLIS_4MH )
{
gen_func_init_co( &funcs[ BLIS_PACKM_2XK_KER ], packm_2xk_rih_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_4XK_KER ], packm_4xk_rih_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_6XK_KER ], packm_6xk_rih_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_8XK_KER ], packm_8xk_rih_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_10XK_KER ], packm_10xk_rih_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_12XK_KER ], packm_12xk_rih_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_14XK_KER ], packm_14xk_rih_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_16XK_KER ], packm_16xk_rih_ker_name );
}
else if ( method == BLIS_3M1 )
{
gen_func_init_co( &funcs[ BLIS_PACKM_2XK_KER ], packm_2xk_3mis_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_4XK_KER ], packm_4xk_3mis_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_6XK_KER ], packm_6xk_3mis_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_8XK_KER ], packm_8xk_3mis_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_10XK_KER ], packm_10xk_3mis_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_12XK_KER ], packm_12xk_3mis_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_14XK_KER ], packm_14xk_3mis_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_16XK_KER ], packm_16xk_3mis_ker_name );
}
else if ( method == BLIS_4M1A || method == BLIS_4M1B )
{
gen_func_init_co( &funcs[ BLIS_PACKM_2XK_KER ], packm_2xk_4mi_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_4XK_KER ], packm_4xk_4mi_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_6XK_KER ], packm_6xk_4mi_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_8XK_KER ], packm_8xk_4mi_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_10XK_KER ], packm_10xk_4mi_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_12XK_KER ], packm_12xk_4mi_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_14XK_KER ], packm_14xk_4mi_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_16XK_KER ], packm_16xk_4mi_ker_name );
}
else if ( method == BLIS_1M )
if ( method == BLIS_1M )
{
gen_func_init_co( &funcs[ BLIS_PACKM_2XK_KER ], packm_2xk_1er_ker_name );
gen_func_init_co( &funcs[ BLIS_PACKM_4XK_KER ], packm_4xk_1er_ker_name );
@@ -756,77 +604,7 @@ void GENBAINAME(cntx_init)
// Modify the context with cache and register blocksizes (and multiples)
// appropriate for the current induced method.
if ( method == BLIS_3MH )
{
bli_cntx_set_ind_blkszs
(
method, 6,
BLIS_NC, 1.0, 1.0,
BLIS_KC, 1.0, 1.0,
BLIS_MC, 1.0, 1.0,
BLIS_NR, 1.0, 1.0,
BLIS_MR, 1.0, 1.0,
BLIS_KR, 1.0, 1.0,
cntx
);
}
else if ( method == BLIS_3M1 )
{
bli_cntx_set_ind_blkszs
(
method, 6,
BLIS_NC, 1.0, 1.0,
BLIS_KC, 3.0, 3.0,
BLIS_MC, 1.0, 1.0,
BLIS_NR, 1.0, 1.0,
BLIS_MR, 1.0, 1.0,
BLIS_KR, 1.0, 1.0,
cntx
);
}
else if ( method == BLIS_4MH )
{
bli_cntx_set_ind_blkszs
(
method, 6,
BLIS_NC, 1.0, 1.0,
BLIS_KC, 1.0, 1.0,
BLIS_MC, 1.0, 1.0,
BLIS_NR, 1.0, 1.0,
BLIS_MR, 1.0, 1.0,
BLIS_KR, 1.0, 1.0,
cntx
);
}
else if ( method == BLIS_4M1B )
{
bli_cntx_set_ind_blkszs
(
method, 6,
BLIS_NC, 2.0, 2.0,
BLIS_KC, 1.0, 1.0,
BLIS_MC, 2.0, 2.0,
BLIS_NR, 1.0, 1.0,
BLIS_MR, 1.0, 1.0,
BLIS_KR, 1.0, 1.0,
cntx
);
}
else if ( method == BLIS_4M1A )
{
bli_cntx_set_ind_blkszs
(
method, 6,
BLIS_NC, 1.0, 1.0,
BLIS_KC, 2.0, 2.0,
BLIS_MC, 1.0, 1.0,
BLIS_NR, 1.0, 1.0,
BLIS_MR, 1.0, 1.0,
BLIS_KR, 1.0, 1.0,
cntx
);
}
else if ( method == BLIS_1M )
if ( method == BLIS_1M )
{
//const bool is_pb = FALSE;
@@ -839,43 +617,6 @@ void GENBAINAME(cntx_init)
{
// No change in blocksizes needed for native execution.
}
// -- Set misc. other fields -----------------------------------------------
if ( method == BLIS_3MH )
{
// Schemas vary with _stage().
}
else if ( method == BLIS_3M1 )
{
//bli_cntx_set_schema_a_block( BLIS_PACKED_ROW_PANELS_3MI, cntx );
//bli_cntx_set_schema_b_panel( BLIS_PACKED_COL_PANELS_3MI, cntx );
}
else if ( method == BLIS_4MH )
{
// Schemas vary with _stage().
}
else if ( method == BLIS_4M1A || method == BLIS_4M1B )
{
//bli_cntx_set_schema_a_block( BLIS_PACKED_ROW_PANELS_4MI, cntx );
//bli_cntx_set_schema_b_panel( BLIS_PACKED_COL_PANELS_4MI, cntx );
}
else if ( method == BLIS_1M )
{
//const bool is_pb = FALSE;
// Set the anti-preference field to TRUE when executing a panel-block
// algorithm, and FALSE otherwise. This will cause higher-level generic
// code to establish (if needed) disagreement between the storage of C and
// the micro-kernel output preference so that the two will come back into
// agreement in the panel-block macro-kernel (which implemented in terms
// of the block-panel macro-kernel with some induced transpositions).
//bli_cntx_set_anti_pref( is_pb, cntx );
}
else // if ( method == BLIS_NAT )
{
}
}
// -----------------------------------------------------------------------------

View File

@@ -1,336 +0,0 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name(s) of the copyright holder(s) nor the names of its
contributors may be used to endorse or promote products derived
from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
#undef GENTFUNCCO
#define GENTFUNCCO( ctype, ctype_r, ch, chr, opname, arch, suf ) \
\
void PASTEMAC3(ch,opname,arch,suf) \
( \
dim_t k, \
ctype* restrict alpha, \
ctype* restrict a, \
ctype* restrict b, \
ctype* restrict beta, \
ctype* restrict c, inc_t rs_c, inc_t cs_c, \
auxinfo_t* restrict data, \
cntx_t* restrict cntx \
) \
{ \
const num_t dt_r = PASTEMAC(chr,type); \
\
PASTECH(chr,gemm_ukr_ft) \
rgemm_ukr = bli_cntx_get_l3_nat_ukr_dt( dt_r, BLIS_GEMM_UKR, cntx ); \
\
const dim_t mr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_MR, cntx ); \
const dim_t nr = bli_cntx_get_blksz_def_dt( dt_r, BLIS_NR, cntx ); \
\
const dim_t m = mr; \
const dim_t n = nr; \
\
ctype_r ab_r[ BLIS_STACK_BUF_MAX_SIZE \
/ sizeof( ctype_r ) ] \
__attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \
ctype_r ab_i[ BLIS_STACK_BUF_MAX_SIZE \
/ sizeof( ctype_r ) ] \
__attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \
ctype_r ab_rpi[ BLIS_STACK_BUF_MAX_SIZE \
/ sizeof( ctype_r ) ] \
__attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE))); \
inc_t rs_ab; \
inc_t cs_ab; \
\
const inc_t is_a = bli_auxinfo_is_a( data ); \
const inc_t is_b = bli_auxinfo_is_b( data ); \
\
ctype_r* restrict a_r = ( ctype_r* )a; \
ctype_r* restrict a_i = ( ctype_r* )a + is_a; \
ctype_r* restrict a_rpi = ( ctype_r* )a + 2*is_a; \
\
ctype_r* restrict b_r = ( ctype_r* )b; \
ctype_r* restrict b_i = ( ctype_r* )b + is_b; \
ctype_r* restrict b_rpi = ( ctype_r* )b + 2*is_b; \
\
ctype_r* restrict zero_r = PASTEMAC(chr,0); \
\
ctype_r* restrict alpha_r = &PASTEMAC(ch,real)( *alpha ); \
ctype_r* restrict alpha_i = &PASTEMAC(ch,imag)( *alpha ); \
\
const ctype_r beta_r = PASTEMAC(ch,real)( *beta ); \
const ctype_r beta_i = PASTEMAC(ch,imag)( *beta ); \
\
void* a_next = bli_auxinfo_next_a( data ); \
void* b_next = bli_auxinfo_next_b( data ); \
\
dim_t n_iter; \
dim_t n_elem; \
\
inc_t incc, ldc; \
inc_t incab, ldab; \
\
dim_t i, j; \
\
\
/* SAFETY CHECK: The higher level implementation should never
allow an alpha with non-zero imaginary component to be passed
in, because it can't be applied properly using the 3m method.
If alpha is not real, then something is very wrong. */ \
if ( !PASTEMAC(chr,eq0)( *alpha_i ) ) \
bli_check_error_code( BLIS_NOT_YET_IMPLEMENTED ); \
\
\
/* An optimization: Set local strides and loop bounds based on the
strides of c, so that (a) the micro-kernel accesses ct the same
way it would if it were updating c directly, and (b) c is updated
contiguously. For c with general stride, we access ct the same way
we would as if it were column-stored. */ \
if ( bli_is_row_stored( rs_c, cs_c ) ) \
{ \
rs_ab = n; n_iter = m; incc = cs_c; \
cs_ab = 1; n_elem = n; ldc = rs_c; \
} \
else /* column-stored or general stride */ \
{ \
rs_ab = 1; n_iter = n; incc = rs_c; \
cs_ab = m; n_elem = m; ldc = cs_c; \
} \
incab = 1; \
ldab = n_elem; \
\
\
/* The following gemm micro-kernel calls implement all "phases" of the
3m method:
c = beta * c;
c_r += + a_r * b_r - a_i * b_i;
c_i += (a_r + a_i)(b_r + b_i) - a_r * b_r - a_i * b_i;
NOTE: Scaling by alpha_r is not shown above, but is implemented
below. */ \
\
\
bli_auxinfo_set_next_ab( a_i, b_i, data ); \
\
/* ab_r = alpha_r * a_r * b_r; */ \
rgemm_ukr \
( \
k, \
alpha_r, \
a_r, \
b_r, \
zero_r, \
ab_r, rs_ab, cs_ab, \
data, \
cntx \
); \
\
bli_auxinfo_set_next_ab( a_rpi, b_rpi, data ); \
\
/* ab_i = alpha_r * a_i * b_i; */ \
rgemm_ukr \
( \
k, \
alpha_r, \
a_i, \
b_i, \
zero_r, \
ab_i, rs_ab, cs_ab, \
data, \
cntx \
); \
\
bli_auxinfo_set_next_ab( a_next, b_next, data ); \
\
/* ct_i = alpha_r * a_ri * b_ri; */ \
rgemm_ukr \
( \
k, \
alpha_r, \
a_rpi, \
b_rpi, \
zero_r, \
ab_rpi, rs_ab, cs_ab, \
data, \
cntx \
); \
\
\
/* How we accumulate the intermediate matrix products stored in ab_r,
ab_i, and ab_rpi depends on the value of beta. */ \
if ( !PASTEMAC(chr,eq0)( beta_i ) ) \
{ \
/* c = beta * c;
c_r = c_r + ab_r - ab_i;
c_i = c_i + ab_rpi - ab_r - ab_i; */ \
for ( j = 0; j < n_iter; ++j ) \
for ( i = 0; i < n_elem; ++i ) \
{ \
const ctype_r alphabeta11_r = *(ab_r + i*incab + j*ldab); \
const ctype_r alphabeta11_i = *(ab_i + i*incab + j*ldab); \
const ctype_r alphabeta11_rpi = *(ab_rpi + i*incab + j*ldab); \
ctype* restrict gamma11 = c + i*incc + j*ldc ; \
ctype_r* restrict gamma11_r = &PASTEMAC(ch,real)( *gamma11 ); \
ctype_r* restrict gamma11_i = &PASTEMAC(ch,imag)( *gamma11 ); \
ctype_r gamma11t_r; \
ctype_r gamma11t_i; \
\
PASTEMAC(ch,copyris)( alphabeta11_r, \
-alphabeta11_r, \
gamma11t_r, \
gamma11t_i ); \
\
PASTEMAC(ch,subris)( alphabeta11_i, \
alphabeta11_i, \
gamma11t_r, \
gamma11t_i ); \
\
PASTEMAC(chr,adds)( alphabeta11_rpi, \
gamma11t_i ); \
\
PASTEMAC(ch,xpbyris)( gamma11t_r, \
gamma11t_i, \
beta_r, \
beta_i, \
*gamma11_r, \
*gamma11_i ); \
} \
} \
else if ( PASTEMAC(chr,eq1)( beta_r ) ) \
{ \
/* c_r = c_r + ab_r - ab_i;
c_i = c_i + ab_rpi - ab_r - ab_i; */ \
for ( j = 0; j < n_iter; ++j ) \
for ( i = 0; i < n_elem; ++i ) \
{ \
const ctype_r alphabeta11_r = *(ab_r + i*incab + j*ldab); \
const ctype_r alphabeta11_i = *(ab_i + i*incab + j*ldab); \
const ctype_r alphabeta11_rpi = *(ab_rpi + i*incab + j*ldab); \
ctype* restrict gamma11 = c + i*incc + j*ldc ; \
ctype_r* restrict gamma11_r = &PASTEMAC(ch,real)( *gamma11 ); \
ctype_r* restrict gamma11_i = &PASTEMAC(ch,imag)( *gamma11 ); \
ctype_r gamma11t_r; \
ctype_r gamma11t_i; \
\
PASTEMAC(ch,copyris)( alphabeta11_r, \
-alphabeta11_r, \
gamma11t_r, \
gamma11t_i ); \
\
PASTEMAC(ch,subris)( alphabeta11_i, \
alphabeta11_i, \
gamma11t_r, \
gamma11t_i ); \
\
PASTEMAC(chr,adds)( alphabeta11_rpi, \
gamma11t_i ); \
\
PASTEMAC(ch,addris)( gamma11t_r, \
gamma11t_i, \
*gamma11_r, \
*gamma11_i ); \
} \
} \
else if ( !PASTEMAC(chr,eq0)( beta_r ) ) \
{ \
/* c_r = beta_r * c_r + ab_r - ab_i;
c_i = beta_r * c_i + ab_rpi - ab_r - ab_i; */ \
for ( j = 0; j < n_iter; ++j ) \
for ( i = 0; i < n_elem; ++i ) \
{ \
const ctype_r alphabeta11_r = *(ab_r + i*incab + j*ldab); \
const ctype_r alphabeta11_i = *(ab_i + i*incab + j*ldab); \
const ctype_r alphabeta11_rpi = *(ab_rpi + i*incab + j*ldab); \
ctype* restrict gamma11 = c + i*incc + j*ldc ; \
ctype_r* restrict gamma11_r = &PASTEMAC(ch,real)( *gamma11 ); \
ctype_r* restrict gamma11_i = &PASTEMAC(ch,imag)( *gamma11 ); \
ctype_r gamma11t_r; \
ctype_r gamma11t_i; \
\
PASTEMAC(ch,copyris)( alphabeta11_r, \
-alphabeta11_r, \
gamma11t_r, \
gamma11t_i ); \
\
PASTEMAC(ch,subris)( alphabeta11_i, \
alphabeta11_i, \
gamma11t_r, \
gamma11t_i ); \
\
PASTEMAC(chr,adds)( alphabeta11_rpi, \
gamma11t_i ); \
\
PASTEMAC(chr,xpbys)( gamma11t_r, beta_r, *gamma11_r ); \
PASTEMAC(chr,xpbys)( gamma11t_i, beta_r, *gamma11_i ); \
} \
} \
else /* if ( PASTEMAC(chr,eq0)( beta_r ) ) */ \
{ \
/* c_r = ab_r - ab_i;
c_i = ab_rpi - ab_r - ab_i; */ \
for ( j = 0; j < n_iter; ++j ) \
for ( i = 0; i < n_elem; ++i ) \
{ \
const ctype_r alphabeta11_r = *(ab_r + i*incab + j*ldab); \
const ctype_r alphabeta11_i = *(ab_i + i*incab + j*ldab); \
const ctype_r alphabeta11_rpi = *(ab_rpi + i*incab + j*ldab); \
ctype* restrict gamma11 = c + i*incc + j*ldc ; \
ctype_r* restrict gamma11_r = &PASTEMAC(ch,real)( *gamma11 ); \
ctype_r* restrict gamma11_i = &PASTEMAC(ch,imag)( *gamma11 ); \
ctype_r gamma11t_r; \
ctype_r gamma11t_i; \
\
PASTEMAC(ch,copyris)( alphabeta11_r, \
-alphabeta11_r, \
gamma11t_r, \
gamma11t_i ); \
\
PASTEMAC(ch,subris)( alphabeta11_i, \
alphabeta11_i, \
gamma11t_r, \
gamma11t_i ); \
\
PASTEMAC(chr,adds)( alphabeta11_rpi, \
gamma11t_i ); \
\
PASTEMAC(ch,copyris)( gamma11t_r, \
gamma11t_i, \
*gamma11_r, \
*gamma11_i ); \
} \
} \
}
INSERT_GENTFUNCCO_BASIC2( gemm3m1, BLIS_CNAME_INFIX, BLIS_REF_SUFFIX )

Some files were not shown because too many files have changed in this diff Show More