mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
Merge remote-tracking branch 'origin/master' into knl
This commit is contained in:
766
CHANGELOG
766
CHANGELOG
@@ -1,10 +1,772 @@
|
||||
commit 47caa33485b91ea6f2a5e386e61210c90c5f489f (HEAD -> master, tag: 0.1.8)
|
||||
commit 898614a555ea0aa7de4ca07bb3cb8f5708b6a002 (HEAD -> master, tag: 0.2.0)
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Apr 11 17:32:09 2016 -0500
|
||||
|
||||
Version file update (0.2.0)
|
||||
|
||||
commit 537a1f4f85ce1aa008901857cb3182e6b4546d7f
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Apr 11 17:21:28 2016 -0500
|
||||
|
||||
Implemented runtime contexts and reorganized code.
|
||||
|
||||
Details:
|
||||
- Retrofitted a new data structure, known as a context, into virtually
|
||||
all internal APIs for computational operations in BLIS. The structure
|
||||
is now present within the type-aware APIs, as well as many supporting
|
||||
utility functions that require information stored in the context. User-
|
||||
level object APIs were unaffected and continue to be "context-free,"
|
||||
however, these APIs were duplicated/mirrored so that "context-aware"
|
||||
APIs now also exist, differentiated with an "_ex" suffix (for "expert").
|
||||
These new context-aware object APIs (along with the lower-level, type-
|
||||
aware, BLAS-like APIs) contain the the address of a context as a last
|
||||
parameter, after all other operands. Contexts, or specifically, cntx_t
|
||||
object pointers, are passed all the way down the function stack into
|
||||
the kernels and allow the code at any level to query information about
|
||||
the runtime, such as kernel addresses and blocksizes, in a thread-
|
||||
friendly manner--that is, one that allows thread-safety, even if the
|
||||
original source of the information stored in the context changes at
|
||||
run-time; see next bullet for more on this "original source" of info).
|
||||
(Special thanks go to Lee Killough for suggesting the use of this kind
|
||||
of data structure in discussions that transpired during the early
|
||||
planning stages of BLIS, and also for suggesting such a perfectly
|
||||
appropriate name.)
|
||||
- Added a new API, in frame/base/bli_gks.c, to define a "global kernel
|
||||
structure" (gks). This data structure and API will allow the caller to
|
||||
initialize a context with the kernel addresses, blocksizes, and other
|
||||
information associated with the currently active kernel configuration.
|
||||
The currently active kernel configuration within the gks cannot be
|
||||
changed (for now), and is initialized with the traditional cpp macros
|
||||
that define kernel function names, blocksizes, and the like. However,
|
||||
in the future, the gks API will be expanded to allow runtime management
|
||||
of kernels and runtime parameters. The most obvious application of this
|
||||
new infrastructure is the runtime detection of hardware (and the
|
||||
implied selection of appropriate kernels). With contexts in place,
|
||||
kernels may even be "hot swapped" at runtime within the gks. Once
|
||||
execution enters a level-3 _front() function, the memory allocator will
|
||||
be reinitialized on-the-fly, if necessary, to accommodate the new
|
||||
kernels' blocksizes. If another application thread is executing with
|
||||
another (previously loaded) kernel, it will finish in a deterministic
|
||||
fashion because its kernel information was loaded into its context
|
||||
before computation began, and also because the blocks it checked out
|
||||
from the internal memory pools will be unaffected by the newer threads'
|
||||
reinitialization of the allocator.
|
||||
- Reorganized and streamlined the 'ind' directory, which contains much of
|
||||
the code enabling use of induced methods for complex domain matrix
|
||||
multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as
|
||||
those APIs' functionality is now mostly subsumed within the global
|
||||
kernel structure.
|
||||
- Updated bli_pool.c to define a new function, bli_pool_reinit_if(),
|
||||
that will reinitialize a memory pool if the necessary pool block size
|
||||
has increased.
|
||||
- Updated bli_mem.c to use bli_pool_reinit_if() instead of
|
||||
bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed
|
||||
usage of contexts where appropriate to communicate cache and register
|
||||
blocksizes to bli_mem_compute_pool_block_sizes().
|
||||
- Simplified control trees now that much of the information resides in
|
||||
the context and/or the global kernel structure:
|
||||
- Removed blocksize object pointers (blksz_t*) fields from all control
|
||||
tree node definitions and replaced them with blocksize id (bszid_t)
|
||||
values instead, which may be passed into a context query routine in
|
||||
order to extract the corresponding blocksize from the given context.
|
||||
- Removed micro-kernel function pointers (func_t*) fields from all
|
||||
control tree node definitions. Now, any code that needs these function
|
||||
pointers can query them from the local context, as identified by a
|
||||
level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or
|
||||
level-1v kernel id (l1vkr_t).
|
||||
- Removed blksz_t object creation and initialization, as well as kernel
|
||||
function object creation and initialization, from all operation-
|
||||
specific control tree initialization files (bli_*_cntl.c), since this
|
||||
information will now live in the gks and, secondarily, in the context.
|
||||
- Removed blocksize multiples from blksz_t objects. Now, we track
|
||||
blocksize multiples for each blocksize id (bszid_t) in the context
|
||||
object.
|
||||
- Removed the bool_t's that were required when a func_t was initialized.
|
||||
These bools are meant to allow one to track the micro-kernel's storage
|
||||
preferences (by rows or columns). This preference is now tracked
|
||||
separately within the gks and contexts.
|
||||
- Merged and reorganized many separate-but-related functions into single
|
||||
files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and
|
||||
util directories, but has the most obvious effect of allowing BLIS
|
||||
to compile noticeably faster.
|
||||
- Reorganized execution paths for level-1v, -1d, -1m, and -2 operations
|
||||
in an attempt to reduce overhead for memory-bound operations. This
|
||||
includes removal of default use of object-based variants for level-2
|
||||
operations. Now, by default, level-2 operations will directly call a
|
||||
low-level (non-object based) loop over a level-1v or -1f kernel.
|
||||
- Converted many common query functions in blk_blksz.c (renamed from
|
||||
bli_blocksize.c) and bli_func.c into cpp macros, now defined in their
|
||||
respective header files.
|
||||
- Defined bli_mbool.c API to create and query "multi-bools", or
|
||||
heterogeneous bool_t's (one for each floating-point datatype), in the
|
||||
same spirit as blksz_t and func_t.
|
||||
- Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS
|
||||
and BLIS_SIMD_SIZE. These values are needed in order to compute a third
|
||||
new parameter, which may be set indirectly via the aforementioned
|
||||
macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to
|
||||
statically allocate memory in macro-kernels and the induced methods'
|
||||
virtual kernels to be used as temporary space to hold a single
|
||||
micro-tile. These values are now output by the testsuite. The default
|
||||
value of BLIS_STACK_BUF_MAX_SIZE is computed as
|
||||
"2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE".
|
||||
- Cleaned up top-level 'kernels' directory (for example, renaming the
|
||||
embarrassingly misleading "avx" and "avx2" directories to "sandybridge"
|
||||
and "haswell," respectively, and gave more consistent and meaningful
|
||||
names to many kernel files (as well as updating their interfaces to
|
||||
conform to the new context-aware kernel APIs).
|
||||
- Updated the testsuite to query blocksizes from a locally-initialized
|
||||
context for test modules that need those values: axpyf, dotxf,
|
||||
dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr.
|
||||
- Reformatted many function signatures into a standard format that will
|
||||
more easily facilitate future API-wide changes.
|
||||
- Updated many "mxn" level-0 macros (ie: those used to inline double loops
|
||||
for level-1m-like operations on small matrices) in frame/include/level0
|
||||
to use more obscure local variable names in an effort to avoid variable
|
||||
shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings,
|
||||
which are only output using -Wshadow.)
|
||||
- Added a conj argument to setm, so that its interface now mirrors that
|
||||
of scalm. The semantic meaning of the conj argument is to optionally
|
||||
allow implicit conjugation of the scalar prior to being populated into
|
||||
the object.
|
||||
- Deprecated all type-aware mixed domain and mixed precision APIs. Note
|
||||
that this does not preclude supporting mixed types via the object APIs,
|
||||
where it produces absolutely zero API code bloat.
|
||||
|
||||
commit d1f8e5d9b2ecd054ed103f4d642d748db2d4f173 (origin/master)
|
||||
Merge: 20af937 c11d28e
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Apr 5 12:21:27 2016 -0500
|
||||
|
||||
Merge pull request #60 from esauvage/master
|
||||
|
||||
sgemm µkernel for bulldozer : bug correction for k%4 != 0
|
||||
|
||||
commit c11d28eed89d65494bc4019f04d046520866c0ff
|
||||
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
|
||||
Date: Sat Apr 2 21:15:48 2016 +0200
|
||||
|
||||
cgemm µkernel for bulldozer : bug correction for k%4 != 0
|
||||
|
||||
commit 20af937b57f82bb3acb09418d5c0206e1b24f2c7
|
||||
Merge: 36c3abb fc61a11
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Mar 31 14:37:30 2016 -0500
|
||||
|
||||
Merge pull request #59 from devinamatthews/fix_testsuite_makefile
|
||||
|
||||
Fix testsuite makefile
|
||||
|
||||
commit fc61a1143edeba4946d4b9915f1775bb08e643fc
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Thu Mar 31 10:53:01 2016 -0500
|
||||
|
||||
Fix formatting in configure.
|
||||
|
||||
commit 26379b14de630e3a6c6eef5dfe87ff001558a8a6
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Thu Mar 31 10:45:48 2016 -0500
|
||||
|
||||
Adjust paths in common.mk to support building from testsuite dir.
|
||||
|
||||
commit 36c3abb05fecb02d4a9ab13b2b69d133adf34583
|
||||
Merge: 64b41fa 917ce75
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Mar 31 10:26:17 2016 -0500
|
||||
|
||||
Merge pull request #58 from esauvage/master
|
||||
|
||||
cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer confi…
|
||||
|
||||
commit 356d854fc9e34642cc46e0e02a8ceb56114878af
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Wed Mar 30 16:33:15 2016 -0500
|
||||
|
||||
Make symlink to common.mk in build directory.
|
||||
|
||||
commit edbb8470044f82ef959583ee09613a5a985292b5
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Wed Mar 30 16:27:11 2016 -0500
|
||||
|
||||
Refactor out some definitions which moved from make_defs.mk to Makefile for use in testsuite Makefile.
|
||||
|
||||
commit 917ce75482a543fef46553efff6c246939761e59
|
||||
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
|
||||
Date: Wed Mar 30 22:03:09 2016 +0200
|
||||
|
||||
cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel
|
||||
|
||||
commit 64b41fa554dff44b2f9ad48901b67c63836407a8
|
||||
Merge: 1b09e34 0171ad5
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Mar 29 15:19:41 2016 -0500
|
||||
|
||||
Merge pull request #54 from devinamatthews/more_config_opts
|
||||
|
||||
More config opts
|
||||
|
||||
commit 1b09e343dfe5b48b4842e2cb96f41c8cc249bad0
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Mar 29 12:55:28 2016 -0500
|
||||
|
||||
Updated gcc version from 4.8 to 4.9 in .travis.yml.
|
||||
|
||||
commit 0171ad58997b3a5a9b76301511dbe0751fffc940
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Mon Mar 28 13:55:06 2016 -0500
|
||||
|
||||
Add icc and clang support for Intel architectures, fixes #47. 2bd036f fixes #49 BTW.
|
||||
|
||||
commit 3090fff64cc87ff2519a09f38e6b8699cf3cba11
|
||||
Merge: 8624e36 4ca5d5b
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Mar 28 12:36:25 2016 -0500
|
||||
|
||||
Merge pull request #44 from esauvage/master
|
||||
|
||||
sgemm micro-kernel for FMA4 instruction set
|
||||
|
||||
commit e6e566426ac3ded7ef87cd8ff9be98accfdc4acc
|
||||
Merge: 469429e 8624e36
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Sat Mar 26 14:10:15 2016 -0500
|
||||
|
||||
Merge branch 'master' into more_config_opts
|
||||
|
||||
commit 8624e36543160739d954c4dbcc5a5594458f3a12
|
||||
Merge: a315833 2bd036f
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Sat Mar 26 13:56:28 2016 -0500
|
||||
|
||||
Merge pull request #50 from devinamatthews/fix_noopt_avx
|
||||
|
||||
Fix configuration issue where instruction set flags are not specified for debug builds.
|
||||
|
||||
commit 469429ec34e5b1a172ce35596f9c7afdaacac131
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Fri Mar 25 20:45:41 2016 -0500
|
||||
|
||||
Fix LD_FLAGS -> LDFLAGS.
|
||||
|
||||
commit 8442d65c9ead0376fc5f2dfad62fd4862ab9b2b3
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Fri Mar 25 20:06:48 2016 -0500
|
||||
|
||||
Replace -march=native with specific architecture flags to support cross-compiling, and add icc support for Intel architectures.
|
||||
|
||||
commit 76099f20be1b49ac960f7e3c5a8296bbf4e1782d
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Fri Mar 25 17:22:58 2016 -0500
|
||||
|
||||
Add threading option to configure.
|
||||
|
||||
commit ad43eab4c7899d56d8d7caa6e2d92bc0581ea5a5
|
||||
Merge: 9452bdb 2bd036f
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Fri Mar 25 15:00:02 2016 -0500
|
||||
|
||||
Merge branch 'fix_noopt_avx' into more_config_opts
|
||||
|
||||
commit 9452bdb3afbf2d7f898134a091d7790817e7be9c
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Fri Mar 25 14:59:50 2016 -0500
|
||||
|
||||
Add options for verbose make output and static/shared linking to configure.
|
||||
|
||||
commit 2bd036f1f9ce1ee0864365557f66d9415dd42de3
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Fri Mar 25 12:16:49 2016 -0500
|
||||
|
||||
Fix configuration issue where instruction set flags are not specified for debug builds.
|
||||
|
||||
commit a315833f067944fb0bc14cf60f0c7dcb5dc897b6
|
||||
Merge: 1d1a426 af92773
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Mar 24 12:30:21 2016 -0500
|
||||
|
||||
Merge pull request #48 from figual/master
|
||||
|
||||
Updated and improved ARMv8 micro-kernels.
|
||||
|
||||
commit af92773f4f85a2441fe0c6e3a52c31b07253d08e
|
||||
Author: figual <figual@ucm.es>
|
||||
Date: Wed Mar 23 22:07:02 2016 +0100
|
||||
|
||||
Updated and improved ARMv8 micro-kernels.
|
||||
|
||||
commit 1d1a426d18ec03754021456862a1f4d1dfec1fbf
|
||||
Merge: 5a978ff d226dfa
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Mar 7 15:17:53 2016 -0600
|
||||
|
||||
Merge pull request #46 from devinamatthews/new-config-opts
|
||||
|
||||
Add several changes to the build system.
|
||||
|
||||
commit d226dfa05190eb477b33563b1edccf8603973336
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Sat Mar 5 16:18:14 2016 -0600
|
||||
|
||||
Add several changes to the build system.
|
||||
|
||||
1) Add -- options.
|
||||
2) Add -d/--enable-debug option to enable debugging symbols with and without optimization.
|
||||
3) Allow user to specify CC at configure time, and determine vendor (gcc/icc/etc.). For now configurations enforce a particular vendor.
|
||||
4) Add make V=[0,1] option to control build verbosity.
|
||||
|
||||
commit 5a978fffdb8f09a81c89541d541d4a6830cd70a4
|
||||
Merge: adb2b4e 63e2642
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Mar 4 17:26:58 2016 -0600
|
||||
|
||||
Merge pull request #45 from devinamatthews/high_prec_timers
|
||||
|
||||
Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday
|
||||
|
||||
commit 63e264239053b913164a849dd8a45829087eaddc
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Fri Mar 4 13:17:50 2016 -0600
|
||||
|
||||
Make sure that -lrt is linked on Linux.
|
||||
|
||||
commit 44fddd48dc1708a956803d1948f04429ec0d8700
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Fri Mar 4 12:36:38 2016 -0600
|
||||
|
||||
Add missing \.
|
||||
|
||||
commit 7cabd2131f953de23e7015d760b0ddfda51b1251
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Thu Mar 3 11:43:07 2016 -0600
|
||||
|
||||
Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday.
|
||||
|
||||
commit adb2b4e096c78e8b2f85fd372cf0d5eb04af5be8
|
||||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||||
Date: Wed Mar 2 14:48:12 2016 -0600
|
||||
|
||||
Fixing guard for non implemented partitioning through packed matrices
|
||||
|
||||
commit 4ca5d5b1fd6f2e4a8b2e139c5405475239581e51
|
||||
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
|
||||
Date: Tue Mar 1 21:33:01 2016 +0100
|
||||
|
||||
sgemm micro-kernel for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel
|
||||
|
||||
commit 627d59b5ba06866b26f46e4434a0435b600925e3
|
||||
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
|
||||
Date: Mon Feb 29 21:53:12 2016 +0100
|
||||
|
||||
symbolic link for bulldozer configuration to kernels
|
||||
|
||||
commit 2dc5c0ae038ed175fab85751803ada05734d1ba1
|
||||
Merge: f2809fc 3d0fae8
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Feb 29 12:22:51 2016 -0600
|
||||
|
||||
Merge pull request #40 from tkelman/bulldozer-symlink
|
||||
|
||||
Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
|
||||
|
||||
commit f2809fc5f74466c755da6a5b4632853e634060b5
|
||||
Merge: f86b94f 8624a33
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Sat Feb 27 13:06:03 2016 -0600
|
||||
|
||||
Merge pull request #39 from devinamatthews/fix_f2c_conflicts
|
||||
|
||||
Devin's f2c type namespace update.
|
||||
|
||||
Details:
|
||||
- Added "bla_" prefix to f2c type names to prevent conflicts with external user code.
|
||||
- Removed most of the body of bli_f2c.h, which was unused.
|
||||
|
||||
commit 3d0fae810d942085d8f2d389820b4e0027577db8
|
||||
Author: Tony Kelman <tony@kelman.net>
|
||||
Date: Thu Feb 25 23:24:03 2016 -0800
|
||||
|
||||
Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
|
||||
|
||||
to fix linking issue mentioned in #37 and https://groups.google.com/forum/#!topic/blis-devel/iypwljcaeEI
|
||||
|
||||
commit 8624a33ccc12dff6f6c4f92992ca5636af1576a6
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Thu Feb 25 13:51:26 2016 -0600
|
||||
|
||||
Fix remaining f2c conflicts.
|
||||
|
||||
commit 372eef0b6c0a535bf88d4b46b72f61266e8491ba
|
||||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||||
Date: Thu Feb 25 12:01:58 2016 -0600
|
||||
|
||||
Fixed most conflicts after hack-n-slash ofr bli_f2c.h, cleanup in
|
||||
progress.
|
||||
|
||||
commit f86b94f206e2e09fa3221cc55c3dc5b05ca4775a
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Feb 23 18:12:34 2016 -0600
|
||||
|
||||
Included missing blas2blis integer def to CBLAS.
|
||||
|
||||
Details:
|
||||
- Added #include "bli_config_macro_defs" to all cblas_*.c files in
|
||||
compat/cblas/src. This has the effect of defining
|
||||
BLIS_BLAS2BLIS_INT_TYPE_SIZE to the default value if bli_config.h does
|
||||
not define it. Thanks to Tony Kelman for reporting this bug.
|
||||
- In cblas_i?amax.c, changed the type of the variable 'iamax' from 'int'
|
||||
to 'f77_int'. This eliminates a compiler warning and a potential
|
||||
runtime bug and/or crash when the size of an int differs from the size
|
||||
of f77_int (as determined by BLIS_BLAS2BLIS_INT_TYPE_SIZE).
|
||||
|
||||
commit 0b126de1342c11c65623bcb38e258e21e9244e3d
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Nov 13 16:29:12 2015 -0600
|
||||
|
||||
Consolidated packm_blk_var1 and packm_blk_var2.
|
||||
|
||||
Details:
|
||||
- Consolidated the two blocked variants for packm into a single
|
||||
implementation (packm_blk_var1) and removed the other variant.
|
||||
- Updated all induced method _cntl_init() functions in frame/cntl/ind/
|
||||
to use the new blocked variant 1.
|
||||
- Defined two new macros, bli_is_ind_packed() and bli_is_nat_packed(),
|
||||
to detect pack_t schemas for induced methods and native execution,
|
||||
respectively.
|
||||
|
||||
commit 30e5eb29e060b97752f702d2ea5d101d950f53b2
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Nov 13 12:14:19 2015 -0600
|
||||
|
||||
Minor changes to treatment of rs, cs in bli_obj.c.
|
||||
|
||||
Details:
|
||||
- Applied a patch submitted by Devin Matthews that:
|
||||
- implements subtle changes to handling of somewhat unusual cases of
|
||||
row and column strides to accommodate certail tensor cases, which
|
||||
includes adding dimension parameters to _is_col_tilted() and
|
||||
_is_row_tilted() macros,
|
||||
- simplifies how buffers are sized when requested BLIS-allocated
|
||||
objects,
|
||||
- re-consolidates bli_adjust_strides_*() into one function, and
|
||||
- defines 'restrict' keyword as a "nothing" macro for C++ and pre-C99
|
||||
environments.
|
||||
|
||||
commit f0a4f41b5acf55b41707ec821c4c5f9076dfbc24
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Nov 12 15:22:50 2015 -0600
|
||||
|
||||
Fixed unimplemented case in core2 sgemm ukernel.
|
||||
|
||||
Details:
|
||||
- Implemented the "beta == 0" case for general stride output for the
|
||||
dunnington sgemm micro-kernel. This case had been, up until now,
|
||||
identical to the "beta != 0" case, which does not work when the
|
||||
output matrix has nan's and inf's. It had manifested as nan residuals
|
||||
in the test suite for right-side tests of ctrsm4m1a. Thanks to Devin
|
||||
Matthews for reporting this bug.
|
||||
|
||||
commit 42810bbfa0b8f006ecc5128d903909ec13ea63f9
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Nov 12 12:07:46 2015 -0600
|
||||
|
||||
Fixed minor bugs for uncommon obj_create cases.
|
||||
|
||||
Details:
|
||||
- Separated bli_adjust_strides() into _alloc() and _attach() flavors so
|
||||
that the latter can avoid a test performed by the former, in which the
|
||||
rs and cs are overridden and set to zero if either matrix dimension is
|
||||
zero. Actually, we also disable this overridding behavior, even for the
|
||||
_alloc() case, since keeping the original strides (probably) does not
|
||||
hurt anything. The original code has been kept commented-out, though,
|
||||
in case an unintended consequence is later discovered.
|
||||
- Fixed a typo in an error check for general stride cases where rs == cs.
|
||||
|
||||
commit 3e6dd11467643fbc2cb45c13cec8dd6024232833
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Nov 3 10:30:08 2015 -0600
|
||||
|
||||
Minor re-expression in quadratic partitioning code.
|
||||
|
||||
Details:
|
||||
- Minor change to quadratic equation solution code that avoids
|
||||
recomputation of the sqrt() parameter when the compiler is not
|
||||
smart enough to perform this optimization automatically.
|
||||
|
||||
commit 0694b722f7e4df00efb32639095a2aca80e67f52
|
||||
Merge: 3e116f0 33557ec
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Nov 2 17:24:25 2015 -0600
|
||||
|
||||
Merge branch 'master' of github.com:flame/blis
|
||||
|
||||
commit 3e116f0a2953f50b3c068759a775ad7ffae04e49
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Nov 2 17:18:23 2015 -0600
|
||||
|
||||
Fixed imaginary bug in quadratic partitioning code.
|
||||
|
||||
Details:
|
||||
- Fixed a bug in the relatively new quadratic partitioning code that,
|
||||
under the right conditions, would perform sqrt() on a negative value.
|
||||
If the solution is imaginary, we discard it and use an alternate
|
||||
partition width that assumes no diagonal intersection. That alternate
|
||||
width is actually already computed, so, the fix was quite simple.
|
||||
Thanks to Devangi Parikh for reporting this bug.
|
||||
|
||||
commit 33557ecccaf49b2569b7f3d7bcea52c2aab94c68
|
||||
Author: Jeff Hammond <jeff.science@gmail.com>
|
||||
Date: Mon Nov 2 12:18:43 2015 -0800
|
||||
|
||||
add Travis CI build status icon to the README
|
||||
|
||||
commit 4a502fbe77bd0f701108baaa559d9cfb483f88de
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Nov 2 13:28:34 2015 -0600
|
||||
|
||||
Laid groundwork for runtime memory pool resizing.
|
||||
|
||||
Details:
|
||||
- Changed bli_pool_finalize() so that the freeing begins with the block
|
||||
at top_index instead of block 0. This allows us to use the function
|
||||
for terminal finalization as well as temporary cleanup prior to
|
||||
reinitialization. Also, clear the pool_t struct upon _pool_finalize()
|
||||
in case it is called in the terminal case with some blocks still
|
||||
checked out to threads (in which case the threads will see the new
|
||||
block size as 0 and thus release the block as intended).
|
||||
- Added bli_pool_reinit(), which calls _pool_finalize() followed by
|
||||
_pool_init() with new parameters.
|
||||
- Added bli_mem_reinit(), which is based on bli_pool_reinit().
|
||||
- Added new wrapper, _mem_compute_pool_block_sizes(), which calls
|
||||
_mem_compute_pool_block_sizes_dt().
|
||||
- Updated bli_mem_release() so that the pblk_t is freed, via
|
||||
_pool_free_block(), if the block size recorded in the mem_t at the
|
||||
time the pblk_t was acquired is now different from the value in the
|
||||
pool_t.
|
||||
|
||||
commit 37e55ca39bdbddaec03ad30d43e8ad2b3e549c96
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Oct 30 18:25:04 2015 -0500
|
||||
|
||||
Fixed obscure 3m1/4m1a bugs in trmm[3] and trsm.
|
||||
|
||||
Details:
|
||||
- Fixed a family of bugs in the triangular level-3 operations for
|
||||
certain complex implementations (3m1 and 4m1a) that only manifest if
|
||||
one of the register blocksizes (PACKMR/PACKNR, actually) is odd:
|
||||
- Fixed incorrect imaginary stride computation in bli_packm_blk_var2()
|
||||
for the triangular case.
|
||||
- Fixed the incorrect computation of imaginary stride, as stored in
|
||||
the auxinfo_t struct in trmm and trsm macro-kernels.
|
||||
- Fixed incorrect pointer arithmetic in the trsm macro-kernels in the
|
||||
cases where the the register blocksize for the triangular matrix is
|
||||
odd. Introduced a new byte-granular pointer arithmetic macro,
|
||||
bli_ptr_add(), that computes the correct value.
|
||||
- Added cpp macro to bli_macro_defs.h for typeof() operator, defined in
|
||||
terms of __typeof__, which is used by bli_ptr_add() macro.
|
||||
- Disabled the row- vs. column-storage optimization in bli_trmm_front()
|
||||
for singleton problems because the inherent ambiguity of whether a
|
||||
scalar is row-stored or column-stored causes the wrong parameter
|
||||
combination code to be executed (by dumb luck of our checking for
|
||||
row storage first).
|
||||
- Added commented-out debugging lines to 3m1/4m1a and reference
|
||||
micro-kernels, and trsm_ll macro-kernel.
|
||||
|
||||
commit 46294d80e5a79c598e200e1c8ec2a642ff839971
|
||||
Merge: d3159c5 a0a7b85
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Oct 27 12:41:23 2015 -0500
|
||||
|
||||
Merge pull request #35 from figual/master
|
||||
|
||||
Fixed incomplete code in the double precision ARMv8 microkernel.
|
||||
|
||||
commit a0a7b85ac3e157af53cff8db0e008f4a3f90372c
|
||||
Author: Francisco Igual <figual@ucm.es>
|
||||
Date: Tue Oct 27 08:59:15 2015 +0000
|
||||
|
||||
Fixed incomplete code in the double precision ARMv8 microkernel.
|
||||
|
||||
commit d3159c5740c9ee7f8c0b661003aab6f00646ad6f
|
||||
Merge: b489152 7e03e45
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Oct 21 14:54:00 2015 -0500
|
||||
|
||||
Merge branch 'master' of github.com:flame/blis
|
||||
|
||||
commit b489152e112644ec3b6d19e687231a9607f7694f
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Oct 21 14:53:17 2015 -0500
|
||||
|
||||
Use vzeroall in haswell micro-kernels.
|
||||
|
||||
commit 7e03e45bfe6c27c4fdbf06b1caa7f49e9a5fef49
|
||||
Merge: 77ddb0b 4f88c29
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Oct 14 13:26:07 2015 -0500
|
||||
|
||||
Merge pull request #33 from xianyi/master
|
||||
|
||||
Enable Travis CI
|
||||
|
||||
commit 4f88c29f9e634cbb6fb22d8c88931f0ec78ad7db
|
||||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||||
Date: Wed Oct 14 12:57:50 2015 -0500
|
||||
|
||||
Detect Intel Broadwell (using Haswell config).
|
||||
|
||||
commit 4b0ac1a9984a93f7ad4369b10fca63991107d9f5
|
||||
Merge: fe3e355 77ddb0b
|
||||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||||
Date: Wed Oct 14 12:51:05 2015 -0500
|
||||
|
||||
Merge branch 'upstream_master'
|
||||
|
||||
commit 77ddb0b1d31ada111dadf392766ba6d9210ed9fb
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Tue Oct 13 12:53:06 2015 -0500
|
||||
|
||||
Removed flop-counting mechanism.
|
||||
|
||||
Details:
|
||||
- Removed the optional flop-counting feature introduced in commit
|
||||
7574c994.
|
||||
|
||||
commit 276da366187460a4c8e6e0910e79cb39ce780bfe
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Oct 12 11:43:03 2015 -0500
|
||||
|
||||
Minor formatting change to README.md.
|
||||
|
||||
commit d17057446f5404824478e8a6cd08f242ab75544a
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Mon Oct 12 11:39:49 2015 -0500
|
||||
|
||||
Added "Getting Started" section to README.md.
|
||||
|
||||
Details:
|
||||
- Added section to README.md file containing links to wikis with brief
|
||||
descriptions.
|
||||
|
||||
commit e7e1f2f7b601b21b50e3cdad8972cb3fe11018d3
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Oct 2 16:51:52 2015 -0500
|
||||
|
||||
Minor updates to CREDITS, README files.
|
||||
|
||||
commit 55329906ecd7ce1ab910e4d30a29354a9172e7ea
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Sat Sep 26 20:47:19 2015 -0500
|
||||
|
||||
Minor edits to README.md, testsuite.
|
||||
|
||||
Details:
|
||||
- Fixed typos in README.md.
|
||||
- Fixed column heading alignment for testsuite when matlab output is
|
||||
enabled.
|
||||
- Minor updates to test/3m4m/runme.sh and test/3m4m/Makefile.
|
||||
|
||||
commit bbebdb5793a8fd6aaf257012ab0272beaa04a0de
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Sep 25 14:47:27 2015 -0500
|
||||
|
||||
Replaced README with README.md.
|
||||
|
||||
Details:
|
||||
- Replaced the old (and short) README file with a much more comprehensive
|
||||
version written in github-flavored markdown. The new file is based on
|
||||
content taken from the old Google Code homepage.
|
||||
|
||||
commit e2e9d64a63485461192d9c2a6dd0183a8b71013c
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Sep 24 12:14:03 2015 -0500
|
||||
|
||||
Load balance thread ranges for arbitrary diagonals.
|
||||
|
||||
Details:
|
||||
- Expanded/updated interface for bli_get_range_weighted() and
|
||||
bli_get_range() so that the direction of movement is specified in the
|
||||
function name (e.g. bli_get_range_l2r(), bli_get_range_weighted_t2b())
|
||||
and also so that the object being partitioned is passed instead of an
|
||||
uplo parameter. Updated invocations in level-3 blocked variants, as
|
||||
appropriate.
|
||||
- (Re)implemented bli_get_range_*() and bli_get_range_weighted_*() to
|
||||
carefully take into account the location of the diagonal when computing
|
||||
ranges so that the area of each subpartition (which, in all present
|
||||
level-3 operations, is proportional to the amount of computation
|
||||
engendered) is as equal as possible.
|
||||
- Added calls to a new class of routines to all non-gemm level-3 blocked
|
||||
variants:
|
||||
bli_<oper>_prune_unref_mparts_[mnk]()
|
||||
where <oper> is herk, trmm, or trsm and [mnk] is chosen based on which
|
||||
dimension is being partitioned. These routines call a more basic
|
||||
routine, bli_prune_unref_mparts(), to prune unreferenced/unstored
|
||||
regions from matrices and simultaneously adjust other matrices which
|
||||
share the same dimension accordingly.
|
||||
- Simplified herk_blk_var2f, trmm_blk_var1f/b as a result of more the
|
||||
new pruning routines.
|
||||
- Fixed incorrect blocking factors passed into bli_get_range_*() in
|
||||
bli_trsm_blk_var[12][fb].c
|
||||
- Added a new test driver in test/thread_ranges that can exercise the new
|
||||
bli_get_range_*() and bli_get_range_weighted_*() under a range of
|
||||
conditions.
|
||||
- Reimplemented m and n fields of obj_t as elements in a "dim"
|
||||
array field so that dimensions could be queried via index constant
|
||||
(e.g. BLIS_M, BLIS_N). Adjusted/added query and modification
|
||||
macros accordingly.
|
||||
- Defined mdim_t type to enumerate BLIS_M and BLIS_N indexing values.
|
||||
- Added bli_round() macro, which calls C math library function round(),
|
||||
and bli_round_to_mult(), which rounds a value to the nearest multiple
|
||||
of some other value.
|
||||
- Added miscellaneous pruning- and mdim_t-related macros.
|
||||
- Renamed bli_obj_row_offset(), bli_obj_col_offset() macros to
|
||||
bli_obj_row_off(), bli_obj_col_off().
|
||||
|
||||
commit fe3e355c9c5a6f65b8736b009e2d501b62a83ea1
|
||||
Merge: efa641e 4dd9dd3
|
||||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||||
Date: Fri Aug 21 14:38:36 2015 -0500
|
||||
|
||||
Merge branch 'upstream_master'
|
||||
|
||||
commit efa641e36b73abee34166a252e90e28a6281d92d
|
||||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||||
Date: Sat Aug 22 03:15:50 2015 +0800
|
||||
|
||||
Try to fix the compiling bug on travis.
|
||||
|
||||
commit 4dd9dd3e1de626b51bfe85d9ee65f193d60e8d38
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Fri Aug 21 11:52:37 2015 -0500
|
||||
|
||||
Fixed minor alignment ambiguity bug in bli_pool.c.
|
||||
|
||||
Details:
|
||||
- Fixed a typecasting ambiguity in bli_pool_alloc_block() in which
|
||||
pointer arithmetic was performed on a void* as if it were a byte
|
||||
pointer (such as char*). Some compilers may have already been
|
||||
interpreting this situation as intended, despite the sloppiness.
|
||||
Thanks to Aleksei Rechinskii for reporting this issue.
|
||||
- Redefined pointer alignment macros to typecast to uintptr_t instead of
|
||||
siz_t.
|
||||
|
||||
commit 12ffd568b04feda57147c13b67717416a01c82f8
|
||||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||||
Date: Sat Aug 22 00:24:28 2015 +0800
|
||||
|
||||
Add Travis CI.
|
||||
|
||||
commit ecc3ebb749e0861c27deda52b5f87236ede4901b
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Jul 29 13:31:12 2015 -0500
|
||||
|
||||
CHANGELOG update (0.1.8)
|
||||
|
||||
commit 47caa33485b91ea6f2a5e386e61210c90c5f489f (tag: 0.1.8)
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Wed Jul 29 13:31:09 2015 -0500
|
||||
|
||||
Version file update (0.1.8)
|
||||
|
||||
commit ef0fbbbdb6148b96938733fce72cb4ed7dad685e (origin/master)
|
||||
commit ef0fbbbdb6148b96938733fce72cb4ed7dad685e
|
||||
Merge: fdfe14f d4b8913
|
||||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||||
Date: Thu Jul 9 13:54:54 2015 -0500
|
||||
|
||||
65
Makefile
65
Makefile
@@ -154,22 +154,13 @@ BLIS_DLL_NAME := $(BLIS_LIB_BASE_NAME).so
|
||||
# --- BLIS framework source and object variable names ---
|
||||
|
||||
# These are the makefile variables that source code files will be accumulated
|
||||
# into by the makefile fragments. Notice that we include separate variables
|
||||
# for regular and "special" source.
|
||||
# into by the makefile fragments.
|
||||
MK_FRAME_SRC :=
|
||||
MK_FRAME_NOOPT_SRC :=
|
||||
MK_FRAME_KERNELS_SRC :=
|
||||
MK_CONFIG_SRC :=
|
||||
MK_CONFIG_NOOPT_SRC :=
|
||||
MK_CONFIG_KERNELS_SRC :=
|
||||
|
||||
# These hold object filenames corresponding to above.
|
||||
MK_FRAME_OBJS :=
|
||||
MK_FRAME_NOOPT_OBJS :=
|
||||
MK_FRAME_KERNELS_OBJS :=
|
||||
MK_CONFIG_OBJS :=
|
||||
MK_CONFIG_NOOPT_OBJS :=
|
||||
MK_CONFIG_KERNELS_OBJS :=
|
||||
|
||||
# Append the base library path to the library names.
|
||||
MK_ALL_BLIS_LIB := $(BASE_LIB_PATH)/$(BLIS_LIB_NAME)
|
||||
@@ -309,41 +300,17 @@ CFLAGS_KERNELS := $(CFLAGS_KERNELS) $(VERS_DEF)
|
||||
# Convert source file paths to object file paths by replacing the base source
|
||||
# directories with the base object directories, and also replacing the source
|
||||
# file suffix (eg: '.c') with '.o'.
|
||||
MK_BLIS_FRAME_OBJS := $(patsubst $(FRAME_PATH)/%.c, $(BASE_OBJ_FRAME_PATH)/%.o, \
|
||||
$(filter %.c, $(MK_FRAME_SRC)))
|
||||
MK_BLIS_FRAME_NOOPT_OBJS := $(patsubst $(FRAME_PATH)/%.c, $(BASE_OBJ_FRAME_PATH)/%.o, \
|
||||
$(filter %.c, $(MK_FRAME_NOOPT_SRC)))
|
||||
MK_BLIS_FRAME_KERNELS_OBJS := $(patsubst $(FRAME_PATH)/%.c, $(BASE_OBJ_FRAME_PATH)/%.o, \
|
||||
$(filter %.c, $(MK_FRAME_KERNELS_SRC)))
|
||||
MK_BLIS_FRAME_OBJS := $(patsubst $(FRAME_PATH)/%.c, $(BASE_OBJ_FRAME_PATH)/%.o, \
|
||||
$(filter %.c, $(MK_FRAME_SRC)))
|
||||
|
||||
MK_BLIS_CONFIG_OBJS := $(patsubst $(CONFIG_PATH)/%.S, $(BASE_OBJ_CONFIG_PATH)/%.o, \
|
||||
$(filter %.S, $(MK_CONFIG_SRC)))
|
||||
MK_BLIS_CONFIG_OBJS += $(patsubst $(CONFIG_PATH)/%.c, $(BASE_OBJ_CONFIG_PATH)/%.o, \
|
||||
$(filter %.c, $(MK_CONFIG_SRC)))
|
||||
|
||||
MK_BLIS_CONFIG_NOOPT_OBJS := $(patsubst $(CONFIG_PATH)/%.S, $(BASE_OBJ_CONFIG_PATH)/%.o, \
|
||||
$(filter %.S, $(MK_CONFIG_NOOPT_SRC)))
|
||||
MK_BLIS_CONFIG_NOOPT_OBJS += $(patsubst $(CONFIG_PATH)/%.c, $(BASE_OBJ_CONFIG_PATH)/%.o, \
|
||||
$(filter %.c, $(MK_CONFIG_NOOPT_SRC)))
|
||||
|
||||
MK_BLIS_CONFIG_KERNELS_OBJS := $(patsubst $(CONFIG_PATH)/%.S, $(BASE_OBJ_CONFIG_PATH)/%.o, \
|
||||
$(filter %.S, $(MK_CONFIG_KERNELS_SRC)))
|
||||
MK_BLIS_CONFIG_KERNELS_OBJS += $(patsubst $(CONFIG_PATH)/%.c, $(BASE_OBJ_CONFIG_PATH)/%.o, \
|
||||
$(filter %.c, $(MK_CONFIG_KERNELS_SRC)))
|
||||
MK_BLIS_CONFIG_OBJS := $(patsubst $(CONFIG_PATH)/%.S, $(BASE_OBJ_CONFIG_PATH)/%.o, \
|
||||
$(filter %.S, $(MK_CONFIG_SRC)))
|
||||
MK_BLIS_CONFIG_OBJS += $(patsubst $(CONFIG_PATH)/%.c, $(BASE_OBJ_CONFIG_PATH)/%.o, \
|
||||
$(filter %.c, $(MK_CONFIG_SRC)))
|
||||
|
||||
# Combine all of the object files into some readily-accessible variables.
|
||||
MK_ALL_BLIS_OPT_OBJS := $(MK_BLIS_CONFIG_OBJS) \
|
||||
$(MK_BLIS_FRAME_OBJS)
|
||||
|
||||
MK_ALL_BLIS_NOOPT_OBJS := $(MK_BLIS_CONFIG_NOOPT_OBJS) \
|
||||
$(MK_BLIS_FRAME_NOOPT_OBJS)
|
||||
|
||||
MK_ALL_BLIS_KERNELS_OBJS := $(MK_BLIS_CONFIG_KERNELS_OBJS) \
|
||||
$(MK_BLIS_FRAME_KERNELS_OBJS)
|
||||
|
||||
MK_ALL_BLIS_OBJS := $(MK_ALL_BLIS_OPT_OBJS) \
|
||||
$(MK_ALL_BLIS_NOOPT_OBJS) \
|
||||
$(MK_ALL_BLIS_KERNELS_OBJS)
|
||||
MK_ALL_BLIS_OBJS := $(MK_BLIS_CONFIG_OBJS) \
|
||||
$(MK_BLIS_FRAME_OBJS)
|
||||
|
||||
|
||||
|
||||
@@ -424,15 +391,15 @@ clean: cleanlib cleantest
|
||||
|
||||
# Define two functions, each of which takes one argument (an object file
|
||||
# path). The functions determine which CFLAGS and text string are needed to
|
||||
# compile the object file. Note that we match with a preceding forward slash,
|
||||
# so the directory name must begin with the special directory name, but it
|
||||
# can have trailing characters (e.g. 'kernels_x86').
|
||||
get_cflags_for_obj = $(if $(findstring /$(NOOPT_DIR),$1),$(CFLAGS_NOOPT),\
|
||||
$(if $(findstring /$(KERNELS_DIR),$1),$(CFLAGS_KERNELS),\
|
||||
# compile the object file. Note that we match without a preceding forward slash,
|
||||
# so the directory name may have 'kernels' as a substring (e.g. 'ukernels' or
|
||||
# 'kernels_opt').
|
||||
get_cflags_for_obj = $(if $(findstring $(NOOPT_DIR),$1),$(CFLAGS_NOOPT),\
|
||||
$(if $(findstring $(KERNELS_DIR),$1),$(CFLAGS_KERNELS),\
|
||||
$(CFLAGS)))
|
||||
|
||||
get_ctext_for_obj = $(if $(findstring /$(NOOPT_DIR),$1),$(NOOPT_TEXT),\
|
||||
$(if $(findstring /$(KERNELS_DIR),$1),$(KERNELS_TEXT),))
|
||||
get_ctext_for_obj = $(if $(findstring $(NOOPT_DIR),$1),$(NOOPT_TEXT),\
|
||||
$(if $(findstring $(KERNELS_DIR),$1),$(KERNELS_TEXT),))
|
||||
|
||||
$(BASE_OBJ_FRAME_PATH)/%.o: $(FRAME_PATH)/%.c $(MK_HEADER_FILES) $(MAKE_DEFS_MK_PATH)
|
||||
ifeq ($(BLIS_ENABLE_VERBOSE_MAKE_OUTPUT),yes)
|
||||
|
||||
@@ -254,7 +254,9 @@ gen_mkfiles()
|
||||
|
||||
|
||||
# Append a relevant suffix to the makefile variable name, if necesary
|
||||
all_add_src_var_name "$cur_dir"
|
||||
# NOTE: This step is disabled because special directories are presently
|
||||
# ignored when generating makefile variable names.
|
||||
#all_add_src_var_name "$cur_dir"
|
||||
|
||||
|
||||
# Be verbose if level 2 was requested
|
||||
@@ -286,7 +288,9 @@ gen_mkfiles()
|
||||
|
||||
|
||||
# Remove a relevant suffix from the makefile variable name, if necesary
|
||||
all_del_src_var_name "$cur_dir"
|
||||
# NOTE: This step is disabled because special directories are presently
|
||||
# ignored when generating makefile variable names.
|
||||
#all_del_src_var_name "$cur_dir"
|
||||
|
||||
|
||||
# Return peacefully
|
||||
@@ -295,42 +299,44 @@ gen_mkfiles()
|
||||
|
||||
|
||||
|
||||
update_src_var_name_special()
|
||||
{
|
||||
local dir act i name var_suffix
|
||||
|
||||
# Extract arguments.
|
||||
act="$1"
|
||||
dir="$2"
|
||||
|
||||
# Strip / from end of directory path, if there is one, and then strip
|
||||
# path from directory name.
|
||||
dir=${dir%/}
|
||||
dir=${dir##*/}
|
||||
|
||||
# Run through our list.
|
||||
for specdir in "${special_dirs}"; do
|
||||
|
||||
# If the current item matches sdir, then we'll have
|
||||
# to make a modification of some form.
|
||||
if [ "$dir" = "$specdir" ]; then
|
||||
|
||||
# Convert the directory name to uppercase.
|
||||
var_suffix=$(echo "$dir" | tr '[:lower:]' '[:upper:]')
|
||||
|
||||
# Either add or remove the suffix, and also update the
|
||||
# source file suffix variable.
|
||||
if [ "$act" == "+" ]; then
|
||||
src_var_name=${src_var_name}_$var_suffix
|
||||
else
|
||||
src_var_name=${src_var_name%_$var_suffix}
|
||||
fi
|
||||
|
||||
# No need to continue iterating.
|
||||
break;
|
||||
fi
|
||||
done
|
||||
}
|
||||
#update_src_var_name_special()
|
||||
#{
|
||||
# local dir act i name var_suffix
|
||||
#
|
||||
# # Extract arguments.
|
||||
# act="$1"
|
||||
# dir="$2"
|
||||
#
|
||||
# # Strip / from end of directory path, if there is one, and then strip
|
||||
# # path from directory name.
|
||||
# dir=${dir%/}
|
||||
# dir=${dir##*/}
|
||||
#
|
||||
# # Run through our list.
|
||||
# # NOTE: CURRENTLY, SPECIAL DIRECTORY NAMES ARE IGNORED. In order to
|
||||
# # re-enable them, remove the quotes from "${special_dirs}".
|
||||
# for specdir in "${special_dirs}"; do
|
||||
#
|
||||
# # If the current item matches sdir, then we'll have
|
||||
# # to make a modification of some form.
|
||||
# if [ "$dir" = "$specdir" ]; then
|
||||
#
|
||||
# # Convert the directory name to uppercase.
|
||||
# var_suffix=$(echo "$dir" | tr '[:lower:]' '[:upper:]')
|
||||
#
|
||||
# # Either add or remove the suffix, and also update the
|
||||
# # source file suffix variable.
|
||||
# if [ "$act" == "+" ]; then
|
||||
# src_var_name=${src_var_name}_$var_suffix
|
||||
# else
|
||||
# src_var_name=${src_var_name%_$var_suffix}
|
||||
# fi
|
||||
#
|
||||
# # No need to continue iterating.
|
||||
# break;
|
||||
# fi
|
||||
# done
|
||||
#}
|
||||
|
||||
#init_src_var_name()
|
||||
#{
|
||||
@@ -351,20 +357,20 @@ update_src_var_name_special()
|
||||
# done
|
||||
#}
|
||||
|
||||
all_add_src_var_name()
|
||||
{
|
||||
local dir="$1"
|
||||
|
||||
update_src_var_name_special "+" "$dir"
|
||||
#all_add_src_var_name()
|
||||
#{
|
||||
# local dir="$1"
|
||||
#
|
||||
# update_src_var_name_special "+" "$dir"
|
||||
#
|
||||
#}
|
||||
|
||||
}
|
||||
|
||||
all_del_src_var_name()
|
||||
{
|
||||
local dir="$1"
|
||||
|
||||
update_src_var_name_special "-" "$dir"
|
||||
}
|
||||
#all_del_src_var_name()
|
||||
#{
|
||||
# local dir="$1"
|
||||
#
|
||||
# update_src_var_name_special "-" "$dir"
|
||||
#}
|
||||
|
||||
read_mkfile_config()
|
||||
{
|
||||
|
||||
@@ -161,7 +161,7 @@ LDFLAGS += -fopenmp
|
||||
endif
|
||||
ifeq ($(THREADING_MODEL),pthreads)
|
||||
CTHREADFLAGS := -pthread -DBLIS_ENABLE_PTHREADS
|
||||
LDFLAGS += -pthread
|
||||
LDFLAGS += -lpthread
|
||||
endif
|
||||
endif
|
||||
|
||||
@@ -175,7 +175,7 @@ LDFLAGS += -openmp
|
||||
endif
|
||||
ifeq ($(THREADING_MODEL),pthreads)
|
||||
CTHREADFLAGS := -pthread -DBLIS_ENABLE_PTHREADS
|
||||
LDFLAGS += -pthread
|
||||
LDFLAGS += -lpthread
|
||||
endif
|
||||
endif
|
||||
|
||||
@@ -188,7 +188,7 @@ $(error OpenMP is not supported with Clang.)
|
||||
endif
|
||||
ifeq ($(THREADING_MODEL),pthreads)
|
||||
CTHREADFLAGS := -pthread -DBLIS_ENABLE_PTHREADS
|
||||
LDFLAGS += -pthread
|
||||
LDFLAGS += -lpthread
|
||||
endif
|
||||
endif
|
||||
|
||||
|
||||
@@ -144,25 +144,7 @@
|
||||
|
||||
// -- Default fusing factors for level-1f operations --
|
||||
|
||||
#define BLIS_L1F_FUSE_FAC_S 8
|
||||
#define BLIS_L1F_FUSE_FAC_D 8
|
||||
#define BLIS_L1F_FUSE_FAC_C 4
|
||||
#define BLIS_L1F_FUSE_FAC_Z 2
|
||||
|
||||
#define BLIS_AXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
#define BLIS_AXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
#define BLIS_AXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
#define BLIS_AXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
|
||||
#define BLIS_DOTXF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
#define BLIS_DOTXF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
#define BLIS_DOTXF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
#define BLIS_DOTXF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
#define BLIS_DOTXAXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
#define BLIS_DEFAULT_AF_D 8
|
||||
|
||||
|
||||
|
||||
@@ -171,10 +153,8 @@
|
||||
|
||||
// -- gemm --
|
||||
|
||||
#include "bli_gemm_8x8.h"
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_8x8
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_8x8
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_int_8x8
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_int_8x8
|
||||
|
||||
// -- trsm-related --
|
||||
|
||||
|
||||
@@ -51,87 +51,6 @@
|
||||
// (b) MR (for zero-padding purposes when MR and NR are "swapped")
|
||||
//
|
||||
|
||||
// #define BLIS_DEFAULT_MC_S 128
|
||||
// #define BLIS_DEFAULT_KC_S 384
|
||||
// #define BLIS_DEFAULT_NC_S 4096
|
||||
|
||||
#define BLIS_DEFAULT_MC_D 1080
|
||||
#define BLIS_DEFAULT_KC_D 120
|
||||
#define BLIS_DEFAULT_NC_D 8400
|
||||
|
||||
// #define BLIS_DEFAULT_MC_C 128
|
||||
// #define BLIS_DEFAULT_KC_C 256
|
||||
// #define BLIS_DEFAULT_NC_C 4096
|
||||
//
|
||||
// #define BLIS_DEFAULT_MC_Z 64
|
||||
// #define BLIS_DEFAULT_KC_Z 256
|
||||
// #define BLIS_DEFAULT_NC_Z 2048
|
||||
|
||||
// -- Register blocksizes --
|
||||
|
||||
// #define BLIS_DEFAULT_MR_S 8
|
||||
// #define BLIS_DEFAULT_NR_S 8
|
||||
|
||||
#define BLIS_DEFAULT_MR_D 4
|
||||
#define BLIS_DEFAULT_NR_D 6
|
||||
|
||||
// #define BLIS_DEFAULT_MR_C 8
|
||||
// #define BLIS_DEFAULT_NR_C 4
|
||||
//
|
||||
// #define BLIS_DEFAULT_MR_Z 8
|
||||
// #define BLIS_DEFAULT_NR_Z 4
|
||||
|
||||
// NOTE: If the micro-kernel, which is typically unrolled to a factor
|
||||
// of f, handles leftover edge cases (ie: when k % f > 0) then these
|
||||
// register blocksizes in the k dimension can be defined to 1.
|
||||
|
||||
//#define BLIS_DEFAULT_KR_S 1
|
||||
//#define BLIS_DEFAULT_KR_D 1
|
||||
//#define BLIS_DEFAULT_KR_C 1
|
||||
//#define BLIS_DEFAULT_KR_Z 1
|
||||
|
||||
// -- Maximum cache blocksizes (for optimizing edge cases) --
|
||||
|
||||
// NOTE: These cache blocksize "extensions" have the same constraints as
|
||||
// the corresponding default blocksizes above. When these values are
|
||||
// larger than the default blocksizes, blocksizes used at edge cases are
|
||||
// enlarged if such an extension would encompass the remaining portion of
|
||||
// the matrix dimension.
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_S (BLIS_DEFAULT_MC_S + BLIS_DEFAULT_MC_S/4)
|
||||
//#define BLIS_MAXIMUM_KC_S (BLIS_DEFAULT_KC_S + BLIS_DEFAULT_KC_S/4)
|
||||
//#define BLIS_MAXIMUM_NC_S (BLIS_DEFAULT_NC_S + BLIS_DEFAULT_NC_S/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_D (BLIS_DEFAULT_MC_D + BLIS_DEFAULT_MC_D/4)
|
||||
//#define BLIS_MAXIMUM_KC_D (BLIS_DEFAULT_KC_D + BLIS_DEFAULT_KC_D/4)
|
||||
//#define BLIS_MAXIMUM_NC_D (BLIS_DEFAULT_NC_D + BLIS_DEFAULT_NC_D/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_C (BLIS_DEFAULT_MC_C + BLIS_DEFAULT_MC_C/4)
|
||||
//#define BLIS_MAXIMUM_KC_C (BLIS_DEFAULT_KC_C + BLIS_DEFAULT_KC_C/4)
|
||||
//#define BLIS_MAXIMUM_NC_C (BLIS_DEFAULT_NC_C + BLIS_DEFAULT_NC_C/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_Z (BLIS_DEFAULT_MC_Z + BLIS_DEFAULT_MC_Z/4)
|
||||
//#define BLIS_MAXIMUM_KC_Z (BLIS_DEFAULT_KC_Z + BLIS_DEFAULT_KC_Z/4)
|
||||
//#define BLIS_MAXIMUM_NC_Z (BLIS_DEFAULT_NC_Z + BLIS_DEFAULT_NC_Z/4)
|
||||
|
||||
// -- Packing register blocksize (for packed micro-panels) --
|
||||
|
||||
// NOTE: These register blocksize "extensions" determine whether the
|
||||
// leading dimensions used within the packed micro-panels are equal to
|
||||
// or greater than their corresponding register blocksizes above.
|
||||
|
||||
//#define BLIS_PACKDIM_MR_S (BLIS_DEFAULT_MR_S + ...)
|
||||
//#define BLIS_PACKDIM_NR_S (BLIS_DEFAULT_NR_S + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_D (BLIS_DEFAULT_MR_D + ...)
|
||||
//#define BLIS_PACKDIM_NR_D (BLIS_DEFAULT_NR_D + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_C (BLIS_DEFAULT_MR_C + ...)
|
||||
//#define BLIS_PACKDIM_NR_C (BLIS_DEFAULT_NR_C + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_Z (BLIS_DEFAULT_MR_Z + ...)
|
||||
//#define BLIS_PACKDIM_NR_Z (BLIS_DEFAULT_NR_Z + ...)
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -149,23 +68,28 @@
|
||||
|
||||
// -- gemm --
|
||||
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_8x8_FMA4
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_8x8_fma4
|
||||
#define BLIS_DEFAULT_MC_S 128
|
||||
#define BLIS_DEFAULT_KC_S 384
|
||||
#define BLIS_DEFAULT_NC_S 4096
|
||||
#define BLIS_DEFAULT_MR_S 8
|
||||
#define BLIS_DEFAULT_NR_S 8
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_4x6_FMA4
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_4x6_fma4
|
||||
#define BLIS_DEFAULT_MC_D 1080
|
||||
#define BLIS_DEFAULT_KC_D 120
|
||||
#define BLIS_DEFAULT_NC_D 8400
|
||||
#define BLIS_DEFAULT_MR_D 4
|
||||
#define BLIS_DEFAULT_NR_D 6
|
||||
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_8x4_FMA4
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_8x4_fma4
|
||||
#define BLIS_DEFAULT_MC_C 96
|
||||
#define BLIS_DEFAULT_KC_C 256
|
||||
#define BLIS_DEFAULT_NC_C 4096
|
||||
#define BLIS_DEFAULT_MR_C 8
|
||||
#define BLIS_DEFAULT_NR_C 4
|
||||
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_4x4_FMA4
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_4x4_fma4
|
||||
#define BLIS_DEFAULT_MC_Z 64
|
||||
#define BLIS_DEFAULT_KC_Z 192
|
||||
#define BLIS_DEFAULT_NC_Z 4096
|
||||
|
||||
@@ -51,28 +51,28 @@
|
||||
// (b) MR (for zero-padding purposes when MR and NR are "swapped")
|
||||
//
|
||||
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_new_16x3
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_16x3
|
||||
#define BLIS_DEFAULT_MC_S 528
|
||||
#define BLIS_DEFAULT_KC_S 256
|
||||
#define BLIS_DEFAULT_NC_S 8400
|
||||
#define BLIS_DEFAULT_MR_S 16
|
||||
#define BLIS_DEFAULT_NR_S 3
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_new_8x3
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_8x3
|
||||
#define BLIS_DEFAULT_MC_D 264
|
||||
#define BLIS_DEFAULT_KC_D 256
|
||||
#define BLIS_DEFAULT_NC_D 8400
|
||||
#define BLIS_DEFAULT_MR_D 8
|
||||
#define BLIS_DEFAULT_NR_D 3
|
||||
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_new_4x2
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_4x2
|
||||
#define BLIS_DEFAULT_MC_C 264
|
||||
#define BLIS_DEFAULT_KC_C 256
|
||||
#define BLIS_DEFAULT_NC_C 8400
|
||||
#define BLIS_DEFAULT_MR_C 4
|
||||
#define BLIS_DEFAULT_NR_C 2
|
||||
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_new_2x2
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_2x2
|
||||
#define BLIS_DEFAULT_MC_Z 100
|
||||
#define BLIS_DEFAULT_KC_Z 320
|
||||
#define BLIS_DEFAULT_NC_Z 8400
|
||||
|
||||
@@ -1 +1 @@
|
||||
../../kernels/arm/neon
|
||||
../../kernels/arm
|
||||
@@ -1 +1 @@
|
||||
../../kernels/arm/neon
|
||||
../../kernels/arm
|
||||
@@ -67,26 +67,6 @@
|
||||
//#define BLIS_DEFAULT_KC_Z 384
|
||||
//#define BLIS_DEFAULT_NC_Z 4096
|
||||
|
||||
// NOTE: If 4m blocksizes are not defined here, they will be determined
|
||||
// from the corresponding real domain blocksizes.
|
||||
#define BLIS_DEFAULT_4M_MC_C 384
|
||||
#define BLIS_DEFAULT_4M_KC_C 512
|
||||
#define BLIS_DEFAULT_4M_NC_C 4096
|
||||
|
||||
#define BLIS_DEFAULT_4M_MC_Z 192
|
||||
#define BLIS_DEFAULT_4M_KC_Z 256
|
||||
#define BLIS_DEFAULT_4M_NC_Z 4096
|
||||
|
||||
// NOTE: If 3m blocksizes are not defined here, they will be determined
|
||||
// from the corresponding real domain blocksizes.
|
||||
#define BLIS_DEFAULT_3M_MC_C 384
|
||||
#define BLIS_DEFAULT_3M_KC_C 512
|
||||
#define BLIS_DEFAULT_3M_NC_C 4096
|
||||
|
||||
#define BLIS_DEFAULT_3M_MC_Z 192
|
||||
#define BLIS_DEFAULT_3M_KC_Z 256
|
||||
#define BLIS_DEFAULT_3M_NC_Z 4096
|
||||
|
||||
// -- Register blocksizes --
|
||||
|
||||
#define BLIS_DEFAULT_MR_S 8
|
||||
@@ -101,56 +81,6 @@
|
||||
#define BLIS_DEFAULT_MR_Z 2
|
||||
#define BLIS_DEFAULT_NR_Z 2
|
||||
|
||||
// NOTE: If the micro-kernel, which is typically unrolled to a factor
|
||||
// of f, handles leftover edge cases (ie: when k % f > 0) then these
|
||||
// register blocksizes in the k dimension can be defined to 1.
|
||||
|
||||
//#define BLIS_DEFAULT_KR_S 1
|
||||
//#define BLIS_DEFAULT_KR_D 1
|
||||
//#define BLIS_DEFAULT_KR_C 1
|
||||
//#define BLIS_DEFAULT_KR_Z 1
|
||||
|
||||
// -- Maximum cache blocksizes (for optimizing edge cases) --
|
||||
|
||||
// NOTE: These cache blocksize "extensions" have the same constraints as
|
||||
// the corresponding default blocksizes above. When these values are
|
||||
// larger than the default blocksizes, blocksizes used at edge cases are
|
||||
// enlarged if such an extension would encompass the remaining portion of
|
||||
// the matrix dimension.
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_S (BLIS_DEFAULT_MC_S + BLIS_DEFAULT_MC_S/4)
|
||||
//#define BLIS_MAXIMUM_KC_S (BLIS_DEFAULT_KC_S + BLIS_DEFAULT_KC_S/4)
|
||||
//#define BLIS_MAXIMUM_NC_S (BLIS_DEFAULT_NC_S + BLIS_DEFAULT_NC_S/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_D (BLIS_DEFAULT_MC_D + BLIS_DEFAULT_MC_D/4)
|
||||
//#define BLIS_MAXIMUM_KC_D (BLIS_DEFAULT_KC_D + BLIS_DEFAULT_KC_D/4)
|
||||
//#define BLIS_MAXIMUM_NC_D (BLIS_DEFAULT_NC_D + BLIS_DEFAULT_NC_D/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_C (BLIS_DEFAULT_MC_C + BLIS_DEFAULT_MC_C/4)
|
||||
//#define BLIS_MAXIMUM_KC_C (BLIS_DEFAULT_KC_C + BLIS_DEFAULT_KC_C/4)
|
||||
//#define BLIS_MAXIMUM_NC_C (BLIS_DEFAULT_NC_C + BLIS_DEFAULT_NC_C/4)
|
||||
|
||||
//#define BLIS_MAXIMUM_MC_Z (BLIS_DEFAULT_MC_Z + BLIS_DEFAULT_MC_Z/4)
|
||||
//#define BLIS_MAXIMUM_KC_Z (BLIS_DEFAULT_KC_Z + BLIS_DEFAULT_KC_Z/4)
|
||||
//#define BLIS_MAXIMUM_NC_Z (BLIS_DEFAULT_NC_Z + BLIS_DEFAULT_NC_Z/4)
|
||||
|
||||
// -- Packing register blocksize (for packed micro-panels) --
|
||||
|
||||
// NOTE: These register blocksize "extensions" determine whether the
|
||||
// leading dimensions used within the packed micro-panels are equal to
|
||||
// or greater than their corresponding register blocksizes above.
|
||||
|
||||
//#define BLIS_PACKDIM_MR_S (BLIS_DEFAULT_MR_S + ...)
|
||||
//#define BLIS_PACKDIM_NR_S (BLIS_DEFAULT_NR_S + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_D (BLIS_DEFAULT_MR_D + ...)
|
||||
//#define BLIS_PACKDIM_NR_D (BLIS_DEFAULT_NR_D + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_C (BLIS_DEFAULT_MR_C + ...)
|
||||
//#define BLIS_PACKDIM_NR_C (BLIS_DEFAULT_NR_C + ...)
|
||||
|
||||
//#define BLIS_PACKDIM_MR_Z (BLIS_DEFAULT_MR_Z + ...)
|
||||
//#define BLIS_PACKDIM_NR_Z (BLIS_DEFAULT_NR_Z + ...)
|
||||
|
||||
|
||||
|
||||
@@ -169,13 +99,13 @@
|
||||
|
||||
// -- gemm --
|
||||
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_opt_8x4
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_4x4
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_8x4
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_4x4
|
||||
|
||||
// -- trsm-related --
|
||||
|
||||
#define BLIS_DGEMMTRSM_L_UKERNEL bli_dgemmtrsm_l_opt_4x4
|
||||
#define BLIS_DGEMMTRSM_U_UKERNEL bli_dgemmtrsm_u_opt_4x4
|
||||
#define BLIS_DGEMMTRSM_L_UKERNEL bli_dgemmtrsm_l_asm_4x4
|
||||
#define BLIS_DGEMMTRSM_U_UKERNEL bli_dgemmtrsm_u_asm_4x4
|
||||
|
||||
|
||||
|
||||
@@ -184,23 +114,23 @@
|
||||
|
||||
// -- axpy2v --
|
||||
|
||||
#define BLIS_DAXPY2V_KERNEL bli_daxpy2v_opt_var1
|
||||
#define BLIS_DAXPY2V_KERNEL bli_daxpy2v_int_var1
|
||||
|
||||
// -- dotaxpyv --
|
||||
|
||||
#define BLIS_DDOTAXPYV_KERNEL bli_ddotaxpyv_opt_var1
|
||||
#define BLIS_DDOTAXPYV_KERNEL bli_ddotaxpyv_int_var1
|
||||
|
||||
// -- axpyf --
|
||||
|
||||
#define BLIS_DAXPYF_KERNEL bli_daxpyf_opt_var1
|
||||
#define BLIS_DAXPYF_KERNEL bli_daxpyf_int_var1
|
||||
|
||||
// -- dotxf --
|
||||
|
||||
#define BLIS_DDOTXF_KERNEL bli_ddotxf_opt_var1
|
||||
#define BLIS_DDOTXF_KERNEL bli_ddotxf_int_var1
|
||||
|
||||
// -- dotxaxpyf --
|
||||
|
||||
#define BLIS_DDOTXAXPYF_KERNEL bli_ddotxaxpyf_opt_var1
|
||||
#define BLIS_DDOTXAXPYF_KERNEL bli_ddotxaxpyf_int_var1
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1 +1 @@
|
||||
../../kernels/x86_64/core2-sse3
|
||||
../../kernels/x86_64/penryn
|
||||
@@ -89,21 +89,6 @@
|
||||
|
||||
#endif
|
||||
|
||||
/*
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_8x4
|
||||
#define BLIS_DEFAULT_MC_C 96
|
||||
#define BLIS_DEFAULT_KC_C 256
|
||||
#define BLIS_DEFAULT_NC_C 4096
|
||||
#define BLIS_DEFAULT_MR_C 8
|
||||
#define BLIS_DEFAULT_NR_C 4
|
||||
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_4x4
|
||||
#define BLIS_DEFAULT_MC_Z 64
|
||||
#define BLIS_DEFAULT_KC_Z 192
|
||||
#define BLIS_DEFAULT_NC_Z 4096
|
||||
#define BLIS_DEFAULT_MR_Z 4
|
||||
#define BLIS_DEFAULT_NR_Z 4
|
||||
*/
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -1 +1 @@
|
||||
../../kernels/x86_64/avx2
|
||||
../../kernels/x86_64/haswell
|
||||
@@ -149,7 +149,7 @@
|
||||
|
||||
// -- gemm --
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_d4x4
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_4x4
|
||||
|
||||
// -- trsm-related --
|
||||
|
||||
|
||||
@@ -42,6 +42,9 @@
|
||||
|
||||
#define BLIS_SIMD_ALIGN_SIZE 32
|
||||
|
||||
#define BLIS_SIMD_SIZE 64
|
||||
#define BLIS_SIMD_NUM_REGISTERS 32
|
||||
|
||||
|
||||
|
||||
#endif
|
||||
|
||||
@@ -153,8 +153,8 @@
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_30x8
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_opt_30x16
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_30x16
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_30x8
|
||||
|
||||
// -- trsm-related --
|
||||
|
||||
|
||||
@@ -51,7 +51,7 @@
|
||||
// (b) MR (for zero-padding purposes when MR and NR are "swapped")
|
||||
//
|
||||
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_new_16x3
|
||||
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_16x3
|
||||
#define BLIS_DEFAULT_MC_S 2016
|
||||
#define BLIS_DEFAULT_KC_S 128
|
||||
#define BLIS_DEFAULT_NC_S 8400
|
||||
@@ -59,7 +59,7 @@
|
||||
#define BLIS_DEFAULT_NR_S 3
|
||||
//#define BLIS_UPANEL_B_ALIGN_SIZE_S 4096
|
||||
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_new_8x3
|
||||
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_8x3
|
||||
//#define BLIS_DEFAULT_MC_D 768
|
||||
//#define BLIS_DEFAULT_KC_D 168
|
||||
#define BLIS_DEFAULT_MC_D 1008
|
||||
@@ -69,14 +69,14 @@
|
||||
#define BLIS_DEFAULT_NR_D 3
|
||||
//#define BLIS_UPANEL_B_ALIGN_SIZE_D 4096
|
||||
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_new_4x2
|
||||
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_4x2
|
||||
#define BLIS_DEFAULT_MC_C 512
|
||||
#define BLIS_DEFAULT_KC_C 256
|
||||
#define BLIS_DEFAULT_NC_C 8400
|
||||
#define BLIS_DEFAULT_MR_C 4
|
||||
#define BLIS_DEFAULT_NR_C 2
|
||||
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_new_2x2
|
||||
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_2x2
|
||||
#define BLIS_DEFAULT_MC_Z 400
|
||||
#define BLIS_DEFAULT_KC_Z 160
|
||||
#define BLIS_DEFAULT_NC_Z 8400
|
||||
|
||||
@@ -1 +1 @@
|
||||
../../kernels/x86_64/avx
|
||||
../../kernels/x86_64/sandybridge
|
||||
@@ -177,17 +177,17 @@
|
||||
// be packed here, but this tends to be much too expensive in practice to
|
||||
// actually employ.)
|
||||
|
||||
//#define BLIS_DEFAULT_L2_MC_S 1000
|
||||
//#define BLIS_DEFAULT_L2_NC_S 1000
|
||||
//#define BLIS_DEFAULT_M2_S 1000
|
||||
//#define BLIS_DEFAULT_N2_S 1000
|
||||
|
||||
//#define BLIS_DEFAULT_L2_MC_D 1000
|
||||
//#define BLIS_DEFAULT_L2_NC_D 1000
|
||||
//#define BLIS_DEFAULT_M2_D 1000
|
||||
//#define BLIS_DEFAULT_N2_D 1000
|
||||
|
||||
//#define BLIS_DEFAULT_L2_MC_C 1000
|
||||
//#define BLIS_DEFAULT_L2_NC_C 1000
|
||||
//#define BLIS_DEFAULT_M2_C 1000
|
||||
//#define BLIS_DEFAULT_N2_C 1000
|
||||
|
||||
//#define BLIS_DEFAULT_L2_MC_Z 1000
|
||||
//#define BLIS_DEFAULT_L2_NC_Z 1000
|
||||
//#define BLIS_DEFAULT_M2_Z 1000
|
||||
//#define BLIS_DEFAULT_N2_Z 1000
|
||||
|
||||
|
||||
|
||||
@@ -196,25 +196,25 @@
|
||||
|
||||
// -- Default fusing factors for level-1f operations --
|
||||
|
||||
//#define BLIS_L1F_FUSE_FAC_S 8
|
||||
//#define BLIS_L1F_FUSE_FAC_D 4
|
||||
//#define BLIS_L1F_FUSE_FAC_C 4
|
||||
//#define BLIS_L1F_FUSE_FAC_Z 2
|
||||
//#define BLIS_DEFAULT_1F_S 8
|
||||
//#define BLIS_DEFAULT_1F_D 4
|
||||
//#define BLIS_DEFAULT_1F_C 4
|
||||
//#define BLIS_DEFAULT_1F_Z 2
|
||||
|
||||
//#define BLIS_AXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
//#define BLIS_AXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
//#define BLIS_AXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
//#define BLIS_AXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
//#define BLIS_DEFAULT_AF_S BLIS_DEFAULT_1F_S
|
||||
//#define BLIS_DEFAULT_AF_D BLIS_DEFAULT_1F_D
|
||||
//#define BLIS_DEFAULT_AF_C BLIS_DEFAULT_1F_C
|
||||
//#define BLIS_DEFAULT_AF_Z BLIS_DEFAULT_1F_Z
|
||||
|
||||
//#define BLIS_DOTXF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
//#define BLIS_DOTXF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
//#define BLIS_DOTXF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
//#define BLIS_DOTXF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
//#define BLIS_DEFAULT_DF_S BLIS_DEFAULT_1F_S
|
||||
//#define BLIS_DEFAULT_DF_D BLIS_DEFAULT_1F_D
|
||||
//#define BLIS_DEFAULT_DF_C BLIS_DEFAULT_1F_C
|
||||
//#define BLIS_DEFAULT_DF_Z BLIS_DEFAULT_1F_Z
|
||||
|
||||
//#define BLIS_DOTXAXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
|
||||
//#define BLIS_DOTXAXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
|
||||
//#define BLIS_DOTXAXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
|
||||
//#define BLIS_DOTXAXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
|
||||
//#define BLIS_DEFAULT_XF_S BLIS_DEFAULT_1F_S
|
||||
//#define BLIS_DEFAULT_XF_D BLIS_DEFAULT_1F_D
|
||||
//#define BLIS_DEFAULT_XF_C BLIS_DEFAULT_1F_C
|
||||
//#define BLIS_DEFAULT_XF_Z BLIS_DEFAULT_1F_Z
|
||||
|
||||
|
||||
|
||||
|
||||
@@ -36,59 +36,87 @@
|
||||
|
||||
|
||||
|
||||
void bli_saxpyv_opt_var1( conj_t conjx,
|
||||
dim_t n,
|
||||
float* restrict alpha,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict y, inc_t incy )
|
||||
void bli_saxpyv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
dim_t n,
|
||||
float* alpha,
|
||||
float* x, inc_t incx,
|
||||
float* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SAXPYV_KERNEL_REF( conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_SAXPYV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_daxpyv_opt_var1( conj_t conjx,
|
||||
dim_t n,
|
||||
double* restrict alpha,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict y, inc_t incy )
|
||||
void bli_daxpyv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
dim_t n,
|
||||
double* alpha,
|
||||
double* x, inc_t incx,
|
||||
double* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DAXPYV_KERNEL_REF( conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_DAXPYV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_caxpyv_opt_var1( conj_t conjx,
|
||||
dim_t n,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict y, inc_t incy )
|
||||
void bli_caxpyv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
dim_t n,
|
||||
scomplex* alpha,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CAXPYV_KERNEL_REF( conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_CAXPYV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
dim_t n,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict y, inc_t incy )
|
||||
void bli_zaxpyv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
dim_t n,
|
||||
dcomplex* alpha,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template axpyv kernel implementation
|
||||
@@ -193,11 +221,15 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZAXPYV_KERNEL_REF( conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_ZAXPYV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -219,7 +251,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// Compute front edge cases if x and y were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha, *xp, *yp );
|
||||
bli_zaxpys( *alpha, *xp, *yp );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -228,7 +260,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha, *xp, *yp );
|
||||
bli_zaxpys( *alpha, *xp, *yp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -237,7 +269,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha, *xp, *yp );
|
||||
bli_zaxpys( *alpha, *xp, *yp );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -247,7 +279,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// Compute front edge cases if x and y were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha, *xp, *yp );
|
||||
bli_zaxpyjs( *alpha, *xp, *yp );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -256,7 +288,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha, *xp, *yp );
|
||||
bli_zaxpyjs( *alpha, *xp, *yp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -265,7 +297,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha, *xp, *yp );
|
||||
bli_zaxpyjs( *alpha, *xp, *yp );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
|
||||
@@ -36,66 +36,94 @@
|
||||
|
||||
|
||||
|
||||
void bli_sdotv_opt_var1( conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict y, inc_t incy,
|
||||
float* restrict rho )
|
||||
void bli_sdotv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* x, inc_t incx,
|
||||
float* y, inc_t incy,
|
||||
float* rho,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SDOTV_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho );
|
||||
BLIS_SDOTV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ddotv_opt_var1( conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict y, inc_t incy,
|
||||
double* restrict rho )
|
||||
void bli_ddotv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* x, inc_t incx,
|
||||
double* y, inc_t incy,
|
||||
double* rho,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DDOTV_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho );
|
||||
BLIS_DDOTV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_cdotv_opt_var1( conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict y, inc_t incy,
|
||||
scomplex* restrict rho )
|
||||
void bli_cdotv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* y, inc_t incy,
|
||||
scomplex* rho,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CDOTV_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho );
|
||||
BLIS_CDOTV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zdotv_opt_var1( conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict y, inc_t incy,
|
||||
dcomplex* restrict rho )
|
||||
void bli_zdotv_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* y, inc_t incy,
|
||||
dcomplex* rho,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template dotv kernel implementation
|
||||
@@ -210,12 +238,16 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZDOTV_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho );
|
||||
BLIS_ZDOTV_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -250,7 +282,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// Compute front edge cases if x and y were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -259,7 +291,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -268,7 +300,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -278,7 +310,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// Compute front edge cases if x and y were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -287,7 +319,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -296,7 +328,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
|
||||
xp += 1; yp += 1;
|
||||
}
|
||||
@@ -307,6 +339,6 @@ void bli_zdotv_opt_var1( conj_t conjx,
|
||||
if ( bli_is_conj( conjy ) )
|
||||
bli_zconjs( dotxy );
|
||||
|
||||
bli_zzcopys( dotxy, *rho );
|
||||
bli_zcopys( dotxy, *rho );
|
||||
}
|
||||
|
||||
|
||||
@@ -36,88 +36,108 @@
|
||||
|
||||
|
||||
|
||||
void bli_saxpy2v_opt_var1(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* restrict alpha1,
|
||||
float* restrict alpha2,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict y, inc_t incy,
|
||||
float* restrict z, inc_t incz
|
||||
)
|
||||
void bli_saxpy2v_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* alpha1,
|
||||
float* alpha2,
|
||||
float* x, inc_t incx,
|
||||
float* y, inc_t incy,
|
||||
float* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SAXPY2V_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_SAXPY2V_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_daxpy2v_opt_var1(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* restrict alpha1,
|
||||
double* restrict alpha2,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict y, inc_t incy,
|
||||
double* restrict z, inc_t incz
|
||||
)
|
||||
void bli_daxpy2v_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* alpha1,
|
||||
double* alpha2,
|
||||
double* x, inc_t incx,
|
||||
double* y, inc_t incy,
|
||||
double* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DAXPY2V_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_DAXPY2V_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_caxpy2v_opt_var1(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* restrict alpha1,
|
||||
scomplex* restrict alpha2,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict y, inc_t incy,
|
||||
scomplex* restrict z, inc_t incz
|
||||
)
|
||||
void bli_caxpy2v_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* alpha1,
|
||||
scomplex* alpha2,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* y, inc_t incy,
|
||||
scomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CAXPY2V_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_CAXPY2V_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zaxpy2v_opt_var1(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* restrict alpha1,
|
||||
dcomplex* restrict alpha2,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict y, inc_t incy,
|
||||
dcomplex* restrict z, inc_t incz
|
||||
)
|
||||
void bli_zaxpy2v_opt_var1
|
||||
(
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* alpha1,
|
||||
dcomplex* alpha2,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* y, inc_t incy,
|
||||
dcomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template axpy2v kernel implementation
|
||||
@@ -229,14 +249,18 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZAXPY2V_KERNEL_REF( conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_ZAXPY2V_KERNEL_REF
|
||||
(
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha1,
|
||||
alpha2,
|
||||
x, incx,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -259,8 +283,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -272,8 +296,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -283,8 +307,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -294,8 +318,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -307,8 +331,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -318,8 +342,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpys( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpys( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -329,8 +353,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -342,8 +366,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -353,8 +377,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpys( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpys( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -364,8 +388,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -377,8 +401,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -388,8 +412,8 @@ void bli_zaxpy2v_opt_var1(
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zzzaxpyjs( *alpha2, *yp, *zp );
|
||||
bli_zaxpyjs( *alpha1, *xp, *zp );
|
||||
bli_zaxpyjs( *alpha2, *yp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
|
||||
@@ -36,87 +36,107 @@
|
||||
|
||||
|
||||
|
||||
void bli_saxpyf_opt_var1(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* restrict alpha,
|
||||
float* restrict a, inc_t inca, inc_t lda,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict y, inc_t incy
|
||||
)
|
||||
void bli_saxpyf_opt_var1
|
||||
(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* alpha,
|
||||
float* a, inc_t inca, inc_t lda,
|
||||
float* x, inc_t incx,
|
||||
float* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SAXPYF_KERNEL_REF( conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_SAXPYF_KERNEL_REF
|
||||
(
|
||||
conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_daxpyf_opt_var1(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* restrict alpha,
|
||||
double* restrict a, inc_t inca, inc_t lda,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict y, inc_t incy
|
||||
)
|
||||
void bli_daxpyf_opt_var1
|
||||
(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* alpha,
|
||||
double* a, inc_t inca, inc_t lda,
|
||||
double* x, inc_t incx,
|
||||
double* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DAXPYF_KERNEL_REF( conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_DAXPYF_KERNEL_REF
|
||||
(
|
||||
conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_caxpyf_opt_var1(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a, inc_t inca, inc_t lda,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict y, inc_t incy
|
||||
)
|
||||
void bli_caxpyf_opt_var1
|
||||
(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* alpha,
|
||||
scomplex* a, inc_t inca, inc_t lda,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CAXPYF_KERNEL_REF( conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_CAXPYF_KERNEL_REF
|
||||
(
|
||||
conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
void bli_zaxpyf_opt_var1(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a, inc_t inca, inc_t lda,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict y, inc_t incy
|
||||
)
|
||||
void bli_zaxpyf_opt_var1
|
||||
(
|
||||
conj_t conja,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* alpha,
|
||||
dcomplex* a, inc_t inca, inc_t lda,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template axpyf kernel implementation
|
||||
@@ -243,14 +263,18 @@ void bli_zaxpyf_opt_var1(
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZAXPYF_KERNEL_REF( conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy );
|
||||
BLIS_ZAXPYF_KERNEL_REF
|
||||
(
|
||||
conja,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -274,16 +298,16 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzcopys( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zzscals( *alpha, alpha_x[ j ] );
|
||||
bli_zcopys( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zscals( *alpha, alpha_x[ j ] );
|
||||
}
|
||||
}
|
||||
else // if ( bli_is_conj( conjx ) )
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzcopyjs( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zzscals( *alpha, alpha_x[ j ] );
|
||||
bli_zcopyjs( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zscals( *alpha, alpha_x[ j ] );
|
||||
}
|
||||
}
|
||||
|
||||
@@ -296,7 +320,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -312,7 +336,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -324,7 +348,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -338,7 +362,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -354,7 +378,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -366,7 +390,7 @@ void bli_zaxpyf_opt_var1(
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
|
||||
@@ -36,87 +36,115 @@
|
||||
|
||||
|
||||
|
||||
void bli_sdotaxpyv_opt_var1( conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* restrict alpha,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict y, inc_t incy,
|
||||
float* restrict rho,
|
||||
float* restrict z, inc_t incz )
|
||||
void bli_sdotaxpyv_opt_var1
|
||||
(
|
||||
conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
float* alpha,
|
||||
float* x, inc_t incx,
|
||||
float* y, inc_t incy,
|
||||
float* rho,
|
||||
float* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SDOTAXPYV_KERNEL_REF( conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz );
|
||||
BLIS_SDOTAXPYV_KERNEL_REF
|
||||
(
|
||||
conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ddotaxpyv_opt_var1( conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* restrict alpha,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict y, inc_t incy,
|
||||
double* restrict rho,
|
||||
double* restrict z, inc_t incz )
|
||||
void bli_ddotaxpyv_opt_var1
|
||||
(
|
||||
conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
double* alpha,
|
||||
double* x, inc_t incx,
|
||||
double* y, inc_t incy,
|
||||
double* rho,
|
||||
double* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DDOTAXPYV_KERNEL_REF( conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz );
|
||||
BLIS_DDOTAXPYV_KERNEL_REF
|
||||
(
|
||||
conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_cdotaxpyv_opt_var1( conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict y, inc_t incy,
|
||||
scomplex* restrict rho,
|
||||
scomplex* restrict z, inc_t incz )
|
||||
void bli_cdotaxpyv_opt_var1
|
||||
(
|
||||
conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
scomplex* alpha,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* y, inc_t incy,
|
||||
scomplex* rho,
|
||||
scomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CDOTAXPYV_KERNEL_REF( conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz );
|
||||
BLIS_CDOTAXPYV_KERNEL_REF
|
||||
(
|
||||
conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict y, inc_t incy,
|
||||
dcomplex* restrict rho,
|
||||
dcomplex* restrict z, inc_t incz )
|
||||
void bli_zdotaxpyv_opt_var1
|
||||
(
|
||||
conj_t conjxt,
|
||||
conj_t conjx,
|
||||
conj_t conjy,
|
||||
dim_t n,
|
||||
dcomplex* alpha,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* y, inc_t incy,
|
||||
dcomplex* rho,
|
||||
dcomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template dotaxpyv kernel implementation
|
||||
@@ -240,15 +268,19 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZDOTAXPYV_KERNEL_REF( conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz );
|
||||
BLIS_ZDOTAXPYV_KERNEL_REF
|
||||
(
|
||||
conjxt,
|
||||
conjx,
|
||||
conjy,
|
||||
n,
|
||||
alpha,
|
||||
x, incx,
|
||||
y, incy,
|
||||
rho,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -285,8 +317,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -298,8 +330,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -309,8 +341,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -320,8 +352,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -333,8 +365,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -344,8 +376,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpys( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpys( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -355,8 +387,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -368,8 +400,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -379,8 +411,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdots( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdots( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -390,8 +422,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute front edge cases if x, y, and z were unaligned.
|
||||
for ( i = 0; i < n_pre; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -403,8 +435,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
|
||||
for ( i = 0; i < n_iter; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += n_elem_per_iter;
|
||||
yp += n_elem_per_iter;
|
||||
@@ -414,8 +446,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
// Compute tail edge cases, if applicable.
|
||||
for ( i = 0; i < n_left; ++i )
|
||||
{
|
||||
bli_zzzdotjs( *xp, *yp, dotxy );
|
||||
bli_zzzaxpyjs( *alpha, *xp, *zp );
|
||||
bli_zdotjs( *xp, *yp, dotxy );
|
||||
bli_zaxpyjs( *alpha, *xp, *zp );
|
||||
|
||||
xp += 1; yp += 1; zp += 1;
|
||||
}
|
||||
@@ -426,6 +458,6 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
|
||||
if ( bli_is_conj( conjy ) )
|
||||
bli_zconjs( dotxy );
|
||||
|
||||
bli_zzcopys( dotxy, *rho );
|
||||
bli_zcopys( dotxy, *rho );
|
||||
}
|
||||
|
||||
|
||||
@@ -36,115 +36,143 @@
|
||||
|
||||
|
||||
|
||||
void bli_sdotxaxpyf_opt_var1( conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* restrict alpha,
|
||||
float* restrict a, inc_t inca, inc_t lda,
|
||||
float* restrict w, inc_t incw,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict beta,
|
||||
float* restrict y, inc_t incy,
|
||||
float* restrict z, inc_t incz )
|
||||
void bli_sdotxaxpyf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* alpha,
|
||||
float* a, inc_t inca, inc_t lda,
|
||||
float* w, inc_t incw,
|
||||
float* x, inc_t incx,
|
||||
float* beta,
|
||||
float* y, inc_t incy,
|
||||
float* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SDOTXAXPYF_KERNEL_REF( conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_SDOTXAXPYF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ddotxaxpyf_opt_var1( conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* restrict alpha,
|
||||
double* restrict a, inc_t inca, inc_t lda,
|
||||
double* restrict w, inc_t incw,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict beta,
|
||||
double* restrict y, inc_t incy,
|
||||
double* restrict z, inc_t incz )
|
||||
void bli_ddotxaxpyf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* alpha,
|
||||
double* a, inc_t inca, inc_t lda,
|
||||
double* w, inc_t incw,
|
||||
double* x, inc_t incx,
|
||||
double* beta,
|
||||
double* y, inc_t incy,
|
||||
double* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DDOTXAXPYF_KERNEL_REF( conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_DDOTXAXPYF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_cdotxaxpyf_opt_var1( conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a, inc_t inca, inc_t lda,
|
||||
scomplex* restrict w, inc_t incw,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict beta,
|
||||
scomplex* restrict y, inc_t incy,
|
||||
scomplex* restrict z, inc_t incz )
|
||||
void bli_cdotxaxpyf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* alpha,
|
||||
scomplex* a, inc_t inca, inc_t lda,
|
||||
scomplex* w, inc_t incw,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* beta,
|
||||
scomplex* y, inc_t incy,
|
||||
scomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CDOTXAXPYF_KERNEL_REF( conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_CDOTXAXPYF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a, inc_t inca, inc_t lda,
|
||||
dcomplex* restrict w, inc_t incw,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict beta,
|
||||
dcomplex* restrict y, inc_t incy,
|
||||
dcomplex* restrict z, inc_t incz )
|
||||
void bli_zdotxaxpyf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conja,
|
||||
conj_t conjw,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* alpha,
|
||||
dcomplex* a, inc_t inca, inc_t lda,
|
||||
dcomplex* w, inc_t incw,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* beta,
|
||||
dcomplex* y, inc_t incy,
|
||||
dcomplex* z, inc_t incz,
|
||||
cntx_t* cntx
|
||||
)
|
||||
|
||||
{
|
||||
/*
|
||||
@@ -289,19 +317,23 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZDOTXAXPYF_KERNEL_REF( conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz );
|
||||
BLIS_ZDOTXAXPYF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conja,
|
||||
conjw,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
w, incw,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
z, incz,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -326,16 +358,16 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzcopys( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zzscals( *alpha, alpha_x[ j ] );
|
||||
bli_zcopys( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zscals( *alpha, alpha_x[ j ] );
|
||||
}
|
||||
}
|
||||
else // if ( bli_is_conj( conjx ) )
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzcopyjs( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zzscals( *alpha, alpha_x[ j ] );
|
||||
bli_zcopyjs( *xp[ j ], alpha_x[ j ] );
|
||||
bli_zscals( *alpha, alpha_x[ j ] );
|
||||
}
|
||||
}
|
||||
|
||||
@@ -366,8 +398,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -383,8 +415,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -396,8 +428,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -411,8 +443,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -428,8 +460,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -441,8 +473,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -456,8 +488,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -473,8 +505,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -486,8 +518,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -501,8 +533,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -518,8 +550,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += n_elem_per_iter;
|
||||
}
|
||||
@@ -531,8 +563,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
{
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
|
||||
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
|
||||
|
||||
ap[ j ] += 1;
|
||||
}
|
||||
@@ -555,8 +587,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
|
||||
// scaling by beta.
|
||||
for ( j = 0; j < b_n; ++j )
|
||||
{
|
||||
bli_zzscals( *beta, *yp[ j ] );
|
||||
bli_zzzaxpys( *alpha, At_w[ j ], *yp[ j ] );
|
||||
bli_zscals( *beta, *yp[ j ] );
|
||||
bli_zaxpys( *alpha, At_w[ j ], *yp[ j ] );
|
||||
}
|
||||
}
|
||||
|
||||
|
||||
@@ -36,95 +36,115 @@
|
||||
|
||||
|
||||
|
||||
void bli_sdotxf_opt_var1(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* restrict alpha,
|
||||
float* restrict a, inc_t inca, inc_t lda,
|
||||
float* restrict x, inc_t incx,
|
||||
float* restrict beta,
|
||||
float* restrict y, inc_t incy
|
||||
)
|
||||
void bli_sdotxf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
float* alpha,
|
||||
float* a, inc_t inca, inc_t lda,
|
||||
float* x, inc_t incx,
|
||||
float* beta,
|
||||
float* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SDOTXF_KERNEL_REF( conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy );
|
||||
BLIS_SDOTXF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ddotxf_opt_var1(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* restrict alpha,
|
||||
double* restrict a, inc_t inca, inc_t lda,
|
||||
double* restrict x, inc_t incx,
|
||||
double* restrict beta,
|
||||
double* restrict y, inc_t incy
|
||||
)
|
||||
void bli_ddotxf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
double* alpha,
|
||||
double* a, inc_t inca, inc_t lda,
|
||||
double* x, inc_t incx,
|
||||
double* beta,
|
||||
double* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_DDOTXF_KERNEL_REF( conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy );
|
||||
BLIS_DDOTXF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_cdotxf_opt_var1(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a, inc_t inca, inc_t lda,
|
||||
scomplex* restrict x, inc_t incx,
|
||||
scomplex* restrict beta,
|
||||
scomplex* restrict y, inc_t incy
|
||||
)
|
||||
void bli_cdotxf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
scomplex* alpha,
|
||||
scomplex* a, inc_t inca, inc_t lda,
|
||||
scomplex* x, inc_t incx,
|
||||
scomplex* beta,
|
||||
scomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CDOTXF_KERNEL_REF( conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy );
|
||||
BLIS_CDOTXF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zdotxf_opt_var1(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a, inc_t inca, inc_t lda,
|
||||
dcomplex* restrict x, inc_t incx,
|
||||
dcomplex* restrict beta,
|
||||
dcomplex* restrict y, inc_t incy
|
||||
)
|
||||
void bli_zdotxf_opt_var1
|
||||
(
|
||||
conj_t conjat,
|
||||
conj_t conjx,
|
||||
dim_t m,
|
||||
dim_t b_n,
|
||||
dcomplex* alpha,
|
||||
dcomplex* a, inc_t inca, inc_t lda,
|
||||
dcomplex* x, inc_t incx,
|
||||
dcomplex* beta,
|
||||
dcomplex* y, inc_t incy,
|
||||
cntx_t* cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template dotxf kernel implementation
|
||||
@@ -225,10 +245,14 @@ void bli_zdotxf_opt_var1(
|
||||
// If the vector lengths are zero, scale r by beta and return.
|
||||
if ( bli_zero_dim1( m ) )
|
||||
{
|
||||
bli_zzscalv( BLIS_NO_CONJUGATE,
|
||||
b_n,
|
||||
beta,
|
||||
y, incy );
|
||||
bli_zscalv_ex
|
||||
(
|
||||
BLIS_NO_CONJUGATE,
|
||||
b_n,
|
||||
beta,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
@@ -265,15 +289,19 @@ void bli_zdotxf_opt_var1(
|
||||
// Call the reference implementation if needed.
|
||||
if ( use_ref == TRUE )
|
||||
{
|
||||
BLIS_ZDOTXF_KERNEL_REF( conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy );
|
||||
BLIS_ZDOTXF_KERNEL_REF
|
||||
(
|
||||
conjat,
|
||||
conjx,
|
||||
m,
|
||||
b_n,
|
||||
alpha,
|
||||
a, inca, lda,
|
||||
x, incx,
|
||||
beta,
|
||||
y, incy,
|
||||
cntx
|
||||
);
|
||||
return;
|
||||
}
|
||||
|
||||
|
||||
@@ -36,37 +36,45 @@
|
||||
|
||||
|
||||
|
||||
void bli_sgemm_opt_mxn(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a1,
|
||||
float* restrict b1,
|
||||
float* restrict beta,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_sgemm_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a1,
|
||||
float* restrict b1,
|
||||
float* restrict beta,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_SGEMM_UKERNEL_REF( k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_SGEMM_UKERNEL_REF
|
||||
(
|
||||
k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_dgemm_opt_mxn(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a1,
|
||||
double* restrict b1,
|
||||
double* restrict beta,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_dgemm_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a1,
|
||||
double* restrict b1,
|
||||
double* restrict beta,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template gemm micro-kernel implementation
|
||||
@@ -85,133 +93,27 @@ void bli_dgemm_opt_mxn(
|
||||
where A1 is MR x k, B1 is k x NR, C11 is MR x NR, and alpha and beta are
|
||||
scalars.
|
||||
|
||||
Parameters:
|
||||
For more info, please refer to the BLIS website's wiki on kernels:
|
||||
|
||||
- k: The number of columns of A1 and rows of B1.
|
||||
- alpha: The address of a scalar to the A1 * B1 product.
|
||||
- a1: The address of a micro-panel of matrix A of dimension MR x k,
|
||||
stored by columns with leading dimension PACKMR, where
|
||||
typically PACKMR = MR.
|
||||
- b1: The address of a micro-panel of matrix B of dimension k x NR,
|
||||
stored by rows with leading dimension PACKNR, where typically
|
||||
PACKNR = NR.
|
||||
- beta: The address of a scalar to the input value of matrix C11.
|
||||
- c11: The address of a submatrix C11 of dimension MR x NR, stored
|
||||
according to rs_c and cs_c.
|
||||
- rs_c: The row stride of matrix C11 (ie: the distance to the next row,
|
||||
in units of matrix elements).
|
||||
- cs_c: The column stride of matrix C11 (ie: the distance to the next
|
||||
column, in units of matrix elements).
|
||||
- data: The address of an auxinfo_t object that contains auxiliary
|
||||
information that may be useful when optimizing the gemm
|
||||
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
|
||||
more info.)
|
||||
https://github.com/flame/blis/wiki/KernelsHowTo
|
||||
|
||||
Diagram for gemm
|
||||
|
||||
The diagram below shows the packed micro-panel operands and how elements
|
||||
of each would be stored when MR = NR = 4. The hex digits indicate the
|
||||
layout and order (but NOT the numeric contents) of the elements in
|
||||
memory. Note that the storage of C11 is not shown since it is determined
|
||||
by the row and column strides of C11.
|
||||
|
||||
c11: a1: b1:
|
||||
_______ ______________________ _______
|
||||
| | |0 4 8 C | |0 1 2 3|
|
||||
MR | | |1 5 9 D . . . | |4 5 6 7|
|
||||
| | += |2 6 A E | |8 9 A B|
|
||||
|_______| |3_7_B_F_______________| |C D E F|
|
||||
| . |
|
||||
NR k | . | k
|
||||
| . |
|
||||
| |
|
||||
| |
|
||||
|_______|
|
||||
|
||||
NR
|
||||
Implementation Notes for gemm
|
||||
|
||||
- Register blocksizes. The C preprocessor macros bli_?mr and bli_?nr
|
||||
evaluate to the MR and NR register blocksizes for the datatype
|
||||
corresponding to the '?' character. These values are abbreviations
|
||||
of the macro constants BLIS_DEFAULT_MR_? and BLIS_DEFAULT_NR_?,
|
||||
which are defined in the bli_kernel.h header file of the BLIS
|
||||
configuration.
|
||||
- Leading dimensions of a1 and b1: PACKMR and PACKNR. The packed
|
||||
micro-panels a1 and b1 are simply stored in column-major and row-major
|
||||
order, respectively. Usually, the width of either micro-panel (ie:
|
||||
the number of rows of A1, or MR, and the number of columns of B1, or
|
||||
NR) is equal to that micro-panel's so-called "leading dimension."
|
||||
Sometimes, it may be beneficial to specify a leading dimension that
|
||||
is larger than the panel width. This may be desirable because it
|
||||
allows each column of A1 or row of B1 to maintain a certain alignment
|
||||
in memory that would not otherwise be maintained by MR and/or NR. In
|
||||
this case, you should index through a1 and b1 using the values PACKMR
|
||||
and PACKNR, respectively, as defined by bli_?packmr and bli_?packnr.
|
||||
These values are defined as BLIS_PACKDIM_MR_? and BLIS_PACKDIM_NR_?,
|
||||
respectively, in the bli_kernel.h header file of the BLIS
|
||||
configuration.
|
||||
- Storage preference of c11: Sometimes, an optimized micro-kernel will
|
||||
have a preferred storage format for C11--typically either contiguous
|
||||
row-storage or contiguous column-storage. This preference comes from
|
||||
how the micro-kernel is most efficiently able to load/store elements
|
||||
of C11 from/to memory. Most micro-kernels use vector instructions to
|
||||
load and store contigous columns (or column segments) of C11. However,
|
||||
the developer may decide that loading contiguous rows (or row
|
||||
segments) is desirable. If this is the case, this preference should be
|
||||
noted in bli_kernel.h by defining the macro
|
||||
BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS. Leaving the macro undefined
|
||||
leaves the default assumption (contiguous column preference) in
|
||||
place. Setting this macro allows the framework to perform a minor
|
||||
optimization at run-time that will ensure the micro-kernel preference
|
||||
is honored, if at all possible.
|
||||
- Edge cases in MR, NR dimensions. Sometimes the micro-kernel will be
|
||||
called with micro-panels a1 and b1 that correspond to edge cases,
|
||||
where only partial results are needed. Zero-padding is handled
|
||||
automatically by the packing function to facilitate reuse of the same
|
||||
micro-kernel. Similarly, the logic for computing to temporary storage
|
||||
and then saving only the elements that correspond to elements of C11
|
||||
that exist (at the edges) is handled automatically within the
|
||||
macro-kernel.
|
||||
- Alignment of a1 and b1. By default, the alignment of addresses a1 and
|
||||
b1 are aligned only to sizeof(type). If BLIS_CONTIG_ADDR_ALIGN_SIZE is
|
||||
set to some larger multiple of sizeof(type), such as the page size,
|
||||
then a1 and b1 will be aligned to PACKMR * sizeof(type) and PACKNR *
|
||||
sizeof(type), respectively. Alignment of a1 and b1 is also affected
|
||||
by BLIS_UPANEL_A_ALIGN_SIZE_? and BLIS_UPANEL_B_ALIGN_SIZE_?, which
|
||||
align the distance (stride) between subsequent micro-panels. (By
|
||||
default, those values are simply sizeof(type), in which case they have
|
||||
no effect.)
|
||||
- Unrolling loops. As a general rule of thumb, the loop over k is
|
||||
sometimes moderately unrolled; for example, in our experience, an
|
||||
unrolling factor of u = 4 is fairly common. If unrolling is applied
|
||||
in the k dimension, edge cases must be handled to support values of k
|
||||
that are not multiples of u. It is nearly universally true that there
|
||||
should be no loops in the MR or NR directions; in other words,
|
||||
iteration over these dimensions should always be fully unrolled
|
||||
(within the loop over k).
|
||||
- Zero beta. If beta = 0.0 (or 0.0 + 0.0i for complex datatypes), then
|
||||
the micro-kernel should NOT use it explicitly, as C11 may contain
|
||||
uninitialized memory (including NaNs). This case should be detected
|
||||
and handled separately, preferably by simply overwriting C11 with the
|
||||
alpha * A1 * B1 product. An example of how to perform this "beta equals
|
||||
zero" handling is included in the gemm micro-kernel associated with
|
||||
the template configuration.
|
||||
|
||||
For more info, please refer to the BLIS website and/or contact the
|
||||
blis-devel mailing list.
|
||||
and/or contact the blis-devel mailing list.
|
||||
|
||||
-FGVZ
|
||||
*/
|
||||
const dim_t mr = bli_dmr;
|
||||
const dim_t nr = bli_dnr;
|
||||
const num_t dt = BLIS_DOUBLE;
|
||||
|
||||
const inc_t cs_a = bli_dpackmr;
|
||||
const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx );
|
||||
const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = bli_dpacknr;
|
||||
const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx );
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_ab = 1;
|
||||
const inc_t cs_ab = bli_dmr;
|
||||
const inc_t cs_a = packmr;
|
||||
const inc_t rs_b = packnr;
|
||||
|
||||
const inc_t rs_ab = 1;
|
||||
const inc_t cs_ab = mr;
|
||||
|
||||
dim_t l, j, i;
|
||||
|
||||
@@ -291,36 +193,56 @@ void bli_cgemm_opt_mxn(
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
(
|
||||
dim_t k,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a1,
|
||||
scomplex* restrict b1,
|
||||
scomplex* restrict beta,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CGEMM_UKERNEL_REF( k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_CGEMM_UKERNEL_REF
|
||||
(
|
||||
k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_zgemm_opt_mxn(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a1,
|
||||
dcomplex* restrict b1,
|
||||
dcomplex* restrict beta,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_zgemm_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a1,
|
||||
dcomplex* restrict b1,
|
||||
dcomplex* restrict beta,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_ZGEMM_UKERNEL_REF( k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_ZGEMM_UKERNEL_REF
|
||||
(
|
||||
k,
|
||||
alpha,
|
||||
a1,
|
||||
b1,
|
||||
beta,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -36,18 +36,24 @@
|
||||
|
||||
|
||||
|
||||
void bli_sgemmtrsm_l_opt_mxn(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a10,
|
||||
float* restrict a11,
|
||||
float* restrict b01,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_sgemmtrsm_l_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a10,
|
||||
float* restrict a11,
|
||||
float* restrict b01,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_spacknr;
|
||||
const num_t dt = BLIS_FLOAT;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
float* restrict minus_one = bli_sm1;
|
||||
@@ -69,16 +75,18 @@ void bli_sgemmtrsm_l_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_dgemmtrsm_l_opt_mxn(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a10,
|
||||
double* restrict a11,
|
||||
double* restrict b01,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_dgemmtrsm_l_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a10,
|
||||
double* restrict a11,
|
||||
double* restrict b01,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template gemmtrsm_l micro-kernel implementation
|
||||
@@ -96,114 +104,19 @@ void bli_dgemmtrsm_l_opt_mxn(
|
||||
B11 is MR x NR, and alpha is a scalar. Here, inv() denotes matrix
|
||||
inverse.
|
||||
|
||||
Parameters:
|
||||
For more info, please refer to the BLIS website's wiki on kernels:
|
||||
|
||||
- k: The number of columns of A10 and rows of B01.
|
||||
- alpha: The address of a scalar to be applied to B11.
|
||||
- a10: The address of A10, which is the MR x k submatrix of the packed
|
||||
micro-panel of A that is situated to the left of the MR x MR
|
||||
triangular submatrix A11. A10 is stored by columns with leading
|
||||
dimension PACKMR, where typically PACKMR = MR.
|
||||
- a11: The address of A11, which is the MR x MR lower triangular
|
||||
submatrix within the packed micro-panel of matrix A that is
|
||||
situated to the right of A10. A11 is stored by columns with
|
||||
leading dimension PACKMR, where typically PACKMR = MR. Note
|
||||
that A11 contains elements in both triangles, though elements
|
||||
in the unstored triangle are not guaranteed to be zero and
|
||||
thus should not be referenced.
|
||||
- b01: The address of B01, which is the k x NR submatrix of the packed
|
||||
micro-panel of B that is situated above the MR x NR submatrix
|
||||
B11. B01 is stored by rows with leading dimension PACKNR, where
|
||||
typically PACKNR = NR.
|
||||
- b11: The address B11, which is the MR x NR submatrix of the packed
|
||||
micro-panel of B, situated below B01. B11 is stored by rows
|
||||
with leading dimension PACKNR, where typically PACKNR = NR.
|
||||
- c11: The address of C11, which is the MR x NR submatrix of matrix
|
||||
C, stored according to rs_c and cs_c. C11 is the submatrix
|
||||
within C that corresponds to the elements which were packed
|
||||
into B11. Thus, C is the original input matrix B to the overall
|
||||
trsm operation.
|
||||
- rs_c: The row stride of C11 (ie: the distance to the next row of C11,
|
||||
in units of matrix elements).
|
||||
- cs_c: The column stride of C11 (ie: the distance to the next column of
|
||||
C11, in units of matrix elements).
|
||||
- data: The address of an auxinfo_t object that contains auxiliary
|
||||
information that may be useful when optimizing the gemmtrsm
|
||||
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
|
||||
more info.)
|
||||
https://github.com/flame/blis/wiki/KernelsHowTo
|
||||
|
||||
Diagram for gemmtrsm_l
|
||||
|
||||
The diagram below shows the packed micro-panel operands for trsm_l and
|
||||
how elements of each would be stored when MR = NR = 4. (The hex digits
|
||||
indicate the layout and order (but NOT the numeric contents) in memory.
|
||||
Here, matrix A11 (referenced by a11) is lower triangular. Matrix A11
|
||||
does contain elements corresponding to the strictly upper triangle,
|
||||
however, they are not guaranteed to contain zeros and thus these elements
|
||||
should not be referenced.
|
||||
|
||||
NR
|
||||
_______
|
||||
b01:|0 1 2 3|
|
||||
|4 5 6 7|
|
||||
|8 9 A B|
|
||||
|C D E F|
|
||||
k | . |
|
||||
| . |
|
||||
a10: a11: | . |
|
||||
___________________ _______ |_______|
|
||||
|0 4 8 C |`. | b11:| |
|
||||
MR |1 5 9 D . . . | `. | | |
|
||||
|2 6 A E | `. | MR | |
|
||||
|3_7_B_F____________|______`.| |_______|
|
||||
|
||||
k MR
|
||||
|
||||
|
||||
Implementation Notes for gemmtrsm
|
||||
|
||||
- Register blocksizes. See Implementation Notes for gemm.
|
||||
- Leading dimensions of a1 and b1: PACKMR and PACKNR. See Implementation
|
||||
Notes for gemm.
|
||||
- Edge cases in MR, NR dimensions. See Implementation Notes for gemm.
|
||||
- Alignment of a1 and b1. The addresses a1 and b1 are aligned according
|
||||
to PACKMR*sizeof(type) and PACKNR*sizeof(type), respectively.
|
||||
- Unrolling loops. Most optimized implementations should unroll all
|
||||
three loops within the trsm subproblem of gemmtrsm. See Implementation
|
||||
Notes for gemm for remarks on unrolling the gemm subproblem.
|
||||
- Prefetching next micro-panels of A and B. When invoked from within a
|
||||
gemmtrsm_l micro-kernel, the addresses accessible via
|
||||
bli_auxinfo_next_a() and bli_auxinfo_next_b() refer to the next
|
||||
invocation's a10 and b01, respectively, while in gemmtrsm_u, the
|
||||
_next_a() and _next_b() macros return the addresses of the next
|
||||
invocation's a11 and b11 (since those submatrices precede a12 and b21).
|
||||
(See BLIS KernelsHowTo wiki for more info.)
|
||||
- Zero alpha. The micro-kernel can safely assume that alpha is non-zero;
|
||||
"alpha equals zero" handling is performed at a much higher level,
|
||||
which means that, in such a scenario, the micro-kernel will never get
|
||||
called.
|
||||
- Diagonal elements of A11. See Implementation Notes for trsm.
|
||||
- Zero elements of A11. See Implementation Notes for trsm.
|
||||
- Output. See Implementation Notes for trsm.
|
||||
- Optimization. Let's assume that the gemm micro-kernel has already been
|
||||
optimized. You have two options with regard to optimizing the fused
|
||||
gemmtrsm micro-kernels:
|
||||
(1) Optimize only the trsm micro-kernels. This will result in the gemm
|
||||
and trsm_l micro-kernels being called in sequence. (Likewise for
|
||||
gemm and trsm_u.)
|
||||
(2) Fuse the implementation of the gemm micro-kernel with that of the
|
||||
trsm micro-kernels by inlining both into the gemmtrsm_l and
|
||||
gemmtrsm_u micro-kernel definitions. This option is more labor-
|
||||
intensive, but also more likely to yield higher performance because
|
||||
it avoids redundant memory operations on the packed MR x NR
|
||||
submatrix B11.
|
||||
|
||||
For more info, please refer to the BLIS website and/or contact the
|
||||
blis-devel mailing list.
|
||||
and/or contact the blis-devel mailing list.
|
||||
|
||||
-FGVZ
|
||||
*/
|
||||
const inc_t rs_b = bli_dpacknr;
|
||||
const num_t dt = BLIS_DOUBLE;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
double* restrict minus_one = bli_dm1;
|
||||
@@ -227,18 +140,24 @@ void bli_dgemmtrsm_l_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_cgemmtrsm_l_opt_mxn(
|
||||
dim_t k,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a10,
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b01,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_cgemmtrsm_l_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a10,
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b01,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_cpacknr;
|
||||
const num_t dt = BLIS_SCOMPLEX;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
scomplex* restrict minus_one = bli_cm1;
|
||||
@@ -260,18 +179,24 @@ void bli_cgemmtrsm_l_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_zgemmtrsm_l_opt_mxn(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a10,
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b01,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_zgemmtrsm_l_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a10,
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b01,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_zpacknr;
|
||||
const num_t dt = BLIS_DCOMPLEX;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
dcomplex* restrict minus_one = bli_zm1;
|
||||
|
||||
@@ -36,18 +36,24 @@
|
||||
|
||||
|
||||
|
||||
void bli_sgemmtrsm_u_opt_mxn(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a12,
|
||||
float* restrict a11,
|
||||
float* restrict b21,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_sgemmtrsm_u_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
float* restrict alpha,
|
||||
float* restrict a10,
|
||||
float* restrict a11,
|
||||
float* restrict b01,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_spacknr;
|
||||
const num_t dt = BLIS_FLOAT;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
float* restrict minus_one = bli_sm1;
|
||||
@@ -69,16 +75,18 @@ void bli_sgemmtrsm_u_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_dgemmtrsm_u_opt_mxn(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a12,
|
||||
double* restrict a11,
|
||||
double* restrict b21,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_dgemmtrsm_u_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
double* restrict alpha,
|
||||
double* restrict a10,
|
||||
double* restrict a11,
|
||||
double* restrict b01,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template gemmtrsm_u micro-kernel implementation
|
||||
@@ -96,111 +104,19 @@ void bli_dgemmtrsm_u_opt_mxn(
|
||||
B11 is MR x NR, and alpha is a scalar. Here, inv() denotes matrix
|
||||
inverse.
|
||||
|
||||
Parameters:
|
||||
For more info, please refer to the BLIS website's wiki on kernels:
|
||||
|
||||
- k: The number of columns of A12 and rows of B21.
|
||||
- alpha: The address of a scalar to be applied to B11.
|
||||
- a12: The address of A12, which is the MR x k submatrix of the packed
|
||||
micro-panel of A that is situated to the right of the MR x MR
|
||||
triangular submatrix A11. A12 is stored by columns with leading
|
||||
dimension PACKMR, where typically PACKMR = MR.
|
||||
- a11: The address of A11, which is the MR x MR upper triangular
|
||||
submatrix within the packed micro-panel of matrix A that is
|
||||
situated to the left of A12. A11 is stored by columns with
|
||||
leading dimension PACKMR, where typically PACKMR = MR. Note
|
||||
that A11 contains elements in both triangles, though elements
|
||||
in the unstored triangle are not guaranteed to be zero and
|
||||
thus should not be referenced.
|
||||
- b21: The address of B21, which is the k x NR submatrix of the packed
|
||||
micro-panel of B that is situated above the MR x NR submatrix
|
||||
B11. B01 is stored by rows with leading dimension PACKNR, where
|
||||
typically PACKNR = NR.
|
||||
- b11: The address B11, which is the MR x NR submatrix of the packed
|
||||
micro-panel of B, situated below B01. B11 is stored by rows
|
||||
with leading dimension PACKNR, where typically PACKNR = NR.
|
||||
- c11: The address of C11, which is the MR x NR submatrix of matrix
|
||||
C, stored according to rs_c and cs_c. C11 is the submatrix
|
||||
within C that corresponds to the elements which were packed
|
||||
into B11. Thus, C is the original input matrix B to the overall
|
||||
trsm operation.
|
||||
- rs_c: The row stride of C11 (ie: the distance to the next row of C11,
|
||||
in units of matrix elements).
|
||||
- cs_c: The column stride of C11 (ie: the distance to the next column of
|
||||
C11, in units of matrix elements).
|
||||
- data: The address of an auxinfo_t object that contains auxiliary
|
||||
information that may be useful when optimizing the gemmtrsm
|
||||
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
|
||||
more info.)
|
||||
https://github.com/flame/blis/wiki/KernelsHowTo
|
||||
|
||||
Diagram for gemmtrsm_u
|
||||
|
||||
The diagram below shows the packed micro-panel operands for trsm_l and
|
||||
how elements of each would be stored when MR = NR = 4. (The hex digits
|
||||
indicate the layout and order (but NOT the numeric contents) in memory.
|
||||
Here, matrix A11 (referenced by a11) is upper triangular. Matrix A11
|
||||
does contain elements corresponding to the strictly lower triangle,
|
||||
however, they are not guaranteed to contain zeros and thus these elements
|
||||
should not be referenced.
|
||||
|
||||
a11: a12: NR
|
||||
________ ___________________ _______
|
||||
|`. |0 4 8 | b11:|0 1 2 3|
|
||||
MR | `. |1 5 9 . . . | |4 5 6 7|
|
||||
| `. |2 6 A | MR |8 9 A B|
|
||||
|______`.|3_7_B______________| |___.___|
|
||||
b21:| . |
|
||||
MR k | . |
|
||||
| |
|
||||
| |
|
||||
NOTE: Storage digits are shown k | |
|
||||
starting with a12 to avoid | |
|
||||
obscuring triangular structure | |
|
||||
of a11. |_______|
|
||||
|
||||
|
||||
Implementation Notes for gemmtrsm
|
||||
|
||||
- Register blocksizes. See Implementation Notes for gemm.
|
||||
- Leading dimensions of a1 and b1: PACKMR and PACKNR. See Implementation
|
||||
Notes for gemm.
|
||||
- Edge cases in MR, NR dimensions. See Implementation Notes for gemm.
|
||||
- Alignment of a1 and b1. The addresses a1 and b1 are aligned according
|
||||
to PACKMR*sizeof(type) and PACKNR*sizeof(type), respectively.
|
||||
- Unrolling loops. Most optimized implementations should unroll all
|
||||
three loops within the trsm subproblem of gemmtrsm. See Implementation
|
||||
Notes for gemm for remarks on unrolling the gemm subproblem.
|
||||
- Prefetching next micro-panels of A and B. When invoked from within a
|
||||
gemmtrsm_l micro-kernel, the addresses accessible via
|
||||
bli_auxinfo_next_a() and bli_auxinfo_next_b() refer to the next
|
||||
invocation's a10 and b01, respectively, while in gemmtrsm_u, the
|
||||
_next_a() and _next_b() macros return the addresses of the next
|
||||
invocation's a11 and b11 (since those submatrices precede a12 and b21).
|
||||
(See BLIS KernelsHowTo wiki for more info.)
|
||||
- Zero alpha. The micro-kernel can safely assume that alpha is non-zero;
|
||||
"alpha equals zero" handling is performed at a much higher level,
|
||||
which means that, in such a scenario, the micro-kernel will never get
|
||||
called.
|
||||
- Diagonal elements of A11. See Implementation Notes for trsm.
|
||||
- Zero elements of A11. See Implementation Notes for trsm.
|
||||
- Output. See Implementation Notes for trsm.
|
||||
- Optimization. Let's assume that the gemm micro-kernel has already been
|
||||
optimized. You have two options with regard to optimizing the fused
|
||||
gemmtrsm micro-kernels:
|
||||
(1) Optimize only the trsm micro-kernels. This will result in the gemm
|
||||
and trsm_l micro-kernels being called in sequence. (Likewise for
|
||||
gemm and trsm_u.)
|
||||
(2) Fuse the implementation of the gemm micro-kernel with that of the
|
||||
trsm micro-kernels by inlining both into the gemmtrsm_l and
|
||||
gemmtrsm_u micro-kernel definitions. This option is more labor-
|
||||
intensive, but also more likely to yield higher performance because
|
||||
it avoids redundant memory operations on the packed MR x NR
|
||||
submatrix B11.
|
||||
|
||||
For more info, please refer to the BLIS website and/or contact the
|
||||
blis-devel mailing list.
|
||||
and/or contact the blis-devel mailing list.
|
||||
|
||||
-FGVZ
|
||||
*/
|
||||
const inc_t rs_b = bli_dpacknr;
|
||||
const num_t dt = BLIS_DOUBLE;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
double* restrict minus_one = bli_dm1;
|
||||
@@ -224,18 +140,24 @@ void bli_dgemmtrsm_u_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_cgemmtrsm_u_opt_mxn(
|
||||
dim_t k,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a12,
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b21,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_cgemmtrsm_u_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
scomplex* restrict alpha,
|
||||
scomplex* restrict a10,
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b01,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_cpacknr;
|
||||
const num_t dt = BLIS_SCOMPLEX;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
scomplex* restrict minus_one = bli_cm1;
|
||||
@@ -257,18 +179,24 @@ void bli_cgemmtrsm_u_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_zgemmtrsm_u_opt_mxn(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a12,
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b21,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_zgemmtrsm_u_opt_mxn
|
||||
(
|
||||
dim_t k,
|
||||
dcomplex* restrict alpha,
|
||||
dcomplex* restrict a10,
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b01,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
const inc_t rs_b = bli_zpacknr;
|
||||
const num_t dt = BLIS_DCOMPLEX;
|
||||
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
dcomplex* restrict minus_one = bli_zm1;
|
||||
|
||||
@@ -36,28 +36,36 @@
|
||||
|
||||
|
||||
|
||||
void bli_strsm_l_opt_mxn(
|
||||
float* restrict a11,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_strsm_l_opt_mxn
|
||||
(
|
||||
float* restrict a11,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_STRSM_L_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_STRSM_L_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_dtrsm_l_opt_mxn(
|
||||
double* restrict a11,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_dtrsm_l_opt_mxn
|
||||
(
|
||||
double* restrict a11,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template trsm_l micro-kernel implementation
|
||||
@@ -76,80 +84,28 @@ void bli_dtrsm_l_opt_mxn(
|
||||
where A11 is MR x MR and lower triangular, B11 is MR x NR, and C11 is
|
||||
MR x NR.
|
||||
|
||||
Parameters:
|
||||
For more info, please refer to the BLIS website's wiki on kernels:
|
||||
|
||||
- a11: The address of A11, which is the MR x MR lower triangular
|
||||
submatrix within the packed micro-panel of matrix A. A11 is
|
||||
stored by columns with leading dimension PACKMR, where
|
||||
typically PACKMR = MR. Note that A11 contains elements in both
|
||||
triangles, though elements in the unstored triangle are not
|
||||
guaranteed to be zero and thus should not be referenced.
|
||||
- b11: The address of B11, which is an MR x NR submatrix of the
|
||||
packed micro-panel of B. B11 is stored by rows with leading
|
||||
dimension PACKNR, where typically PACKNR = NR.
|
||||
- c11: The address of C11, which is an MR x NR submatrix of matrix C,
|
||||
stored according to rs_c and cs_c. C11 is the submatrix within
|
||||
C that corresponds to the elements which were packed into B11.
|
||||
Thus, C is the original input matrix B to the overall trsm
|
||||
operation.
|
||||
- rs_c: The row stride of C11 (ie: the distance to the next row of C11,
|
||||
in units of matrix elements).
|
||||
- cs_c: The column stride of C11 (ie: the distance to the next column of
|
||||
C11, in units of matrix elements).
|
||||
- data: The address of an auxinfo_t object that contains auxiliary
|
||||
information that may be useful when optimizing the trsm
|
||||
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
|
||||
more info.)
|
||||
https://github.com/flame/blis/wiki/KernelsHowTo
|
||||
|
||||
Diagrams for trsm
|
||||
|
||||
Please see the diagram for gemmtrsm_l to see depiction of the trsm_l and
|
||||
where it fits in with its preceding gemm subproblem.
|
||||
|
||||
Implementation Notes for trsm
|
||||
|
||||
- Register blocksizes. See Implementation Notes for gemm.
|
||||
- Leading dimensions of a11 and b11: PACKMR and PACKNR. See
|
||||
Implementation Notes for gemm.
|
||||
- Edge cases in MR, NR dimensions. See Implementation Notes for gemm.
|
||||
- Alignment of a11 and b11. See Implementation Notes for gemmtrsm.
|
||||
- Unrolling loops. Most optimized implementations should unroll all
|
||||
three loops within the trsm micro-kernel.
|
||||
- Prefetching next micro-panels of A and B. We advise against using
|
||||
the bli_auxinfo_next_a() and bli_auxinfo_next_b() macros from within
|
||||
the trsm_l and trsm_u micro-kernels, since the values returned usually
|
||||
only make sense in the context of the overall gemmtrsm subproblem.
|
||||
- Diagonal elements of A11. At the time this micro-kernel is called,
|
||||
the diagonal entries of triangular matrix A11 contain the inverse of
|
||||
the original elements. This inversion is done during packing so that
|
||||
we can avoid expensive division instructions within the micro-kernel
|
||||
itself. If the diag parameter to the higher level trsm operation was
|
||||
equal to BLIS_UNIT_DIAG, the diagonal elements will be explicitly
|
||||
unit.
|
||||
- Zero elements of A11. Since A11 is lower triangular (for trsm_l), the
|
||||
strictly upper triangle implicitly contains zeros. Similarly, the
|
||||
strictly lower triangle of A11 implicitly contains zeros when A11 is
|
||||
upper triangular (for trsm_u). However, the packing function may or
|
||||
may not actually write zeros to this region. Thus, while the
|
||||
implementation may reference these elements, it should not use them
|
||||
in any computation.
|
||||
- Output. This micro-kernel must write its result to two places: the
|
||||
submatrix B11 of the current packed micro-panel of B and the submatrix
|
||||
C11 of the output matrix C.
|
||||
|
||||
For more info, please refer to the BLIS website and/or contact the
|
||||
blis-devel mailing list.
|
||||
and/or contact the blis-devel mailing list.
|
||||
|
||||
-FGVZ
|
||||
*/
|
||||
const dim_t m = bli_dmr;
|
||||
const dim_t n = bli_dnr;
|
||||
const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx );
|
||||
const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_a = 1;
|
||||
const inc_t cs_a = bli_dpackmr;
|
||||
const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx );
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = bli_dpacknr;
|
||||
const inc_t cs_b = 1;
|
||||
const dim_t m = mr;
|
||||
const dim_t n = nr;
|
||||
|
||||
const inc_t rs_a = 1;
|
||||
const inc_t cs_a = packmr;
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
dim_t iter, i, j, l;
|
||||
dim_t n_behind;
|
||||
@@ -208,33 +164,45 @@ void bli_dtrsm_l_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_ctrsm_l_opt_mxn(
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_ctrsm_l_opt_mxn
|
||||
(
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CTRSM_L_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_CTRSM_L_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ztrsm_l_opt_mxn(
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_ztrsm_l_opt_mxn
|
||||
(
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_ZTRSM_L_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_ZTRSM_L_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -36,18 +36,24 @@
|
||||
|
||||
|
||||
|
||||
void bli_strsm_u_opt_mxn(
|
||||
float* restrict a11,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_strsm_u_opt_mxn
|
||||
(
|
||||
float* restrict a11,
|
||||
float* restrict b11,
|
||||
float* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_STRSM_U_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_STRSM_U_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -58,6 +64,13 @@ void bli_dtrsm_u_opt_mxn(
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
(
|
||||
double* restrict a11,
|
||||
double* restrict b11,
|
||||
double* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/*
|
||||
Template trsm_u micro-kernel implementation
|
||||
@@ -76,79 +89,28 @@ void bli_dtrsm_u_opt_mxn(
|
||||
where A11 is MR x MR and upper triangular, B11 is MR x NR, and C11 is
|
||||
MR x NR.
|
||||
|
||||
Parameters:
|
||||
For more info, please refer to the BLIS website's wiki on kernels:
|
||||
|
||||
- a11: The address of A11, which is the MR x MR upper triangular
|
||||
submatrix within the packed micro-panel of matrix A. A11 is
|
||||
stored by columns with leading dimension PACKMR, where
|
||||
typically PACKMR = MR. Note that A11 contains elements in both
|
||||
triangles, though elements in the unstored triangle are not
|
||||
guaranteed to be zero and thus should not be referenced.
|
||||
- b11: The address of B11, which is an MR x NR submatrix of the
|
||||
packed micro-panel of B. B11 is stored by rows with leading
|
||||
dimension PACKNR, where typically PACKNR = NR.
|
||||
- c11: The address of C11, which is an MR x NR submatrix of matrix C,
|
||||
stored according to rs_c and cs_c. C11 is the submatrix within
|
||||
C that corresponds to the elements which were packed into B11.
|
||||
Thus, C is the original input matrix B to the overall trsm
|
||||
operation.
|
||||
- rs_c: The row stride of C11 (ie: the distance to the next row of C11,
|
||||
in units of matrix elements).
|
||||
- cs_c: The column stride of C11 (ie: the distance to the next column of
|
||||
C11, in units of matrix elements).
|
||||
- data: The address of an auxinfo_t object that contains auxiliary
|
||||
information that may be useful when optimizing the trsm
|
||||
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
|
||||
more info.)
|
||||
https://github.com/flame/blis/wiki/KernelsHowTo
|
||||
|
||||
Diagrams for trsm
|
||||
|
||||
Please see the diagram for gemmtrsm_u to see depiction of the trsm_u and
|
||||
where it fits in with its preceding gemm subproblem.
|
||||
|
||||
Implementation Notes for trsm
|
||||
|
||||
- Register blocksizes. See Implementation Notes for gemm.
|
||||
- Leading dimensions of a11 and b11: PACKMR and PACKNR. See
|
||||
Implementation Notes for gemm.
|
||||
- Edge cases in MR, NR dimensions. See Implementation Notes for gemm.
|
||||
- Alignment of a11 and b11. See Implementation Notes for gemmtrsm.
|
||||
- Unrolling loops. Most optimized implementations should unroll all
|
||||
three loops within the trsm micro-kernel.
|
||||
- Prefetching next micro-panels of A and B. We advise against using
|
||||
the bli_auxinfo_next_a() and bli_auxinfo_next_b() macros from within
|
||||
the trsm_l and trsm_u micro-kernels, since the values returned usually
|
||||
only make sense in the context of the overall gemmtrsm subproblem.
|
||||
- Diagonal elements of A11. At the time this micro-kernel is called,
|
||||
the diagonal entries of triangular matrix A11 contain the inverse of
|
||||
the original elements. This inversion is done during packing so that
|
||||
we can avoid expensive division instructions within the micro-kernel
|
||||
itself. If the diag parameter to the higher level trsm operation was
|
||||
equal to BLIS_UNIT_DIAG, the diagonal elements will be explicitly
|
||||
unit.
|
||||
- Zero elements of A11. Since A11 is lower triangular (for trsm_l), the
|
||||
strictly upper triangle implicitly contains zeros. Similarly, the
|
||||
strictly lower triangle of A11 implicitly contains zeros when A11 is
|
||||
upper triangular (for trsm_u). However, the packing function may or
|
||||
may not actually write zeros to this region. Thus, the implementation
|
||||
should not reference these elements.
|
||||
- Output. This micro-kernel must write its result to two places: the
|
||||
submatrix B11 of the current packed micro-panel of B and the submatrix
|
||||
C11 of the output matrix C.
|
||||
|
||||
For more info, please refer to the BLIS website and/or contact the
|
||||
blis-devel mailing list.
|
||||
and/or contact the blis-devel mailing list.
|
||||
|
||||
-FGVZ
|
||||
*/
|
||||
const dim_t m = bli_dmr;
|
||||
const dim_t n = bli_dnr;
|
||||
const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx );
|
||||
const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_a = 1;
|
||||
const inc_t cs_a = bli_dpackmr;
|
||||
const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx );
|
||||
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
|
||||
|
||||
const inc_t rs_b = bli_dpacknr;
|
||||
const inc_t cs_b = 1;
|
||||
const dim_t m = mr;
|
||||
const dim_t n = nr;
|
||||
|
||||
const inc_t rs_a = 1;
|
||||
const inc_t cs_a = packmr;
|
||||
|
||||
const inc_t rs_b = packnr;
|
||||
const inc_t cs_b = 1;
|
||||
|
||||
dim_t iter, i, j, l;
|
||||
dim_t n_behind;
|
||||
@@ -207,33 +169,45 @@ void bli_dtrsm_u_opt_mxn(
|
||||
|
||||
|
||||
|
||||
void bli_ctrsm_u_opt_mxn(
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_ctrsm_u_opt_mxn
|
||||
(
|
||||
scomplex* restrict a11,
|
||||
scomplex* restrict b11,
|
||||
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_CTRSM_U_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_CTRSM_U_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
|
||||
void bli_ztrsm_u_opt_mxn(
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* data
|
||||
)
|
||||
void bli_ztrsm_u_opt_mxn
|
||||
(
|
||||
dcomplex* restrict a11,
|
||||
dcomplex* restrict b11,
|
||||
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
|
||||
auxinfo_t* restrict data,
|
||||
cntx_t* restrict cntx
|
||||
)
|
||||
{
|
||||
/* Just call the reference implementation. */
|
||||
BLIS_ZTRSM_U_UKERNEL_REF( a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data );
|
||||
BLIS_ZTRSM_U_UKERNEL_REF
|
||||
(
|
||||
a11,
|
||||
b11,
|
||||
c11, rs_c, cs_c,
|
||||
data,
|
||||
cntx
|
||||
);
|
||||
}
|
||||
|
||||
|
||||
@@ -32,8 +32,10 @@
|
||||
|
||||
*/
|
||||
|
||||
//
|
||||
// Prototype object-based fusing factor query routine.
|
||||
//
|
||||
dim_t bli_dotxaxpyf_fusefac( num_t dt );
|
||||
#include "bli_l0_check.h"
|
||||
|
||||
#include "bli_l0_oapi.h"
|
||||
#include "bli_l0_tapi.h"
|
||||
|
||||
// copysc
|
||||
#include "bli_copysc.h"
|
||||
314
frame/0/bli_l0_check.c
Normal file
314
frame/0/bli_l0_check.c
Normal file
@@ -0,0 +1,314 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
//
|
||||
// Define object-based check functions.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
) \
|
||||
{ \
|
||||
bli_l0_xxsc_check( chi, psi ); \
|
||||
}
|
||||
|
||||
GENFRONT( addsc )
|
||||
GENFRONT( copysc )
|
||||
GENFRONT( divsc )
|
||||
GENFRONT( mulsc )
|
||||
GENFRONT( sqrtsc )
|
||||
GENFRONT( subsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* norm \
|
||||
) \
|
||||
{ \
|
||||
bli_l0_xx2sc_check( chi, norm ); \
|
||||
}
|
||||
|
||||
GENFRONT( absqsc )
|
||||
GENFRONT( normfsc )
|
||||
|
||||
|
||||
void bli_getsc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
double* zeta_r,
|
||||
double* zeta_i
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_noninteger_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
|
||||
void bli_setsc_check
|
||||
(
|
||||
double zeta_r,
|
||||
double zeta_i,
|
||||
obj_t* chi
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_floating_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
|
||||
void bli_unzipsc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
obj_t* zeta_r,
|
||||
obj_t* zeta_i
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_noninteger_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_real_object( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_real_object( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_nonconstant_object( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_nonconstant_object( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_real_proj_of( chi, zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_real_proj_of( chi, zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
|
||||
void bli_zipsc_check
|
||||
(
|
||||
obj_t* zeta_r,
|
||||
obj_t* zeta_i,
|
||||
obj_t* chi
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_real_object( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_real_object( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_noninteger_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_nonconstant_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_real_proj_of( chi, zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_real_proj_of( chi, zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( zeta_r );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( zeta_i );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_l0_xxsc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
obj_t* psi
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_noninteger_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_noninteger_object( psi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_nonconstant_object( psi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( psi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( psi );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
void bli_l0_xx2sc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
obj_t* absq
|
||||
)
|
||||
{
|
||||
err_t e_val;
|
||||
|
||||
// Check object datatypes.
|
||||
|
||||
e_val = bli_check_noninteger_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_nonconstant_object( absq );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_real_object( absq );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_real_proj_of( chi, absq );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object dimensions.
|
||||
|
||||
e_val = bli_check_scalar_object( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_scalar_object( absq );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
// Check object buffers (for non-NULLness).
|
||||
|
||||
e_val = bli_check_object_buffer( chi );
|
||||
bli_check_error_code( e_val );
|
||||
|
||||
e_val = bli_check_object_buffer( absq );
|
||||
bli_check_error_code( e_val );
|
||||
}
|
||||
|
||||
134
frame/0/bli_l0_check.h
Normal file
134
frame/0/bli_l0_check.h
Normal file
@@ -0,0 +1,134 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based check functions.
|
||||
//
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
);
|
||||
|
||||
GENTPROT( addsc )
|
||||
GENTPROT( copysc )
|
||||
GENTPROT( divsc )
|
||||
GENTPROT( mulsc )
|
||||
GENTPROT( sqrtsc )
|
||||
GENTPROT( subsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* absq \
|
||||
);
|
||||
|
||||
GENTPROT( absqsc )
|
||||
GENTPROT( normfsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
);
|
||||
|
||||
GENTPROT( getsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
double zeta_r, \
|
||||
double zeta_i, \
|
||||
obj_t* chi \
|
||||
);
|
||||
|
||||
GENTPROT( setsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i \
|
||||
);
|
||||
|
||||
GENTPROT( unzipsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC(opname,_check) \
|
||||
( \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i, \
|
||||
obj_t* chi \
|
||||
);
|
||||
|
||||
GENTPROT( zipsc )
|
||||
|
||||
|
||||
// -----------------------------------------------------------------------------
|
||||
|
||||
void bli_l0_xxsc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
obj_t* psi
|
||||
);
|
||||
|
||||
void bli_l0_xx2sc_check
|
||||
(
|
||||
obj_t* chi,
|
||||
obj_t* norm
|
||||
);
|
||||
288
frame/0/bli_l0_oapi.c
Normal file
288
frame/0/bli_l0_oapi.c
Normal file
@@ -0,0 +1,288 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
//
|
||||
// Define object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* absq \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi; \
|
||||
num_t dt_absq_c = bli_obj_datatype_proj_to_complex( *absq ); \
|
||||
\
|
||||
void* buf_chi; \
|
||||
void* buf_absq = bli_obj_buffer_at_off( *absq ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, absq ); \
|
||||
\
|
||||
/* If chi is a scalar constant, use dt_absq_c to extract the address of the
|
||||
corresponding constant value; otherwise, use the datatype encoded
|
||||
within the chi object and extract the buffer at the chi offset. */ \
|
||||
bli_set_scalar_dt_buffer( chi, dt_absq_c, dt_chi, buf_chi ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_2 \
|
||||
( \
|
||||
dt_chi, \
|
||||
opname, \
|
||||
buf_chi, \
|
||||
buf_absq \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( absqsc )
|
||||
GENFRONT( normfsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
) \
|
||||
{ \
|
||||
num_t dt = bli_obj_datatype( *psi ); \
|
||||
\
|
||||
conj_t conjchi = bli_obj_conj_status( *chi ); \
|
||||
\
|
||||
void* buf_chi = bli_obj_buffer_for_1x1( dt, *chi ); \
|
||||
void* buf_psi = bli_obj_buffer_at_off( *psi ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, psi ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_3 \
|
||||
( \
|
||||
dt, \
|
||||
opname, \
|
||||
conjchi, \
|
||||
buf_chi, \
|
||||
buf_psi \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( addsc )
|
||||
GENFRONT( divsc )
|
||||
GENFRONT( mulsc )
|
||||
GENFRONT( subsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
) \
|
||||
{ \
|
||||
num_t dt = bli_obj_datatype( *psi ); \
|
||||
\
|
||||
void* buf_chi = bli_obj_buffer_for_1x1( dt, *chi ); \
|
||||
void* buf_psi = bli_obj_buffer_at_off( *psi ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, psi ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_2 \
|
||||
( \
|
||||
dt, \
|
||||
opname, \
|
||||
buf_chi, \
|
||||
buf_psi \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( sqrtsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi = bli_obj_datatype( *chi ); \
|
||||
num_t dt_def = BLIS_DCOMPLEX; \
|
||||
num_t dt_use; \
|
||||
\
|
||||
/* If chi is a constant object, default to using the dcomplex
|
||||
value to maximize precision, and since we don't know if the
|
||||
caller needs just the real or the real and imaginary parts. */ \
|
||||
void* buf_chi = bli_obj_buffer_for_1x1( dt_def, *chi ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
|
||||
\
|
||||
/* The _check() routine prevents integer types, so we know that chi
|
||||
is either a constant or an actual floating-point type. */ \
|
||||
if ( bli_is_constant( dt_chi ) ) dt_use = dt_def; \
|
||||
else dt_use = dt_chi; \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_3 \
|
||||
( \
|
||||
dt_use, \
|
||||
opname, \
|
||||
buf_chi, \
|
||||
zeta_r, \
|
||||
zeta_i \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( getsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
double zeta_r, \
|
||||
double zeta_i, \
|
||||
obj_t* chi \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi = bli_obj_datatype( *chi ); \
|
||||
\
|
||||
void* buf_chi = bli_obj_buffer_at_off( *chi ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( zeta_r, zeta_i, chi ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_3 \
|
||||
( \
|
||||
dt_chi, \
|
||||
opname, \
|
||||
zeta_r, \
|
||||
zeta_i, \
|
||||
buf_chi \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( setsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi; \
|
||||
num_t dt_zeta_c = bli_obj_datatype_proj_to_complex( *zeta_r ); \
|
||||
\
|
||||
void* buf_chi; \
|
||||
\
|
||||
void* buf_zeta_r = bli_obj_buffer_at_off( *zeta_r ); \
|
||||
void* buf_zeta_i = bli_obj_buffer_at_off( *zeta_i ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
|
||||
\
|
||||
/* If chi is a scalar constant, use dt_zeta_c to extract the address of the
|
||||
corresponding constant value; otherwise, use the datatype encoded
|
||||
within the chi object and extract the buffer at the chi offset. */ \
|
||||
bli_set_scalar_dt_buffer( chi, dt_zeta_c, dt_chi, buf_chi ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_3 \
|
||||
( \
|
||||
dt_chi, \
|
||||
opname, \
|
||||
buf_chi, \
|
||||
buf_zeta_r, \
|
||||
buf_zeta_i \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( unzipsc )
|
||||
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i, \
|
||||
obj_t* chi \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi = bli_obj_datatype( *chi ); \
|
||||
\
|
||||
void* buf_zeta_r = bli_obj_buffer_for_1x1( dt_chi, *zeta_r ); \
|
||||
void* buf_zeta_i = bli_obj_buffer_for_1x1( dt_chi, *zeta_i ); \
|
||||
\
|
||||
void* buf_chi = bli_obj_buffer_at_off( *chi ); \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
|
||||
\
|
||||
/* Invoke the typed function. */ \
|
||||
bli_call_ft_3 \
|
||||
( \
|
||||
dt_chi, \
|
||||
opname, \
|
||||
buf_zeta_i, \
|
||||
buf_zeta_r, \
|
||||
buf_chi \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( zipsc )
|
||||
|
||||
125
frame/0/bli_l0_oapi.h
Normal file
125
frame/0/bli_l0_oapi.h
Normal file
@@ -0,0 +1,125 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* absq \
|
||||
);
|
||||
|
||||
GENPROT( absqsc )
|
||||
GENPROT( normfsc )
|
||||
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
);
|
||||
|
||||
GENPROT( addsc )
|
||||
GENPROT( divsc )
|
||||
GENPROT( mulsc )
|
||||
GENPROT( sqrtsc )
|
||||
GENPROT( subsc )
|
||||
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
);
|
||||
|
||||
GENPROT( getsc )
|
||||
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
double zeta_r, \
|
||||
double zeta_i, \
|
||||
obj_t* chi \
|
||||
);
|
||||
|
||||
GENPROT( setsc )
|
||||
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i \
|
||||
);
|
||||
|
||||
GENPROT( unzipsc )
|
||||
|
||||
|
||||
#undef GENPROT
|
||||
#define GENPROT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* zeta_r, \
|
||||
obj_t* zeta_i, \
|
||||
obj_t* chi \
|
||||
);
|
||||
|
||||
GENPROT( zipsc )
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
210
frame/0/bli_l0_tapi.c
Normal file
210
frame/0/bli_l0_tapi.c
Normal file
@@ -0,0 +1,210 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname, kername ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
) \
|
||||
{ \
|
||||
ctype chi_conj; \
|
||||
\
|
||||
PASTEMAC(ch,copycjs)( conjchi, *chi, chi_conj ); \
|
||||
PASTEMAC(ch,kername)( chi_conj, *psi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( addsc, adds )
|
||||
INSERT_GENTFUNC_BASIC( divsc, invscals )
|
||||
INSERT_GENTFUNC_BASIC( subsc, subs )
|
||||
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname, kername ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
) \
|
||||
{ \
|
||||
if ( PASTEMAC(ch,eq0)( *chi ) ) \
|
||||
{ \
|
||||
/* Overwrite potential Infs and NaNs. */ \
|
||||
PASTEMAC(ch,set0s)( *psi ); \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
ctype chi_conj; \
|
||||
\
|
||||
PASTEMAC(ch,copycjs)( conjchi, *chi, chi_conj ); \
|
||||
PASTEMAC(ch,kername)( chi_conj, *psi ); \
|
||||
} \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( mulsc, scals )
|
||||
|
||||
|
||||
#undef GENTFUNCR
|
||||
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype_r* absq \
|
||||
) \
|
||||
{ \
|
||||
ctype_r chi_r; \
|
||||
ctype_r chi_i; \
|
||||
ctype_r absq_i; \
|
||||
\
|
||||
( void )absq_i; \
|
||||
\
|
||||
PASTEMAC2(ch,chr,gets)( *chi, chi_r, chi_i ); \
|
||||
\
|
||||
/* absq = chi_r * chi_r + chi_i * chi_i; \
|
||||
absq_r = 0.0; (thrown away) */ \
|
||||
PASTEMAC(ch,absq2ris)( chi_r, chi_i, *absq, absq_i ); \
|
||||
\
|
||||
( void )chi_i; \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNCR_BASIC0( absqsc )
|
||||
|
||||
|
||||
#undef GENTFUNCR
|
||||
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype_r* norm \
|
||||
) \
|
||||
{ \
|
||||
/* norm = sqrt( chi_r * chi_r + chi_i * chi_i ); */ \
|
||||
PASTEMAC2(ch,chr,abval2s)( *chi, *norm ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNCR_BASIC0( normfsc )
|
||||
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
) \
|
||||
{ \
|
||||
/* NOTE: sqrtsc/sqrt2s differs from normfsc/abval2s in the complex domain. */ \
|
||||
PASTEMAC(ch,sqrt2s)( *chi, *psi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC0( sqrtsc )
|
||||
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(ch,d,gets)( *chi, *zeta_r, *zeta_i ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC0( getsc )
|
||||
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
double zeta_r, \
|
||||
double zeta_i, \
|
||||
ctype* chi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(d,ch,sets)( zeta_r, zeta_i, *chi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC0( setsc )
|
||||
|
||||
|
||||
#undef GENTFUNCR
|
||||
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype_r* zeta_r, \
|
||||
ctype_r* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(ch,chr,gets)( *chi, *zeta_r, *zeta_i ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNCR_BASIC0( unzipsc )
|
||||
|
||||
|
||||
#undef GENTFUNCR
|
||||
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype_r* zeta_r, \
|
||||
ctype_r* zeta_i, \
|
||||
ctype* chi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(chr,ch,sets)( *zeta_r, *zeta_i, *chi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNCR_BASIC0( zipsc )
|
||||
|
||||
131
frame/0/bli_l0_tapi.h
Normal file
131
frame/0/bli_l0_tapi.h
Normal file
@@ -0,0 +1,131 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
//
|
||||
// Prototype BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( addsc )
|
||||
INSERT_GENTPROT_BASIC( divsc )
|
||||
INSERT_GENTPROT_BASIC( mulsc )
|
||||
INSERT_GENTPROT_BASIC( subsc )
|
||||
|
||||
|
||||
#undef GENTPROTR
|
||||
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype_r* absq \
|
||||
);
|
||||
|
||||
INSERT_GENTPROTR_BASIC( absqsc )
|
||||
INSERT_GENTPROTR_BASIC( normfsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( sqrtsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( getsc )
|
||||
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
double zeta_r, \
|
||||
double zeta_i, \
|
||||
ctype* chi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( setsc )
|
||||
|
||||
|
||||
#undef GENTPROTR
|
||||
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype* chi, \
|
||||
ctype_r* zeta_r, \
|
||||
ctype_r* zeta_i \
|
||||
);
|
||||
|
||||
INSERT_GENTPROTR_BASIC( unzipsc )
|
||||
|
||||
|
||||
#undef GENTPROTR
|
||||
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname) \
|
||||
( \
|
||||
ctype_r* zeta_r, \
|
||||
ctype_r* zeta_i, \
|
||||
ctype* chi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROTR_BASIC( zipsc )
|
||||
|
||||
@@ -34,66 +34,93 @@
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
// NOTE: This is one of the few functions in BLIS that is defined
|
||||
// with heterogeneous type support. This is done so that we have
|
||||
// an operation that can be used to typecast (copy-cast) a scalar
|
||||
// of one datatype to a scalar of another datatype.
|
||||
|
||||
typedef void (*FUNCPTR_T)(
|
||||
conj_t conjchi,
|
||||
void* chi,
|
||||
void* psi
|
||||
);
|
||||
|
||||
static FUNCPTR_T GENARRAY2_ALL(ftypes,copysc);
|
||||
|
||||
//
|
||||
// Define object-based interface.
|
||||
// Define object-based interfaces.
|
||||
//
|
||||
void bli_copysc( obj_t* chi,
|
||||
obj_t* psi )
|
||||
{
|
||||
if ( bli_error_checking_is_enabled() )
|
||||
bli_copysc_check( chi, psi );
|
||||
|
||||
bli_copysc_unb_var1( chi, psi );
|
||||
}
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with homogeneous-typed operands.
|
||||
//
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname, varname ) \
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
) \
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(ch,ch,varname)( conjchi, \
|
||||
chi, \
|
||||
psi ); \
|
||||
conj_t conjchi = bli_obj_conj_status( *chi ); \
|
||||
\
|
||||
num_t dt_psi = bli_obj_datatype( *psi ); \
|
||||
void* buf_psi = bli_obj_buffer_at_off( *psi ); \
|
||||
\
|
||||
num_t dt_chi; \
|
||||
void* buf_chi; \
|
||||
\
|
||||
FUNCPTR_T f; \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, psi ); \
|
||||
\
|
||||
/* If chi is a scalar constant, use dt_psi to extract the address of the
|
||||
corresponding constant value; otherwise, use the datatype encoded
|
||||
within the chi object and extract the buffer at the chi offset. */ \
|
||||
bli_set_scalar_dt_buffer( chi, dt_psi, dt_chi, buf_chi ); \
|
||||
\
|
||||
/* Index into the type combination array to extract the correct
|
||||
function pointer. */ \
|
||||
f = ftypes[dt_chi][dt_psi]; \
|
||||
\
|
||||
/* Invoke the void pointer-based function. */ \
|
||||
f( \
|
||||
conjchi, \
|
||||
buf_chi, \
|
||||
buf_psi \
|
||||
); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( copysc, copysc_unb_var1 )
|
||||
GENFRONT( copysc )
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with heterogeneous-typed operands.
|
||||
// Define BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTFUNC2
|
||||
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \
|
||||
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname ) \
|
||||
\
|
||||
void PASTEMAC2(chx,chy,opname)( \
|
||||
conj_t conjchi, \
|
||||
ctype_x* chi, \
|
||||
ctype_y* psi \
|
||||
) \
|
||||
void PASTEMAC2(chx,chy,varname) \
|
||||
( \
|
||||
conj_t conjchi, \
|
||||
void* chi, \
|
||||
void* psi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(chx,chy,varname)( conjchi, \
|
||||
chi, \
|
||||
psi ); \
|
||||
ctype_x* chi_cast = chi; \
|
||||
ctype_y* psi_cast = psi; \
|
||||
\
|
||||
if ( bli_is_conj( conjchi ) ) \
|
||||
{ \
|
||||
PASTEMAC2(chx,chy,copyjs)( *chi_cast, *psi_cast ); \
|
||||
} \
|
||||
else \
|
||||
{ \
|
||||
PASTEMAC2(chx,chy,copys)( *chi_cast, *psi_cast ); \
|
||||
} \
|
||||
}
|
||||
|
||||
// Define the basic set of functions unconditionally, and then also some
|
||||
// mixed datatype functions if requested.
|
||||
INSERT_GENTFUNC2_BASIC( copysc, copysc_unb_var1 )
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_D( copysc, copysc_unb_var1 )
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_P( copysc, copysc_unb_var1 )
|
||||
#endif
|
||||
INSERT_GENTFUNC2_BASIC0( copysc )
|
||||
INSERT_GENTFUNC2_MIX_D0( copysc )
|
||||
INSERT_GENTFUNC2_MIX_P0( copysc )
|
||||
|
||||
|
||||
@@ -32,51 +32,37 @@
|
||||
|
||||
*/
|
||||
|
||||
#include "bli_copysc_check.h"
|
||||
#include "bli_copysc_unb_var1.h"
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based interface.
|
||||
// Prototype object-based interfaces.
|
||||
//
|
||||
void bli_copysc( obj_t* chi,
|
||||
obj_t* psi );
|
||||
|
||||
|
||||
//
|
||||
// Prototype BLAS-like interfaces with homogeneous-typed operands.
|
||||
//
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( copysc )
|
||||
void PASTEMAC0(opname) \
|
||||
( \
|
||||
obj_t* chi, \
|
||||
obj_t* psi \
|
||||
);
|
||||
GENFRONT( copysc )
|
||||
|
||||
|
||||
//
|
||||
// Prototype BLAS-like interfaces with heterogeneous-typed operands.
|
||||
// Define BLAS-like interfaces with heterogeneous-typed operands.
|
||||
//
|
||||
|
||||
#undef GENTPROT2
|
||||
#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \
|
||||
#define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \
|
||||
\
|
||||
void PASTEMAC2(chx,chy,opname)( \
|
||||
conj_t conjchi, \
|
||||
ctype_x* chi, \
|
||||
ctype_y* psi \
|
||||
);
|
||||
void PASTEMAC2(chx,chy,varname) \
|
||||
( \
|
||||
conj_t conjchi, \
|
||||
void* chi, \
|
||||
void* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT2_BASIC( copysc )
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
|
||||
INSERT_GENTPROT2_MIX_D( copysc )
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
|
||||
INSERT_GENTPROT2_MIX_P( copysc )
|
||||
#endif
|
||||
|
||||
|
||||
@@ -34,76 +34,78 @@
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
typedef void (*FUNCPTR_T)(
|
||||
void* chi,
|
||||
double* zeta_r,
|
||||
double* zeta_i
|
||||
);
|
||||
|
||||
static FUNCPTR_T GENARRAY(ftypes,getsc);
|
||||
|
||||
//
|
||||
// Define object-based interface.
|
||||
// Define object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname, varname ) \
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname)( \
|
||||
obj_t* x, \
|
||||
obj_t* y \
|
||||
obj_t* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( x, y ); \
|
||||
num_t dt_chi = bli_obj_datatype( *chi ); \
|
||||
num_t dt_def = BLIS_DCOMPLEX; \
|
||||
num_t dt_use; \
|
||||
\
|
||||
PASTEMAC0(varname)( x, \
|
||||
y ); \
|
||||
/* If chi is a constant object, default to using the dcomplex
|
||||
value to maximize precision, and since we don't know if the
|
||||
caller needs just the real or the real and imaginary parts. */ \
|
||||
void* buf_chi = bli_obj_buffer_for_1x1( dt_def, *chi ); \
|
||||
\
|
||||
FUNCPTR_T f; \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
|
||||
\
|
||||
/* The _check() routine prevents integer types, so we know that chi
|
||||
is either a constant or an actual floating-point type. */ \
|
||||
if ( bli_is_constant( dt_chi ) ) dt_use = dt_def; \
|
||||
else dt_use = dt_chi; \
|
||||
\
|
||||
/* Index into the type combination array to extract the correct
|
||||
function pointer. */ \
|
||||
f = ftypes[dt_use]; \
|
||||
\
|
||||
/* Invoke the function. */ \
|
||||
f( \
|
||||
buf_chi, \
|
||||
zeta_r, \
|
||||
zeta_i \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( addv, addv_kernel )
|
||||
GENFRONT( getsc )
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with homogeneous-typed operands.
|
||||
// Define BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname, varname ) \
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
conj_t conjx, \
|
||||
dim_t n, \
|
||||
ctype* x, inc_t incx, \
|
||||
ctype* y, inc_t incy \
|
||||
void* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(ch,ch,varname)( conjx, \
|
||||
n, \
|
||||
x, incx, \
|
||||
y, incy ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( addv, ADDV_KERNEL )
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with heterogeneous-typed operands.
|
||||
//
|
||||
#undef GENTFUNC2
|
||||
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \
|
||||
ctype* chi_cast = chi; \
|
||||
\
|
||||
void PASTEMAC2(chx,chy,opname)( \
|
||||
conj_t conjx, \
|
||||
dim_t n, \
|
||||
ctype_x* x, inc_t incx, \
|
||||
ctype_y* y, inc_t incy \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(chx,chy,varname)( conjx, \
|
||||
n, \
|
||||
x, incx, \
|
||||
y, incy ); \
|
||||
PASTEMAC2(ch,d,gets)( *chi_cast, *zeta_r, *zeta_i ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC2_BASIC( addv, ADDV_KERNEL )
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_D( addv, ADDV_KERNEL )
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_P( addv, ADDV_KERNEL )
|
||||
#endif
|
||||
INSERT_GENTFUNC_BASIC( getsc )
|
||||
|
||||
@@ -32,42 +32,33 @@
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
|
||||
//
|
||||
// Define object-based interface.
|
||||
// Prototype object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname, varname ) \
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname)( \
|
||||
obj_t* x \
|
||||
) \
|
||||
{ \
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( x ); \
|
||||
\
|
||||
PASTEMAC0(varname)( x ); \
|
||||
}
|
||||
|
||||
GENFRONT( invertv, invertv_kernel )
|
||||
obj_t* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
);
|
||||
GENFRONT( getsc )
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces.
|
||||
// Prototype BLAS-like interfaces with typed operands.
|
||||
//
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname, varname ) \
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
dim_t n, \
|
||||
ctype* x, inc_t incx \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC(ch,varname)( n, \
|
||||
x, incx ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( invertv, INVERTV_KERNEL )
|
||||
void* chi, \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( getsc )
|
||||
101
frame/0/old/bli_setsc.c
Normal file
101
frame/0/old/bli_setsc.c
Normal file
@@ -0,0 +1,101 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
#include "blis.h"
|
||||
|
||||
typedef void (*FUNCPTR_T)(
|
||||
double* zeta_r,
|
||||
double* zeta_i,
|
||||
void* chi
|
||||
);
|
||||
|
||||
static FUNCPTR_T GENARRAY(ftypes,setsc);
|
||||
|
||||
//
|
||||
// Define object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname)( \
|
||||
double* zeta_r, \
|
||||
double* zeta_i, \
|
||||
obj_t* chi \
|
||||
) \
|
||||
{ \
|
||||
num_t dt_chi = bli_obj_datatype( *chi ); \
|
||||
\
|
||||
void* buf_chi = bli_obj_buffer_at_off( *chi ); \
|
||||
\
|
||||
FUNCPTR_T f; \
|
||||
\
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( zeta_r, zeta_i, chi ); \
|
||||
\
|
||||
/* Index into the type combination array to extract the correct
|
||||
function pointer. */ \
|
||||
f = ftypes[dt_chi]; \
|
||||
\
|
||||
/* Invoke the function. */ \
|
||||
f( \
|
||||
zeta_r, \
|
||||
zeta_i, \
|
||||
buf_chi \
|
||||
); \
|
||||
}
|
||||
|
||||
GENFRONT( setsc )
|
||||
|
||||
|
||||
//
|
||||
// Define BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTFUNC
|
||||
#define GENTFUNC( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
double* zeta_r, \
|
||||
double* zeta_i \
|
||||
void* chi, \
|
||||
) \
|
||||
{ \
|
||||
ctype* chi_cast = chi; \
|
||||
\
|
||||
PASTEMAC2(d,ch,sets)( *zeta_r, *zeta_i, *chi_cast ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( setsc )
|
||||
|
||||
64
frame/0/old/bli_setsc.h
Normal file
64
frame/0/old/bli_setsc.h
Normal file
@@ -0,0 +1,64 @@
|
||||
/*
|
||||
|
||||
BLIS
|
||||
An object-based framework for developing high-performance BLAS-like
|
||||
libraries.
|
||||
|
||||
Copyright (C) 2014, The University of Texas at Austin
|
||||
|
||||
Redistribution and use in source and binary forms, with or without
|
||||
modification, are permitted provided that the following conditions are
|
||||
met:
|
||||
- Redistributions of source code must retain the above copyright
|
||||
notice, this list of conditions and the following disclaimer.
|
||||
- Redistributions in binary form must reproduce the above copyright
|
||||
notice, this list of conditions and the following disclaimer in the
|
||||
documentation and/or other materials provided with the distribution.
|
||||
- Neither the name of The University of Texas at Austin nor the names
|
||||
of its contributors may be used to endorse or promote products
|
||||
derived from this software without specific prior written permission.
|
||||
|
||||
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
|
||||
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
|
||||
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
|
||||
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
|
||||
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
|
||||
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
|
||||
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
|
||||
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
|
||||
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
|
||||
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
|
||||
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
|
||||
|
||||
*/
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based interfaces.
|
||||
//
|
||||
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname ) \
|
||||
\
|
||||
void PASTEMAC0(opname)( \
|
||||
double* zeta_r, \
|
||||
double* zeta_i, \
|
||||
obj_t* chi \
|
||||
);
|
||||
GENFRONT( setsc )
|
||||
|
||||
|
||||
//
|
||||
// Prototype BLAS-like interfaces with typed operands.
|
||||
//
|
||||
|
||||
#undef GENTPROT
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
double* zeta_r, \
|
||||
double* zeta_i, \
|
||||
void* chi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( setsc )
|
||||
@@ -38,22 +38,14 @@
|
||||
//
|
||||
// Define object-based interface.
|
||||
//
|
||||
#undef GENFRONT
|
||||
#define GENFRONT( opname, varname ) \
|
||||
\
|
||||
void PASTEMAC0(opname)( \
|
||||
obj_t* x, \
|
||||
obj_t* y \
|
||||
) \
|
||||
{ \
|
||||
if ( bli_error_checking_is_enabled() ) \
|
||||
PASTEMAC(opname,_check)( x, y ); \
|
||||
\
|
||||
PASTEMAC0(varname)( x, \
|
||||
y ); \
|
||||
}
|
||||
void bli_copysc( obj_t* chi,
|
||||
obj_t* psi )
|
||||
{
|
||||
if ( bli_error_checking_is_enabled() )
|
||||
bli_copysc_check( chi, psi );
|
||||
|
||||
GENFRONT( swapv, swapv_kernel )
|
||||
bli_copysc_unb_var1( chi, psi );
|
||||
}
|
||||
|
||||
|
||||
//
|
||||
@@ -63,17 +55,17 @@ GENFRONT( swapv, swapv_kernel )
|
||||
#define GENTFUNC( ctype, ch, opname, varname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
dim_t n, \
|
||||
ctype* x, inc_t incx, \
|
||||
ctype* y, inc_t incy \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(ch,ch,varname)( n, \
|
||||
x, incx, \
|
||||
y, incy ); \
|
||||
PASTEMAC2(ch,ch,varname)( conjchi, \
|
||||
chi, \
|
||||
psi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC_BASIC( swapv, SWAPV_KERNEL )
|
||||
INSERT_GENTFUNC_BASIC( copysc, copysc_unb_var1 )
|
||||
|
||||
|
||||
//
|
||||
@@ -83,23 +75,25 @@ INSERT_GENTFUNC_BASIC( swapv, SWAPV_KERNEL )
|
||||
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \
|
||||
\
|
||||
void PASTEMAC2(chx,chy,opname)( \
|
||||
dim_t n, \
|
||||
ctype_x* x, inc_t incx, \
|
||||
ctype_y* y, inc_t incy \
|
||||
conj_t conjchi, \
|
||||
ctype_x* chi, \
|
||||
ctype_y* psi \
|
||||
) \
|
||||
{ \
|
||||
PASTEMAC2(chx,chy,varname)( n, \
|
||||
x, incx, \
|
||||
y, incy ); \
|
||||
PASTEMAC2(chx,chy,varname)( conjchi, \
|
||||
chi, \
|
||||
psi ); \
|
||||
}
|
||||
|
||||
INSERT_GENTFUNC2_BASIC( swapv, SWAPV_KERNEL )
|
||||
// Define the basic set of functions unconditionally, and then also some
|
||||
// mixed datatype functions if requested.
|
||||
INSERT_GENTFUNC2_BASIC( copysc, copysc_unb_var1 )
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_D( swapv, SWAPV_KERNEL )
|
||||
INSERT_GENTFUNC2_MIX_D( copysc, copysc_unb_var1 )
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
|
||||
INSERT_GENTFUNC2_MIX_P( swapv, SWAPV_KERNEL )
|
||||
INSERT_GENTFUNC2_MIX_P( copysc, copysc_unb_var1 )
|
||||
#endif
|
||||
|
||||
@@ -32,17 +32,15 @@
|
||||
|
||||
*/
|
||||
|
||||
#include "bli_setv_check.h"
|
||||
|
||||
#include "bli_setv_kernel.h"
|
||||
#include "bli_setv_ref.h"
|
||||
#include "bli_copysc_check.h"
|
||||
#include "bli_copysc_unb_var1.h"
|
||||
|
||||
|
||||
//
|
||||
// Prototype object-based interface.
|
||||
//
|
||||
void bli_setv( obj_t* beta,
|
||||
obj_t* x );
|
||||
void bli_copysc( obj_t* chi,
|
||||
obj_t* psi );
|
||||
|
||||
|
||||
//
|
||||
@@ -52,33 +50,33 @@ void bli_setv( obj_t* beta,
|
||||
#define GENTPROT( ctype, ch, opname ) \
|
||||
\
|
||||
void PASTEMAC(ch,opname)( \
|
||||
dim_t n, \
|
||||
ctype* beta, \
|
||||
ctype* x, inc_t incx \
|
||||
conj_t conjchi, \
|
||||
ctype* chi, \
|
||||
ctype* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT_BASIC( setv )
|
||||
INSERT_GENTPROT_BASIC( copysc )
|
||||
|
||||
|
||||
//
|
||||
// Prototype BLAS-like interfaces with heterogeneous-typed operands.
|
||||
//
|
||||
#undef GENTPROT2
|
||||
#define GENTPROT2( ctype_b, ctype_x, chb, chx, opname ) \
|
||||
#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \
|
||||
\
|
||||
void PASTEMAC2(chb,chx,opname)( \
|
||||
dim_t n, \
|
||||
ctype_b* beta, \
|
||||
ctype_x* x, inc_t incx \
|
||||
void PASTEMAC2(chx,chy,opname)( \
|
||||
conj_t conjchi, \
|
||||
ctype_x* chi, \
|
||||
ctype_y* psi \
|
||||
);
|
||||
|
||||
INSERT_GENTPROT2_BASIC( setv )
|
||||
INSERT_GENTPROT2_BASIC( copysc )
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
|
||||
INSERT_GENTPROT2_MIX_D( setv )
|
||||
INSERT_GENTPROT2_MIX_D( copysc )
|
||||
#endif
|
||||
|
||||
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
|
||||
INSERT_GENTPROT2_MIX_P( setv )
|
||||
INSERT_GENTPROT2_MIX_P( copysc )
|
||||
#endif
|
||||
|
||||
Some files were not shown because too many files have changed in this diff Show More
Reference in New Issue
Block a user