Merge remote-tracking branch 'origin/master' into knl

This commit is contained in:
Devin Matthews
2016-04-18 10:21:35 -05:00
1343 changed files with 54329 additions and 31224 deletions

766
CHANGELOG
View File

@@ -1,10 +1,772 @@
commit 47caa33485b91ea6f2a5e386e61210c90c5f489f (HEAD -> master, tag: 0.1.8)
commit 898614a555ea0aa7de4ca07bb3cb8f5708b6a002 (HEAD -> master, tag: 0.2.0)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 11 17:32:09 2016 -0500
Version file update (0.2.0)
commit 537a1f4f85ce1aa008901857cb3182e6b4546d7f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 11 17:21:28 2016 -0500
Implemented runtime contexts and reorganized code.
Details:
- Retrofitted a new data structure, known as a context, into virtually
all internal APIs for computational operations in BLIS. The structure
is now present within the type-aware APIs, as well as many supporting
utility functions that require information stored in the context. User-
level object APIs were unaffected and continue to be "context-free,"
however, these APIs were duplicated/mirrored so that "context-aware"
APIs now also exist, differentiated with an "_ex" suffix (for "expert").
These new context-aware object APIs (along with the lower-level, type-
aware, BLAS-like APIs) contain the the address of a context as a last
parameter, after all other operands. Contexts, or specifically, cntx_t
object pointers, are passed all the way down the function stack into
the kernels and allow the code at any level to query information about
the runtime, such as kernel addresses and blocksizes, in a thread-
friendly manner--that is, one that allows thread-safety, even if the
original source of the information stored in the context changes at
run-time; see next bullet for more on this "original source" of info).
(Special thanks go to Lee Killough for suggesting the use of this kind
of data structure in discussions that transpired during the early
planning stages of BLIS, and also for suggesting such a perfectly
appropriate name.)
- Added a new API, in frame/base/bli_gks.c, to define a "global kernel
structure" (gks). This data structure and API will allow the caller to
initialize a context with the kernel addresses, blocksizes, and other
information associated with the currently active kernel configuration.
The currently active kernel configuration within the gks cannot be
changed (for now), and is initialized with the traditional cpp macros
that define kernel function names, blocksizes, and the like. However,
in the future, the gks API will be expanded to allow runtime management
of kernels and runtime parameters. The most obvious application of this
new infrastructure is the runtime detection of hardware (and the
implied selection of appropriate kernels). With contexts in place,
kernels may even be "hot swapped" at runtime within the gks. Once
execution enters a level-3 _front() function, the memory allocator will
be reinitialized on-the-fly, if necessary, to accommodate the new
kernels' blocksizes. If another application thread is executing with
another (previously loaded) kernel, it will finish in a deterministic
fashion because its kernel information was loaded into its context
before computation began, and also because the blocks it checked out
from the internal memory pools will be unaffected by the newer threads'
reinitialization of the allocator.
- Reorganized and streamlined the 'ind' directory, which contains much of
the code enabling use of induced methods for complex domain matrix
multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as
those APIs' functionality is now mostly subsumed within the global
kernel structure.
- Updated bli_pool.c to define a new function, bli_pool_reinit_if(),
that will reinitialize a memory pool if the necessary pool block size
has increased.
- Updated bli_mem.c to use bli_pool_reinit_if() instead of
bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed
usage of contexts where appropriate to communicate cache and register
blocksizes to bli_mem_compute_pool_block_sizes().
- Simplified control trees now that much of the information resides in
the context and/or the global kernel structure:
- Removed blocksize object pointers (blksz_t*) fields from all control
tree node definitions and replaced them with blocksize id (bszid_t)
values instead, which may be passed into a context query routine in
order to extract the corresponding blocksize from the given context.
- Removed micro-kernel function pointers (func_t*) fields from all
control tree node definitions. Now, any code that needs these function
pointers can query them from the local context, as identified by a
level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or
level-1v kernel id (l1vkr_t).
- Removed blksz_t object creation and initialization, as well as kernel
function object creation and initialization, from all operation-
specific control tree initialization files (bli_*_cntl.c), since this
information will now live in the gks and, secondarily, in the context.
- Removed blocksize multiples from blksz_t objects. Now, we track
blocksize multiples for each blocksize id (bszid_t) in the context
object.
- Removed the bool_t's that were required when a func_t was initialized.
These bools are meant to allow one to track the micro-kernel's storage
preferences (by rows or columns). This preference is now tracked
separately within the gks and contexts.
- Merged and reorganized many separate-but-related functions into single
files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and
util directories, but has the most obvious effect of allowing BLIS
to compile noticeably faster.
- Reorganized execution paths for level-1v, -1d, -1m, and -2 operations
in an attempt to reduce overhead for memory-bound operations. This
includes removal of default use of object-based variants for level-2
operations. Now, by default, level-2 operations will directly call a
low-level (non-object based) loop over a level-1v or -1f kernel.
- Converted many common query functions in blk_blksz.c (renamed from
bli_blocksize.c) and bli_func.c into cpp macros, now defined in their
respective header files.
- Defined bli_mbool.c API to create and query "multi-bools", or
heterogeneous bool_t's (one for each floating-point datatype), in the
same spirit as blksz_t and func_t.
- Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS
and BLIS_SIMD_SIZE. These values are needed in order to compute a third
new parameter, which may be set indirectly via the aforementioned
macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to
statically allocate memory in macro-kernels and the induced methods'
virtual kernels to be used as temporary space to hold a single
micro-tile. These values are now output by the testsuite. The default
value of BLIS_STACK_BUF_MAX_SIZE is computed as
"2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE".
- Cleaned up top-level 'kernels' directory (for example, renaming the
embarrassingly misleading "avx" and "avx2" directories to "sandybridge"
and "haswell," respectively, and gave more consistent and meaningful
names to many kernel files (as well as updating their interfaces to
conform to the new context-aware kernel APIs).
- Updated the testsuite to query blocksizes from a locally-initialized
context for test modules that need those values: axpyf, dotxf,
dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr.
- Reformatted many function signatures into a standard format that will
more easily facilitate future API-wide changes.
- Updated many "mxn" level-0 macros (ie: those used to inline double loops
for level-1m-like operations on small matrices) in frame/include/level0
to use more obscure local variable names in an effort to avoid variable
shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings,
which are only output using -Wshadow.)
- Added a conj argument to setm, so that its interface now mirrors that
of scalm. The semantic meaning of the conj argument is to optionally
allow implicit conjugation of the scalar prior to being populated into
the object.
- Deprecated all type-aware mixed domain and mixed precision APIs. Note
that this does not preclude supporting mixed types via the object APIs,
where it produces absolutely zero API code bloat.
commit d1f8e5d9b2ecd054ed103f4d642d748db2d4f173 (origin/master)
Merge: 20af937 c11d28e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 5 12:21:27 2016 -0500
Merge pull request #60 from esauvage/master
sgemm µkernel for bulldozer : bug correction for k%4 != 0
commit c11d28eed89d65494bc4019f04d046520866c0ff
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date: Sat Apr 2 21:15:48 2016 +0200
cgemm µkernel for bulldozer : bug correction for k%4 != 0
commit 20af937b57f82bb3acb09418d5c0206e1b24f2c7
Merge: 36c3abb fc61a11
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 31 14:37:30 2016 -0500
Merge pull request #59 from devinamatthews/fix_testsuite_makefile
Fix testsuite makefile
commit fc61a1143edeba4946d4b9915f1775bb08e643fc
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Mar 31 10:53:01 2016 -0500
Fix formatting in configure.
commit 26379b14de630e3a6c6eef5dfe87ff001558a8a6
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Mar 31 10:45:48 2016 -0500
Adjust paths in common.mk to support building from testsuite dir.
commit 36c3abb05fecb02d4a9ab13b2b69d133adf34583
Merge: 64b41fa 917ce75
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 31 10:26:17 2016 -0500
Merge pull request #58 from esauvage/master
cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer confi…
commit 356d854fc9e34642cc46e0e02a8ceb56114878af
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Mar 30 16:33:15 2016 -0500
Make symlink to common.mk in build directory.
commit edbb8470044f82ef959583ee09613a5a985292b5
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Wed Mar 30 16:27:11 2016 -0500
Refactor out some definitions which moved from make_defs.mk to Makefile for use in testsuite Makefile.
commit 917ce75482a543fef46553efff6c246939761e59
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date: Wed Mar 30 22:03:09 2016 +0200
cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel
commit 64b41fa554dff44b2f9ad48901b67c63836407a8
Merge: 1b09e34 0171ad5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 29 15:19:41 2016 -0500
Merge pull request #54 from devinamatthews/more_config_opts
More config opts
commit 1b09e343dfe5b48b4842e2cb96f41c8cc249bad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 29 12:55:28 2016 -0500
Updated gcc version from 4.8 to 4.9 in .travis.yml.
commit 0171ad58997b3a5a9b76301511dbe0751fffc940
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Mon Mar 28 13:55:06 2016 -0500
Add icc and clang support for Intel architectures, fixes #47. 2bd036f fixes #49 BTW.
commit 3090fff64cc87ff2519a09f38e6b8699cf3cba11
Merge: 8624e36 4ca5d5b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 28 12:36:25 2016 -0500
Merge pull request #44 from esauvage/master
sgemm micro-kernel for FMA4 instruction set
commit e6e566426ac3ded7ef87cd8ff9be98accfdc4acc
Merge: 469429e 8624e36
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sat Mar 26 14:10:15 2016 -0500
Merge branch 'master' into more_config_opts
commit 8624e36543160739d954c4dbcc5a5594458f3a12
Merge: a315833 2bd036f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 26 13:56:28 2016 -0500
Merge pull request #50 from devinamatthews/fix_noopt_avx
Fix configuration issue where instruction set flags are not specified for debug builds.
commit 469429ec34e5b1a172ce35596f9c7afdaacac131
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 20:45:41 2016 -0500
Fix LD_FLAGS -> LDFLAGS.
commit 8442d65c9ead0376fc5f2dfad62fd4862ab9b2b3
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 20:06:48 2016 -0500
Replace -march=native with specific architecture flags to support cross-compiling, and add icc support for Intel architectures.
commit 76099f20be1b49ac960f7e3c5a8296bbf4e1782d
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 17:22:58 2016 -0500
Add threading option to configure.
commit ad43eab4c7899d56d8d7caa6e2d92bc0581ea5a5
Merge: 9452bdb 2bd036f
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 15:00:02 2016 -0500
Merge branch 'fix_noopt_avx' into more_config_opts
commit 9452bdb3afbf2d7f898134a091d7790817e7be9c
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 14:59:50 2016 -0500
Add options for verbose make output and static/shared linking to configure.
commit 2bd036f1f9ce1ee0864365557f66d9415dd42de3
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 25 12:16:49 2016 -0500
Fix configuration issue where instruction set flags are not specified for debug builds.
commit a315833f067944fb0bc14cf60f0c7dcb5dc897b6
Merge: 1d1a426 af92773
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 24 12:30:21 2016 -0500
Merge pull request #48 from figual/master
Updated and improved ARMv8 micro-kernels.
commit af92773f4f85a2441fe0c6e3a52c31b07253d08e
Author: figual <figual@ucm.es>
Date: Wed Mar 23 22:07:02 2016 +0100
Updated and improved ARMv8 micro-kernels.
commit 1d1a426d18ec03754021456862a1f4d1dfec1fbf
Merge: 5a978ff d226dfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 7 15:17:53 2016 -0600
Merge pull request #46 from devinamatthews/new-config-opts
Add several changes to the build system.
commit d226dfa05190eb477b33563b1edccf8603973336
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Sat Mar 5 16:18:14 2016 -0600
Add several changes to the build system.
1) Add -- options.
2) Add -d/--enable-debug option to enable debugging symbols with and without optimization.
3) Allow user to specify CC at configure time, and determine vendor (gcc/icc/etc.). For now configurations enforce a particular vendor.
4) Add make V=[0,1] option to control build verbosity.
commit 5a978fffdb8f09a81c89541d541d4a6830cd70a4
Merge: adb2b4e 63e2642
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 4 17:26:58 2016 -0600
Merge pull request #45 from devinamatthews/high_prec_timers
Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday
commit 63e264239053b913164a849dd8a45829087eaddc
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 4 13:17:50 2016 -0600
Make sure that -lrt is linked on Linux.
commit 44fddd48dc1708a956803d1948f04429ec0d8700
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Fri Mar 4 12:36:38 2016 -0600
Add missing \.
commit 7cabd2131f953de23e7015d760b0ddfda51b1251
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Mar 3 11:43:07 2016 -0600
Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday.
commit adb2b4e096c78e8b2f85fd372cf0d5eb04af5be8
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Mar 2 14:48:12 2016 -0600
Fixing guard for non implemented partitioning through packed matrices
commit 4ca5d5b1fd6f2e4a8b2e139c5405475239581e51
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date: Tue Mar 1 21:33:01 2016 +0100
sgemm micro-kernel for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel
commit 627d59b5ba06866b26f46e4434a0435b600925e3
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
Date: Mon Feb 29 21:53:12 2016 +0100
symbolic link for bulldozer configuration to kernels
commit 2dc5c0ae038ed175fab85751803ada05734d1ba1
Merge: f2809fc 3d0fae8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 29 12:22:51 2016 -0600
Merge pull request #40 from tkelman/bulldozer-symlink
Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
commit f2809fc5f74466c755da6a5b4632853e634060b5
Merge: f86b94f 8624a33
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Feb 27 13:06:03 2016 -0600
Merge pull request #39 from devinamatthews/fix_f2c_conflicts
Devin's f2c type namespace update.
Details:
- Added "bla_" prefix to f2c type names to prevent conflicts with external user code.
- Removed most of the body of bli_f2c.h, which was unused.
commit 3d0fae810d942085d8f2d389820b4e0027577db8
Author: Tony Kelman <tony@kelman.net>
Date: Thu Feb 25 23:24:03 2016 -0800
Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
to fix linking issue mentioned in #37 and https://groups.google.com/forum/#!topic/blis-devel/iypwljcaeEI
commit 8624a33ccc12dff6f6c4f92992ca5636af1576a6
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Feb 25 13:51:26 2016 -0600
Fix remaining f2c conflicts.
commit 372eef0b6c0a535bf88d4b46b72f61266e8491ba
Author: Devin Matthews <dmatthews@utexas.edu>
Date: Thu Feb 25 12:01:58 2016 -0600
Fixed most conflicts after hack-n-slash ofr bli_f2c.h, cleanup in
progress.
commit f86b94f206e2e09fa3221cc55c3dc5b05ca4775a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 23 18:12:34 2016 -0600
Included missing blas2blis integer def to CBLAS.
Details:
- Added #include "bli_config_macro_defs" to all cblas_*.c files in
compat/cblas/src. This has the effect of defining
BLIS_BLAS2BLIS_INT_TYPE_SIZE to the default value if bli_config.h does
not define it. Thanks to Tony Kelman for reporting this bug.
- In cblas_i?amax.c, changed the type of the variable 'iamax' from 'int'
to 'f77_int'. This eliminates a compiler warning and a potential
runtime bug and/or crash when the size of an int differs from the size
of f77_int (as determined by BLIS_BLAS2BLIS_INT_TYPE_SIZE).
commit 0b126de1342c11c65623bcb38e258e21e9244e3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 13 16:29:12 2015 -0600
Consolidated packm_blk_var1 and packm_blk_var2.
Details:
- Consolidated the two blocked variants for packm into a single
implementation (packm_blk_var1) and removed the other variant.
- Updated all induced method _cntl_init() functions in frame/cntl/ind/
to use the new blocked variant 1.
- Defined two new macros, bli_is_ind_packed() and bli_is_nat_packed(),
to detect pack_t schemas for induced methods and native execution,
respectively.
commit 30e5eb29e060b97752f702d2ea5d101d950f53b2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 13 12:14:19 2015 -0600
Minor changes to treatment of rs, cs in bli_obj.c.
Details:
- Applied a patch submitted by Devin Matthews that:
- implements subtle changes to handling of somewhat unusual cases of
row and column strides to accommodate certail tensor cases, which
includes adding dimension parameters to _is_col_tilted() and
_is_row_tilted() macros,
- simplifies how buffers are sized when requested BLIS-allocated
objects,
- re-consolidates bli_adjust_strides_*() into one function, and
- defines 'restrict' keyword as a "nothing" macro for C++ and pre-C99
environments.
commit f0a4f41b5acf55b41707ec821c4c5f9076dfbc24
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 12 15:22:50 2015 -0600
Fixed unimplemented case in core2 sgemm ukernel.
Details:
- Implemented the "beta == 0" case for general stride output for the
dunnington sgemm micro-kernel. This case had been, up until now,
identical to the "beta != 0" case, which does not work when the
output matrix has nan's and inf's. It had manifested as nan residuals
in the test suite for right-side tests of ctrsm4m1a. Thanks to Devin
Matthews for reporting this bug.
commit 42810bbfa0b8f006ecc5128d903909ec13ea63f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 12 12:07:46 2015 -0600
Fixed minor bugs for uncommon obj_create cases.
Details:
- Separated bli_adjust_strides() into _alloc() and _attach() flavors so
that the latter can avoid a test performed by the former, in which the
rs and cs are overridden and set to zero if either matrix dimension is
zero. Actually, we also disable this overridding behavior, even for the
_alloc() case, since keeping the original strides (probably) does not
hurt anything. The original code has been kept commented-out, though,
in case an unintended consequence is later discovered.
- Fixed a typo in an error check for general stride cases where rs == cs.
commit 3e6dd11467643fbc2cb45c13cec8dd6024232833
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 3 10:30:08 2015 -0600
Minor re-expression in quadratic partitioning code.
Details:
- Minor change to quadratic equation solution code that avoids
recomputation of the sqrt() parameter when the compiler is not
smart enough to perform this optimization automatically.
commit 0694b722f7e4df00efb32639095a2aca80e67f52
Merge: 3e116f0 33557ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 2 17:24:25 2015 -0600
Merge branch 'master' of github.com:flame/blis
commit 3e116f0a2953f50b3c068759a775ad7ffae04e49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 2 17:18:23 2015 -0600
Fixed imaginary bug in quadratic partitioning code.
Details:
- Fixed a bug in the relatively new quadratic partitioning code that,
under the right conditions, would perform sqrt() on a negative value.
If the solution is imaginary, we discard it and use an alternate
partition width that assumes no diagonal intersection. That alternate
width is actually already computed, so, the fix was quite simple.
Thanks to Devangi Parikh for reporting this bug.
commit 33557ecccaf49b2569b7f3d7bcea52c2aab94c68
Author: Jeff Hammond <jeff.science@gmail.com>
Date: Mon Nov 2 12:18:43 2015 -0800
add Travis CI build status icon to the README
commit 4a502fbe77bd0f701108baaa559d9cfb483f88de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 2 13:28:34 2015 -0600
Laid groundwork for runtime memory pool resizing.
Details:
- Changed bli_pool_finalize() so that the freeing begins with the block
at top_index instead of block 0. This allows us to use the function
for terminal finalization as well as temporary cleanup prior to
reinitialization. Also, clear the pool_t struct upon _pool_finalize()
in case it is called in the terminal case with some blocks still
checked out to threads (in which case the threads will see the new
block size as 0 and thus release the block as intended).
- Added bli_pool_reinit(), which calls _pool_finalize() followed by
_pool_init() with new parameters.
- Added bli_mem_reinit(), which is based on bli_pool_reinit().
- Added new wrapper, _mem_compute_pool_block_sizes(), which calls
_mem_compute_pool_block_sizes_dt().
- Updated bli_mem_release() so that the pblk_t is freed, via
_pool_free_block(), if the block size recorded in the mem_t at the
time the pblk_t was acquired is now different from the value in the
pool_t.
commit 37e55ca39bdbddaec03ad30d43e8ad2b3e549c96
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 30 18:25:04 2015 -0500
Fixed obscure 3m1/4m1a bugs in trmm[3] and trsm.
Details:
- Fixed a family of bugs in the triangular level-3 operations for
certain complex implementations (3m1 and 4m1a) that only manifest if
one of the register blocksizes (PACKMR/PACKNR, actually) is odd:
- Fixed incorrect imaginary stride computation in bli_packm_blk_var2()
for the triangular case.
- Fixed the incorrect computation of imaginary stride, as stored in
the auxinfo_t struct in trmm and trsm macro-kernels.
- Fixed incorrect pointer arithmetic in the trsm macro-kernels in the
cases where the the register blocksize for the triangular matrix is
odd. Introduced a new byte-granular pointer arithmetic macro,
bli_ptr_add(), that computes the correct value.
- Added cpp macro to bli_macro_defs.h for typeof() operator, defined in
terms of __typeof__, which is used by bli_ptr_add() macro.
- Disabled the row- vs. column-storage optimization in bli_trmm_front()
for singleton problems because the inherent ambiguity of whether a
scalar is row-stored or column-stored causes the wrong parameter
combination code to be executed (by dumb luck of our checking for
row storage first).
- Added commented-out debugging lines to 3m1/4m1a and reference
micro-kernels, and trsm_ll macro-kernel.
commit 46294d80e5a79c598e200e1c8ec2a642ff839971
Merge: d3159c5 a0a7b85
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 27 12:41:23 2015 -0500
Merge pull request #35 from figual/master
Fixed incomplete code in the double precision ARMv8 microkernel.
commit a0a7b85ac3e157af53cff8db0e008f4a3f90372c
Author: Francisco Igual <figual@ucm.es>
Date: Tue Oct 27 08:59:15 2015 +0000
Fixed incomplete code in the double precision ARMv8 microkernel.
commit d3159c5740c9ee7f8c0b661003aab6f00646ad6f
Merge: b489152 7e03e45
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 21 14:54:00 2015 -0500
Merge branch 'master' of github.com:flame/blis
commit b489152e112644ec3b6d19e687231a9607f7694f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 21 14:53:17 2015 -0500
Use vzeroall in haswell micro-kernels.
commit 7e03e45bfe6c27c4fdbf06b1caa7f49e9a5fef49
Merge: 77ddb0b 4f88c29
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 14 13:26:07 2015 -0500
Merge pull request #33 from xianyi/master
Enable Travis CI
commit 4f88c29f9e634cbb6fb22d8c88931f0ec78ad7db
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Wed Oct 14 12:57:50 2015 -0500
Detect Intel Broadwell (using Haswell config).
commit 4b0ac1a9984a93f7ad4369b10fca63991107d9f5
Merge: fe3e355 77ddb0b
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Wed Oct 14 12:51:05 2015 -0500
Merge branch 'upstream_master'
commit 77ddb0b1d31ada111dadf392766ba6d9210ed9fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 13 12:53:06 2015 -0500
Removed flop-counting mechanism.
Details:
- Removed the optional flop-counting feature introduced in commit
7574c994.
commit 276da366187460a4c8e6e0910e79cb39ce780bfe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 12 11:43:03 2015 -0500
Minor formatting change to README.md.
commit d17057446f5404824478e8a6cd08f242ab75544a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 12 11:39:49 2015 -0500
Added "Getting Started" section to README.md.
Details:
- Added section to README.md file containing links to wikis with brief
descriptions.
commit e7e1f2f7b601b21b50e3cdad8972cb3fe11018d3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 2 16:51:52 2015 -0500
Minor updates to CREDITS, README files.
commit 55329906ecd7ce1ab910e4d30a29354a9172e7ea
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Sep 26 20:47:19 2015 -0500
Minor edits to README.md, testsuite.
Details:
- Fixed typos in README.md.
- Fixed column heading alignment for testsuite when matlab output is
enabled.
- Minor updates to test/3m4m/runme.sh and test/3m4m/Makefile.
commit bbebdb5793a8fd6aaf257012ab0272beaa04a0de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 25 14:47:27 2015 -0500
Replaced README with README.md.
Details:
- Replaced the old (and short) README file with a much more comprehensive
version written in github-flavored markdown. The new file is based on
content taken from the old Google Code homepage.
commit e2e9d64a63485461192d9c2a6dd0183a8b71013c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 24 12:14:03 2015 -0500
Load balance thread ranges for arbitrary diagonals.
Details:
- Expanded/updated interface for bli_get_range_weighted() and
bli_get_range() so that the direction of movement is specified in the
function name (e.g. bli_get_range_l2r(), bli_get_range_weighted_t2b())
and also so that the object being partitioned is passed instead of an
uplo parameter. Updated invocations in level-3 blocked variants, as
appropriate.
- (Re)implemented bli_get_range_*() and bli_get_range_weighted_*() to
carefully take into account the location of the diagonal when computing
ranges so that the area of each subpartition (which, in all present
level-3 operations, is proportional to the amount of computation
engendered) is as equal as possible.
- Added calls to a new class of routines to all non-gemm level-3 blocked
variants:
bli_<oper>_prune_unref_mparts_[mnk]()
where <oper> is herk, trmm, or trsm and [mnk] is chosen based on which
dimension is being partitioned. These routines call a more basic
routine, bli_prune_unref_mparts(), to prune unreferenced/unstored
regions from matrices and simultaneously adjust other matrices which
share the same dimension accordingly.
- Simplified herk_blk_var2f, trmm_blk_var1f/b as a result of more the
new pruning routines.
- Fixed incorrect blocking factors passed into bli_get_range_*() in
bli_trsm_blk_var[12][fb].c
- Added a new test driver in test/thread_ranges that can exercise the new
bli_get_range_*() and bli_get_range_weighted_*() under a range of
conditions.
- Reimplemented m and n fields of obj_t as elements in a "dim"
array field so that dimensions could be queried via index constant
(e.g. BLIS_M, BLIS_N). Adjusted/added query and modification
macros accordingly.
- Defined mdim_t type to enumerate BLIS_M and BLIS_N indexing values.
- Added bli_round() macro, which calls C math library function round(),
and bli_round_to_mult(), which rounds a value to the nearest multiple
of some other value.
- Added miscellaneous pruning- and mdim_t-related macros.
- Renamed bli_obj_row_offset(), bli_obj_col_offset() macros to
bli_obj_row_off(), bli_obj_col_off().
commit fe3e355c9c5a6f65b8736b009e2d501b62a83ea1
Merge: efa641e 4dd9dd3
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Fri Aug 21 14:38:36 2015 -0500
Merge branch 'upstream_master'
commit efa641e36b73abee34166a252e90e28a6281d92d
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Sat Aug 22 03:15:50 2015 +0800
Try to fix the compiling bug on travis.
commit 4dd9dd3e1de626b51bfe85d9ee65f193d60e8d38
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 21 11:52:37 2015 -0500
Fixed minor alignment ambiguity bug in bli_pool.c.
Details:
- Fixed a typecasting ambiguity in bli_pool_alloc_block() in which
pointer arithmetic was performed on a void* as if it were a byte
pointer (such as char*). Some compilers may have already been
interpreting this situation as intended, despite the sloppiness.
Thanks to Aleksei Rechinskii for reporting this issue.
- Redefined pointer alignment macros to typecast to uintptr_t instead of
siz_t.
commit 12ffd568b04feda57147c13b67717416a01c82f8
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Sat Aug 22 00:24:28 2015 +0800
Add Travis CI.
commit ecc3ebb749e0861c27deda52b5f87236ede4901b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 29 13:31:12 2015 -0500
CHANGELOG update (0.1.8)
commit 47caa33485b91ea6f2a5e386e61210c90c5f489f (tag: 0.1.8)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 29 13:31:09 2015 -0500
Version file update (0.1.8)
commit ef0fbbbdb6148b96938733fce72cb4ed7dad685e (origin/master)
commit ef0fbbbdb6148b96938733fce72cb4ed7dad685e
Merge: fdfe14f d4b8913
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 9 13:54:54 2015 -0500

View File

@@ -154,22 +154,13 @@ BLIS_DLL_NAME := $(BLIS_LIB_BASE_NAME).so
# --- BLIS framework source and object variable names ---
# These are the makefile variables that source code files will be accumulated
# into by the makefile fragments. Notice that we include separate variables
# for regular and "special" source.
# into by the makefile fragments.
MK_FRAME_SRC :=
MK_FRAME_NOOPT_SRC :=
MK_FRAME_KERNELS_SRC :=
MK_CONFIG_SRC :=
MK_CONFIG_NOOPT_SRC :=
MK_CONFIG_KERNELS_SRC :=
# These hold object filenames corresponding to above.
MK_FRAME_OBJS :=
MK_FRAME_NOOPT_OBJS :=
MK_FRAME_KERNELS_OBJS :=
MK_CONFIG_OBJS :=
MK_CONFIG_NOOPT_OBJS :=
MK_CONFIG_KERNELS_OBJS :=
# Append the base library path to the library names.
MK_ALL_BLIS_LIB := $(BASE_LIB_PATH)/$(BLIS_LIB_NAME)
@@ -309,41 +300,17 @@ CFLAGS_KERNELS := $(CFLAGS_KERNELS) $(VERS_DEF)
# Convert source file paths to object file paths by replacing the base source
# directories with the base object directories, and also replacing the source
# file suffix (eg: '.c') with '.o'.
MK_BLIS_FRAME_OBJS := $(patsubst $(FRAME_PATH)/%.c, $(BASE_OBJ_FRAME_PATH)/%.o, \
$(filter %.c, $(MK_FRAME_SRC)))
MK_BLIS_FRAME_NOOPT_OBJS := $(patsubst $(FRAME_PATH)/%.c, $(BASE_OBJ_FRAME_PATH)/%.o, \
$(filter %.c, $(MK_FRAME_NOOPT_SRC)))
MK_BLIS_FRAME_KERNELS_OBJS := $(patsubst $(FRAME_PATH)/%.c, $(BASE_OBJ_FRAME_PATH)/%.o, \
$(filter %.c, $(MK_FRAME_KERNELS_SRC)))
MK_BLIS_FRAME_OBJS := $(patsubst $(FRAME_PATH)/%.c, $(BASE_OBJ_FRAME_PATH)/%.o, \
$(filter %.c, $(MK_FRAME_SRC)))
MK_BLIS_CONFIG_OBJS := $(patsubst $(CONFIG_PATH)/%.S, $(BASE_OBJ_CONFIG_PATH)/%.o, \
$(filter %.S, $(MK_CONFIG_SRC)))
MK_BLIS_CONFIG_OBJS += $(patsubst $(CONFIG_PATH)/%.c, $(BASE_OBJ_CONFIG_PATH)/%.o, \
$(filter %.c, $(MK_CONFIG_SRC)))
MK_BLIS_CONFIG_NOOPT_OBJS := $(patsubst $(CONFIG_PATH)/%.S, $(BASE_OBJ_CONFIG_PATH)/%.o, \
$(filter %.S, $(MK_CONFIG_NOOPT_SRC)))
MK_BLIS_CONFIG_NOOPT_OBJS += $(patsubst $(CONFIG_PATH)/%.c, $(BASE_OBJ_CONFIG_PATH)/%.o, \
$(filter %.c, $(MK_CONFIG_NOOPT_SRC)))
MK_BLIS_CONFIG_KERNELS_OBJS := $(patsubst $(CONFIG_PATH)/%.S, $(BASE_OBJ_CONFIG_PATH)/%.o, \
$(filter %.S, $(MK_CONFIG_KERNELS_SRC)))
MK_BLIS_CONFIG_KERNELS_OBJS += $(patsubst $(CONFIG_PATH)/%.c, $(BASE_OBJ_CONFIG_PATH)/%.o, \
$(filter %.c, $(MK_CONFIG_KERNELS_SRC)))
MK_BLIS_CONFIG_OBJS := $(patsubst $(CONFIG_PATH)/%.S, $(BASE_OBJ_CONFIG_PATH)/%.o, \
$(filter %.S, $(MK_CONFIG_SRC)))
MK_BLIS_CONFIG_OBJS += $(patsubst $(CONFIG_PATH)/%.c, $(BASE_OBJ_CONFIG_PATH)/%.o, \
$(filter %.c, $(MK_CONFIG_SRC)))
# Combine all of the object files into some readily-accessible variables.
MK_ALL_BLIS_OPT_OBJS := $(MK_BLIS_CONFIG_OBJS) \
$(MK_BLIS_FRAME_OBJS)
MK_ALL_BLIS_NOOPT_OBJS := $(MK_BLIS_CONFIG_NOOPT_OBJS) \
$(MK_BLIS_FRAME_NOOPT_OBJS)
MK_ALL_BLIS_KERNELS_OBJS := $(MK_BLIS_CONFIG_KERNELS_OBJS) \
$(MK_BLIS_FRAME_KERNELS_OBJS)
MK_ALL_BLIS_OBJS := $(MK_ALL_BLIS_OPT_OBJS) \
$(MK_ALL_BLIS_NOOPT_OBJS) \
$(MK_ALL_BLIS_KERNELS_OBJS)
MK_ALL_BLIS_OBJS := $(MK_BLIS_CONFIG_OBJS) \
$(MK_BLIS_FRAME_OBJS)
@@ -424,15 +391,15 @@ clean: cleanlib cleantest
# Define two functions, each of which takes one argument (an object file
# path). The functions determine which CFLAGS and text string are needed to
# compile the object file. Note that we match with a preceding forward slash,
# so the directory name must begin with the special directory name, but it
# can have trailing characters (e.g. 'kernels_x86').
get_cflags_for_obj = $(if $(findstring /$(NOOPT_DIR),$1),$(CFLAGS_NOOPT),\
$(if $(findstring /$(KERNELS_DIR),$1),$(CFLAGS_KERNELS),\
# compile the object file. Note that we match without a preceding forward slash,
# so the directory name may have 'kernels' as a substring (e.g. 'ukernels' or
# 'kernels_opt').
get_cflags_for_obj = $(if $(findstring $(NOOPT_DIR),$1),$(CFLAGS_NOOPT),\
$(if $(findstring $(KERNELS_DIR),$1),$(CFLAGS_KERNELS),\
$(CFLAGS)))
get_ctext_for_obj = $(if $(findstring /$(NOOPT_DIR),$1),$(NOOPT_TEXT),\
$(if $(findstring /$(KERNELS_DIR),$1),$(KERNELS_TEXT),))
get_ctext_for_obj = $(if $(findstring $(NOOPT_DIR),$1),$(NOOPT_TEXT),\
$(if $(findstring $(KERNELS_DIR),$1),$(KERNELS_TEXT),))
$(BASE_OBJ_FRAME_PATH)/%.o: $(FRAME_PATH)/%.c $(MK_HEADER_FILES) $(MAKE_DEFS_MK_PATH)
ifeq ($(BLIS_ENABLE_VERBOSE_MAKE_OUTPUT),yes)

View File

@@ -254,7 +254,9 @@ gen_mkfiles()
# Append a relevant suffix to the makefile variable name, if necesary
all_add_src_var_name "$cur_dir"
# NOTE: This step is disabled because special directories are presently
# ignored when generating makefile variable names.
#all_add_src_var_name "$cur_dir"
# Be verbose if level 2 was requested
@@ -286,7 +288,9 @@ gen_mkfiles()
# Remove a relevant suffix from the makefile variable name, if necesary
all_del_src_var_name "$cur_dir"
# NOTE: This step is disabled because special directories are presently
# ignored when generating makefile variable names.
#all_del_src_var_name "$cur_dir"
# Return peacefully
@@ -295,42 +299,44 @@ gen_mkfiles()
update_src_var_name_special()
{
local dir act i name var_suffix
# Extract arguments.
act="$1"
dir="$2"
# Strip / from end of directory path, if there is one, and then strip
# path from directory name.
dir=${dir%/}
dir=${dir##*/}
# Run through our list.
for specdir in "${special_dirs}"; do
# If the current item matches sdir, then we'll have
# to make a modification of some form.
if [ "$dir" = "$specdir" ]; then
# Convert the directory name to uppercase.
var_suffix=$(echo "$dir" | tr '[:lower:]' '[:upper:]')
# Either add or remove the suffix, and also update the
# source file suffix variable.
if [ "$act" == "+" ]; then
src_var_name=${src_var_name}_$var_suffix
else
src_var_name=${src_var_name%_$var_suffix}
fi
# No need to continue iterating.
break;
fi
done
}
#update_src_var_name_special()
#{
# local dir act i name var_suffix
#
# # Extract arguments.
# act="$1"
# dir="$2"
#
# # Strip / from end of directory path, if there is one, and then strip
# # path from directory name.
# dir=${dir%/}
# dir=${dir##*/}
#
# # Run through our list.
# # NOTE: CURRENTLY, SPECIAL DIRECTORY NAMES ARE IGNORED. In order to
# # re-enable them, remove the quotes from "${special_dirs}".
# for specdir in "${special_dirs}"; do
#
# # If the current item matches sdir, then we'll have
# # to make a modification of some form.
# if [ "$dir" = "$specdir" ]; then
#
# # Convert the directory name to uppercase.
# var_suffix=$(echo "$dir" | tr '[:lower:]' '[:upper:]')
#
# # Either add or remove the suffix, and also update the
# # source file suffix variable.
# if [ "$act" == "+" ]; then
# src_var_name=${src_var_name}_$var_suffix
# else
# src_var_name=${src_var_name%_$var_suffix}
# fi
#
# # No need to continue iterating.
# break;
# fi
# done
#}
#init_src_var_name()
#{
@@ -351,20 +357,20 @@ update_src_var_name_special()
# done
#}
all_add_src_var_name()
{
local dir="$1"
update_src_var_name_special "+" "$dir"
#all_add_src_var_name()
#{
# local dir="$1"
#
# update_src_var_name_special "+" "$dir"
#
#}
}
all_del_src_var_name()
{
local dir="$1"
update_src_var_name_special "-" "$dir"
}
#all_del_src_var_name()
#{
# local dir="$1"
#
# update_src_var_name_special "-" "$dir"
#}
read_mkfile_config()
{

View File

@@ -161,7 +161,7 @@ LDFLAGS += -fopenmp
endif
ifeq ($(THREADING_MODEL),pthreads)
CTHREADFLAGS := -pthread -DBLIS_ENABLE_PTHREADS
LDFLAGS += -pthread
LDFLAGS += -lpthread
endif
endif
@@ -175,7 +175,7 @@ LDFLAGS += -openmp
endif
ifeq ($(THREADING_MODEL),pthreads)
CTHREADFLAGS := -pthread -DBLIS_ENABLE_PTHREADS
LDFLAGS += -pthread
LDFLAGS += -lpthread
endif
endif
@@ -188,7 +188,7 @@ $(error OpenMP is not supported with Clang.)
endif
ifeq ($(THREADING_MODEL),pthreads)
CTHREADFLAGS := -pthread -DBLIS_ENABLE_PTHREADS
LDFLAGS += -pthread
LDFLAGS += -lpthread
endif
endif

View File

@@ -144,25 +144,7 @@
// -- Default fusing factors for level-1f operations --
#define BLIS_L1F_FUSE_FAC_S 8
#define BLIS_L1F_FUSE_FAC_D 8
#define BLIS_L1F_FUSE_FAC_C 4
#define BLIS_L1F_FUSE_FAC_Z 2
#define BLIS_AXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
#define BLIS_AXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
#define BLIS_AXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
#define BLIS_AXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
#define BLIS_DOTXF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
#define BLIS_DOTXF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
#define BLIS_DOTXF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
#define BLIS_DOTXF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
#define BLIS_DOTXAXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
#define BLIS_DOTXAXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
#define BLIS_DOTXAXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
#define BLIS_DOTXAXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
#define BLIS_DEFAULT_AF_D 8
@@ -171,10 +153,8 @@
// -- gemm --
#include "bli_gemm_8x8.h"
#define BLIS_DGEMM_UKERNEL bli_dgemm_8x8
#define BLIS_ZGEMM_UKERNEL bli_zgemm_8x8
#define BLIS_DGEMM_UKERNEL bli_dgemm_int_8x8
#define BLIS_ZGEMM_UKERNEL bli_zgemm_int_8x8
// -- trsm-related --

View File

@@ -51,87 +51,6 @@
// (b) MR (for zero-padding purposes when MR and NR are "swapped")
//
// #define BLIS_DEFAULT_MC_S 128
// #define BLIS_DEFAULT_KC_S 384
// #define BLIS_DEFAULT_NC_S 4096
#define BLIS_DEFAULT_MC_D 1080
#define BLIS_DEFAULT_KC_D 120
#define BLIS_DEFAULT_NC_D 8400
// #define BLIS_DEFAULT_MC_C 128
// #define BLIS_DEFAULT_KC_C 256
// #define BLIS_DEFAULT_NC_C 4096
//
// #define BLIS_DEFAULT_MC_Z 64
// #define BLIS_DEFAULT_KC_Z 256
// #define BLIS_DEFAULT_NC_Z 2048
// -- Register blocksizes --
// #define BLIS_DEFAULT_MR_S 8
// #define BLIS_DEFAULT_NR_S 8
#define BLIS_DEFAULT_MR_D 4
#define BLIS_DEFAULT_NR_D 6
// #define BLIS_DEFAULT_MR_C 8
// #define BLIS_DEFAULT_NR_C 4
//
// #define BLIS_DEFAULT_MR_Z 8
// #define BLIS_DEFAULT_NR_Z 4
// NOTE: If the micro-kernel, which is typically unrolled to a factor
// of f, handles leftover edge cases (ie: when k % f > 0) then these
// register blocksizes in the k dimension can be defined to 1.
//#define BLIS_DEFAULT_KR_S 1
//#define BLIS_DEFAULT_KR_D 1
//#define BLIS_DEFAULT_KR_C 1
//#define BLIS_DEFAULT_KR_Z 1
// -- Maximum cache blocksizes (for optimizing edge cases) --
// NOTE: These cache blocksize "extensions" have the same constraints as
// the corresponding default blocksizes above. When these values are
// larger than the default blocksizes, blocksizes used at edge cases are
// enlarged if such an extension would encompass the remaining portion of
// the matrix dimension.
//#define BLIS_MAXIMUM_MC_S (BLIS_DEFAULT_MC_S + BLIS_DEFAULT_MC_S/4)
//#define BLIS_MAXIMUM_KC_S (BLIS_DEFAULT_KC_S + BLIS_DEFAULT_KC_S/4)
//#define BLIS_MAXIMUM_NC_S (BLIS_DEFAULT_NC_S + BLIS_DEFAULT_NC_S/4)
//#define BLIS_MAXIMUM_MC_D (BLIS_DEFAULT_MC_D + BLIS_DEFAULT_MC_D/4)
//#define BLIS_MAXIMUM_KC_D (BLIS_DEFAULT_KC_D + BLIS_DEFAULT_KC_D/4)
//#define BLIS_MAXIMUM_NC_D (BLIS_DEFAULT_NC_D + BLIS_DEFAULT_NC_D/4)
//#define BLIS_MAXIMUM_MC_C (BLIS_DEFAULT_MC_C + BLIS_DEFAULT_MC_C/4)
//#define BLIS_MAXIMUM_KC_C (BLIS_DEFAULT_KC_C + BLIS_DEFAULT_KC_C/4)
//#define BLIS_MAXIMUM_NC_C (BLIS_DEFAULT_NC_C + BLIS_DEFAULT_NC_C/4)
//#define BLIS_MAXIMUM_MC_Z (BLIS_DEFAULT_MC_Z + BLIS_DEFAULT_MC_Z/4)
//#define BLIS_MAXIMUM_KC_Z (BLIS_DEFAULT_KC_Z + BLIS_DEFAULT_KC_Z/4)
//#define BLIS_MAXIMUM_NC_Z (BLIS_DEFAULT_NC_Z + BLIS_DEFAULT_NC_Z/4)
// -- Packing register blocksize (for packed micro-panels) --
// NOTE: These register blocksize "extensions" determine whether the
// leading dimensions used within the packed micro-panels are equal to
// or greater than their corresponding register blocksizes above.
//#define BLIS_PACKDIM_MR_S (BLIS_DEFAULT_MR_S + ...)
//#define BLIS_PACKDIM_NR_S (BLIS_DEFAULT_NR_S + ...)
//#define BLIS_PACKDIM_MR_D (BLIS_DEFAULT_MR_D + ...)
//#define BLIS_PACKDIM_NR_D (BLIS_DEFAULT_NR_D + ...)
//#define BLIS_PACKDIM_MR_C (BLIS_DEFAULT_MR_C + ...)
//#define BLIS_PACKDIM_NR_C (BLIS_DEFAULT_NR_C + ...)
//#define BLIS_PACKDIM_MR_Z (BLIS_DEFAULT_MR_Z + ...)
//#define BLIS_PACKDIM_NR_Z (BLIS_DEFAULT_NR_Z + ...)
@@ -149,23 +68,28 @@
// -- gemm --
#define BLIS_SGEMM_UKERNEL bli_sgemm_8x8_FMA4
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_8x8_fma4
#define BLIS_DEFAULT_MC_S 128
#define BLIS_DEFAULT_KC_S 384
#define BLIS_DEFAULT_NC_S 4096
#define BLIS_DEFAULT_MR_S 8
#define BLIS_DEFAULT_NR_S 8
#define BLIS_DGEMM_UKERNEL bli_dgemm_4x6_FMA4
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_4x6_fma4
#define BLIS_DEFAULT_MC_D 1080
#define BLIS_DEFAULT_KC_D 120
#define BLIS_DEFAULT_NC_D 8400
#define BLIS_DEFAULT_MR_D 4
#define BLIS_DEFAULT_NR_D 6
#define BLIS_CGEMM_UKERNEL bli_cgemm_8x4_FMA4
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_8x4_fma4
#define BLIS_DEFAULT_MC_C 96
#define BLIS_DEFAULT_KC_C 256
#define BLIS_DEFAULT_NC_C 4096
#define BLIS_DEFAULT_MR_C 8
#define BLIS_DEFAULT_NR_C 4
#define BLIS_ZGEMM_UKERNEL bli_zgemm_4x4_FMA4
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_4x4_fma4
#define BLIS_DEFAULT_MC_Z 64
#define BLIS_DEFAULT_KC_Z 192
#define BLIS_DEFAULT_NC_Z 4096

View File

@@ -51,28 +51,28 @@
// (b) MR (for zero-padding purposes when MR and NR are "swapped")
//
#define BLIS_SGEMM_UKERNEL bli_sgemm_new_16x3
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_16x3
#define BLIS_DEFAULT_MC_S 528
#define BLIS_DEFAULT_KC_S 256
#define BLIS_DEFAULT_NC_S 8400
#define BLIS_DEFAULT_MR_S 16
#define BLIS_DEFAULT_NR_S 3
#define BLIS_DGEMM_UKERNEL bli_dgemm_new_8x3
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_8x3
#define BLIS_DEFAULT_MC_D 264
#define BLIS_DEFAULT_KC_D 256
#define BLIS_DEFAULT_NC_D 8400
#define BLIS_DEFAULT_MR_D 8
#define BLIS_DEFAULT_NR_D 3
#define BLIS_CGEMM_UKERNEL bli_cgemm_new_4x2
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_4x2
#define BLIS_DEFAULT_MC_C 264
#define BLIS_DEFAULT_KC_C 256
#define BLIS_DEFAULT_NC_C 8400
#define BLIS_DEFAULT_MR_C 4
#define BLIS_DEFAULT_NR_C 2
#define BLIS_ZGEMM_UKERNEL bli_zgemm_new_2x2
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_2x2
#define BLIS_DEFAULT_MC_Z 100
#define BLIS_DEFAULT_KC_Z 320
#define BLIS_DEFAULT_NC_Z 8400

View File

@@ -1 +1 @@
../../kernels/arm/neon
../../kernels/arm

View File

@@ -1 +1 @@
../../kernels/arm/neon
../../kernels/arm

View File

@@ -67,26 +67,6 @@
//#define BLIS_DEFAULT_KC_Z 384
//#define BLIS_DEFAULT_NC_Z 4096
// NOTE: If 4m blocksizes are not defined here, they will be determined
// from the corresponding real domain blocksizes.
#define BLIS_DEFAULT_4M_MC_C 384
#define BLIS_DEFAULT_4M_KC_C 512
#define BLIS_DEFAULT_4M_NC_C 4096
#define BLIS_DEFAULT_4M_MC_Z 192
#define BLIS_DEFAULT_4M_KC_Z 256
#define BLIS_DEFAULT_4M_NC_Z 4096
// NOTE: If 3m blocksizes are not defined here, they will be determined
// from the corresponding real domain blocksizes.
#define BLIS_DEFAULT_3M_MC_C 384
#define BLIS_DEFAULT_3M_KC_C 512
#define BLIS_DEFAULT_3M_NC_C 4096
#define BLIS_DEFAULT_3M_MC_Z 192
#define BLIS_DEFAULT_3M_KC_Z 256
#define BLIS_DEFAULT_3M_NC_Z 4096
// -- Register blocksizes --
#define BLIS_DEFAULT_MR_S 8
@@ -101,56 +81,6 @@
#define BLIS_DEFAULT_MR_Z 2
#define BLIS_DEFAULT_NR_Z 2
// NOTE: If the micro-kernel, which is typically unrolled to a factor
// of f, handles leftover edge cases (ie: when k % f > 0) then these
// register blocksizes in the k dimension can be defined to 1.
//#define BLIS_DEFAULT_KR_S 1
//#define BLIS_DEFAULT_KR_D 1
//#define BLIS_DEFAULT_KR_C 1
//#define BLIS_DEFAULT_KR_Z 1
// -- Maximum cache blocksizes (for optimizing edge cases) --
// NOTE: These cache blocksize "extensions" have the same constraints as
// the corresponding default blocksizes above. When these values are
// larger than the default blocksizes, blocksizes used at edge cases are
// enlarged if such an extension would encompass the remaining portion of
// the matrix dimension.
//#define BLIS_MAXIMUM_MC_S (BLIS_DEFAULT_MC_S + BLIS_DEFAULT_MC_S/4)
//#define BLIS_MAXIMUM_KC_S (BLIS_DEFAULT_KC_S + BLIS_DEFAULT_KC_S/4)
//#define BLIS_MAXIMUM_NC_S (BLIS_DEFAULT_NC_S + BLIS_DEFAULT_NC_S/4)
//#define BLIS_MAXIMUM_MC_D (BLIS_DEFAULT_MC_D + BLIS_DEFAULT_MC_D/4)
//#define BLIS_MAXIMUM_KC_D (BLIS_DEFAULT_KC_D + BLIS_DEFAULT_KC_D/4)
//#define BLIS_MAXIMUM_NC_D (BLIS_DEFAULT_NC_D + BLIS_DEFAULT_NC_D/4)
//#define BLIS_MAXIMUM_MC_C (BLIS_DEFAULT_MC_C + BLIS_DEFAULT_MC_C/4)
//#define BLIS_MAXIMUM_KC_C (BLIS_DEFAULT_KC_C + BLIS_DEFAULT_KC_C/4)
//#define BLIS_MAXIMUM_NC_C (BLIS_DEFAULT_NC_C + BLIS_DEFAULT_NC_C/4)
//#define BLIS_MAXIMUM_MC_Z (BLIS_DEFAULT_MC_Z + BLIS_DEFAULT_MC_Z/4)
//#define BLIS_MAXIMUM_KC_Z (BLIS_DEFAULT_KC_Z + BLIS_DEFAULT_KC_Z/4)
//#define BLIS_MAXIMUM_NC_Z (BLIS_DEFAULT_NC_Z + BLIS_DEFAULT_NC_Z/4)
// -- Packing register blocksize (for packed micro-panels) --
// NOTE: These register blocksize "extensions" determine whether the
// leading dimensions used within the packed micro-panels are equal to
// or greater than their corresponding register blocksizes above.
//#define BLIS_PACKDIM_MR_S (BLIS_DEFAULT_MR_S + ...)
//#define BLIS_PACKDIM_NR_S (BLIS_DEFAULT_NR_S + ...)
//#define BLIS_PACKDIM_MR_D (BLIS_DEFAULT_MR_D + ...)
//#define BLIS_PACKDIM_NR_D (BLIS_DEFAULT_NR_D + ...)
//#define BLIS_PACKDIM_MR_C (BLIS_DEFAULT_MR_C + ...)
//#define BLIS_PACKDIM_NR_C (BLIS_DEFAULT_NR_C + ...)
//#define BLIS_PACKDIM_MR_Z (BLIS_DEFAULT_MR_Z + ...)
//#define BLIS_PACKDIM_NR_Z (BLIS_DEFAULT_NR_Z + ...)
@@ -169,13 +99,13 @@
// -- gemm --
#define BLIS_SGEMM_UKERNEL bli_sgemm_opt_8x4
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_4x4
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_8x4
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_4x4
// -- trsm-related --
#define BLIS_DGEMMTRSM_L_UKERNEL bli_dgemmtrsm_l_opt_4x4
#define BLIS_DGEMMTRSM_U_UKERNEL bli_dgemmtrsm_u_opt_4x4
#define BLIS_DGEMMTRSM_L_UKERNEL bli_dgemmtrsm_l_asm_4x4
#define BLIS_DGEMMTRSM_U_UKERNEL bli_dgemmtrsm_u_asm_4x4
@@ -184,23 +114,23 @@
// -- axpy2v --
#define BLIS_DAXPY2V_KERNEL bli_daxpy2v_opt_var1
#define BLIS_DAXPY2V_KERNEL bli_daxpy2v_int_var1
// -- dotaxpyv --
#define BLIS_DDOTAXPYV_KERNEL bli_ddotaxpyv_opt_var1
#define BLIS_DDOTAXPYV_KERNEL bli_ddotaxpyv_int_var1
// -- axpyf --
#define BLIS_DAXPYF_KERNEL bli_daxpyf_opt_var1
#define BLIS_DAXPYF_KERNEL bli_daxpyf_int_var1
// -- dotxf --
#define BLIS_DDOTXF_KERNEL bli_ddotxf_opt_var1
#define BLIS_DDOTXF_KERNEL bli_ddotxf_int_var1
// -- dotxaxpyf --
#define BLIS_DDOTXAXPYF_KERNEL bli_ddotxaxpyf_opt_var1
#define BLIS_DDOTXAXPYF_KERNEL bli_ddotxaxpyf_int_var1

View File

@@ -1 +1 @@
../../kernels/x86_64/core2-sse3
../../kernels/x86_64/penryn

View File

@@ -89,21 +89,6 @@
#endif
/*
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_8x4
#define BLIS_DEFAULT_MC_C 96
#define BLIS_DEFAULT_KC_C 256
#define BLIS_DEFAULT_NC_C 4096
#define BLIS_DEFAULT_MR_C 8
#define BLIS_DEFAULT_NR_C 4
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_4x4
#define BLIS_DEFAULT_MC_Z 64
#define BLIS_DEFAULT_KC_Z 192
#define BLIS_DEFAULT_NC_Z 4096
#define BLIS_DEFAULT_MR_Z 4
#define BLIS_DEFAULT_NR_Z 4
*/

View File

@@ -1 +1 @@
../../kernels/x86_64/avx2
../../kernels/x86_64/haswell

View File

@@ -149,7 +149,7 @@
// -- gemm --
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_d4x4
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_4x4
// -- trsm-related --

View File

@@ -42,6 +42,9 @@
#define BLIS_SIMD_ALIGN_SIZE 32
#define BLIS_SIMD_SIZE 64
#define BLIS_SIMD_NUM_REGISTERS 32
#endif

View File

@@ -153,8 +153,8 @@
#define BLIS_DGEMM_UKERNEL_PREFERS_CONTIG_ROWS
#define BLIS_DGEMM_UKERNEL bli_dgemm_opt_30x8
#define BLIS_SGEMM_UKERNEL bli_sgemm_opt_30x16
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_30x16
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_30x8
// -- trsm-related --

View File

@@ -51,7 +51,7 @@
// (b) MR (for zero-padding purposes when MR and NR are "swapped")
//
#define BLIS_SGEMM_UKERNEL bli_sgemm_new_16x3
#define BLIS_SGEMM_UKERNEL bli_sgemm_asm_16x3
#define BLIS_DEFAULT_MC_S 2016
#define BLIS_DEFAULT_KC_S 128
#define BLIS_DEFAULT_NC_S 8400
@@ -59,7 +59,7 @@
#define BLIS_DEFAULT_NR_S 3
//#define BLIS_UPANEL_B_ALIGN_SIZE_S 4096
#define BLIS_DGEMM_UKERNEL bli_dgemm_new_8x3
#define BLIS_DGEMM_UKERNEL bli_dgemm_asm_8x3
//#define BLIS_DEFAULT_MC_D 768
//#define BLIS_DEFAULT_KC_D 168
#define BLIS_DEFAULT_MC_D 1008
@@ -69,14 +69,14 @@
#define BLIS_DEFAULT_NR_D 3
//#define BLIS_UPANEL_B_ALIGN_SIZE_D 4096
#define BLIS_CGEMM_UKERNEL bli_cgemm_new_4x2
#define BLIS_CGEMM_UKERNEL bli_cgemm_asm_4x2
#define BLIS_DEFAULT_MC_C 512
#define BLIS_DEFAULT_KC_C 256
#define BLIS_DEFAULT_NC_C 8400
#define BLIS_DEFAULT_MR_C 4
#define BLIS_DEFAULT_NR_C 2
#define BLIS_ZGEMM_UKERNEL bli_zgemm_new_2x2
#define BLIS_ZGEMM_UKERNEL bli_zgemm_asm_2x2
#define BLIS_DEFAULT_MC_Z 400
#define BLIS_DEFAULT_KC_Z 160
#define BLIS_DEFAULT_NC_Z 8400

View File

@@ -1 +1 @@
../../kernels/x86_64/avx
../../kernels/x86_64/sandybridge

View File

@@ -177,17 +177,17 @@
// be packed here, but this tends to be much too expensive in practice to
// actually employ.)
//#define BLIS_DEFAULT_L2_MC_S 1000
//#define BLIS_DEFAULT_L2_NC_S 1000
//#define BLIS_DEFAULT_M2_S 1000
//#define BLIS_DEFAULT_N2_S 1000
//#define BLIS_DEFAULT_L2_MC_D 1000
//#define BLIS_DEFAULT_L2_NC_D 1000
//#define BLIS_DEFAULT_M2_D 1000
//#define BLIS_DEFAULT_N2_D 1000
//#define BLIS_DEFAULT_L2_MC_C 1000
//#define BLIS_DEFAULT_L2_NC_C 1000
//#define BLIS_DEFAULT_M2_C 1000
//#define BLIS_DEFAULT_N2_C 1000
//#define BLIS_DEFAULT_L2_MC_Z 1000
//#define BLIS_DEFAULT_L2_NC_Z 1000
//#define BLIS_DEFAULT_M2_Z 1000
//#define BLIS_DEFAULT_N2_Z 1000
@@ -196,25 +196,25 @@
// -- Default fusing factors for level-1f operations --
//#define BLIS_L1F_FUSE_FAC_S 8
//#define BLIS_L1F_FUSE_FAC_D 4
//#define BLIS_L1F_FUSE_FAC_C 4
//#define BLIS_L1F_FUSE_FAC_Z 2
//#define BLIS_DEFAULT_1F_S 8
//#define BLIS_DEFAULT_1F_D 4
//#define BLIS_DEFAULT_1F_C 4
//#define BLIS_DEFAULT_1F_Z 2
//#define BLIS_AXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
//#define BLIS_AXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
//#define BLIS_AXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
//#define BLIS_AXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
//#define BLIS_DEFAULT_AF_S BLIS_DEFAULT_1F_S
//#define BLIS_DEFAULT_AF_D BLIS_DEFAULT_1F_D
//#define BLIS_DEFAULT_AF_C BLIS_DEFAULT_1F_C
//#define BLIS_DEFAULT_AF_Z BLIS_DEFAULT_1F_Z
//#define BLIS_DOTXF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
//#define BLIS_DOTXF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
//#define BLIS_DOTXF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
//#define BLIS_DOTXF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
//#define BLIS_DEFAULT_DF_S BLIS_DEFAULT_1F_S
//#define BLIS_DEFAULT_DF_D BLIS_DEFAULT_1F_D
//#define BLIS_DEFAULT_DF_C BLIS_DEFAULT_1F_C
//#define BLIS_DEFAULT_DF_Z BLIS_DEFAULT_1F_Z
//#define BLIS_DOTXAXPYF_FUSE_FAC_S BLIS_L1F_FUSE_FAC_S
//#define BLIS_DOTXAXPYF_FUSE_FAC_D BLIS_L1F_FUSE_FAC_D
//#define BLIS_DOTXAXPYF_FUSE_FAC_C BLIS_L1F_FUSE_FAC_C
//#define BLIS_DOTXAXPYF_FUSE_FAC_Z BLIS_L1F_FUSE_FAC_Z
//#define BLIS_DEFAULT_XF_S BLIS_DEFAULT_1F_S
//#define BLIS_DEFAULT_XF_D BLIS_DEFAULT_1F_D
//#define BLIS_DEFAULT_XF_C BLIS_DEFAULT_1F_C
//#define BLIS_DEFAULT_XF_Z BLIS_DEFAULT_1F_Z

View File

@@ -36,59 +36,87 @@
void bli_saxpyv_opt_var1( conj_t conjx,
dim_t n,
float* restrict alpha,
float* restrict x, inc_t incx,
float* restrict y, inc_t incy )
void bli_saxpyv_opt_var1
(
conj_t conjx,
dim_t n,
float* alpha,
float* x, inc_t incx,
float* y, inc_t incy,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_SAXPYV_KERNEL_REF( conjx,
n,
alpha,
x, incx,
y, incy );
BLIS_SAXPYV_KERNEL_REF
(
conjx,
n,
alpha,
x, incx,
y, incy,
cntx
);
}
void bli_daxpyv_opt_var1( conj_t conjx,
dim_t n,
double* restrict alpha,
double* restrict x, inc_t incx,
double* restrict y, inc_t incy )
void bli_daxpyv_opt_var1
(
conj_t conjx,
dim_t n,
double* alpha,
double* x, inc_t incx,
double* y, inc_t incy,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_DAXPYV_KERNEL_REF( conjx,
n,
alpha,
x, incx,
y, incy );
BLIS_DAXPYV_KERNEL_REF
(
conjx,
n,
alpha,
x, incx,
y, incy,
cntx
);
}
void bli_caxpyv_opt_var1( conj_t conjx,
dim_t n,
scomplex* restrict alpha,
scomplex* restrict x, inc_t incx,
scomplex* restrict y, inc_t incy )
void bli_caxpyv_opt_var1
(
conj_t conjx,
dim_t n,
scomplex* alpha,
scomplex* x, inc_t incx,
scomplex* y, inc_t incy,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_CAXPYV_KERNEL_REF( conjx,
n,
alpha,
x, incx,
y, incy );
BLIS_CAXPYV_KERNEL_REF
(
conjx,
n,
alpha,
x, incx,
y, incy,
cntx
);
}
void bli_zaxpyv_opt_var1( conj_t conjx,
dim_t n,
dcomplex* restrict alpha,
dcomplex* restrict x, inc_t incx,
dcomplex* restrict y, inc_t incy )
void bli_zaxpyv_opt_var1
(
conj_t conjx,
dim_t n,
dcomplex* alpha,
dcomplex* x, inc_t incx,
dcomplex* y, inc_t incy,
cntx_t* cntx
)
{
/*
Template axpyv kernel implementation
@@ -193,11 +221,15 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
// Call the reference implementation if needed.
if ( use_ref == TRUE )
{
BLIS_ZAXPYV_KERNEL_REF( conjx,
n,
alpha,
x, incx,
y, incy );
BLIS_ZAXPYV_KERNEL_REF
(
conjx,
n,
alpha,
x, incx,
y, incy,
cntx
);
return;
}
@@ -219,7 +251,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
// Compute front edge cases if x and y were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzaxpys( *alpha, *xp, *yp );
bli_zaxpys( *alpha, *xp, *yp );
xp += 1; yp += 1;
}
@@ -228,7 +260,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzaxpys( *alpha, *xp, *yp );
bli_zaxpys( *alpha, *xp, *yp );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -237,7 +269,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzaxpys( *alpha, *xp, *yp );
bli_zaxpys( *alpha, *xp, *yp );
xp += 1; yp += 1;
}
@@ -247,7 +279,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
// Compute front edge cases if x and y were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzaxpyjs( *alpha, *xp, *yp );
bli_zaxpyjs( *alpha, *xp, *yp );
xp += 1; yp += 1;
}
@@ -256,7 +288,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzaxpyjs( *alpha, *xp, *yp );
bli_zaxpyjs( *alpha, *xp, *yp );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -265,7 +297,7 @@ void bli_zaxpyv_opt_var1( conj_t conjx,
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzaxpyjs( *alpha, *xp, *yp );
bli_zaxpyjs( *alpha, *xp, *yp );
xp += 1; yp += 1;
}

View File

@@ -36,66 +36,94 @@
void bli_sdotv_opt_var1( conj_t conjx,
conj_t conjy,
dim_t n,
float* restrict x, inc_t incx,
float* restrict y, inc_t incy,
float* restrict rho )
void bli_sdotv_opt_var1
(
conj_t conjx,
conj_t conjy,
dim_t n,
float* x, inc_t incx,
float* y, inc_t incy,
float* rho,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_SDOTV_KERNEL_REF( conjx,
conjy,
n,
x, incx,
y, incy,
rho );
BLIS_SDOTV_KERNEL_REF
(
conjx,
conjy,
n,
x, incx,
y, incy,
rho,
cntx
);
}
void bli_ddotv_opt_var1( conj_t conjx,
conj_t conjy,
dim_t n,
double* restrict x, inc_t incx,
double* restrict y, inc_t incy,
double* restrict rho )
void bli_ddotv_opt_var1
(
conj_t conjx,
conj_t conjy,
dim_t n,
double* x, inc_t incx,
double* y, inc_t incy,
double* rho,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_DDOTV_KERNEL_REF( conjx,
conjy,
n,
x, incx,
y, incy,
rho );
BLIS_DDOTV_KERNEL_REF
(
conjx,
conjy,
n,
x, incx,
y, incy,
rho,
cntx
);
}
void bli_cdotv_opt_var1( conj_t conjx,
conj_t conjy,
dim_t n,
scomplex* restrict x, inc_t incx,
scomplex* restrict y, inc_t incy,
scomplex* restrict rho )
void bli_cdotv_opt_var1
(
conj_t conjx,
conj_t conjy,
dim_t n,
scomplex* x, inc_t incx,
scomplex* y, inc_t incy,
scomplex* rho,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_CDOTV_KERNEL_REF( conjx,
conjy,
n,
x, incx,
y, incy,
rho );
BLIS_CDOTV_KERNEL_REF
(
conjx,
conjy,
n,
x, incx,
y, incy,
rho,
cntx
);
}
void bli_zdotv_opt_var1( conj_t conjx,
conj_t conjy,
dim_t n,
dcomplex* restrict x, inc_t incx,
dcomplex* restrict y, inc_t incy,
dcomplex* restrict rho )
void bli_zdotv_opt_var1
(
conj_t conjx,
conj_t conjy,
dim_t n,
dcomplex* x, inc_t incx,
dcomplex* y, inc_t incy,
dcomplex* rho,
cntx_t* cntx
)
{
/*
Template dotv kernel implementation
@@ -210,12 +238,16 @@ void bli_zdotv_opt_var1( conj_t conjx,
// Call the reference implementation if needed.
if ( use_ref == TRUE )
{
BLIS_ZDOTV_KERNEL_REF( conjx,
conjy,
n,
x, incx,
y, incy,
rho );
BLIS_ZDOTV_KERNEL_REF
(
conjx,
conjy,
n,
x, incx,
y, incy,
rho,
cntx
);
return;
}
@@ -250,7 +282,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
// Compute front edge cases if x and y were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzdots( *xp, *yp, dotxy );
bli_zdots( *xp, *yp, dotxy );
xp += 1; yp += 1;
}
@@ -259,7 +291,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzdots( *xp, *yp, dotxy );
bli_zdots( *xp, *yp, dotxy );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -268,7 +300,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzdots( *xp, *yp, dotxy );
bli_zdots( *xp, *yp, dotxy );
xp += 1; yp += 1;
}
@@ -278,7 +310,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
// Compute front edge cases if x and y were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzdotjs( *xp, *yp, dotxy );
bli_zdotjs( *xp, *yp, dotxy );
xp += 1; yp += 1;
}
@@ -287,7 +319,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
// yp are guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzdotjs( *xp, *yp, dotxy );
bli_zdotjs( *xp, *yp, dotxy );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -296,7 +328,7 @@ void bli_zdotv_opt_var1( conj_t conjx,
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzdotjs( *xp, *yp, dotxy );
bli_zdotjs( *xp, *yp, dotxy );
xp += 1; yp += 1;
}
@@ -307,6 +339,6 @@ void bli_zdotv_opt_var1( conj_t conjx,
if ( bli_is_conj( conjy ) )
bli_zconjs( dotxy );
bli_zzcopys( dotxy, *rho );
bli_zcopys( dotxy, *rho );
}

View File

@@ -36,88 +36,108 @@
void bli_saxpy2v_opt_var1(
conj_t conjx,
conj_t conjy,
dim_t n,
float* restrict alpha1,
float* restrict alpha2,
float* restrict x, inc_t incx,
float* restrict y, inc_t incy,
float* restrict z, inc_t incz
)
void bli_saxpy2v_opt_var1
(
conj_t conjx,
conj_t conjy,
dim_t n,
float* alpha1,
float* alpha2,
float* x, inc_t incx,
float* y, inc_t incy,
float* z, inc_t incz,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_SAXPY2V_KERNEL_REF( conjx,
conjy,
n,
alpha1,
alpha2,
x, incx,
y, incy,
z, incz );
BLIS_SAXPY2V_KERNEL_REF
(
conjx,
conjy,
n,
alpha1,
alpha2,
x, incx,
y, incy,
z, incz,
cntx
);
}
void bli_daxpy2v_opt_var1(
conj_t conjx,
conj_t conjy,
dim_t n,
double* restrict alpha1,
double* restrict alpha2,
double* restrict x, inc_t incx,
double* restrict y, inc_t incy,
double* restrict z, inc_t incz
)
void bli_daxpy2v_opt_var1
(
conj_t conjx,
conj_t conjy,
dim_t n,
double* alpha1,
double* alpha2,
double* x, inc_t incx,
double* y, inc_t incy,
double* z, inc_t incz,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_DAXPY2V_KERNEL_REF( conjx,
conjy,
n,
alpha1,
alpha2,
x, incx,
y, incy,
z, incz );
BLIS_DAXPY2V_KERNEL_REF
(
conjx,
conjy,
n,
alpha1,
alpha2,
x, incx,
y, incy,
z, incz,
cntx
);
}
void bli_caxpy2v_opt_var1(
conj_t conjx,
conj_t conjy,
dim_t n,
scomplex* restrict alpha1,
scomplex* restrict alpha2,
scomplex* restrict x, inc_t incx,
scomplex* restrict y, inc_t incy,
scomplex* restrict z, inc_t incz
)
void bli_caxpy2v_opt_var1
(
conj_t conjx,
conj_t conjy,
dim_t n,
scomplex* alpha1,
scomplex* alpha2,
scomplex* x, inc_t incx,
scomplex* y, inc_t incy,
scomplex* z, inc_t incz,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_CAXPY2V_KERNEL_REF( conjx,
conjy,
n,
alpha1,
alpha2,
x, incx,
y, incy,
z, incz );
BLIS_CAXPY2V_KERNEL_REF
(
conjx,
conjy,
n,
alpha1,
alpha2,
x, incx,
y, incy,
z, incz,
cntx
);
}
void bli_zaxpy2v_opt_var1(
conj_t conjx,
conj_t conjy,
dim_t n,
dcomplex* restrict alpha1,
dcomplex* restrict alpha2,
dcomplex* restrict x, inc_t incx,
dcomplex* restrict y, inc_t incy,
dcomplex* restrict z, inc_t incz
)
void bli_zaxpy2v_opt_var1
(
conj_t conjx,
conj_t conjy,
dim_t n,
dcomplex* alpha1,
dcomplex* alpha2,
dcomplex* x, inc_t incx,
dcomplex* y, inc_t incy,
dcomplex* z, inc_t incz,
cntx_t* cntx
)
{
/*
Template axpy2v kernel implementation
@@ -229,14 +249,18 @@ void bli_zaxpy2v_opt_var1(
// Call the reference implementation if needed.
if ( use_ref == TRUE )
{
BLIS_ZAXPY2V_KERNEL_REF( conjx,
conjy,
n,
alpha1,
alpha2,
x, incx,
y, incy,
z, incz );
BLIS_ZAXPY2V_KERNEL_REF
(
conjx,
conjy,
n,
alpha1,
alpha2,
x, incx,
y, incy,
z, incz,
cntx
);
return;
}
@@ -259,8 +283,8 @@ void bli_zaxpy2v_opt_var1(
// Compute front edge cases if x, y, and z were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzaxpys( *alpha1, *xp, *zp );
bli_zzzaxpys( *alpha2, *yp, *zp );
bli_zaxpys( *alpha1, *xp, *zp );
bli_zaxpys( *alpha2, *yp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -272,8 +296,8 @@ void bli_zaxpy2v_opt_var1(
// to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzaxpys( *alpha1, *xp, *zp );
bli_zzzaxpys( *alpha2, *yp, *zp );
bli_zaxpys( *alpha1, *xp, *zp );
bli_zaxpys( *alpha2, *yp, *zp );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -283,8 +307,8 @@ void bli_zaxpy2v_opt_var1(
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzaxpys( *alpha1, *xp, *zp );
bli_zzzaxpys( *alpha2, *yp, *zp );
bli_zaxpys( *alpha1, *xp, *zp );
bli_zaxpys( *alpha2, *yp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -294,8 +318,8 @@ void bli_zaxpy2v_opt_var1(
// Compute front edge cases if x, y, and z were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzaxpys( *alpha1, *xp, *zp );
bli_zzzaxpyjs( *alpha2, *yp, *zp );
bli_zaxpys( *alpha1, *xp, *zp );
bli_zaxpyjs( *alpha2, *yp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -307,8 +331,8 @@ void bli_zaxpy2v_opt_var1(
// to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzaxpys( *alpha1, *xp, *zp );
bli_zzzaxpyjs( *alpha2, *yp, *zp );
bli_zaxpys( *alpha1, *xp, *zp );
bli_zaxpyjs( *alpha2, *yp, *zp );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -318,8 +342,8 @@ void bli_zaxpy2v_opt_var1(
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzaxpys( *alpha1, *xp, *zp );
bli_zzzaxpyjs( *alpha2, *yp, *zp );
bli_zaxpys( *alpha1, *xp, *zp );
bli_zaxpyjs( *alpha2, *yp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -329,8 +353,8 @@ void bli_zaxpy2v_opt_var1(
// Compute front edge cases if x, y, and z were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzaxpyjs( *alpha1, *xp, *zp );
bli_zzzaxpys( *alpha2, *yp, *zp );
bli_zaxpyjs( *alpha1, *xp, *zp );
bli_zaxpys( *alpha2, *yp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -342,8 +366,8 @@ void bli_zaxpy2v_opt_var1(
// to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzaxpyjs( *alpha1, *xp, *zp );
bli_zzzaxpys( *alpha2, *yp, *zp );
bli_zaxpyjs( *alpha1, *xp, *zp );
bli_zaxpys( *alpha2, *yp, *zp );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -353,8 +377,8 @@ void bli_zaxpy2v_opt_var1(
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzaxpyjs( *alpha1, *xp, *zp );
bli_zzzaxpys( *alpha2, *yp, *zp );
bli_zaxpyjs( *alpha1, *xp, *zp );
bli_zaxpys( *alpha2, *yp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -364,8 +388,8 @@ void bli_zaxpy2v_opt_var1(
// Compute front edge cases if x, y, and z were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzaxpyjs( *alpha1, *xp, *zp );
bli_zzzaxpyjs( *alpha2, *yp, *zp );
bli_zaxpyjs( *alpha1, *xp, *zp );
bli_zaxpyjs( *alpha2, *yp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -377,8 +401,8 @@ void bli_zaxpy2v_opt_var1(
// to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzaxpyjs( *alpha1, *xp, *zp );
bli_zzzaxpyjs( *alpha2, *yp, *zp );
bli_zaxpyjs( *alpha1, *xp, *zp );
bli_zaxpyjs( *alpha2, *yp, *zp );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -388,8 +412,8 @@ void bli_zaxpy2v_opt_var1(
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzaxpyjs( *alpha1, *xp, *zp );
bli_zzzaxpyjs( *alpha2, *yp, *zp );
bli_zaxpyjs( *alpha1, *xp, *zp );
bli_zaxpyjs( *alpha2, *yp, *zp );
xp += 1; yp += 1; zp += 1;
}

View File

@@ -36,87 +36,107 @@
void bli_saxpyf_opt_var1(
conj_t conja,
conj_t conjx,
dim_t m,
dim_t b_n,
float* restrict alpha,
float* restrict a, inc_t inca, inc_t lda,
float* restrict x, inc_t incx,
float* restrict y, inc_t incy
)
void bli_saxpyf_opt_var1
(
conj_t conja,
conj_t conjx,
dim_t m,
dim_t b_n,
float* alpha,
float* a, inc_t inca, inc_t lda,
float* x, inc_t incx,
float* y, inc_t incy,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_SAXPYF_KERNEL_REF( conja,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
y, incy );
BLIS_SAXPYF_KERNEL_REF
(
conja,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
y, incy,
cntx
);
}
void bli_daxpyf_opt_var1(
conj_t conja,
conj_t conjx,
dim_t m,
dim_t b_n,
double* restrict alpha,
double* restrict a, inc_t inca, inc_t lda,
double* restrict x, inc_t incx,
double* restrict y, inc_t incy
)
void bli_daxpyf_opt_var1
(
conj_t conja,
conj_t conjx,
dim_t m,
dim_t b_n,
double* alpha,
double* a, inc_t inca, inc_t lda,
double* x, inc_t incx,
double* y, inc_t incy,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_DAXPYF_KERNEL_REF( conja,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
y, incy );
BLIS_DAXPYF_KERNEL_REF
(
conja,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
y, incy,
cntx
);
}
void bli_caxpyf_opt_var1(
conj_t conja,
conj_t conjx,
dim_t m,
dim_t b_n,
scomplex* restrict alpha,
scomplex* restrict a, inc_t inca, inc_t lda,
scomplex* restrict x, inc_t incx,
scomplex* restrict y, inc_t incy
)
void bli_caxpyf_opt_var1
(
conj_t conja,
conj_t conjx,
dim_t m,
dim_t b_n,
scomplex* alpha,
scomplex* a, inc_t inca, inc_t lda,
scomplex* x, inc_t incx,
scomplex* y, inc_t incy,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_CAXPYF_KERNEL_REF( conja,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
y, incy );
BLIS_CAXPYF_KERNEL_REF
(
conja,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
y, incy,
cntx
);
}
void bli_zaxpyf_opt_var1(
conj_t conja,
conj_t conjx,
dim_t m,
dim_t b_n,
dcomplex* restrict alpha,
dcomplex* restrict a, inc_t inca, inc_t lda,
dcomplex* restrict x, inc_t incx,
dcomplex* restrict y, inc_t incy
)
void bli_zaxpyf_opt_var1
(
conj_t conja,
conj_t conjx,
dim_t m,
dim_t b_n,
dcomplex* alpha,
dcomplex* a, inc_t inca, inc_t lda,
dcomplex* x, inc_t incx,
dcomplex* y, inc_t incy,
cntx_t* cntx
)
{
/*
Template axpyf kernel implementation
@@ -243,14 +263,18 @@ void bli_zaxpyf_opt_var1(
// Call the reference implementation if needed.
if ( use_ref == TRUE )
{
BLIS_ZAXPYF_KERNEL_REF( conja,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
y, incy );
BLIS_ZAXPYF_KERNEL_REF
(
conja,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
y, incy,
cntx
);
return;
}
@@ -274,16 +298,16 @@ void bli_zaxpyf_opt_var1(
{
for ( j = 0; j < b_n; ++j )
{
bli_zzcopys( *xp[ j ], alpha_x[ j ] );
bli_zzscals( *alpha, alpha_x[ j ] );
bli_zcopys( *xp[ j ], alpha_x[ j ] );
bli_zscals( *alpha, alpha_x[ j ] );
}
}
else // if ( bli_is_conj( conjx ) )
{
for ( j = 0; j < b_n; ++j )
{
bli_zzcopyjs( *xp[ j ], alpha_x[ j ] );
bli_zzscals( *alpha, alpha_x[ j ] );
bli_zcopyjs( *xp[ j ], alpha_x[ j ] );
bli_zscals( *alpha, alpha_x[ j ] );
}
}
@@ -296,7 +320,7 @@ void bli_zaxpyf_opt_var1(
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp );
bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp );
ap[ j ] += 1;
}
@@ -312,7 +336,7 @@ void bli_zaxpyf_opt_var1(
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp );
bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp );
ap[ j ] += n_elem_per_iter;
}
@@ -324,7 +348,7 @@ void bli_zaxpyf_opt_var1(
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzaxpys( alpha_x[ j ], *ap[ j ], *yp );
bli_zaxpys( alpha_x[ j ], *ap[ j ], *yp );
ap[ j ] += 1;
}
@@ -338,7 +362,7 @@ void bli_zaxpyf_opt_var1(
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
ap[ j ] += 1;
}
@@ -354,7 +378,7 @@ void bli_zaxpyf_opt_var1(
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
ap[ j ] += n_elem_per_iter;
}
@@ -366,7 +390,7 @@ void bli_zaxpyf_opt_var1(
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
bli_zaxpyjs( alpha_x[ j ], *ap[ j ], *yp );
ap[ j ] += 1;
}

View File

@@ -36,87 +36,115 @@
void bli_sdotaxpyv_opt_var1( conj_t conjxt,
conj_t conjx,
conj_t conjy,
dim_t n,
float* restrict alpha,
float* restrict x, inc_t incx,
float* restrict y, inc_t incy,
float* restrict rho,
float* restrict z, inc_t incz )
void bli_sdotaxpyv_opt_var1
(
conj_t conjxt,
conj_t conjx,
conj_t conjy,
dim_t n,
float* alpha,
float* x, inc_t incx,
float* y, inc_t incy,
float* rho,
float* z, inc_t incz,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_SDOTAXPYV_KERNEL_REF( conjxt,
conjx,
conjy,
n,
alpha,
x, incx,
y, incy,
rho,
z, incz );
BLIS_SDOTAXPYV_KERNEL_REF
(
conjxt,
conjx,
conjy,
n,
alpha,
x, incx,
y, incy,
rho,
z, incz,
cntx
);
}
void bli_ddotaxpyv_opt_var1( conj_t conjxt,
conj_t conjx,
conj_t conjy,
dim_t n,
double* restrict alpha,
double* restrict x, inc_t incx,
double* restrict y, inc_t incy,
double* restrict rho,
double* restrict z, inc_t incz )
void bli_ddotaxpyv_opt_var1
(
conj_t conjxt,
conj_t conjx,
conj_t conjy,
dim_t n,
double* alpha,
double* x, inc_t incx,
double* y, inc_t incy,
double* rho,
double* z, inc_t incz,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_DDOTAXPYV_KERNEL_REF( conjxt,
conjx,
conjy,
n,
alpha,
x, incx,
y, incy,
rho,
z, incz );
BLIS_DDOTAXPYV_KERNEL_REF
(
conjxt,
conjx,
conjy,
n,
alpha,
x, incx,
y, incy,
rho,
z, incz,
cntx
);
}
void bli_cdotaxpyv_opt_var1( conj_t conjxt,
conj_t conjx,
conj_t conjy,
dim_t n,
scomplex* restrict alpha,
scomplex* restrict x, inc_t incx,
scomplex* restrict y, inc_t incy,
scomplex* restrict rho,
scomplex* restrict z, inc_t incz )
void bli_cdotaxpyv_opt_var1
(
conj_t conjxt,
conj_t conjx,
conj_t conjy,
dim_t n,
scomplex* alpha,
scomplex* x, inc_t incx,
scomplex* y, inc_t incy,
scomplex* rho,
scomplex* z, inc_t incz,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_CDOTAXPYV_KERNEL_REF( conjxt,
conjx,
conjy,
n,
alpha,
x, incx,
y, incy,
rho,
z, incz );
BLIS_CDOTAXPYV_KERNEL_REF
(
conjxt,
conjx,
conjy,
n,
alpha,
x, incx,
y, incy,
rho,
z, incz,
cntx
);
}
void bli_zdotaxpyv_opt_var1( conj_t conjxt,
conj_t conjx,
conj_t conjy,
dim_t n,
dcomplex* restrict alpha,
dcomplex* restrict x, inc_t incx,
dcomplex* restrict y, inc_t incy,
dcomplex* restrict rho,
dcomplex* restrict z, inc_t incz )
void bli_zdotaxpyv_opt_var1
(
conj_t conjxt,
conj_t conjx,
conj_t conjy,
dim_t n,
dcomplex* alpha,
dcomplex* x, inc_t incx,
dcomplex* y, inc_t incy,
dcomplex* rho,
dcomplex* z, inc_t incz,
cntx_t* cntx
)
{
/*
Template dotaxpyv kernel implementation
@@ -240,15 +268,19 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// Call the reference implementation if needed.
if ( use_ref == TRUE )
{
BLIS_ZDOTAXPYV_KERNEL_REF( conjxt,
conjx,
conjy,
n,
alpha,
x, incx,
y, incy,
rho,
z, incz );
BLIS_ZDOTAXPYV_KERNEL_REF
(
conjxt,
conjx,
conjy,
n,
alpha,
x, incx,
y, incy,
rho,
z, incz,
cntx
);
return;
}
@@ -285,8 +317,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// Compute front edge cases if x, y, and z were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzdots( *xp, *yp, dotxy );
bli_zzzaxpys( *alpha, *xp, *zp );
bli_zdots( *xp, *yp, dotxy );
bli_zaxpys( *alpha, *xp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -298,8 +330,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzdots( *xp, *yp, dotxy );
bli_zzzaxpys( *alpha, *xp, *zp );
bli_zdots( *xp, *yp, dotxy );
bli_zaxpys( *alpha, *xp, *zp );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -309,8 +341,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzdots( *xp, *yp, dotxy );
bli_zzzaxpys( *alpha, *xp, *zp );
bli_zdots( *xp, *yp, dotxy );
bli_zaxpys( *alpha, *xp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -320,8 +352,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// Compute front edge cases if x, y, and z were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzdotjs( *xp, *yp, dotxy );
bli_zzzaxpys( *alpha, *xp, *zp );
bli_zdotjs( *xp, *yp, dotxy );
bli_zaxpys( *alpha, *xp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -333,8 +365,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzdotjs( *xp, *yp, dotxy );
bli_zzzaxpys( *alpha, *xp, *zp );
bli_zdotjs( *xp, *yp, dotxy );
bli_zaxpys( *alpha, *xp, *zp );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -344,8 +376,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzdotjs( *xp, *yp, dotxy );
bli_zzzaxpys( *alpha, *xp, *zp );
bli_zdotjs( *xp, *yp, dotxy );
bli_zaxpys( *alpha, *xp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -355,8 +387,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// Compute front edge cases if x, y, and z were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzdots( *xp, *yp, dotxy );
bli_zzzaxpyjs( *alpha, *xp, *zp );
bli_zdots( *xp, *yp, dotxy );
bli_zaxpyjs( *alpha, *xp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -368,8 +400,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzdots( *xp, *yp, dotxy );
bli_zzzaxpyjs( *alpha, *xp, *zp );
bli_zdots( *xp, *yp, dotxy );
bli_zaxpyjs( *alpha, *xp, *zp );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -379,8 +411,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzdots( *xp, *yp, dotxy );
bli_zzzaxpyjs( *alpha, *xp, *zp );
bli_zdots( *xp, *yp, dotxy );
bli_zaxpyjs( *alpha, *xp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -390,8 +422,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// Compute front edge cases if x, y, and z were unaligned.
for ( i = 0; i < n_pre; ++i )
{
bli_zzzdotjs( *xp, *yp, dotxy );
bli_zzzaxpyjs( *alpha, *xp, *zp );
bli_zdotjs( *xp, *yp, dotxy );
bli_zaxpyjs( *alpha, *xp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -403,8 +435,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// guaranteed to be aligned to BLIS_SIMD_ALIGN_SIZE.
for ( i = 0; i < n_iter; ++i )
{
bli_zzzdotjs( *xp, *yp, dotxy );
bli_zzzaxpyjs( *alpha, *xp, *zp );
bli_zdotjs( *xp, *yp, dotxy );
bli_zaxpyjs( *alpha, *xp, *zp );
xp += n_elem_per_iter;
yp += n_elem_per_iter;
@@ -414,8 +446,8 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
// Compute tail edge cases, if applicable.
for ( i = 0; i < n_left; ++i )
{
bli_zzzdotjs( *xp, *yp, dotxy );
bli_zzzaxpyjs( *alpha, *xp, *zp );
bli_zdotjs( *xp, *yp, dotxy );
bli_zaxpyjs( *alpha, *xp, *zp );
xp += 1; yp += 1; zp += 1;
}
@@ -426,6 +458,6 @@ void bli_zdotaxpyv_opt_var1( conj_t conjxt,
if ( bli_is_conj( conjy ) )
bli_zconjs( dotxy );
bli_zzcopys( dotxy, *rho );
bli_zcopys( dotxy, *rho );
}

View File

@@ -36,115 +36,143 @@
void bli_sdotxaxpyf_opt_var1( conj_t conjat,
conj_t conja,
conj_t conjw,
conj_t conjx,
dim_t m,
dim_t b_n,
float* restrict alpha,
float* restrict a, inc_t inca, inc_t lda,
float* restrict w, inc_t incw,
float* restrict x, inc_t incx,
float* restrict beta,
float* restrict y, inc_t incy,
float* restrict z, inc_t incz )
void bli_sdotxaxpyf_opt_var1
(
conj_t conjat,
conj_t conja,
conj_t conjw,
conj_t conjx,
dim_t m,
dim_t b_n,
float* alpha,
float* a, inc_t inca, inc_t lda,
float* w, inc_t incw,
float* x, inc_t incx,
float* beta,
float* y, inc_t incy,
float* z, inc_t incz,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_SDOTXAXPYF_KERNEL_REF( conjat,
conja,
conjw,
conjx,
m,
b_n,
alpha,
a, inca, lda,
w, incw,
x, incx,
beta,
y, incy,
z, incz );
BLIS_SDOTXAXPYF_KERNEL_REF
(
conjat,
conja,
conjw,
conjx,
m,
b_n,
alpha,
a, inca, lda,
w, incw,
x, incx,
beta,
y, incy,
z, incz,
cntx
);
}
void bli_ddotxaxpyf_opt_var1( conj_t conjat,
conj_t conja,
conj_t conjw,
conj_t conjx,
dim_t m,
dim_t b_n,
double* restrict alpha,
double* restrict a, inc_t inca, inc_t lda,
double* restrict w, inc_t incw,
double* restrict x, inc_t incx,
double* restrict beta,
double* restrict y, inc_t incy,
double* restrict z, inc_t incz )
void bli_ddotxaxpyf_opt_var1
(
conj_t conjat,
conj_t conja,
conj_t conjw,
conj_t conjx,
dim_t m,
dim_t b_n,
double* alpha,
double* a, inc_t inca, inc_t lda,
double* w, inc_t incw,
double* x, inc_t incx,
double* beta,
double* y, inc_t incy,
double* z, inc_t incz,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_DDOTXAXPYF_KERNEL_REF( conjat,
conja,
conjw,
conjx,
m,
b_n,
alpha,
a, inca, lda,
w, incw,
x, incx,
beta,
y, incy,
z, incz );
BLIS_DDOTXAXPYF_KERNEL_REF
(
conjat,
conja,
conjw,
conjx,
m,
b_n,
alpha,
a, inca, lda,
w, incw,
x, incx,
beta,
y, incy,
z, incz,
cntx
);
}
void bli_cdotxaxpyf_opt_var1( conj_t conjat,
conj_t conja,
conj_t conjw,
conj_t conjx,
dim_t m,
dim_t b_n,
scomplex* restrict alpha,
scomplex* restrict a, inc_t inca, inc_t lda,
scomplex* restrict w, inc_t incw,
scomplex* restrict x, inc_t incx,
scomplex* restrict beta,
scomplex* restrict y, inc_t incy,
scomplex* restrict z, inc_t incz )
void bli_cdotxaxpyf_opt_var1
(
conj_t conjat,
conj_t conja,
conj_t conjw,
conj_t conjx,
dim_t m,
dim_t b_n,
scomplex* alpha,
scomplex* a, inc_t inca, inc_t lda,
scomplex* w, inc_t incw,
scomplex* x, inc_t incx,
scomplex* beta,
scomplex* y, inc_t incy,
scomplex* z, inc_t incz,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_CDOTXAXPYF_KERNEL_REF( conjat,
conja,
conjw,
conjx,
m,
b_n,
alpha,
a, inca, lda,
w, incw,
x, incx,
beta,
y, incy,
z, incz );
BLIS_CDOTXAXPYF_KERNEL_REF
(
conjat,
conja,
conjw,
conjx,
m,
b_n,
alpha,
a, inca, lda,
w, incw,
x, incx,
beta,
y, incy,
z, incz,
cntx
);
}
void bli_zdotxaxpyf_opt_var1( conj_t conjat,
conj_t conja,
conj_t conjw,
conj_t conjx,
dim_t m,
dim_t b_n,
dcomplex* restrict alpha,
dcomplex* restrict a, inc_t inca, inc_t lda,
dcomplex* restrict w, inc_t incw,
dcomplex* restrict x, inc_t incx,
dcomplex* restrict beta,
dcomplex* restrict y, inc_t incy,
dcomplex* restrict z, inc_t incz )
void bli_zdotxaxpyf_opt_var1
(
conj_t conjat,
conj_t conja,
conj_t conjw,
conj_t conjx,
dim_t m,
dim_t b_n,
dcomplex* alpha,
dcomplex* a, inc_t inca, inc_t lda,
dcomplex* w, inc_t incw,
dcomplex* x, inc_t incx,
dcomplex* beta,
dcomplex* y, inc_t incy,
dcomplex* z, inc_t incz,
cntx_t* cntx
)
{
/*
@@ -289,19 +317,23 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
// Call the reference implementation if needed.
if ( use_ref == TRUE )
{
BLIS_ZDOTXAXPYF_KERNEL_REF( conjat,
conja,
conjw,
conjx,
m,
b_n,
alpha,
a, inca, lda,
w, incw,
x, incx,
beta,
y, incy,
z, incz );
BLIS_ZDOTXAXPYF_KERNEL_REF
(
conjat,
conja,
conjw,
conjx,
m,
b_n,
alpha,
a, inca, lda,
w, incw,
x, incx,
beta,
y, incy,
z, incz,
cntx
);
return;
}
@@ -326,16 +358,16 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzcopys( *xp[ j ], alpha_x[ j ] );
bli_zzscals( *alpha, alpha_x[ j ] );
bli_zcopys( *xp[ j ], alpha_x[ j ] );
bli_zscals( *alpha, alpha_x[ j ] );
}
}
else // if ( bli_is_conj( conjx ) )
{
for ( j = 0; j < b_n; ++j )
{
bli_zzcopyjs( *xp[ j ], alpha_x[ j ] );
bli_zzscals( *alpha, alpha_x[ j ] );
bli_zcopyjs( *xp[ j ], alpha_x[ j ] );
bli_zscals( *alpha, alpha_x[ j ] );
}
}
@@ -366,8 +398,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += 1;
}
@@ -383,8 +415,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += n_elem_per_iter;
}
@@ -396,8 +428,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += 1;
}
@@ -411,8 +443,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += 1;
}
@@ -428,8 +460,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += n_elem_per_iter;
}
@@ -441,8 +473,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdots( *ap[ j ], alpha_x[ j ], *zp );
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zdots( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += 1;
}
@@ -456,8 +488,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += 1;
}
@@ -473,8 +505,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += n_elem_per_iter;
}
@@ -486,8 +518,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdots( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
bli_zdots( *ap[ j ], *wp, At_w[ j ] );
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += 1;
}
@@ -501,8 +533,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += 1;
}
@@ -518,8 +550,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += n_elem_per_iter;
}
@@ -531,8 +563,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
{
for ( j = 0; j < b_n; ++j )
{
bli_zzzdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zzzdotjs( *ap[ j ], alpha_x[ j ], *zp );
bli_zdotjs( *ap[ j ], *wp, At_w[ j ] );
bli_zdotjs( *ap[ j ], alpha_x[ j ], *zp );
ap[ j ] += 1;
}
@@ -555,8 +587,8 @@ void bli_zdotxaxpyf_opt_var1( conj_t conjat,
// scaling by beta.
for ( j = 0; j < b_n; ++j )
{
bli_zzscals( *beta, *yp[ j ] );
bli_zzzaxpys( *alpha, At_w[ j ], *yp[ j ] );
bli_zscals( *beta, *yp[ j ] );
bli_zaxpys( *alpha, At_w[ j ], *yp[ j ] );
}
}

View File

@@ -36,95 +36,115 @@
void bli_sdotxf_opt_var1(
conj_t conjat,
conj_t conjx,
dim_t m,
dim_t b_n,
float* restrict alpha,
float* restrict a, inc_t inca, inc_t lda,
float* restrict x, inc_t incx,
float* restrict beta,
float* restrict y, inc_t incy
)
void bli_sdotxf_opt_var1
(
conj_t conjat,
conj_t conjx,
dim_t m,
dim_t b_n,
float* alpha,
float* a, inc_t inca, inc_t lda,
float* x, inc_t incx,
float* beta,
float* y, inc_t incy,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_SDOTXF_KERNEL_REF( conjat,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
beta,
y, incy );
BLIS_SDOTXF_KERNEL_REF
(
conjat,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
beta,
y, incy,
cntx
);
}
void bli_ddotxf_opt_var1(
conj_t conjat,
conj_t conjx,
dim_t m,
dim_t b_n,
double* restrict alpha,
double* restrict a, inc_t inca, inc_t lda,
double* restrict x, inc_t incx,
double* restrict beta,
double* restrict y, inc_t incy
)
void bli_ddotxf_opt_var1
(
conj_t conjat,
conj_t conjx,
dim_t m,
dim_t b_n,
double* alpha,
double* a, inc_t inca, inc_t lda,
double* x, inc_t incx,
double* beta,
double* y, inc_t incy,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_DDOTXF_KERNEL_REF( conjat,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
beta,
y, incy );
BLIS_DDOTXF_KERNEL_REF
(
conjat,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
beta,
y, incy,
cntx
);
}
void bli_cdotxf_opt_var1(
conj_t conjat,
conj_t conjx,
dim_t m,
dim_t b_n,
scomplex* restrict alpha,
scomplex* restrict a, inc_t inca, inc_t lda,
scomplex* restrict x, inc_t incx,
scomplex* restrict beta,
scomplex* restrict y, inc_t incy
)
void bli_cdotxf_opt_var1
(
conj_t conjat,
conj_t conjx,
dim_t m,
dim_t b_n,
scomplex* alpha,
scomplex* a, inc_t inca, inc_t lda,
scomplex* x, inc_t incx,
scomplex* beta,
scomplex* y, inc_t incy,
cntx_t* cntx
)
{
/* Just call the reference implementation. */
BLIS_CDOTXF_KERNEL_REF( conjat,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
beta,
y, incy );
BLIS_CDOTXF_KERNEL_REF
(
conjat,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
beta,
y, incy,
cntx
);
}
void bli_zdotxf_opt_var1(
conj_t conjat,
conj_t conjx,
dim_t m,
dim_t b_n,
dcomplex* restrict alpha,
dcomplex* restrict a, inc_t inca, inc_t lda,
dcomplex* restrict x, inc_t incx,
dcomplex* restrict beta,
dcomplex* restrict y, inc_t incy
)
void bli_zdotxf_opt_var1
(
conj_t conjat,
conj_t conjx,
dim_t m,
dim_t b_n,
dcomplex* alpha,
dcomplex* a, inc_t inca, inc_t lda,
dcomplex* x, inc_t incx,
dcomplex* beta,
dcomplex* y, inc_t incy,
cntx_t* cntx
)
{
/*
Template dotxf kernel implementation
@@ -225,10 +245,14 @@ void bli_zdotxf_opt_var1(
// If the vector lengths are zero, scale r by beta and return.
if ( bli_zero_dim1( m ) )
{
bli_zzscalv( BLIS_NO_CONJUGATE,
b_n,
beta,
y, incy );
bli_zscalv_ex
(
BLIS_NO_CONJUGATE,
b_n,
beta,
y, incy,
cntx
);
return;
}
@@ -265,15 +289,19 @@ void bli_zdotxf_opt_var1(
// Call the reference implementation if needed.
if ( use_ref == TRUE )
{
BLIS_ZDOTXF_KERNEL_REF( conjat,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
beta,
y, incy );
BLIS_ZDOTXF_KERNEL_REF
(
conjat,
conjx,
m,
b_n,
alpha,
a, inca, lda,
x, incx,
beta,
y, incy,
cntx
);
return;
}

View File

@@ -36,37 +36,45 @@
void bli_sgemm_opt_mxn(
dim_t k,
float* restrict alpha,
float* restrict a1,
float* restrict b1,
float* restrict beta,
float* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_sgemm_opt_mxn
(
dim_t k,
float* restrict alpha,
float* restrict a1,
float* restrict b1,
float* restrict beta,
float* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/* Just call the reference implementation. */
BLIS_SGEMM_UKERNEL_REF( k,
alpha,
a1,
b1,
beta,
c11, rs_c, cs_c,
data );
BLIS_SGEMM_UKERNEL_REF
(
k,
alpha,
a1,
b1,
beta,
c11, rs_c, cs_c,
data,
cntx
);
}
void bli_dgemm_opt_mxn(
dim_t k,
double* restrict alpha,
double* restrict a1,
double* restrict b1,
double* restrict beta,
double* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_dgemm_opt_mxn
(
dim_t k,
double* restrict alpha,
double* restrict a1,
double* restrict b1,
double* restrict beta,
double* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/*
Template gemm micro-kernel implementation
@@ -85,133 +93,27 @@ void bli_dgemm_opt_mxn(
where A1 is MR x k, B1 is k x NR, C11 is MR x NR, and alpha and beta are
scalars.
Parameters:
For more info, please refer to the BLIS website's wiki on kernels:
- k: The number of columns of A1 and rows of B1.
- alpha: The address of a scalar to the A1 * B1 product.
- a1: The address of a micro-panel of matrix A of dimension MR x k,
stored by columns with leading dimension PACKMR, where
typically PACKMR = MR.
- b1: The address of a micro-panel of matrix B of dimension k x NR,
stored by rows with leading dimension PACKNR, where typically
PACKNR = NR.
- beta: The address of a scalar to the input value of matrix C11.
- c11: The address of a submatrix C11 of dimension MR x NR, stored
according to rs_c and cs_c.
- rs_c: The row stride of matrix C11 (ie: the distance to the next row,
in units of matrix elements).
- cs_c: The column stride of matrix C11 (ie: the distance to the next
column, in units of matrix elements).
- data: The address of an auxinfo_t object that contains auxiliary
information that may be useful when optimizing the gemm
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
more info.)
https://github.com/flame/blis/wiki/KernelsHowTo
Diagram for gemm
The diagram below shows the packed micro-panel operands and how elements
of each would be stored when MR = NR = 4. The hex digits indicate the
layout and order (but NOT the numeric contents) of the elements in
memory. Note that the storage of C11 is not shown since it is determined
by the row and column strides of C11.
c11: a1: b1:
_______ ______________________ _______
| | |0 4 8 C | |0 1 2 3|
MR | | |1 5 9 D . . . | |4 5 6 7|
| | += |2 6 A E | |8 9 A B|
|_______| |3_7_B_F_______________| |C D E F|
| . |
NR k | . | k
| . |
| |
| |
|_______|
NR
Implementation Notes for gemm
- Register blocksizes. The C preprocessor macros bli_?mr and bli_?nr
evaluate to the MR and NR register blocksizes for the datatype
corresponding to the '?' character. These values are abbreviations
of the macro constants BLIS_DEFAULT_MR_? and BLIS_DEFAULT_NR_?,
which are defined in the bli_kernel.h header file of the BLIS
configuration.
- Leading dimensions of a1 and b1: PACKMR and PACKNR. The packed
micro-panels a1 and b1 are simply stored in column-major and row-major
order, respectively. Usually, the width of either micro-panel (ie:
the number of rows of A1, or MR, and the number of columns of B1, or
NR) is equal to that micro-panel's so-called "leading dimension."
Sometimes, it may be beneficial to specify a leading dimension that
is larger than the panel width. This may be desirable because it
allows each column of A1 or row of B1 to maintain a certain alignment
in memory that would not otherwise be maintained by MR and/or NR. In
this case, you should index through a1 and b1 using the values PACKMR
and PACKNR, respectively, as defined by bli_?packmr and bli_?packnr.
These values are defined as BLIS_PACKDIM_MR_? and BLIS_PACKDIM_NR_?,
respectively, in the bli_kernel.h header file of the BLIS
configuration.
- Storage preference of c11: Sometimes, an optimized micro-kernel will
have a preferred storage format for C11--typically either contiguous
row-storage or contiguous column-storage. This preference comes from
how the micro-kernel is most efficiently able to load/store elements
of C11 from/to memory. Most micro-kernels use vector instructions to
load and store contigous columns (or column segments) of C11. However,
the developer may decide that loading contiguous rows (or row
segments) is desirable. If this is the case, this preference should be
noted in bli_kernel.h by defining the macro
BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS. Leaving the macro undefined
leaves the default assumption (contiguous column preference) in
place. Setting this macro allows the framework to perform a minor
optimization at run-time that will ensure the micro-kernel preference
is honored, if at all possible.
- Edge cases in MR, NR dimensions. Sometimes the micro-kernel will be
called with micro-panels a1 and b1 that correspond to edge cases,
where only partial results are needed. Zero-padding is handled
automatically by the packing function to facilitate reuse of the same
micro-kernel. Similarly, the logic for computing to temporary storage
and then saving only the elements that correspond to elements of C11
that exist (at the edges) is handled automatically within the
macro-kernel.
- Alignment of a1 and b1. By default, the alignment of addresses a1 and
b1 are aligned only to sizeof(type). If BLIS_CONTIG_ADDR_ALIGN_SIZE is
set to some larger multiple of sizeof(type), such as the page size,
then a1 and b1 will be aligned to PACKMR * sizeof(type) and PACKNR *
sizeof(type), respectively. Alignment of a1 and b1 is also affected
by BLIS_UPANEL_A_ALIGN_SIZE_? and BLIS_UPANEL_B_ALIGN_SIZE_?, which
align the distance (stride) between subsequent micro-panels. (By
default, those values are simply sizeof(type), in which case they have
no effect.)
- Unrolling loops. As a general rule of thumb, the loop over k is
sometimes moderately unrolled; for example, in our experience, an
unrolling factor of u = 4 is fairly common. If unrolling is applied
in the k dimension, edge cases must be handled to support values of k
that are not multiples of u. It is nearly universally true that there
should be no loops in the MR or NR directions; in other words,
iteration over these dimensions should always be fully unrolled
(within the loop over k).
- Zero beta. If beta = 0.0 (or 0.0 + 0.0i for complex datatypes), then
the micro-kernel should NOT use it explicitly, as C11 may contain
uninitialized memory (including NaNs). This case should be detected
and handled separately, preferably by simply overwriting C11 with the
alpha * A1 * B1 product. An example of how to perform this "beta equals
zero" handling is included in the gemm micro-kernel associated with
the template configuration.
For more info, please refer to the BLIS website and/or contact the
blis-devel mailing list.
and/or contact the blis-devel mailing list.
-FGVZ
*/
const dim_t mr = bli_dmr;
const dim_t nr = bli_dnr;
const num_t dt = BLIS_DOUBLE;
const inc_t cs_a = bli_dpackmr;
const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx );
const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = bli_dpacknr;
const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx );
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_ab = 1;
const inc_t cs_ab = bli_dmr;
const inc_t cs_a = packmr;
const inc_t rs_b = packnr;
const inc_t rs_ab = 1;
const inc_t cs_ab = mr;
dim_t l, j, i;
@@ -291,36 +193,56 @@ void bli_cgemm_opt_mxn(
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
(
dim_t k,
scomplex* restrict alpha,
scomplex* restrict a1,
scomplex* restrict b1,
scomplex* restrict beta,
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/* Just call the reference implementation. */
BLIS_CGEMM_UKERNEL_REF( k,
alpha,
a1,
b1,
beta,
c11, rs_c, cs_c,
data );
BLIS_CGEMM_UKERNEL_REF
(
k,
alpha,
a1,
b1,
beta,
c11, rs_c, cs_c,
data,
cntx
);
}
void bli_zgemm_opt_mxn(
dim_t k,
dcomplex* restrict alpha,
dcomplex* restrict a1,
dcomplex* restrict b1,
dcomplex* restrict beta,
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_zgemm_opt_mxn
(
dim_t k,
dcomplex* restrict alpha,
dcomplex* restrict a1,
dcomplex* restrict b1,
dcomplex* restrict beta,
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/* Just call the reference implementation. */
BLIS_ZGEMM_UKERNEL_REF( k,
alpha,
a1,
b1,
beta,
c11, rs_c, cs_c,
data );
BLIS_ZGEMM_UKERNEL_REF
(
k,
alpha,
a1,
b1,
beta,
c11, rs_c, cs_c,
data,
cntx
);
}

View File

@@ -36,18 +36,24 @@
void bli_sgemmtrsm_l_opt_mxn(
dim_t k,
float* restrict alpha,
float* restrict a10,
float* restrict a11,
float* restrict b01,
float* restrict b11,
float* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_sgemmtrsm_l_opt_mxn
(
dim_t k,
float* restrict alpha,
float* restrict a10,
float* restrict a11,
float* restrict b01,
float* restrict b11,
float* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
const inc_t rs_b = bli_spacknr;
const num_t dt = BLIS_FLOAT;
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = packnr;
const inc_t cs_b = 1;
float* restrict minus_one = bli_sm1;
@@ -69,16 +75,18 @@ void bli_sgemmtrsm_l_opt_mxn(
void bli_dgemmtrsm_l_opt_mxn(
dim_t k,
double* restrict alpha,
double* restrict a10,
double* restrict a11,
double* restrict b01,
double* restrict b11,
double* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_dgemmtrsm_l_opt_mxn
(
dim_t k,
double* restrict alpha,
double* restrict a10,
double* restrict a11,
double* restrict b01,
double* restrict b11,
double* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/*
Template gemmtrsm_l micro-kernel implementation
@@ -96,114 +104,19 @@ void bli_dgemmtrsm_l_opt_mxn(
B11 is MR x NR, and alpha is a scalar. Here, inv() denotes matrix
inverse.
Parameters:
For more info, please refer to the BLIS website's wiki on kernels:
- k: The number of columns of A10 and rows of B01.
- alpha: The address of a scalar to be applied to B11.
- a10: The address of A10, which is the MR x k submatrix of the packed
micro-panel of A that is situated to the left of the MR x MR
triangular submatrix A11. A10 is stored by columns with leading
dimension PACKMR, where typically PACKMR = MR.
- a11: The address of A11, which is the MR x MR lower triangular
submatrix within the packed micro-panel of matrix A that is
situated to the right of A10. A11 is stored by columns with
leading dimension PACKMR, where typically PACKMR = MR. Note
that A11 contains elements in both triangles, though elements
in the unstored triangle are not guaranteed to be zero and
thus should not be referenced.
- b01: The address of B01, which is the k x NR submatrix of the packed
micro-panel of B that is situated above the MR x NR submatrix
B11. B01 is stored by rows with leading dimension PACKNR, where
typically PACKNR = NR.
- b11: The address B11, which is the MR x NR submatrix of the packed
micro-panel of B, situated below B01. B11 is stored by rows
with leading dimension PACKNR, where typically PACKNR = NR.
- c11: The address of C11, which is the MR x NR submatrix of matrix
C, stored according to rs_c and cs_c. C11 is the submatrix
within C that corresponds to the elements which were packed
into B11. Thus, C is the original input matrix B to the overall
trsm operation.
- rs_c: The row stride of C11 (ie: the distance to the next row of C11,
in units of matrix elements).
- cs_c: The column stride of C11 (ie: the distance to the next column of
C11, in units of matrix elements).
- data: The address of an auxinfo_t object that contains auxiliary
information that may be useful when optimizing the gemmtrsm
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
more info.)
https://github.com/flame/blis/wiki/KernelsHowTo
Diagram for gemmtrsm_l
The diagram below shows the packed micro-panel operands for trsm_l and
how elements of each would be stored when MR = NR = 4. (The hex digits
indicate the layout and order (but NOT the numeric contents) in memory.
Here, matrix A11 (referenced by a11) is lower triangular. Matrix A11
does contain elements corresponding to the strictly upper triangle,
however, they are not guaranteed to contain zeros and thus these elements
should not be referenced.
NR
_______
b01:|0 1 2 3|
|4 5 6 7|
|8 9 A B|
|C D E F|
k | . |
| . |
a10: a11: | . |
___________________ _______ |_______|
|0 4 8 C |`. | b11:| |
MR |1 5 9 D . . . | `. | | |
|2 6 A E | `. | MR | |
|3_7_B_F____________|______`.| |_______|
k MR
Implementation Notes for gemmtrsm
- Register blocksizes. See Implementation Notes for gemm.
- Leading dimensions of a1 and b1: PACKMR and PACKNR. See Implementation
Notes for gemm.
- Edge cases in MR, NR dimensions. See Implementation Notes for gemm.
- Alignment of a1 and b1. The addresses a1 and b1 are aligned according
to PACKMR*sizeof(type) and PACKNR*sizeof(type), respectively.
- Unrolling loops. Most optimized implementations should unroll all
three loops within the trsm subproblem of gemmtrsm. See Implementation
Notes for gemm for remarks on unrolling the gemm subproblem.
- Prefetching next micro-panels of A and B. When invoked from within a
gemmtrsm_l micro-kernel, the addresses accessible via
bli_auxinfo_next_a() and bli_auxinfo_next_b() refer to the next
invocation's a10 and b01, respectively, while in gemmtrsm_u, the
_next_a() and _next_b() macros return the addresses of the next
invocation's a11 and b11 (since those submatrices precede a12 and b21).
(See BLIS KernelsHowTo wiki for more info.)
- Zero alpha. The micro-kernel can safely assume that alpha is non-zero;
"alpha equals zero" handling is performed at a much higher level,
which means that, in such a scenario, the micro-kernel will never get
called.
- Diagonal elements of A11. See Implementation Notes for trsm.
- Zero elements of A11. See Implementation Notes for trsm.
- Output. See Implementation Notes for trsm.
- Optimization. Let's assume that the gemm micro-kernel has already been
optimized. You have two options with regard to optimizing the fused
gemmtrsm micro-kernels:
(1) Optimize only the trsm micro-kernels. This will result in the gemm
and trsm_l micro-kernels being called in sequence. (Likewise for
gemm and trsm_u.)
(2) Fuse the implementation of the gemm micro-kernel with that of the
trsm micro-kernels by inlining both into the gemmtrsm_l and
gemmtrsm_u micro-kernel definitions. This option is more labor-
intensive, but also more likely to yield higher performance because
it avoids redundant memory operations on the packed MR x NR
submatrix B11.
For more info, please refer to the BLIS website and/or contact the
blis-devel mailing list.
and/or contact the blis-devel mailing list.
-FGVZ
*/
const inc_t rs_b = bli_dpacknr;
const num_t dt = BLIS_DOUBLE;
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = packnr;
const inc_t cs_b = 1;
double* restrict minus_one = bli_dm1;
@@ -227,18 +140,24 @@ void bli_dgemmtrsm_l_opt_mxn(
void bli_cgemmtrsm_l_opt_mxn(
dim_t k,
scomplex* restrict alpha,
scomplex* restrict a10,
scomplex* restrict a11,
scomplex* restrict b01,
scomplex* restrict b11,
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_cgemmtrsm_l_opt_mxn
(
dim_t k,
scomplex* restrict alpha,
scomplex* restrict a10,
scomplex* restrict a11,
scomplex* restrict b01,
scomplex* restrict b11,
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
const inc_t rs_b = bli_cpacknr;
const num_t dt = BLIS_SCOMPLEX;
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = packnr;
const inc_t cs_b = 1;
scomplex* restrict minus_one = bli_cm1;
@@ -260,18 +179,24 @@ void bli_cgemmtrsm_l_opt_mxn(
void bli_zgemmtrsm_l_opt_mxn(
dim_t k,
dcomplex* restrict alpha,
dcomplex* restrict a10,
dcomplex* restrict a11,
dcomplex* restrict b01,
dcomplex* restrict b11,
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_zgemmtrsm_l_opt_mxn
(
dim_t k,
dcomplex* restrict alpha,
dcomplex* restrict a10,
dcomplex* restrict a11,
dcomplex* restrict b01,
dcomplex* restrict b11,
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
const inc_t rs_b = bli_zpacknr;
const num_t dt = BLIS_DCOMPLEX;
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = packnr;
const inc_t cs_b = 1;
dcomplex* restrict minus_one = bli_zm1;

View File

@@ -36,18 +36,24 @@
void bli_sgemmtrsm_u_opt_mxn(
dim_t k,
float* restrict alpha,
float* restrict a12,
float* restrict a11,
float* restrict b21,
float* restrict b11,
float* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_sgemmtrsm_u_opt_mxn
(
dim_t k,
float* restrict alpha,
float* restrict a10,
float* restrict a11,
float* restrict b01,
float* restrict b11,
float* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
const inc_t rs_b = bli_spacknr;
const num_t dt = BLIS_FLOAT;
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = packnr;
const inc_t cs_b = 1;
float* restrict minus_one = bli_sm1;
@@ -69,16 +75,18 @@ void bli_sgemmtrsm_u_opt_mxn(
void bli_dgemmtrsm_u_opt_mxn(
dim_t k,
double* restrict alpha,
double* restrict a12,
double* restrict a11,
double* restrict b21,
double* restrict b11,
double* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_dgemmtrsm_u_opt_mxn
(
dim_t k,
double* restrict alpha,
double* restrict a10,
double* restrict a11,
double* restrict b01,
double* restrict b11,
double* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/*
Template gemmtrsm_u micro-kernel implementation
@@ -96,111 +104,19 @@ void bli_dgemmtrsm_u_opt_mxn(
B11 is MR x NR, and alpha is a scalar. Here, inv() denotes matrix
inverse.
Parameters:
For more info, please refer to the BLIS website's wiki on kernels:
- k: The number of columns of A12 and rows of B21.
- alpha: The address of a scalar to be applied to B11.
- a12: The address of A12, which is the MR x k submatrix of the packed
micro-panel of A that is situated to the right of the MR x MR
triangular submatrix A11. A12 is stored by columns with leading
dimension PACKMR, where typically PACKMR = MR.
- a11: The address of A11, which is the MR x MR upper triangular
submatrix within the packed micro-panel of matrix A that is
situated to the left of A12. A11 is stored by columns with
leading dimension PACKMR, where typically PACKMR = MR. Note
that A11 contains elements in both triangles, though elements
in the unstored triangle are not guaranteed to be zero and
thus should not be referenced.
- b21: The address of B21, which is the k x NR submatrix of the packed
micro-panel of B that is situated above the MR x NR submatrix
B11. B01 is stored by rows with leading dimension PACKNR, where
typically PACKNR = NR.
- b11: The address B11, which is the MR x NR submatrix of the packed
micro-panel of B, situated below B01. B11 is stored by rows
with leading dimension PACKNR, where typically PACKNR = NR.
- c11: The address of C11, which is the MR x NR submatrix of matrix
C, stored according to rs_c and cs_c. C11 is the submatrix
within C that corresponds to the elements which were packed
into B11. Thus, C is the original input matrix B to the overall
trsm operation.
- rs_c: The row stride of C11 (ie: the distance to the next row of C11,
in units of matrix elements).
- cs_c: The column stride of C11 (ie: the distance to the next column of
C11, in units of matrix elements).
- data: The address of an auxinfo_t object that contains auxiliary
information that may be useful when optimizing the gemmtrsm
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
more info.)
https://github.com/flame/blis/wiki/KernelsHowTo
Diagram for gemmtrsm_u
The diagram below shows the packed micro-panel operands for trsm_l and
how elements of each would be stored when MR = NR = 4. (The hex digits
indicate the layout and order (but NOT the numeric contents) in memory.
Here, matrix A11 (referenced by a11) is upper triangular. Matrix A11
does contain elements corresponding to the strictly lower triangle,
however, they are not guaranteed to contain zeros and thus these elements
should not be referenced.
a11: a12: NR
________ ___________________ _______
|`. |0 4 8 | b11:|0 1 2 3|
MR | `. |1 5 9 . . . | |4 5 6 7|
| `. |2 6 A | MR |8 9 A B|
|______`.|3_7_B______________| |___.___|
b21:| . |
MR k | . |
| |
| |
NOTE: Storage digits are shown k | |
starting with a12 to avoid | |
obscuring triangular structure | |
of a11. |_______|
Implementation Notes for gemmtrsm
- Register blocksizes. See Implementation Notes for gemm.
- Leading dimensions of a1 and b1: PACKMR and PACKNR. See Implementation
Notes for gemm.
- Edge cases in MR, NR dimensions. See Implementation Notes for gemm.
- Alignment of a1 and b1. The addresses a1 and b1 are aligned according
to PACKMR*sizeof(type) and PACKNR*sizeof(type), respectively.
- Unrolling loops. Most optimized implementations should unroll all
three loops within the trsm subproblem of gemmtrsm. See Implementation
Notes for gemm for remarks on unrolling the gemm subproblem.
- Prefetching next micro-panels of A and B. When invoked from within a
gemmtrsm_l micro-kernel, the addresses accessible via
bli_auxinfo_next_a() and bli_auxinfo_next_b() refer to the next
invocation's a10 and b01, respectively, while in gemmtrsm_u, the
_next_a() and _next_b() macros return the addresses of the next
invocation's a11 and b11 (since those submatrices precede a12 and b21).
(See BLIS KernelsHowTo wiki for more info.)
- Zero alpha. The micro-kernel can safely assume that alpha is non-zero;
"alpha equals zero" handling is performed at a much higher level,
which means that, in such a scenario, the micro-kernel will never get
called.
- Diagonal elements of A11. See Implementation Notes for trsm.
- Zero elements of A11. See Implementation Notes for trsm.
- Output. See Implementation Notes for trsm.
- Optimization. Let's assume that the gemm micro-kernel has already been
optimized. You have two options with regard to optimizing the fused
gemmtrsm micro-kernels:
(1) Optimize only the trsm micro-kernels. This will result in the gemm
and trsm_l micro-kernels being called in sequence. (Likewise for
gemm and trsm_u.)
(2) Fuse the implementation of the gemm micro-kernel with that of the
trsm micro-kernels by inlining both into the gemmtrsm_l and
gemmtrsm_u micro-kernel definitions. This option is more labor-
intensive, but also more likely to yield higher performance because
it avoids redundant memory operations on the packed MR x NR
submatrix B11.
For more info, please refer to the BLIS website and/or contact the
blis-devel mailing list.
and/or contact the blis-devel mailing list.
-FGVZ
*/
const inc_t rs_b = bli_dpacknr;
const num_t dt = BLIS_DOUBLE;
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = packnr;
const inc_t cs_b = 1;
double* restrict minus_one = bli_dm1;
@@ -224,18 +140,24 @@ void bli_dgemmtrsm_u_opt_mxn(
void bli_cgemmtrsm_u_opt_mxn(
dim_t k,
scomplex* restrict alpha,
scomplex* restrict a12,
scomplex* restrict a11,
scomplex* restrict b21,
scomplex* restrict b11,
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_cgemmtrsm_u_opt_mxn
(
dim_t k,
scomplex* restrict alpha,
scomplex* restrict a10,
scomplex* restrict a11,
scomplex* restrict b01,
scomplex* restrict b11,
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
const inc_t rs_b = bli_cpacknr;
const num_t dt = BLIS_SCOMPLEX;
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = packnr;
const inc_t cs_b = 1;
scomplex* restrict minus_one = bli_cm1;
@@ -257,18 +179,24 @@ void bli_cgemmtrsm_u_opt_mxn(
void bli_zgemmtrsm_u_opt_mxn(
dim_t k,
dcomplex* restrict alpha,
dcomplex* restrict a12,
dcomplex* restrict a11,
dcomplex* restrict b21,
dcomplex* restrict b11,
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_zgemmtrsm_u_opt_mxn
(
dim_t k,
dcomplex* restrict alpha,
dcomplex* restrict a10,
dcomplex* restrict a11,
dcomplex* restrict b01,
dcomplex* restrict b11,
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
const inc_t rs_b = bli_zpacknr;
const num_t dt = BLIS_DCOMPLEX;
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = packnr;
const inc_t cs_b = 1;
dcomplex* restrict minus_one = bli_zm1;

View File

@@ -36,28 +36,36 @@
void bli_strsm_l_opt_mxn(
float* restrict a11,
float* restrict b11,
float* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_strsm_l_opt_mxn
(
float* restrict a11,
float* restrict b11,
float* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/* Just call the reference implementation. */
BLIS_STRSM_L_UKERNEL_REF( a11,
b11,
c11, rs_c, cs_c,
data );
BLIS_STRSM_L_UKERNEL_REF
(
a11,
b11,
c11, rs_c, cs_c,
data,
cntx
);
}
void bli_dtrsm_l_opt_mxn(
double* restrict a11,
double* restrict b11,
double* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_dtrsm_l_opt_mxn
(
double* restrict a11,
double* restrict b11,
double* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/*
Template trsm_l micro-kernel implementation
@@ -76,80 +84,28 @@ void bli_dtrsm_l_opt_mxn(
where A11 is MR x MR and lower triangular, B11 is MR x NR, and C11 is
MR x NR.
Parameters:
For more info, please refer to the BLIS website's wiki on kernels:
- a11: The address of A11, which is the MR x MR lower triangular
submatrix within the packed micro-panel of matrix A. A11 is
stored by columns with leading dimension PACKMR, where
typically PACKMR = MR. Note that A11 contains elements in both
triangles, though elements in the unstored triangle are not
guaranteed to be zero and thus should not be referenced.
- b11: The address of B11, which is an MR x NR submatrix of the
packed micro-panel of B. B11 is stored by rows with leading
dimension PACKNR, where typically PACKNR = NR.
- c11: The address of C11, which is an MR x NR submatrix of matrix C,
stored according to rs_c and cs_c. C11 is the submatrix within
C that corresponds to the elements which were packed into B11.
Thus, C is the original input matrix B to the overall trsm
operation.
- rs_c: The row stride of C11 (ie: the distance to the next row of C11,
in units of matrix elements).
- cs_c: The column stride of C11 (ie: the distance to the next column of
C11, in units of matrix elements).
- data: The address of an auxinfo_t object that contains auxiliary
information that may be useful when optimizing the trsm
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
more info.)
https://github.com/flame/blis/wiki/KernelsHowTo
Diagrams for trsm
Please see the diagram for gemmtrsm_l to see depiction of the trsm_l and
where it fits in with its preceding gemm subproblem.
Implementation Notes for trsm
- Register blocksizes. See Implementation Notes for gemm.
- Leading dimensions of a11 and b11: PACKMR and PACKNR. See
Implementation Notes for gemm.
- Edge cases in MR, NR dimensions. See Implementation Notes for gemm.
- Alignment of a11 and b11. See Implementation Notes for gemmtrsm.
- Unrolling loops. Most optimized implementations should unroll all
three loops within the trsm micro-kernel.
- Prefetching next micro-panels of A and B. We advise against using
the bli_auxinfo_next_a() and bli_auxinfo_next_b() macros from within
the trsm_l and trsm_u micro-kernels, since the values returned usually
only make sense in the context of the overall gemmtrsm subproblem.
- Diagonal elements of A11. At the time this micro-kernel is called,
the diagonal entries of triangular matrix A11 contain the inverse of
the original elements. This inversion is done during packing so that
we can avoid expensive division instructions within the micro-kernel
itself. If the diag parameter to the higher level trsm operation was
equal to BLIS_UNIT_DIAG, the diagonal elements will be explicitly
unit.
- Zero elements of A11. Since A11 is lower triangular (for trsm_l), the
strictly upper triangle implicitly contains zeros. Similarly, the
strictly lower triangle of A11 implicitly contains zeros when A11 is
upper triangular (for trsm_u). However, the packing function may or
may not actually write zeros to this region. Thus, while the
implementation may reference these elements, it should not use them
in any computation.
- Output. This micro-kernel must write its result to two places: the
submatrix B11 of the current packed micro-panel of B and the submatrix
C11 of the output matrix C.
For more info, please refer to the BLIS website and/or contact the
blis-devel mailing list.
and/or contact the blis-devel mailing list.
-FGVZ
*/
const dim_t m = bli_dmr;
const dim_t n = bli_dnr;
const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx );
const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx );
const inc_t rs_a = 1;
const inc_t cs_a = bli_dpackmr;
const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx );
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = bli_dpacknr;
const inc_t cs_b = 1;
const dim_t m = mr;
const dim_t n = nr;
const inc_t rs_a = 1;
const inc_t cs_a = packmr;
const inc_t rs_b = packnr;
const inc_t cs_b = 1;
dim_t iter, i, j, l;
dim_t n_behind;
@@ -208,33 +164,45 @@ void bli_dtrsm_l_opt_mxn(
void bli_ctrsm_l_opt_mxn(
scomplex* restrict a11,
scomplex* restrict b11,
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_ctrsm_l_opt_mxn
(
scomplex* restrict a11,
scomplex* restrict b11,
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/* Just call the reference implementation. */
BLIS_CTRSM_L_UKERNEL_REF( a11,
b11,
c11, rs_c, cs_c,
data );
BLIS_CTRSM_L_UKERNEL_REF
(
a11,
b11,
c11, rs_c, cs_c,
data,
cntx
);
}
void bli_ztrsm_l_opt_mxn(
dcomplex* restrict a11,
dcomplex* restrict b11,
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_ztrsm_l_opt_mxn
(
dcomplex* restrict a11,
dcomplex* restrict b11,
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/* Just call the reference implementation. */
BLIS_ZTRSM_L_UKERNEL_REF( a11,
b11,
c11, rs_c, cs_c,
data );
BLIS_ZTRSM_L_UKERNEL_REF
(
a11,
b11,
c11, rs_c, cs_c,
data,
cntx
);
}

View File

@@ -36,18 +36,24 @@
void bli_strsm_u_opt_mxn(
float* restrict a11,
float* restrict b11,
float* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_strsm_u_opt_mxn
(
float* restrict a11,
float* restrict b11,
float* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/* Just call the reference implementation. */
BLIS_STRSM_U_UKERNEL_REF( a11,
b11,
c11, rs_c, cs_c,
data );
BLIS_STRSM_U_UKERNEL_REF
(
a11,
b11,
c11, rs_c, cs_c,
data,
cntx
);
}
@@ -58,6 +64,13 @@ void bli_dtrsm_u_opt_mxn(
double* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
(
double* restrict a11,
double* restrict b11,
double* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/*
Template trsm_u micro-kernel implementation
@@ -76,79 +89,28 @@ void bli_dtrsm_u_opt_mxn(
where A11 is MR x MR and upper triangular, B11 is MR x NR, and C11 is
MR x NR.
Parameters:
For more info, please refer to the BLIS website's wiki on kernels:
- a11: The address of A11, which is the MR x MR upper triangular
submatrix within the packed micro-panel of matrix A. A11 is
stored by columns with leading dimension PACKMR, where
typically PACKMR = MR. Note that A11 contains elements in both
triangles, though elements in the unstored triangle are not
guaranteed to be zero and thus should not be referenced.
- b11: The address of B11, which is an MR x NR submatrix of the
packed micro-panel of B. B11 is stored by rows with leading
dimension PACKNR, where typically PACKNR = NR.
- c11: The address of C11, which is an MR x NR submatrix of matrix C,
stored according to rs_c and cs_c. C11 is the submatrix within
C that corresponds to the elements which were packed into B11.
Thus, C is the original input matrix B to the overall trsm
operation.
- rs_c: The row stride of C11 (ie: the distance to the next row of C11,
in units of matrix elements).
- cs_c: The column stride of C11 (ie: the distance to the next column of
C11, in units of matrix elements).
- data: The address of an auxinfo_t object that contains auxiliary
information that may be useful when optimizing the trsm
micro-kernel implementation. (See BLIS KernelsHowTo wiki for
more info.)
https://github.com/flame/blis/wiki/KernelsHowTo
Diagrams for trsm
Please see the diagram for gemmtrsm_u to see depiction of the trsm_u and
where it fits in with its preceding gemm subproblem.
Implementation Notes for trsm
- Register blocksizes. See Implementation Notes for gemm.
- Leading dimensions of a11 and b11: PACKMR and PACKNR. See
Implementation Notes for gemm.
- Edge cases in MR, NR dimensions. See Implementation Notes for gemm.
- Alignment of a11 and b11. See Implementation Notes for gemmtrsm.
- Unrolling loops. Most optimized implementations should unroll all
three loops within the trsm micro-kernel.
- Prefetching next micro-panels of A and B. We advise against using
the bli_auxinfo_next_a() and bli_auxinfo_next_b() macros from within
the trsm_l and trsm_u micro-kernels, since the values returned usually
only make sense in the context of the overall gemmtrsm subproblem.
- Diagonal elements of A11. At the time this micro-kernel is called,
the diagonal entries of triangular matrix A11 contain the inverse of
the original elements. This inversion is done during packing so that
we can avoid expensive division instructions within the micro-kernel
itself. If the diag parameter to the higher level trsm operation was
equal to BLIS_UNIT_DIAG, the diagonal elements will be explicitly
unit.
- Zero elements of A11. Since A11 is lower triangular (for trsm_l), the
strictly upper triangle implicitly contains zeros. Similarly, the
strictly lower triangle of A11 implicitly contains zeros when A11 is
upper triangular (for trsm_u). However, the packing function may or
may not actually write zeros to this region. Thus, the implementation
should not reference these elements.
- Output. This micro-kernel must write its result to two places: the
submatrix B11 of the current packed micro-panel of B and the submatrix
C11 of the output matrix C.
For more info, please refer to the BLIS website and/or contact the
blis-devel mailing list.
and/or contact the blis-devel mailing list.
-FGVZ
*/
const dim_t m = bli_dmr;
const dim_t n = bli_dnr;
const dim_t mr = bli_cntx_get_blksz_def_dt( dt, BLIS_MR, cntx );
const dim_t nr = bli_cntx_get_blksz_def_dt( dt, BLIS_NR, cntx );
const inc_t rs_a = 1;
const inc_t cs_a = bli_dpackmr;
const inc_t packmr = bli_cntx_get_blksz_max_dt( dt, BLIS_MR, cntx );
const inc_t packnr = bli_cntx_get_blksz_max_dt( dt, BLIS_NR, cntx );
const inc_t rs_b = bli_dpacknr;
const inc_t cs_b = 1;
const dim_t m = mr;
const dim_t n = nr;
const inc_t rs_a = 1;
const inc_t cs_a = packmr;
const inc_t rs_b = packnr;
const inc_t cs_b = 1;
dim_t iter, i, j, l;
dim_t n_behind;
@@ -207,33 +169,45 @@ void bli_dtrsm_u_opt_mxn(
void bli_ctrsm_u_opt_mxn(
scomplex* restrict a11,
scomplex* restrict b11,
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_ctrsm_u_opt_mxn
(
scomplex* restrict a11,
scomplex* restrict b11,
scomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/* Just call the reference implementation. */
BLIS_CTRSM_U_UKERNEL_REF( a11,
b11,
c11, rs_c, cs_c,
data );
BLIS_CTRSM_U_UKERNEL_REF
(
a11,
b11,
c11, rs_c, cs_c,
data,
cntx
);
}
void bli_ztrsm_u_opt_mxn(
dcomplex* restrict a11,
dcomplex* restrict b11,
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* data
)
void bli_ztrsm_u_opt_mxn
(
dcomplex* restrict a11,
dcomplex* restrict b11,
dcomplex* restrict c11, inc_t rs_c, inc_t cs_c,
auxinfo_t* restrict data,
cntx_t* restrict cntx
)
{
/* Just call the reference implementation. */
BLIS_ZTRSM_U_UKERNEL_REF( a11,
b11,
c11, rs_c, cs_c,
data );
BLIS_ZTRSM_U_UKERNEL_REF
(
a11,
b11,
c11, rs_c, cs_c,
data,
cntx
);
}

View File

@@ -32,8 +32,10 @@
*/
//
// Prototype object-based fusing factor query routine.
//
dim_t bli_dotxaxpyf_fusefac( num_t dt );
#include "bli_l0_check.h"
#include "bli_l0_oapi.h"
#include "bli_l0_tapi.h"
// copysc
#include "bli_copysc.h"

314
frame/0/bli_l0_check.c Normal file
View File

@@ -0,0 +1,314 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of The University of Texas at Austin nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
//
// Define object-based check functions.
//
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,_check) \
( \
obj_t* chi, \
obj_t* psi \
) \
{ \
bli_l0_xxsc_check( chi, psi ); \
}
GENFRONT( addsc )
GENFRONT( copysc )
GENFRONT( divsc )
GENFRONT( mulsc )
GENFRONT( sqrtsc )
GENFRONT( subsc )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(opname,_check) \
( \
obj_t* chi, \
obj_t* norm \
) \
{ \
bli_l0_xx2sc_check( chi, norm ); \
}
GENFRONT( absqsc )
GENFRONT( normfsc )
void bli_getsc_check
(
obj_t* chi,
double* zeta_r,
double* zeta_i
)
{
err_t e_val;
// Check object datatypes.
e_val = bli_check_noninteger_object( chi );
bli_check_error_code( e_val );
// Check object dimensions.
e_val = bli_check_scalar_object( chi );
bli_check_error_code( e_val );
// Check object buffers (for non-NULLness).
e_val = bli_check_object_buffer( chi );
bli_check_error_code( e_val );
}
void bli_setsc_check
(
double zeta_r,
double zeta_i,
obj_t* chi
)
{
err_t e_val;
// Check object datatypes.
e_val = bli_check_floating_object( chi );
bli_check_error_code( e_val );
// Check object dimensions.
e_val = bli_check_scalar_object( chi );
bli_check_error_code( e_val );
// Check object buffers (for non-NULLness).
e_val = bli_check_object_buffer( chi );
bli_check_error_code( e_val );
}
void bli_unzipsc_check
(
obj_t* chi,
obj_t* zeta_r,
obj_t* zeta_i
)
{
err_t e_val;
// Check object datatypes.
e_val = bli_check_noninteger_object( chi );
bli_check_error_code( e_val );
e_val = bli_check_real_object( zeta_r );
bli_check_error_code( e_val );
e_val = bli_check_real_object( zeta_i );
bli_check_error_code( e_val );
e_val = bli_check_nonconstant_object( zeta_r );
bli_check_error_code( e_val );
e_val = bli_check_nonconstant_object( zeta_i );
bli_check_error_code( e_val );
e_val = bli_check_object_real_proj_of( chi, zeta_r );
bli_check_error_code( e_val );
e_val = bli_check_object_real_proj_of( chi, zeta_i );
bli_check_error_code( e_val );
// Check object dimensions.
e_val = bli_check_scalar_object( chi );
bli_check_error_code( e_val );
e_val = bli_check_scalar_object( zeta_r );
bli_check_error_code( e_val );
e_val = bli_check_scalar_object( zeta_i );
bli_check_error_code( e_val );
// Check object buffers (for non-NULLness).
e_val = bli_check_object_buffer( chi );
bli_check_error_code( e_val );
e_val = bli_check_object_buffer( zeta_r );
bli_check_error_code( e_val );
e_val = bli_check_object_buffer( zeta_i );
bli_check_error_code( e_val );
}
void bli_zipsc_check
(
obj_t* zeta_r,
obj_t* zeta_i,
obj_t* chi
)
{
err_t e_val;
// Check object datatypes.
e_val = bli_check_real_object( zeta_r );
bli_check_error_code( e_val );
e_val = bli_check_real_object( zeta_i );
bli_check_error_code( e_val );
e_val = bli_check_noninteger_object( chi );
bli_check_error_code( e_val );
e_val = bli_check_nonconstant_object( chi );
bli_check_error_code( e_val );
e_val = bli_check_object_real_proj_of( chi, zeta_r );
bli_check_error_code( e_val );
e_val = bli_check_object_real_proj_of( chi, zeta_i );
bli_check_error_code( e_val );
// Check object dimensions.
e_val = bli_check_scalar_object( zeta_r );
bli_check_error_code( e_val );
e_val = bli_check_scalar_object( zeta_i );
bli_check_error_code( e_val );
e_val = bli_check_scalar_object( chi );
bli_check_error_code( e_val );
// Check object buffers (for non-NULLness).
e_val = bli_check_object_buffer( zeta_r );
bli_check_error_code( e_val );
e_val = bli_check_object_buffer( zeta_i );
bli_check_error_code( e_val );
e_val = bli_check_object_buffer( chi );
bli_check_error_code( e_val );
}
// -----------------------------------------------------------------------------
void bli_l0_xxsc_check
(
obj_t* chi,
obj_t* psi
)
{
err_t e_val;
// Check object datatypes.
e_val = bli_check_noninteger_object( chi );
bli_check_error_code( e_val );
e_val = bli_check_noninteger_object( psi );
bli_check_error_code( e_val );
e_val = bli_check_nonconstant_object( psi );
bli_check_error_code( e_val );
// Check object dimensions.
e_val = bli_check_scalar_object( chi );
bli_check_error_code( e_val );
e_val = bli_check_scalar_object( psi );
bli_check_error_code( e_val );
// Check object buffers (for non-NULLness).
e_val = bli_check_object_buffer( chi );
bli_check_error_code( e_val );
e_val = bli_check_object_buffer( psi );
bli_check_error_code( e_val );
}
void bli_l0_xx2sc_check
(
obj_t* chi,
obj_t* absq
)
{
err_t e_val;
// Check object datatypes.
e_val = bli_check_noninteger_object( chi );
bli_check_error_code( e_val );
e_val = bli_check_nonconstant_object( absq );
bli_check_error_code( e_val );
e_val = bli_check_real_object( absq );
bli_check_error_code( e_val );
e_val = bli_check_object_real_proj_of( chi, absq );
bli_check_error_code( e_val );
// Check object dimensions.
e_val = bli_check_scalar_object( chi );
bli_check_error_code( e_val );
e_val = bli_check_scalar_object( absq );
bli_check_error_code( e_val );
// Check object buffers (for non-NULLness).
e_val = bli_check_object_buffer( chi );
bli_check_error_code( e_val );
e_val = bli_check_object_buffer( absq );
bli_check_error_code( e_val );
}

134
frame/0/bli_l0_check.h Normal file
View File

@@ -0,0 +1,134 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of The University of Texas at Austin nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
//
// Prototype object-based check functions.
//
#undef GENTPROT
#define GENTPROT( opname ) \
\
void PASTEMAC(opname,_check) \
( \
obj_t* chi, \
obj_t* psi \
);
GENTPROT( addsc )
GENTPROT( copysc )
GENTPROT( divsc )
GENTPROT( mulsc )
GENTPROT( sqrtsc )
GENTPROT( subsc )
#undef GENTPROT
#define GENTPROT( opname ) \
\
void PASTEMAC(opname,_check) \
( \
obj_t* chi, \
obj_t* absq \
);
GENTPROT( absqsc )
GENTPROT( normfsc )
#undef GENTPROT
#define GENTPROT( opname ) \
\
void PASTEMAC(opname,_check) \
( \
obj_t* chi, \
double* zeta_r, \
double* zeta_i \
);
GENTPROT( getsc )
#undef GENTPROT
#define GENTPROT( opname ) \
\
void PASTEMAC(opname,_check) \
( \
double zeta_r, \
double zeta_i, \
obj_t* chi \
);
GENTPROT( setsc )
#undef GENTPROT
#define GENTPROT( opname ) \
\
void PASTEMAC(opname,_check) \
( \
obj_t* chi, \
obj_t* zeta_r, \
obj_t* zeta_i \
);
GENTPROT( unzipsc )
#undef GENTPROT
#define GENTPROT( opname ) \
\
void PASTEMAC(opname,_check) \
( \
obj_t* zeta_r, \
obj_t* zeta_i, \
obj_t* chi \
);
GENTPROT( zipsc )
// -----------------------------------------------------------------------------
void bli_l0_xxsc_check
(
obj_t* chi,
obj_t* psi
);
void bli_l0_xx2sc_check
(
obj_t* chi,
obj_t* norm
);

288
frame/0/bli_l0_oapi.c Normal file
View File

@@ -0,0 +1,288 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of The University of Texas at Austin nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
//
// Define object-based interfaces.
//
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* chi, \
obj_t* absq \
) \
{ \
num_t dt_chi; \
num_t dt_absq_c = bli_obj_datatype_proj_to_complex( *absq ); \
\
void* buf_chi; \
void* buf_absq = bli_obj_buffer_at_off( *absq ); \
\
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( chi, absq ); \
\
/* If chi is a scalar constant, use dt_absq_c to extract the address of the
corresponding constant value; otherwise, use the datatype encoded
within the chi object and extract the buffer at the chi offset. */ \
bli_set_scalar_dt_buffer( chi, dt_absq_c, dt_chi, buf_chi ); \
\
/* Invoke the typed function. */ \
bli_call_ft_2 \
( \
dt_chi, \
opname, \
buf_chi, \
buf_absq \
); \
}
GENFRONT( absqsc )
GENFRONT( normfsc )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* chi, \
obj_t* psi \
) \
{ \
num_t dt = bli_obj_datatype( *psi ); \
\
conj_t conjchi = bli_obj_conj_status( *chi ); \
\
void* buf_chi = bli_obj_buffer_for_1x1( dt, *chi ); \
void* buf_psi = bli_obj_buffer_at_off( *psi ); \
\
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( chi, psi ); \
\
/* Invoke the typed function. */ \
bli_call_ft_3 \
( \
dt, \
opname, \
conjchi, \
buf_chi, \
buf_psi \
); \
}
GENFRONT( addsc )
GENFRONT( divsc )
GENFRONT( mulsc )
GENFRONT( subsc )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* chi, \
obj_t* psi \
) \
{ \
num_t dt = bli_obj_datatype( *psi ); \
\
void* buf_chi = bli_obj_buffer_for_1x1( dt, *chi ); \
void* buf_psi = bli_obj_buffer_at_off( *psi ); \
\
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( chi, psi ); \
\
/* Invoke the typed function. */ \
bli_call_ft_2 \
( \
dt, \
opname, \
buf_chi, \
buf_psi \
); \
}
GENFRONT( sqrtsc )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* chi, \
double* zeta_r, \
double* zeta_i \
) \
{ \
num_t dt_chi = bli_obj_datatype( *chi ); \
num_t dt_def = BLIS_DCOMPLEX; \
num_t dt_use; \
\
/* If chi is a constant object, default to using the dcomplex
value to maximize precision, and since we don't know if the
caller needs just the real or the real and imaginary parts. */ \
void* buf_chi = bli_obj_buffer_for_1x1( dt_def, *chi ); \
\
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
\
/* The _check() routine prevents integer types, so we know that chi
is either a constant or an actual floating-point type. */ \
if ( bli_is_constant( dt_chi ) ) dt_use = dt_def; \
else dt_use = dt_chi; \
\
/* Invoke the typed function. */ \
bli_call_ft_3 \
( \
dt_use, \
opname, \
buf_chi, \
zeta_r, \
zeta_i \
); \
}
GENFRONT( getsc )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname) \
( \
double zeta_r, \
double zeta_i, \
obj_t* chi \
) \
{ \
num_t dt_chi = bli_obj_datatype( *chi ); \
\
void* buf_chi = bli_obj_buffer_at_off( *chi ); \
\
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( zeta_r, zeta_i, chi ); \
\
/* Invoke the typed function. */ \
bli_call_ft_3 \
( \
dt_chi, \
opname, \
zeta_r, \
zeta_i, \
buf_chi \
); \
}
GENFRONT( setsc )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* chi, \
obj_t* zeta_r, \
obj_t* zeta_i \
) \
{ \
num_t dt_chi; \
num_t dt_zeta_c = bli_obj_datatype_proj_to_complex( *zeta_r ); \
\
void* buf_chi; \
\
void* buf_zeta_r = bli_obj_buffer_at_off( *zeta_r ); \
void* buf_zeta_i = bli_obj_buffer_at_off( *zeta_i ); \
\
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
\
/* If chi is a scalar constant, use dt_zeta_c to extract the address of the
corresponding constant value; otherwise, use the datatype encoded
within the chi object and extract the buffer at the chi offset. */ \
bli_set_scalar_dt_buffer( chi, dt_zeta_c, dt_chi, buf_chi ); \
\
/* Invoke the typed function. */ \
bli_call_ft_3 \
( \
dt_chi, \
opname, \
buf_chi, \
buf_zeta_r, \
buf_zeta_i \
); \
}
GENFRONT( unzipsc )
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* zeta_r, \
obj_t* zeta_i, \
obj_t* chi \
) \
{ \
num_t dt_chi = bli_obj_datatype( *chi ); \
\
void* buf_zeta_r = bli_obj_buffer_for_1x1( dt_chi, *zeta_r ); \
void* buf_zeta_i = bli_obj_buffer_for_1x1( dt_chi, *zeta_i ); \
\
void* buf_chi = bli_obj_buffer_at_off( *chi ); \
\
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
\
/* Invoke the typed function. */ \
bli_call_ft_3 \
( \
dt_chi, \
opname, \
buf_zeta_i, \
buf_zeta_r, \
buf_chi \
); \
}
GENFRONT( zipsc )

125
frame/0/bli_l0_oapi.h Normal file
View File

@@ -0,0 +1,125 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of The University of Texas at Austin nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
//
// Prototype object-based interfaces.
//
#undef GENPROT
#define GENPROT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* chi, \
obj_t* absq \
);
GENPROT( absqsc )
GENPROT( normfsc )
#undef GENPROT
#define GENPROT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* chi, \
obj_t* psi \
);
GENPROT( addsc )
GENPROT( divsc )
GENPROT( mulsc )
GENPROT( sqrtsc )
GENPROT( subsc )
#undef GENPROT
#define GENPROT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* chi, \
double* zeta_r, \
double* zeta_i \
);
GENPROT( getsc )
#undef GENPROT
#define GENPROT( opname ) \
\
void PASTEMAC0(opname) \
( \
double zeta_r, \
double zeta_i, \
obj_t* chi \
);
GENPROT( setsc )
#undef GENPROT
#define GENPROT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* chi, \
obj_t* zeta_r, \
obj_t* zeta_i \
);
GENPROT( unzipsc )
#undef GENPROT
#define GENPROT( opname ) \
\
void PASTEMAC0(opname) \
( \
obj_t* zeta_r, \
obj_t* zeta_i, \
obj_t* chi \
);
GENPROT( zipsc )

210
frame/0/bli_l0_tapi.c Normal file
View File

@@ -0,0 +1,210 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of The University of Texas at Austin nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
//
// Define BLAS-like interfaces with typed operands.
//
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname, kername ) \
\
void PASTEMAC(ch,opname) \
( \
conj_t conjchi, \
ctype* chi, \
ctype* psi \
) \
{ \
ctype chi_conj; \
\
PASTEMAC(ch,copycjs)( conjchi, *chi, chi_conj ); \
PASTEMAC(ch,kername)( chi_conj, *psi ); \
}
INSERT_GENTFUNC_BASIC( addsc, adds )
INSERT_GENTFUNC_BASIC( divsc, invscals )
INSERT_GENTFUNC_BASIC( subsc, subs )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname, kername ) \
\
void PASTEMAC(ch,opname) \
( \
conj_t conjchi, \
ctype* chi, \
ctype* psi \
) \
{ \
if ( PASTEMAC(ch,eq0)( *chi ) ) \
{ \
/* Overwrite potential Infs and NaNs. */ \
PASTEMAC(ch,set0s)( *psi ); \
} \
else \
{ \
ctype chi_conj; \
\
PASTEMAC(ch,copycjs)( conjchi, *chi, chi_conj ); \
PASTEMAC(ch,kername)( chi_conj, *psi ); \
} \
}
INSERT_GENTFUNC_BASIC( mulsc, scals )
#undef GENTFUNCR
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype* chi, \
ctype_r* absq \
) \
{ \
ctype_r chi_r; \
ctype_r chi_i; \
ctype_r absq_i; \
\
( void )absq_i; \
\
PASTEMAC2(ch,chr,gets)( *chi, chi_r, chi_i ); \
\
/* absq = chi_r * chi_r + chi_i * chi_i; \
absq_r = 0.0; (thrown away) */ \
PASTEMAC(ch,absq2ris)( chi_r, chi_i, *absq, absq_i ); \
\
( void )chi_i; \
}
INSERT_GENTFUNCR_BASIC0( absqsc )
#undef GENTFUNCR
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype* chi, \
ctype_r* norm \
) \
{ \
/* norm = sqrt( chi_r * chi_r + chi_i * chi_i ); */ \
PASTEMAC2(ch,chr,abval2s)( *chi, *norm ); \
}
INSERT_GENTFUNCR_BASIC0( normfsc )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype* chi, \
ctype* psi \
) \
{ \
/* NOTE: sqrtsc/sqrt2s differs from normfsc/abval2s in the complex domain. */ \
PASTEMAC(ch,sqrt2s)( *chi, *psi ); \
}
INSERT_GENTFUNC_BASIC0( sqrtsc )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype* chi, \
double* zeta_r, \
double* zeta_i \
) \
{ \
PASTEMAC2(ch,d,gets)( *chi, *zeta_r, *zeta_i ); \
}
INSERT_GENTFUNC_BASIC0( getsc )
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
double zeta_r, \
double zeta_i, \
ctype* chi \
) \
{ \
PASTEMAC2(d,ch,sets)( zeta_r, zeta_i, *chi ); \
}
INSERT_GENTFUNC_BASIC0( setsc )
#undef GENTFUNCR
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype* chi, \
ctype_r* zeta_r, \
ctype_r* zeta_i \
) \
{ \
PASTEMAC2(ch,chr,gets)( *chi, *zeta_r, *zeta_i ); \
}
INSERT_GENTFUNCR_BASIC0( unzipsc )
#undef GENTFUNCR
#define GENTFUNCR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype_r* zeta_r, \
ctype_r* zeta_i, \
ctype* chi \
) \
{ \
PASTEMAC2(chr,ch,sets)( *zeta_r, *zeta_i, *chi ); \
}
INSERT_GENTFUNCR_BASIC0( zipsc )

131
frame/0/bli_l0_tapi.h Normal file
View File

@@ -0,0 +1,131 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of The University of Texas at Austin nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
//
// Prototype BLAS-like interfaces with typed operands.
//
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
conj_t conjchi, \
ctype* chi, \
ctype* psi \
);
INSERT_GENTPROT_BASIC( addsc )
INSERT_GENTPROT_BASIC( divsc )
INSERT_GENTPROT_BASIC( mulsc )
INSERT_GENTPROT_BASIC( subsc )
#undef GENTPROTR
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype* chi, \
ctype_r* absq \
);
INSERT_GENTPROTR_BASIC( absqsc )
INSERT_GENTPROTR_BASIC( normfsc )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype* chi, \
ctype* psi \
);
INSERT_GENTPROT_BASIC( sqrtsc )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype* chi, \
double* zeta_r, \
double* zeta_i \
);
INSERT_GENTPROT_BASIC( getsc )
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname) \
( \
double zeta_r, \
double zeta_i, \
ctype* chi \
);
INSERT_GENTPROT_BASIC( setsc )
#undef GENTPROTR
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype* chi, \
ctype_r* zeta_r, \
ctype_r* zeta_i \
);
INSERT_GENTPROTR_BASIC( unzipsc )
#undef GENTPROTR
#define GENTPROTR( ctype, ctype_r, ch, chr, opname ) \
\
void PASTEMAC(ch,opname) \
( \
ctype_r* zeta_r, \
ctype_r* zeta_i, \
ctype* chi \
);
INSERT_GENTPROTR_BASIC( zipsc )

View File

@@ -34,66 +34,93 @@
#include "blis.h"
// NOTE: This is one of the few functions in BLIS that is defined
// with heterogeneous type support. This is done so that we have
// an operation that can be used to typecast (copy-cast) a scalar
// of one datatype to a scalar of another datatype.
typedef void (*FUNCPTR_T)(
conj_t conjchi,
void* chi,
void* psi
);
static FUNCPTR_T GENARRAY2_ALL(ftypes,copysc);
//
// Define object-based interface.
// Define object-based interfaces.
//
void bli_copysc( obj_t* chi,
obj_t* psi )
{
if ( bli_error_checking_is_enabled() )
bli_copysc_check( chi, psi );
bli_copysc_unb_var1( chi, psi );
}
//
// Define BLAS-like interfaces with homogeneous-typed operands.
//
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname, varname ) \
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(ch,opname)( \
conj_t conjchi, \
ctype* chi, \
ctype* psi \
) \
void PASTEMAC0(opname) \
( \
obj_t* chi, \
obj_t* psi \
) \
{ \
PASTEMAC2(ch,ch,varname)( conjchi, \
chi, \
psi ); \
conj_t conjchi = bli_obj_conj_status( *chi ); \
\
num_t dt_psi = bli_obj_datatype( *psi ); \
void* buf_psi = bli_obj_buffer_at_off( *psi ); \
\
num_t dt_chi; \
void* buf_chi; \
\
FUNCPTR_T f; \
\
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( chi, psi ); \
\
/* If chi is a scalar constant, use dt_psi to extract the address of the
corresponding constant value; otherwise, use the datatype encoded
within the chi object and extract the buffer at the chi offset. */ \
bli_set_scalar_dt_buffer( chi, dt_psi, dt_chi, buf_chi ); \
\
/* Index into the type combination array to extract the correct
function pointer. */ \
f = ftypes[dt_chi][dt_psi]; \
\
/* Invoke the void pointer-based function. */ \
f( \
conjchi, \
buf_chi, \
buf_psi \
); \
}
INSERT_GENTFUNC_BASIC( copysc, copysc_unb_var1 )
GENFRONT( copysc )
//
// Define BLAS-like interfaces with heterogeneous-typed operands.
// Define BLAS-like interfaces with typed operands.
//
#undef GENTFUNC2
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, varname ) \
\
void PASTEMAC2(chx,chy,opname)( \
conj_t conjchi, \
ctype_x* chi, \
ctype_y* psi \
) \
void PASTEMAC2(chx,chy,varname) \
( \
conj_t conjchi, \
void* chi, \
void* psi \
) \
{ \
PASTEMAC2(chx,chy,varname)( conjchi, \
chi, \
psi ); \
ctype_x* chi_cast = chi; \
ctype_y* psi_cast = psi; \
\
if ( bli_is_conj( conjchi ) ) \
{ \
PASTEMAC2(chx,chy,copyjs)( *chi_cast, *psi_cast ); \
} \
else \
{ \
PASTEMAC2(chx,chy,copys)( *chi_cast, *psi_cast ); \
} \
}
// Define the basic set of functions unconditionally, and then also some
// mixed datatype functions if requested.
INSERT_GENTFUNC2_BASIC( copysc, copysc_unb_var1 )
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
INSERT_GENTFUNC2_MIX_D( copysc, copysc_unb_var1 )
#endif
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
INSERT_GENTFUNC2_MIX_P( copysc, copysc_unb_var1 )
#endif
INSERT_GENTFUNC2_BASIC0( copysc )
INSERT_GENTFUNC2_MIX_D0( copysc )
INSERT_GENTFUNC2_MIX_P0( copysc )

View File

@@ -32,51 +32,37 @@
*/
#include "bli_copysc_check.h"
#include "bli_copysc_unb_var1.h"
//
// Prototype object-based interface.
// Prototype object-based interfaces.
//
void bli_copysc( obj_t* chi,
obj_t* psi );
//
// Prototype BLAS-like interfaces with homogeneous-typed operands.
//
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC(ch,opname)( \
conj_t conjchi, \
ctype* chi, \
ctype* psi \
);
INSERT_GENTPROT_BASIC( copysc )
void PASTEMAC0(opname) \
( \
obj_t* chi, \
obj_t* psi \
);
GENFRONT( copysc )
//
// Prototype BLAS-like interfaces with heterogeneous-typed operands.
// Define BLAS-like interfaces with heterogeneous-typed operands.
//
#undef GENTPROT2
#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \
#define GENTPROT2( ctype_x, ctype_y, chx, chy, varname ) \
\
void PASTEMAC2(chx,chy,opname)( \
conj_t conjchi, \
ctype_x* chi, \
ctype_y* psi \
);
void PASTEMAC2(chx,chy,varname) \
( \
conj_t conjchi, \
void* chi, \
void* psi \
);
INSERT_GENTPROT2_BASIC( copysc )
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
INSERT_GENTPROT2_MIX_D( copysc )
#endif
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
INSERT_GENTPROT2_MIX_P( copysc )
#endif

View File

@@ -34,76 +34,78 @@
#include "blis.h"
typedef void (*FUNCPTR_T)(
void* chi,
double* zeta_r,
double* zeta_i
);
static FUNCPTR_T GENARRAY(ftypes,getsc);
//
// Define object-based interface.
// Define object-based interfaces.
//
#undef GENFRONT
#define GENFRONT( opname, varname ) \
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname)( \
obj_t* x, \
obj_t* y \
obj_t* chi, \
double* zeta_r, \
double* zeta_i \
) \
{ \
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( x, y ); \
num_t dt_chi = bli_obj_datatype( *chi ); \
num_t dt_def = BLIS_DCOMPLEX; \
num_t dt_use; \
\
PASTEMAC0(varname)( x, \
y ); \
/* If chi is a constant object, default to using the dcomplex
value to maximize precision, and since we don't know if the
caller needs just the real or the real and imaginary parts. */ \
void* buf_chi = bli_obj_buffer_for_1x1( dt_def, *chi ); \
\
FUNCPTR_T f; \
\
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( chi, zeta_r, zeta_i ); \
\
/* The _check() routine prevents integer types, so we know that chi
is either a constant or an actual floating-point type. */ \
if ( bli_is_constant( dt_chi ) ) dt_use = dt_def; \
else dt_use = dt_chi; \
\
/* Index into the type combination array to extract the correct
function pointer. */ \
f = ftypes[dt_use]; \
\
/* Invoke the function. */ \
f( \
buf_chi, \
zeta_r, \
zeta_i \
); \
}
GENFRONT( addv, addv_kernel )
GENFRONT( getsc )
//
// Define BLAS-like interfaces with homogeneous-typed operands.
// Define BLAS-like interfaces with typed operands.
//
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname, varname ) \
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname)( \
conj_t conjx, \
dim_t n, \
ctype* x, inc_t incx, \
ctype* y, inc_t incy \
void* chi, \
double* zeta_r, \
double* zeta_i \
) \
{ \
PASTEMAC2(ch,ch,varname)( conjx, \
n, \
x, incx, \
y, incy ); \
}
INSERT_GENTFUNC_BASIC( addv, ADDV_KERNEL )
//
// Define BLAS-like interfaces with heterogeneous-typed operands.
//
#undef GENTFUNC2
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \
ctype* chi_cast = chi; \
\
void PASTEMAC2(chx,chy,opname)( \
conj_t conjx, \
dim_t n, \
ctype_x* x, inc_t incx, \
ctype_y* y, inc_t incy \
) \
{ \
PASTEMAC2(chx,chy,varname)( conjx, \
n, \
x, incx, \
y, incy ); \
PASTEMAC2(ch,d,gets)( *chi_cast, *zeta_r, *zeta_i ); \
}
INSERT_GENTFUNC2_BASIC( addv, ADDV_KERNEL )
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
INSERT_GENTFUNC2_MIX_D( addv, ADDV_KERNEL )
#endif
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
INSERT_GENTFUNC2_MIX_P( addv, ADDV_KERNEL )
#endif
INSERT_GENTFUNC_BASIC( getsc )

View File

@@ -32,42 +32,33 @@
*/
#include "blis.h"
//
// Define object-based interface.
// Prototype object-based interfaces.
//
#undef GENFRONT
#define GENFRONT( opname, varname ) \
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname)( \
obj_t* x \
) \
{ \
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( x ); \
\
PASTEMAC0(varname)( x ); \
}
GENFRONT( invertv, invertv_kernel )
obj_t* chi, \
double* zeta_r, \
double* zeta_i \
);
GENFRONT( getsc )
//
// Define BLAS-like interfaces.
// Prototype BLAS-like interfaces with typed operands.
//
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname, varname ) \
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname)( \
dim_t n, \
ctype* x, inc_t incx \
) \
{ \
PASTEMAC(ch,varname)( n, \
x, incx ); \
}
INSERT_GENTFUNC_BASIC( invertv, INVERTV_KERNEL )
void* chi, \
double* zeta_r, \
double* zeta_i \
);
INSERT_GENTPROT_BASIC( getsc )

101
frame/0/old/bli_setsc.c Normal file
View File

@@ -0,0 +1,101 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of The University of Texas at Austin nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
#include "blis.h"
typedef void (*FUNCPTR_T)(
double* zeta_r,
double* zeta_i,
void* chi
);
static FUNCPTR_T GENARRAY(ftypes,setsc);
//
// Define object-based interfaces.
//
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname)( \
double* zeta_r, \
double* zeta_i, \
obj_t* chi \
) \
{ \
num_t dt_chi = bli_obj_datatype( *chi ); \
\
void* buf_chi = bli_obj_buffer_at_off( *chi ); \
\
FUNCPTR_T f; \
\
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( zeta_r, zeta_i, chi ); \
\
/* Index into the type combination array to extract the correct
function pointer. */ \
f = ftypes[dt_chi]; \
\
/* Invoke the function. */ \
f( \
zeta_r, \
zeta_i, \
buf_chi \
); \
}
GENFRONT( setsc )
//
// Define BLAS-like interfaces with typed operands.
//
#undef GENTFUNC
#define GENTFUNC( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname)( \
double* zeta_r, \
double* zeta_i \
void* chi, \
) \
{ \
ctype* chi_cast = chi; \
\
PASTEMAC2(d,ch,sets)( *zeta_r, *zeta_i, *chi_cast ); \
}
INSERT_GENTFUNC_BASIC( setsc )

64
frame/0/old/bli_setsc.h Normal file
View File

@@ -0,0 +1,64 @@
/*
BLIS
An object-based framework for developing high-performance BLAS-like
libraries.
Copyright (C) 2014, The University of Texas at Austin
Redistribution and use in source and binary forms, with or without
modification, are permitted provided that the following conditions are
met:
- Redistributions of source code must retain the above copyright
notice, this list of conditions and the following disclaimer.
- Redistributions in binary form must reproduce the above copyright
notice, this list of conditions and the following disclaimer in the
documentation and/or other materials provided with the distribution.
- Neither the name of The University of Texas at Austin nor the names
of its contributors may be used to endorse or promote products
derived from this software without specific prior written permission.
THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
"AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT
HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL,
SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT
LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE,
DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY
THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT
(INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE
OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.
*/
//
// Prototype object-based interfaces.
//
#undef GENFRONT
#define GENFRONT( opname ) \
\
void PASTEMAC0(opname)( \
double* zeta_r, \
double* zeta_i, \
obj_t* chi \
);
GENFRONT( setsc )
//
// Prototype BLAS-like interfaces with typed operands.
//
#undef GENTPROT
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname)( \
double* zeta_r, \
double* zeta_i, \
void* chi \
);
INSERT_GENTPROT_BASIC( setsc )

View File

@@ -38,22 +38,14 @@
//
// Define object-based interface.
//
#undef GENFRONT
#define GENFRONT( opname, varname ) \
\
void PASTEMAC0(opname)( \
obj_t* x, \
obj_t* y \
) \
{ \
if ( bli_error_checking_is_enabled() ) \
PASTEMAC(opname,_check)( x, y ); \
\
PASTEMAC0(varname)( x, \
y ); \
}
void bli_copysc( obj_t* chi,
obj_t* psi )
{
if ( bli_error_checking_is_enabled() )
bli_copysc_check( chi, psi );
GENFRONT( swapv, swapv_kernel )
bli_copysc_unb_var1( chi, psi );
}
//
@@ -63,17 +55,17 @@ GENFRONT( swapv, swapv_kernel )
#define GENTFUNC( ctype, ch, opname, varname ) \
\
void PASTEMAC(ch,opname)( \
dim_t n, \
ctype* x, inc_t incx, \
ctype* y, inc_t incy \
conj_t conjchi, \
ctype* chi, \
ctype* psi \
) \
{ \
PASTEMAC2(ch,ch,varname)( n, \
x, incx, \
y, incy ); \
PASTEMAC2(ch,ch,varname)( conjchi, \
chi, \
psi ); \
}
INSERT_GENTFUNC_BASIC( swapv, SWAPV_KERNEL )
INSERT_GENTFUNC_BASIC( copysc, copysc_unb_var1 )
//
@@ -83,23 +75,25 @@ INSERT_GENTFUNC_BASIC( swapv, SWAPV_KERNEL )
#define GENTFUNC2( ctype_x, ctype_y, chx, chy, opname, varname ) \
\
void PASTEMAC2(chx,chy,opname)( \
dim_t n, \
ctype_x* x, inc_t incx, \
ctype_y* y, inc_t incy \
conj_t conjchi, \
ctype_x* chi, \
ctype_y* psi \
) \
{ \
PASTEMAC2(chx,chy,varname)( n, \
x, incx, \
y, incy ); \
PASTEMAC2(chx,chy,varname)( conjchi, \
chi, \
psi ); \
}
INSERT_GENTFUNC2_BASIC( swapv, SWAPV_KERNEL )
// Define the basic set of functions unconditionally, and then also some
// mixed datatype functions if requested.
INSERT_GENTFUNC2_BASIC( copysc, copysc_unb_var1 )
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
INSERT_GENTFUNC2_MIX_D( swapv, SWAPV_KERNEL )
INSERT_GENTFUNC2_MIX_D( copysc, copysc_unb_var1 )
#endif
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
INSERT_GENTFUNC2_MIX_P( swapv, SWAPV_KERNEL )
INSERT_GENTFUNC2_MIX_P( copysc, copysc_unb_var1 )
#endif

View File

@@ -32,17 +32,15 @@
*/
#include "bli_setv_check.h"
#include "bli_setv_kernel.h"
#include "bli_setv_ref.h"
#include "bli_copysc_check.h"
#include "bli_copysc_unb_var1.h"
//
// Prototype object-based interface.
//
void bli_setv( obj_t* beta,
obj_t* x );
void bli_copysc( obj_t* chi,
obj_t* psi );
//
@@ -52,33 +50,33 @@ void bli_setv( obj_t* beta,
#define GENTPROT( ctype, ch, opname ) \
\
void PASTEMAC(ch,opname)( \
dim_t n, \
ctype* beta, \
ctype* x, inc_t incx \
conj_t conjchi, \
ctype* chi, \
ctype* psi \
);
INSERT_GENTPROT_BASIC( setv )
INSERT_GENTPROT_BASIC( copysc )
//
// Prototype BLAS-like interfaces with heterogeneous-typed operands.
//
#undef GENTPROT2
#define GENTPROT2( ctype_b, ctype_x, chb, chx, opname ) \
#define GENTPROT2( ctype_x, ctype_y, chx, chy, opname ) \
\
void PASTEMAC2(chb,chx,opname)( \
dim_t n, \
ctype_b* beta, \
ctype_x* x, inc_t incx \
void PASTEMAC2(chx,chy,opname)( \
conj_t conjchi, \
ctype_x* chi, \
ctype_y* psi \
);
INSERT_GENTPROT2_BASIC( setv )
INSERT_GENTPROT2_BASIC( copysc )
#ifdef BLIS_ENABLE_MIXED_DOMAIN_SUPPORT
INSERT_GENTPROT2_MIX_D( setv )
INSERT_GENTPROT2_MIX_D( copysc )
#endif
#ifdef BLIS_ENABLE_MIXED_PRECISION_SUPPORT
INSERT_GENTPROT2_MIX_P( setv )
INSERT_GENTPROT2_MIX_P( copysc )
#endif

Some files were not shown because too many files have changed in this diff Show More