Files
blis/CHANGELOG
2015-07-29 13:31:12 -05:00

6655 lines
267 KiB
Plaintext
Raw Blame History

This file contains ambiguous Unicode characters
This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.
commit 47caa33485b91ea6f2a5e386e61210c90c5f489f (HEAD -> master, tag: 0.1.8)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 29 13:31:09 2015 -0500
Version file update (0.1.8)
commit ef0fbbbdb6148b96938733fce72cb4ed7dad685e (origin/master)
Merge: fdfe14f d4b8913
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 9 13:54:54 2015 -0500
Merge branch 'master' of github.com:flame/blis
commit fdfe14f1e17ba5a2f8dfa0bdb799c6b0e730211b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 9 13:52:39 2015 -0500
Added support for Intel Haswell/Broadwell.
Details:
- Added sgemm and dgemm micro-kernels, which employ 256-bit AVX vectors
and FMA instructions. (Complex support is currently provided by default
induced method, 4m1a.)
- Added a 'haswell' configuration, which uses the aforementioned kernels.
- Inserted auto-detection support for haswell configuration in
build/auto-detect/cpuid_x86.c.
- Modified configure script to explicitly echo when automatic or manual
configuration is in progress.
- Changed beta scalar in test_gemm.c module of test suite to -1.0 to 0.9.
commit d4b891369c1eb0879ade662ff896a5b9a7fca207
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 7 10:06:53 2015 -0500
Added 'carrizo' configuration.
Details:
- Added a new configuration for AMD Excavator-based hardware also known
as Carrizo when referring to the entire APU. This configuration uses
the same micro-kernels as the piledriver, but with different
cache blocksizes.
commit 0b7255a642d56723f02d7ca1f8f21809967b8515
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 19 12:01:50 2015 -0500
CHANGELOG update (0.1.7)
commit 267253de8a7be546ce87626443ee38701c1d411f (tag: 0.1.7)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 19 12:01:49 2015 -0500
Version file update (0.1.7)
commit 7cd01b71b5e757a6774625b3c9f427f5e7664a76
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 19 11:31:53 2015 -0500
Implemented dynamic allocation for packing buffers.
Details:
- Replaced the old memory allocator, which was based on statically-
allocated arrays, with one based on a new internal pool_t type, which,
combined with a new bli_pool_*() API, provides a new abstract data
type that implements the same memory pool functionality but with blocks
from the heap (ie: malloc() or equivalent). Hiding the details of the
pool in a separate API also allows for a much simpler bli_mem.c family
of functions.
- Added a new internal header, bli_config_macro_defs.h, which enables
sane defaults for the values previously found in bli_config. Those
values can be overridden by #defining them in bli_config.h the same
way kernel defaults can be overridden in bli_kernel.h. This file most
resembles what was previously a typical configuration's bli_config.h.
- Added a new configuration macro, BLIS_POOL_ADDR_ALIGN_SIZE, which
defaults to BLIS_PAGE_SIZE, to specify the alignment of individual
blocks in the memory pool. Also added a corresponding query routine to
the bli_info API.
- Deprecated (once again) the micro-panel alignment feature. Upon further
reflection, it seems that the goal of more predictable L1 cache
replacement behavior is outweighed by the harm caused by non-contiguous
micro-panels when k % kc != 0. I honestly don't think anyone will even
miss this feature.
- Changed bli_ukr_get_funcs() and bli_ukr_get_ref_funcs() to call
bli_cntl_init() instead of bli_init().
- Removed query functions from bli_info.c that are no longer applicable
given the dynamic memory allocator.
- Removed unnecessary definitions from configurations' bli_config.h files,
which are now pleasantly sparse.
- Fixed incorrect flop counts in addv, subv, scal2v, scal2m testsuite
modules. Thanks to Devangi Parikh for pointing out these
miscalculations.
- Comment, whitespace changes.
commit 9848f255a3bab17d1139c391cca13ff3f1ffe6ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 11 19:14:22 2015 -0500
Added early return to API-level _init() routines.
Details:
- Added conditional code that returns early from the API-level _init()
routines if the API is already initialized. Actually meant for this to
be included in 5f93cbe8.
commit 5f93cbe870f3478870e15581e7fd450dad5bba1e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 11 18:52:12 2015 -0500
Introduced API-level initialization.
Details:
- Added API-level initialization state to _const, _error, _mem, _thread,
_ind, and _cntl APIs. While this functionality will mostly go unused,
adding miniscule overhead at init-time, there will be at least once
instance in the near future where, in order to avoid an infinite loop,
a certain portion of the initialization will call a query function that
itself attempts to call bli_init(). API-level initialization will allow
this later stage to verify that an earlier stage of initialization has
completed, even if the overall call to bli_init() has not yet returned.
- Added _is_initialized() functions for each API, setting the underlying
bool_t during _init() and unsetting it during _finalize().
- Comment, whitespace changes.
commit ee129c6b028bc5ac88da7c74fde72c49803742ff
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 10 12:53:28 2015 -0500
Fixed bugs in _get_range(), _get_range_weighted().
Details:
- Fixed some bugs that only manifested in multithreaded instances of
some (non-gemm) level-3 operations. The bugs were related to invalid
allocation of "edge" cases to thread subpartitions. (Here, we define
an "edge" case to be one where the dimension being partitioned for
parallelism is not a whole multiple of whatever register blocksize
is needed in that dimension.) In BLIS, we always require edge cases
to be part of the bottom, right, or bottom-right subpartitions.
(This is so that zero-padding only has to happen at the bottom, right,
or bottom-right edges of micro-panels.) The previous implementations
of bli_get_range() and _get_range_weighted() did not adhere to this
implicit policy and thus produced bad ranges for some combinations of
operation, parameter cases, problem sizes, and n-way parallelism.
- As part of the above fix, the functions bli_get_range() and
_get_range_weighted() have been renamed to use _l2r, _r2l, _t2b,
and _b2t suffixes, similar to the partitioning functions. This is
an easy way to make sure that the variants are calling the right
version of each function. The function signatures have also been
changed slightly.
- Comment/whitespace updates.
- Removed unnecessary '/' from macros in bli_obj_macro_defs.h.
commit 9135dfd69d39f3bbd75034f479f27a78dbfebcce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 5 13:37:44 2015 -0500
Minor updates to test/3m4m files.
commit d62ceece943b20537ec4dd99f25136b9ba2ae340
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 3 12:56:45 2015 -0500
Minor update to test/3m4m/runme.sh.
Details:
- Removed some stale script code that should have been removed
during 590bb3b8c.
commit b6ee82a3d421c9c4f1eb6848c7c6e37aa46de799
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 3 12:14:23 2015 -0500
Minor cleanup to bli_init() and friends.
Details:
- Spun-off initialization of global scalar constants to bli_const_init()
and of threading stuff to bli_thread_init().
- Added some missing _finalize() functions, even when there is nothing
to do.
commit 1213f5cebabc1637ce9dd45c4bfa87bb93677c29
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 2 13:27:47 2015 -0500
POSIX thread bugfixes/edits to bli_init.c, _mem.c.
Details:
- Fixed a sort-of bug in bli_init.c whereby the wrong pthread mutex
was used to lock access to initialization/finalization actions.
But everything worked out okay as long as bli_init() was called by
single-threaded code.
- Changed to static initialization for memory allocator mutex in
bli_mem.c, and moved mutex to that file (from bli_init.c).
- Fixed some type mismatches in bli_threading_pthreads.c that resulted
in compiler warnings.
- Fixed a small memory leak with allocated-but-never-freed (and unused)
pthread_attr_t objects.
- Whitespace changes to bli_init.c and bli_mem.c.
commit 590bb3b8c5c0389159c5a9451b6c156c5f237e8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun May 24 16:02:53 2015 -0500
Backed-out adjusted dim changes to test/3m4m.
Details:
- Reverted most changes applied during commit ec25807b.
commit ec25807b26da943868f0d0517c3720e50181b8f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 10 13:23:50 2015 -0500
Tweaks to test/3m4m to test with adjusted dims.
Details:
- Updated test/3m4m driver files to build test drivers that allow
comparision of real "asm_blis" results to complex "asm_blis" results,
except with the latter's problem sizes adjusted so that problems are
generated with equal flop counts.
commit 426b6488580a92bf071a62dc319a9c837ce39821
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 8 15:12:21 2015 -0500
Fixed a packing bug that manifested in trsm_r.
Details:
- Fixed a bug that caused a memory leak in the contiguous memory
allocator. Because packm_init() was using simple aliasing when
a subpartition object was marked as zeros by bli_acquire_mpart_*(),
the "destination" pack object's mem_t entry was being overwritten
by the corresponding field of the "source" object (which was likely
NULL). This prevented the block from being released back to the
memory allocator. But this bug only manifested when changing the
location of packing B from outside the var1 loop to inside the
var3 loop, and only for trsm with triangular B (side = right). The
bug was fixed by changing the type of alias used in packm_init()
when handling zero partition cases. Specifically, we now use
bli_obj_alias_for_packing(), which does not clobber the destination
(pack) object's mem_t field. Thanks to Devangi Parikh for this bug
report.
commit c84286d5cef48f16d83831baac1f46b9856b9a36
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 4 15:39:14 2015 -0500
More minor tweaks to test/3m4m.
Details:
- Added a line of output that forces matlab to allocate the entire array
up-front.
- Re-enabled real domain benchmarks in runme.sh, which were temporarily
disabled.
commit 309717c8ebf4ef1369f15cf41340e13c25b41573
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 3 19:28:49 2015 -0500
More tweaks to test/3m4m, configurations.
Details:
- Fixed incorrect number of mc_x_kc memory blocks in
sandybridge/bli_config.h.
- Enabled OpenMP multithreding in piledriver/bli_config.h.
- More updates to test/3m4m driver files.
commit 4baf3b9c69b2f648be9e46e07ccc9859dd675828
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 3 16:44:32 2015 -0500
Tweaked test/3m4m driver, including acml support.
Details:
- Added ACML support to test/3m4m driver Makefile and runme.sh script.
commit a32f7c49ca4ea869d2a6c66818780f4321743d67
Merge: 349e075 4bfd1ce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 3 08:28:11 2015 -0500
Merge pull request #23 from xianyi/master
Add auto-detecting CPU on configure stage.
commit 349e075ad6a8e2a1211d94f36d24828c9d44b052
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 2 18:12:28 2015 -0500
Tweaks to sandybridge config, test/3m4m driver.
Details:
- Enable OpenMP support by default in sandybridge's bli_config.h.
- Reorganized sandybridge's bli_kernel.h.
- Updated 3m4m Makefile, runme.sh to also test MKL implementation.
commit 4bfd1ce8ca93f93d170dd2715f0a32027b417b46
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Thu Apr 2 16:40:21 2015 -0500
Detect NEON for cortex-a9 and cortex-a15.
commit aa6eec4f43137057276fe6119bdbfb5c52682527
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Thu Apr 2 16:03:44 2015 -0500
Detect the CPU architecture. Support ARM cores.
Detect the CPU architecture by compiler's predefined macros.
Then, detect the CPU cores.
Support detecting x86 and ARM architectures.
commit 2947cfb749c937b0f62fac36cc92f123bd45b53c
Author: Zhang Xianyi <traits.zhang@gmail.com>
Date: Wed Apr 1 12:24:00 2015 -0500
Add auto-detecting CPU on configure stage.
e.g. /Path_to_BLIS/configure auto
Now, it only support detecting x86 CPUs.
commit 26a4b8f6f985597f80e0174990bf541f1d9bafac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 1 10:44:54 2015 -0500
Implemented 3m2, 3m3 induced algorithms (gemm only).
Details:
- Defined a new "3ms" (separated 3m) pack schema and added appropriate
support in packm_init(), packm_blk_var2().
- Generalized packm_struc_cxk_3mi to take the imaginary stride (is_p)
as an argument instead of computing it locally. Exception: for trmm,
is_p must be computed locally, since it changes for triangular
packed matrices. Also exposed is_p in interface to dt-specific
packm_blk_var2 (and _var1, even though it does not use imaginary
stride).
- Renamed many functions/variables from _3mi to _3mis to indicate that
they work for either interleaved or separated 3m pack schemas.
- Generalized gemm and herk macro-kernels to pass in imaginary stride
rather than compute them locally.
- Added support for 3m2 and 3m3 algorithms to frame/ind, including 3m2-
and 3m3-specific virtual micro-kernels.
- Added special gemm macro-kernels to support 3m2 and 3m3.
- Added support for 3m2 and 3m3 to testsuite.
- Corrected the type of the panel dimension (pd_) in various macro-
kernels from inc_t to dim_t.
- Renamed many functions defined in bli_blocksize.c.
- Moved most induced-related macro defs from frame/include to
frame/ind/include.
- Updated the _ukernel.c files so that the micro-kernel function pointers
are obtained from the func_t objects rather than the cpp macros that
define the function names.
- Updated test/3m4m driver, Makefile, and run script.
commit ddf62ba7d2da08225b201585b85e06c967767dea
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Mar 27 14:27:51 2015 -0500
Refuse to free the packm thread info if it uses the single threaded version
commit 016fc587584d958a0e430a56a5e2c05022ac2f17
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Mar 27 14:23:02 2015 -0500
Don't free packm thread info if it is null
commit 00a443c529a60862a57b93e303a0b3212c9b1df4
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Mar 27 14:11:07 2015 -0500
Use bli_malloc instead of malloc for the thread info paths
commit f1a6b7d02861ccebdc500ea98778cc0f6cddad17
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 18 15:37:10 2015 -0500
Reorganized code for induced complex methods.
Details:
- Consolidated most of the code relating to induced complex methods
(e.g. 4mh, 4m1, 3mh, 3m1, etc.) into frame/ind. Induced methods
are now enabled on a per-operation basis. The current "available"
(enabled and implemented) implementation can then be queried on
an operation basis. Micro-kernel func_t objects as well as blksz_t
objects can also be queried in a similar maner.
- Redefined several micro-kernel and operation-related functions in
bli_info_*() API, in accordance with above changes.
- Added mr and nr fields to blksz_t object, which point to the mr
and nr blksz_t objects for each cache blocksize (and are NULL for
register blocksizes). Renamed the sub-blocksize field "sub" to
"mult" since it is really expressing a blocksize multiple.
- Updated bli_*_determine_kc_[fb]() for gemm/hemm/symm, trmm, and
trsm to correctly query mr and nr (for purposes of nudging kc).
- Introduced an enumerated opid_t in bli_type_defs.h that uniquely
identifies an operation. For now, only level-3 id values are defined,
along with a generic, catch-all BLIS_NOID value.
- Reworked testsuite so that all induced methods that are enabled
are tested (one at a time) rather than only testing the first
available method.
- Reformated summary at the beginning of testsuite output so that
blocksize and micro-kernel info is shown for each induced method
that was requested (as well as native execution).
- Reduced the number of columns needed to display non-matlab
testsuite output (from approx. 90 to 80).
commit 8d5169ccda954e5f72944308a036dcb7ebfc9097
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 18 11:38:08 2015 -0500
Fixed bug in release of mem_t buffer.
Details:
- Fixed a bug that affects all level-2 and level-3 blocked variants. The
bug only manifested, however, if the packing of operands (A and B in
gemm, for example) spanned multiple nodes in the control tree. Until
recently, the main consumers of packm were level-3 operations, all of
which packed both input operands from blocked variant 1 (B outside of
the loop, and A within the loop). This particular usage masked a flaw
in the code whereby bli_obj_release_pack() would always release the
underlying mem_t buffer (provided it was allocated), even if the buffer
was not allocated in the current variant. This has been fixed by
replacing all calls to bli_obj_release_pack() with calls to a new
function, bli_packm_release(), which takes the same control tree node
argument passed into the object's corresponding call to packm_init()
or packv_init(). bli_packm_release() then proceeds to invoke
bli_obj_release_pack() only if the control tree node indicates that
packing was requested. Thanks to Devangi Parikh for identifying this
bug.
commit c0acca0f5182ba96fd39c9d10b34a896a6e74206
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 3 10:56:22 2015 -0600
Clarified comments in testsuite input.operations.
commit 03ba9a6b17861d9e1adc0cf924439c4d7e860d19
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 24 10:33:28 2015 -0600
Removed some 'old' directories.
commit a86db60ee270cdeb745ae7cf68f9e0becc9f522d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 23 18:42:39 2015 -0600
Extensive renaming of 3m/4m-related files, symbols.
Details:
- Renamed all remaining 3m/4m packing files and symbols to 3mi/4mi
('i' for "interleaved"). Similar changes to 3M/4M macros.
- Renamed all 3m/4m files and functions to 3m1/4m1.
- Whitespace changes.
commit 8cf8da291a0fb2f491f410969a76ec0fbda47faf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 20 15:24:27 2015 -0600
Minor updates to induced complex mode management.
Details:
- Relocated bli_4mh.c, bli_4mb.c, bli_4m.c, bli_3mh.c, bli_3m.c (and
associated headers) from frame/base to frame/base/induced.
- Added bli_xm.? to frame/base/induced, which implements
bli_xm_is_enabled(), which detects whether ANY induced complex method
is currently enabled.
- The new function bli_xm_is_enabled() is now used in bli_info.c to
detect when an induced complex method is used, so we know when to
return blocksizes from one of the induced methods' blocksize objects.
commit 411e637ee7d1083a84f58f08938d51e63d7c3c9a
Merge: c2569b8 fc0b771
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Fri Feb 20 20:39:25 2015 -0600
Merge branch 'master' of http://github.com/flame/blis
commit c2569b8803d4ccc1d7b6f391713461b51443601d
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Fri Feb 20 20:38:19 2015 -0600
Fixed a memory leak in freeing the thread infos
commit fc0b771227abf86d81f505b324f69f6e83db1d8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 20 11:47:44 2015 -0600
Added max(mr,nr) to kc in static mem pools.
Details:
- Changed the static memory definitions to compute the maximum register
blocksize for each datatype and add it to kc when computing the size
of blocks of A and B. This formally accounts for the nudging of kc
up to a multiple of mr or nr at runtime for triangular operations
(e.g. trmm).
commit af32e3a608631953ef770341df10a14a991bf290
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Thu Feb 19 22:51:11 2015 -0600
Fixed a bug with get_range_weighted would return end = 0 for small problem sizes
commit 441d47542a64e131578d00da7404c1ed387a721c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 19 17:06:10 2015 -0600
Renamed 3m and 4m symbols/macros to 3mi and 4mi.
Details:
- Renamed several variables and macros from 3m/4m to 3mi/4mi. This is
because those packing schemas were always implicitly "interleaved".
This new naming scheme will make way for new schemas that separate
instead of interleve the real and imaginary (and summed) parts.
- Expanded the pack format sub-field of the pack schema field of the
info_t to 4 bits (from 3). This will allow for more schema types
going forward.
- Removed old _cntl.c files for herk3m, herk4m, trmm3m, trmm4m.
commit 518a1756ccf02122b96fc437b538604a597df42a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 19 14:27:09 2015 -0600
Fixed indexing bug for trmm3 via 3mh, 4mh.
Details:
- Fixed a bug that only affected trmm3 when performed via 3mh or 4mh,
whereby micro-panels of the triangular matrix were packed with "dead
space" between them due to failing to adjust for the fact that pointer
arithmetic was occurring in units of complex elements while the data
being packed consisted of real elements. It turns out that the macro-
kernel suffered from the same bug, meaning the panels were actually
being packed and read consistently. The only way I was able to
discover the bug in the first place was because the packed block of A
was overflowing into the beginning of the packed row panel of B using
the sandybridge configuration.
commit 493087d730f01d5169434f461644e5633f48a42f
Merge: 650d2a6 2502129
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 18 09:45:51 2015 -0600
Merge branch 'master' of github.com:flame/blis
commit 25021299b670775df8ca9c87910c63d7e74ed946
Merge: fe2b8d3 f05a576
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 11 20:03:21 2015 -0600
Merge branch 'master' of github.com:flame/blis
commit fe2b8d39a445ac848686e78c7540fd046cb95492
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 11 19:33:10 2015 -0600
Fixed an obscure bug in 3mh/3m/4mh/4m packing.
Details:
- Modified bli_packm_blk_var1.c and _var2.c to increase the triangular
case's panel increment by 1 if it would otherwise be odd. This is
particularly necessary in _var2.c when handling the interleaved 3m
or ro/io/rpi pack schemas, since division of an odd number by 2 can
happen if both the panel length and the panel packing dimension
(register packing blocksize) are odd, thus making their product odd.
- Modified bli_packm_init.c so that panel strides are increased by 1
if they would otherwise be odd, even for non-3m related packing.
- Modified the trmm and trsm macro-kernels so that triangular packed
micro-panels are traversed with this new "increment by 1 if odd"
policy.
- Added sanity checks in trmm and trsm macro-kernels that would result
in an abort() if the conditions that would lead to a "divide odd
integer by 2" scenario ever manifest.
- Defined bli_is_odd(), _is_even() macros in bli_scalar_macro_defs.h.
commit 650d2a6ff2e593151a296ca86b5214afcc747afc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 9 14:59:20 2015 -0600
Added initial support for imaginary stride.
Details:
- Added an imaginary stride field ("is") to obj_t.
- Renamed bli_obj_set_incs() macro to bli_obj_set_strides().
- Defined bli_obj_imag_stride() and bli_obj_set_imag_stride() and
added invocations in key locations.
- Added some basic error-checking related to imaginary stride.
- For now, imaginary stride will not be exposed into the most-used
BLIS APIs such as bli_obj_create(), and certainly not the
computational APIs such as bli_dgemm().
commit f05a57634a7c8e3864b25b3335d1194c1ea1aeb9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Feb 8 19:40:34 2015 -0600
Defined gemm cntl function to query ukrs func_t.
Details:
- Added a new function, bli_gemm_cntl_ukrs(), that returns the func_t*
for the gemm micro-kernels from the leaf node of the control tree.
This allows all the func_t* fields from higher-level nodes in the tree
to be NULL, which makes the function that builds the control trees
slightly easier to read.
- Call bli_gemm_cntl_ukrs() instead of the cntl_gemm_ukrs() macro in
all bli_*_front() functions (which is needed to apply the row/column
preference optimization).
- In all level-3 bli_*_cntl_init() functions, changed the _obj_create()
function arguments corresponding to the gemm_ukrs fields in higher-
level cntl tree nodes to NULL.
- Removed some old her2k macro-kernels.
commit cefd3d5d2001264de17cf63dae541f890cb9daaf
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 5 11:09:12 2015 -0600
A couple of functions were incorrectly ifdeffed away on Xeon Phi. Fixed this
commit 7574c9947d57a19f613880e3b9f62f8c8f6df4ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 4 12:11:55 2015 -0600
Added basic flop-counting mechanism (level-3 only).
Details:
- Added optional flop counting to all level-3 front-ends, which is
enabled via BLIS_ENABLE_FLOP_COUNT. The flop count can be
reset at any time via bli_flop_count_reset() and queried via
bli_flop_count(). Caveats:
- flop counts are approximate for her[2]k, syr[2]k, trmm, and
trsm operations;
- flop counts ignore extra flops due to non-unit alpha;
- flop counts do not account for situations where beta is zero.
commit ceda4f27d1f1bcf19320e09848e0f2e3b9941e6c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 29 13:22:54 2015 -0600
Implemented bli_obj_imag_equals().
Details:
- Implemented a new function, bli_obj_imag_equals(), which compares the
imaginary part of the first argument to the second argument, which may
be a BLIS_CONSTANT or of a regular real datatype.
commit 81114824a05a9053229efd577a8a94a856deda93
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 6 12:15:21 2015 -0600
Minor 4m/3m consolidation to mem_pool_macro_defs.h.
Details:
- Merged the 4m and 3m definitions in bli_mem_pool_macro_defs.h to
reduce code and improve readability.
commit 36a9b7b7436d9423ba4de2a9f85cfcd43577b783
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Wed Dec 17 21:53:50 2014 +0000
reduced the default number of MC by KC blocks for bgq
commit c60619c7c3568f044a849abbab60209aa7455423
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 16 17:08:22 2014 -0600
Minor tweaks for 3m4m test drivers.
Details:
- Changed gemm_kc blocksizes to be reduced by two-thirds instead of
half.
- Changed 3m4m/test_gemm.c driver to divide by 3 instead of 2 when
computing the fixed k dimension.
- Fixed runme.sh so that it would use multiple threads for s/dgemm
cases.
commit c6929ba6a5e6f633a7295e979a2b8df8c7ecdb1b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 16 11:27:50 2014 -0600
Added 4m_1b to test/3m4m test driver and script.
commit 785d480805fc0d6f4251b5499933515740b6b2a7
Merge: 9456f33 4156c08
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 12 14:34:19 2014 -0600
Merge branch 'master' of github.com:flame/blis
commit 9456f330af4617f9ee32972d51f974aa2d84f97b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 12 14:31:57 2014 -0600
Added 4m_1b implementation for gemm.
Details:
- Added yet another 4m-based implementation for complex domain level-3
operations. This method, which the 3m/4m paper identifies as Algorithm
"4m_1b" fissures the first loop around the micro-kernel so that the
real sub-panel of the current micro-panel of B is multiplied against
(both sub-panels of) all micro-panels of A, before doing the same for
the imaginary sub-panel of the micro-panel of B. For now, only gemm is
supported, and 4m_1b (labeled "4mb" within the framework) is not yet
integrated into the test suite.
commit 4156c0880d9aea4ff04a9c4fa139ba8c437d8bfb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 9 16:03:14 2014 -0600
Fixed obscure level-2 packing / general stride bug.
Details:
- Fixed a bug in certain structured level-2 operations that manifested
only when the structured matrix was provided to BLIS as matrix stored
with general stride. The bug was introduced in c472993b when the
densify field was removed from the packm control tree node and
associated APIs. Since then, the packed object was unconditionally
marked with an uplo field of BLIS_DENSE. This is fine for level-3
operations where micro-panels are always densified, but in level-2
contexts, the underlying unblocked variant (fused or unfused) of
structured operations (e.g. trmv) still needs to know whether to
execute its "lower" or "upper" branches of code. Since this field
was unconditionally being set to BLIS_DENSE, the unblocked variants
were always executed the "else" branch, which happened to be the
"lower" case code. Thus, running an upper case produced the wrong
answer. This most obviously manifested in the form of failures for
trmm, trmm3, and trsm in the test suite.
The bug was fixed by setting the packed object's uplo field to
BLIS_DENSE only if the schema indicated that micro-panels were to be
packed. Otherwise, we can assume we are packing to regular row or
column storage, as is the case with level-2 packing. Thanks to
Francisco Igual for reporting the testsuite failures and ultimately
leading us to this bug.
commit 689f60a578b461119e9ea90c74f642b9eb79addb
Merge: bef24e6 483e4d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Dec 7 14:03:30 2014 -0600
Merge pull request #21 from figual/master
Adding armv8a configuration and micro-kernels.
commit 483e4d6a3fdbef9d9ab47fb674c9476c70ca9f0f
Author: Francisco D. Igual <figual@ucm.es>
Date: Sun Dec 7 20:27:49 2014 +0100
Adding armv8a configuration and micro-kernels.
Only sgemm micro-kernel is fully functional at this point.
commit bef24e67e0f93579c2a80315348dc2e227f72a72
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Nov 26 18:00:56 2014 -0600
Fixed a type of race condition exposed by pthreads implementation.
Lead thread of the inner thread communicator could exit subproblem, move on the next iteration of the loop and modify a1_pack, b1_pack, or c1_pack while other threads were still using those.
Barriers were inserted to fix this.
commit 76bde44411f0e34266bab9d666a54ef22be97320
Merge: e56e614 f3d729e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 26 17:25:24 2014 -0600
Merge branch 'master' of github.com:flame/blis
commit f3d729e504ec012e7dc7e02b2ecd42e004c6894d
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Wed Nov 26 22:25:24 2014 -0600
Added static mutex to bli_init and bli_finalize
commit d71cc797866ff502ad1127527016f463267eef80
Author: Tyler Michael Smith <tms@cs.utexas.edu>
Date: Wed Nov 26 21:35:39 2014 -0600
Refactored bli_threading files and added support for pthreads
commit e56e61438ff7fcf25a48c0b7603f18df782b50b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 26 17:20:35 2014 -0600
Minor cleanups to bli_threading.h and friends.
Details:
- No longer need to define BLIS_ENABLE_MULTITHREADING manually in
bli_config.h; it now gets defined when BLIS_ENABLE_OPENMP or
BLIS_ENABLE_PTHREADS is defined.
- Added sanity check to prevent both BLIS__ENABLE_OPENMP and
BLIS_ENABLE_PTHREADS from being enabled simultaneously.
- Reorganization of bli_threading*.h header files, which led to
simplification of threading-related part of blis.h.
- added "-fopenmp -lpthread" to LDFLAGS of sandybridge make_defs.mk
file.
commit 3be2744cbe2c56d38c23fd818aa5c1f10cc7ea51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 21 12:28:08 2014 -0600
Update to template gemm ukernel comments.
Details:
- Updated comments on alignment of a1 and b1 to match wiki.
commit 994429c6881b2ade92d9d7949bcaebfbf2cc65eb
Merge: 58796ab 694029d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 20 13:55:35 2014 -0600
Merge pull request #20 from TimmyLiu/master
#define PASTEF773 required by cblas compatibility layer
commit 694029d9d7db857d642ab536955c0621791108c8
Author: Timmy <timmy.liu@amd.com>
Date: Wed Nov 19 15:25:14 2014 -0600
#define PASTEF773 required by cblas compatiility layer
commit 58796abda66b133346f8d523b39178afc336351f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 6 14:31:52 2014 -0600
Removed KC constraint comments from _kernel.h files.
Details:
- Since 4674ca8c, the constraint that KC be a multiple of both MR and
NR have been relaxed, and thus it was time to remove the comments
from the top of the bli_kernel.h files of all configurations.
commit 7bbc95a54f706d43c7f7951f0e5995f86130cd52
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 29 10:52:23 2014 -0500
Added new piledriver micro-kernels.
Details:
- Added new micro-kernels for the AMD piledriver architecture (one
for each datatype).
- Updates and tweaks to piledriver configuration.
- Added 3xk packm micro-kernel support.
- Explicitly unrolled some of the smaller packm micro-kernels.
- Added notes to avx/sandybridge and piledriver micro-kernel files
acknowledging the influence of the corresponding kernel code in
OpenBLAS.
commit 59613f1d5500f6279963327db2fbc84bc9135183
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 23 17:21:37 2014 -0500
Added separeate micro-panel alignment for A and B.
Details:
- Changed the recently-added micro-panel alignment macros so that we now
have two sets--one for micro-panels of matrix A and one for micro-
panels of matrix B: BLIS_UPANEL_[AB]_ALIGN_SIZE_?.
- Store each set of alignment values into a separate blksz_t object in
bli_gemm_cntl_init().
- Adjusted packm_init() to use the separate alignment values.
- Added query routines for the new alignment values to bli_info.c.
- Modified test suite output accordingly.
commit a8e12884ee1fddd3fd77ca5a68aa0cb857f3af57
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 23 11:35:48 2014 -0500
CHANGELOG update (0.1.6)
commit 38ea5022e4ed846112198c4e1672fcdaeb90dc71 (tag: 0.1.6)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 23 11:35:45 2014 -0500
Version file update (0.1.6)
commit a3e6341bdb0e28411f935d6b4708a6389663e004
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 23 11:13:28 2014 -0500
Factored common code from blocksize functions.
Details:
- Split bli_determine_blocksize_[fb]() into two functions each, the
newer ones ending with the _sub suffix. These new sub-functions are
now called from bli_[gemm|trmm|trsm]_determine_kc_[fb](), which
eliminates redundant code and will allow any future tweaks to the
core sub-functions to automatically be inherited by the operation-
specific versions.
commit 4674ca8cffb58331ff7edf23bbe0e3f6a7558489
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 23 10:50:59 2014 -0500
Extended newly relaxed KC to hemm, symm.
Details:
- These changes were intended for the previous commit.
- Defined bli_gemm_determine_kc_[fb]() and bli_gemm_determine_kc_[fb](),
which determine blocksizes for gemm-based operations, taking special
care to "nudge" the kc dimension up to a multiple of MR or NR for
hemm and symm operations, as needed.
- Changed bli_gemm_blk_var3f.c to call bli_gemm_determine_kc_f().
instead of bli_determine_blocksize_f().
- Comment updates to bli_trmm_blocksize.c, bli_trsm_blocksize.c.
commit ab954ba6f874eaca7b001804491f866ef6b9b327
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 22 17:21:58 2014 -0500
Relaxed constraint that KC be multiple of MR, NR.
Details:
- Relaxed a long-held requirement in register blocksizes that required
the kernel programmer to choose a KC that was divisible by both MR
and NR. This was very constraining on some architectures that did not
use register blocksizes that were powers of two. The constraint is
now enforced only for trmm and trsm, where it is needed, and it is
now handled by "nudging" kc upward at runtime, if necessary, to be a
multiple of MR or NR, as needed.
- Defined bli_trmm_determine_kc_[fb]() and bli_trsm_determine_kc_[fb](),
which determine blocksizes for trmm and trsm, taking special care to
"nudge" the kc dimension up to a multiple of MR or NR, as needed.
- Changed bli_trmm_blk_var3[fb].c to call bli_trmm_determine_kc_[fb]()
instead of bli_determine_blocksize_[fb]().
- Added safeguard to bli_align_dim_to_mult() that returns the dimension
unmodified if the dimension multiple is zero (to avoid division by
zero).
- Removed cpp guard/check for KC % MR == 0 and KC % NR == 0 from
bli_kernel_macro_defs.h.
- Whitespace, variable name changes to bli_blocksize.c.
- Removed old commented code from bli_gemm_cntl.c.
commit 95cdae65d6b88e043ee14bcd53cd2e800d7aecb4
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Oct 22 16:30:16 2014 -0500
Fixed bug in KNC microkernel where k=0 and beta != 1
commit e64dba5633fc49b768b5edc7762f2b5d8a4d0588
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 20 19:23:06 2014 -0500
Re-implemented micro-panel alignment.
Details:
- This commit re-implements a feature that was removed in commit
c2b2ab62. It was removed because, at the time, I wasn't sure how the
micro-panel alignment feature would interact with the 4m method (when
applied at the micro-kernrel level), and so it seemed safer to disable
the feature entirely rather than allow possible breakage. This commit
revisits the issue and safely re-implements the feature in a way that
is compatible with 4m, 3m, 4mh, and 3mh (and native execution).
- Modified the static memory pool to account for micro-panel alignment
space.
- Modified packm_init and blocked variants to align whole micro-panels
by a datatype-specific alignment value that may be set by the
configuration. (If it is not set by the configuration, it will default
to BLIS_SIZEOF_?.)
- Modified macro-kernels so that:
- storage stride is handled properly given the new micro-panel
alignment behavior;
- indexing through 3m/4m/rih-type sub-panels, as is done by trmm and
trsm, is more robust (e.g. will work if the applicable packing
register blocksize is odd);
- imaginary strides are computed and stored within auxinfo_t structs,
which allows the virtual micro-kernels to more easily determine how
to index into the micro-panel operands.
- Modified virtual 3m and 4m micro-kernels to use the imaginary strides
within the auxinfo_t structs instead of panel strides.
- Deprecated the panel stride fields from the auxinfo_t structs.
- Updated test suite to print out the micro-panel alignment values.
commit add16b0e5402924301e7078e4ca5e3ef725bff0b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 17 11:49:24 2014 -0500
Added 3m4m test driver subdir of 'test'.
Details:
- Added a modified test driver for [cz]gemm that will test all 3m/4m
as well as assembly-based and OpenBLAS implementations of gemm
in single and multithreaded modes.
commit e171504a72406c61a173241d8bccf0a5ceb10582
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 17 11:25:59 2014 -0500
Use correct definition of bli_is_last_iter().
Details:
- As intended for previous commit, the new definition of
bli_is_last_iter() is now disabled in favor of the old
definition.
commit 0d954087b2b55d2f5f3c5e57d702b318ca2300f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 17 11:19:34 2014 -0500
Minor changes and fixes.
Details:
- Redefined bli_is_last_iter() to take thread_id and num_thread
arguments, which allows the macro to correctly compute whether a
given iteration is the last that the thread will compute in that
particular loop. The new definition, however, remains disabled
(commented out) until someone can look at this more closely, as
the new definition seems to actually hurt performance slightly.
- Whitespace and related updates to level-3 macro-kernels.
- Updated test suite so that performance results in the hundreds of
gigaflops does not disrupt the column alignment of the output.
commit d1e86e1876e433f54b501ec5a005b4ba7c5ce4e6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Oct 12 13:43:47 2014 -0500
More minor tweaks to sandybridge/avx micro-kernel.
Details:
- Re-enabled use of b_next for dgemm and cgemm micro-kernels.
commit 7b6fe4cae57cb22c09c1a97595e1a201a02cbcd2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Oct 12 12:01:51 2014 -0500
Minor tweaks to sandybridge/avx micro-kernels.
Details:
- Changed the MC blocksize for zgemm micro-kernel from 128 to 64.
- Removed usage of b_next in all x86_64/avx gemm micro-kernels.
commit a6a156e9feec47154e7a0fd43bcc006b1fc04aba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 10 14:26:41 2014 -0500
Added cgemm ukernel for avx/sandybridge.
Details:
- Implemented AVX-based cgemm micro-kernel (via GNU extended inline
assembly syntax).
- Updated sandybridge configuration accordingly.
commit 6f8575ab2580e167a022293b76ddf0514f71b613
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 10 10:01:45 2014 -0500
Added zgemm ukernel for avx/sandybridge.
Details:
- Implemented AVX-based zgemm micro-kernel (via GNU extended inline
assembly syntax).
- Updated sandybridge configuration accordingly.
commit 23ce7ee542a12ca40b4b6090ad2558d180e16d37
Merge: 99fd9a3 7a8ad47
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 9 16:41:22 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit 99fd9a39718cb7281f6fb23f9fef7cca4fe514f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 9 16:38:04 2014 -0500
Fixed two minor bugs.
Details:
- Fixed a bug in the test suite for the trsm_ukr and gemmtrsm_ukr test
modules whereby the uplo bits of some packed matrix objects were not
being set properly, resulting in false FAILURE results for those
tests. Thanks to Tyler Smith for bringing this issue to my attention.
- Fixed a bug in bli_obj_alloc_buffer() that caused an unnecessary
"not yet implemented" abort() when creating a 1x1 object with non-unit
strides.
commit 7a8ad47fb2d100a9da93aa8cab774fcceeaab733
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Oct 8 15:52:13 2014 -0500
Minor changes to knc configuration, including preference row major storage
Also fixed a bug in the knc micro-kernel where it would fail if k == 0
commit 76b7c34af0c09f47d9615b18857a356acddc788a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 2 14:15:38 2014 -0500
Fixed a bug in the pack schema-related bit macros.
Details:
- Expanded the BLIS_PACK_SCHEMA_BITS value in bli_type_defs.h to
include all six bits presently used in the pack schema bitfield of
the info field of obj_t structs. Prior to this commit, the macro
constant only included the lowest five bits, which excluded the
"is or is not packed" bit. This manifested as a strange bug in
probably many level-2 codes that invoked packing, though we only
observed it in ger before fixing. Thanks to Devin Matthews for
finding and reporting this bug.
commit a5763e332226598d70c47dfa9cad4578e15ef5f4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 2 13:28:17 2014 -0500
Added extra output to bli_obj_print().
Details:
- Print extra values from info field of obj_t struct within
bli_obj_print().
commit 9bba209fc44fbfce943ba6a51cd8278a0cb6b159
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Sep 29 14:56:36 2014 -0500
Fixed bug when packing anywhere besides in blk_var_1 for gemm.
commit 614a4afc9272adb47e5a8b83b39d56c2804d95d6
Merge: b541b66 4a7df04
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Sep 26 10:49:57 2014 -0500
Merge branch 'master' of http://github.com/flame/blis
commit 4a7df04e8a4ffdb9561d26426afd35e4fe15b013
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 22 16:06:15 2014 -0500
Added 30xk support for packm ukernels.
Details:
- Updated bli_kernel_*_macro_defs.h headers to include default
definitions for 30xk packm kernels.
- Extended function pointer arrays in bli_packm_cxk_*() out to 31 and
included 30xk kernels.
- Addex 30xk kernels to frame/1m/packm/ukernels/bli_packm_ref_cxk_*.c.
commit b6d4bd792e0d44ce4b28afef343f5ff3ba89c285
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 22 16:02:37 2014 -0500
Fixed missing tabs from Makefile patch.
commit 32630f9b6f0d5ba28d5b56dae4c7288a37158743
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 19 17:18:20 2014 -0500
Comment update to virtual micro-kernels.
commit 13447cffead7c6d137a7a3ccbf9e552ed0477467
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 19 13:00:48 2014 -0500
Minor bugfix to top-level Makefile.
Details:
- Applied a patch that allows the top-level Makefile to work on certain
systems. The patch simply separates out the source-to-object code
generation rules for .c and .S files into two separate rules. Thanks
to Devin Matthews for submitting this patch.
commit e80a4537846416719c067ae08a53aeda978c572d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 18 10:24:20 2014 -0500
Fixed bug introduced by bugfix in 25b258d.
Details:
- We actually need to check alignment of lda*sizeof(double) and NOT
a+lda because in the latter case, alignment could cancel out and
still allow the optimized code to run when it shouldn't. Thanks
to Devin for pointing this out.
commit 25b258d61f9c8cee64e922f4131784b6edb196dd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 18 10:10:49 2014 -0500
Fixed a non-fatal problem with bugfix in a68b316c.
Details:
- The bugfix in a68b316c was inadvertantly checkin alignment of the
leading dimension itself, rather than the byte size of the leading
dimension. Now, we simply check alignment of a+lda.
commit 96302d4fc81363410e41c3a3c43a65df44d97ad9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 18 09:43:40 2014 -0500
Renamed bli_info_get_*_ukr_type() functions.
Details:
- Added _string() suffix to bli_info_get_*_ukr_type() function names.
This makes them consistent with the bli_info_get_*_impl_string()
functions.
commit a68b316ca4852509f84ed50e01afac486bf70f58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 17 11:10:07 2014 -0500
Fixed alignment bugs in level-1f kernels.
Details:
- Fixed bugs whereby the level-1f dotxf, axpyxf, and dotxaxpyf kernels
were attempting to compute problems with unaligned leading dimensions
with optimized code, rather than (correctly) using the reference
implementations. Thanks to Devin Matthews for reporting this bug.
commit 870761eb902e4866090d1d3446a345df3d6d4599
Merge: e9899be a2b59a3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 16 18:20:49 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit e9899be09044829e23386bd73e394f1dd7778210
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 16 18:19:32 2014 -0500
Added high-level implementations of 4m, 3m.
Details:
- Added "4mh" and "3mh" APIs, which implement the 4m and 3m methods at
high levels, respectively. APIs for trmm and trsm were NOT added due
to the fact that these approaches are inherently incompatible with
implementing 4m or 3m at high levels (because the input right-hand
side matrix is overwritten).
- Added 4mh, 3mh virtual micro-kernels, and updated the existing 4m and
3m so that all are stylistically consistent.
- Added new "rih" packing kernels (both low-level and structure-aware)
to support both 4mh and 3mh.
- Defined new pack_t schemas to support real-only, imaginary-only, and
real+imaginary packing formats.
- Added various level0 scalar macros to support the rih packm kernels.
- Minor tweaks to trmm macro-kernels to facilitate 4mh and 3mh.
- Added the ability to enable/disable 4mh, 3m, and 3mh, and adjusted
level-3 front-ends to check enabledness of 3mh, 3m, 4mh, and 4m (in
that order) and execute the first one that is enabled, or the native
implementation if none are enabled.
- Added implementation query functions for each level-3 operation so
that the user can query a string that describes the implementation
that is currently enabled.
- Updated test suite to output implementation types for reach level-3
operation, as well as micro-kernel types for each of the five micro-
kernels.
- Renamed BLIS_ENABLE_?COMPLEX_VIA_4M macros to _ENABLE_VIRTUAL_?COMPLEX.
- Fixed an obscure bug when packing Hermitian matrices (regular packing
type) whereby the diagonal elements of the packed micro-panels could
get tainted if the source matrix's imaginary diagonal part contained
garbage.
commit a2b59a37f166f70a6dd5793db2530823ef590c2b
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Sep 15 10:44:44 2014 -0500
Fixed make defs so that they actually compile for bulldozer
commit 86fc7e40764f78ec217f50216ef4fa5b57dbfbc7
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Sep 15 10:35:46 2014 -0500
Added bulldozer configuration and updated piledriver micro-kernel
commit 0644e61a79a57f136be5f4c47b9099cff2af06e0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 11 12:55:34 2014 -0500
Minor updates to bli_packm_init.c.
commit 9dc9b44a057a08e20ad4d423344f0ecad54c1eb2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 11 12:03:28 2014 -0500
Renamed bli_obj_pack_status() to _pack_schema().
Details:
- Renamed the bli_obj_pack_status() macro to bli_obj_pack_schema() in
order to help avoid confusion as to what the macro returns.
commit cf5efdde0588a0d5b6ea57fe7d7be5000be06f8e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Sep 11 11:47:56 2014 -0500
Pass pack_t schemas into ukernels via auxinfo_t.
Details:
- Modified macro-kernels to pass the pack_t schema values for matrices
A and B into the datatype-specific functions, where they are now
inserted into a newly-expanded auxinfo_t struct. This gives gives the
micro-kernels access to the pack_t schema values embedded in the
control trees, which determine the precise format into which the
matrix elements are packed.
- Updated a call to bli_packm_init_pack() in src/test_libblis.c to
remove densify argument. Meant to include this in commit c472993b.
commit cc8d2b82775cca3c2d51bf427f4e77c8024a6d15
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 9 13:48:22 2014 -0500
Updated old test drivers in 'test'.
commit c472993bbccb69e9ffc409c79b742426c8ad2ad4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 9 13:42:04 2014 -0500
Removed densify argument to packm_cntl_obj_create().
Details:
- Removed the "densify" bool_t argument to bli_packm_cntl_obj_create().
This argument was inserted very early in BLIS's development, when it
was anticipated that the developer may sometimes wish to pack a
Hermitian, symmetric, or triangular matrix without making it dense.
But as it turns out, if we are packing a matrix, we always want to
make it dense in some way or another due to the fact that the micro-
kernel only multiplies dense micro-panels. Thus, unless/until there
is a real need for the feature, it seems reasonable to remove it from
the packm_cntl API.
commit 5c43ee387146cd76dc59b730dac6683a8446b834
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 8 15:19:29 2014 -0500
Moved trmm4m/3m_cntl files to 'old' directory.
Details:
- Meant to include this in previous commit.
commit 7b2f469d5465ed73b1ca88124bc9a1987388aa27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 8 14:49:50 2014 -0500
Retired trmm_t control tree definitions, usage.
Details:
- Replaced all trmm_t control tree instances and usage with that of
gemm_t. This change is similar to the recent retirement of the herk_t
control tree.
- Tweaked packm blocked variants so that the triangular code does NOT
assume that k is a multiple of MR (when A is triangular) or NR (when
B is triangular). This means that bottom-right micro-panels packed for
trmm will have different zero-padding when k is not already a multiple
of the relevant register blocksize. While this creates a seemingly
arbitrary and unnecessary distinction between trmm and trsm packing,
it actually allows trmm to be handled with one control tree, instead
of one for left and one for right side cases. Furthermore, since only
one tree is required, it can now be handled by the gemm tree, and thus
the trmm control tree definitions can be disposed of entirely.
- Tweaked trmm macro-kernels so that they do NOT inflate k up to a
multiple of MR (when A is triangular) or NR (when B is triangular).
- Misc. tweaks and cleanups to bli_packm_struc_cxk_4m.c and _3m.c, some
of which are to facilitate above-mentioned changes whereby k is no
longer required to be a multiple of register blocksize when packing
triangular micro-panels.
- Adjusted trmm3 according to above changes.
- Retired trmm_t control tree creation/initialization functions.
commit 576e9e9255a79dba9cd3c804267f51e0b4aa6e8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Sep 7 16:12:52 2014 -0500
Retired herk_t control tree definitions, usage.
Details:
- Replaced all herk_t control tree instances and usage with that of
gemm_t, since the two types presently have the same fields. This means
that herk, her2k, syrk, and syr2k can simply use the gemm control tree
as-is, just as hemm and symm have been doing for some time now.
- Retired herk_t control tree creation/initialization functions.
- Retired many _target.c and .h files into 'old' directories.
commit b2fed052c9a23d858ef0afbe220b342bce9aa7f7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 3 17:07:25 2014 -0500
Minor code cleanup to bli_packm_struc_cxk*.c
Details:
- Realized that we don't need to track rs_p11 and cs_p11 for
Hermitian/symmetric case of bli_packm_struc_cxk*(). They are always
equal to rs_p and cs_p.
commit 023ce770966b3b5a98bba729c5af1f45e15ebb97
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 3 10:47:53 2014 -0500
Minor update to packm_cxk kernels.
Details:
- Changed m and n dimension parameter names to panel_dim and panel_len,
respectively, in packm_cxk, packm_cxk_3m, packm_cxk_4m kernel wrapper
functions. This makes the code a little easier to read since "m" and
"n" have connotations that are not applicable here.
- Comment updates.
commit 189def3667d9218adbeec45e2801fd074341a679
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 1 16:23:17 2014 -0500
Retired portions of bli_kernel_3m/4m_macro_defs.h.
Details:
- Removed sections of bli_kernel_[4m|3m]_macro_defs.h that defined
4m/3m-specific blocksizes after realizing that this can be done in
bli_gemm[4m|3m]_cntl.c, since that is (mostly) the only place they
are used.
- The maximum cache values for 4m/3m are stll needed when computing mem
pool dimensions in bli_mem_pool_macro_defs.h. As a workaround, "local"
definitions in terms of the regular cache blocksizes are now in place.
- Similarly, the register blocksizes for 4m/3m are still needed in
bli_kernel_post_macro_defs.h. As a workaround, "local" definitions in
terms of the regular register blocksizes are now in place.
commit af521ee6f2a77d61c98b833e85c09969987bc00d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 1 14:06:46 2014 -0500
Changed semantics of blocksize extensions.
Details:
- Changed semantics of cache and register blocksize extensions so that
the extended values are tracked, rather than just the marginal
extensions.
- BLIS_EXTEND_[MKN]C_? has been renamed BLIS_MAXIMUM_[MKN]C_?.
- BLIS_EXTEND_[MKN]R_? has been renamed BLIS_PACKDIM_[MKN]R_?.
- bli_blksz_ext_*() APIs have been renamed to bli_blksz_max_*(). Note
that these "max" query routines grab the maximum value for cache
blocksizes and the packdim value for register blocksizes.
- bli_info_*() API has been updated accordingly.
- All configurations have been updated accordingly.
commit 07f23aefd52f5ba4960dbd46e59b180a2136b8e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 31 11:58:50 2014 -0500
Pass pack schema into packm_struc_cxk*().
Details:
- Changed the interface to the packm_struc_cxk*() kernels to include
the pack_t schema. This allows the implementation to more easily
determine how the micro-panel is stored (row-stored column panel
or column-stored row panel).
- Updated packm blocked variants to pass in the schema.
- Updated packm_ker_t function pointer definition accordingly.
commit f032ba9b1186cb02184574d339565f53d733aa42
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 30 16:21:20 2014 -0500
Reorganized packm implementation.
Details:
- Reorganized packm variants and structure-aware kernels so that all
routines for a given pack format (4m, 3m, regular) reside in a single
file.
- Renamed _blk_var4 to _blk_var2 and generalized so that it will work
for
both 4m and 3m, and adjusted 4m/3m _cntl_init() functions accordingly.
- Added a new packm_ker_t function pointer type to
bli_kernel_type_defs.h
to facilitate function pointer typecasting in the datatype-specific
packm_blk_var2() functions.
- Deprecated _blk_var3.
- Fixed a bug in the triangular micro-panel packing facility that
affected trmm and trmm3 with unit diagonals.
commit c6793cecb70788bdf2c76ab8102504ea97be9d2a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 28 17:14:48 2014 -0500
Reorganized #includes for scalar macro headers.
Details:
- Reordered the #include statements in bli_scalar_macro_defs.h so that
conventional, ri-, and ri3-based macros are grouped together.
- Renamed bli_eqri.h (and macros within) to end with 'ris' suffix.
commit b4da8907284345be4374f87a88679c4886ab866e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 28 14:10:32 2014 -0500
Whitespace, comments updates on packm_blk_var?.c.
commit 46e46a1d83da586c3dd9fd7a01eb16067abbaee1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 28 12:05:45 2014 -0500
Minor updates to packm blocked, cxk_3m/4m code.
Details:
- Added 'const' qualifier to inlined packing code that handles
micro-panel packing that is too large for an existing packm ukernel.
- Comment updates.
commit 908dc688b5979995eaacb3aa937f241551a8df00
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 28 11:55:12 2014 -0500
Pass pack schema into blocked packm routines.
Details:
- Rather than passing the packm blocked routines a boolean value that
represents whether the matrix is being packed to row or column storage,
we now pass in the pack schema itself.
commit a0ff6066e06075ab5f92b19247b39b92ed15f1bf
Merge: c4c99c4 d40b32b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 15:56:21 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit c4c99c4813bf9817592a7899c5d33412fe22313f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 15:52:22 2014 -0500
Renamed packm scalar from beta to kappa.
Details:
- The packm implementation (i.e. sources files in frame/1m/packm and
frame/1m/packm/ukernels), interchangeably used the names "beta" and
"kappa" to refer to the optional scalar to be applied during packing.
This commit renames all uses of "beta" to be "kappa", since "beta"
sometimes evokes the scalar specifically on the output matrix of a
level-2 or level-3 operation.
commit d40b32bc24ffbae24123e054307b3138969bb095
Merge: 9331f79 6c25c37
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 13:46:36 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit 6c25c379fadb50834146e1614f7b80c093c2aad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 13:44:10 2014 -0500
Consolidated unpackm ukernels into single file.
Details:
- Reorganized unpackm ukernels into a single file,
bli_unpackm_ref_cxk.c, in a manner similar to what was done for packm
ukernels in commit 4cc2b46.
commit 9331f79443223fe267676ee54c439e1ed320380c
Merge: 7fc48a7 670b639
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 10:54:21 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit 670b63926a7f4fc694abc5b1582ef8a4f367f5a8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 24 10:46:27 2014 -0500
Added whitespace to bli_obj_scalar_ routine calls.
Details:
- Added extra spaces to align arguments of
bli_obj_scalar_init_detached_copy_of(). This misalignment was due to
the fact that the function was previously named
bli_obj_init_scalar_copy_of() and the name change, performed in
b444489f, was done via recursive sed commands which left subsequent
lines untouched.
commit 7fc48a7d920e07fd8e9528ab2565123f8f4e67f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 23 16:50:58 2014 -0500
Combined 4m/3m bits into an expanded bitfield.
Details:
- Combined the 4m/3m bits into an expanded bitfield, which will encode
the packing "format" of the micro-panels. This will allow for more
easily and compactly encoding additional formats.
- Other minor comment/whitespace updates to bli_type_defs.h.
- Updated bli_obj_macro_defs.h and bli_param_macro_defs.h to use the new
format bitfield.
- Comment update to bli_kernel_post_macro_defs.h.
- Whitespace changes to bli_kernel_3m_macro_defs.h, _4m_macro_defs.h.
commit ef0143cc1417e4815e4cafd5a464cc83fe7a1e86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Aug 23 14:02:27 2014 -0500
Renamed _ri, _ri3 packm ukernels to _4m, _3m.
Details:
- Renamed packm ukernels, _cxk dispatcher, and structure-aware _cxk
helper functions to use _4m and _3m instead of _ri and _ri3 suffixes.
- Updated names of cpp macros that correspond to packm ukernels.
commit b0ccac116158b5ed3316d34798748ba0c6d78672
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 21 19:21:52 2014 -0500
Cleaned up front-end layering for 4m/3m.
Details:
- Added an extra layer to level-3 front-ends (examples: bli_gemm_entry()
and bli_gemm4m_entry()) to hide the control trees from the code that
decides whether to execute native or 4m-based implementations. The
layering was also applied to 3m.
- Branch to 4m code based on the return value of bli_4m_is_enabled(),
rather than the cpp macros BLIS_ENABLE_?COMPLEX_VIA_4M. This lays
the groundwork for users to be able to change at runtime which
implementation is called by the main front-ends (e.g. bli_gemm()).
- Retired some experimental gemm code that hadn't been touched in
months.
commit bedec95451cabfa7a8906b51018a5e0572998a5e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 21 18:25:48 2014 -0500
Added bli_4m API for querying 4m enabled state.
Details:
- Added bli_4m.c (and header), which defines a simple API that can be
used to query, enable, and disable 4m-based complex support in BLIS.
The macros BLIS_ENABLE_?COMPLEX_VIA_4M are now used to initialize
the variable that determines the state (enabled or disabled).
- Changed bli_info*() API so that all cache and register blocksize-
related query routines return the blksz_t objects' values as they
exist at runtime, rather than return the values as determined by the
configuration system (e.g. bli_kernel.h, or defaults for those values
not specified). This sets the foundation for being able to change
those blocksizes at runtime.
commit b541b667cabfa6d41b50ad1e49209651ee6812cc
Merge: 699a815 dd61307
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Aug 20 14:44:51 2014 -0500
Merge branch 'master' of http://github.com/flame/blis
Conflicts:
frame/3/trsm/bli_trsm_blk_var2b.c
frame/3/trsm/bli_trsm_blk_var2f.c
commit 699a8151ca3d5021e834a1784ef45dcc3a3d17cd
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Aug 20 14:43:17 2014 -0500
Some improvements to trsm parallelism
commit dd61307f55bb6bc762fe0ef0446479d6c0536723
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 20 09:52:16 2014 -0500
Minor update to sandybridge MC_S, KC_S.
Details:
- Changed sandybridge MC and KC for single-precision real to 128 and 384,
respectively.
- Updated comments in template configuration's gemm micro-kernel file
to document the new "contiguous row preference" macro.
commit d0eec4bddd740ce360d0f655362c551287cf925b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 19 15:49:19 2014 -0500
Added optional row preference to ukernel config.
Details:
- Added the ability for the kernel developer to indicate the gemm micro-
kernel as having a preference for accessing the micro-tile of C via
contiguous rows (as opposed to contiguous columns). This property may
be encoded in bli_kernel.h as BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS,
which may be defined or left undefined. Leaving it undefined leads to
the default assumption of column preference.
- Changed conditionals in frame/3/*/*_front.c that induce transposition
of the operation so that the transposition is induced only if there
is disagreement between the storage of C and the preference of the
micro-kernel. Previously, the only conditional that needed to be met
was that C was row-stored, which is to say that we assumed the micro-
kernel preferred column-contiguous access on C.
- Added a "prefers_contig_rows" property to func_t objects, and updated
calls to bli_func_obj_create() in _cntl.c files in order to support
the above changes.
- Removed the row-storage optimization from bli_trsm_front.c because
it is actually ineffective. This is because the right-side case of
trsm flips the A and B micro-panel operands (since BLIS only requires
left-side gemmtrsm/trsm kernels), meaning any transposition done
at the high level is then undone at the low level.
- Tweaked trmm, trmm3 _front.c files to eliminate a possible redundant
invocation of the bli_obj_swap() macro.
commit 4cc2b464f29cafbfef9295b073b857fe0752f710
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Aug 15 11:49:15 2014 -0500
Reorganized packm ukernels.
Details:
- Previously, packm micro-kernels were organized by the implied register
blocksize (panel dimension) assumed by the kernel, meaning conventional,
ri, and ri3 variations of some micro-kernel size were housed in the same
file. This commit reorganizes the micro-kernels so that all sizes reside
in the same file for each format type (conventional, ri, and ri3).
commit fcc10054a11b6fc3976986f57feccf741596cbf6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 13 12:32:06 2014 -0500
Tweaks to gemm4m, gemm3m virtual ukernels.
Details:
- Fixed a potential, but as-yet unobserved bug in gemm3m that would
allow undesirable inf/NaN propogation, since C was being scaled by
beta even if it was equal to zero.
- In gemm3m micro-kernel, we now avoid copying C to the temporary
micro-tile if beta is zero.
- Rearranged computation in gemm4m so that the temporary C micro-tile
is accessed less, and C is accessed only after the micro-kernel
calls. This improves performance marginally in most situations.
- Comment updates to both gemm4m and gemm3m micro-kernels.
commit cdcbacc2fa871317c8e7ef961ecc6d70ab22dc34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 12 12:45:38 2014 -0500
Removed redundant redef of packm ukr prototypes.
Details:
- Removed redundant macro code that redefined packm ukernel prototypes
when the previous macro was already sufficient. This helps de-clutter
the packm ukernel prototyping headers a little bit.
commit 82dac98d9032ccb598068a55ddf23d7898491e9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 12 12:36:25 2014 -0500
Relocated packm ukernel #includes.
Details:
- Consolidated the #include statements for packm ukernel headers from
bli_packm_cxk.h, bli_packm_cxk_ri.h, and bli_packm_cxk_ri3.h to
bli_packm.h.
- Comment/whitespace updates to bli_packm_blk_var3.c, _var4.c.
commit 7f77856e25aad5fc6f172ed3e57b6351804e31a4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 12 12:20:15 2014 -0500
Removed unused 4m/3m-related packm macro defs.
Details:
- Removed unused and unneeded s- and d-flavored macro definitions for
packm ukernels related to the complex 4m and 3m methods, as
implemented in BLIS.
commit bc1d86b2d4d436b1dfba2d0098501aaca9cbb8b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 7 19:01:20 2014 -0500
Sandy Bridge configuration, micro-kernel update.
Details:
- Minor updates to bli_config and bli_kernel.h for sandybridge
configuration.
- Renamed existing AVX intrinsic-based micro-kernel file to
bli_gemm_int_d8x4.c.
- Added new file, bli_gemm_asm_d8x4.c, which provides assembly-based
gemm micro-kernels for single- and double-precision real.
commit 98ec95877a95242e159b2bf0c879115a59e4c6e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 7 18:28:32 2014 -0500
Corrected comment for _obj_is_[row|col]_stored().
Details:
- Fixed a mistake in the comments introduced in the previous commit for
bli_obj_is_row_stored() and bli_obj_is_col_stored().
commit 43d5e419e1b424d2143817103dbee8ead797e8aa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 7 18:20:40 2014 -0500
Reverted _obj_is_[row|col]_stored() macros.
Details:
- Rolled back recent changes to bli_obj_is_row_stored() and
bli_obj_is_col_stored() so that those macros now only inspect the
strides (row or column). It turns out that the more sophisticated
definitions introduced in a51e32e are not necessary, because these
"obj" macros are virtually never used on packed matrices, and when
they are, they can use bli_obj_is_[row|col}_packed() macros, which
inspect the info bitfield.
commit 45692e3ad4b7e1d05ac4302398df4efce04b4284
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 7 13:21:15 2014 -0500
Reverted some accidental changes.
Details:
- Reverted some changes that were unintentionally included in the
previous commit (9526ce98). Thanks to Tony Kelman for pointing
this out. (Note: a few select changes were not reverted.)
commit 9526ce98812be908bc4915f2849b657fb6ce1b49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 6 14:13:46 2014 -0500
Updated copyright headers of emscripten configuration files.
commit 30833ed71d56f231ddba21e632bcbbc90b12a97c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 6 12:12:03 2014 -0500
Minor edits to configurations' make_defs.mk files.
Details:
- Redefined CFLAGS, CFLAGS_NOOPT, and CFLAGS_KERNELS so that CFLAGS_NOOPT
is defined first and then the other two are defined in terms of
CFLAGS_NOOPT. This textually cleans up the definitions and makes them a
little easier to read.
commit 9d61afeae2ba70fe1df07e7546f6954ea83aed12
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 4 16:01:59 2014 -0500
CHANGELOG update (0.1.5)
commit bde56d0ecfd0ec20330fac290b91a6dca0cf94e9 (tag: 0.1.5)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 4 16:01:58 2014 -0500
Version file update (0.1.5)
commit 4c6ceea4be35d089630986eb5b959b9e97214077
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 4 15:49:59 2014 -0500
Added CBLAS compatibility layer.
Details:
- Added a new section in bli_config.h files of all configurations for
enabling CBLAS support. (Currently, the default is for the CBLAS layer
to be disabled.)
- Added a directory, frame/compat/cblas, to house CBLAS source code. A
subdirectory 'f77_sub' holds subroutine wrappers corresponding to
subroutines found in CBLAS that allow calling some BLAS routines with
the return value passed as the last argument rather than as an actual
(function) return value. This was probably intended to allow CBLAS to
avoid the whole f2c debacle altogether. However, since BLIS does not
assume the presence of a Fortran compiler, we had to provide similar
routines in C.
- A script, integrate-cblas-tarball.sh, is included to streamline the
integration of future revisions of the CBLAS source code.
- The current tarball, cblas.tgz, that was used with the above script to
generate the present set of CBLAS source code is also included.
- Updated blis.h to include necessary CBLAS-related headers.
commit caab62dac0fb0bd0d674118f409c81680db94d29
Merge: 383631b db97ce9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Aug 3 14:36:18 2014 -0500
Merge pull request #19 from kevinoid/fix-install-perms-error
Fix permissions error installing to non-owned directory
commit db97ce979b88c051922c2f946ce52d523c7a12c6
Author: Kevin Locke <kevin@kevinlocke.name>
Date: Sun Aug 3 12:48:04 2014 -0600
Fix permissions error installing to non-owned directory
When installing to a directory which is not owned by the installing
user, even when the user has write permission for the directory, the
installation can fail with an error similar to the following:
Installing libblis-0.1.4-7-sandybridge.a into /usr/local/lib/
install: cannot change permissions of /usr/local/lib: Operation not permitted
Makefile:658: recipe for target '/usr/local/lib/libblis-0.1.4-7-sandybridge.a' failed
make: *** [/usr/local/lib/libblis-0.1.4-7-sandybridge.a] Error 1
In the example case, the error occurred because the user attempted to
install to /usr/local and /usr/local/lib is owned by root with mode 2755
which the Makefile unsuccessfully attempted to change to 0755.
Given that installing to /usr/local is likely to be quite common and the
ownership/permissions are the default for Debian and Debian-derived
Linux distributions (perhaps others as well), this commit attempts to
support that use case by using mkdir rather than install to create the
directory (which is the same approach as Automake).
Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
commit 383631b514c3d42b724640f57644eea276cc418c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 31 14:51:48 2014 -0500
Redefined bit field macros with bitshift operator.
Details:
- Redefined many of the macros that define bit fields and bit values in
the obj_t info field using the bitshift operator (<<). This makes it
easier to reorder bit fields, or expand existing bit fields, or add
new fields. The bitshifting should be evaluated by the compiler at
compile-time.
commit 137143345dc93cc9a83da5ba88b25bac7502de86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 31 12:12:45 2014 -0500
Reimplemented unit blocksize fix in prev commit.
Details:
- Instead of inferring the storage format of the micro-panels from within
the packm variants, we now pass in a bool_t value that denotes whether
the packed matrix contains row-stored column panels or column-stored
row panels. This value can then be tested more easily inside the main
packm variant loop.
- Renumbered pack_t schema values in bli_type_defs.h so that there are
now five bits, each with different meaning:
- 4: packed or not packed?
- 3: packed for 3m?
- 2: packed for 4m?
- 1: packed to panels?
- 0: stored by rows or columns?
- Added new macros that test for status of above bits in schema bit
subfield, and renamed some existing macros related to 4m/3m.
commit a51e32ec061941cd10119ea80115c82a40b1673f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 30 10:41:48 2014 -0500
Fixed unit register blocksize brokenness.
Details:
- Fixed a breakdown in BLIS's ability to differentiate between row-stored
and column-stored micro-panels when MR or NR is unit. When either
register blocksize (or both) is equal to one, inspecting the strides of
the affected packed micro-panel is no longer sufficient to determine
whether the micro-panel is a row-stored column panel or a column-stored
row panel (because both strides are unit). At that point, dimension
information is necessary when invoking the bli_is_row_stored_f() and
bli_is_col_stored_f() macros (and their "obj" counterparts). Thanks to
Ilya Polkovnichenko for reporting this bug.
- Added panel dimensions (m and n) to obj_t, which are set in
packm_init() and then passed into the blocked variants to support the
aforementioned update.
commit c2732272f0ac680a0ad19fa9db5d587398a1479a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 29 16:37:18 2014 -0500
Removed old/unused packm variants.
commit b97fa9a5a70fe0123e5eebd999b947461d38445f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 27 18:54:09 2014 -0500
Minor usage update to build/bump-version.sh.
commit b18ba5f62d98629cdd519ff4c96fc67ec1a62fb9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 27 18:52:05 2014 -0500
Added missing 'bla_' prefix to r_imag(), d_imag().
Details:
- Added "bla_" to f2c functions r_imag() and d_imag(). Thanks to Murtaza
Ali for pointing the mis-named functions.
commit af7a8e6c042cade452130a6729377f1a3ef4e19e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 27 18:20:13 2014 -0500
CHANGELOG update (0.1.4)
commit a7537071b152ecff671f8716595d37dc09e4fd51 (tag: 0.1.4)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 27 18:20:12 2014 -0500
Version file update (0.1.4)
commit acff74041bf02c7b9fdfa24b507bca782a4c5fce
Merge: cdb9413 47b243e
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Jul 23 15:07:30 2014 -0500
Merge branch 'master' of https://github.com/flame/blis
commit cdb9413e140f8a198666250ec88fa34b5425a9c3
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Jul 23 15:05:15 2014 -0500
Enabled threading for a couple more loops in TRSM
JC loop is now enabled for the left-sided case
IC loop is now enabled for the right-sided case
commit 47b243ef08f4101de3d936f2373343e67eaa4dd5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 23 13:41:13 2014 -0500
Call setid for early return from herk/her2k.
Details:
- Added setid call (to zero imaginary parts of diagonal elements) to
early return branches of herk_front() and her2k_front() for cases
where alpha is zero. Thanks to Murtaza Ali for suggesting this fix.
- Comment update.
commit 3e7b0db5b0e24f5fd66c60bacabc019885ddbec5
Merge: 2f8a357 ed3e33d
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Jul 23 13:40:44 2014 -0500
Merge branch 'master' of https://github.com/flame/blis
commit 2f8a357de5fb55163a969d888cf059f24b78125c
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Jul 23 13:40:12 2014 -0500
Some TRSM threading fixes/additions
commit ed3e33d548047be3283ff41268fdf716563bc542
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 22 14:40:43 2014 -0500
Tweaked behavior of herk, her2k for BLAS compat.
Details:
- Updated herk_front() and her2k_front() to explicitly set the imaginary
components of the diagonal entries of C to zero after the computation
is complete. This is needed in case downstream applications read the
full diagonal entries (i.e., including imaginary part), which could, in
the absence of this modification, accumulate numerical error from
subsequent rank-k/rank-2k updates.
- Updated BLAS compatibility wrappers for herk and her2k to return early
if:
n == 0 || ( ( alpha == 0 || k == 0 ) && beta == 1 )
This also results in the imaginary components of diagonal entries NOT
being set to zero (see above), which is consistent with BLAS.
- Updated mkherm to use setid instead of an inlined loop over the
diagonal.
commit ea59a5c93cde1467a3715abc53dda4aecf961873
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 22 14:36:02 2014 -0500
Added new level-1d operation: setid.
Details:
- Defined a new level-1d operation, setid, which sets the imaginary
elements of an object's diagonal to a single scalar. This can be
useful, for example, when trying to make the diagonal of a Hermitian
matrix real-valued.
commit 8965a965931318619ceaebd7c32edccf3022d0c7
Merge: 1785efb 5b73e80
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 22 14:34:32 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit 1785efb5420bc7b9c850a068cb5d99837071e877
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 22 14:33:01 2014 -0500
Minor improvements to invertd and setd.
Details:
- Added missing call to invertd_check() from front-end.
- Changed setd front-end call of scald_check() to setd_check().
commit 5b73e80b71c054c1945a06aff044ef629bc1a9a0
Merge: a41e68e 20690fe
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 18 12:21:20 2014 -0500
Merge pull request #16 from Maratyszcza/emscripten
Emscripten port
commit a41e68e09e73b999fab0bb430a43dccfc63aab45
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 17 13:25:56 2014 -0500
Reimplemented BLIS initialization/finalization.
Details:
- Rewrote bli_init() and bli_finalize() with OpenMP critical sections
for thread-safety. Also added lots of explanatory comments.
- Renamed bli_init_safe() and bli_finalize_safe() with the _auto()
suffix, and reimplemented for simplicity. Updated all invocations
in BLAS compatibility layer to use _auto() suffix.
commit 36358948ea75074bda32a9f8c008f835b87d21db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 17 10:58:10 2014 -0500
Retired frame/3/gemm/other directory.
Details:
- Removed frame/3/gemm/other directory, which contained some outdated
and/or experimental variants.
commit c73261f17edf589e76bdbe297702a1fbbd69275f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 16:23:51 2014 -0500
More minor cleanups post-copyright update.
commit 2a09d24463d358be6243b24f112fad057c2aefe0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 16:17:09 2014 -0500
Reverted power7 symlinks destroyed by sed script.
Details:
- Reverted two symlinks, in kernels/power7/3/test, back to being symlinks
after recursive-sed.sh mistakenly replaced them with copies of the
actual files to which they referred. Meant to include this in previous
commit.
commit 7ed415824d3b2e78541b6f64e404ca5347c06d3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 16:14:33 2014 -0500
Updated copyright headers (continued).
Details:
- Inserted "at Austin" into third clause of license declarations.
Meant to include this change in previous commit.
commit 5c2c6c85616834ff2716ece083118201d9df6dde
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 16:05:03 2014 -0500
Updated copyright headers to contain "at Austin".
Details:
- Updated copyright headers to include "at Austin" in the name of the
University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
2012).
commit fcec68cda3f6e90ae055e7304e6674c1c5c8d010
Merge: 94c0df7 4a20ed1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 11:35:34 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit 94c0df797eda377931f29a41ba6a89c0ed58daca
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 14 11:24:36 2014 -0500
Changed order of zero dim / error checking.
Details:
- Updated level-2 and level-3 internal back-ends so that the operation's
_check() function is called BEFORE any attempt to return early due to
the presence of zero dimensions. This ordering makes more sense because
(for example) object dimensions should match even if one of them is
zero. Previously, a dimension mismatch could result in an early return
with no error message.
- Updated bli_check_object_buffer() so that NULL buffers result in an
error only if the object is dimensionally non-empty (i.e., only if both
of the object's dimensions are non-zero). This allows BLIS operations
to be performed on dimensionally empty objects (i.e., where at least one
dimension is zero).
- Updated the error message associated with bli_check_object_buffer()
to mention the newly relaxed constraint mentioned above, vis-a-vis
non-zero dimensions.
commit 20690fe3018ce17c8df61ce0bffecaa7911dc3a5
Author: Marat Dukhan <maratek@gmail.com>
Date: Sun Jul 13 22:50:56 2014 -0700
Emscripten port
commit 4a20ed1a3f5e9e5232df30aa0e568e6c00c56ce1
Merge: 6a515e9 8ccdfae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 13 17:45:01 2014 -0500
Merge pull request #14 from Maratyszcza/master
Support "make test" for PNaCl configuration
commit 6a515e988f2ae1628258a6dec2c0e9cf2d04790f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 13 17:38:33 2014 -0500
Implemented dsdot() and sdsdot() in compat layer.
Details:
- Replaced "not yet implemented" error messages in dsdot() and sdsdot()
with actual implementations. (These routines are so rarely used that
this log message will probably lead to some people learning of their
existence for the first time.)
commit 255668ddd1004552c6cc65035ec6486671ce99bb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 13 17:30:44 2014 -0500
Inserted gemv beta-scaling bug into compat layer.
Details:
- BLAS has a peculiar bug (or feature) whereby calling gemv on a vector
y of non-zero length and a vector x of zero length results in no action.
Given that the operation is y := beta*y + A*x, many (most?) individuals
would expect vector y to still be scaled by beta. BLIS, when called
natively, handles these cases intuitively (with beta scaling).
Unfortunately, many BLAS test suites actually check for the way this
situation is handled. Therefore, we have decided to implement this "bug"
in the compatibility layer so as to provide "bug-for-bug" compatibility
with BLAS.
commit 570a154581bdb353fa13a219c7cb3c81d3dceffd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jul 12 17:51:05 2014 -0500
Comment/formatting updates to build scripts.
Details:
- Minor updates to comments and formatting in bump-version.sh and
update-version-file.sh scripts.
commit 26cd81990631ff799791629206e068126ff9e3a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 10 13:16:07 2014 -0500
Added bli_info_*() query functions.
Details:
- Added a new API family, bli_info_*(), which can be used to query
information about how BLIS was configured. Most of these values are
returned as gint_t, with the exception of the version string which
is char*.
- Changed how the testsuite driver queries information about how BLIS
was configured (from using macro constants directly to using the
new bli_info API).
- Removed bli_version.c and its header file.
- Added STRINGIFY_INT() macro to bli_macro_defs.h
- Renamed info_t type in bli_type_defs.h to objbits_t (not because of
an actual naming conflict, but because the name 'info_t' would now be
somewhat misleading in the presence of the new bli_info API, as the
two are unrelated).
commit 970b43141697d8c31a033f59513bb59d7cc78ab0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 10 09:30:00 2014 -0500
Minor bugfixes to BLAS compatibility layer.
Details:
- Changed bla_amax.c so that i?amax() routines now correctly return 0
if ( n < 1 || incx <= 0 ).
- Changed bla_rotg.c and bla_rotmg.c to use bli_fabs() macro instead of
f2c's abs() macro for float and double cases.
- Thanks to Murtaza Ali for suggesting the two fixes above.
- Updated label of fnormv to normfv in testsuite/input.operations.
commit 8ccdfaef4c42ad8957af8607a1a9ee29b9277d4b
Author: Marat Dukhan <maratek@gmail.com>
Date: Tue Jul 8 23:14:36 2014 -0700
Replicated logic from testsuite/Makefile in top-level Makefile to support make test
commit caa6507ff3724c80d60987f309b8bbc5b50a9841
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 8 10:25:27 2014 -0500
Minor cleanup to standalone test drivers.
Details:
- Very minor code changes to standalone test drivers in 'test' directory.
- Added *.so files to '.gitignore'.
commit 6c65e9a58fe55990ebb99ec3986443e18af35338
Merge: cb12e45 daca500
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 8 10:13:49 2014 -0500
Merge branch 'master' of github.com:flame/blis
commit cb12e456f94c196c093e52f02a7cbca0032fc86e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 8 10:07:46 2014 -0500
Fixed possible level-3 inf/NaN issue when beta=0.
Details:
- Redefined xpbys_mxn and xpbys_mxn_u/_l macros to employ a copy
(instead of scaling by beta) when beta is zero. This will stamp out
any possible infs or NaNs in the output matrix, if it happens to be
uninitialized. Thanks to Tony Kelman for isolating this bug.
commit daca500db5e2448ba0da8047b75eb0f88d9f40e3
Merge: ab3bc91 4702350
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Jul 3 12:52:52 2014 -0500
Merge branch 'master' of http://github.com/flame/blis
commit 4702350278af31f662b458127777dd4d85a3192f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 3 11:48:23 2014 -0500
Defined _ukernel_void() wrappers to micro-kernels.
Details:
- Added wrappers for micro-kernels so that users may invoke the
micro-kernels without knowing what the function names actually are.
This is useful when an application wishes to call the micro-kernel
from a shared library instance of BLIS, where the application may not
necessarily have the luxury of grabbing the micro-kernel name(s) from
C preprocessor macros at compile-time. Also, since the wrappers use
void* pointers, one's environment does not need to be aware of some
BLIS types such as scomplex and dcomplex. These wrappers now join the
level-1 and level-1f kernel wrappers, which pre-dated this commit.
- Removed the wrapper definitions and prototypes from the micro-kernel
test suite modules, and replaced calls to them with calls to the new
wrappers mentioned above.
commit ab3bc9153b914fbaf259e15b66c91d628e7c8661
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Jul 3 11:19:43 2014 -0500
Fixed a bug for TRSM when BLIS_ENABLE_MULTITHREADING is not set but the multithreading environment variables are turned on
commit b8134b720b985783ee6a582a3eb5d6c51f00d051
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Jul 2 16:02:39 2014 -0500
Quick and dirty multithreading for TRSM
Should work fine for small number of threads (up to 8 or maybe even 16).
However, performance is yet untested.
This parallelizes the "JR" loop for the left sided cases
and the "IR" loop for the right sided cases.
Future work is to parallelize the outer loops as well.
commit e8ef69692831db07ddbe9485a5e504ac3f03e496
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 2 14:59:27 2014 -0500
Added shared library support to build system.
Details:
- Modified top-level Makefile to support building shared (dynamic)
libraries.
- Updated most configurations' make_defs.mk files to include necessary
compiler/linker flags needed by top-level Makefile.
- Note that by default, all configurations presently do NOT build
shared libraries. To enable, one must change the value of
BLIS_ENABLE_DYNAMIC_BUILD to 'yes'.
commit b80df0f2cffb015da02e70a82b8512da9891ab67
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 13:52:39 2014 -0500
Added bump-version.sh script to 'build' directory.
Details:
- Added a bash script, bump-version.sh, to aid in incrementing the BLIS
version string.
commit 9ef1f1e21d083697fc730e48d7d9169c201f3da2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 13:48:17 2014 -0500
CHANGELOG update (0.1.3)
commit 036cc634918463b1caa0fd89c9a211f2f5639af7 (tag: 0.1.3)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 13:48:17 2014 -0500
Version file update (0.1.3)
commit 09d9a3bf6763932d9f571085b2cfd1b8631eccba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 13:43:26 2014 -0500
Reverting version file to test new version script.
Details:
- Changed version file contents to 0.1.2 so that I can test out a new
version file bumping script.
commit ebb33965981dcb2b0bdee5fc7fdf6c959420f311
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 11:22:50 2014 -0500
Added 'version' file.
commit 2cb9a5501a3cbeb6692cf68e896087ba73b6af69
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 10:42:29 2014 -0500
Removed 'version' from .gitignore file.
commit b40dcefc5ee31f67aa3990e2e9d2ef8ed1386a25
Merge: 7101a8e b693b0c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 23 10:39:05 2014 -0500
Merge pull request #11 from Maratyszcza/stable
[sc]axpy kernels for PNaCl
commit b693b0cddcfb41450e3c09a3ab97acb44c1ccdec
Author: Marat Dukhan <maratek@gmail.com>
Date: Sun Jun 22 13:44:25 2014 -0700
[SC]AXPY kernels for PNaCl
commit 7101a8eec0327d6c3a7eb36eb4b0fd45c1c6d162
Merge: ad48dca 020a831
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 19 21:46:50 2014 -0500
Merge pull request #10 from Maratyszcza/stable
Portable Native Client port
commit 020a831bc5f61744cb8354886aa679b99b1285f6
Author: Marat Dukhan <maratek@gmail.com>
Date: Thu Jun 19 00:58:26 2014 -0700
Code clean-up in PNaCl port
commit 491be4f91ed725522f5cc7184053857c6c376ada
Author: Marat Dukhan <maratek@gmail.com>
Date: Thu Jun 19 00:45:44 2014 -0700
Optimized dot product kernels for PNaCl
commit 4b8e71aab80182873a2e138eb07902b8d8fd5480
Author: Marat Dukhan <maratek@gmail.com>
Date: Thu Jun 19 00:43:25 2014 -0700
Use AR rcs flags for PNaCl target to avoid warning
commit 031deb2a5c718d569bde842590a791b812f4cf1d
Author: Marat Dukhan <maratek@gmail.com>
Date: Wed Jun 18 03:11:34 2014 -0700
PNaCl configuration: use pnacl-ar instead or ar (fixes build issue on Mac)
commit 68a02976e3c3638f0a9821342e269a1743e3ace3
Author: Marat Dukhan <maratek@gmail.com>
Date: Wed Jun 18 03:10:25 2014 -0700
Compile pnacl configuration in GNU11 mode to avoid warning about non-standard features
commit 6f8462eb0ec278b89731e73ef583386a3371d095
Author: Marat Dukhan <maratek@gmail.com>
Date: Wed Jun 18 03:08:46 2014 -0700
Fix inconsistent VERBOSE macro in Makefile
commit b2ffb4de8b6872cb23537ad282e557d11dcd9c8b
Author: Marat Dukhan <maratek@gmail.com>
Date: Sun Jun 15 18:41:30 2014 -0400
Reformatted PNaCl GEMM kernels
commit 6de2d472d98baa215264a776f3d5291780a6a085
Author: Marat Dukhan <maratek@gmail.com>
Date: Sun Jun 15 08:44:31 2014 -0400
CGEMM and ZGEMM kernels for PNaCl
commit f064711a5e6fb3852c17c7520909b09dc27665f2
Author: Marat Dukhan <maratek@gmail.com>
Date: Sun Jun 15 06:27:37 2014 -0400
SGEMM and DGEMM kernels for PNaCl
commit ad48dca22913a363899f0bef45553898718eebb1
Merge: ee2b679 7118f87
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jun 14 15:10:13 2014 -0500
Merge pull request #9 from tkelman/memalign_windows
Use _aligned_malloc instead of posix_memalign on Windows
commit 7118f87e18b4941423472afc00215c1d1f2a1fcd
Author: Tony Kelman <tony@kelman.net>
Date: Sat Jun 14 06:53:20 2014 -0700
Use _aligned_malloc instead of posix_memalign on Windows
commit ee2b679281ca45fb40b2198e293bc3bc3d446632
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Jun 6 12:41:55 2014 -0500
Only include omp.h if BLIS_ENABLE_OPENMP is set
commit 19c05dfaac43c627f86e897c8c00f1f9440754aa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 5 10:54:16 2014 -0500
CHANGELOG update (for 0.1.2).
commit 00f232f8ed1f7c41619b12ebf779ebe2c3b2d3cd (tag: 0.1.2)
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Jun 2 13:40:57 2014 -0500
Added single-precision micro-kernel for Knights Corner aka MIC aka Xeon Phi
commit 3fc60e491426f6248c0feae88d971e4d1f88fb95
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 21 11:34:42 2014 -0500
Fixed ldim alignment bug in core2 gemm ukernel.
Details:
- Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in
a segmentation fault if a column-stored matrix's starting address was
aligned, but its leading dimension was such that its second column was
unaligned. Basically, the micro-kernel was assuming that aligned load
instructions were safe when they actually were not. An extra condition
that checks the alignment of cs_c (ie: the leading dimension in the
column storage case) has now been added. Thanks to Michael Lehn for
reporting this bug.
commit 77a2d8dac8b242d7a202c9aabda3927ab68cf987
Merge: 8c5d607 21fb089
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue May 20 09:53:19 2014 -0500
Merge pull request #8 from tlrmchlsmth/master
Added multithreading to most level-3 operations.
commit 21fb089387ee7c87f6dc53b0f60f68b48d3ff3e8
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon May 19 20:38:55 2014 -0700
Reverting changes dunnington and reference configs
Now they are unchanged from the main branch of BLIS
commit 8a0ef0e0db5880730425926f8ba56b457a2ba764
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri May 16 13:44:14 2014 -0500
Fixed rounding error in bli_get_range_weighted
commit 0b4b1680334528b1b60bc696537600f763198e92
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri May 16 12:23:37 2014 -0500
Fixed bug with disabling JC loop threading for right sided trmm
commit 5c048a90d8dfa1dbde4e45fbc10ffcbdfe59d960
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed May 14 16:20:06 2014 -0500
Disabled parallelism for right-sided TRMM JC loop
The loop has dependent iterations.
commit 13a4c717ed0e273359dbaf5554cc4fa70b087d71
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed May 14 14:59:04 2014 -0500
Fixed bug with bli_get_range_weighted
commit 45957cc7745e9bb1698408d72f53ef192e960820
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue May 13 17:14:46 2014 -0500
Allowed threading to be turned off
No longer requires OpenMP to compile
Define the following in bli_config.h in order to enable multithreading:
BLIS_ENABLE_MULTITHREADING
BLIS_ENABLE_OPENMP
Also fixes a bug with bli_get_range_weighted
commit bd1dc98ce599d74513a553fe3b37a2ebca1c3812
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon May 12 17:26:19 2014 -0500
Disabled multithreading of the kc loop
commit 456df0372170bd7ca2c7e2d85365a69f1f04de88
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Apr 30 12:28:00 2014 -0500
Replaced register blocksize hack with querying the register blocksize for determining parallelism granularity
commit f4fdfe8fc573553eb36795b79cdf681270dab71b
Merge: 31bb065 8c5d607
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Apr 30 11:46:35 2014 -0500
Merge http://github.com/flame/blis
commit 8c5d6071e24ba10a53669390a47287e86ff354ce
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 29 12:26:12 2014 -0500
Added _check() routines for fprint[mv], rand[mv].
Details:
- Added _check() routines for fprintm, fprintv, randm, and randv.
- Added invocations to the above routines from their respective
front-ends.
commit 262cdabcc885bcf6636f4d8bb7d320f95e81d820
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 28 16:48:25 2014 -0500
Changed treatment of NULL object buffers.
Details:
- Relaxed the constraint in bli_obj_attach_buffer_check(), which required
the buffer address being attached to be non-NULL. This is acceptable
because the user was already able to create and use objects with NULL
buffers (via bli_obj_create_without_buffer(), which initializes the
buffer to NULL).
- Inserted calls to newly defined function, bli_check_object_buffer(),
into nearly all operations' _check() or _int_check() functions. This
allows BLIS to abort peacefully if a computational routine is called
with an object containing a NULL buffer. By contrast, under such
conditions, BLAS would typically fail with a segmentation fault.
- Within operation front-ends, moved the calls to _check()/_int_check()
so that zero dimensions are checked first (and if found, execution
returns with trivial or no computation). This resolves issue #7. Thanks
to Jack Poulson for reporting this bug.
commit 31bb065ba40ae0c5a614e743b8025abca012b99e
Merge: 20e2443 7c61959
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Apr 23 12:30:19 2014 -0500
Merge http://github.com/flame/blis
commit 7c61959955c8ba78160d0ed4d1979022029d963b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 10 17:18:36 2014 -0500
Can now query register blocksizes from blk algs.
Details:
- Added a new field to blksz_t objects that allows one to attach a
sub-object. Doing this allows us to associate a register blocksize with
any given cache blocksize. That way, the register blocksize can be
queried wherever the cache blocksize would normally be accessible
(e.g. a blocked algorithm).
- Modified bli_gemm_cntl.c (and 4m/3m variants) so that the register
blocksizes are attached to the cache blocksizes after they are created.
commit 58671597d3d450817b2eda576c05ed6dadd8af6d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 10 15:35:30 2014 -0500
Minor cleanups to level-2 _cntl.c files.
Details:
- Changed level-2 _cntl.c files so that the blocksizes for gemv are
imported and used, rather than blocksizes being declared locally.
- Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as
4m/3m variants).
- Removed test/old/test_blis2.c.
commit 20e24430a772bc0fbaf24dec2f8c544096fd3f4e
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date: Tue Apr 8 17:50:44 2014 +0000
Some fixes for the bgq kernels
commit bde697f75ec1e7f2decebee0c9bd620b4c134cd5
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 16:43:44 2014 -0500
Add -openmp to ldflags as well
commit c332be8cd471eeace7b4fa4ae7443088b6a68ec3
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 16:37:50 2014 -0500
Added -openmp flag to Xeon Phi build for convenience
commit e7ca9e4b4a24d585c9aec8293fc7bb79e4171ad0
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 16:31:15 2014 -0500
Used BLIS_DEFAULT_*_MR for rounding partitioning instead of BLIS_DEFAULT_*_MC
commit 7b9b228c6fa4cfb70b1ebb855b009a036e85fac3
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 16:29:10 2014 -0500
Fix for tree barrier freeing bug
commit 5ec93bd9a76096312d51c326ccde1e9bd0a436ab
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 15:09:10 2014 -0500
Bunch of minor fixes
Removed barrier after unpackm in all level3 blocked variants
Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case)
Moved the enabling of the tree barriers into bli_config.h
Fed the default MR and NR for double precision into bli_get_range instead of the number 8
commit 575fb9b0b08f3bdb56ccde056da619d1585617c1
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 12:13:29 2014 -0500
Changed default blocking factor to default double precision MR and NR
commit ab9c7880335c281432d5809fe0dec46753d22569
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 11:38:11 2014 -0500
Added faster tree barriers necessary for performance for Xeon Phi
Fixed up some stuff in the thread info free functions
Disabled threading for TRSM so that it actually works when threading environment variables are set
commit ec58a7923cccac08632670caadf3cf6ff5dce766
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 10:22:48 2014 -0500
Freeing thread info paths.
Also made herk IC and JC loops do weighted partitioning
commit 2b6848b2397d6d84ca4e5f792fc51ad05e351a36
Merge: 4e3eb39 21a0efb
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Apr 4 09:54:54 2014 -0500
Merge http://github.com/flame/blis
Conflicts:
kernels/bgq/1/bli_axpyv_opt_var1.c
kernels/bgq/1/bli_dotv_opt_var1.c
commit 4e3eb39aca4df0b9fdc003d468f368a2f2ba597d
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date: Fri Apr 4 14:50:03 2014 +0000
Some fixes to the bgq config
MR and NR for double complex were wrong
Default fusing factor for double precision was wrong as well
commit 21a0efb33d7435139e9c43c1a4787a6bff533e26
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 3 16:38:44 2014 -0500
Fixed follow-up to issue #6.
commit c318157a9bee8ea6e59be16f99f65d9271fe0d27
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 3 16:24:34 2014 -0500
Fixed issue #6 (incorrect 'restrict' usage).
Details:
- Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels.
(However, there may be other instances of similar misuse elsewhere in
BLIS.) Thanks to Jeff Hammond for reporting this issue.
commit b5150a1bf3bd89598e2b3aeac110eb5b44ac6c12
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 3 12:25:45 2014 -0500
Added #include "arm_neon.h" to ARM gemm ukernel.
Details:
- Inserted #include "arm_neon.h" into gemm ukernel source file for
arm/neon. Thanks to Jean-Michel Hautbois for suggesting this fix.
commit 2041c264517b6c590fd4f7e8253e6911b622d1c3
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Apr 3 10:30:03 2014 -0500
Added barriers needed prior to doing scalar reset for rank-k updates.
commit 47a90e69dfde3f4f8fdf90654248a6b499fbadbc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 1 14:34:31 2014 -0500
Attempted to fix uninitialized variable warnings.
Details:
- Added initialization statements to various macros used in level 1m and
1m-like operations. I wasn't able to reproduce the reported behavior,
so hopefully this takes care of it. Thanks to Jeff Hammond for the
report.
commit d27b4f690c14b1f836f8c7a3c0e91e09d852f02e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 1 12:57:24 2014 -0500
Use generic paths for toolchain in POWER7.
Details:
- Fixed issue #4. Thanks to Jeff Hammond for contributing changes.
commit 1584ae1c83c3a8c1af76acb46404747507650f19
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Mar 28 15:15:48 2014 -0500
Fixed race condition involving scalar reset
commit 459dde4acc09e49380da58fb7b246db488884ad9
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Mar 27 17:06:45 2014 -0500
Made barrier after packing implicit.
This also fixed a bug where barriers in the blocked variants were inserted after the inner packing routines,
but not the outer packing routines.
This allowed, for instance, the block of B to not be finished being packed before computation to occur.
commit 9f78ec6e7e95fcad89a167b27cad7e2d74b6d122
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Mar 27 14:18:46 2014 -0500
Some fixes for the internal functions,
was innappropriately only having thread chief do some things.
commit a6fd48345424e097f71652be013aa897e098b41e
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date: Wed Mar 26 17:19:46 2014 +0000
Added test drivers for level 3 BLAS that run tests in parallel using MPI
commit 73b3db594864be0f9be9a0eb29bf961fa9c95f29
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
Date: Wed Mar 26 15:39:05 2014 +0000
Some fixes for the bgq configuration
commit f0824a04fc75e231c3a3d7757fa4e7294173282f
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 24 15:21:42 2014 -0500
Initial commit to enable threading in TRSM,
Also enabled weighted partitioning for herk, trmm
Fixed bug where multiple threads would try to modify the same state in the internal level 3 functions
Correctly computed a_next and b_next for gemm, herk macrokernels
a_next and b_next point to the current micropanels in trmm
commit 23d9eab354fbc88165889832955e126772bf8488
Merge: 5d5dc2e fd3e32a
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Mar 20 16:54:35 2014 -0500
Merge https://github.com/flame/blis
commit 5d5dc2eedef2f7c90d61371a1b457be5c06cf583
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Mar 20 16:43:36 2014 -0500
Parallelized trmm and trmm3
Also fixed bugs in packm
commit fd3e32a5f419fa412f46afe4dd1c3a26e15f3eb4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 20 13:59:48 2014 -0500
Refined INSERT_GENTFUNC macro usage.
Details:
- Defined new INSERT_GENTFUNC macros so that the macro always takes
exactly the number of arguments needed for the particular operation or
variant being defined. Many operations were using INSERT_GENTFUNC
macros that expected one auxiliary argument even though none were
needed. Those instances have now been updated. Most of these instances
were in the level-0 and -1v operations, as well as some operations
defined in frame/util.
commit 9b0e715f29338a1a1d6445907d2445c35f011121
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 19 15:47:54 2014 -0500
Minor simplifications to trmm, trsm macro-kernels.
Details:
- Simplified some code that would have allowed the diagonal of a trmm
or trsm triangular matrix to intersect the short end of a micro-panel.
This is disallowed via higher-level constraints on cache blocksizes, so
this code was never needed and only served to obfuscate.
- Updated some comments in trmm, trsm macro-kernels.
commit a3902750b9ab4923433f7e353f3669c3c419f8e4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Mar 19 12:35:17 2014 -0500
Reorganized norm operations.
Details:
- Completely reoganized norm operations:
- Renames:
- fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm)
- absumv -> norm1v (vector 1-norm)
- New operations:
- norm1m (matrix 1-norm)
- normiv, normim (infinity-norm)
- amaxv (BLAS-like absolute maximum value index)
- asumv (BLAS-like absolute sum)
- Deprecated absumm, as it did not correspond to any actual norm.
(However, an inlined version now exists in the testsuite module for
randm.)
commit c0140cb752f27e99742f85d23be2181c00a1335e
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Wed Mar 19 11:21:16 2014 -0500
Fixed packm variants 3 and 4 where every thread was trying to manipulate the same state
Now just performed by the master thread.
commit fb42983bd9943711baa7d1c6496de1215bb816ef
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 18 16:37:28 2014 -0500
Fixed a barrier bug and a thread decorator bug
commit aa2405f8b23d0f8d2ec04790882f2176ef2e8fd8
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 18 15:23:09 2014 -0500
Fixing function pointer issues with thread decorator
commit ec8b88f93533942d3711191873310e7ff281bda6
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 18 14:35:37 2014 -0500
Enabled threading for packm blocked variants 3 and 4
commit 0ac534cdf657bbf04601abfe719ba2887aab5da7
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 18 13:26:27 2014 -0500
Added decorator for calling parallelized intermal functions
Will allow for easy support for different threading models
commit 5296f58975f7d351f88909cc80b6d0cffd73def7
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 17 17:15:35 2014 -0500
Fixing some bugs with herk parallelization
commit c51d0110831eb89361b4720bf7ed75edbd26ebce
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 17 15:00:47 2014 -0500
Initial multithreading support for HERK
commit c720b141568d1f289146bf34ded08001f2c0dfbb
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 17 11:39:32 2014 -0500
Switched to using environment variables to control threading.
The environment variables all follow the format BLIS_X_NT,
where X is the index of the loop as described in our paper
Anatomy of High Performance Many-Threaded Matrix Multiplication.
These indices are IR, JR, IC, KC, and JC.
Also enabled parallelism for hemm and symm, but these are currently untested.
commit 92233cf64274b27b2217c5cfffe75443ff6137a4
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 11 14:16:08 2014 -0500
Some fixes to gemm thread info tree creation,
Changed microkernel tests to use the new BLIS_PACKM_SINGLE_THREADED
instead of BLIS_SINGLE_THREADED
commit 020f80c30289d8bcaa688bf600b01fae9b23b54f
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Tue Mar 11 12:08:17 2014 -0500
Added files specific to threading for gemm and packm operations
commit 8d8f4352a41926bc923e47be836365b6b726aff2
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 10 15:47:28 2014 -0500
Added single threaded thread info data structures specifically for gemm and packm
commit 0e8677761175189583ca7d855e24b2bbdd2dada8
Merge: 2e727a0 b3bff63
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 10 15:16:21 2014 -0500
Merge branch 'master' of https://github.com/tlrmchlsmth/blis
commit 2e727a025a8f796d2b6bd14f489d0ee72e7d1fc7
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Mon Mar 10 15:14:33 2014 -0500
Modifying the thread info data structures
This change makes each operation have its own thread info type,
allowing more fine control of threading in operations that have different types of suboperations
commit a770590cf21a459f04bf941c58ee2afd272cc441
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 3 14:31:44 2014 -0600
Minor fixes to sumsqv, abmaxv.
Details:
- Minor update to bli_sumsqv_unb_var1() to bring it up-to-date with
LAPACK 3.5.0's zlassq.f, which, starting with 3.4.2, returns NaN when
the vector (or matrix) contains a NaN.
- Minor change to bli_abmaxv_unb_var1() to more closely mimic the
behavior of netlib BLAS's izamax(). There, a "less than or equal to"
operator is used in the search instead of "less than", which would
change the element index returned if there were multiple maximum values.
- Added macro function definitions for bli_isinf() and bli_isnan(), which
are currently implemented in terms of isinf() and isnan() from math.h.
commit b3bff631eadf98b15cb422fb4a8e2f855c23e8a7
Merge: 2c158fb e8757b0
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 16:53:24 2014 -0600
Merge https://github.com/flame/blis
commit 2c158fb885c27f7b599dc1e85b57edd684f19223
Merge: e4738c4 c2b2ab6
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 16:46:23 2014 -0600
Merge https://github.com/flame/blis
Conflicts:
frame/1m/packm/bli_packm_blk_var1.c
commit e8757b03a74f9891632242e9a90efb32150826f5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 27 16:40:07 2014 -0600
Use "%ld" as int format specifier in fprintm.
Details:
- Changed "%d" to "%ld" when printing integers via bli_fprintm().
- Meant to include this in previous commit.
commit c663ce3b5170fee7dfb5b528b650d70c8e932cac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 27 16:32:57 2014 -0600
Fixed various bugs when C99 complex is enabled.
Details:
- Fixed various bugs in packm_*_cxk(), the 4m/3m micro-kernels, and
elsewhere in the framework that were not yet set up to work properly
when BLIS_ENABLE_C99_COMPLEX is defined in bli_config.h
- Extensive changes to f2c-derived files in frame/compat/f2c to allow
C99 complex storage. Most of these changes center around accessing
real and imaginary components via bli_?real()/bli_?imag() accessor
macros, and setting of values via bli_?sets() assignment macros.
(Thanks to Vladimir Sukarev for pointing out that _ENABLE_C99_COMPLEX
was broken.)
commit e4738c48e00b89391d9baa1fd0aa62d1ea2f95e6
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 16:29:46 2014 -0600
Added support for parallelism in gemm micro-kernel
commit bfe214b633765ed40b57b330fbb84c332663aa40
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 15:53:10 2014 -0600
Fixed bug with parallel packing, and bug with allocating an array of thread infos
In packm variant 1, the variable p_begin was incremented each iteration, causing a dependency.
This dependeny was removed, allowing each iteration to be executed in parallel.
Somewhere in bli_threading.c, I was allocating an array of pointers instead of an array of structs.
commit 6193d9ceea552e67170dba45abde04c64271c705
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 14:09:19 2014 -0600
Fixed bug in thread trees
commit ac5a2de1d17ffd460b00fee9757898525a09abae
Merge: 01b125e bd3c7ec
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 11:59:33 2014 -0600
Merge branch 'master' of https://github.com/tlrmchlsmth/blis
commit 01b125e815f19410e8e0611d088b84570e499e93
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Thu Feb 27 11:55:45 2014 -0600
First pass at adding parallelism to BLIS.
Added a multithreading infrastructure that should be independent of multithreading implementation in the future.
Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized.
commit c2b2ab62707e4174892aff3ce65f36f54878fae5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 26 12:46:45 2014 -0600
Deprecated panel stride alignment in bli_config.h.
Details:
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all
configurations. It was already going unused in packm_init() since the
recent 4m/3m commit. This setting was rarely, if ever, useful, and its
existence only posed a potential risk for 4m/3m-based implementations.
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h.
- Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template
micro-kernels.
commit f18aee83a5ac1b14808686fc3c5a3c846a1d99b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 25 17:58:42 2014 -0600
CHANGELOG update (for 0.1.1).
commit fde5f1fdece19881f50b142e8611b772a647e6d2 (tag: 0.1.1)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 25 13:34:56 2014 -0600
Added extensive support for configuration defaults.
Details:
- Standard names for reference kernels (levels-1v, -1f and 3) are now
macro constants. Examples:
BLIS_SAXPYV_KERNEL_REF
BLIS_DDOTXF_KERNEL_REF
BLIS_ZGEMM_UKERNEL_REF
- Developers no longer have to name all datatype instances of a kernel
with a common base name; [sdcz] datatype flavors of each kernel or
micro-kernel (level-1v, -1f, or 3) may now be named independently.
This means you can now, if you wish, encode the datatype-specific
register blocksizes in the name of the micro-kernel functions.
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
undefined in bli_kernel.h will default to the corresponding reference
implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
it will be defined to be BLIS_DGEMM_UKERNEL_REF.
- Developers no longer need to name level-1v/-1f kernels with multiple
datatype chars to match the number of types the kernel WOULD take in
a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
sufficient, as in bli_daxpyv_opt().
- There is no longer a need to define an obj_t wrapper to go along with
your level-1v/-1f kernels. The framework now prvides a _kernel()
function which serves as the obj_t wrapper for whatever kernels are
specified (or defaulted to) via bli_kernel.h
- Developers no longer need to prototype their kernels, and thus no
longer need to include any prototyping headers from within
bli_kernel.h. The framework now generates kernel prototypes, with the
proper type signature, based on the kernel names defined (or defaulted
to) via bli_kernel.h.
- If the complex datatype x (of [cz]) implementation of the gemm micro-
kernel is left undefined by bli_kernel.h, but its same-precision real
domain equivalent IS defined, BLIS will use a 4m-based implementation
for the datatype x implementations of all level-3 operations, using
only the real gemm micro-kernel.
commit 15b51e990f1d21333b5f7af97c211756247336e5
Merge: 6363a9f fc04b5e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 21 09:04:32 2014 -0600
Merge branch 'master' of github.com:fgvanzee/blis
commit fc04b5eb69868c341ce03f5ef1f02de4b8c121b0
Merge: b29e1c2 d1813c9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 21 09:04:13 2014 -0600
Merge pull request #3 from figual/master
New ARM armv7a kernels and Assembly file consideration in Makefile
commit d1813c9dee34410833db5061e6588ec1a6c9ecd4
Author: Francisco Igual <figual@pandaboard.(none)>
Date: Fri Feb 21 15:14:31 2014 +0100
Added new armv7a micro-kernels and configuration files from Werner Saar.
commit 0cd098c03a000ed9426a7e9135190696da8cadbc
Author: Francisco Igual <figual@pandaboard.(none)>
Date: Fri Feb 21 15:12:30 2014 +0100
o Modified Makefile to consider .S assembly microkernels.
commit 6363a9f658257fe3d814a3dce5308f807adb54a2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 19 17:00:52 2014 -0600
Added level-3 support for complex via 4m-/3m.
Details:
- Added the ability to induce complex domain level-3 operations via new
virtual complex micro-kernels which are implemented via only real
domain micro-kernels. Two new implementations are provided: 4m and 3m.
4m implements complex matrix multiplication in terms of four real
matrix multiplications, where as 3m uses only three and thus is
capable of even higher (than peak) performance. However, the 3m method
has somewhat weaker numerical properties, making it less desirable
in general.
- Further refined packing routines, which were recently revamped, and
added packing functionality for 4m and 3m.
- Some modifications to trmm and trsm macro-kernels to facilitate indexing
into micro-panels which were packed for 4m/3m virtual kernels.
- Added 4m and 3m interfaces for each level-3 operation.
- Various other minor changes to facilitate 4m/3m methods.
commit b29e1c2b278c177e104c84ba462820ee8296df6c
Merge: ee60377 bd3c7ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 14 14:11:54 2014 -0600
Merge pull request #2 from tlrmchlsmth/master
Fixes and improvements to xeon phi implementation.
commit bd3c7ecfb54a9b9851c7d364f41c21e4cff52f6f
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Feb 14 14:05:57 2014 -0600
Removing changes to input.general and input.operations
commit ce066863683cb4e910270cf8ab8e138b01ff3358
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Feb 14 13:40:24 2014 -0600
Fixed more Xeon Phi bugs, especially with scattered update
commit 31134b5c7076423aee1b4f494e925f27171d97e6
Author: Tyler Smith <tms@cs.utexas.edu>
Date: Fri Feb 14 11:19:44 2014 -0600
Some fixes, changes, and improvements to the microkernel to the Xeon Phi
commit ee60377e467862b9d8a7205c45dce5cf66c78c46
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 13 14:03:31 2014 -0600
Shifted some fields in info_t.
Details:
- Shifted the pack order, pack buffer type, and structure type fields
to make room for an extra bit in the pack type/status field.
commit bd3ab1ad4cf42f8bc30ab262acf8eccb49bb1a08
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 13 09:29:55 2014 -0600
Minor fixes to trsm consistent with prev on trmm.
Details:
- Removed use of bli_min() and bli_max() that were only being used to
try to support situations where the diagonal would intersect the
short end of some micro-panels, which is situation that is disallowed
at a higher level by various constraints on the register and cache
blocksize. This only affected trsm_ll and trsm_lu.
- Use panel stride as passed into the macro-kernel rather than compute
it via k and PACKMR/PACKNR. This affects all macro-kernels of trsm.
commit 6260b0b5f8bd248f3f66e5a1c6854bdbd9d02ad0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 13 09:19:56 2014 -0600
Fixed obscure bug in trmm_ll, trmm_lu.
Details:
- Fixed an obscure bug in left-hand trmm that would only manifest when
non-zero register blocksize extensions (PACKMR > MR or PACKNR > NR)
are used.
- Removed use of bli_min() and bli_max() that were only being used to
try to support situations where the diagonal would intersect the
short end of some micro-panels, which is situation that is disallowed
at a higher level by various constraints on the register and cache
blocksize. This only affected trmm_ll and trmm_lu.
- Use panel stride as passed into the macro-kernel rather than compute
it via k and PACKMR/PACKNR. This affects all macro-kernels of trmm.
commit 16915c1c1e55c660bf82141cdadf7c0860d5b464
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 11 10:54:19 2014 -0600
Fixed an obscure bug in packm_cxk().
Details:
- Fixed a bug in packm_cxk() whereby the packm ukernel was being chosen
from ldp, which is always equal to PACKMR or PACKNR. The problem with
this is that the pack ukernels were implicitly assuming that the
panel dimension of the panel being packed was equal to ldp, which
is not the case when the register blocksizes extensions are non-zero
(ie: when PACKMR > MR or PACKNR > NR, whichever is applicable). This
problem has been fixed by passing ldp into the pack ukernels, which
now walk through the packed micro-panel region by incrementing by this
value, rather than incrementing by the inherent panel dimension value
assumed by each packm ukernel (e.g. 4 in the case of packm_ref_4xk).
- Also fixed a very minor edge case inefficiency whereby pack ukernels
smaller than the default were not being used in edge cases, and instead
those situations were being handled by scal2m. This is related to the
issue above, because the pack ukernel itself was being chosen based on
ldp instead of the panel dimension.
commit b7da57b282c5a5e2208946e60309d2352f55351d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 11 10:28:23 2014 -0600
Updated calls to packm_blk_var2() in testsuite.
Details:
- In ukernel testsuite modules, replaced calls to packm_blk_var2() with
_var1(). Meant to include this in previous commit.
commit c255a293e25b2223c88e8800267cd06ad2a90041
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 10 14:31:24 2014 -0600
Consolidated packm_blk_var2 and var3.
Details:
- Consolidated the functionality previously supported by packm_blk_var2()
and packm_blk_var3() into a new variant, packm_blk_var1().
- Updates to packm_gen_cxk(), packm_herm_cxk.c(), and packm_tri_cxk()
to accommodate above changes.
- Removed packm_blk_var3() and retired packm_blk_var2() to
frame/1m/packm/old.
- Updated all level-3 _cntl_init() functions so that the new, more
versatile packm_blk_var1 is used for all level-3 matrix packing.
commit 32d8f264ae7b28155f5d7b21dcc5ecb78da2e0ab
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Feb 9 10:07:37 2014 -0600
Refactored packm variants.
Details:
- Revised packm_blk_var2() and _var3() by encapsulating the general,
hermitian/symmetric, and triangular panel-packing subproblems into
separate functions: packm_gen_cxk(), packm_herm_cxk(), and
packm_tri_cxk(), respectively. Also, homogenized the packm code as
well as the new specialized packm_*_cxk() code to further improve
readability.
commit 6c8067028707947fcdf4f856a272e15bb9ed91e3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 7 11:27:15 2014 -0600
Renamed enumerated type in testsuite and modules.
Details:
- Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and
renamed all corresponding "impl" variables to "iface".
commit 6c12598b1bc567f0b08f58aebdc753a1c1390378
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 6 18:26:35 2014 -0600
Employ simpler INSERT_ macro for ref ukernels.
Details:
- Defined a new macro, INSERT_GENTFUNC_BASIC0, which takes only one
argument--the base name of the function--and employed this macro
in the reference micro-kernel files instead of the _BASIC macro,
which takes one auxiliary argument. That argument was not being
used and probably just acted to unnecessarily obfuscate.
commit 32cae66326b68706d0e695cfd60c9ca5bc32c534
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 6 18:06:42 2014 -0600
Fixed some instances of sloppy 'restrict' usage.
Details:
- Fixed some technical incorrectness with some usage of the 'restrict'
keyword in the reference trsm micro-kernels.
- Tweak to testsuite/Makefile that causes rebuild if libblis was
touched.
commit 7aceef7683e2a2aff3c7ec2a73508036af2e19e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 6 17:31:19 2014 -0600
Updated comments in macro-kernels.
Details:
- Updated (and fixed some errors in) the "Assumptions/assertions" comment
section of macro-kernels.
- Changed register blocksizes of reference configuration to MR = 8 and
NR = 4. It's always good for MR != NR in the reference configuration
since it may help uncover bugs related to non-square micro-kernels.
commit 8fd292aa78950bcdf556605718f09d13f9575abc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 6 14:32:21 2014 -0600
Pass panel dimensions into macro-kernels.
Details:
- Modified the interfaces to the datatype-specific macro-kernels so that:
- pd_a and pd_b are passed in (which contain the panel dimensions of
packed panels of a and b).
- rs_a and cs_b are no longer passed in (they were guaranteed to be 1).
- Modified implementations of datatype-specific macro-kernels so pd_a,
pd_b, cs_a, and rs_b are used instead of cpp macros for MR, NR, PACKMR,
and PACKNR, respectively.
- Declare temporary c matrices (ct) as being maxmr-by-maxnr, which for now
is equivalent to being mr-by-nr. maxmr and maxnr are declared in a new
header file bli_kernel_post_macro_defs.h.
commit 3404e6657eabb017cd1580a2f1dd8e6fb13df923
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 5 11:19:10 2014 -0600
Deprecated incremental blocksize macro const defs.
Details:
- Removed macro constant definitions related to incremental blocksizes
from all configurations' bli_kernel.h files. This change is minor and
is mostly a cleanup related to a previous commit.
commit 1e9afd39a63e0a58167d4439c1a0a880a4a35657
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 4 20:15:19 2014 -0600
Comment updates (removed vestiges of "bd").
commit 5cf58f7c2d5bc0d2d94d9576f7158d8f133b7aac
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 4 09:15:19 2014 -0600
Added early returns for "object is zeros" case.
Details:
- Added some logic to packm_init(), pack_int() and gemm_int() so that
(a) objects marked as BLIS_ZEROS are not packed, and (b) those
objects are not computed with. This functionality is not currently
needed by any existing implementations, but may be used in the
future.
commit 6bbd4be769a9b344a55abe5ddaca1a99fd29f7b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 3 13:15:25 2014 -0600
Added 'f' on some gemm and trmm blocked variants.
Details:
- Added 'f' to some block variant files/functions to be consistent with
other file/functions' naming convention. Here, the f indicates
partitioning in the "forward" direction.
commit eb13cb2c6b182df5e2a9b88c76f50e2cee25b9e0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 3 11:07:01 2014 -0600
Removed redundant non-gemm blksz_t creation.
Details:
- Removed code that creates duplicate blksz_t objects for herk, trmm,
and trsm. Instead, the gemm blksz_t objects are accessed via extern
and used directly. This reduces the amount of code associated with
each of the three _cntl_init() and _cntl_finalize() function.
commit 0a023a7d9e58e53b8c204a5f49aa8ca9afeba938
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 29 14:02:08 2014 -0600
Introduced new level-3 front-end layer.
Details:
- Added new _front() functions for each level-3 operation. This is done
so that the choosing of the control tree (and *only* the choosing of
the control tree) happens in what was previously the "front end"
(e.g. bli_gemm()). That control tree is then passed into the _front()
function, which then performs up-front tasks such as parameter
checking.
commit 251c5d112196d37b183e554bc9d406104aed65fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jan 28 19:40:29 2014 -0600
Removed redundant hemm, her2k control trees.
Details:
- Removed code that generated a control tree specifically for hemm and
symm. Instead, the gemm control tree is now configured so that it
works for gemm, hemm, or symm.
- Retired most her2k code, as it was not being used. (Currently, her2k is
implemented as two invocations of herk.) I couldn't think of many
situations where her2k variants were needed.
- Removed some older her2k code.
commit 5a36e5bf2f59d1e85d6dbce32a07d604c5e82d11
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 27 11:13:00 2014 -0600
Embed func_t microkernel objects in control trees.
Details:
- Modified all control tree node definitions to include a new field of
type func_t*, which is similar to a blksz_t except that it contains
one function pointer (each typed simply as void*) for each datatype.
We use the func_t* to embed pointers to the micro-kernels to use for
the leaf-level nodes of each control tree. This change is a natural
extension of control trees and will allow more flexibility in the
future.
- Modified all macro-kernel wrappers to obtain the micro-kernel pointers
from the incomming (previously ignored) control tree node and then pass
the queried pointer into the datatype-specific macro-kernel code, which
then casts the pointer to the appropriate type (new typedefs residing
in bli_kernel_type_defs.h) and then uses the pointer to call the micro-
kernel. Thus, the micro-kernel function is no longer "hard-coded" (that
is, determined when the datatype-specific macro-kernel functions are
instantiated by the C preprocessor).
- Added macros to bli_kernel_macro_defs.h that build datatype-specific
base names if they do not exist already, and then uses those to build
datatype-specific micro-kernel function names. This will allow
developers extra flexibility if they wanted to, for example, name each
of their datatype-specific micro-kernels differently (e.g. double
real might be named bli_dgemm_opt_4x4() while double complex might be
named bli_zgemm_opt_2x2()).
- Inserted appropriate code into _cntl_init() functions that allocates
and initializes a func_t object for the corresponding micro-kernels.
The gemm ukernel func_t object is created once, in bli_gemm_cntl_init(),
and then reused via extern wherever possible.
commit 6cbd6f1c7f1915180aa28939833afde48665c5ae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 24 10:38:29 2014 -0600
Removed commented mixed domain macro-kernel code.
Details:
- Removed commented-out code from macro-kernels that was supposed to
facilitate implementing mixed domain (complex times real) matrix
multiplication. This functionality is still (probably possible),
but I'm getting tired of looking at the code every time I edit
a macro-kernel. Plus, there are probably ways of doing it at a
higher level, via control trees.
commit 29778be1119f1a884330d7f8dc424a2df4101d58
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 22 16:03:11 2014 -0600
Removed b_aux field from cntl nodes.
Details:
- Removed b_aux field from all control tree node definitions. This field
was being used in certain optimizations (incremental blocking) that were
not actually being employed within BLIS, and are probably not employed
by others.
- Updated all _cntl_obj_create() function definitions and invocations
according to above change.
- Retired bli_gemm_blk_var4.c, which was one such function that employed
incremental blocking, but which was never called by BLIS itself.
commit 06ac727a42ec9e832c7832745036702014638f99
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 15 16:44:52 2014 -0600
Updated some comments in level-3 front ends.
commit d628bf1da1560f1f5126a1ddfed8714f0a4b8da3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 15 11:40:12 2014 -0600
Consolidated pack_t enums; retired VECTOR value.
Details:
- Changed the pack_t enumerations so that BLIS_PACKED_VECTOR no longer has
its own value, and instead simply aliases to BLIS_PACKED_UNSPEC. This
makes room in the three pack_t bits of the info field of obj_t so that
two values are now unused, and may be used for other future purposes.
- Updated sloppy terminology usage in comments in level-2 front-ends.
(Replaced "is contiguous" with more accurate "has unit stride".)
commit ddc8c1c379b4787be5954802906593d7ea144452
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 13 14:55:43 2014 -0600
Suppress warning in Makefile (UNINSTALL_LIBS).
Details:
- Redirect errors to /dev/null when using 'find' to locate libraries that
would be uninstalled upon executing "make uninstall-old". Before, if the
Makefile was read before $(INSTALL_PREFIX)/lib existed, a "No such file
or directory" message was emitted. This message was harmless, but is now
suppressed in this situation.
commit f8f67d7251bffc05020e20527c100c8115fd5e55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 10 09:06:11 2014 -0600
Typecast bli_getopt() return value in testsuite.
Details:
- In the test suite driver, inserted an explicit typecast of the return
value of bli_getopt() prior parsing. The lack of typecast caused a
problem on at least one system whereby a return value of -1 was
interpreted as garbage character. Thanks to Francisco Igual for finding
and submitting this fix.
commit e7f154fe2ed3e10e2323cefe5d25c2c23ac902c4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 10 08:48:07 2014 -0600
Applied edge case fix to arm/neon microkernel.
Details:
- Applied an edge case bugfix, courtesy of Francisco Igual, to the current
double precision real gemm microkernel in kernels/arm/neon/3.
commit 89c76a8a51d070d263c13bfa5ace65769509f2b4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jan 9 12:08:37 2014 -0600
Allow building outside source distribution.
Details:
- Modified build system (mostly configure and top-level Makefile) so that
a user can build a BLIS library outside of the top-level directory of
the source distribution.
- Added "test" target to Makefile so that the user can run "make test",
which will compile, link, and run the testsuite binary. This works even
if the build directory is externally located, thanks to the test suite
binary's new -g and -o command-line options. Also, when creating the
test suite via the top-level Makefile, the linking is against the
local archive, in lib/<configname>, rather than at <install_prefix>/lib.
- Modified testsuite/Makefile so that it links against the library built
locally, in ../lib/<configname>.
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
- Various other cleanups to build system.
commit 12fa82ec12cc340ab28552997d9d50f7c98691f8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jan 8 16:09:26 2014 -0600
Implemented bli_getopt().
Details:
- Added bli_getopt.c and .h files to frame/base. These files implement
a custom version of getopt(), which may be used to parse command line
options passed into a program via argc/argv. I am implementing this
function myself, as opposed to using the version available via unistd.h,
for portability reasons, as the only requirements are string.h (which
is available via the standard C library).
- Modified test suite to allow the user to specify the file name (and/or
path) to the parameters and operations input files: -g may be used to
specify the general input file and -o to specify the operations input
file). If -g or -o or both are not given, default filenames are assumed
(as well as their existence in the current directory).
commit cafb58e86ea5cfb21b9eedc57ca8ebbf24252098
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 6 13:28:36 2014 -0600
Updated template micro-kernels to use auxinfo_t.
Details:
- Updated template micro-kernel implementations (located in
config/template/kernels), to adhere to the new auxinfo_t interface.
Meant to include this change in a0331fb1.
- Changed template configuration to use 64-bit integers (for both BLIS
and the BLAS compatibility layer).
commit 9ab126b499c3805045020cb89a8a5848e28d3bf5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jan 6 12:13:26 2014 -0600
Removed error checks in netlib->BLIS param mapping
Details:
- Disabled error checking in netlib-to-BLIS parameter mapping functions.
If the char value input to these functions was not one of the defined
values, bli_check_error_code() with the appropriate error code value
would be called, resulting in an abort(). This was unnecessary and
redundant since these routines are currently only used within the
BLAS compatibility layer, and they are only called AFTER parameter
checking has already been performed on the original BLAS char values.
If the application tried to override xerbla() to prevent an abort()
from being called, this error checking would still get in the way.
Thus, instead of reporting the error situation to the framework (ie:
calling abort()), an arbitrary BLIS parameter value is now chosen and
the function returns normally. Thanks to Jeff Hammond for finding and
reporting this issue.
commit 2cb13600f9f9601c60e7f96f4ca159d169ade9cb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 3 12:29:13 2014 -0600
Updated year in copyright headers to 2014.
commit 290fa54e0083c9c837188b8321b13b1b282e7b0c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 20 14:10:26 2013 -0600
Store variable panel strides in trmm/trsm auxinfo.
Details:
- Changed the value being stored into the auxinfo_t structure in trmm
and trsm macro-kernels. Whereas before we stored whatever value was
provided to the macro-kernel implementation via ps_a/ps_b, now we
store the stride that will advance to the next variable-length
micro-panel of the triangular matrix A (left) or B (right).
- Whitespace changes to the files affected above.
commit e3a6c7e77667fd749248df3f75f880266c3136ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 19 16:29:31 2013 -0600
Macroized conditionals for a2/b2 in macro-kernels.
Details:
- Replaced conditional expressions in macro-kernels related to computing
the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
invocation, bli_is_last_iter(), that tests the same condition.
- Updated gemm_ukr module to use auxinfo_t argument.
- Whitespace changes in test suite ukr modules.
commit a0331fb10a50393e31d16339053b75b944132da1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 19 14:50:11 2013 -0600
Introduced auxinfo_t argument to micro-kernels.
Details:
- Removed a_next and b_next arguments to micro-kernels and replaced them
with a pointer to a new datatype, auxinfo_t, which is simply a struct
that holds a_next and b_next. The struct may hold other auxiliary
information that may be useful to a micro-kernel, such as micro-panel
stride. Micro-kernels may access struct fields via accessor macros
defined in bli_auxinfo_macro_defs.h.
- Updated all instances of micro-kernel definitions, micro-kernel calls,
as well as macro-kernels (for declaring and initializing the structs)
according to above change.
commit 392428dea4001fe4384efe29f6cde32f8abeeb35
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 12 19:01:47 2013 -0600
Added "ri" scalar macros.
Details:
- Added set of basic scalar macros that take arguments' real and
imaginary components separately, named like the previous set except
with the "ris" (instead of "s") suffix.
- Redefined the previous set of scalar macros (those that take arguments
"whole") in terms of the new "ri" set.
- Renamed setris and getris macros to sets and gets.
- Renamed setimag0 macros to seti0s.
- Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.
commit f60c8adc2f61eaba06b892f4e73000159de93056
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 10 14:39:56 2013 -0600
Minor updates to dunnington configuration.
Details:
- Added commented alternatives to dunnington configuration's bli_kernel.h.
- Minor reformatting of optimization flag variables in make_defs.mk.
commit 4ef20150492db254b5baf2368add62e19b0ac11b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 9 18:53:03 2013 -0600
Tweaks to dunnington configuration (x86_64/core2).
Details:
- Updated BLIS_DEFAULT_KC_D from 256 to 384.
- Enabled cache blocksize extension of up to 25% for MC and KC (for
double-precision real).
commit 5ad2ce7bf5ba3ea955e6d517bfd270e02820263b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 9 18:30:49 2013 -0600
Minor x86_64 (core2) kernel fixes.
Details:
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
for x86_64/core2 were calling the wrong reference code (l instead
of u).
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
kernels.
- Minor typecasting fix in testsuite/src/test_libblis.c.
- Makefile updates.
commit d289f5d3a9c0e1a68a17c1c32b736e282a289c4c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 5 10:56:13 2013 -0600
Whitespace changes to level-2 blocked variants.
Details:
- Joined some lines in level-2 blocked variants to match formatting used
in level-3 blocked variants.
- Streamlined implementation of bli_obj_equals() in bli_query.c.
commit b444489f100d218bc8ef29b01ff8489c358559f9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 3 16:08:30 2013 -0600
Added new "attached" scalar representation.
Details:
- Added infrastructure to support a new scalar representation, whereby
every object contains an internal scalar that defaults to 1.0. This
facilitates passing scalars around without having to house them in
separate objects. These "attached" scalars are stored in the internal
atom_t field of the obj_t struct, and are always stored to be the same
datatype as the object to which they are attached. Level-3 variants no
longer take scalar arguments, however, level-3 internal back-ends stll
do; this is so that the calling function can perform subproblems such
as C := C - alpha * A * B on-the-fly without needing to change either
of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):
bli_obj_init_scalar_copy_of()
-> bli_obj_scalar_init_detached_copy_of()
bli_obj_init_scalar() -> bli_obj_scalar_init_detached()
bli_obj_create_scalar_with_attached_buffer()
-> bli_obj_create_1x1_with_attached_buffer()
bli_obj_scalar_equals() -> bli_obj_equals()
- Defined new functions:
bli_obj_scalar_detach()
bli_obj_scalar_attach()
bli_obj_scalar_apply_scalar()
bli_obj_scalar_reset()
bli_obj_scalar_has_nonzero_imag()
bli_obj_scalar_equals()
- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:
bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
bli_obj_is_scalar() -> bli_obj_is_1x1()
- Defined new macros to set and copy internal scalars between objects:
bli_obj_set_internal_scalar()
bli_obj_copy_internal_scalar()
- In level-3 internal back-ends, added conditional blocks where alpha and
beta are checked for non-unit-ness. Those values for alpha and beta are
applied to the scalars attached to aliases of A/B/C, as appropriate,
before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
attached to A and B are multiplied together to obtain alpha, while beta
is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
future support for mixed domain/precision. These can be added back later
once that functionality is given proper treatment. Also, removed the
creating of copy-casts of alpha and beta since typecasting of scalars
is now implicitly handled in the internal back-ends when alpha and
beta are applied to the attached scalars.
commit 992de486d6f23e69a623abd15ae77d7881d13871
Merge: 9552e6e fd4ac63
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 2 13:58:46 2013 -0600
Unimplemented kernels now call reference.
Details:
- Updated arm, bgq, loongson3a, and x86_64 kernels so that unimplemented
datatypes call the corresponding reference kernel. Previously, these
kernel functions called abort() with a "not yet implemented" error
message.
commit fd4ac636d9a55cec1476a444bd4e70def219dc8f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 2 13:50:36 2013 -0600
Unimplemented kernels now call reference.
Details:
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
unimplemented kernel functions simply call the corresponding reference
implementation. (Previously, these unimplemented functions would
abort() with a "not yet implemented" message.)
commit 9552e6ee824d4345d5e908e869e071d19829819a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Nov 24 11:40:31 2013 -0600
Removed optional scaling from packm control tree.
Details:
- Removed does_scale field from packm control tree node and
bli_packm_cntl_obj_create() interface. Adjusted all invocations of
_cntl_obj_create() accordingly.
- Redefined/renamted macros that are used in aliasing so that now,
bli_obj_alias_to() does a full alias (shallow copy) while
bli_obj_alias_for_packing() does a partial alias that preserves the
pack_mem-related fields of the aliasing (destination) object.
- Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree
will work just fine for bli_trmm3().
- Removed some commented vestiges of the typecasting functionality needed
to support heterogeneous datatypes.
commit e65c476284db9ef64b23191a21c2584b1083342f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Nov 19 10:05:35 2013 -0600
Minor updates to packm_blk_var2.c and _blk_var3.c.
Details:
- Comment updates to packm_blk_var2.c and packm_blk_var3.c.
- In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly
instead of setm(), scal2m().
commit 9e1d0d4bca48eda54301d8976f203e2544c9df3a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 18:11:07 2013 -0600
Added trsm_l, trsm_u ukernels for x86_64/core2.
Details:
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
that already existed in kernels/x86_64/core2-sse3/3.
commit 85e7e02ea3a9190b6fcff5d46b00d41c79cb1242
Merge: 67761e2 7072005
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 12:02:00 2013 -0600
Merge branch 'master'. Forgot to git-pull.
commit 67761e224c92500eecf9c1540cc72bdd2fb27679
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 11:57:40 2013 -0600
Attempting to fix errors in bgq build.
Details:
- Removed restrict declaration from b_cast and c_cast from
bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
are causing problems for xlc only in those two files and no other
macro-kernels.
- Fixed (hopefully) kernel function parameter type declarations in
kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.
commit 707200541d344f98cf34c9801954dbb36fbe0447
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 11:17:31 2013 -0600
Syntax error fix in x86_64/core2 gemmtrsm_u ukr.
commit bbe2b84a49e7785d4d0c514cda34adfbe66478b0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 11:11:06 2013 -0600
Updated Makefile in test, testsuite.
Details:
- Updated Makefiles in test and testsuite directories to use the new
BLIS header installation directory scheme, which is to compile with
-I<PREFIX>/include/blis instead of -I<PREFIX>/include.
commit 9bd7fcfd436625ca2108128086671319362f4d92
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 18 10:58:09 2013 -0600
Outer-to-inner 'restrict' fix in macro-kernels.
Details:
- Fixed sloppy placement of 'restrict' pointer declarations in level-3
macro-kernels. Previously, all restricted pointers were being declared
at the outer-most function scope level. While this violates the C99
standard, very few of the compilers used with BLIS so far have seemed
to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith
for identifying this bug (and suggesting the fix).
commit 50549a6a31dd26cf63a013e0ede16b2c7ce835b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Nov 17 18:31:27 2013 -0600
Changed header install directory to include/blis.
Details:
- Changed top-level Makefile so that headers are installed to
$(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
named by version/configuration and then symlinked.)
- Added uninstall targets, including uninstall-old to clean out old
library archives.
- Added GREP makefile definitions to all configurations' make_defs.mk.
commit d70733abddfb9a95661897e1e4f3c1f3cfa7cbaa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 16 17:34:25 2013 -0600
Added ARM kernels, configurations.
Details:
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
Thanks to Francisco Igual for contributing these kernels and
configurations.
commit d37c2cff62089c86983c2f79762f4b5329037373
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 13 10:47:11 2013 -0600
Minor comment and Makefile changes.
Details:
- Added missing 'check-config' and 'check-make-defs' targets to
testsuite/Makefile.
- Removed unused 'test' target from top-level Makefile.
- Comment changes to testsuite input files.
commit 19885f893a17b91ee79bead0620d0f913392d4c5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 11 12:09:21 2013 -0600
Updated some kernel comment headers.
Details:
- Updated bgq and piledriver comment headers to use BLIS copyright header
instead of libflame.
commit 1a4d698f42981d74fe5f29b980031e1ee7dc42d5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 11 10:15:40 2013 -0600
CHANGELOG update (for 0.1.0).
commit 089048d5895a30221b6b1976c9be93ad6443420d (tag: 0.1.0)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 9 17:18:00 2013 -0600
Added object wrappers to 1f test suite modules.
Details:
- Added missing object wrappers to level-1f test suite modules. This was
only apparent if you were configuring with something other than the
reference configuration.
- Commented out object-wrappers in level-1f front-ends. These were not
working as intended the reference configuration was selected, because
most kernel sets, such as those in the template set, do not have object
wrappers.
- Whitespace changes to template micro-kernels.
- Comment changes to template level-1f kernel headers.
commit 9ef3752079de10124bed906b5d28479d04aa8187
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 8 17:20:47 2013 -0600
Updated template kernels wrt KernelsHowTo wiki.
Details:
- Merged latest state of KernelsHowTo wiki into template micro-kernels
located in config/template/kernels/3.
commit 376bbb59c8944e29c5c1ff6637920d8451370afa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 8 11:17:34 2013 -0600
Removed support for duplication.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
and all framework code.
- Updated test suite modules according to above changes.
commit 68a5910974b62b4df853fae2a68cb04df9d5a19c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Nov 7 11:36:11 2013 -0600
Added comments to testsuite/input.operations.
Details:
- Added extensive comments to the top of testsuite/input.operations,
which describe how to edit the file.
- Removed input.operations.0 and input.operations.1.
- Changed input.general to test all datatypes ("sdcz") by default.
commit a98f78b715fb256a519870071bb5266130d70b21
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 6 15:32:47 2013 -0600
Changed dim_t and inc_t to be signed integers.
Details:
- Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
This will facilitate interoperability with Fortran in the future.
(Fortran does not support unsigned integers.)
- Redefined many instances of stride-related macros so that they return
or use the absolute value of the strides, rather than the raw strides
which may now be signed. Added new macros bli_is_row_stored_f() and
bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
and changed the packm_blk_var[23] variants to use these macros instead
of the existing bli_is_row_stored(), bli_is_col_stored().
- Added/adjusted typecasting to to various functions/macros, including
bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
related macros in bli_param_macro_defs.h.
- Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
layer properly handles situations where vector increments are negative.
Thanks to Vladimir Sukharev for pointing out this issue.
- Changed type of increment parameters in bli_adjust_strides() from dim_t
to inc_t. Likewise in bli_check_matrix_strides().
- Defined bli_check_matrix_object(), which checks for negative strides.
- Redefined bli_check_scalar_object() and bli_check_vector_object() so
that they also check for negative stride.
- Added instances of bli_check_matrix_object() to various operations'
_check routines.
commit 1f8afc3e08a4312cfe810be86aedeacbc57275c5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Nov 6 10:09:10 2013 -0600
Minor comment update to BLAS compat files.
commit 1abbf768afafc158d44e4d5c4a135cfd9e277f13
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 4 15:50:00 2013 -0600
Fixed bugs in scalv and setv.
Details:
- Fixed bugs similar to those addressed in cca1e1f51dc6, whereby
a segmentation fault may occur if beta is not the same type as
the vector operand for scalv and setv.
- Changed axpyv and scal2v front-ends in a similar fashion.
commit f5953259a1842ee48e5833c22ac86e68a337bfe1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Nov 4 14:43:55 2013 -0600
Fixed a bug related to Hermitian matrix diagonals.
Details:
- Fixed a bug whereby BLIS assumed that the imaginary components of the
diagonal elements of Hermitian matrices were already zero. This property
is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
to Vladimir Sukharev for reporting this bug.
- Minor comment updates to template kernels.
commit d70f2b089dac8b9e4c19295dfa6014c36afee2ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Nov 2 17:19:40 2013 -0500
Added scaling to abval2s, sqrt2s macros.
Details:
- Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
and overflow from squaring the real and imaginary components. (This is
the same technique used to fix recent bugs in invscals/invscaljs and
inverts.)
commit c5b1ed9409ae2f71d04041eef5da9a0080b5784a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 1 10:28:04 2013 -0500
Added new dotxaxpyf variant 2.
Details:
- Added a new variant for dotxaxpyf that is based on dotxf and axpyf
kernels. By default, this variant is not used by any other operation.
commit 97f89fbcf202d72fc440b614708e352ea31633e2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Nov 1 10:16:39 2013 -0500
Fixed bug in complex invscals.
Details:
- Fixed complex inversion in invscals and invscaljs whereby the
imaginary component was being computed incorrectly.
- Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
in inverts, invscals, and invscaljs.
- Changed bli_abs() and bli_fabs() macro definitions to use "<="
operator instead of "<".
commit eda42a21d17a2742eab69ab801ed530b82488c8a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 31 18:00:44 2013 -0500
Defined missing symbols in bla_rotg.c
Details:
- Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
these bugs.
commit cca1e1f51dc67a2c3725d5c1837256831aaf70f8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 30 14:39:01 2013 -0500
Fixed bugs in scalm and setm.
Details:
- Fixed bugs in scalm and setm that resulted in segmentation faults when
beta is not the same type as the matrix operand. Thanks to Vladimir
Sukharev for reporting this bug.
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
and setm; namely, the alpha scalar is copy-cast the type of the first
matrix operand.
- Changed the template and reference configurations' bli_config.h files
so that the number of memory allocator blocks of A and B are set based
on BLIS_MAX_NUM_THREADS.
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.
commit 2807013a4761c2b84b3944de64d23483ad7ef2fb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 24 14:32:20 2013 -0500
Fixed over/under-flow in complex inversion.
Details:
- Fixed the complex bli_?inverts() macros, which were inverting elements
in an "unsafe" manner, such that very large and very small values were
unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
reporting this bug.
- Comment update to bli_sumsqv_unb_var1.c.
- Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
- Changed 1.0F to 1.0 for bli_drands() macro.
commit 45a80c625f84edb2ade6ac25efe2b9c589d7e0df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Oct 23 12:15:25 2013 -0500
Fixed parameter checking issue in BLAS syr[2]k.
Details:
- Fixed a minor parameter checking bug in the BLAS compatibility layer
for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the
trans parameter of either operation, it is (a) allowed, and (b) treated
as 'T' (whereas previously it was disallowed). Thanks for Vladimir
Sukharev for finding and reporting this bug.
commit a091a219bda55e56817acd4930c2aa4472e53ba5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Oct 14 10:11:29 2013 -0500
Minor fixes to piledriver configuration, ukernel.
Details:
- Applied a patch from Tyler that fixes minor staleness in the piledriver
configuration and gemm micro-kernel.
- Very minor changes to test suite input files.
commit dacdde27aee4fb90b14880136d7f20c6b234e2c6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 11 11:37:19 2013 -0500
Added Fran's Sandy Bridge kernels/configuration.
Details:
- Added a kernel directory for kernels developed by Francisco Igual for
the Sandy Bridge architecture, including a dgemm ukernel coded with
AVX intrinsics.
- Added a configuration for Sandy Bridge using values supplied by Fran.
commit 03106d650e4030d4c9831683448376f92fc52d41
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Oct 11 10:40:38 2013 -0500
Fixed minor perf bug in gemm_ker_var2.
Details:
- Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental
bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not
computed correctly (ie: do not wraparound) at the edge cases. Thanks to
Tze Meng for helping me identify this bug.
commit b053337387dbdef9035be03538222670a21707ca
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 10 18:26:55 2013 -0500
Added fusing factors, MR/NR to test suite output.
Details:
- Updated the test suite driver (and modules where appropriate) so that
the level-1f fusing factors are output along with the variable dimension.
While this is not strictly necessary, since the fusing factors are output
in the initial parameter summary, it allows extra reassurance to the user
since the fusing factors appear alongside the variable dimension, which
together give a complete picture of the problem size. Similar changes were
made for outputting the register blocksizes when reporting results for the
micro-kernel test modules.
commit be4833bd91c5a58d0bfc52daaadf7ba543a77acf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 10 14:20:06 2013 -0500
Added test suite modules for level-1f, 3 kernels.
Details:
- Added test modules in test suite for level-1f kernels and level-3
micro-kernels. (Duplication in the micro-kernels, for now, is NOT
supported by these test modules.)
- Added section override switches to test suite's input.operations file.
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
facilitate the level-1f test modules. Also added front-end for dupl
operation.
- Added obj_t-based check routines for level-1f operations, which are
called from the new front-ends mentioned above.
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
factors as a function of datatype, which is needed by their respective
test modules.
- Whitespace changes to bli_kernel.h of all existing configurations.
commit 680188d46bb15b9a1a2867638104939dc77ca2a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 10 13:23:37 2013 -0500
Cleaned up old test drivers.
Details:
- Minor updates to old test drivers in preparation for our participation
in ACM TOMS's replicated results initiative.
commit 3690bdd4f95769c935c410414112102cc3e108b1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 10 11:45:33 2013 -0500
More updates to level-1f kernels for core2-sse3.
Details:
- Changed types in function signatures to match new prototypes. Meant to
include this in previous commit.
commit 661d5120cd7071f9b0c5cefc95f99f1361370ade
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Oct 10 11:27:27 2013 -0500
Fixed outdated fusing factor macros in 1f kernels.
Details:
- Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
this out.
commit 73aa1e9f31d1b2a319c7e711ced6db3f9835c832
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Oct 1 17:01:18 2013 -0500
Added section overrides to test suite.
Details:
- Added new lines of input to the test suite's input.operations file, which
allows the user to disable entire sections (levels) of tests. Before this
change, the user had to manually disable each operation tests's "master
switch". (This is why input.operations.0 existed: to allow a more
convenient starting point for someone who only wanted to test one or a
few operations.)
commit 5e54f46ccb76beab892d530b693e07c6bf6db7cf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 30 12:58:18 2013 -0500
Added template implementations and other tweaks.
Details:
- Added a 'template' configuration, which contains stub implementations of the
level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
dotxaxpyf, as well as the default fusing factor (which are all equal
in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
reference variants were implemented in terms of front-end routines rather
that directly in terms of the kernels. (For example, axpy2v was implemented
as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
A is assumed to be m x b_n in both cases, and for dotxf A is actually used
as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
frame/3/gemm/ukernels and frame/3/trsm/ukernels.
commit 97aaf220a847363b4da35935eca17790c0ef71f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 17 10:51:36 2013 -0500
Added new kernels, configurations.
Details:
- Added various micro-kernels for the following architectures:
Intel MIC
IBM BG/Q
IBM Power7
AMD Piledriver
Loogson 3A
and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
and Xianyi Zhang for contributing these kernels.
- Added configurations corresponding to above architectures, and renamed
"clarksville" configuration to "dunnington".
commit fe979c5a114c877506a5697cdab1fc8cf2bcd303
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 13 14:31:53 2013 -0500
Removed default configuration behavior.
Details:
- Changed the configure script so that it no longer defaults to the
reference configuration. This change is being made so that the
developer has a firm awareness of which configuration is being used
to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this
suggested change.
commit da77e9614f54f92f703f01e3b9bd67a83280150c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Sep 13 12:00:37 2013 -0500
Minor improvements to static memory allocator.
Details:
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
functionality includes computing the pool size for each datatype (using
that datatype's cache blocksizes) and using the maximum to size the
actual pool array. This addresses the somewhat common pitfall whereby a
developer updates cache blocksizes in bli_kernel.h for only one datatype
(say, single-precision real), while the memory pools are sized using the
double-precision real values. Then, when the developer attempts to link
to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
a message saying the static memory pool was exhausted. Clearly, this
message is misleading when the pool was not sized properly to begin with.
- Removed previously disabled code in bli_kernel_macro_defs.h that was
meant to check for size consistency among the various cache blocksizes.
(Obviously the memory pool size-based solution mentioned above is better.)
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
reasonable place to put these constants, rather than further crowd up
bli_config.h.
- Updated testsuite driver to output memory pool sizes for A, B, and C.
- Minor comment updates to bli_config.h.
- Removed 'flame' configuration. It was beginning to get out-of-date, and
I hadn't used it in months. We can always re-create it later.
commit 631f347b7a99cb02757c534fd3ec5f723a2fdb0e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 10 17:17:28 2013 -0500
Added ESSL and Accelerate targets to test drivers.
Details:
- Added ESSL and Accelerate (OS X) targets to standalone test drivers'
Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
/ providing this patch.
commit 7ae4d7a41d13ef5f1ceee217c000a5cf77a11128
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 10 16:35:12 2013 -0500
Various changes to treatment of integers.
Details:
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
assigned values of 32, 64, or some other value. The former two result in
defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
causes integers to be defined in terms of a default type (e.g. long int).
- Updated bli_config.h in reference and clarksville configurations according
to above changes.
- Updated test drivers in test and testsuite to avoid type warnings associated
with format specifiers not matching the types of their arguments to printf()
and scanf().
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
inclusion in d141f9eeb6d1).
- Added explicit typecasting of dim_t and inc_t to macros in
bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
- Slight changes to CREDITS and INSTALL files.
- Slight tweaks to Windows build system, mostly in the form of switching to
Windows-style CRLF newlines for certain files.
commit 068437736b41d51a1f5ec47839f059bf58a20413
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 9 14:07:58 2013 -0500
Fixed set-but-not-used compiler (gcc) warnings.
Details:
- Used void-casts of certain variables to appease gcc (and perhaps other
compilers) when such variables are only used in the complex instances of
the functions. Special thanks to Karl Rupp for suggesting a portable fix
for these warnings.
commit 6dc85f63dcd5282340c9e00d585e97d70a21edc3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 9 13:48:52 2013 -0500
Small fix to Windows defs.mk makefile fragment.
Details:
- Commented out a !include statement that was attempting to include a
version file that does not yet exist. For now, the version string is
hard-coded into defs.mk.
commit d141f9eeb6d1de7044b7429adf52d11c6fca620c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 9 13:09:16 2013 -0500
Added Windows build system.
Details:
- Added a 'windows' directory, which contains a Windows build system
similar to that of libflame's. Thanks to Martin for getting this up
and running.
- Spun off system header #includes into bli_system.h, which is included
in blis.h
- Added a Windows section to bli_clock.c (similar to libflame's).
commit 9b320e7406fb69e8b61a0085abe2ed89a96bdb68
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Sep 9 11:04:46 2013 -0500
Edited bli_?lamch.c to avoid Windows keyword.
Details:
- Renamed "small" variable to "smnum" to avoid collision with Windows type
by the same name. This change is needed in advance of the upcoming Windows
build system.
commit 9013ad6ff2e9ace35e0cf44c32795c2f3d5be628
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 4 13:36:07 2013 -0500
Switched integer typedefs (again) to C types.
Details:
- Redefined gint_t and guint_t in terms of the standard C types long int
and unsigned long int, respectively.
- Changed testsuite default max problem size to 500.
- Changed testsuite input.operations to use square problems for level-3
operation tests.
commit 981a60cfa07abac2e93697dfe12b0f076ab00a38
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Sep 4 12:09:11 2013 -0500
Falling back to 32-bit integers for dim_t, etc.
Details:
- In light of recent segfaulting issues when compiling on 32-bit systems,
I've changed the default typedef for gint_t and guint_t from int64_t and
uint64_t to int32_t and uint32_t, respectively.
- Disabled 64-bit integers in the blas2blis layer for the reference
configuration.
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
to introductory output of the testsuite.
commit b776ddcd4338b34f172ef78da0ac1d771a771ab4
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 3 21:58:07 2013 -0500
Applied temp fix to typecasting bug in testsuite.
Details:
- Applied a temporary fix to the typecasting bug in the testsuite driver.
The fix involves casting both numerator and denominator to unsigned long.
This fix is more voodoo than science, as I can't be sure why it even
works.
commit 9ee6e125373869c4213c017ce772c38ecefba103
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Sep 3 21:53:27 2013 -0500
Changed dimension spec for gemm in testsuite.
Details:
- Encounted a bizarre typecasting bug whereby the test suite was not
computing the proper dimension from the problem size and dimension
specification when the latter was set to -3. Will investigate.
Thanks to Fran for finding this "bug".
commit e8be081e68c385ab44d0fea8dade21d40c200b79
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 28 15:52:34 2013 -0500
Generalized matlab and file output in testsuite.
Details:
- Added a new option in input.general that allows outputting in
matlab/octave format so that one can output in matlab format
independently from outputting to files.
- Adjusted input.operations according to above.
- Added input.operations.0 and input.operations.1 with all options
disabled and enabled, respectively.
commit d352c746e5683037d41b5061dfb5ce08e1d0843b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 27 13:41:46 2013 -0500
Added single/real gemm micro-kernel for x86_64.
Details:
- Added a single-precision real gemm micro-kernel in
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
- Adjusted the single-precision real register blocksizes in
config/clarksville/bli_kernel.h to be 8x4.
- Added a missing comment to bli_packm_blk_var2.c that was present in
bli_packm_blk_var3.c
commit dedda523dc5dc779ecc34e6a03dc74cb8eb220de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Aug 19 12:07:41 2013 -0500
Fixed bug in bli_acquire_mpart_t2b(), _l2r().
Details:
- Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r()
that cause incorrect partitioning when SUBPART0 was requested. This
bug was introduced in 46d3d09d49ad. Thanks to Bryan for isolating
this bug.
- Removed dupl kernels from kernels/x86_64/3 directory.
- Uncommented beta == 0 optimizaition code in
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
commit 12dbd2f33455e9384fe2070cbdd660fd4a7fceb5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 8 14:39:35 2013 -0500
Moved init_safe(), finalize_safe() to BLAS compat.
Details:
- Moved the bli_init_safe() and bli_finalize_safe() function calls from the
BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto-
initializers in the BLIS layer wasn't buying us anything because the user
could still call the library with uninitialized global scalar constants,
for example. Thus, we will just have to live with the constraint that
bli_init() MUST be called before calling ANY routine with a bli_ prefix.
- Added the missing _init_safe() and finalize_safe() calls to the level-1
BLAS compatibility wrappers.
commit 8abfe55f2ae5d89df18e1b26a5a28d94b0936683
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 8 13:30:19 2013 -0500
Miscellaneous updates.
Details:
- Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
BLIS_CACHE_LINE_SIZE (typically 64).
- Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
kernels.
- Reformulated gemm_ker_var2 to look more like the other level-3 macro-
kernels, in that the interior and edge-case handling is expressed once
inside the loops in the n and m dimensions, rather than the edge-case
handling being "unrolled" and expressed as distinct code regions. The
previous macro-kernel now lives in retired form in the subdirectory
other/bli_gemm_ker_var2.c.old.
- Updated experimental gemm_ker_var5 according to above change.
- Fixed bug in bli_her2k.c whereby incorrect transformations were being
applied to optimize the macro-kernel accesses pattern on C when C is
row-stored.
- Various updates inside of test/exec_sizes.
commit 1aa05736ff49e7cc5f121acf615460fe9a87852c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Aug 7 12:27:04 2013 -0500
Fixed bug in interface of bla_ger_check().
Details:
- Fixed the misplaced lda parameter in the function signature of
bla_ger_check(). Thanks to Tyler for finding this bug.
commit 685aad25353fb200de4ca97a8bc0feeebde51d0f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Aug 6 12:25:51 2013 -0500
Fixed cpp guard typos in frame/compat/check files.
Details:
- Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been
BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this.
- Fixed various syntax errors in the code that had yet to be compiled
due to the aforementioned bug.
commit f4ec28e723d28d998f1038f82da6986e44320ef6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Aug 1 11:24:23 2013 -0500
Added basic OpenMP-based gemm and packm files.
Details:
- Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2
into the following auxiliary files
frame/1m/packm/other/bli_packm_blk_var2.c
frame/3/gemm/other/bli_gemm_ker_var2.c
The routine in the first file uses a basic OpenMP parallel region to
parallelize the packing of blocks of A and panels of B, while the
second uses a similar parallel region to parallelize along the n
dimension of the gemm macro-kernel.
commit f8980edf9c318453bb1962ac4939c06bf11e6d5e
Merge: 67a8b94 6e7e452
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 26 11:14:27 2013 -0500
Merge branch 'master' of https://code.google.com/p/blis
commit 67a8b9498d13b038deb316ac163e62c5b17da2ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 26 11:12:37 2013 -0500
Added missing cpp kernel blocksize constraints.
Details:
- Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce
constraints on the register blocksizes relative to the cache blocksizes.
Thanks to Tyler for helping me stumble across this issue.
commit 6e7e452343014e8f86640874dc1dbadca4a642a1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 22 14:50:57 2013 -0500
Fixed minor warnings and misc issues.
Details:
- Fixed various warnings output by gcc 4.6.3-1, including removing some
set-but-not-used variables and addressing some instances of typecasting
of pointer types to integer types of different sizes.
commit 03f6c3599743bc837a7d40eb5b415b1bf4f2a4e9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 22 12:54:32 2013 -0500
Tightened some macros that detect datatypes.
Details:
- Modified the definitions of some macros, such as bli_is_real(), so that
the "special" bit is taken into account so that BLIS_INT is differentiated
from BLIS_FLOAT.
- Whitespace changes to bli_obj_macro_defs.h.
- Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
being used.
commit b33e2f4443b9043b554963320280ff7783773652
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jul 19 17:15:03 2013 -0500
CHANGELOG update (for 0.0.9).
commit 0680916fdd532f7a4716b11a2515243b2c08d00f (tag: 0.0.9)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 18 18:04:34 2013 -0500
Added BLAS error checking to compatibility layer.
Details:
- Added frame/compat/check directory, which now houses companion _check()
routines for each of the BLAS wrappers in frame/compat. These _check()
routines are called from the compatibility wrappers and mimic the
error-checking present in the netlib BLAS.
- Edited bla_xerbla.c so that xerbla() translates the operation string to
uppercase before printing.
- Redefined util routines in frame/compat/f2c/util in terms of level0
macros.
- Added prototypes for util routines, f2c routines, lsame(), and xerbla().
- Commented out prototypes in test/test_*.c since Fortran integers are now
int64_t by default (and the prototypes that were present in the files
used int).
- Removed redundant #include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c,
since blis.h was already being included.
- Other minor changes to code in frame/compat/f2c.
commit 4e80ad28c97273db3366428ec44020da7944964d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jul 18 17:53:31 2013 -0500
Added support for C99 complex types/arithmetic.
Details:
- Added support for C99 complex types to bli_type_defs.h and overloaded
complex arithmetic to the scalar-level macros in include/level0. This
includes a somewhat substantial reorganization and re-layering of much
of the existing machinery present in the level0 macros.
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
commented-out by default, which optionally enables the use of built-in
C99 complex types and arithmetic.
- Minor changes to clarksville and reference configs' make_defs.mk files.
- Removed macro definitions from bli_param_macro_defs.h which was not being
used (bli_proj_dt_to_real_if_imag_eq0).
commit 6072d7c848e837ba20d607f7b727438ada31bdcf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 17 12:27:45 2013 -0500
Fixed bugs in trsm, trmm macro-kernels.
Details:
- Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling.
- Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was
incorrectly being adjusted upward by MR, instead of NR. The rl and ru
trmm macro-kernels were updated in a similar fashion.
- Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on
diagoffb when recomputing k to skip a zero region below where the
diagonal intersects the right side of the block. The corresponding
trmm macro-kernel was also updated.
- Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR)
needed to be placed AFTER the block that recomputes k to skip the zero
region (if present). The other three trsm macro-kernels, as well as the
trmm macro-kernels, were updated in the same manner, for consistency.
- Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was
being updated to skip a zero region to the left of where the diagonal
of A intersects the top edge of the block.
- Comment updates to all trsm and trmm macro-kernels.
- Comment updates to bli_packm_init.c.
commit 47410a48f9b91e94ce4c67633686ffd1f2ad0275
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 10 14:53:59 2013 -0500
Added f2c'ed Givens rotation wrappers.
Details:
- Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic'
along with other wrappers for which no BLIS implementation exists.
- Added f2c-generated codes for applicable datatype flavors of rot, rotg,
rotm, and rotmg operations.
commit e5f90f3a8dbe671104bcb9d8b4e3409de01805da
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 10 13:40:12 2013 -0500
Removed copynz defs from bli_kernel.h files.
Details:
- Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each
configuration. (Meant to include this in previous commit.)
commit aec12d90f596e8c04b1ad178258a1cd38108f59d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jul 10 13:33:30 2013 -0500
Removed copynzv, copynzm and related codes.
Details:
- Removed copynzv and copynzm operation directories. These operations
implemented a variation of copyv/m that, in the case of real source
and complex destination operands, leaves the imaginary component
untouched (rather than setting it to zero). I realize now that the
special case(s) (e.g. gemm with real A and B but complex C) that I
thought required this operation actually can be handled more simply.
- Removed level0 scalar macros implementing copynzs, copynzjs.
commit b0a0a0f274a761788531b5d281cc3b411b7124ed
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jul 9 17:15:38 2013 -0500
Added handling of restrict, stdint.h for non-C99.
Details:
- Removed the #include <stdint.h> from blis.h and inserted a cpp macro block
in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise
manually typedefs the types we need (which, for now, are unconditionally
int64_t and uint64_t).
- Moved basic typedefs to top of bli_type_defs.h, and comment changes.
- Added cpp macro block to bli_macro_defs.h that #defines restrict as
nothing for C++ and non-C99.
commit 4b7e7970f1af4a1ab121e07657e2b78b9fcd7671
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 8 15:20:34 2013 -0500
Migrated integer usage to stdint.h types.
Details:
- Changed the way bli_type_defs.h defines integer types so that dim_t,
inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
integer) or guint_t (general unsigned integer).
- Renamed Fortran types fchar and fint to f77_char and f77_int.
- Define f77_int as int64_t if a new configuration variable,
BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
These types are defined in stdint.h, which is now included in blis.h.
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
in terms of scomplex.
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
of char.
- Updated bla_amax() wrappers so that the return type is defined directly
as f77_int, rather than letting the prototype-generating macro decide
the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
so I removed them. Also, changed the body of the wrapper so that a
gint_t is passed into abmaxv, which is THEN typecast to an f77_int
before returning the value.
- Updated f2c code that accessed .r and .i fields of complex and
doublecomplex types so that they use .real and .imag instead (now that
we are using scomplex and dcomplex).
commit 372501398564fdba3d5a3db86c30bc1039b185ff
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jul 8 11:24:18 2013 -0500
Added experimental bli_gemm_ker_var5().
Details:
- Added support for an experimental gemm macro-kernel incrementally
packs one micro-panel of B at a time. This is useful for certain
special cases of gemm where m is small.
- Minor changes to default values of clarksville configuration.
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
do not yet have any use (or implementation support) for block storage.
- Comment update to bli_packm_init.c.
commit 9915d667a79f23e3a2a2516247c560e9063a1646
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Jul 7 13:28:39 2013 -0500
Defined "total" blocksize query functions.
Details:
- Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query
the default blocksize plus blocksize extension (using the type or the type
of an object).
- Comment update in bli_packm_cxk.c.
commit 46d3d09d49aded1d9f1b468c83fce75e07d631dc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 27 13:19:56 2013 -0500
Consolidated lower/upper her[2]k blocked variants.
Details:
- Consolidated lower and upper blocked variants for herk and her2k, and
renamed the resulting variants, according to the same changes recently
made to trmm and trsm.
- Implemented support for four new subpartitions types:
BLIS_SUBPART1T
BLIS_SUBPART1B
BLIS_SUBPART1L
BLIS_SUBPART1R
which correspond to "merged" partitions that include the middle "1"
partition as well as either the neighboring "0" or "2" partition. This is
used to clean up code in herk/her2k var2 that attempts to partition away
the strictly zero region above or below the diagonal of a matrix operand
that is being marched through diagonally.
- Added safeguards to herk macro-kernels that skip any leading or trailing
zero region in the panel of C that is passed in. This is now needed given
that herk/her2k var1 no longer partitions off this zero region before
calling the macro-kernel (via bli_her[2]k_int()).
- Updated comments and other whitespace changes to trmm/trsm macro-kernels.
commit 02002ef6f3d2746665982793db36714bd69bccc9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 24 17:08:14 2013 -0500
Added row-storage optimizations for trmm, trsm.
Details:
- Implemented algorithmic optimizations for trmm and trsm whereby the right
side case is now handled explicitly, rather than induced indirectly by
transposing and swapping strides on operands. This allows us to walk through
the output matrix with favorable access patterns no matter how it is stored,
for all parameter combinations.
- Renamed trmm and trsm blocked variants so that there is no longer a
lower/upper distinction. Instead, we simply label the variants by which
dimension is partitioned and whether the variant marches forwards or
backwards through the corresponding partitioned operands.
- Added support for row-stored packing of lower and upper triangular matrices
(as provided by bli_packm_blk_var3.c).
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
blocksize extensions (if non-zero) were not being used to appropriately size
the first iteration (ie: the bottom/right edge case).
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
whole multiples of MR AND NR. This is needed for the case of trsm_r where,
in order to reuse existing left-side gemmtrsm fused micro-kernels, the
packing of A (left-hand operand) and B (right-hand operand) is done with
NR and MR, respectively (instead of MR and NR).
commit d1e81ddc848ee47bc188735883d14582bdd0cabc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 13 11:14:21 2013 -0500
Minor generalizing tweaks to trmm blk var1, var2.
commit 0efb7974f104206ba3985276f2180a9b14fe9f9b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 12 16:40:04 2013 -0500
CHANGELOG update.
commit 5b641c3bab31eac6a1795b9f6e3f86c59651ca50 (tag: 0.0.8)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Jun 12 16:02:12 2013 -0500
Use separate CFLAGS for "kernels" directories.
Details:
- Added a new "special" directory type: any source code within directories
named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
compiler flags. This allows the developer to specify a separate set of
flags (e.g. optimization flags) for compiling kernels while maintaining a
standard set for regular framework code.
- Fixed a bug in the top-level Makefile that was causing "noopt" code
to be compiled with the standard set of compilation flags.
- Updated make_defs.mk in reference, flame, and clarksville configurations
according to above changes.
commit 08475e7c7653ba598665071a617d10f0d8f763c2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 11 12:18:39 2013 -0500
Various level-3 optimizations for row storage.
Details:
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
packing from a lower or upper-stored symmetric/Hermitian matrix to column
panels (which are row-stored). Previously one could only pack to row panels
(which are column-stored).
- Implemented various optimizations in the level-3 front-ends that allow more
favorable access through row-stored matrices for gemm, hemm, herk, her2k,
symm, syrk, and syr2k.
- Cleaned up code in level-3 front-ends that has to do with setting target and
execution datatypes.
commit 05a657a6b92e8d34efa5c57ae6a18a4f35ec0841
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jun 7 11:04:10 2013 -0500
Added beta == 0 optimization to x86_64 ukernel.
Details:
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
from memory (nor scaled by beta).
- Fixed minor bug in test suite driver when "Test all combinations of storage
schemes?" switch is disabled, which would result in redundant tests being
executed for matrix-only (e.g. level-1m, level-3) operations if multiple
vector storage schemes were specified.
- Restored debug flags as default in clarksville configuration.
commit f1aa6b81cc421516dd77dd0f18f7c432724e6ef2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Jun 6 13:36:06 2013 -0500
Whitespace changes to old test drivers.
Details:
- Replaced tabs with four spaces in places where indention was already
in place.
commit 9feb4c23d2e36f3d8b5417a3802c69f94b29f749
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Jun 4 14:57:46 2013 -0500
Fixed unaligned handling in axpyf, dotxaxpyf.
Details:
- Fixed over-cautious handling of unaligned operands in vector instrinsic
implementation of axpyf kernel.
- Fixed over- and under-cautious handling of unaligned operands in vector
intrinsic implementation of dotxaxpyf kernel.
commit 22b06cfcd2e3205c8325a246c2279e4b1047c066
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Jun 3 16:54:52 2013 -0500
Updated level-1/-1f [vector intrinsic] kernels.
Details:
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
handled by reference implementation (rather than aborted).
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
configuration.
- Defined bli_offset_from_alignment() macro.
- Minor edits to old test drivers.
commit 0288c827d3659bb225ac9c10f168b623ed0106a2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Jun 1 08:02:23 2013 -0500
Updated ukernels for x86_64.
Details:
- Tweaked micro-kernels and configuration for clarksville.
- Updated/cleaned up old test drivers in test directory.
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
recently).
commit 85a6d1c9a52c2b27c71a3a3e341c51d7ba263749
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon May 6 11:05:08 2013 -0500
Replaced axpys usage with subs in trsv.
Details:
- Replaced instances of axpys with alpha equal to -1 with subs.
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
sizeof(dcomplex).
commit 2d9c667f3c48a12cab64e5ad09d5fcb9f4c19d78
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 24 16:28:10 2013 -0500
Fixed x86_64 kernel bugs and other minor issues.
Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
unaligned subpartitions. We were already going out of our way a bit to
handle edge cases in the first iteration for blocked variants, and this
was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
into account how the choice of variant needed to be altered for
upper-stored matrices (given that only lower-stored algorithms are
explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
macros to provide inlined versions of bli_determine_blocksize_[fb]() for
use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
consistency with that of the bugfix for trmv/trsv (both of which now
use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
was invalid only because the code was expecting 1 (for purposes of
performing contiguous vector loads) but got a value greater than 1 because
the column stride of the object (e.g. rho) was inflated for alignment
purposes (albeit unnecessarily since there is only one element in the
object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
internal back-ends to correctly handle cases where output operand still
needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
commit d57ec42b34f8447c88adeffa95cf22f8c115ad51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 3 17:35:32 2013 -0500
Renamed _trans_status() macro.
Details:
- Mistakenly forgot to rename the _trans_status() macro and instances in
previous commit.
commit 9e2b227866af429a4a6fb7dbb8c457bbdda2f136
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri May 3 17:24:58 2013 -0500
Renamed _set_trans(), _trans_status() macros.
Details:
- Renamed the following macros:
bli_obj_set_trans() -> bli_obj_set_onlytrans()
bli_obj_trans_status() -> bli_obj_onlytrans_status()
to remove ambiguity as to which bits are read/updated.
commit 2f8174509ea9f844db11ebd9389de5168e85b132
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 1 15:06:30 2013 -0500
Unconditionally check memory pool(s) for errors.
Details:
- Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the
memory pool is exhausted before checking out and returning a block, even
if BLIS error checking has been disabled. These errors are useful because
they likely indicate that BLIS was improperly configured for the code
being run.
commit 75405a2b83679b6aff38d7e7425199d623a7b0a9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed May 1 15:00:30 2013 -0500
CHANGELOG update.
commit 6bfa96f84887dec0b4cf8be5d38dd634c2f8951d (tag: 0.0.7)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 30 19:35:54 2013 -0500
Absorbed blocksize extensions into main objects.
Details:
- Revamped some parts of commit b6ef84fad1c9 by adding blocksize extension
fields to the blksz_t object rather than have them as separate structs.
- Updated all packm interfaces/invocations according to above change.
- Generalized bli_determine_blocksize_?() so that edge case optimization
happens if and only if cache blocksizes are created with non-zero
extensions.
- Updated comments in bli_kernel.h files to indicate that the edge case
blocksize extension mechanism is now available for use.
commit bc7c8005cedbe50961ac2a99aeeabf4e9f9a8e9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 25 17:16:59 2013 -0500
Added option to disable err checking in testsuite.
Details:
- Added a new line to input.general that allows one to specify the error-
checking level to use for each BLIS experiment. The only two levels
supported for now are "no error checking" and "full error checking".
commit 096b366ddcfe386f44419ef84d8df8be13825f86
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 25 16:43:43 2013 -0500
Use cntl trees that block in n dimension.
Details:
- Updated _cntl.c files for each level-3 operation to induce blocked
algorithms that first paritition in the n dimension with a blocksize
of NC. Typically this is not an issue since only very large problems
exceed that of NC. But developers often run very large problems, and
so this extra blocking should be the default.
- Removed some recently introduced but now unused macros from
bli_param_macro_defs.h.
commit b6e24b23cb4dfc488c1c9c70d596539c2287f72e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 25 12:06:12 2013 -0500
Use PASTEMAC in macro-kernels (over MAC2 or MAC3).
Details:
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
and PASTEMAC3) with those that only use a single type (PASTEMAC).
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
accommodate above change.
- Fixed comment typo in bli_config.h files.
- Added .nfs* pattern to .gitignore.
commit df80acf517dde180ddcc5835c6136b2fa7556d4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 19:43:23 2013 -0500
Fixed computation of b_next in L3 macro-kernels.
Details:
- Restructured herk_l and herk_u macro-kernels in the imagine of trmm
and trsm, in that the edge cases are captured by the main loop, rather
than trying to have "cleanup" sections that result in four distinct
parts (interior, bottom edge, right edge, bottom-right edge) of the
code.
- Fixed the way b_next was being computed in the non-gemm level-3
macro-kernels (herk, trmm, trsm). The way they are computed now matches
that of gemm.
commit 3671528cf8efe4b445d196665143a5c50c2c6048
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 19:12:14 2013 -0500
Fixed minor bug in computing b_next in gemm.
commit db072a5b4a039a9a668ef951333ecfb5bd3a74b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 17:49:10 2013 -0500
Fixed rare edge case bug in herk_l macro-kernel.
Details:
- Fixed a potential bug in herk_l at the m_left edge case. If MR was
chosen to be much larger than NR, then one could encounter edge cases
in the the MC dimension that fall entirely below the diagonal, which
the previous implementation of the herk_l macro-kernel was not allowing
for.
commit 1dab11e37d1cb403cbe75b73a644c00de534f104
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 17:17:11 2013 -0500
Updated x86 gemmtrsm ukernels to use alpha.
commit 9d10d7dd9bc92a993fea7162bfa5983f75506f49
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 16:00:18 2013 -0500
Added a_next, b_next arguments to micro-kernels.
Details:
- Added two more arguments to the gemm and gemmtrsm microkernels: the
addresses of the next micro-panels of A and B. By passing these
pointers into the micro-kernel, we allow the micro-kernel author to
prefetch micro-panels of A and B as necessary (though this is
completely optional; these addresses may also be safely ignored).
- Updated all seven macro-kernels so that they compute and pass in
a_next and b_next. Note that ONLY the gemm macro-kernel computes
a_next and b_next with the precise semantics we want. I will go back
and fix the other macro-kernels in the near future.
- Added 'restrict' to various micro-kernels from which it was missing.
commit f3815dc84d385c514a5acaf1e925424a57be2f51
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 23 11:12:33 2013 -0500
Added code for backward edge-case blocking.
Disabled:
- Edited bli_determine_blocksize_b() to include experimental (and
currently disabled) code that computes extended blocks.
- Updated commnts relate to above changes.
- Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.
commit 4fe1435f20e8fc7dd72f795ac58c8e236e6c631b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 22 19:00:43 2013 -0500
Updated dupl implementation to use PACKNR and NR.
Details:
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
explicitly so navigate b1 so that situations where PACKNR > NR are
supported.
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
frame/3/trsm/ukernels to kernels/c99/.
- Updated clarksville and flame configurations.
commit 2d6f9e83799a46d52d7901e275f8fd67f0a0edc6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Apr 21 15:10:34 2013 -0500
Disabled blocksize checks for memory pools.
Details:
- Temporarily disabled checks that ensure that enough memory will be allocated
by the contiguous memory allocator for all types, given that the values for
double precision real are the ones used to allocate the space. These checks
can easily go awry in certain situations, especially if you are developing for
only one datatype. So for now, they are probably more trouble than they are
worth.
commit b6ef84fad1c9884c84b7f1350a0bcdfe1737e8f2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Apr 21 15:00:24 2013 -0500
Allow ldim of packed micro-panels != MR, NR.
Details:
- Made substantial changes throughout the framework to decouple the leading
dimension (row or column stride) used within each packed micro-panel from
the corresponding register blocksize. It appears advantageous on some
systems to use, for example, packed micro-panels of A where the column
stride is greater than MR (whereas previously it was always equal to MR).
- Changes include:
- Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
to use when packing micro-panels of A and B.
- Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
where appropriate, instead of MR and NR.
- Added pd field (panel dimension) to obj_t.
- New interface to bli_packm_cntl_obj_create().
- Renamed bli_obj_packed_length()/_width() macros to
bli_obj_padded_length()/_width().
- Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
- Print out new cache and register blocksize extensions in test suite.
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
blocksize for edge cases, which can improve performance at the margins.
commit 59fca58dbe678d79c1df0916b022afbeac7c48fa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 19 15:26:29 2013 -0500
Fixed bug in compatibility layer (her2k/syr2k).
Details:
- Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c
and bla_syr2k.c, that caused incorrect computation to occur when the BLAS
interface caller requests the [conjugate-]transpose case. Thanks to Bryan
Marker for reporting the behavior that led to this bug.
commit 09eacbd1ab1380a95a0e9625726b45e43ed102d6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 18 19:39:13 2013 -0500
Changed old level3 test drivers to call front-ends.
Details:
- Changed old level-3 test drivers, in 'test' directory, to always call the
front-end object API instead of the internal back-end with the locally
defined control tree.
commit 83e45de23e565138b8fde06fb11cfedc973b7246
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 18 18:33:03 2013 -0500
Allow packm_init() to reacquire a too-small mem_t.
Details:
- Changed bli_packm_init() to react differently to a situation where a pack
obj_t has an already-allocated mem_t entry that has a buffer that is smaller
than what will be needed to hold the block/panel that now needs to be
packed. Previously, this situation was treated with an abort() since I
assumed something was horribly wrong. I have changed the code so that it now
reacts by releasing the previous mem_t and re-acquires a new mem_t with the
new information. (This change was done at the request of Bryan Marker to
facilitate code generation via DxT.)
commit a6990434173b0cf651f8521194f3aef738deb7d2
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 18 13:52:47 2013 -0500
Fixed bug in packing block of A for hemm/symm.
Details:
- Fixed a bug in bli_packm_blk_var2() that affected the packing functionality
of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or
symmetric matrix where the block of A being packed intersects the diagonal,
but some of its micro-panels do not intersect the diagonal and lie completely
in the unstored region. Thanks to Francisco Igual for reporting this bug.
- Comment updates to both _blk_var2.c and _blk_var3.c.
commit c92e7590e1934f830814ab614c794215ebe0c415
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Apr 17 20:53:29 2013 -0500
Activated bli_packm_acquire_mpart_t2b().
Details:
- Removed the overly-paranoid bli_abort() from the end of
bli_packm_acquire_mpart_t2b(), to allow others to experiment with
partitioning through packed blocks of A. Also, and more importantly,
changed an earlier check that was causing an erroneous (but
coincidentally redundant) abort(). Also, updated some of the comments
in bli_packm_part.c.
commit bea579e9f009a44e08008eb14d09f38748ab2b53
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 16 19:43:14 2013 -0500
Allow creation of "empty" objects.
Details:
- Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and
modified bli_adjust_strides() to explicitly handle m = n = 0.
- Updated bli_check_matrix_strides() to allow cases where m = n = 0.
commit 7904e20f2e6908571ee5008da2a08084198eefae
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 16 17:37:16 2013 -0500
Fixed "root" object bug in bli_her[2]k/syr[2]k.
Details:
- Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k,
that manifested as the incorrect triangle being updated. It occurred when
the user would pass in a matrix object that was correctly marked as
symmetric/Hermitian and lower-stored, but whose root object was never marked
as lower (or upper). We now alias and re-assign root status for matrix C
within the front-ends. Note that trmm and trsm were already doing this,
albeit for a slightly different reason (to allow the internal back-end to
choose which algorithm to run--lower or upper--based on the uplo of the root
object for both left and right side cases). Thanks to Bryan Marker for
leading me to this bug.
commit 19155a768dd97b57cfb59c32fa8e54a344ec66e1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 16 11:24:03 2013 -0500
Fixed overzealous type-checking in bli_getsc().
Details:
- Relaxed type checking in getsc so that the input object could be a constant
and not just a proper floating-point type. (If it is a constant, default to
extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
bug.
- Added definition for bli_is_constant() in bli_param_macro_defs.h
- Comment updates to various level-0 scalar routines.
commit 2ee6bbca2953d04c967685da9735b3eaf8a4b813
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 19:27:57 2013 -0500
Fixed bug in bli_obj_is_packed() and renamed.
Details:
- This macro is used to determine whether the partitioning routines should
call a corresponding packm_part routine instead. However, it was
unintentionally catching matrices that were marked as "packed" by virtue
of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
The macro has now been renamed to bli_obj_is_panel_packed(), and now only
checks for row or column panel packing. (Note that I first attempted to
fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
erroneous behavior that led me to this bug.
commit 99b99eebe70336b5f28039a4a084aa7f5fa7059d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 17:54:43 2013 -0500
Removed local reference ukernel blocksize macros.
Details:
- Removed locally defined gemm microkernel blocksize macros from _mxn
reference microkernel definition and header. Meant to include this in
a recent/previous commit (0020ef7c8271).
commit 6a538fa7b164655f41cea5b9c8d3902438bda66b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 14:40:31 2013 -0500
Formatting change to mods in previous commit.
commit ea079d35591e808971d2d98a1a7d9f89bc1f7c2f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 14:31:40 2013 -0500
Set structure of objects in level-2 BLIS APIs.
Details:
- Added missing statement to set structure field of local objects in
top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
reporting this bug.
commit d9948c541c0446e20e249a1ccc83709ce51b7aa8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 10:21:26 2013 -0500
Tweak to test suite function string construction.
Details:
- Fixed a minor bug in the way that the test suite would construct function
name strings when the user anchored all parameters in input.operations.
In this case, the test driver would mistake this situation for one where
the operation simply had no parameters to begin with, and thus would not
include the parameter string in the function string that is output for
every result.
commit ca9e435c57c5c7a000d2a32681dd8070ba850abd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 15 09:59:46 2013 -0500
Fixed a bug in reference implementation of dupl.
Details:
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
which resulted in incorrect duplication.
- Updated old test drivers according to recently updated packm control tree
creation interface.
- Added 'restrict' to x86 gemm microkernel interface.
commit 26cbd52e364bbe439e3744101cd5a6cbcb82dffd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Apr 14 19:05:33 2013 -0500
Modified bli_kernel.h include order in blis.h.
Details:
- Delayed #include of bli_kernel.h in blis.h to prevent a situation where
_kernel.h includes an optimized microkernel header, which uses BLIS types
such as dim_t and inc_t, which would precede the definition of those types
in bli_type_defs.h.
- Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
(immediately after that of bli_kernel.h).
commit 3414a23c38b0de45a8034b3dda2fc4b5a755e4e1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 13 16:53:16 2013 -0500
CHANGELOG update.
commit ec16c52f2ecf419c749175ce0a297441c10f1c68 (tag: 0.0.6)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 13 16:41:16 2013 -0500
Updated INSTALL file (now redirects to website).
commit 0020ef7c82711a7ebf08e5174f939bee2563184c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 13 15:26:35 2013 -0500
Removed gemmtrsm-, trsm-specific blocksize macros.
Details:
- Modified gemmtrsm micro-kernel wrappers to use new aliased blocksize macros
instead of operation-specific ones.
- Removed local, gemmtrsm-specific blocksize macro definitions found in
micro-kernel header files.
(Meant to include above changes in 31b100e7bf4a.)
- Added comments to reference gemmtrsm micro-kernel wrapper implementation.
commit 1a9f427b85bb95aaa9e54c8ff8ecad8734b361ee
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 12 15:25:54 2013 -0500
Added/renamed alignment constants to _config.h.
Details:
- Added new memory alignment constants:
BLIS_HEAP_STRIDE_ALIGN_SIZE (previously assumed to be same as SYSTEM_MEM)
BLIS_CONTIG_ADDR_ALIGN_SIZE (previously assumed to be same as PAGE_SIZE)
BLIS_STACK_BUF_ALIGN_SIZE (previously not enforced)
and renamed existing ones
BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
to better convey what the alignment factor is used for (and what it is
not used for).
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
into macro-kernels to specify stack alignment of temporary buffers.
- Modified test suite driver to output new constants.
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
use bli_align_dim_to_size(), which takes a third argument (the desired
alignment).
commit a77d10e87e3c0ab55ec14d74c285bc95c06285c3
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 12 11:40:55 2013 -0500
Fixed an bug in axpyv/axpym when alpha is unit.
Details:
- Fixed bug whereby axpyv and axpym were incorrectly simplifying to a copy,
rather than an add, when alpha = 1. Thanks to Bryan Marker for identifying
this bug.
commit 0495bd1d6de5995fe2fb79b321eec79e961eb7a5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 16:39:25 2013 -0500
Moved _POSIX_C_SOURCE def to compiler cmd line.
Details:
- Removed the #define of _POSIX_C_SOURCE in bli_config.h (for both reference
and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
the compiler command line arguments in make_defs.mk (for both configs).
Thanks to Devin Matthews for suggesting this change.
commit d43d1a0a2ef6de4bc57627566aef8e3fdb458b8c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 16:28:17 2013 -0500
Appended 'f2c_' to abs, min, max macros in f2c.h.
Details:
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
would not conflict with anything defined by the user (or the language).
Thanks to Devin Matthews for suggesting this fix.
- Updated all instances of the above macros accordingly.
commit 31b100e7bf4aeaa4ceafefd2b6c3102d5fbc4cbb
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 11:11:52 2013 -0500
Added new kernel blocksize macro aliases.
Details:
- Added new macros that alias level-3 cache and register blocksize macros
to names that can be constructed via the PASTEMAC macro. These aliased
macro definitions live inside bli_kernel_macro_defs.h, which is now
#included after bli_kernel.h.
- Modified macro-kernels to use new aliased blocksize macros instead of
operation-specific ones.
- Removed local, operation-specific kernel blocksize macro definitions
(found in macro-kernel header files).
commit bd2b24ba65b36d7c07c5918a3838ce2ff57c4b48
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 10:35:39 2013 -0500
Updated CREDITS file.
commit 79328c15410215737f3f14cd069328cf52aa11fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 11 10:32:14 2013 -0500
Reverted testsuite object files' home to 'obj'.
Details:
- Removed 'obj' and 'lib' from .gitignore.
- Added testsuite/obj/.gitkeep (which is an empty file).
- Updated testsuite/Makefile accordingly.
- Thanks to Vernon Austel for pointing out the .gitkeep trick to tracking
empty directories in git.
commit 4afe3bfd82c03e1e97b58b7d250588a0d28541e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 9 17:45:39 2013 -0500
Renamed/moved object scalar constant macros.
Details:
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
simplier macro in bli_obj_macro_defs.h.
- Updated invocations of old macros accordingly.
- Removed bli_const_defs.h.
commit 357893f5be5c56ab7b062874005e77e614b23f06
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 9 14:48:15 2013 -0500
Applied fix from prev commit to gemmtrsm_?_ref_4x4
Details:
- Fixed hard-coded kernels in bli_gemmtrsm_l_ref_4x4.c and
bli_gemmtrsm_u_ref_4x4.c.
commit 54988e8dca44475610bcaee5a7bc1c40e8921402
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 19:08:43 2013 -0500
Fixed a performance bug in trsm.
Details:
- Fixed a bug in the reference implementations of the gemmtrsm wrappers
(bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the
reference gemm microkernel was hard-coded, and thus always called, even
when GEMM_UKERNEL was defined to point to an optimzied microkernel. This
manifested as artificially low trsm performance for all problem sizes, but
especially for small problem sizes as it only affected blocks of A that
intersected the diagonal. Thanks to Mike Kistler of IBM for helping me
find this bug.
commit a7252e40b5c351eef9a1df531ea0ef25cb5fb705
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 16:08:22 2013 -0500
Generate testsuite objects 'src'.
Details:
- Tweaked the testsuite makefile so that object files are stored in 'src'
rather than 'obj', since (a) the top-level .gitignore dictates that
obj directories are to be ignored, and (b) since git has problems
tracking empty directories. Now, users do not need to create their own
obj directories within their own local clones of BLIS.
commit 803871c55b60d3c225ad9a0607fa507a9c16aab7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 15:18:42 2013 -0500
Minor formatting changes.
commit a571af816d72727e16cad37007e7043b9d6fa362
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Apr 8 15:00:13 2013 -0500
Fixed definition of bli_is_packed_object() macro.
Details:
- Changed the definition of bli_is_packed_object() so that it keys off of the
value of the pack schema bits in the info field of obj_t, rather than
comparing the obj_t buffer with that of the mem_t entry. This was the cause
of a very low probability bug whereby uninitialized memory caused the macro
to evaluate to TRUE even though the object in question was not packed.
Thanks to Vernon Austel of IBM for helping discover this bug.
- Changed an abort() in bli_packm_part() to a not-yet-implemented.
commit 3be14c32f735ecc6169d3ab6370cf8b69162acec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Apr 6 12:54:45 2013 -0500
Updated information in testsuite output header.
Details:
- Added to the information that is echoed at the beginning of the test suite's
output, and also re-labeled some existing information.
commit 874707c1b183a4dd9a91dbfd4ea1522384c190df
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Apr 5 17:19:43 2013 -0500
Fixed edge case handling bug in herk macrokernels.
Details:
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
only manifests when BLIS is configured such that MR != NR. The bug involves
incorrectly detecting edge cases, which resulted in some parts of matrix C
potentially being skipped and not updated, depending on the problem size.
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
8 and 4, respectively, so that I can better stress the framework on a
day-to-day basis. (The fact that they were both equal to 4 for so long is
why I did not stumble upon this bug much sooner.)
commit 7cbda15291d3e01300e71c286b9657b7ef0708bf
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Apr 4 15:25:43 2013 -0500
Added reference microkernels for arbitrary MR, NR.
Details:
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
contain explicit loops over MR and NR, thus allowing them to be used
unmodified by developers who want to build a reference library with
custom register blocksizes.
- Changed config/reference/bli_kernel.h to use above ukernels by default.
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
to use 'restrict' keyword.
- Added -funroll-loops option to config/reference/make_defs.mk.
- Updated comments in bli_kernel.h describing constraints on register and
cache blocksizes.
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
single-char macros are also defined.
commit 6684b73d5501f91d24a79e26655a42819c9b3114
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Apr 2 13:06:20 2013 -0500
Implemented amax operation and related changes.
Details:
- Implemented amax operation in BLIS.
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
- Added integer support to [f]printv, [f]printm.
- Added integer support to level-0 copys macros.
- Updated printing of configuration information in test suite driver.
- Comment changes to _config.h files.
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
used for.
commit fb68087f8727cd5fd656a742a110e54fb1c91db9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 15:10:16 2013 -0500
More memory alignment-related tweaks.
Details:
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
passed into posix_memalign() or equivalent.
- Defined new function, bli_align_dim_to_cmem(), which applies the
contiguous memory alignment (rather than the system/malloc alignment).
commit 9682ef61dbf9a8846c8b0826d4de24bc216cd641
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 14:14:53 2013 -0500
Always define memory alignment size cpp constant.
Details:
- Removed guard around #define for memory alignment size constant.
Memory alignment should always be enabled, and so this value should
always be defined.
commit 3a787cccaae16531474f34398e3c0cf4f49b8cd8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 13:59:19 2013 -0500
Renamed memory alignment macro constant.
Details:
- Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
BLIS_MEMORY_ALIGNMENT_SIZE.
commit 37308f9a502b56d94fa52a7df71c676a46c3be3d
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 26 12:43:14 2013 -0500
Align packed panel strides with system alignment.
Details:
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
subsequent packed panel of A and B begins at an aligned address. (The
first panel is presumably aligned to system alignment because it is
aligned to a page boundary, which is typically much larger.)
- Rearranged code in packm_init_pack() to prevent additional conditional
blocks as a result of the aforementioned change.
- Adjusted contiguous memory allocator so that the system memory alignment
is used to allocate enough space for each block no matter what kind of
register blocking is used (even if register blocksize is unit and every
row/column needs maximal padding).
- Adjusted default blocksizes in reference configuration so that MC*KC
and KC*NC result in identical footprints for all datatypes.
commit 40a0654ada5f256beb3da80ebba015a3c71fb61f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 20:18:12 2013 -0500
CHANGELOG update.
commit b65cdc57d9e51fa00e3c03539cfb7e045707d0f4 (tag: 0.0.5)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 20:01:49 2013 -0500
Migrated 'bl2' prefix to 'bli'.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
commit 132bffcef7441f32d02cc7485aef6a0648e0ef1e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 18:49:36 2013 -0500
Removed several 'old' directories and files.
Details:
- Removed most of the 'old' directories scattered throughout the framework,
which includes alternate/half-baked/broken implementations.
commit 551ea4767a3ea6c263f12aaca94bc2642cee4cfa
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sun Mar 24 18:00:10 2013 -0500
Removed #include "blis2.h" from low-level headers.
Details:
- Removed #include of "blis2.h" from various lower-level, operation-specific
header files throughout the framework. Given that these low-level headers
are included within #blis2.h in a very specific order, #include'ing blis2.h
within them directly is unnecessary.
commit bc7b318ed0960edeb4537797dd8c91de0d942ca9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 17:18:58 2013 -0500
Added cpp guards to conflicting libflame typedefs.
Details:
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
This is a temporary hack to allow interoperability with libflame. (Similarly
temporary changes are being made to libflame's type definitions file.)
commit f469907503fcdc24dff0174c569170e6e756e045
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 15:20:15 2013 -0500
Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
Details:
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
(e.g. "prefetch" instructions, which are different than the particular
kind of prefetching/preloading referred to by this constant).
commit d1023bfbc6668a58a01ee4f82ded2319911e7b19
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 15:09:59 2013 -0500
Removed build/old directory.
commit 718888849c48d99f83eea6b8f83bc1998cffef7e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 22 15:07:01 2013 -0500
Deprecated 'flame' configuration.
Details:
- Removed 'flame' configuration, as it was horribly out-of-date.
- Comment changes to bl2_blocksize.c and bl2_mem.c.
commit bba38cf4e9d28058c14483f44fa074a6d2852ad9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Mar 19 18:07:40 2013 -0500
Added missing conjbeta argument to scald.
commit 1f82b51d06d0279dded3f2b87ba59403f3ed0af6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 18 15:37:20 2013 -0500
Relocated packed mem_t dimension fields to obj_t.
Details:
- Removed the m and n (and elem_size) fields from the mem_t object, and added
m_packed and n_packed fields to obj_t. These new fields track the same as
the old ones. From an abstraction standpoint, it seemed awkward to store
those dimensions inside the mem_t.
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
is passed in, instead of m, n, and elem_size.
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
respectively.
- Updated packm variants to access the packed length and width fields from
their new locations.
commit 36c782857bf9b8ac1b1dac47a70f689a4407e2cc
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Mar 18 10:37:03 2013 -0500
CHANGELOG update.
commit e7d41229d3b1674e74f47d7f29fae004a745201a (tag: 0.0.4)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 15 17:12:36 2013 -0500
Re-implemented contiguous memory allocator.
Details:
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
allocator instantiates and initializes three separate memory pool objects,
each one associated with a separate array of contiguous memory blocks, each
block of fixed and uniform size. (The three pools are for allocating mc-by-kc
blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
objects use a stack structure internally to track which blocks in the region
have been "checked out" to a thread and which are still available. Critical
regions are now clearly marked and adaptable to parallel environments (e.g.
OpenMP). Memory pools are set up when bl2_init() is called.
- Added a new field to the packm control tree node, which indicates what kind
of packed buffer is being allocated. The enumerated type for this argument
is defined as packbuf_t in bl2_type_defs.h.
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
packbuf_t argument to bl2_packm_cntl_obj_create().
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
bl2_mem_macro_defs.h.
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
number of blocks of A reserved for the memory allocator.
- Deprecated bl2_align_dim(). Replaced usage with that of
bl2_align_dim_to_mult(). Turns out that typically we don't need to align
a dimension to the system alignment, since that value has to do with
starting addresses, whereas the values we are dealing with are unitless
dimensions.
commit 1e76cae00cb0a04544aaae1ade878686b238d283
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 15 12:21:42 2013 -0500
Perform her2k var1 loops in sequence.
Details:
- Changed variant 1 of her2k so that the two rank-k products are computed
and accumulated in sequence rather than fused into one loop. This is
necessary if BLIS is to be configured to provide only enough contiguous
memory for one panel of B.
commit c95c270eba91ae4efc26603beddfd0292caa919b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 7 14:42:15 2013 -0600
Enhanced tracking of dimensions for mem_t objects.
Details:
- Added new fields to mem_t struct definition to track the allocated (as
opposed to the currently used) dimensions of the memory region. This
allows packm_init() to be more robust in situations where memory is
already allocated but is more than needed for the current packing job.
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
in packm_init(), to update the "currently used" dimensions of the mem_t
object if the requested dimensions are smaller than the allocated
dimensions.
commit e99281a0f41d482fddeffa239bfc8e13e6d13d4b
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Mar 7 14:00:10 2013 -0600
Fixed test suite flop formulas for ops with side.
Details:
- Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
trmm3, and trsm.
- Comment updates in herk macro-kernels.
commit ef8cbfc44dd620fdcbdb51cdb173217194bebe31
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 2 12:47:06 2013 -0600
Added "version" to .gitignore.
Details:
- Added "version" to .gitignore file so that the file does not show up when
running 'git status', or accidentally get pulled into the index when
running 'git add' or 'git add --all'.
commit e9e0747c2f6c178f53ac46ab794acbb7b8c4fea8
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Sat Mar 2 12:43:54 2013 -0600
Removed version file from version control.
Details:
- Removed version file from version control to prevent git errors that occur
when trying to pull new commits.
commit bb612f864e9c17dd9805e9446840f02259619469
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Mar 1 12:55:42 2013 -0600
Updated behavior of bl2_obj_induce_trans() macro.
Details:
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
updated as part of the macro. All current uses of the macro have been
coupled with instances of bl2_obj_set_trans() to clear the bit.
- Added Jed to CREDITS file.
commit f24e29b789e7314764a818ceb3063126936c986f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 22 18:15:41 2013 -0600
Replaced banded/packed BLAS2 stubs with f2c code.
Details:
- Retired the blas2blis wrappers that simply called abort with a "not yet
implemented" message. This includes all of the level-2 banded and packed
routines.
- Replaced the aforementioned with the corresponding netlib implementations
having been run through f2c (with some customization).
- Added directories named 'attic' to build/gen-make-frags/ignore_list.
commit 1454c1a14207766dfed372b8e38b47fa384f5198
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 22 12:38:45 2013 -0600
Moved Fortran name-mangling macro to bl2_config.h.
Details:
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
configuration directory (bl2_config.h, specifically) given that it can be
expected to be tweaked by some developers.
commit ede75693e5a36c6006087c4a7df834175b604504 (tag: 0.0.3)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 22 12:11:24 2013 -0600
Implemented blas2blis compatibility layer.
Details:
- Added the blas2blis compatibility layer, located in frame/compat. This
includes virtually all of the BLAS, including banded and packed level-2
operations.
- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
initialization, which stores the "exit status" in an err_t, which is then
read by the latter function to determine whether finalization should actually
take place.
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
level-3 BLAS-like wrappers.
- Added configuration option to instruct BLIS to remain initialized whenever
it automatically initializes itself (via bl2_init_safe()), until/unless the
application code explicitly calls bl2_finalize().
- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
templatization of blas2blis wrappers.
- Defined level-0 scalar macro bl2_??swaps().
- Defined level-1v operation bl2_swapv().
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
wrappers.
commit 995edf43e21c1868732dbdd7fee14b08730218bd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 21 14:30:50 2013 -0600
Updated version file. (Forgot to in prev commit).
commit e823b08aaf7b65ecc6ddc30570709ea8a4b52aa7
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 21 12:00:17 2013 -0600
Fixed some scalar types in BLAS-like Herm APIs.
Details:
- Some of the scalars of Hermitian operations, such as alpha in her,
alpha and beta in herk, and beta in her2k, need to be real. These
arguments were typed incorrectly as the complex types. This has been
fixed. Note the issue was only present in the BLAS-like APIs for
these operations (not the native object-based interfaces).
commit 5ece050a669e74ba4a711d1d4669239d22d45642
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 20 15:50:54 2013 -0600
Updated version file. (Forgot to in prev commit).
commit f243034b8b430d4684680ea8eddfd246e73fefc0
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 20 14:11:36 2013 -0600
Changed API of packm_init_pack() to use blksz_t.
Details:
- Changed the interface of packm_init_pack() so that mult_m and mult_n
are passed in as type blksz_t* instead of dim_t.
- Make similar change for packv_init_pack().
commit da0c22f24107be9f33e0ea2dae52e5534b1fd0e5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Feb 15 09:59:48 2013 -0600
Minor changes to lower levels of scalm and setm.
Details:
- Removed diagx parameter from lower-level interfaces of scalm.
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
- Changed setm_unb_var1() so that having an implicit unit diagonal results
in only the strictly lower or upper triangle of the matrix being modified.
commit 2c836adadcd2a7d7f217033ac4d7fcad03d5bd55
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 14 10:42:56 2013 -0600
Updated beta == zero semantics of mulsc.
Details:
- Updated beta == zero semantics of mulsc. Hopefully this is the last
operation that needed updating.
- Added Devin to CREDITS file.
commit 722b66c7dcaaaa1b109e7c8b1d53fd71a9af8240
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Feb 14 10:18:00 2013 -0600
Removed some calls to setv() in test modules.
Details:
- Removed calls to setv() in test modules whose sole purpose was to
initialize vectors to zero to ensure that nan's and inf's would not
taint the computation. Now that beta == zero semantics have been
updated to clear the output operand (when beta is zero), rather than
multiply against it, these setv() calls are no longer needed.
commit e6ac623a902f776c42f85eadbf76996d9770a0db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 13 18:44:59 2013 -0600
Properly implemented beta == 0 semantics.
Details:
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
respectively.
- Added code to the following operations that sets the output operand to
zero if the corresponding scalar is zero (rather than performing the
floating-point multiply, or in the case of setv, copying the value).
This will prevent nan's and inf's from creeping into results from
uninitialized memory.
- axpy
- dotxv
- scalv
- scal2v
- setv
- gemv
- ger
- hemv
- her
- her2
- gemm reference ukernels
commit aedccbc85d491e41711a0c6eb0d246d8700a199a
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 13 18:29:53 2013 -0600
Fixed stale interface to packm_unb_var1().
Details:
- Removed the control tree from the interface to packm_unb_var1(), which
I meant to do when it was un-deprecated.
commit c23135669f7a8a545e2e11ef559bf284be8bc65c
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Wed Feb 13 13:21:00 2013 -0600
Un-deprecated packm_unb_var1.c (needed by l2 ops).
Details:
- Added bl2_packm_unb_var1() back into the mix once I realized that level-2
operations still need this routine for packing matrices. Now, whether
level-2 operations should be packing matrices to begin with is another
matter. But this fixes the segmentation fault one would have gotten when
running bl2_gemv() on a general stride matrix.
commit cf49e35f9819f9d93ebdca4703ade5abab28f6f6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 18:39:35 2013 -0600
Removed cntl tree usage from packm implementation.
Details:
- Added new fields to obj_t info field:
- invert_diag
- pack_order_if_upper
- pack_order_if_lower
These fields allow packm_init() to embed information that begins
in the control tree into the object so that the packm implementation
does not need to use control trees at all. This is being done to aid
Bryan's DxT code generation.
- Added macros that operate on above fields.
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
to above changes.
- Made similar (but much simpler) changes to packv.
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
These were part of prototype implementations and are no longer needed.
commit eb139ae256651af7820b93ef982626180195b87f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 12:39:30 2013 -0600
Replaced bl2_abs() with _fabs() where appropriate.
commit 474bac30c99928f9e87315972bcb45c632c0b7ec
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 12:23:48 2013 -0600
Removed level-0 macros projrs, grabis.
Details:
- Replaced instances of projrs and grabis macros with newer,
more general-purpose getris.
commit 03a260a457c8964e4603a655cee0d40ac17affba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Feb 12 11:45:34 2013 -0600
Restored executable permissions to scripts.
Details:
- Restored executable (0755) permissions to scripts that were touched by
the recursive sed script that updated the copyright headers in the
previous commit.
commit 1274e1243775e5e705114257a43176f63635227f
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 11 14:37:47 2013 -0600
Updated copyright headers from 2012 to 2013.
commit 3b620cc8e90c53c79129bd9dd89ae6b77c2446f1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 11 13:38:07 2013 -0600
CHANGELOG update.
commit 768fcebaa8be0eb936a6e7a02cd8a19438c79d99 (tag: 0.0.2)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Feb 11 13:20:44 2013 -0600
Added unified test suite, and many fixes.
Details:
- Added a highly configurable, unified test suite.
- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
header files. Now, instead, DUPB is computed as (NDUP != 1) within each
macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
By encoding both pieces of information into one constant in _kernel.h,
it seems somewhat less likely others will encounter this bug in the
future.
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
and defined blocksizes in _cntl.c files to these default values.
- Changed semantics of her2k and syr2k such that these operations no longer
expect the B matrix to already be conjugate-transposed (or just transposed
for syr2k). However, these semantics are preserved for the internal
mechanics of the implementations, including the internal back-end and all
blocked variants.
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
respectively.
- Relaxed general object structure constraints in _basic_check() for gemv, ger.
- Changed her front-end to NOT copy-cast to real projection; instead, this is
replaced by selecting either the real part or both parts within the unblocked
algorithm implementation, depending on the value of conjh.
- Added conjh to all _check routines for her so that the code knows when to
verify that alpha has an imaginary component equal to zero (for her, but
not syr).
- Changed control tree for her to forgo packing.
- Added unit diagonal support to fnormm.
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
- Redefined complex versions of sqrt2s macros using the actual "complex square
root" formula.
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
- Defined new level-1v, -1d, and -1m versions of add and sub operations
(two-operand add and subtract).
- Added new scalar macros:
- getris: acquire real and imaginary components.
- setris: set real and imaginary components.
- addjs: addition with conjugated x.
- subjs: subtraction with conjugated x.
- Defined new utility operations:
- absumv: element-wise sum of absolute values for vector elements.
- absumm: element-wise sum of absolute values for matrix elements.
- mkherm: convert existing matrix to Hermitian.
- mksymm: convert existing matrix to symmetric.
- mktrim: convert existing matrix to triangular.
- Added various error checking routines.
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
wall clock time of a code block.
- Added general stride support to bl2_obj_alloc_buffer().
- Added bl2_obj_init_scalar().
- Updated parameter mapping in bl2_param_map.c.
- Added support for queriable version string.
- Fixed a bug in the her2k macro-kernels (which currently are simply
implemented in terms of two invocations of herk) whereby beta was being
applied to both the first and second rank-k updates, rather than only
the first.
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
properly implemented due to erroneous assumptions regarding aliasing and
root objects.
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
MR x NR block of B was being updated.
- Fixed a bug in the inverts macro in the double real case whereby the
value was typecast to float before inversion. This affected non-unit cases
of dtrsm.
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
constant was being applied incorrectly.
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
now mimics the rank-k strategy of gemm, whereby alpah is applied during
the first iteration of variant 3, with BLIS_ONE passed in instead for
subsequent iterations. This also required passing alpha into the macro-
kernels as well as the fused gemmtrsm micro-kernels.
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
called for blocks strictly above the diagonal. While this sounds good in
theory, this cannot be done because gemm_ker_var2 expects row panels of
A to be packed from top to bottom, while for trsm_u, A is actually packed
from bottom to top due to the reverse (BR->TL) nature of the algorithm.
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
dimensions were mishandled due to incorrect arguments to the copyv kernel.
Also changed the copyv kernel invocation to scal2v so that these edge
cases are properly handled when scaling is requested.
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
instead of the source object.
- Fixed a bug whereby level-2 code could allocate memory dynamically via
bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
a potential future bug whereby a mem_t object that is actually no longer
"allocated" from the static pool is mistaken for being allocated due to
failure to NULLify the buffer when the block was most recently released.
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
toggled when the requested subpartition needed to be "reflected" due to it
residing in an unstored region.
commit be94fb84c0351602d7585269f29998e3bf83f899
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 4 10:55:21 2013 -0600
Added missing 'd' to fused gemmtrsm function name.
commit 879a179e1dee36f0c56765f2ab91a26861019b34
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Jan 4 10:37:27 2013 -0600
Added debug statements to bl2_mm_acquire_m().
Details:
- Added printf() statements to bl2_mm_acquire_m() to help debug issues
with prematurely exhausted memory pool.
- Removed 'd' from kernel names of reference kernels in clarksville
configuration's bl2_kernel.h
commit 806e74beb4eafeef620a555ffbb3f6779e29c7b6
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 17:07:50 2012 -0600
Defined Frobenius norm operations.
Details:
- Added level-0 grabis macro operation to grab imaginary component of one
variable and copy it to the real component of another variable.
- Defined sumsqv operation, which computes the sum of the absolute squares
of the elements of a vector. This implementation is modeled after ?lassq
in netlib LAPACK.
- Defined fnormv and fnormm operations, which compute the Frobenius norm on
vectors and matrices, respectively. These operations are treated as one-
operand operations where the output norm value is the real projection of
the datatype of the input operand. Both operations are implemented in terms
of sumsqv.
commit 66e80ce1aec099b2b2b0c4f295e38add2c921383
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 17:02:55 2012 -0600
Added GENT*R macros; tweaked bl2_machval defs.
Details:
- Added function and prototype macro-generating macros for GENTFUNCR and
GENTPROTR, which are one-operand macros with auxiliary real projection
types.
- Tweaked bl2_machval files to use new macros.
commit 2fecc88ca22142020573f168da715e8e9f3dd7de
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 20 11:35:14 2012 -0600
Fixed harmless macro bug in level-1m operations.
Details:
- Fixed some inconsistent usage of n_iter_max and n_iter in the two
bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening
despite the bug, which is why I had not discovered it until now.
commit 8945db6ec9f82168cf72411ad408b4fdb44ae0d1
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 18 15:07:36 2012 -0600
Renamed x86,x86_64 kernels to indicate 'd' fusing.
Details:
- Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape
to emphasize that the fusing shape is not for all datatype instances, but
rather just for one (that of double-precision real). Other fusing shapes
would be proportional to their precision and domain "byte footprints".
- Corresponding changes to config/clarksville/bl2_kernel.h.
commit 6fbbdd4e194d06096ad08c5db61127be338067db
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 18 14:34:02 2012 -0600
More tweaks to _config.h, _kernel.h; smem tweaks.
Details:
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
accomplishes the same thing (enabling posix_memalign()) without enabling
all of the GNU extensions we don't need.
- Defined the size of the static memory pool in terms of MC, KC, and NC,
as well as two new constants that determine how many MCxKC blocks and
how many KCxNC blocks should be allocated (defined in bl2_config.h).
- In the case of static memory pool exhaustion, replaced the generic
bl2_abort() with a specific error code call.
commit 5d8bdb21c48e8fb11bef6128a242122cc1470a99
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 17 16:07:36 2012 -0600
Minor reordering of bl2_config.h definitions.
commit 4a83f67490136a898f558e273b76a687aed8b893
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 17 12:35:54 2012 -0600
Consolidated configuration headers.
Details:
- Merged contents of bl2_arch.h into bl2_config.h for reference and
clarksville configurations.
- Updated CREDITS, INSTALL, LICENSE, README files.
commit 0670c33cc14612f636ef09ede4133404ae0af6ba
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 14 12:45:26 2012 -0600
Fixed bug in reference gemm ukernels.
Details:
- Fixed a bug whereby, for the reference gemm ukernels, the matrix product
was not correctly accumulated and scaled (by alpha) into the output matrix
C. (Thanks to Fran for finding this bug.)
- Whitespace changes to reference trsm kernels.
commit e2e7cb2fbe615be4d375bc2dce88d03d98fadc9e
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 13 18:17:54 2012 -0600
Expanded reference packm/unpackm kernel set to 16.
Details:
- Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and
unpackm.
- Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range"
kernel size is requested. (Thanks to Tyler for finding this bug.)
- Updated bl2_kernel.h to contain new _KERNEL definitions, according
to above changes, for 'reference' and 'clarksville' configurations.
- Updated CHANGELOG.
- Removed "output*.m" from .gitignore.
commit 17455a8bce038dd570356ab0c5c11d9a89f20248
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 10 17:23:32 2012 -0600
Minor updates towards to 0.0.1.
commit 7ad4ebef38b8e6eea9b6091844ba7294ec870271 (tag: 0.0.1)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 10 16:18:40 2012 -0600
Tweaks to get BLIS compiling again on clarksville.
Details:
- Updated header files and make_defs.mk in config/clarksville.
- Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
- Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
- Shuffled include statements in blis2.h.
commit cc58ea86010b1f046134d13b546c878389df9af5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 10 14:55:12 2012 -0600
Added template fragment.mk; updated .gitignore.
commit 714c527b0eb153b7e2040b79349edc8372f743fd
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 7 19:54:04 2012 -0600
Added 'changelog' make target; other tweaks.
Details:
- Updated CHANGELOG.
- Added 'changelog' target to Makefile that runs 'git log --decorate' and
overwrites CHANGELOG with the output.
- Other trivial changes.
commit e4e5404d26aded4873278e85faf6f14ac32115b5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 7 17:34:53 2012 -0600
Define static memory pool size in bl2_config.h.
commit 19bb507d0de6a2bd3ce37cf616bdcd6b419ed641
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Fri Dec 7 17:18:00 2012 -0600
Refined INSTALL text; added 'showconfig' target.
Details:
- Added 'showconfig' target to Makefile.
- Added header files and ./config/<configname>/make_defs.mk as prerequisites
to object file rules.
- Added config.mk as prerequisite to library install rules.
- Edited and added to INSTALL file.
commit 26cb659dd79636489db5a051aa60fff80273a7b9
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 6 15:34:53 2012 -0600
Added auto-detection of version string (via git).
Details:
- Added build/update-version-file.sh script for auto-detecting "version"
string and updating 'version' file accordingly. (If .git directory is
not present, then it is assumed this copy of BLIS is a downloaded
release, in which case 'version' file is left unchanged.)
- Added invocation of update-version-file.sh to configure script.
commit b0ecd0ff52fa6ffc9e1d9eb44c365f7f009a6204
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 6 14:27:11 2012 -0600
Wrote first draft of INSTALL file.
commit bcbe81235a35ccfdbcc2f2319a0ca6e04f75a785 (tag: 0.0.0)
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Thu Dec 6 12:42:35 2012 -0600
Updated standalone test Makefile and other fixes.
Details:
- Major edits to test/Makefile to bring up-to-date wrt new build system;
should no longer be broken.
- Minor edits to top-level Makefile.
- Fixed copy-and-paste bugs in
- frame/1m/packm/ukernels/bl2_packm_ref_?xk.c
- frame/1m/unpackm/ukernels/bl2_unpackm_ref_?xk.c
commit 2f272b40f43307909736327f49d17737c7a05d37
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Tue Dec 4 19:22:14 2012 -0600
Added build system and continued reorganization.
Details:
- Added/renamed packm, unpackm kernels.
- Added machine value routines.
- Added param_map facility.
- Renamed AUTHORS to CREDITS.
- Added Makefile; continued to expand upon existing configure script.
- #define fuse_fac macros in operation headers if not defined already
(by the user in bl2_kernels.h).
commit 00f3498a8943be1b387f0d5c029c8c7891687ad5
Author: Field G. Van Zee <field@cs.utexas.edu>
Date: Mon Dec 3 12:36:11 2012 -0600
Initial commit.