mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
14067 lines
568 KiB
Plaintext
14067 lines
568 KiB
Plaintext
commit 2fb440876690bdcec0c11a30e2b33ad100bab529 (HEAD -> master, tag: 0.3.2)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Apr 28 14:07:31 2018 -0500
|
||
|
||
Version file update (0.3.2)
|
||
|
||
commit cdf041ddadd8725e578e2f59f37ae341f26655af (origin/master, origin/HEAD)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Apr 28 14:05:00 2018 -0500
|
||
|
||
Use config.mk instead of common.mk in bump-version.sh.
|
||
|
||
Details:
|
||
- Fixed inadvertent targeting of common.mk when testing whether configure
|
||
had already been run, rather than config.mk.
|
||
|
||
commit 6ded8f9f0364b3c07255e2532ada3eeb2ed2a715
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Apr 28 14:01:29 2018 -0500
|
||
|
||
Account for recent 'make distclean' in bump-version.sh.
|
||
|
||
Details:
|
||
- Added logic to build/bump-version.sh that will run './configure auto'
|
||
if 'common.mk' is not present (usually because 'make distclean' was run
|
||
recently).
|
||
|
||
commit 7c16fdce433f5dea0e83d5047553c955d8e46fd2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Apr 28 13:50:55 2018 -0500
|
||
|
||
Fixed typo in RELEASING file.
|
||
|
||
commit 5e5ca4984fcf6d72d3036c338bb9cdc64520a325
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Apr 28 13:48:01 2018 -0500
|
||
|
||
README updates.
|
||
|
||
Details:
|
||
- Updates to the top-level README files in the top-level directory as
|
||
well as the 'examples/oapi' directory.
|
||
|
||
commit 627b045e301defea6770dc5b64e1110cbec25153 (origin/dev, origin/amd, dev, amd)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 27 18:11:19 2018 -0500
|
||
|
||
Added an example of using transposition with gemm.
|
||
|
||
Details:
|
||
- Added an example to examples/oapi/8level3.c to show how to indicate
|
||
transposition when performing a gemm operation.
|
||
|
||
commit 13a0eadc69d72933e322901f5b44944834e3c787
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 27 18:00:07 2018 -0500
|
||
|
||
Added more transposition/conjugation examples.
|
||
|
||
Details:
|
||
- Added code to examples/oapi/5level1m.c that demonstrates transposing
|
||
(and conjugate-transposing) unstructured matrices.
|
||
- Comment updates to 6level1m_diag.c to maintain consistency with new
|
||
examples in 5level1m.c.
|
||
|
||
commit 5606cd8881e75264a96af45dc8ea1905bab054f5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 27 17:13:10 2018 -0500
|
||
|
||
Added utility module to examples/oapi.
|
||
|
||
Details:
|
||
- Added a new code example file to examples/oapi demonstrating how to use
|
||
various utility operations.
|
||
- Comment updates to other example files.
|
||
- README updates.
|
||
|
||
commit ff26c94c6486374c709f93c6965ea18903bd6a18
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 27 12:31:34 2018 -0500
|
||
|
||
Added missing gcc version constraint for knl.
|
||
|
||
Details:
|
||
- Previously forgot to add explicit enforcement of a minimum gcc version
|
||
in configure script when 'knl' sub-configuration is requested.
|
||
- Comment updates to configure.
|
||
|
||
commit 4d97574e477b3e55ddbb6044b0542a92cd9bab30
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 24 18:48:09 2018 -0500
|
||
|
||
Added object API example code.
|
||
|
||
Details:
|
||
- Added an 'examples' directory at the top level.
|
||
- Added an 'oapi' subdirectory in 'examples' that contains a tutorial-like
|
||
sequence of example code demostrating the core functionality of BLIS's
|
||
object-based API, along with a Makefile and README. Thanks to Victor
|
||
Eijkhout for being the first to suggest including such code in BLIS.
|
||
|
||
commit d6ab25a3232aa52b9b855088fb4b0b46ff2c00c8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 24 18:43:03 2018 -0500
|
||
|
||
Add setijm, getijm operations.
|
||
|
||
Details:
|
||
- Added bli_setgetijm.c, which defines bli_setijm(), bli_getijm(), and
|
||
related functions that can be used to read and write individual
|
||
elements of an obj_t.
|
||
- Defined a new function, bli_obj_create_conf_to(), in bli_obj.c that will
|
||
create a new object with dimensions conformal to an existing object.
|
||
Transposition and conjugation states on the existing object are ignored,
|
||
as are structure and uplo fields.
|
||
- Defined a new function, bli_datatype_string(), in bli_obj.c that returns
|
||
a char* to a string representation of the name of each num_t datatype.
|
||
For example, BLIS_DOUBLE is "double" and BLIS_DCOMPLEX is "dcomplex".
|
||
BLIS_INT is included (as "int"), but BLIS_CONSTANT is not, and thus is
|
||
not a valid input argument to bli_datatype_string().
|
||
- Added calls to bli_init_once() to various functions in bli_obj.c, the
|
||
most important of which was bli_obj_create_without_buffer().
|
||
- Removed unintended/extra newline from the end of printv output.
|
||
- Whitespace changes to
|
||
- frame/base/bli_machval.c
|
||
- frame/base/bli_machval.h
|
||
- frame/0/copysc/bli_copysc.c
|
||
- Trivial changes to README.md and common.mk.
|
||
|
||
commit a731a428f7fc02fd6ab4f953ead828c1d06fb5a1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 17 16:44:55 2018 -0500
|
||
|
||
Another README.md update.
|
||
|
||
commit c734ee928a824b27d280a9a67b1b4bc8423d5795
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 17 16:40:05 2018 -0500
|
||
|
||
README.md update.
|
||
|
||
commit 03ecad372d8eb603ee905a7b944d0544a813460a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 17 14:16:59 2018 -0500
|
||
|
||
Added RELEASING file.
|
||
|
||
Details:
|
||
- Added a file named 'RELEASING' that contains basic notes on how to
|
||
create a new version/release of BLIS. This is mostly just a reminder
|
||
to myself, but also may become useful if/when others take over
|
||
development and administration of the project.
|
||
|
||
commit 24b3c3149ce66546b9a1afc2cc794a637a86aa60
|
||
Merge: 60366a3f 817b67c0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 16 18:49:38 2018 -0500
|
||
|
||
Merge branch 'dev' of github.com:flame/blis into dev
|
||
|
||
commit 60366a3faba4e60cee85c3b87a3f69625f4b9026
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 16 18:46:21 2018 -0500
|
||
|
||
Updates to knl kernels and related code.
|
||
|
||
Details:
|
||
- Imported the 24x16 knl sgemm microkernel (and its corresonding spackm
|
||
kernel) from TBLIS and enabled its use in the knl sub-config. Also
|
||
Added sgemm microkernel prototype to bli_kernels_knl.h.
|
||
- Updated dgemm and dpackm microkernels from TBLIS, which included an
|
||
important change regarding the offsets array (changed from extern
|
||
declaration to static declaration/definition).
|
||
- Activated use of level-1v and -1f zen kernels in skx and knl
|
||
sub-configs.
|
||
- Removed some old macros no longer needed in bli_family_skx.h now that
|
||
libmemkind support exists in configure.
|
||
- Moved bli_avx512_macros.h to frame/include and adjusted #includes in
|
||
skx and knl kernels accordingly.
|
||
- Moved unused kernels in kernels/knl/3 to kernels/knl/3/other
|
||
directory.
|
||
- Fixed a minor bug in the 'make' output per compile when verboseness
|
||
is not turned on. The rule-generating function 'make-kernel-rule' was
|
||
previously passing in the name of the config, rather than the name of
|
||
the kernel set returned by get-config-for-kset, which could give
|
||
misleading information to the user when the kconfig_map mapped a
|
||
kernel set to a sub-configuration that did not share the same name.
|
||
(This didn't affect the CFLAGS that were actually used.)
|
||
- Updated test/3m4m/Makefile, removing acml targets and renaming the
|
||
remaining targets.
|
||
|
||
commit 817b67c01752e0ca8fe230bb8ad23afc7bd0f64e
|
||
Merge: 67c9c2f8 2b7108a8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 16 14:06:26 2018 -0500
|
||
|
||
Merge branch 'dev' of github.com:flame/blis into dev
|
||
|
||
commit 67c9c2f86d5ef2accc439b21581d73d82754a2e3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 16 14:03:12 2018 -0500
|
||
|
||
Retired haswell gemm microkernels.
|
||
|
||
Details:
|
||
- Moved microkernels in kernels/haswell/3 to kernels/haswell/3/old. These
|
||
microkernels were no longer being used and only sowed confusion to
|
||
anyone inspecting the repository without being fully cognizant of the
|
||
build system and how it works (and sometimes even to those who wrote
|
||
the build system). Note that the haswell configuration currently
|
||
employs the zen microkernels.
|
||
|
||
commit 2b7108a8ef8ce958b3acad028ff07c85ff97fd63
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 16 12:35:53 2018 -0500
|
||
|
||
Minor updates to test driver makefiles.
|
||
|
||
Details:
|
||
- Cleaned up and homogenized the various test driver Makefiles in
|
||
testsuite and test directories.
|
||
- Very minor updates to test driver code.
|
||
|
||
commit 9f56df95570a24587b910b169f342bd356ccbfb6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 11 14:51:36 2018 -0500
|
||
|
||
Trivial tweaks to configure blacklisting output.
|
||
|
||
Details:
|
||
- Updated output of information vis-a-vis configuration blacklisting.
|
||
|
||
commit f56481efebd9a7785c0618f3a12c0bec36f46333
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 10 19:02:21 2018 -0500
|
||
|
||
Cleaned up assembler version query on OS X.
|
||
|
||
Details:
|
||
- Swiched from querying version of 'objdump' to 'as' (e.g. the
|
||
assembler).
|
||
- Fixed the outputting of the version of 'as' on OS X, which required
|
||
this beauty:
|
||
...=$(as -v /dev/null -o /dev/null 2>&1)
|
||
- Only add sub-configs to blacklist if the sub-config hasn't already
|
||
been added.
|
||
|
||
commit 088c474e629535affbe111f141f895af50d109be
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 10 18:09:56 2018 -0500
|
||
|
||
Added support for blacklisting via the assembler.
|
||
|
||
Details:
|
||
- Added logic to configure that attempts to assemble various small files
|
||
containing select instructions designed to reveal whether binutils
|
||
(specifically, the assembler) supports emitting those instruction sets.
|
||
This information provides additional opportunities to blacklist sub-
|
||
configurations that are unsupported by the environment. Thanks to Devin
|
||
Matthews for pointing me towards a similar solution in TBLIS as an
|
||
example.
|
||
- Various other cleanups in configure.
|
||
- Reorganized the detection code in the 'build' directory, bringing the
|
||
"auto-detect" configuration detection, libmemkind detection, and new
|
||
instruction set detection codes into a single new subdirectory named
|
||
'detect'.
|
||
|
||
commit 78a24e7dada52a3582f8488795bd1a44993989d9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 9 17:02:13 2018 -0500
|
||
|
||
Updated bli_avx512_macros.h in knl and skx configs.
|
||
|
||
Details:
|
||
- Downloaded updated version of bli_avx512_macros.h from TBLIS [1] in
|
||
attempt to address issue #192.
|
||
[1] https://github.com/devinamatthews/tblis/
|
||
|
||
commit 388f64d6ade14caa4a6c286845ad2d565378b2bb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 9 15:33:10 2018 -0500
|
||
|
||
Fixed failure to honor CC= argument to configure.
|
||
|
||
Details:
|
||
- Fixed a failure to observe the value of CC when selecting the compiler
|
||
in configure. Thanks to Devangi Parikh for reporting this bug.
|
||
- The semantics now also work for the CC environment variable. That is,
|
||
if CC is set prior to running configure, that value is used, but will
|
||
be overridden by specifying the CC= argument to configure. If the CC
|
||
environment variable is not set, the CC= value is used. If neither the
|
||
environment variable nor CC= are specified, then the choice is made
|
||
internally to configure: first attempting to find gcc, then clang, and
|
||
then cc.
|
||
|
||
commit 45fbe66b3e2ab92f0b4fdf437d57c5d06603803d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 9 14:01:08 2018 -0500
|
||
|
||
Fixed libmemkind dependency for x86_64.
|
||
|
||
Details:
|
||
- Removed some old conditional code in config/knl/make_defs.mk that
|
||
added -lmemkind to LDFLAGS if DEBUG_TYPE was not 'sde' and inserted
|
||
code into common.mk that affirmatively filters out -lmemkind from
|
||
LDFLAGS if DEBUG_TYPE is 'sde'. (Thanks to Dave Love for reporting
|
||
this issue.) Other minor cleanups to neighboring code in common.mk.
|
||
- Updated CRVECFLAGS in knl/make_defs.mk to be based on -march=knl,
|
||
and then AVX-512 functionality is manually removed via various
|
||
-mno-avx512* flags. Also, make the setting of CRVECFLAGS conditional
|
||
on CC_VENDOR. Similar change to skx/make_defs.mk.
|
||
- Comment/whitespace updates.
|
||
|
||
commit ca982148b3b419db063cad2fa74376ec383a5c80
|
||
Author: dnp <devangiparikh@gmail.com>
|
||
Date: Sun Apr 8 21:27:10 2018 -0500
|
||
|
||
Fixed bug in SKX sgemm microkernel. Modified SKX dgemm mircokernel to be consistent with the sgemm microkernel
|
||
|
||
commit bd0276752ccdd56ff897b1a5ae022f2ffe6e0b38
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 6 18:51:43 2018 -0500
|
||
|
||
Track separate ref kernel flags for each sub-config.
|
||
|
||
Details:
|
||
- Renamed CVECFLAGS variables in sub-configurations' make_defs.mk files
|
||
to CKVECFLAGS.
|
||
- Added default defintions of two new make variables to most sub-
|
||
configurations' make_defs.mk files--CROPTFLAGS and CRVECFLAGS--
|
||
which correspond to reference kernel analogues of the CKOPTFLAGS
|
||
and CKVECFLAGS, which track optimization and vectorization flags for
|
||
optimized kernels. Currently, two sub-configurations (knl and skx)
|
||
explicitly set CRVECFLAGS to non-default values (using AVX2 instead of
|
||
AVX-512 for reference kernels. Thanks to Jeff Hammond, whose feedback
|
||
prompted me to make this change (issue #187).
|
||
- Changed common.mk so that the get-refkern-cflags-for function returns
|
||
the flags associated with the given sub-configuration's CROPTFLAGS
|
||
and CRVECFLAGS (instead of CKOPTFLAGS and CKVECFLAGS).
|
||
|
||
commit b9aebce19480448817373e2df2b36bd090eae41a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 6 18:37:33 2018 -0500
|
||
|
||
De-verbosify makefile fragment generation.
|
||
|
||
Details:
|
||
- Changed from -v1 to -v0 when calling gen-make-frag.sh from configure.
|
||
The directory-by-directory recursive output didn't add much value to
|
||
the user, so now we just echo a line for each top-level directory into
|
||
which we will recurse (e.g. 'config', 'ref_kernels', 'frame', etc.).
|
||
This also helps keep more interesting information (from earlier in the
|
||
execution of configure) from scrolling out of the terminal window.
|
||
|
||
commit b549b91f26948991e13364f1f26a878da0f43aa0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 6 16:31:33 2018 -0500
|
||
|
||
Added 64-bit integer support to BLAS test drivers.
|
||
|
||
Details:
|
||
- Updated the build system and BLAS test drivers to use 64-bit integers
|
||
when BLIS is configured for 64-bit integers in the BLAS layer. Also
|
||
updated blastest/Makefile accordingly. Thanks to Dave Love for
|
||
reporting the need for this feature.
|
||
- Added a 'check' target to blastest/Makefile so that the user can see
|
||
a summary of the tests.
|
||
- Commented out the initial definition of INCLUDE_PATHS in common.mk,
|
||
which was used pre-monolithic header, back when BLIS needed paths to
|
||
*all* headers, rather than just a select few. This line is no longer
|
||
needed since the value of INCLUDE_PATHS is overwritten by a later
|
||
definition limited to only the header paths that are needed now.
|
||
|
||
commit d39fa1c04265869bdf8b6f453076359eec2f3c59
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 5 19:38:35 2018 -0500
|
||
|
||
Adjusted CFLAGS used to compile bli_cntx_ref.c.
|
||
|
||
Details:
|
||
- Removed CKOPTFLAGS and CVECFLAGS from the set of CFLAGS used to
|
||
compile bli_cntx_ref.c for each configuration. This is necessary
|
||
because the file defines functions like bli_cntx_init_skx_ref(),
|
||
which are called during BLIS's initialization of the global kernel
|
||
structure, potentially being executed by an architecture that lacks
|
||
the instruction set used to compile the kernels for, in this example,
|
||
skx, which would lead to an illegal instruction error. Thanks to
|
||
Dave Love for reporting this issue.
|
||
- Further adjusted CFLAGS used when compiling code in the 'config'
|
||
directory (e.g. bli_cntx_init_skx.c) as well as code in 'frame' so
|
||
as to avoid the aforementioned issue.
|
||
|
||
commit 08b123084d35680beab379012f8f5a5a8b44a443
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 5 14:25:39 2018 -0500
|
||
|
||
Added color-coding to 'make check' output.
|
||
|
||
Details:
|
||
- Added color coding to output of check-blistest.sh, check-blastest.sh
|
||
scripts. Success messages are coded green and failure are coded red.
|
||
This helps draw the eye toward those messages as the 'make checkblis',
|
||
'make checkblis-fast', and 'make checkblas' targets are executed.
|
||
- Changed top-level Makefile so that execution will not halt if
|
||
'checkblis', 'checkblis-fast', or 'checkblas' targets fail, which
|
||
means that the second of the two tests (BLIS and BLAS) run by
|
||
'make check' will run even if the first test fails.
|
||
|
||
commit c9e4d7db7410b03c1ffe8c9727e9f1b2ba7fecfe
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 4 17:13:15 2018 -0500
|
||
|
||
CHANGELOG update (0.3.1)
|
||
|
||
commit 1f28d7c86e17730f05bd239c8e8d67e3e7510a4f (tag: 0.3.1)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 4 17:13:15 2018 -0500
|
||
|
||
Version file update (0.3.1)
|
||
|
||
commit e6cc9ee26bcf0450f1120d5d12985b04d9fb8516
|
||
Merge: 786d15c5 3c91c7ae
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 4 16:08:18 2018 -0500
|
||
|
||
Merge branch 'dev' of github.com:flame/blis into dev
|
||
|
||
commit 786d15c5ef09f1f647b126b63d57e76d5810c58e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 4 16:06:47 2018 -0500
|
||
|
||
Added skx, knl to x86_64 configuration family.
|
||
|
||
Details:
|
||
- Added 'skx' and 'knl' sub-configurations to the 'x86_64' configuration
|
||
family in the config_registry file.
|
||
- Added logic to configure that avoids committing certain sub-configs to
|
||
the configuration/kernel registries if those sub-configs cannot be
|
||
handled properly by the chosen compiler. (This was modeled after
|
||
similar logic in TBLIS's configure; thanks to Devin Matthews for
|
||
pointing this out.) First, the compiler and its version are inspected
|
||
and, based on the results, certain configurations are added to a
|
||
"blacklist". Then, as the configuration registries are being created,
|
||
configurations and/or kernels that match items in the blacklist are
|
||
skipped over and not commited to the registries. Under certain
|
||
circumstances, omitting a blacklisted configuration will indirectly
|
||
invalidate other configurations due to the loss of availability of
|
||
the original blacklisted configuration's kernel set. This additional
|
||
indirect blacklist is also accounted for.
|
||
- Added output to the beginning of configure that echos information
|
||
about the chosen compiler as well as the configurations that are
|
||
blacklisted and must be stripped from the registries.
|
||
- Various other cleanups in configure, especially with respect to
|
||
explicitly declaring local variables in functions.
|
||
- Comment updates to config/zen/make_defs.mk regarding choice of -march
|
||
flags based on compiler version.
|
||
|
||
commit 3c91c7aebafb446a2582267beb3b22c8bb475b3b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 2 12:40:25 2018 -0500
|
||
|
||
Fixed 64b type mismatch warning in cblas_xerbla.c.
|
||
|
||
Details:
|
||
- Fixed a compiler warning concerning a type mismatch between the
|
||
format specifier of the printf() call in cblas_xerbla.c and its
|
||
corresponding (info) argument. The warning manifested when the CBLAS
|
||
layer was enabled and the BLAS/CBLAS integer type siwas is set to 64
|
||
(the default is 32). The warning was fixed by changing the specifier
|
||
from %d to %jd and typecasting the argument to intmax_t. Thanks to
|
||
Dave Love for reporting this issue and submitting the patch.
|
||
|
||
commit 71eaf449a812fe2bd640d21513ec83974b2edb45
|
||
Merge: 6a628184 ae9a5be5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 27 17:21:43 2018 -0500
|
||
|
||
Merge branch 'dev'
|
||
|
||
commit ae9a5be56d6f9b87278d6032154d2dcf3fb7d54f
|
||
Author: dnp <devangiparikh@gmail.com>
|
||
Date: Tue Mar 27 17:01:23 2018 -0500
|
||
|
||
Fixed bug in skx sgemm microkernel
|
||
|
||
commit 3f02af0905b1e2e2e065862f8afe5e9a52f282b2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 26 17:40:04 2018 -0500
|
||
|
||
Row storage optimizations to zen dotxf kernels.
|
||
|
||
Details:
|
||
- Split the main loop bodies of zen's [sd]dotxf kernels into two cases:
|
||
one to handle a column-stored matrix A and one to handle a row-stored
|
||
matrix A. This allows vector instructions to be employed even if A is
|
||
stored by rows (and A^T appears stored as columns). Both storage cases
|
||
use a common edge case loop. Thanks to Devin Matthews for this idea
|
||
and for prototyping the change needed for sdotxf kernel.
|
||
|
||
commit 679dcc331dd870ec680e135a3fb65ffa6e3a91c2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 26 15:35:17 2018 -0500
|
||
|
||
Make k_iter/k_left uint64_t in bulldozer fma ukrs.
|
||
|
||
Details:
|
||
- Changed the declaration of k_iter and k_left for d, c, z microkernels
|
||
from dim_t to uint64_t. This is needed to ensure compatibility with
|
||
the movq instruction used to load the value into registers. This
|
||
change should have been made a long time ago, but for some reason
|
||
only recently began showing up via Travis CI.
|
||
|
||
commit 6a628184f6938673440e4cdd4fed0208c51fd1f9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 26 14:48:16 2018 -0500
|
||
|
||
Fixed a memkind-related compile-time bug on knl.
|
||
|
||
Details:
|
||
- Fixed a compile-time error that occurred due to the fact that
|
||
BLIS_ENABLE_MEMKIND, defined in bli_config.h, was not being defined
|
||
soon enough to be used in bli_system.h where it is needed to determine
|
||
whether hbwmalloc.h should be #included. bli_system.h is now included
|
||
after bli_config.h (and bli_config_macro_defs.h). Thanks to Dave Love
|
||
for reporting this issue.
|
||
- Tweaked the language used by configure to echo the status of the
|
||
--with[out]-memkind option.
|
||
|
||
commit e2192a8fd58ec3657434ddd407033e097edad8f4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 23 12:53:48 2018 -0500
|
||
|
||
Removed vzeroupper intrinsics from zen kenels.
|
||
|
||
Details:
|
||
- Fixed a bug in the zen (also used by haswell) dotxf kernels whereby a
|
||
vzeroupper instruction destoryed part of the intermediate result
|
||
stored by the vdpps instructions that came right before. (The
|
||
vzeroupper instrinsic was removed.)
|
||
- Removed remaining vzeroupper instrinsics from other zen kernels.
|
||
Previously, the vzeroupper instructions were included because BLIS is
|
||
typically compiled with -mfpmath=sse. But it was brought to my
|
||
attention that inserting these vzeroupper instructions is unnecessary
|
||
for our purposes, since (a) -mfpmath=sse results in VEX-encoded scalar
|
||
code rather than literal SSE instructions, and (b) compilers already
|
||
(likely) insert vzeroupper instructions where necessary. Thanks to
|
||
Devin Matthews for zeroing in on the dotxf bug.
|
||
- Removed -malign-double from bulldozer make_defs.mk. This alignment
|
||
was already happening by default since bulldozer is an x86_64 system.
|
||
|
||
commit 22289ad23cd10b81451ce82f60d84b5f97e7fd85
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Mar 22 18:21:30 2018 -0500
|
||
|
||
Added build system support for libmemkind.
|
||
|
||
Details:
|
||
- Added support for libmemkind to configure. configure attempts to
|
||
detect the presence of libmemkind by compiling a small program
|
||
containing #include <hbwmalloc.h> and a call to hbw_malloc(). If
|
||
successful, it is assumed that libmemkind is present and available.
|
||
If present, use of libmemkind is enabled by default, and otherwise
|
||
use is disabled by default. If libmemkind is present, the user may
|
||
explicitly disable use of the library by running configure with the
|
||
--without-memkind option. Furthermore, a configuration may disable
|
||
libmemkind, perhaps conditional on some aspect of the build system,
|
||
by including -DBLIS_DISABLE_MEMKIND in the configuration's CPPROCFLAGS
|
||
make variable and setting the BLIS_ENABLE_MEMKIND makefile variable,
|
||
set in config.mk, to 'no'. (The knl configuration makes use of this
|
||
latter feature; see below.)
|
||
- If enabled at configure-time, bli_system.h will #include <hbwmalloc.h>
|
||
and bli_kernel_macro_defs.h will define BLIS_MALLOC_POOL and
|
||
BLIS_FREE_POOL to use hbw_malloc() and hbw_free(), respectively.
|
||
- Deprecated explicit use of BLIS_NO_HBWMALLOC in
|
||
config/knl/bli_family.knl.h and replaced use of -DBLIS_NO_HBWMALLOC in
|
||
config/knl/make_defs.mk with -DBLIS_DISABLE_MEMKIND, which overrides
|
||
(#undefs) the definition of BLIS_ENABLE_MEMKIND in bli_system.h, if it
|
||
would otherwise be defined. Also, set the BLIS_ENABLE_MEMKIND makefile
|
||
variable to 'no'.
|
||
- common.mk now adds libmemkind to LDFLAGS if libmemkind is enabled.
|
||
|
||
commit 7dc40eafdd9af3e8c4519a8d1b04d25830b4ca7a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Mar 21 18:39:16 2018 -0500
|
||
|
||
Updates to top-level and test driver Makefiles.
|
||
|
||
Details:
|
||
- Added logic to common.mk that will choose a BLIS library against which
|
||
to link (LIBBLIS_LINK). The default choice is the static (.a) library;
|
||
the shared (.so) library is chosen only if the shared library build was
|
||
enabled and the static one was disabled.
|
||
- Updated the various test driver Makefiles to reference this common,
|
||
pre-chosen library against which to link. (Previously, these drivers
|
||
unconditionally linked against the static library and would have
|
||
failed if the static library build was disabled at configure-time.)
|
||
- Renamed many of the variables in common.mk and the top-level Makefile
|
||
so that variables relating to the libblis.[a|so] files, including
|
||
paths to those files, begin with "LIBBLIS".
|
||
- Shuffled around some of the library definitions from the top-level
|
||
Makefile to common.mk.
|
||
- Renamed BLIS_ENABLE_DYNAMIC_BUILD to BLIS_ENABLE_SHARED_BUILD, and
|
||
the @enable_dynamic@ anchor to @enable_shared@ in build/config.mk.in
|
||
and in configure.
|
||
- A few other cleanups in the top-level Makefile.
|
||
|
||
commit 97e1eeade3c51df1bae574a9bc1da34b05bf2bd3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Mar 21 15:47:11 2018 -0500
|
||
|
||
Added input.operations.fast file for 'make check'.
|
||
|
||
Details:
|
||
- Added an 'input.operations.fast' file to testsuite directory to go
|
||
along with the 'input.general.fast' file used by the 'make check'
|
||
target in the top-level Makefile. This will allow the "fast" check
|
||
to prune operations and/or parameter combinations from the test
|
||
space in order to save time.
|
||
- Currently, input.operations.fast prunes trmm3 and all transposition
|
||
and conjugation parameters from the level-3 test space.
|
||
- Reduced problem size tested in input.general.fast to 100 and disabled
|
||
testing of 1m method.
|
||
|
||
commit c441caa95aabe69f54e2160eb67bf4ca76a66c34
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 20 17:56:02 2018 -0500
|
||
|
||
README update.
|
||
|
||
Details:
|
||
- Minor updates to README.md.
|
||
- Minor change to blastest/Makefile.
|
||
|
||
commit 6fe018eb4ac8c16f2edc916c24f5994848017b7f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 20 15:35:45 2018 -0500
|
||
|
||
Added .gitkeep file to blastest/obj.
|
||
|
||
Details:
|
||
- Added an empty file named '.gitkeep' to blastest/obj/ so that git will
|
||
track the otherwise empty directory. (This is already done for the BLIS
|
||
testsuite in testsuite/obj.)
|
||
|
||
commit 0e6d000db9291342913dc5f8590a28c67bbcbc95
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 20 15:08:43 2018 -0500
|
||
|
||
Updated .gitignore to ignore BLAS test out.* files.
|
||
|
||
commit 40c040a31d96fbadff11f761d0cad1ef03ef2cc5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 20 14:33:50 2018 -0500
|
||
|
||
Fixes to .travis.yml.
|
||
|
||
Details:
|
||
- Invoke the full BLIS testsuite via 'make testblis' instead of the fast
|
||
version via 'blistest-fast' (which was wrong anyway, since the correct
|
||
fast traget is 'testblis-fast').
|
||
- Invoke the BLAS tests via 'make testblas' instead of 'blastest'.
|
||
|
||
commit 664ec4813d8b53121cce7a68bef47da656ece9cb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 20 13:54:58 2018 -0500
|
||
|
||
Integrated f2c'ed netlib BLAS test suite.
|
||
|
||
Details:
|
||
- Created a new test suite that exercises only the BLAS compatibility
|
||
found in BLIS. The test suite is a straightforward port of code
|
||
obtained from netlib LAPACK, run through f2c and linked to a stripped-
|
||
down version of libf2c that is compiled along with the test drivers
|
||
(to prevent any obvious ABI issues). The new BLAS test suite can be
|
||
run from within its new local directory, 'blastest' (through its local
|
||
'make ; make run' targets) or from the top-level Makefile (via the
|
||
'make testblas' target). Output files are created in whatever directory
|
||
the test drivers are run, whether it be the 'blastest' directory, the
|
||
top-level source distribution directory, or the out-of-tree directory
|
||
in which 'configure' was run. Also, the results of the BLAS test suite
|
||
can be checked via 'make checkblas', which summarizes the presence or
|
||
absence of test failures in a single line printed to stdout.
|
||
- Updated the 'test' target to run both 'testblis' and 'testblas'.
|
||
- Added a new 'testblis-fast' target that runs the BLIS testsuite with
|
||
smaller problem sizes, allowing it to finish more quickly.
|
||
- Added a 'make check' target, which runs 'checkblis-fast' and
|
||
'checkblas'.
|
||
- Changed .travis.yml so that Travis CI runs 'testblis-fast' instead of
|
||
'testblis' before (calling the check-blistest.sh script to check the
|
||
result manually).
|
||
- Renamed some targets in the top-level Makefile to be consistent between
|
||
BLAS and BLIS.
|
||
|
||
commit 40fa10396c0a3f9601cf49f6b6cd9922185c932e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 19 18:19:43 2018 -0500
|
||
|
||
Fixed a few obscure bugs in the BLAS API.
|
||
|
||
Details:
|
||
- Fixed a missing parameter in the definition of sdsdot_(). The 'sb'
|
||
argument was missing. Strangely, the argument is omitted from dsdot_()
|
||
in the BLAS API.
|
||
- Fixed the missing 'c' or 'u' in the "?gerc" or "?geru" operation string
|
||
passed to xerbla_() by the bla_ger_check() macro.
|
||
- For bla_syrk_check() and bla_syr2k_check() macros, only allow
|
||
conjugate-transpose (trans='c') as a valid argument for the real
|
||
domain functions [sd]syrk_() and [sd]syr2k_(). (Previously, the
|
||
argument was allowed even for the complex domain equivalents, which
|
||
was inconsistent with the BLAS API.)
|
||
|
||
commit fe7d7f1e43e4c26249eed83d4188beee1ba96202
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Mar 18 19:43:06 2018 -0500
|
||
|
||
Fixed cpp macro parameter "ch" typo in bla_ger.c.
|
||
|
||
Details:
|
||
- Previously, the BLAS routine-generating macro in bla_ger.c was
|
||
incorrectly passing MKSTR(ch) into the _check() macro when it
|
||
should have been passing in the char that was available, chxy.
|
||
I've instead changed the name of the macro parameter from chxy
|
||
to ch. Similar change as made to bla_ger.h for consistency.
|
||
Thanks to Dave Love in helping track this down. (NOTE: This is
|
||
actually the root cause of the bug that was first patched by
|
||
increasing the length of the operation name strings passed into
|
||
xerbla_(), as defined by the constant BLIS_MAX_BLAS_FUNC_STR_LENGTH,
|
||
in 3d1a5a7. In theory, that change could be backed out now.)
|
||
- Applied aforementioned chxy->ch change to bla_dot.[ch], as well as
|
||
frame/compat/cblas/f77_sub/f77_dot_sub.[ch] (not because it needed
|
||
to happen, but for naming consistency).
|
||
- Reformatted function signatures/prototypes of CBLAS functions and
|
||
function calls to BLAS in frame/compat/cblas/f77_sub/*.c.
|
||
|
||
commit cb7ed90752d1ddbac11368c4510641ca4f3a02eb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 16 13:05:56 2018 -0500
|
||
|
||
Convert op names to uppercase before calling xerbla_().
|
||
|
||
Details:
|
||
- Defined a new function, bli_string_mkupper(), that calls toupper() on
|
||
every non-NULL character in a string.
|
||
- Call bli_string_mkupper() prior to calling xerbla_() in the level-2/-3
|
||
BLAS _check() macros. This prevents the BLAS testsuite from complaining
|
||
that the operation name (e.g. "dgemm") does not match the expected
|
||
value (e.g. "DGEMM"). Thanks to Dave Love for reporting this issue.
|
||
|
||
commit 3d1a5a7c08fed3ba29f060fe1db2b0dc42dde223
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 16 12:24:07 2018 -0500
|
||
|
||
Fixed printf() format overflow.
|
||
|
||
Details:
|
||
- Increased the length of operation name strings passed to xerbla_() in
|
||
the level-2 and level-3 operation _check() functions, found in
|
||
frame/compat/check. This avoids a format specifier overflow warning by
|
||
gcc 7. Thanks to Dave Love for reporting this issue and suggesting the
|
||
fix.
|
||
|
||
commit c73055f028684d998e03b2392093c393782bbfe7
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Mar 15 16:08:21 2018 -0500
|
||
|
||
Return after non-zero info in BLAS checks.
|
||
|
||
Details:
|
||
- Previously, when calling the BLAS compatibility layer, discovering a
|
||
parameter check failure would result in the proper setting of the
|
||
info parameter (printed by xerbla_()), but would also come with an
|
||
immediate abort() rather than a return. This was incorrect behavior
|
||
for two overlapping reasons.
|
||
(1) BLAS should return gracefully to the caller in the event of a
|
||
bad set of parameters, not abort().
|
||
(2) When BLIS was being tested via the BLAS testsuite, BLIS's
|
||
xerbla_() would correctly get preempted/overridden by the
|
||
xerbla_() in the BLAS testsuite, but execution would then
|
||
erroneously continue on to the BLIS implementation with bad
|
||
parameter values.
|
||
- The previous issue was addressed by disabling the abort() in BLIS's
|
||
xerbla_(), changing all of the BLAS _check() functions to cpp macros,
|
||
and adding a return statement to the end of each _check() macro's
|
||
"if ( info != 0 )" conditional.
|
||
Thanks to Dave Love for reporting this issue.
|
||
|
||
commit c4f1d18b97a6a8c3ea0366aa759db597a664062a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Mar 14 19:10:09 2018 -0500
|
||
|
||
Minor typo fix to printing arch in testsuite.
|
||
|
||
Details:
|
||
- Mistakenly was calling bli_cpuid_query_id() instead of
|
||
bli_arch_query_id() in the recent addition to the testsuite output
|
||
that prints the active sub-configuration. The former function is
|
||
only used for multi-architecture builds, whereas the latter is the
|
||
more general option that also works for single configuration
|
||
(including 'configure auto') builds.
|
||
|
||
commit 8f2fabec800a720b3e94b33c0048cc8c4ead436d
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Mar 14 17:43:42 2018 -0500
|
||
|
||
Make arm32 and arm64 families work. (#176)
|
||
|
||
commit fc6a1842518a0820c6708c285611346d5a1419da
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Mar 14 15:31:17 2018 -0500
|
||
|
||
Print sub-configuration name in testsuite output.
|
||
|
||
Details:
|
||
- Added a line to the testsuite output that prints the name of the
|
||
current/active sub-configuration. This is useful when linking the
|
||
testsuite against multi-configuration builds because it confirms
|
||
the sub-configuration that is actually being employed at runtime.
|
||
Thanks to Devin Matthews for suggesting this feature.
|
||
|
||
commit 9943a899d64bf7ec4a24106f6f4c70629bbe1f6e
|
||
Merge: 290dd4a9 b1a15ae6
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Mar 14 13:27:44 2018 -0500
|
||
|
||
Merge pull request #173 from devinamatthews/dev
|
||
|
||
Fix Cortex-A9 and Cortex-A15 configs.
|
||
|
||
commit b1a15ae6ee0f46c9a95cf59f9555925e0e8e21ff
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Mar 14 13:26:44 2018 -0500
|
||
|
||
Use BLIS_H_FLAT
|
||
|
||
commit 290dd4a9feee447e69b40ad108954af78e196f7e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Mar 14 13:15:37 2018 -0500
|
||
|
||
Allow arbitrarily deep configuration families.
|
||
|
||
Details:
|
||
- Updated configure so that configuration families specified in the
|
||
config_registry are no longer constrained as being only one level
|
||
deep. For example, previously the x86_64 family could not be defined
|
||
concisely in terms of, say, intel64 and amd64 families, and instead
|
||
had to be defined as containing "haswell, sandybridge, penryn, zen,
|
||
etc." In other words, families were constrained to only having
|
||
singleton configurations as their members. That constraint is now
|
||
lifted.
|
||
- Redefined x86_64 family in config_registry in terms of intel64 and
|
||
amd64.
|
||
|
||
commit 9cee78e006d56543ac02fc9c488905c0434e60ae
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Mar 14 13:09:48 2018 -0500
|
||
|
||
Fix Cortex-A9 and Cortex-A15 configs.
|
||
|
||
Tested with QEMU.
|
||
|
||
commit 1a3031740f7fcbbcc2c99d5c4cb50d0413407455
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 13 16:04:40 2018 -0500
|
||
|
||
Updates to ARM hardware detection support.
|
||
|
||
Details:
|
||
- Updated/clarified the ARM preprocessor macro branch of bli_cpuid.c.
|
||
Going forward, cortexa57 (64-bit), cortexa15, and cortexa9 (32-bit)
|
||
sub-configurations are supported. However, the functions that detect
|
||
features specific to a15 and a9 are identical, and since a15 is tested
|
||
first, it will always be chosen for arm32 hardware (even if both
|
||
sub-configurations were enabled at configure-time and the library is
|
||
linked and run on an a9). Thus, more work needs to be done to
|
||
distinguish these two.
|
||
- Added cpp guard around x86_64 portions of bli_cpuid.c. Now, either
|
||
the x86_64 or ARM code will be compiled (or neither, if neither
|
||
environment is detected).
|
||
- In bli_arch_query_id(), call bli_cpuid_query_id() when the
|
||
BLIS_FAMILY_ARM64 or BLIS_FAMILY_ARM32 macros are defined.
|
||
- Added arm64 and arm32 configuration families to config_registry.
|
||
- Added a note to the arch_t typedef enum in bli_type_defs.h reminding
|
||
the developer to update the string array in bli_arch.c whenever new
|
||
enum values are added or existing values are reordered.
|
||
|
||
commit 1442d06886ebdc34d8f1cb620229ddc6062c2ce8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Mar 11 16:59:50 2018 -0500
|
||
|
||
Fixed misnamed kernels in _cntx_init_cortexa57.c.
|
||
|
||
Details:
|
||
- Changed incorrect kernel function names in bli_cntx_init_cortexa57.c:
|
||
bli_sgemm_cortexa57_asm_8x12 -> bli_sgemm_armv8a_asm_8x12
|
||
bli_dgemm_cortexa57_asm_6x8 -> bli_dgemm_armv8a_asm_6x8
|
||
Thanks to Jacob Gorm Hansen for reporting this issue.
|
||
|
||
commit 48da9f5805f0a49f6ad181ae2bf57b4fde8e1b0a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Mar 7 12:54:06 2018 -0600
|
||
|
||
Tweaked common.mk, Makefile, skx/knl make_defs.mk.
|
||
|
||
Details:
|
||
- Reorganized linker-related section of common.mk so that LDFLAGS set
|
||
in a sub-configuration's make_defs.mk file will not be immediately
|
||
(and erroneously) overridden by the default values.
|
||
- Re-enabled redirected (to file) output of the testsuite when run from
|
||
the top-level Makefile via 'make test'. (For some reason, it was
|
||
commented-out for the non-verbose case.)
|
||
- Removed old/unnecessary code from the make_defs.mk files of skx and
|
||
knl sub-configurations.
|
||
|
||
commit 8b0475a87daa177916e2caac0e530c6a57fa07cf
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 6 06:39:44 2018 -0600
|
||
|
||
Fixed typo in attempted fix in 1a8350f7.
|
||
|
||
Details:
|
||
- Mistakenly entered 148 as knl mc blocksize for double real when the
|
||
value should have been 144. Thanks to Dave Love for reporting this.
|
||
|
||
commit 8912e6886b97eabb4ce0c35a3609a0fd994d347b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 5 18:00:45 2018 -0600
|
||
|
||
Fixed missing flags during shared object build.
|
||
|
||
Details:
|
||
- Fixed a bug in common.mk that caused warning, position-independent
|
||
code, miscellaneous, and general preprocessor flags to be omitted
|
||
from the configuration family-specific variables that hold those
|
||
values, as registered by the family's make_defs.mk file. This would
|
||
most obviously manifest when targeting a configuration family such as
|
||
'intel64' while simultaneously configuring for a shared object build,
|
||
as the key '-fPIC' flag would be omitted at compile-time and prevent
|
||
successful linking. Thanks to Dave Love for reporting this bug.
|
||
- Other cleanups to common.mk for readability and clarity.
|
||
|
||
commit 1a8350f70557fc53ca0c2eadf2076710dd0d9bc9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 5 13:32:00 2018 -0600
|
||
|
||
Fixed cache blocksize bug in knl configuration.
|
||
|
||
Details:
|
||
- Changed the mc blocksize for double real execution in the knl sub-
|
||
configuration from 160 to 148. The old value was not a multiple of
|
||
mr (which is 24), and thus the safeguards in bli_gks_register_cntx()
|
||
were tripping. Thanks for Dave Love for reporting this issue.
|
||
- Switch knl sub-configuration to use default blocksizes for datatypes
|
||
not supported by native kernels.
|
||
- Fixed typos in bli_error.c that prevented certain error strings
|
||
(which report maximum cache blocksizes not being multiples of their
|
||
corresponding register blocksize) from properly initializing.
|
||
|
||
commit c09fffa827fe6241dc20193a1c404496664220de
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Mar 3 13:13:39 2018 -0600
|
||
|
||
Added missing cntx_t* arg in knl packm kernels.
|
||
|
||
Details:
|
||
- Added the missing cntx_t* argument to the function signature of packm
|
||
kernels in kernels/knl/1m/. Thanks to Dave Love for reporting this
|
||
issue.
|
||
|
||
commit 1ef9360b1fd0209fbeb5766f7a35402fbd080fcb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Mar 1 14:36:39 2018 -0600
|
||
|
||
Enable non-unit vector stride tests by default.
|
||
|
||
Details:
|
||
- Change "vector storage schemes to test" parameter in testsuite's
|
||
input.general file to "cj". This means that both unit stride column
|
||
vectors and non-unit stride column vectors will be tested in
|
||
operations with vector operands (e.g. level-1v, level-1f, level-2).
|
||
- Very minor comment (typo) changes to input.operations.
|
||
|
||
commit 8c4e55a1a1ead9a5e970200fee027ffd2c7e8454
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 28 17:01:47 2018 -0600
|
||
|
||
Added individual operation overrides in testsuite.
|
||
|
||
Details:
|
||
- Updated the testsuite driver so that setting one or more individual
|
||
operation test switches to "2" in input.operations will enable ONLY
|
||
those operations and disable all others, regardless of the values of
|
||
the section overrides and other operation switches. This makes it
|
||
every easy to quickly test only one or two operations, and equally
|
||
easy to revert back to the previous combination of operation tests.
|
||
- Added more comments to input.operations describing the use of
|
||
individual "enable only" overrides.
|
||
|
||
commit 34862aed89e5d5a8f35aeecd49f3052ada1f337b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 28 15:30:14 2018 -0600
|
||
|
||
Use zen kernels in haswell sub-configuration.
|
||
|
||
Details:
|
||
- Register use of level-1v zen intrinsic kernels for amaxv, axpyv, dotv,
|
||
dotxv, and scalv, as well asl level-1f zen intrinsic kernels for axpyf
|
||
and dotxf. This works because these kernels simply target AVX/AVX2,
|
||
and therefore work without modification on haswell hardware.
|
||
- Switch to use of zen microkernels in bli_cntx_init_haswell.c. The zen
|
||
kernels are essentially identical to those used by haswell, except that
|
||
now zen kernels are a bit more up-to-date. In the future, I may
|
||
continue to maintain duplicates, or I may keep the kernels named after
|
||
one architecture (zen or haswell) but used by both sub-configurations.
|
||
- In config_registry, enable use of both haswell and zen kernels for the
|
||
haswell sub-configuration. This is necessary in order to make zen
|
||
kernels visible when registering kernels in bli_cntx_init_haswell.c.
|
||
- Enable use of assembly-based complex gemm microkernels for zen,
|
||
bli_cgemm_zen_asm_3x8() and bli_zgemm_zen_asm_3x4(), in
|
||
bli_cntx_init_zen.c. This was actually intended for 1681333.
|
||
|
||
commit d9079655c9cbb903c6761d79194a21b7c0a322bc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 23 17:42:48 2018 -0600
|
||
|
||
CHANGELOG update (0.3.0)
|
||
|
||
commit 709f8361ebc90b96b02ebe5c5ffb6fc3b1b25e58 (tag: 0.3.0)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 23 17:42:48 2018 -0600
|
||
|
||
Version file update (0.3.0)
|
||
|
||
commit 3defc7265c12cf85e9de2d7a1f243c5e090a6f9d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 23 17:38:19 2018 -0600
|
||
|
||
Applied 34b72a3 to non-active/unused microkernels.
|
||
|
||
Details:
|
||
- Applied the read-beyond-bounds bugfix in 34b72a3 to other haswell and
|
||
zen kernels (ie: other microtile shapes) which are not used by default.
|
||
This was done mostly in case someone decided to pick up these kernels
|
||
and start using them, not because it affects BLIS's behavior
|
||
out-of-the-box.
|
||
|
||
commit 34b72a351745aa0d47bb0b74ebcd0f0a616d613d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 23 16:33:32 2018 -0600
|
||
|
||
Fixed obscure read-beyond-bounds bug in sgemm ukrs.
|
||
|
||
Details:
|
||
- Fixed an obscure bug in the bli_sgemm_haswell_asm_6x16 and
|
||
bli_sgemm_zen_asm_6x16 microkernels when the input/output matrix C
|
||
is stored with general stride (ie: both rs and cs are non-unit). The
|
||
bug was rooted in the way those microkernels read from matrix C--
|
||
namely, they used vmovlps/vmovhps instead of movss. By loading two
|
||
floats at a time, even if one of them was treated as junk, the
|
||
assembly code could be written in a more concise manner. However,
|
||
under certain conditions--if m % mr == 0 and n % nr == 0 and the
|
||
underlying matrix is not an internal "view" into a larger matrix--
|
||
this could result in the very last vmovhps of the last (bottom-right)
|
||
microkernel invocation reading beyond valid memory. Specifically, the
|
||
low 32 bits read would always be valid, but the high 32 bits could
|
||
reside beyond the bounds of the array in which the output C matrix is
|
||
contained. To remedy this situation, we now selectively use movss to
|
||
load any element that could be the last element in the matrix.
|
||
|
||
commit 5112e1859e7f8888f5555eb7bc02bd9fab9b4442 (origin/rt)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 23 14:31:26 2018 -0600
|
||
|
||
Added missing 'restrict' to some kernels' cntx_t*.
|
||
|
||
Details:
|
||
- Added missing 'restrict' keyword to cntx_t* argument of function
|
||
signatures corresponding to level-1v, level-1f, and level-1m kernels.
|
||
This affected bli_l1v_ker_prot.h, bli_l1f_ker_prot.h, and
|
||
bli_l1m_ker_prot.h. (The 'restrict' was already being used to
|
||
qualify cntx_t* arguments for kernels defined in bli_l3_ker_prot.h.)
|
||
- Added comments to bli_l1v_ker.h, bli_l1f_ker.h, bli_l1m_ker.h, and
|
||
bli_l3_ukr.h that help explain how those headers function to produce
|
||
kernel prototypes using the prototype macros defined in the files
|
||
mentioned above.
|
||
|
||
commit 1fa8af95d807168e0849adb668492601e7009be0
|
||
Merge: c084b03b 16813335
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 21 17:54:02 2018 -0600
|
||
|
||
Merge branch 'rt'
|
||
|
||
commit c084b03b31d84427a120e391963db5419f1911ee
|
||
Merge: 5d03b6e6 fa74af4e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 21 17:52:17 2018 -0600
|
||
|
||
Merge branch 'rt'
|
||
|
||
commit 16813335bdb5978bc9a26cd00a32bd5a130130c4
|
||
Merge: fa74af4e 5a7005dd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 21 17:43:32 2018 -0600
|
||
|
||
Merge branch 'amd' into rt
|
||
|
||
Details:
|
||
- Merged contributions made by AMD via 'amd' branch (see summary below).
|
||
Special thanks to AMD for their contributions to-date, especially with
|
||
regard to intrinsic- and assembly-based kernels.
|
||
- Added column storage output cases to microkernels in
|
||
bli_gemm_zen_asm_d6x8.c and bli_gemmtrsm_l_zen_asm_d6x8.c. Even with
|
||
the extra cost of transposing the microtile in registers, this is
|
||
much faster than using the general storage case when the underlying
|
||
matrix is column-stored.
|
||
- Added s and d assembly-based zen gemmtrsm_u microkernel (including
|
||
column storage optimization mentioned above).
|
||
- Updated zen sub-configuration to reflect presence of new native
|
||
kernels.
|
||
- Temporarily reverted zen sub-configuration's level-3 cache blocksizes
|
||
to smaller haswell values.
|
||
- Temporarily disabled small matrix handling for zen configuration
|
||
family in config/zen/bli_family_zen.h.
|
||
- Updated zen CFLAGS according to changes in 1e4365b.
|
||
- Updated haswell microkernels such that:
|
||
- only one vzeroupper instruction is called prior to returning
|
||
- movapd/movupd are used in leiu of movaps/movups for double-real
|
||
microkernels. (Note that single-real microkernels still use
|
||
movaps/movups.)
|
||
- Added kernel prototypes to kernels/zen/bli_kernels_zen.h, which is
|
||
now included via frame/include/bli_arch_config.h.
|
||
- Minor updates to bli_amaxv_ref.c (and to inlined "test" implementation
|
||
in testsuite/src/test_amaxv.c).
|
||
- Added early return for alpha == 0 in bli_dotxv_ref.c.
|
||
- Integrated changes from f07b176, including a fix for undefined
|
||
behavior when executing the 1m method under certain conditions.
|
||
- Updated config_registry; no longer need haswell kernels for zen
|
||
sub-configuration.
|
||
- Tweaked marginal and pass thresholds for dotxf.
|
||
- Reformatted level-1v, -1f, and -3 amd kernels and inserted additional
|
||
comments.
|
||
- Updated LICENSE file to explicitly mention that parts are copyright
|
||
UT-Austin and AMD.
|
||
- Added AMD copyright to header templates in build/templates.
|
||
|
||
Summary of previous changes from 'amd' branch.
|
||
- Added s and d assembly-based zen gemm microkernels (d6x8 and d8x6) and
|
||
s and d assembly-based zen gemmtrsm_l microkernels (d6x8).
|
||
- Added s and d intrinsics-based zen kernels for amaxv, axpyv, dotv, dotxv,
|
||
and scalv, with extra-unrolling variants for axpyv and scalv.
|
||
- Added a small matrix handler to bli_gemm_front(), with the handler
|
||
implemented in kernels/zen/3/bli_gemm_small_matrix.c.
|
||
- Added additional logic to sumsqv that first attempts to compute the
|
||
sum of the squares via dotv(). If there is a floating-point exception
|
||
(FE_OVERFLOW), then the previous (numerically conservative) code is
|
||
used; otherwise, the result of dotv() is square-rooted and stored as
|
||
the result. This new implementation is only enabled when FE_OVERFLOW
|
||
is #defined. If the macro is not #defined, then the previous
|
||
implementation is used.
|
||
- Added axpyv and dotv standalone test drivers to test directory.
|
||
- Added zen support to old cpuid_x86.c driver in build/auto-detect/old.
|
||
- Added thread-local and __attribute__-related macros to bli_macro_defs.h.
|
||
|
||
commit 5d03b6e6e19d5a07f0cccf1a158f02fbd62dfd99
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Feb 19 11:31:30 2018 -0600
|
||
|
||
Fix asm macro include line for KNL. Fixes #167.
|
||
|
||
commit f07b176c84dc9ca38fb0d68805c28b69287c938a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 15 18:36:54 2018 -0600
|
||
|
||
Fixed an obscure bug in the 1m implementation.
|
||
|
||
Details:
|
||
- Fixed a bug in the way the bli_gemm1m_cntx_ref() function (defined in
|
||
ref_kernels/bli_cntx_ref.c) initializes its context for 1m execution.
|
||
Previously, the function probed the context that was in the process of
|
||
being updated for use with 1m--this context being previously
|
||
initialized/copied from a native context--for its storage preference
|
||
to determine which "variant" (row- or column-oriented) of 1m would be
|
||
needed. However, the _cntx_ref() function was not updating the method
|
||
field of the context until AFTER this query, and the conditional which
|
||
depended on it, had taken place, meaning the storage preference query
|
||
function would mistakenly think the context was for native execution,
|
||
since the context's method field would still be set to BLIS_NAT. This
|
||
would lead it to incorrectly grab the storage preference of the complex
|
||
domain microkernel rather than the corresponding real domain
|
||
microkernel, which could cause the storage preference predicate to
|
||
evaluate to the wrong value, which would lead to the _cntx_ref()
|
||
function choosing the wrong variant. This could lead to undefined
|
||
behavior at runtime. The method is now explicitly set within the
|
||
context prior to calling the storage preference query function.
|
||
- Updated comments in frame/ind/oapi/bli_l3_3m4m1m_oapi.c.
|
||
- Fixed a typo in the commented-out CFLAGS in config/zen/make_defs.mk,
|
||
which are appropriate for gcc 6.x and newer. (Mistakenly used
|
||
-march=bdver4 instead of -march=znver1.)
|
||
|
||
commit 1f94bb7b96eb2b67257e6c4df89e29c73e9ab386
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jan 19 12:46:53 2018 -0600
|
||
|
||
Document how to enable zen-specific instructions.
|
||
|
||
Details:
|
||
- Added as a comment in config/zen/make_defs.mk the list of compiler flags
|
||
that could be added to manually enable the instructions provided by the
|
||
Zen microarchitecture that are not already implied by -march=bdver4.
|
||
This information, along with the previous commit's flags to selectively
|
||
disable Bulldozer instructions no longer present in Zen, was gathered
|
||
from [1]. I hesitate to enable use of these instructions since I don't
|
||
have any Zen hardware to test on yet.
|
||
[1] https://wiki.gentoo.org/wiki/Ryzen
|
||
|
||
commit 1e4365b21bafa02bd108c5ac4705a25671fb9441
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jan 18 12:03:51 2018 -0600
|
||
|
||
Augment zen CFLAGS to prevent illegal instruction.
|
||
|
||
Details:
|
||
- Added various compiler flags (-mno-fma4 -mno-tbm -mno-xop -mno-lwp) so
|
||
that compiling with -march=bdver4 on zen-based architectures does not
|
||
result in an illegal instruction error at runtime. Note: This fix is
|
||
only needed for gcc 5.4; gcc 6.3 or later supports the use of
|
||
-march=znver1, which can be used in lieu of the augmented set of flags
|
||
based on bdver4. Thanks to Nisanth Padinharepatt for reporting this
|
||
error.
|
||
|
||
commit fa74af4e1fa7385ac3f3089fe1ea7bb88c906029
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jan 9 13:43:15 2018 -0600
|
||
|
||
Minor labeling update for './configure -c' output.
|
||
|
||
Details:
|
||
- Print the name of the configuration in the output of the
|
||
kernel-to-config map (and chosen pairs list) as a subtle way to remind
|
||
the user that these only apply to the targeted configuration (whereas
|
||
the config list and kernel list are printed without regard to which
|
||
configuration was actually targeted).
|
||
|
||
commit 5cdea756c7391e2c6cbfb38436ef9a205f860237
|
||
Merge: 9d8858b5 1e7a4896
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Jan 7 19:45:20 2018 -0600
|
||
|
||
Merge branch 'rt'
|
||
|
||
commit 9d8858b5cff4a4b078b87872847a5710073fff0a
|
||
Merge: 0b3ca3cf f7df64da
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Sun Jan 7 10:03:25 2018 -0600
|
||
|
||
Merge pull request #164 from devinamatthews/master
|
||
|
||
Don't use memkind for skx configuration.
|
||
|
||
commit f7df64daf6bbe6431effada6e13d8d1fab5aa221
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Sun Jan 7 09:37:25 2018 -0600
|
||
|
||
Don't use memkind for skx configuration. Fixes #163.
|
||
|
||
commit 1e7a4896e0cbe73c4685fa956278e3f28273cdf9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jan 5 12:33:48 2018 -0600
|
||
|
||
Minor error handling in update-version-file.sh.
|
||
|
||
Details:
|
||
- Added explicit handling of situations when 'git describe --tags'
|
||
returns an error. This command is used by update-version-file.sh
|
||
when deciding whether or not to update the version file prior to
|
||
configuration.
|
||
- Removed bli_packm.c and bli_unpackm.c, as they contained no source
|
||
code.
|
||
|
||
commit 0b3ca3cfb682715a3686fd93ebb10d4a695d1162
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jan 4 20:51:35 2018 -0600
|
||
|
||
Intelligently select compiler for auto-detection.
|
||
|
||
Details:
|
||
- Rewrote code that selects the compiler for the purposes of compiling
|
||
the auto-detection executable. CC (if specified) is tried first. Then
|
||
gcc. Then clang. The absolute fallback is cc. The previous code was
|
||
sort of broken, and seemed to unintentionally always use gcc.
|
||
- Moved various configuration-agnostic flags from config/*/make_defs.mk
|
||
files to common.mk. The new mechanism appends the configuration-
|
||
agnostic flags to the various compiler flag variables initialized in
|
||
make_defs.mk. Flags specific to the sub-configuration are still set
|
||
in make_defs.mk.
|
||
- Added -Wno-tautological-compare to CMISCFLAGS when clang is in use.
|
||
Also added the flag to the compiler instantiation during configure-
|
||
time hardware detection (when clang is selected).
|
||
- Added some missing (but mostly-optional) quotes to configure script.
|
||
|
||
commit 5a7005dd44ed3174abbe360981e367fd41c99b4b
|
||
Merge: 7be88705 3bc99a96
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Wed Jan 3 12:05:12 2018 +0530
|
||
|
||
Merge changes in AMD beta release 0.95 into amd branch
|
||
|
||
commit 0b9c5127e91508c115228ca604ee2dac8de8f477
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Dec 23 15:53:44 2017 -0600
|
||
|
||
Enabled C99, added stdint.h to auto-detect build.
|
||
|
||
Details:
|
||
- Added "-std=c99" to compiler arguments when building auto-detection
|
||
driver in configure script.
|
||
- Added #include <stdint.h> to all three source files needed by auto-
|
||
detection program.
|
||
|
||
commit 0ce5e19c318e04909d3e664d69accb3a0fc6b988
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Dec 23 15:32:03 2017 -0600
|
||
|
||
Reimplemented configure-time hardware detection.
|
||
|
||
Details:
|
||
- Reimplemented the hardware detection functionality invoked when running
|
||
"./configure auto". Previously, a standalone script in build/auto-detect
|
||
that used CPUID was used. However, the script attempted to enumerate all
|
||
models for each microarchitecture supported. The new approach recycles
|
||
the same code used for runtime hardware detection introduced in 2c51356.
|
||
This has two immediate benefits. First, it reduces and consolidates the
|
||
code required to detect microarchitectures via the CPUID instruction.
|
||
Second, it provides an indirect way of testing at configure-time the
|
||
code that is used to detect hardware at runtime. This code is (a) only
|
||
activated when targeting a configuration family (such as intel64 or
|
||
amd64) at configure-time and (b) somewhat difficult to test in
|
||
practice, since it relies on having access to older microarchitectures.
|
||
- The above change required placing conditional cpp macro blocks in
|
||
bli_arch.c and bli_cpuid.c which either #include "blis.h" or #include
|
||
a bare-bones set of headers that does not rely on the presence of a
|
||
bli_config.h header. This is needed because bli_config.h has not been
|
||
created yet when configure-time auto-detection takes places.
|
||
- Defined a new function in bli_arch.c, bli_arch_string(), which takes
|
||
an arch_t id and returns a pointer to a string that contains the
|
||
lowercase name of the corresponding microarchitecture. This function
|
||
is used by the auto-detection script to printf() the name of the
|
||
sub-configuration corresponding to the detected hardware.
|
||
|
||
commit 9804adfd405056ec332bb8e13d68c7b52bd3a6c1 (origin/selfinit)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 21 19:22:57 2017 -0600
|
||
|
||
Added option to disable pack buffer memory pools.
|
||
|
||
Details:
|
||
- Added a new configure option, --[en|dis]able-packbuf-pools, which will
|
||
enable or disable the use of internal memory pools for managing buffers
|
||
used for packing. When disabled, the function specified by the cpp
|
||
macro BLIS_MALLOC_POOL is called whenever a packing buffer is needed
|
||
(and BLIS_FREE_POOL is called when the buffer is ready to be released,
|
||
usually at the end of a loop). When enabled, which was the status quo
|
||
prior to this commit, a memory pool data structure is created and
|
||
managed to provide threads with packing buffers. The memory pool
|
||
minimizes calls to bli_malloc_pool() (i.e., the wrapper that calls
|
||
BLIS_MALLOC_POOL), but does so through a somewhat more complex
|
||
mechanism that may incur additional overhead in some (but not all)
|
||
situations. The new option defaults to --enable-packbuf-pools.
|
||
- Removed the reinitialization of the memory pools from the level-3
|
||
front-ends and replaced it with automatic reinitialization within the
|
||
pool API's implementation. This required an extra argument to
|
||
bli_pool_checkout_block() in the form of a requested size, but hides
|
||
the complexity entirely from BLIS. And since bli_pool_checkout_block()
|
||
is only ever called within a critical section, this change fixes a
|
||
potential race condition in which threads using contexts with different
|
||
cache blocksizes--most likely a heterogeneous environment--can check
|
||
out pool blocks that are too small for the submatrices it wishes to
|
||
pack. Thanks to Nisanth Padinharepatt for reporting this potential
|
||
issue.
|
||
- Removed several functions in light of the relocation of pool reinit,
|
||
including bli_membrk_reinit_pools(), bli_memsys_reinit(),
|
||
bli_pool_reinit_if(), and bli_check_requested_block_size_for_pool().
|
||
- Updated the testsuite to print whether the memory pools are enabled or
|
||
disabled.
|
||
|
||
commit 107801aaae180c00022f1b990bc59038c14949d2
|
||
Merge: d9c05745 0084531d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 18 16:29:28 2017 -0600
|
||
|
||
Merge branch 'master' into selfinit
|
||
|
||
commit 0084531d3eea730a319ecd7018428148c81bbba7
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Dec 17 18:58:25 2017 -0600
|
||
|
||
Updated flatten-headers.py for python3.
|
||
|
||
Details:
|
||
- Modifed flatten-headers.py to work with python 3.x. This mostly
|
||
amounted to removing print statements (which I replaced with calls
|
||
to my_print(), a wrapper to sys.stdout.write()). Thanks to Stefan
|
||
Husmann for pointing out the script's incompatibility with python 3.
|
||
- Other minor changes/cleanups.
|
||
|
||
commit 90b11b79c302f208791bdfb1ed754873103c7ce5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Dec 17 17:34:32 2017 -0600
|
||
|
||
Modest performance boost to flatten-headers.py.
|
||
|
||
Details:
|
||
- Updated flatten-headers.py to pre-compile the main regular expression
|
||
used to isolate #include directives and the header filenames they
|
||
reference. The compiled regex object is then used over and over on
|
||
each header file in the tree of referenced headers. This appears to
|
||
have provided a 1.7-2x performance increase in the best case.
|
||
- Other minor tweaks, such as renaming the main recursive function from
|
||
replace_pass() to flatten_header().
|
||
|
||
commit 99dee87f30b4d437fa6b5e4ba862526d07b9f08b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Dec 17 16:47:27 2017 -0600
|
||
|
||
Reimplemented flatten-headers.sh in python.
|
||
|
||
Details:
|
||
- Added flatten-headers.py, a python implementation of the bash script
|
||
flatten-headers.sh. The new script appears to be 25-100x faster,
|
||
depending on the operating system, filesystem, etc. The python script
|
||
abides by the same command line interface as its predecessor and
|
||
targets python 2.7 or later. (Thanks to Devin Matthews for suggesting
|
||
that I look into a python replacement for higher performance.)
|
||
- Activated use of flatten-headers.py in common.mk via the FLATTEN_H
|
||
variable.
|
||
- Made minor tweaks to flatten-headers.sh such as spelling corrections
|
||
in comments.
|
||
|
||
commit d9c0574599c3f97c0f9b6c334a077bab9452e1f4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 14 17:13:42 2017 -0600
|
||
|
||
Allow travis failures of OS X builds that run testsuite.
|
||
|
||
Details:
|
||
- Added an allowance for OS X builds that run the testsuite to fail.
|
||
There seems to be an issue with 1m when running in Travis CI under
|
||
OS X and clang, but only in double-precision. Haven't been able to
|
||
reproduce the error on my own, and thus, I can't debug it. (Hopefully
|
||
it is simply a version-specific compiler bug.)
|
||
|
||
commit 86cd23b7379b00a42b4ecc04fa668f1e3f9b54ee
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 14 15:47:41 2017 -0600
|
||
|
||
Fixed testsuite Makefile brokenness from 9091a207.
|
||
|
||
Details:
|
||
- Fixed a makefile error encountered when building the testsuite directly
|
||
in its directory (as opposed to indirectly via 'make test'). The fix
|
||
involves introducing a new variable, BUILD_PATH, alongside the existing
|
||
DIST_PATH variable. By default, BUILD_PATH is set to the current
|
||
directory, and is overridden by other Makefiles used by, for example,
|
||
the testsuite and standalone test drivers in testsuite or test,
|
||
respectively.
|
||
- Some files/directories in common.mk were redefined in terms of
|
||
BUILD_DIR, such as the locations of config.mk file and the intermediate
|
||
include directory.
|
||
|
||
commit 6a3a8924c04d25507fc4aa593df30c56c7dc12f7
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 14 13:20:02 2017 -0600
|
||
|
||
Temporarily show Makefile's testsuite output.
|
||
|
||
Details:
|
||
- Disabled redirection of testsuite output for 'test' target. This is
|
||
part of an attempt to debug a segmentation fault on OS X via Travis.
|
||
|
||
commit 9a01080dd426915bed18229f70401bfa639dc283
|
||
Merge: 83316485 a32e8a47
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 14 11:27:19 2017 -0600
|
||
|
||
Merge branch 'master' into selfinit
|
||
|
||
commit a32e8a47c022b6071302b2956af5728976c83ca9 (origin/travis)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 13 16:31:36 2017 -0600
|
||
|
||
Added an exclusion to .travis.yml.
|
||
|
||
Details:
|
||
- Added exclusion for out-of-tree builds on OS X (clang).
|
||
|
||
commit b9f7d987df548965c86e16e0ba94d5cad0d9b399
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 13 16:22:09 2017 -0600
|
||
|
||
Cleaned up after previous travis oot debugging.
|
||
|
||
Details:
|
||
- Removed debugging output from common.mk related to Travis CI
|
||
out-of-tree builds.
|
||
- Other minor cleanups to common.mk.
|
||
|
||
commit 9091a207aa8c49e279676ea02be533480b3b0d5a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 13 16:12:34 2017 -0600
|
||
|
||
Attempted fix to travis oot build failure.
|
||
|
||
Details:
|
||
- Found the likely cause of the Travis CI out-of-tree build failures:
|
||
config.mk was being read from DIST_PATH, rather than the current
|
||
directory.
|
||
|
||
commit c01c71c33e236e6c91f5ddd3ec1e3faec89368c1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 13 15:58:50 2017 -0600
|
||
|
||
Added debugging output to Makefile.
|
||
|
||
Details:
|
||
- Added $(info ...) statements in key locations in an attempt to reveal
|
||
why Travis CI doesn't like building BLIS out-of-tree.
|
||
|
||
commit 784289d69dd6b3692444d3b3e290f6a014465b72
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 13 15:31:27 2017 -0600
|
||
|
||
Updated SHELL in common.mk from /bin/bash to bash.
|
||
|
||
commit d9bb1d1d4ebc89ea75d9d927d09882162a914f77
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 13 15:27:54 2017 -0600
|
||
|
||
Defined SHELL in common.mk so "echo -n" works.
|
||
|
||
Details:
|
||
- Defined the SHELL variable in common.mk as "/bin/bash" so that the
|
||
-n option can be used with echo in the Makefile rule for flattening
|
||
blis.h. Thanks to Devin Matthews for suggesting this fix.
|
||
|
||
commit 9289a08667df2044f3a37af54d893efe2b56d555
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 13 15:14:27 2017 -0600
|
||
|
||
Attempt 3 on .travis.yml.
|
||
|
||
commit 720bfcf0ef54fdc41df0dcaa94503edb0d5c8972
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 13 14:52:28 2017 -0600
|
||
|
||
More fixes to .travis.yml.
|
||
|
||
Details:
|
||
- Fixed a mistake (hopefully) in d0c4dd0 that resulted in many more
|
||
osx/clang sub-tests than intended.
|
||
- Shortened the variable names in an effort to make them more readable
|
||
via the Travis CI web interface.
|
||
|
||
commit 8717c9c97fe9b1ecd3b3192049a73976f8390ca7
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 13 14:36:37 2017 -0600
|
||
|
||
Added 'pwd' commands to .travis.yml for debugging.
|
||
|
||
Details:
|
||
- Added 'pwd' commands to the script portion of the .travis.yml file in
|
||
an attempt to uncover the problem with the recent out-of-tree build
|
||
testing changes made in d0c4dd0.
|
||
|
||
commit 83316485ce10f6fcafe92a1c146282de0dd8068a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 13 14:14:50 2017 -0600
|
||
|
||
Simplified/fixed self-initialization.
|
||
|
||
Details:
|
||
- Fixed a race condition in self-initialization whereby the bli_is_init
|
||
static variable could be erroneously read as TRUE by thread 1 while
|
||
thread 0 is still executing bli_init_apis(), thus allowing thread 1 to
|
||
use the library before it is actually ready. Thanks to to Minh Quan Ho
|
||
and Devin Matthews for pointing out this issue.
|
||
- Part of the solution to the aforementioned race condition was involved
|
||
replacing the runtime initialization of the global scalar constants
|
||
(e.g., BLIS_ONE, BLIS_ZERO, etc.) in bli_const.c with a static
|
||
initialization of those same constants. This eliminates the need for
|
||
bli_const_init() altogether. (The static initialization is made concise
|
||
via preprocess macros.)
|
||
- Defined bli_gks_query_cntx_noinit(), which behaves just like
|
||
bli_gks_query_cntx(), except that it does not call bli_init_once(). This
|
||
function is called in lieu of bli_gks_query_cntx() in bli_ind_init() and
|
||
bli_memsys_init() so as to not result in any recursion into
|
||
bli_init_once().
|
||
- Removed BLIS_ONE_HALF, BLIS_MINUS_ONE_HALF global scalar constants.
|
||
They have no use in BLIS or its test products, and we have little reason
|
||
to believe they are used by others.
|
||
- Removed testsuite/out file, which was accidentally committed as part
|
||
of 70640a3.
|
||
|
||
commit 6526d1d4ae6dbfa854ca8d1e5f224cd6ab3fa958
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 12 13:50:43 2017 -0600
|
||
|
||
Added temp_dir argument to flatten-headers.sh.
|
||
|
||
Details:
|
||
- Added "temp_dir" argument to flatten-headers.sh so that the caller can
|
||
specify where intermediate files should be created as the script runs.
|
||
- Updated flatten-headers.sh to create intermediate files in temp_dir
|
||
instead of alongside the corresponding source files. This should now
|
||
(once again) allow out-of-tree builds where the BLIS distribution is
|
||
read-only, or where the out-of-tree build is running concurrently with
|
||
another out-of-tree build. (Thanks to Devin Matthews for pointing out
|
||
the possibility of simultaneous out-of-tree builds.)
|
||
|
||
commit 94755017c967630daf2e31c1f63ed5e88ab0d6ab
|
||
Merge: d0c4dd00 5cf7b0c4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 12 12:50:41 2017 -0600
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit d0c4dd000ff38acc249e8acf7e0655a523991695
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 12 12:47:53 2017 -0600
|
||
|
||
Added out-of-tree build test to .travis.yml file.
|
||
|
||
Details:
|
||
- Modified .travis.yml file to include an out-of-tree build test (using
|
||
the "auto" configure target). Thanks to Devin Matthews for this
|
||
suggestion.
|
||
|
||
commit 5cf7b0c4e52922069183a87dc2aa177419644e04
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Dec 12 12:38:48 2017 -0600
|
||
|
||
Ignore blis.h.interm [ci skip]
|
||
|
||
commit 8d8ff74d15b4a584929cec36034ba6d3c53f7d27
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 12 12:32:50 2017 -0600
|
||
|
||
Further attempt to fix out-of-tree builds.
|
||
|
||
Details:
|
||
- Fix applied in 87978f6 was necessary but not sufficient to fix
|
||
out-of-tree builds. It turns out that using a source tree that had
|
||
already built the target erroneously gave the impression that
|
||
out-of-tree builds were working again, when in fact they were still
|
||
broken. The additional changes in this commit should complete the
|
||
fix that was started in the aforementioned commit. Thanks to Devin
|
||
Matthews and Shaden Smith for their help in isolating this issue.
|
||
|
||
commit 70640a37109290b57c344083c00624e13c496e30
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 11 17:18:43 2017 -0600
|
||
|
||
Implemented library self-initialization.
|
||
|
||
Details:
|
||
- Defined two new functions in bli_init.c: bli_init_once() and
|
||
bli_finalize_once(). Each is implemented with pthread_once(), which
|
||
guarantees that, among the threads that pass in the same pthread_once_t
|
||
data structure, exactly one thread will execute a user-defined function.
|
||
(Thus, there is now a runtime dependency against libpthread even when
|
||
multithreading is not enabled at configure-time.)
|
||
- Added calls to bli_init_once() to top-level user APIs for all
|
||
computational operations as well as many other functions in BLIS to
|
||
all but guarantee that BLIS will self-initialize through the normal
|
||
use of its functions.
|
||
- Rewrote and simplified bli_init() and bli_finalize() and related
|
||
functions.
|
||
- Added -lpthread to LDFLAGS in common.mk.
|
||
- Modified the bli_init_auto()/_finalize_auto() functions used by the
|
||
BLAS compatibility layer to take and return no arguments. (The
|
||
previous API that tracked whether BLIS was initialized, and then
|
||
only finalized if it was initialized in the same function, was too
|
||
cute by half and borderline useless because by default BLIS stays
|
||
initialized when auto-initialized via the compatibility layer.)
|
||
- Removed static variables that track initialization of the sub-APIs in
|
||
bli_const.c, bli_error.c, bli_init.c, bli_memsys.c, bli_thread, and
|
||
bli_ind.c. We don't need to track initialization at the sub-API level,
|
||
especially now that BLIS can self-initialize.
|
||
- Added a critical section around the changing of the error checking
|
||
level in bli_error.c.
|
||
- Deprecated bli_ind_oper_has_avail() as well as all functions
|
||
bli_<opname>_ind_get_avail(), where <opname> is a level-3 operation
|
||
name. These functions had no use cases within BLIS and likely none
|
||
outside of BLIS.
|
||
- Commented out calls to bli_init() and bli_finalize() in testsuite's
|
||
main() function, and likewise for standalone test drivers in 'test'
|
||
directory, so that self-initialization is exercised by default.
|
||
|
||
commit 70a64432ee5a7adbee10fb7ff6d7b608c1940a7a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 11 13:14:20 2017 -0600
|
||
|
||
Fixed off-by-one indexing in bli_cpuid.c.
|
||
|
||
Details:
|
||
- In bli_cpuid.c, fixed an off-by-one indexing statement in vpu_count()
|
||
whereby a string-terminating NULL character, '\0', is written beyond
|
||
the bounds of the model_num string.
|
||
- Minor whitespace and formatting edits to bli_cpuid.c.
|
||
|
||
commit 87978f6261a080d261d01f9acf4e9cc18855c833
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 11 12:49:03 2017 -0600
|
||
|
||
Fixed broken out-of-tree builds since 52f9e6f.
|
||
|
||
Details:
|
||
- Added missing $(DIST_PATH)/ prefix to relative path to flatten-headers.sh
|
||
script in common.mk so that the script could be found during out-of-tree
|
||
builds. Thanks to Devin Matthews for reporting this bug.
|
||
|
||
commit 513ef4d040f89a18dda5154e8c4cf1aaf7463999
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 11 12:35:59 2017 -0600
|
||
|
||
Various typecasting fixes, mis-typed enums, etc.
|
||
|
||
Details:
|
||
- Fixed implicit typecasting of conj_t to trans_t in bli_[un]packm_cxk.c.
|
||
- Properly typecast integer arguments to match format specifier in various
|
||
calls to printf() in bli_l3_thrinfo.c, bli_cntx.c, bli_pool.c, and
|
||
bli_util_oapi.c.
|
||
- Fixed "unsigned less-than-comparison with zero" checks in bli_check.c,
|
||
bli_cntx.h.
|
||
- Fixed mis-typed enums in bli_cntx.c (e.g., l1mkr_t that should have been
|
||
l1fkr_t or l1vkr_t).
|
||
- Fixed instances of opid_t value BLIS_GEMM that should have been l3ukr_t
|
||
value BLIS_GEMM_UKR in bli_cntx_ref.c.
|
||
- NOTE: These issues were identified via compiler warnings when building
|
||
BLIS with clang on a rather old installation of OS X:
|
||
$ clang --version
|
||
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
|
||
Target: x86_64-apple-darwin15.2.0
|
||
Thread model: posix
|
||
|
||
commit 3bc99a96a3648f51b9acdc8a8c7e1cf4eb815459
|
||
Merge: 3a441183 78199c53
|
||
Author: prangana <pradeep.rao@amd.com>
|
||
Date: Mon Dec 11 12:53:03 2017 +0530
|
||
|
||
Fix merge conflicts after rebase with release branch
|
||
|
||
Change-Id: I581b26c6d515f717ff0dce91c7c0c92553aa2630
|
||
|
||
commit 3a44118398955d6f872e01f73ae5bb4a4f8500f7
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Wed Nov 15 11:11:17 2017 +0530
|
||
|
||
Added AMD copyright line to the changed files in last 3 commits
|
||
|
||
Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66
|
||
|
||
commit 268a56c06e94d1c388766dbfe81d54efbe432809
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 1 11:51:41 2017 -0500
|
||
|
||
Revert to default SIMD alignment for bulldozer.
|
||
|
||
Details:
|
||
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
|
||
config/bulldozer/bli_kernel.h. Not sure where this value came from, but
|
||
it would seem to allow for insufficient starting address alignment for
|
||
any matrices created via bli_malloc_user(), such as via
|
||
bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
|
||
led us to this bug.
|
||
- This commit is a manual patch of the same fix made to the 'rt' branch
|
||
in 8f150f2.
|
||
|
||
commit 510a6863e28277f9446abfb77f1aea9f01d37e7a
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Oct 30 10:04:42 2017 -0500
|
||
|
||
Fix CVECFLAGS for bulldozer config.
|
||
|
||
commit c669716790bdda5d2b11ea0a026cbc121b228842
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Tue Oct 24 16:36:36 2017 +0530
|
||
|
||
Adding __attribute__((constructor/destructor)) for CLANG case.
|
||
|
||
CLANG supports __attribute__, but its documentation doesn't
|
||
mention support for constructor/destructor. Compiling with
|
||
clang and testing shows that it does support this.
|
||
|
||
Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b
|
||
|
||
commit 24e64a9d0877d788357fc63d4b947e977f8697f7
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 18 13:41:25 2017 -0500
|
||
|
||
Removed a duplicate bli_avx512_macros.h header.
|
||
|
||
Details:
|
||
- Removed a duplicate header file that was causing problems during
|
||
installation for the 'knl' configuration. Thanks to Victor Eijkhout
|
||
for reporting this issue.
|
||
|
||
commit 9c0a3c4c0260cbfefb9f11532f46508b4fd19ec2
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Mon Oct 16 22:06:57 2017 +0530
|
||
|
||
Thread Safety: Move bli_init() before and bli_finalize() after main()
|
||
|
||
BLIS provides APIs to initialize and finalize its global context.
|
||
One application thread can finalize BLIS, while other threads
|
||
in the application are stil using BLIS.
|
||
|
||
This issue can be solved by removing bli_finalize() from API.
|
||
One way to do this is by getting bli_finalize() to execute by default
|
||
after application exits from main().
|
||
|
||
GCC supports this behaviour with the help of __attribute__((destructor))
|
||
added to the function that need to be executed after main exits.
|
||
|
||
Similarly bli_init() can be made to run before application enters main()
|
||
so that application need not call it.
|
||
|
||
Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac
|
||
|
||
commit 83f31253eb21c5ecd8a5907835e57720daae0b8b
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Mon Oct 16 21:07:50 2017 +0530
|
||
|
||
Thread safety: Make the global induced method status array local to thread
|
||
|
||
BLIS retains a global status array for induced methods, and provides
|
||
APIs to modify this state during runtime. So, one application thread
|
||
can modify the state, before another starts the corresponding
|
||
BLIS operation.
|
||
|
||
This patch solves this issue by making the induced method status array
|
||
local to threads.
|
||
|
||
Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe
|
||
|
||
commit e923402e68029be379a4297de3ac6fb155ffd928
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Thu Sep 28 12:15:36 2017 +0530
|
||
|
||
The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default
|
||
|
||
Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5
|
||
|
||
commit a64c15de19327c7595376d699be676c7003e850e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 26 19:02:53 2017 -0500
|
||
|
||
Fixed a pthread typo in previous commit.
|
||
|
||
Details:
|
||
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.
|
||
|
||
commit 42dcd589c37e1a2473ab2e1539207da97aebc07f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 26 17:00:04 2017 -0500
|
||
|
||
Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
|
||
|
||
Details:
|
||
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
|
||
into a k x k triangular matrix for the purposes of obtaining an mr x k
|
||
micropanel of A with which to test.
|
||
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
|
||
very large k (depending on the product of mr x kc on that architecture).
|
||
The bug arose from the fact that the test module was triggering the
|
||
allocation of blocks from the internal memory pools, which are limited in
|
||
size. This allocation imposes an implicit assumption that the micro-
|
||
panel being tested with will fit inside, and this assumption is violated
|
||
for large values of k. Arbitrarily large k may now be tested for both
|
||
operation tests.
|
||
- Added OpenMP/pthread critical sections around the setting or getting of
|
||
statuses from the induced method operation lookup table in bli_l3_ind.c.
|
||
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
|
||
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
|
||
issues.
|
||
|
||
commit 206beb68ff73b75f5c382413967aacbb8a0aac3a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Sep 9 14:10:15 2017 -0500
|
||
|
||
Updated bibtex info for BLIS5 (3m4m) article.
|
||
|
||
commit 0c8c0363aeb1f4aa88f7ec2d02403dab05a6e014
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Mon Aug 28 16:44:42 2017 +0530
|
||
|
||
Bug fix for the testsuite build failing
|
||
|
||
Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77
|
||
|
||
commit 63d1c84465b50f64787808dd3e8494e683c16821
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Wed Aug 23 13:01:14 2017 +0530
|
||
|
||
Adding auto hardware detection for Zen
|
||
|
||
Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf
|
||
|
||
commit 537fb2a895b09be94b11947696fd2da629be24dd
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Aug 15 10:02:25 2017 -0500
|
||
|
||
Add vzeroupper to Intel AVX kernels.
|
||
|
||
commit 7628de3f76f78a44788807605a4601ddda445854
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 10 16:24:28 2017 -0500
|
||
|
||
Removed trailing enum commas from bli_type_defs.h.
|
||
|
||
Details:
|
||
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
|
||
Erling Andersen for pointing out this inconsistency and suggesting
|
||
the change.
|
||
|
||
commit a666fd4e267ffae3d4b21f38d569c61ff56adc9e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Aug 5 13:04:31 2017 -0500
|
||
|
||
Added edge handling to _determine_blocksize_b().
|
||
|
||
Details:
|
||
- Added explicit handling of situations where i == dim to
|
||
bli_determine_blocksize_b_sub(). This isn't actually needed by any
|
||
current use case within BLIS, but handling the situation is nonetheless
|
||
prudent. Thanks to Minh Quan for reporting this issue and requesting
|
||
the fix.
|
||
|
||
commit 0c8afa546d7f33760415519ba328d7c49eb7aa06
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Aug 4 14:17:44 2017 -0500
|
||
|
||
Fixed a minor bug in level-3 packm management.
|
||
|
||
Details:
|
||
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
|
||
entries to be released and then re-acquired unnecessarily. (In essence,
|
||
the "<" operands in the conditional that guards the
|
||
release-and-reacquire code block simply needed to be swapped.) The bug
|
||
should have only affected performance (rather than the computed result).
|
||
Thanks to Minh Quan for identifying and reporting the bug.
|
||
|
||
commit 6cf68a185d83fa46d438fcef65258ace78e24b13
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Jul 31 15:19:51 2017 -0500
|
||
|
||
Change lsame_ signature to match lapacke.
|
||
|
||
commit 6a9bd97295cc4fb1cbcd28f69824a43c073c9a76
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Jul 29 20:17:05 2017 -0500
|
||
|
||
Fixed pthreads compile bug with previous commit.
|
||
|
||
Details:
|
||
- Erroneously passed family parameter into l3int_t function despite
|
||
that function not taking the parameter. Oops.
|
||
|
||
commit 95adc43d800431dc0a02ca83a51426dbef641ad6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Jul 29 14:53:39 2017 -0500
|
||
|
||
Moved 'family' field from cntx_t to cntl_t.
|
||
|
||
Details:
|
||
- Removed the family field inside the cntx_t struct and re-added it to the
|
||
cntl_t struct. Updated all accessor functions/macros accordingly, as well
|
||
as all consumers and intermediaries of the family parameter (such as
|
||
bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
|
||
change was motivated by the desire to keep the context limited, as much
|
||
as possible, to information about the computing environment. (The family
|
||
field, by contrast, is a descriptor about the operation being executed.)
|
||
- Added additional functions to bli_blksz_*() API.
|
||
- Added additional functions to bli_cntx_*() API.
|
||
- Minor updates to bli_func.c, bli_mbool.c.
|
||
- Removed 'obj' from bli_blksz_*() API names.
|
||
- Removed 'obj' from bli_cntx_*() API names.
|
||
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
|
||
that operate only on a single struct to contain the "_node" suffix to
|
||
differentiate with those routines that operate on the entire tree.
|
||
- Added enums for packm and unpackm kernels to bli_type_defs.h.
|
||
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
|
||
They weren't being used and probably never will be.
|
||
|
||
commit a98e4aa547f61ab09dd91d11478c2a2ef9882e11
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Jul 20 14:50:13 2017 -0500
|
||
|
||
Clang can't make up it's mind what to support.
|
||
|
||
commit 32eb36c3e8c2add2528514272044de16faed0c8f
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Jul 20 12:54:58 2017 -0500
|
||
|
||
Add default #define for __has_extension.
|
||
|
||
commit 2a9aa134f7c29d3d4fdc160022ff257e61885a95
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Jul 20 10:04:34 2017 -0500
|
||
|
||
Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143.
|
||
|
||
commit 6f07a034d575e1e9e30bb6417b8fcb77cf301297
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 19 15:40:48 2017 -0500
|
||
|
||
Updated ar option list used by all configurations.
|
||
|
||
Details:
|
||
- Dropped 'u' from the list of modifiers passed into the library archiver
|
||
ar. Previously, "cru" was used, while now we employ only "cr". This
|
||
change was prompted by a warning observed on Ubuntu 16.04:
|
||
|
||
ar: `u' modifier ignored since `D' is the default (see `U')
|
||
|
||
This caused me to realize that the default mode causes timestamps to be
|
||
zero, and thus the 'u' option, which causes only changed object files to
|
||
be inserted, is not applicable.
|
||
|
||
commit 32bc03f9eed8795cfd2f2615d1c9f8673e039c57
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 19 13:51:53 2017 -0500
|
||
|
||
Added --force-version=STRING option to configure.
|
||
|
||
Details:
|
||
- Added an option to configure that allows the user to force an arbitrary
|
||
version string at configure-time. The help text also now describes the
|
||
usage information.
|
||
- Changed the way the version string is communicated to the Makefile.
|
||
Previously, it was read into the VERSION variable from the 'version' file
|
||
via $(shell cat ...). Now, the VERSION variable is instead set in
|
||
config.mk (via a configure-substituted anchor from config.mk.in).
|
||
|
||
commit befaee6dd8b2a72de9e0461fe2ec1f36e9f88f3c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 18 17:56:00 2017 -0500
|
||
|
||
Updated openmp/pthread barriers with GNU atomics.
|
||
|
||
Details:
|
||
- Updated the non-tree openmp and pthreads barriers defined in
|
||
bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
|
||
implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
|
||
implementation goes through the same motions as the previous codes, but
|
||
protects its loads and increments with GNU atomic built-ins. These atomic
|
||
statements take memory ordering parameters that allow us to specify just
|
||
enough constraints for the barrier to work as intended on weakly-ordered
|
||
hardware. The prior implementation was only guaranteed to work on systems
|
||
with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
|
||
this change and his crash-course in atomics and memory ordering.)
|
||
- Removed 'volatile' from structs' barrier field declarations in
|
||
bli_thrcomm_*.h.
|
||
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
|
||
consistent with that of the _openmp.? files.
|
||
- Updated other bli_thrcomm_* files to rename "communicator" variables to
|
||
simply "comm".
|
||
|
||
commit 8f739cc847fcff2ddeeb336f8b2b9d080eb16f6c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 17 19:03:22 2017 -0500
|
||
|
||
Added API to set mt environment variables.
|
||
|
||
Details:
|
||
- Renamed bli_env_get_nway() -> bli_thread_get_env().
|
||
- Added bli_thread_set_env() to allow setting environment variables
|
||
pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
|
||
- Added the following convenience wrapper routines:
|
||
bli_thread_get_jc_nt()
|
||
bli_thread_get_ic_nt()
|
||
bli_thread_get_jr_nt()
|
||
bli_thread_get_ir_nt()
|
||
bli_thread_get_num_threads()
|
||
bli_thread_set_jc_nt()
|
||
bli_thread_set_ic_nt()
|
||
bli_thread_set_jr_nt()
|
||
bli_thread_set_ir_nt()
|
||
bli_thread_set_num_threads()
|
||
- Added #include "errno.h" to bli_system.h.
|
||
- This commit addresses issue #140.
|
||
- Thanks to Chris Goodyer for inspiring these updates.
|
||
|
||
commit 10163833075fd42be5b5b503acc855f91a484cfd
|
||
Author: Marat Dukhan <marat@fb.com>
|
||
Date: Thu Jul 13 21:39:24 2017 -0700
|
||
|
||
Fix Emscripten builds
|
||
|
||
commit c09b30d115eade72f44f37bf90aa848c9c0e79af
|
||
Author: Minh Quan HO <mqho@kalray.eu>
|
||
Date: Fri Jul 7 10:52:05 2017 +0200
|
||
|
||
set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
|
||
|
||
The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
|
||
not set in bli_membrk_init
|
||
|
||
commit 997628ed9793c72e9ef576dd8d715cfec27c4862
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Fri Jun 30 12:23:19 2017 +0530
|
||
|
||
Reducing the framework overhead of GEMV routines
|
||
|
||
Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684
|
||
|
||
commit ee869066168239b710ad9938bb0e1ae454883f3a
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Tue Jul 4 12:57:32 2017 +0530
|
||
|
||
Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024.
|
||
|
||
Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4
|
||
|
||
commit 7b933b90b1859c96de49a402d48de82909bc73e5
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Jun 6 20:23:17 2017 -0500
|
||
|
||
Add new SSI acknowledgment
|
||
|
||
commit 3485abba4b426fbf42b146a9611a0841f6d236c6
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Wed May 24 11:48:16 2017 +0530
|
||
|
||
Checked in the small matrix code to compute GEMM called with A transpose case
|
||
|
||
Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462
|
||
|
||
commit de16beb83b29b4b9748f70db985b0fe04db85f7d
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri May 26 14:49:31 2017 -0400
|
||
|
||
PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%.
|
||
|
||
commit 25d0e618544b6eea7d3f13c7aec513ac0139801d
|
||
Author: Devin Matthews <dmatthews@gator3.ufhpc>
|
||
Date: Fri May 26 14:47:36 2017 -0400
|
||
|
||
Revert "Change PACKDIM_MR (double) for haswell to 8."
|
||
|
||
This reverts commit 681eec913d7c2ebcff637cec5c1627ced9a92b99.
|
||
|
||
commit c5bdd84b35bc2a8ebf55b7763fb56c0c945be0cb
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri May 26 12:28:09 2017 -0500
|
||
|
||
Change PACKDIM_MR (double) for haswell to 8.
|
||
|
||
commit 172789d562001293b973bbdd8015bd27d37292e8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 17 13:03:52 2017 -0500
|
||
|
||
Restored deleted lines from makefile fragments.
|
||
|
||
commit 3ea9bd2c8e90dbd35655fa6a5b953dfea1f308fe
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed May 17 12:29:44 2017 -0500
|
||
|
||
Change to /bin/sh.
|
||
|
||
All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.
|
||
|
||
commit 49438409eedb98d3f0ebf00b8d1eee0ae45f4f8c
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed May 17 12:27:14 2017 -0500
|
||
|
||
Remove shebangs from makefiles.
|
||
|
||
commit 497e2640474c016d576dce3530fa6a66891642a0
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 23:11:22 2017 -0400
|
||
|
||
Fix if/else structure. Thanks to TravisCI.
|
||
|
||
commit 835035c56a8de36ad25bb8d1375db170d489ef57
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 22:23:27 2017 -0400
|
||
|
||
Mark piledriver compilable w/ clang.
|
||
|
||
commit 6cdb533472ee61af297c1f948307abbf45828887
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 22:12:12 2017 -0400
|
||
|
||
Mark bulldozer compilable w/ clang.
|
||
|
||
commit a85697d62272da06d28cd1c947f6cf1098df6467
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 22:06:59 2017 -0400
|
||
|
||
Correct error message.
|
||
|
||
commit e0c64cad271058688a2b999caf8c2767dc3aef7e
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 22:03:23 2017 -0400
|
||
|
||
Indeed once can compile for carrizo also using clang.
|
||
|
||
commit 4aafe0505d3f0954d095ded5459a76976e5093b4
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 21:50:49 2017 -0400
|
||
|
||
A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash
|
||
|
||
commit abaeaa68ea11e84be1810f564d6f38d506cbeb6a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri May 5 15:06:56 2017 -0500
|
||
|
||
Fixed a bug in norm1v, norm1m.
|
||
|
||
Details:
|
||
- Fixed a bug that manifested as improperly-computed 1-norm for vectors
|
||
and matrices. This is one of the few operations in BLIS that does not
|
||
have its own test module within the testsuite, hence why it went
|
||
undetected for so long. The bad 1-norms were being used to normalize
|
||
matrices in the testsuite after initialization, which led to some
|
||
matrices containing a combination of "large" and "small" values. This
|
||
tended to push the residuals computed after each test away from zero.
|
||
In some cases, they were off *just* enough to the testsuite to label
|
||
it a "failure". Many thanks to Jeff Hammond for reporting this bug.
|
||
(Wonky details: the bug was due to improperly-defined level-0 scalar
|
||
macros for abval2, an operation that computes the absolute square,
|
||
or complex magnitude/modulus. Certain complex domain instances of
|
||
abval2 were being incorrectly defined in terms of real-only solutions,
|
||
leading to bad results. This level-0 operation forms the basis of
|
||
norm1v/norm1m. absq2 was also affected, but almost nothing uses
|
||
this operation.)
|
||
|
||
commit cc3107ae1c2074f72b724aa748d2e5b4cb290ed5
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu May 4 10:35:22 2017 -0500
|
||
|
||
Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123.
|
||
|
||
commit c8ab91f70d399ee14edd30a3a5c46b24c5d2f910
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 3 15:04:51 2017 -0500
|
||
|
||
Disable complex 3m/4m in testsuite by default.
|
||
|
||
Details:
|
||
- Disabled testsuite tests of all level-3 implementations based on 3m
|
||
and 4m. This will improve testing runtime on Travis CI as well as for
|
||
anyone manually running the testsuite using default test parameters.
|
||
Thanks to Devin Matthews for suggesting this change.
|
||
|
||
commit 9700f0e5785007ddafb72a5ca83800dee61fd35c
|
||
Author: Jeff Hammond <jeff.science@gmail.com>
|
||
Date: Tue May 2 19:25:21 2017 -0700
|
||
|
||
allow KNL build without hbwmalloc.h (i.e. emulated)
|
||
|
||
we want to be able to run BLIS KNL binaries on non-KNL machines via SDE.
|
||
although it is possible to install hbwmalloc implementation on such
|
||
systems, it is easier not to, since obviously the performance of SDE
|
||
execution is not representative so there is no reason to emulate HBW
|
||
allocation.
|
||
|
||
commit 17dcd5a33ff91967f67e7c0ba09b4f18754609a4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 16:48:43 2017 -0500
|
||
|
||
Fixed stray parentheses in README citations.
|
||
|
||
commit 2910d44ff9e1d951d3249313f4ab39d18ea1b48d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 16:38:43 2017 -0500
|
||
|
||
CHANGELOG update (0.2.2)
|
||
|
||
commit 5ca3863220e07972fcefc6682ddd3f6e54fe4a94
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 15:48:30 2017 -0500
|
||
|
||
Fixed a trsm1m bug that affected right-side cases.
|
||
|
||
Details:
|
||
- Fixed a bug introduced in 1c732d3 that affected trsm1m_r. The result
|
||
was nondeterministic behavior (usually segmentation faults) for certain
|
||
problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The
|
||
cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c
|
||
which explicitly directed the virtual gemm micro-kernel to use temporary
|
||
space if the storage preference of the [real domain] gemm ukernel did
|
||
not match the storage of the output matrix C. In the context of gemm,
|
||
this handling is not needed because agreement between the storage pref
|
||
and the matrix is guaranteed by a high-level optimization in BLIS.
|
||
However, this optimization is not applied to trsm because the storage
|
||
of C is not necessarily the same as the storage of the micro-panels of
|
||
B--both of which are updated by the micro-kernel during a trsm
|
||
operation. Thus, the guarantee of storage/preference agreement is not
|
||
in place for trsm, which means we must handle that case within the
|
||
virtual gemm micro-kernel.
|
||
- Comment updates and a minor macro change to bli_trsm*_cntx_init() for
|
||
3m1, 4m1a, and 1m.
|
||
|
||
commit 1af0b09f5c275ee7bac896cc6f36f42af721d9b5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 12:09:39 2017 -0500
|
||
|
||
README.md update.
|
||
|
||
Details:
|
||
- Updated bibtex entries for 4th BLIS paper, and adds entries for 5th
|
||
and 6th BLIS papers.
|
||
|
||
commit db4a0bb8ba7cd697d68be8e5632371ee3e59fd63
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 17 12:07:27 2017 -0500
|
||
|
||
Whitespace reformatting to armv8a kernels file.
|
||
|
||
Details:
|
||
- Updated formatting of function signature/header in
|
||
kernels/armv8a/3/bli_gemm_opt_4x4.c.
|
||
|
||
commit e3eb01f6b990e205b15edcbaffd3d54b3ddd1ca4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 21 15:33:39 2017 -0600
|
||
|
||
Disabled experiment-related 1m code.
|
||
|
||
Details:
|
||
- Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
|
||
specifically inserted to facilitate the benchmarking of 1m block-panel
|
||
and panel-block algorithms.
|
||
- Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
|
||
reflect changes used/needed during benchmarking.
|
||
|
||
commit 4f61528d56eed6a139eeac9db0c44e56f2d2d136
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jan 25 16:25:46 2017 -0600
|
||
|
||
Added 1m-specific APIs for bp, pb gemm algorithms.
|
||
|
||
Details:
|
||
- Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the
|
||
body of bli_gemm_cntl_create() replaced with a call to the former.
|
||
- Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now,
|
||
bli_cntl_free() can check if the thread parameter is NULL, and if so,
|
||
call the latter, and otherwise call the former.
|
||
- Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in
|
||
terms of bli_gemm1mxx_cntx_init(), which behaves the same as
|
||
bli_gemm1m_cntx_init() did before, except that an extra bool parameter
|
||
(is_pb) is used to support both bp and pb algorithms (including to
|
||
support the anti-preference field described below).
|
||
- Added support for "anti-preference" in context. The anti_pref field,
|
||
when true, will toggle the boolean return value of routines such as
|
||
bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of
|
||
causing BLIS to transpose the operation to achieve disagreement (rather
|
||
than agreement) between the storage of C and the micro-kernel output
|
||
preference. This disagreement is needed for panel-block implementations,
|
||
since they induce a transposition of the suboperation immediately before
|
||
the macro-kernel is called, which changes the apparent storage of C. For
|
||
now, anti-preference is used only with the pb algorithm for 1m (and not
|
||
with any other non-1m implementation).
|
||
- Defined new functions,
|
||
bli_cntx_l3_ukr_eff_prefers_storage_of()
|
||
bli_cntx_l3_ukr_eff_dislikes_storage_of()
|
||
bli_cntx_l3_nat_ukr_eff_prefers_storage_of()
|
||
bli_cntx_l3_nat_ukr_eff_dislikes_storage_of()
|
||
which are identical to their non-"eff" (effectively) counterparts except
|
||
that they take the anti-preference field of the context into account.
|
||
- Explicitly initialize the anti-pref field to FALSE in
|
||
bli_gks_cntx_set_l3_nat_ukr_prefs().
|
||
- Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel
|
||
in terms of the existing block-panel macro-kernel _ker_var2(). This
|
||
technique requires inducing transposes on all operands and swapping
|
||
the A and B.
|
||
- Changed bli_obj_induce_trans() macro so that pack-related fields are
|
||
also changed to reflect the induced transposition.
|
||
- Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily
|
||
specify the 1m algorithm (block-panel or panel-block).
|
||
- Renamed the following cntx_t-related macros:
|
||
bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block()
|
||
bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel()
|
||
bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel()
|
||
and updated all instantiations. Also updated the field names in the
|
||
cntx_t struct.
|
||
- Comment updates.
|
||
|
||
commit 1d728ccb2394e77365e7c42683db6579c5fba014
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 25 18:29:49 2016 -0600
|
||
|
||
Implemented the 1m method.
|
||
|
||
Details:
|
||
- Implemented the 1m method for inducing complex domain matrix
|
||
multiplication. 1m support has been added to all level-3 operations,
|
||
including trsm, and is now the default induced method when native
|
||
complex domain gemm microkernels are omitted from the configuration.
|
||
- Updated _cntx_init() operations to take a datatype parameter. This was
|
||
needed for the corresponding function for 1m (because 1m requires us
|
||
to choose between column-oriented or row-oriented execution, which
|
||
requires us to query the context for the storage preference of the
|
||
gemm microkernel, which requires knowing the datatype) but I decided
|
||
that it made sense for consistency to add the parameter to all other
|
||
cntx initialization functions as well, even though those functions
|
||
don't use the parameter.
|
||
- Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take
|
||
a second scalar for each blocksize entry. The semantic meaning of the
|
||
two scalars now is that the first will scale the default blocksize
|
||
while the second will scale the maximum blocksize. This allows scaling
|
||
the two independently, and was needed to support 1m, which requires
|
||
scaling for a register blocksize but not the register storage
|
||
blocksize (ie: "packdim") analogue.
|
||
- Deprecated bli_blksz_reduce_dt_to() and defined two new functions,
|
||
bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing
|
||
default and maximum blocksizes to some desired blocksize multiple.
|
||
These functions are needed in the updated definitions of
|
||
bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs().
|
||
- Added support for the 1e and 1r packing schemas to packm, including
|
||
1e/1r packing kernels.
|
||
- Added a minor optimization to bli_gemm_ker_var2() that allows, under
|
||
certain circumstances (specifically, real domain beta and row- or
|
||
column-stored matrix C), the real domain macrokernel and microkernel
|
||
to be called directly, rather than using the virtual microkernel
|
||
via the complex domain macrokernel, which carries a slight additional
|
||
amount of overhead.
|
||
- Added 1m support to the testsuite.
|
||
- Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified
|
||
some code in test_gemm.c driver.
|
||
|
||
commit 0d1b90286e29aa8b768e280b5286d92c02ad87a1
|
||
Author: Jeff Hammond <jeff.science@gmail.com>
|
||
Date: Tue Oct 25 21:15:26 2016 -0700
|
||
|
||
never use libm with Intel compilers
|
||
|
||
Intel compilers include a highly optimized math library (libimf) that
|
||
should be used instead of GNU libm.
|
||
|
||
yes, this change is for ALL targets, including those that are not
|
||
supported by the Intel compiler. there is no harm in doing this, and it
|
||
is future-proof in the event that the Intel compilers support other
|
||
architectures.
|
||
|
||
commit b150870397e7aee558e61d1bd72a0c0d1d99bee8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Dec 8 16:08:41 2017 -0600
|
||
|
||
Removed most "old" directories.
|
||
|
||
Details:
|
||
- Removed the vast majority of directories named "old", which contained
|
||
deprecated code that I wasn't quite ready to jettison from the source
|
||
tree.
|
||
|
||
commit 270c65985df849297ba1951aa3b56c03948d7775
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Dec 8 15:21:18 2017 -0600
|
||
|
||
Modified bli_getopt() for thread-safety.
|
||
|
||
Details:
|
||
- Changed the interface of bli_getopt() to take a new argument, a getopt_t
|
||
struct, that stores the values of optarg, optind, opterr, and optopt,
|
||
and updated the implementation accordingly. (Previously, these
|
||
variables were assumed to be global.)
|
||
- Added a function for initializing a getopt_t struct.
|
||
- Changed test_libblis.c--currently the only consumer of bli_getopt()--to
|
||
utilize the new getopt_t state object.
|
||
|
||
commit ce4d8fabc2e39371f89c12192fb707be82ae021a
|
||
Merge: 39be59f2 e05a8dfa
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 7 17:36:44 2017 -0600
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 39be59f2a8470f40475907d9dd52639b8a911a92
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 7 17:35:20 2017 -0600
|
||
|
||
Replaced several macros with static function APIs.
|
||
|
||
Details:
|
||
- Reimplemented several sets of get/set-style preprocessor macros with
|
||
static functions, including those in the following frame/base headers:
|
||
auxinfo, cntl, mbool, mem, membrk, opid, and pool. A few headers in
|
||
frame/thread were touched as well: mutex_*, thrcomm, and thrinfo.
|
||
|
||
commit e05a8dfa7cc7df41e966c1ad04e51c482b308b23
|
||
Merge: 79507337 4423e33d
|
||
Author: dnp <devangiparikh@gmail.com>
|
||
Date: Wed Dec 6 16:45:24 2017 -0600
|
||
|
||
Merge branch 'rt'
|
||
|
||
commit 4423e33dc593115cda92c5763d756d7ad1298aa9
|
||
Author: dnp <devangiparikh@gmail.com>
|
||
Date: Wed Dec 6 16:35:03 2017 -0600
|
||
|
||
Adding SKX kernels and configuration.
|
||
|
||
commit 79507337e140daec7639f6eb3ed9cfe6e123d342
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Dec 6 16:21:35 2017 -0600
|
||
|
||
Various checks to ensure that arch_t id is in range.
|
||
|
||
Details:
|
||
- Expanded checking of the arch_t id in bli_gks.c--either passed in from
|
||
the caller or as returned from bli_arch_query_id()--against the expected
|
||
range of id values. Thanks to Devangi Parikh for suggesting these
|
||
additional sanity checks.
|
||
|
||
commit fde7c1126c58373ecde83471890b257399144876
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 4 16:11:01 2017 -0600
|
||
|
||
Added 'uninstall-old-headers' target to Makefile.
|
||
|
||
Details:
|
||
- Defined a new 'uninstall-old-headers' target that allows users of BLIS to
|
||
uninstall no-longer-needed headers left over from previous installations.
|
||
- Fixed the 'uninstall-old' target so that it will install both .a and .so
|
||
libraries.
|
||
- Renamed 'uninstall-old' to 'uninstall-old-libs'.
|
||
- Added 'uninstall-old' target (different from previous 'uninstall-old'
|
||
target) that combines 'uninstall-old-libs' and 'uninstall-old-headers'.
|
||
|
||
commit d4ee770bde213a87aa6049245145318324dc6b51
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 4 14:53:43 2017 -0600
|
||
|
||
Create/install monolithic cblas.h.
|
||
|
||
Details:
|
||
- When CBLAS is enabled at configure-time, BLIS now creates a monolithic
|
||
cblas.h using the same flatten-header.sh script that was recently
|
||
introduced for creating monolithic blis.h header files. The top-level
|
||
Makefile will also install this cblas.h file into the install prefix
|
||
alongside blis.h when the 'install' target is invoked. The two header
|
||
files are compatible with one another. Regardless whether the user's
|
||
source #includes cblas.h, both blis.h and cblas.h, or just blis.h,
|
||
the user will get the CBLAS function prototypes and enums, as expected.
|
||
|
||
commit 52f9e6f1b6468785af8947317656445d4729fc8b
|
||
Merge: ab57b979 21360dd8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Dec 1 12:28:09 2017 -0600
|
||
|
||
Merge branch 'rt'
|
||
|
||
commit 21360dd8e2c7287100645e109acaabcc6ba1140c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 29 14:11:34 2017 -0600
|
||
|
||
Fixed cntx_t packm query when ker_id > _NUM_PACKM_KERS.
|
||
|
||
Details:
|
||
- Fixed a subtle bug in bli_cntx_get_[un]packm_ker_dt() in which the
|
||
function fails to return NULL when passed a kernel id argument that is
|
||
equal to or beyond BLIS_NUM_[UN]PACKM_KERS. Instead, the function was
|
||
attempting to index into the cntx_t's packm kernel array, which resulted
|
||
in undefined behvaior. Thanks to Devangi Parikh for finding this bug.
|
||
|
||
commit 244a6f4e66e8ff091e995f8090ce779c1928aa8b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 28 17:48:48 2017 -0600
|
||
|
||
Fixed POSIX sed non-compliance in flatten-header.sh.
|
||
|
||
Details:
|
||
- Changed GNU usage of 'i' and 'a' sed commands used in flatten-header.sh
|
||
to POSIX-compliant usage that will work on OS X's sed.
|
||
|
||
commit 45078621676833e53a2878af8f89479c4f93b8ab
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 28 15:16:22 2017 -0600
|
||
|
||
Generate/compile with/install monolithic blis.h.
|
||
|
||
Details:
|
||
- Rewrote monolithify-header.sh (and renamed to flatten-header.sh) so that
|
||
headers are inserted recursively. This improves performance by a factor
|
||
of 3-4x.
|
||
- Modified configure to create an 'include/<configname>' directory in which
|
||
make can create a monolithic header.
|
||
- Modified the top-level Makefile so that a monolithic header is generated
|
||
unconditionally prior to compilation (stored in include/<configname>) and
|
||
so that the single header is installed instead of the 450 or so header
|
||
files that reside throughout the framework source tree.
|
||
- Added "include/*/*.h" to .gitignore file.
|
||
- Removed some pnacl/emscripten leftovers that I intended to include in
|
||
a1caeba (mostly in testsuite/Makefile).
|
||
- Trivial comment changes to frame/include/bli_f2c.h.
|
||
|
||
commit 1f30b1301bf6d6047ec29e57a5fde8eb1072a0ee
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Nov 25 16:54:26 2017 -0600
|
||
|
||
Added missing framework support for x86_64 family.
|
||
|
||
Details:
|
||
- Added support for the x86_64 configuration family to bli_arch.c and
|
||
bli_arch_config.h. Thanks to Johannes Dieterich for reporting this
|
||
issue.
|
||
- Bumped the default value for BLIS_SIMD_NUM_REGISTERS from 16 to 32 and
|
||
the default value for BLIS_SIMD_SIZE from 32 to 64. This will support
|
||
configuration families that include Skylake and newer processors without
|
||
any supported needed in the bli_family_*.h file. The semantics of these
|
||
values have always been "maximum" and not exact values; comments in
|
||
bli_kernel_macro_defs.h and the github wiki have been adjusted
|
||
accordingly.
|
||
|
||
commit 9f39806c4ed484c9ed13edf96005838d977722a9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 21 16:03:56 2017 -0600
|
||
|
||
Fixed a bug in e31f0b3/b131b9a.
|
||
|
||
Details:
|
||
- Erroneously placed the "don't overwrite existing blocksize" logic in
|
||
bli_blksz_init*() rather than in bli_cntx_set_blkszs(). It belongs in
|
||
the latter because that function copies blocksizes as-is from the
|
||
blksz_t function argument to the appropriate field in the cntx_t. If
|
||
the blksz_t was previously initialized selectively, based on the sign
|
||
of the blocksize value passed into bli_blksz_init*(), that just leaves
|
||
some fields possibly uninitialized (with garbage values), which
|
||
definitely will not work.
|
||
- The aforementioned logic has been moved to bli_cntx_set_blkszs() via
|
||
a new function bli_blksz_copy_if_pos(), which selectively copies only
|
||
the blocksizes that are greater than zero.
|
||
|
||
commit b131b9a025c15f548d4c2952a9ec85eee3d139b1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 21 14:30:26 2017 -0600
|
||
|
||
Updated configs to omit setting some blocksizes.
|
||
|
||
Details:
|
||
- Employ the new semantics of bli_blksz_init*() in e31f0b3 in various
|
||
sub-configurations' bli_cntx_init_*() functions by passing in 0 for
|
||
register and cache blocksizes that correpond to gemm microkernel
|
||
datatypes that were not registered, allowing the default values
|
||
set by the bli_cntx_init_*_ref() function call to remain.
|
||
|
||
commit 499a4c002f895744ecaf81ef7f62d2d6d0d7d594
|
||
Merge: e31f0b3e 6c3ba502
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 21 14:25:08 2017 -0600
|
||
|
||
Merge branch 'rt' of github.com:flame/blis into rt
|
||
|
||
commit e31f0b3e2dba19ca8a2946bc21beb136a42d0f57
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 21 14:21:25 2017 -0600
|
||
|
||
Subtle update to bli_blksz_init*() API.
|
||
|
||
Details:
|
||
- Updated the semantics of bli_blksz_init() and bli_blksz_init_ed() so
|
||
that non-positive blocksize values are ignored entirely. This provides
|
||
an easy way to indicate that certain existing values should not be
|
||
touched by the update. Thanks to Devangi Parikh for feedback that led
|
||
to these changes.
|
||
|
||
commit 6c3ba502a11f87bc67555d26154cfd39d0af1bac
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 21 13:50:53 2017 -0600
|
||
|
||
Added 'x86_64' sub-config directory.
|
||
|
||
Details:
|
||
- Added missing x86_64 configuration directory, which was intended to be
|
||
part of b7ca580.
|
||
- Added -Wfatal-errors compiler warning flag to all configurations so that
|
||
compilation stops after the first error.
|
||
- Changed the vectorization flags for intel64 configuration to be compatible
|
||
with 'penryn', the oldest sub-config included in that family.
|
||
- Changed the vectorization flags for penryn to target the 'core2'
|
||
microarchitecture and ssse3.
|
||
|
||
commit 25eee3cc49b0631812485d4d5ceef0c23ed1b6dd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 21 12:34:20 2017 -0600
|
||
|
||
Added a dummy file to kernels/generic.
|
||
|
||
Details:
|
||
- Added a dummy file to kernels/generic, which was previously empty, so
|
||
that git would begin tracking the otherwise-empty directory. This
|
||
directory's existence is necessary for proper execution of configure
|
||
for any configuration family that contains the 'generic'
|
||
sub-configuration. Thanks to Johannes Dieterich for reporting the
|
||
issue that led to this fix.
|
||
|
||
commit ef024ce4cafa217669eaabb31ff8ab6df93cca05
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 20 18:08:29 2017 -0600
|
||
|
||
More tweaks to monolithify-header.sh
|
||
|
||
Details:
|
||
- Further fixes monolithify-header.sh script.
|
||
- Removed unnecessary #include "blis.h" from frame/3/bli_l3_packm.h.
|
||
|
||
commit 5028e7dec269b62895511453272585da36e591b5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 20 17:00:37 2017 -0600
|
||
|
||
Second attempt to implement travis_wait.
|
||
|
||
Details:
|
||
- Corrected accidental misplacement of the travis_wait prefix (on the
|
||
wrong line of the .travis.yml file) in commit 13e5d91.
|
||
|
||
commit 13e5d9107b3763cba46fb1bae87476852601b47c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 20 15:57:06 2017 -0600
|
||
|
||
Added travis_wait prefix to testsuite via Travis.
|
||
|
||
Details:
|
||
- It appears that Travis CL has implemented a new policy that results in
|
||
a test failing if it does not produce any output for more than 10
|
||
minutes. (Two test instances are now failing in Travis despite the most
|
||
recent commit not affecting the library or testsuite.) This issue can
|
||
be worked around by executing the test run via travis_wait, which takes
|
||
an optional time parameter. This commit attempts to use 'travis_wait 30'
|
||
in the .travis.yml file to prevent the early failure at 10 minutes.
|
||
|
||
commit a1caeba0ea79c8fecb1abadca1f91c6367ab3afb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 20 13:31:20 2017 -0600
|
||
|
||
Removed pnacl, emscripten support from Makefile.
|
||
|
||
commit 78199c539beaa50f37893add220261ce0dcb921a
|
||
Merge: b3d8ab2e ab57b979
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Nov 20 15:51:20 2017 +0530
|
||
|
||
Merge master code till 01-Nov-2017 to amd-staging
|
||
|
||
Change-Id: I40b53f876db84c8b947b3f2385c9b882245c6603
|
||
|
||
commit 9df6dda9ec51a0d40166169d2d8a2f84b42266e6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Nov 18 19:03:26 2017 -0600
|
||
|
||
Improvements, bugfixes to monolithify-header.sh.
|
||
|
||
commit 21d26201f90b884eb8d5de279ed74bbd244ffcb5
|
||
Merge: 43baa3b3 b7ca5806
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Nov 18 14:16:53 2017 -0600
|
||
|
||
Merge branch 'rt' of github.com:flame/blis into rt
|
||
|
||
commit 43baa3b327d5ae1e2ba619432687b4dd849b05e3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Nov 18 14:14:44 2017 -0600
|
||
|
||
Removed unnecessary flags for generic config.
|
||
|
||
Details:
|
||
- Removed -D_POSIX_C_SOURCE=200112L and -m64 flags from make_defs.mk file
|
||
of generic sub-configuration. These flags are generally not necessary,
|
||
and particularly not desirable for the generic configuration since they
|
||
unnecessarily restrict the environments in which the configuration can
|
||
be built.
|
||
|
||
commit b7ca580618f9382b7982168fd035ed058f83e4c2
|
||
Author: iotamudelta <dieterich@ogolem.org>
|
||
Date: Sat Nov 18 14:56:05 2017 -0500
|
||
|
||
[WIP] Add x86 and x86_64 processor families. (#154)
|
||
|
||
* Add x86 and x86_64 processor families.
|
||
* Use generic config as fallback for more families.
|
||
|
||
After discussion with fgvanzee, a) it's "generic" and 2) use it for all the families as a fallback. Goal is that if a specific CPU is not yet supported by a family (say a new Intel microarchitecture on x86_64), it'll fall through to still work with the slower "generic" kernels
|
||
|
||
commit 870597d1663aaba1b74d7654b1d4946280aa0d3f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 17 17:06:42 2017 -0600
|
||
|
||
Added bash script for creating monolithic headers.
|
||
|
||
Details:
|
||
- Added a new script, monolithify-header.sh, to the 'build' directory.
|
||
This script recursively replaces all #include directives in a selected
|
||
file with the contents of the header files referenced by each directive.
|
||
The idea is to "flatten" a tree of .h files into a single file, with
|
||
the script acting as a C preprocessor that only processes #include
|
||
directives.
|
||
|
||
commit c76f77f4cc1e71988251c5e63cf6ef137477bf9c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 17 15:10:52 2017 -0600
|
||
|
||
Removed unnecessary #include "blis.h" from header.
|
||
|
||
Details:
|
||
- Removed an errant #include "blis.h directive from bli_cntx_ind_stage.h.
|
||
The generaly policy is that no header file in BLIS should include
|
||
blis.h. This will be important in the near future when using a tool to
|
||
recursively create a monolithic blis.h file from its consitutent
|
||
headers.
|
||
|
||
commit 2bb9bc6e9536fa239fbc19a7efaaf151116e15b4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 17 13:50:14 2017 -0600
|
||
|
||
Miscellaneous tweaks to gks, rt functionality.
|
||
|
||
Details:
|
||
- Updated bli_cpuid_query_id() so that BLIS_ARCH_GENERIC is always returned
|
||
if the hardware fails to test positive for any supported sub-configuration.
|
||
- Defined bli_gks_init_ref_cntx(), which will call the context initialization
|
||
function bli_cntx_init_configname() for the sub-configuration 'configname'
|
||
associated with the arch_t id returned by bli_arch_query_id(). This makes
|
||
initializing a reference context easy for experts who wish to construct
|
||
those contexts.
|
||
|
||
commit b3d8ab2ea02c127ab241532abc214624f35bfaab
|
||
Merge: 189ffbb0 fe71c06e
|
||
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
|
||
Date: Wed Nov 15 01:33:12 2017 -0500
|
||
|
||
Merge "Added AMD copyright line to the changed files in last 3 commits" into amd-staging
|
||
|
||
commit fe71c06e42b072407c83112779055b0afb67173d
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Wed Nov 15 11:11:17 2017 +0530
|
||
|
||
Added AMD copyright line to the changed files in last 3 commits
|
||
|
||
Change-Id: I37d5dbbbe1b199e07529610a5e9cc9e49d067c66
|
||
|
||
commit d5bf79e50bf97072bbe7117c86b7c45e6e707ea0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 13 14:24:29 2017 -0600
|
||
|
||
Miscellaneous tweaks and fixes.
|
||
|
||
Details:
|
||
- Fixed incorrect calling sequence in bli_cntx_init_knl.c--an instance of
|
||
bli_blksz_init_easy() that should have been bli_blksz_init().
|
||
- Fixed a bug in code that is supposed to output the list of sub-directories
|
||
in the 'config' directory when configure script is run with no arguments.
|
||
- Expanded the output of "make showconfig" to include more info from config.mk.
|
||
- Minor changes to build/auto-detect/cpuid_x86.c, mostly in preparation for
|
||
someone to add excavator and zen support.
|
||
- Added a link to the ConfigurationHowTo wiki to config_registry.
|
||
- Other minor tweaks to configure.
|
||
|
||
commit 673e5184030532c4ebd9fdeecbaa6442bb3ad54f
|
||
Merge: 2c51356a 8f150f28
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 1 17:37:42 2017 -0500
|
||
|
||
Merge branch 'rt' of github.com:flame/blis into rt
|
||
|
||
commit 2c51356a8b2699c99f9507c80d69c08a35d45fe3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 1 17:37:02 2017 -0500
|
||
|
||
Implemented runtime hardware detection via cpuid.
|
||
|
||
Details:
|
||
- Added runtime support for selecting an appropriate arch_t value based
|
||
on the results of the cpuid instruction (for x86_64). This allows
|
||
deferral of choosing a context (kernels, blocksizes, etc.) until
|
||
runtime, which allows BLIS to be built with support for multiple
|
||
microarchitectures. Currently, only amd64 and intel64 configurations
|
||
are registered in the config_registry; however, one could create
|
||
custom configuration families to support arbitrary sets of x86_64
|
||
microarchitectures.
|
||
- Current Intel microarchitectures supported via cpuid are knl, haswell,
|
||
sandybridge, and penryn.
|
||
- Current AMD microarchitectures supported via cpuid are: zen, excavator,
|
||
steamroller, piledriver, and bulldozer.
|
||
|
||
commit ab57b979046479bcda7f83165838a80117c2ad95
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 1 11:51:41 2017 -0500
|
||
|
||
Revert to default SIMD alignment for bulldozer.
|
||
|
||
Details:
|
||
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
|
||
config/bulldozer/bli_kernel.h. Not sure where this value came from, but
|
||
it would seem to allow for insufficient starting address alignment for
|
||
any matrices created via bli_malloc_user(), such as via
|
||
bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
|
||
led us to this bug.
|
||
- This commit is a manual patch of the same fix made to the 'rt' branch
|
||
in 8f150f2.
|
||
|
||
commit 8f150f28a678c4a0c1591400177ad7cca81fcaec
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 1 11:41:45 2017 -0500
|
||
|
||
Revert to default SIMD alignment for bulldozer.
|
||
|
||
Details:
|
||
- Removed the default-overriding #define of BLIS_SIMD_ALIGN_SIZE set in
|
||
bli_family_bulldozer.h. Not sure where this value came from, but it
|
||
would seem to allow for insufficient starting address alignment for
|
||
any matrices created via bli_malloc_user(), such as via
|
||
bli_obj_create(). Thanks to Rene Sitt for reporting the behavior that
|
||
led us to this bug.
|
||
|
||
commit e3f10557caf114441fbfff990e3ce3576c177bdc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 30 13:37:54 2017 -0500
|
||
|
||
Use perl for some substitution for OS X compatibility.
|
||
|
||
Details:
|
||
- Discovered that sed commands where the replacement string contains '\n'
|
||
are problematic with the version of sed present in OS X. For these cases
|
||
cases in the configure script, we instead use 'perl -pe' for
|
||
search-and-replace functionality.
|
||
- Various other minor comment/whitespace tweaks to configure.
|
||
- Removed remaining lines of code related to setting/checking variables to
|
||
track "unregistered" configurations.
|
||
|
||
commit dd45cfdfc3d8f9acf4cf7f69138d9b83dafc8842
|
||
Merge: 3e4f42a4 f60c827b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 30 12:23:05 2017 -0500
|
||
|
||
Merge branch 'master' into rt
|
||
|
||
commit f60c827ba95f452c8454fb914f5564f4895bf644
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Oct 30 10:04:42 2017 -0500
|
||
|
||
Fix CVECFLAGS for bulldozer config.
|
||
|
||
commit 3e4f42a4d2ebb37b95988933d92e561c5b2cc201
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 27 11:41:37 2017 -0500
|
||
|
||
Typecast l1mkr_t enum value prior to comparison.
|
||
|
||
Details:
|
||
- Typecast l1mkr_t enum value in bli_cntx.h to guint_t before testing for
|
||
out-of-range value. This is an attempt to pacify a strange warning from
|
||
clang on OS X that is seemingly the result of the following compiler
|
||
warning flag:
|
||
-Wtautological-constant-out-of-range-compare
|
||
|
||
commit aec6e038d942d35b81bbd723a640cce2c054fb8e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 26 16:12:36 2017 -0500
|
||
|
||
Removed associative arrays from configure.
|
||
|
||
Details:
|
||
- Implemented a replacement for associative arrays in the configure script
|
||
that does not utilize arrays, and therefore works in pre-4.0 versions of
|
||
bash. (It appears that Mac OS X will be stuck with version 3.2 indefinitely
|
||
due to bash switching to the GPL 3.0 license starting with version 4.0.)
|
||
|
||
commit 189ffbb0d37262b21acddc0d35b4a22f2cbbca94
|
||
Merge: 06e0e635 3eb44f67
|
||
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
|
||
Date: Wed Oct 25 02:00:30 2017 -0400
|
||
|
||
Merge changes Ie115b206,I7ce6cfa2,Iff59b6f4 into amd-staging
|
||
|
||
* changes:
|
||
Adding __attribute__((constructor/destructor)) for CLANG case.
|
||
Thread Safety: Move bli_init() before and bli_finalize() after main()
|
||
Thread safety: Make the global induced method status array local to thread
|
||
|
||
commit 3eb44f67618b91ae5f5f0aaaba67e38f16042ee4
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Tue Oct 24 16:36:36 2017 +0530
|
||
|
||
Adding __attribute__((constructor/destructor)) for CLANG case.
|
||
|
||
CLANG supports __attribute__, but its documentation doesn't
|
||
mention support for constructor/destructor. Compiling with
|
||
clang and testing shows that it does support this.
|
||
|
||
Change-Id: Ie115b20634c26bda475cc09c20960d687fb7050b
|
||
|
||
commit 07c352188bf5265af242255f8e6fcb97050d973d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 23 16:59:22 2017 -0500
|
||
|
||
Added "generic" configuration.
|
||
|
||
Details:
|
||
- Added a "generic" configuration that leaves the default blocksizes and
|
||
kernels unchanged. This replaces the older "reference" configuration.
|
||
Updated auto-detect script and code accordingly.
|
||
- Added support for generic configuration to arch_t (bli_type_defs.h),
|
||
bli_gks_init() (bli_gks.c), and bli_arch_config.h
|
||
- Moved bli_arch_query_id() to bli_arch.c (and prototype to bli_arch.h).
|
||
- Whitespace changes to configurations' make_defs.mk files.
|
||
|
||
commit c1a98d6f70608b02a1e6bcad6ba020a60773dace
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 23 14:24:41 2017 -0500
|
||
|
||
Minor update to .travis.yml file.
|
||
|
||
commit 75b9383f01caa8b83f8be0117e15085b0d807ba6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 20 16:41:22 2017 -0500
|
||
|
||
Minor header renaming ahead of bli_arch.c.
|
||
|
||
Details:
|
||
- Renamed the various configurations' "bli_arch_<configname>.h" header files
|
||
(replacing "arch" with "family") to free up the 'bli_arch' namespace for a
|
||
different purpose (hardware detection).
|
||
- Renamed "bli_arch.h" and "bli_arch_pre_macro_defs.h" in frame/include to
|
||
"bli_arch_config.h" and "bli_arch_config_pre.h", respectively.
|
||
|
||
commit 482af51add26d5ed103c3e3f167657f273b32c7a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 20 15:44:26 2017 -0500
|
||
|
||
Fixed 'make test' target from top-level Makefile.
|
||
|
||
Details:
|
||
- Updated the top-level Makefile's build rule for testsuite object files to
|
||
properly obtain CFLAGS via get-frame-cflags-for() function instead of
|
||
simply using the $(CFLAGS) variable (which is empty). This means that
|
||
'make test' should now work as expected.
|
||
|
||
commit 3c269f700d207efe6c04193f09d519c88c1d4045
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 20 13:57:21 2017 -0500
|
||
|
||
Makefile updates for test drivers, testsuite.
|
||
|
||
Details:
|
||
- Fixed semi-broken testsuite Makefile and very-broken test driver Makefiles,
|
||
as well as those for test/3m4m, test/thread_ranges, and test/exec_sizes
|
||
sub-directories.
|
||
- Factored out much of the top-level Makefile into common.mk. A Makefile
|
||
needs only set DIST_PATH to the relative path to the top level of the
|
||
BLIS source distribution before including common.mk in order to acquire
|
||
all of the definitions typically needed in a Makefile that tests BLIS.
|
||
|
||
commit 0557189d463446b4c32077cdcf0467fa71ca68dc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 18 15:05:27 2017 -0500
|
||
|
||
Minor updates to .travis.yml, configure script.
|
||
|
||
commit 2553734d1d62043793f4e783a027349ef6d4d563
|
||
Merge: 453deb29 37534279
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 18 13:46:50 2017 -0500
|
||
|
||
Merge branch 'master' into rt
|
||
|
||
commit 375342799cbae981c28d831793af588d7951f3f6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 18 13:41:25 2017 -0500
|
||
|
||
Removed a duplicate bli_avx512_macros.h header.
|
||
|
||
Details:
|
||
- Removed a duplicate header file that was causing problems during
|
||
installation for the 'knl' configuration. Thanks to Victor Eijkhout
|
||
for reporting this issue.
|
||
|
||
commit 453deb29068889698e274f269c9aa90eea99b527
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 18 13:29:32 2017 -0500
|
||
|
||
Implemented runtime kernel management.
|
||
|
||
Details:
|
||
- Reworked the build system around a configuration registry file, named
|
||
config_registry', that identifies valid configuration targets, their
|
||
constituent sub-configurations, and the kernel sets that are needed by
|
||
those sub-configurations. The build system now facilitates the building
|
||
of a single library that can contains kernels and cache/register
|
||
blocksizes for multiple configurations (microarchitectures). Reference
|
||
kernels are also built on a per-configuration basis.
|
||
- Updated the Makefile to use new variables set by configure via the
|
||
config.mk.in template, such as CONFIG_LIST, KERNEL_LIST, and KCONFIG_MAP,
|
||
in determining which sub-configurations (CONFIG_LIST) and kernel sets
|
||
(KERNEL_LIST) are included in the library, and which make_defs.mk files'
|
||
CFLAGS (KCONFIG_MAP) are used when compiling kernels.
|
||
- Reorganized 'kernels' directory into a "flat" structure. Renamed kernel
|
||
functions into a standard format that includes the kernel set name
|
||
(e.g. 'haswell'). Created a "bli_kernels_<kernelset>.h" file in each
|
||
kernels sub-directory. These files exist to provide prototypes for the
|
||
kernels present in those directories.
|
||
- Reorganized reference kernels into a top-level 'ref_kernels' directory.
|
||
This directory includes a new source file, bli_cntx_ref.c (compiled on
|
||
a per-configuration basis), that defines the code needed to initialize
|
||
a reference context and a context for induced methods for the
|
||
microarchitecture in question.
|
||
- Rewrote make_defs.mk files in each configuration so that the compiler
|
||
variables (e.g. CFLAGS) are "stored" (renamed) on a per-configuration
|
||
basis.
|
||
- Modified bli_config.h.in template so that bli_config.h is generated with
|
||
#defines for the config (family) name, the sub-configurations that are
|
||
associated with the family, and the kernel sets needed by those
|
||
sub-configurations.
|
||
- Deprecated all kernel-related information in bli_kernel.h and transferred
|
||
what remains to new header files named "bli_arch_<configname>.h", which
|
||
are conditionally #included from a new header bli_arch.h. These files
|
||
are still needed to set library-wide parameters such as custom
|
||
malloc()/free() functions or SIMD alignment values.
|
||
- Added bli_cntx_init_<configname>.c files to each configuration directory.
|
||
The files contain a function, named the same as the file, that initializes
|
||
a "native" context for a particular configuration (microarchitecture). The
|
||
idea is that optimized kernels, if available, will be initialized into
|
||
these contexts. Other fields will retain pointers to reference functions,
|
||
which will be compiled on a per-configuration basis. These bli_cntx_init_*()
|
||
functions will be called during the initialization of the global kernel
|
||
structure. They are thought of as initializing for "native" execution, but
|
||
they also form the basis for contexts that use induced methods. These
|
||
functions are prototyped, along with their _ref() and _ind() brethren, by
|
||
prototype-generating macros in bli_arch.h.
|
||
- Added a new typedef enum in bli_type_defs.h to define an arch_t, which
|
||
identifies the various sub-configurations.
|
||
- Redesigned the global kernel structure (gks) around a 2D array of cntx_t
|
||
structures (pointers to cntx_t, actually). The first dimension is indexed
|
||
over arch_t and the inner dimension is the ind_t (induced method) for
|
||
each microarchitecture. When a microarchitecture (configuration) is
|
||
"registered" at init-time, the inner array for that configuration in the
|
||
2D array is initialized (and allocated, if it hasn't been already). The
|
||
cntx_t slot for BLIS_NAT is initialized immediately and those for other
|
||
induced method types are initialized and cached on-demand, as needed. At
|
||
cntx_t registration, we also store function pointers to cntx_init functions
|
||
that will initialize (a) "reference" contexts and (b) contexts for use with
|
||
induced methods. We don't cache the full contexts for reference contexts
|
||
since they are rarely needed. The functions that initialize these two kinds
|
||
of contexts are generated automatically for each targeted sub-configuration
|
||
from cpp-templatized code at compile-time. Induced method contexts that
|
||
need "stage" adjustments can still obtain them via functions in
|
||
bli_cntx_ind_stage.c.
|
||
- Added new functions and functionality to bli_cntx.c, such as for setting
|
||
the level-1f, level-1v, and packm kernels, and for converting a native
|
||
context into one for executing an induced method.
|
||
- Moved the checking of register/cache blocksize consistency from being cpp
|
||
macros in bli_kernel_macro_defs.h to being runtime checks defined in
|
||
bli_check.c and called from bli_gks_register_cntx() at the time that the
|
||
global kernel structure's internal context is initialized for a given
|
||
microarchitecture/configuration.
|
||
- Deprecated all of the old per-operation bli_*_cntx.c files and removed
|
||
the previous operation-level cntx_t_init()/_finalize() invocations.
|
||
Instead, we now query the gks for a suitable context, usually via
|
||
bli_gks_query_cntx().
|
||
- Deprecated support for the 3m2 and 3m3 induced methods. (They required
|
||
hackery that I was no longer willing to support.)
|
||
- Consolidated the 1e and 1r packm kernels for any given register blocksize
|
||
into a single kernel that will branch on the schema and support packing
|
||
to both formats.
|
||
- Added the cntx_t* argument to all packm kernel signatures.
|
||
- Deprecated the local function pointer array in all bli_packm_cxk*.c files
|
||
and instead obtain the packm kernel from the cntx_t.
|
||
- Added bli_calloc_intl(), which serves as the calloc-equivalent to to
|
||
bli_malloc_intl(). Useful when we wish to allocate and initialize to
|
||
zero/NULL.
|
||
- Converted existing cpp macro functions defined in bli_blksz.h, bli_func.h,
|
||
bli_cntx.h into static functions.
|
||
|
||
commit 4607aac297e55ad540cbe5fffbe02e6b1889c181
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Mon Oct 16 22:06:57 2017 +0530
|
||
|
||
Thread Safety: Move bli_init() before and bli_finalize() after main()
|
||
|
||
BLIS provides APIs to initialize and finalize its global context.
|
||
One application thread can finalize BLIS, while other threads
|
||
in the application are stil using BLIS.
|
||
|
||
This issue can be solved by removing bli_finalize() from API.
|
||
One way to do this is by getting bli_finalize() to execute by default
|
||
after application exits from main().
|
||
|
||
GCC supports this behaviour with the help of __attribute__((destructor))
|
||
added to the function that need to be executed after main exits.
|
||
|
||
Similarly bli_init() can be made to run before application enters main()
|
||
so that application need not call it.
|
||
|
||
Change-Id: I7ce6cfa28b384e92c0bdf772f3baea373fd9feac
|
||
|
||
commit 0f5ce26fc597cda6e8ae93a7526f52eb8cba01e9
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Mon Oct 16 21:07:50 2017 +0530
|
||
|
||
Thread safety: Make the global induced method status array local to thread
|
||
|
||
BLIS retains a global status array for induced methods, and provides
|
||
APIs to modify this state during runtime. So, one application thread
|
||
can modify the state, before another starts the corresponding
|
||
BLIS operation.
|
||
|
||
This patch solves this issue by making the induced method status array
|
||
local to threads.
|
||
|
||
Change-Id: Iff59b6f473771344054c010b4eda51b7aa4317fe
|
||
|
||
commit b882648af87deb1b365fc6b3e94151e69c5ccfa4
|
||
Merge: 8b379069 e02d3cb8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 11 16:32:21 2017 -0500
|
||
|
||
Merge branch 'master' into rt
|
||
|
||
commit 06e0e6351acb9481225975ad9a4e0b8925336621
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Thu Sep 28 12:15:36 2017 +0530
|
||
|
||
The inner loop paralleization is turned off by default, the JR and IR loop parameters are set to 1 by default
|
||
|
||
Change-Id: I8c3c2ecbbd636259f6ffb92768ec04148205c3e5
|
||
|
||
commit e02d3cb84190a345ebe9b32f53db03a1838976b1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 26 19:02:53 2017 -0500
|
||
|
||
Fixed a pthread typo in previous commit.
|
||
|
||
Details:
|
||
- Misnamed 'pthread_mutex_t' type in bli_memsys.c as 'thread_mutex_t'.
|
||
|
||
commit f5962a1aae0fb3c9be104d0035c0d73210e7f670
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 26 17:00:04 2017 -0500
|
||
|
||
Fixed bugs in gemm/gemmtrsm ukr tests in testsuite.
|
||
|
||
Details:
|
||
- Fixed a bug in gemmtrsm test module that was due to improper partitioning
|
||
into a k x k triangular matrix for the purposes of obtaining an mr x k
|
||
micropanel of A with which to test.
|
||
- Fixed a bug in gemm and gemmtrsm test modules that would only manifest for
|
||
very large k (depending on the product of mr x kc on that architecture).
|
||
The bug arose from the fact that the test module was triggering the
|
||
allocation of blocks from the internal memory pools, which are limited in
|
||
size. This allocation imposes an implicit assumption that the micro-
|
||
panel being tested with will fit inside, and this assumption is violated
|
||
for large values of k. Arbitrarily large k may now be tested for both
|
||
operation tests.
|
||
- Added OpenMP/pthread critical sections around the setting or getting of
|
||
statuses from the induced method operation lookup table in bli_l3_ind.c.
|
||
- Added the 'static' keyword to all pthread_mutex_t global variables in BLIS.
|
||
- Thanks to Nisanth Padinharepatt of AMD for reporting the first and third
|
||
issues.
|
||
|
||
commit 8e917b256ca2d4bcdc059fe98d86be8775c69561
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Sep 9 14:10:15 2017 -0500
|
||
|
||
Updated bibtex info for BLIS5 (3m4m) article.
|
||
|
||
commit 7be887057358df4978a4833eeae0c17e15acd9d1
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Mon Aug 28 17:38:22 2017 +0530
|
||
|
||
Merging "Adding auto hardware detection for Zen"
|
||
|
||
Change-Id: Id450fb0c4f91a5cd5cbdc06970f4f9ed28dd8520
|
||
|
||
commit e056d810d16621891ead032603de0c2105cfc0f7
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Mon Aug 28 16:44:42 2017 +0530
|
||
|
||
Bug fix for the testsuite build failing
|
||
|
||
Change-Id: I7cd8c9d187387c48b2564e45cbfb8df985e93d77
|
||
|
||
commit 83796b7caf745fafc263e9e5e1bfcf5eff00c025
|
||
Merge: 8176f4e4 d1ee7762
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Mon Aug 28 05:23:28 2017 -0400
|
||
|
||
Merge "Adding auto hardware detection for Zen" into amd-staging
|
||
|
||
commit d1ee776202b26874333af7a91b6d2686342c4c81
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Wed Aug 23 13:01:14 2017 +0530
|
||
|
||
Adding auto hardware detection for Zen
|
||
|
||
Change-Id: I40ce6705dd66b35000c4ccddffad1c5b65998caf
|
||
|
||
commit 8176f4e43872714b997f1a5f83056daadb0ff1a5
|
||
Merge: 12413018 adafe974
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Aug 28 12:21:16 2017 +0530
|
||
|
||
resolving conflicts bli_gemm_front.c and LICENCE
|
||
|
||
Change-Id: Id24ce53896d4c1c7ceccc3e004014a0ecceb5474
|
||
|
||
commit 57e1e5cd51e7ffe8612c96a20b6a041b55426ddb
|
||
Merge: f86ce54d d6ef56c6
|
||
Author: Nisanth M P <nisanth.padinharepatt@amd.com>
|
||
Date: Tue Aug 22 17:07:44 2017 +0530
|
||
|
||
Merge AMD authored changes
|
||
|
||
commit adafe974b4bc3fc0663bc2f6f4ce2fde71a97988
|
||
Merge: f86ce54d 7dc78b49
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Aug 15 15:17:21 2017 -0500
|
||
|
||
Merge pull request #150 from devinamatthews/vzeroupper
|
||
|
||
Add vzeroupper to Intel AVX kernels.
|
||
|
||
commit 7dc78b49f97e6b3cd6d72fcdc588ace534d0e700
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Aug 15 10:02:25 2017 -0500
|
||
|
||
Add vzeroupper to Intel AVX kernels.
|
||
|
||
commit f86ce54d6f315006984534fe29e47a2deaacc9f5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 10 16:24:28 2017 -0500
|
||
|
||
Removed trailing enum commas from bli_type_defs.h.
|
||
|
||
Details:
|
||
- Removed trailing commas from enums in bli_type_defs.h. Thanks to
|
||
Erling Andersen for pointing out this inconsistency and suggesting
|
||
the change.
|
||
|
||
commit 60a1eeb2317939d732b9eb6ff1e0d6d668c9a1e5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Aug 5 13:04:31 2017 -0500
|
||
|
||
Added edge handling to _determine_blocksize_b().
|
||
|
||
Details:
|
||
- Added explicit handling of situations where i == dim to
|
||
bli_determine_blocksize_b_sub(). This isn't actually needed by any
|
||
current use case within BLIS, but handling the situation is nonetheless
|
||
prudent. Thanks to Minh Quan for reporting this issue and requesting
|
||
the fix.
|
||
|
||
commit b01c80829907d50ec79977fba8e7b53cfe7db80a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Aug 4 14:17:44 2017 -0500
|
||
|
||
Fixed a minor bug in level-3 packm management.
|
||
|
||
Details:
|
||
- Fixed a bug in bli_l3_packm() that caused cntl_t-cached packed mem_t
|
||
entries to be released and then re-acquired unnecessarily. (In essence,
|
||
the "<" operands in the conditional that guards the
|
||
release-and-reacquire code block simply needed to be swapped.) The bug
|
||
should have only affected performance (rather than the computed result).
|
||
Thanks to Minh Quan for identifying and reporting the bug.
|
||
|
||
commit 8b379069fcd4811669855b1248ece831f190dff6
|
||
Merge: 1f3a5819 05925dd5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 1 15:30:40 2017 -0500
|
||
|
||
Merge branch 'master' into rt
|
||
|
||
commit 05925dd5d30e8f403bb671ce33029170d65ce7c0
|
||
Merge: 803bbef0 cecdc05d
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Aug 1 09:31:02 2017 -0500
|
||
|
||
Merge pull request #146 from devinamatthews/master
|
||
|
||
Change lsame_ signature to match lapacke.
|
||
|
||
commit cecdc05d2834786a84ff85775d3f99a958c0765a
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Jul 31 15:19:51 2017 -0500
|
||
|
||
Change lsame_ signature to match lapacke.
|
||
|
||
commit 803bbef0a386dd0571ad389f69d55154dbfe3c50
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Jul 29 20:17:05 2017 -0500
|
||
|
||
Fixed pthreads compile bug with previous commit.
|
||
|
||
Details:
|
||
- Erroneously passed family parameter into l3int_t function despite
|
||
that function not taking the parameter. Oops.
|
||
|
||
commit c63980f4ca750618f359031d0691289b1abf5146
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Jul 29 14:53:39 2017 -0500
|
||
|
||
Moved 'family' field from cntx_t to cntl_t.
|
||
|
||
Details:
|
||
- Removed the family field inside the cntx_t struct and re-added it to the
|
||
cntl_t struct. Updated all accessor functions/macros accordingly, as well
|
||
as all consumers and intermediaries of the family parameter (such as
|
||
bli_l3_thread_decorator(), bli_l3_direct(), and bli_l3_prune_*()). This
|
||
change was motivated by the desire to keep the context limited, as much
|
||
as possible, to information about the computing environment. (The family
|
||
field, by contrast, is a descriptor about the operation being executed.)
|
||
- Added additional functions to bli_blksz_*() API.
|
||
- Added additional functions to bli_cntx_*() API.
|
||
- Minor updates to bli_func.c, bli_mbool.c.
|
||
- Removed 'obj' from bli_blksz_*() API names.
|
||
- Removed 'obj' from bli_cntx_*() API names.
|
||
- Removed 'obj' from bli_cntl_*(), bli_*_cntl_*() API names. Renamed routines
|
||
that operate only on a single struct to contain the "_node" suffix to
|
||
differentiate with those routines that operate on the entire tree.
|
||
- Added enums for packm and unpackm kernels to bli_type_defs.h.
|
||
- Removed BLIS_1F and BLIS_VF from bszid_t definition in bli_type_defs.h.
|
||
They weren't being used and probably never will be.
|
||
|
||
commit 07837395560d413a1ba828163b41186e21a7bcfe
|
||
Merge: ca1d1d85 ad8610b4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jul 21 16:49:48 2017 -0500
|
||
|
||
Merge pull request #139 from Maratyszcza/emscripten
|
||
|
||
Fix Emscripten builds
|
||
|
||
commit ad8610b4415cc7982804d74f9aba29875e9e2b6c
|
||
Merge: 8772a0b3 ca1d1d85
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jul 21 15:18:33 2017 -0500
|
||
|
||
Merge branch 'master' into emscripten
|
||
|
||
commit ca1d1d8560c9ab1a7e3b0ac43ac70d08075bf904
|
||
Merge: b537b5bb 733faf84
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 21 09:49:50 2017 -0500
|
||
|
||
Merge pull request #144 from devinamatthews/fix_atomics_on_bgq
|
||
|
||
Add fallbacks to __sync_* or __c11_atomic_* builtins...
|
||
|
||
commit 733faf848dcc54834fcdfbb0185dc644978d8864
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Jul 20 14:50:13 2017 -0500
|
||
|
||
Clang can't make up it's mind what to support.
|
||
|
||
commit 7425d0744d9e9cd29a887120e57c2b43ba287040
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Jul 20 12:54:58 2017 -0500
|
||
|
||
Add default #define for __has_extension.
|
||
|
||
commit b537b5bbe8cbee459a85bac11458498ae2bce4de
|
||
Merge: 1f1ec0db 7f41bb0a
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Jul 20 10:58:39 2017 -0500
|
||
|
||
Merge pull request #133 from devinamatthews/haswell-packdim
|
||
|
||
Fix prefetching in haswell ukernel
|
||
|
||
commit 8823f91a14638ce6f4e45e67df03212bb61609d6
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Jul 20 10:04:34 2017 -0500
|
||
|
||
Add fallbacks to __sync_* or __c11_atomic_* builtins when __atomic_* is not supported. Fixes #143.
|
||
|
||
commit 1f1ec0db9380b87679d5c771c4594daa1cfc5f0d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 19 15:40:48 2017 -0500
|
||
|
||
Updated ar option list used by all configurations.
|
||
|
||
Details:
|
||
- Dropped 'u' from the list of modifiers passed into the library archiver
|
||
ar. Previously, "cru" was used, while now we employ only "cr". This
|
||
change was prompted by a warning observed on Ubuntu 16.04:
|
||
|
||
ar: `u' modifier ignored since `D' is the default (see `U')
|
||
|
||
This caused me to realize that the default mode causes timestamps to be
|
||
zero, and thus the 'u' option, which causes only changed object files to
|
||
be inserted, is not applicable.
|
||
|
||
commit 5caaba2d61cbbc36d63102a0786ece28ff797f72
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 19 13:51:53 2017 -0500
|
||
|
||
Added --force-version=STRING option to configure.
|
||
|
||
Details:
|
||
- Added an option to configure that allows the user to force an arbitrary
|
||
version string at configure-time. The help text also now describes the
|
||
usage information.
|
||
- Changed the way the version string is communicated to the Makefile.
|
||
Previously, it was read into the VERSION variable from the 'version' file
|
||
via $(shell cat ...). Now, the VERSION variable is instead set in
|
||
config.mk (via a configure-substituted anchor from config.mk.in).
|
||
|
||
commit 13175c5fb70fb6a378d5fff6ecede62e5ea6a1f6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 18 17:56:00 2017 -0500
|
||
|
||
Updated openmp/pthread barriers with GNU atomics.
|
||
|
||
Details:
|
||
- Updated the non-tree openmp and pthreads barriers defined in
|
||
bli_thrcomm_openmp.c and bli_thrcomm_pthreads.c to instead call a common
|
||
implementation in bli_thrcomm.c, bli_thrcomm_barrier_atomic(). This new
|
||
implementation goes through the same motions as the previous codes, but
|
||
protects its loads and increments with GNU atomic built-ins. These atomic
|
||
statements take memory ordering parameters that allow us to specify just
|
||
enough constraints for the barrier to work as intended on weakly-ordered
|
||
hardware. The prior implementation was only guaranteed to work on systems
|
||
with strongly- ordered memory. (Thanks to Devin Matthews for suggesting
|
||
this change and his crash-course in atomics and memory ordering.)
|
||
- Removed 'volatile' from structs' barrier field declarations in
|
||
bli_thrcomm_*.h.
|
||
- Updated bli_thrcomm_pthread.? files to use renamed struct barrier fields
|
||
consistent with that of the _openmp.? files.
|
||
- Updated other bli_thrcomm_* files to rename "communicator" variables to
|
||
simply "comm".
|
||
|
||
commit 0e58ba1b3aa84700ca51a96f1c0eed6067562fba
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 17 19:03:22 2017 -0500
|
||
|
||
Added API to set mt environment variables.
|
||
|
||
Details:
|
||
- Renamed bli_env_get_nway() -> bli_thread_get_env().
|
||
- Added bli_thread_set_env() to allow setting environment variables
|
||
pertaining to multithreading, such as BLIS_JC_NT or BLIS_NUM_THREADS.
|
||
- Added the following convenience wrapper routines:
|
||
bli_thread_get_jc_nt()
|
||
bli_thread_get_ic_nt()
|
||
bli_thread_get_jr_nt()
|
||
bli_thread_get_ir_nt()
|
||
bli_thread_get_num_threads()
|
||
bli_thread_set_jc_nt()
|
||
bli_thread_set_ic_nt()
|
||
bli_thread_set_jr_nt()
|
||
bli_thread_set_ir_nt()
|
||
bli_thread_set_num_threads()
|
||
- Added #include "errno.h" to bli_system.h.
|
||
- This commit addresses issue #140.
|
||
- Thanks to Chris Goodyer for inspiring these updates.
|
||
|
||
commit 8772a0b33a90154c80d88b381dcdd66f824e041f
|
||
Author: Marat Dukhan <marat@fb.com>
|
||
Date: Thu Jul 13 21:39:24 2017 -0700
|
||
|
||
Fix Emscripten builds
|
||
|
||
commit 72c8b49bb8d3b9370b2cc37718da22f065de9c57
|
||
Merge: 70cc825b ba7cada5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 12 14:58:12 2017 -0500
|
||
|
||
Merge pull request #138 from hominhquan/membrk_set_free_fp
|
||
|
||
Set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
|
||
|
||
commit ba7cada51a238d320528e3504ed0f0a17a6b022a
|
||
Author: Minh Quan HO <mqho@kalray.eu>
|
||
Date: Fri Jul 7 10:52:05 2017 +0200
|
||
|
||
set missing free_fp in bli_membrk_init for free-ing GEN_USE buffers
|
||
|
||
The membrk's free_fp is called when releasing GEN_USE buffers, but this free_fp is
|
||
not set in bli_membrk_init
|
||
|
||
commit 1241301869957c96f16a2c6567e3ad70afa547de
|
||
Merge: 969b67e8 25ead66f
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Wed Jul 5 02:24:00 2017 -0400
|
||
|
||
Merge "Reducing the framework overhead of GEMV routines" into amd-staging
|
||
|
||
commit 25ead66fb78557f73af48bac305724d5d8aa3309
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Fri Jun 30 12:23:19 2017 +0530
|
||
|
||
Reducing the framework overhead of GEMV routines
|
||
|
||
Change-Id: I83607ad767bff74e305e915b54b0ea34ec3e5684
|
||
|
||
commit 969b67e8800fbd5d14a086606f3b5afbf66ed093
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Tue Jul 4 12:57:32 2017 +0530
|
||
|
||
Improved efficiency of dGEMM for large matrices by reducing TLB load misses and majorly L3 cache misses. This is achieved by changing the packed block sizes of matrix A & B. Now the optimum values are MC_D = 510 and KC_D = 1024.
|
||
|
||
Change-Id: I2d8bdd5f62f2d1f8782ae2997f3d7a26587d1ca4
|
||
|
||
commit 70cc825b552dec05165b9d70f9e6eb33d8abb118
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Jun 6 21:58:21 2017 -0500
|
||
|
||
Update LICENSE
|
||
|
||
Remove totally unnecessary first 9 lines and hopefully get Github to recognize it as 3BSD [ci skip].
|
||
|
||
commit cf54c77bc79a0f33a514be72c80a654c4e6e6f63
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Jun 6 20:23:17 2017 -0500
|
||
|
||
Add new SSI acknowledgment
|
||
|
||
commit d6ef56c6dbaf6df8ee1af1ca6a0f0792a811396a
|
||
Author: prangana <pradeep.rao@amd.com>
|
||
Date: Thu Jun 1 16:11:09 2017 +0530
|
||
|
||
Update version number
|
||
|
||
Change-Id: Ib6e52d1d34c0791367ab9152dfab31f94deedeb4
|
||
|
||
commit 897bfa0e92082c30bbb74229562d7d7327cbbac8
|
||
Author: prangana <pradeep.rao@amd.com>
|
||
Date: Thu Jun 1 16:11:09 2017 +0530
|
||
|
||
Update version number
|
||
|
||
Change-Id: Ib6e52d1d34c0791367ab9152dfab31f94deedeb4
|
||
|
||
commit 99d0ba5606d4b63e6a9c639aa78d4defc2455f79
|
||
Merge: be2c7eb8 6d17e012
|
||
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
|
||
Date: Thu Jun 1 02:19:02 2017 -0400
|
||
|
||
Merge "Checked in the small matrix code to compute GEMM called with A transpose case" into amd-staging
|
||
|
||
commit 6d17e0120fe5c127b941136ad2c0c08e91439535
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Wed May 24 11:48:16 2017 +0530
|
||
|
||
Checked in the small matrix code to compute GEMM called with A transpose case
|
||
|
||
Change-Id: I29f40046d43d7a4b037c1cb322503ee26495f462
|
||
|
||
commit 9d93f8481a1404695f7b78a3ced8ca47e890b649
|
||
Author: prangana <pradeep.rao@amd.com>
|
||
Date: Tue May 30 09:58:10 2017 +0530
|
||
|
||
Update Licence File
|
||
|
||
Change-Id: I4c5cf1690d0cef92a68400f9a89e454ab6856ad2
|
||
|
||
commit be2c7eb85168937bd4318f4d05ded37620119310
|
||
Author: prangana <pradeep.rao@amd.com>
|
||
Date: Tue May 30 09:58:10 2017 +0530
|
||
|
||
Update Licence File
|
||
|
||
Change-Id: I4c5cf1690d0cef92a68400f9a89e454ab6856ad2
|
||
|
||
commit 7f41bb0a0becde6a7de7df0f99668d7b4686c3b0
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri May 26 14:49:31 2017 -0400
|
||
|
||
PACKDIM_MR=8 didn't work out, but messing with the prefetching helps 2%.
|
||
|
||
commit d87614af3f3d9187be94d6e77984b282bf890928
|
||
Author: Devin Matthews <dmatthews@gator3.ufhpc>
|
||
Date: Fri May 26 14:47:36 2017 -0400
|
||
|
||
Revert "Change PACKDIM_MR (double) for haswell to 8."
|
||
|
||
This reverts commit 681eec913d7c2ebcff637cec5c1627ced9a92b99.
|
||
|
||
commit 681eec913d7c2ebcff637cec5c1627ced9a92b99
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri May 26 12:28:09 2017 -0500
|
||
|
||
Change PACKDIM_MR (double) for haswell to 8.
|
||
|
||
commit 0a3ae0ecaa0ddcb5887005d7051fa234499f1120
|
||
Merge: 0f4e6652 6e04f9df
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Sat May 20 16:53:50 2017 +0530
|
||
|
||
frame/3/gemm/bli_gemm_front.c
|
||
|
||
Change-Id: I52a0fbc1d33bb948d430942323bbc5fe44e3ca13
|
||
|
||
commit 6e04f9df01d79c1b0e673943ca0d5d0a6095eb2e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 17 13:03:52 2017 -0500
|
||
|
||
Restored deleted lines from makefile fragments.
|
||
|
||
commit ec5c0c0448275280dca0991f6f33afeb73650450
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed May 17 12:29:44 2017 -0500
|
||
|
||
Change to /bin/sh.
|
||
|
||
All scripts checked with Debian's checkbashisms. Also check for clang first in auto-detect.sh.
|
||
|
||
commit 555ddc30d4c7e44f3f335e436c98606f56e1598b
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed May 17 12:27:14 2017 -0500
|
||
|
||
Remove shebangs from makefiles.
|
||
|
||
commit f26bd7f42e0c2a47fe321b2c452644990b689654
|
||
Merge: cbf8710a 169fb05f
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed May 17 11:58:41 2017 -0500
|
||
|
||
Merge pull request #128 from iotamudelta/master
|
||
|
||
Portability and clang
|
||
|
||
commit 169fb05f225c2f060265bcaa872f7f80dc638b70
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 23:11:22 2017 -0400
|
||
|
||
Fix if/else structure. Thanks to TravisCI.
|
||
|
||
commit 0579dfea0bcfbb90ebc073fcf78b92a5cf7238e1
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 22:58:07 2017 -0400
|
||
|
||
Restore version.
|
||
|
||
commit a75b05c23dc786a1fdc45dc1627a5ce2299f1a7b
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 22:23:27 2017 -0400
|
||
|
||
Mark piledriver compilable w/ clang.
|
||
|
||
commit 7541d46e2ba8659bb2e36b444edef112fefa1345
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 22:12:12 2017 -0400
|
||
|
||
Mark bulldozer compilable w/ clang.
|
||
|
||
commit 91f897073ec0df3330ede449c4d6af8158266ae3
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 22:06:59 2017 -0400
|
||
|
||
Correct error message.
|
||
|
||
commit f5131e1e49167f948bddd714bb1af1761829c212
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 22:03:23 2017 -0400
|
||
|
||
Indeed once can compile for carrizo also using clang.
|
||
|
||
commit 5fa4e9439c04f35f89dd7d26ff742cb2dadc3180
|
||
Author: J M Dieterich <dieterich@ogolem.org>
|
||
Date: Tue May 16 21:50:49 2017 -0400
|
||
|
||
A bunch of shebang fixes from unportable /bin/bash to portable /usr/bin/env bash
|
||
|
||
commit 1f3a58197e5d5f9ac862bda91e7527cbfbab5d76
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon May 8 16:10:03 2017 -0500
|
||
|
||
Housekeeping, induced method file/function renames.
|
||
|
||
Details:
|
||
- Renamed all level-3 induced method files to use the "_vir.c" suffix
|
||
instead of "_ref.c". Also renamed functions within these files
|
||
accordingly.
|
||
- Renamed cpp macro definitions in frame/ind/include according to the
|
||
above changes.
|
||
- Removed frame/3/old.
|
||
|
||
commit cbf8710a1ba63e25aadaa6fc5da51ea81b3d596d
|
||
Merge: cf39d3ef fdc66f12
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Mon May 8 11:21:20 2017 -0500
|
||
|
||
Merge pull request #127 from devinamatthews/fix_blis_nt_xx
|
||
|
||
Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS
|
||
|
||
commit cf39d3ef3b29b8058c39fb4638c1a734fe64aaed
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri May 5 15:06:56 2017 -0500
|
||
|
||
Fixed a bug in norm1v, norm1m.
|
||
|
||
Details:
|
||
- Fixed a bug that manifested as improperly-computed 1-norm for vectors
|
||
and matrices. This is one of the few operations in BLIS that does not
|
||
have its own test module within the testsuite, hence why it went
|
||
undetected for so long. The bad 1-norms were being used to normalize
|
||
matrices in the testsuite after initialization, which led to some
|
||
matrices containing a combination of "large" and "small" values. This
|
||
tended to push the residuals computed after each test away from zero.
|
||
In some cases, they were off *just* enough to the testsuite to label
|
||
it a "failure". Many thanks to Jeff Hammond for reporting this bug.
|
||
(Wonky details: the bug was due to improperly-defined level-0 scalar
|
||
macros for abval2, an operation that computes the absolute square,
|
||
or complex magnitude/modulus. Certain complex domain instances of
|
||
abval2 were being incorrectly defined in terms of real-only solutions,
|
||
leading to bad results. This level-0 operation forms the basis of
|
||
norm1v/norm1m. absq2 was also affected, but almost nothing uses
|
||
this operation.)
|
||
|
||
commit 799485124f4d823e908d2e5d38b0c3a1e6172ade
|
||
Merge: 773a24ef 0df3541f
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu May 4 10:52:09 2017 -0500
|
||
|
||
Merge pull request #121 from jeffhammond/not-real-knl
|
||
|
||
allow KNL build without hbwmalloc (i.e. emulated)
|
||
|
||
commit fdc66f12d40754ff46179804bff592fddafbca02
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu May 4 10:35:22 2017 -0500
|
||
|
||
Setting any one of BLIS_NT_[IJ][CR] overrides BLIS_NUM_THEADS. Missing BLIS_NT_XX's are defaulted to 1. Fixes #123.
|
||
|
||
commit 773a24efb2fa1c3a220bf0ce1dd621a3176196da
|
||
Merge: dd58c954 b8854259
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 3 15:07:59 2017 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit dd58c9545c877c3f7553eaebca7b5e9720a66f5d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 3 15:04:51 2017 -0500
|
||
|
||
Disable complex 3m/4m in testsuite by default.
|
||
|
||
Details:
|
||
- Disabled testsuite tests of all level-3 implementations based on 3m
|
||
and 4m. This will improve testing runtime on Travis CI as well as for
|
||
anyone manually running the testsuite using default test parameters.
|
||
Thanks to Devin Matthews for suggesting this change.
|
||
|
||
commit 0df3541f54b7fe0c604ab2ec47ba814f12391798
|
||
Author: Jeff Hammond <jeff.science@gmail.com>
|
||
Date: Tue May 2 19:25:21 2017 -0700
|
||
|
||
allow KNL build without hbwmalloc.h (i.e. emulated)
|
||
|
||
we want to be able to run BLIS KNL binaries on non-KNL machines via SDE.
|
||
although it is possible to install hbwmalloc implementation on such
|
||
systems, it is easier not to, since obviously the performance of SDE
|
||
execution is not representative so there is no reason to emulate HBW
|
||
allocation.
|
||
|
||
commit b88542591d4dd0cde366e5ae35afd3205cb81bdc
|
||
Merge: 43007f7b c2c91e09
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 19:22:41 2017 -0500
|
||
|
||
Merge pull request #107 from jeffhammond/intel-compilers-no-use-libm
|
||
|
||
never use libm with Intel compilers
|
||
|
||
commit 43007f7b65ec7926cbbfc39965ff733fa251c15f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 16:48:43 2017 -0500
|
||
|
||
Fixed stray parentheses in README citations.
|
||
|
||
commit a4f1d0b8801c114e9ef8be39df01e1b8d27ebcb3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 16:38:43 2017 -0500
|
||
|
||
CHANGELOG update (0.2.2)
|
||
|
||
commit 940a707ac78de975110e17c95765e65b89aa5e10 (tag: 0.2.2)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 16:38:42 2017 -0500
|
||
|
||
Version file update (0.2.2)
|
||
|
||
commit d5a5e003ea9b24bb6abf12e88862e8eb61ffb03d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 15:48:30 2017 -0500
|
||
|
||
Fixed a trsm1m bug that affected right-side cases.
|
||
|
||
Details:
|
||
- Fixed a bug introduced in 1c732d3 that affected trsm1m_r. The result
|
||
was nondeterministic behavior (usually segmentation faults) for certain
|
||
problem sizes beyond the 1m instance of kc (e.g. 128 on haswell). The
|
||
cause of the bug was my commenting out lines in bli_gemm1m_ukr_ref.c
|
||
which explicitly directed the virtual gemm micro-kernel to use temporary
|
||
space if the storage preference of the [real domain] gemm ukernel did
|
||
not match the storage of the output matrix C. In the context of gemm,
|
||
this handling is not needed because agreement between the storage pref
|
||
and the matrix is guaranteed by a high-level optimization in BLIS.
|
||
However, this optimization is not applied to trsm because the storage
|
||
of C is not necessarily the same as the storage of the micro-panels of
|
||
B--both of which are updated by the micro-kernel during a trsm
|
||
operation. Thus, the guarantee of storage/preference agreement is not
|
||
in place for trsm, which means we must handle that case within the
|
||
virtual gemm micro-kernel.
|
||
- Comment updates and a minor macro change to bli_trsm*_cntx_init() for
|
||
3m1, 4m1a, and 1m.
|
||
|
||
commit e80993e71f4d571e9650a8e90ed386e32059eae5
|
||
Merge: a509fbd5 ca3a7924
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 12:30:28 2017 -0500
|
||
|
||
Merge branch 'master' into 1m
|
||
|
||
commit ca3a7924770d6cf203cce4ca9f5482e1d0d4e961
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 2 12:09:39 2017 -0500
|
||
|
||
README.md update.
|
||
|
||
Details:
|
||
- Updated bibtex entries for 4th BLIS paper, and adds entries for 5th
|
||
and 6th BLIS papers.
|
||
|
||
commit 0f4e6652dfe9b30105d3bab328ac26d9d5c11182
|
||
Merge: 42e7f6fb 6e7de6ef
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Wed Apr 19 17:54:10 2017 +0530
|
||
|
||
Merge master code till 2017_04_19 to amd-staging
|
||
|
||
Change-Id: Ibebe83c8ea2e7eb15798c2bcf214b7228a1c9518
|
||
|
||
commit 42e7f6fb2a531429ee600b2fe0293b67371c7ccb
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Tue Mar 28 18:10:03 2017 +0530
|
||
|
||
fixed license attribute issues in AMD added files
|
||
|
||
Change-Id: I303f870a777c7cd1c1af29ea0b93f3e0a27948e4
|
||
|
||
commit 5600001e973c6cea048bd3fdb28117f1d7c98b9d
|
||
Merge: 0b190293 b3ed4933
|
||
Author: prangana <pradeep.rao@amd.com>
|
||
Date: Mon Mar 20 13:56:33 2017 +0530
|
||
|
||
Fix merge conflicts after sync with release branch
|
||
|
||
Change-Id: Icf14a09f728befb69a73fff9fa79c4128e728310
|
||
|
||
commit 6e7de6ef84babb273dc5528a9b9d01f0febe394b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 17 12:10:24 2017 -0500
|
||
|
||
Minor updates to test/3m4m.
|
||
|
||
Details:
|
||
- Updated initial problem size and increment in Makefile.
|
||
- Updated code in test_gemm.c to correctly query kc from context.
|
||
|
||
commit f484c6cd4389dc7ae5b972849e12e98ad5bbf9a4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 17 12:07:27 2017 -0500
|
||
|
||
Whitespace reformatting to armv8a kernels file.
|
||
|
||
Details:
|
||
- Updated formatting of function signature/header in
|
||
kernels/armv8a/3/bli_gemm_opt_4x4.c.
|
||
|
||
commit 0b19029342ffc530fa22ef20398a26221cb8f6ec
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Tue Mar 14 14:51:31 2017 +0530
|
||
|
||
Code cleanup, removed warnings from trsm, removed unused routines in axpyv & scalv
|
||
|
||
Change-Id: I02867f394c5f416194c4b1769a6c75f39243ec81
|
||
|
||
commit 825363bd2a5a60a923d4a6d9691dc143845a9cab
|
||
Merge: 093bdb80 513944e4
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Wed Mar 8 15:42:49 2017 +0530
|
||
|
||
Merge code from master to amd-staging as on 2017_03_08 by praveeng
|
||
|
||
Change-Id: I80740081b2cb54c9b77a3e78b9fe540e170be23d
|
||
|
||
commit 093bdb80c86b06367e595aa17487139ae983822f
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Tue Mar 7 13:35:50 2017 +0530
|
||
|
||
Checked in Unpacked DGEMM code
|
||
|
||
Change-Id: I39dcc7b238b328f73ee2675d21a5e521d0488723
|
||
|
||
commit 33923da9a108854590d386e74b6ee66b971e7796
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Mon Mar 6 14:31:31 2017 +0530
|
||
|
||
Added variant 10 for double precision axpyv microkernel
|
||
|
||
Change-Id: I7a20cc113a422603250bc450825c965136354974
|
||
|
||
commit bc828f7f8e3ddb9f58af07edc0b935b21759fb0f
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Fri Mar 3 14:45:35 2017 +0530
|
||
|
||
Added new axpyv (single precision) microkernel where it performs 10 FMAs per loop- This gives better performance than all other implementations of axpyv
|
||
|
||
Change-Id: Ic4f0e4c67e367d67d0b24febcf34f81a70a39972
|
||
|
||
commit c9949f4603419267c10973adf1d63ec38497475d
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Fri Feb 17 14:16:33 2017 +0530
|
||
|
||
Checked in DGEMMTRSM and edge case handling routine in DDOTXF
|
||
|
||
Change-Id: I65f00661af6c09b2507294fd43e0a10641c0597e
|
||
|
||
commit a509fbd5ac04fafd4e51b43d2f59ca56432dc212
|
||
Merge: 69b4846a 513944e4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 21 17:06:16 2017 -0600
|
||
|
||
Merge branch 'master' into 1m
|
||
|
||
commit 69b4846ae9adb157c4171b52e159684db2867853
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 21 15:33:39 2017 -0600
|
||
|
||
Disabled experiment-related 1m code.
|
||
|
||
Details:
|
||
- Commented out code in frame/ind/oapi/bli_l3_3m4m1m_oapi.c that was
|
||
specifically inserted to facilitate the benchmarking of 1m block-panel
|
||
and panel-block algorithms.
|
||
- Updates to test/3m4m/Makefile, runme.sh script, and test_gemm.c to
|
||
reflect changes used/needed during benchmarking.
|
||
|
||
commit 513944e4a951d8823b4de161b86ad7a965b4d99b
|
||
Merge: 8b462a0e 0e18f68c
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Feb 20 10:04:33 2017 -0500
|
||
|
||
Merge pull request #118 from devinamatthews/master
|
||
|
||
Handle k=0 correctly in KNL dgemm ukernel.
|
||
|
||
commit 0e18f68cf12eb9189ba901a20040b1cdae417670
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Feb 20 09:03:21 2017 -0600
|
||
|
||
Handle k=0 correctly in KNL dgemm ukernel.
|
||
|
||
commit 8b462a0e8c3e9252f0401940849e53cc772256fa
|
||
Merge: c362afc5 7d42fc07
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Sun Feb 19 23:03:03 2017 -0500
|
||
|
||
Merge pull request #117 from devinamatthews/master
|
||
|
||
Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.
|
||
|
||
commit 7d42fc0796ef0c010375fd8e59b1240ba41ce4d2
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Sun Feb 19 21:10:55 2017 -0500
|
||
|
||
Cast dim_t and inc_t parameters to 64-bit in KNL microkernels.
|
||
|
||
commit 04245c9ff7f8b3c70d61003029c964bb9a4320ee
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Fri Feb 10 14:24:30 2017 +0530
|
||
|
||
Reoptimized scalv routines - two vector multiplies are done per iteration, and these routines are enabled in bli_kernel.h
|
||
|
||
Change-Id: Ic5654508573d1f6bde2edef06aefe117e581feb5
|
||
|
||
commit c362afc525bab4050581d1b0fcea2fe4d582c608
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 9 11:54:59 2017 -0600
|
||
|
||
Added missing "level-0" BLAS [sd]cabs1_().
|
||
|
||
Details:
|
||
- Fixed issue #115 by adding implementations for scabs1_() and dcabs1_()
|
||
to the BLAS compatibility layer. Thanks to heroxbd for pointing out
|
||
their absence.
|
||
|
||
commit 018180c938c32efbeaaf626ba71ec5b780664db1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 8 11:20:52 2017 -0600
|
||
|
||
Fixed a minor bug in configure (issue #114).
|
||
|
||
Details:
|
||
- Fixed a bug in the configure script whereby a non-preferred value for
|
||
--enable-threading would cause problems in common.mk vis-a-vis detecting
|
||
which threading model was chosen. Thanks to heroxbd for reporting this
|
||
issue.
|
||
|
||
commit 58b5b77e5fdb179ea465e398e416e6a00d917e05
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Wed Feb 8 21:43:34 2017 +0530
|
||
|
||
Fixed a bug in axpyv, the arguments passed to intrinsic fmad instruction are corrected
|
||
|
||
Change-Id: If12f24c6bc74b22ac9e4acd6b9378e06d79f2f5e
|
||
|
||
commit 85de4ebf74d0a5587d5a12724eb5489d51674db3
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Wed Feb 8 14:41:04 2017 +0530
|
||
|
||
variant 4 axpyv single precision modified: explicitly used FMA intrinsics, replaced vector multiply and add operations
|
||
|
||
Change-Id: I975feef56696d479d2b9e9441b0660021cf4f6ff
|
||
|
||
commit 3fa53e8af31d634779f40258c51483ae8af494fa
|
||
Merge: b5291a44 95be7b04
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Wed Feb 8 11:46:34 2017 +0530
|
||
|
||
Merged axpyv and gemm small in bli_kernel.h
|
||
Merge branch 'amd-staging' of ssh://git.amd.com:29418/cpulibraries/er/blis into amd-staging
|
||
|
||
modified: config/zen/bli_kernel.h
|
||
modified: frame/3/gemm/bli_gemm_front.c
|
||
modified: kernels/x86_64/zen/3/bli_gemm_small_matrix.c
|
||
|
||
Change-Id: If181cf9345178c448b3530beb8bef453917fe295
|
||
|
||
commit 95be7b04709e688a4cb01fba680081e30f4258ef
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Tue Feb 7 14:01:27 2017 +0530
|
||
|
||
Added logic for packing matrix A and prefetching matrix C in Unpacked SGEMM code
|
||
|
||
Change-Id: I99efeca9eb5b4449286ec0ec133fd554ef1bb4f0
|
||
|
||
commit b5291a445b1313e01f1e0e8102c5f3660ab07f69
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Tue Feb 7 12:39:31 2017 +0530
|
||
|
||
Added optimization variant 4 for axpyv single precision - this performs 5 FMA per loop, keeping the IPC always full
|
||
|
||
Change-Id: Ie77ed22584271136a257e673bcd3b1ba71136bc9
|
||
|
||
commit f4bfc1662af82aa4b98185334c44835e51f1cbec
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Mon Feb 6 15:04:27 2017 +0530
|
||
|
||
New routines implemented for axpyv to improve performance for small vector sizes, vectorization is done for vectors as small as 8 (single precision) 4(double precision), since this operation has low compute to memory ratio, higher matrix sizes memory operations are dominating and hence not much gain - This still needs some work- added saxpyv and daxpyv var 3 routines in the file bli_axpyv_opt_var1.c
|
||
|
||
Change-Id: Ic1b33bd5516e10113b00e44ab41b97eb19d46072
|
||
|
||
commit ddf45e71770c55ea4a58ca24ea4913fe5d8beb9b
|
||
Merge: a6ab91bc 78e1b16e
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jan 27 14:25:40 2017 -0600
|
||
|
||
Merge pull request #113 from devinamatthews/knl_thread_params
|
||
|
||
Change default threading parameters for KNL.
|
||
|
||
commit 78e1b16e16d589ed31b2e712115ee282097f114d
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jan 27 14:22:20 2017 -0600
|
||
|
||
Change default threading parameters for KNL.
|
||
|
||
commit 574472ba5a89924eca7dbd10055d0e1dcd7f4c71
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Tue Jan 10 14:51:46 2017 +0530
|
||
|
||
checked in unpacked SGEMM optimization
|
||
|
||
Change-Id: I8e4ea374415c0c402c660b656fb076af15354181
|
||
|
||
commit 1c732d3ddc4ac0861d3b0e0dd15eb7e071615502
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jan 25 16:25:46 2017 -0600
|
||
|
||
Added 1m-specific APIs for bp, pb gemm algorithms.
|
||
|
||
Details:
|
||
- Defined bli_gemmbp_cntl_create(), bli_gemmpb_cntl_create(), with the
|
||
body of bli_gemm_cntl_create() replaced with a call to the former.
|
||
- Defined bli_cntl_free_w_thrinfo(), bli_cntl_free_wo_thrinfo(). Now,
|
||
bli_cntl_free() can check if the thread parameter is NULL, and if so,
|
||
call the latter, and otherwise call the former.
|
||
- Defined bli_gemm1mbp_cntx_init(), bli_gemm1mpb_cntx_init(), both in
|
||
terms of bli_gemm1mxx_cntx_init(), which behaves the same as
|
||
bli_gemm1m_cntx_init() did before, except that an extra bool parameter
|
||
(is_pb) is used to support both bp and pb algorithms (including to
|
||
support the anti-preference field described below).
|
||
- Added support for "anti-preference" in context. The anti_pref field,
|
||
when true, will toggle the boolean return value of routines such as
|
||
bli_cntx_l3_ukr_eff_prefers_storage_of(), which has the net effect of
|
||
causing BLIS to transpose the operation to achieve disagreement (rather
|
||
than agreement) between the storage of C and the micro-kernel output
|
||
preference. This disagreement is needed for panel-block implementations,
|
||
since they induce a transposition of the suboperation immediately before
|
||
the macro-kernel is called, which changes the apparent storage of C. For
|
||
now, anti-preference is used only with the pb algorithm for 1m (and not
|
||
with any other non-1m implementation).
|
||
- Defined new functions,
|
||
bli_cntx_l3_ukr_eff_prefers_storage_of()
|
||
bli_cntx_l3_ukr_eff_dislikes_storage_of()
|
||
bli_cntx_l3_nat_ukr_eff_prefers_storage_of()
|
||
bli_cntx_l3_nat_ukr_eff_dislikes_storage_of()
|
||
which are identical to their non-"eff" (effectively) counterparts except
|
||
that they take the anti-preference field of the context into account.
|
||
- Explicitly initialize the anti-pref field to FALSE in
|
||
bli_gks_cntx_set_l3_nat_ukr_prefs().
|
||
- Added bli_gemm_ker_var1.c, which implements a panel-block macro-kernel
|
||
in terms of the existing block-panel macro-kernel _ker_var2(). This
|
||
technique requires inducing transposes on all operands and swapping
|
||
the A and B.
|
||
- Changed bli_obj_induce_trans() macro so that pack-related fields are
|
||
also changed to reflect the induced transposition.
|
||
- Added a temporary hack to bli_l3_3m4m1m_oapi.c that allows us to easily
|
||
specify the 1m algorithm (block-panel or panel-block).
|
||
- Renamed the following cntx_t-related macros:
|
||
bli_cntx_get_pack_schema_a() -> bli_cntx_get_pack_schema_a_block()
|
||
bli_cntx_get_pack_schema_b() -> bli_cntx_get_pack_schema_b_panel()
|
||
bli_cntx_get_pack_schema_c() -> bli_cntx_get_pack_schema_c_panel()
|
||
and updated all instantiations. Also updated the field names in the
|
||
cntx_t struct.
|
||
- Comment updates.
|
||
|
||
commit 41595e98eedaf3f1f93802c14dcae490402f933f
|
||
Merge: d625c49e a6ab91bc
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Wed Dec 7 15:13:21 2016 +0530
|
||
|
||
Merge master code as on 2016_12_07 to amd-staging
|
||
|
||
Change-Id: I5d9ecef9bff960aeb9b51ca4e4b21714e789e44f
|
||
|
||
commit d625c49e20bd3c50d6d44e330e34076cced114a3
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Tue Nov 29 15:05:19 2016 +0530
|
||
|
||
checked-in SGEMMTRSM microkernel for Zen
|
||
|
||
Change-Id: Ib61936418dea911b2154aa99f703b66e9669f94f
|
||
|
||
commit a6ab91bc61432490fadf18d596de4589645f37dd
|
||
Merge: 145a551d 7f31a630
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 30 09:26:58 2016 -0600
|
||
|
||
Merge pull request #111 from figual/master
|
||
|
||
Fixed missing cntx argument in ARMv8 microkernels.
|
||
|
||
commit 7f31a6307b7bd35f913c895947552c3a176f789b
|
||
Author: Francisco Igual <figual@ucm.es>
|
||
Date: Sun Nov 27 14:40:47 2016 +0100
|
||
|
||
Fixed missing cntx argument in ARMv8 microkernels.
|
||
|
||
commit 126482a3b609b9ad7026ba348f6c4bf6a29be8a1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 25 18:29:49 2016 -0600
|
||
|
||
Implemented the 1m method.
|
||
|
||
Details:
|
||
- Implemented the 1m method for inducing complex domain matrix
|
||
multiplication. 1m support has been added to all level-3 operations,
|
||
including trsm, and is now the default induced method when native
|
||
complex domain gemm microkernels are omitted from the configuration.
|
||
- Updated _cntx_init() operations to take a datatype parameter. This was
|
||
needed for the corresponding function for 1m (because 1m requires us
|
||
to choose between column-oriented or row-oriented execution, which
|
||
requires us to query the context for the storage preference of the
|
||
gemm microkernel, which requires knowing the datatype) but I decided
|
||
that it made sense for consistency to add the parameter to all other
|
||
cntx initialization functions as well, even though those functions
|
||
don't use the parameter.
|
||
- Updated bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs() to take
|
||
a second scalar for each blocksize entry. The semantic meaning of the
|
||
two scalars now is that the first will scale the default blocksize
|
||
while the second will scale the maximum blocksize. This allows scaling
|
||
the two independently, and was needed to support 1m, which requires
|
||
scaling for a register blocksize but not the register storage
|
||
blocksize (ie: "packdim") analogue.
|
||
- Deprecated bli_blksz_reduce_dt_to() and defined two new functions,
|
||
bli_blksz_reduce_def_to() and bli_blksz_reduce_max_to(), for reducing
|
||
default and maximum blocksizes to some desired blocksize multiple.
|
||
These functions are needed in the updated definitions of
|
||
bli_cntx_set_blkszs() and bli_gks_cntx_set_blkszs().
|
||
- Added support for the 1e and 1r packing schemas to packm, including
|
||
1e/1r packing kernels.
|
||
- Added a minor optimization to bli_gemm_ker_var2() that allows, under
|
||
certain circumstances (specifically, real domain beta and row- or
|
||
column-stored matrix C), the real domain macrokernel and microkernel
|
||
to be called directly, rather than using the virtual microkernel
|
||
via the complex domain macrokernel, which carries a slight additional
|
||
amount of overhead.
|
||
- Added 1m support to the testsuite.
|
||
- Added 1m support to Makefile and runme.sh in test/3m4m. Also simplified
|
||
some code in test_gemm.c driver.
|
||
|
||
commit d8f13beeea90338e0ecb0a3aeaa2d59d8ebd6c36
|
||
Merge: c25a9205 145a551d
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Fri Nov 25 17:31:08 2016 +0530
|
||
|
||
Merge master code till 2016_11_25 to amd-staging
|
||
|
||
commit c25a9205fd8c8d8de7fd81b1e5621e7ac79f4e87
|
||
Merge: 65298762 bdc0a264
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Fri Nov 25 17:06:36 2016 +0530
|
||
|
||
Merge master code till Switched to simpler trsm_r 2016_11_25 to amd-staging
|
||
|
||
Change-Id: Ibf71d224d8fb6cf0bc497f84d50c27d276512cc1
|
||
|
||
commit 145a551d524ae5492667a05fc248923d922df850
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 23 17:59:06 2016 -0600
|
||
|
||
Switched to simpler trsm_r implementation.
|
||
|
||
Details:
|
||
- Disabled the implementation of trsm_r that allows the right-hand matrix
|
||
B to be trianglar, and switched to the implementation that simply
|
||
transposes the operation (and thus the storage of C) in order to recast
|
||
the operation as trsm_l. This avoids the need to use trsm_rl and trsm_ru
|
||
macrokernels, which require an awkward swapping of MR and NR. For now,
|
||
the support for trsm_r macrokernels, via separate control trees, remains.
|
||
- Modified bli_config_macro_defs.h so that BLIS_RELAX_MCNR_NCMR_CONSTRAINTS
|
||
is defined by default. This is mostly a safety precaution in case someone
|
||
tries to switch back to the previous trsm_r implementation, but also
|
||
serves as a convenience on some systems where one does not naturally
|
||
choose blocksizes in a way that satisfies MC % NR = 0 and NC % MR = 0.
|
||
|
||
commit b3e58ee30307cf1e11529f2113acb9abbeda25af
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 23 17:58:26 2016 -0600
|
||
|
||
Reimplemented 4x12 haswell ukernels (real only).
|
||
|
||
Details:
|
||
- Replaced permutation-based implementations in bli_gemm_asm_d4x12.c, which
|
||
defines 4x24 single real and 4x12 double real gemm microkernels, with
|
||
broadcast-based implementations. (The previous microkernel file has been
|
||
moved to an 'old' subdirectory.)
|
||
|
||
commit 65298762ff15c45e8588e0c279a9feaa98c927a0
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Tue Nov 22 12:15:33 2016 +0530
|
||
|
||
removed a redundant copy operation in DNRM2
|
||
|
||
Change-Id: I673b08efde4480e871779716f7715566740ad9ce
|
||
|
||
commit d6863e851adeef037e4d1476fe63bb293fb9d987
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Mon Nov 21 11:30:30 2016 +0530
|
||
|
||
checked-in DNRM2 optimizations
|
||
|
||
Change-Id: I3b31d768bd7f4fbf43042aa5a0762995c73c4522
|
||
|
||
commit bdc0a264d2fb5940bfd09298b1de823674a39053
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 16 14:13:08 2016 -0600
|
||
|
||
Adjusted stride selection of ct in macrokernels.
|
||
|
||
Details:
|
||
- Updated the changes introduced in 618f433 so that the strides of the
|
||
temporary microtile ct used in the macrokernels is determined based
|
||
on the storage preference of the microkernel (via the new functions
|
||
below), rather than the strides of c. In almost all cases, presently,
|
||
this change results in no net effect, as a high-level optimization
|
||
in the _front() functions aligns the storage of c to that of the
|
||
microkernel's preference. However, I encountered some cases where
|
||
this is not always the case in some development code that has yet
|
||
to be committed, and therefore I'm generalizing the framework code
|
||
in advance.
|
||
- Defined two new functions in bli_cntx.c:
|
||
bli_cntx_l3_ukr_prefers_rows_dt()
|
||
bli_cntx_l3_ukr_prefers_cols_dt()
|
||
which return bool_t's based on the current micro-kernel's storage
|
||
preferences. For induced methods, the preference of the underlying
|
||
real domain microkernel is returned.
|
||
- Updated definition of bli_cntx_l3_ukr_dislikes_storage_of(), and
|
||
by proxy bli_cntx_l3_ukr_prefers_storage_of(), to be in terms of
|
||
the above functions, rather than querying the preferences of the
|
||
native microkernel directly (which did the wrong thing for induced
|
||
methods).
|
||
|
||
commit 031978d2647cf08316858baf29c84ebba9c3133e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 16 14:04:33 2016 -0600
|
||
|
||
Fixed inactive trsm_r blocksize constraint code.
|
||
|
||
Details:
|
||
- Changed a cpp macro that was meant to prevent using certain trsm_r code
|
||
if BLIS_RELAX_MCNR_NCMR_CONSTRAINTS was defined. It was actually coded
|
||
incorrectly at first. I've now fixed its location and changed its
|
||
consequence to a compile-time #error message.
|
||
|
||
commit 9772218cae57d55c252595b01e3669d8bed84944
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Wed Nov 16 15:19:19 2016 +0530
|
||
|
||
Added optimized DAMAX routines for Zen
|
||
|
||
Change-Id: I499c0c8f0f4ce6c19235c47b86d5608db6ba50f8
|
||
|
||
commit 9c448e30174e5eb76a94b43b30819704a5dfcb3f
|
||
Merge: 998d8240 e35d3c23
|
||
Author: Santanu Thangaraj <Santanu.Thangaraj@amd.com>
|
||
Date: Wed Nov 16 04:18:57 2016 -0500
|
||
|
||
Merge "Added new optimized micro-kernel for dotxv routine" into amd-staging
|
||
|
||
commit 998d824044adac0d54c921dcd44fb58f3d54aad2
|
||
Merge: 0d13e9a4 6b5a4032
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Wed Nov 16 14:22:42 2016 +0530
|
||
|
||
Merge master code till devinamatthews/omp_num_thrds 2016_11_16 to amd-staging
|
||
|
||
Change-Id: I601ff1d3ec8a680e1be039ffc7b299744e8a27c5
|
||
|
||
commit 6b5a4032d2e3ed29a272c7f738b7e3ed6657e556
|
||
Merge: 3b524a08 a8220e3a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Nov 10 15:28:24 2016 -0600
|
||
|
||
Merge pull request #109 from devinamatthews/omp_num_threads
|
||
|
||
Add automatic loop thread assignment.
|
||
|
||
commit a8220e3a86433b5d76789e32ea7ca014a11b6d17
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Nov 10 14:19:34 2016 -0600
|
||
|
||
- Fix typo in bli_cntx.c
|
||
- Bump BLIS_DEFAULT_NR_THREAD_MAX to 4
|
||
|
||
commit e35d3c23f28784e50ee13d2e77a69d60e0c24c1f
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Thu Nov 10 14:30:53 2016 +0530
|
||
|
||
Added new optimized micro-kernel for dotxv routine
|
||
|
||
Change-Id: I2c544e9b25a454d971ad690353502a55cd668391
|
||
|
||
commit 0d13e9a4f6f2fcda08f205215240cdf86442d6c6
|
||
Merge: e044fa62 3b524a08
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Nov 7 14:40:41 2016 +0530
|
||
|
||
bli_kernel.h
|
||
|
||
Change-Id: I425d089f79497a0de7d1622e829c3ca9edf7f091
|
||
|
||
commit c05b3862f6241486442b313eff0c8bee7b5e1274
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Nov 4 15:48:02 2016 -0500
|
||
|
||
Add automatic loop thread assignment.
|
||
|
||
- Number of threads is determined by BLIS_NUM_THREADS or OMP_NUM_THREADS, but can be overridden by BLIS_XX_NT as before.
|
||
- Threads are assigned to loops (ic, jc, ir, and jc) automatically by weighted partitioning and heuristics, both of which are tunable via bli_kernel.h.
|
||
- All level-3 BLAS covered.
|
||
|
||
commit 3b524a08e3fb8380e7b8b2ba835312c51a331570
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 2 17:45:18 2016 -0500
|
||
|
||
Consolidated 3m1/4m1 gemmtrsm, trsm ukernel code.
|
||
|
||
Details:
|
||
- Consolidated the macros that define the lower and upper versions of the
|
||
gemmtrsm microkernels into a single macro that is instantiated twice.
|
||
Did this for both 3m1 and 4m1 microkernels.
|
||
- Consolidated lower and upper versions of the trsm microkernels for 3m1
|
||
and 4m1 into single files (each).
|
||
|
||
commit ead231aca635deb3db270f118454e4222c627f31
|
||
Merge: d25e6f8b 62987f60
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 2 13:03:50 2016 -0500
|
||
|
||
Merge pull request #108 from devinamatthews/patch-2
|
||
|
||
Update .travis.yml with additional tests
|
||
|
||
commit 62987f60a6a6ff0a75b31d0404f493593ce35ccc
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Nov 2 11:20:37 2016 -0500
|
||
|
||
Allow KNL to fail
|
||
|
||
commit 8f9010542c751ae3cbfe6121cb011d8985c1e00d
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Nov 2 11:18:32 2016 -0500
|
||
|
||
Fix some problems with OSX builds:
|
||
|
||
- Update CPU detection for Intel archs (esp. Skylake)
|
||
- Allow clang for the reference config
|
||
|
||
commit d25e6f8b63c57f30b8a67dffbf4995977cf9f235
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 1 14:35:15 2016 -0500
|
||
|
||
Can disable trsm_r-specific blocksize constraints.
|
||
|
||
Details:
|
||
- Added cpp guards around the constraints in bli_kernel_macro_defs.h
|
||
that enforce MC % NR = 0 and NC % MR = 0. These constraints are ONLY
|
||
needed when handling right-side trsm by allowing the matrix on the
|
||
right (matrix B) to be triangular, because it involves swapping
|
||
register, but not cache, blocksizes (packing A by NR and B by MR)
|
||
and then swapping the operands to gemmtrsm just before that kernel
|
||
is called. It may be useful to disable these constraints if, for
|
||
example, the developer wishes to test the configuration with
|
||
a different set of cache blocksizes where only MC % MR = 0 and
|
||
NC % NR = 0 are enforced.
|
||
- In summary, #defining BLIS_RELAX_MCNR_NCMR_CONSTRAINTS will bypass
|
||
the enforcement of MC % NR = 0 and NC % MR = 0.
|
||
|
||
commit 1a67e3688edb073a9d44c160e7b0798e08796b8a
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Nov 1 13:53:18 2016 -0500
|
||
|
||
Bogus commit
|
||
|
||
Need to trigger another Travis build.
|
||
|
||
commit 2cd82d67b372cad1bed50cfd99e524f1f40b4e24
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Nov 1 13:25:50 2016 -0500
|
||
|
||
Some fixes for .travis.yml
|
||
|
||
- Switch to gcc-5 to support knl
|
||
- Don't run tests in parallel -- it is super slow.
|
||
- Use clang on OSX since gcc is only a zombie husk.
|
||
|
||
commit a3db4e6bdfe745083acf704ab0f51f74ea869538
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Nov 1 10:33:18 2016 -0500
|
||
|
||
Update .travis.yml with additional tests
|
||
|
||
- Test knl configuration (without running of course).
|
||
- Test openmp and pthreads threading for auto configuration with 4 threads.
|
||
- Test auto configuration with and without pthreads on OSX.
|
||
- Also, run make in parallel.
|
||
|
||
I don't know how the `addons:` section works on OSX; hopefully it is just ignored.
|
||
|
||
commit 8a11a2174a1a5b9426f13bbc5338dc86ab138cdd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 31 19:07:55 2016 -0500
|
||
|
||
Updates to non-default haswell microkernels.
|
||
|
||
Details:
|
||
- Updated s and d microkernels in bli_gemm_asm_d8x6.c to relax alignment
|
||
constraints.
|
||
- Added missing c and z microkernels, which are based on the corresponding
|
||
kernels in the d6x8 set.
|
||
- This completes the d8x6 set (which may be used for situations when it
|
||
is desirable to have a microkernel with a column preference).
|
||
|
||
commit 618f4331eba209803ecab99747872eceb1b5f091
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 31 14:40:51 2016 -0500
|
||
|
||
Align strides of ct in macrokernels to that of c.
|
||
|
||
Details:
|
||
- Previously, rs_ct and cs_ct, the strides of the temporary microtile used
|
||
primarily in the macrokernels' edge case handling, were unconditionally
|
||
set to 1 and MR, respectively. However, Devin Matthews noted that this
|
||
ought to be changed so that the strides of ct were in agreement with the
|
||
strides of C. (That is, if C was row-stored, then ct should be accessed
|
||
as by rows as well.) The implicit assumption is that the strides of C
|
||
have already been adjusted, via induced transposition, if the storage
|
||
preference of the microkernel is at odds with the storage of C. So, if
|
||
the microkernel prefers row storage, the macrokernel's interior cases
|
||
would present row-stored (ideal) microkernel subproblems to the
|
||
microkernel, but for edge cases, it would still see column-stored
|
||
subproblems (not ideal). This commit fixes this issue. Thanks to Devin
|
||
for his suggestion.
|
||
|
||
commit c2c91e09b4893cb81314774557f728a95080f81e
|
||
Author: Jeff Hammond <jeff.science@gmail.com>
|
||
Date: Tue Oct 25 21:15:26 2016 -0700
|
||
|
||
never use libm with Intel compilers
|
||
|
||
Intel compilers include a highly optimized math library (libimf) that
|
||
should be used instead of GNU libm.
|
||
|
||
yes, this change is for ALL targets, including those that are not
|
||
supported by the Intel compiler. there is no harm in doing this, and it
|
||
is future-proof in the event that the Intel compilers support other
|
||
architectures.
|
||
|
||
commit 630391002325a589063aec2ab0a7d89ef2e178c0
|
||
Merge: 956b3edf 216206c1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Oct 25 19:34:51 2016 -0500
|
||
|
||
Merge pull request #105 from devinamatthews/knl
|
||
|
||
Support for Intel Knight's Landing.
|
||
|
||
commit 216206c1d328a865c2192e35a4df6e9aff79a85b
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Oct 25 13:56:18 2016 -0500
|
||
|
||
Fix up for merge to master.
|
||
|
||
commit 11eb7957abbcdf02d5e312898e094260eadb1209
|
||
Merge: cd5b6681 956b3edf
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Oct 25 13:51:07 2016 -0500
|
||
|
||
Merge branch 'master' into knl
|
||
|
||
# Conflicts:
|
||
# frame/thread/bli_thread.h
|
||
|
||
commit cd5b6681838899283cd94e5427dfda206e7fbabe
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Oct 25 13:49:27 2016 -0500
|
||
|
||
Don't use %rbp in KNL packing kernels.
|
||
|
||
commit 956b3edf8eb09480f31f2e861c1b10f9ecbb2e52
|
||
Merge: b7e41d71 0662a3c1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Oct 25 13:02:57 2016 -0500
|
||
|
||
Merge pull request #104 from devinamatthews/misspellings
|
||
|
||
Add flexible options for thread model (pthread/posix for pthreads etc.).
|
||
|
||
commit 0662a3c1b1f4644a86bf8e5073d1391808c91b4a
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Oct 25 12:42:44 2016 -0500
|
||
|
||
Add flexible options for thread model (pthread/posix for pthreads etc.).
|
||
|
||
commit e044fa624008c161de32a39d734cddf1dd22dd41
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Tue Oct 25 13:03:05 2016 +0530
|
||
|
||
Changed double precision trsm kernel macro definition to bli_dtrsm_l_int_6x8 from 6x16 : it fixes the seg fault
|
||
|
||
Change-Id: Ia8c1de5fe13a370d691570a50136d55ffb18908a
|
||
|
||
commit b3ed4933aa0da72ad771fb0fdf1727e5ba9ad7b4
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Tue Oct 25 13:03:05 2016 +0530
|
||
|
||
Changed double precision trsm kernel macro definition to bli_dtrsm_l_int_6x8 from 6x16 : it fixes the seg fault
|
||
|
||
Change-Id: Ia8c1de5fe13a370d691570a50136d55ffb18908a
|
||
|
||
commit b7e41d71b07d2af6d22d632c70e0c5f7ce46852c
|
||
Merge: 4bd905bd 5117d444
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 24 16:47:46 2016 -0500
|
||
|
||
Merge pull request #103 from devinamatthews/patch-1
|
||
|
||
Change .align to .p2align in Bulldozer ukernels.
|
||
|
||
commit 5117d444f7f3a2bc327f067926eaf2398212edda
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Oct 24 16:20:47 2016 -0500
|
||
|
||
Change .align to .p2align in Bulldozer ukernels
|
||
|
||
Apparently OSX doesn't allow .align directives for >16B, so I've changed these to their .p2align counterparts.
|
||
|
||
commit 4bd905bd4597e0ad7bedf31e25e779d3e2dfda29
|
||
Merge: 936d5fdc 7f32dd57
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 21 14:48:44 2016 -0500
|
||
|
||
Merge pull request #93 from ShadenSmith/config_check
|
||
|
||
Adds sanity check to configuration choice.
|
||
|
||
commit 936d5fdc26c6c4dab199a8d11fde948975cfa1d6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 21 14:34:27 2016 -0500
|
||
|
||
Fixed multithreading compilation bug in 970745a.
|
||
|
||
Details:
|
||
- Moved the definition of the cpp macro BLIS_ENABLE_MULTITHREADING
|
||
from bli_thread.h to bli_config_macro_defs.h. Also moved the
|
||
sanity check that OpenMP and POSIX threads are not both enabled.
|
||
- Thanks to Krzysztof Drewniak for reporting this bug.
|
||
|
||
commit d250e6a3af3af8beedcda28f508ac03e94efb3c8
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Thu Oct 20 14:34:39 2016 +0530
|
||
|
||
Merged TRSM and scalv routines into zen folder
|
||
|
||
Change-Id: Ice897bc83e8fb70b90f23cc3ce892c39883aceb9
|
||
|
||
commit 8feb0f85a674e84bec2417486e3bcea584b14c04
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 19 16:05:41 2016 -0500
|
||
|
||
Removed auto-prototyping of malloc()/free() substitutes.
|
||
|
||
Details:
|
||
- Removed the header file, bli_malloc_prototypes.h, which automatically
|
||
generated prototypes for the functions specified by the following
|
||
cpp macros:
|
||
BLIS_MALLOC_INTL
|
||
BLIS_FREE_INTL
|
||
BLIS_MALLOC_POOL
|
||
BLIS_FREE_POOL
|
||
BLIS_MALLOC_USER
|
||
BLIS_FREE_USER
|
||
These prototypes were originally provided primarily as a convenience
|
||
to those developers who specified their own malloc()/free() substitutes
|
||
for one or more of the following. However, we generated these prototypes
|
||
regardless, even when the default values (malloc and free) of the
|
||
macros above were used. A problem arose under certain circumstances
|
||
(e.g., gcc in C++ mode on Linux with glibc) when including blis.h that
|
||
stemmed from the "throw" specification which was added to the glibc's
|
||
malloc() prototype, resulting in a prototype mismatch. Therefore, going
|
||
forward, developers who specify their own custom malloc()/free()
|
||
substitutes must also prototype those substitutes via bli_kernel.h.
|
||
Thanks to Krzysztof Drewniak for reporting this bug, and Devin Matthews
|
||
for researching the nature and potential solutions.
|
||
|
||
commit 970745a5fc7c29de3e202988e5eb104fabca4fdc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 19 15:58:03 2016 -0500
|
||
|
||
Reorganized typedefs to avoid compiler warnings.
|
||
|
||
Details:
|
||
- Relocated membrk_t definition from bli_membrk.h to bli_type_defs.h.
|
||
- Moved #include of bli_malloc.h from blis.h to bli_type_defs.h.
|
||
- Removed standalone mtx_t and mutex_t typedefs in bli_type_defs.h.
|
||
- Moved #include of bli_mutex.h from bli_thread.h to bli_typedefs.h.
|
||
- The redundant typedefs of membrk_t and mtx_t caused a warning on some C
|
||
compilers. Thanks to Tyler Smith for reporting this issue.
|
||
|
||
commit 1c2f7b57d557c05f5ef6148cccafaf0f70d910da
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Tue Oct 18 15:06:35 2016 +0530
|
||
|
||
Removed symlinks to zen kernels from haswell kernel folder and also modified the bli_kernel.h file accordingly
|
||
|
||
Change-Id: Ib3736af48e851c8243bbe10d937fb942c49ad048
|
||
|
||
commit d864ea9f4f039fe2b2dc395d0015bd9e8902bc8e
|
||
Merge: 7045fcbf 28b2af8a
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Fri Oct 14 17:00:57 2016 +0530
|
||
|
||
Merge master code 2016_10_14 till Added disabled code thrinfo_t structures
|
||
|
||
Change-Id: If7db98d286c1471fcd30f00757abee9b253ef987
|
||
|
||
commit 28b2af8a71133ce68774e153b6e05afb05affba8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 13 14:50:08 2016 -0500
|
||
|
||
Added disabled code to print thrinfo_t structures.
|
||
|
||
Details:
|
||
- Added cpp-guarded code to bli_thrcomm_openmp.c that allows a curious
|
||
developer to print the contents of the thrinfo_t structures of each
|
||
thread, for verification purposes or just to study the way thread
|
||
information and communicators are used in BLIS.
|
||
- Enabled some previously-disabled code in bli_l3_thrinfo.c for freeing
|
||
an array of thrinfo_t* values that is used in the new, cpp-guarde code
|
||
mentioned above.
|
||
- Removed some old commented lines from bli_gemm_front.c.
|
||
|
||
commit 11eed3f683d09e65f721567b346b0f733bff9a64
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 13 14:23:23 2016 -0500
|
||
|
||
Fixed a configure -t omp/openmp bug from fd04869.
|
||
|
||
Details:
|
||
- Forgot to update certain occurrences of "omp" in common.mk during
|
||
commit fd04869, which changed the preferred configure option string
|
||
for enabling OpenMP from "omp" to "openmp".
|
||
|
||
commit 7045fcbf0bd349ebe6cb9ac4508c6a387bb05966
|
||
Merge: 7e044900 9cda6057
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Oct 13 12:02:28 2016 +0530
|
||
|
||
Merge master code 2016_10_13 Removed previously renamed/old files
|
||
|
||
Change-Id: I8106d371afaa0af474a8967388d44481b05de923
|
||
|
||
commit 7e04490002206d3557fcfb7dd893838a7f36916f
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Wed Oct 12 16:43:02 2016 +0530
|
||
|
||
Checked in the SAMAX optimizations
|
||
|
||
Change-Id: I7faf8c3adf52ff01432188ad3b9866ee4b9a9dfd
|
||
|
||
commit 9cda6057eaa16a24ac8785a9fa167df6c9edba44
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Oct 11 13:21:26 2016 -0500
|
||
|
||
Removed previously renamed/old files.
|
||
|
||
Details:
|
||
- Removed frame/base/bli_mem.c and frame/include/bli_auxinfo_macro_defs.h,
|
||
both of which were renamed/removed in 701b9aa. For some reason, these
|
||
files survived when the compose branch was merged back into master.
|
||
(Clearly, git's merging algorithm is not perfect.)
|
||
- Removed frame/base/bli_mem.c.prev (an artifact of the long-ago changed
|
||
memory allocator that I was keeping around for no particular reason).
|
||
|
||
commit 22377abd84b9e560ffe1c4e4d284eb443ddb7133
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 10 13:43:56 2016 -0500
|
||
|
||
Fixed bli_gemm() segfault on empty C matrices.
|
||
|
||
Details:
|
||
- Fixed a bug that would manifest in the form of a segmentation fault
|
||
in bli_cntl_free() when calling any level-3 operation on an empty
|
||
output matrix (ie: m = n = 0). Specifically, the code previously
|
||
assumed that the entire control tree was built prior to it being
|
||
freed. However, if the level-3 operation performs an early exit, the
|
||
control tree will be incomplete, and this scenario is now handled.
|
||
Thanks to Elmar Peise for reporting this bug.
|
||
|
||
commit 0b571cd94d9b175331c9453258a6b1389a718ae8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 6 14:48:15 2016 -0500
|
||
|
||
Fixed segfault in bli_free_align() for NULL ptrs.
|
||
|
||
Details:
|
||
- Fixed a bug in bli_free_align() caused by failing to handle NULL pointers
|
||
up-front, which led to performing pointer arithmetic on NULL pointers in
|
||
order to free the address immediately before the pointer. Thanks to Devin
|
||
Matthews for reporting this bug.
|
||
|
||
commit cd84fb95182514601d72c78ee0e36a394d0284d7
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Oct 6 15:08:21 2016 +0530
|
||
|
||
syntax erros in configure file
|
||
|
||
Change-Id: Ibe8a6071aad97df550df64c009fec33a9d8f43a1
|
||
|
||
commit f2e7ea113aa93b74f1d42408d5db2c5a7b00a653
|
||
Merge: 133983c3 86969873
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Oct 6 12:35:30 2016 +0530
|
||
|
||
conflicts merge for bli_kernel.h
|
||
|
||
Change-Id: I15d846bd34e11f86ebfd7ed091ff671a1f3366a0
|
||
|
||
commit 133983c36fa01c7acb6d666b3744f77f216314a5
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Thu Oct 6 11:26:22 2016 +0530
|
||
|
||
code clean up in bli_kernel.h
|
||
|
||
Change-Id: I11d9cdf2af8e8199209eb084f6c3a7c910b83d5d
|
||
|
||
commit 4fb9b4ef2e4cf2626a6e000a41628fb823f16da8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 5 14:41:35 2016 -0500
|
||
|
||
CHANGELOG update (0.2.1)
|
||
|
||
commit 866b2dde3f41760121115fb25f096d4344e8b4f9 (tag: 0.2.1)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 5 14:41:34 2016 -0500
|
||
|
||
Version file update (0.2.1)
|
||
|
||
commit 87fddeab3c8a5ccb1bbf02e5f89db1464e459ba9
|
||
Merge: 86969873 6f71cd34
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 5 13:35:01 2016 -0500
|
||
|
||
Merge branch 'compose'
|
||
|
||
commit 6f71cd344951854e4cff9ea21bbdfe536e72611d (origin/compose)
|
||
Merge: c0630c40 8d55033c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Oct 4 15:53:46 2016 -0500
|
||
|
||
Merge pull request #94 from flame/distcomm
|
||
|
||
Implemented distributed thrinfo_t management.
|
||
|
||
commit 86969873b5b861966d717d8f9f370af39e3d9de6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Oct 4 14:24:59 2016 -0500
|
||
|
||
Reclassified amaxv operation as a level-1v kernel.
|
||
|
||
Details:
|
||
- Moved amaxv from being a utility operation to being a level-1v operation.
|
||
This includes the establishment of a new amaxv kernel to live beside all
|
||
of the other level-1v kernels.
|
||
- Added two new functions to bli_part.c:
|
||
bli_acquire_mij()
|
||
bli_acquire_vi()
|
||
The first acquires a scalar object for the (i,j) element of a matrix,
|
||
and the second acquires a scalar object for the ith element of a vector.
|
||
- Added integer support to bli_getsc level-0 operation. This involved
|
||
adding integer support to the bli_*gets level-0 scalar macros.
|
||
- Added a new test module to test amaxv as a level-1v operation. The test
|
||
module works by comparing the value identified by bli_amaxv() to the
|
||
the value found from a reference-like code local to the test module
|
||
source file. In other words, it (intentionally) does not guarantee the
|
||
same index is found; only the same value. This allows for different
|
||
implementations in the case where a vector contains two or more elements
|
||
containing exactly the same floating point value (or values, in the case
|
||
of the complex domain).
|
||
- Removed the directory frame/include/old/.
|
||
|
||
commit 8d55033c966feed99fcca2a58017c3ab5b1646dc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 27 15:20:58 2016 -0500
|
||
|
||
Implemented distributed thrinfo_t management.
|
||
|
||
Details:
|
||
- Implemented Ricardo Magana's distributed thread info/communicator
|
||
management. Rather that fully construct the thrinfo_t structures, from
|
||
root to leaf, prior to spawning threads, the threads individually
|
||
construct their thrinfo_t trees (or, chains), and do so incrementally,
|
||
as needed, reusing the same structure nodes during subsequent blocked
|
||
variant iterations. This required moving the initial creation of the
|
||
thrinfo_t structure (now, the root nodes) from the _front() functions
|
||
to the bli_l3_thread_decorator(). The incremental "growing" of the tree
|
||
is performed in the internal back-end (ie: _int()) function, and so
|
||
mostly invisible. Also, the incremental growth of the thrinfo_t tree is
|
||
done as a function of the current and parent control tree nodes (as well
|
||
as the parent thrinfo_t node), further reinforcing the parallel
|
||
relationship between the two data structures.
|
||
- Removed the "inner" communicator from thrinfo_t structure definition,
|
||
as well as its id. Changed all APIs accordingly. Renamed
|
||
bli_thrinfo_needs_free_comms() to bli_thrinfo_needs_free_comm().
|
||
- Defined bli_l3_thrinfo_print_paths(), which prints the information
|
||
in an array of thrinfo_t* structure pointers. (Used only as a
|
||
debugging/verification tool.)
|
||
- Deprecated the following thrinfo_t creation functions:
|
||
bli_packm_thrinfo_create()
|
||
bli_l3_thrinfo_create()
|
||
because they are no longer used. bli_thrinfo_create() is now called
|
||
directly when creating thrinfo_t nodes.
|
||
|
||
commit fd04869ae4d4a3b0ebb9052557c296456bce7c0d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 27 14:14:11 2016 -0500
|
||
|
||
Changed configure's 'omp' threading to 'openmp'.
|
||
|
||
Details:
|
||
- Changed the configure script so that the expected string argument to the
|
||
-t (or --enable-threading=) option that enables OpenMP multithreading is
|
||
'openmp'. The previous expected string, 'omp', is still supported but
|
||
should be considered deprecated.
|
||
|
||
commit 9424af87209e4e435e2e742430945152690170b0
|
||
Merge: efa7341d c0630c40
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 27 12:51:08 2016 -0500
|
||
|
||
Merge branch 'compose'
|
||
|
||
commit 7f32dd57c6bd41c0704341752842277dd6a4c8eb
|
||
Author: Shaden Smith <shaden@cs.umn.edu>
|
||
Date: Sat Sep 17 11:33:57 2016 -0500
|
||
|
||
Adds sanity check to configuration choice.
|
||
|
||
commit efa7341df0b0115926aa8a6e8a4ebfb24fdbf11e
|
||
Merge: 121c39d4 e1453f68
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Sep 16 11:01:57 2016 -0500
|
||
|
||
Merge pull request #92 from ShadenSmith/readme_fix
|
||
|
||
Fixes broken URL in README.md
|
||
|
||
commit e1453f68f6afd90ae9a29b7a5faa46aa79bbf741
|
||
Author: Shaden Smith <ShadenTSmith@gmail.com>
|
||
Date: Fri Sep 16 09:29:28 2016 -0500
|
||
|
||
Fixes broken URL in README.md
|
||
|
||
commit b922d7563422e14c49a4677bc6ae088a408861ed
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 23 13:38:36 2016 -0500
|
||
|
||
Avoid compiling BLAS/CBLAS files when disabled.
|
||
|
||
Details:
|
||
- Updated the top-level Makefile, build/config.mk.in template, and
|
||
configure script so that object files corresponding to source files
|
||
belonging to the BLAS compatibility layer are not compiled (or archived)
|
||
when the compatibility layer is disabled. (Same for CBLAS.) Thanks
|
||
to Devin Matthews for suggesting this optimization.
|
||
- Slight change to the way configure handles internal variables. Instead
|
||
of converting (overwriting) some, such as enable_blas2blis and
|
||
enable_cblas, from a "yes" or "no" to a "1" or "0" value, the latter are
|
||
now stored in new variables that live alongside the originals (with the
|
||
suffix "_01"). This is convenient since some values need to be
|
||
sed-substituted into the config.mk.in template, which requires "yes" or
|
||
"no", while some need to be written to the bli_config.h.in template,
|
||
which requires "0" or "1".
|
||
|
||
Updated BLIS4 TOMS citation in README.md.
|
||
|
||
Added complex gemm micro-kernels for haswell.
|
||
|
||
Details:
|
||
- Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
|
||
architectures. As with their real domain brethren, these kernels perfer
|
||
row storage, (though this doesn't affect most users due to high-level
|
||
optimizations in most level-3 operations that induce a transpose to
|
||
whatever storage preference the kernel may have).
|
||
|
||
Change-Id: I512ab90784ecbb7cdaee24928d2ccebb544ba5c1
|
||
|
||
commit 69826110bab2a064ec76457c24843d28f2581281
|
||
Merge: 64598ee4 a58dd35e
|
||
Author: Pradeep Rao <Pradeep.Rao@amd.com>
|
||
Date: Wed Sep 14 03:26:25 2016 -0400
|
||
|
||
Merge "Implemented trsm single precision for lower triangular matrices, files added bli_trsm_l_int_6x16.cfiles modified bli_kernel.h to enable optimized trsm microkernel and test_trsm.c is modified to test trsm single precision" into amd-staging
|
||
|
||
commit c0630c4024b08750043a2942a3e8a037aa6b6259
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 12 13:59:02 2016 -0500
|
||
|
||
Added debugging printf()'s to bli_l3_thrinfo.c.
|
||
|
||
Details:
|
||
- Added optional printf() statements to print out thread communicator
|
||
info as the thrinfo_t structure is built in bli_l3_thrinfo.c.
|
||
- Minor changes to frame/thread/bli_thrinfo.h.
|
||
|
||
commit 7b3bf1ffcd7160ccbf6c2518af6d88f6742e4977
|
||
Merge: 35509818 121c39d4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 6 15:47:13 2016 -0500
|
||
|
||
Merge branch 'master' into compose
|
||
|
||
commit 121c39d455f2db6f7ce6802ba7f73ad5e088c68c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 5 13:11:42 2016 -0500
|
||
|
||
Added complex gemm micro-kernels for haswell.
|
||
|
||
Details:
|
||
- Defined cgemm (3x8) and zgemm (3x4) micro-kernels for haswell-based
|
||
architectures. As with their real domain brethren, these kernels perfer
|
||
row storage, (though this doesn't affect most users due to high-level
|
||
optimizations in most level-3 operations that induce a transpose to
|
||
whatever storage preference the kernel may have).
|
||
|
||
commit 35509818cbea1598b123421f81c42120889a03c3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Aug 31 17:34:15 2016 -0500
|
||
|
||
Added, moved some thread barriers.
|
||
|
||
Details:
|
||
- Removed thread barriers from the end of the loop bodies of
|
||
bli_gemm_blk_var1(), bli_gemm_blk_var2(), bli_trsm_blk_var1(),
|
||
and bli_trsm_blk_var2().
|
||
- Moved the thread barrier at the end of bli_packm_int() to the
|
||
end of bli_l3_packm(), and added missing barriers to that function.
|
||
- Removed the no longer necessary (and now incorrect) ochief guard
|
||
in bli_gemm3m3_packa() on the bli_obj_scalar_reset() on C.
|
||
- Thanks to Tyler Smith for help with these changes.
|
||
|
||
commit 64598ee4cfb86f64abbd4bcef5a82ba0d5565b67
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Wed Aug 31 12:54:50 2016 +0530
|
||
|
||
fixed the symlink issue
|
||
|
||
Change-Id: I2186d529f295c576597c189e1ae219bc1a83f955
|
||
|
||
commit abd61f9fa75d77a96d1491b3e035451ee73238fe
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 30 12:34:19 2016 -0500
|
||
|
||
Updated BLIS4 TOMS citation in README.md.
|
||
|
||
commit 8a2373f26ba8fcd5b2d7b2cc72cb8b2e1f841a03
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Mon Aug 29 14:10:45 2016 +0530
|
||
|
||
Norm 2 optimization
|
||
|
||
Change-Id: Ide9decaccd20bf0ccc32c9abb6556e038dceed2b
|
||
|
||
commit fdc663902347aa252ea88cf09ce24ab748958dff
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Mon Aug 29 10:43:38 2016 +0530
|
||
|
||
Placed 1 and 1f AMD optimized AVX routines under zen folder
|
||
|
||
Change-Id: I26795211ef11d232ed794ce36dd0a9c1f8706328
|
||
|
||
commit 701b9aa3ff028decbf90efac0dca5bd64fe26269
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Aug 26 19:04:45 2016 -0500
|
||
|
||
Redesigned control tree infrastructure.
|
||
|
||
Details:
|
||
- Altered control tree node struct definitions so that all nodes have the
|
||
same struct definition, whose primary fields consist of a blocksize id,
|
||
a variant function pointer, a pointer to an optional parameter struct,
|
||
and a pointer to a (single) sub-node. This unified control tree type is
|
||
now named cntl_t.
|
||
- Changed the way control tree nodes are connected, and what computation
|
||
they represent, such that, for example, packing operations are now
|
||
associated with nodes that are "inline" in the tree, rather than off-
|
||
shoot braches. The original tree for the classic Goto gemm algorithm was
|
||
expressed (roughly) as:
|
||
|
||
blk_var2 -> blk_var3 -> blk_var1 -> ker_var2
|
||
| |
|
||
-> packb -> packa
|
||
|
||
and now, the same tree would look like:
|
||
|
||
blk_var2 -> blk_var3 -> packb -> blk_var1 -> packa -> ker_var2
|
||
|
||
Specifically, the packb and packa nodes perform their respective packing
|
||
operations and then recurse (without any loop) to a subproblem. This means
|
||
there are now two kinds of level-3 control tree nodes: partitioning and
|
||
non-partitioning. The blocked variants are members of the former, because
|
||
they iteratively partition off submatrices and perform suboperations on
|
||
those partitions, while the packing variants belong to the latter group.
|
||
(This change has the effect of allowing greatly simplified initialization
|
||
of the nodes, which previously involved setting many unused node fields to
|
||
NULL.)
|
||
- Changed the way thrinfo_t tree nodes are arranged to mirror the new
|
||
connective structure of control trees. That is, packm nodes are no longer
|
||
off-shoot branches of the main algorithmic nodes, but rather connected
|
||
"inline".
|
||
- Simplified control tree creation functions. Partitioning nodes are created
|
||
concisely with just a few fields needing initialization. By contrast, the
|
||
packing nodes require additional parameters, which are stored in a
|
||
packm-specific struct that is tracked via the optional parameters pointer
|
||
within the control tree struct. (This parameter struct must always begin
|
||
with a uint64_t that contains the byte size of the struct. This allows
|
||
us to use a generic function to recursively copy control trees.) gemm,
|
||
herk, and trmm control tree creation continues to be consolidated into
|
||
a single function, with the operation family being used to select
|
||
among the parameter-agnostic macro-kernel wrappers. A single routine,
|
||
bli_cntl_free(), is provided to free control trees recursively, whereby
|
||
the chief thread within a groups release the blocks associated with
|
||
mem_t entries back to the memory broker from which they were acquired.
|
||
- Updated internal back-ends, e.g. bli_gemm_int(), to query and call the
|
||
function pointer stored in the current control tree node (rather than
|
||
index into a local function pointer array). Before being invoked, these
|
||
function pointers are first cast to a gemm_voft (for gemm, herk, or trmm
|
||
families) or trsm_voft (for trsm family) type, which is defined in
|
||
frame/3/bli_l3_var_oft.h.
|
||
- Retired herk and trmm internal back-ends, since all execution now flows
|
||
through gemm or trsm blocked variants.
|
||
- Merged forwards- and backwards-moving variants by querying the direction
|
||
from routines as a function of the variant's matrix operands. gemm and
|
||
herk always move forward, while trmm and trsm move in a direction that
|
||
is dependent on which operand (a or b) is triangular.
|
||
- Added functions bli_thread_get_range_mdim(), bli_thread_get_range_ndim(),
|
||
each of which takes additional arguments and hides complexity in managing
|
||
the difference between the way ranges are computed for the four families
|
||
of operations.
|
||
- Simplified level-3 blocked variants according to the above changes, so that
|
||
the only steps taken are:
|
||
1. Query partitioning direction (forwards or backwards).
|
||
2. Prune unreferenced regions, if they exist.
|
||
3. Determine the thread partitioning sub-ranges.
|
||
<begin loop>
|
||
4. Determine the partitioning blocksize (passing in the partitioning
|
||
direction)
|
||
5. Acquire the curren iteration's partitions for the matrices affected
|
||
by the current variants's partitioning dimension (m, k, n).
|
||
6. Call the subproblem.
|
||
<end loop>
|
||
- Instantiate control trees once per thread, per operation invocation.
|
||
(This is a change from the previous regime in which control trees were
|
||
treated as stateless objects, initialized with the library, and shared
|
||
as read-only objects between threads.) This once-per-thread allocation
|
||
is done primarily to allow threads to use the control tree as as place
|
||
to cache certain data for use in subsequent loop iterations. Presently,
|
||
the only application of this caching is a mem_t entry for the packing
|
||
blocks checked out from the memory broker (allocator). If a non-NULL
|
||
control tree is passed in by the (expert) user, then the tree is copied
|
||
by each thread. This is done in bli_l3_thread_decorator(), in
|
||
bli_thrcomm_*.c.
|
||
- Added a new field to the context, and opid_t which tracks the "family"
|
||
of the operation being executed. For example, gemm, hemm, and symm are
|
||
all part of the gemm family, while herk, syrk, her2k, and syr2k are
|
||
all part of the herk family. Knowing the operation's family is necessary
|
||
when conditionally executing the internal (beta) scalar reset on on
|
||
C in blocked variant 3, which is needed for gemm and herk families,
|
||
but must not be performed for the trmm family (because beta has only
|
||
been applied to the current row-panel of C after the first rank-kc
|
||
iteration).
|
||
- Reexpressed 3m3 induced method blocked variant in frame/3/gemm/ind
|
||
to comform with the new control tree design, and renamed the macro-
|
||
kernel codes corresponding to 3m2 and 4m1b.
|
||
- Renamed bli_mem.c (and its APIs) to bli_memsys.c, and renamed/relocated
|
||
bli_mem_macro_defs.h from frame/include to frame/base/bli_mem.h.
|
||
- Renamed/relocated bli_auxinfo_macro_defs.h from frame/include to
|
||
frame/base/bli_auxinfo.h.
|
||
- Fixed a minor bug whereby the storage-to-ukr-preference matching
|
||
optimization in the various level-3 front-ends was not being applied
|
||
properly when the context indicated that execution would be via an
|
||
induced method. (Before, we always checked the native micro-kernel
|
||
corresponding to the datatype being executed, whereas now we check
|
||
the native micro-kernel corresponding to the datatype's real projection,
|
||
since that is the micro-kernel that is actually used by induced methods.
|
||
- Added an option to the testsuite to skip the testing of native level-3
|
||
complex implementations. Previously, it was always tested, provided that
|
||
the c/z datatypes were enabled. However, some configurations use
|
||
reference micro-kernels for complex datatypes, and testing these
|
||
implementations can slow down the testsuite considerably.
|
||
|
||
commit a58dd35ed7b5b77a6b272655d2edd7a822b8fa87
|
||
Author: Kiran Varaganti <Kiran.Varaganti@amd.com>
|
||
Date: Fri Aug 26 14:55:12 2016 +0530
|
||
|
||
Implemented trsm single precision for lower triangular matrices, files added bli_trsm_l_int_6x16.cfiles modified bli_kernel.h to enable optimized trsm microkernel and test_trsm.c is modified to test trsm single precision
|
||
|
||
Change-Id: Ibddf989f4aad577e89558673e1038cf6ece654d9
|
||
|
||
commit 73517f522b69de429dd7f3df60a70c068149ab28
|
||
Merge: c6f5c215 50293da3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 23 13:46:59 2016 -0500
|
||
|
||
Merge branch 'master' into compose
|
||
|
||
commit 50293da38d5f2b7be9bbc94b9e85aacb6a10f672
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 23 13:38:36 2016 -0500
|
||
|
||
Avoid compiling BLAS/CBLAS files when disabled.
|
||
|
||
Details:
|
||
- Updated the top-level Makefile, build/config.mk.in template, and
|
||
configure script so that object files corresponding to source files
|
||
belonging to the BLAS compatibility layer are not compiled (or archived)
|
||
when the compatibility layer is disabled. (Same for CBLAS.) Thanks
|
||
to Devin Matthews for suggesting this optimization.
|
||
- Slight change to the way configure handles internal variables. Instead
|
||
of converting (overwriting) some, such as enable_blas2blis and
|
||
enable_cblas, from a "yes" or "no" to a "1" or "0" value, the latter are
|
||
now stored in new variables that live alongside the originals (with the
|
||
suffix "_01"). This is convenient since some values need to be
|
||
sed-substituted into the config.mk.in template, which requires "yes" or
|
||
"no", while some need to be written to the bli_config.h.in template,
|
||
which requires "0" or "1".
|
||
|
||
commit 22dd6a353ddb56614309c01533b1a94c9fd32bca
|
||
Merge: cdfb3c3f f20ed388
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Aug 23 15:15:35 2016 +0530
|
||
|
||
Merge master code as on 2016_08_23 to amd-staging branch by praveeng
|
||
|
||
Changes to be committed:
|
||
modified: frame/thread/bli_mutex_openmp.h
|
||
modified: frame/thread/bli_mutex_pthreads.h
|
||
|
||
Change-Id: Ica522edbb1d0173f53f38d5057b1f7aef73666be
|
||
|
||
commit c6f5c215ee793d03ea834469fc2adc53feaffc42
|
||
Merge: d52cb767 16a4c7a8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Aug 22 17:33:02 2016 -0500
|
||
|
||
Merge branch 'master' into compose
|
||
|
||
commit f20ed3885d628992fab88690f629a5a2bab3eb88
|
||
Merge: 02ac597e 4bc842ca
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Aug 22 15:27:33 2016 +0530
|
||
|
||
Merge branch 'master' of https://github.com/clMathLibraries/blis-amd for "Fixed bugs in bli_mutex_init() and friends."
|
||
|
||
commit 02ac597e4b9be2670d9fff65d28552f8e1ec81b3
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Jul 28 15:11:08 2016 +0530
|
||
|
||
Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
|
||
|
||
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
|
||
|
||
commit 84e41cc73c9c87ce64582acd4264b8e1b5316482
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Jul 28 15:01:36 2016 +0530
|
||
|
||
Revert commits 8aee306
|
||
|
||
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
|
||
|
||
commit 30ccfcee82db93d0109d1571242e2db925e95d0a
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Jul 25 14:14:00 2016 +0530
|
||
|
||
removed changes from readme file which are giving confilcts
|
||
|
||
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
|
||
|
||
commit aeca25cd63fc8971f8fe7809599c57853f976548
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 16:51:23 2016 +0530
|
||
|
||
first commit
|
||
|
||
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
|
||
|
||
commit 6b2274864b36fd1019d97bcc4ca6dd7a57ef16d9
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 15:00:31 2016 +0530
|
||
|
||
small modification to readme for git push test
|
||
|
||
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
|
||
|
||
commit daa7a9ecb25982f2551adbd95e65f8ba97cfe944
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 16:51:23 2016 +0530
|
||
|
||
first commit
|
||
|
||
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
|
||
|
||
commit 5f66a4aa05aeffcb6eb587851d78d9527319466c
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 15:00:31 2016 +0530
|
||
|
||
small modification to readme for git push test
|
||
|
||
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
|
||
|
||
commit c6cbd78d2388c08824822b91a1c36ac4349bb67f
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Jul 28 15:11:08 2016 +0530
|
||
|
||
Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
|
||
|
||
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
|
||
|
||
commit 9219a9060762525f87ebbf556d78fe8621858513
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Jul 28 15:01:36 2016 +0530
|
||
|
||
Revert commits 8aee306
|
||
|
||
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
|
||
|
||
commit 728573296efa7cf14d2381570e116509dfe2a240
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Jul 25 14:14:00 2016 +0530
|
||
|
||
removed changes from readme file which are giving confilcts
|
||
|
||
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
|
||
|
||
commit ad7862e291c240505c733a41d231b1a126ade73c
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 16:51:23 2016 +0530
|
||
|
||
first commit
|
||
|
||
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
|
||
|
||
commit ad4b471a25ce77867295e5529dfc787e7c18b03f
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 15:00:31 2016 +0530
|
||
|
||
small modification to readme for git push test
|
||
|
||
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
|
||
|
||
commit 55d641363fcd8bdfdabbd7c22822fa2d0b7f3fa6
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 16:51:23 2016 +0530
|
||
|
||
first commit
|
||
|
||
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
|
||
|
||
commit f3b6b15f6d591d323802bd6c81c522a02056506d
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 15:00:31 2016 +0530
|
||
|
||
small modification to readme for git push test
|
||
|
||
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
|
||
|
||
commit 16a4c7a823d60707ed9272f5d36e5c5d54c0ba4b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Aug 19 11:38:36 2016 -0500
|
||
|
||
Fixed bugs in bli_mutex_init() and friends.
|
||
|
||
Details:
|
||
- Fixed a couple of bugs that affected OpenMP and POSIX threads
|
||
configurations that resulted in compiler errors and warnings due
|
||
to type mismatch, and in the case of pthreads, a missing function
|
||
argument. The bugs are fairly recent, introduced in a017062.
|
||
|
||
commit c8e4ef93953ba2b79fb7e0973c08469c0e28a2cd
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Aug 3 16:13:03 2016 -0500
|
||
|
||
Add prefetchw to 30x8 kernel.
|
||
|
||
commit 4b5a2f3d6e7ffeb5cc2be8448554f5c2083ad68f
|
||
Merge: 380736bf 9f52a587
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Aug 3 16:09:51 2016 -0500
|
||
|
||
Merge remote-tracking branch 'origin/knl' into knl
|
||
|
||
# Conflicts:
|
||
# kernels/x86_64/knl/3/bli_dgemm_opt_24x8.c
|
||
|
||
commit 380736bfe955efbdd7274c90b6fd635688e83bc4
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Aug 3 16:08:28 2016 -0500
|
||
|
||
Add (new) 30x8 KNL kernel and fix non-scatter prefetch bug.
|
||
|
||
commit 9f52a587dee855daa73c194e41b6951416544e9a
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Aug 3 16:03:53 2016 -0500
|
||
|
||
Try prefetchw[t1] instead of regular prefetch for C.
|
||
|
||
commit 8945a1512d366bc6a8a85718d12cbf5de6f2898b
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Aug 3 11:28:24 2016 -0500
|
||
|
||
This version gets ~1550 GFLOPs on KNL wuth 16x4.
|
||
|
||
commit cdfb3c3f29d321033fca106aa58ab67ead90a95d
|
||
Merge: 50a2f2ef 4bc842ca
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Fri Jul 29 12:45:04 2016 +0530
|
||
|
||
Merge master code as on 2016_07_29 to amd-staging branch by praveeng
|
||
|
||
Change-Id: Ic78b84d8b8d10158fb2a612f9a64bbc7b1f9b486
|
||
|
||
commit 4bc842ca3a64e658c0808bfe4c5693a5ace97923
|
||
Merge: 117f8838 b0d510bf
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Jul 28 17:32:12 2016 +0530
|
||
|
||
Merge branch 'master' of publicrepo
|
||
|
||
commit 117f8838511a478aa16137e770d27dd21f4227c5
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Jul 28 15:11:08 2016 +0530
|
||
|
||
Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
|
||
|
||
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
|
||
|
||
commit 2fcdc28f1055d385b2e662aa920fb97c472394d7
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Jul 28 15:01:36 2016 +0530
|
||
|
||
Revert commits 8aee306
|
||
|
||
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
|
||
|
||
commit 1b5d104afe0628b8b6c0650f1e58cfb08be67004
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Jul 25 14:14:00 2016 +0530
|
||
|
||
removed changes from readme file which are giving confilcts
|
||
|
||
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
|
||
|
||
commit d81273047bff56501e9413a90991d3d1f8b56a06
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 16:51:23 2016 +0530
|
||
|
||
first commit
|
||
|
||
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
|
||
|
||
commit 65905c3011a11cda95761681d4ae84337e46bdb5
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 15:00:31 2016 +0530
|
||
|
||
small modification to readme for git push test
|
||
|
||
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
|
||
|
||
commit 23cca231be10fe1797aed451bcbc69d38c78bc0c
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 16:51:23 2016 +0530
|
||
|
||
first commit
|
||
|
||
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
|
||
|
||
commit 922e3091702f25e3287b417719a33adbd5bbf138
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 15:00:31 2016 +0530
|
||
|
||
small modification to readme for git push test
|
||
|
||
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
|
||
|
||
commit b0d510bf0e4dfd177f9e4ae0069f41921e2ecdc1
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Jul 28 15:11:08 2016 +0530
|
||
|
||
Revert commits 357c990bdd7bd5667aac5adf1bab3712973e7414
|
||
|
||
Change-Id: I12a34456d7eed93fda4369e76bcddb42ba7ccb99
|
||
|
||
commit 5ebeece5b4a8df81d59ca7558b278a4263d15128
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Jul 28 15:01:36 2016 +0530
|
||
|
||
Revert commits 8aee306
|
||
|
||
Change-Id: I3dd999c77c6779332a40dbb84371ca487216f189
|
||
|
||
commit 6ce4c022ebdea00c2b951090e3c2e9e88735b9ce
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Jul 27 16:26:36 2016 -0500
|
||
|
||
Switch back to 24x8. I could only squeeze 24.5GFLOP out of 8x24, and scalability is not improved.
|
||
|
||
commit d52cb7671509592a8078729477b40b60380518a2
|
||
Merge: 95abea46 c31b1e7b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 27 16:04:55 2016 -0500
|
||
|
||
Merge branch 'master' into compose
|
||
|
||
commit c31b1e7b9d659b96433a87e5aecb90e457a104cc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 27 15:58:07 2016 -0500
|
||
|
||
Relax alignment restrictions for sandybridge ukrs.
|
||
|
||
Details:
|
||
- Relaxed the base pointer and leading dimension alignment restrictions
|
||
in the sandybridge gemm microkernels, allowing the use of vmovups/vmovupd
|
||
instead of vmovaps/vmovapd. These change mimic those made to the haswell
|
||
microkernels in e0d2fa0 and ee2c139.
|
||
- Updated testsuite modules as well as standalone test drivers in 'test'
|
||
directory to use DBL_MAX as the initial time candidate. Thanks to Devin
|
||
Matthews for suggesting this change.
|
||
- Inserted #include "float.h" into bli_system.h (to gain access to DBL_MAX).
|
||
- Minor update (vis-a-vis contexts) to driver code in test/3m4m.
|
||
|
||
commit b8f2b55532849d45d379afbdd05a52ff6100800d
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Jul 27 15:22:55 2016 -0500
|
||
|
||
Try an 8x24 kernel for the hell of it.
|
||
|
||
commit 7ede5863ae3567f7c0852efc2d5cd649ca19e0f3
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Jul 27 13:41:27 2016 -0600
|
||
|
||
Allocate pack buffer on MCDRAM for KNL.
|
||
|
||
commit ad89ed2e829c7b261d8ba0998a3cb83ad576ee04
|
||
Merge: 2c9de740 81e2b05f
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Jul 27 11:45:40 2016 -0500
|
||
|
||
Merge branch 'knl' of github.com:devinamatthews/blis into knl
|
||
|
||
commit 2c9de740edb66c4692c200731763bbd1d3171ccb
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Jul 27 11:44:54 2016 -0500
|
||
|
||
This version gets ~26GF on one core.
|
||
|
||
commit 81e2b05f31bca4e1e1676e7b533d1868d9f9be33
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Jul 27 11:39:05 2016 -0500
|
||
|
||
Add optimized packing kernels for KNL.
|
||
|
||
commit a7d8ca97b8d835c32d90ff20a565c82733f014a8
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Jul 25 15:15:13 2016 -0500
|
||
|
||
All fixed.
|
||
|
||
commit 963d0393b023f4134bb0c682923faf9964c0e645
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Jul 25 14:40:53 2016 -0500
|
||
|
||
Add 24xk pack kernel.
|
||
|
||
commit 117b76739afba481768897d2580f8365d3345417
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Jul 25 13:53:07 2016 -0500
|
||
|
||
In the midst of debugging.
|
||
|
||
commit 8c0a4fd1d3535d608a9a309a61ffee0a73c3646f
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Jul 25 13:09:24 2016 -0500
|
||
|
||
Fix some row/column confusion.
|
||
|
||
commit c44f9f96930312125b15e64c326ab5ab5cc02633
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Jul 25 12:02:24 2016 -0500
|
||
|
||
Simplify displacements -- clang assembler was badly botching EVEX compressed displacements giving false alarms for instruction length.
|
||
|
||
commit e0cce177cc1b47ec9f11ac0556241feaa3564df1
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Jul 25 10:02:25 2016 -0500
|
||
|
||
Minor fixes for 8x24 KNL kernel.
|
||
|
||
commit 50a2f2efcbeb46537f1deaa8e44dc579a4e49eb8
|
||
Merge: 1aa77dfc cfd46c88
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Jul 25 17:01:20 2016 +0530
|
||
|
||
Merge master code as on 2016_07_25 to amd-staging branch by praveeng
|
||
|
||
Change-Id: I84886ae241db2aac0bef6b7ef399f04aa8bca16d
|
||
|
||
commit cfd46c88d59c8f61d5e7cf768d606e4c44623584
|
||
Merge: f493bf4d a017062f
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Jul 25 15:38:13 2016 +0530
|
||
|
||
Merge remote-tracking branch 'publicrepo/master'
|
||
|
||
commit f493bf4d704fe0e967783cd6e6877d3302c056a1
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Jul 25 14:14:00 2016 +0530
|
||
|
||
removed changes from readme file which are giving confilcts
|
||
|
||
Change-Id: Ic71ad1313e1404fed444e899466043704d875af6
|
||
|
||
commit 65735bbedf75784c48bd11e05b3fdc98fc66b4bc
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Sun Jul 24 21:50:32 2016 -0500
|
||
|
||
Switch to 24x8 kernel, unrolled by 16.
|
||
|
||
commit 45d5dc97177117220bd9dd0abf85aafc185acad1
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Sun Jul 24 14:25:26 2016 -0500
|
||
|
||
Add 24x8 "KNC-style" kernel for KNL.
|
||
|
||
commit 95abea46f86816fddfc9ff0abfa52880801461be
|
||
Merge: d0dfe5b5 a017062f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Jul 23 15:38:33 2016 -0500
|
||
|
||
Merge branch 'master' into compose
|
||
|
||
commit a017062fdf763037da9d971a028bb07d47aa1c8a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jul 22 17:02:59 2016 -0500
|
||
|
||
Integrated "memory broker" (membrk_t) abstraction.
|
||
|
||
Details:
|
||
- Integrated a patch originally authored and submitted by Ricardo Magana
|
||
of HP Enterprise. The changeset inserts use of a new object type, membrk_t,
|
||
(memory broker) that allows multiple sets of memory pools on, for example,
|
||
separate NUMA nodes, each of which has a separate memory space.
|
||
- Added membrk field to cntx_t and defined corresponding accessor macros.
|
||
- Added membrk field to mem_t object and defined corresponding accessor macros.
|
||
- Created new bli_membrk.c file, which contains the new memory broker API,
|
||
including:
|
||
bli_membrk_init(), bli_membrk_finalize()
|
||
bli_membrk_acquire_[mv](), bli_membrk_release(),
|
||
bli_membrk_init_pools(), bli_membrk_reinit_pools(),
|
||
bli_membrk_finalize_pools(),
|
||
bli_membrk_pool_size()
|
||
- In bli_mem.c, changed function calls to
|
||
bli_mem_init_pools() -> bli_membrk_init()
|
||
bli_mem_reinit_pools() -> bli_membrk_reinit()
|
||
bli_mem_finalize_pools() -> bli_membrk_finalize()
|
||
- In bli_packv_init.c, bli_packm_init.c, changed function calls to:
|
||
bli_mem_acquire_[mv]() -> bli_membrk_acquire_[mv]()
|
||
bli_mem_release() -> bli_membrk_release()
|
||
- Added bli_mutex.c and related files to frame/thread. These files define
|
||
abstract mutexes (locks) and corresponding APIs for pthreads, openmp, or
|
||
single-threaded execution. This new API is employed within functions
|
||
such as bli_membrk_acquire_[mv]() and bli_membrk_release().
|
||
|
||
commit 8ff2e069c48c12fd06b9c48c6b3aeb4ea9b0e6e1
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 16:22:26 2016 -0500
|
||
|
||
Add 4x unrolled variant for KNL microkernel.
|
||
|
||
commit 9cb2ed9b0c25f31a22c1c9719b062fa665ad7adf
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 16:10:30 2016 -0500
|
||
|
||
Git rid of one RBX update.
|
||
|
||
commit 451bde076f0320d60cd2475cfb048ac4a2b798bb
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 15:43:00 2016 -0500
|
||
|
||
Add some more knobs to twiddle for KNL microkernel.
|
||
|
||
commit 8c6e621c099521e7a4d87e007bb8224faa5f33a3
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 15:05:15 2016 -0500
|
||
|
||
Make knl conform to new kernel dir structure.
|
||
|
||
commit ce7214c6618d6f22f4ce2ee452336236916d1f30
|
||
Merge: 119d0399 ce59f811
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 14:59:53 2016 -0500
|
||
|
||
Merge remote-tracking branch 'origin/master' into knl
|
||
|
||
commit ce59f81108ec9aea918a7e77030da8acfdd397ce
|
||
Merge: ff41153f 707a2b7f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jul 22 14:48:14 2016 -0500
|
||
|
||
Merge pull request #88 from devinamatthews/32bit-dim_t
|
||
|
||
Handle 32-bit dim_t in 64-bit microkernels.
|
||
|
||
commit 707a2b7faca137cca7cab7b11a12c44ddaf7ad53
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 13:49:44 2016 -0500
|
||
|
||
Somehow forgot the most important microkernel.
|
||
|
||
commit 47ec045056351ac4f0791c071fa0daaa81699c8c
|
||
Merge: 08f1d6b6 ff41153f
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 13:45:23 2016 -0500
|
||
|
||
Merge remote-tracking branch 'upstream/master' into 32bit-dim_t
|
||
|
||
commit 08f1d6b6fa344275de0f675f69737145ccf6646a
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 13:44:37 2016 -0500
|
||
|
||
Use 64-bit intermediate variable for k for architectures that do 64-bit loads in case dim_t is 32-bit.
|
||
|
||
commit ff41153f4eb7f38ed94bdd9a3fd81fb979f3f401
|
||
Merge: f9214ced e0d2fa0d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jul 22 13:21:03 2016 -0500
|
||
|
||
Merge pull request #86 from devinamatthews/haswell-vmovups
|
||
|
||
Remove alignment restrictions on C in haswell kernel.
|
||
|
||
commit e0d2fa0d835ab49366aeb790363bb2b571d36ed8
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 12:56:51 2016 -0500
|
||
|
||
Relax alignment restrictions for haswell sgemm.
|
||
|
||
commit f9214ced97392861f5a0ea72abfcf6f41faf674c
|
||
Merge: 413d62ac 08666eaa
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jul 22 12:16:39 2016 -0500
|
||
|
||
Merge pull request #85 from devinamatthews/qopenmp
|
||
|
||
Change -openmp to -fopenmp for icc.
|
||
|
||
commit ee2c139df6ad53c6aec8a67ab23b3b1912e8d259
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 12:06:03 2016 -0500
|
||
|
||
Remove alignment restrictions on C in haswell kernel.
|
||
|
||
commit 08666eaa20d8a31f2f92f944e5bfa7c1558c53e4
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 11:07:34 2016 -0500
|
||
|
||
Change -openmp to -fopenmp for icc.
|
||
|
||
commit 119d0399428905053265f3aca1cc8cc1fde3b363
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Jul 22 10:23:31 2016 -0500
|
||
|
||
Add 8x24 KNL kernel.
|
||
|
||
commit 1aa77dfc1dc183d16e0b6a1196d9c263f021e83d
|
||
Merge: 9101a9c8 ec9f5983
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Thu Jul 21 14:22:40 2016 +0530
|
||
|
||
Merge master code as on 2016_07_21 to amd-staging branch by praveeng
|
||
|
||
Change-Id: Ic7d0a21101358f08147736e7f1884e7409937344
|
||
|
||
commit b58cda9eba0c1e175460aae109baf792d29ba5bf
|
||
Merge: 318f063d 413d62ac
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Jul 19 14:09:09 2016 -0500
|
||
|
||
Merge remote-tracking branch 'origin/master' into knl
|
||
|
||
# Conflicts:
|
||
# frame/base/bli_threading.h
|
||
# frame/include/blis.h
|
||
# frame/thread/bli_thread.c
|
||
|
||
commit ec9f59836b32260c29ff1cd24e629c7d8de14992
|
||
Merge: 197e182f 763babe4
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Mon Jul 18 12:56:25 2016 +0530
|
||
|
||
Merge branch 'master' of https://github.com/clMathLibraries/blis-amd
|
||
|
||
commit 197e182fcbf1340fd4a202fac58bea6cfcfa9e2f
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 16:51:23 2016 +0530
|
||
|
||
first commit
|
||
|
||
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
|
||
|
||
commit 41fb32711031e7ec86b062aa7f53255d1f5905e2
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 15:00:31 2016 +0530
|
||
|
||
small modification to readme for git push test
|
||
|
||
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
|
||
|
||
commit d0dfe5b5372cc7558ee9c4104b29f82eecc7ed61
|
||
Merge: 31def12e 413d62ac
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 14 11:01:06 2016 -0500
|
||
|
||
Merge branch 'master' into compose
|
||
|
||
commit 9101a9c880e3934f8a63ffc7fe15f5fc1077a73d
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Wed Jul 13 16:51:14 2016 +0530
|
||
|
||
Checked in optimized 1V kernels along with benchmark codes. Also incorporated review comments for 1F kernels
|
||
|
||
Change-Id: I035c0d39e6b0bed28e6e2041242186c49f6ed55b
|
||
|
||
commit 763babe488880b42c86c7fc207aa7665bd0ff9f7
|
||
Merge: 357c990b 413d62ac
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Wed Jul 13 11:57:19 2016 +0530
|
||
|
||
Merge remote-tracking branch 'publirepo/master'
|
||
|
||
commit 413d62aca28edabba56605a9f87d5b715831e1db
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 12 15:02:52 2016 -0500
|
||
|
||
README update (use official ACM TOMS links).
|
||
|
||
commit dfa431f696db2df4065ea454df268a2e0bc02eac
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 12 14:21:19 2016 -0500
|
||
|
||
README update (BLIS2 TOMS article now in-print).
|
||
|
||
commit 357c990bdd7bd5667aac5adf1bab3712973e7414
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 16:51:23 2016 +0530
|
||
|
||
first commit
|
||
|
||
Change-Id: Ib50c81acda3b2c1583da3d421efc0ca547ef68e2
|
||
|
||
commit 8aee306300adb099b66036f2c2f7f3996433cf49
|
||
Author: praveeng <praveen.g@amd.com>
|
||
Date: Tue Jul 5 15:00:31 2016 +0530
|
||
|
||
small modification to readme for git push test
|
||
|
||
Change-Id: I68506a49586b07eaa907f3f85304ee40d4c92d0a
|
||
|
||
commit 31def12e2629f187e40f93f6bae9e26a6c2660e2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jun 30 15:19:20 2016 -0500
|
||
|
||
First phase of control tree redesign.
|
||
|
||
Details:
|
||
- These changes constitute the first set of changes in preparation to
|
||
revamping the structure and use of control trees in BLIS. Modifications
|
||
in this commit don't affect the control tree code yet, but rather lay
|
||
the groundwork.
|
||
- Defined wrappers for the following functions, where the the wrappers
|
||
each take a direction parameter of a new enumerated type (BLIS_BWD or
|
||
BLIS_FWD), dir_t, and executes the correct underlying function.
|
||
- bli_acquire_mpart_*() and _vpart_*()
|
||
- bli_*_determine_kc_[fb]()
|
||
- bli_thread_get_range_*() and bli_thread_get_range_weighted_*()
|
||
- Consolidated all 'f' (forwards-moving) and 'b' (backwards-moving)
|
||
blocked variants for trmm and trsm, and renamed gemm and herk variants
|
||
accordingly. The direction is now queried via routines such as
|
||
bli_trmm_direct(), which deterines the direction from the implied side
|
||
and uplo parameters. For gemm and herk, it is uncondtionally BLIS_FWD.
|
||
- Defined wrappers to parameter-specific macrokernels for herk, trmm, and
|
||
trsm, e.g. bli_trmm_xx_ker_var2(), that execute the correct underlying
|
||
macrokernel based on the implied parameters. The same logic used to
|
||
choose the dir_t in _direct() functions is used here.
|
||
- Simplified the function pointer arrays in _int() functions given the
|
||
consolidation and dir_t querying mentioned above.
|
||
- Function signature (whitespace) reformatting for various functions.
|
||
- Removed old code in various 'old' directories.
|
||
|
||
commit 405c9d46344d93c3eab5572b233900b50ca50d68
|
||
Author: sthangar <Santanu.Thangaraj@amd.com>
|
||
Date: Wed Jun 22 12:18:54 2016 +0530
|
||
|
||
Check-in the fused kernels optimized for Zen
|
||
|
||
Change-Id: I7b2f467b960e7b9a285f06e47be87de122e5fa24
|
||
|
||
commit 232754feecf29452987666b9f5ebba2619bfd0b0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jun 21 14:25:39 2016 -0500
|
||
|
||
Fixed compiler warning in rand[vm], randn[vm].
|
||
|
||
Details:
|
||
- Fixed compiler warnings about unused variables related to the disabling
|
||
of normalization in the structured cases of the rand[vm] and randn[vm]
|
||
operations.
|
||
|
||
commit a89555d1605574f3685813dcc972b636dd61264d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jun 17 14:08:35 2016 -0500
|
||
|
||
Added randn[vm] operations, support in testsuite.
|
||
|
||
Details:
|
||
- Defined a new randomization operation, randn, on vectors and matrices.
|
||
The randnv and randnm operations randomize each element of the target
|
||
object with values from a narrow range of values. Presently, those
|
||
values are all integer powers of two, but they do not need to be powers
|
||
of two in order to achieve the primary goal, which is to initialize
|
||
objects that can be operated on with plenty of precision "slack"
|
||
available to allow computations that avoid roundoff. Using this method
|
||
of randomization makes it much more likely that testsuite residuals of
|
||
properly-functioning operations are close to zero, if not exactly zero.
|
||
- Updated existing randomization operations randv and randm to skip
|
||
special diagonal handling and normalization for matrices with structure.
|
||
This is now handled by the testsuite modules by explicitly calling a
|
||
testsuite function that loads the diagonal (and scales off-diagonal
|
||
elements).
|
||
- Added support for randnv and randnm in the testsuite with a new switch
|
||
in input.general that universally toggles between use of the classic
|
||
randv/randm, which use real values on the interval [-1,1], and
|
||
randnv/randnm, which use only values from a narrow range. Currently,
|
||
the narrow range is: +/-{2^0, 2^-1, 2^-2, 2^-3, 2^-4, 2^-5, 2^-6}, as
|
||
well as 0.0.
|
||
- Updated testsuite modules so that a testsutie wrapper function is called
|
||
instead of directly calling the randomization operations (such as
|
||
bli_randv() and bli_randm()). This wrapper also takes a bool_t that
|
||
indicates whether the object's elements should be normalized. (NOTE: As
|
||
alluded to above, in the test modules of triangular solve operations such
|
||
as trsv and trsm, we perform the extra step of loading the diagonal.)
|
||
- Defined a new level-0 operation, invertsc, which inverts a scalar.
|
||
- Updated the abval2ris and sqrt2ris level-0 macros to avoid an unlikely
|
||
but possible divide-by-zero.
|
||
- Updated function signature and prototype formatting in testsuite.
|
||
|
||
commit 318f063dcbd8b594969e401bc99146d24b01066a
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Jun 8 17:46:50 2016 -0500
|
||
|
||
Add new KNL microkernel derived from Haswell.
|
||
|
||
commit 096895c5d538a7f8817603d7cf28c52e99340def
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jun 6 13:32:04 2016 -0500
|
||
|
||
Reorganized code, APIs related to multithreading.
|
||
|
||
Details:
|
||
- Reorganized code and renamed files defining APIs related to multithreading.
|
||
All code that is not specific to a particular operation is now located in a
|
||
new directory: frame/thread. Code is now organized, roughly, by the
|
||
namespace to which it belongs (see below).
|
||
- Consolidated all operation-specific *_thrinfo_t object types into a single
|
||
thrinfo_t object type. Operation-specific level-3 *_thrinfo_t APIs were
|
||
also consolidated, leaving bli_l3_thrinfo_*() and bli_packm_thrinfo_*()
|
||
functions (aside from a few general purpose bli_thrinfo_*() functions).
|
||
- Renamed thread_comm_t object type to thrcomm_t.
|
||
- Renamed many of the routines and functions (and macros) for multithreading.
|
||
We now have the following API namespaces:
|
||
- bli_thrinfo_*(): functions related to thrinfo_t objects
|
||
- bli_thrcomm_*(): functions related to thrcomm_t objects.
|
||
- bli_thread_*(): general-purpose functions, such as initialization,
|
||
finalization, and computing ranges. (For now, some macros, such as
|
||
bli_thread_[io]broadcast() and bli_thread_[io]barrier() use the
|
||
bli_thread_ namespace prefix, even though bli_thrinfo_ may be more
|
||
appropriate.)
|
||
- Renamed thread-related macros so that they use a bli_ prefix.
|
||
- Renamed control tree-related macros so that they use a bli_ prefix (to be
|
||
consistent with the thread-related macros that were also renamed).
|
||
- Removed #undef BLIS_SIMD_ALIGN_SIZE from dunnington's bli_kernel.h. This
|
||
#undef was a temporary fix to some macro defaults which were being applied
|
||
in the wrong order, which was recently fixed.
|
||
|
||
commit 232530e88ff99f37abcae5b6fb5319a9a375a45f
|
||
Merge: 4bcabd1b eef37f8b
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Wed Jun 1 15:14:10 2016 -0500
|
||
|
||
Merge commit 'refs/pull/81/head' of https://github.com/flame/blis
|
||
|
||
Conflicts:
|
||
frame/base/bli_threading_pthreads.c
|
||
frame/base/bli_threading_pthreads.h
|
||
|
||
commit 4bcabd1bf60688c38cf562459fc5e8be8b831756
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Wed Jun 1 13:27:28 2016 -0500
|
||
|
||
Use spin locks instead of pthread barriers
|
||
|
||
commit eef37f8b4d81845a6ba4bf25586d32b50c3e8a68
|
||
Author: Jeff Hammond <jeff.science@gmail.com>
|
||
Date: Sun May 29 22:28:13 2016 -0700
|
||
|
||
use GCC intrinsic instead of pthread_mutex for atomic increment and fetch
|
||
|
||
commit 9dcd6f05c4c3ff2ce7cd87a9951a96ebef22681e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 24 13:15:32 2016 -0500
|
||
|
||
Implemented developer-configurable malloc()/free().
|
||
|
||
Details:
|
||
- Replaced all instances of bli_malloc() and bli_free() with one of:
|
||
- bli_malloc_pool()/bli_free_pool()
|
||
- bli_malloc_user()/bli_free_user()
|
||
- bli_malloc_intl()/bli_free_intl()
|
||
each of which can be configured to call malloc()/free() substitutes,
|
||
so long as the substitute functions have the same function type
|
||
signatures as malloc() and free() defined by C's stdlib.h. The _pool()
|
||
function is called when allocating blocks for the memory pools (used
|
||
for packing buffers, primarily), the _user() function is called when
|
||
obj_t's are created (via bli_obj_create() and friends), and the _intl()
|
||
function is called for internal use by BLIS, such as when creating
|
||
control tree nodes or temporary buffers for manipulating internal data
|
||
structures. Substitutes for any of the three types of bli_malloc() may
|
||
be specified by #defining the following pairs of cpp macros in
|
||
bli_kernel.h:
|
||
- BLIS_MALLOC_POOL/BLIS_FREE_POOL
|
||
- BLIS_MALLOC_USER/BLIS_FREE_USER
|
||
- BLIS_MALLOC_INTL/BLIS_FREE_INTL
|
||
to be the name of the substitute functions. (Obviously, the object
|
||
code that contains these functions must be provided at link-time.)
|
||
These macros default to malloc() and free(). Subsitute functions are
|
||
also automatically prototyped by BLIS (in bli_malloc_prototypes.h).
|
||
- Removed definitions for bli_malloc() and bli_free().
|
||
- Note that bli_malloc_pool() and bli_malloc_user() are now defined in
|
||
terms of a new function, bli_malloc_align(), which aligns memory to an
|
||
arbitrary (power of two) alignment boundary, but does so manually,
|
||
whereas before alignment was performed behind the scenes by
|
||
posix_memalign(). Currently, bli_malloc_intl() is defined in terms
|
||
of bli_malloc_noalign(), which serves as a simple wrapper to the
|
||
designated function that is passed in (e.g. BLIS_MALLOC_INTL).
|
||
Similarly, there are bli_free_align() and bli_free_noalign(), which
|
||
are used in concert with their bli_malloc_*() counterparts.
|
||
|
||
commit 9dd440109a9d964f5cd286e9f83c487ad703e1e4
|
||
Author: Jeff Hammond <jeff.science@gmail.com>
|
||
Date: Sat May 21 15:21:58 2016 -0700
|
||
|
||
fix 404 link to BuildSystem
|
||
|
||
Google Code is dead. Long live GitHub!
|
||
|
||
commit d309f20b7376a68efa3b864ad790c2021c071655
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 18 15:13:53 2016 -0500
|
||
|
||
Added alignment switch to testsuite.
|
||
|
||
Details:
|
||
- Added a new input parameter to input.general that globally toggles
|
||
whether testsuite tests are performed on objects whose buffers and
|
||
leading dimensions have been aligned, and changed the implementation
|
||
of libblis_test_mobj_create() to employ alignment (or not) regardless
|
||
of whether row, column, or general storage is being tested.
|
||
- Updated configure script's "--help" text to indicate default behavior
|
||
for internal integer type size and BLAS/CBLAS integer type size
|
||
options.
|
||
|
||
commit 32db0adc218ea4ae370164dbe8d23b41cd3526d3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 17 15:20:16 2016 -0500
|
||
|
||
Generate prototypes for user-defined packm kernels.
|
||
|
||
Details:
|
||
- Created template prototypes for packm kernels (in bli_l1m_ker.h), and
|
||
then redefined reference packm kernels' prototyping headers in terms of
|
||
this template, as is already done for level-1v, -1f, and -3 kernels.
|
||
- Automatically generate prototypes for user-defined packm kernels in
|
||
bli_kernel_prototypes.h (using the new template prototypes in
|
||
bli_l1m_ker.h).
|
||
- Defined packm kernel function types in bli_l1m_ft.h, including for
|
||
packm kernels specific to induced methods, which are now used in
|
||
bli_packm_cxk.c and friends rather than using a locally-defined
|
||
function type.
|
||
- In bli_packm_cxk.c, extended function pointer for packm kernels array
|
||
from out to index 31 (from previous maximum of 17). This allows us to
|
||
store the unrolled 30xk kernel in the array for use (on knc, for
|
||
example). Note: This should have been done a long time ago.
|
||
|
||
commit e3bd5ca64ae7c190ba689396c0de687b829a11fe
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu May 12 20:54:13 2016 -0500
|
||
|
||
Fix SIMD definitions in KNL config, and a couple of fixes to C update.
|
||
|
||
commit 4fe02e3d497995d94d34d3fcf5af895084cfc8b9
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu May 12 20:53:58 2016 -0500
|
||
|
||
Move bli_kernel.h before bli_threading.h in order of inclusion in blis.h.
|
||
|
||
commit 4bcf1b35abea3f3dfc8f2fe462dcf155cf199e55
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 11 16:09:49 2016 -0500
|
||
|
||
Fixed bli_get_range_*() bugs in trsm variants.
|
||
|
||
Details:
|
||
- Fixed incorrect calls to bli_get_range_*() from within trsm blocked
|
||
variants 1f, 2b, and 2f. The bug somehow went undetected since the
|
||
big commit (537a1f4), and, strangely, did not manifest via the BLIS
|
||
testsuite. The bug finally came to our attention when running thei
|
||
libflame test suite while linking to BLIS. Thanks to Kiran Varaganti
|
||
for submitting the initial report that led to this bug.
|
||
|
||
commit 9cfa33023f123a6c17e987f72fba174ce073f0b6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 11 16:02:30 2016 -0500
|
||
|
||
Minor updates to bli_f2c.h.
|
||
|
||
Details:
|
||
- Added #undef guards to certain #define statements in bli_f2c.h,
|
||
and renamed the file guard to BLIS_F2C_H. This helps when
|
||
#including "blis.h" from an application or library that already
|
||
#includes an "f2c.h" header.
|
||
|
||
commit a09a2e23eacf5328858c8318bb637c5ff3b71d08
|
||
Merge: 4dcd37eb 7c604e1c
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Wed May 11 10:47:11 2016 -0500
|
||
|
||
Merge pull request #76 from devinamatthews/move_simd_defs
|
||
|
||
Move default SIMD-related definitions to bli_kernel_macro_defs.h
|
||
|
||
commit 4dcd37eb1b12a6e08cc13df7b61391ef8363f5d8
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Tue May 10 16:28:59 2016 -0500
|
||
|
||
fixing knc simd align size
|
||
|
||
commit 619dee0daec3474b4e5a55df90a61aabcae194f2
|
||
Merge: b790b3d9 7c604e1c
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue May 10 12:13:24 2016 -0500
|
||
|
||
Merge branch 'move_simd_defs' into knl
|
||
|
||
commit 7c604e1cbc1609b6e12d3ee973c08b7af5035be4
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue May 10 12:11:55 2016 -0500
|
||
|
||
Move default SIMD-related definitions to bli_kernel_macro_defs.h. Otherwise, configurations which customize these fail as these are now defined in bli_kernel.h.
|
||
|
||
commit b790b3d9e1820f3b691676de48c291cae083452d
|
||
Merge: 4f8c05c9 a7be2d28
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue May 10 11:49:47 2016 -0500
|
||
|
||
Merge branch 'master' into knl
|
||
|
||
commit a7be2d28e8930b154d0da1d6929b54a96e210af6
|
||
Merge: 97b512ef 4b1e55ed
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 10 11:48:51 2016 -0500
|
||
|
||
Merge pull request #74 from devinamatthews/fix_common_symbols
|
||
|
||
Default-initialize all extern global variables to avoid generating common symbols.
|
||
|
||
commit 4b1e55edbfe0e1cb2e7b9428424903497cb7a841
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue May 10 10:08:47 2016 -0500
|
||
|
||
Default-initialize all extern global variables to avoid generating common symbols. Fixes #73.
|
||
|
||
commit 97b512ef62c7e25c97ed5e9eca81cd7015b2ac91
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri May 6 10:24:30 2016 -0500
|
||
|
||
Include headers from cblas.h to pull in f77_int.
|
||
|
||
Details:
|
||
- Added #include statements for certain key BLIS headers so that the
|
||
definition of f77_int is pulled in when a user compiles application
|
||
code with only #include "cblas.h" (and no other BLIS header). This
|
||
is necessary since f77_int is now used within the cblas API.
|
||
|
||
commit c3a4d39d03665135f1616588b5ef7c3e9ef5688d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 4 17:22:56 2016 -0500
|
||
|
||
Updates to haswell gemm micro-kernels.
|
||
|
||
Details:
|
||
- Added two new sets of [sd]gemm micro-kernels for haswell architectures,
|
||
one that is 4x24/4x12 (s and d) and one that is 6x16/6x8.
|
||
- Changed the haswell configuration to use the 6x16/6x8 micro-kernels
|
||
by default.
|
||
- Updated various Makefiles, in test, test/3m4m, and testsuite.
|
||
|
||
commit 0b01d355ae861754ae2da6c9a545474af010f02e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 27 15:21:10 2016 -0500
|
||
|
||
Miscellaneous cleanups, fixes to recent commits.
|
||
|
||
Details:
|
||
- Fixed a typo in bli_l1f_ref.h, introduced into bbb8569, that only
|
||
manifested when non-reference level-1f kernels were used.
|
||
- Added an #undef BLIS_SIMD_ALIGN_SIZE to bli_kernel.h of dunnington
|
||
configuration to prevent a compile-time warning until I can figure out
|
||
the proper permanent fix.
|
||
- Moved frame/1f/kernels/bli_dotxaxpyf_ref_var1.c out of the compilation
|
||
path (into 'other' directory). _ref_var2 is used by default, which is
|
||
the variant that is built on axpyf and dotxf instead of dotaxpyv.
|
||
- Removed section of frame/include/bli_config_macro_defs.h pertaining to
|
||
mixed datatype support.
|
||
|
||
commit ed7326c836f427e2f8420b015220ce293207b10c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 27 14:57:40 2016 -0500
|
||
|
||
Added 'restrict' to l1v/l1f code in 'kernels' dir.
|
||
|
||
Details:
|
||
- Added 'restrict' keyword to existing kernel definitions in 'kernels'
|
||
directory. These changes were meant for inclusion in bbb8569.
|
||
|
||
commit bbb8569b2a08c3bcd631d5a05eb389d01d94ac07
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 27 14:13:46 2016 -0500
|
||
|
||
Use 'restrict' in all kernel APIs; wspace changes.
|
||
|
||
Details:
|
||
- Updated level-1v, level-1f kernel function types (bli_l1?_ft.h) and
|
||
generic kernel prototypes (bli_l1?_ker.h) to use 'restrict' for all
|
||
numerical operand pointers (ie: all pointers except the cntx_t).
|
||
- Updated level-1f reference kernel definitions to use 'restrict' for
|
||
all numerical operand pointers. (Level-1v reference kernel definitions
|
||
were already updated in bdbda6e.)
|
||
- Rewrote the level-1v and level-1f reference kernel prototypes in
|
||
bli_l1v_ref.h and bli_l1f_ref.h, respectively, to simply #include
|
||
bli_l1v_ker.h and bli_l1f_ker.h with redefined function base names
|
||
(as was already being done for the level-3 micro-kernel prototypes
|
||
in bli_l3_ref.h), rather than duplicate the signatures from the
|
||
_ker.h files.
|
||
- Added definitions to frame/include/bli_kernel_prototypes.h for axpbyv
|
||
and xpbyv, which were probably meant for inclusion in bdbda6e.
|
||
- Converted a number of instances of four spaces, as introduced in
|
||
bdbda6e, to tabs.
|
||
|
||
commit 4ea419c72c789825e1f93a1eee88219bbf873930
|
||
Merge: f1e9be2a bdbda6e6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 26 12:50:45 2016 -0500
|
||
|
||
Merge pull request #70 from devinamatthews/daxpby
|
||
|
||
Give the level1v operations some love
|
||
|
||
commit bdbda6e6acc682ab1b6ca680edebd09ae12a832c
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Apr 25 11:05:57 2016 -0500
|
||
|
||
Give the level1v operations some love:
|
||
|
||
- Add missing axpby and xpby operations (plus test cases).
|
||
- Add special case for scal2v with alpha=1.
|
||
- Add restrict qualifiers.
|
||
- Add special-case algorithms for incx=incy=1.
|
||
|
||
commit f1e9be2aba1a057eedb947bbae96848597777408
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 22 15:34:02 2016 -0500
|
||
|
||
Minor tweak to test/Makefile.
|
||
|
||
Details:
|
||
- Just committing a minor change to test/Makefile that has been lingering
|
||
in my local working copy for longer than I can remember.
|
||
|
||
commit aa0bceec277938328dabeb744680623f24fb0b61
|
||
Merge: 4136553f e2784b4c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 22 12:01:31 2016 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 4136553f0d0661a668dfdb9edcd7ce1c5773dde7
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 22 11:53:53 2016 -0500
|
||
|
||
Clear level-3 cntx_t's via memset() before use.
|
||
|
||
Details:
|
||
- In all level-3 operations' _cntx_init() functions, replaced calls to
|
||
bli_cntx_obj_init() with calls to bli_cntx_obj_clear(), and in all
|
||
level-3 operations' _cntx_finalize() functions, removed calls to
|
||
bli_cntx_obj_finalize(), leaving those function definitions empty.
|
||
- Changed the definition of bli_cntx_obj_clear() so that the clearing
|
||
occurs via a single call to memset().
|
||
|
||
commit 4f8c05c9e2ef4cbb82b35a3ebf1f0a0ac665830e
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Apr 21 10:00:59 2016 -0500
|
||
|
||
Rearrange KNL dgemm kernel again to streamline usage of ymm register. sgemm and dgemm now both working with Intel SDE.
|
||
|
||
commit e2784b4c921f706e756df3e146e20a4cb63f53e3
|
||
Merge: dd0ab1d9 a9b6c3ab
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 20 18:34:09 2016 -0500
|
||
|
||
Merge pull request #67 from devinamatthews/cblas-f77-int
|
||
|
||
Change CBLAS integer type to f77_int
|
||
|
||
commit a9b6c3abda6222a8b240361643932e83cf726c4f
|
||
Merge: e4c54c81 dd0ab1d9
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Apr 20 16:00:10 2016 -0500
|
||
|
||
Merge remote-tracking branch 'origin/master' into cblas-f77-int
|
||
|
||
# Conflicts:
|
||
# config/haswell/bli_config.h
|
||
|
||
commit e4c54c81463c2a19c9bb6b1f0f1be3fa9d018a45
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Apr 20 15:56:46 2016 -0500
|
||
|
||
Change integer type in CBLAS function signatures to f77_int, and add proper const-correctness to BLAS layer.
|
||
|
||
commit dd0ab1d93f33abca6af9edd7b8e52da62dcfa5b1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 20 14:38:23 2016 -0500
|
||
|
||
Converted some bli_cntx query functions to macros.
|
||
|
||
Details:
|
||
- Commented out several datatype-aware query functions (those ending in
|
||
_dt) from bli_cntx.c, as well as their prototypes in bli_cntx.h, and
|
||
added equivalent cpp query macros to bli_cntx.h.
|
||
- Added 'bli_config.h' to .gitignore.
|
||
|
||
commit 7193230f7d35edbd1d2f77842a613971f1603463
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Apr 20 09:37:30 2016 -0500
|
||
|
||
Work around missing VPMULLQ on KNL.
|
||
|
||
commit a30ccbc4c6a6e6460e78af6b5c530ee0d06f98fb
|
||
Merge: eb2f18e4 0e1a9821
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 19 15:04:33 2016 -0500
|
||
|
||
Merge pull request #66 from devinamatthews/blas-configure
|
||
|
||
Add configure options and generate bli_config.h automatically.
|
||
|
||
commit bd44cf13e886069bc66c10ac0db178be96629a0d
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Apr 19 13:43:04 2016 -0500
|
||
|
||
Fix copy-paste errors in KNL kernels.
|
||
|
||
commit eb2f18e4844d985715df20798f50f9cc12e3b5ad
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 19 12:50:32 2016 -0500
|
||
|
||
More compile-time fixes to bgq gemm ukernel code.
|
||
|
||
commit 0e1a9821d860f6c1d818baf4c48d21a23726c132
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Apr 19 11:44:37 2016 -0500
|
||
|
||
Add configure options and generate bli_config.h automatically.
|
||
|
||
Options to configure have been added for:
|
||
- Setting the internal BLIS and BLAS/CBLAS integer sizes.
|
||
- Enabling and disabling the BLAS and CBLAS layers.
|
||
|
||
Additionally, configure options which require defining macros (the above plus the threading model), write their macros to the automatically-generated bli_config.h file in the top-level build directory. The old bli_config.h files in the config dirs were removed, and any kernel-related macros (SIMD size and alignment etc.) were moved to bli_kernel.h. The Makefiles were also modified to find the new bli_config.h file.
|
||
|
||
Lastly, support for OMP in clang has been added (closes #56).
|
||
|
||
commit a11eec05928ddc5c43fa5dbcd35f2edd24ff35a1
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Apr 18 13:13:36 2016 -0500
|
||
|
||
Add sgemm ukernels for KNL. vpmullq is not implemented on KNL -- needs workaround.
|
||
|
||
commit ff84469a4575f1ef8a0010046fde52240a312cae
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 18 12:29:09 2016 -0500
|
||
|
||
Applied various compilation fixes to bgq kernels.
|
||
|
||
commit c38e0dab05b2dc36672eab96e1248fb7fb2d785b
|
||
Merge: bd5e2296 cbcd0b73
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Apr 18 10:21:35 2016 -0500
|
||
|
||
Merge remote-tracking branch 'origin/master' into knl
|
||
|
||
commit bd5e2296e98e042c31f1e8ece2c1ca8e4bdc2d4c
|
||
Merge: 4745def0 49f85177
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Apr 18 10:15:22 2016 -0500
|
||
|
||
Merge remote-tracking branch 'origin/knl' into knl
|
||
|
||
commit 4745def0c87377ae83ad73ac514d7de08a96b2ac
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Apr 18 10:15:05 2016 -0500
|
||
|
||
Add 64-bit offset vector so we can use vgatherqpd.
|
||
|
||
commit 49f85177f886f38889b60503a4e12fa7f04be1fd
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Apr 18 10:14:11 2016 -0500
|
||
|
||
KNL ukernel compiles with gcc.
|
||
|
||
commit cbcd0b739dc54bd14fbb46aeda267c26725cd70f
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Mon Apr 18 03:12:57 2016 -0500
|
||
|
||
Changing ifdef for OSX pthread barriers
|
||
|
||
commit 58b2c3cf040134d1be913c585a3c6905629116c0
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Sat Apr 16 16:12:24 2016 -0500
|
||
|
||
Rewrite of KNL kernel in GNU extended asm syntax.
|
||
|
||
commit dd62080cea78f3a23616200d6640e52c102b2bb9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 15 11:15:41 2016 -0500
|
||
|
||
Compile-time fix to bgq l1f kernels.
|
||
|
||
Details:
|
||
- Fixed an old reference to bli_daxpyf_fusefac, which no longer exists,
|
||
by replacing it with the axpyf fusing factor (8), and cleaned up the
|
||
relevant section of config/bgq/bli_kernel.h.
|
||
- Removed most of the details of the level-3 kernels from the template
|
||
kernel code in config/template/kernels/3 and replaced it with a
|
||
reference to the relevant kernel wiki maintained on the BLIS github
|
||
website.
|
||
|
||
commit d5a915dd8d7a6ead42a68772e4420eb3647e6f1a
|
||
Merge: 4320b725 41694675
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 14 12:56:36 2016 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 4320b725a1f8fd34101470b6cf52ad504a79c517
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 14 12:51:29 2016 -0500
|
||
|
||
Use kernel CFLAGS on "ukernels" directories.
|
||
|
||
Details:
|
||
- Updated the top-level Makefile so that the CFLAGS variable designated
|
||
for kernel source code is applied not only to source code in
|
||
directories named "kernels" but source code in any directory that
|
||
contains the substring "kernels", such as "ukernels".
|
||
- Formally disabled some code in gen-make-frag.sh script that was already
|
||
effectively disabled. The code was related to handling "noopt" and
|
||
"kernel" directories, which is now handled independently within the
|
||
top-level Makefile without needing to place these source files into
|
||
a spearate makefile variable.
|
||
|
||
commit 41694675e4cb56e2e0323c7a7db48e0819606a31
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Apr 13 15:51:08 2016 -0500
|
||
|
||
pthreads bugfixes
|
||
|
||
Getting pthreads to work on my Mac
|
||
Implemented a pthread barrier when _POSIX_BARRIER isn't defined
|
||
Now spawn n-1 threads instead of n threads so that master thread isn't just spinning the whole time
|
||
Add -lpthread instead of -pthread to LDFLAGS (for clang)
|
||
|
||
commit f756dbfa0d542cbc497724981520c83abf049c4b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 13 11:25:33 2016 -0500
|
||
|
||
Removed stale #include from bgq configuration.
|
||
|
||
Details:
|
||
- Removed an old #include statement ("bli_gemm_8x8.h") from the
|
||
bli_kernel.h file in the bgq configuration. It turns out this
|
||
file was no longer needed even prior to 537a1f4.
|
||
|
||
commit 0bd4169ea75f690714e7d2912229932a75d8a7e2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 11 18:08:32 2016 -0500
|
||
|
||
Fixed context-broken dunnington/penryn kernels.
|
||
|
||
Details:
|
||
- Added missing context parameters to several instances where simpler
|
||
kernels, or reference kernels, are called instead of executing the
|
||
main body code contained in the kernel function in question.
|
||
- Renamed axpyv and dotv kernel files to use "opt" instead of "int"
|
||
substring, for consistency with level-1f kernels.
|
||
|
||
commit 7912af5db45b7372d19a9a3dfeb82df302a05628
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 11 17:32:13 2016 -0500
|
||
|
||
CHANGELOG update (0.2.0)
|
||
|
||
commit 898614a555ea0aa7de4ca07bb3cb8f5708b6a002 (tag: 0.2.0)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 11 17:32:09 2016 -0500
|
||
|
||
Version file update (0.2.0)
|
||
|
||
commit 537a1f4f85ce1aa008901857cb3182e6b4546d7f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 11 17:21:28 2016 -0500
|
||
|
||
Implemented runtime contexts and reorganized code.
|
||
|
||
Details:
|
||
- Retrofitted a new data structure, known as a context, into virtually
|
||
all internal APIs for computational operations in BLIS. The structure
|
||
is now present within the type-aware APIs, as well as many supporting
|
||
utility functions that require information stored in the context. User-
|
||
level object APIs were unaffected and continue to be "context-free,"
|
||
however, these APIs were duplicated/mirrored so that "context-aware"
|
||
APIs now also exist, differentiated with an "_ex" suffix (for "expert").
|
||
These new context-aware object APIs (along with the lower-level, type-
|
||
aware, BLAS-like APIs) contain the the address of a context as a last
|
||
parameter, after all other operands. Contexts, or specifically, cntx_t
|
||
object pointers, are passed all the way down the function stack into
|
||
the kernels and allow the code at any level to query information about
|
||
the runtime, such as kernel addresses and blocksizes, in a thread-
|
||
friendly manner--that is, one that allows thread-safety, even if the
|
||
original source of the information stored in the context changes at
|
||
run-time; see next bullet for more on this "original source" of info).
|
||
(Special thanks go to Lee Killough for suggesting the use of this kind
|
||
of data structure in discussions that transpired during the early
|
||
planning stages of BLIS, and also for suggesting such a perfectly
|
||
appropriate name.)
|
||
- Added a new API, in frame/base/bli_gks.c, to define a "global kernel
|
||
structure" (gks). This data structure and API will allow the caller to
|
||
initialize a context with the kernel addresses, blocksizes, and other
|
||
information associated with the currently active kernel configuration.
|
||
The currently active kernel configuration within the gks cannot be
|
||
changed (for now), and is initialized with the traditional cpp macros
|
||
that define kernel function names, blocksizes, and the like. However,
|
||
in the future, the gks API will be expanded to allow runtime management
|
||
of kernels and runtime parameters. The most obvious application of this
|
||
new infrastructure is the runtime detection of hardware (and the
|
||
implied selection of appropriate kernels). With contexts in place,
|
||
kernels may even be "hot swapped" at runtime within the gks. Once
|
||
execution enters a level-3 _front() function, the memory allocator will
|
||
be reinitialized on-the-fly, if necessary, to accommodate the new
|
||
kernels' blocksizes. If another application thread is executing with
|
||
another (previously loaded) kernel, it will finish in a deterministic
|
||
fashion because its kernel information was loaded into its context
|
||
before computation began, and also because the blocks it checked out
|
||
from the internal memory pools will be unaffected by the newer threads'
|
||
reinitialization of the allocator.
|
||
- Reorganized and streamlined the 'ind' directory, which contains much of
|
||
the code enabling use of induced methods for complex domain matrix
|
||
multiplication; deprecated bli_bsv_query.c and bli_ukr_query.c, as
|
||
those APIs' functionality is now mostly subsumed within the global
|
||
kernel structure.
|
||
- Updated bli_pool.c to define a new function, bli_pool_reinit_if(),
|
||
that will reinitialize a memory pool if the necessary pool block size
|
||
has increased.
|
||
- Updated bli_mem.c to use bli_pool_reinit_if() instead of
|
||
bli_pool_reinit() in the definition of bli_mem_pool_init(), and placed
|
||
usage of contexts where appropriate to communicate cache and register
|
||
blocksizes to bli_mem_compute_pool_block_sizes().
|
||
- Simplified control trees now that much of the information resides in
|
||
the context and/or the global kernel structure:
|
||
- Removed blocksize object pointers (blksz_t*) fields from all control
|
||
tree node definitions and replaced them with blocksize id (bszid_t)
|
||
values instead, which may be passed into a context query routine in
|
||
order to extract the corresponding blocksize from the given context.
|
||
- Removed micro-kernel function pointers (func_t*) fields from all
|
||
control tree node definitions. Now, any code that needs these function
|
||
pointers can query them from the local context, as identified by a
|
||
level-3 micro-kernel id (l3ukr_t), level-1f kernel id, (l1fkr_t), or
|
||
level-1v kernel id (l1vkr_t).
|
||
- Removed blksz_t object creation and initialization, as well as kernel
|
||
function object creation and initialization, from all operation-
|
||
specific control tree initialization files (bli_*_cntl.c), since this
|
||
information will now live in the gks and, secondarily, in the context.
|
||
- Removed blocksize multiples from blksz_t objects. Now, we track
|
||
blocksize multiples for each blocksize id (bszid_t) in the context
|
||
object.
|
||
- Removed the bool_t's that were required when a func_t was initialized.
|
||
These bools are meant to allow one to track the micro-kernel's storage
|
||
preferences (by rows or columns). This preference is now tracked
|
||
separately within the gks and contexts.
|
||
- Merged and reorganized many separate-but-related functions into single
|
||
files. This reorganization affects frame/0, 1, 1d, 1m, 1f, 2, 3, and
|
||
util directories, but has the most obvious effect of allowing BLIS
|
||
to compile noticeably faster.
|
||
- Reorganized execution paths for level-1v, -1d, -1m, and -2 operations
|
||
in an attempt to reduce overhead for memory-bound operations. This
|
||
includes removal of default use of object-based variants for level-2
|
||
operations. Now, by default, level-2 operations will directly call a
|
||
low-level (non-object based) loop over a level-1v or -1f kernel.
|
||
- Converted many common query functions in blk_blksz.c (renamed from
|
||
bli_blocksize.c) and bli_func.c into cpp macros, now defined in their
|
||
respective header files.
|
||
- Defined bli_mbool.c API to create and query "multi-bools", or
|
||
heterogeneous bool_t's (one for each floating-point datatype), in the
|
||
same spirit as blksz_t and func_t.
|
||
- Introduced two key parameters of the hardware: BLIS_SIMD_NUM_REGISTERS
|
||
and BLIS_SIMD_SIZE. These values are needed in order to compute a third
|
||
new parameter, which may be set indirectly via the aforementioned
|
||
macros or directly: BLIS_STACK_BUF_MAX_SIZE. This value is used to
|
||
statically allocate memory in macro-kernels and the induced methods'
|
||
virtual kernels to be used as temporary space to hold a single
|
||
micro-tile. These values are now output by the testsuite. The default
|
||
value of BLIS_STACK_BUF_MAX_SIZE is computed as
|
||
"2 * BLIS_SIMD_NUM_REGISTERS * BLIS_SIMD_SIZE".
|
||
- Cleaned up top-level 'kernels' directory (for example, renaming the
|
||
embarrassingly misleading "avx" and "avx2" directories to "sandybridge"
|
||
and "haswell," respectively, and gave more consistent and meaningful
|
||
names to many kernel files (as well as updating their interfaces to
|
||
conform to the new context-aware kernel APIs).
|
||
- Updated the testsuite to query blocksizes from a locally-initialized
|
||
context for test modules that need those values: axpyf, dotxf,
|
||
dotxaxpyf, gemm_ukr, gemmtrsm_ukr, and trsm_ukr.
|
||
- Reformatted many function signatures into a standard format that will
|
||
more easily facilitate future API-wide changes.
|
||
- Updated many "mxn" level-0 macros (ie: those used to inline double loops
|
||
for level-1m-like operations on small matrices) in frame/include/level0
|
||
to use more obscure local variable names in an effort to avoid variable
|
||
shaddowing. (Thanks to Devin Matthews for pointing these gcc warnings,
|
||
which are only output using -Wshadow.)
|
||
- Added a conj argument to setm, so that its interface now mirrors that
|
||
of scalm. The semantic meaning of the conj argument is to optionally
|
||
allow implicit conjugation of the scalar prior to being populated into
|
||
the object.
|
||
- Deprecated all type-aware mixed domain and mixed precision APIs. Note
|
||
that this does not preclude supporting mixed types via the object APIs,
|
||
where it produces absolutely zero API code bloat.
|
||
|
||
commit dd856c2cb75a2221a503a73dde27790c34b91570
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Apr 11 10:39:18 2016 -0500
|
||
|
||
Translated MIC kernel to KNL and cleaned up a bit. Only real change is lack of swizzle modifiers for FMA instructions (used bcast from memory instead).
|
||
|
||
commit 7f27431d3fffdda99c282ec412731d0a90cb32a7
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Apr 8 10:04:39 2016 -0500
|
||
|
||
Copy mic kernel to knl for transliteration.
|
||
|
||
commit f8f02f0334ac020021e15a415bcd33aeea01deb4
|
||
Merge: 32c92d94 d1f8e5d9
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Apr 6 11:37:05 2016 -0500
|
||
|
||
Merge branch 'master' into const_correctness
|
||
|
||
commit 32c92d945c55708da0eb63be1771f8c5430e3910
|
||
Merge: 62914ccb 20af937b
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Apr 6 11:36:02 2016 -0500
|
||
|
||
Merge branch 'master' into const_correctness
|
||
|
||
commit d1f8e5d9b2ecd054ed103f4d642d748db2d4f173
|
||
Merge: 20af937b c11d28ee
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 5 12:21:27 2016 -0500
|
||
|
||
Merge pull request #60 from esauvage/master
|
||
|
||
sgemm µkernel for bulldozer : bug correction for k%4 != 0
|
||
|
||
commit c11d28eed89d65494bc4019f04d046520866c0ff
|
||
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
|
||
Date: Sat Apr 2 21:15:48 2016 +0200
|
||
|
||
cgemm µkernel for bulldozer : bug correction for k%4 != 0
|
||
|
||
commit 20af937b57f82bb3acb09418d5c0206e1b24f2c7
|
||
Merge: 36c3abb0 fc61a114
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Mar 31 14:37:30 2016 -0500
|
||
|
||
Merge pull request #59 from devinamatthews/fix_testsuite_makefile
|
||
|
||
Fix testsuite makefile
|
||
|
||
commit fc61a1143edeba4946d4b9915f1775bb08e643fc
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Mar 31 10:53:01 2016 -0500
|
||
|
||
Fix formatting in configure.
|
||
|
||
commit 26379b14de630e3a6c6eef5dfe87ff001558a8a6
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Mar 31 10:45:48 2016 -0500
|
||
|
||
Adjust paths in common.mk to support building from testsuite dir.
|
||
|
||
commit 36c3abb05fecb02d4a9ab13b2b69d133adf34583
|
||
Merge: 64b41fa5 917ce754
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Mar 31 10:26:17 2016 -0500
|
||
|
||
Merge pull request #58 from esauvage/master
|
||
|
||
cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer confi…
|
||
|
||
commit 356d854fc9e34642cc46e0e02a8ceb56114878af
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Mar 30 16:33:15 2016 -0500
|
||
|
||
Make symlink to common.mk in build directory.
|
||
|
||
commit edbb8470044f82ef959583ee09613a5a985292b5
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Wed Mar 30 16:27:11 2016 -0500
|
||
|
||
Refactor out some definitions which moved from make_defs.mk to Makefile for use in testsuite Makefile.
|
||
|
||
commit 917ce75482a543fef46553efff6c246939761e59
|
||
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
|
||
Date: Wed Mar 30 22:03:09 2016 +0200
|
||
|
||
cgemm & zgemm micro-kernels for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel
|
||
|
||
commit 62914ccbcdb3c594f065dcfa65bd7e7b95c79283
|
||
Merge: bbf704bf 64b41fa5
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Tue Mar 29 15:24:25 2016 -0500
|
||
|
||
Merge branch 'master' into const_correctness
|
||
|
||
commit 64b41fa554dff44b2f9ad48901b67c63836407a8
|
||
Merge: 1b09e343 0171ad58
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 29 15:19:41 2016 -0500
|
||
|
||
Merge pull request #54 from devinamatthews/more_config_opts
|
||
|
||
More config opts
|
||
|
||
commit 1b09e343dfe5b48b4842e2cb96f41c8cc249bad0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 29 12:55:28 2016 -0500
|
||
|
||
Updated gcc version from 4.8 to 4.9 in .travis.yml.
|
||
|
||
commit 0171ad58997b3a5a9b76301511dbe0751fffc940
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Mar 28 13:55:06 2016 -0500
|
||
|
||
Add icc and clang support for Intel architectures, fixes #47. 2bd036f fixes #49 BTW.
|
||
|
||
commit 3090fff64cc87ff2519a09f38e6b8699cf3cba11
|
||
Merge: 8624e365 4ca5d5b1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 28 12:36:25 2016 -0500
|
||
|
||
Merge pull request #44 from esauvage/master
|
||
|
||
sgemm micro-kernel for FMA4 instruction set
|
||
|
||
commit e6e566426ac3ded7ef87cd8ff9be98accfdc4acc
|
||
Merge: 469429ec 8624e365
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Sat Mar 26 14:10:15 2016 -0500
|
||
|
||
Merge branch 'master' into more_config_opts
|
||
|
||
commit 8624e36543160739d954c4dbcc5a5594458f3a12
|
||
Merge: a315833f 2bd036f1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Mar 26 13:56:28 2016 -0500
|
||
|
||
Merge pull request #50 from devinamatthews/fix_noopt_avx
|
||
|
||
Fix configuration issue where instruction set flags are not specified for debug builds.
|
||
|
||
commit 469429ec34e5b1a172ce35596f9c7afdaacac131
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Mar 25 20:45:41 2016 -0500
|
||
|
||
Fix LD_FLAGS -> LDFLAGS.
|
||
|
||
commit 8442d65c9ead0376fc5f2dfad62fd4862ab9b2b3
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Mar 25 20:06:48 2016 -0500
|
||
|
||
Replace -march=native with specific architecture flags to support cross-compiling, and add icc support for Intel architectures.
|
||
|
||
commit 76099f20be1b49ac960f7e3c5a8296bbf4e1782d
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Mar 25 17:22:58 2016 -0500
|
||
|
||
Add threading option to configure.
|
||
|
||
commit ad43eab4c7899d56d8d7caa6e2d92bc0581ea5a5
|
||
Merge: 9452bdb3 2bd036f1
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Mar 25 15:00:02 2016 -0500
|
||
|
||
Merge branch 'fix_noopt_avx' into more_config_opts
|
||
|
||
commit 9452bdb3afbf2d7f898134a091d7790817e7be9c
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Mar 25 14:59:50 2016 -0500
|
||
|
||
Add options for verbose make output and static/shared linking to configure.
|
||
|
||
commit 2bd036f1f9ce1ee0864365557f66d9415dd42de3
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Mar 25 12:16:49 2016 -0500
|
||
|
||
Fix configuration issue where instruction set flags are not specified for debug builds.
|
||
|
||
commit bbf704bf7501411964a63a68f1af541f612cf92d
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Mar 25 09:55:35 2016 -0500
|
||
|
||
Add missing const to bli_read_nway_from_env.
|
||
|
||
commit a315833f067944fb0bc14cf60f0c7dcb5dc897b6
|
||
Merge: 1d1a426d af92773f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Mar 24 12:30:21 2016 -0500
|
||
|
||
Merge pull request #48 from figual/master
|
||
|
||
Updated and improved ARMv8 micro-kernels.
|
||
|
||
commit af92773f4f85a2441fe0c6e3a52c31b07253d08e
|
||
Author: figual <figual@ucm.es>
|
||
Date: Wed Mar 23 22:07:02 2016 +0100
|
||
|
||
Updated and improved ARMv8 micro-kernels.
|
||
|
||
commit a4d7729776d17d9bdf2341eacd70b9770b9ba8d2
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Mon Mar 21 09:55:21 2016 -0500
|
||
|
||
Set default value for debug_type variable.
|
||
|
||
commit 0e2447fa55d8c5fa2b1fc4150073512495c5f9eb
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Mar 17 16:32:05 2016 -0500
|
||
|
||
Add const correctness to auxinfo_t struct (microkernels need update theoretically).
|
||
|
||
commit 1d1a426d18ec03754021456862a1f4d1dfec1fbf
|
||
Merge: 5a978fff d226dfa0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 7 15:17:53 2016 -0600
|
||
|
||
Merge pull request #46 from devinamatthews/new-config-opts
|
||
|
||
Add several changes to the build system.
|
||
|
||
commit d226dfa05190eb477b33563b1edccf8603973336
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Sat Mar 5 16:18:14 2016 -0600
|
||
|
||
Add several changes to the build system.
|
||
|
||
1) Add -- options.
|
||
2) Add -d/--enable-debug option to enable debugging symbols with and without optimization.
|
||
3) Allow user to specify CC at configure time, and determine vendor (gcc/icc/etc.). For now configurations enforce a particular vendor.
|
||
4) Add make V=[0,1] option to control build verbosity.
|
||
|
||
commit 5a978fffdb8f09a81c89541d541d4a6830cd70a4
|
||
Merge: adb2b4e0 63e26423
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 4 17:26:58 2016 -0600
|
||
|
||
Merge pull request #45 from devinamatthews/high_prec_timers
|
||
|
||
Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday
|
||
|
||
commit 63e264239053b913164a849dd8a45829087eaddc
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Mar 4 13:17:50 2016 -0600
|
||
|
||
Make sure that -lrt is linked on Linux.
|
||
|
||
commit 44fddd48dc1708a956803d1948f04429ec0d8700
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Fri Mar 4 12:36:38 2016 -0600
|
||
|
||
Add missing \.
|
||
|
||
commit 7cabd2131f953de23e7015d760b0ddfda51b1251
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Mar 3 11:43:07 2016 -0600
|
||
|
||
Use clock_gettime(CLOCK_MONOTONIC) and mach_absolute_time instead of gettimeofday.
|
||
|
||
commit adb2b4e096c78e8b2f85fd372cf0d5eb04af5be8
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Mar 2 14:48:12 2016 -0600
|
||
|
||
Fixing guard for non implemented partitioning through packed matrices
|
||
|
||
commit 4ca5d5b1fd6f2e4a8b2e139c5405475239581e51
|
||
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
|
||
Date: Tue Mar 1 21:33:01 2016 +0100
|
||
|
||
sgemm micro-kernel for FMA4 instruction set (bulldozer configuration), based on x86_64/avx micro-kernel
|
||
|
||
commit 627d59b5ba06866b26f46e4434a0435b600925e3
|
||
Author: Etienne Sauvage <etienne.sauvage@gmail.com>
|
||
Date: Mon Feb 29 21:53:12 2016 +0100
|
||
|
||
symbolic link for bulldozer configuration to kernels
|
||
|
||
commit 2dc5c0ae038ed175fab85751803ada05734d1ba1
|
||
Merge: f2809fc5 3d0fae81
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Feb 29 12:22:51 2016 -0600
|
||
|
||
Merge pull request #40 from tkelman/bulldozer-symlink
|
||
|
||
Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
|
||
|
||
commit f2809fc5f74466c755da6a5b4632853e634060b5
|
||
Merge: f86b94f2 8624a33c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Feb 27 13:06:03 2016 -0600
|
||
|
||
Merge pull request #39 from devinamatthews/fix_f2c_conflicts
|
||
|
||
Devin's f2c type namespace update.
|
||
|
||
Details:
|
||
- Added "bla_" prefix to f2c type names to prevent conflicts with external user code.
|
||
- Removed most of the body of bli_f2c.h, which was unused.
|
||
|
||
commit 3d0fae810d942085d8f2d389820b4e0027577db8
|
||
Author: Tony Kelman <tony@kelman.net>
|
||
Date: Thu Feb 25 23:24:03 2016 -0800
|
||
|
||
Add symlink from config/bulldozer/kernels to kernels/x86_64/bulldozer
|
||
|
||
to fix linking issue mentioned in #37 and https://groups.google.com/forum/#!topic/blis-devel/iypwljcaeEI
|
||
|
||
commit 8624a33ccc12dff6f6c4f92992ca5636af1576a6
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Feb 25 13:51:26 2016 -0600
|
||
|
||
Fix remaining f2c conflicts.
|
||
|
||
commit 372eef0b6c0a535bf88d4b46b72f61266e8491ba
|
||
Author: Devin Matthews <dmatthews@utexas.edu>
|
||
Date: Thu Feb 25 12:01:58 2016 -0600
|
||
|
||
Fixed most conflicts after hack-n-slash ofr bli_f2c.h, cleanup in
|
||
progress.
|
||
|
||
commit f86b94f206e2e09fa3221cc55c3dc5b05ca4775a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 23 18:12:34 2016 -0600
|
||
|
||
Included missing blas2blis integer def to CBLAS.
|
||
|
||
Details:
|
||
- Added #include "bli_config_macro_defs" to all cblas_*.c files in
|
||
compat/cblas/src. This has the effect of defining
|
||
BLIS_BLAS2BLIS_INT_TYPE_SIZE to the default value if bli_config.h does
|
||
not define it. Thanks to Tony Kelman for reporting this bug.
|
||
- In cblas_i?amax.c, changed the type of the variable 'iamax' from 'int'
|
||
to 'f77_int'. This eliminates a compiler warning and a potential
|
||
runtime bug and/or crash when the size of an int differs from the size
|
||
of f77_int (as determined by BLIS_BLAS2BLIS_INT_TYPE_SIZE).
|
||
|
||
commit 0b126de1342c11c65623bcb38e258e21e9244e3d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 13 16:29:12 2015 -0600
|
||
|
||
Consolidated packm_blk_var1 and packm_blk_var2.
|
||
|
||
Details:
|
||
- Consolidated the two blocked variants for packm into a single
|
||
implementation (packm_blk_var1) and removed the other variant.
|
||
- Updated all induced method _cntl_init() functions in frame/cntl/ind/
|
||
to use the new blocked variant 1.
|
||
- Defined two new macros, bli_is_ind_packed() and bli_is_nat_packed(),
|
||
to detect pack_t schemas for induced methods and native execution,
|
||
respectively.
|
||
|
||
commit 30e5eb29e060b97752f702d2ea5d101d950f53b2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 13 12:14:19 2015 -0600
|
||
|
||
Minor changes to treatment of rs, cs in bli_obj.c.
|
||
|
||
Details:
|
||
- Applied a patch submitted by Devin Matthews that:
|
||
- implements subtle changes to handling of somewhat unusual cases of
|
||
row and column strides to accommodate certail tensor cases, which
|
||
includes adding dimension parameters to _is_col_tilted() and
|
||
_is_row_tilted() macros,
|
||
- simplifies how buffers are sized when requested BLIS-allocated
|
||
objects,
|
||
- re-consolidates bli_adjust_strides_*() into one function, and
|
||
- defines 'restrict' keyword as a "nothing" macro for C++ and pre-C99
|
||
environments.
|
||
|
||
commit f0a4f41b5acf55b41707ec821c4c5f9076dfbc24
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Nov 12 15:22:50 2015 -0600
|
||
|
||
Fixed unimplemented case in core2 sgemm ukernel.
|
||
|
||
Details:
|
||
- Implemented the "beta == 0" case for general stride output for the
|
||
dunnington sgemm micro-kernel. This case had been, up until now,
|
||
identical to the "beta != 0" case, which does not work when the
|
||
output matrix has nan's and inf's. It had manifested as nan residuals
|
||
in the test suite for right-side tests of ctrsm4m1a. Thanks to Devin
|
||
Matthews for reporting this bug.
|
||
|
||
commit 42810bbfa0b8f006ecc5128d903909ec13ea63f9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Nov 12 12:07:46 2015 -0600
|
||
|
||
Fixed minor bugs for uncommon obj_create cases.
|
||
|
||
Details:
|
||
- Separated bli_adjust_strides() into _alloc() and _attach() flavors so
|
||
that the latter can avoid a test performed by the former, in which the
|
||
rs and cs are overridden and set to zero if either matrix dimension is
|
||
zero. Actually, we also disable this overridding behavior, even for the
|
||
_alloc() case, since keeping the original strides (probably) does not
|
||
hurt anything. The original code has been kept commented-out, though,
|
||
in case an unintended consequence is later discovered.
|
||
- Fixed a typo in an error check for general stride cases where rs == cs.
|
||
|
||
commit 3e6dd11467643fbc2cb45c13cec8dd6024232833
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 3 10:30:08 2015 -0600
|
||
|
||
Minor re-expression in quadratic partitioning code.
|
||
|
||
Details:
|
||
- Minor change to quadratic equation solution code that avoids
|
||
recomputation of the sqrt() parameter when the compiler is not
|
||
smart enough to perform this optimization automatically.
|
||
|
||
commit 0694b722f7e4df00efb32639095a2aca80e67f52
|
||
Merge: 3e116f0a 33557ecc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 2 17:24:25 2015 -0600
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 3e116f0a2953f50b3c068759a775ad7ffae04e49
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 2 17:18:23 2015 -0600
|
||
|
||
Fixed imaginary bug in quadratic partitioning code.
|
||
|
||
Details:
|
||
- Fixed a bug in the relatively new quadratic partitioning code that,
|
||
under the right conditions, would perform sqrt() on a negative value.
|
||
If the solution is imaginary, we discard it and use an alternate
|
||
partition width that assumes no diagonal intersection. That alternate
|
||
width is actually already computed, so, the fix was quite simple.
|
||
Thanks to Devangi Parikh for reporting this bug.
|
||
|
||
commit 33557ecccaf49b2569b7f3d7bcea52c2aab94c68
|
||
Author: Jeff Hammond <jeff.science@gmail.com>
|
||
Date: Mon Nov 2 12:18:43 2015 -0800
|
||
|
||
add Travis CI build status icon to the README
|
||
|
||
commit 4a502fbe77bd0f701108baaa559d9cfb483f88de
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 2 13:28:34 2015 -0600
|
||
|
||
Laid groundwork for runtime memory pool resizing.
|
||
|
||
Details:
|
||
- Changed bli_pool_finalize() so that the freeing begins with the block
|
||
at top_index instead of block 0. This allows us to use the function
|
||
for terminal finalization as well as temporary cleanup prior to
|
||
reinitialization. Also, clear the pool_t struct upon _pool_finalize()
|
||
in case it is called in the terminal case with some blocks still
|
||
checked out to threads (in which case the threads will see the new
|
||
block size as 0 and thus release the block as intended).
|
||
- Added bli_pool_reinit(), which calls _pool_finalize() followed by
|
||
_pool_init() with new parameters.
|
||
- Added bli_mem_reinit(), which is based on bli_pool_reinit().
|
||
- Added new wrapper, _mem_compute_pool_block_sizes(), which calls
|
||
_mem_compute_pool_block_sizes_dt().
|
||
- Updated bli_mem_release() so that the pblk_t is freed, via
|
||
_pool_free_block(), if the block size recorded in the mem_t at the
|
||
time the pblk_t was acquired is now different from the value in the
|
||
pool_t.
|
||
|
||
commit 37e55ca39bdbddaec03ad30d43e8ad2b3e549c96
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 30 18:25:04 2015 -0500
|
||
|
||
Fixed obscure 3m1/4m1a bugs in trmm[3] and trsm.
|
||
|
||
Details:
|
||
- Fixed a family of bugs in the triangular level-3 operations for
|
||
certain complex implementations (3m1 and 4m1a) that only manifest if
|
||
one of the register blocksizes (PACKMR/PACKNR, actually) is odd:
|
||
- Fixed incorrect imaginary stride computation in bli_packm_blk_var2()
|
||
for the triangular case.
|
||
- Fixed the incorrect computation of imaginary stride, as stored in
|
||
the auxinfo_t struct in trmm and trsm macro-kernels.
|
||
- Fixed incorrect pointer arithmetic in the trsm macro-kernels in the
|
||
cases where the the register blocksize for the triangular matrix is
|
||
odd. Introduced a new byte-granular pointer arithmetic macro,
|
||
bli_ptr_add(), that computes the correct value.
|
||
- Added cpp macro to bli_macro_defs.h for typeof() operator, defined in
|
||
terms of __typeof__, which is used by bli_ptr_add() macro.
|
||
- Disabled the row- vs. column-storage optimization in bli_trmm_front()
|
||
for singleton problems because the inherent ambiguity of whether a
|
||
scalar is row-stored or column-stored causes the wrong parameter
|
||
combination code to be executed (by dumb luck of our checking for
|
||
row storage first).
|
||
- Added commented-out debugging lines to 3m1/4m1a and reference
|
||
micro-kernels, and trsm_ll macro-kernel.
|
||
|
||
commit 46294d80e5a79c598e200e1c8ec2a642ff839971
|
||
Merge: d3159c57 a0a7b85a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Oct 27 12:41:23 2015 -0500
|
||
|
||
Merge pull request #35 from figual/master
|
||
|
||
Fixed incomplete code in the double precision ARMv8 microkernel.
|
||
|
||
commit a0a7b85ac3e157af53cff8db0e008f4a3f90372c
|
||
Author: Francisco Igual <figual@ucm.es>
|
||
Date: Tue Oct 27 08:59:15 2015 +0000
|
||
|
||
Fixed incomplete code in the double precision ARMv8 microkernel.
|
||
|
||
commit d3159c5740c9ee7f8c0b661003aab6f00646ad6f
|
||
Merge: b489152e 7e03e45b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 21 14:54:00 2015 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit b489152e112644ec3b6d19e687231a9607f7694f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 21 14:53:17 2015 -0500
|
||
|
||
Use vzeroall in haswell micro-kernels.
|
||
|
||
commit 7e03e45bfe6c27c4fdbf06b1caa7f49e9a5fef49
|
||
Merge: 77ddb0b1 4f88c29f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 14 13:26:07 2015 -0500
|
||
|
||
Merge pull request #33 from xianyi/master
|
||
|
||
Enable Travis CI
|
||
|
||
commit 4f88c29f9e634cbb6fb22d8c88931f0ec78ad7db
|
||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||
Date: Wed Oct 14 12:57:50 2015 -0500
|
||
|
||
Detect Intel Broadwell (using Haswell config).
|
||
|
||
commit 4b0ac1a9984a93f7ad4369b10fca63991107d9f5
|
||
Merge: fe3e355c 77ddb0b1
|
||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||
Date: Wed Oct 14 12:51:05 2015 -0500
|
||
|
||
Merge branch 'upstream_master'
|
||
|
||
commit 77ddb0b1d31ada111dadf392766ba6d9210ed9fb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Oct 13 12:53:06 2015 -0500
|
||
|
||
Removed flop-counting mechanism.
|
||
|
||
Details:
|
||
- Removed the optional flop-counting feature introduced in commit
|
||
7574c994.
|
||
|
||
commit 276da366187460a4c8e6e0910e79cb39ce780bfe
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 12 11:43:03 2015 -0500
|
||
|
||
Minor formatting change to README.md.
|
||
|
||
commit d17057446f5404824478e8a6cd08f242ab75544a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 12 11:39:49 2015 -0500
|
||
|
||
Added "Getting Started" section to README.md.
|
||
|
||
Details:
|
||
- Added section to README.md file containing links to wikis with brief
|
||
descriptions.
|
||
|
||
commit e7e1f2f7b601b21b50e3cdad8972cb3fe11018d3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 2 16:51:52 2015 -0500
|
||
|
||
Minor updates to CREDITS, README files.
|
||
|
||
commit 55329906ecd7ce1ab910e4d30a29354a9172e7ea
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Sep 26 20:47:19 2015 -0500
|
||
|
||
Minor edits to README.md, testsuite.
|
||
|
||
Details:
|
||
- Fixed typos in README.md.
|
||
- Fixed column heading alignment for testsuite when matlab output is
|
||
enabled.
|
||
- Minor updates to test/3m4m/runme.sh and test/3m4m/Makefile.
|
||
|
||
commit bbebdb5793a8fd6aaf257012ab0272beaa04a0de
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Sep 25 14:47:27 2015 -0500
|
||
|
||
Replaced README with README.md.
|
||
|
||
Details:
|
||
- Replaced the old (and short) README file with a much more comprehensive
|
||
version written in github-flavored markdown. The new file is based on
|
||
content taken from the old Google Code homepage.
|
||
|
||
commit e2e9d64a63485461192d9c2a6dd0183a8b71013c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Sep 24 12:14:03 2015 -0500
|
||
|
||
Load balance thread ranges for arbitrary diagonals.
|
||
|
||
Details:
|
||
- Expanded/updated interface for bli_get_range_weighted() and
|
||
bli_get_range() so that the direction of movement is specified in the
|
||
function name (e.g. bli_get_range_l2r(), bli_get_range_weighted_t2b())
|
||
and also so that the object being partitioned is passed instead of an
|
||
uplo parameter. Updated invocations in level-3 blocked variants, as
|
||
appropriate.
|
||
- (Re)implemented bli_get_range_*() and bli_get_range_weighted_*() to
|
||
carefully take into account the location of the diagonal when computing
|
||
ranges so that the area of each subpartition (which, in all present
|
||
level-3 operations, is proportional to the amount of computation
|
||
engendered) is as equal as possible.
|
||
- Added calls to a new class of routines to all non-gemm level-3 blocked
|
||
variants:
|
||
bli_<oper>_prune_unref_mparts_[mnk]()
|
||
where <oper> is herk, trmm, or trsm and [mnk] is chosen based on which
|
||
dimension is being partitioned. These routines call a more basic
|
||
routine, bli_prune_unref_mparts(), to prune unreferenced/unstored
|
||
regions from matrices and simultaneously adjust other matrices which
|
||
share the same dimension accordingly.
|
||
- Simplified herk_blk_var2f, trmm_blk_var1f/b as a result of more the
|
||
new pruning routines.
|
||
- Fixed incorrect blocking factors passed into bli_get_range_*() in
|
||
bli_trsm_blk_var[12][fb].c
|
||
- Added a new test driver in test/thread_ranges that can exercise the new
|
||
bli_get_range_*() and bli_get_range_weighted_*() under a range of
|
||
conditions.
|
||
- Reimplemented m and n fields of obj_t as elements in a "dim"
|
||
array field so that dimensions could be queried via index constant
|
||
(e.g. BLIS_M, BLIS_N). Adjusted/added query and modification
|
||
macros accordingly.
|
||
- Defined mdim_t type to enumerate BLIS_M and BLIS_N indexing values.
|
||
- Added bli_round() macro, which calls C math library function round(),
|
||
and bli_round_to_mult(), which rounds a value to the nearest multiple
|
||
of some other value.
|
||
- Added miscellaneous pruning- and mdim_t-related macros.
|
||
- Renamed bli_obj_row_offset(), bli_obj_col_offset() macros to
|
||
bli_obj_row_off(), bli_obj_col_off().
|
||
|
||
commit fe3e355c9c5a6f65b8736b009e2d501b62a83ea1
|
||
Merge: efa641e3 4dd9dd3e
|
||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||
Date: Fri Aug 21 14:38:36 2015 -0500
|
||
|
||
Merge branch 'upstream_master'
|
||
|
||
commit efa641e36b73abee34166a252e90e28a6281d92d
|
||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||
Date: Sat Aug 22 03:15:50 2015 +0800
|
||
|
||
Try to fix the compiling bug on travis.
|
||
|
||
commit 4dd9dd3e1de626b51bfe85d9ee65f193d60e8d38
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Aug 21 11:52:37 2015 -0500
|
||
|
||
Fixed minor alignment ambiguity bug in bli_pool.c.
|
||
|
||
Details:
|
||
- Fixed a typecasting ambiguity in bli_pool_alloc_block() in which
|
||
pointer arithmetic was performed on a void* as if it were a byte
|
||
pointer (such as char*). Some compilers may have already been
|
||
interpreting this situation as intended, despite the sloppiness.
|
||
Thanks to Aleksei Rechinskii for reporting this issue.
|
||
- Redefined pointer alignment macros to typecast to uintptr_t instead of
|
||
siz_t.
|
||
|
||
commit 12ffd568b04feda57147c13b67717416a01c82f8
|
||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||
Date: Sat Aug 22 00:24:28 2015 +0800
|
||
|
||
Add Travis CI.
|
||
|
||
commit ecc3ebb749e0861c27deda52b5f87236ede4901b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 29 13:31:12 2015 -0500
|
||
|
||
CHANGELOG update (0.1.8)
|
||
|
||
commit 47caa33485b91ea6f2a5e386e61210c90c5f489f (tag: 0.1.8)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 29 13:31:09 2015 -0500
|
||
|
||
Version file update (0.1.8)
|
||
|
||
commit ef0fbbbdb6148b96938733fce72cb4ed7dad685e
|
||
Merge: fdfe14f1 d4b89136
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 9 13:54:54 2015 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit fdfe14f1e17ba5a2f8dfa0bdb799c6b0e730211b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 9 13:52:39 2015 -0500
|
||
|
||
Added support for Intel Haswell/Broadwell.
|
||
|
||
Details:
|
||
- Added sgemm and dgemm micro-kernels, which employ 256-bit AVX vectors
|
||
and FMA instructions. (Complex support is currently provided by default
|
||
induced method, 4m1a.)
|
||
- Added a 'haswell' configuration, which uses the aforementioned kernels.
|
||
- Inserted auto-detection support for haswell configuration in
|
||
build/auto-detect/cpuid_x86.c.
|
||
- Modified configure script to explicitly echo when automatic or manual
|
||
configuration is in progress.
|
||
- Changed beta scalar in test_gemm.c module of test suite to -1.0 to 0.9.
|
||
|
||
commit d4b891369c1eb0879ade662ff896a5b9a7fca207
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 7 10:06:53 2015 -0500
|
||
|
||
Added 'carrizo' configuration.
|
||
|
||
Details:
|
||
- Added a new configuration for AMD Excavator-based hardware also known
|
||
as Carrizo when referring to the entire APU. This configuration uses
|
||
the same micro-kernels as the piledriver, but with different
|
||
cache blocksizes.
|
||
|
||
commit 0b7255a642d56723f02d7ca1f8f21809967b8515
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jun 19 12:01:50 2015 -0500
|
||
|
||
CHANGELOG update (0.1.7)
|
||
|
||
commit 267253de8a7be546ce87626443ee38701c1d411f (tag: 0.1.7)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jun 19 12:01:49 2015 -0500
|
||
|
||
Version file update (0.1.7)
|
||
|
||
commit 7cd01b71b5e757a6774625b3c9f427f5e7664a76
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jun 19 11:31:53 2015 -0500
|
||
|
||
Implemented dynamic allocation for packing buffers.
|
||
|
||
Details:
|
||
- Replaced the old memory allocator, which was based on statically-
|
||
allocated arrays, with one based on a new internal pool_t type, which,
|
||
combined with a new bli_pool_*() API, provides a new abstract data
|
||
type that implements the same memory pool functionality but with blocks
|
||
from the heap (ie: malloc() or equivalent). Hiding the details of the
|
||
pool in a separate API also allows for a much simpler bli_mem.c family
|
||
of functions.
|
||
- Added a new internal header, bli_config_macro_defs.h, which enables
|
||
sane defaults for the values previously found in bli_config. Those
|
||
values can be overridden by #defining them in bli_config.h the same
|
||
way kernel defaults can be overridden in bli_kernel.h. This file most
|
||
resembles what was previously a typical configuration's bli_config.h.
|
||
- Added a new configuration macro, BLIS_POOL_ADDR_ALIGN_SIZE, which
|
||
defaults to BLIS_PAGE_SIZE, to specify the alignment of individual
|
||
blocks in the memory pool. Also added a corresponding query routine to
|
||
the bli_info API.
|
||
- Deprecated (once again) the micro-panel alignment feature. Upon further
|
||
reflection, it seems that the goal of more predictable L1 cache
|
||
replacement behavior is outweighed by the harm caused by non-contiguous
|
||
micro-panels when k % kc != 0. I honestly don't think anyone will even
|
||
miss this feature.
|
||
- Changed bli_ukr_get_funcs() and bli_ukr_get_ref_funcs() to call
|
||
bli_cntl_init() instead of bli_init().
|
||
- Removed query functions from bli_info.c that are no longer applicable
|
||
given the dynamic memory allocator.
|
||
- Removed unnecessary definitions from configurations' bli_config.h files,
|
||
which are now pleasantly sparse.
|
||
- Fixed incorrect flop counts in addv, subv, scal2v, scal2m testsuite
|
||
modules. Thanks to Devangi Parikh for pointing out these
|
||
miscalculations.
|
||
- Comment, whitespace changes.
|
||
|
||
commit 9848f255a3bab17d1139c391cca13ff3f1ffe6ed
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jun 11 19:14:22 2015 -0500
|
||
|
||
Added early return to API-level _init() routines.
|
||
|
||
Details:
|
||
- Added conditional code that returns early from the API-level _init()
|
||
routines if the API is already initialized. Actually meant for this to
|
||
be included in 5f93cbe8.
|
||
|
||
commit 5f93cbe870f3478870e15581e7fd450dad5bba1e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jun 11 18:52:12 2015 -0500
|
||
|
||
Introduced API-level initialization.
|
||
|
||
Details:
|
||
- Added API-level initialization state to _const, _error, _mem, _thread,
|
||
_ind, and _cntl APIs. While this functionality will mostly go unused,
|
||
adding miniscule overhead at init-time, there will be at least once
|
||
instance in the near future where, in order to avoid an infinite loop,
|
||
a certain portion of the initialization will call a query function that
|
||
itself attempts to call bli_init(). API-level initialization will allow
|
||
this later stage to verify that an earlier stage of initialization has
|
||
completed, even if the overall call to bli_init() has not yet returned.
|
||
- Added _is_initialized() functions for each API, setting the underlying
|
||
bool_t during _init() and unsetting it during _finalize().
|
||
- Comment, whitespace changes.
|
||
|
||
commit ee129c6b028bc5ac88da7c74fde72c49803742ff
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jun 10 12:53:28 2015 -0500
|
||
|
||
Fixed bugs in _get_range(), _get_range_weighted().
|
||
|
||
Details:
|
||
- Fixed some bugs that only manifested in multithreaded instances of
|
||
some (non-gemm) level-3 operations. The bugs were related to invalid
|
||
allocation of "edge" cases to thread subpartitions. (Here, we define
|
||
an "edge" case to be one where the dimension being partitioned for
|
||
parallelism is not a whole multiple of whatever register blocksize
|
||
is needed in that dimension.) In BLIS, we always require edge cases
|
||
to be part of the bottom, right, or bottom-right subpartitions.
|
||
(This is so that zero-padding only has to happen at the bottom, right,
|
||
or bottom-right edges of micro-panels.) The previous implementations
|
||
of bli_get_range() and _get_range_weighted() did not adhere to this
|
||
implicit policy and thus produced bad ranges for some combinations of
|
||
operation, parameter cases, problem sizes, and n-way parallelism.
|
||
- As part of the above fix, the functions bli_get_range() and
|
||
_get_range_weighted() have been renamed to use _l2r, _r2l, _t2b,
|
||
and _b2t suffixes, similar to the partitioning functions. This is
|
||
an easy way to make sure that the variants are calling the right
|
||
version of each function. The function signatures have also been
|
||
changed slightly.
|
||
- Comment/whitespace updates.
|
||
- Removed unnecessary '/' from macros in bli_obj_macro_defs.h.
|
||
|
||
commit 9135dfd69d39f3bbd75034f479f27a78dbfebcce
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jun 5 13:37:44 2015 -0500
|
||
|
||
Minor updates to test/3m4m files.
|
||
|
||
commit d62ceece943b20537ec4dd99f25136b9ba2ae340
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jun 3 12:56:45 2015 -0500
|
||
|
||
Minor update to test/3m4m/runme.sh.
|
||
|
||
Details:
|
||
- Removed some stale script code that should have been removed
|
||
during 590bb3b8c.
|
||
|
||
commit b6ee82a3d421c9c4f1eb6848c7c6e37aa46de799
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jun 3 12:14:23 2015 -0500
|
||
|
||
Minor cleanup to bli_init() and friends.
|
||
|
||
Details:
|
||
- Spun-off initialization of global scalar constants to bli_const_init()
|
||
and of threading stuff to bli_thread_init().
|
||
- Added some missing _finalize() functions, even when there is nothing
|
||
to do.
|
||
|
||
commit 1213f5cebabc1637ce9dd45c4bfa87bb93677c29
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jun 2 13:27:47 2015 -0500
|
||
|
||
POSIX thread bugfixes/edits to bli_init.c, _mem.c.
|
||
|
||
Details:
|
||
- Fixed a sort-of bug in bli_init.c whereby the wrong pthread mutex
|
||
was used to lock access to initialization/finalization actions.
|
||
But everything worked out okay as long as bli_init() was called by
|
||
single-threaded code.
|
||
- Changed to static initialization for memory allocator mutex in
|
||
bli_mem.c, and moved mutex to that file (from bli_init.c).
|
||
- Fixed some type mismatches in bli_threading_pthreads.c that resulted
|
||
in compiler warnings.
|
||
- Fixed a small memory leak with allocated-but-never-freed (and unused)
|
||
pthread_attr_t objects.
|
||
- Whitespace changes to bli_init.c and bli_mem.c.
|
||
|
||
commit 590bb3b8c5c0389159c5a9451b6c156c5f237e8a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun May 24 16:02:53 2015 -0500
|
||
|
||
Backed-out adjusted dim changes to test/3m4m.
|
||
|
||
Details:
|
||
- Reverted most changes applied during commit ec25807b.
|
||
|
||
commit ec25807b26da943868f0d0517c3720e50181b8f9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 10 13:23:50 2015 -0500
|
||
|
||
Tweaks to test/3m4m to test with adjusted dims.
|
||
|
||
Details:
|
||
- Updated test/3m4m driver files to build test drivers that allow
|
||
comparision of real "asm_blis" results to complex "asm_blis" results,
|
||
except with the latter's problem sizes adjusted so that problems are
|
||
generated with equal flop counts.
|
||
|
||
commit 426b6488580a92bf071a62dc319a9c837ce39821
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 8 15:12:21 2015 -0500
|
||
|
||
Fixed a packing bug that manifested in trsm_r.
|
||
|
||
Details:
|
||
- Fixed a bug that caused a memory leak in the contiguous memory
|
||
allocator. Because packm_init() was using simple aliasing when
|
||
a subpartition object was marked as zeros by bli_acquire_mpart_*(),
|
||
the "destination" pack object's mem_t entry was being overwritten
|
||
by the corresponding field of the "source" object (which was likely
|
||
NULL). This prevented the block from being released back to the
|
||
memory allocator. But this bug only manifested when changing the
|
||
location of packing B from outside the var1 loop to inside the
|
||
var3 loop, and only for trsm with triangular B (side = right). The
|
||
bug was fixed by changing the type of alias used in packm_init()
|
||
when handling zero partition cases. Specifically, we now use
|
||
bli_obj_alias_for_packing(), which does not clobber the destination
|
||
(pack) object's mem_t field. Thanks to Devangi Parikh for this bug
|
||
report.
|
||
|
||
commit c84286d5cef48f16d83831baac1f46b9856b9a36
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Apr 4 15:39:14 2015 -0500
|
||
|
||
More minor tweaks to test/3m4m.
|
||
|
||
Details:
|
||
- Added a line of output that forces matlab to allocate the entire array
|
||
up-front.
|
||
- Re-enabled real domain benchmarks in runme.sh, which were temporarily
|
||
disabled.
|
||
|
||
commit 309717c8ebf4ef1369f15cf41340e13c25b41573
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 3 19:28:49 2015 -0500
|
||
|
||
More tweaks to test/3m4m, configurations.
|
||
|
||
Details:
|
||
- Fixed incorrect number of mc_x_kc memory blocks in
|
||
sandybridge/bli_config.h.
|
||
- Enabled OpenMP multithreding in piledriver/bli_config.h.
|
||
- More updates to test/3m4m driver files.
|
||
|
||
commit 4baf3b9c69b2f648be9e46e07ccc9859dd675828
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 3 16:44:32 2015 -0500
|
||
|
||
Tweaked test/3m4m driver, including acml support.
|
||
|
||
Details:
|
||
- Added ACML support to test/3m4m driver Makefile and runme.sh script.
|
||
|
||
commit a32f7c49ca4ea869d2a6c66818780f4321743d67
|
||
Merge: 349e075a 4bfd1ce8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 3 08:28:11 2015 -0500
|
||
|
||
Merge pull request #23 from xianyi/master
|
||
|
||
Add auto-detecting CPU on configure stage.
|
||
|
||
commit 349e075ad6a8e2a1211d94f36d24828c9d44b052
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 2 18:12:28 2015 -0500
|
||
|
||
Tweaks to sandybridge config, test/3m4m driver.
|
||
|
||
Details:
|
||
- Enable OpenMP support by default in sandybridge's bli_config.h.
|
||
- Reorganized sandybridge's bli_kernel.h.
|
||
- Updated 3m4m Makefile, runme.sh to also test MKL implementation.
|
||
|
||
commit 4bfd1ce8ca93f93d170dd2715f0a32027b417b46
|
||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||
Date: Thu Apr 2 16:40:21 2015 -0500
|
||
|
||
Detect NEON for cortex-a9 and cortex-a15.
|
||
|
||
commit aa6eec4f43137057276fe6119bdbfb5c52682527
|
||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||
Date: Thu Apr 2 16:03:44 2015 -0500
|
||
|
||
Detect the CPU architecture. Support ARM cores.
|
||
|
||
Detect the CPU architecture by compiler's predefined macros.
|
||
Then, detect the CPU cores.
|
||
|
||
Support detecting x86 and ARM architectures.
|
||
|
||
commit 2947cfb749c937b0f62fac36cc92f123bd45b53c
|
||
Author: Zhang Xianyi <traits.zhang@gmail.com>
|
||
Date: Wed Apr 1 12:24:00 2015 -0500
|
||
|
||
Add auto-detecting CPU on configure stage.
|
||
e.g. /Path_to_BLIS/configure auto
|
||
|
||
Now, it only support detecting x86 CPUs.
|
||
|
||
commit 26a4b8f6f985597f80e0174990bf541f1d9bafac
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 1 10:44:54 2015 -0500
|
||
|
||
Implemented 3m2, 3m3 induced algorithms (gemm only).
|
||
|
||
Details:
|
||
- Defined a new "3ms" (separated 3m) pack schema and added appropriate
|
||
support in packm_init(), packm_blk_var2().
|
||
- Generalized packm_struc_cxk_3mi to take the imaginary stride (is_p)
|
||
as an argument instead of computing it locally. Exception: for trmm,
|
||
is_p must be computed locally, since it changes for triangular
|
||
packed matrices. Also exposed is_p in interface to dt-specific
|
||
packm_blk_var2 (and _var1, even though it does not use imaginary
|
||
stride).
|
||
- Renamed many functions/variables from _3mi to _3mis to indicate that
|
||
they work for either interleaved or separated 3m pack schemas.
|
||
- Generalized gemm and herk macro-kernels to pass in imaginary stride
|
||
rather than compute them locally.
|
||
- Added support for 3m2 and 3m3 algorithms to frame/ind, including 3m2-
|
||
and 3m3-specific virtual micro-kernels.
|
||
- Added special gemm macro-kernels to support 3m2 and 3m3.
|
||
- Added support for 3m2 and 3m3 to testsuite.
|
||
- Corrected the type of the panel dimension (pd_) in various macro-
|
||
kernels from inc_t to dim_t.
|
||
- Renamed many functions defined in bli_blocksize.c.
|
||
- Moved most induced-related macro defs from frame/include to
|
||
frame/ind/include.
|
||
- Updated the _ukernel.c files so that the micro-kernel function pointers
|
||
are obtained from the func_t objects rather than the cpp macros that
|
||
define the function names.
|
||
- Updated test/3m4m driver, Makefile, and run script.
|
||
|
||
commit ddf62ba7d2da08225b201585b85e06c967767dea
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Mar 27 14:27:51 2015 -0500
|
||
|
||
Refuse to free the packm thread info if it uses the single threaded version
|
||
|
||
commit 016fc587584d958a0e430a56a5e2c05022ac2f17
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Mar 27 14:23:02 2015 -0500
|
||
|
||
Don't free packm thread info if it is null
|
||
|
||
commit 00a443c529a60862a57b93e303a0b3212c9b1df4
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Mar 27 14:11:07 2015 -0500
|
||
|
||
Use bli_malloc instead of malloc for the thread info paths
|
||
|
||
commit f1a6b7d02861ccebdc500ea98778cc0f6cddad17
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Mar 18 15:37:10 2015 -0500
|
||
|
||
Reorganized code for induced complex methods.
|
||
|
||
Details:
|
||
- Consolidated most of the code relating to induced complex methods
|
||
(e.g. 4mh, 4m1, 3mh, 3m1, etc.) into frame/ind. Induced methods
|
||
are now enabled on a per-operation basis. The current "available"
|
||
(enabled and implemented) implementation can then be queried on
|
||
an operation basis. Micro-kernel func_t objects as well as blksz_t
|
||
objects can also be queried in a similar maner.
|
||
- Redefined several micro-kernel and operation-related functions in
|
||
bli_info_*() API, in accordance with above changes.
|
||
- Added mr and nr fields to blksz_t object, which point to the mr
|
||
and nr blksz_t objects for each cache blocksize (and are NULL for
|
||
register blocksizes). Renamed the sub-blocksize field "sub" to
|
||
"mult" since it is really expressing a blocksize multiple.
|
||
- Updated bli_*_determine_kc_[fb]() for gemm/hemm/symm, trmm, and
|
||
trsm to correctly query mr and nr (for purposes of nudging kc).
|
||
- Introduced an enumerated opid_t in bli_type_defs.h that uniquely
|
||
identifies an operation. For now, only level-3 id values are defined,
|
||
along with a generic, catch-all BLIS_NOID value.
|
||
- Reworked testsuite so that all induced methods that are enabled
|
||
are tested (one at a time) rather than only testing the first
|
||
available method.
|
||
- Reformated summary at the beginning of testsuite output so that
|
||
blocksize and micro-kernel info is shown for each induced method
|
||
that was requested (as well as native execution).
|
||
- Reduced the number of columns needed to display non-matlab
|
||
testsuite output (from approx. 90 to 80).
|
||
|
||
commit 8d5169ccda954e5f72944308a036dcb7ebfc9097
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Mar 18 11:38:08 2015 -0500
|
||
|
||
Fixed bug in release of mem_t buffer.
|
||
|
||
Details:
|
||
- Fixed a bug that affects all level-2 and level-3 blocked variants. The
|
||
bug only manifested, however, if the packing of operands (A and B in
|
||
gemm, for example) spanned multiple nodes in the control tree. Until
|
||
recently, the main consumers of packm were level-3 operations, all of
|
||
which packed both input operands from blocked variant 1 (B outside of
|
||
the loop, and A within the loop). This particular usage masked a flaw
|
||
in the code whereby bli_obj_release_pack() would always release the
|
||
underlying mem_t buffer (provided it was allocated), even if the buffer
|
||
was not allocated in the current variant. This has been fixed by
|
||
replacing all calls to bli_obj_release_pack() with calls to a new
|
||
function, bli_packm_release(), which takes the same control tree node
|
||
argument passed into the object's corresponding call to packm_init()
|
||
or packv_init(). bli_packm_release() then proceeds to invoke
|
||
bli_obj_release_pack() only if the control tree node indicates that
|
||
packing was requested. Thanks to Devangi Parikh for identifying this
|
||
bug.
|
||
|
||
commit c0acca0f5182ba96fd39c9d10b34a896a6e74206
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 3 10:56:22 2015 -0600
|
||
|
||
Clarified comments in testsuite input.operations.
|
||
|
||
commit 03ba9a6b17861d9e1adc0cf924439c4d7e860d19
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 24 10:33:28 2015 -0600
|
||
|
||
Removed some 'old' directories.
|
||
|
||
commit a86db60ee270cdeb745ae7cf68f9e0becc9f522d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Feb 23 18:42:39 2015 -0600
|
||
|
||
Extensive renaming of 3m/4m-related files, symbols.
|
||
|
||
Details:
|
||
- Renamed all remaining 3m/4m packing files and symbols to 3mi/4mi
|
||
('i' for "interleaved"). Similar changes to 3M/4M macros.
|
||
- Renamed all 3m/4m files and functions to 3m1/4m1.
|
||
- Whitespace changes.
|
||
|
||
commit 8cf8da291a0fb2f491f410969a76ec0fbda47faf
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 20 15:24:27 2015 -0600
|
||
|
||
Minor updates to induced complex mode management.
|
||
|
||
Details:
|
||
- Relocated bli_4mh.c, bli_4mb.c, bli_4m.c, bli_3mh.c, bli_3m.c (and
|
||
associated headers) from frame/base to frame/base/induced.
|
||
- Added bli_xm.? to frame/base/induced, which implements
|
||
bli_xm_is_enabled(), which detects whether ANY induced complex method
|
||
is currently enabled.
|
||
- The new function bli_xm_is_enabled() is now used in bli_info.c to
|
||
detect when an induced complex method is used, so we know when to
|
||
return blocksizes from one of the induced methods' blocksize objects.
|
||
|
||
commit 411e637ee7d1083a84f58f08938d51e63d7c3c9a
|
||
Merge: c2569b88 fc0b7712
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Fri Feb 20 20:39:25 2015 -0600
|
||
|
||
Merge branch 'master' of http://github.com/flame/blis
|
||
|
||
commit c2569b8803d4ccc1d7b6f391713461b51443601d
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Fri Feb 20 20:38:19 2015 -0600
|
||
|
||
Fixed a memory leak in freeing the thread infos
|
||
|
||
commit fc0b771227abf86d81f505b324f69f6e83db1d8f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 20 11:47:44 2015 -0600
|
||
|
||
Added max(mr,nr) to kc in static mem pools.
|
||
|
||
Details:
|
||
- Changed the static memory definitions to compute the maximum register
|
||
blocksize for each datatype and add it to kc when computing the size
|
||
of blocks of A and B. This formally accounts for the nudging of kc
|
||
up to a multiple of mr or nr at runtime for triangular operations
|
||
(e.g. trmm).
|
||
|
||
commit af32e3a608631953ef770341df10a14a991bf290
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Thu Feb 19 22:51:11 2015 -0600
|
||
|
||
Fixed a bug with get_range_weighted would return end = 0 for small problem sizes
|
||
|
||
commit 441d47542a64e131578d00da7404c1ed387a721c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 19 17:06:10 2015 -0600
|
||
|
||
Renamed 3m and 4m symbols/macros to 3mi and 4mi.
|
||
|
||
Details:
|
||
- Renamed several variables and macros from 3m/4m to 3mi/4mi. This is
|
||
because those packing schemas were always implicitly "interleaved".
|
||
This new naming scheme will make way for new schemas that separate
|
||
instead of interleve the real and imaginary (and summed) parts.
|
||
- Expanded the pack format sub-field of the pack schema field of the
|
||
info_t to 4 bits (from 3). This will allow for more schema types
|
||
going forward.
|
||
- Removed old _cntl.c files for herk3m, herk4m, trmm3m, trmm4m.
|
||
|
||
commit 518a1756ccf02122b96fc437b538604a597df42a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 19 14:27:09 2015 -0600
|
||
|
||
Fixed indexing bug for trmm3 via 3mh, 4mh.
|
||
|
||
Details:
|
||
- Fixed a bug that only affected trmm3 when performed via 3mh or 4mh,
|
||
whereby micro-panels of the triangular matrix were packed with "dead
|
||
space" between them due to failing to adjust for the fact that pointer
|
||
arithmetic was occurring in units of complex elements while the data
|
||
being packed consisted of real elements. It turns out that the macro-
|
||
kernel suffered from the same bug, meaning the panels were actually
|
||
being packed and read consistently. The only way I was able to
|
||
discover the bug in the first place was because the packed block of A
|
||
was overflowing into the beginning of the packed row panel of B using
|
||
the sandybridge configuration.
|
||
|
||
commit 493087d730f01d5169434f461644e5633f48a42f
|
||
Merge: 650d2a6f 25021299
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 18 09:45:51 2015 -0600
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 25021299b670775df8ca9c87910c63d7e74ed946
|
||
Merge: fe2b8d39 f05a5763
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 11 20:03:21 2015 -0600
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit fe2b8d39a445ac848686e78c7540fd046cb95492
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 11 19:33:10 2015 -0600
|
||
|
||
Fixed an obscure bug in 3mh/3m/4mh/4m packing.
|
||
|
||
Details:
|
||
- Modified bli_packm_blk_var1.c and _var2.c to increase the triangular
|
||
case's panel increment by 1 if it would otherwise be odd. This is
|
||
particularly necessary in _var2.c when handling the interleaved 3m
|
||
or ro/io/rpi pack schemas, since division of an odd number by 2 can
|
||
happen if both the panel length and the panel packing dimension
|
||
(register packing blocksize) are odd, thus making their product odd.
|
||
- Modified bli_packm_init.c so that panel strides are increased by 1
|
||
if they would otherwise be odd, even for non-3m related packing.
|
||
- Modified the trmm and trsm macro-kernels so that triangular packed
|
||
micro-panels are traversed with this new "increment by 1 if odd"
|
||
policy.
|
||
- Added sanity checks in trmm and trsm macro-kernels that would result
|
||
in an abort() if the conditions that would lead to a "divide odd
|
||
integer by 2" scenario ever manifest.
|
||
- Defined bli_is_odd(), _is_even() macros in bli_scalar_macro_defs.h.
|
||
|
||
commit 650d2a6ff2e593151a296ca86b5214afcc747afc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Feb 9 14:59:20 2015 -0600
|
||
|
||
Added initial support for imaginary stride.
|
||
|
||
Details:
|
||
- Added an imaginary stride field ("is") to obj_t.
|
||
- Renamed bli_obj_set_incs() macro to bli_obj_set_strides().
|
||
- Defined bli_obj_imag_stride() and bli_obj_set_imag_stride() and
|
||
added invocations in key locations.
|
||
- Added some basic error-checking related to imaginary stride.
|
||
- For now, imaginary stride will not be exposed into the most-used
|
||
BLIS APIs such as bli_obj_create(), and certainly not the
|
||
computational APIs such as bli_dgemm().
|
||
|
||
commit f05a57634a7c8e3864b25b3335d1194c1ea1aeb9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Feb 8 19:40:34 2015 -0600
|
||
|
||
Defined gemm cntl function to query ukrs func_t.
|
||
|
||
Details:
|
||
- Added a new function, bli_gemm_cntl_ukrs(), that returns the func_t*
|
||
for the gemm micro-kernels from the leaf node of the control tree.
|
||
This allows all the func_t* fields from higher-level nodes in the tree
|
||
to be NULL, which makes the function that builds the control trees
|
||
slightly easier to read.
|
||
- Call bli_gemm_cntl_ukrs() instead of the cntl_gemm_ukrs() macro in
|
||
all bli_*_front() functions (which is needed to apply the row/column
|
||
preference optimization).
|
||
- In all level-3 bli_*_cntl_init() functions, changed the _obj_create()
|
||
function arguments corresponding to the gemm_ukrs fields in higher-
|
||
level cntl tree nodes to NULL.
|
||
- Removed some old her2k macro-kernels.
|
||
|
||
commit cefd3d5d2001264de17cf63dae541f890cb9daaf
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Feb 5 11:09:12 2015 -0600
|
||
|
||
A couple of functions were incorrectly ifdeffed away on Xeon Phi. Fixed this
|
||
|
||
commit 7574c9947d57a19f613880e3b9f62f8c8f6df4ec
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 4 12:11:55 2015 -0600
|
||
|
||
Added basic flop-counting mechanism (level-3 only).
|
||
|
||
Details:
|
||
- Added optional flop counting to all level-3 front-ends, which is
|
||
enabled via BLIS_ENABLE_FLOP_COUNT. The flop count can be
|
||
reset at any time via bli_flop_count_reset() and queried via
|
||
bli_flop_count(). Caveats:
|
||
- flop counts are approximate for her[2]k, syr[2]k, trmm, and
|
||
trsm operations;
|
||
- flop counts ignore extra flops due to non-unit alpha;
|
||
- flop counts do not account for situations where beta is zero.
|
||
|
||
commit ceda4f27d1f1bcf19320e09848e0f2e3b9941e6c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jan 29 13:22:54 2015 -0600
|
||
|
||
Implemented bli_obj_imag_equals().
|
||
|
||
Details:
|
||
- Implemented a new function, bli_obj_imag_equals(), which compares the
|
||
imaginary part of the first argument to the second argument, which may
|
||
be a BLIS_CONSTANT or of a regular real datatype.
|
||
|
||
commit 81114824a05a9053229efd577a8a94a856deda93
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jan 6 12:15:21 2015 -0600
|
||
|
||
Minor 4m/3m consolidation to mem_pool_macro_defs.h.
|
||
|
||
Details:
|
||
- Merged the 4m and 3m definitions in bli_mem_pool_macro_defs.h to
|
||
reduce code and improve readability.
|
||
|
||
commit 36a9b7b7436d9423ba4de2a9f85cfcd43577b783
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Wed Dec 17 21:53:50 2014 +0000
|
||
|
||
reduced the default number of MC by KC blocks for bgq
|
||
|
||
commit c60619c7c3568f044a849abbab60209aa7455423
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 16 17:08:22 2014 -0600
|
||
|
||
Minor tweaks for 3m4m test drivers.
|
||
|
||
Details:
|
||
- Changed gemm_kc blocksizes to be reduced by two-thirds instead of
|
||
half.
|
||
- Changed 3m4m/test_gemm.c driver to divide by 3 instead of 2 when
|
||
computing the fixed k dimension.
|
||
- Fixed runme.sh so that it would use multiple threads for s/dgemm
|
||
cases.
|
||
|
||
commit c6929ba6a5e6f633a7295e979a2b8df8c7ecdb1b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 16 11:27:50 2014 -0600
|
||
|
||
Added 4m_1b to test/3m4m test driver and script.
|
||
|
||
commit 785d480805fc0d6f4251b5499933515740b6b2a7
|
||
Merge: 9456f330 4156c088
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Dec 12 14:34:19 2014 -0600
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 9456f330af4617f9ee32972d51f974aa2d84f97b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Dec 12 14:31:57 2014 -0600
|
||
|
||
Added 4m_1b implementation for gemm.
|
||
|
||
Details:
|
||
- Added yet another 4m-based implementation for complex domain level-3
|
||
operations. This method, which the 3m/4m paper identifies as Algorithm
|
||
"4m_1b" fissures the first loop around the micro-kernel so that the
|
||
real sub-panel of the current micro-panel of B is multiplied against
|
||
(both sub-panels of) all micro-panels of A, before doing the same for
|
||
the imaginary sub-panel of the micro-panel of B. For now, only gemm is
|
||
supported, and 4m_1b (labeled "4mb" within the framework) is not yet
|
||
integrated into the test suite.
|
||
|
||
commit 4156c0880d9aea4ff04a9c4fa139ba8c437d8bfb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 9 16:03:14 2014 -0600
|
||
|
||
Fixed obscure level-2 packing / general stride bug.
|
||
|
||
Details:
|
||
- Fixed a bug in certain structured level-2 operations that manifested
|
||
only when the structured matrix was provided to BLIS as matrix stored
|
||
with general stride. The bug was introduced in c472993b when the
|
||
densify field was removed from the packm control tree node and
|
||
associated APIs. Since then, the packed object was unconditionally
|
||
marked with an uplo field of BLIS_DENSE. This is fine for level-3
|
||
operations where micro-panels are always densified, but in level-2
|
||
contexts, the underlying unblocked variant (fused or unfused) of
|
||
structured operations (e.g. trmv) still needs to know whether to
|
||
execute its "lower" or "upper" branches of code. Since this field
|
||
was unconditionally being set to BLIS_DENSE, the unblocked variants
|
||
were always executed the "else" branch, which happened to be the
|
||
"lower" case code. Thus, running an upper case produced the wrong
|
||
answer. This most obviously manifested in the form of failures for
|
||
trmm, trmm3, and trsm in the test suite.
|
||
The bug was fixed by setting the packed object's uplo field to
|
||
BLIS_DENSE only if the schema indicated that micro-panels were to be
|
||
packed. Otherwise, we can assume we are packing to regular row or
|
||
column storage, as is the case with level-2 packing. Thanks to
|
||
Francisco Igual for reporting the testsuite failures and ultimately
|
||
leading us to this bug.
|
||
|
||
commit 689f60a578b461119e9ea90c74f642b9eb79addb
|
||
Merge: bef24e67 483e4d6a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Dec 7 14:03:30 2014 -0600
|
||
|
||
Merge pull request #21 from figual/master
|
||
|
||
Adding armv8a configuration and micro-kernels.
|
||
|
||
commit 483e4d6a3fdbef9d9ab47fb674c9476c70ca9f0f
|
||
Author: Francisco D. Igual <figual@ucm.es>
|
||
Date: Sun Dec 7 20:27:49 2014 +0100
|
||
|
||
Adding armv8a configuration and micro-kernels.
|
||
|
||
Only sgemm micro-kernel is fully functional at this point.
|
||
|
||
commit bef24e67e0f93579c2a80315348dc2e227f72a72
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Nov 26 18:00:56 2014 -0600
|
||
|
||
Fixed a type of race condition exposed by pthreads implementation.
|
||
Lead thread of the inner thread communicator could exit subproblem, move on the next iteration of the loop and modify a1_pack, b1_pack, or c1_pack while other threads were still using those.
|
||
|
||
Barriers were inserted to fix this.
|
||
|
||
commit 76bde44411f0e34266bab9d666a54ef22be97320
|
||
Merge: e56e6143 f3d729e5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 26 17:25:24 2014 -0600
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit f3d729e504ec012e7dc7e02b2ecd42e004c6894d
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Wed Nov 26 22:25:24 2014 -0600
|
||
|
||
Added static mutex to bli_init and bli_finalize
|
||
|
||
commit d71cc797866ff502ad1127527016f463267eef80
|
||
Author: Tyler Michael Smith <tms@cs.utexas.edu>
|
||
Date: Wed Nov 26 21:35:39 2014 -0600
|
||
|
||
Refactored bli_threading files and added support for pthreads
|
||
|
||
commit e56e61438ff7fcf25a48c0b7603f18df782b50b6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 26 17:20:35 2014 -0600
|
||
|
||
Minor cleanups to bli_threading.h and friends.
|
||
|
||
Details:
|
||
- No longer need to define BLIS_ENABLE_MULTITHREADING manually in
|
||
bli_config.h; it now gets defined when BLIS_ENABLE_OPENMP or
|
||
BLIS_ENABLE_PTHREADS is defined.
|
||
- Added sanity check to prevent both BLIS__ENABLE_OPENMP and
|
||
BLIS_ENABLE_PTHREADS from being enabled simultaneously.
|
||
- Reorganization of bli_threading*.h header files, which led to
|
||
simplification of threading-related part of blis.h.
|
||
- added "-fopenmp -lpthread" to LDFLAGS of sandybridge make_defs.mk
|
||
file.
|
||
|
||
commit 3be2744cbe2c56d38c23fd818aa5c1f10cc7ea51
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 21 12:28:08 2014 -0600
|
||
|
||
Update to template gemm ukernel comments.
|
||
|
||
Details:
|
||
- Updated comments on alignment of a1 and b1 to match wiki.
|
||
|
||
commit 994429c6881b2ade92d9d7949bcaebfbf2cc65eb
|
||
Merge: 58796abd 694029d9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Nov 20 13:55:35 2014 -0600
|
||
|
||
Merge pull request #20 from TimmyLiu/master
|
||
|
||
#define PASTEF773 required by cblas compatibility layer
|
||
|
||
commit 694029d9d7db857d642ab536955c0621791108c8
|
||
Author: Timmy <timmy.liu@amd.com>
|
||
Date: Wed Nov 19 15:25:14 2014 -0600
|
||
|
||
#define PASTEF773 required by cblas compatiility layer
|
||
|
||
commit 58796abda66b133346f8d523b39178afc336351f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Nov 6 14:31:52 2014 -0600
|
||
|
||
Removed KC constraint comments from _kernel.h files.
|
||
|
||
Details:
|
||
- Since 4674ca8c, the constraint that KC be a multiple of both MR and
|
||
NR have been relaxed, and thus it was time to remove the comments
|
||
from the top of the bli_kernel.h files of all configurations.
|
||
|
||
commit 7bbc95a54f706d43c7f7951f0e5995f86130cd52
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 29 10:52:23 2014 -0500
|
||
|
||
Added new piledriver micro-kernels.
|
||
|
||
Details:
|
||
- Added new micro-kernels for the AMD piledriver architecture (one
|
||
for each datatype).
|
||
- Updates and tweaks to piledriver configuration.
|
||
- Added 3xk packm micro-kernel support.
|
||
- Explicitly unrolled some of the smaller packm micro-kernels.
|
||
- Added notes to avx/sandybridge and piledriver micro-kernel files
|
||
acknowledging the influence of the corresponding kernel code in
|
||
OpenBLAS.
|
||
|
||
commit 59613f1d5500f6279963327db2fbc84bc9135183
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 23 17:21:37 2014 -0500
|
||
|
||
Added separeate micro-panel alignment for A and B.
|
||
|
||
Details:
|
||
- Changed the recently-added micro-panel alignment macros so that we now
|
||
have two sets--one for micro-panels of matrix A and one for micro-
|
||
panels of matrix B: BLIS_UPANEL_[AB]_ALIGN_SIZE_?.
|
||
- Store each set of alignment values into a separate blksz_t object in
|
||
bli_gemm_cntl_init().
|
||
- Adjusted packm_init() to use the separate alignment values.
|
||
- Added query routines for the new alignment values to bli_info.c.
|
||
- Modified test suite output accordingly.
|
||
|
||
commit a8e12884ee1fddd3fd77ca5a68aa0cb857f3af57
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 23 11:35:48 2014 -0500
|
||
|
||
CHANGELOG update (0.1.6)
|
||
|
||
commit 38ea5022e4ed846112198c4e1672fcdaeb90dc71 (tag: 0.1.6)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 23 11:35:45 2014 -0500
|
||
|
||
Version file update (0.1.6)
|
||
|
||
commit a3e6341bdb0e28411f935d6b4708a6389663e004
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 23 11:13:28 2014 -0500
|
||
|
||
Factored common code from blocksize functions.
|
||
|
||
Details:
|
||
- Split bli_determine_blocksize_[fb]() into two functions each, the
|
||
newer ones ending with the _sub suffix. These new sub-functions are
|
||
now called from bli_[gemm|trmm|trsm]_determine_kc_[fb](), which
|
||
eliminates redundant code and will allow any future tweaks to the
|
||
core sub-functions to automatically be inherited by the operation-
|
||
specific versions.
|
||
|
||
commit 4674ca8cffb58331ff7edf23bbe0e3f6a7558489
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 23 10:50:59 2014 -0500
|
||
|
||
Extended newly relaxed KC to hemm, symm.
|
||
|
||
Details:
|
||
- These changes were intended for the previous commit.
|
||
- Defined bli_gemm_determine_kc_[fb]() and bli_gemm_determine_kc_[fb](),
|
||
which determine blocksizes for gemm-based operations, taking special
|
||
care to "nudge" the kc dimension up to a multiple of MR or NR for
|
||
hemm and symm operations, as needed.
|
||
- Changed bli_gemm_blk_var3f.c to call bli_gemm_determine_kc_f().
|
||
instead of bli_determine_blocksize_f().
|
||
- Comment updates to bli_trmm_blocksize.c, bli_trsm_blocksize.c.
|
||
|
||
commit ab954ba6f874eaca7b001804491f866ef6b9b327
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 22 17:21:58 2014 -0500
|
||
|
||
Relaxed constraint that KC be multiple of MR, NR.
|
||
|
||
Details:
|
||
- Relaxed a long-held requirement in register blocksizes that required
|
||
the kernel programmer to choose a KC that was divisible by both MR
|
||
and NR. This was very constraining on some architectures that did not
|
||
use register blocksizes that were powers of two. The constraint is
|
||
now enforced only for trmm and trsm, where it is needed, and it is
|
||
now handled by "nudging" kc upward at runtime, if necessary, to be a
|
||
multiple of MR or NR, as needed.
|
||
- Defined bli_trmm_determine_kc_[fb]() and bli_trsm_determine_kc_[fb](),
|
||
which determine blocksizes for trmm and trsm, taking special care to
|
||
"nudge" the kc dimension up to a multiple of MR or NR, as needed.
|
||
- Changed bli_trmm_blk_var3[fb].c to call bli_trmm_determine_kc_[fb]()
|
||
instead of bli_determine_blocksize_[fb]().
|
||
- Added safeguard to bli_align_dim_to_mult() that returns the dimension
|
||
unmodified if the dimension multiple is zero (to avoid division by
|
||
zero).
|
||
- Removed cpp guard/check for KC % MR == 0 and KC % NR == 0 from
|
||
bli_kernel_macro_defs.h.
|
||
- Whitespace, variable name changes to bli_blocksize.c.
|
||
- Removed old commented code from bli_gemm_cntl.c.
|
||
|
||
commit 95cdae65d6b88e043ee14bcd53cd2e800d7aecb4
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Oct 22 16:30:16 2014 -0500
|
||
|
||
Fixed bug in KNC microkernel where k=0 and beta != 1
|
||
|
||
commit e64dba5633fc49b768b5edc7762f2b5d8a4d0588
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 20 19:23:06 2014 -0500
|
||
|
||
Re-implemented micro-panel alignment.
|
||
|
||
Details:
|
||
- This commit re-implements a feature that was removed in commit
|
||
c2b2ab62. It was removed because, at the time, I wasn't sure how the
|
||
micro-panel alignment feature would interact with the 4m method (when
|
||
applied at the micro-kernrel level), and so it seemed safer to disable
|
||
the feature entirely rather than allow possible breakage. This commit
|
||
revisits the issue and safely re-implements the feature in a way that
|
||
is compatible with 4m, 3m, 4mh, and 3mh (and native execution).
|
||
- Modified the static memory pool to account for micro-panel alignment
|
||
space.
|
||
- Modified packm_init and blocked variants to align whole micro-panels
|
||
by a datatype-specific alignment value that may be set by the
|
||
configuration. (If it is not set by the configuration, it will default
|
||
to BLIS_SIZEOF_?.)
|
||
- Modified macro-kernels so that:
|
||
- storage stride is handled properly given the new micro-panel
|
||
alignment behavior;
|
||
- indexing through 3m/4m/rih-type sub-panels, as is done by trmm and
|
||
trsm, is more robust (e.g. will work if the applicable packing
|
||
register blocksize is odd);
|
||
- imaginary strides are computed and stored within auxinfo_t structs,
|
||
which allows the virtual micro-kernels to more easily determine how
|
||
to index into the micro-panel operands.
|
||
- Modified virtual 3m and 4m micro-kernels to use the imaginary strides
|
||
within the auxinfo_t structs instead of panel strides.
|
||
- Deprecated the panel stride fields from the auxinfo_t structs.
|
||
- Updated test suite to print out the micro-panel alignment values.
|
||
|
||
commit add16b0e5402924301e7078e4ca5e3ef725bff0b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 17 11:49:24 2014 -0500
|
||
|
||
Added 3m4m test driver subdir of 'test'.
|
||
|
||
Details:
|
||
- Added a modified test driver for [cz]gemm that will test all 3m/4m
|
||
as well as assembly-based and OpenBLAS implementations of gemm
|
||
in single and multithreaded modes.
|
||
|
||
commit e171504a72406c61a173241d8bccf0a5ceb10582
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 17 11:25:59 2014 -0500
|
||
|
||
Use correct definition of bli_is_last_iter().
|
||
|
||
Details:
|
||
- As intended for previous commit, the new definition of
|
||
bli_is_last_iter() is now disabled in favor of the old
|
||
definition.
|
||
|
||
commit 0d954087b2b55d2f5f3c5e57d702b318ca2300f6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 17 11:19:34 2014 -0500
|
||
|
||
Minor changes and fixes.
|
||
|
||
Details:
|
||
- Redefined bli_is_last_iter() to take thread_id and num_thread
|
||
arguments, which allows the macro to correctly compute whether a
|
||
given iteration is the last that the thread will compute in that
|
||
particular loop. The new definition, however, remains disabled
|
||
(commented out) until someone can look at this more closely, as
|
||
the new definition seems to actually hurt performance slightly.
|
||
- Whitespace and related updates to level-3 macro-kernels.
|
||
- Updated test suite so that performance results in the hundreds of
|
||
gigaflops does not disrupt the column alignment of the output.
|
||
|
||
commit d1e86e1876e433f54b501ec5a005b4ba7c5ce4e6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Oct 12 13:43:47 2014 -0500
|
||
|
||
More minor tweaks to sandybridge/avx micro-kernel.
|
||
|
||
Details:
|
||
- Re-enabled use of b_next for dgemm and cgemm micro-kernels.
|
||
|
||
commit 7b6fe4cae57cb22c09c1a97595e1a201a02cbcd2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Oct 12 12:01:51 2014 -0500
|
||
|
||
Minor tweaks to sandybridge/avx micro-kernels.
|
||
|
||
Details:
|
||
- Changed the MC blocksize for zgemm micro-kernel from 128 to 64.
|
||
- Removed usage of b_next in all x86_64/avx gemm micro-kernels.
|
||
|
||
commit a6a156e9feec47154e7a0fd43bcc006b1fc04aba
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 10 14:26:41 2014 -0500
|
||
|
||
Added cgemm ukernel for avx/sandybridge.
|
||
|
||
Details:
|
||
- Implemented AVX-based cgemm micro-kernel (via GNU extended inline
|
||
assembly syntax).
|
||
- Updated sandybridge configuration accordingly.
|
||
|
||
commit 6f8575ab2580e167a022293b76ddf0514f71b613
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 10 10:01:45 2014 -0500
|
||
|
||
Added zgemm ukernel for avx/sandybridge.
|
||
|
||
Details:
|
||
- Implemented AVX-based zgemm micro-kernel (via GNU extended inline
|
||
assembly syntax).
|
||
- Updated sandybridge configuration accordingly.
|
||
|
||
commit 23ce7ee542a12ca40b4b6090ad2558d180e16d37
|
||
Merge: 99fd9a39 7a8ad47f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 9 16:41:22 2014 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 99fd9a39718cb7281f6fb23f9fef7cca4fe514f4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 9 16:38:04 2014 -0500
|
||
|
||
Fixed two minor bugs.
|
||
|
||
Details:
|
||
- Fixed a bug in the test suite for the trsm_ukr and gemmtrsm_ukr test
|
||
modules whereby the uplo bits of some packed matrix objects were not
|
||
being set properly, resulting in false FAILURE results for those
|
||
tests. Thanks to Tyler Smith for bringing this issue to my attention.
|
||
- Fixed a bug in bli_obj_alloc_buffer() that caused an unnecessary
|
||
"not yet implemented" abort() when creating a 1x1 object with non-unit
|
||
strides.
|
||
|
||
commit 7a8ad47fb2d100a9da93aa8cab774fcceeaab733
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Oct 8 15:52:13 2014 -0500
|
||
|
||
Minor changes to knc configuration, including preference row major storage
|
||
Also fixed a bug in the knc micro-kernel where it would fail if k == 0
|
||
|
||
commit 76b7c34af0c09f47d9615b18857a356acddc788a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 2 14:15:38 2014 -0500
|
||
|
||
Fixed a bug in the pack schema-related bit macros.
|
||
|
||
Details:
|
||
- Expanded the BLIS_PACK_SCHEMA_BITS value in bli_type_defs.h to
|
||
include all six bits presently used in the pack schema bitfield of
|
||
the info field of obj_t structs. Prior to this commit, the macro
|
||
constant only included the lowest five bits, which excluded the
|
||
"is or is not packed" bit. This manifested as a strange bug in
|
||
probably many level-2 codes that invoked packing, though we only
|
||
observed it in ger before fixing. Thanks to Devin Matthews for
|
||
finding and reporting this bug.
|
||
|
||
commit a5763e332226598d70c47dfa9cad4578e15ef5f4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 2 13:28:17 2014 -0500
|
||
|
||
Added extra output to bli_obj_print().
|
||
|
||
Details:
|
||
- Print extra values from info field of obj_t struct within
|
||
bli_obj_print().
|
||
|
||
commit 9bba209fc44fbfce943ba6a51cd8278a0cb6b159
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Sep 29 14:56:36 2014 -0500
|
||
|
||
Fixed bug when packing anywhere besides in blk_var_1 for gemm.
|
||
|
||
commit 614a4afc9272adb47e5a8b83b39d56c2804d95d6
|
||
Merge: b541b667 4a7df04e
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Sep 26 10:49:57 2014 -0500
|
||
|
||
Merge branch 'master' of http://github.com/flame/blis
|
||
|
||
commit 4a7df04e8a4ffdb9561d26426afd35e4fe15b013
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 22 16:06:15 2014 -0500
|
||
|
||
Added 30xk support for packm ukernels.
|
||
|
||
Details:
|
||
- Updated bli_kernel_*_macro_defs.h headers to include default
|
||
definitions for 30xk packm kernels.
|
||
- Extended function pointer arrays in bli_packm_cxk_*() out to 31 and
|
||
included 30xk kernels.
|
||
- Addex 30xk kernels to frame/1m/packm/ukernels/bli_packm_ref_cxk_*.c.
|
||
|
||
commit b6d4bd792e0d44ce4b28afef343f5ff3ba89c285
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 22 16:02:37 2014 -0500
|
||
|
||
Fixed missing tabs from Makefile patch.
|
||
|
||
commit 32630f9b6f0d5ba28d5b56dae4c7288a37158743
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Sep 19 17:18:20 2014 -0500
|
||
|
||
Comment update to virtual micro-kernels.
|
||
|
||
commit 13447cffead7c6d137a7a3ccbf9e552ed0477467
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Sep 19 13:00:48 2014 -0500
|
||
|
||
Minor bugfix to top-level Makefile.
|
||
|
||
Details:
|
||
- Applied a patch that allows the top-level Makefile to work on certain
|
||
systems. The patch simply separates out the source-to-object code
|
||
generation rules for .c and .S files into two separate rules. Thanks
|
||
to Devin Matthews for submitting this patch.
|
||
|
||
commit e80a4537846416719c067ae08a53aeda978c572d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Sep 18 10:24:20 2014 -0500
|
||
|
||
Fixed bug introduced by bugfix in 25b258d.
|
||
|
||
Details:
|
||
- We actually need to check alignment of lda*sizeof(double) and NOT
|
||
a+lda because in the latter case, alignment could cancel out and
|
||
still allow the optimized code to run when it shouldn't. Thanks
|
||
to Devin for pointing this out.
|
||
|
||
commit 25b258d61f9c8cee64e922f4131784b6edb196dd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Sep 18 10:10:49 2014 -0500
|
||
|
||
Fixed a non-fatal problem with bugfix in a68b316c.
|
||
|
||
Details:
|
||
- The bugfix in a68b316c was inadvertantly checkin alignment of the
|
||
leading dimension itself, rather than the byte size of the leading
|
||
dimension. Now, we simply check alignment of a+lda.
|
||
|
||
commit 96302d4fc81363410e41c3a3c43a65df44d97ad9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Sep 18 09:43:40 2014 -0500
|
||
|
||
Renamed bli_info_get_*_ukr_type() functions.
|
||
|
||
Details:
|
||
- Added _string() suffix to bli_info_get_*_ukr_type() function names.
|
||
This makes them consistent with the bli_info_get_*_impl_string()
|
||
functions.
|
||
|
||
commit a68b316ca4852509f84ed50e01afac486bf70f58
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Sep 17 11:10:07 2014 -0500
|
||
|
||
Fixed alignment bugs in level-1f kernels.
|
||
|
||
Details:
|
||
- Fixed bugs whereby the level-1f dotxf, axpyxf, and dotxaxpyf kernels
|
||
were attempting to compute problems with unaligned leading dimensions
|
||
with optimized code, rather than (correctly) using the reference
|
||
implementations. Thanks to Devin Matthews for reporting this bug.
|
||
|
||
commit 870761eb902e4866090d1d3446a345df3d6d4599
|
||
Merge: e9899be0 a2b59a37
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 16 18:20:49 2014 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit e9899be09044829e23386bd73e394f1dd7778210
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 16 18:19:32 2014 -0500
|
||
|
||
Added high-level implementations of 4m, 3m.
|
||
|
||
Details:
|
||
- Added "4mh" and "3mh" APIs, which implement the 4m and 3m methods at
|
||
high levels, respectively. APIs for trmm and trsm were NOT added due
|
||
to the fact that these approaches are inherently incompatible with
|
||
implementing 4m or 3m at high levels (because the input right-hand
|
||
side matrix is overwritten).
|
||
- Added 4mh, 3mh virtual micro-kernels, and updated the existing 4m and
|
||
3m so that all are stylistically consistent.
|
||
- Added new "rih" packing kernels (both low-level and structure-aware)
|
||
to support both 4mh and 3mh.
|
||
- Defined new pack_t schemas to support real-only, imaginary-only, and
|
||
real+imaginary packing formats.
|
||
- Added various level0 scalar macros to support the rih packm kernels.
|
||
- Minor tweaks to trmm macro-kernels to facilitate 4mh and 3mh.
|
||
- Added the ability to enable/disable 4mh, 3m, and 3mh, and adjusted
|
||
level-3 front-ends to check enabledness of 3mh, 3m, 4mh, and 4m (in
|
||
that order) and execute the first one that is enabled, or the native
|
||
implementation if none are enabled.
|
||
- Added implementation query functions for each level-3 operation so
|
||
that the user can query a string that describes the implementation
|
||
that is currently enabled.
|
||
- Updated test suite to output implementation types for reach level-3
|
||
operation, as well as micro-kernel types for each of the five micro-
|
||
kernels.
|
||
- Renamed BLIS_ENABLE_?COMPLEX_VIA_4M macros to _ENABLE_VIRTUAL_?COMPLEX.
|
||
- Fixed an obscure bug when packing Hermitian matrices (regular packing
|
||
type) whereby the diagonal elements of the packed micro-panels could
|
||
get tainted if the source matrix's imaginary diagonal part contained
|
||
garbage.
|
||
|
||
commit a2b59a37f166f70a6dd5793db2530823ef590c2b
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Sep 15 10:44:44 2014 -0500
|
||
|
||
Fixed make defs so that they actually compile for bulldozer
|
||
|
||
commit 86fc7e40764f78ec217f50216ef4fa5b57dbfbc7
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Sep 15 10:35:46 2014 -0500
|
||
|
||
Added bulldozer configuration and updated piledriver micro-kernel
|
||
|
||
commit 0644e61a79a57f136be5f4c47b9099cff2af06e0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Sep 11 12:55:34 2014 -0500
|
||
|
||
Minor updates to bli_packm_init.c.
|
||
|
||
commit 9dc9b44a057a08e20ad4d423344f0ecad54c1eb2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Sep 11 12:03:28 2014 -0500
|
||
|
||
Renamed bli_obj_pack_status() to _pack_schema().
|
||
|
||
Details:
|
||
- Renamed the bli_obj_pack_status() macro to bli_obj_pack_schema() in
|
||
order to help avoid confusion as to what the macro returns.
|
||
|
||
commit cf5efdde0588a0d5b6ea57fe7d7be5000be06f8e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Sep 11 11:47:56 2014 -0500
|
||
|
||
Pass pack_t schemas into ukernels via auxinfo_t.
|
||
|
||
Details:
|
||
- Modified macro-kernels to pass the pack_t schema values for matrices
|
||
A and B into the datatype-specific functions, where they are now
|
||
inserted into a newly-expanded auxinfo_t struct. This gives gives the
|
||
micro-kernels access to the pack_t schema values embedded in the
|
||
control trees, which determine the precise format into which the
|
||
matrix elements are packed.
|
||
- Updated a call to bli_packm_init_pack() in src/test_libblis.c to
|
||
remove densify argument. Meant to include this in commit c472993b.
|
||
|
||
commit cc8d2b82775cca3c2d51bf427f4e77c8024a6d15
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 9 13:48:22 2014 -0500
|
||
|
||
Updated old test drivers in 'test'.
|
||
|
||
commit c472993bbccb69e9ffc409c79b742426c8ad2ad4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 9 13:42:04 2014 -0500
|
||
|
||
Removed densify argument to packm_cntl_obj_create().
|
||
|
||
Details:
|
||
- Removed the "densify" bool_t argument to bli_packm_cntl_obj_create().
|
||
This argument was inserted very early in BLIS's development, when it
|
||
was anticipated that the developer may sometimes wish to pack a
|
||
Hermitian, symmetric, or triangular matrix without making it dense.
|
||
But as it turns out, if we are packing a matrix, we always want to
|
||
make it dense in some way or another due to the fact that the micro-
|
||
kernel only multiplies dense micro-panels. Thus, unless/until there
|
||
is a real need for the feature, it seems reasonable to remove it from
|
||
the packm_cntl API.
|
||
|
||
commit 5c43ee387146cd76dc59b730dac6683a8446b834
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 8 15:19:29 2014 -0500
|
||
|
||
Moved trmm4m/3m_cntl files to 'old' directory.
|
||
|
||
Details:
|
||
- Meant to include this in previous commit.
|
||
|
||
commit 7b2f469d5465ed73b1ca88124bc9a1987388aa27
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 8 14:49:50 2014 -0500
|
||
|
||
Retired trmm_t control tree definitions, usage.
|
||
|
||
Details:
|
||
- Replaced all trmm_t control tree instances and usage with that of
|
||
gemm_t. This change is similar to the recent retirement of the herk_t
|
||
control tree.
|
||
- Tweaked packm blocked variants so that the triangular code does NOT
|
||
assume that k is a multiple of MR (when A is triangular) or NR (when
|
||
B is triangular). This means that bottom-right micro-panels packed for
|
||
trmm will have different zero-padding when k is not already a multiple
|
||
of the relevant register blocksize. While this creates a seemingly
|
||
arbitrary and unnecessary distinction between trmm and trsm packing,
|
||
it actually allows trmm to be handled with one control tree, instead
|
||
of one for left and one for right side cases. Furthermore, since only
|
||
one tree is required, it can now be handled by the gemm tree, and thus
|
||
the trmm control tree definitions can be disposed of entirely.
|
||
- Tweaked trmm macro-kernels so that they do NOT inflate k up to a
|
||
multiple of MR (when A is triangular) or NR (when B is triangular).
|
||
- Misc. tweaks and cleanups to bli_packm_struc_cxk_4m.c and _3m.c, some
|
||
of which are to facilitate above-mentioned changes whereby k is no
|
||
longer required to be a multiple of register blocksize when packing
|
||
triangular micro-panels.
|
||
- Adjusted trmm3 according to above changes.
|
||
- Retired trmm_t control tree creation/initialization functions.
|
||
|
||
commit 576e9e9255a79dba9cd3c804267f51e0b4aa6e8a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Sep 7 16:12:52 2014 -0500
|
||
|
||
Retired herk_t control tree definitions, usage.
|
||
|
||
Details:
|
||
- Replaced all herk_t control tree instances and usage with that of
|
||
gemm_t, since the two types presently have the same fields. This means
|
||
that herk, her2k, syrk, and syr2k can simply use the gemm control tree
|
||
as-is, just as hemm and symm have been doing for some time now.
|
||
- Retired herk_t control tree creation/initialization functions.
|
||
- Retired many _target.c and .h files into 'old' directories.
|
||
|
||
commit b2fed052c9a23d858ef0afbe220b342bce9aa7f7
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Sep 3 17:07:25 2014 -0500
|
||
|
||
Minor code cleanup to bli_packm_struc_cxk*.c
|
||
|
||
Details:
|
||
- Realized that we don't need to track rs_p11 and cs_p11 for
|
||
Hermitian/symmetric case of bli_packm_struc_cxk*(). They are always
|
||
equal to rs_p and cs_p.
|
||
|
||
commit 023ce770966b3b5a98bba729c5af1f45e15ebb97
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Sep 3 10:47:53 2014 -0500
|
||
|
||
Minor update to packm_cxk kernels.
|
||
|
||
Details:
|
||
- Changed m and n dimension parameter names to panel_dim and panel_len,
|
||
respectively, in packm_cxk, packm_cxk_3m, packm_cxk_4m kernel wrapper
|
||
functions. This makes the code a little easier to read since "m" and
|
||
"n" have connotations that are not applicable here.
|
||
- Comment updates.
|
||
|
||
commit 189def3667d9218adbeec45e2801fd074341a679
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 1 16:23:17 2014 -0500
|
||
|
||
Retired portions of bli_kernel_3m/4m_macro_defs.h.
|
||
|
||
Details:
|
||
- Removed sections of bli_kernel_[4m|3m]_macro_defs.h that defined
|
||
4m/3m-specific blocksizes after realizing that this can be done in
|
||
bli_gemm[4m|3m]_cntl.c, since that is (mostly) the only place they
|
||
are used.
|
||
- The maximum cache values for 4m/3m are stll needed when computing mem
|
||
pool dimensions in bli_mem_pool_macro_defs.h. As a workaround, "local"
|
||
definitions in terms of the regular cache blocksizes are now in place.
|
||
- Similarly, the register blocksizes for 4m/3m are still needed in
|
||
bli_kernel_post_macro_defs.h. As a workaround, "local" definitions in
|
||
terms of the regular register blocksizes are now in place.
|
||
|
||
commit af521ee6f2a77d61c98b833e85c09969987bc00d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 1 14:06:46 2014 -0500
|
||
|
||
Changed semantics of blocksize extensions.
|
||
|
||
Details:
|
||
- Changed semantics of cache and register blocksize extensions so that
|
||
the extended values are tracked, rather than just the marginal
|
||
extensions.
|
||
- BLIS_EXTEND_[MKN]C_? has been renamed BLIS_MAXIMUM_[MKN]C_?.
|
||
- BLIS_EXTEND_[MKN]R_? has been renamed BLIS_PACKDIM_[MKN]R_?.
|
||
- bli_blksz_ext_*() APIs have been renamed to bli_blksz_max_*(). Note
|
||
that these "max" query routines grab the maximum value for cache
|
||
blocksizes and the packdim value for register blocksizes.
|
||
- bli_info_*() API has been updated accordingly.
|
||
- All configurations have been updated accordingly.
|
||
|
||
commit 07f23aefd52f5ba4960dbd46e59b180a2136b8e9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Aug 31 11:58:50 2014 -0500
|
||
|
||
Pass pack schema into packm_struc_cxk*().
|
||
|
||
Details:
|
||
- Changed the interface to the packm_struc_cxk*() kernels to include
|
||
the pack_t schema. This allows the implementation to more easily
|
||
determine how the micro-panel is stored (row-stored column panel
|
||
or column-stored row panel).
|
||
- Updated packm blocked variants to pass in the schema.
|
||
- Updated packm_ker_t function pointer definition accordingly.
|
||
|
||
commit f032ba9b1186cb02184574d339565f53d733aa42
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Aug 30 16:21:20 2014 -0500
|
||
|
||
Reorganized packm implementation.
|
||
|
||
Details:
|
||
- Reorganized packm variants and structure-aware kernels so that all
|
||
routines for a given pack format (4m, 3m, regular) reside in a single
|
||
file.
|
||
- Renamed _blk_var4 to _blk_var2 and generalized so that it will work
|
||
for
|
||
both 4m and 3m, and adjusted 4m/3m _cntl_init() functions accordingly.
|
||
- Added a new packm_ker_t function pointer type to
|
||
bli_kernel_type_defs.h
|
||
to facilitate function pointer typecasting in the datatype-specific
|
||
packm_blk_var2() functions.
|
||
- Deprecated _blk_var3.
|
||
- Fixed a bug in the triangular micro-panel packing facility that
|
||
affected trmm and trmm3 with unit diagonals.
|
||
|
||
commit c6793cecb70788bdf2c76ab8102504ea97be9d2a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 28 17:14:48 2014 -0500
|
||
|
||
Reorganized #includes for scalar macro headers.
|
||
|
||
Details:
|
||
- Reordered the #include statements in bli_scalar_macro_defs.h so that
|
||
conventional, ri-, and ri3-based macros are grouped together.
|
||
- Renamed bli_eqri.h (and macros within) to end with 'ris' suffix.
|
||
|
||
commit b4da8907284345be4374f87a88679c4886ab866e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 28 14:10:32 2014 -0500
|
||
|
||
Whitespace, comments updates on packm_blk_var?.c.
|
||
|
||
commit 46e46a1d83da586c3dd9fd7a01eb16067abbaee1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 28 12:05:45 2014 -0500
|
||
|
||
Minor updates to packm blocked, cxk_3m/4m code.
|
||
|
||
Details:
|
||
- Added 'const' qualifier to inlined packing code that handles
|
||
micro-panel packing that is too large for an existing packm ukernel.
|
||
- Comment updates.
|
||
|
||
commit 908dc688b5979995eaacb3aa937f241551a8df00
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 28 11:55:12 2014 -0500
|
||
|
||
Pass pack schema into blocked packm routines.
|
||
|
||
Details:
|
||
- Rather than passing the packm blocked routines a boolean value that
|
||
represents whether the matrix is being packed to row or column storage,
|
||
we now pass in the pack schema itself.
|
||
|
||
commit a0ff6066e06075ab5f92b19247b39b92ed15f1bf
|
||
Merge: c4c99c48 d40b32bc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Aug 24 15:56:21 2014 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit c4c99c4813bf9817592a7899c5d33412fe22313f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Aug 24 15:52:22 2014 -0500
|
||
|
||
Renamed packm scalar from beta to kappa.
|
||
|
||
Details:
|
||
- The packm implementation (i.e. sources files in frame/1m/packm and
|
||
frame/1m/packm/ukernels), interchangeably used the names "beta" and
|
||
"kappa" to refer to the optional scalar to be applied during packing.
|
||
This commit renames all uses of "beta" to be "kappa", since "beta"
|
||
sometimes evokes the scalar specifically on the output matrix of a
|
||
level-2 or level-3 operation.
|
||
|
||
commit d40b32bc24ffbae24123e054307b3138969bb095
|
||
Merge: 9331f794 6c25c379
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Aug 24 13:46:36 2014 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 6c25c379fadb50834146e1614f7b80c093c2aad0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Aug 24 13:44:10 2014 -0500
|
||
|
||
Consolidated unpackm ukernels into single file.
|
||
|
||
Details:
|
||
- Reorganized unpackm ukernels into a single file,
|
||
bli_unpackm_ref_cxk.c, in a manner similar to what was done for packm
|
||
ukernels in commit 4cc2b46.
|
||
|
||
commit 9331f79443223fe267676ee54c439e1ed320380c
|
||
Merge: 7fc48a7d 670b6392
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Aug 24 10:54:21 2014 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 670b63926a7f4fc694abc5b1582ef8a4f367f5a8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Aug 24 10:46:27 2014 -0500
|
||
|
||
Added whitespace to bli_obj_scalar_ routine calls.
|
||
|
||
Details:
|
||
- Added extra spaces to align arguments of
|
||
bli_obj_scalar_init_detached_copy_of(). This misalignment was due to
|
||
the fact that the function was previously named
|
||
bli_obj_init_scalar_copy_of() and the name change, performed in
|
||
b444489f, was done via recursive sed commands which left subsequent
|
||
lines untouched.
|
||
|
||
commit 7fc48a7d920e07fd8e9528ab2565123f8f4e67f9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Aug 23 16:50:58 2014 -0500
|
||
|
||
Combined 4m/3m bits into an expanded bitfield.
|
||
|
||
Details:
|
||
- Combined the 4m/3m bits into an expanded bitfield, which will encode
|
||
the packing "format" of the micro-panels. This will allow for more
|
||
easily and compactly encoding additional formats.
|
||
- Other minor comment/whitespace updates to bli_type_defs.h.
|
||
- Updated bli_obj_macro_defs.h and bli_param_macro_defs.h to use the new
|
||
format bitfield.
|
||
- Comment update to bli_kernel_post_macro_defs.h.
|
||
- Whitespace changes to bli_kernel_3m_macro_defs.h, _4m_macro_defs.h.
|
||
|
||
commit ef0143cc1417e4815e4cafd5a464cc83fe7a1e86
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Aug 23 14:02:27 2014 -0500
|
||
|
||
Renamed _ri, _ri3 packm ukernels to _4m, _3m.
|
||
|
||
Details:
|
||
- Renamed packm ukernels, _cxk dispatcher, and structure-aware _cxk
|
||
helper functions to use _4m and _3m instead of _ri and _ri3 suffixes.
|
||
- Updated names of cpp macros that correspond to packm ukernels.
|
||
|
||
commit b0ccac116158b5ed3316d34798748ba0c6d78672
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 21 19:21:52 2014 -0500
|
||
|
||
Cleaned up front-end layering for 4m/3m.
|
||
|
||
Details:
|
||
- Added an extra layer to level-3 front-ends (examples: bli_gemm_entry()
|
||
and bli_gemm4m_entry()) to hide the control trees from the code that
|
||
decides whether to execute native or 4m-based implementations. The
|
||
layering was also applied to 3m.
|
||
- Branch to 4m code based on the return value of bli_4m_is_enabled(),
|
||
rather than the cpp macros BLIS_ENABLE_?COMPLEX_VIA_4M. This lays
|
||
the groundwork for users to be able to change at runtime which
|
||
implementation is called by the main front-ends (e.g. bli_gemm()).
|
||
- Retired some experimental gemm code that hadn't been touched in
|
||
months.
|
||
|
||
commit bedec95451cabfa7a8906b51018a5e0572998a5e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 21 18:25:48 2014 -0500
|
||
|
||
Added bli_4m API for querying 4m enabled state.
|
||
|
||
Details:
|
||
- Added bli_4m.c (and header), which defines a simple API that can be
|
||
used to query, enable, and disable 4m-based complex support in BLIS.
|
||
The macros BLIS_ENABLE_?COMPLEX_VIA_4M are now used to initialize
|
||
the variable that determines the state (enabled or disabled).
|
||
- Changed bli_info*() API so that all cache and register blocksize-
|
||
related query routines return the blksz_t objects' values as they
|
||
exist at runtime, rather than return the values as determined by the
|
||
configuration system (e.g. bli_kernel.h, or defaults for those values
|
||
not specified). This sets the foundation for being able to change
|
||
those blocksizes at runtime.
|
||
|
||
commit b541b667cabfa6d41b50ad1e49209651ee6812cc
|
||
Merge: 699a8151 dd61307f
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Aug 20 14:44:51 2014 -0500
|
||
|
||
Merge branch 'master' of http://github.com/flame/blis
|
||
|
||
Conflicts:
|
||
frame/3/trsm/bli_trsm_blk_var2b.c
|
||
frame/3/trsm/bli_trsm_blk_var2f.c
|
||
|
||
commit 699a8151ca3d5021e834a1784ef45dcc3a3d17cd
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Aug 20 14:43:17 2014 -0500
|
||
|
||
Some improvements to trsm parallelism
|
||
|
||
commit dd61307f55bb6bc762fe0ef0446479d6c0536723
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Aug 20 09:52:16 2014 -0500
|
||
|
||
Minor update to sandybridge MC_S, KC_S.
|
||
|
||
Details:
|
||
- Changed sandybridge MC and KC for single-precision real to 128 and 384,
|
||
respectively.
|
||
- Updated comments in template configuration's gemm micro-kernel file
|
||
to document the new "contiguous row preference" macro.
|
||
|
||
commit d0eec4bddd740ce360d0f655362c551287cf925b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 19 15:49:19 2014 -0500
|
||
|
||
Added optional row preference to ukernel config.
|
||
|
||
Details:
|
||
- Added the ability for the kernel developer to indicate the gemm micro-
|
||
kernel as having a preference for accessing the micro-tile of C via
|
||
contiguous rows (as opposed to contiguous columns). This property may
|
||
be encoded in bli_kernel.h as BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS,
|
||
which may be defined or left undefined. Leaving it undefined leads to
|
||
the default assumption of column preference.
|
||
- Changed conditionals in frame/3/*/*_front.c that induce transposition
|
||
of the operation so that the transposition is induced only if there
|
||
is disagreement between the storage of C and the preference of the
|
||
micro-kernel. Previously, the only conditional that needed to be met
|
||
was that C was row-stored, which is to say that we assumed the micro-
|
||
kernel preferred column-contiguous access on C.
|
||
- Added a "prefers_contig_rows" property to func_t objects, and updated
|
||
calls to bli_func_obj_create() in _cntl.c files in order to support
|
||
the above changes.
|
||
- Removed the row-storage optimization from bli_trsm_front.c because
|
||
it is actually ineffective. This is because the right-side case of
|
||
trsm flips the A and B micro-panel operands (since BLIS only requires
|
||
left-side gemmtrsm/trsm kernels), meaning any transposition done
|
||
at the high level is then undone at the low level.
|
||
- Tweaked trmm, trmm3 _front.c files to eliminate a possible redundant
|
||
invocation of the bli_obj_swap() macro.
|
||
|
||
commit 4cc2b464f29cafbfef9295b073b857fe0752f710
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Aug 15 11:49:15 2014 -0500
|
||
|
||
Reorganized packm ukernels.
|
||
|
||
Details:
|
||
- Previously, packm micro-kernels were organized by the implied register
|
||
blocksize (panel dimension) assumed by the kernel, meaning conventional,
|
||
ri, and ri3 variations of some micro-kernel size were housed in the same
|
||
file. This commit reorganizes the micro-kernels so that all sizes reside
|
||
in the same file for each format type (conventional, ri, and ri3).
|
||
|
||
commit fcc10054a11b6fc3976986f57feccf741596cbf6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Aug 13 12:32:06 2014 -0500
|
||
|
||
Tweaks to gemm4m, gemm3m virtual ukernels.
|
||
|
||
Details:
|
||
- Fixed a potential, but as-yet unobserved bug in gemm3m that would
|
||
allow undesirable inf/NaN propogation, since C was being scaled by
|
||
beta even if it was equal to zero.
|
||
- In gemm3m micro-kernel, we now avoid copying C to the temporary
|
||
micro-tile if beta is zero.
|
||
- Rearranged computation in gemm4m so that the temporary C micro-tile
|
||
is accessed less, and C is accessed only after the micro-kernel
|
||
calls. This improves performance marginally in most situations.
|
||
- Comment updates to both gemm4m and gemm3m micro-kernels.
|
||
|
||
commit cdcbacc2fa871317c8e7ef961ecc6d70ab22dc34
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 12 12:45:38 2014 -0500
|
||
|
||
Removed redundant redef of packm ukr prototypes.
|
||
|
||
Details:
|
||
- Removed redundant macro code that redefined packm ukernel prototypes
|
||
when the previous macro was already sufficient. This helps de-clutter
|
||
the packm ukernel prototyping headers a little bit.
|
||
|
||
commit 82dac98d9032ccb598068a55ddf23d7898491e9e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 12 12:36:25 2014 -0500
|
||
|
||
Relocated packm ukernel #includes.
|
||
|
||
Details:
|
||
- Consolidated the #include statements for packm ukernel headers from
|
||
bli_packm_cxk.h, bli_packm_cxk_ri.h, and bli_packm_cxk_ri3.h to
|
||
bli_packm.h.
|
||
- Comment/whitespace updates to bli_packm_blk_var3.c, _var4.c.
|
||
|
||
commit 7f77856e25aad5fc6f172ed3e57b6351804e31a4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 12 12:20:15 2014 -0500
|
||
|
||
Removed unused 4m/3m-related packm macro defs.
|
||
|
||
Details:
|
||
- Removed unused and unneeded s- and d-flavored macro definitions for
|
||
packm ukernels related to the complex 4m and 3m methods, as
|
||
implemented in BLIS.
|
||
|
||
commit bc1d86b2d4d436b1dfba2d0098501aaca9cbb8b5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 7 19:01:20 2014 -0500
|
||
|
||
Sandy Bridge configuration, micro-kernel update.
|
||
|
||
Details:
|
||
- Minor updates to bli_config and bli_kernel.h for sandybridge
|
||
configuration.
|
||
- Renamed existing AVX intrinsic-based micro-kernel file to
|
||
bli_gemm_int_d8x4.c.
|
||
- Added new file, bli_gemm_asm_d8x4.c, which provides assembly-based
|
||
gemm micro-kernels for single- and double-precision real.
|
||
|
||
commit 98ec95877a95242e159b2bf0c879115a59e4c6e2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 7 18:28:32 2014 -0500
|
||
|
||
Corrected comment for _obj_is_[row|col]_stored().
|
||
|
||
Details:
|
||
- Fixed a mistake in the comments introduced in the previous commit for
|
||
bli_obj_is_row_stored() and bli_obj_is_col_stored().
|
||
|
||
commit 43d5e419e1b424d2143817103dbee8ead797e8aa
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 7 18:20:40 2014 -0500
|
||
|
||
Reverted _obj_is_[row|col]_stored() macros.
|
||
|
||
Details:
|
||
- Rolled back recent changes to bli_obj_is_row_stored() and
|
||
bli_obj_is_col_stored() so that those macros now only inspect the
|
||
strides (row or column). It turns out that the more sophisticated
|
||
definitions introduced in a51e32e are not necessary, because these
|
||
"obj" macros are virtually never used on packed matrices, and when
|
||
they are, they can use bli_obj_is_[row|col}_packed() macros, which
|
||
inspect the info bitfield.
|
||
|
||
commit 45692e3ad4b7e1d05ac4302398df4efce04b4284
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 7 13:21:15 2014 -0500
|
||
|
||
Reverted some accidental changes.
|
||
|
||
Details:
|
||
- Reverted some changes that were unintentionally included in the
|
||
previous commit (9526ce98). Thanks to Tony Kelman for pointing
|
||
this out. (Note: a few select changes were not reverted.)
|
||
|
||
commit 9526ce98812be908bc4915f2849b657fb6ce1b49
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Aug 6 14:13:46 2014 -0500
|
||
|
||
Updated copyright headers of emscripten configuration files.
|
||
|
||
commit 30833ed71d56f231ddba21e632bcbbc90b12a97c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Aug 6 12:12:03 2014 -0500
|
||
|
||
Minor edits to configurations' make_defs.mk files.
|
||
|
||
Details:
|
||
- Redefined CFLAGS, CFLAGS_NOOPT, and CFLAGS_KERNELS so that CFLAGS_NOOPT
|
||
is defined first and then the other two are defined in terms of
|
||
CFLAGS_NOOPT. This textually cleans up the definitions and makes them a
|
||
little easier to read.
|
||
|
||
commit 9d61afeae2ba70fe1df07e7546f6954ea83aed12
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Aug 4 16:01:59 2014 -0500
|
||
|
||
CHANGELOG update (0.1.5)
|
||
|
||
commit bde56d0ecfd0ec20330fac290b91a6dca0cf94e9 (tag: 0.1.5)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Aug 4 16:01:58 2014 -0500
|
||
|
||
Version file update (0.1.5)
|
||
|
||
commit 4c6ceea4be35d089630986eb5b959b9e97214077
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Aug 4 15:49:59 2014 -0500
|
||
|
||
Added CBLAS compatibility layer.
|
||
|
||
Details:
|
||
- Added a new section in bli_config.h files of all configurations for
|
||
enabling CBLAS support. (Currently, the default is for the CBLAS layer
|
||
to be disabled.)
|
||
- Added a directory, frame/compat/cblas, to house CBLAS source code. A
|
||
subdirectory 'f77_sub' holds subroutine wrappers corresponding to
|
||
subroutines found in CBLAS that allow calling some BLAS routines with
|
||
the return value passed as the last argument rather than as an actual
|
||
(function) return value. This was probably intended to allow CBLAS to
|
||
avoid the whole f2c debacle altogether. However, since BLIS does not
|
||
assume the presence of a Fortran compiler, we had to provide similar
|
||
routines in C.
|
||
- A script, integrate-cblas-tarball.sh, is included to streamline the
|
||
integration of future revisions of the CBLAS source code.
|
||
- The current tarball, cblas.tgz, that was used with the above script to
|
||
generate the present set of CBLAS source code is also included.
|
||
- Updated blis.h to include necessary CBLAS-related headers.
|
||
|
||
commit caab62dac0fb0bd0d674118f409c81680db94d29
|
||
Merge: 383631b5 db97ce97
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Aug 3 14:36:18 2014 -0500
|
||
|
||
Merge pull request #19 from kevinoid/fix-install-perms-error
|
||
|
||
Fix permissions error installing to non-owned directory
|
||
|
||
commit db97ce979b88c051922c2f946ce52d523c7a12c6
|
||
Author: Kevin Locke <kevin@kevinlocke.name>
|
||
Date: Sun Aug 3 12:48:04 2014 -0600
|
||
|
||
Fix permissions error installing to non-owned directory
|
||
|
||
When installing to a directory which is not owned by the installing
|
||
user, even when the user has write permission for the directory, the
|
||
installation can fail with an error similar to the following:
|
||
|
||
Installing libblis-0.1.4-7-sandybridge.a into /usr/local/lib/
|
||
install: cannot change permissions of ‘/usr/local/lib’: Operation not permitted
|
||
Makefile:658: recipe for target '/usr/local/lib/libblis-0.1.4-7-sandybridge.a' failed
|
||
make: *** [/usr/local/lib/libblis-0.1.4-7-sandybridge.a] Error 1
|
||
|
||
In the example case, the error occurred because the user attempted to
|
||
install to /usr/local and /usr/local/lib is owned by root with mode 2755
|
||
which the Makefile unsuccessfully attempted to change to 0755.
|
||
|
||
Given that installing to /usr/local is likely to be quite common and the
|
||
ownership/permissions are the default for Debian and Debian-derived
|
||
Linux distributions (perhaps others as well), this commit attempts to
|
||
support that use case by using mkdir rather than install to create the
|
||
directory (which is the same approach as Automake).
|
||
|
||
Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
|
||
|
||
commit 383631b514c3d42b724640f57644eea276cc418c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 31 14:51:48 2014 -0500
|
||
|
||
Redefined bit field macros with bitshift operator.
|
||
|
||
Details:
|
||
- Redefined many of the macros that define bit fields and bit values in
|
||
the obj_t info field using the bitshift operator (<<). This makes it
|
||
easier to reorder bit fields, or expand existing bit fields, or add
|
||
new fields. The bitshifting should be evaluated by the compiler at
|
||
compile-time.
|
||
|
||
commit 137143345dc93cc9a83da5ba88b25bac7502de86
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 31 12:12:45 2014 -0500
|
||
|
||
Reimplemented unit blocksize fix in prev commit.
|
||
|
||
Details:
|
||
- Instead of inferring the storage format of the micro-panels from within
|
||
the packm variants, we now pass in a bool_t value that denotes whether
|
||
the packed matrix contains row-stored column panels or column-stored
|
||
row panels. This value can then be tested more easily inside the main
|
||
packm variant loop.
|
||
- Renumbered pack_t schema values in bli_type_defs.h so that there are
|
||
now five bits, each with different meaning:
|
||
- 4: packed or not packed?
|
||
- 3: packed for 3m?
|
||
- 2: packed for 4m?
|
||
- 1: packed to panels?
|
||
- 0: stored by rows or columns?
|
||
- Added new macros that test for status of above bits in schema bit
|
||
subfield, and renamed some existing macros related to 4m/3m.
|
||
|
||
commit a51e32ec061941cd10119ea80115c82a40b1673f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 30 10:41:48 2014 -0500
|
||
|
||
Fixed unit register blocksize brokenness.
|
||
|
||
Details:
|
||
- Fixed a breakdown in BLIS's ability to differentiate between row-stored
|
||
and column-stored micro-panels when MR or NR is unit. When either
|
||
register blocksize (or both) is equal to one, inspecting the strides of
|
||
the affected packed micro-panel is no longer sufficient to determine
|
||
whether the micro-panel is a row-stored column panel or a column-stored
|
||
row panel (because both strides are unit). At that point, dimension
|
||
information is necessary when invoking the bli_is_row_stored_f() and
|
||
bli_is_col_stored_f() macros (and their "obj" counterparts). Thanks to
|
||
Ilya Polkovnichenko for reporting this bug.
|
||
- Added panel dimensions (m and n) to obj_t, which are set in
|
||
packm_init() and then passed into the blocked variants to support the
|
||
aforementioned update.
|
||
|
||
commit c2732272f0ac680a0ad19fa9db5d587398a1479a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 29 16:37:18 2014 -0500
|
||
|
||
Removed old/unused packm variants.
|
||
|
||
commit b97fa9a5a70fe0123e5eebd999b947461d38445f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Jul 27 18:54:09 2014 -0500
|
||
|
||
Minor usage update to build/bump-version.sh.
|
||
|
||
commit b18ba5f62d98629cdd519ff4c96fc67ec1a62fb9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Jul 27 18:52:05 2014 -0500
|
||
|
||
Added missing 'bla_' prefix to r_imag(), d_imag().
|
||
|
||
Details:
|
||
- Added "bla_" to f2c functions r_imag() and d_imag(). Thanks to Murtaza
|
||
Ali for pointing the mis-named functions.
|
||
|
||
commit af7a8e6c042cade452130a6729377f1a3ef4e19e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Jul 27 18:20:13 2014 -0500
|
||
|
||
CHANGELOG update (0.1.4)
|
||
|
||
commit a7537071b152ecff671f8716595d37dc09e4fd51 (tag: 0.1.4)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Jul 27 18:20:12 2014 -0500
|
||
|
||
Version file update (0.1.4)
|
||
|
||
commit acff74041bf02c7b9fdfa24b507bca782a4c5fce
|
||
Merge: cdb9413e 47b243ef
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Jul 23 15:07:30 2014 -0500
|
||
|
||
Merge branch 'master' of https://github.com/flame/blis
|
||
|
||
commit cdb9413e140f8a198666250ec88fa34b5425a9c3
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Jul 23 15:05:15 2014 -0500
|
||
|
||
Enabled threading for a couple more loops in TRSM
|
||
|
||
JC loop is now enabled for the left-sided case
|
||
IC loop is now enabled for the right-sided case
|
||
|
||
commit 47b243ef08f4101de3d936f2373343e67eaa4dd5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 23 13:41:13 2014 -0500
|
||
|
||
Call setid for early return from herk/her2k.
|
||
|
||
Details:
|
||
- Added setid call (to zero imaginary parts of diagonal elements) to
|
||
early return branches of herk_front() and her2k_front() for cases
|
||
where alpha is zero. Thanks to Murtaza Ali for suggesting this fix.
|
||
- Comment update.
|
||
|
||
commit 3e7b0db5b0e24f5fd66c60bacabc019885ddbec5
|
||
Merge: 2f8a357d ed3e33d5
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Jul 23 13:40:44 2014 -0500
|
||
|
||
Merge branch 'master' of https://github.com/flame/blis
|
||
|
||
commit 2f8a357de5fb55163a969d888cf059f24b78125c
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Jul 23 13:40:12 2014 -0500
|
||
|
||
Some TRSM threading fixes/additions
|
||
|
||
commit ed3e33d548047be3283ff41268fdf716563bc542
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 22 14:40:43 2014 -0500
|
||
|
||
Tweaked behavior of herk, her2k for BLAS compat.
|
||
|
||
Details:
|
||
- Updated herk_front() and her2k_front() to explicitly set the imaginary
|
||
components of the diagonal entries of C to zero after the computation
|
||
is complete. This is needed in case downstream applications read the
|
||
full diagonal entries (i.e., including imaginary part), which could, in
|
||
the absence of this modification, accumulate numerical error from
|
||
subsequent rank-k/rank-2k updates.
|
||
- Updated BLAS compatibility wrappers for herk and her2k to return early
|
||
if:
|
||
n == 0 || ( ( alpha == 0 || k == 0 ) && beta == 1 )
|
||
This also results in the imaginary components of diagonal entries NOT
|
||
being set to zero (see above), which is consistent with BLAS.
|
||
- Updated mkherm to use setid instead of an inlined loop over the
|
||
diagonal.
|
||
|
||
commit ea59a5c93cde1467a3715abc53dda4aecf961873
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 22 14:36:02 2014 -0500
|
||
|
||
Added new level-1d operation: setid.
|
||
|
||
Details:
|
||
- Defined a new level-1d operation, setid, which sets the imaginary
|
||
elements of an object's diagonal to a single scalar. This can be
|
||
useful, for example, when trying to make the diagonal of a Hermitian
|
||
matrix real-valued.
|
||
|
||
commit 8965a965931318619ceaebd7c32edccf3022d0c7
|
||
Merge: 1785efb5 5b73e80b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 22 14:34:32 2014 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 1785efb5420bc7b9c850a068cb5d99837071e877
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 22 14:33:01 2014 -0500
|
||
|
||
Minor improvements to invertd and setd.
|
||
|
||
Details:
|
||
- Added missing call to invertd_check() from front-end.
|
||
- Changed setd front-end call of scald_check() to setd_check().
|
||
|
||
commit 5b73e80b71c054c1945a06aff044ef629bc1a9a0
|
||
Merge: a41e68e0 20690fe3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jul 18 12:21:20 2014 -0500
|
||
|
||
Merge pull request #16 from Maratyszcza/emscripten
|
||
|
||
Emscripten port
|
||
|
||
commit a41e68e09e73b999fab0bb430a43dccfc63aab45
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 17 13:25:56 2014 -0500
|
||
|
||
Reimplemented BLIS initialization/finalization.
|
||
|
||
Details:
|
||
- Rewrote bli_init() and bli_finalize() with OpenMP critical sections
|
||
for thread-safety. Also added lots of explanatory comments.
|
||
- Renamed bli_init_safe() and bli_finalize_safe() with the _auto()
|
||
suffix, and reimplemented for simplicity. Updated all invocations
|
||
in BLAS compatibility layer to use _auto() suffix.
|
||
|
||
commit 36358948ea75074bda32a9f8c008f835b87d21db
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 17 10:58:10 2014 -0500
|
||
|
||
Retired frame/3/gemm/other directory.
|
||
|
||
Details:
|
||
- Removed frame/3/gemm/other directory, which contained some outdated
|
||
and/or experimental variants.
|
||
|
||
commit c73261f17edf589e76bdbe297702a1fbbd69275f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 14 16:23:51 2014 -0500
|
||
|
||
More minor cleanups post-copyright update.
|
||
|
||
commit 2a09d24463d358be6243b24f112fad057c2aefe0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 14 16:17:09 2014 -0500
|
||
|
||
Reverted power7 symlinks destroyed by sed script.
|
||
|
||
Details:
|
||
- Reverted two symlinks, in kernels/power7/3/test, back to being symlinks
|
||
after recursive-sed.sh mistakenly replaced them with copies of the
|
||
actual files to which they referred. Meant to include this in previous
|
||
commit.
|
||
|
||
commit 7ed415824d3b2e78541b6f64e404ca5347c06d3d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 14 16:14:33 2014 -0500
|
||
|
||
Updated copyright headers (continued).
|
||
|
||
Details:
|
||
- Inserted "at Austin" into third clause of license declarations.
|
||
Meant to include this change in previous commit.
|
||
|
||
commit 5c2c6c85616834ff2716ece083118201d9df6dde
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 14 16:05:03 2014 -0500
|
||
|
||
Updated copyright headers to contain "at Austin".
|
||
|
||
Details:
|
||
- Updated copyright headers to include "at Austin" in the name of the
|
||
University of Texas.
|
||
- Updated the copyright years of a few headers to 2014 (from 2011 and
|
||
2012).
|
||
|
||
commit fcec68cda3f6e90ae055e7304e6674c1c5c8d010
|
||
Merge: 94c0df79 4a20ed1a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 14 11:35:34 2014 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit 94c0df797eda377931f29a41ba6a89c0ed58daca
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 14 11:24:36 2014 -0500
|
||
|
||
Changed order of zero dim / error checking.
|
||
|
||
Details:
|
||
- Updated level-2 and level-3 internal back-ends so that the operation's
|
||
_check() function is called BEFORE any attempt to return early due to
|
||
the presence of zero dimensions. This ordering makes more sense because
|
||
(for example) object dimensions should match even if one of them is
|
||
zero. Previously, a dimension mismatch could result in an early return
|
||
with no error message.
|
||
- Updated bli_check_object_buffer() so that NULL buffers result in an
|
||
error only if the object is dimensionally non-empty (i.e., only if both
|
||
of the object's dimensions are non-zero). This allows BLIS operations
|
||
to be performed on dimensionally empty objects (i.e., where at least one
|
||
dimension is zero).
|
||
- Updated the error message associated with bli_check_object_buffer()
|
||
to mention the newly relaxed constraint mentioned above, vis-a-vis
|
||
non-zero dimensions.
|
||
|
||
commit 20690fe3018ce17c8df61ce0bffecaa7911dc3a5
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Sun Jul 13 22:50:56 2014 -0700
|
||
|
||
Emscripten port
|
||
|
||
commit 4a20ed1a3f5e9e5232df30aa0e568e6c00c56ce1
|
||
Merge: 6a515e98 8ccdfaef
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Jul 13 17:45:01 2014 -0500
|
||
|
||
Merge pull request #14 from Maratyszcza/master
|
||
|
||
Support "make test" for PNaCl configuration
|
||
|
||
commit 6a515e988f2ae1628258a6dec2c0e9cf2d04790f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Jul 13 17:38:33 2014 -0500
|
||
|
||
Implemented dsdot() and sdsdot() in compat layer.
|
||
|
||
Details:
|
||
- Replaced "not yet implemented" error messages in dsdot() and sdsdot()
|
||
with actual implementations. (These routines are so rarely used that
|
||
this log message will probably lead to some people learning of their
|
||
existence for the first time.)
|
||
|
||
commit 255668ddd1004552c6cc65035ec6486671ce99bb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Jul 13 17:30:44 2014 -0500
|
||
|
||
Inserted gemv beta-scaling bug into compat layer.
|
||
|
||
Details:
|
||
- BLAS has a peculiar bug (or feature) whereby calling gemv on a vector
|
||
y of non-zero length and a vector x of zero length results in no action.
|
||
Given that the operation is y := beta*y + A*x, many (most?) individuals
|
||
would expect vector y to still be scaled by beta. BLIS, when called
|
||
natively, handles these cases intuitively (with beta scaling).
|
||
Unfortunately, many BLAS test suites actually check for the way this
|
||
situation is handled. Therefore, we have decided to implement this "bug"
|
||
in the compatibility layer so as to provide "bug-for-bug" compatibility
|
||
with BLAS.
|
||
|
||
commit 570a154581bdb353fa13a219c7cb3c81d3dceffd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Jul 12 17:51:05 2014 -0500
|
||
|
||
Comment/formatting updates to build scripts.
|
||
|
||
Details:
|
||
- Minor updates to comments and formatting in bump-version.sh and
|
||
update-version-file.sh scripts.
|
||
|
||
commit 26cd81990631ff799791629206e068126ff9e3a1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 10 13:16:07 2014 -0500
|
||
|
||
Added bli_info_*() query functions.
|
||
|
||
Details:
|
||
- Added a new API family, bli_info_*(), which can be used to query
|
||
information about how BLIS was configured. Most of these values are
|
||
returned as gint_t, with the exception of the version string which
|
||
is char*.
|
||
- Changed how the testsuite driver queries information about how BLIS
|
||
was configured (from using macro constants directly to using the
|
||
new bli_info API).
|
||
- Removed bli_version.c and its header file.
|
||
- Added STRINGIFY_INT() macro to bli_macro_defs.h
|
||
- Renamed info_t type in bli_type_defs.h to objbits_t (not because of
|
||
an actual naming conflict, but because the name 'info_t' would now be
|
||
somewhat misleading in the presence of the new bli_info API, as the
|
||
two are unrelated).
|
||
|
||
commit 970b43141697d8c31a033f59513bb59d7cc78ab0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 10 09:30:00 2014 -0500
|
||
|
||
Minor bugfixes to BLAS compatibility layer.
|
||
|
||
Details:
|
||
- Changed bla_amax.c so that i?amax() routines now correctly return 0
|
||
if ( n < 1 || incx <= 0 ).
|
||
- Changed bla_rotg.c and bla_rotmg.c to use bli_fabs() macro instead of
|
||
f2c's abs() macro for float and double cases.
|
||
- Thanks to Murtaza Ali for suggesting the two fixes above.
|
||
- Updated label of fnormv to normfv in testsuite/input.operations.
|
||
|
||
commit 8ccdfaef4c42ad8957af8607a1a9ee29b9277d4b
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Tue Jul 8 23:14:36 2014 -0700
|
||
|
||
Replicated logic from testsuite/Makefile in top-level Makefile to support make test
|
||
|
||
commit caa6507ff3724c80d60987f309b8bbc5b50a9841
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 8 10:25:27 2014 -0500
|
||
|
||
Minor cleanup to standalone test drivers.
|
||
|
||
Details:
|
||
- Very minor code changes to standalone test drivers in 'test' directory.
|
||
- Added *.so files to '.gitignore'.
|
||
|
||
commit 6c65e9a58fe55990ebb99ec3986443e18af35338
|
||
Merge: cb12e456 daca500d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 8 10:13:49 2014 -0500
|
||
|
||
Merge branch 'master' of github.com:flame/blis
|
||
|
||
commit cb12e456f94c196c093e52f02a7cbca0032fc86e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 8 10:07:46 2014 -0500
|
||
|
||
Fixed possible level-3 inf/NaN issue when beta=0.
|
||
|
||
Details:
|
||
- Redefined xpbys_mxn and xpbys_mxn_u/_l macros to employ a copy
|
||
(instead of scaling by beta) when beta is zero. This will stamp out
|
||
any possible infs or NaNs in the output matrix, if it happens to be
|
||
uninitialized. Thanks to Tony Kelman for isolating this bug.
|
||
|
||
commit daca500db5e2448ba0da8047b75eb0f88d9f40e3
|
||
Merge: ab3bc915 47023502
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Jul 3 12:52:52 2014 -0500
|
||
|
||
Merge branch 'master' of http://github.com/flame/blis
|
||
|
||
commit 4702350278af31f662b458127777dd4d85a3192f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 3 11:48:23 2014 -0500
|
||
|
||
Defined _ukernel_void() wrappers to micro-kernels.
|
||
|
||
Details:
|
||
- Added wrappers for micro-kernels so that users may invoke the
|
||
micro-kernels without knowing what the function names actually are.
|
||
This is useful when an application wishes to call the micro-kernel
|
||
from a shared library instance of BLIS, where the application may not
|
||
necessarily have the luxury of grabbing the micro-kernel name(s) from
|
||
C preprocessor macros at compile-time. Also, since the wrappers use
|
||
void* pointers, one's environment does not need to be aware of some
|
||
BLIS types such as scomplex and dcomplex. These wrappers now join the
|
||
level-1 and level-1f kernel wrappers, which pre-dated this commit.
|
||
- Removed the wrapper definitions and prototypes from the micro-kernel
|
||
test suite modules, and replaced calls to them with calls to the new
|
||
wrappers mentioned above.
|
||
|
||
commit ab3bc9153b914fbaf259e15b66c91d628e7c8661
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Jul 3 11:19:43 2014 -0500
|
||
|
||
Fixed a bug for TRSM when BLIS_ENABLE_MULTITHREADING is not set but the multithreading environment variables are turned on
|
||
|
||
commit b8134b720b985783ee6a582a3eb5d6c51f00d051
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Jul 2 16:02:39 2014 -0500
|
||
|
||
Quick and dirty multithreading for TRSM
|
||
|
||
Should work fine for small number of threads (up to 8 or maybe even 16).
|
||
However, performance is yet untested.
|
||
This parallelizes the "JR" loop for the left sided cases
|
||
and the "IR" loop for the right sided cases.
|
||
|
||
Future work is to parallelize the outer loops as well.
|
||
|
||
commit e8ef69692831db07ddbe9485a5e504ac3f03e496
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 2 14:59:27 2014 -0500
|
||
|
||
Added shared library support to build system.
|
||
|
||
Details:
|
||
- Modified top-level Makefile to support building shared (dynamic)
|
||
libraries.
|
||
- Updated most configurations' make_defs.mk files to include necessary
|
||
compiler/linker flags needed by top-level Makefile.
|
||
- Note that by default, all configurations presently do NOT build
|
||
shared libraries. To enable, one must change the value of
|
||
BLIS_ENABLE_DYNAMIC_BUILD to 'yes'.
|
||
|
||
commit b80df0f2cffb015da02e70a82b8512da9891ab67
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jun 23 13:52:39 2014 -0500
|
||
|
||
Added bump-version.sh script to 'build' directory.
|
||
|
||
Details:
|
||
- Added a bash script, bump-version.sh, to aid in incrementing the BLIS
|
||
version string.
|
||
|
||
commit 9ef1f1e21d083697fc730e48d7d9169c201f3da2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jun 23 13:48:17 2014 -0500
|
||
|
||
CHANGELOG update (0.1.3)
|
||
|
||
commit 036cc634918463b1caa0fd89c9a211f2f5639af7 (tag: 0.1.3)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jun 23 13:48:17 2014 -0500
|
||
|
||
Version file update (0.1.3)
|
||
|
||
commit 09d9a3bf6763932d9f571085b2cfd1b8631eccba
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jun 23 13:43:26 2014 -0500
|
||
|
||
Reverting version file to test new version script.
|
||
|
||
Details:
|
||
- Changed version file contents to 0.1.2 so that I can test out a new
|
||
version file bumping script.
|
||
|
||
commit ebb33965981dcb2b0bdee5fc7fdf6c959420f311
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jun 23 11:22:50 2014 -0500
|
||
|
||
Added 'version' file.
|
||
|
||
commit 2cb9a5501a3cbeb6692cf68e896087ba73b6af69
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jun 23 10:42:29 2014 -0500
|
||
|
||
Removed 'version' from .gitignore file.
|
||
|
||
commit b40dcefc5ee31f67aa3990e2e9d2ef8ed1386a25
|
||
Merge: 7101a8ee b693b0cd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jun 23 10:39:05 2014 -0500
|
||
|
||
Merge pull request #11 from Maratyszcza/stable
|
||
|
||
[sc]axpy kernels for PNaCl
|
||
|
||
commit b693b0cddcfb41450e3c09a3ab97acb44c1ccdec
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Sun Jun 22 13:44:25 2014 -0700
|
||
|
||
[SC]AXPY kernels for PNaCl
|
||
|
||
commit 7101a8eec0327d6c3a7eb36eb4b0fd45c1c6d162
|
||
Merge: ad48dca2 020a831b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jun 19 21:46:50 2014 -0500
|
||
|
||
Merge pull request #10 from Maratyszcza/stable
|
||
|
||
Portable Native Client port
|
||
|
||
commit 020a831bc5f61744cb8354886aa679b99b1285f6
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Thu Jun 19 00:58:26 2014 -0700
|
||
|
||
Code clean-up in PNaCl port
|
||
|
||
commit 491be4f91ed725522f5cc7184053857c6c376ada
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Thu Jun 19 00:45:44 2014 -0700
|
||
|
||
Optimized dot product kernels for PNaCl
|
||
|
||
commit 4b8e71aab80182873a2e138eb07902b8d8fd5480
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Thu Jun 19 00:43:25 2014 -0700
|
||
|
||
Use AR rcs flags for PNaCl target to avoid warning
|
||
|
||
commit 031deb2a5c718d569bde842590a791b812f4cf1d
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Wed Jun 18 03:11:34 2014 -0700
|
||
|
||
PNaCl configuration: use pnacl-ar instead or ar (fixes build issue on Mac)
|
||
|
||
commit 68a02976e3c3638f0a9821342e269a1743e3ace3
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Wed Jun 18 03:10:25 2014 -0700
|
||
|
||
Compile pnacl configuration in GNU11 mode to avoid warning about non-standard features
|
||
|
||
commit 6f8462eb0ec278b89731e73ef583386a3371d095
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Wed Jun 18 03:08:46 2014 -0700
|
||
|
||
Fix inconsistent VERBOSE macro in Makefile
|
||
|
||
commit b2ffb4de8b6872cb23537ad282e557d11dcd9c8b
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Sun Jun 15 18:41:30 2014 -0400
|
||
|
||
Reformatted PNaCl GEMM kernels
|
||
|
||
commit 6de2d472d98baa215264a776f3d5291780a6a085
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Sun Jun 15 08:44:31 2014 -0400
|
||
|
||
CGEMM and ZGEMM kernels for PNaCl
|
||
|
||
commit f064711a5e6fb3852c17c7520909b09dc27665f2
|
||
Author: Marat Dukhan <maratek@gmail.com>
|
||
Date: Sun Jun 15 06:27:37 2014 -0400
|
||
|
||
SGEMM and DGEMM kernels for PNaCl
|
||
|
||
commit ad48dca22913a363899f0bef45553898718eebb1
|
||
Merge: ee2b6792 7118f87e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Jun 14 15:10:13 2014 -0500
|
||
|
||
Merge pull request #9 from tkelman/memalign_windows
|
||
|
||
Use _aligned_malloc instead of posix_memalign on Windows
|
||
|
||
commit 7118f87e18b4941423472afc00215c1d1f2a1fcd
|
||
Author: Tony Kelman <tony@kelman.net>
|
||
Date: Sat Jun 14 06:53:20 2014 -0700
|
||
|
||
Use _aligned_malloc instead of posix_memalign on Windows
|
||
|
||
commit ee2b679281ca45fb40b2198e293bc3bc3d446632
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Jun 6 12:41:55 2014 -0500
|
||
|
||
Only include omp.h if BLIS_ENABLE_OPENMP is set
|
||
|
||
commit 19c05dfaac43c627f86e897c8c00f1f9440754aa
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jun 5 10:54:16 2014 -0500
|
||
|
||
CHANGELOG update (for 0.1.2).
|
||
|
||
commit 00f232f8ed1f7c41619b12ebf779ebe2c3b2d3cd (tag: 0.1.2)
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Jun 2 13:40:57 2014 -0500
|
||
|
||
Added single-precision micro-kernel for Knights Corner aka MIC aka Xeon Phi
|
||
|
||
commit 3fc60e491426f6248c0feae88d971e4d1f88fb95
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 21 11:34:42 2014 -0500
|
||
|
||
Fixed ldim alignment bug in core2 gemm ukernel.
|
||
|
||
Details:
|
||
- Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in
|
||
a segmentation fault if a column-stored matrix's starting address was
|
||
aligned, but its leading dimension was such that its second column was
|
||
unaligned. Basically, the micro-kernel was assuming that aligned load
|
||
instructions were safe when they actually were not. An extra condition
|
||
that checks the alignment of cs_c (ie: the leading dimension in the
|
||
column storage case) has now been added. Thanks to Michael Lehn for
|
||
reporting this bug.
|
||
|
||
commit 77a2d8dac8b242d7a202c9aabda3927ab68cf987
|
||
Merge: 8c5d6071 21fb0893
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue May 20 09:53:19 2014 -0500
|
||
|
||
Merge pull request #8 from tlrmchlsmth/master
|
||
|
||
Added multithreading to most level-3 operations.
|
||
|
||
commit 21fb089387ee7c87f6dc53b0f60f68b48d3ff3e8
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon May 19 20:38:55 2014 -0700
|
||
|
||
Reverting changes dunnington and reference configs
|
||
|
||
Now they are unchanged from the main branch of BLIS
|
||
|
||
commit 8a0ef0e0db5880730425926f8ba56b457a2ba764
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri May 16 13:44:14 2014 -0500
|
||
|
||
Fixed rounding error in bli_get_range_weighted
|
||
|
||
commit 0b4b1680334528b1b60bc696537600f763198e92
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri May 16 12:23:37 2014 -0500
|
||
|
||
Fixed bug with disabling JC loop threading for right sided trmm
|
||
|
||
commit 5c048a90d8dfa1dbde4e45fbc10ffcbdfe59d960
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed May 14 16:20:06 2014 -0500
|
||
|
||
Disabled parallelism for right-sided TRMM JC loop
|
||
|
||
The loop has dependent iterations.
|
||
|
||
commit 13a4c717ed0e273359dbaf5554cc4fa70b087d71
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed May 14 14:59:04 2014 -0500
|
||
|
||
Fixed bug with bli_get_range_weighted
|
||
|
||
commit 45957cc7745e9bb1698408d72f53ef192e960820
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Tue May 13 17:14:46 2014 -0500
|
||
|
||
Allowed threading to be turned off
|
||
|
||
No longer requires OpenMP to compile
|
||
Define the following in bli_config.h in order to enable multithreading:
|
||
BLIS_ENABLE_MULTITHREADING
|
||
BLIS_ENABLE_OPENMP
|
||
|
||
Also fixes a bug with bli_get_range_weighted
|
||
|
||
commit bd1dc98ce599d74513a553fe3b37a2ebca1c3812
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon May 12 17:26:19 2014 -0500
|
||
|
||
Disabled multithreading of the kc loop
|
||
|
||
commit 456df0372170bd7ca2c7e2d85365a69f1f04de88
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Apr 30 12:28:00 2014 -0500
|
||
|
||
Replaced register blocksize hack with querying the register blocksize for determining parallelism granularity
|
||
|
||
commit f4fdfe8fc573553eb36795b79cdf681270dab71b
|
||
Merge: 31bb065b 8c5d6071
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Apr 30 11:46:35 2014 -0500
|
||
|
||
Merge http://github.com/flame/blis
|
||
|
||
commit 8c5d6071e24ba10a53669390a47287e86ff354ce
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 29 12:26:12 2014 -0500
|
||
|
||
Added _check() routines for fprint[mv], rand[mv].
|
||
|
||
Details:
|
||
- Added _check() routines for fprintm, fprintv, randm, and randv.
|
||
- Added invocations to the above routines from their respective
|
||
front-ends.
|
||
|
||
commit 262cdabcc885bcf6636f4d8bb7d320f95e81d820
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 28 16:48:25 2014 -0500
|
||
|
||
Changed treatment of NULL object buffers.
|
||
|
||
Details:
|
||
- Relaxed the constraint in bli_obj_attach_buffer_check(), which required
|
||
the buffer address being attached to be non-NULL. This is acceptable
|
||
because the user was already able to create and use objects with NULL
|
||
buffers (via bli_obj_create_without_buffer(), which initializes the
|
||
buffer to NULL).
|
||
- Inserted calls to newly defined function, bli_check_object_buffer(),
|
||
into nearly all operations' _check() or _int_check() functions. This
|
||
allows BLIS to abort peacefully if a computational routine is called
|
||
with an object containing a NULL buffer. By contrast, under such
|
||
conditions, BLAS would typically fail with a segmentation fault.
|
||
- Within operation front-ends, moved the calls to _check()/_int_check()
|
||
so that zero dimensions are checked first (and if found, execution
|
||
returns with trivial or no computation). This resolves issue #7. Thanks
|
||
to Jack Poulson for reporting this bug.
|
||
|
||
commit 31bb065ba40ae0c5a614e743b8025abca012b99e
|
||
Merge: 20e24430 7c619599
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Apr 23 12:30:19 2014 -0500
|
||
|
||
Merge http://github.com/flame/blis
|
||
|
||
commit 7c61959955c8ba78160d0ed4d1979022029d963b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 10 17:18:36 2014 -0500
|
||
|
||
Can now query register blocksizes from blk algs.
|
||
|
||
Details:
|
||
- Added a new field to blksz_t objects that allows one to attach a
|
||
sub-object. Doing this allows us to associate a register blocksize with
|
||
any given cache blocksize. That way, the register blocksize can be
|
||
queried wherever the cache blocksize would normally be accessible
|
||
(e.g. a blocked algorithm).
|
||
- Modified bli_gemm_cntl.c (and 4m/3m variants) so that the register
|
||
blocksizes are attached to the cache blocksizes after they are created.
|
||
|
||
commit 58671597d3d450817b2eda576c05ed6dadd8af6d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 10 15:35:30 2014 -0500
|
||
|
||
Minor cleanups to level-2 _cntl.c files.
|
||
|
||
Details:
|
||
- Changed level-2 _cntl.c files so that the blocksizes for gemv are
|
||
imported and used, rather than blocksizes being declared locally.
|
||
- Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as
|
||
4m/3m variants).
|
||
- Removed test/old/test_blis2.c.
|
||
|
||
commit 20e24430a772bc0fbaf24dec2f8c544096fd3f4e
|
||
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
|
||
Date: Tue Apr 8 17:50:44 2014 +0000
|
||
|
||
Some fixes for the bgq kernels
|
||
|
||
commit bde697f75ec1e7f2decebee0c9bd620b4c134cd5
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Apr 4 16:43:44 2014 -0500
|
||
|
||
Add -openmp to ldflags as well
|
||
|
||
commit c332be8cd471eeace7b4fa4ae7443088b6a68ec3
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Apr 4 16:37:50 2014 -0500
|
||
|
||
Added -openmp flag to Xeon Phi build for convenience
|
||
|
||
commit e7ca9e4b4a24d585c9aec8293fc7bb79e4171ad0
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Apr 4 16:31:15 2014 -0500
|
||
|
||
Used BLIS_DEFAULT_*_MR for rounding partitioning instead of BLIS_DEFAULT_*_MC
|
||
|
||
commit 7b9b228c6fa4cfb70b1ebb855b009a036e85fac3
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Apr 4 16:29:10 2014 -0500
|
||
|
||
Fix for tree barrier freeing bug
|
||
|
||
commit 5ec93bd9a76096312d51c326ccde1e9bd0a436ab
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Apr 4 15:09:10 2014 -0500
|
||
|
||
Bunch of minor fixes
|
||
|
||
Removed barrier after unpackm in all level3 blocked variants
|
||
Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case)
|
||
|
||
Moved the enabling of the tree barriers into bli_config.h
|
||
Fed the default MR and NR for double precision into bli_get_range instead of the number 8
|
||
|
||
commit 575fb9b0b08f3bdb56ccde056da619d1585617c1
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Apr 4 12:13:29 2014 -0500
|
||
|
||
Changed default blocking factor to default double precision MR and NR
|
||
|
||
commit ab9c7880335c281432d5809fe0dec46753d22569
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Apr 4 11:38:11 2014 -0500
|
||
|
||
Added faster tree barriers necessary for performance for Xeon Phi
|
||
|
||
Fixed up some stuff in the thread info free functions
|
||
Disabled threading for TRSM so that it actually works when threading environment variables are set
|
||
|
||
commit ec58a7923cccac08632670caadf3cf6ff5dce766
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Apr 4 10:22:48 2014 -0500
|
||
|
||
Freeing thread info paths.
|
||
|
||
Also made herk IC and JC loops do weighted partitioning
|
||
|
||
commit 2b6848b2397d6d84ca4e5f792fc51ad05e351a36
|
||
Merge: 4e3eb39a 21a0efb3
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Apr 4 09:54:54 2014 -0500
|
||
|
||
Merge http://github.com/flame/blis
|
||
|
||
Conflicts:
|
||
kernels/bgq/1/bli_axpyv_opt_var1.c
|
||
kernels/bgq/1/bli_dotv_opt_var1.c
|
||
|
||
commit 4e3eb39aca4df0b9fdc003d468f368a2f2ba597d
|
||
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
|
||
Date: Fri Apr 4 14:50:03 2014 +0000
|
||
|
||
Some fixes to the bgq config
|
||
MR and NR for double complex were wrong
|
||
Default fusing factor for double precision was wrong as well
|
||
|
||
commit 21a0efb33d7435139e9c43c1a4787a6bff533e26
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 3 16:38:44 2014 -0500
|
||
|
||
Fixed follow-up to issue #6.
|
||
|
||
commit c318157a9bee8ea6e59be16f99f65d9271fe0d27
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 3 16:24:34 2014 -0500
|
||
|
||
Fixed issue #6 (incorrect 'restrict' usage).
|
||
|
||
Details:
|
||
- Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels.
|
||
(However, there may be other instances of similar misuse elsewhere in
|
||
BLIS.) Thanks to Jeff Hammond for reporting this issue.
|
||
|
||
commit b5150a1bf3bd89598e2b3aeac110eb5b44ac6c12
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 3 12:25:45 2014 -0500
|
||
|
||
Added #include "arm_neon.h" to ARM gemm ukernel.
|
||
|
||
Details:
|
||
- Inserted #include "arm_neon.h" into gemm ukernel source file for
|
||
arm/neon. Thanks to Jean-Michel Hautbois for suggesting this fix.
|
||
|
||
commit 2041c264517b6c590fd4f7e8253e6911b622d1c3
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Apr 3 10:30:03 2014 -0500
|
||
|
||
Added barriers needed prior to doing scalar reset for rank-k updates.
|
||
|
||
commit 47a90e69dfde3f4f8fdf90654248a6b499fbadbc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 1 14:34:31 2014 -0500
|
||
|
||
Attempted to fix uninitialized variable warnings.
|
||
|
||
Details:
|
||
- Added initialization statements to various macros used in level 1m and
|
||
1m-like operations. I wasn't able to reproduce the reported behavior,
|
||
so hopefully this takes care of it. Thanks to Jeff Hammond for the
|
||
report.
|
||
|
||
commit d27b4f690c14b1f836f8c7a3c0e91e09d852f02e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 1 12:57:24 2014 -0500
|
||
|
||
Use generic paths for toolchain in POWER7.
|
||
|
||
Details:
|
||
- Fixed issue #4. Thanks to Jeff Hammond for contributing changes.
|
||
|
||
commit 1584ae1c83c3a8c1af76acb46404747507650f19
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Mar 28 15:15:48 2014 -0500
|
||
|
||
Fixed race condition involving scalar reset
|
||
|
||
commit 459dde4acc09e49380da58fb7b246db488884ad9
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Mar 27 17:06:45 2014 -0500
|
||
|
||
Made barrier after packing implicit.
|
||
|
||
This also fixed a bug where barriers in the blocked variants were inserted after the inner packing routines,
|
||
but not the outer packing routines.
|
||
This allowed, for instance, the block of B to not be finished being packed before computation to occur.
|
||
|
||
commit 9f78ec6e7e95fcad89a167b27cad7e2d74b6d122
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Mar 27 14:18:46 2014 -0500
|
||
|
||
Some fixes for the internal functions,
|
||
was innappropriately only having thread chief do some things.
|
||
|
||
commit a6fd48345424e097f71652be013aa897e098b41e
|
||
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
|
||
Date: Wed Mar 26 17:19:46 2014 +0000
|
||
|
||
Added test drivers for level 3 BLAS that run tests in parallel using MPI
|
||
|
||
commit 73b3db594864be0f9be9a0eb29bf961fa9c95f29
|
||
Author: Tyler Michael Smith <tmsmith@vestalac1.ftd.alcf.anl.gov>
|
||
Date: Wed Mar 26 15:39:05 2014 +0000
|
||
|
||
Some fixes for the bgq configuration
|
||
|
||
commit f0824a04fc75e231c3a3d7757fa4e7294173282f
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Mar 24 15:21:42 2014 -0500
|
||
|
||
Initial commit to enable threading in TRSM,
|
||
|
||
Also enabled weighted partitioning for herk, trmm
|
||
Fixed bug where multiple threads would try to modify the same state in the internal level 3 functions
|
||
Correctly computed a_next and b_next for gemm, herk macrokernels
|
||
a_next and b_next point to the current micropanels in trmm
|
||
|
||
commit 23d9eab354fbc88165889832955e126772bf8488
|
||
Merge: 5d5dc2ee fd3e32a5
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Mar 20 16:54:35 2014 -0500
|
||
|
||
Merge https://github.com/flame/blis
|
||
|
||
commit 5d5dc2eedef2f7c90d61371a1b457be5c06cf583
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Mar 20 16:43:36 2014 -0500
|
||
|
||
Parallelized trmm and trmm3
|
||
|
||
Also fixed bugs in packm
|
||
|
||
commit fd3e32a5f419fa412f46afe4dd1c3a26e15f3eb4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Mar 20 13:59:48 2014 -0500
|
||
|
||
Refined INSERT_GENTFUNC macro usage.
|
||
|
||
Details:
|
||
- Defined new INSERT_GENTFUNC macros so that the macro always takes
|
||
exactly the number of arguments needed for the particular operation or
|
||
variant being defined. Many operations were using INSERT_GENTFUNC
|
||
macros that expected one auxiliary argument even though none were
|
||
needed. Those instances have now been updated. Most of these instances
|
||
were in the level-0 and -1v operations, as well as some operations
|
||
defined in frame/util.
|
||
|
||
commit 9b0e715f29338a1a1d6445907d2445c35f011121
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Mar 19 15:47:54 2014 -0500
|
||
|
||
Minor simplifications to trmm, trsm macro-kernels.
|
||
|
||
Details:
|
||
- Simplified some code that would have allowed the diagonal of a trmm
|
||
or trsm triangular matrix to intersect the short end of a micro-panel.
|
||
This is disallowed via higher-level constraints on cache blocksizes, so
|
||
this code was never needed and only served to obfuscate.
|
||
- Updated some comments in trmm, trsm macro-kernels.
|
||
|
||
commit a3902750b9ab4923433f7e353f3669c3c419f8e4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Mar 19 12:35:17 2014 -0500
|
||
|
||
Reorganized norm operations.
|
||
|
||
Details:
|
||
- Completely reoganized norm operations:
|
||
- Renames:
|
||
- fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm)
|
||
- absumv -> norm1v (vector 1-norm)
|
||
- New operations:
|
||
- norm1m (matrix 1-norm)
|
||
- normiv, normim (infinity-norm)
|
||
- amaxv (BLAS-like absolute maximum value index)
|
||
- asumv (BLAS-like absolute sum)
|
||
- Deprecated absumm, as it did not correspond to any actual norm.
|
||
(However, an inlined version now exists in the testsuite module for
|
||
randm.)
|
||
|
||
commit c0140cb752f27e99742f85d23be2181c00a1335e
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Wed Mar 19 11:21:16 2014 -0500
|
||
|
||
Fixed packm variants 3 and 4 where every thread was trying to manipulate the same state
|
||
|
||
Now just performed by the master thread.
|
||
|
||
commit fb42983bd9943711baa7d1c6496de1215bb816ef
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Tue Mar 18 16:37:28 2014 -0500
|
||
|
||
Fixed a barrier bug and a thread decorator bug
|
||
|
||
commit aa2405f8b23d0f8d2ec04790882f2176ef2e8fd8
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Tue Mar 18 15:23:09 2014 -0500
|
||
|
||
Fixing function pointer issues with thread decorator
|
||
|
||
commit ec8b88f93533942d3711191873310e7ff281bda6
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Tue Mar 18 14:35:37 2014 -0500
|
||
|
||
Enabled threading for packm blocked variants 3 and 4
|
||
|
||
commit 0ac534cdf657bbf04601abfe719ba2887aab5da7
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Tue Mar 18 13:26:27 2014 -0500
|
||
|
||
Added decorator for calling parallelized intermal functions
|
||
|
||
Will allow for easy support for different threading models
|
||
|
||
commit 5296f58975f7d351f88909cc80b6d0cffd73def7
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Mar 17 17:15:35 2014 -0500
|
||
|
||
Fixing some bugs with herk parallelization
|
||
|
||
commit c51d0110831eb89361b4720bf7ed75edbd26ebce
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Mar 17 15:00:47 2014 -0500
|
||
|
||
Initial multithreading support for HERK
|
||
|
||
commit c720b141568d1f289146bf34ded08001f2c0dfbb
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Mar 17 11:39:32 2014 -0500
|
||
|
||
Switched to using environment variables to control threading.
|
||
|
||
The environment variables all follow the format BLIS_X_NT,
|
||
where X is the index of the loop as described in our paper
|
||
Anatomy of High Performance Many-Threaded Matrix Multiplication.
|
||
These indices are IR, JR, IC, KC, and JC.
|
||
|
||
Also enabled parallelism for hemm and symm, but these are currently untested.
|
||
|
||
commit 92233cf64274b27b2217c5cfffe75443ff6137a4
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Tue Mar 11 14:16:08 2014 -0500
|
||
|
||
Some fixes to gemm thread info tree creation,
|
||
Changed microkernel tests to use the new BLIS_PACKM_SINGLE_THREADED
|
||
instead of BLIS_SINGLE_THREADED
|
||
|
||
commit 020f80c30289d8bcaa688bf600b01fae9b23b54f
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Tue Mar 11 12:08:17 2014 -0500
|
||
|
||
Added files specific to threading for gemm and packm operations
|
||
|
||
commit 8d8f4352a41926bc923e47be836365b6b726aff2
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Mar 10 15:47:28 2014 -0500
|
||
|
||
Added single threaded thread info data structures specifically for gemm and packm
|
||
|
||
commit 0e8677761175189583ca7d855e24b2bbdd2dada8
|
||
Merge: 2e727a02 b3bff631
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Mar 10 15:16:21 2014 -0500
|
||
|
||
Merge branch 'master' of https://github.com/tlrmchlsmth/blis
|
||
|
||
commit 2e727a025a8f796d2b6bd14f489d0ee72e7d1fc7
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Mon Mar 10 15:14:33 2014 -0500
|
||
|
||
Modifying the thread info data structures
|
||
|
||
This change makes each operation have its own thread info type,
|
||
allowing more fine control of threading in operations that have different types of suboperations
|
||
|
||
commit a770590cf21a459f04bf941c58ee2afd272cc441
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 3 14:31:44 2014 -0600
|
||
|
||
Minor fixes to sumsqv, abmaxv.
|
||
|
||
Details:
|
||
- Minor update to bli_sumsqv_unb_var1() to bring it up-to-date with
|
||
LAPACK 3.5.0's zlassq.f, which, starting with 3.4.2, returns NaN when
|
||
the vector (or matrix) contains a NaN.
|
||
- Minor change to bli_abmaxv_unb_var1() to more closely mimic the
|
||
behavior of netlib BLAS's izamax(). There, a "less than or equal to"
|
||
operator is used in the search instead of "less than", which would
|
||
change the element index returned if there were multiple maximum values.
|
||
- Added macro function definitions for bli_isinf() and bli_isnan(), which
|
||
are currently implemented in terms of isinf() and isnan() from math.h.
|
||
|
||
commit b3bff631eadf98b15cb422fb4a8e2f855c23e8a7
|
||
Merge: 2c158fb8 e8757b03
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Feb 27 16:53:24 2014 -0600
|
||
|
||
Merge https://github.com/flame/blis
|
||
|
||
commit 2c158fb885c27f7b599dc1e85b57edd684f19223
|
||
Merge: e4738c48 c2b2ab62
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Feb 27 16:46:23 2014 -0600
|
||
|
||
Merge https://github.com/flame/blis
|
||
|
||
Conflicts:
|
||
frame/1m/packm/bli_packm_blk_var1.c
|
||
|
||
commit e8757b03a74f9891632242e9a90efb32150826f5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 27 16:40:07 2014 -0600
|
||
|
||
Use "%ld" as int format specifier in fprintm.
|
||
|
||
Details:
|
||
- Changed "%d" to "%ld" when printing integers via bli_fprintm().
|
||
- Meant to include this in previous commit.
|
||
|
||
commit c663ce3b5170fee7dfb5b528b650d70c8e932cac
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 27 16:32:57 2014 -0600
|
||
|
||
Fixed various bugs when C99 complex is enabled.
|
||
|
||
Details:
|
||
- Fixed various bugs in packm_*_cxk(), the 4m/3m micro-kernels, and
|
||
elsewhere in the framework that were not yet set up to work properly
|
||
when BLIS_ENABLE_C99_COMPLEX is defined in bli_config.h
|
||
- Extensive changes to f2c-derived files in frame/compat/f2c to allow
|
||
C99 complex storage. Most of these changes center around accessing
|
||
real and imaginary components via bli_?real()/bli_?imag() accessor
|
||
macros, and setting of values via bli_?sets() assignment macros.
|
||
(Thanks to Vladimir Sukarev for pointing out that _ENABLE_C99_COMPLEX
|
||
was broken.)
|
||
|
||
commit e4738c48e00b89391d9baa1fd0aa62d1ea2f95e6
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Feb 27 16:29:46 2014 -0600
|
||
|
||
Added support for parallelism in gemm micro-kernel
|
||
|
||
commit bfe214b633765ed40b57b330fbb84c332663aa40
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Feb 27 15:53:10 2014 -0600
|
||
|
||
Fixed bug with parallel packing, and bug with allocating an array of thread infos
|
||
|
||
In packm variant 1, the variable p_begin was incremented each iteration, causing a dependency.
|
||
This dependeny was removed, allowing each iteration to be executed in parallel.
|
||
|
||
Somewhere in bli_threading.c, I was allocating an array of pointers instead of an array of structs.
|
||
|
||
commit 6193d9ceea552e67170dba45abde04c64271c705
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Feb 27 14:09:19 2014 -0600
|
||
|
||
Fixed bug in thread trees
|
||
|
||
commit ac5a2de1d17ffd460b00fee9757898525a09abae
|
||
Merge: 01b125e8 bd3c7ecf
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Feb 27 11:59:33 2014 -0600
|
||
|
||
Merge branch 'master' of https://github.com/tlrmchlsmth/blis
|
||
|
||
commit 01b125e815f19410e8e0611d088b84570e499e93
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Thu Feb 27 11:55:45 2014 -0600
|
||
|
||
First pass at adding parallelism to BLIS.
|
||
|
||
Added a multithreading infrastructure that should be independent of multithreading implementation in the future.
|
||
Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized.
|
||
|
||
commit c2b2ab62707e4174892aff3ce65f36f54878fae5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 26 12:46:45 2014 -0600
|
||
|
||
Deprecated panel stride alignment in bli_config.h.
|
||
|
||
Details:
|
||
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all
|
||
configurations. It was already going unused in packm_init() since the
|
||
recent 4m/3m commit. This setting was rarely, if ever, useful, and its
|
||
existence only posed a potential risk for 4m/3m-based implementations.
|
||
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h.
|
||
- Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template
|
||
micro-kernels.
|
||
|
||
commit f18aee83a5ac1b14808686fc3c5a3c846a1d99b9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 25 17:58:42 2014 -0600
|
||
|
||
CHANGELOG update (for 0.1.1).
|
||
|
||
commit fde5f1fdece19881f50b142e8611b772a647e6d2 (tag: 0.1.1)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 25 13:34:56 2014 -0600
|
||
|
||
Added extensive support for configuration defaults.
|
||
|
||
Details:
|
||
- Standard names for reference kernels (levels-1v, -1f and 3) are now
|
||
macro constants. Examples:
|
||
BLIS_SAXPYV_KERNEL_REF
|
||
BLIS_DDOTXF_KERNEL_REF
|
||
BLIS_ZGEMM_UKERNEL_REF
|
||
- Developers no longer have to name all datatype instances of a kernel
|
||
with a common base name; [sdcz] datatype flavors of each kernel or
|
||
micro-kernel (level-1v, -1f, or 3) may now be named independently.
|
||
This means you can now, if you wish, encode the datatype-specific
|
||
register blocksizes in the name of the micro-kernel functions.
|
||
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
|
||
undefined in bli_kernel.h will default to the corresponding reference
|
||
implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
|
||
it will be defined to be BLIS_DGEMM_UKERNEL_REF.
|
||
- Developers no longer need to name level-1v/-1f kernels with multiple
|
||
datatype chars to match the number of types the kernel WOULD take in
|
||
a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
|
||
sufficient, as in bli_daxpyv_opt().
|
||
- There is no longer a need to define an obj_t wrapper to go along with
|
||
your level-1v/-1f kernels. The framework now prvides a _kernel()
|
||
function which serves as the obj_t wrapper for whatever kernels are
|
||
specified (or defaulted to) via bli_kernel.h
|
||
- Developers no longer need to prototype their kernels, and thus no
|
||
longer need to include any prototyping headers from within
|
||
bli_kernel.h. The framework now generates kernel prototypes, with the
|
||
proper type signature, based on the kernel names defined (or defaulted
|
||
to) via bli_kernel.h.
|
||
- If the complex datatype x (of [cz]) implementation of the gemm micro-
|
||
kernel is left undefined by bli_kernel.h, but its same-precision real
|
||
domain equivalent IS defined, BLIS will use a 4m-based implementation
|
||
for the datatype x implementations of all level-3 operations, using
|
||
only the real gemm micro-kernel.
|
||
|
||
commit 15b51e990f1d21333b5f7af97c211756247336e5
|
||
Merge: 6363a9f6 fc04b5eb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 21 09:04:32 2014 -0600
|
||
|
||
Merge branch 'master' of github.com:fgvanzee/blis
|
||
|
||
commit fc04b5eb69868c341ce03f5ef1f02de4b8c121b0
|
||
Merge: b29e1c2b d1813c9d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 21 09:04:13 2014 -0600
|
||
|
||
Merge pull request #3 from figual/master
|
||
|
||
New ARM armv7a kernels and Assembly file consideration in Makefile
|
||
|
||
commit d1813c9dee34410833db5061e6588ec1a6c9ecd4
|
||
Author: Francisco Igual <figual@pandaboard.(none)>
|
||
Date: Fri Feb 21 15:14:31 2014 +0100
|
||
|
||
Added new armv7a micro-kernels and configuration files from Werner Saar.
|
||
|
||
commit 0cd098c03a000ed9426a7e9135190696da8cadbc
|
||
Author: Francisco Igual <figual@pandaboard.(none)>
|
||
Date: Fri Feb 21 15:12:30 2014 +0100
|
||
|
||
o Modified Makefile to consider .S assembly microkernels.
|
||
|
||
commit 6363a9f658257fe3d814a3dce5308f807adb54a2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 19 17:00:52 2014 -0600
|
||
|
||
Added level-3 support for complex via 4m-/3m.
|
||
|
||
Details:
|
||
- Added the ability to induce complex domain level-3 operations via new
|
||
virtual complex micro-kernels which are implemented via only real
|
||
domain micro-kernels. Two new implementations are provided: 4m and 3m.
|
||
4m implements complex matrix multiplication in terms of four real
|
||
matrix multiplications, where as 3m uses only three and thus is
|
||
capable of even higher (than peak) performance. However, the 3m method
|
||
has somewhat weaker numerical properties, making it less desirable
|
||
in general.
|
||
- Further refined packing routines, which were recently revamped, and
|
||
added packing functionality for 4m and 3m.
|
||
- Some modifications to trmm and trsm macro-kernels to facilitate indexing
|
||
into micro-panels which were packed for 4m/3m virtual kernels.
|
||
- Added 4m and 3m interfaces for each level-3 operation.
|
||
- Various other minor changes to facilitate 4m/3m methods.
|
||
|
||
commit b29e1c2b278c177e104c84ba462820ee8296df6c
|
||
Merge: ee60377e bd3c7ecf
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 14 14:11:54 2014 -0600
|
||
|
||
Merge pull request #2 from tlrmchlsmth/master
|
||
|
||
Fixes and improvements to xeon phi implementation.
|
||
|
||
commit bd3c7ecfb54a9b9851c7d364f41c21e4cff52f6f
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Feb 14 14:05:57 2014 -0600
|
||
|
||
Removing changes to input.general and input.operations
|
||
|
||
commit ce066863683cb4e910270cf8ab8e138b01ff3358
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Feb 14 13:40:24 2014 -0600
|
||
|
||
Fixed more Xeon Phi bugs, especially with scattered update
|
||
|
||
commit 31134b5c7076423aee1b4f494e925f27171d97e6
|
||
Author: Tyler Smith <tms@cs.utexas.edu>
|
||
Date: Fri Feb 14 11:19:44 2014 -0600
|
||
|
||
Some fixes, changes, and improvements to the microkernel to the Xeon Phi
|
||
|
||
commit ee60377e467862b9d8a7205c45dce5cf66c78c46
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 13 14:03:31 2014 -0600
|
||
|
||
Shifted some fields in info_t.
|
||
|
||
Details:
|
||
- Shifted the pack order, pack buffer type, and structure type fields
|
||
to make room for an extra bit in the pack type/status field.
|
||
|
||
commit bd3ab1ad4cf42f8bc30ab262acf8eccb49bb1a08
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 13 09:29:55 2014 -0600
|
||
|
||
Minor fixes to trsm consistent with prev on trmm.
|
||
|
||
Details:
|
||
- Removed use of bli_min() and bli_max() that were only being used to
|
||
try to support situations where the diagonal would intersect the
|
||
short end of some micro-panels, which is situation that is disallowed
|
||
at a higher level by various constraints on the register and cache
|
||
blocksize. This only affected trsm_ll and trsm_lu.
|
||
- Use panel stride as passed into the macro-kernel rather than compute
|
||
it via k and PACKMR/PACKNR. This affects all macro-kernels of trsm.
|
||
|
||
commit 6260b0b5f8bd248f3f66e5a1c6854bdbd9d02ad0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 13 09:19:56 2014 -0600
|
||
|
||
Fixed obscure bug in trmm_ll, trmm_lu.
|
||
|
||
Details:
|
||
- Fixed an obscure bug in left-hand trmm that would only manifest when
|
||
non-zero register blocksize extensions (PACKMR > MR or PACKNR > NR)
|
||
are used.
|
||
- Removed use of bli_min() and bli_max() that were only being used to
|
||
try to support situations where the diagonal would intersect the
|
||
short end of some micro-panels, which is situation that is disallowed
|
||
at a higher level by various constraints on the register and cache
|
||
blocksize. This only affected trmm_ll and trmm_lu.
|
||
- Use panel stride as passed into the macro-kernel rather than compute
|
||
it via k and PACKMR/PACKNR. This affects all macro-kernels of trmm.
|
||
|
||
commit 16915c1c1e55c660bf82141cdadf7c0860d5b464
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 11 10:54:19 2014 -0600
|
||
|
||
Fixed an obscure bug in packm_cxk().
|
||
|
||
Details:
|
||
- Fixed a bug in packm_cxk() whereby the packm ukernel was being chosen
|
||
from ldp, which is always equal to PACKMR or PACKNR. The problem with
|
||
this is that the pack ukernels were implicitly assuming that the
|
||
panel dimension of the panel being packed was equal to ldp, which
|
||
is not the case when the register blocksizes extensions are non-zero
|
||
(ie: when PACKMR > MR or PACKNR > NR, whichever is applicable). This
|
||
problem has been fixed by passing ldp into the pack ukernels, which
|
||
now walk through the packed micro-panel region by incrementing by this
|
||
value, rather than incrementing by the inherent panel dimension value
|
||
assumed by each packm ukernel (e.g. 4 in the case of packm_ref_4xk).
|
||
- Also fixed a very minor edge case inefficiency whereby pack ukernels
|
||
smaller than the default were not being used in edge cases, and instead
|
||
those situations were being handled by scal2m. This is related to the
|
||
issue above, because the pack ukernel itself was being chosen based on
|
||
ldp instead of the panel dimension.
|
||
|
||
commit b7da57b282c5a5e2208946e60309d2352f55351d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 11 10:28:23 2014 -0600
|
||
|
||
Updated calls to packm_blk_var2() in testsuite.
|
||
|
||
Details:
|
||
- In ukernel testsuite modules, replaced calls to packm_blk_var2() with
|
||
_var1(). Meant to include this in previous commit.
|
||
|
||
commit c255a293e25b2223c88e8800267cd06ad2a90041
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Feb 10 14:31:24 2014 -0600
|
||
|
||
Consolidated packm_blk_var2 and var3.
|
||
|
||
Details:
|
||
- Consolidated the functionality previously supported by packm_blk_var2()
|
||
and packm_blk_var3() into a new variant, packm_blk_var1().
|
||
- Updates to packm_gen_cxk(), packm_herm_cxk.c(), and packm_tri_cxk()
|
||
to accommodate above changes.
|
||
- Removed packm_blk_var3() and retired packm_blk_var2() to
|
||
frame/1m/packm/old.
|
||
- Updated all level-3 _cntl_init() functions so that the new, more
|
||
versatile packm_blk_var1 is used for all level-3 matrix packing.
|
||
|
||
commit 32d8f264ae7b28155f5d7b21dcc5ecb78da2e0ab
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Feb 9 10:07:37 2014 -0600
|
||
|
||
Refactored packm variants.
|
||
|
||
Details:
|
||
- Revised packm_blk_var2() and _var3() by encapsulating the general,
|
||
hermitian/symmetric, and triangular panel-packing subproblems into
|
||
separate functions: packm_gen_cxk(), packm_herm_cxk(), and
|
||
packm_tri_cxk(), respectively. Also, homogenized the packm code as
|
||
well as the new specialized packm_*_cxk() code to further improve
|
||
readability.
|
||
|
||
commit 6c8067028707947fcdf4f856a272e15bb9ed91e3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 7 11:27:15 2014 -0600
|
||
|
||
Renamed enumerated type in testsuite and modules.
|
||
|
||
Details:
|
||
- Renamed the test suite's "mt_impl_t" enumerated type to "iface_t", and
|
||
renamed all corresponding "impl" variables to "iface".
|
||
|
||
commit 6c12598b1bc567f0b08f58aebdc753a1c1390378
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 6 18:26:35 2014 -0600
|
||
|
||
Employ simpler INSERT_ macro for ref ukernels.
|
||
|
||
Details:
|
||
- Defined a new macro, INSERT_GENTFUNC_BASIC0, which takes only one
|
||
argument--the base name of the function--and employed this macro
|
||
in the reference micro-kernel files instead of the _BASIC macro,
|
||
which takes one auxiliary argument. That argument was not being
|
||
used and probably just acted to unnecessarily obfuscate.
|
||
|
||
commit 32cae66326b68706d0e695cfd60c9ca5bc32c534
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 6 18:06:42 2014 -0600
|
||
|
||
Fixed some instances of sloppy 'restrict' usage.
|
||
|
||
Details:
|
||
- Fixed some technical incorrectness with some usage of the 'restrict'
|
||
keyword in the reference trsm micro-kernels.
|
||
- Tweak to testsuite/Makefile that causes rebuild if libblis was
|
||
touched.
|
||
|
||
commit 7aceef7683e2a2aff3c7ec2a73508036af2e19e2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 6 17:31:19 2014 -0600
|
||
|
||
Updated comments in macro-kernels.
|
||
|
||
Details:
|
||
- Updated (and fixed some errors in) the "Assumptions/assertions" comment
|
||
section of macro-kernels.
|
||
- Changed register blocksizes of reference configuration to MR = 8 and
|
||
NR = 4. It's always good for MR != NR in the reference configuration
|
||
since it may help uncover bugs related to non-square micro-kernels.
|
||
|
||
commit 8fd292aa78950bcdf556605718f09d13f9575abc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 6 14:32:21 2014 -0600
|
||
|
||
Pass panel dimensions into macro-kernels.
|
||
|
||
Details:
|
||
- Modified the interfaces to the datatype-specific macro-kernels so that:
|
||
- pd_a and pd_b are passed in (which contain the panel dimensions of
|
||
packed panels of a and b).
|
||
- rs_a and cs_b are no longer passed in (they were guaranteed to be 1).
|
||
- Modified implementations of datatype-specific macro-kernels so pd_a,
|
||
pd_b, cs_a, and rs_b are used instead of cpp macros for MR, NR, PACKMR,
|
||
and PACKNR, respectively.
|
||
- Declare temporary c matrices (ct) as being maxmr-by-maxnr, which for now
|
||
is equivalent to being mr-by-nr. maxmr and maxnr are declared in a new
|
||
header file bli_kernel_post_macro_defs.h.
|
||
|
||
commit 3404e6657eabb017cd1580a2f1dd8e6fb13df923
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 5 11:19:10 2014 -0600
|
||
|
||
Deprecated incremental blocksize macro const defs.
|
||
|
||
Details:
|
||
- Removed macro constant definitions related to incremental blocksizes
|
||
from all configurations' bli_kernel.h files. This change is minor and
|
||
is mostly a cleanup related to a previous commit.
|
||
|
||
commit 1e9afd39a63e0a58167d4439c1a0a880a4a35657
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 4 20:15:19 2014 -0600
|
||
|
||
Comment updates (removed vestiges of "bd").
|
||
|
||
commit 5cf58f7c2d5bc0d2d94d9576f7158d8f133b7aac
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 4 09:15:19 2014 -0600
|
||
|
||
Added early returns for "object is zeros" case.
|
||
|
||
Details:
|
||
- Added some logic to packm_init(), pack_int() and gemm_int() so that
|
||
(a) objects marked as BLIS_ZEROS are not packed, and (b) those
|
||
objects are not computed with. This functionality is not currently
|
||
needed by any existing implementations, but may be used in the
|
||
future.
|
||
|
||
commit 6bbd4be769a9b344a55abe5ddaca1a99fd29f7b4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Feb 3 13:15:25 2014 -0600
|
||
|
||
Added 'f' on some gemm and trmm blocked variants.
|
||
|
||
Details:
|
||
- Added 'f' to some block variant files/functions to be consistent with
|
||
other file/functions' naming convention. Here, the f indicates
|
||
partitioning in the "forward" direction.
|
||
|
||
commit eb13cb2c6b182df5e2a9b88c76f50e2cee25b9e0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Feb 3 11:07:01 2014 -0600
|
||
|
||
Removed redundant non-gemm blksz_t creation.
|
||
|
||
Details:
|
||
- Removed code that creates duplicate blksz_t objects for herk, trmm,
|
||
and trsm. Instead, the gemm blksz_t objects are accessed via extern
|
||
and used directly. This reduces the amount of code associated with
|
||
each of the three _cntl_init() and _cntl_finalize() function.
|
||
|
||
commit 0a023a7d9e58e53b8c204a5f49aa8ca9afeba938
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jan 29 14:02:08 2014 -0600
|
||
|
||
Introduced new level-3 front-end layer.
|
||
|
||
Details:
|
||
- Added new _front() functions for each level-3 operation. This is done
|
||
so that the choosing of the control tree (and *only* the choosing of
|
||
the control tree) happens in what was previously the "front end"
|
||
(e.g. bli_gemm()). That control tree is then passed into the _front()
|
||
function, which then performs up-front tasks such as parameter
|
||
checking.
|
||
|
||
commit 251c5d112196d37b183e554bc9d406104aed65fb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jan 28 19:40:29 2014 -0600
|
||
|
||
Removed redundant hemm, her2k control trees.
|
||
|
||
Details:
|
||
- Removed code that generated a control tree specifically for hemm and
|
||
symm. Instead, the gemm control tree is now configured so that it
|
||
works for gemm, hemm, or symm.
|
||
- Retired most her2k code, as it was not being used. (Currently, her2k is
|
||
implemented as two invocations of herk.) I couldn't think of many
|
||
situations where her2k variants were needed.
|
||
- Removed some older her2k code.
|
||
|
||
commit 5a36e5bf2f59d1e85d6dbce32a07d604c5e82d11
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jan 27 11:13:00 2014 -0600
|
||
|
||
Embed func_t microkernel objects in control trees.
|
||
|
||
Details:
|
||
- Modified all control tree node definitions to include a new field of
|
||
type func_t*, which is similar to a blksz_t except that it contains
|
||
one function pointer (each typed simply as void*) for each datatype.
|
||
We use the func_t* to embed pointers to the micro-kernels to use for
|
||
the leaf-level nodes of each control tree. This change is a natural
|
||
extension of control trees and will allow more flexibility in the
|
||
future.
|
||
- Modified all macro-kernel wrappers to obtain the micro-kernel pointers
|
||
from the incomming (previously ignored) control tree node and then pass
|
||
the queried pointer into the datatype-specific macro-kernel code, which
|
||
then casts the pointer to the appropriate type (new typedefs residing
|
||
in bli_kernel_type_defs.h) and then uses the pointer to call the micro-
|
||
kernel. Thus, the micro-kernel function is no longer "hard-coded" (that
|
||
is, determined when the datatype-specific macro-kernel functions are
|
||
instantiated by the C preprocessor).
|
||
- Added macros to bli_kernel_macro_defs.h that build datatype-specific
|
||
base names if they do not exist already, and then uses those to build
|
||
datatype-specific micro-kernel function names. This will allow
|
||
developers extra flexibility if they wanted to, for example, name each
|
||
of their datatype-specific micro-kernels differently (e.g. double
|
||
real might be named bli_dgemm_opt_4x4() while double complex might be
|
||
named bli_zgemm_opt_2x2()).
|
||
- Inserted appropriate code into _cntl_init() functions that allocates
|
||
and initializes a func_t object for the corresponding micro-kernels.
|
||
The gemm ukernel func_t object is created once, in bli_gemm_cntl_init(),
|
||
and then reused via extern wherever possible.
|
||
|
||
commit 6cbd6f1c7f1915180aa28939833afde48665c5ae
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jan 24 10:38:29 2014 -0600
|
||
|
||
Removed commented mixed domain macro-kernel code.
|
||
|
||
Details:
|
||
- Removed commented-out code from macro-kernels that was supposed to
|
||
facilitate implementing mixed domain (complex times real) matrix
|
||
multiplication. This functionality is still (probably possible),
|
||
but I'm getting tired of looking at the code every time I edit
|
||
a macro-kernel. Plus, there are probably ways of doing it at a
|
||
higher level, via control trees.
|
||
|
||
commit 29778be1119f1a884330d7f8dc424a2df4101d58
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jan 22 16:03:11 2014 -0600
|
||
|
||
Removed b_aux field from cntl nodes.
|
||
|
||
Details:
|
||
- Removed b_aux field from all control tree node definitions. This field
|
||
was being used in certain optimizations (incremental blocking) that were
|
||
not actually being employed within BLIS, and are probably not employed
|
||
by others.
|
||
- Updated all _cntl_obj_create() function definitions and invocations
|
||
according to above change.
|
||
- Retired bli_gemm_blk_var4.c, which was one such function that employed
|
||
incremental blocking, but which was never called by BLIS itself.
|
||
|
||
commit 06ac727a42ec9e832c7832745036702014638f99
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jan 15 16:44:52 2014 -0600
|
||
|
||
Updated some comments in level-3 front ends.
|
||
|
||
commit d628bf1da1560f1f5126a1ddfed8714f0a4b8da3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jan 15 11:40:12 2014 -0600
|
||
|
||
Consolidated pack_t enums; retired VECTOR value.
|
||
|
||
Details:
|
||
- Changed the pack_t enumerations so that BLIS_PACKED_VECTOR no longer has
|
||
its own value, and instead simply aliases to BLIS_PACKED_UNSPEC. This
|
||
makes room in the three pack_t bits of the info field of obj_t so that
|
||
two values are now unused, and may be used for other future purposes.
|
||
- Updated sloppy terminology usage in comments in level-2 front-ends.
|
||
(Replaced "is contiguous" with more accurate "has unit stride".)
|
||
|
||
commit ddc8c1c379b4787be5954802906593d7ea144452
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jan 13 14:55:43 2014 -0600
|
||
|
||
Suppress warning in Makefile (UNINSTALL_LIBS).
|
||
|
||
Details:
|
||
- Redirect errors to /dev/null when using 'find' to locate libraries that
|
||
would be uninstalled upon executing "make uninstall-old". Before, if the
|
||
Makefile was read before $(INSTALL_PREFIX)/lib existed, a "No such file
|
||
or directory" message was emitted. This message was harmless, but is now
|
||
suppressed in this situation.
|
||
|
||
commit f8f67d7251bffc05020e20527c100c8115fd5e55
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jan 10 09:06:11 2014 -0600
|
||
|
||
Typecast bli_getopt() return value in testsuite.
|
||
|
||
Details:
|
||
- In the test suite driver, inserted an explicit typecast of the return
|
||
value of bli_getopt() prior parsing. The lack of typecast caused a
|
||
problem on at least one system whereby a return value of -1 was
|
||
interpreted as garbage character. Thanks to Francisco Igual for finding
|
||
and submitting this fix.
|
||
|
||
commit e7f154fe2ed3e10e2323cefe5d25c2c23ac902c4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jan 10 08:48:07 2014 -0600
|
||
|
||
Applied edge case fix to arm/neon microkernel.
|
||
|
||
Details:
|
||
- Applied an edge case bugfix, courtesy of Francisco Igual, to the current
|
||
double precision real gemm microkernel in kernels/arm/neon/3.
|
||
|
||
commit 89c76a8a51d070d263c13bfa5ace65769509f2b4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jan 9 12:08:37 2014 -0600
|
||
|
||
Allow building outside source distribution.
|
||
|
||
Details:
|
||
- Modified build system (mostly configure and top-level Makefile) so that
|
||
a user can build a BLIS library outside of the top-level directory of
|
||
the source distribution.
|
||
- Added "test" target to Makefile so that the user can run "make test",
|
||
which will compile, link, and run the testsuite binary. This works even
|
||
if the build directory is externally located, thanks to the test suite
|
||
binary's new -g and -o command-line options. Also, when creating the
|
||
test suite via the top-level Makefile, the linking is against the
|
||
local archive, in lib/<configname>, rather than at <install_prefix>/lib.
|
||
- Modified testsuite/Makefile so that it links against the library built
|
||
locally, in ../lib/<configname>.
|
||
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
|
||
- Various other cleanups to build system.
|
||
|
||
commit 12fa82ec12cc340ab28552997d9d50f7c98691f8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jan 8 16:09:26 2014 -0600
|
||
|
||
Implemented bli_getopt().
|
||
|
||
Details:
|
||
- Added bli_getopt.c and .h files to frame/base. These files implement
|
||
a custom version of getopt(), which may be used to parse command line
|
||
options passed into a program via argc/argv. I am implementing this
|
||
function myself, as opposed to using the version available via unistd.h,
|
||
for portability reasons, as the only requirements are string.h (which
|
||
is available via the standard C library).
|
||
- Modified test suite to allow the user to specify the file name (and/or
|
||
path) to the parameters and operations input files: -g may be used to
|
||
specify the general input file and -o to specify the operations input
|
||
file). If -g or -o or both are not given, default filenames are assumed
|
||
(as well as their existence in the current directory).
|
||
|
||
commit cafb58e86ea5cfb21b9eedc57ca8ebbf24252098
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jan 6 13:28:36 2014 -0600
|
||
|
||
Updated template micro-kernels to use auxinfo_t.
|
||
|
||
Details:
|
||
- Updated template micro-kernel implementations (located in
|
||
config/template/kernels), to adhere to the new auxinfo_t interface.
|
||
Meant to include this change in a0331fb1.
|
||
- Changed template configuration to use 64-bit integers (for both BLIS
|
||
and the BLAS compatibility layer).
|
||
|
||
commit 9ab126b499c3805045020cb89a8a5848e28d3bf5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jan 6 12:13:26 2014 -0600
|
||
|
||
Removed error checks in netlib->BLIS param mapping
|
||
|
||
Details:
|
||
- Disabled error checking in netlib-to-BLIS parameter mapping functions.
|
||
If the char value input to these functions was not one of the defined
|
||
values, bli_check_error_code() with the appropriate error code value
|
||
would be called, resulting in an abort(). This was unnecessary and
|
||
redundant since these routines are currently only used within the
|
||
BLAS compatibility layer, and they are only called AFTER parameter
|
||
checking has already been performed on the original BLAS char values.
|
||
If the application tried to override xerbla() to prevent an abort()
|
||
from being called, this error checking would still get in the way.
|
||
Thus, instead of reporting the error situation to the framework (ie:
|
||
calling abort()), an arbitrary BLIS parameter value is now chosen and
|
||
the function returns normally. Thanks to Jeff Hammond for finding and
|
||
reporting this issue.
|
||
|
||
commit 2cb13600f9f9601c60e7f96f4ca159d169ade9cb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jan 3 12:29:13 2014 -0600
|
||
|
||
Updated year in copyright headers to 2014.
|
||
|
||
commit 290fa54e0083c9c837188b8321b13b1b282e7b0c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Dec 20 14:10:26 2013 -0600
|
||
|
||
Store variable panel strides in trmm/trsm auxinfo.
|
||
|
||
Details:
|
||
- Changed the value being stored into the auxinfo_t structure in trmm
|
||
and trsm macro-kernels. Whereas before we stored whatever value was
|
||
provided to the macro-kernel implementation via ps_a/ps_b, now we
|
||
store the stride that will advance to the next variable-length
|
||
micro-panel of the triangular matrix A (left) or B (right).
|
||
- Whitespace changes to the files affected above.
|
||
|
||
commit e3a6c7e77667fd749248df3f75f880266c3136ec
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 19 16:29:31 2013 -0600
|
||
|
||
Macroized conditionals for a2/b2 in macro-kernels.
|
||
|
||
Details:
|
||
- Replaced conditional expressions in macro-kernels related to computing
|
||
the addresses a2 and b2 (a_next and b_next) with a preprocessor macro
|
||
invocation, bli_is_last_iter(), that tests the same condition.
|
||
- Updated gemm_ukr module to use auxinfo_t argument.
|
||
- Whitespace changes in test suite ukr modules.
|
||
|
||
commit a0331fb10a50393e31d16339053b75b944132da1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 19 14:50:11 2013 -0600
|
||
|
||
Introduced auxinfo_t argument to micro-kernels.
|
||
|
||
Details:
|
||
- Removed a_next and b_next arguments to micro-kernels and replaced them
|
||
with a pointer to a new datatype, auxinfo_t, which is simply a struct
|
||
that holds a_next and b_next. The struct may hold other auxiliary
|
||
information that may be useful to a micro-kernel, such as micro-panel
|
||
stride. Micro-kernels may access struct fields via accessor macros
|
||
defined in bli_auxinfo_macro_defs.h.
|
||
- Updated all instances of micro-kernel definitions, micro-kernel calls,
|
||
as well as macro-kernels (for declaring and initializing the structs)
|
||
according to above change.
|
||
|
||
commit 392428dea4001fe4384efe29f6cde32f8abeeb35
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 12 19:01:47 2013 -0600
|
||
|
||
Added "ri" scalar macros.
|
||
|
||
Details:
|
||
- Added set of basic scalar macros that take arguments' real and
|
||
imaginary components separately, named like the previous set except
|
||
with the "ris" (instead of "s") suffix.
|
||
- Redefined the previous set of scalar macros (those that take arguments
|
||
"whole") in terms of the new "ri" set.
|
||
- Renamed setris and getris macros to sets and gets.
|
||
- Renamed setimag0 macros to seti0s.
|
||
- Use bli_?1 macro instead of a local constant in bla_trmv.c, bla_trsv.c.
|
||
|
||
commit f60c8adc2f61eaba06b892f4e73000159de93056
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 10 14:39:56 2013 -0600
|
||
|
||
Minor updates to dunnington configuration.
|
||
|
||
Details:
|
||
- Added commented alternatives to dunnington configuration's bli_kernel.h.
|
||
- Minor reformatting of optimization flag variables in make_defs.mk.
|
||
|
||
commit 4ef20150492db254b5baf2368add62e19b0ac11b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 9 18:53:03 2013 -0600
|
||
|
||
Tweaks to dunnington configuration (x86_64/core2).
|
||
|
||
Details:
|
||
- Updated BLIS_DEFAULT_KC_D from 256 to 384.
|
||
- Enabled cache blocksize extension of up to 25% for MC and KC (for
|
||
double-precision real).
|
||
|
||
commit 5ad2ce7bf5ba3ea955e6d517bfd270e02820263b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 9 18:30:49 2013 -0600
|
||
|
||
Minor x86_64 (core2) kernel fixes.
|
||
|
||
Details:
|
||
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
|
||
for x86_64/core2 were calling the wrong reference code (l instead
|
||
of u).
|
||
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
|
||
kernels.
|
||
- Minor typecasting fix in testsuite/src/test_libblis.c.
|
||
- Makefile updates.
|
||
|
||
commit d289f5d3a9c0e1a68a17c1c32b736e282a289c4c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 5 10:56:13 2013 -0600
|
||
|
||
Whitespace changes to level-2 blocked variants.
|
||
|
||
Details:
|
||
- Joined some lines in level-2 blocked variants to match formatting used
|
||
in level-3 blocked variants.
|
||
- Streamlined implementation of bli_obj_equals() in bli_query.c.
|
||
|
||
commit b444489f100d218bc8ef29b01ff8489c358559f9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 3 16:08:30 2013 -0600
|
||
|
||
Added new "attached" scalar representation.
|
||
|
||
Details:
|
||
- Added infrastructure to support a new scalar representation, whereby
|
||
every object contains an internal scalar that defaults to 1.0. This
|
||
facilitates passing scalars around without having to house them in
|
||
separate objects. These "attached" scalars are stored in the internal
|
||
atom_t field of the obj_t struct, and are always stored to be the same
|
||
datatype as the object to which they are attached. Level-3 variants no
|
||
longer take scalar arguments, however, level-3 internal back-ends stll
|
||
do; this is so that the calling function can perform subproblems such
|
||
as C := C - alpha * A * B on-the-fly without needing to change either
|
||
of the scalars attached to A or B.
|
||
- Removed scalar argument from packm_int().
|
||
- Observe and apply attached scalars in scalm_int(), and removed scalar
|
||
from interface of scalm_unb_var1().
|
||
- Renamed the following functions (and corresponding invocations):
|
||
|
||
bli_obj_init_scalar_copy_of()
|
||
-> bli_obj_scalar_init_detached_copy_of()
|
||
bli_obj_init_scalar() -> bli_obj_scalar_init_detached()
|
||
bli_obj_create_scalar_with_attached_buffer()
|
||
-> bli_obj_create_1x1_with_attached_buffer()
|
||
bli_obj_scalar_equals() -> bli_obj_equals()
|
||
|
||
- Defined new functions:
|
||
|
||
bli_obj_scalar_detach()
|
||
bli_obj_scalar_attach()
|
||
bli_obj_scalar_apply_scalar()
|
||
bli_obj_scalar_reset()
|
||
bli_obj_scalar_has_nonzero_imag()
|
||
bli_obj_scalar_equals()
|
||
|
||
- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
|
||
- Renamed the following macros:
|
||
|
||
bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
|
||
bli_obj_is_scalar() -> bli_obj_is_1x1()
|
||
|
||
- Defined new macros to set and copy internal scalars between objects:
|
||
|
||
bli_obj_set_internal_scalar()
|
||
bli_obj_copy_internal_scalar()
|
||
|
||
- In level-3 internal back-ends, added conditional blocks where alpha and
|
||
beta are checked for non-unit-ness. Those values for alpha and beta are
|
||
applied to the scalars attached to aliases of A/B/C, as appropriate,
|
||
before being passed into the variant specified by the control tree.
|
||
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
|
||
alpha and/or beta.
|
||
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
|
||
attached to A and B are multiplied together to obtain alpha, while beta
|
||
is obtained directly from C.
|
||
- In level-3 front-ends, removed old function calls meant to provide
|
||
future support for mixed domain/precision. These can be added back later
|
||
once that functionality is given proper treatment. Also, removed the
|
||
creating of copy-casts of alpha and beta since typecasting of scalars
|
||
is now implicitly handled in the internal back-ends when alpha and
|
||
beta are applied to the attached scalars.
|
||
|
||
commit 992de486d6f23e69a623abd15ae77d7881d13871
|
||
Merge: 9552e6ee fd4ac636
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 2 13:58:46 2013 -0600
|
||
|
||
Unimplemented kernels now call reference.
|
||
|
||
Details:
|
||
- Updated arm, bgq, loongson3a, and x86_64 kernels so that unimplemented
|
||
datatypes call the corresponding reference kernel. Previously, these
|
||
kernel functions called abort() with a "not yet implemented" error
|
||
message.
|
||
|
||
commit fd4ac636d9a55cec1476a444bd4e70def219dc8f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 2 13:50:36 2013 -0600
|
||
|
||
Unimplemented kernels now call reference.
|
||
|
||
Details:
|
||
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
|
||
unimplemented kernel functions simply call the corresponding reference
|
||
implementation. (Previously, these unimplemented functions would
|
||
abort() with a "not yet implemented" message.)
|
||
|
||
commit 9552e6ee824d4345d5e908e869e071d19829819a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Nov 24 11:40:31 2013 -0600
|
||
|
||
Removed optional scaling from packm control tree.
|
||
|
||
Details:
|
||
- Removed does_scale field from packm control tree node and
|
||
bli_packm_cntl_obj_create() interface. Adjusted all invocations of
|
||
_cntl_obj_create() accordingly.
|
||
- Redefined/renamted macros that are used in aliasing so that now,
|
||
bli_obj_alias_to() does a full alias (shallow copy) while
|
||
bli_obj_alias_for_packing() does a partial alias that preserves the
|
||
pack_mem-related fields of the aliasing (destination) object.
|
||
- Removed bli_trmm3_cntl.c, .h after realizing that the trmm control tree
|
||
will work just fine for bli_trmm3().
|
||
- Removed some commented vestiges of the typecasting functionality needed
|
||
to support heterogeneous datatypes.
|
||
|
||
commit e65c476284db9ef64b23191a21c2584b1083342f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Nov 19 10:05:35 2013 -0600
|
||
|
||
Minor updates to packm_blk_var2.c and _blk_var3.c.
|
||
|
||
Details:
|
||
- Comment updates to packm_blk_var2.c and packm_blk_var3.c.
|
||
- In packm_blk_var2(), call setm_unb_var1(), scal2m_unb_var1() directly
|
||
instead of setm(), scal2m().
|
||
|
||
commit 9e1d0d4bca48eda54301d8976f203e2544c9df3a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 18 18:11:07 2013 -0600
|
||
|
||
Added trsm_l, trsm_u ukernels for x86_64/core2.
|
||
|
||
Details:
|
||
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
|
||
These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
|
||
that already existed in kernels/x86_64/core2-sse3/3.
|
||
|
||
commit 85e7e02ea3a9190b6fcff5d46b00d41c79cb1242
|
||
Merge: 67761e22 70720054
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 18 12:02:00 2013 -0600
|
||
|
||
Merge branch 'master'. Forgot to git-pull.
|
||
|
||
commit 67761e224c92500eecf9c1540cc72bdd2fb27679
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 18 11:57:40 2013 -0600
|
||
|
||
Attempting to fix errors in bgq build.
|
||
|
||
Details:
|
||
- Removed restrict declaration from b_cast and c_cast from
|
||
bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
|
||
are causing problems for xlc only in those two files and no other
|
||
macro-kernels.
|
||
- Fixed (hopefully) kernel function parameter type declarations in
|
||
kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.
|
||
|
||
commit 707200541d344f98cf34c9801954dbb36fbe0447
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 18 11:17:31 2013 -0600
|
||
|
||
Syntax error fix in x86_64/core2 gemmtrsm_u ukr.
|
||
|
||
commit bbe2b84a49e7785d4d0c514cda34adfbe66478b0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 18 11:11:06 2013 -0600
|
||
|
||
Updated Makefile in test, testsuite.
|
||
|
||
Details:
|
||
- Updated Makefiles in test and testsuite directories to use the new
|
||
BLIS header installation directory scheme, which is to compile with
|
||
-I<PREFIX>/include/blis instead of -I<PREFIX>/include.
|
||
|
||
commit 9bd7fcfd436625ca2108128086671319362f4d92
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 18 10:58:09 2013 -0600
|
||
|
||
Outer-to-inner 'restrict' fix in macro-kernels.
|
||
|
||
Details:
|
||
- Fixed sloppy placement of 'restrict' pointer declarations in level-3
|
||
macro-kernels. Previously, all restricted pointers were being declared
|
||
at the outer-most function scope level. While this violates the C99
|
||
standard, very few of the compilers used with BLIS so far have seemed
|
||
to care. The lone exception has been IBM's xlc. Thanks to Tyler Smith
|
||
for identifying this bug (and suggesting the fix).
|
||
|
||
commit 50549a6a31dd26cf63a013e0ede16b2c7ce835b6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Nov 17 18:31:27 2013 -0600
|
||
|
||
Changed header install directory to include/blis.
|
||
|
||
Details:
|
||
- Changed top-level Makefile so that headers are installed to
|
||
$(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
|
||
named by version/configuration and then symlinked.)
|
||
- Added uninstall targets, including uninstall-old to clean out old
|
||
library archives.
|
||
- Added GREP makefile definitions to all configurations' make_defs.mk.
|
||
|
||
commit d70733abddfb9a95661897e1e4f3c1f3cfa7cbaa
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Nov 16 17:34:25 2013 -0600
|
||
|
||
Added ARM kernels, configurations.
|
||
|
||
Details:
|
||
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
|
||
Thanks to Francisco Igual for contributing these kernels and
|
||
configurations.
|
||
|
||
commit d37c2cff62089c86983c2f79762f4b5329037373
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 13 10:47:11 2013 -0600
|
||
|
||
Minor comment and Makefile changes.
|
||
|
||
Details:
|
||
- Added missing 'check-config' and 'check-make-defs' targets to
|
||
testsuite/Makefile.
|
||
- Removed unused 'test' target from top-level Makefile.
|
||
- Comment changes to testsuite input files.
|
||
|
||
commit 19885f893a17b91ee79bead0620d0f913392d4c5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 11 12:09:21 2013 -0600
|
||
|
||
Updated some kernel comment headers.
|
||
|
||
Details:
|
||
- Updated bgq and piledriver comment headers to use BLIS copyright header
|
||
instead of libflame.
|
||
|
||
commit 1a4d698f42981d74fe5f29b980031e1ee7dc42d5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 11 10:15:40 2013 -0600
|
||
|
||
CHANGELOG update (for 0.1.0).
|
||
|
||
commit 089048d5895a30221b6b1976c9be93ad6443420d (tag: 0.1.0)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Nov 9 17:18:00 2013 -0600
|
||
|
||
Added object wrappers to 1f test suite modules.
|
||
|
||
Details:
|
||
- Added missing object wrappers to level-1f test suite modules. This was
|
||
only apparent if you were configuring with something other than the
|
||
reference configuration.
|
||
- Commented out object-wrappers in level-1f front-ends. These were not
|
||
working as intended the reference configuration was selected, because
|
||
most kernel sets, such as those in the template set, do not have object
|
||
wrappers.
|
||
- Whitespace changes to template micro-kernels.
|
||
- Comment changes to template level-1f kernel headers.
|
||
|
||
commit 9ef3752079de10124bed906b5d28479d04aa8187
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 8 17:20:47 2013 -0600
|
||
|
||
Updated template kernels wrt KernelsHowTo wiki.
|
||
|
||
Details:
|
||
- Merged latest state of KernelsHowTo wiki into template micro-kernels
|
||
located in config/template/kernels/3.
|
||
|
||
commit 376bbb59c8944e29c5c1ff6637920d8451370afa
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 8 11:17:34 2013 -0600
|
||
|
||
Removed support for duplication.
|
||
|
||
Details:
|
||
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
|
||
and all framework code.
|
||
- Updated test suite modules according to above changes.
|
||
|
||
commit 68a5910974b62b4df853fae2a68cb04df9d5a19c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Nov 7 11:36:11 2013 -0600
|
||
|
||
Added comments to testsuite/input.operations.
|
||
|
||
Details:
|
||
- Added extensive comments to the top of testsuite/input.operations,
|
||
which describe how to edit the file.
|
||
- Removed input.operations.0 and input.operations.1.
|
||
- Changed input.general to test all datatypes ("sdcz") by default.
|
||
|
||
commit a98f78b715fb256a519870071bb5266130d70b21
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 6 15:32:47 2013 -0600
|
||
|
||
Changed dim_t and inc_t to be signed integers.
|
||
|
||
Details:
|
||
- Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
|
||
This will facilitate interoperability with Fortran in the future.
|
||
(Fortran does not support unsigned integers.)
|
||
- Redefined many instances of stride-related macros so that they return
|
||
or use the absolute value of the strides, rather than the raw strides
|
||
which may now be signed. Added new macros bli_is_row_stored_f() and
|
||
bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
|
||
and changed the packm_blk_var[23] variants to use these macros instead
|
||
of the existing bli_is_row_stored(), bli_is_col_stored().
|
||
- Added/adjusted typecasting to to various functions/macros, including
|
||
bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
|
||
related macros in bli_param_macro_defs.h.
|
||
- Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
|
||
layer properly handles situations where vector increments are negative.
|
||
Thanks to Vladimir Sukharev for pointing out this issue.
|
||
- Changed type of increment parameters in bli_adjust_strides() from dim_t
|
||
to inc_t. Likewise in bli_check_matrix_strides().
|
||
- Defined bli_check_matrix_object(), which checks for negative strides.
|
||
- Redefined bli_check_scalar_object() and bli_check_vector_object() so
|
||
that they also check for negative stride.
|
||
- Added instances of bli_check_matrix_object() to various operations'
|
||
_check routines.
|
||
|
||
commit 1f8afc3e08a4312cfe810be86aedeacbc57275c5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Nov 6 10:09:10 2013 -0600
|
||
|
||
Minor comment update to BLAS compat files.
|
||
|
||
commit 1abbf768afafc158d44e4d5c4a135cfd9e277f13
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 4 15:50:00 2013 -0600
|
||
|
||
Fixed bugs in scalv and setv.
|
||
|
||
Details:
|
||
- Fixed bugs similar to those addressed in cca1e1f51dc6, whereby
|
||
a segmentation fault may occur if beta is not the same type as
|
||
the vector operand for scalv and setv.
|
||
- Changed axpyv and scal2v front-ends in a similar fashion.
|
||
|
||
commit f5953259a1842ee48e5833c22ac86e68a337bfe1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Nov 4 14:43:55 2013 -0600
|
||
|
||
Fixed a bug related to Hermitian matrix diagonals.
|
||
|
||
Details:
|
||
- Fixed a bug whereby BLIS assumed that the imaginary components of the
|
||
diagonal elements of Hermitian matrices were already zero. This property
|
||
is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
|
||
to Vladimir Sukharev for reporting this bug.
|
||
- Minor comment updates to template kernels.
|
||
|
||
commit d70f2b089dac8b9e4c19295dfa6014c36afee2ec
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Nov 2 17:19:40 2013 -0500
|
||
|
||
Added scaling to abval2s, sqrt2s macros.
|
||
|
||
Details:
|
||
- Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
|
||
and overflow from squaring the real and imaginary components. (This is
|
||
the same technique used to fix recent bugs in invscals/invscaljs and
|
||
inverts.)
|
||
|
||
commit c5b1ed9409ae2f71d04041eef5da9a0080b5784a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 1 10:28:04 2013 -0500
|
||
|
||
Added new dotxaxpyf variant 2.
|
||
|
||
Details:
|
||
- Added a new variant for dotxaxpyf that is based on dotxf and axpyf
|
||
kernels. By default, this variant is not used by any other operation.
|
||
|
||
commit 97f89fbcf202d72fc440b614708e352ea31633e2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Nov 1 10:16:39 2013 -0500
|
||
|
||
Fixed bug in complex invscals.
|
||
|
||
Details:
|
||
- Fixed complex inversion in invscals and invscaljs whereby the
|
||
imaginary component was being computed incorrectly.
|
||
- Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
|
||
in inverts, invscals, and invscaljs.
|
||
- Changed bli_abs() and bli_fabs() macro definitions to use "<="
|
||
operator instead of "<".
|
||
|
||
commit eda42a21d17a2742eab69ab801ed530b82488c8a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 31 18:00:44 2013 -0500
|
||
|
||
Defined missing symbols in bla_rotg.c
|
||
|
||
Details:
|
||
- Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
|
||
z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
|
||
d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
|
||
these bugs.
|
||
|
||
commit cca1e1f51dc67a2c3725d5c1837256831aaf70f8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 30 14:39:01 2013 -0500
|
||
|
||
Fixed bugs in scalm and setm.
|
||
|
||
Details:
|
||
- Fixed bugs in scalm and setm that resulted in segmentation faults when
|
||
beta is not the same type as the matrix operand. Thanks to Vladimir
|
||
Sukharev for reporting this bug.
|
||
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
|
||
and setm; namely, the alpha scalar is copy-cast the type of the first
|
||
matrix operand.
|
||
- Changed the template and reference configurations' bli_config.h files
|
||
so that the number of memory allocator blocks of A and B are set based
|
||
on BLIS_MAX_NUM_THREADS.
|
||
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.
|
||
|
||
commit 2807013a4761c2b84b3944de64d23483ad7ef2fb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 24 14:32:20 2013 -0500
|
||
|
||
Fixed over/under-flow in complex inversion.
|
||
|
||
Details:
|
||
- Fixed the complex bli_?inverts() macros, which were inverting elements
|
||
in an "unsafe" manner, such that very large and very small values were
|
||
unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
|
||
reporting this bug.
|
||
- Comment update to bli_sumsqv_unb_var1.c.
|
||
- Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
|
||
- Changed 1.0F to 1.0 for bli_drands() macro.
|
||
|
||
commit 45a80c625f84edb2ade6ac25efe2b9c589d7e0df
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Oct 23 12:15:25 2013 -0500
|
||
|
||
Fixed parameter checking issue in BLAS syr[2]k.
|
||
|
||
Details:
|
||
- Fixed a minor parameter checking bug in the BLAS compatibility layer
|
||
for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the
|
||
trans parameter of either operation, it is (a) allowed, and (b) treated
|
||
as 'T' (whereas previously it was disallowed). Thanks for Vladimir
|
||
Sukharev for finding and reporting this bug.
|
||
|
||
commit a091a219bda55e56817acd4930c2aa4472e53ba5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Oct 14 10:11:29 2013 -0500
|
||
|
||
Minor fixes to piledriver configuration, ukernel.
|
||
|
||
Details:
|
||
- Applied a patch from Tyler that fixes minor staleness in the piledriver
|
||
configuration and gemm micro-kernel.
|
||
- Very minor changes to test suite input files.
|
||
|
||
commit dacdde27aee4fb90b14880136d7f20c6b234e2c6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 11 11:37:19 2013 -0500
|
||
|
||
Added Fran's Sandy Bridge kernels/configuration.
|
||
|
||
Details:
|
||
- Added a kernel directory for kernels developed by Francisco Igual for
|
||
the Sandy Bridge architecture, including a dgemm ukernel coded with
|
||
AVX intrinsics.
|
||
- Added a configuration for Sandy Bridge using values supplied by Fran.
|
||
|
||
commit 03106d650e4030d4c9831683448376f92fc52d41
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Oct 11 10:40:38 2013 -0500
|
||
|
||
Fixed minor perf bug in gemm_ker_var2.
|
||
|
||
Details:
|
||
- Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental
|
||
bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not
|
||
computed correctly (ie: do not wraparound) at the edge cases. Thanks to
|
||
Tze Meng for helping me identify this bug.
|
||
|
||
commit b053337387dbdef9035be03538222670a21707ca
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 10 18:26:55 2013 -0500
|
||
|
||
Added fusing factors, MR/NR to test suite output.
|
||
|
||
Details:
|
||
- Updated the test suite driver (and modules where appropriate) so that
|
||
the level-1f fusing factors are output along with the variable dimension.
|
||
While this is not strictly necessary, since the fusing factors are output
|
||
in the initial parameter summary, it allows extra reassurance to the user
|
||
since the fusing factors appear alongside the variable dimension, which
|
||
together give a complete picture of the problem size. Similar changes were
|
||
made for outputting the register blocksizes when reporting results for the
|
||
micro-kernel test modules.
|
||
|
||
commit be4833bd91c5a58d0bfc52daaadf7ba543a77acf
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 10 14:20:06 2013 -0500
|
||
|
||
Added test suite modules for level-1f, 3 kernels.
|
||
|
||
Details:
|
||
- Added test modules in test suite for level-1f kernels and level-3
|
||
micro-kernels. (Duplication in the micro-kernels, for now, is NOT
|
||
supported by these test modules.)
|
||
- Added section override switches to test suite's input.operations file.
|
||
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
|
||
facilitate the level-1f test modules. Also added front-end for dupl
|
||
operation.
|
||
- Added obj_t-based check routines for level-1f operations, which are
|
||
called from the new front-ends mentioned above.
|
||
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
|
||
factors as a function of datatype, which is needed by their respective
|
||
test modules.
|
||
- Whitespace changes to bli_kernel.h of all existing configurations.
|
||
|
||
commit 680188d46bb15b9a1a2867638104939dc77ca2a1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 10 13:23:37 2013 -0500
|
||
|
||
Cleaned up old test drivers.
|
||
|
||
Details:
|
||
- Minor updates to old test drivers in preparation for our participation
|
||
in ACM TOMS's replicated results initiative.
|
||
|
||
commit 3690bdd4f95769c935c410414112102cc3e108b1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 10 11:45:33 2013 -0500
|
||
|
||
More updates to level-1f kernels for core2-sse3.
|
||
|
||
Details:
|
||
- Changed types in function signatures to match new prototypes. Meant to
|
||
include this in previous commit.
|
||
|
||
commit 661d5120cd7071f9b0c5cefc95f99f1361370ade
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Oct 10 11:27:27 2013 -0500
|
||
|
||
Fixed outdated fusing factor macros in 1f kernels.
|
||
|
||
Details:
|
||
- Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
|
||
macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
|
||
this out.
|
||
|
||
commit 73aa1e9f31d1b2a319c7e711ced6db3f9835c832
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Oct 1 17:01:18 2013 -0500
|
||
|
||
Added section overrides to test suite.
|
||
|
||
Details:
|
||
- Added new lines of input to the test suite's input.operations file, which
|
||
allows the user to disable entire sections (levels) of tests. Before this
|
||
change, the user had to manually disable each operation tests's "master
|
||
switch". (This is why input.operations.0 existed: to allow a more
|
||
convenient starting point for someone who only wanted to test one or a
|
||
few operations.)
|
||
|
||
commit 5e54f46ccb76beab892d530b693e07c6bf6db7cf
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 30 12:58:18 2013 -0500
|
||
|
||
Added template implementations and other tweaks.
|
||
|
||
Details:
|
||
- Added a 'template' configuration, which contains stub implementations of the
|
||
level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
|
||
lots of in-file comments and documentation.
|
||
- Modified some variable/parameter names for some 1/1f operations. (e.g.
|
||
renaming vector length parameter from m to n.)
|
||
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
|
||
to bli_kernel.h.
|
||
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
|
||
dotxaxpyf, as well as the default fusing factor (which are all equal
|
||
in the reference and template implementations).
|
||
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
|
||
reference variants were implemented in terms of front-end routines rather
|
||
that directly in terms of the kernels. (For example, axpy2v was implemented
|
||
as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
|
||
- Changed the interface to dotxf so that it matches that of axpyf, in that
|
||
A is assumed to be m x b_n in both cases, and for dotxf A is actually used
|
||
as A^T.
|
||
- Minor variable naming and comment changes to reference micro-kernels in
|
||
frame/3/gemm/ukernels and frame/3/trsm/ukernels.
|
||
|
||
commit 97aaf220a847363b4da35935eca17790c0ef71f6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 17 10:51:36 2013 -0500
|
||
|
||
Added new kernels, configurations.
|
||
|
||
Details:
|
||
- Added various micro-kernels for the following architectures:
|
||
Intel MIC
|
||
IBM BG/Q
|
||
IBM Power7
|
||
AMD Piledriver
|
||
Loogson 3A
|
||
and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
|
||
and Xianyi Zhang for contributing these kernels.
|
||
- Added configurations corresponding to above architectures, and renamed
|
||
"clarksville" configuration to "dunnington".
|
||
|
||
commit fe979c5a114c877506a5697cdab1fc8cf2bcd303
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Sep 13 14:31:53 2013 -0500
|
||
|
||
Removed default configuration behavior.
|
||
|
||
Details:
|
||
- Changed the configure script so that it no longer defaults to the
|
||
reference configuration. This change is being made so that the
|
||
developer has a firm awareness of which configuration is being used
|
||
to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this
|
||
suggested change.
|
||
|
||
commit da77e9614f54f92f703f01e3b9bd67a83280150c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Sep 13 12:00:37 2013 -0500
|
||
|
||
Minor improvements to static memory allocator.
|
||
|
||
Details:
|
||
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
|
||
a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
|
||
functionality includes computing the pool size for each datatype (using
|
||
that datatype's cache blocksizes) and using the maximum to size the
|
||
actual pool array. This addresses the somewhat common pitfall whereby a
|
||
developer updates cache blocksizes in bli_kernel.h for only one datatype
|
||
(say, single-precision real), while the memory pools are sized using the
|
||
double-precision real values. Then, when the developer attempts to link
|
||
to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
|
||
a message saying the static memory pool was exhausted. Clearly, this
|
||
message is misleading when the pool was not sized properly to begin with.
|
||
- Removed previously disabled code in bli_kernel_macro_defs.h that was
|
||
meant to check for size consistency among the various cache blocksizes.
|
||
(Obviously the memory pool size-based solution mentioned above is better.)
|
||
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
|
||
reasonable place to put these constants, rather than further crowd up
|
||
bli_config.h.
|
||
- Updated testsuite driver to output memory pool sizes for A, B, and C.
|
||
- Minor comment updates to bli_config.h.
|
||
- Removed 'flame' configuration. It was beginning to get out-of-date, and
|
||
I hadn't used it in months. We can always re-create it later.
|
||
|
||
commit 631f347b7a99cb02757c534fd3ec5f723a2fdb0e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 10 17:17:28 2013 -0500
|
||
|
||
Added ESSL and Accelerate targets to test drivers.
|
||
|
||
Details:
|
||
- Added ESSL and Accelerate (OS X) targets to standalone test drivers'
|
||
Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
|
||
/ providing this patch.
|
||
|
||
commit 7ae4d7a41d13ef5f1ceee217c000a5cf77a11128
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 10 16:35:12 2013 -0500
|
||
|
||
Various changes to treatment of integers.
|
||
|
||
Details:
|
||
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
|
||
assigned values of 32, 64, or some other value. The former two result in
|
||
defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
|
||
causes integers to be defined in terms of a default type (e.g. long int).
|
||
- Updated bli_config.h in reference and clarksville configurations according
|
||
to above changes.
|
||
- Updated test drivers in test and testsuite to avoid type warnings associated
|
||
with format specifiers not matching the types of their arguments to printf()
|
||
and scanf().
|
||
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
|
||
inclusion in d141f9eeb6d1).
|
||
- Added explicit typecasting of dim_t and inc_t to macros in
|
||
bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
|
||
- Slight changes to CREDITS and INSTALL files.
|
||
- Slight tweaks to Windows build system, mostly in the form of switching to
|
||
Windows-style CRLF newlines for certain files.
|
||
|
||
commit 068437736b41d51a1f5ec47839f059bf58a20413
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 9 14:07:58 2013 -0500
|
||
|
||
Fixed set-but-not-used compiler (gcc) warnings.
|
||
|
||
Details:
|
||
- Used void-casts of certain variables to appease gcc (and perhaps other
|
||
compilers) when such variables are only used in the complex instances of
|
||
the functions. Special thanks to Karl Rupp for suggesting a portable fix
|
||
for these warnings.
|
||
|
||
commit 6dc85f63dcd5282340c9e00d585e97d70a21edc3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 9 13:48:52 2013 -0500
|
||
|
||
Small fix to Windows defs.mk makefile fragment.
|
||
|
||
Details:
|
||
- Commented out a !include statement that was attempting to include a
|
||
version file that does not yet exist. For now, the version string is
|
||
hard-coded into defs.mk.
|
||
|
||
commit d141f9eeb6d1de7044b7429adf52d11c6fca620c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 9 13:09:16 2013 -0500
|
||
|
||
Added Windows build system.
|
||
|
||
Details:
|
||
- Added a 'windows' directory, which contains a Windows build system
|
||
similar to that of libflame's. Thanks to Martin for getting this up
|
||
and running.
|
||
- Spun off system header #includes into bli_system.h, which is included
|
||
in blis.h
|
||
- Added a Windows section to bli_clock.c (similar to libflame's).
|
||
|
||
commit 9b320e7406fb69e8b61a0085abe2ed89a96bdb68
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Sep 9 11:04:46 2013 -0500
|
||
|
||
Edited bli_?lamch.c to avoid Windows keyword.
|
||
|
||
Details:
|
||
- Renamed "small" variable to "smnum" to avoid collision with Windows type
|
||
by the same name. This change is needed in advance of the upcoming Windows
|
||
build system.
|
||
|
||
commit 9013ad6ff2e9ace35e0cf44c32795c2f3d5be628
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Sep 4 13:36:07 2013 -0500
|
||
|
||
Switched integer typedefs (again) to C types.
|
||
|
||
Details:
|
||
- Redefined gint_t and guint_t in terms of the standard C types long int
|
||
and unsigned long int, respectively.
|
||
- Changed testsuite default max problem size to 500.
|
||
- Changed testsuite input.operations to use square problems for level-3
|
||
operation tests.
|
||
|
||
commit 981a60cfa07abac2e93697dfe12b0f076ab00a38
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Sep 4 12:09:11 2013 -0500
|
||
|
||
Falling back to 32-bit integers for dim_t, etc.
|
||
|
||
Details:
|
||
- In light of recent segfaulting issues when compiling on 32-bit systems,
|
||
I've changed the default typedef for gint_t and guint_t from int64_t and
|
||
uint64_t to int32_t and uint32_t, respectively.
|
||
- Disabled 64-bit integers in the blas2blis layer for the reference
|
||
configuration.
|
||
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
|
||
to introductory output of the testsuite.
|
||
|
||
commit b776ddcd4338b34f172ef78da0ac1d771a771ab4
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 3 21:58:07 2013 -0500
|
||
|
||
Applied temp fix to typecasting bug in testsuite.
|
||
|
||
Details:
|
||
- Applied a temporary fix to the typecasting bug in the testsuite driver.
|
||
The fix involves casting both numerator and denominator to unsigned long.
|
||
This fix is more voodoo than science, as I can't be sure why it even
|
||
works.
|
||
|
||
commit 9ee6e125373869c4213c017ce772c38ecefba103
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Sep 3 21:53:27 2013 -0500
|
||
|
||
Changed dimension spec for gemm in testsuite.
|
||
|
||
Details:
|
||
- Encounted a bizarre typecasting bug whereby the test suite was not
|
||
computing the proper dimension from the problem size and dimension
|
||
specification when the latter was set to -3. Will investigate.
|
||
Thanks to Fran for finding this "bug".
|
||
|
||
commit e8be081e68c385ab44d0fea8dade21d40c200b79
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Aug 28 15:52:34 2013 -0500
|
||
|
||
Generalized matlab and file output in testsuite.
|
||
|
||
Details:
|
||
- Added a new option in input.general that allows outputting in
|
||
matlab/octave format so that one can output in matlab format
|
||
independently from outputting to files.
|
||
- Adjusted input.operations according to above.
|
||
- Added input.operations.0 and input.operations.1 with all options
|
||
disabled and enabled, respectively.
|
||
|
||
commit d352c746e5683037d41b5061dfb5ce08e1d0843b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 27 13:41:46 2013 -0500
|
||
|
||
Added single/real gemm micro-kernel for x86_64.
|
||
|
||
Details:
|
||
- Added a single-precision real gemm micro-kernel in
|
||
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
|
||
- Adjusted the single-precision real register blocksizes in
|
||
config/clarksville/bli_kernel.h to be 8x4.
|
||
- Added a missing comment to bli_packm_blk_var2.c that was present in
|
||
bli_packm_blk_var3.c
|
||
|
||
commit dedda523dc5dc779ecc34e6a03dc74cb8eb220de
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Aug 19 12:07:41 2013 -0500
|
||
|
||
Fixed bug in bli_acquire_mpart_t2b(), _l2r().
|
||
|
||
Details:
|
||
- Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r()
|
||
that cause incorrect partitioning when SUBPART0 was requested. This
|
||
bug was introduced in 46d3d09d49ad. Thanks to Bryan for isolating
|
||
this bug.
|
||
- Removed dupl kernels from kernels/x86_64/3 directory.
|
||
- Uncommented beta == 0 optimizaition code in
|
||
kernels/x86_64/3/bli_gemm_opt_d4x4.c.
|
||
|
||
commit 12dbd2f33455e9384fe2070cbdd660fd4a7fceb5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 8 14:39:35 2013 -0500
|
||
|
||
Moved init_safe(), finalize_safe() to BLAS compat.
|
||
|
||
Details:
|
||
- Moved the bli_init_safe() and bli_finalize_safe() function calls from the
|
||
BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto-
|
||
initializers in the BLIS layer wasn't buying us anything because the user
|
||
could still call the library with uninitialized global scalar constants,
|
||
for example. Thus, we will just have to live with the constraint that
|
||
bli_init() MUST be called before calling ANY routine with a bli_ prefix.
|
||
- Added the missing _init_safe() and finalize_safe() calls to the level-1
|
||
BLAS compatibility wrappers.
|
||
|
||
commit 8abfe55f2ae5d89df18e1b26a5a28d94b0936683
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 8 13:30:19 2013 -0500
|
||
|
||
Miscellaneous updates.
|
||
|
||
Details:
|
||
- Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
|
||
BLIS_CACHE_LINE_SIZE (typically 64).
|
||
- Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
|
||
kernels.
|
||
- Reformulated gemm_ker_var2 to look more like the other level-3 macro-
|
||
kernels, in that the interior and edge-case handling is expressed once
|
||
inside the loops in the n and m dimensions, rather than the edge-case
|
||
handling being "unrolled" and expressed as distinct code regions. The
|
||
previous macro-kernel now lives in retired form in the subdirectory
|
||
other/bli_gemm_ker_var2.c.old.
|
||
- Updated experimental gemm_ker_var5 according to above change.
|
||
- Fixed bug in bli_her2k.c whereby incorrect transformations were being
|
||
applied to optimize the macro-kernel accesses pattern on C when C is
|
||
row-stored.
|
||
- Various updates inside of test/exec_sizes.
|
||
|
||
commit 1aa05736ff49e7cc5f121acf615460fe9a87852c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Aug 7 12:27:04 2013 -0500
|
||
|
||
Fixed bug in interface of bla_ger_check().
|
||
|
||
Details:
|
||
- Fixed the misplaced lda parameter in the function signature of
|
||
bla_ger_check(). Thanks to Tyler for finding this bug.
|
||
|
||
commit 685aad25353fb200de4ca97a8bc0feeebde51d0f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Aug 6 12:25:51 2013 -0500
|
||
|
||
Fixed cpp guard typos in frame/compat/check files.
|
||
|
||
Details:
|
||
- Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been
|
||
BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this.
|
||
- Fixed various syntax errors in the code that had yet to be compiled
|
||
due to the aforementioned bug.
|
||
|
||
commit f4ec28e723d28d998f1038f82da6986e44320ef6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Aug 1 11:24:23 2013 -0500
|
||
|
||
Added basic OpenMP-based gemm and packm files.
|
||
|
||
Details:
|
||
- Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2
|
||
into the following auxiliary files
|
||
|
||
frame/1m/packm/other/bli_packm_blk_var2.c
|
||
frame/3/gemm/other/bli_gemm_ker_var2.c
|
||
|
||
The routine in the first file uses a basic OpenMP parallel region to
|
||
parallelize the packing of blocks of A and panels of B, while the
|
||
second uses a similar parallel region to parallelize along the n
|
||
dimension of the gemm macro-kernel.
|
||
|
||
commit f8980edf9c318453bb1962ac4939c06bf11e6d5e
|
||
Merge: 67a8b949 6e7e4523
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jul 26 11:14:27 2013 -0500
|
||
|
||
Merge branch 'master' of https://code.google.com/p/blis
|
||
|
||
commit 67a8b9498d13b038deb316ac163e62c5b17da2ec
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jul 26 11:12:37 2013 -0500
|
||
|
||
Added missing cpp kernel blocksize constraints.
|
||
|
||
Details:
|
||
- Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce
|
||
constraints on the register blocksizes relative to the cache blocksizes.
|
||
Thanks to Tyler for helping me stumble across this issue.
|
||
|
||
commit 6e7e452343014e8f86640874dc1dbadca4a642a1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 22 14:50:57 2013 -0500
|
||
|
||
Fixed minor warnings and misc issues.
|
||
|
||
Details:
|
||
- Fixed various warnings output by gcc 4.6.3-1, including removing some
|
||
set-but-not-used variables and addressing some instances of typecasting
|
||
of pointer types to integer types of different sizes.
|
||
|
||
commit 03f6c3599743bc837a7d40eb5b415b1bf4f2a4e9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 22 12:54:32 2013 -0500
|
||
|
||
Tightened some macros that detect datatypes.
|
||
|
||
Details:
|
||
- Modified the definitions of some macros, such as bli_is_real(), so that
|
||
the "special" bit is taken into account so that BLIS_INT is differentiated
|
||
from BLIS_FLOAT.
|
||
- Whitespace changes to bli_obj_macro_defs.h.
|
||
- Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
|
||
being used.
|
||
|
||
commit b33e2f4443b9043b554963320280ff7783773652
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jul 19 17:15:03 2013 -0500
|
||
|
||
CHANGELOG update (for 0.0.9).
|
||
|
||
commit 0680916fdd532f7a4716b11a2515243b2c08d00f (tag: 0.0.9)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 18 18:04:34 2013 -0500
|
||
|
||
Added BLAS error checking to compatibility layer.
|
||
|
||
Details:
|
||
- Added frame/compat/check directory, which now houses companion _check()
|
||
routines for each of the BLAS wrappers in frame/compat. These _check()
|
||
routines are called from the compatibility wrappers and mimic the
|
||
error-checking present in the netlib BLAS.
|
||
- Edited bla_xerbla.c so that xerbla() translates the operation string to
|
||
uppercase before printing.
|
||
- Redefined util routines in frame/compat/f2c/util in terms of level0
|
||
macros.
|
||
- Added prototypes for util routines, f2c routines, lsame(), and xerbla().
|
||
- Commented out prototypes in test/test_*.c since Fortran integers are now
|
||
int64_t by default (and the prototypes that were present in the files
|
||
used int).
|
||
- Removed redundant #include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c,
|
||
since blis.h was already being included.
|
||
- Other minor changes to code in frame/compat/f2c.
|
||
|
||
commit 4e80ad28c97273db3366428ec44020da7944964d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jul 18 17:53:31 2013 -0500
|
||
|
||
Added support for C99 complex types/arithmetic.
|
||
|
||
Details:
|
||
- Added support for C99 complex types to bli_type_defs.h and overloaded
|
||
complex arithmetic to the scalar-level macros in include/level0. This
|
||
includes a somewhat substantial reorganization and re-layering of much
|
||
of the existing machinery present in the level0 macros.
|
||
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
|
||
commented-out by default, which optionally enables the use of built-in
|
||
C99 complex types and arithmetic.
|
||
- Minor changes to clarksville and reference configs' make_defs.mk files.
|
||
- Removed macro definitions from bli_param_macro_defs.h which was not being
|
||
used (bli_proj_dt_to_real_if_imag_eq0).
|
||
|
||
commit 6072d7c848e837ba20d607f7b727438ada31bdcf
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 17 12:27:45 2013 -0500
|
||
|
||
Fixed bugs in trsm, trmm macro-kernels.
|
||
|
||
Details:
|
||
- Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling.
|
||
- Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was
|
||
incorrectly being adjusted upward by MR, instead of NR. The rl and ru
|
||
trmm macro-kernels were updated in a similar fashion.
|
||
- Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on
|
||
diagoffb when recomputing k to skip a zero region below where the
|
||
diagonal intersects the right side of the block. The corresponding
|
||
trmm macro-kernel was also updated.
|
||
- Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR)
|
||
needed to be placed AFTER the block that recomputes k to skip the zero
|
||
region (if present). The other three trsm macro-kernels, as well as the
|
||
trmm macro-kernels, were updated in the same manner, for consistency.
|
||
- Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was
|
||
being updated to skip a zero region to the left of where the diagonal
|
||
of A intersects the top edge of the block.
|
||
- Comment updates to all trsm and trmm macro-kernels.
|
||
- Comment updates to bli_packm_init.c.
|
||
|
||
commit 47410a48f9b91e94ce4c67633686ffd1f2ad0275
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 10 14:53:59 2013 -0500
|
||
|
||
Added f2c'ed Givens rotation wrappers.
|
||
|
||
Details:
|
||
- Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic'
|
||
along with other wrappers for which no BLIS implementation exists.
|
||
- Added f2c-generated codes for applicable datatype flavors of rot, rotg,
|
||
rotm, and rotmg operations.
|
||
|
||
commit e5f90f3a8dbe671104bcb9d8b4e3409de01805da
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 10 13:40:12 2013 -0500
|
||
|
||
Removed copynz defs from bli_kernel.h files.
|
||
|
||
Details:
|
||
- Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each
|
||
configuration. (Meant to include this in previous commit.)
|
||
|
||
commit aec12d90f596e8c04b1ad178258a1cd38108f59d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jul 10 13:33:30 2013 -0500
|
||
|
||
Removed copynzv, copynzm and related codes.
|
||
|
||
Details:
|
||
- Removed copynzv and copynzm operation directories. These operations
|
||
implemented a variation of copyv/m that, in the case of real source
|
||
and complex destination operands, leaves the imaginary component
|
||
untouched (rather than setting it to zero). I realize now that the
|
||
special case(s) (e.g. gemm with real A and B but complex C) that I
|
||
thought required this operation actually can be handled more simply.
|
||
- Removed level0 scalar macros implementing copynzs, copynzjs.
|
||
|
||
commit b0a0a0f274a761788531b5d281cc3b411b7124ed
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jul 9 17:15:38 2013 -0500
|
||
|
||
Added handling of restrict, stdint.h for non-C99.
|
||
|
||
Details:
|
||
- Removed the #include <stdint.h> from blis.h and inserted a cpp macro block
|
||
in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise
|
||
manually typedefs the types we need (which, for now, are unconditionally
|
||
int64_t and uint64_t).
|
||
- Moved basic typedefs to top of bli_type_defs.h, and comment changes.
|
||
- Added cpp macro block to bli_macro_defs.h that #defines restrict as
|
||
nothing for C++ and non-C99.
|
||
|
||
commit 4b7e7970f1af4a1ab121e07657e2b78b9fcd7671
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 8 15:20:34 2013 -0500
|
||
|
||
Migrated integer usage to stdint.h types.
|
||
|
||
Details:
|
||
- Changed the way bli_type_defs.h defines integer types so that dim_t,
|
||
inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
|
||
integer) or guint_t (general unsigned integer).
|
||
- Renamed Fortran types fchar and fint to f77_char and f77_int.
|
||
- Define f77_int as int64_t if a new configuration variable,
|
||
BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
|
||
These types are defined in stdint.h, which is now included in blis.h.
|
||
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
|
||
in terms of scomplex.
|
||
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
|
||
of char.
|
||
- Updated bla_amax() wrappers so that the return type is defined directly
|
||
as f77_int, rather than letting the prototype-generating macro decide
|
||
the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
|
||
so I removed them. Also, changed the body of the wrapper so that a
|
||
gint_t is passed into abmaxv, which is THEN typecast to an f77_int
|
||
before returning the value.
|
||
- Updated f2c code that accessed .r and .i fields of complex and
|
||
doublecomplex types so that they use .real and .imag instead (now that
|
||
we are using scomplex and dcomplex).
|
||
|
||
commit 372501398564fdba3d5a3db86c30bc1039b185ff
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jul 8 11:24:18 2013 -0500
|
||
|
||
Added experimental bli_gemm_ker_var5().
|
||
|
||
Details:
|
||
- Added support for an experimental gemm macro-kernel incrementally
|
||
packs one micro-panel of B at a time. This is useful for certain
|
||
special cases of gemm where m is small.
|
||
- Minor changes to default values of clarksville configuration.
|
||
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
|
||
do not yet have any use (or implementation support) for block storage.
|
||
- Comment update to bli_packm_init.c.
|
||
|
||
commit 9915d667a79f23e3a2a2516247c560e9063a1646
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Jul 7 13:28:39 2013 -0500
|
||
|
||
Defined "total" blocksize query functions.
|
||
|
||
Details:
|
||
- Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query
|
||
the default blocksize plus blocksize extension (using the type or the type
|
||
of an object).
|
||
- Comment update in bli_packm_cxk.c.
|
||
|
||
commit 46d3d09d49aded1d9f1b468c83fce75e07d631dc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jun 27 13:19:56 2013 -0500
|
||
|
||
Consolidated lower/upper her[2]k blocked variants.
|
||
|
||
Details:
|
||
- Consolidated lower and upper blocked variants for herk and her2k, and
|
||
renamed the resulting variants, according to the same changes recently
|
||
made to trmm and trsm.
|
||
- Implemented support for four new subpartitions types:
|
||
BLIS_SUBPART1T
|
||
BLIS_SUBPART1B
|
||
BLIS_SUBPART1L
|
||
BLIS_SUBPART1R
|
||
which correspond to "merged" partitions that include the middle "1"
|
||
partition as well as either the neighboring "0" or "2" partition. This is
|
||
used to clean up code in herk/her2k var2 that attempts to partition away
|
||
the strictly zero region above or below the diagonal of a matrix operand
|
||
that is being marched through diagonally.
|
||
- Added safeguards to herk macro-kernels that skip any leading or trailing
|
||
zero region in the panel of C that is passed in. This is now needed given
|
||
that herk/her2k var1 no longer partitions off this zero region before
|
||
calling the macro-kernel (via bli_her[2]k_int()).
|
||
- Updated comments and other whitespace changes to trmm/trsm macro-kernels.
|
||
|
||
commit 02002ef6f3d2746665982793db36714bd69bccc9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jun 24 17:08:14 2013 -0500
|
||
|
||
Added row-storage optimizations for trmm, trsm.
|
||
|
||
Details:
|
||
- Implemented algorithmic optimizations for trmm and trsm whereby the right
|
||
side case is now handled explicitly, rather than induced indirectly by
|
||
transposing and swapping strides on operands. This allows us to walk through
|
||
the output matrix with favorable access patterns no matter how it is stored,
|
||
for all parameter combinations.
|
||
- Renamed trmm and trsm blocked variants so that there is no longer a
|
||
lower/upper distinction. Instead, we simply label the variants by which
|
||
dimension is partitioned and whether the variant marches forwards or
|
||
backwards through the corresponding partitioned operands.
|
||
- Added support for row-stored packing of lower and upper triangular matrices
|
||
(as provided by bli_packm_blk_var3.c).
|
||
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
|
||
blocksize extensions (if non-zero) were not being used to appropriately size
|
||
the first iteration (ie: the bottom/right edge case).
|
||
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
|
||
whole multiples of MR AND NR. This is needed for the case of trsm_r where,
|
||
in order to reuse existing left-side gemmtrsm fused micro-kernels, the
|
||
packing of A (left-hand operand) and B (right-hand operand) is done with
|
||
NR and MR, respectively (instead of MR and NR).
|
||
|
||
commit d1e81ddc848ee47bc188735883d14582bdd0cabc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jun 13 11:14:21 2013 -0500
|
||
|
||
Minor generalizing tweaks to trmm blk var1, var2.
|
||
|
||
commit 0efb7974f104206ba3985276f2180a9b14fe9f9b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jun 12 16:40:04 2013 -0500
|
||
|
||
CHANGELOG update.
|
||
|
||
commit 5b641c3bab31eac6a1795b9f6e3f86c59651ca50 (tag: 0.0.8)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Jun 12 16:02:12 2013 -0500
|
||
|
||
Use separate CFLAGS for "kernels" directories.
|
||
|
||
Details:
|
||
- Added a new "special" directory type: any source code within directories
|
||
named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
|
||
compiler flags. This allows the developer to specify a separate set of
|
||
flags (e.g. optimization flags) for compiling kernels while maintaining a
|
||
standard set for regular framework code.
|
||
- Fixed a bug in the top-level Makefile that was causing "noopt" code
|
||
to be compiled with the standard set of compilation flags.
|
||
- Updated make_defs.mk in reference, flame, and clarksville configurations
|
||
according to above changes.
|
||
|
||
commit 08475e7c7653ba598665071a617d10f0d8f763c2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jun 11 12:18:39 2013 -0500
|
||
|
||
Various level-3 optimizations for row storage.
|
||
|
||
Details:
|
||
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
|
||
packing from a lower or upper-stored symmetric/Hermitian matrix to column
|
||
panels (which are row-stored). Previously one could only pack to row panels
|
||
(which are column-stored).
|
||
- Implemented various optimizations in the level-3 front-ends that allow more
|
||
favorable access through row-stored matrices for gemm, hemm, herk, her2k,
|
||
symm, syrk, and syr2k.
|
||
- Cleaned up code in level-3 front-ends that has to do with setting target and
|
||
execution datatypes.
|
||
|
||
commit 05a657a6b92e8d34efa5c57ae6a18a4f35ec0841
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jun 7 11:04:10 2013 -0500
|
||
|
||
Added beta == 0 optimization to x86_64 ukernel.
|
||
|
||
Details:
|
||
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
|
||
from memory (nor scaled by beta).
|
||
- Fixed minor bug in test suite driver when "Test all combinations of storage
|
||
schemes?" switch is disabled, which would result in redundant tests being
|
||
executed for matrix-only (e.g. level-1m, level-3) operations if multiple
|
||
vector storage schemes were specified.
|
||
- Restored debug flags as default in clarksville configuration.
|
||
|
||
commit f1aa6b81cc421516dd77dd0f18f7c432724e6ef2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Jun 6 13:36:06 2013 -0500
|
||
|
||
Whitespace changes to old test drivers.
|
||
|
||
Details:
|
||
- Replaced tabs with four spaces in places where indention was already
|
||
in place.
|
||
|
||
commit 9feb4c23d2e36f3d8b5417a3802c69f94b29f749
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Jun 4 14:57:46 2013 -0500
|
||
|
||
Fixed unaligned handling in axpyf, dotxaxpyf.
|
||
|
||
Details:
|
||
- Fixed over-cautious handling of unaligned operands in vector instrinsic
|
||
implementation of axpyf kernel.
|
||
- Fixed over- and under-cautious handling of unaligned operands in vector
|
||
intrinsic implementation of dotxaxpyf kernel.
|
||
|
||
commit 22b06cfcd2e3205c8325a246c2279e4b1047c066
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Jun 3 16:54:52 2013 -0500
|
||
|
||
Updated level-1/-1f [vector intrinsic] kernels.
|
||
|
||
Details:
|
||
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
|
||
handled by reference implementation (rather than aborted).
|
||
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
|
||
configuration.
|
||
- Defined bli_offset_from_alignment() macro.
|
||
- Minor edits to old test drivers.
|
||
|
||
commit 0288c827d3659bb225ac9c10f168b623ed0106a2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Jun 1 08:02:23 2013 -0500
|
||
|
||
Updated ukernels for x86_64.
|
||
|
||
Details:
|
||
- Tweaked micro-kernels and configuration for clarksville.
|
||
- Updated/cleaned up old test drivers in test directory.
|
||
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
|
||
recently).
|
||
|
||
commit 85a6d1c9a52c2b27c71a3a3e341c51d7ba263749
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon May 6 11:05:08 2013 -0500
|
||
|
||
Replaced axpys usage with subs in trsv.
|
||
|
||
Details:
|
||
- Replaced instances of axpys with alpha equal to -1 with subs.
|
||
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
|
||
sizeof(dcomplex).
|
||
|
||
commit 2d9c667f3c48a12cab64e5ad09d5fcb9f4c19d78
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri May 24 16:28:10 2013 -0500
|
||
|
||
Fixed x86_64 kernel bugs and other minor issues.
|
||
|
||
Details:
|
||
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
|
||
unaligned subpartitions. We were already going out of our way a bit to
|
||
handle edge cases in the first iteration for blocked variants, and this
|
||
was simply the unblocked-fused extension of that idea.
|
||
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
|
||
into account how the choice of variant needed to be altered for
|
||
upper-stored matrices (given that only lower-stored algorithms are
|
||
explicitly implemented).
|
||
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
|
||
macros to provide inlined versions of bli_determine_blocksize_[fb]() for
|
||
use by unblocked-fused variants.
|
||
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
|
||
consistency with that of the bugfix for trmv/trsv (both of which now
|
||
use the same macros).
|
||
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
|
||
vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
|
||
conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
|
||
was invalid only because the code was expecting 1 (for purposes of
|
||
performing contiguous vector loads) but got a value greater than 1 because
|
||
the column stride of the object (e.g. rho) was inflated for alignment
|
||
purposes (albeit unnecessarily since there is only one element in the
|
||
object).
|
||
- Replaced some old invocations of set0 with set0s.
|
||
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
|
||
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
|
||
- Added safeguard to test modules so that testing a problem with a zero
|
||
dimension does not result in a failure.
|
||
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
|
||
internal back-ends to correctly handle cases where output operand still
|
||
needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
|
||
|
||
commit d57ec42b34f8447c88adeffa95cf22f8c115ad51
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri May 3 17:35:32 2013 -0500
|
||
|
||
Renamed _trans_status() macro.
|
||
|
||
Details:
|
||
- Mistakenly forgot to rename the _trans_status() macro and instances in
|
||
previous commit.
|
||
|
||
commit 9e2b227866af429a4a6fb7dbb8c457bbdda2f136
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri May 3 17:24:58 2013 -0500
|
||
|
||
Renamed _set_trans(), _trans_status() macros.
|
||
|
||
Details:
|
||
- Renamed the following macros:
|
||
bli_obj_set_trans() -> bli_obj_set_onlytrans()
|
||
bli_obj_trans_status() -> bli_obj_onlytrans_status()
|
||
to remove ambiguity as to which bits are read/updated.
|
||
|
||
commit 2f8174509ea9f844db11ebd9389de5168e85b132
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 1 15:06:30 2013 -0500
|
||
|
||
Unconditionally check memory pool(s) for errors.
|
||
|
||
Details:
|
||
- Changed bli_mem_acquire_m() in bli_mem.c so that we still check if the
|
||
memory pool is exhausted before checking out and returning a block, even
|
||
if BLIS error checking has been disabled. These errors are useful because
|
||
they likely indicate that BLIS was improperly configured for the code
|
||
being run.
|
||
|
||
commit 75405a2b83679b6aff38d7e7425199d623a7b0a9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed May 1 15:00:30 2013 -0500
|
||
|
||
CHANGELOG update.
|
||
|
||
commit 6bfa96f84887dec0b4cf8be5d38dd634c2f8951d (tag: 0.0.7)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 30 19:35:54 2013 -0500
|
||
|
||
Absorbed blocksize extensions into main objects.
|
||
|
||
Details:
|
||
- Revamped some parts of commit b6ef84fad1c9 by adding blocksize extension
|
||
fields to the blksz_t object rather than have them as separate structs.
|
||
- Updated all packm interfaces/invocations according to above change.
|
||
- Generalized bli_determine_blocksize_?() so that edge case optimization
|
||
happens if and only if cache blocksizes are created with non-zero
|
||
extensions.
|
||
- Updated comments in bli_kernel.h files to indicate that the edge case
|
||
blocksize extension mechanism is now available for use.
|
||
|
||
commit bc7c8005cedbe50961ac2a99aeeabf4e9f9a8e9e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 25 17:16:59 2013 -0500
|
||
|
||
Added option to disable err checking in testsuite.
|
||
|
||
Details:
|
||
- Added a new line to input.general that allows one to specify the error-
|
||
checking level to use for each BLIS experiment. The only two levels
|
||
supported for now are "no error checking" and "full error checking".
|
||
|
||
commit 096b366ddcfe386f44419ef84d8df8be13825f86
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 25 16:43:43 2013 -0500
|
||
|
||
Use cntl trees that block in n dimension.
|
||
|
||
Details:
|
||
- Updated _cntl.c files for each level-3 operation to induce blocked
|
||
algorithms that first paritition in the n dimension with a blocksize
|
||
of NC. Typically this is not an issue since only very large problems
|
||
exceed that of NC. But developers often run very large problems, and
|
||
so this extra blocking should be the default.
|
||
- Removed some recently introduced but now unused macros from
|
||
bli_param_macro_defs.h.
|
||
|
||
commit b6e24b23cb4dfc488c1c9c70d596539c2287f72e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 25 12:06:12 2013 -0500
|
||
|
||
Use PASTEMAC in macro-kernels (over MAC2 or MAC3).
|
||
|
||
Details:
|
||
- Replaced multi-type invocations of copys_mxn, xpbys_mxn, etc. (PASTEMAC2
|
||
and PASTEMAC3) with those that only use a single type (PASTEMAC).
|
||
- Added extra macros to bli_adds_mxn_uplo.h and bli_xpbys_mxn_uplo.h to
|
||
accommodate above change.
|
||
- Fixed comment typo in bli_config.h files.
|
||
- Added .nfs* pattern to .gitignore.
|
||
|
||
commit df80acf517dde180ddcc5835c6136b2fa7556d4b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 23 19:43:23 2013 -0500
|
||
|
||
Fixed computation of b_next in L3 macro-kernels.
|
||
|
||
Details:
|
||
- Restructured herk_l and herk_u macro-kernels in the imagine of trmm
|
||
and trsm, in that the edge cases are captured by the main loop, rather
|
||
than trying to have "cleanup" sections that result in four distinct
|
||
parts (interior, bottom edge, right edge, bottom-right edge) of the
|
||
code.
|
||
- Fixed the way b_next was being computed in the non-gemm level-3
|
||
macro-kernels (herk, trmm, trsm). The way they are computed now matches
|
||
that of gemm.
|
||
|
||
commit 3671528cf8efe4b445d196665143a5c50c2c6048
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 23 19:12:14 2013 -0500
|
||
|
||
Fixed minor bug in computing b_next in gemm.
|
||
|
||
commit db072a5b4a039a9a668ef951333ecfb5bd3a74b9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 23 17:49:10 2013 -0500
|
||
|
||
Fixed rare edge case bug in herk_l macro-kernel.
|
||
|
||
Details:
|
||
- Fixed a potential bug in herk_l at the m_left edge case. If MR was
|
||
chosen to be much larger than NR, then one could encounter edge cases
|
||
in the the MC dimension that fall entirely below the diagonal, which
|
||
the previous implementation of the herk_l macro-kernel was not allowing
|
||
for.
|
||
|
||
commit 1dab11e37d1cb403cbe75b73a644c00de534f104
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 23 17:17:11 2013 -0500
|
||
|
||
Updated x86 gemmtrsm ukernels to use alpha.
|
||
|
||
commit 9d10d7dd9bc92a993fea7162bfa5983f75506f49
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 23 16:00:18 2013 -0500
|
||
|
||
Added a_next, b_next arguments to micro-kernels.
|
||
|
||
Details:
|
||
- Added two more arguments to the gemm and gemmtrsm microkernels: the
|
||
addresses of the next micro-panels of A and B. By passing these
|
||
pointers into the micro-kernel, we allow the micro-kernel author to
|
||
prefetch micro-panels of A and B as necessary (though this is
|
||
completely optional; these addresses may also be safely ignored).
|
||
- Updated all seven macro-kernels so that they compute and pass in
|
||
a_next and b_next. Note that ONLY the gemm macro-kernel computes
|
||
a_next and b_next with the precise semantics we want. I will go back
|
||
and fix the other macro-kernels in the near future.
|
||
- Added 'restrict' to various micro-kernels from which it was missing.
|
||
|
||
commit f3815dc84d385c514a5acaf1e925424a57be2f51
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 23 11:12:33 2013 -0500
|
||
|
||
Added code for backward edge-case blocking.
|
||
|
||
Disabled:
|
||
- Edited bli_determine_blocksize_b() to include experimental (and
|
||
currently disabled) code that computes extended blocks.
|
||
- Updated commnts relate to above changes.
|
||
- Enabled use of x86 gemmtrsm ukernel in config/flame/bli_kernel.h.
|
||
|
||
commit 4fe1435f20e8fc7dd72f795ac58c8e236e6c631b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 22 19:00:43 2013 -0500
|
||
|
||
Updated dupl implementation to use PACKNR and NR.
|
||
|
||
Details:
|
||
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
|
||
explicitly so navigate b1 so that situations where PACKNR > NR are
|
||
supported.
|
||
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
|
||
frame/3/trsm/ukernels to kernels/c99/.
|
||
- Updated clarksville and flame configurations.
|
||
|
||
commit 2d6f9e83799a46d52d7901e275f8fd67f0a0edc6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Apr 21 15:10:34 2013 -0500
|
||
|
||
Disabled blocksize checks for memory pools.
|
||
|
||
Details:
|
||
- Temporarily disabled checks that ensure that enough memory will be allocated
|
||
by the contiguous memory allocator for all types, given that the values for
|
||
double precision real are the ones used to allocate the space. These checks
|
||
can easily go awry in certain situations, especially if you are developing for
|
||
only one datatype. So for now, they are probably more trouble than they are
|
||
worth.
|
||
|
||
commit b6ef84fad1c9884c84b7f1350a0bcdfe1737e8f2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Apr 21 15:00:24 2013 -0500
|
||
|
||
Allow ldim of packed micro-panels != MR, NR.
|
||
|
||
Details:
|
||
- Made substantial changes throughout the framework to decouple the leading
|
||
dimension (row or column stride) used within each packed micro-panel from
|
||
the corresponding register blocksize. It appears advantageous on some
|
||
systems to use, for example, packed micro-panels of A where the column
|
||
stride is greater than MR (whereas previously it was always equal to MR).
|
||
- Changes include:
|
||
- Added BLIS_EXTEND_[MNK]R_? macros, which specify how much extra padding
|
||
to use when packing micro-panels of A and B.
|
||
- Adjusted all packing routines and macro-kernels to use PACKMR and PACKNR
|
||
where appropriate, instead of MR and NR.
|
||
- Added pd field (panel dimension) to obj_t.
|
||
- New interface to bli_packm_cntl_obj_create().
|
||
- Renamed bli_obj_packed_length()/_width() macros to
|
||
bli_obj_padded_length()/_width().
|
||
- Removed local #defines for cache/register blocksizes in level-3 *_cntl.c.
|
||
- Print out new cache and register blocksize extensions in test suite.
|
||
- Also added new BLIS_EXTEND_[MNK]C_? macros for future use in using a larger
|
||
blocksize for edge cases, which can improve performance at the margins.
|
||
|
||
commit 59fca58dbe678d79c1df0916b022afbeac7c48fa
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 19 15:26:29 2013 -0500
|
||
|
||
Fixed bug in compatibility layer (her2k/syr2k).
|
||
|
||
Details:
|
||
- Fixed a bug in the BLAS compatibility layer, specifically in bla_her2k.c
|
||
and bla_syr2k.c, that caused incorrect computation to occur when the BLAS
|
||
interface caller requests the [conjugate-]transpose case. Thanks to Bryan
|
||
Marker for reporting the behavior that led to this bug.
|
||
|
||
commit 09eacbd1ab1380a95a0e9625726b45e43ed102d6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 18 19:39:13 2013 -0500
|
||
|
||
Changed old level3 test drivers to call front-ends.
|
||
|
||
Details:
|
||
- Changed old level-3 test drivers, in 'test' directory, to always call the
|
||
front-end object API instead of the internal back-end with the locally
|
||
defined control tree.
|
||
|
||
commit 83e45de23e565138b8fde06fb11cfedc973b7246
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 18 18:33:03 2013 -0500
|
||
|
||
Allow packm_init() to reacquire a too-small mem_t.
|
||
|
||
Details:
|
||
- Changed bli_packm_init() to react differently to a situation where a pack
|
||
obj_t has an already-allocated mem_t entry that has a buffer that is smaller
|
||
than what will be needed to hold the block/panel that now needs to be
|
||
packed. Previously, this situation was treated with an abort() since I
|
||
assumed something was horribly wrong. I have changed the code so that it now
|
||
reacts by releasing the previous mem_t and re-acquires a new mem_t with the
|
||
new information. (This change was done at the request of Bryan Marker to
|
||
facilitate code generation via DxT.)
|
||
|
||
commit a6990434173b0cf651f8521194f3aef738deb7d2
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 18 13:52:47 2013 -0500
|
||
|
||
Fixed bug in packing block of A for hemm/symm.
|
||
|
||
Details:
|
||
- Fixed a bug in bli_packm_blk_var2() that affected the packing functionality
|
||
of hemm and symm. The bug occurs whenever attempting to pack a Hermitian or
|
||
symmetric matrix where the block of A being packed intersects the diagonal,
|
||
but some of its micro-panels do not intersect the diagonal and lie completely
|
||
in the unstored region. Thanks to Francisco Igual for reporting this bug.
|
||
- Comment updates to both _blk_var2.c and _blk_var3.c.
|
||
|
||
commit c92e7590e1934f830814ab614c794215ebe0c415
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Apr 17 20:53:29 2013 -0500
|
||
|
||
Activated bli_packm_acquire_mpart_t2b().
|
||
|
||
Details:
|
||
- Removed the overly-paranoid bli_abort() from the end of
|
||
bli_packm_acquire_mpart_t2b(), to allow others to experiment with
|
||
partitioning through packed blocks of A. Also, and more importantly,
|
||
changed an earlier check that was causing an erroneous (but
|
||
coincidentally redundant) abort(). Also, updated some of the comments
|
||
in bli_packm_part.c.
|
||
|
||
commit bea579e9f009a44e08008eb14d09f38748ab2b53
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 16 19:43:14 2013 -0500
|
||
|
||
Allow creation of "empty" objects.
|
||
|
||
Details:
|
||
- Modified bli_obj_alloc_buffer() to allow allocating an empty buffer, and
|
||
modified bli_adjust_strides() to explicitly handle m = n = 0.
|
||
- Updated bli_check_matrix_strides() to allow cases where m = n = 0.
|
||
|
||
commit 7904e20f2e6908571ee5008da2a08084198eefae
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 16 17:37:16 2013 -0500
|
||
|
||
Fixed "root" object bug in bli_her[2]k/syr[2]k.
|
||
|
||
Details:
|
||
- Fixed an obscure bug in the front-ends for herk, her2k, syrk, and syr2k,
|
||
that manifested as the incorrect triangle being updated. It occurred when
|
||
the user would pass in a matrix object that was correctly marked as
|
||
symmetric/Hermitian and lower-stored, but whose root object was never marked
|
||
as lower (or upper). We now alias and re-assign root status for matrix C
|
||
within the front-ends. Note that trmm and trsm were already doing this,
|
||
albeit for a slightly different reason (to allow the internal back-end to
|
||
choose which algorithm to run--lower or upper--based on the uplo of the root
|
||
object for both left and right side cases). Thanks to Bryan Marker for
|
||
leading me to this bug.
|
||
|
||
commit 19155a768dd97b57cfb59c32fa8e54a344ec66e1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 16 11:24:03 2013 -0500
|
||
|
||
Fixed overzealous type-checking in bli_getsc().
|
||
|
||
Details:
|
||
- Relaxed type checking in getsc so that the input object could be a constant
|
||
and not just a proper floating-point type. (If it is a constant, default to
|
||
extracting the dcomplex values.) Thanks to Bryan Marker for reporting this
|
||
bug.
|
||
- Added definition for bli_is_constant() in bli_param_macro_defs.h
|
||
- Comment updates to various level-0 scalar routines.
|
||
|
||
commit 2ee6bbca2953d04c967685da9735b3eaf8a4b813
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 15 19:27:57 2013 -0500
|
||
|
||
Fixed bug in bli_obj_is_packed() and renamed.
|
||
|
||
Details:
|
||
- This macro is used to determine whether the partitioning routines should
|
||
call a corresponding packm_part routine instead. However, it was
|
||
unintentionally catching matrices that were marked as "packed" by virtue
|
||
of them simply being marked as BLIS_PACKED_UNSPEC in, say, bli_gemv().
|
||
The macro has now been renamed to bli_obj_is_panel_packed(), and now only
|
||
checks for row or column panel packing. (Note that I first attempted to
|
||
fix this bug in a571af816d72.) Thanks to Bryan Marker for reporting the
|
||
erroneous behavior that led me to this bug.
|
||
|
||
commit 99b99eebe70336b5f28039a4a084aa7f5fa7059d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 15 17:54:43 2013 -0500
|
||
|
||
Removed local reference ukernel blocksize macros.
|
||
|
||
Details:
|
||
- Removed locally defined gemm microkernel blocksize macros from _mxn
|
||
reference microkernel definition and header. Meant to include this in
|
||
a recent/previous commit (0020ef7c8271).
|
||
|
||
commit 6a538fa7b164655f41cea5b9c8d3902438bda66b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 15 14:40:31 2013 -0500
|
||
|
||
Formatting change to mods in previous commit.
|
||
|
||
commit ea079d35591e808971d2d98a1a7d9f89bc1f7c2f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 15 14:31:40 2013 -0500
|
||
|
||
Set structure of objects in level-2 BLIS APIs.
|
||
|
||
Details:
|
||
- Added missing statement to set structure field of local objects in
|
||
top-level BLIS (BLAS-like) API wrappers. Thanks to Bryan Marker for
|
||
reporting this bug.
|
||
|
||
commit d9948c541c0446e20e249a1ccc83709ce51b7aa8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 15 10:21:26 2013 -0500
|
||
|
||
Tweak to test suite function string construction.
|
||
|
||
Details:
|
||
- Fixed a minor bug in the way that the test suite would construct function
|
||
name strings when the user anchored all parameters in input.operations.
|
||
In this case, the test driver would mistake this situation for one where
|
||
the operation simply had no parameters to begin with, and thus would not
|
||
include the parameter string in the function string that is output for
|
||
every result.
|
||
|
||
commit ca9e435c57c5c7a000d2a32681dd8070ba850abd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 15 09:59:46 2013 -0500
|
||
|
||
Fixed a bug in reference implementation of dupl.
|
||
|
||
Details:
|
||
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
|
||
which resulted in incorrect duplication.
|
||
- Updated old test drivers according to recently updated packm control tree
|
||
creation interface.
|
||
- Added 'restrict' to x86 gemm microkernel interface.
|
||
|
||
commit 26cbd52e364bbe439e3744101cd5a6cbcb82dffd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Apr 14 19:05:33 2013 -0500
|
||
|
||
Modified bli_kernel.h include order in blis.h.
|
||
|
||
Details:
|
||
- Delayed #include of bli_kernel.h in blis.h to prevent a situation where
|
||
_kernel.h includes an optimized microkernel header, which uses BLIS types
|
||
such as dim_t and inc_t, which would precede the definition of those types
|
||
in bli_type_defs.h.
|
||
- Moved the #include of bli_kernel_macro_defs.h in bli_macro_defs.h to blis.h
|
||
(immediately after that of bli_kernel.h).
|
||
|
||
commit 3414a23c38b0de45a8034b3dda2fc4b5a755e4e1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Apr 13 16:53:16 2013 -0500
|
||
|
||
CHANGELOG update.
|
||
|
||
commit ec16c52f2ecf419c749175ce0a297441c10f1c68 (tag: 0.0.6)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Apr 13 16:41:16 2013 -0500
|
||
|
||
Updated INSTALL file (now redirects to website).
|
||
|
||
commit 0020ef7c82711a7ebf08e5174f939bee2563184c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Apr 13 15:26:35 2013 -0500
|
||
|
||
Removed gemmtrsm-, trsm-specific blocksize macros.
|
||
|
||
Details:
|
||
- Modified gemmtrsm micro-kernel wrappers to use new aliased blocksize macros
|
||
instead of operation-specific ones.
|
||
- Removed local, gemmtrsm-specific blocksize macro definitions found in
|
||
micro-kernel header files.
|
||
(Meant to include above changes in 31b100e7bf4a.)
|
||
- Added comments to reference gemmtrsm micro-kernel wrapper implementation.
|
||
|
||
commit 1a9f427b85bb95aaa9e54c8ff8ecad8734b361ee
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 12 15:25:54 2013 -0500
|
||
|
||
Added/renamed alignment constants to _config.h.
|
||
|
||
Details:
|
||
- Added new memory alignment constants:
|
||
BLIS_HEAP_STRIDE_ALIGN_SIZE (previously assumed to be same as SYSTEM_MEM)
|
||
BLIS_CONTIG_ADDR_ALIGN_SIZE (previously assumed to be same as PAGE_SIZE)
|
||
BLIS_STACK_BUF_ALIGN_SIZE (previously not enforced)
|
||
and renamed existing ones
|
||
BLIS_SYSTEM_MEM_ALIGN_SIZE -> BLIS_HEAP_ADDR_ALIGN_SIZE
|
||
BLIS_CONTIG_MEM_ALIGN_SIZE -> BLIS_CONTIG_STRIDE_ALIGN_SIZE
|
||
to better convey what the alignment factor is used for (and what it is
|
||
not used for).
|
||
- Removed BLIS_ENABLE_SYSTEM_MEM_ALIGN. Dynamic memory alignment is now
|
||
disabled by setting BLIS_HEAP_STRIDE_ALIGN_SIZE to 1.
|
||
- Inserted instances of __attribute__((aligned(BLIS_STACK_BUF_ALIGN_SIZE)))
|
||
into macro-kernels to specify stack alignment of temporary buffers.
|
||
- Modified test suite driver to output new constants.
|
||
- Removed bli_align_dim_to_sys() and bli_align_dim_to_cmem(). Instead, we now
|
||
use bli_align_dim_to_size(), which takes a third argument (the desired
|
||
alignment).
|
||
|
||
commit a77d10e87e3c0ab55ec14d74c285bc95c06285c3
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 12 11:40:55 2013 -0500
|
||
|
||
Fixed an bug in axpyv/axpym when alpha is unit.
|
||
|
||
Details:
|
||
- Fixed bug whereby axpyv and axpym were incorrectly simplifying to a copy,
|
||
rather than an add, when alpha = 1. Thanks to Bryan Marker for identifying
|
||
this bug.
|
||
|
||
commit 0495bd1d6de5995fe2fb79b321eec79e961eb7a5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 11 16:39:25 2013 -0500
|
||
|
||
Moved _POSIX_C_SOURCE def to compiler cmd line.
|
||
|
||
Details:
|
||
- Removed the #define of _POSIX_C_SOURCE in bli_config.h (for both reference
|
||
and clarksville configurations) and added "-D_POSIX_C_SOURCE=200112L" to
|
||
the compiler command line arguments in make_defs.mk (for both configs).
|
||
Thanks to Devin Matthews for suggesting this change.
|
||
|
||
commit d43d1a0a2ef6de4bc57627566aef8e3fdb458b8c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 11 16:28:17 2013 -0500
|
||
|
||
Appended 'f2c_' to abs, min, max macros in f2c.h.
|
||
|
||
Details:
|
||
- Renamed abs, min, max, dmin, and dmax macros in bli_f2c.h so that they
|
||
would not conflict with anything defined by the user (or the language).
|
||
Thanks to Devin Matthews for suggesting this fix.
|
||
- Updated all instances of the above macros accordingly.
|
||
|
||
commit 31b100e7bf4aeaa4ceafefd2b6c3102d5fbc4cbb
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 11 11:11:52 2013 -0500
|
||
|
||
Added new kernel blocksize macro aliases.
|
||
|
||
Details:
|
||
- Added new macros that alias level-3 cache and register blocksize macros
|
||
to names that can be constructed via the PASTEMAC macro. These aliased
|
||
macro definitions live inside bli_kernel_macro_defs.h, which is now
|
||
#included after bli_kernel.h.
|
||
- Modified macro-kernels to use new aliased blocksize macros instead of
|
||
operation-specific ones.
|
||
- Removed local, operation-specific kernel blocksize macro definitions
|
||
(found in macro-kernel header files).
|
||
|
||
commit bd2b24ba65b36d7c07c5918a3838ce2ff57c4b48
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 11 10:35:39 2013 -0500
|
||
|
||
Updated CREDITS file.
|
||
|
||
commit 79328c15410215737f3f14cd069328cf52aa11fd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 11 10:32:14 2013 -0500
|
||
|
||
Reverted testsuite object files' home to 'obj'.
|
||
|
||
Details:
|
||
- Removed 'obj' and 'lib' from .gitignore.
|
||
- Added testsuite/obj/.gitkeep (which is an empty file).
|
||
- Updated testsuite/Makefile accordingly.
|
||
- Thanks to Vernon Austel for pointing out the .gitkeep trick to tracking
|
||
empty directories in git.
|
||
|
||
commit 4afe3bfd82c03e1e97b58b7d250588a0d28541e5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 9 17:45:39 2013 -0500
|
||
|
||
Renamed/moved object scalar constant macros.
|
||
|
||
Details:
|
||
- Replaced scalar constant macro definitions in bli_const_defs.h with a single,
|
||
simplier macro in bli_obj_macro_defs.h.
|
||
- Updated invocations of old macros accordingly.
|
||
- Removed bli_const_defs.h.
|
||
|
||
commit 357893f5be5c56ab7b062874005e77e614b23f06
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 9 14:48:15 2013 -0500
|
||
|
||
Applied fix from prev commit to gemmtrsm_?_ref_4x4
|
||
|
||
Details:
|
||
- Fixed hard-coded kernels in bli_gemmtrsm_l_ref_4x4.c and
|
||
bli_gemmtrsm_u_ref_4x4.c.
|
||
|
||
commit 54988e8dca44475610bcaee5a7bc1c40e8921402
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 8 19:08:43 2013 -0500
|
||
|
||
Fixed a performance bug in trsm.
|
||
|
||
Details:
|
||
- Fixed a bug in the reference implementations of the gemmtrsm wrappers
|
||
(bli_gemmtrsm_l_ref_mxn.c and bli_gemmtrsm_u_ref_mxn.c) whereby the
|
||
reference gemm microkernel was hard-coded, and thus always called, even
|
||
when GEMM_UKERNEL was defined to point to an optimzied microkernel. This
|
||
manifested as artificially low trsm performance for all problem sizes, but
|
||
especially for small problem sizes as it only affected blocks of A that
|
||
intersected the diagonal. Thanks to Mike Kistler of IBM for helping me
|
||
find this bug.
|
||
|
||
commit a7252e40b5c351eef9a1df531ea0ef25cb5fb705
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 8 16:08:22 2013 -0500
|
||
|
||
Generate testsuite objects 'src'.
|
||
|
||
Details:
|
||
- Tweaked the testsuite makefile so that object files are stored in 'src'
|
||
rather than 'obj', since (a) the top-level .gitignore dictates that
|
||
obj directories are to be ignored, and (b) since git has problems
|
||
tracking empty directories. Now, users do not need to create their own
|
||
obj directories within their own local clones of BLIS.
|
||
|
||
commit 803871c55b60d3c225ad9a0607fa507a9c16aab7
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 8 15:18:42 2013 -0500
|
||
|
||
Minor formatting changes.
|
||
|
||
commit a571af816d72727e16cad37007e7043b9d6fa362
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Apr 8 15:00:13 2013 -0500
|
||
|
||
Fixed definition of bli_is_packed_object() macro.
|
||
|
||
Details:
|
||
- Changed the definition of bli_is_packed_object() so that it keys off of the
|
||
value of the pack schema bits in the info field of obj_t, rather than
|
||
comparing the obj_t buffer with that of the mem_t entry. This was the cause
|
||
of a very low probability bug whereby uninitialized memory caused the macro
|
||
to evaluate to TRUE even though the object in question was not packed.
|
||
Thanks to Vernon Austel of IBM for helping discover this bug.
|
||
- Changed an abort() in bli_packm_part() to a not-yet-implemented.
|
||
|
||
commit 3be14c32f735ecc6169d3ab6370cf8b69162acec
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Apr 6 12:54:45 2013 -0500
|
||
|
||
Updated information in testsuite output header.
|
||
|
||
Details:
|
||
- Added to the information that is echoed at the beginning of the test suite's
|
||
output, and also re-labeled some existing information.
|
||
|
||
commit 874707c1b183a4dd9a91dbfd4ea1522384c190df
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Apr 5 17:19:43 2013 -0500
|
||
|
||
Fixed edge case handling bug in herk macrokernels.
|
||
|
||
Details:
|
||
- Fixed a bug present in bli_herk_l_ker_var2() and bli_herk_u_ker_var2() that
|
||
only manifests when BLIS is configured such that MR != NR. The bug involves
|
||
incorrectly detecting edge cases, which resulted in some parts of matrix C
|
||
potentially being skipped and not updated, depending on the problem size.
|
||
- Updated the default values of MR and NR in config/reference/bli_kernel.h to
|
||
8 and 4, respectively, so that I can better stress the framework on a
|
||
day-to-day basis. (The fact that they were both equal to 4 for so long is
|
||
why I did not stumble upon this bug much sooner.)
|
||
|
||
commit 7cbda15291d3e01300e71c286b9657b7ef0708bf
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Apr 4 15:25:43 2013 -0500
|
||
|
||
Added reference microkernels for arbitrary MR, NR.
|
||
|
||
Details:
|
||
- Added a new set of reference gemm, gemmtrsm, and trsm micro-kernels that
|
||
contain explicit loops over MR and NR, thus allowing them to be used
|
||
unmodified by developers who want to build a reference library with
|
||
custom register blocksizes.
|
||
- Changed config/reference/bli_kernel.h to use above ukernels by default.
|
||
- Changed interfaces of new and existing gemm, gemmtrsm, and trsm micro-kernels
|
||
to use 'restrict' keyword.
|
||
- Added -funroll-loops option to config/reference/make_defs.mk.
|
||
- Updated comments in bli_kernel.h describing constraints on register and
|
||
cache blocksizes.
|
||
- Updated _adds_mxn.h, _copys_mxn.h, and _xpbys_mxn.h macros files so that
|
||
single-char macros are also defined.
|
||
|
||
commit 6684b73d5501f91d24a79e26655a42819c9b3114
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Apr 2 13:06:20 2013 -0500
|
||
|
||
Implemented amax operation and related changes.
|
||
|
||
Details:
|
||
- Implemented amax operation in BLIS.
|
||
- Activated BLAS2BLIS routine mapping for new amax BLIS implementation.
|
||
- Added integer support to [f]printv, [f]printm.
|
||
- Added integer support to level-0 copys macros.
|
||
- Updated printing of configuration information in test suite driver.
|
||
- Comment changes to _config.h files.
|
||
- Added comments to bla_dot.c to reminder reader what sdsdot()/dsdot() are
|
||
used for.
|
||
|
||
commit fb68087f8727cd5fd656a742a110e54fb1c91db9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 26 15:10:16 2013 -0500
|
||
|
||
More memory alignment-related tweaks.
|
||
|
||
Details:
|
||
- Renamed BLIS_MEMORY_ALIGNMENT_SIZE to BLIS_CONTIG_MEM_ALIGN_SIZE.
|
||
- Renamed BLIS_ENABLE_MEMORY_ALIGNMENT to BLIS_ENABLE_SYSTEM_MEM_ALIGN.
|
||
- Added BLIS_SYSTEM_MEM_ALIGN_SIZE, which controls only the alignment
|
||
passed into posix_memalign() or equivalent.
|
||
- Defined new function, bli_align_dim_to_cmem(), which applies the
|
||
contiguous memory alignment (rather than the system/malloc alignment).
|
||
|
||
commit 9682ef61dbf9a8846c8b0826d4de24bc216cd641
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 26 14:14:53 2013 -0500
|
||
|
||
Always define memory alignment size cpp constant.
|
||
|
||
Details:
|
||
- Removed guard around #define for memory alignment size constant.
|
||
Memory alignment should always be enabled, and so this value should
|
||
always be defined.
|
||
|
||
commit 3a787cccaae16531474f34398e3c0cf4f49b8cd8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 26 13:59:19 2013 -0500
|
||
|
||
Renamed memory alignment macro constant.
|
||
|
||
Details:
|
||
- Renamed all occurrences of BLIS_MEMORY_ALIGNMENT_BOUNDARY to
|
||
BLIS_MEMORY_ALIGNMENT_SIZE.
|
||
|
||
commit 37308f9a502b56d94fa52a7df71c676a46c3be3d
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 26 12:43:14 2013 -0500
|
||
|
||
Align packed panel strides with system alignment.
|
||
|
||
Details:
|
||
- Pass panel strides through bli_align_dim_to_sys() to ensure that each
|
||
subsequent packed panel of A and B begins at an aligned address. (The
|
||
first panel is presumably aligned to system alignment because it is
|
||
aligned to a page boundary, which is typically much larger.)
|
||
- Rearranged code in packm_init_pack() to prevent additional conditional
|
||
blocks as a result of the aforementioned change.
|
||
- Adjusted contiguous memory allocator so that the system memory alignment
|
||
is used to allocate enough space for each block no matter what kind of
|
||
register blocking is used (even if register blocksize is unit and every
|
||
row/column needs maximal padding).
|
||
- Adjusted default blocksizes in reference configuration so that MC*KC
|
||
and KC*NC result in identical footprints for all datatypes.
|
||
|
||
commit 40a0654ada5f256beb3da80ebba015a3c71fb61f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Mar 24 20:18:12 2013 -0500
|
||
|
||
CHANGELOG update.
|
||
|
||
commit b65cdc57d9e51fa00e3c03539cfb7e045707d0f4 (tag: 0.0.5)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Mar 24 20:01:49 2013 -0500
|
||
|
||
Migrated 'bl2' prefix to 'bli'.
|
||
|
||
Details:
|
||
- Changed all filename and function prefixes from 'bl2' to 'bli'.
|
||
- Changed the "blis2.h" header filename to "blis.h" and changed all
|
||
corresponding #include statements accordingly.
|
||
- Fixed incorrect association for Fran in CREDITS file.
|
||
|
||
commit 132bffcef7441f32d02cc7485aef6a0648e0ef1e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Mar 24 18:49:36 2013 -0500
|
||
|
||
Removed several 'old' directories and files.
|
||
|
||
Details:
|
||
- Removed most of the 'old' directories scattered throughout the framework,
|
||
which includes alternate/half-baked/broken implementations.
|
||
|
||
commit 551ea4767a3ea6c263f12aaca94bc2642cee4cfa
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sun Mar 24 18:00:10 2013 -0500
|
||
|
||
Removed #include "blis2.h" from low-level headers.
|
||
|
||
Details:
|
||
- Removed #include of "blis2.h" from various lower-level, operation-specific
|
||
header files throughout the framework. Given that these low-level headers
|
||
are included within #blis2.h in a very specific order, #include'ing blis2.h
|
||
within them directly is unnecessary.
|
||
|
||
commit bc7b318ed0960edeb4537797dd8c91de0d942ca9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 22 17:18:58 2013 -0500
|
||
|
||
Added cpp guards to conflicting libflame typedefs.
|
||
|
||
Details:
|
||
- Added cpp guards around the definitions of dim_t, scomplex, and dcomplex.
|
||
This is a temporary hack to allow interoperability with libflame. (Similarly
|
||
temporary changes are being made to libflame's type definitions file.)
|
||
|
||
commit f469907503fcdc24dff0174c569170e6e756e045
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 22 15:20:15 2013 -0500
|
||
|
||
Renamed MAX_PREFETCH_BYTE_OFFSET to MAX_PRELOAD_.
|
||
|
||
Details:
|
||
- Renamed BLIS_MAX_PREFETCH_BYTE_OFFSET to
|
||
BLIS_MAX_PRELOAD_BYTE_OFFSET since "prefetch" is kind of a loaded word
|
||
(e.g. "prefetch" instructions, which are different than the particular
|
||
kind of prefetching/preloading referred to by this constant).
|
||
|
||
commit d1023bfbc6668a58a01ee4f82ded2319911e7b19
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 22 15:09:59 2013 -0500
|
||
|
||
Removed build/old directory.
|
||
|
||
commit 718888849c48d99f83eea6b8f83bc1998cffef7e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 22 15:07:01 2013 -0500
|
||
|
||
Deprecated 'flame' configuration.
|
||
|
||
Details:
|
||
- Removed 'flame' configuration, as it was horribly out-of-date.
|
||
- Comment changes to bl2_blocksize.c and bl2_mem.c.
|
||
|
||
commit bba38cf4e9d28058c14483f44fa074a6d2852ad9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Mar 19 18:07:40 2013 -0500
|
||
|
||
Added missing conjbeta argument to scald.
|
||
|
||
commit 1f82b51d06d0279dded3f2b87ba59403f3ed0af6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 18 15:37:20 2013 -0500
|
||
|
||
Relocated packed mem_t dimension fields to obj_t.
|
||
|
||
Details:
|
||
- Removed the m and n (and elem_size) fields from the mem_t object, and added
|
||
m_packed and n_packed fields to obj_t. These new fields track the same as
|
||
the old ones. From an abstraction standpoint, it seemed awkward to store
|
||
those dimensions inside the mem_t.
|
||
- Updated interfaces to bl2_mem_acquire_*() so that only a byte size argument
|
||
is passed in, instead of m, n, and elem_size.
|
||
- Updated bl2_packm_init_pack() and bl2_packv_init_pack() to inline the
|
||
functionality of bl2_mem_alloc_update_m() and bl2_mem_alloc_update_v(),
|
||
respectively.
|
||
- Updated packm variants to access the packed length and width fields from
|
||
their new locations.
|
||
|
||
commit 36c782857bf9b8ac1b1dac47a70f689a4407e2cc
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Mar 18 10:37:03 2013 -0500
|
||
|
||
CHANGELOG update.
|
||
|
||
commit e7d41229d3b1674e74f47d7f29fae004a745201a (tag: 0.0.4)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 15 17:12:36 2013 -0500
|
||
|
||
Re-implemented contiguous memory allocator.
|
||
|
||
Details:
|
||
- Completely re-wrote the contiguous memory allocator (bl2_mem.c). The new
|
||
allocator instantiates and initializes three separate memory pool objects,
|
||
each one associated with a separate array of contiguous memory blocks, each
|
||
block of fixed and uniform size. (The three pools are for allocating mc-by-kc
|
||
blocks of A, kc-by-nc panels of B, and mc-by-nc panels of C.) The pool
|
||
objects use a stack structure internally to track which blocks in the region
|
||
have been "checked out" to a thread and which are still available. Critical
|
||
regions are now clearly marked and adaptable to parallel environments (e.g.
|
||
OpenMP). Memory pools are set up when bl2_init() is called.
|
||
- Added a new field to the packm control tree node, which indicates what kind
|
||
of packed buffer is being allocated. The enumerated type for this argument
|
||
is defined as packbuf_t in bl2_type_defs.h.
|
||
- Updated level-3 _cntl.c files to pass in the appropriate value for a new
|
||
packbuf_t argument to bl2_packm_cntl_obj_create().
|
||
- Moved some macros called by packm_init_pack() from bl2_obj_macro_defs.h to
|
||
bl2_mem_macro_defs.h.
|
||
- Added BLIS_MAX_NUM_THREADS to bl2_config.h, which we use as the default
|
||
number of blocks of A reserved for the memory allocator.
|
||
- Deprecated bl2_align_dim(). Replaced usage with that of
|
||
bl2_align_dim_to_mult(). Turns out that typically we don't need to align
|
||
a dimension to the system alignment, since that value has to do with
|
||
starting addresses, whereas the values we are dealing with are unitless
|
||
dimensions.
|
||
|
||
commit 1e76cae00cb0a04544aaae1ade878686b238d283
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 15 12:21:42 2013 -0500
|
||
|
||
Perform her2k var1 loops in sequence.
|
||
|
||
Details:
|
||
- Changed variant 1 of her2k so that the two rank-k products are computed
|
||
and accumulated in sequence rather than fused into one loop. This is
|
||
necessary if BLIS is to be configured to provide only enough contiguous
|
||
memory for one panel of B.
|
||
|
||
commit c95c270eba91ae4efc26603beddfd0292caa919b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Mar 7 14:42:15 2013 -0600
|
||
|
||
Enhanced tracking of dimensions for mem_t objects.
|
||
|
||
Details:
|
||
- Added new fields to mem_t struct definition to track the allocated (as
|
||
opposed to the currently used) dimensions of the memory region. This
|
||
allows packm_init() to be more robust in situations where memory is
|
||
already allocated but is more than needed for the current packing job.
|
||
- Updated logic in bl2_obj_set_buffer_with_cached_packm_mem() macro, used
|
||
in packm_init(), to update the "currently used" dimensions of the mem_t
|
||
object if the requested dimensions are smaller than the allocated
|
||
dimensions.
|
||
|
||
commit e99281a0f41d482fddeffa239bfc8e13e6d13d4b
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Mar 7 14:00:10 2013 -0600
|
||
|
||
Fixed test suite flop formulas for ops with side.
|
||
|
||
Details:
|
||
- Fixed incorrect flop counts in test suite modules for hemm, symm, trmm,
|
||
trmm3, and trsm.
|
||
- Comment updates in herk macro-kernels.
|
||
|
||
commit ef8cbfc44dd620fdcbdb51cdb173217194bebe31
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Mar 2 12:47:06 2013 -0600
|
||
|
||
Added "version" to .gitignore.
|
||
|
||
Details:
|
||
- Added "version" to .gitignore file so that the file does not show up when
|
||
running 'git status', or accidentally get pulled into the index when
|
||
running 'git add' or 'git add --all'.
|
||
|
||
commit e9e0747c2f6c178f53ac46ab794acbb7b8c4fea8
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Sat Mar 2 12:43:54 2013 -0600
|
||
|
||
Removed version file from version control.
|
||
|
||
Details:
|
||
- Removed version file from version control to prevent git errors that occur
|
||
when trying to pull new commits.
|
||
|
||
commit bb612f864e9c17dd9805e9446840f02259619469
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Mar 1 12:55:42 2013 -0600
|
||
|
||
Updated behavior of bl2_obj_induce_trans() macro.
|
||
|
||
Details:
|
||
- Changed bl2_obj_induce_trans() so that the transposition bit is no longer
|
||
updated as part of the macro. All current uses of the macro have been
|
||
coupled with instances of bl2_obj_set_trans() to clear the bit.
|
||
- Added Jed to CREDITS file.
|
||
|
||
commit f24e29b789e7314764a818ceb3063126936c986f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 22 18:15:41 2013 -0600
|
||
|
||
Replaced banded/packed BLAS2 stubs with f2c code.
|
||
|
||
Details:
|
||
- Retired the blas2blis wrappers that simply called abort with a "not yet
|
||
implemented" message. This includes all of the level-2 banded and packed
|
||
routines.
|
||
- Replaced the aforementioned with the corresponding netlib implementations
|
||
having been run through f2c (with some customization).
|
||
- Added directories named 'attic' to build/gen-make-frags/ignore_list.
|
||
|
||
commit 1454c1a14207766dfed372b8e38b47fa384f5198
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 22 12:38:45 2013 -0600
|
||
|
||
Moved Fortran name-mangling macro to bl2_config.h.
|
||
|
||
Details:
|
||
- Moved the Fortran-77 name-mangling macros from bl2_blas_macro_defs.h to the
|
||
configuration directory (bl2_config.h, specifically) given that it can be
|
||
expected to be tweaked by some developers.
|
||
|
||
commit ede75693e5a36c6006087c4a7df834175b604504 (tag: 0.0.3)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 22 12:11:24 2013 -0600
|
||
|
||
Implemented blas2blis compatibility layer.
|
||
|
||
Details:
|
||
- Added the blas2blis compatibility layer, located in frame/compat. This
|
||
includes virtually all of the BLAS, including banded and packed level-2
|
||
operations.
|
||
|
||
- Defined bl2_init_safe(), bl2_finalize_safe(). The former allows a conditional
|
||
initialization, which stores the "exit status" in an err_t, which is then
|
||
read by the latter function to determine whether finalization should actually
|
||
take place.
|
||
- Added calls to bl2_init_safe(), bl2_finalize_safe() to all level-2 and
|
||
level-3 BLAS-like wrappers.
|
||
- Added configuration option to instruct BLIS to remain initialized whenever
|
||
it automatically initializes itself (via bl2_init_safe()), until/unless the
|
||
application code explicitly calls bl2_finalize().
|
||
|
||
- Added INSERT_GENTFUNC* and INSERT_GENTPROT* macros to facilitate type
|
||
templatization of blas2blis wrappers.
|
||
- Defined level-0 scalar macro bl2_??swaps().
|
||
- Defined level-1v operation bl2_swapv().
|
||
- Defined some "Fortran" types to bl2_type_defs.h for use with BLAS
|
||
wrappers.
|
||
|
||
commit 995edf43e21c1868732dbdd7fee14b08730218bd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 21 14:30:50 2013 -0600
|
||
|
||
Updated version file. (Forgot to in prev commit).
|
||
|
||
commit e823b08aaf7b65ecc6ddc30570709ea8a4b52aa7
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 21 12:00:17 2013 -0600
|
||
|
||
Fixed some scalar types in BLAS-like Herm APIs.
|
||
|
||
Details:
|
||
- Some of the scalars of Hermitian operations, such as alpha in her,
|
||
alpha and beta in herk, and beta in her2k, need to be real. These
|
||
arguments were typed incorrectly as the complex types. This has been
|
||
fixed. Note the issue was only present in the BLAS-like APIs for
|
||
these operations (not the native object-based interfaces).
|
||
|
||
commit 5ece050a669e74ba4a711d1d4669239d22d45642
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 20 15:50:54 2013 -0600
|
||
|
||
Updated version file. (Forgot to in prev commit).
|
||
|
||
commit f243034b8b430d4684680ea8eddfd246e73fefc0
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 20 14:11:36 2013 -0600
|
||
|
||
Changed API of packm_init_pack() to use blksz_t.
|
||
|
||
Details:
|
||
- Changed the interface of packm_init_pack() so that mult_m and mult_n
|
||
are passed in as type blksz_t* instead of dim_t.
|
||
- Make similar change for packv_init_pack().
|
||
|
||
commit da0c22f24107be9f33e0ea2dae52e5534b1fd0e5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Feb 15 09:59:48 2013 -0600
|
||
|
||
Minor changes to lower levels of scalm and setm.
|
||
|
||
Details:
|
||
- Removed diagx parameter from lower-level interfaces of scalm.
|
||
- Modified scalm_basic_check() to expect an object with a nonunit diagonal.
|
||
- Changed setm_unb_var1() so that having an implicit unit diagonal results
|
||
in only the strictly lower or upper triangle of the matrix being modified.
|
||
|
||
commit 2c836adadcd2a7d7f217033ac4d7fcad03d5bd55
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 14 10:42:56 2013 -0600
|
||
|
||
Updated beta == zero semantics of mulsc.
|
||
|
||
Details:
|
||
- Updated beta == zero semantics of mulsc. Hopefully this is the last
|
||
operation that needed updating.
|
||
- Added Devin to CREDITS file.
|
||
|
||
commit 722b66c7dcaaaa1b109e7c8b1d53fd71a9af8240
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Feb 14 10:18:00 2013 -0600
|
||
|
||
Removed some calls to setv() in test modules.
|
||
|
||
Details:
|
||
- Removed calls to setv() in test modules whose sole purpose was to
|
||
initialize vectors to zero to ensure that nan's and inf's would not
|
||
taint the computation. Now that beta == zero semantics have been
|
||
updated to clear the output operand (when beta is zero), rather than
|
||
multiply against it, these setv() calls are no longer needed.
|
||
|
||
commit e6ac623a902f776c42f85eadbf76996d9770a0db
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 13 18:44:59 2013 -0600
|
||
|
||
Properly implemented beta == 0 semantics.
|
||
|
||
Details:
|
||
- Changed name of set0 and set0_mxn macros to set0s and set0s_mxn,
|
||
respectively.
|
||
- Added code to the following operations that sets the output operand to
|
||
zero if the corresponding scalar is zero (rather than performing the
|
||
floating-point multiply, or in the case of setv, copying the value).
|
||
This will prevent nan's and inf's from creeping into results from
|
||
uninitialized memory.
|
||
- axpy
|
||
- dotxv
|
||
- scalv
|
||
- scal2v
|
||
- setv
|
||
- gemv
|
||
- ger
|
||
- hemv
|
||
- her
|
||
- her2
|
||
- gemm reference ukernels
|
||
|
||
commit aedccbc85d491e41711a0c6eb0d246d8700a199a
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 13 18:29:53 2013 -0600
|
||
|
||
Fixed stale interface to packm_unb_var1().
|
||
|
||
Details:
|
||
- Removed the control tree from the interface to packm_unb_var1(), which
|
||
I meant to do when it was un-deprecated.
|
||
|
||
commit c23135669f7a8a545e2e11ef559bf284be8bc65c
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Wed Feb 13 13:21:00 2013 -0600
|
||
|
||
Un-deprecated packm_unb_var1.c (needed by l2 ops).
|
||
|
||
Details:
|
||
- Added bl2_packm_unb_var1() back into the mix once I realized that level-2
|
||
operations still need this routine for packing matrices. Now, whether
|
||
level-2 operations should be packing matrices to begin with is another
|
||
matter. But this fixes the segmentation fault one would have gotten when
|
||
running bl2_gemv() on a general stride matrix.
|
||
|
||
commit cf49e35f9819f9d93ebdca4703ade5abab28f6f6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 12 18:39:35 2013 -0600
|
||
|
||
Removed cntl tree usage from packm implementation.
|
||
|
||
Details:
|
||
- Added new fields to obj_t info field:
|
||
- invert_diag
|
||
- pack_order_if_upper
|
||
- pack_order_if_lower
|
||
These fields allow packm_init() to embed information that begins
|
||
in the control tree into the object so that the packm implementation
|
||
does not need to use control trees at all. This is being done to aid
|
||
Bryan's DxT code generation.
|
||
- Added macros that operate on above fields.
|
||
- Changed packm_init(), packm_blk_var2(), and packm_blk_var3() according
|
||
to above changes.
|
||
- Made similar (but much simpler) changes to packv.
|
||
- Deprecated packm_blk_var1(), packm_unb_var1(), and packm_densify().
|
||
These were part of prototype implementations and are no longer needed.
|
||
|
||
commit eb139ae256651af7820b93ef982626180195b87f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 12 12:39:30 2013 -0600
|
||
|
||
Replaced bl2_abs() with _fabs() where appropriate.
|
||
|
||
commit 474bac30c99928f9e87315972bcb45c632c0b7ec
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 12 12:23:48 2013 -0600
|
||
|
||
Removed level-0 macros projrs, grabis.
|
||
|
||
Details:
|
||
- Replaced instances of projrs and grabis macros with newer,
|
||
more general-purpose getris.
|
||
|
||
commit 03a260a457c8964e4603a655cee0d40ac17affba
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Feb 12 11:45:34 2013 -0600
|
||
|
||
Restored executable permissions to scripts.
|
||
|
||
Details:
|
||
- Restored executable (0755) permissions to scripts that were touched by
|
||
the recursive sed script that updated the copyright headers in the
|
||
previous commit.
|
||
|
||
commit 1274e1243775e5e705114257a43176f63635227f
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Feb 11 14:37:47 2013 -0600
|
||
|
||
Updated copyright headers from 2012 to 2013.
|
||
|
||
commit 3b620cc8e90c53c79129bd9dd89ae6b77c2446f1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Feb 11 13:38:07 2013 -0600
|
||
|
||
CHANGELOG update.
|
||
|
||
commit 768fcebaa8be0eb936a6e7a02cd8a19438c79d99 (tag: 0.0.2)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Feb 11 13:20:44 2013 -0600
|
||
|
||
Added unified test suite, and many fixes.
|
||
|
||
Details:
|
||
- Added a highly configurable, unified test suite.
|
||
|
||
- Removed DUPB configuration constant from bl2_kernel.h and macro-kernel
|
||
header files. Now, instead, DUPB is computed as (NDUP != 1) within each
|
||
macro-kernel. This fixes a bug in trmm/trsm whereby bp was indexed into
|
||
incorrectly when DUPB was set to FALSE but the NDUP was still non-unit.
|
||
By encoding both pieces of information into one constant in _kernel.h,
|
||
it seems somewhat less likely others will encounter this bug in the
|
||
future.
|
||
- Added level-2 cache blocksizes to _kernel.h for reference configuration,
|
||
and defined blocksizes in _cntl.c files to these default values.
|
||
|
||
- Changed semantics of her2k and syr2k such that these operations no longer
|
||
expect the B matrix to already be conjugate-transposed (or just transposed
|
||
for syr2k). However, these semantics are preserved for the internal
|
||
mechanics of the implementations, including the internal back-end and all
|
||
blocked variants.
|
||
- Inserted checks for real-valued alpha and beta for herk/her2k and herk,
|
||
respectively.
|
||
|
||
- Relaxed general object structure constraints in _basic_check() for gemv, ger.
|
||
- Changed her front-end to NOT copy-cast to real projection; instead, this is
|
||
replaced by selecting either the real part or both parts within the unblocked
|
||
algorithm implementation, depending on the value of conjh.
|
||
- Added conjh to all _check routines for her so that the code knows when to
|
||
verify that alpha has an imaginary component equal to zero (for her, but
|
||
not syr).
|
||
- Changed control tree for her to forgo packing.
|
||
|
||
- Added unit diagonal support to fnormm.
|
||
- Redefined real versions of abval2s macros in terms of fabs(), fabsf().
|
||
- Redefined complex versions of sqrt2s macros using the actual "complex square
|
||
root" formula.
|
||
- Created new level-0 object-based routines, suffixed with "sc" (for "scalar").
|
||
- Defined new level-1v, -1d, and -1m versions of add and sub operations
|
||
(two-operand add and subtract).
|
||
- Added new scalar macros:
|
||
- getris: acquire real and imaginary components.
|
||
- setris: set real and imaginary components.
|
||
- addjs: addition with conjugated x.
|
||
- subjs: subtraction with conjugated x.
|
||
- Defined new utility operations:
|
||
- absumv: element-wise sum of absolute values for vector elements.
|
||
- absumm: element-wise sum of absolute values for matrix elements.
|
||
- mkherm: convert existing matrix to Hermitian.
|
||
- mksymm: convert existing matrix to symmetric.
|
||
- mktrim: convert existing matrix to triangular.
|
||
|
||
- Added various error checking routines.
|
||
- Added bl2_clock_min_diff(), which is used to more cleanly measure the
|
||
wall clock time of a code block.
|
||
- Added general stride support to bl2_obj_alloc_buffer().
|
||
- Added bl2_obj_init_scalar().
|
||
- Updated parameter mapping in bl2_param_map.c.
|
||
- Added support for queriable version string.
|
||
|
||
- Fixed a bug in the her2k macro-kernels (which currently are simply
|
||
implemented in terms of two invocations of herk) whereby beta was being
|
||
applied to both the first and second rank-k updates, rather than only
|
||
the first.
|
||
- Fixed a bug in trmm/trsm whereby transpose and right side cases were not
|
||
properly implemented due to erroneous assumptions regarding aliasing and
|
||
root objects.
|
||
- Fixed a bug in the upper triangular trsm macro-kernel in which the wrong
|
||
MR x NR block of B was being updated.
|
||
- Fixed a bug in the inverts macro in the double real case whereby the
|
||
value was typecast to float before inversion. This affected non-unit cases
|
||
of dtrsm.
|
||
- Fixed a bug in the reference kernels for gemmtrsm whereby the minus one
|
||
constant was being applied incorrectly.
|
||
- Fixed a bug in the overall treatment of non-unit alpha for trsm. The code
|
||
now mimics the rank-k strategy of gemm, whereby alpah is applied during
|
||
the first iteration of variant 3, with BLIS_ONE passed in instead for
|
||
subsequent iterations. This also required passing alpha into the macro-
|
||
kernels as well as the fused gemmtrsm micro-kernels.
|
||
- Fixed a bug in trsm_u_blk_var1 whereby the gemm macro-kernel was being
|
||
called for blocks strictly above the diagonal. While this sounds good in
|
||
theory, this cannot be done because gemm_ker_var2 expects row panels of
|
||
A to be packed from top to bottom, while for trsm_u, A is actually packed
|
||
from bottom to top due to the reverse (BR->TL) nature of the algorithm.
|
||
- Fixed a bug in packm_cxk() whereby panel packings with unit panel
|
||
dimensions were mishandled due to incorrect arguments to the copyv kernel.
|
||
Also changed the copyv kernel invocation to scal2v so that these edge
|
||
cases are properly handled when scaling is requested.
|
||
- Fixed a bug in packv_int() whereby an uninitialized object is passed in
|
||
instead of the source object.
|
||
- Fixed a bug whereby level-2 code could allocate memory dynamically via
|
||
bl2_malloc() and then attempt to free it via bl2_mm_release(). Also fixed
|
||
a potential future bug whereby a mem_t object that is actually no longer
|
||
"allocated" from the static pool is mistaken for being allocated due to
|
||
failure to NULLify the buffer when the block was most recently released.
|
||
- Fixed a bug in bl2_acquire_mpart_*() whreby the uplo field was mistakenly
|
||
toggled when the requested subpartition needed to be "reflected" due to it
|
||
residing in an unstored region.
|
||
|
||
commit be94fb84c0351602d7585269f29998e3bf83f899
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jan 4 10:55:21 2013 -0600
|
||
|
||
Added missing 'd' to fused gemmtrsm function name.
|
||
|
||
commit 879a179e1dee36f0c56765f2ab91a26861019b34
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Jan 4 10:37:27 2013 -0600
|
||
|
||
Added debug statements to bl2_mm_acquire_m().
|
||
|
||
Details:
|
||
- Added printf() statements to bl2_mm_acquire_m() to help debug issues
|
||
with prematurely exhausted memory pool.
|
||
- Removed 'd' from kernel names of reference kernels in clarksville
|
||
configuration's bl2_kernel.h
|
||
|
||
commit 806e74beb4eafeef620a555ffbb3f6779e29c7b6
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 20 17:07:50 2012 -0600
|
||
|
||
Defined Frobenius norm operations.
|
||
|
||
Details:
|
||
- Added level-0 grabis macro operation to grab imaginary component of one
|
||
variable and copy it to the real component of another variable.
|
||
- Defined sumsqv operation, which computes the sum of the absolute squares
|
||
of the elements of a vector. This implementation is modeled after ?lassq
|
||
in netlib LAPACK.
|
||
- Defined fnormv and fnormm operations, which compute the Frobenius norm on
|
||
vectors and matrices, respectively. These operations are treated as one-
|
||
operand operations where the output norm value is the real projection of
|
||
the datatype of the input operand. Both operations are implemented in terms
|
||
of sumsqv.
|
||
|
||
commit 66e80ce1aec099b2b2b0c4f295e38add2c921383
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 20 17:02:55 2012 -0600
|
||
|
||
Added GENT*R macros; tweaked bl2_machval defs.
|
||
|
||
Details:
|
||
- Added function and prototype macro-generating macros for GENTFUNCR and
|
||
GENTPROTR, which are one-operand macros with auxiliary real projection
|
||
types.
|
||
- Tweaked bl2_machval files to use new macros.
|
||
|
||
commit 2fecc88ca22142020573f168da715e8e9f3dd7de
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 20 11:35:14 2012 -0600
|
||
|
||
Fixed harmless macro bug in level-1m operations.
|
||
|
||
Details:
|
||
- Fixed some inconsistent usage of n_iter_max and n_iter in the two
|
||
bl2_set_dims_incs_uplo_[12]m macros. The right thing ended up happening
|
||
despite the bug, which is why I had not discovered it until now.
|
||
|
||
commit 8945db6ec9f82168cf72411ad408b4fdb44ae0d1
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 18 15:07:36 2012 -0600
|
||
|
||
Renamed x86,x86_64 kernels to indicate 'd' fusing.
|
||
|
||
Details:
|
||
- Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape
|
||
to emphasize that the fusing shape is not for all datatype instances, but
|
||
rather just for one (that of double-precision real). Other fusing shapes
|
||
would be proportional to their precision and domain "byte footprints".
|
||
- Corresponding changes to config/clarksville/bl2_kernel.h.
|
||
|
||
commit 6fbbdd4e194d06096ad08c5db61127be338067db
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 18 14:34:02 2012 -0600
|
||
|
||
More tweaks to _config.h, _kernel.h; smem tweaks.
|
||
|
||
Details:
|
||
- Moved kernel-related definitions form bl2_config.h to bl2_kernel.h.
|
||
- Replaced #define of _GNU_SOURCE with #define of _POSIX_C_SOURCE. This
|
||
accomplishes the same thing (enabling posix_memalign()) without enabling
|
||
all of the GNU extensions we don't need.
|
||
- Defined the size of the static memory pool in terms of MC, KC, and NC,
|
||
as well as two new constants that determine how many MCxKC blocks and
|
||
how many KCxNC blocks should be allocated (defined in bl2_config.h).
|
||
- In the case of static memory pool exhaustion, replaced the generic
|
||
bl2_abort() with a specific error code call.
|
||
|
||
commit 5d8bdb21c48e8fb11bef6128a242122cc1470a99
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 17 16:07:36 2012 -0600
|
||
|
||
Minor reordering of bl2_config.h definitions.
|
||
|
||
commit 4a83f67490136a898f558e273b76a687aed8b893
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 17 12:35:54 2012 -0600
|
||
|
||
Consolidated configuration headers.
|
||
|
||
Details:
|
||
- Merged contents of bl2_arch.h into bl2_config.h for reference and
|
||
clarksville configurations.
|
||
- Updated CREDITS, INSTALL, LICENSE, README files.
|
||
|
||
commit 0670c33cc14612f636ef09ede4133404ae0af6ba
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Dec 14 12:45:26 2012 -0600
|
||
|
||
Fixed bug in reference gemm ukernels.
|
||
|
||
Details:
|
||
- Fixed a bug whereby, for the reference gemm ukernels, the matrix product
|
||
was not correctly accumulated and scaled (by alpha) into the output matrix
|
||
C. (Thanks to Fran for finding this bug.)
|
||
- Whitespace changes to reference trsm kernels.
|
||
|
||
commit e2e7cb2fbe615be4d375bc2dce88d03d98fadc9e
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 13 18:17:54 2012 -0600
|
||
|
||
Expanded reference packm/unpackm kernel set to 16.
|
||
|
||
Details:
|
||
- Added 10xk, 12xk, 14xk, and 16xk reference kernels for packm and
|
||
unpackm.
|
||
- Updated bl2_[un]packm_cxk() to silently use scal2m if "out of range"
|
||
kernel size is requested. (Thanks to Tyler for finding this bug.)
|
||
- Updated bl2_kernel.h to contain new _KERNEL definitions, according
|
||
to above changes, for 'reference' and 'clarksville' configurations.
|
||
- Updated CHANGELOG.
|
||
- Removed "output*.m" from .gitignore.
|
||
|
||
commit 17455a8bce038dd570356ab0c5c11d9a89f20248
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 10 17:23:32 2012 -0600
|
||
|
||
Minor updates towards to 0.0.1.
|
||
|
||
commit 7ad4ebef38b8e6eea9b6091844ba7294ec870271 (tag: 0.0.1)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 10 16:18:40 2012 -0600
|
||
|
||
Tweaks to get BLIS compiling again on clarksville.
|
||
|
||
Details:
|
||
- Updated header files and make_defs.mk in config/clarksville.
|
||
- Fixes to bl2_mem.c (now that SMEM_M, SMEM_N are gone).
|
||
- Moved definition of blksz_t from bl2_cntl.h to bl2_type_defs.h.
|
||
- Shuffled include statements in blis2.h.
|
||
|
||
commit cc58ea86010b1f046134d13b546c878389df9af5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 10 14:55:12 2012 -0600
|
||
|
||
Added template fragment.mk; updated .gitignore.
|
||
|
||
commit 714c527b0eb153b7e2040b79349edc8372f743fd
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Dec 7 19:54:04 2012 -0600
|
||
|
||
Added 'changelog' make target; other tweaks.
|
||
|
||
Details:
|
||
- Updated CHANGELOG.
|
||
- Added 'changelog' target to Makefile that runs 'git log --decorate' and
|
||
overwrites CHANGELOG with the output.
|
||
- Other trivial changes.
|
||
|
||
commit e4e5404d26aded4873278e85faf6f14ac32115b5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Dec 7 17:34:53 2012 -0600
|
||
|
||
Define static memory pool size in bl2_config.h.
|
||
|
||
commit 19bb507d0de6a2bd3ce37cf616bdcd6b419ed641
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Fri Dec 7 17:18:00 2012 -0600
|
||
|
||
Refined INSTALL text; added 'showconfig' target.
|
||
|
||
Details:
|
||
- Added 'showconfig' target to Makefile.
|
||
- Added header files and ./config/<configname>/make_defs.mk as prerequisites
|
||
to object file rules.
|
||
- Added config.mk as prerequisite to library install rules.
|
||
- Edited and added to INSTALL file.
|
||
|
||
commit 26cb659dd79636489db5a051aa60fff80273a7b9
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 6 15:34:53 2012 -0600
|
||
|
||
Added auto-detection of version string (via git).
|
||
|
||
Details:
|
||
- Added build/update-version-file.sh script for auto-detecting "version"
|
||
string and updating 'version' file accordingly. (If .git directory is
|
||
not present, then it is assumed this copy of BLIS is a downloaded
|
||
release, in which case 'version' file is left unchanged.)
|
||
- Added invocation of update-version-file.sh to configure script.
|
||
|
||
commit b0ecd0ff52fa6ffc9e1d9eb44c365f7f009a6204
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 6 14:27:11 2012 -0600
|
||
|
||
Wrote first draft of INSTALL file.
|
||
|
||
commit bcbe81235a35ccfdbcc2f2319a0ca6e04f75a785 (tag: 0.0.0)
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Thu Dec 6 12:42:35 2012 -0600
|
||
|
||
Updated standalone test Makefile and other fixes.
|
||
|
||
Details:
|
||
- Major edits to test/Makefile to bring up-to-date wrt new build system;
|
||
should no longer be broken.
|
||
- Minor edits to top-level Makefile.
|
||
- Fixed copy-and-paste bugs in
|
||
- frame/1m/packm/ukernels/bl2_packm_ref_?xk.c
|
||
- frame/1m/unpackm/ukernels/bl2_unpackm_ref_?xk.c
|
||
|
||
commit 2f272b40f43307909736327f49d17737c7a05d37
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Tue Dec 4 19:22:14 2012 -0600
|
||
|
||
Added build system and continued reorganization.
|
||
|
||
Details:
|
||
- Added/renamed packm, unpackm kernels.
|
||
- Added machine value routines.
|
||
- Added param_map facility.
|
||
- Renamed AUTHORS to CREDITS.
|
||
- Added Makefile; continued to expand upon existing configure script.
|
||
- #define fuse_fac macros in operation headers if not defined already
|
||
(by the user in bl2_kernels.h).
|
||
|
||
commit 00f3498a8943be1b387f0d5c029c8c7891687ad5
|
||
Author: Field G. Van Zee <field@cs.utexas.edu>
|
||
Date: Mon Dec 3 12:36:11 2012 -0600
|
||
|
||
Initial commit.
|