CHANGELOG update (for 0.1.0).

2026-05-11 17:50:00 +00:00 · 2013-11-11 10:15:40 -06:00
parent 089048d589
commit 1a4d698f42
1 changed files with 716 additions and 1 deletions
--- a/717
+++ b/717
@@ -1,4 +1,719 @@
-commit 0680916fdd532f7a4716b11a2515243b2c08d00f (HEAD, tag: 0.0.9, origin/master, origin/HEAD, master)
+commit 089048d5895a30221b6b1976c9be93ad6443420d (HEAD, tag: 0.1.0, origin/master, master)
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Sat Nov 9 17:18:00 2013 -0600
+
+    Added object wrappers to 1f test suite modules.
+    
+    Details:
+    - Added missing object wrappers to level-1f test suite modules. This was
+      only apparent if you were configuring with something other than the
+      reference configuration.
+    - Commented out object-wrappers in level-1f front-ends. These were not
+      working as intended the reference configuration was selected, because
+      most kernel sets, such as those in the template set, do not have object
+      wrappers.
+    - Whitespace changes to template micro-kernels.
+    - Comment changes to template level-1f kernel headers.
+
+commit 9ef3752079de10124bed906b5d28479d04aa8187
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Nov 8 17:20:47 2013 -0600
+
+    Updated template kernels wrt KernelsHowTo wiki.
+    
+    Details:
+    - Merged latest state of KernelsHowTo wiki into template micro-kernels
+      located in config/template/kernels/3.
+
+commit 376bbb59c8944e29c5c1ff6637920d8451370afa
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Nov 8 11:17:34 2013 -0600
+
+    Removed support for duplication.
+    
+    Details:
+    - Removed support for duplication from the gemmtrsm/trsm micro-kernels
+      and all framework code.
+    - Updated test suite modules according to above changes.
+
+commit 68a5910974b62b4df853fae2a68cb04df9d5a19c
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Nov 7 11:36:11 2013 -0600
+
+    Added comments to testsuite/input.operations.
+    
+    Details:
+    - Added extensive comments to the top of testsuite/input.operations,
+      which describe how to edit the file.
+    - Removed input.operations.0 and input.operations.1.
+    - Changed input.general to test all datatypes ("sdcz") by default.
+
+commit a98f78b715fb256a519870071bb5266130d70b21
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Wed Nov 6 15:32:47 2013 -0600
+
+    Changed dim_t and inc_t to be signed integers.
+    
+    Details:
+    - Redefined dim_t and inc_t in terms of gint_t (instead of guint_t).
+      This will facilitate interoperability with Fortran in the future.
+      (Fortran does not support unsigned integers.)
+    - Redefined many instances of stride-related macros so that they return
+      or use the absolute value of the strides, rather than the raw strides
+      which may now be signed. Added new macros bli_is_row_stored_f() and
+      bli_is_col_stored_f(), which assume positive (forward-oriented) strides,
+      and changed the packm_blk_var[23] variants to use these macros instead
+      of the existing bli_is_row_stored(), bli_is_col_stored().
+    - Added/adjusted typecasting to to various functions/macros, including
+      bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer-
+      related macros in bli_param_macro_defs.h.
+    - Redefined bli_convert_blas_incv() macro so that the BLAS compatibility
+      layer properly handles situations where vector increments are negative.
+      Thanks to Vladimir Sukharev for pointing out this issue.
+    - Changed type of increment parameters in bli_adjust_strides() from dim_t
+      to inc_t. Likewise in bli_check_matrix_strides().
+    - Defined bli_check_matrix_object(), which checks for negative strides.
+    - Redefined bli_check_scalar_object() and bli_check_vector_object() so
+      that they also check for negative stride.
+    - Added instances of bli_check_matrix_object() to various operations'
+      _check routines.
+
+commit 1f8afc3e08a4312cfe810be86aedeacbc57275c5
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Wed Nov 6 10:09:10 2013 -0600
+
+    Minor comment update to BLAS compat files.
+
+commit 1abbf768afafc158d44e4d5c4a135cfd9e277f13
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Nov 4 15:50:00 2013 -0600
+
+    Fixed bugs in scalv and setv.
+    
+    Details:
+    - Fixed bugs similar to those addressed in cca1e1f51dc6, whereby
+      a segmentation fault may occur if beta is not the same type as
+      the vector operand for scalv and setv.
+    - Changed axpyv and scal2v front-ends in a similar fashion.
+
+commit f5953259a1842ee48e5833c22ac86e68a337bfe1
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Nov 4 14:43:55 2013 -0600
+
+    Fixed a bug related to Hermitian matrix diagonals.
+    
+    Details:
+    - Fixed a bug whereby BLIS assumed that the imaginary components of the
+      diagonal elements of Hermitian matrices were already zero. This property
+      is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
+      to Vladimir Sukharev for reporting this bug.
+    - Minor comment updates to template kernels.
+
+commit d70f2b089dac8b9e4c19295dfa6014c36afee2ec
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Sat Nov 2 17:19:40 2013 -0500
+
+    Added scaling to abval2s, sqrt2s macros.
+    
+    Details:
+    - Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow
+      and overflow from squaring the real and imaginary components. (This is
+      the same technique used to fix recent bugs in invscals/invscaljs and
+      inverts.)
+
+commit c5b1ed9409ae2f71d04041eef5da9a0080b5784a
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Nov 1 10:28:04 2013 -0500
+
+    Added new dotxaxpyf variant 2.
+    
+    Details:
+    - Added a new variant for dotxaxpyf that is based on dotxf and axpyf
+      kernels. By default, this variant is not used by any other operation.
+
+commit 97f89fbcf202d72fc440b614708e352ea31633e2
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Nov 1 10:16:39 2013 -0500
+
+    Fixed bug in complex invscals.
+    
+    Details:
+    - Fixed complex inversion in invscals and invscaljs whereby the
+      imaginary component was being computed incorrectly.
+    - Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar
+      in inverts, invscals, and invscaljs.
+    - Changed bli_abs() and bli_fabs() macro definitions to use "<="
+      operator instead of "<".
+
+commit eda42a21d17a2742eab69ab801ed530b82488c8a
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Oct 31 18:00:44 2013 -0500
+
+    Defined missing symbols in bla_rotg.c
+    
+    Details:
+    - Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and
+      z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and
+      d_abs() for completeness. Thanks to Vladimir Sukharev for reporting
+      these bugs.
+
+commit cca1e1f51dc67a2c3725d5c1837256831aaf70f8
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Wed Oct 30 14:39:01 2013 -0500
+
+    Fixed bugs in scalm and setm.
+    
+    Details:
+    - Fixed bugs in scalm and setm that resulted in segmentation faults when
+      beta is not the same type as the matrix operand. Thanks to Vladimir
+      Sukharev for reporting this bug.
+    - Changed axpym and scal2m front-ends in fashion similar to that of scalm
+      and setm; namely, the alpha scalar is copy-cast the type of the first
+      matrix operand.
+    - Changed the template and reference configurations' bli_config.h files
+      so that the number of memory allocator blocks of A and B are set based
+      on BLIS_MAX_NUM_THREADS.
+    - Comment updates to bli_obj.c and variable rename in bla_nrm2.c.
+
+commit 2807013a4761c2b84b3944de64d23483ad7ef2fb
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Oct 24 14:32:20 2013 -0500
+
+    Fixed over/under-flow in complex inversion.
+    
+    Details:
+    - Fixed the complex bli_?inverts() macros, which were inverting elements
+      in an "unsafe" manner, such that very large and very small values were
+      unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for
+      reporting this bug.
+    - Comment update to bli_sumsqv_unb_var1.c.
+    - Removed redundant bli_min() macro in bli_scalar_macro_defs.h.
+    - Changed 1.0F to 1.0 for bli_drands() macro.
+
+commit 45a80c625f84edb2ade6ac25efe2b9c589d7e0df
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Wed Oct 23 12:15:25 2013 -0500
+
+    Fixed parameter checking issue in BLAS syr[2]k.
+    
+    Details:
+    - Fixed a minor parameter checking bug in the BLAS compatibility layer
+      for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the
+      trans parameter of either operation, it is (a) allowed, and (b) treated
+      as 'T' (whereas previously it was disallowed). Thanks for Vladimir
+      Sukharev for finding and reporting this bug.
+
+commit a091a219bda55e56817acd4930c2aa4472e53ba5
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Oct 14 10:11:29 2013 -0500
+
+    Minor fixes to piledriver configuration, ukernel.
+    
+    Details:
+    - Applied a patch from Tyler that fixes minor staleness in the piledriver
+      configuration and gemm micro-kernel.
+    - Very minor changes to test suite input files.
+
+commit dacdde27aee4fb90b14880136d7f20c6b234e2c6
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Oct 11 11:37:19 2013 -0500
+
+    Added Fran's Sandy Bridge kernels/configuration.
+    
+    Details:
+    - Added a kernel directory for kernels developed by Francisco Igual for
+      the Sandy Bridge architecture, including a dgemm ukernel coded with
+      AVX intrinsics.
+    - Added a configuration for Sandy Bridge using values supplied by Fran.
+
+commit 03106d650e4030d4c9831683448376f92fc52d41
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Oct 11 10:40:38 2013 -0500
+
+    Fixed minor perf bug in gemm_ker_var2.
+    
+    Details:
+    - Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental
+      bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not
+      computed correctly (ie: do not wraparound) at the edge cases. Thanks to
+      Tze Meng for helping me identify this bug.
+
+commit b053337387dbdef9035be03538222670a21707ca
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Oct 10 18:26:55 2013 -0500
+
+    Added fusing factors, MR/NR to test suite output.
+    
+    Details:
+    - Updated the test suite driver (and modules where appropriate) so that
+      the level-1f fusing factors are output along with the variable dimension.
+      While this is not strictly necessary, since the fusing factors are output
+      in the initial parameter summary, it allows extra reassurance to the user
+      since the fusing factors appear alongside the variable dimension, which
+      together give a complete picture of the problem size. Similar changes were
+      made for outputting the register blocksizes when reporting results for the
+      micro-kernel test modules.
+
+commit be4833bd91c5a58d0bfc52daaadf7ba543a77acf
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Oct 10 14:20:06 2013 -0500
+
+    Added test suite modules for level-1f, 3 kernels.
+    
+    Details:
+    - Added test modules in test suite for level-1f kernels and level-3
+      micro-kernels. (Duplication in the micro-kernels, for now, is NOT
+      supported by these test modules.)
+    - Added section override switches to test suite's input.operations file.
+    - Added obj_t APIs for level-1f front-ends and their unblocked variants to
+      facilitate the level-1f test modules. Also added front-end for dupl
+      operation.
+    - Added obj_t-based check routines for level-1f operations, which are
+      called from the new front-ends mentioned above.
+    - Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
+      factors as a function of datatype, which is needed by their respective
+      test modules.
+    - Whitespace changes to bli_kernel.h of all existing configurations.
+
+commit 680188d46bb15b9a1a2867638104939dc77ca2a1
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Oct 10 13:23:37 2013 -0500
+
+    Cleaned up old test drivers.
+    
+    Details:
+    - Minor updates to old test drivers in preparation for our participation
+      in ACM TOMS's replicated results initiative.
+
+commit 3690bdd4f95769c935c410414112102cc3e108b1
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Oct 10 11:45:33 2013 -0500
+
+    More updates to level-1f kernels for core2-sse3.
+    
+    Details:
+    - Changed types in function signatures to match new prototypes. Meant to
+      include this in previous commit.
+
+commit 661d5120cd7071f9b0c5cefc95f99f1361370ade
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Oct 10 11:27:27 2013 -0500
+
+    Fixed outdated fusing factor macros in 1f kernels.
+    
+    Details:
+    - Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
+      macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
+      this out.
+
+commit 73aa1e9f31d1b2a319c7e711ced6db3f9835c832
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Tue Oct 1 17:01:18 2013 -0500
+
+    Added section overrides to test suite.
+    
+    Details:
+    - Added new lines of input to the test suite's input.operations file, which
+      allows the user to disable entire sections (levels) of tests. Before this
+      change, the user had to manually disable each operation tests's "master
+      switch". (This is why input.operations.0 existed: to allow a more
+      convenient starting point for someone who only wanted to test one or a
+      few operations.)
+
+commit 5e54f46ccb76beab892d530b693e07c6bf6db7cf
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Sep 30 12:58:18 2013 -0500
+
+    Added template implementations and other tweaks.
+    
+    Details:
+    - Added a 'template' configuration, which contains stub implementations of the
+      level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
+      lots of in-file comments and documentation.
+    - Modified some variable/parameter names for some 1/1f operations. (e.g.
+      renaming vector length parameter from m to n.)
+    - Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
+      to bli_kernel.h.
+    - Modifed test suite to print out fusing factors for axpyf, dotxf, and
+      dotxaxpyf, as well as the default fusing factor (which are all equal
+      in the reference and template implementations).
+    - Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
+      reference variants were implemented in terms of front-end routines rather
+      that directly in terms of the kernels. (For example, axpy2v was implemented
+      as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
+    - Changed the interface to dotxf so that it matches that of axpyf, in that
+      A is assumed to be m x b_n in both cases, and for dotxf A is actually used
+      as A^T.
+    - Minor variable naming and comment changes to reference micro-kernels in
+      frame/3/gemm/ukernels and frame/3/trsm/ukernels.
+
+commit 97aaf220a847363b4da35935eca17790c0ef71f6
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Tue Sep 17 10:51:36 2013 -0500
+
+    Added new kernels, configurations.
+    
+    Details:
+    - Added various micro-kernels for the following architectures:
+        Intel MIC
+        IBM BG/Q
+        IBM Power7
+        AMD Piledriver
+        Loogson 3A
+      and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
+      and Xianyi Zhang for contributing these kernels.
+    - Added configurations corresponding to above architectures, and renamed
+      "clarksville" configuration to "dunnington".
+
+commit fe979c5a114c877506a5697cdab1fc8cf2bcd303
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Sep 13 14:31:53 2013 -0500
+
+    Removed default configuration behavior.
+    
+    Details:
+    - Changed the configure script so that it no longer defaults to the
+      reference configuration. This change is being made so that the
+      developer has a firm awareness of which configuration is being used
+      to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this
+      suggested change.
+
+commit da77e9614f54f92f703f01e3b9bd67a83280150c
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Sep 13 12:00:37 2013 -0500
+
+    Minor improvements to static memory allocator.
+    
+    Details:
+    - Expanded on cpp macro definitions from bli_mem.c and relocated them to
+      a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
+      functionality includes computing the pool size for each datatype (using
+      that datatype's cache blocksizes) and using the maximum to size the
+      actual pool array. This addresses the somewhat common pitfall whereby a
+      developer updates cache blocksizes in bli_kernel.h for only one datatype
+      (say, single-precision real), while the memory pools are sized using the
+      double-precision real values. Then, when the developer attempts to link
+      to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
+      a message saying the static memory pool was exhausted. Clearly, this
+      message is misleading when the pool was not sized properly to begin with.
+    - Removed previously disabled code in bli_kernel_macro_defs.h that was
+      meant to check for size consistency among the various cache blocksizes.
+      (Obviously the memory pool size-based solution mentioned above is better.)
+    - Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
+      reasonable place to put these constants, rather than further crowd up
+      bli_config.h.
+    - Updated testsuite driver to output memory pool sizes for A, B, and C.
+    - Minor comment updates to bli_config.h.
+    - Removed 'flame' configuration. It was beginning to get out-of-date, and
+      I hadn't used it in months. We can always re-create it later.
+
+commit 631f347b7a99cb02757c534fd3ec5f723a2fdb0e
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Tue Sep 10 17:17:28 2013 -0500
+
+    Added ESSL and Accelerate targets to test drivers.
+    
+    Details:
+    - Added ESSL and Accelerate (OS X) targets to standalone test drivers'
+      Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
+      / providing this patch.
+
+commit 7ae4d7a41d13ef5f1ceee217c000a5cf77a11128
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Tue Sep 10 16:35:12 2013 -0500
+
+    Various changes to treatment of integers.
+    
+    Details:
+    - Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
+      assigned values of 32, 64, or some other value. The former two result in
+      defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
+      causes integers to be defined in terms of a default type (e.g. long int).
+    - Updated bli_config.h in reference and clarksville configurations according
+      to above changes.
+    - Updated test drivers in test and testsuite to avoid type warnings associated
+      with format specifiers not matching the types of their arguments to printf()
+      and scanf().
+    - Inserted missing #include "bli_system.h" into blis.h (which was slated for
+      inclusion in d141f9eeb6d1).
+    - Added explicit typecasting of dim_t and inc_t to macros in
+      bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
+    - Slight changes to CREDITS and INSTALL files.
+    - Slight tweaks to Windows build system, mostly in the form of switching to
+      Windows-style CRLF newlines for certain files.
+
+commit 068437736b41d51a1f5ec47839f059bf58a20413
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Sep 9 14:07:58 2013 -0500
+
+    Fixed set-but-not-used compiler (gcc) warnings.
+    
+    Details:
+    - Used void-casts of certain variables to appease gcc (and perhaps other
+      compilers) when such variables are only used in the complex instances of
+      the functions. Special thanks to Karl Rupp for suggesting a portable fix
+      for these warnings.
+
+commit 6dc85f63dcd5282340c9e00d585e97d70a21edc3
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Sep 9 13:48:52 2013 -0500
+
+    Small fix to Windows defs.mk makefile fragment.
+    
+    Details:
+    - Commented out a !include statement that was attempting to include a
+      version file that does not yet exist. For now, the version string is
+      hard-coded into defs.mk.
+
+commit d141f9eeb6d1de7044b7429adf52d11c6fca620c
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Sep 9 13:09:16 2013 -0500
+
+    Added Windows build system.
+    
+    Details:
+    - Added a 'windows' directory, which contains a Windows build system
+      similar to that of libflame's. Thanks to Martin for getting this up
+      and running.
+    - Spun off system header #includes into bli_system.h, which is included
+      in blis.h
+    - Added a Windows section to bli_clock.c (similar to libflame's).
+
+commit 9b320e7406fb69e8b61a0085abe2ed89a96bdb68
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Sep 9 11:04:46 2013 -0500
+
+    Edited bli_?lamch.c to avoid Windows keyword.
+    
+    Details:
+    - Renamed "small" variable to "smnum" to avoid collision with Windows type
+      by the same name. This change is needed in advance of the upcoming Windows
+      build system.
+
+commit 9013ad6ff2e9ace35e0cf44c32795c2f3d5be628
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Wed Sep 4 13:36:07 2013 -0500
+
+    Switched integer typedefs (again) to C types.
+    
+    Details:
+    - Redefined gint_t and guint_t in terms of the standard C types long int
+      and unsigned long int, respectively.
+    - Changed testsuite default max problem size to 500.
+    - Changed testsuite input.operations to use square problems for level-3
+      operation tests.
+
+commit 981a60cfa07abac2e93697dfe12b0f076ab00a38
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Wed Sep 4 12:09:11 2013 -0500
+
+    Falling back to 32-bit integers for dim_t, etc.
+    
+    Details:
+    - In light of recent segfaulting issues when compiling on 32-bit systems,
+      I've changed the default typedef for gint_t and guint_t from int64_t and
+      uint64_t to int32_t and uint32_t, respectively.
+    - Disabled 64-bit integers in the blas2blis layer for the reference
+      configuration.
+    - Added type sizes of gint_t, guint_t, and the four floating-point datatypes
+      to introductory output of the testsuite.
+
+commit b776ddcd4338b34f172ef78da0ac1d771a771ab4
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Tue Sep 3 21:58:07 2013 -0500
+
+    Applied temp fix to typecasting bug in testsuite.
+    
+    Details:
+    - Applied a temporary fix to the typecasting bug in the testsuite driver.
+      The fix involves casting both numerator and denominator to unsigned long.
+      This fix is more voodoo than science, as I can't be sure why it even
+      works.
+
+commit 9ee6e125373869c4213c017ce772c38ecefba103
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Tue Sep 3 21:53:27 2013 -0500
+
+    Changed dimension spec for gemm in testsuite.
+    
+    Details:
+    - Encounted a bizarre typecasting bug whereby the test suite was not
+      computing the proper dimension from the problem size and dimension
+      specification when the latter was set to -3. Will investigate.
+      Thanks to Fran for finding this "bug".
+
+commit e8be081e68c385ab44d0fea8dade21d40c200b79
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Wed Aug 28 15:52:34 2013 -0500
+
+    Generalized matlab and file output in testsuite.
+    
+    Details:
+    - Added a new option in input.general that allows outputting in
+      matlab/octave format so that one can output in matlab format
+      independently from outputting to files.
+    - Adjusted input.operations according to above.
+    - Added input.operations.0 and input.operations.1 with all options
+      disabled and enabled, respectively.
+
+commit d352c746e5683037d41b5061dfb5ce08e1d0843b
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Tue Aug 27 13:41:46 2013 -0500
+
+    Added single/real gemm micro-kernel for x86_64.
+    
+    Details:
+    - Added a single-precision real gemm micro-kernel in
+      kernels/x86_64/3/bli_gemm_opt_d4x4.c.
+    - Adjusted the single-precision real register blocksizes in
+      config/clarksville/bli_kernel.h to be 8x4.
+    - Added a missing comment to bli_packm_blk_var2.c that was present in
+      bli_packm_blk_var3.c
+
+commit dedda523dc5dc779ecc34e6a03dc74cb8eb220de
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Aug 19 12:07:41 2013 -0500
+
+    Fixed bug in bli_acquire_mpart_t2b(), _l2r().
+    
+    Details:
+    - Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r()
+      that cause incorrect partitioning when SUBPART0 was requested. This
+      bug was introduced in 46d3d09d49ad. Thanks to Bryan for isolating
+      this bug.
+    - Removed dupl kernels from kernels/x86_64/3 directory.
+    - Uncommented beta == 0 optimizaition code in
+      kernels/x86_64/3/bli_gemm_opt_d4x4.c.
+
+commit 12dbd2f33455e9384fe2070cbdd660fd4a7fceb5
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Aug 8 14:39:35 2013 -0500
+
+    Moved init_safe(), finalize_safe() to BLAS compat.
+    
+    Details:
+    - Moved the bli_init_safe() and bli_finalize_safe() function calls from the
+      BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto-
+      initializers in the BLIS layer wasn't buying us anything because the user
+      could still call the library with uninitialized global scalar constants,
+      for example. Thus, we will just have to live with the constraint that
+      bli_init() MUST be called before calling ANY routine with a bli_ prefix.
+    - Added the missing _init_safe() and finalize_safe() calls to the level-1
+      BLAS compatibility wrappers.
+
+commit 8abfe55f2ae5d89df18e1b26a5a28d94b0936683
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Aug 8 13:30:19 2013 -0500
+
+    Miscellaneous updates.
+    
+    Details:
+    - Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
+      BLIS_CACHE_LINE_SIZE (typically 64).
+    - Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
+      kernels.
+    - Reformulated gemm_ker_var2 to look more like the other level-3 macro-
+      kernels, in that the interior and edge-case handling is expressed once
+      inside the loops in the n and m dimensions, rather than the edge-case
+      handling being "unrolled" and expressed as distinct code regions. The
+      previous macro-kernel now lives in retired form in the subdirectory
+      other/bli_gemm_ker_var2.c.old.
+    - Updated experimental gemm_ker_var5 according to above change.
+    - Fixed bug in bli_her2k.c whereby incorrect transformations were being
+      applied to optimize the macro-kernel accesses pattern on C when C is
+      row-stored.
+    - Various updates inside of test/exec_sizes.
+
+commit 1aa05736ff49e7cc5f121acf615460fe9a87852c
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Wed Aug 7 12:27:04 2013 -0500
+
+    Fixed bug in interface of bla_ger_check().
+    
+    Details:
+    - Fixed the misplaced lda parameter in the function signature of
+      bla_ger_check(). Thanks to Tyler for finding this bug.
+
+commit 685aad25353fb200de4ca97a8bc0feeebde51d0f
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Tue Aug 6 12:25:51 2013 -0500
+
+    Fixed cpp guard typos in frame/compat/check files.
+    
+    Details:
+    - Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been
+      BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this.
+    - Fixed various syntax errors in the code that had yet to be compiled
+      due to the aforementioned bug.
+
+commit f4ec28e723d28d998f1038f82da6986e44320ef6
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Thu Aug 1 11:24:23 2013 -0500
+
+    Added basic OpenMP-based gemm and packm files.
+    
+    Details:
+    - Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2
+      into the following auxiliary files
+    
+        frame/1m/packm/other/bli_packm_blk_var2.c
+        frame/3/gemm/other/bli_gemm_ker_var2.c
+    
+      The routine in the first file uses a basic OpenMP parallel region to
+      parallelize the packing of blocks of A and panels of B, while the
+      second uses a similar parallel region to parallelize along the n
+      dimension of the gemm macro-kernel.
+
+commit f8980edf9c318453bb1962ac4939c06bf11e6d5e
+Merge: 67a8b94 6e7e452
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Jul 26 11:14:27 2013 -0500
+
+    Merge branch 'master' of https://code.google.com/p/blis
+
+commit 67a8b9498d13b038deb316ac163e62c5b17da2ec
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Jul 26 11:12:37 2013 -0500
+
+    Added missing cpp kernel blocksize constraints.
+    
+    Details:
+    - Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce
+      constraints on the register blocksizes relative to the cache blocksizes.
+      Thanks to Tyler for helping me stumble across this issue.
+
+commit 6e7e452343014e8f86640874dc1dbadca4a642a1
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Jul 22 14:50:57 2013 -0500
+
+    Fixed minor warnings and misc issues.
+    
+    Details:
+    - Fixed various warnings output by gcc 4.6.3-1, including removing some
+      set-but-not-used variables and addressing some instances of typecasting
+      of pointer types to integer types of different sizes.
+
+commit 03f6c3599743bc837a7d40eb5b415b1bf4f2a4e9
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Mon Jul 22 12:54:32 2013 -0500
+
+    Tightened some macros that detect datatypes.
+    
+    Details:
+    - Modified the definitions of some macros, such as bli_is_real(), so that
+      the "special" bit is taken into account so that BLIS_INT is differentiated
+      from BLIS_FLOAT.
+    - Whitespace changes to bli_obj_macro_defs.h.
+    - Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
+      being used.
+
+commit b33e2f4443b9043b554963320280ff7783773652
+Author: Field G. Van Zee <field@cs.utexas.edu>
+Date:   Fri Jul 19 17:15:03 2013 -0500
+
+    CHANGELOG update (for 0.0.9).
+
+commit 0680916fdd532f7a4716b11a2515243b2c08d00f (tag: 0.0.9)
 Author: Field G. Van Zee <field@cs.utexas.edu>
 Date:   Thu Jul 18 18:04:34 2013 -0500