From 1a4d698f42981d74fe5f29b980031e1ee7dc42d5 Mon Sep 17 00:00:00 2001 From: "Field G. Van Zee" Date: Mon, 11 Nov 2013 10:15:40 -0600 Subject: [PATCH] CHANGELOG update (for 0.1.0). --- CHANGELOG | 717 +++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 716 insertions(+), 1 deletion(-) diff --git a/CHANGELOG b/CHANGELOG index 8c39ae88b..a5aae8601 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,4 +1,719 @@ -commit 0680916fdd532f7a4716b11a2515243b2c08d00f (HEAD, tag: 0.0.9, origin/master, origin/HEAD, master) +commit 089048d5895a30221b6b1976c9be93ad6443420d (HEAD, tag: 0.1.0, origin/master, master) +Author: Field G. Van Zee +Date: Sat Nov 9 17:18:00 2013 -0600 + + Added object wrappers to 1f test suite modules. + + Details: + - Added missing object wrappers to level-1f test suite modules. This was + only apparent if you were configuring with something other than the + reference configuration. + - Commented out object-wrappers in level-1f front-ends. These were not + working as intended the reference configuration was selected, because + most kernel sets, such as those in the template set, do not have object + wrappers. + - Whitespace changes to template micro-kernels. + - Comment changes to template level-1f kernel headers. + +commit 9ef3752079de10124bed906b5d28479d04aa8187 +Author: Field G. Van Zee +Date: Fri Nov 8 17:20:47 2013 -0600 + + Updated template kernels wrt KernelsHowTo wiki. + + Details: + - Merged latest state of KernelsHowTo wiki into template micro-kernels + located in config/template/kernels/3. + +commit 376bbb59c8944e29c5c1ff6637920d8451370afa +Author: Field G. Van Zee +Date: Fri Nov 8 11:17:34 2013 -0600 + + Removed support for duplication. + + Details: + - Removed support for duplication from the gemmtrsm/trsm micro-kernels + and all framework code. + - Updated test suite modules according to above changes. + +commit 68a5910974b62b4df853fae2a68cb04df9d5a19c +Author: Field G. Van Zee +Date: Thu Nov 7 11:36:11 2013 -0600 + + Added comments to testsuite/input.operations. + + Details: + - Added extensive comments to the top of testsuite/input.operations, + which describe how to edit the file. + - Removed input.operations.0 and input.operations.1. + - Changed input.general to test all datatypes ("sdcz") by default. + +commit a98f78b715fb256a519870071bb5266130d70b21 +Author: Field G. Van Zee +Date: Wed Nov 6 15:32:47 2013 -0600 + + Changed dim_t and inc_t to be signed integers. + + Details: + - Redefined dim_t and inc_t in terms of gint_t (instead of guint_t). + This will facilitate interoperability with Fortran in the future. + (Fortran does not support unsigned integers.) + - Redefined many instances of stride-related macros so that they return + or use the absolute value of the strides, rather than the raw strides + which may now be signed. Added new macros bli_is_row_stored_f() and + bli_is_col_stored_f(), which assume positive (forward-oriented) strides, + and changed the packm_blk_var[23] variants to use these macros instead + of the existing bli_is_row_stored(), bli_is_col_stored(). + - Added/adjusted typecasting to to various functions/macros, including + bli_obj_alloc_buffer(), bli_obj_buffer_at_off(), and various pointer- + related macros in bli_param_macro_defs.h. + - Redefined bli_convert_blas_incv() macro so that the BLAS compatibility + layer properly handles situations where vector increments are negative. + Thanks to Vladimir Sukharev for pointing out this issue. + - Changed type of increment parameters in bli_adjust_strides() from dim_t + to inc_t. Likewise in bli_check_matrix_strides(). + - Defined bli_check_matrix_object(), which checks for negative strides. + - Redefined bli_check_scalar_object() and bli_check_vector_object() so + that they also check for negative stride. + - Added instances of bli_check_matrix_object() to various operations' + _check routines. + +commit 1f8afc3e08a4312cfe810be86aedeacbc57275c5 +Author: Field G. Van Zee +Date: Wed Nov 6 10:09:10 2013 -0600 + + Minor comment update to BLAS compat files. + +commit 1abbf768afafc158d44e4d5c4a135cfd9e277f13 +Author: Field G. Van Zee +Date: Mon Nov 4 15:50:00 2013 -0600 + + Fixed bugs in scalv and setv. + + Details: + - Fixed bugs similar to those addressed in cca1e1f51dc6, whereby + a segmentation fault may occur if beta is not the same type as + the vector operand for scalv and setv. + - Changed axpyv and scal2v front-ends in a similar fashion. + +commit f5953259a1842ee48e5833c22ac86e68a337bfe1 +Author: Field G. Van Zee +Date: Mon Nov 4 14:43:55 2013 -0600 + + Fixed a bug related to Hermitian matrix diagonals. + + Details: + - Fixed a bug whereby BLIS assumed that the imaginary components of the + diagonal elements of Hermitian matrices were already zero. This property + is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks + to Vladimir Sukharev for reporting this bug. + - Minor comment updates to template kernels. + +commit d70f2b089dac8b9e4c19295dfa6014c36afee2ec +Author: Field G. Van Zee +Date: Sat Nov 2 17:19:40 2013 -0500 + + Added scaling to abval2s, sqrt2s macros. + + Details: + - Re-defined abval2s and sqrt2s macros to use scaling to avoid underflow + and overflow from squaring the real and imaginary components. (This is + the same technique used to fix recent bugs in invscals/invscaljs and + inverts.) + +commit c5b1ed9409ae2f71d04041eef5da9a0080b5784a +Author: Field G. Van Zee +Date: Fri Nov 1 10:28:04 2013 -0500 + + Added new dotxaxpyf variant 2. + + Details: + - Added a new variant for dotxaxpyf that is based on dotxf and axpyf + kernels. By default, this variant is not used by any other operation. + +commit 97f89fbcf202d72fc440b614708e352ea31633e2 +Author: Field G. Van Zee +Date: Fri Nov 1 10:16:39 2013 -0500 + + Fixed bug in complex invscals. + + Details: + - Fixed complex inversion in invscals and invscaljs whereby the + imaginary component was being computed incorrectly. + - Use bli_fmaxabs() instead of bli_fabs() when choosing the scalar + in inverts, invscals, and invscaljs. + - Changed bli_abs() and bli_fabs() macro definitions to use "<=" + operator instead of "<". + +commit eda42a21d17a2742eab69ab801ed530b82488c8a +Author: Field G. Van Zee +Date: Thu Oct 31 18:00:44 2013 -0500 + + Defined missing symbols in bla_rotg.c + + Details: + - Defined local equivalents of libf2c's r_sign(), d_sign(), c_abs(), and + z_abs(), which are needed by bla_rotg.c. Also defined r_abs() and + d_abs() for completeness. Thanks to Vladimir Sukharev for reporting + these bugs. + +commit cca1e1f51dc67a2c3725d5c1837256831aaf70f8 +Author: Field G. Van Zee +Date: Wed Oct 30 14:39:01 2013 -0500 + + Fixed bugs in scalm and setm. + + Details: + - Fixed bugs in scalm and setm that resulted in segmentation faults when + beta is not the same type as the matrix operand. Thanks to Vladimir + Sukharev for reporting this bug. + - Changed axpym and scal2m front-ends in fashion similar to that of scalm + and setm; namely, the alpha scalar is copy-cast the type of the first + matrix operand. + - Changed the template and reference configurations' bli_config.h files + so that the number of memory allocator blocks of A and B are set based + on BLIS_MAX_NUM_THREADS. + - Comment updates to bli_obj.c and variable rename in bla_nrm2.c. + +commit 2807013a4761c2b84b3944de64d23483ad7ef2fb +Author: Field G. Van Zee +Date: Thu Oct 24 14:32:20 2013 -0500 + + Fixed over/under-flow in complex inversion. + + Details: + - Fixed the complex bli_?inverts() macros, which were inverting elements + in an "unsafe" manner, such that very large and very small values were + unnecessarily over/under-flowing. Thanks for Vladimir Sukharev for + reporting this bug. + - Comment update to bli_sumsqv_unb_var1.c. + - Removed redundant bli_min() macro in bli_scalar_macro_defs.h. + - Changed 1.0F to 1.0 for bli_drands() macro. + +commit 45a80c625f84edb2ade6ac25efe2b9c589d7e0df +Author: Field G. Van Zee +Date: Wed Oct 23 12:15:25 2013 -0500 + + Fixed parameter checking issue in BLAS syr[2]k. + + Details: + - Fixed a minor parameter checking bug in the BLAS compatibility layer + for [sd]syrk and [sd]syr2k. Specifically, if 'C' is passed in for the + trans parameter of either operation, it is (a) allowed, and (b) treated + as 'T' (whereas previously it was disallowed). Thanks for Vladimir + Sukharev for finding and reporting this bug. + +commit a091a219bda55e56817acd4930c2aa4472e53ba5 +Author: Field G. Van Zee +Date: Mon Oct 14 10:11:29 2013 -0500 + + Minor fixes to piledriver configuration, ukernel. + + Details: + - Applied a patch from Tyler that fixes minor staleness in the piledriver + configuration and gemm micro-kernel. + - Very minor changes to test suite input files. + +commit dacdde27aee4fb90b14880136d7f20c6b234e2c6 +Author: Field G. Van Zee +Date: Fri Oct 11 11:37:19 2013 -0500 + + Added Fran's Sandy Bridge kernels/configuration. + + Details: + - Added a kernel directory for kernels developed by Francisco Igual for + the Sandy Bridge architecture, including a dgemm ukernel coded with + AVX intrinsics. + - Added a configuration for Sandy Bridge using values supplied by Fran. + +commit 03106d650e4030d4c9831683448376f92fc52d41 +Author: Field G. Van Zee +Date: Fri Oct 11 10:40:38 2013 -0500 + + Fixed minor perf bug in gemm_ker_var2. + + Details: + - Fixed a minor performance bug in bli_gemm_ker_var2.c (and the experimental + bli_gemm_ker_var5.c) whereby the addresses for a_next and b_next are not + computed correctly (ie: do not wraparound) at the edge cases. Thanks to + Tze Meng for helping me identify this bug. + +commit b053337387dbdef9035be03538222670a21707ca +Author: Field G. Van Zee +Date: Thu Oct 10 18:26:55 2013 -0500 + + Added fusing factors, MR/NR to test suite output. + + Details: + - Updated the test suite driver (and modules where appropriate) so that + the level-1f fusing factors are output along with the variable dimension. + While this is not strictly necessary, since the fusing factors are output + in the initial parameter summary, it allows extra reassurance to the user + since the fusing factors appear alongside the variable dimension, which + together give a complete picture of the problem size. Similar changes were + made for outputting the register blocksizes when reporting results for the + micro-kernel test modules. + +commit be4833bd91c5a58d0bfc52daaadf7ba543a77acf +Author: Field G. Van Zee +Date: Thu Oct 10 14:20:06 2013 -0500 + + Added test suite modules for level-1f, 3 kernels. + + Details: + - Added test modules in test suite for level-1f kernels and level-3 + micro-kernels. (Duplication in the micro-kernels, for now, is NOT + supported by these test modules.) + - Added section override switches to test suite's input.operations file. + - Added obj_t APIs for level-1f front-ends and their unblocked variants to + facilitate the level-1f test modules. Also added front-end for dupl + operation. + - Added obj_t-based check routines for level-1f operations, which are + called from the new front-ends mentioned above. + - Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing + factors as a function of datatype, which is needed by their respective + test modules. + - Whitespace changes to bli_kernel.h of all existing configurations. + +commit 680188d46bb15b9a1a2867638104939dc77ca2a1 +Author: Field G. Van Zee +Date: Thu Oct 10 13:23:37 2013 -0500 + + Cleaned up old test drivers. + + Details: + - Minor updates to old test drivers in preparation for our participation + in ACM TOMS's replicated results initiative. + +commit 3690bdd4f95769c935c410414112102cc3e108b1 +Author: Field G. Van Zee +Date: Thu Oct 10 11:45:33 2013 -0500 + + More updates to level-1f kernels for core2-sse3. + + Details: + - Changed types in function signatures to match new prototypes. Meant to + include this in previous commit. + +commit 661d5120cd7071f9b0c5cefc95f99f1361370ade +Author: Field G. Van Zee +Date: Thu Oct 10 11:27:27 2013 -0500 + + Fixed outdated fusing factor macros in 1f kernels. + + Details: + - Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor + macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing + this out. + +commit 73aa1e9f31d1b2a319c7e711ced6db3f9835c832 +Author: Field G. Van Zee +Date: Tue Oct 1 17:01:18 2013 -0500 + + Added section overrides to test suite. + + Details: + - Added new lines of input to the test suite's input.operations file, which + allows the user to disable entire sections (levels) of tests. Before this + change, the user had to manually disable each operation tests's "master + switch". (This is why input.operations.0 existed: to allow a more + convenient starting point for someone who only wanted to test one or a + few operations.) + +commit 5e54f46ccb76beab892d530b693e07c6bf6db7cf +Author: Field G. Van Zee +Date: Mon Sep 30 12:58:18 2013 -0500 + + Added template implementations and other tweaks. + + Details: + - Added a 'template' configuration, which contains stub implementations of the + level 1, 1f, and 3 kernels with one datatype implemented in C for each, with + lots of in-file comments and documentation. + - Modified some variable/parameter names for some 1/1f operations. (e.g. + renaming vector length parameter from m to n.) + - Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files + to bli_kernel.h. + - Modifed test suite to print out fusing factors for axpyf, dotxf, and + dotxaxpyf, as well as the default fusing factor (which are all equal + in the reference and template implementations). + - Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these + reference variants were implemented in terms of front-end routines rather + that directly in terms of the kernels. (For example, axpy2v was implemented + as two calls to axpyv rather than two calls to AXPYV_KERNEL.) + - Changed the interface to dotxf so that it matches that of axpyf, in that + A is assumed to be m x b_n in both cases, and for dotxf A is actually used + as A^T. + - Minor variable naming and comment changes to reference micro-kernels in + frame/3/gemm/ukernels and frame/3/trsm/ukernels. + +commit 97aaf220a847363b4da35935eca17790c0ef71f6 +Author: Field G. Van Zee +Date: Tue Sep 17 10:51:36 2013 -0500 + + Added new kernels, configurations. + + Details: + - Added various micro-kernels for the following architectures: + Intel MIC + IBM BG/Q + IBM Power7 + AMD Piledriver + Loogson 3A + and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler, + and Xianyi Zhang for contributing these kernels. + - Added configurations corresponding to above architectures, and renamed + "clarksville" configuration to "dunnington". + +commit fe979c5a114c877506a5697cdab1fc8cf2bcd303 +Author: Field G. Van Zee +Date: Fri Sep 13 14:31:53 2013 -0500 + + Removed default configuration behavior. + + Details: + - Changed the configure script so that it no longer defaults to the + reference configuration. This change is being made so that the + developer has a firm awareness of which configuration is being used + to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this + suggested change. + +commit da77e9614f54f92f703f01e3b9bd67a83280150c +Author: Field G. Van Zee +Date: Fri Sep 13 12:00:37 2013 -0500 + + Minor improvements to static memory allocator. + + Details: + - Expanded on cpp macro definitions from bli_mem.c and relocated them to + a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded + functionality includes computing the pool size for each datatype (using + that datatype's cache blocksizes) and using the maximum to size the + actual pool array. This addresses the somewhat common pitfall whereby a + developer updates cache blocksizes in bli_kernel.h for only one datatype + (say, single-precision real), while the memory pools are sized using the + double-precision real values. Then, when the developer attempts to link + to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with + a message saying the static memory pool was exhausted. Clearly, this + message is misleading when the pool was not sized properly to begin with. + - Removed previously disabled code in bli_kernel_macro_defs.h that was + meant to check for size consistency among the various cache blocksizes. + (Obviously the memory pool size-based solution mentioned above is better.) + - Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a + reasonable place to put these constants, rather than further crowd up + bli_config.h. + - Updated testsuite driver to output memory pool sizes for A, B, and C. + - Minor comment updates to bli_config.h. + - Removed 'flame' configuration. It was beginning to get out-of-date, and + I hadn't used it in months. We can always re-create it later. + +commit 631f347b7a99cb02757c534fd3ec5f723a2fdb0e +Author: Field G. Van Zee +Date: Tue Sep 10 17:17:28 2013 -0500 + + Added ESSL and Accelerate targets to test drivers. + + Details: + - Added ESSL and Accelerate (OS X) targets to standalone test drivers' + Makefile in "test" directory. Thanks to Jeff Hammond for suggesting + / providing this patch. + +commit 7ae4d7a41d13ef5f1ceee217c000a5cf77a11128 +Author: Field G. Van Zee +Date: Tue Sep 10 16:35:12 2013 -0500 + + Various changes to treatment of integers. + + Details: + - Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be + assigned values of 32, 64, or some other value. The former two result in + defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter + causes integers to be defined in terms of a default type (e.g. long int). + - Updated bli_config.h in reference and clarksville configurations according + to above changes. + - Updated test drivers in test and testsuite to avoid type warnings associated + with format specifiers not matching the types of their arguments to printf() + and scanf(). + - Inserted missing #include "bli_system.h" into blis.h (which was slated for + inclusion in d141f9eeb6d1). + - Added explicit typecasting of dim_t and inc_t to macros in + bli_blas_macro_defs.h (which are used in BLAS compatibility layer). + - Slight changes to CREDITS and INSTALL files. + - Slight tweaks to Windows build system, mostly in the form of switching to + Windows-style CRLF newlines for certain files. + +commit 068437736b41d51a1f5ec47839f059bf58a20413 +Author: Field G. Van Zee +Date: Mon Sep 9 14:07:58 2013 -0500 + + Fixed set-but-not-used compiler (gcc) warnings. + + Details: + - Used void-casts of certain variables to appease gcc (and perhaps other + compilers) when such variables are only used in the complex instances of + the functions. Special thanks to Karl Rupp for suggesting a portable fix + for these warnings. + +commit 6dc85f63dcd5282340c9e00d585e97d70a21edc3 +Author: Field G. Van Zee +Date: Mon Sep 9 13:48:52 2013 -0500 + + Small fix to Windows defs.mk makefile fragment. + + Details: + - Commented out a !include statement that was attempting to include a + version file that does not yet exist. For now, the version string is + hard-coded into defs.mk. + +commit d141f9eeb6d1de7044b7429adf52d11c6fca620c +Author: Field G. Van Zee +Date: Mon Sep 9 13:09:16 2013 -0500 + + Added Windows build system. + + Details: + - Added a 'windows' directory, which contains a Windows build system + similar to that of libflame's. Thanks to Martin for getting this up + and running. + - Spun off system header #includes into bli_system.h, which is included + in blis.h + - Added a Windows section to bli_clock.c (similar to libflame's). + +commit 9b320e7406fb69e8b61a0085abe2ed89a96bdb68 +Author: Field G. Van Zee +Date: Mon Sep 9 11:04:46 2013 -0500 + + Edited bli_?lamch.c to avoid Windows keyword. + + Details: + - Renamed "small" variable to "smnum" to avoid collision with Windows type + by the same name. This change is needed in advance of the upcoming Windows + build system. + +commit 9013ad6ff2e9ace35e0cf44c32795c2f3d5be628 +Author: Field G. Van Zee +Date: Wed Sep 4 13:36:07 2013 -0500 + + Switched integer typedefs (again) to C types. + + Details: + - Redefined gint_t and guint_t in terms of the standard C types long int + and unsigned long int, respectively. + - Changed testsuite default max problem size to 500. + - Changed testsuite input.operations to use square problems for level-3 + operation tests. + +commit 981a60cfa07abac2e93697dfe12b0f076ab00a38 +Author: Field G. Van Zee +Date: Wed Sep 4 12:09:11 2013 -0500 + + Falling back to 32-bit integers for dim_t, etc. + + Details: + - In light of recent segfaulting issues when compiling on 32-bit systems, + I've changed the default typedef for gint_t and guint_t from int64_t and + uint64_t to int32_t and uint32_t, respectively. + - Disabled 64-bit integers in the blas2blis layer for the reference + configuration. + - Added type sizes of gint_t, guint_t, and the four floating-point datatypes + to introductory output of the testsuite. + +commit b776ddcd4338b34f172ef78da0ac1d771a771ab4 +Author: Field G. Van Zee +Date: Tue Sep 3 21:58:07 2013 -0500 + + Applied temp fix to typecasting bug in testsuite. + + Details: + - Applied a temporary fix to the typecasting bug in the testsuite driver. + The fix involves casting both numerator and denominator to unsigned long. + This fix is more voodoo than science, as I can't be sure why it even + works. + +commit 9ee6e125373869c4213c017ce772c38ecefba103 +Author: Field G. Van Zee +Date: Tue Sep 3 21:53:27 2013 -0500 + + Changed dimension spec for gemm in testsuite. + + Details: + - Encounted a bizarre typecasting bug whereby the test suite was not + computing the proper dimension from the problem size and dimension + specification when the latter was set to -3. Will investigate. + Thanks to Fran for finding this "bug". + +commit e8be081e68c385ab44d0fea8dade21d40c200b79 +Author: Field G. Van Zee +Date: Wed Aug 28 15:52:34 2013 -0500 + + Generalized matlab and file output in testsuite. + + Details: + - Added a new option in input.general that allows outputting in + matlab/octave format so that one can output in matlab format + independently from outputting to files. + - Adjusted input.operations according to above. + - Added input.operations.0 and input.operations.1 with all options + disabled and enabled, respectively. + +commit d352c746e5683037d41b5061dfb5ce08e1d0843b +Author: Field G. Van Zee +Date: Tue Aug 27 13:41:46 2013 -0500 + + Added single/real gemm micro-kernel for x86_64. + + Details: + - Added a single-precision real gemm micro-kernel in + kernels/x86_64/3/bli_gemm_opt_d4x4.c. + - Adjusted the single-precision real register blocksizes in + config/clarksville/bli_kernel.h to be 8x4. + - Added a missing comment to bli_packm_blk_var2.c that was present in + bli_packm_blk_var3.c + +commit dedda523dc5dc779ecc34e6a03dc74cb8eb220de +Author: Field G. Van Zee +Date: Mon Aug 19 12:07:41 2013 -0500 + + Fixed bug in bli_acquire_mpart_t2b(), _l2r(). + + Details: + - Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r() + that cause incorrect partitioning when SUBPART0 was requested. This + bug was introduced in 46d3d09d49ad. Thanks to Bryan for isolating + this bug. + - Removed dupl kernels from kernels/x86_64/3 directory. + - Uncommented beta == 0 optimizaition code in + kernels/x86_64/3/bli_gemm_opt_d4x4.c. + +commit 12dbd2f33455e9384fe2070cbdd660fd4a7fceb5 +Author: Field G. Van Zee +Date: Thu Aug 8 14:39:35 2013 -0500 + + Moved init_safe(), finalize_safe() to BLAS compat. + + Details: + - Moved the bli_init_safe() and bli_finalize_safe() function calls from the + BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto- + initializers in the BLIS layer wasn't buying us anything because the user + could still call the library with uninitialized global scalar constants, + for example. Thus, we will just have to live with the constraint that + bli_init() MUST be called before calling ANY routine with a bli_ prefix. + - Added the missing _init_safe() and finalize_safe() calls to the level-1 + BLAS compatibility wrappers. + +commit 8abfe55f2ae5d89df18e1b26a5a28d94b0936683 +Author: Field G. Van Zee +Date: Thu Aug 8 13:30:19 2013 -0500 + + Miscellaneous updates. + + Details: + - Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to + BLIS_CACHE_LINE_SIZE (typically 64). + - Changed the use of nr in sizing of bd buffer to packnr in level-3 macro- + kernels. + - Reformulated gemm_ker_var2 to look more like the other level-3 macro- + kernels, in that the interior and edge-case handling is expressed once + inside the loops in the n and m dimensions, rather than the edge-case + handling being "unrolled" and expressed as distinct code regions. The + previous macro-kernel now lives in retired form in the subdirectory + other/bli_gemm_ker_var2.c.old. + - Updated experimental gemm_ker_var5 according to above change. + - Fixed bug in bli_her2k.c whereby incorrect transformations were being + applied to optimize the macro-kernel accesses pattern on C when C is + row-stored. + - Various updates inside of test/exec_sizes. + +commit 1aa05736ff49e7cc5f121acf615460fe9a87852c +Author: Field G. Van Zee +Date: Wed Aug 7 12:27:04 2013 -0500 + + Fixed bug in interface of bla_ger_check(). + + Details: + - Fixed the misplaced lda parameter in the function signature of + bla_ger_check(). Thanks to Tyler for finding this bug. + +commit 685aad25353fb200de4ca97a8bc0feeebde51d0f +Author: Field G. Van Zee +Date: Tue Aug 6 12:25:51 2013 -0500 + + Fixed cpp guard typos in frame/compat/check files. + + Details: + - Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been + BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this. + - Fixed various syntax errors in the code that had yet to be compiled + due to the aforementioned bug. + +commit f4ec28e723d28d998f1038f82da6986e44320ef6 +Author: Field G. Van Zee +Date: Thu Aug 1 11:24:23 2013 -0500 + + Added basic OpenMP-based gemm and packm files. + + Details: + - Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2 + into the following auxiliary files + + frame/1m/packm/other/bli_packm_blk_var2.c + frame/3/gemm/other/bli_gemm_ker_var2.c + + The routine in the first file uses a basic OpenMP parallel region to + parallelize the packing of blocks of A and panels of B, while the + second uses a similar parallel region to parallelize along the n + dimension of the gemm macro-kernel. + +commit f8980edf9c318453bb1962ac4939c06bf11e6d5e +Merge: 67a8b94 6e7e452 +Author: Field G. Van Zee +Date: Fri Jul 26 11:14:27 2013 -0500 + + Merge branch 'master' of https://code.google.com/p/blis + +commit 67a8b9498d13b038deb316ac163e62c5b17da2ec +Author: Field G. Van Zee +Date: Fri Jul 26 11:12:37 2013 -0500 + + Added missing cpp kernel blocksize constraints. + + Details: + - Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce + constraints on the register blocksizes relative to the cache blocksizes. + Thanks to Tyler for helping me stumble across this issue. + +commit 6e7e452343014e8f86640874dc1dbadca4a642a1 +Author: Field G. Van Zee +Date: Mon Jul 22 14:50:57 2013 -0500 + + Fixed minor warnings and misc issues. + + Details: + - Fixed various warnings output by gcc 4.6.3-1, including removing some + set-but-not-used variables and addressing some instances of typecasting + of pointer types to integer types of different sizes. + +commit 03f6c3599743bc837a7d40eb5b415b1bf4f2a4e9 +Author: Field G. Van Zee +Date: Mon Jul 22 12:54:32 2013 -0500 + + Tightened some macros that detect datatypes. + + Details: + - Modified the definitions of some macros, such as bli_is_real(), so that + the "special" bit is taken into account so that BLIS_INT is differentiated + from BLIS_FLOAT. + - Whitespace changes to bli_obj_macro_defs.h. + - Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't + being used. + +commit b33e2f4443b9043b554963320280ff7783773652 +Author: Field G. Van Zee +Date: Fri Jul 19 17:15:03 2013 -0500 + + CHANGELOG update (for 0.0.9). + +commit 0680916fdd532f7a4716b11a2515243b2c08d00f (tag: 0.0.9) Author: Field G. Van Zee Date: Thu Jul 18 18:04:34 2013 -0500