diff --git a/CHANGELOG b/CHANGELOG index 14a619026..eb33b31ef 100644 --- a/CHANGELOG +++ b/CHANGELOG @@ -1,4 +1,632 @@ -commit fde5f1fdece19881f50b142e8611b772a647e6d2 (HEAD, tag: 0.1.1, origin/master, origin/HEAD, master) +commit 00f232f8ed1f7c41619b12ebf779ebe2c3b2d3cd (HEAD, tag: 0.1.2, origin/master, master) +Author: Tyler Smith +Date: Mon Jun 2 13:40:57 2014 -0500 + + Added single-precision micro-kernel for Knights Corner aka MIC aka Xeon Phi + +commit 3fc60e491426f6248c0feae88d971e4d1f88fb95 +Author: Field G. Van Zee +Date: Wed May 21 11:34:42 2014 -0500 + + Fixed ldim alignment bug in core2 gemm ukernel. + + Details: + - Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in + a segmentation fault if a column-stored matrix's starting address was + aligned, but its leading dimension was such that its second column was + unaligned. Basically, the micro-kernel was assuming that aligned load + instructions were safe when they actually were not. An extra condition + that checks the alignment of cs_c (ie: the leading dimension in the + column storage case) has now been added. Thanks to Michael Lehn for + reporting this bug. + +commit 77a2d8dac8b242d7a202c9aabda3927ab68cf987 +Merge: 8c5d607 21fb089 +Author: Field G. Van Zee +Date: Tue May 20 09:53:19 2014 -0500 + + Merge pull request #8 from tlrmchlsmth/master + + Added multithreading to most level-3 operations. + +commit 21fb089387ee7c87f6dc53b0f60f68b48d3ff3e8 +Author: Tyler Smith +Date: Mon May 19 20:38:55 2014 -0700 + + Reverting changes dunnington and reference configs + + Now they are unchanged from the main branch of BLIS + +commit 8a0ef0e0db5880730425926f8ba56b457a2ba764 +Author: Tyler Smith +Date: Fri May 16 13:44:14 2014 -0500 + + Fixed rounding error in bli_get_range_weighted + +commit 0b4b1680334528b1b60bc696537600f763198e92 +Author: Tyler Smith +Date: Fri May 16 12:23:37 2014 -0500 + + Fixed bug with disabling JC loop threading for right sided trmm + +commit 5c048a90d8dfa1dbde4e45fbc10ffcbdfe59d960 +Author: Tyler Smith +Date: Wed May 14 16:20:06 2014 -0500 + + Disabled parallelism for right-sided TRMM JC loop + + The loop has dependent iterations. + +commit 13a4c717ed0e273359dbaf5554cc4fa70b087d71 +Author: Tyler Smith +Date: Wed May 14 14:59:04 2014 -0500 + + Fixed bug with bli_get_range_weighted + +commit 45957cc7745e9bb1698408d72f53ef192e960820 +Author: Tyler Smith +Date: Tue May 13 17:14:46 2014 -0500 + + Allowed threading to be turned off + + No longer requires OpenMP to compile + Define the following in bli_config.h in order to enable multithreading: + BLIS_ENABLE_MULTITHREADING + BLIS_ENABLE_OPENMP + + Also fixes a bug with bli_get_range_weighted + +commit bd1dc98ce599d74513a553fe3b37a2ebca1c3812 +Author: Tyler Smith +Date: Mon May 12 17:26:19 2014 -0500 + + Disabled multithreading of the kc loop + +commit 456df0372170bd7ca2c7e2d85365a69f1f04de88 +Author: Tyler Smith +Date: Wed Apr 30 12:28:00 2014 -0500 + + Replaced register blocksize hack with querying the register blocksize for determining parallelism granularity + +commit f4fdfe8fc573553eb36795b79cdf681270dab71b +Merge: 31bb065 8c5d607 +Author: Tyler Smith +Date: Wed Apr 30 11:46:35 2014 -0500 + + Merge http://github.com/flame/blis + +commit 8c5d6071e24ba10a53669390a47287e86ff354ce +Author: Field G. Van Zee +Date: Tue Apr 29 12:26:12 2014 -0500 + + Added _check() routines for fprint[mv], rand[mv]. + + Details: + - Added _check() routines for fprintm, fprintv, randm, and randv. + - Added invocations to the above routines from their respective + front-ends. + +commit 262cdabcc885bcf6636f4d8bb7d320f95e81d820 +Author: Field G. Van Zee +Date: Mon Apr 28 16:48:25 2014 -0500 + + Changed treatment of NULL object buffers. + + Details: + - Relaxed the constraint in bli_obj_attach_buffer_check(), which required + the buffer address being attached to be non-NULL. This is acceptable + because the user was already able to create and use objects with NULL + buffers (via bli_obj_create_without_buffer(), which initializes the + buffer to NULL). + - Inserted calls to newly defined function, bli_check_object_buffer(), + into nearly all operations' _check() or _int_check() functions. This + allows BLIS to abort peacefully if a computational routine is called + with an object containing a NULL buffer. By contrast, under such + conditions, BLAS would typically fail with a segmentation fault. + - Within operation front-ends, moved the calls to _check()/_int_check() + so that zero dimensions are checked first (and if found, execution + returns with trivial or no computation). This resolves issue #7. Thanks + to Jack Poulson for reporting this bug. + +commit 31bb065ba40ae0c5a614e743b8025abca012b99e +Merge: 20e2443 7c61959 +Author: Tyler Smith +Date: Wed Apr 23 12:30:19 2014 -0500 + + Merge http://github.com/flame/blis + +commit 7c61959955c8ba78160d0ed4d1979022029d963b +Author: Field G. Van Zee +Date: Thu Apr 10 17:18:36 2014 -0500 + + Can now query register blocksizes from blk algs. + + Details: + - Added a new field to blksz_t objects that allows one to attach a + sub-object. Doing this allows us to associate a register blocksize with + any given cache blocksize. That way, the register blocksize can be + queried wherever the cache blocksize would normally be accessible + (e.g. a blocked algorithm). + - Modified bli_gemm_cntl.c (and 4m/3m variants) so that the register + blocksizes are attached to the cache blocksizes after they are created. + +commit 58671597d3d450817b2eda576c05ed6dadd8af6d +Author: Field G. Van Zee +Date: Thu Apr 10 15:35:30 2014 -0500 + + Minor cleanups to level-2 _cntl.c files. + + Details: + - Changed level-2 _cntl.c files so that the blocksizes for gemv are + imported and used, rather than blocksizes being declared locally. + - Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as + 4m/3m variants). + - Removed test/old/test_blis2.c. + +commit 20e24430a772bc0fbaf24dec2f8c544096fd3f4e +Author: Tyler Michael Smith +Date: Tue Apr 8 17:50:44 2014 +0000 + + Some fixes for the bgq kernels + +commit bde697f75ec1e7f2decebee0c9bd620b4c134cd5 +Author: Tyler Smith +Date: Fri Apr 4 16:43:44 2014 -0500 + + Add -openmp to ldflags as well + +commit c332be8cd471eeace7b4fa4ae7443088b6a68ec3 +Author: Tyler Smith +Date: Fri Apr 4 16:37:50 2014 -0500 + + Added -openmp flag to Xeon Phi build for convenience + +commit e7ca9e4b4a24d585c9aec8293fc7bb79e4171ad0 +Author: Tyler Smith +Date: Fri Apr 4 16:31:15 2014 -0500 + + Used BLIS_DEFAULT_*_MR for rounding partitioning instead of BLIS_DEFAULT_*_MC + +commit 7b9b228c6fa4cfb70b1ebb855b009a036e85fac3 +Author: Tyler Smith +Date: Fri Apr 4 16:29:10 2014 -0500 + + Fix for tree barrier freeing bug + +commit 5ec93bd9a76096312d51c326ccde1e9bd0a436ab +Author: Tyler Smith +Date: Fri Apr 4 15:09:10 2014 -0500 + + Bunch of minor fixes + + Removed barrier after unpackm in all level3 blocked variants + Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case) + + Moved the enabling of the tree barriers into bli_config.h + Fed the default MR and NR for double precision into bli_get_range instead of the number 8 + +commit 575fb9b0b08f3bdb56ccde056da619d1585617c1 +Author: Tyler Smith +Date: Fri Apr 4 12:13:29 2014 -0500 + + Changed default blocking factor to default double precision MR and NR + +commit ab9c7880335c281432d5809fe0dec46753d22569 +Author: Tyler Smith +Date: Fri Apr 4 11:38:11 2014 -0500 + + Added faster tree barriers necessary for performance for Xeon Phi + + Fixed up some stuff in the thread info free functions + Disabled threading for TRSM so that it actually works when threading environment variables are set + +commit ec58a7923cccac08632670caadf3cf6ff5dce766 +Author: Tyler Smith +Date: Fri Apr 4 10:22:48 2014 -0500 + + Freeing thread info paths. + + Also made herk IC and JC loops do weighted partitioning + +commit 2b6848b2397d6d84ca4e5f792fc51ad05e351a36 +Merge: 4e3eb39 21a0efb +Author: Tyler Smith +Date: Fri Apr 4 09:54:54 2014 -0500 + + Merge http://github.com/flame/blis + + Conflicts: + kernels/bgq/1/bli_axpyv_opt_var1.c + kernels/bgq/1/bli_dotv_opt_var1.c + +commit 4e3eb39aca4df0b9fdc003d468f368a2f2ba597d +Author: Tyler Michael Smith +Date: Fri Apr 4 14:50:03 2014 +0000 + + Some fixes to the bgq config + MR and NR for double complex were wrong + Default fusing factor for double precision was wrong as well + +commit 21a0efb33d7435139e9c43c1a4787a6bff533e26 +Author: Field G. Van Zee +Date: Thu Apr 3 16:38:44 2014 -0500 + + Fixed follow-up to issue #6. + +commit c318157a9bee8ea6e59be16f99f65d9271fe0d27 +Author: Field G. Van Zee +Date: Thu Apr 3 16:24:34 2014 -0500 + + Fixed issue #6 (incorrect 'restrict' usage). + + Details: + - Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels. + (However, there may be other instances of similar misuse elsewhere in + BLIS.) Thanks to Jeff Hammond for reporting this issue. + +commit b5150a1bf3bd89598e2b3aeac110eb5b44ac6c12 +Author: Field G. Van Zee +Date: Thu Apr 3 12:25:45 2014 -0500 + + Added #include "arm_neon.h" to ARM gemm ukernel. + + Details: + - Inserted #include "arm_neon.h" into gemm ukernel source file for + arm/neon. Thanks to Jean-Michel Hautbois for suggesting this fix. + +commit 2041c264517b6c590fd4f7e8253e6911b622d1c3 +Author: Tyler Smith +Date: Thu Apr 3 10:30:03 2014 -0500 + + Added barriers needed prior to doing scalar reset for rank-k updates. + +commit 47a90e69dfde3f4f8fdf90654248a6b499fbadbc +Author: Field G. Van Zee +Date: Tue Apr 1 14:34:31 2014 -0500 + + Attempted to fix uninitialized variable warnings. + + Details: + - Added initialization statements to various macros used in level 1m and + 1m-like operations. I wasn't able to reproduce the reported behavior, + so hopefully this takes care of it. Thanks to Jeff Hammond for the + report. + +commit d27b4f690c14b1f836f8c7a3c0e91e09d852f02e +Author: Field G. Van Zee +Date: Tue Apr 1 12:57:24 2014 -0500 + + Use generic paths for toolchain in POWER7. + + Details: + - Fixed issue #4. Thanks to Jeff Hammond for contributing changes. + +commit 1584ae1c83c3a8c1af76acb46404747507650f19 +Author: Tyler Smith +Date: Fri Mar 28 15:15:48 2014 -0500 + + Fixed race condition involving scalar reset + +commit 459dde4acc09e49380da58fb7b246db488884ad9 +Author: Tyler Smith +Date: Thu Mar 27 17:06:45 2014 -0500 + + Made barrier after packing implicit. + + This also fixed a bug where barriers in the blocked variants were inserted after the inner packing routines, + but not the outer packing routines. + This allowed, for instance, the block of B to not be finished being packed before computation to occur. + +commit 9f78ec6e7e95fcad89a167b27cad7e2d74b6d122 +Author: Tyler Smith +Date: Thu Mar 27 14:18:46 2014 -0500 + + Some fixes for the internal functions, + was innappropriately only having thread chief do some things. + +commit a6fd48345424e097f71652be013aa897e098b41e +Author: Tyler Michael Smith +Date: Wed Mar 26 17:19:46 2014 +0000 + + Added test drivers for level 3 BLAS that run tests in parallel using MPI + +commit 73b3db594864be0f9be9a0eb29bf961fa9c95f29 +Author: Tyler Michael Smith +Date: Wed Mar 26 15:39:05 2014 +0000 + + Some fixes for the bgq configuration + +commit f0824a04fc75e231c3a3d7757fa4e7294173282f +Author: Tyler Smith +Date: Mon Mar 24 15:21:42 2014 -0500 + + Initial commit to enable threading in TRSM, + + Also enabled weighted partitioning for herk, trmm + Fixed bug where multiple threads would try to modify the same state in the internal level 3 functions + Correctly computed a_next and b_next for gemm, herk macrokernels + a_next and b_next point to the current micropanels in trmm + +commit 23d9eab354fbc88165889832955e126772bf8488 +Merge: 5d5dc2e fd3e32a +Author: Tyler Smith +Date: Thu Mar 20 16:54:35 2014 -0500 + + Merge https://github.com/flame/blis + +commit 5d5dc2eedef2f7c90d61371a1b457be5c06cf583 +Author: Tyler Smith +Date: Thu Mar 20 16:43:36 2014 -0500 + + Parallelized trmm and trmm3 + + Also fixed bugs in packm + +commit fd3e32a5f419fa412f46afe4dd1c3a26e15f3eb4 +Author: Field G. Van Zee +Date: Thu Mar 20 13:59:48 2014 -0500 + + Refined INSERT_GENTFUNC macro usage. + + Details: + - Defined new INSERT_GENTFUNC macros so that the macro always takes + exactly the number of arguments needed for the particular operation or + variant being defined. Many operations were using INSERT_GENTFUNC + macros that expected one auxiliary argument even though none were + needed. Those instances have now been updated. Most of these instances + were in the level-0 and -1v operations, as well as some operations + defined in frame/util. + +commit 9b0e715f29338a1a1d6445907d2445c35f011121 +Author: Field G. Van Zee +Date: Wed Mar 19 15:47:54 2014 -0500 + + Minor simplifications to trmm, trsm macro-kernels. + + Details: + - Simplified some code that would have allowed the diagonal of a trmm + or trsm triangular matrix to intersect the short end of a micro-panel. + This is disallowed via higher-level constraints on cache blocksizes, so + this code was never needed and only served to obfuscate. + - Updated some comments in trmm, trsm macro-kernels. + +commit a3902750b9ab4923433f7e353f3669c3c419f8e4 +Author: Field G. Van Zee +Date: Wed Mar 19 12:35:17 2014 -0500 + + Reorganized norm operations. + + Details: + - Completely reoganized norm operations: + - Renames: + - fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm) + - absumv -> norm1v (vector 1-norm) + - New operations: + - norm1m (matrix 1-norm) + - normiv, normim (infinity-norm) + - amaxv (BLAS-like absolute maximum value index) + - asumv (BLAS-like absolute sum) + - Deprecated absumm, as it did not correspond to any actual norm. + (However, an inlined version now exists in the testsuite module for + randm.) + +commit c0140cb752f27e99742f85d23be2181c00a1335e +Author: Tyler Smith +Date: Wed Mar 19 11:21:16 2014 -0500 + + Fixed packm variants 3 and 4 where every thread was trying to manipulate the same state + + Now just performed by the master thread. + +commit fb42983bd9943711baa7d1c6496de1215bb816ef +Author: Tyler Smith +Date: Tue Mar 18 16:37:28 2014 -0500 + + Fixed a barrier bug and a thread decorator bug + +commit aa2405f8b23d0f8d2ec04790882f2176ef2e8fd8 +Author: Tyler Smith +Date: Tue Mar 18 15:23:09 2014 -0500 + + Fixing function pointer issues with thread decorator + +commit ec8b88f93533942d3711191873310e7ff281bda6 +Author: Tyler Smith +Date: Tue Mar 18 14:35:37 2014 -0500 + + Enabled threading for packm blocked variants 3 and 4 + +commit 0ac534cdf657bbf04601abfe719ba2887aab5da7 +Author: Tyler Smith +Date: Tue Mar 18 13:26:27 2014 -0500 + + Added decorator for calling parallelized intermal functions + + Will allow for easy support for different threading models + +commit 5296f58975f7d351f88909cc80b6d0cffd73def7 +Author: Tyler Smith +Date: Mon Mar 17 17:15:35 2014 -0500 + + Fixing some bugs with herk parallelization + +commit c51d0110831eb89361b4720bf7ed75edbd26ebce +Author: Tyler Smith +Date: Mon Mar 17 15:00:47 2014 -0500 + + Initial multithreading support for HERK + +commit c720b141568d1f289146bf34ded08001f2c0dfbb +Author: Tyler Smith +Date: Mon Mar 17 11:39:32 2014 -0500 + + Switched to using environment variables to control threading. + + The environment variables all follow the format BLIS_X_NT, + where X is the index of the loop as described in our paper + Anatomy of High Performance Many-Threaded Matrix Multiplication. + These indices are IR, JR, IC, KC, and JC. + + Also enabled parallelism for hemm and symm, but these are currently untested. + +commit 92233cf64274b27b2217c5cfffe75443ff6137a4 +Author: Tyler Smith +Date: Tue Mar 11 14:16:08 2014 -0500 + + Some fixes to gemm thread info tree creation, + Changed microkernel tests to use the new BLIS_PACKM_SINGLE_THREADED + instead of BLIS_SINGLE_THREADED + +commit 020f80c30289d8bcaa688bf600b01fae9b23b54f +Author: Tyler Smith +Date: Tue Mar 11 12:08:17 2014 -0500 + + Added files specific to threading for gemm and packm operations + +commit 8d8f4352a41926bc923e47be836365b6b726aff2 +Author: Tyler Smith +Date: Mon Mar 10 15:47:28 2014 -0500 + + Added single threaded thread info data structures specifically for gemm and packm + +commit 0e8677761175189583ca7d855e24b2bbdd2dada8 +Merge: 2e727a0 b3bff63 +Author: Tyler Smith +Date: Mon Mar 10 15:16:21 2014 -0500 + + Merge branch 'master' of https://github.com/tlrmchlsmth/blis + +commit 2e727a025a8f796d2b6bd14f489d0ee72e7d1fc7 +Author: Tyler Smith +Date: Mon Mar 10 15:14:33 2014 -0500 + + Modifying the thread info data structures + + This change makes each operation have its own thread info type, + allowing more fine control of threading in operations that have different types of suboperations + +commit a770590cf21a459f04bf941c58ee2afd272cc441 +Author: Field G. Van Zee +Date: Mon Mar 3 14:31:44 2014 -0600 + + Minor fixes to sumsqv, abmaxv. + + Details: + - Minor update to bli_sumsqv_unb_var1() to bring it up-to-date with + LAPACK 3.5.0's zlassq.f, which, starting with 3.4.2, returns NaN when + the vector (or matrix) contains a NaN. + - Minor change to bli_abmaxv_unb_var1() to more closely mimic the + behavior of netlib BLAS's izamax(). There, a "less than or equal to" + operator is used in the search instead of "less than", which would + change the element index returned if there were multiple maximum values. + - Added macro function definitions for bli_isinf() and bli_isnan(), which + are currently implemented in terms of isinf() and isnan() from math.h. + +commit b3bff631eadf98b15cb422fb4a8e2f855c23e8a7 +Merge: 2c158fb e8757b0 +Author: Tyler Smith +Date: Thu Feb 27 16:53:24 2014 -0600 + + Merge https://github.com/flame/blis + +commit 2c158fb885c27f7b599dc1e85b57edd684f19223 +Merge: e4738c4 c2b2ab6 +Author: Tyler Smith +Date: Thu Feb 27 16:46:23 2014 -0600 + + Merge https://github.com/flame/blis + + Conflicts: + frame/1m/packm/bli_packm_blk_var1.c + +commit e8757b03a74f9891632242e9a90efb32150826f5 +Author: Field G. Van Zee +Date: Thu Feb 27 16:40:07 2014 -0600 + + Use "%ld" as int format specifier in fprintm. + + Details: + - Changed "%d" to "%ld" when printing integers via bli_fprintm(). + - Meant to include this in previous commit. + +commit c663ce3b5170fee7dfb5b528b650d70c8e932cac +Author: Field G. Van Zee +Date: Thu Feb 27 16:32:57 2014 -0600 + + Fixed various bugs when C99 complex is enabled. + + Details: + - Fixed various bugs in packm_*_cxk(), the 4m/3m micro-kernels, and + elsewhere in the framework that were not yet set up to work properly + when BLIS_ENABLE_C99_COMPLEX is defined in bli_config.h + - Extensive changes to f2c-derived files in frame/compat/f2c to allow + C99 complex storage. Most of these changes center around accessing + real and imaginary components via bli_?real()/bli_?imag() accessor + macros, and setting of values via bli_?sets() assignment macros. + (Thanks to Vladimir Sukarev for pointing out that _ENABLE_C99_COMPLEX + was broken.) + +commit e4738c48e00b89391d9baa1fd0aa62d1ea2f95e6 +Author: Tyler Smith +Date: Thu Feb 27 16:29:46 2014 -0600 + + Added support for parallelism in gemm micro-kernel + +commit bfe214b633765ed40b57b330fbb84c332663aa40 +Author: Tyler Smith +Date: Thu Feb 27 15:53:10 2014 -0600 + + Fixed bug with parallel packing, and bug with allocating an array of thread infos + + In packm variant 1, the variable p_begin was incremented each iteration, causing a dependency. + This dependeny was removed, allowing each iteration to be executed in parallel. + + Somewhere in bli_threading.c, I was allocating an array of pointers instead of an array of structs. + +commit 6193d9ceea552e67170dba45abde04c64271c705 +Author: Tyler Smith +Date: Thu Feb 27 14:09:19 2014 -0600 + + Fixed bug in thread trees + +commit ac5a2de1d17ffd460b00fee9757898525a09abae +Merge: 01b125e bd3c7ec +Author: Tyler Smith +Date: Thu Feb 27 11:59:33 2014 -0600 + + Merge branch 'master' of https://github.com/tlrmchlsmth/blis + +commit 01b125e815f19410e8e0611d088b84570e499e93 +Author: Tyler Smith +Date: Thu Feb 27 11:55:45 2014 -0600 + + First pass at adding parallelism to BLIS. + + Added a multithreading infrastructure that should be independent of multithreading implementation in the future. + Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized. + +commit c2b2ab62707e4174892aff3ce65f36f54878fae5 +Author: Field G. Van Zee +Date: Wed Feb 26 12:46:45 2014 -0600 + + Deprecated panel stride alignment in bli_config.h. + + Details: + - Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all + configurations. It was already going unused in packm_init() since the + recent 4m/3m commit. This setting was rarely, if ever, useful, and its + existence only posed a potential risk for 4m/3m-based implementations. + - Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h. + - Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template + micro-kernels. + +commit f18aee83a5ac1b14808686fc3c5a3c846a1d99b9 +Author: Field G. Van Zee +Date: Tue Feb 25 17:58:42 2014 -0600 + + CHANGELOG update (for 0.1.1). + +commit fde5f1fdece19881f50b142e8611b772a647e6d2 (tag: 0.1.1) Author: Field G. Van Zee Date: Tue Feb 25 13:34:56 2014 -0600