Commit Graph

223 Commits

Author SHA1 Message Date
Tyler Smith
5c048a90d8 Disabled parallelism for right-sided TRMM JC loop
The loop has dependent iterations.
2014-05-14 16:20:06 -05:00
Tyler Smith
13a4c717ed Fixed bug with bli_get_range_weighted 2014-05-14 14:59:04 -05:00
Tyler Smith
45957cc774 Allowed threading to be turned off
No longer requires OpenMP to compile
Define the following in bli_config.h in order to enable multithreading:
BLIS_ENABLE_MULTITHREADING
BLIS_ENABLE_OPENMP

Also fixes a bug with bli_get_range_weighted
2014-05-13 17:14:46 -05:00
Tyler Smith
bd1dc98ce5 Disabled multithreading of the kc loop 2014-05-12 17:26:19 -05:00
Tyler Smith
456df03721 Replaced register blocksize hack with querying the register blocksize for determining parallelism granularity 2014-04-30 12:28:00 -05:00
Tyler Smith
f4fdfe8fc5 Merge http://github.com/flame/blis 2014-04-30 11:46:35 -05:00
Field G. Van Zee
8c5d6071e2 Added _check() routines for fprint[mv], rand[mv].
Details:
- Added _check() routines for fprintm, fprintv, randm, and randv.
- Added invocations to the above routines from their respective
  front-ends.
2014-04-29 12:26:12 -05:00
Field G. Van Zee
262cdabcc8 Changed treatment of NULL object buffers.
Details:
- Relaxed the constraint in bli_obj_attach_buffer_check(), which required
  the buffer address being attached to be non-NULL. This is acceptable
  because the user was already able to create and use objects with NULL
  buffers (via bli_obj_create_without_buffer(), which initializes the
  buffer to NULL).
- Inserted calls to newly defined function, bli_check_object_buffer(),
  into nearly all operations' _check() or _int_check() functions. This
  allows BLIS to abort peacefully if a computational routine is called
  with an object containing a NULL buffer. By contrast, under such
  conditions, BLAS would typically fail with a segmentation fault.
- Within operation front-ends, moved the calls to _check()/_int_check()
  so that zero dimensions are checked first (and if found, execution
  returns with trivial or no computation). This resolves issue #7. Thanks
  to Jack Poulson for reporting this bug.
2014-04-28 16:48:25 -05:00
Tyler Smith
31bb065ba4 Merge http://github.com/flame/blis 2014-04-23 12:30:19 -05:00
Field G. Van Zee
7c61959955 Can now query register blocksizes from blk algs.
Details:
- Added a new field to blksz_t objects that allows one to attach a
  sub-object. Doing this allows us to associate a register blocksize with
  any given cache blocksize. That way, the register blocksize can be
  queried wherever the cache blocksize would normally be accessible
  (e.g. a blocked algorithm).
- Modified bli_gemm_cntl.c (and 4m/3m variants) so that the register
  blocksizes are attached to the cache blocksizes after they are created.
2014-04-10 17:18:36 -05:00
Field G. Van Zee
58671597d3 Minor cleanups to level-2 _cntl.c files.
Details:
- Changed level-2 _cntl.c files so that the blocksizes for gemv are
  imported and used, rather than blocksizes being declared locally.
- Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as
  4m/3m variants).
- Removed test/old/test_blis2.c.
2014-04-10 15:35:30 -05:00
Tyler Smith
e7ca9e4b4a Used BLIS_DEFAULT_*_MR for rounding partitioning instead of BLIS_DEFAULT_*_MC 2014-04-04 16:31:15 -05:00
Tyler Smith
7b9b228c6f Fix for tree barrier freeing bug 2014-04-04 16:29:10 -05:00
Tyler Smith
5ec93bd9a7 Bunch of minor fixes
Removed barrier after unpackm in all level3 blocked variants
Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case)

Moved the enabling of the tree barriers into bli_config.h
Fed the default MR and NR for double precision into bli_get_range instead of the number 8
2014-04-04 15:09:10 -05:00
Tyler Smith
575fb9b0b0 Changed default blocking factor to default double precision MR and NR 2014-04-04 12:13:29 -05:00
Tyler Smith
ab9c788033 Added faster tree barriers necessary for performance for Xeon Phi
Fixed up some stuff in the thread info free functions
Disabled threading for TRSM so that it actually works when threading environment variables are set
2014-04-04 11:38:11 -05:00
Tyler Smith
ec58a7923c Freeing thread info paths.
Also made herk IC and JC loops do weighted partitioning
2014-04-04 10:22:48 -05:00
Tyler Smith
2b6848b239 Merge http://github.com/flame/blis
Conflicts:
	kernels/bgq/1/bli_axpyv_opt_var1.c
	kernels/bgq/1/bli_dotv_opt_var1.c
2014-04-04 09:54:54 -05:00
Tyler Smith
2041c26451 Added barriers needed prior to doing scalar reset for rank-k updates. 2014-04-03 10:30:03 -05:00
Field G. Van Zee
47a90e69df Attempted to fix uninitialized variable warnings.
Details:
- Added initialization statements to various macros used in level 1m and
  1m-like operations. I wasn't able to reproduce the reported behavior,
  so hopefully this takes care of it. Thanks to Jeff Hammond for the
  report.
2014-04-01 14:34:31 -05:00
Tyler Smith
1584ae1c83 Fixed race condition involving scalar reset 2014-03-28 15:15:48 -05:00
Tyler Smith
459dde4acc Made barrier after packing implicit.
This also fixed a bug where barriers in the blocked variants were inserted after the inner packing routines,
but not the outer packing routines.
This allowed, for instance, the block of B to not be finished being packed before computation to occur.
2014-03-27 17:06:45 -05:00
Tyler Smith
9f78ec6e7e Some fixes for the internal functions,
was innappropriately only having thread chief do some things.
2014-03-27 14:18:46 -05:00
Tyler Smith
f0824a04fc Initial commit to enable threading in TRSM,
Also enabled weighted partitioning for herk, trmm
Fixed bug where multiple threads would try to modify the same state in the internal level 3 functions
Correctly computed a_next and b_next for gemm, herk macrokernels
a_next and b_next point to the current micropanels in trmm
2014-03-24 15:21:42 -05:00
Tyler Smith
23d9eab354 Merge https://github.com/flame/blis 2014-03-20 16:54:35 -05:00
Tyler Smith
5d5dc2eede Parallelized trmm and trmm3
Also fixed bugs in packm
2014-03-20 16:43:36 -05:00
Field G. Van Zee
fd3e32a5f4 Refined INSERT_GENTFUNC macro usage.
Details:
- Defined new INSERT_GENTFUNC macros so that the macro always takes
  exactly the number of arguments needed for the particular operation or
  variant being defined. Many operations were using INSERT_GENTFUNC
  macros that expected one auxiliary argument even though none were
  needed. Those instances have now been updated. Most of these instances
  were in the level-0 and -1v operations, as well as some operations
  defined in frame/util.
2014-03-20 13:59:48 -05:00
Field G. Van Zee
9b0e715f29 Minor simplifications to trmm, trsm macro-kernels.
Details:
- Simplified some code that would have allowed the diagonal of a trmm
  or trsm triangular matrix to intersect the short end of a micro-panel.
  This is disallowed via higher-level constraints on cache blocksizes, so
  this code was never needed and only served to obfuscate.
- Updated some comments in trmm, trsm macro-kernels.
2014-03-19 15:47:54 -05:00
Field G. Van Zee
a3902750b9 Reorganized norm operations.
Details:
- Completely reoganized norm operations:
  - Renames:
    - fnormsc, fnormv, fnormm -> normfsc, normfv, normfm (2-norm)
    - absumv -> norm1v (vector 1-norm)
  - New operations:
    - norm1m (matrix 1-norm)
    - normiv, normim (infinity-norm)
    - amaxv (BLAS-like absolute maximum value index)
    - asumv (BLAS-like absolute sum)
- Deprecated absumm, as it did not correspond to any actual norm.
  (However, an inlined version now exists in the testsuite module for
  randm.)
2014-03-19 12:35:17 -05:00
Tyler Smith
c0140cb752 Fixed packm variants 3 and 4 where every thread was trying to manipulate the same state
Now just performed by the master thread.
2014-03-19 11:21:16 -05:00
Tyler Smith
fb42983bd9 Fixed a barrier bug and a thread decorator bug 2014-03-18 16:37:28 -05:00
Tyler Smith
aa2405f8b2 Fixing function pointer issues with thread decorator 2014-03-18 15:23:09 -05:00
Tyler Smith
ec8b88f935 Enabled threading for packm blocked variants 3 and 4 2014-03-18 14:35:37 -05:00
Tyler Smith
0ac534cdf6 Added decorator for calling parallelized intermal functions
Will allow for easy support for different threading models
2014-03-18 13:26:27 -05:00
Tyler Smith
5296f58975 Fixing some bugs with herk parallelization 2014-03-17 17:15:35 -05:00
Tyler Smith
c51d011083 Initial multithreading support for HERK 2014-03-17 15:00:47 -05:00
Tyler Smith
c720b14156 Switched to using environment variables to control threading.
The environment variables all follow the format BLIS_X_NT,
where X is the index of the loop as described in our paper
Anatomy of High Performance Many-Threaded Matrix Multiplication.
These indices are IR, JR, IC, KC, and JC.

Also enabled parallelism for hemm and symm, but these are currently untested.
2014-03-17 11:39:32 -05:00
Tyler Smith
92233cf642 Some fixes to gemm thread info tree creation,
Changed microkernel tests to use the new BLIS_PACKM_SINGLE_THREADED
instead of BLIS_SINGLE_THREADED
2014-03-11 14:16:08 -05:00
Tyler Smith
020f80c302 Added files specific to threading for gemm and packm operations 2014-03-11 12:08:17 -05:00
Tyler Smith
8d8f4352a4 Added single threaded thread info data structures specifically for gemm and packm 2014-03-10 15:47:28 -05:00
Tyler Smith
0e86777611 Merge branch 'master' of https://github.com/tlrmchlsmth/blis 2014-03-10 15:16:21 -05:00
Tyler Smith
2e727a025a Modifying the thread info data structures
This change makes each operation have its own thread info type,
allowing more fine control of threading in operations that have different types of suboperations
2014-03-10 15:14:33 -05:00
Field G. Van Zee
a770590cf2 Minor fixes to sumsqv, abmaxv.
Details:
- Minor update to bli_sumsqv_unb_var1() to bring it up-to-date with
  LAPACK 3.5.0's zlassq.f, which, starting with 3.4.2, returns NaN when
  the vector (or matrix) contains a NaN.
- Minor change to bli_abmaxv_unb_var1() to more closely mimic the
  behavior of netlib BLAS's izamax(). There, a "less than or equal to"
  operator is used in the search instead of "less than", which would
  change the element index returned if there were multiple maximum values.
- Added macro function definitions for bli_isinf() and bli_isnan(), which
  are currently implemented in terms of isinf() and isnan() from math.h.
2014-03-05 09:23:46 -06:00
Tyler Smith
b3bff631ea Merge https://github.com/flame/blis 2014-02-27 16:53:24 -06:00
Tyler Smith
2c158fb885 Merge https://github.com/flame/blis
Conflicts:
	frame/1m/packm/bli_packm_blk_var1.c
2014-02-27 16:46:23 -06:00
Field G. Van Zee
e8757b03a7 Use "%ld" as int format specifier in fprintm.
Details:
- Changed "%d" to "%ld" when printing integers via bli_fprintm().
- Meant to include this in previous commit.
2014-02-27 16:40:07 -06:00
Field G. Van Zee
c663ce3b51 Fixed various bugs when C99 complex is enabled.
Details:
- Fixed various bugs in packm_*_cxk(), the 4m/3m micro-kernels, and
  elsewhere in the framework that were not yet set up to work properly
  when BLIS_ENABLE_C99_COMPLEX is defined in bli_config.h
- Extensive changes to f2c-derived files in frame/compat/f2c to allow
  C99 complex storage. Most of these changes center around accessing
  real and imaginary components via bli_?real()/bli_?imag() accessor
  macros, and setting of values via bli_?sets() assignment macros.
  (Thanks to Vladimir Sukarev for pointing out that _ENABLE_C99_COMPLEX
  was broken.)
2014-02-27 16:32:57 -06:00
Tyler Smith
e4738c48e0 Added support for parallelism in gemm micro-kernel 2014-02-27 16:29:46 -06:00
Tyler Smith
bfe214b633 Fixed bug with parallel packing, and bug with allocating an array of thread infos
In packm variant 1, the variable p_begin was incremented each iteration, causing a dependency.
This dependeny was removed, allowing each iteration to be executed in parallel.

Somewhere in bli_threading.c, I was allocating an array of pointers instead of an array of structs.
2014-02-27 15:53:10 -06:00
Tyler Smith
6193d9ceea Fixed bug in thread trees 2014-02-27 14:09:19 -06:00