Commit Graph

335 Commits

Author SHA1 Message Date
Marat Dukhan
4b8e71aab8 Use AR rcs flags for PNaCl target to avoid warning 2014-06-19 00:43:25 -07:00
Marat Dukhan
031deb2a5c PNaCl configuration: use pnacl-ar instead or ar (fixes build issue on Mac) 2014-06-18 03:11:34 -07:00
Marat Dukhan
68a02976e3 Compile pnacl configuration in GNU11 mode to avoid warning about non-standard features 2014-06-18 03:10:25 -07:00
Marat Dukhan
6f8462eb0e Fix inconsistent VERBOSE macro in Makefile 2014-06-18 03:08:46 -07:00
Marat Dukhan
b2ffb4de8b Reformatted PNaCl GEMM kernels 2014-06-15 18:41:30 -04:00
Marat Dukhan
6de2d472d9 CGEMM and ZGEMM kernels for PNaCl 2014-06-15 08:44:31 -04:00
Marat Dukhan
f064711a5e SGEMM and DGEMM kernels for PNaCl 2014-06-15 06:27:37 -04:00
Tyler Smith
ee2b679281 Only include omp.h if BLIS_ENABLE_OPENMP is set 2014-06-06 12:41:55 -05:00
Field G. Van Zee
19c05dfaac CHANGELOG update (for 0.1.2). 2014-06-05 10:54:16 -05:00
Tyler Smith
00f232f8ed Added single-precision micro-kernel for Knights Corner aka MIC aka Xeon Phi 0.1.2 2014-06-02 13:40:57 -05:00
Field G. Van Zee
3fc60e4914 Fixed ldim alignment bug in core2 gemm ukernel.
Details:
- Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in
  a segmentation fault if a column-stored matrix's starting address was
  aligned, but its leading dimension was such that its second column was
  unaligned. Basically, the micro-kernel was assuming that aligned load
  instructions were safe when they actually were not. An extra condition
  that checks the alignment of cs_c (ie: the leading dimension in the
  column storage case) has now been added. Thanks to Michael Lehn for
  reporting this bug.
2014-05-21 11:34:42 -05:00
Field G. Van Zee
77a2d8dac8 Merge pull request #8 from tlrmchlsmth/master
Added multithreading to most level-3 operations.
2014-05-20 09:53:19 -05:00
Tyler Smith
21fb089387 Reverting changes dunnington and reference configs
Now they are unchanged from the main branch of BLIS
2014-05-19 20:38:55 -07:00
Tyler Smith
8a0ef0e0db Fixed rounding error in bli_get_range_weighted 2014-05-16 13:44:14 -05:00
Tyler Smith
0b4b168033 Fixed bug with disabling JC loop threading for right sided trmm 2014-05-16 12:23:37 -05:00
Tyler Smith
5c048a90d8 Disabled parallelism for right-sided TRMM JC loop
The loop has dependent iterations.
2014-05-14 16:20:06 -05:00
Tyler Smith
13a4c717ed Fixed bug with bli_get_range_weighted 2014-05-14 14:59:04 -05:00
Tyler Smith
45957cc774 Allowed threading to be turned off
No longer requires OpenMP to compile
Define the following in bli_config.h in order to enable multithreading:
BLIS_ENABLE_MULTITHREADING
BLIS_ENABLE_OPENMP

Also fixes a bug with bli_get_range_weighted
2014-05-13 17:14:46 -05:00
Tyler Smith
bd1dc98ce5 Disabled multithreading of the kc loop 2014-05-12 17:26:19 -05:00
Tyler Smith
456df03721 Replaced register blocksize hack with querying the register blocksize for determining parallelism granularity 2014-04-30 12:28:00 -05:00
Tyler Smith
f4fdfe8fc5 Merge http://github.com/flame/blis 2014-04-30 11:46:35 -05:00
Field G. Van Zee
8c5d6071e2 Added _check() routines for fprint[mv], rand[mv].
Details:
- Added _check() routines for fprintm, fprintv, randm, and randv.
- Added invocations to the above routines from their respective
  front-ends.
2014-04-29 12:26:12 -05:00
Field G. Van Zee
262cdabcc8 Changed treatment of NULL object buffers.
Details:
- Relaxed the constraint in bli_obj_attach_buffer_check(), which required
  the buffer address being attached to be non-NULL. This is acceptable
  because the user was already able to create and use objects with NULL
  buffers (via bli_obj_create_without_buffer(), which initializes the
  buffer to NULL).
- Inserted calls to newly defined function, bli_check_object_buffer(),
  into nearly all operations' _check() or _int_check() functions. This
  allows BLIS to abort peacefully if a computational routine is called
  with an object containing a NULL buffer. By contrast, under such
  conditions, BLAS would typically fail with a segmentation fault.
- Within operation front-ends, moved the calls to _check()/_int_check()
  so that zero dimensions are checked first (and if found, execution
  returns with trivial or no computation). This resolves issue #7. Thanks
  to Jack Poulson for reporting this bug.
2014-04-28 16:48:25 -05:00
Tyler Smith
31bb065ba4 Merge http://github.com/flame/blis 2014-04-23 12:30:19 -05:00
Field G. Van Zee
7c61959955 Can now query register blocksizes from blk algs.
Details:
- Added a new field to blksz_t objects that allows one to attach a
  sub-object. Doing this allows us to associate a register blocksize with
  any given cache blocksize. That way, the register blocksize can be
  queried wherever the cache blocksize would normally be accessible
  (e.g. a blocked algorithm).
- Modified bli_gemm_cntl.c (and 4m/3m variants) so that the register
  blocksizes are attached to the cache blocksizes after they are created.
2014-04-10 17:18:36 -05:00
Field G. Van Zee
58671597d3 Minor cleanups to level-2 _cntl.c files.
Details:
- Changed level-2 _cntl.c files so that the blocksizes for gemv are
  imported and used, rather than blocksizes being declared locally.
- Whitespace changes to gemv_cntl.c and gemm_cntl.c files (as well as
  4m/3m variants).
- Removed test/old/test_blis2.c.
2014-04-10 15:35:30 -05:00
Tyler Michael Smith
20e24430a7 Some fixes for the bgq kernels 2014-04-08 17:50:44 +00:00
Tyler Smith
bde697f75e Add -openmp to ldflags as well 2014-04-04 16:43:44 -05:00
Tyler Smith
c332be8cd4 Added -openmp flag to Xeon Phi build for convenience 2014-04-04 16:37:50 -05:00
Tyler Smith
e7ca9e4b4a Used BLIS_DEFAULT_*_MR for rounding partitioning instead of BLIS_DEFAULT_*_MC 2014-04-04 16:31:15 -05:00
Tyler Smith
7b9b228c6f Fix for tree barrier freeing bug 2014-04-04 16:29:10 -05:00
Tyler Smith
5ec93bd9a7 Bunch of minor fixes
Removed barrier after unpackm in all level3 blocked variants
Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case)

Moved the enabling of the tree barriers into bli_config.h
Fed the default MR and NR for double precision into bli_get_range instead of the number 8
2014-04-04 15:09:10 -05:00
Tyler Smith
575fb9b0b0 Changed default blocking factor to default double precision MR and NR 2014-04-04 12:13:29 -05:00
Tyler Smith
ab9c788033 Added faster tree barriers necessary for performance for Xeon Phi
Fixed up some stuff in the thread info free functions
Disabled threading for TRSM so that it actually works when threading environment variables are set
2014-04-04 11:38:11 -05:00
Tyler Smith
ec58a7923c Freeing thread info paths.
Also made herk IC and JC loops do weighted partitioning
2014-04-04 10:22:48 -05:00
Tyler Smith
2b6848b239 Merge http://github.com/flame/blis
Conflicts:
	kernels/bgq/1/bli_axpyv_opt_var1.c
	kernels/bgq/1/bli_dotv_opt_var1.c
2014-04-04 09:54:54 -05:00
Tyler Michael Smith
4e3eb39aca Some fixes to the bgq config
MR and NR for double complex were wrong
Default fusing factor for double precision was wrong as well
2014-04-04 14:50:03 +00:00
Field G. Van Zee
21a0efb33d Fixed follow-up to issue #6. 2014-04-03 16:38:44 -05:00
Field G. Van Zee
c318157a9b Fixed issue #6 (incorrect 'restrict' usage).
Details:
- Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels.
  (However, there may be other instances of similar misuse elsewhere in
  BLIS.) Thanks to Jeff Hammond for reporting this issue.
2014-04-03 16:24:34 -05:00
Field G. Van Zee
b5150a1bf3 Added #include "arm_neon.h" to ARM gemm ukernel.
Details:
- Inserted #include "arm_neon.h" into gemm ukernel source file for
  arm/neon. Thanks to Jean-Michel Hautbois for suggesting this fix.
2014-04-03 12:25:45 -05:00
Tyler Smith
2041c26451 Added barriers needed prior to doing scalar reset for rank-k updates. 2014-04-03 10:30:03 -05:00
Field G. Van Zee
47a90e69df Attempted to fix uninitialized variable warnings.
Details:
- Added initialization statements to various macros used in level 1m and
  1m-like operations. I wasn't able to reproduce the reported behavior,
  so hopefully this takes care of it. Thanks to Jeff Hammond for the
  report.
2014-04-01 14:34:31 -05:00
Field G. Van Zee
d27b4f690c Use generic paths for toolchain in POWER7.
Details:
- Fixed issue #4. Thanks to Jeff Hammond for contributing changes.
2014-04-01 12:57:24 -05:00
Tyler Smith
1584ae1c83 Fixed race condition involving scalar reset 2014-03-28 15:15:48 -05:00
Tyler Smith
459dde4acc Made barrier after packing implicit.
This also fixed a bug where barriers in the blocked variants were inserted after the inner packing routines,
but not the outer packing routines.
This allowed, for instance, the block of B to not be finished being packed before computation to occur.
2014-03-27 17:06:45 -05:00
Tyler Smith
9f78ec6e7e Some fixes for the internal functions,
was innappropriately only having thread chief do some things.
2014-03-27 14:18:46 -05:00
Tyler Michael Smith
a6fd483454 Added test drivers for level 3 BLAS that run tests in parallel using MPI 2014-03-26 17:19:46 +00:00
Tyler Michael Smith
73b3db5948 Some fixes for the bgq configuration 2014-03-26 15:39:05 +00:00
Tyler Smith
f0824a04fc Initial commit to enable threading in TRSM,
Also enabled weighted partitioning for herk, trmm
Fixed bug where multiple threads would try to modify the same state in the internal level 3 functions
Correctly computed a_next and b_next for gemm, herk macrokernels
a_next and b_next point to the current micropanels in trmm
2014-03-24 15:21:42 -05:00
Tyler Smith
23d9eab354 Merge https://github.com/flame/blis 2014-03-20 16:54:35 -05:00