Commit Graph

406 Commits

Author SHA1 Message Date
Field G. Van Zee
4cc2b464f2 Reorganized packm ukernels.
Details:
- Previously, packm micro-kernels were organized by the implied register
  blocksize (panel dimension) assumed by the kernel, meaning conventional,
  ri, and ri3 variations of some micro-kernel size were housed in the same
  file. This commit reorganizes the micro-kernels so that all sizes reside
  in the same file for each format type (conventional, ri, and ri3).
2014-08-15 11:49:15 -05:00
Field G. Van Zee
fcc10054a1 Tweaks to gemm4m, gemm3m virtual ukernels.
Details:
- Fixed a potential, but as-yet unobserved bug in gemm3m that would
  allow undesirable inf/NaN propogation, since C was being scaled by
  beta even if it was equal to zero.
- In gemm3m micro-kernel, we now avoid copying C to the temporary
  micro-tile if beta is zero.
- Rearranged computation in gemm4m so that the temporary C micro-tile
  is accessed less, and C is accessed only after the micro-kernel
  calls. This improves performance marginally in most situations.
- Comment updates to both gemm4m and gemm3m micro-kernels.
2014-08-13 12:32:06 -05:00
Field G. Van Zee
cdcbacc2fa Removed redundant redef of packm ukr prototypes.
Details:
- Removed redundant macro code that redefined packm ukernel prototypes
  when the previous macro was already sufficient. This helps de-clutter
  the packm ukernel prototyping headers a little bit.
2014-08-12 12:45:38 -05:00
Field G. Van Zee
82dac98d90 Relocated packm ukernel #includes.
Details:
- Consolidated the #include statements for packm ukernel headers from
  bli_packm_cxk.h, bli_packm_cxk_ri.h, and bli_packm_cxk_ri3.h to
  bli_packm.h.
- Comment/whitespace updates to bli_packm_blk_var3.c, _var4.c.
2014-08-12 12:36:25 -05:00
Field G. Van Zee
7f77856e25 Removed unused 4m/3m-related packm macro defs.
Details:
- Removed unused and unneeded s- and d-flavored macro definitions for
  packm ukernels related to the complex 4m and 3m methods, as
  implemented in BLIS.
2014-08-12 12:20:15 -05:00
Field G. Van Zee
bc1d86b2d4 Sandy Bridge configuration, micro-kernel update.
Details:
- Minor updates to bli_config and bli_kernel.h for sandybridge
  configuration.
- Renamed existing AVX intrinsic-based micro-kernel file to
  bli_gemm_int_d8x4.c.
- Added new file, bli_gemm_asm_d8x4.c, which provides assembly-based
  gemm micro-kernels for single- and double-precision real.
2014-08-07 19:01:20 -05:00
Field G. Van Zee
98ec95877a Corrected comment for _obj_is_[row|col]_stored().
Details:
- Fixed a mistake in the comments introduced in the previous commit for
  bli_obj_is_row_stored() and bli_obj_is_col_stored().
2014-08-07 18:28:32 -05:00
Field G. Van Zee
43d5e419e1 Reverted _obj_is_[row|col]_stored() macros.
Details:
- Rolled back recent changes to bli_obj_is_row_stored() and
  bli_obj_is_col_stored() so that those macros now only inspect the
  strides (row or column). It turns out that the more sophisticated
  definitions introduced in a51e32e are not necessary, because these
  "obj" macros are virtually never used on packed matrices, and when
  they are, they can use bli_obj_is_[row|col}_packed() macros, which
  inspect the info bitfield.
2014-08-07 18:20:40 -05:00
Field G. Van Zee
45692e3ad4 Reverted some accidental changes.
Details:
- Reverted some changes that were unintentionally included in the
  previous commit (9526ce98). Thanks to Tony Kelman for pointing
  this out. (Note: a few select changes were not reverted.)
2014-08-07 13:21:15 -05:00
Field G. Van Zee
9526ce9881 Updated copyright headers of emscripten configuration files. 2014-08-06 14:15:34 -05:00
Field G. Van Zee
30833ed71d Minor edits to configurations' make_defs.mk files.
Details:
- Redefined CFLAGS, CFLAGS_NOOPT, and CFLAGS_KERNELS so that CFLAGS_NOOPT
  is defined first and then the other two are defined in terms of
  CFLAGS_NOOPT. This textually cleans up the definitions and makes them a
  little easier to read.
2014-08-06 12:12:03 -05:00
Field G. Van Zee
9d61afeae2 CHANGELOG update (0.1.5) 2014-08-04 16:01:59 -05:00
Field G. Van Zee
bde56d0ecf Version file update (0.1.5) 0.1.5 2014-08-04 16:01:58 -05:00
Field G. Van Zee
4c6ceea4be Added CBLAS compatibility layer.
Details:
- Added a new section in bli_config.h files of all configurations for
  enabling CBLAS support. (Currently, the default is for the CBLAS layer
  to be disabled.)
- Added a directory, frame/compat/cblas, to house CBLAS source code. A
  subdirectory 'f77_sub' holds subroutine wrappers corresponding to
  subroutines found in CBLAS that allow calling some BLAS routines with
  the return value passed as the last argument rather than as an actual
  (function) return value. This was probably intended to allow CBLAS to
  avoid the whole f2c debacle altogether. However, since BLIS does not
  assume the presence of a Fortran compiler, we had to provide similar
  routines in C.
- A script, integrate-cblas-tarball.sh, is included to streamline the
  integration of future revisions of the CBLAS source code.
- The current tarball, cblas.tgz, that was used with the above script to
  generate the present set of CBLAS source code is also included.
- Updated blis.h to include necessary CBLAS-related headers.
2014-08-04 15:49:59 -05:00
Field G. Van Zee
caab62dac0 Merge pull request #19 from kevinoid/fix-install-perms-error
Fix permissions error installing to non-owned directory
2014-08-03 14:36:18 -05:00
Kevin Locke
db97ce979b Fix permissions error installing to non-owned directory
When installing to a directory which is not owned by the installing
user, even when the user has write permission for the directory, the
installation can fail with an error similar to the following:

Installing libblis-0.1.4-7-sandybridge.a into /usr/local/lib/
install: cannot change permissions of ‘/usr/local/lib’: Operation not permitted
Makefile:658: recipe for target '/usr/local/lib/libblis-0.1.4-7-sandybridge.a' failed
make: *** [/usr/local/lib/libblis-0.1.4-7-sandybridge.a] Error 1

In the example case, the error occurred because the user attempted to
install to /usr/local and /usr/local/lib is owned by root with mode 2755
which the Makefile unsuccessfully attempted to change to 0755.

Given that installing to /usr/local is likely to be quite common and the
ownership/permissions are the default for Debian and Debian-derived
Linux distributions (perhaps others as well), this commit attempts to
support that use case by using mkdir rather than install to create the
directory (which is the same approach as Automake).

Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
2014-08-03 12:48:04 -06:00
Field G. Van Zee
383631b514 Redefined bit field macros with bitshift operator.
Details:
- Redefined many of the macros that define bit fields and bit values in
  the obj_t info field using the bitshift operator (<<). This makes it
  easier to reorder bit fields, or expand existing bit fields, or add
  new fields. The bitshifting should be evaluated by the compiler at
  compile-time.
2014-07-31 14:51:48 -05:00
Field G. Van Zee
137143345d Reimplemented unit blocksize fix in prev commit.
Details:
- Instead of inferring the storage format of the micro-panels from within
  the packm variants, we now pass in a bool_t value that denotes whether
  the packed matrix contains row-stored column panels or column-stored
  row panels. This value can then be tested more easily inside the main
  packm variant loop.
- Renumbered pack_t schema values in bli_type_defs.h so that there are
  now five bits, each with different meaning:
  - 4: packed or not packed?
  - 3: packed for 3m?
  - 2: packed for 4m?
  - 1: packed to panels?
  - 0: stored by rows or columns?
- Added new macros that test for status of above bits in schema bit
  subfield, and renamed some existing macros related to 4m/3m.
2014-07-31 12:12:45 -05:00
Field G. Van Zee
a51e32ec06 Fixed unit register blocksize brokenness.
Details:
- Fixed a breakdown in BLIS's ability to differentiate between row-stored
  and column-stored micro-panels when MR or NR is unit. When either
  register blocksize (or both) is equal to one, inspecting the strides of
  the affected packed micro-panel is no longer sufficient to determine
  whether the micro-panel is a row-stored column panel or a column-stored
  row panel (because both strides are unit). At that point, dimension
  information is necessary when invoking the bli_is_row_stored_f() and
  bli_is_col_stored_f() macros (and their "obj" counterparts). Thanks to
  Ilya Polkovnichenko for reporting this bug.
- Added panel dimensions (m and n) to obj_t, which are set in
  packm_init() and then passed into the blocked variants to support the
  aforementioned update.
2014-07-30 10:41:48 -05:00
Field G. Van Zee
c2732272f0 Removed old/unused packm variants. 2014-07-29 16:37:18 -05:00
Field G. Van Zee
b97fa9a5a7 Minor usage update to build/bump-version.sh. 2014-07-27 18:54:09 -05:00
Field G. Van Zee
b18ba5f62d Added missing 'bla_' prefix to r_imag(), d_imag().
Details:
- Added "bla_" to f2c functions r_imag() and d_imag(). Thanks to Murtaza
  Ali for pointing the mis-named functions.
2014-07-27 18:52:05 -05:00
Field G. Van Zee
af7a8e6c04 CHANGELOG update (0.1.4) 2014-07-27 18:20:13 -05:00
Field G. Van Zee
a7537071b1 Version file update (0.1.4) 0.1.4 2014-07-27 18:20:12 -05:00
Tyler Smith
acff74041b Merge branch 'master' of https://github.com/flame/blis 2014-07-23 15:07:30 -05:00
Tyler Smith
cdb9413e14 Enabled threading for a couple more loops in TRSM
JC loop is now enabled for the left-sided case
IC loop is now enabled for the right-sided case
2014-07-23 15:05:15 -05:00
Field G. Van Zee
47b243ef08 Call setid for early return from herk/her2k.
Details:
- Added setid call (to zero imaginary parts of diagonal elements) to
  early return branches of herk_front() and her2k_front() for cases
  where alpha is zero. Thanks to Murtaza Ali for suggesting this fix.
- Comment update.
2014-07-23 13:41:13 -05:00
Tyler Smith
3e7b0db5b0 Merge branch 'master' of https://github.com/flame/blis 2014-07-23 13:40:44 -05:00
Tyler Smith
2f8a357de5 Some TRSM threading fixes/additions 2014-07-23 13:40:12 -05:00
Field G. Van Zee
ed3e33d548 Tweaked behavior of herk, her2k for BLAS compat.
Details:
- Updated herk_front() and her2k_front() to explicitly set the imaginary
  components of the diagonal entries of C to zero after the computation
  is complete. This is needed in case downstream applications read the
  full diagonal entries (i.e., including imaginary part), which could, in
  the absence of this modification, accumulate numerical error from
  subsequent rank-k/rank-2k updates.
- Updated BLAS compatibility wrappers for herk and her2k to return early
  if:
    n == 0 || ( ( alpha == 0 || k == 0 ) && beta == 1 )
  This also results in the imaginary components of diagonal entries NOT
  being set to zero (see above), which is consistent with BLAS.
- Updated mkherm to use setid instead of an inlined loop over the
  diagonal.
2014-07-22 14:40:43 -05:00
Field G. Van Zee
ea59a5c93c Added new level-1d operation: setid.
Details:
- Defined a new level-1d operation, setid, which sets the imaginary
  elements of an object's diagonal to a single scalar. This can be
  useful, for example, when trying to make the diagonal of a Hermitian
  matrix real-valued.
2014-07-22 14:36:02 -05:00
Field G. Van Zee
8965a96593 Merge branch 'master' of github.com:flame/blis 2014-07-22 14:34:32 -05:00
Field G. Van Zee
1785efb542 Minor improvements to invertd and setd.
Details:
- Added missing call to invertd_check() from front-end.
- Changed setd front-end call of scald_check() to setd_check().
2014-07-22 14:33:01 -05:00
Field G. Van Zee
5b73e80b71 Merge pull request #16 from Maratyszcza/emscripten
Emscripten port
2014-07-18 12:21:20 -05:00
Field G. Van Zee
a41e68e09e Reimplemented BLIS initialization/finalization.
Details:
- Rewrote bli_init() and bli_finalize() with OpenMP critical sections
  for thread-safety. Also added lots of explanatory comments.
- Renamed bli_init_safe() and bli_finalize_safe() with the _auto()
  suffix, and reimplemented for simplicity. Updated all invocations
  in BLAS compatibility layer to use _auto() suffix.
2014-07-17 13:25:56 -05:00
Field G. Van Zee
36358948ea Retired frame/3/gemm/other directory.
Details:
- Removed frame/3/gemm/other directory, which contained some outdated
  and/or experimental variants.
2014-07-17 10:58:10 -05:00
Field G. Van Zee
c73261f17e More minor cleanups post-copyright update. 2014-07-14 16:23:51 -05:00
Field G. Van Zee
2a09d24463 Reverted power7 symlinks destroyed by sed script.
Details:
- Reverted two symlinks, in kernels/power7/3/test, back to being symlinks
  after recursive-sed.sh mistakenly replaced them with copies of the
  actual files to which they referred. Meant to include this in previous
  commit.
2014-07-14 16:17:09 -05:00
Field G. Van Zee
7ed415824d Updated copyright headers (continued).
Details:
- Inserted "at Austin" into third clause of license declarations.
  Meant to include this change in previous commit.
2014-07-14 16:14:33 -05:00
Field G. Van Zee
5c2c6c8561 Updated copyright headers to contain "at Austin".
Details:
- Updated copyright headers to include "at Austin" in the name of the
  University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
  2012).
2014-07-14 16:05:03 -05:00
Field G. Van Zee
fcec68cda3 Merge branch 'master' of github.com:flame/blis 2014-07-14 11:35:34 -05:00
Field G. Van Zee
94c0df797e Changed order of zero dim / error checking.
Details:
- Updated level-2 and level-3 internal back-ends so that the operation's
  _check() function is called BEFORE any attempt to return early due to
  the presence of zero dimensions. This ordering makes more sense because
  (for example) object dimensions should match even if one of them is
  zero. Previously, a dimension mismatch could result in an early return
  with no error message.
- Updated bli_check_object_buffer() so that NULL buffers result in an
  error only if the object is dimensionally non-empty (i.e., only if both
  of the object's dimensions are non-zero). This allows BLIS operations
  to be performed on dimensionally empty objects (i.e., where at least one
  dimension is zero).
- Updated the error message associated with bli_check_object_buffer()
  to mention the newly relaxed constraint mentioned above, vis-a-vis
  non-zero dimensions.
2014-07-14 11:24:36 -05:00
Marat Dukhan
20690fe301 Emscripten port 2014-07-13 22:50:56 -07:00
Field G. Van Zee
4a20ed1a3f Merge pull request #14 from Maratyszcza/master
Support "make test" for PNaCl configuration
2014-07-13 17:45:01 -05:00
Field G. Van Zee
6a515e988f Implemented dsdot() and sdsdot() in compat layer.
Details:
- Replaced "not yet implemented" error messages in dsdot() and sdsdot()
  with actual implementations. (These routines are so rarely used that
  this log message will probably lead to some people learning of their
  existence for the first time.)
2014-07-13 17:38:33 -05:00
Field G. Van Zee
255668ddd1 Inserted gemv beta-scaling bug into compat layer.
Details:
- BLAS has a peculiar bug (or feature) whereby calling gemv on a vector
  y of non-zero length and a vector x of zero length results in no action.
  Given that the operation is y := beta*y + A*x, many (most?) individuals
  would expect vector y to still be scaled by beta. BLIS, when called
  natively, handles these cases intuitively (with beta scaling).
  Unfortunately, many BLAS test suites actually check for the way this
  situation is handled. Therefore, we have decided to implement this "bug"
  in the compatibility layer so as to provide "bug-for-bug" compatibility
  with BLAS.
2014-07-13 17:30:44 -05:00
Field G. Van Zee
570a154581 Comment/formatting updates to build scripts.
Details:
- Minor updates to comments and formatting in bump-version.sh and
  update-version-file.sh scripts.
2014-07-12 17:51:05 -05:00
Field G. Van Zee
26cd819906 Added bli_info_*() query functions.
Details:
- Added a new API family, bli_info_*(), which can be used to query
  information about how BLIS was configured. Most of these values are
  returned as gint_t, with the exception of the version string which
  is char*.
- Changed how the testsuite driver queries information about how BLIS
  was configured (from using macro constants directly to using the
  new bli_info API).
- Removed bli_version.c and its header file.
- Added STRINGIFY_INT() macro to bli_macro_defs.h
- Renamed info_t type in bli_type_defs.h to objbits_t (not because of
  an actual naming conflict, but because the name 'info_t' would now be
  somewhat misleading in the presence of the new bli_info API, as the
  two are unrelated).
2014-07-10 13:16:07 -05:00
Field G. Van Zee
970b431416 Minor bugfixes to BLAS compatibility layer.
Details:
- Changed bla_amax.c so that i?amax() routines now correctly return 0
  if ( n < 1 || incx <= 0 ).
- Changed bla_rotg.c and bla_rotmg.c to use bli_fabs() macro instead of
  f2c's abs() macro for float and double cases.
- Thanks to Murtaza Ali for suggesting the two fixes above.
- Updated label of fnormv to normfv in testsuite/input.operations.
2014-07-10 09:30:00 -05:00
Marat Dukhan
8ccdfaef4c Replicated logic from testsuite/Makefile in top-level Makefile to support make test 2014-07-08 23:14:36 -07:00