Commit Graph

433 Commits

Author SHA1 Message Date
Field G. Van Zee
cc8d2b8277 Updated old test drivers in 'test'. 2014-09-09 13:48:22 -05:00
Field G. Van Zee
c472993bbc Removed densify argument to packm_cntl_obj_create().
Details:
- Removed the "densify" bool_t argument to bli_packm_cntl_obj_create().
  This argument was inserted very early in BLIS's development, when it
  was anticipated that the developer may sometimes wish to pack a
  Hermitian, symmetric, or triangular matrix without making it dense.
  But as it turns out, if we are packing a matrix, we always want to
  make it dense in some way or another due to the fact that the micro-
  kernel only multiplies dense micro-panels. Thus, unless/until there
  is a real need for the feature, it seems reasonable to remove it from
  the packm_cntl API.
2014-09-09 13:42:04 -05:00
Field G. Van Zee
5c43ee3871 Moved trmm4m/3m_cntl files to 'old' directory.
Details:
- Meant to include this in previous commit.
2014-09-08 15:19:29 -05:00
Field G. Van Zee
7b2f469d54 Retired trmm_t control tree definitions, usage.
Details:
- Replaced all trmm_t control tree instances and usage with that of
  gemm_t. This change is similar to the recent retirement of the herk_t
  control tree.
- Tweaked packm blocked variants so that the triangular code does NOT
  assume that k is a multiple of MR (when A is triangular) or NR (when
  B is triangular). This means that bottom-right micro-panels packed for
  trmm will have different zero-padding when k is not already a multiple
  of the relevant register blocksize. While this creates a seemingly
  arbitrary and unnecessary distinction between trmm and trsm packing,
  it actually allows trmm to be handled with one control tree, instead
  of one for left and one for right side cases. Furthermore, since only
  one tree is required, it can now be handled by the gemm tree, and thus
  the trmm control tree definitions can be disposed of entirely.
- Tweaked trmm macro-kernels so that they do NOT inflate k up to a
  multiple of MR (when A is triangular) or NR (when B is triangular).
- Misc. tweaks and cleanups to bli_packm_struc_cxk_4m.c and _3m.c, some
  of which are to facilitate above-mentioned changes whereby k is no
  longer required to be a multiple of register blocksize when packing
  triangular micro-panels.
- Adjusted trmm3 according to above changes.
- Retired trmm_t control tree creation/initialization functions.
2014-09-08 14:49:50 -05:00
Field G. Van Zee
576e9e9255 Retired herk_t control tree definitions, usage.
Details:
- Replaced all herk_t control tree instances and usage with that of
  gemm_t, since the two types presently have the same fields. This means
  that herk, her2k, syrk, and syr2k can simply use the gemm control tree
  as-is, just as hemm and symm have been doing for some time now.
- Retired herk_t control tree creation/initialization functions.
- Retired many _target.c and .h files into 'old' directories.
2014-09-07 16:12:52 -05:00
Field G. Van Zee
b2fed052c9 Minor code cleanup to bli_packm_struc_cxk*.c
Details:
- Realized that we don't need to track rs_p11 and cs_p11 for
  Hermitian/symmetric case of bli_packm_struc_cxk*(). They are always
  equal to rs_p and cs_p.
2014-09-03 17:07:25 -05:00
Field G. Van Zee
023ce77096 Minor update to packm_cxk kernels.
Details:
- Changed m and n dimension parameter names to panel_dim and panel_len,
  respectively, in packm_cxk, packm_cxk_3m, packm_cxk_4m kernel wrapper
  functions. This makes the code a little easier to read since "m" and
  "n" have connotations that are not applicable here.
- Comment updates.
2014-09-03 10:47:53 -05:00
Field G. Van Zee
189def3667 Retired portions of bli_kernel_3m/4m_macro_defs.h.
Details:
- Removed sections of bli_kernel_[4m|3m]_macro_defs.h that defined
  4m/3m-specific blocksizes after realizing that this can be done in
  bli_gemm[4m|3m]_cntl.c, since that is (mostly) the only place they
  are used.
- The maximum cache values for 4m/3m are stll needed when computing mem
  pool dimensions in bli_mem_pool_macro_defs.h. As a workaround, "local"
  definitions in terms of the regular cache blocksizes are now in place.
- Similarly, the register blocksizes for 4m/3m are still needed in
  bli_kernel_post_macro_defs.h. As a workaround, "local" definitions in
  terms of the regular register blocksizes are now in place.
2014-09-01 16:23:17 -05:00
Field G. Van Zee
af521ee6f2 Changed semantics of blocksize extensions.
Details:
- Changed semantics of cache and register blocksize extensions so that
  the extended values are tracked, rather than just the marginal
  extensions.
- BLIS_EXTEND_[MKN]C_? has been renamed BLIS_MAXIMUM_[MKN]C_?.
- BLIS_EXTEND_[MKN]R_? has been renamed BLIS_PACKDIM_[MKN]R_?.
- bli_blksz_ext_*() APIs have been renamed to bli_blksz_max_*(). Note
  that these "max" query routines grab the maximum value for cache
  blocksizes and the packdim value for register blocksizes.
- bli_info_*() API has been updated accordingly.
- All configurations have been updated accordingly.
2014-09-01 14:06:46 -05:00
Field G. Van Zee
07f23aefd5 Pass pack schema into packm_struc_cxk*().
Details:
- Changed the interface to the packm_struc_cxk*() kernels to include
  the pack_t schema. This allows the implementation to more easily
  determine how the micro-panel is stored (row-stored column panel
  or column-stored row panel).
- Updated packm blocked variants to pass in the schema.
- Updated packm_ker_t function pointer definition accordingly.
2014-08-31 11:58:50 -05:00
Field G. Van Zee
f032ba9b11 Reorganized packm implementation.
Details:
- Reorganized packm variants and structure-aware kernels so that all
  routines for a given pack format (4m, 3m, regular) reside in a single
  file.
- Renamed _blk_var4 to _blk_var2 and generalized so that it will work
  for
  both 4m and 3m, and adjusted 4m/3m _cntl_init() functions accordingly.
- Added a new packm_ker_t function pointer type to
  bli_kernel_type_defs.h
  to facilitate function pointer typecasting in the datatype-specific
  packm_blk_var2() functions.
- Deprecated _blk_var3.
- Fixed a bug in the triangular micro-panel packing facility that
  affected trmm and trmm3 with unit diagonals.
2014-08-30 16:21:20 -05:00
Field G. Van Zee
c6793cecb7 Reorganized #includes for scalar macro headers.
Details:
- Reordered the #include statements in bli_scalar_macro_defs.h so that
  conventional, ri-, and ri3-based macros are grouped together.
- Renamed bli_eqri.h (and macros within) to end with 'ris' suffix.
2014-08-28 17:14:48 -05:00
Field G. Van Zee
b4da890728 Whitespace, comments updates on packm_blk_var?.c. 2014-08-28 14:10:32 -05:00
Field G. Van Zee
46e46a1d83 Minor updates to packm blocked, cxk_3m/4m code.
Details:
- Added 'const' qualifier to inlined packing code that handles
  micro-panel packing that is too large for an existing packm ukernel.
- Comment updates.
2014-08-28 12:05:45 -05:00
Field G. Van Zee
908dc688b5 Pass pack schema into blocked packm routines.
Details:
- Rather than passing the packm blocked routines a boolean value that
  represents whether the matrix is being packed to row or column storage,
  we now pass in the pack schema itself.
2014-08-28 11:55:12 -05:00
Field G. Van Zee
a0ff6066e0 Merge branch 'master' of github.com:flame/blis 2014-08-24 15:56:21 -05:00
Field G. Van Zee
c4c99c4813 Renamed packm scalar from beta to kappa.
Details:
- The packm implementation (i.e. sources files in frame/1m/packm and
  frame/1m/packm/ukernels), interchangeably used the names "beta" and
  "kappa" to refer to the optional scalar to be applied during packing.
  This commit renames all uses of "beta" to be "kappa", since "beta"
  sometimes evokes the scalar specifically on the output matrix of a
  level-2 or level-3 operation.
2014-08-24 15:52:22 -05:00
Field G. Van Zee
d40b32bc24 Merge branch 'master' of github.com:flame/blis 2014-08-24 13:46:36 -05:00
Field G. Van Zee
6c25c379fa Consolidated unpackm ukernels into single file.
Details:
- Reorganized unpackm ukernels into a single file,
  bli_unpackm_ref_cxk.c, in a manner similar to what was done for packm
  ukernels in commit 4cc2b46.
2014-08-24 13:44:10 -05:00
Field G. Van Zee
9331f79443 Merge branch 'master' of github.com:flame/blis 2014-08-24 10:54:21 -05:00
Field G. Van Zee
670b63926a Added whitespace to bli_obj_scalar_ routine calls.
Details:
- Added extra spaces to align arguments of
  bli_obj_scalar_init_detached_copy_of(). This misalignment was due to
  the fact that the function was previously named
  bli_obj_init_scalar_copy_of() and the name change, performed in
  b444489f, was done via recursive sed commands which left subsequent
  lines untouched.
2014-08-24 10:46:27 -05:00
Field G. Van Zee
7fc48a7d92 Combined 4m/3m bits into an expanded bitfield.
Details:
- Combined the 4m/3m bits into an expanded bitfield, which will encode
  the packing "format" of the micro-panels. This will allow for more
  easily and compactly encoding additional formats.
- Other minor comment/whitespace updates to bli_type_defs.h.
- Updated bli_obj_macro_defs.h and bli_param_macro_defs.h to use the new
  format bitfield.
- Comment update to bli_kernel_post_macro_defs.h.
- Whitespace changes to bli_kernel_3m_macro_defs.h, _4m_macro_defs.h.
2014-08-23 16:50:58 -05:00
Field G. Van Zee
ef0143cc14 Renamed _ri, _ri3 packm ukernels to _4m, _3m.
Details:
- Renamed packm ukernels, _cxk dispatcher, and structure-aware _cxk
  helper functions to use _4m and _3m instead of _ri and _ri3 suffixes.
- Updated names of cpp macros that correspond to packm ukernels.
2014-08-23 14:02:27 -05:00
Field G. Van Zee
b0ccac1161 Cleaned up front-end layering for 4m/3m.
Details:
- Added an extra layer to level-3 front-ends (examples: bli_gemm_entry()
  and bli_gemm4m_entry()) to hide the control trees from the code that
  decides whether to execute native or 4m-based implementations. The
  layering was also applied to 3m.
- Branch to 4m code based on the return value of bli_4m_is_enabled(),
  rather than the cpp macros BLIS_ENABLE_?COMPLEX_VIA_4M. This lays
  the groundwork for users to be able to change at runtime which
  implementation is called by the main front-ends (e.g. bli_gemm()).
- Retired some experimental gemm code that hadn't been touched in
  months.
2014-08-21 19:21:52 -05:00
Field G. Van Zee
bedec95451 Added bli_4m API for querying 4m enabled state.
Details:
- Added bli_4m.c (and header), which defines a simple API that can be
  used to query, enable, and disable 4m-based complex support in BLIS.
  The macros BLIS_ENABLE_?COMPLEX_VIA_4M are now used to initialize
  the variable that determines the state (enabled or disabled).
- Changed bli_info*() API so that all cache and register blocksize-
  related query routines return the blksz_t objects' values as they
  exist at runtime, rather than return the values as determined by the
  configuration system (e.g. bli_kernel.h, or defaults for those values
  not specified). This sets the foundation for being able to change
  those blocksizes at runtime.
2014-08-21 18:25:48 -05:00
Field G. Van Zee
dd61307f55 Minor update to sandybridge MC_S, KC_S.
Details:
- Changed sandybridge MC and KC for single-precision real to 128 and 384,
  respectively.
- Updated comments in template configuration's gemm micro-kernel file
  to document the new "contiguous row preference" macro.
2014-08-20 09:52:16 -05:00
Field G. Van Zee
d0eec4bddd Added optional row preference to ukernel config.
Details:
- Added the ability for the kernel developer to indicate the gemm micro-
  kernel as having a preference for accessing the micro-tile of C via
  contiguous rows (as opposed to contiguous columns). This property may
  be encoded in bli_kernel.h as BLIS_?GEMM_UKERNEL_PREFERS_CONTIG_ROWS,
  which may be defined or left undefined. Leaving it undefined leads to
  the default assumption of column preference.
- Changed conditionals in frame/3/*/*_front.c that induce transposition
  of the operation so that the transposition is induced only if there
  is disagreement between the storage of C and the preference of the
  micro-kernel. Previously, the only conditional that needed to be met
  was that C was row-stored, which is to say that we assumed the micro-
  kernel preferred column-contiguous access on C.
- Added a "prefers_contig_rows" property to func_t objects, and updated
  calls to bli_func_obj_create() in _cntl.c files in order to support
  the above changes.
- Removed the row-storage optimization from bli_trsm_front.c because
  it is actually ineffective. This is because the right-side case of
  trsm flips the A and B micro-panel operands (since BLIS only requires
  left-side gemmtrsm/trsm kernels), meaning any transposition done
  at the high level is then undone at the low level.
- Tweaked trmm, trmm3 _front.c files to eliminate a possible redundant
  invocation of the bli_obj_swap() macro.
2014-08-19 15:49:19 -05:00
Field G. Van Zee
4cc2b464f2 Reorganized packm ukernels.
Details:
- Previously, packm micro-kernels were organized by the implied register
  blocksize (panel dimension) assumed by the kernel, meaning conventional,
  ri, and ri3 variations of some micro-kernel size were housed in the same
  file. This commit reorganizes the micro-kernels so that all sizes reside
  in the same file for each format type (conventional, ri, and ri3).
2014-08-15 11:49:15 -05:00
Field G. Van Zee
fcc10054a1 Tweaks to gemm4m, gemm3m virtual ukernels.
Details:
- Fixed a potential, but as-yet unobserved bug in gemm3m that would
  allow undesirable inf/NaN propogation, since C was being scaled by
  beta even if it was equal to zero.
- In gemm3m micro-kernel, we now avoid copying C to the temporary
  micro-tile if beta is zero.
- Rearranged computation in gemm4m so that the temporary C micro-tile
  is accessed less, and C is accessed only after the micro-kernel
  calls. This improves performance marginally in most situations.
- Comment updates to both gemm4m and gemm3m micro-kernels.
2014-08-13 12:32:06 -05:00
Field G. Van Zee
cdcbacc2fa Removed redundant redef of packm ukr prototypes.
Details:
- Removed redundant macro code that redefined packm ukernel prototypes
  when the previous macro was already sufficient. This helps de-clutter
  the packm ukernel prototyping headers a little bit.
2014-08-12 12:45:38 -05:00
Field G. Van Zee
82dac98d90 Relocated packm ukernel #includes.
Details:
- Consolidated the #include statements for packm ukernel headers from
  bli_packm_cxk.h, bli_packm_cxk_ri.h, and bli_packm_cxk_ri3.h to
  bli_packm.h.
- Comment/whitespace updates to bli_packm_blk_var3.c, _var4.c.
2014-08-12 12:36:25 -05:00
Field G. Van Zee
7f77856e25 Removed unused 4m/3m-related packm macro defs.
Details:
- Removed unused and unneeded s- and d-flavored macro definitions for
  packm ukernels related to the complex 4m and 3m methods, as
  implemented in BLIS.
2014-08-12 12:20:15 -05:00
Field G. Van Zee
bc1d86b2d4 Sandy Bridge configuration, micro-kernel update.
Details:
- Minor updates to bli_config and bli_kernel.h for sandybridge
  configuration.
- Renamed existing AVX intrinsic-based micro-kernel file to
  bli_gemm_int_d8x4.c.
- Added new file, bli_gemm_asm_d8x4.c, which provides assembly-based
  gemm micro-kernels for single- and double-precision real.
2014-08-07 19:01:20 -05:00
Field G. Van Zee
98ec95877a Corrected comment for _obj_is_[row|col]_stored().
Details:
- Fixed a mistake in the comments introduced in the previous commit for
  bli_obj_is_row_stored() and bli_obj_is_col_stored().
2014-08-07 18:28:32 -05:00
Field G. Van Zee
43d5e419e1 Reverted _obj_is_[row|col]_stored() macros.
Details:
- Rolled back recent changes to bli_obj_is_row_stored() and
  bli_obj_is_col_stored() so that those macros now only inspect the
  strides (row or column). It turns out that the more sophisticated
  definitions introduced in a51e32e are not necessary, because these
  "obj" macros are virtually never used on packed matrices, and when
  they are, they can use bli_obj_is_[row|col}_packed() macros, which
  inspect the info bitfield.
2014-08-07 18:20:40 -05:00
Field G. Van Zee
45692e3ad4 Reverted some accidental changes.
Details:
- Reverted some changes that were unintentionally included in the
  previous commit (9526ce98). Thanks to Tony Kelman for pointing
  this out. (Note: a few select changes were not reverted.)
2014-08-07 13:21:15 -05:00
Field G. Van Zee
9526ce9881 Updated copyright headers of emscripten configuration files. 2014-08-06 14:15:34 -05:00
Field G. Van Zee
30833ed71d Minor edits to configurations' make_defs.mk files.
Details:
- Redefined CFLAGS, CFLAGS_NOOPT, and CFLAGS_KERNELS so that CFLAGS_NOOPT
  is defined first and then the other two are defined in terms of
  CFLAGS_NOOPT. This textually cleans up the definitions and makes them a
  little easier to read.
2014-08-06 12:12:03 -05:00
Field G. Van Zee
9d61afeae2 CHANGELOG update (0.1.5) 2014-08-04 16:01:59 -05:00
Field G. Van Zee
bde56d0ecf Version file update (0.1.5) 0.1.5 2014-08-04 16:01:58 -05:00
Field G. Van Zee
4c6ceea4be Added CBLAS compatibility layer.
Details:
- Added a new section in bli_config.h files of all configurations for
  enabling CBLAS support. (Currently, the default is for the CBLAS layer
  to be disabled.)
- Added a directory, frame/compat/cblas, to house CBLAS source code. A
  subdirectory 'f77_sub' holds subroutine wrappers corresponding to
  subroutines found in CBLAS that allow calling some BLAS routines with
  the return value passed as the last argument rather than as an actual
  (function) return value. This was probably intended to allow CBLAS to
  avoid the whole f2c debacle altogether. However, since BLIS does not
  assume the presence of a Fortran compiler, we had to provide similar
  routines in C.
- A script, integrate-cblas-tarball.sh, is included to streamline the
  integration of future revisions of the CBLAS source code.
- The current tarball, cblas.tgz, that was used with the above script to
  generate the present set of CBLAS source code is also included.
- Updated blis.h to include necessary CBLAS-related headers.
2014-08-04 15:49:59 -05:00
Field G. Van Zee
caab62dac0 Merge pull request #19 from kevinoid/fix-install-perms-error
Fix permissions error installing to non-owned directory
2014-08-03 14:36:18 -05:00
Kevin Locke
db97ce979b Fix permissions error installing to non-owned directory
When installing to a directory which is not owned by the installing
user, even when the user has write permission for the directory, the
installation can fail with an error similar to the following:

Installing libblis-0.1.4-7-sandybridge.a into /usr/local/lib/
install: cannot change permissions of ‘/usr/local/lib’: Operation not permitted
Makefile:658: recipe for target '/usr/local/lib/libblis-0.1.4-7-sandybridge.a' failed
make: *** [/usr/local/lib/libblis-0.1.4-7-sandybridge.a] Error 1

In the example case, the error occurred because the user attempted to
install to /usr/local and /usr/local/lib is owned by root with mode 2755
which the Makefile unsuccessfully attempted to change to 0755.

Given that installing to /usr/local is likely to be quite common and the
ownership/permissions are the default for Debian and Debian-derived
Linux distributions (perhaps others as well), this commit attempts to
support that use case by using mkdir rather than install to create the
directory (which is the same approach as Automake).

Signed-off-by: Kevin Locke <kevin@kevinlocke.name>
2014-08-03 12:48:04 -06:00
Field G. Van Zee
383631b514 Redefined bit field macros with bitshift operator.
Details:
- Redefined many of the macros that define bit fields and bit values in
  the obj_t info field using the bitshift operator (<<). This makes it
  easier to reorder bit fields, or expand existing bit fields, or add
  new fields. The bitshifting should be evaluated by the compiler at
  compile-time.
2014-07-31 14:51:48 -05:00
Field G. Van Zee
137143345d Reimplemented unit blocksize fix in prev commit.
Details:
- Instead of inferring the storage format of the micro-panels from within
  the packm variants, we now pass in a bool_t value that denotes whether
  the packed matrix contains row-stored column panels or column-stored
  row panels. This value can then be tested more easily inside the main
  packm variant loop.
- Renumbered pack_t schema values in bli_type_defs.h so that there are
  now five bits, each with different meaning:
  - 4: packed or not packed?
  - 3: packed for 3m?
  - 2: packed for 4m?
  - 1: packed to panels?
  - 0: stored by rows or columns?
- Added new macros that test for status of above bits in schema bit
  subfield, and renamed some existing macros related to 4m/3m.
2014-07-31 12:12:45 -05:00
Field G. Van Zee
a51e32ec06 Fixed unit register blocksize brokenness.
Details:
- Fixed a breakdown in BLIS's ability to differentiate between row-stored
  and column-stored micro-panels when MR or NR is unit. When either
  register blocksize (or both) is equal to one, inspecting the strides of
  the affected packed micro-panel is no longer sufficient to determine
  whether the micro-panel is a row-stored column panel or a column-stored
  row panel (because both strides are unit). At that point, dimension
  information is necessary when invoking the bli_is_row_stored_f() and
  bli_is_col_stored_f() macros (and their "obj" counterparts). Thanks to
  Ilya Polkovnichenko for reporting this bug.
- Added panel dimensions (m and n) to obj_t, which are set in
  packm_init() and then passed into the blocked variants to support the
  aforementioned update.
2014-07-30 10:41:48 -05:00
Field G. Van Zee
c2732272f0 Removed old/unused packm variants. 2014-07-29 16:37:18 -05:00
Field G. Van Zee
b97fa9a5a7 Minor usage update to build/bump-version.sh. 2014-07-27 18:54:09 -05:00
Field G. Van Zee
b18ba5f62d Added missing 'bla_' prefix to r_imag(), d_imag().
Details:
- Added "bla_" to f2c functions r_imag() and d_imag(). Thanks to Murtaza
  Ali for pointing the mis-named functions.
2014-07-27 18:52:05 -05:00
Field G. Van Zee
af7a8e6c04 CHANGELOG update (0.1.4) 2014-07-27 18:20:13 -05:00