Commit Graph

84 Commits

Author SHA1 Message Date
Tyler Smith
bde697f75e Add -openmp to ldflags as well 2014-04-04 16:43:44 -05:00
Tyler Smith
c332be8cd4 Added -openmp flag to Xeon Phi build for convenience 2014-04-04 16:37:50 -05:00
Tyler Smith
5ec93bd9a7 Bunch of minor fixes
Removed barrier after unpackm in all level3 blocked variants
Now there is an implicit barrier inside unpackm that only occurs if C is packed (which is usually not the case)

Moved the enabling of the tree barriers into bli_config.h
Fed the default MR and NR for double precision into bli_get_range instead of the number 8
2014-04-04 15:09:10 -05:00
Tyler Smith
2b6848b239 Merge http://github.com/flame/blis
Conflicts:
	kernels/bgq/1/bli_axpyv_opt_var1.c
	kernels/bgq/1/bli_dotv_opt_var1.c
2014-04-04 09:54:54 -05:00
Tyler Michael Smith
4e3eb39aca Some fixes to the bgq config
MR and NR for double complex were wrong
Default fusing factor for double precision was wrong as well
2014-04-04 14:50:03 +00:00
Field G. Van Zee
d27b4f690c Use generic paths for toolchain in POWER7.
Details:
- Fixed issue #4. Thanks to Jeff Hammond for contributing changes.
2014-04-01 12:57:24 -05:00
Tyler Michael Smith
73b3db5948 Some fixes for the bgq configuration 2014-03-26 15:39:05 +00:00
Tyler Smith
2c158fb885 Merge https://github.com/flame/blis
Conflicts:
	frame/1m/packm/bli_packm_blk_var1.c
2014-02-27 16:46:23 -06:00
Tyler Smith
ac5a2de1d1 Merge branch 'master' of https://github.com/tlrmchlsmth/blis 2014-02-27 11:59:33 -06:00
Tyler Smith
01b125e815 First pass at adding parallelism to BLIS.
Added a multithreading infrastructure that should be independent of multithreading implementation in the future.
Currently, gemm blocked variants 1f and 2f, and packm variant blocked variant 1 is parallelized.
2014-02-27 11:55:45 -06:00
Field G. Van Zee
c2b2ab6270 Deprecated panel stride alignment in bli_config.h.
Details:
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE from bli_config.h of all
  configurations. It was already going unused in packm_init() since the
  recent 4m/3m commit. This setting was rarely, if ever, useful, and its
  existence only posed a potential risk for 4m/3m-based implementations.
- Removed BLIS_CONTIG_STRIDE_ALIGN_SIZE usage from mem_pool_macro_defs.h.
- Updated comments regarding CONTIG_STRIDE_ALIGN_SIZE in template
  micro-kernels.
2014-02-26 12:46:45 -06:00
Field G. Van Zee
fde5f1fdec Added extensive support for configuration defaults.
Details:
- Standard names for reference kernels (levels-1v, -1f and 3) are now
  macro constants. Examples:
    BLIS_SAXPYV_KERNEL_REF
    BLIS_DDOTXF_KERNEL_REF
    BLIS_ZGEMM_UKERNEL_REF
- Developers no longer have to name all datatype instances of a kernel
  with a common base name; [sdcz] datatype flavors of each kernel or
  micro-kernel (level-1v, -1f, or 3) may now be named independently.
  This means you can now, if you wish, encode the datatype-specific
  register blocksizes in the name of the micro-kernel functions.
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
  undefined in bli_kernel.h will default to the corresponding reference
  implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
  it will be defined to be BLIS_DGEMM_UKERNEL_REF.
- Developers no longer need to name level-1v/-1f kernels with multiple
  datatype chars to match the number of types the kernel WOULD take in
  a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
  sufficient, as in bli_daxpyv_opt().
- There is no longer a need to define an obj_t wrapper to go along with
  your level-1v/-1f kernels. The framework now prvides a _kernel()
  function which serves as the obj_t wrapper for whatever kernels are
  specified (or defaulted to) via bli_kernel.h
- Developers no longer need to prototype their kernels, and thus no
  longer need to include any prototyping headers from within
  bli_kernel.h. The framework now generates kernel prototypes, with the
  proper type signature, based on the kernel names defined (or defaulted
  to) via bli_kernel.h.
- If the complex datatype x (of [cz]) implementation of the gemm micro-
  kernel is left undefined by bli_kernel.h, but its same-precision real
  domain equivalent IS defined, BLIS will use a 4m-based implementation
  for the datatype x implementations of all level-3 operations, using
  only the real gemm micro-kernel.
2014-02-25 13:34:56 -06:00
Field G. Van Zee
15b51e990f Merge branch 'master' of github.com:fgvanzee/blis 2014-02-21 09:04:32 -06:00
Francisco Igual
d1813c9dee Added new armv7a micro-kernels and configuration files from Werner Saar. 2014-02-21 15:14:31 +01:00
Field G. Van Zee
6363a9f658 Added level-3 support for complex via 4m-/3m.
Details:
- Added the ability to induce complex domain level-3 operations via new
  virtual complex micro-kernels which are implemented via only real
  domain micro-kernels. Two new implementations are provided: 4m and 3m.
  4m implements complex matrix multiplication in terms of four real
  matrix multiplications, where as 3m uses only three and thus is
  capable of even higher (than peak) performance. However, the 3m method
  has somewhat weaker numerical properties, making it less desirable
  in general.
- Further refined packing routines, which were recently revamped, and
  added packing functionality for 4m and 3m.
- Some modifications to trmm and trsm macro-kernels to facilitate indexing
  into micro-panels which were packed for 4m/3m virtual kernels.
- Added 4m and 3m interfaces for each level-3 operation.
- Various other minor changes to facilitate 4m/3m methods.
2014-02-19 17:00:52 -06:00
Tyler Smith
ce06686368 Fixed more Xeon Phi bugs, especially with scattered update 2014-02-14 13:52:18 -06:00
Field G. Van Zee
7aceef7683 Updated comments in macro-kernels.
Details:
- Updated (and fixed some errors in) the "Assumptions/assertions" comment
  section of macro-kernels.
- Changed register blocksizes of reference configuration to MR = 8 and
  NR = 4. It's always good for MR != NR in the reference configuration
  since it may help uncover bugs related to non-square micro-kernels.
2014-02-06 17:31:19 -06:00
Field G. Van Zee
3404e6657e Deprecated incremental blocksize macro const defs.
Details:
- Removed macro constant definitions related to incremental blocksizes
  from all configurations' bli_kernel.h files. This change is minor and
  is mostly a cleanup related to a previous commit.
2014-02-05 11:19:10 -06:00
Field G. Van Zee
89c76a8a51 Allow building outside source distribution.
Details:
- Modified build system (mostly configure and top-level Makefile) so that
  a user can build a BLIS library outside of the top-level directory of
  the source distribution.
- Added "test" target to Makefile so that the user can run "make test",
  which will compile, link, and run the testsuite binary. This works even
  if the build directory is externally located, thanks to the test suite
  binary's new -g and -o command-line options. Also, when creating the
  test suite via the top-level Makefile, the linking is against the
  local archive, in lib/<configname>, rather than at <install_prefix>/lib.
- Modified testsuite/Makefile so that it links against the library built
  locally, in ../lib/<configname>.
- Added "-lm" to LDFLAGS of most configurations' make_defs.mk.
- Various other cleanups to build system.
2014-01-09 12:08:37 -06:00
Field G. Van Zee
cafb58e86e Updated template micro-kernels to use auxinfo_t.
Details:
- Updated template micro-kernel implementations (located in
  config/template/kernels), to adhere to the new auxinfo_t interface.
  Meant to include this change in a0331fb1.
- Changed template configuration to use 64-bit integers (for both BLIS
  and the BLAS compatibility layer).
2014-01-06 13:28:36 -06:00
Field G. Van Zee
2cb13600f9 Updated year in copyright headers to 2014. 2014-01-03 12:29:13 -06:00
Field G. Van Zee
f60c8adc2f Minor updates to dunnington configuration.
Details:
- Added commented alternatives to dunnington configuration's bli_kernel.h.
- Minor reformatting of optimization flag variables in make_defs.mk.
2013-12-10 14:39:56 -06:00
Field G. Van Zee
4ef2015049 Tweaks to dunnington configuration (x86_64/core2).
Details:
- Updated BLIS_DEFAULT_KC_D from 256 to 384.
- Enabled cache blocksize extension of up to 25% for MC and KC (for
  double-precision real).
2013-12-09 18:53:03 -06:00
Field G. Van Zee
50549a6a31 Changed header install directory to include/blis.
Details:
- Changed top-level Makefile so that headers are installed to
  $(INSTALL_PREFIX)/include/blis/. (Header directories are no longer
  named by version/configuration and then symlinked.)
- Added uninstall targets, including uninstall-old to clean out old
  library archives.
- Added GREP makefile definitions to all configurations' make_defs.mk.
2013-11-17 18:31:27 -06:00
Field G. Van Zee
d70733abdd Added ARM kernels, configurations.
Details:
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
  Thanks to Francisco Igual for contributing these kernels and
  configurations.
2013-11-16 17:34:25 -06:00
Field G. Van Zee
089048d589 Added object wrappers to 1f test suite modules.
Details:
- Added missing object wrappers to level-1f test suite modules. This was
  only apparent if you were configuring with something other than the
  reference configuration.
- Commented out object-wrappers in level-1f front-ends. These were not
  working as intended the reference configuration was selected, because
  most kernel sets, such as those in the template set, do not have object
  wrappers.
- Whitespace changes to template micro-kernels.
- Comment changes to template level-1f kernel headers.
2013-11-09 17:18:00 -06:00
Field G. Van Zee
9ef3752079 Updated template kernels wrt KernelsHowTo wiki.
Details:
- Merged latest state of KernelsHowTo wiki into template micro-kernels
  located in config/template/kernels/3.
2013-11-08 17:20:47 -06:00
Field G. Van Zee
376bbb59c8 Removed support for duplication.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
  and all framework code.
- Updated test suite modules according to above changes.
2013-11-08 11:17:34 -06:00
Field G. Van Zee
f5953259a1 Fixed a bug related to Hermitian matrix diagonals.
Details:
- Fixed a bug whereby BLIS assumed that the imaginary components of the
  diagonal elements of Hermitian matrices were already zero. This property
  is now enforced when the matrix is packed (bli_packm_blk_var2). Thanks
  to Vladimir Sukharev for reporting this bug.
- Minor comment updates to template kernels.
2013-11-04 14:43:55 -06:00
Field G. Van Zee
cca1e1f51d Fixed bugs in scalm and setm.
Details:
- Fixed bugs in scalm and setm that resulted in segmentation faults when
  beta is not the same type as the matrix operand. Thanks to Vladimir
  Sukharev for reporting this bug.
- Changed axpym and scal2m front-ends in fashion similar to that of scalm
  and setm; namely, the alpha scalar is copy-cast the type of the first
  matrix operand.
- Changed the template and reference configurations' bli_config.h files
  so that the number of memory allocator blocks of A and B are set based
  on BLIS_MAX_NUM_THREADS.
- Comment updates to bli_obj.c and variable rename in bla_nrm2.c.
2013-10-30 14:39:01 -05:00
Field G. Van Zee
a091a219bd Minor fixes to piledriver configuration, ukernel.
Details:
- Applied a patch from Tyler that fixes minor staleness in the piledriver
  configuration and gemm micro-kernel.
- Very minor changes to test suite input files.
2013-10-14 10:11:29 -05:00
Field G. Van Zee
dacdde27ae Added Fran's Sandy Bridge kernels/configuration.
Details:
- Added a kernel directory for kernels developed by Francisco Igual for
  the Sandy Bridge architecture, including a dgemm ukernel coded with
  AVX intrinsics.
- Added a configuration for Sandy Bridge using values supplied by Fran.
2013-10-11 11:37:19 -05:00
Field G. Van Zee
be4833bd91 Added test suite modules for level-1f, 3 kernels.
Details:
- Added test modules in test suite for level-1f kernels and level-3
  micro-kernels. (Duplication in the micro-kernels, for now, is NOT
  supported by these test modules.)
- Added section override switches to test suite's input.operations file.
- Added obj_t APIs for level-1f front-ends and their unblocked variants to
  facilitate the level-1f test modules. Also added front-end for dupl
  operation.
- Added obj_t-based check routines for level-1f operations, which are
  called from the new front-ends mentioned above.
- Added query routines for axpyf, dotxf, and dotxaxpyf that return fusing
  factors as a function of datatype, which is needed by their respective
  test modules.
- Whitespace changes to bli_kernel.h of all existing configurations.
2013-10-10 14:20:06 -05:00
Field G. Van Zee
5e54f46ccb Added template implementations and other tweaks.
Details:
- Added a 'template' configuration, which contains stub implementations of the
  level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
  lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
  renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
  to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
  dotxaxpyf, as well as the default fusing factor (which are all equal
  in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
  reference variants were implemented in terms of front-end routines rather
  that directly in terms of the kernels. (For example, axpy2v was implemented
  as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
  A is assumed to be m x b_n in both cases, and for dotxf A is actually used
  as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
  frame/3/gemm/ukernels and frame/3/trsm/ukernels.
2013-09-30 12:58:18 -05:00
Field G. Van Zee
97aaf220a8 Added new kernels, configurations.
Details:
- Added various micro-kernels for the following architectures:
    Intel MIC
    IBM BG/Q
    IBM Power7
    AMD Piledriver
    Loogson 3A
  and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
  and Xianyi Zhang for contributing these kernels.
- Added configurations corresponding to above architectures, and renamed
  "clarksville" configuration to "dunnington".
2013-09-17 10:51:36 -05:00
Field G. Van Zee
da77e9614f Minor improvements to static memory allocator.
Details:
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
  a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
  functionality includes computing the pool size for each datatype (using
  that datatype's cache blocksizes) and using the maximum to size the
  actual pool array. This addresses the somewhat common pitfall whereby a
  developer updates cache blocksizes in bli_kernel.h for only one datatype
  (say, single-precision real), while the memory pools are sized using the
  double-precision real values. Then, when the developer attempts to link
  to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
  a message saying the static memory pool was exhausted. Clearly, this
  message is misleading when the pool was not sized properly to begin with.
- Removed previously disabled code in bli_kernel_macro_defs.h that was
  meant to check for size consistency among the various cache blocksizes.
  (Obviously the memory pool size-based solution mentioned above is better.)
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
  reasonable place to put these constants, rather than further crowd up
  bli_config.h.
- Updated testsuite driver to output memory pool sizes for A, B, and C.
- Minor comment updates to bli_config.h.
- Removed 'flame' configuration. It was beginning to get out-of-date, and
  I hadn't used it in months. We can always re-create it later.
2013-09-13 12:00:37 -05:00
Field G. Van Zee
7ae4d7a41d Various changes to treatment of integers.
Details:
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
  assigned values of 32, 64, or some other value. The former two result in
  defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
  causes integers to be defined in terms of a default type (e.g. long int).
- Updated bli_config.h in reference and clarksville configurations according
  to above changes.
- Updated test drivers in test and testsuite to avoid type warnings associated
  with format specifiers not matching the types of their arguments to printf()
  and scanf().
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
  inclusion in d141f9eeb6).
- Added explicit typecasting of dim_t and inc_t to macros in
  bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
- Slight changes to CREDITS and INSTALL files.
- Slight tweaks to Windows build system, mostly in the form of switching to
  Windows-style CRLF newlines for certain files.
2013-09-10 16:35:12 -05:00
Field G. Van Zee
981a60cfa0 Falling back to 32-bit integers for dim_t, etc.
Details:
- In light of recent segfaulting issues when compiling on 32-bit systems,
  I've changed the default typedef for gint_t and guint_t from int64_t and
  uint64_t to int32_t and uint32_t, respectively.
- Disabled 64-bit integers in the blas2blis layer for the reference
  configuration.
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
  to introductory output of the testsuite.
2013-09-04 12:09:11 -05:00
Field G. Van Zee
d352c746e5 Added single/real gemm micro-kernel for x86_64.
Details:
- Added a single-precision real gemm micro-kernel in
  kernels/x86_64/3/bli_gemm_opt_d4x4.c.
- Adjusted the single-precision real register blocksizes in
  config/clarksville/bli_kernel.h to be 8x4.
- Added a missing comment to bli_packm_blk_var2.c that was present in
  bli_packm_blk_var3.c
2013-08-27 13:41:46 -05:00
Field G. Van Zee
8abfe55f2a Miscellaneous updates.
Details:
- Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
  BLIS_CACHE_LINE_SIZE (typically 64).
- Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
  kernels.
- Reformulated gemm_ker_var2 to look more like the other level-3 macro-
  kernels, in that the interior and edge-case handling is expressed once
  inside the loops in the n and m dimensions, rather than the edge-case
  handling being "unrolled" and expressed as distinct code regions. The
  previous macro-kernel now lives in retired form in the subdirectory
  other/bli_gemm_ker_var2.c.old.
- Updated experimental gemm_ker_var5 according to above change.
- Fixed bug in bli_her2k.c whereby incorrect transformations were being
  applied to optimize the macro-kernel accesses pattern on C when C is
  row-stored.
- Various updates inside of test/exec_sizes.
2013-08-08 13:30:19 -05:00
Field G. Van Zee
4e80ad28c9 Added support for C99 complex types/arithmetic.
Details:
- Added support for C99 complex types to bli_type_defs.h and overloaded
  complex arithmetic to the scalar-level macros in include/level0. This
  includes a somewhat substantial reorganization and re-layering of much
  of the existing machinery present in the level0 macros.
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
  commented-out by default, which optionally enables the use of built-in
  C99 complex types and arithmetic.
- Minor changes to clarksville and reference configs' make_defs.mk files.
- Removed macro definitions from bli_param_macro_defs.h which was not being
  used (bli_proj_dt_to_real_if_imag_eq0).
2013-07-18 17:53:31 -05:00
Field G. Van Zee
e5f90f3a8d Removed copynz defs from bli_kernel.h files.
Details:
- Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each
  configuration. (Meant to include this in previous commit.)
2013-07-10 13:40:12 -05:00
Field G. Van Zee
4b7e7970f1 Migrated integer usage to stdint.h types.
Details:
- Changed the way bli_type_defs.h defines integer types so that dim_t,
  inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
  integer) or guint_t (general unsigned integer).
- Renamed Fortran types fchar and fint to f77_char and f77_int.
- Define f77_int as int64_t if a new configuration variable,
  BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
  These types are defined in stdint.h, which is now included in blis.h.
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
  in terms of scomplex.
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
  of char.
- Updated bla_amax() wrappers so that the return type is defined directly
  as f77_int, rather than letting the prototype-generating macro decide
  the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
  so I removed them. Also, changed the body of the wrapper so that a
  gint_t is passed into abmaxv, which is THEN typecast to an f77_int
  before returning the value.
- Updated f2c code that accessed .r and .i fields of complex and
  doublecomplex types so that they use .real and .imag instead (now that
  we are using scomplex and dcomplex).
2013-07-08 15:20:34 -05:00
Field G. Van Zee
3725013985 Added experimental bli_gemm_ker_var5().
Details:
- Added support for an experimental gemm macro-kernel incrementally
  packs one micro-panel of B at a time. This is useful for certain
  special cases of gemm where m is small.
- Minor changes to default values of clarksville configuration.
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
  do not yet have any use (or implementation support) for block storage.
- Comment update to bli_packm_init.c.
2013-07-08 11:24:18 -05:00
Field G. Van Zee
02002ef6f3 Added row-storage optimizations for trmm, trsm.
Details:
- Implemented algorithmic optimizations for trmm and trsm whereby the right
  side case is now handled explicitly, rather than induced indirectly by
  transposing and swapping strides on operands. This allows us to walk through
  the output matrix with favorable access patterns no matter how it is stored,
  for all parameter combinations.
- Renamed trmm and trsm blocked variants so that there is no longer a
  lower/upper distinction. Instead, we simply label the variants by which
  dimension is partitioned and whether the variant marches forwards or
  backwards through the corresponding partitioned operands.
- Added support for row-stored packing of lower and upper triangular matrices
  (as provided by bli_packm_blk_var3.c).
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
  blocksize  extensions (if non-zero) were not being used to appropriately size
  the first iteration (ie: the bottom/right edge case).
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
  whole multiples of MR AND NR. This is needed for the case of trsm_r where,
  in order to reuse existing left-side gemmtrsm fused micro-kernels, the
  packing of A (left-hand operand) and B (right-hand operand) is done with
  NR and MR, respectively (instead of MR and NR).
2013-06-24 17:08:14 -05:00
Field G. Van Zee
5b641c3bab Use separate CFLAGS for "kernels" directories.
Details:
- Added a new "special" directory type: any source code within directories
  named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
  compiler flags. This allows the developer to specify a separate set of
  flags (e.g. optimization flags) for compiling kernels while maintaining a
  standard set for regular framework code.
- Fixed a bug in the top-level Makefile that was causing "noopt" code
  to be compiled with the standard set of compilation flags.
- Updated make_defs.mk in reference, flame, and clarksville configurations
  according to above changes.
2013-06-12 16:02:12 -05:00
Field G. Van Zee
05a657a6b9 Added beta == 0 optimization to x86_64 ukernel.
Details:
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
  from memory (nor scaled by beta).
- Fixed minor bug in test suite driver when "Test all combinations of storage
  schemes?" switch is disabled, which would result in redundant tests being
  executed for matrix-only (e.g. level-1m, level-3) operations if multiple
  vector storage schemes were specified.
- Restored debug flags as default in clarksville configuration.
2013-06-07 11:04:10 -05:00
Field G. Van Zee
22b06cfcd2 Updated level-1/-1f [vector intrinsic] kernels.
Details:
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
  handled by reference implementation (rather than aborted).
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
  configuration.
- Defined bli_offset_from_alignment() macro.
- Minor edits to old test drivers.
2013-06-03 16:54:52 -05:00
Field G. Van Zee
0288c827d3 Updated ukernels for x86_64.
Details:
- Tweaked micro-kernels and configuration for clarksville.
- Updated/cleaned up old test drivers in test directory.
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
  recently).
2013-06-01 08:02:23 -05:00
Field G. Van Zee
2d9c667f3c Fixed x86_64 kernel bugs and other minor issues.
Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
  unaligned subpartitions. We were already going out of our way a bit to
  handle edge cases in the first iteration for blocked variants, and this
  was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
  into account how the choice of variant needed to be altered for
  upper-stored matrices (given that only lower-stored algorithms are
  explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
  macros to provide inlined versions of bli_determine_blocksize_[fb]() for
  use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
  consistency with that of the bugfix for trmv/trsv (both of which now
  use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
  vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
  conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
  was invalid only because the code was expecting 1 (for purposes of
  performing contiguous vector loads) but got a value greater than 1 because
  the column stride of the object (e.g. rho) was inflated for alignment
  purposes (albeit unnecessarily since there is only one element in the
  object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
  dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
  internal back-ends to correctly handle cases where output operand still
  needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
2013-05-24 16:28:10 -05:00