Commit Graph

166 Commits

Author SHA1 Message Date
Field G. Van Zee
5e54f46ccb Added template implementations and other tweaks.
Details:
- Added a 'template' configuration, which contains stub implementations of the
  level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
  lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
  renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
  to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
  dotxaxpyf, as well as the default fusing factor (which are all equal
  in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
  reference variants were implemented in terms of front-end routines rather
  that directly in terms of the kernels. (For example, axpy2v was implemented
  as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
  A is assumed to be m x b_n in both cases, and for dotxf A is actually used
  as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
  frame/3/gemm/ukernels and frame/3/trsm/ukernels.
2013-09-30 12:58:18 -05:00
Field G. Van Zee
97aaf220a8 Added new kernels, configurations.
Details:
- Added various micro-kernels for the following architectures:
    Intel MIC
    IBM BG/Q
    IBM Power7
    AMD Piledriver
    Loogson 3A
  and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
  and Xianyi Zhang for contributing these kernels.
- Added configurations corresponding to above architectures, and renamed
  "clarksville" configuration to "dunnington".
2013-09-17 10:51:36 -05:00
Field G. Van Zee
fe979c5a11 Removed default configuration behavior.
Details:
- Changed the configure script so that it no longer defaults to the
  reference configuration. This change is being made so that the
  developer has a firm awareness of which configuration is being used
  to configure BLIS. Thanks to Mike Kistler and Bryan Marker for this
  suggested change.
2013-09-13 14:31:53 -05:00
Field G. Van Zee
da77e9614f Minor improvements to static memory allocator.
Details:
- Expanded on cpp macro definitions from bli_mem.c and relocated them to
  a new header file, frame/include/bli_mem_pool_macro_defs.h. The expanded
  functionality includes computing the pool size for each datatype (using
  that datatype's cache blocksizes) and using the maximum to size the
  actual pool array. This addresses the somewhat common pitfall whereby a
  developer updates cache blocksizes in bli_kernel.h for only one datatype
  (say, single-precision real), while the memory pools are sized using the
  double-precision real values. Then, when the developer attempts to link
  to and run a level-3 BLIS routine (e.g. dgemm), the library aborts with
  a message saying the static memory pool was exhausted. Clearly, this
  message is misleading when the pool was not sized properly to begin with.
- Removed previously disabled code in bli_kernel_macro_defs.h that was
  meant to check for size consistency among the various cache blocksizes.
  (Obviously the memory pool size-based solution mentioned above is better.)
- Added BLIS_SIZEOF_? cpp macros to bli_type_defs.h. This seemed like a
  reasonable place to put these constants, rather than further crowd up
  bli_config.h.
- Updated testsuite driver to output memory pool sizes for A, B, and C.
- Minor comment updates to bli_config.h.
- Removed 'flame' configuration. It was beginning to get out-of-date, and
  I hadn't used it in months. We can always re-create it later.
2013-09-13 12:00:37 -05:00
Field G. Van Zee
631f347b7a Added ESSL and Accelerate targets to test drivers.
Details:
- Added ESSL and Accelerate (OS X) targets to standalone test drivers'
  Makefile in "test" directory. Thanks to Jeff Hammond for suggesting
  / providing this patch.
2013-09-10 17:17:28 -05:00
Field G. Van Zee
7ae4d7a41d Various changes to treatment of integers.
Details:
- Added a new cpp macro in bli_config.h, BLIS_INT_TYPE_SIZE, which can be
  assigned values of 32, 64, or some other value. The former two result in
  defining gint_t/guint_t in terms of 32- or 64-bit integers, while the latter
  causes integers to be defined in terms of a default type (e.g. long int).
- Updated bli_config.h in reference and clarksville configurations according
  to above changes.
- Updated test drivers in test and testsuite to avoid type warnings associated
  with format specifiers not matching the types of their arguments to printf()
  and scanf().
- Inserted missing #include "bli_system.h" into blis.h (which was slated for
  inclusion in d141f9eeb6).
- Added explicit typecasting of dim_t and inc_t to macros in
  bli_blas_macro_defs.h (which are used in BLAS compatibility layer).
- Slight changes to CREDITS and INSTALL files.
- Slight tweaks to Windows build system, mostly in the form of switching to
  Windows-style CRLF newlines for certain files.
2013-09-10 16:35:12 -05:00
Field G. Van Zee
068437736b Fixed set-but-not-used compiler (gcc) warnings.
Details:
- Used void-casts of certain variables to appease gcc (and perhaps other
  compilers) when such variables are only used in the complex instances of
  the functions. Special thanks to Karl Rupp for suggesting a portable fix
  for these warnings.
2013-09-09 14:07:58 -05:00
Field G. Van Zee
6dc85f63dc Small fix to Windows defs.mk makefile fragment.
Details:
- Commented out a !include statement that was attempting to include a
  version file that does not yet exist. For now, the version string is
  hard-coded into defs.mk.
2013-09-09 13:48:52 -05:00
Field G. Van Zee
d141f9eeb6 Added Windows build system.
Details:
- Added a 'windows' directory, which contains a Windows build system
  similar to that of libflame's. Thanks to Martin for getting this up
  and running.
- Spun off system header #includes into bli_system.h, which is included
  in blis.h
- Added a Windows section to bli_clock.c (similar to libflame's).
2013-09-09 13:09:16 -05:00
Field G. Van Zee
9b320e7406 Edited bli_?lamch.c to avoid Windows keyword.
Details:
- Renamed "small" variable to "smnum" to avoid collision with Windows type
  by the same name. This change is needed in advance of the upcoming Windows
  build system.
2013-09-09 11:04:46 -05:00
Field G. Van Zee
9013ad6ff2 Switched integer typedefs (again) to C types.
Details:
- Redefined gint_t and guint_t in terms of the standard C types long int
  and unsigned long int, respectively.
- Changed testsuite default max problem size to 500.
- Changed testsuite input.operations to use square problems for level-3
  operation tests.
2013-09-04 13:36:07 -05:00
Field G. Van Zee
981a60cfa0 Falling back to 32-bit integers for dim_t, etc.
Details:
- In light of recent segfaulting issues when compiling on 32-bit systems,
  I've changed the default typedef for gint_t and guint_t from int64_t and
  uint64_t to int32_t and uint32_t, respectively.
- Disabled 64-bit integers in the blas2blis layer for the reference
  configuration.
- Added type sizes of gint_t, guint_t, and the four floating-point datatypes
  to introductory output of the testsuite.
2013-09-04 12:09:11 -05:00
Field G. Van Zee
b776ddcd43 Applied temp fix to typecasting bug in testsuite.
Details:
- Applied a temporary fix to the typecasting bug in the testsuite driver.
  The fix involves casting both numerator and denominator to unsigned long.
  This fix is more voodoo than science, as I can't be sure why it even
  works.
2013-09-03 21:58:07 -05:00
Field G. Van Zee
9ee6e12537 Changed dimension spec for gemm in testsuite.
Details:
- Encounted a bizarre typecasting bug whereby the test suite was not
  computing the proper dimension from the problem size and dimension
  specification when the latter was set to -3. Will investigate.
  Thanks to Fran for finding this "bug".
2013-09-03 21:53:27 -05:00
Field G. Van Zee
e8be081e68 Generalized matlab and file output in testsuite.
Details:
- Added a new option in input.general that allows outputting in
  matlab/octave format so that one can output in matlab format
  independently from outputting to files.
- Adjusted input.operations according to above.
- Added input.operations.0 and input.operations.1 with all options
  disabled and enabled, respectively.
2013-08-28 15:52:34 -05:00
Field G. Van Zee
d352c746e5 Added single/real gemm micro-kernel for x86_64.
Details:
- Added a single-precision real gemm micro-kernel in
  kernels/x86_64/3/bli_gemm_opt_d4x4.c.
- Adjusted the single-precision real register blocksizes in
  config/clarksville/bli_kernel.h to be 8x4.
- Added a missing comment to bli_packm_blk_var2.c that was present in
  bli_packm_blk_var3.c
2013-08-27 13:41:46 -05:00
Field G. Van Zee
dedda523dc Fixed bug in bli_acquire_mpart_t2b(), _l2r().
Details:
- Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r()
  that cause incorrect partitioning when SUBPART0 was requested. This
  bug was introduced in 46d3d09d49. Thanks to Bryan for isolating
  this bug.
- Removed dupl kernels from kernels/x86_64/3 directory.
- Uncommented beta == 0 optimizaition code in
  kernels/x86_64/3/bli_gemm_opt_d4x4.c.
2013-08-19 12:07:41 -05:00
Field G. Van Zee
12dbd2f334 Moved init_safe(), finalize_safe() to BLAS compat.
Details:
- Moved the bli_init_safe() and bli_finalize_safe() function calls from the
  BLAS-like BLIS layer to the BLAS compatibility layer. Having these auto-
  initializers in the BLIS layer wasn't buying us anything because the user
  could still call the library with uninitialized global scalar constants,
  for example. Thus, we will just have to live with the constraint that
  bli_init() MUST be called before calling ANY routine with a bli_ prefix.
- Added the missing _init_safe() and finalize_safe() calls to the level-1
  BLAS compatibility wrappers.
2013-08-08 14:39:35 -05:00
Field G. Van Zee
8abfe55f2a Miscellaneous updates.
Details:
- Changed the BLIS_HEAP_STRIDE_ALIGN_SIZE in the configurations from 16 to
  BLIS_CACHE_LINE_SIZE (typically 64).
- Changed the use of nr in sizing of bd buffer to packnr in level-3 macro-
  kernels.
- Reformulated gemm_ker_var2 to look more like the other level-3 macro-
  kernels, in that the interior and edge-case handling is expressed once
  inside the loops in the n and m dimensions, rather than the edge-case
  handling being "unrolled" and expressed as distinct code regions. The
  previous macro-kernel now lives in retired form in the subdirectory
  other/bli_gemm_ker_var2.c.old.
- Updated experimental gemm_ker_var5 according to above change.
- Fixed bug in bli_her2k.c whereby incorrect transformations were being
  applied to optimize the macro-kernel accesses pattern on C when C is
  row-stored.
- Various updates inside of test/exec_sizes.
2013-08-08 13:30:19 -05:00
Field G. Van Zee
1aa05736ff Fixed bug in interface of bla_ger_check().
Details:
- Fixed the misplaced lda parameter in the function signature of
  bla_ger_check(). Thanks to Tyler for finding this bug.
2013-08-07 12:27:04 -05:00
Field G. Van Zee
685aad2535 Fixed cpp guard typos in frame/compat/check files.
Details:
- Fixed instances of BLIS_ENABLE_BLIS2BLAS that should have been
  BLIS_ENABLE_BLAS2BLIS. Thanks to Tyler for catching this.
- Fixed various syntax errors in the code that had yet to be compiled
  due to the aforementioned bug.
2013-08-06 12:25:51 -05:00
Field G. Van Zee
f4ec28e723 Added basic OpenMP-based gemm and packm files.
Details:
- Integrated Tyler's parallelized packm_blk_var2 and gemm_ker_var2
  into the following auxiliary files

    frame/1m/packm/other/bli_packm_blk_var2.c
    frame/3/gemm/other/bli_gemm_ker_var2.c

  The routine in the first file uses a basic OpenMP parallel region to
  parallelize the packing of blocks of A and panels of B, while the
  second uses a similar parallel region to parallelize along the n
  dimension of the gemm macro-kernel.
2013-08-01 11:24:23 -05:00
Field G. Van Zee
f8980edf9c Merge branch 'master' of https://code.google.com/p/blis 2013-07-26 11:14:27 -05:00
Field G. Van Zee
67a8b9498d Added missing cpp kernel blocksize constraints.
Details:
- Added missing C preprocessor guards in bli_kernel_macro_defs.h that enforce
  constraints on the register blocksizes relative to the cache blocksizes.
  Thanks to Tyler for helping me stumble across this issue.
2013-07-26 11:12:37 -05:00
Field G. Van Zee
6e7e452343 Fixed minor warnings and misc issues.
Details:
- Fixed various warnings output by gcc 4.6.3-1, including removing some
  set-but-not-used variables and addressing some instances of typecasting
  of pointer types to integer types of different sizes.
2013-07-22 14:50:57 -05:00
Field G. Van Zee
03f6c35997 Tightened some macros that detect datatypes.
Details:
- Modified the definitions of some macros, such as bli_is_real(), so that
  the "special" bit is taken into account so that BLIS_INT is differentiated
  from BLIS_FLOAT.
- Whitespace changes to bli_obj_macro_defs.h.
- Removed BLIS_SPECIAL_BIT definition from bli_type_defs.h, since it wasn't
  being used.
2013-07-22 12:54:32 -05:00
Field G. Van Zee
b33e2f4443 CHANGELOG update (for 0.0.9). 2013-07-19 17:15:03 -05:00
Field G. Van Zee
0680916fdd Added BLAS error checking to compatibility layer.
Details:
- Added frame/compat/check directory, which now houses companion _check()
  routines for each of the BLAS wrappers in frame/compat. These _check()
  routines are called from the compatibility wrappers and mimic the
  error-checking present in the netlib BLAS.
- Edited bla_xerbla.c so that xerbla() translates the operation string to
  uppercase before printing.
- Redefined util routines in frame/compat/f2c/util in terms of level0
  macros.
- Added prototypes for util routines, f2c routines, lsame(), and xerbla().
- Commented out prototypes in test/test_*.c since Fortran integers are now
  int64_t by default (and the prototypes that were present in the files
  used int).
- Removed redundant #include "bli_f2c.h" in bli_?lamch.c and bli_lsame.c,
  since blis.h was already being included.
- Other minor changes to code in frame/compat/f2c.
0.0.9
2013-07-18 18:04:34 -05:00
Field G. Van Zee
4e80ad28c9 Added support for C99 complex types/arithmetic.
Details:
- Added support for C99 complex types to bli_type_defs.h and overloaded
  complex arithmetic to the scalar-level macros in include/level0. This
  includes a somewhat substantial reorganization and re-layering of much
  of the existing machinery present in the level0 macros.
- Added new #define for BLIS_ENABLE_C99_COMPLEX to bli_config.h files,
  commented-out by default, which optionally enables the use of built-in
  C99 complex types and arithmetic.
- Minor changes to clarksville and reference configs' make_defs.mk files.
- Removed macro definitions from bli_param_macro_defs.h which was not being
  used (bli_proj_dt_to_real_if_imag_eq0).
2013-07-18 17:53:31 -05:00
Field G. Van Zee
6072d7c848 Fixed bugs in trsm, trmm macro-kernels.
Details:
- Fixed a bug in trsm_rl_ker_var2() caused by incorrect edge case handling.
- Fixed a bug in trsm_rl_ker_var2() and trsm_ru_ker_var2() whereby k was
  incorrectly being adjusted upward by MR, instead of NR. The rl and ru
  trmm macro-kernels were updated in a similar fashion.
- Fixed a bug in trsm_ru_ker_var2() that was due to a missing negation on
  diagoffb when recomputing k to skip a zero region below where the
  diagonal intersects the right side of the block. The corresponding
  trmm macro-kernel was also updated.
- Fixed a bug in trsm_ru_ker_var2() where the the adjustment of k (by NR)
  needed to be placed AFTER the block that recomputes k to skip the zero
  region (if present). The other three trsm macro-kernels, as well as the
  trmm macro-kernels, were updated in the same manner, for consistency.
- Fixed a bug in trmm_lu_ker_var2() in which the wrong dimension (n) was
  being updated to skip a zero region to the left of where the diagonal
  of A intersects the top edge of the block.
- Comment updates to all trsm and trmm macro-kernels.
- Comment updates to bli_packm_init.c.
2013-07-17 12:27:45 -05:00
Field G. Van Zee
47410a48f9 Added f2c'ed Givens rotation wrappers.
Details:
- Retired (for now) existing ?rot*() BLAS compatibility wrappers to 'attic'
  along with other wrappers for which no BLIS implementation exists.
- Added f2c-generated codes for applicable datatype flavors of rot, rotg,
  rotm, and rotmg operations.
2013-07-10 14:53:59 -05:00
Field G. Van Zee
e5f90f3a8d Removed copynz defs from bli_kernel.h files.
Details:
- Removed COPYNZ_KERNEL definition from the bli_kernel.h files in each
  configuration. (Meant to include this in previous commit.)
2013-07-10 13:40:12 -05:00
Field G. Van Zee
aec12d90f5 Removed copynzv, copynzm and related codes.
Details:
- Removed copynzv and copynzm operation directories. These operations
  implemented a variation of copyv/m that, in the case of real source
  and complex destination operands, leaves the imaginary component
  untouched (rather than setting it to zero). I realize now that the
  special case(s) (e.g. gemm with real A and B but complex C) that I
  thought required this operation actually can be handled more simply.
- Removed level0 scalar macros implementing copynzs, copynzjs.
2013-07-10 13:33:30 -05:00
Field G. Van Zee
b0a0a0f274 Added handling of restrict, stdint.h for non-C99.
Details:
- Removed the #include <stdint.h> from blis.h and inserted a cpp macro block
  in bli_type_defs.h that #includes <stdint.h> for C++ and C99, and otherwise
  manually typedefs the types we need (which, for now, are unconditionally
  int64_t and uint64_t).
- Moved basic typedefs to top of bli_type_defs.h, and comment changes.
- Added cpp macro block to bli_macro_defs.h that #defines restrict as
  nothing for C++ and non-C99.
2013-07-09 17:15:38 -05:00
Field G. Van Zee
4b7e7970f1 Migrated integer usage to stdint.h types.
Details:
- Changed the way bli_type_defs.h defines integer types so that dim_t,
  inc_t, doff_t, etc. are all defined in terms of gint_t (general signed
  integer) or guint_t (general unsigned integer).
- Renamed Fortran types fchar and fint to f77_char and f77_int.
- Define f77_int as int64_t if a new configuration variable,
  BLIS_ENABLE_BLIS2BLAS_INT64, is defined, and int32_t otherwise.
  These types are defined in stdint.h, which is now included in blis.h.
- Renamed "complex" type in f2c files to "singlecomplex" and typedef'ed
  in terms of scomplex.
- Renamed "char" type in f2c files to "character" and typedef'ed in terms
  of char.
- Updated bla_amax() wrappers so that the return type is defined directly
  as f77_int, rather than letting the prototype-generating macro decide
  the type. This was the only use of GENTFUNC2I/GENTPROT2I-related macros,
  so I removed them. Also, changed the body of the wrapper so that a
  gint_t is passed into abmaxv, which is THEN typecast to an f77_int
  before returning the value.
- Updated f2c code that accessed .r and .i fields of complex and
  doublecomplex types so that they use .real and .imag instead (now that
  we are using scomplex and dcomplex).
2013-07-08 15:20:34 -05:00
Field G. Van Zee
3725013985 Added experimental bli_gemm_ker_var5().
Details:
- Added support for an experimental gemm macro-kernel incrementally
  packs one micro-panel of B at a time. This is useful for certain
  special cases of gemm where m is small.
- Minor changes to default values of clarksville configuration.
- Defined BLIS_PACKED_BLOCKS as part of pack_t type, even though we
  do not yet have any use (or implementation support) for block storage.
- Comment update to bli_packm_init.c.
2013-07-08 11:24:18 -05:00
Field G. Van Zee
9915d667a7 Defined "total" blocksize query functions.
Details:
- Defined bli_blksz_total_for_type() and bli_blksz_total_for_obj() to query
  the default blocksize plus blocksize extension (using the type or the type
  of an object).
- Comment update in bli_packm_cxk.c.
2013-07-07 13:28:39 -05:00
Field G. Van Zee
46d3d09d49 Consolidated lower/upper her[2]k blocked variants.
Details:
- Consolidated lower and upper blocked variants for herk and her2k, and
  renamed the resulting variants, according to the same changes recently
  made to trmm and trsm.
- Implemented support for four new subpartitions types:
    BLIS_SUBPART1T
    BLIS_SUBPART1B
    BLIS_SUBPART1L
    BLIS_SUBPART1R
  which correspond to "merged" partitions that include the middle "1"
  partition as well as either the neighboring "0" or "2" partition. This is
  used to clean up code in herk/her2k var2 that attempts to partition away
  the strictly zero region above or below the diagonal of a matrix operand
  that is being marched through diagonally.
- Added safeguards to herk macro-kernels that skip any leading or trailing
  zero region in the panel of C that is passed in. This is now needed given
  that herk/her2k var1 no longer partitions off this zero region before
  calling the macro-kernel (via bli_her[2]k_int()).
- Updated comments and other whitespace changes to trmm/trsm macro-kernels.
2013-06-27 13:19:56 -05:00
Field G. Van Zee
02002ef6f3 Added row-storage optimizations for trmm, trsm.
Details:
- Implemented algorithmic optimizations for trmm and trsm whereby the right
  side case is now handled explicitly, rather than induced indirectly by
  transposing and swapping strides on operands. This allows us to walk through
  the output matrix with favorable access patterns no matter how it is stored,
  for all parameter combinations.
- Renamed trmm and trsm blocked variants so that there is no longer a
  lower/upper distinction. Instead, we simply label the variants by which
  dimension is partitioned and whether the variant marches forwards or
  backwards through the corresponding partitioned operands.
- Added support for row-stored packing of lower and upper triangular matrices
  (as provided by bli_packm_blk_var3.c).
- Fixed a performance bug in bli_determine_blocksize_b() whereby the cache
  blocksize  extensions (if non-zero) were not being used to appropriately size
  the first iteration (ie: the bottom/right edge case).
- Updated comments in bli_kernel.h to indicate that both MC and NC must be
  whole multiples of MR AND NR. This is needed for the case of trsm_r where,
  in order to reuse existing left-side gemmtrsm fused micro-kernels, the
  packing of A (left-hand operand) and B (right-hand operand) is done with
  NR and MR, respectively (instead of MR and NR).
2013-06-24 17:08:14 -05:00
Field G. Van Zee
d1e81ddc84 Minor generalizing tweaks to trmm blk var1, var2. 2013-06-13 11:14:21 -05:00
Field G. Van Zee
0efb7974f1 CHANGELOG update. 2013-06-12 16:40:04 -05:00
Field G. Van Zee
5b641c3bab Use separate CFLAGS for "kernels" directories.
Details:
- Added a new "special" directory type: any source code within directories
  named "kernels" will be compiled with a separate CFLAGS_KERNELS set of
  compiler flags. This allows the developer to specify a separate set of
  flags (e.g. optimization flags) for compiling kernels while maintaining a
  standard set for regular framework code.
- Fixed a bug in the top-level Makefile that was causing "noopt" code
  to be compiled with the standard set of compilation flags.
- Updated make_defs.mk in reference, flame, and clarksville configurations
  according to above changes.
0.0.8
2013-06-12 16:02:12 -05:00
Field G. Van Zee
08475e7c76 Various level-3 optimizations for row storage.
Details:
- Implemented remaining two cases within bli_packm_blk_var2(), which allow
  packing from a lower or upper-stored symmetric/Hermitian matrix to column
  panels (which are row-stored). Previously one could only pack to row panels
  (which are column-stored).
- Implemented various optimizations in the level-3 front-ends that allow more
  favorable access through row-stored matrices for gemm, hemm, herk, her2k,
  symm, syrk, and syr2k.
- Cleaned up code in level-3 front-ends that has to do with setting target and
  execution datatypes.
2013-06-11 12:18:39 -05:00
Field G. Van Zee
05a657a6b9 Added beta == 0 optimization to x86_64 ukernel.
Details:
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
  from memory (nor scaled by beta).
- Fixed minor bug in test suite driver when "Test all combinations of storage
  schemes?" switch is disabled, which would result in redundant tests being
  executed for matrix-only (e.g. level-1m, level-3) operations if multiple
  vector storage schemes were specified.
- Restored debug flags as default in clarksville configuration.
2013-06-07 11:04:10 -05:00
Field G. Van Zee
f1aa6b81cc Whitespace changes to old test drivers.
Details:
- Replaced tabs with four spaces in places where indention was already
  in place.
2013-06-06 13:36:06 -05:00
Field G. Van Zee
9feb4c23d2 Fixed unaligned handling in axpyf, dotxaxpyf.
Details:
- Fixed over-cautious handling of unaligned operands in vector instrinsic
  implementation of axpyf kernel.
- Fixed over- and under-cautious handling of unaligned operands in vector
  intrinsic implementation of dotxaxpyf kernel.
2013-06-04 14:57:46 -05:00
Field G. Van Zee
22b06cfcd2 Updated level-1/-1f [vector intrinsic] kernels.
Details:
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
  handled by reference implementation (rather than aborted).
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
  configuration.
- Defined bli_offset_from_alignment() macro.
- Minor edits to old test drivers.
2013-06-03 16:54:52 -05:00
Field G. Van Zee
0288c827d3 Updated ukernels for x86_64.
Details:
- Tweaked micro-kernels and configuration for clarksville.
- Updated/cleaned up old test drivers in test directory.
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
  recently).
2013-06-01 08:02:23 -05:00
Field G. Van Zee
85a6d1c9a5 Replaced axpys usage with subs in trsv.
Details:
- Replaced instances of axpys with alpha equal to -1 with subs.
- Use BLIS_MAX_TYPE_SIZE to define BLIS_CONSTANT_SLOT_SIZE instead of
  sizeof(dcomplex).
2013-05-29 10:58:24 -05:00
Field G. Van Zee
2d9c667f3c Fixed x86_64 kernel bugs and other minor issues.
Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
  unaligned subpartitions. We were already going out of our way a bit to
  handle edge cases in the first iteration for blocked variants, and this
  was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
  into account how the choice of variant needed to be altered for
  upper-stored matrices (given that only lower-stored algorithms are
  explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
  macros to provide inlined versions of bli_determine_blocksize_[fb]() for
  use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
  consistency with that of the bugfix for trmv/trsv (both of which now
  use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
  vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
  conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
  was invalid only because the code was expecting 1 (for purposes of
  performing contiguous vector loads) but got a value greater than 1 because
  the column stride of the object (e.g. rho) was inflated for alignment
  purposes (albeit unnecessarily since there is only one element in the
  object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
  dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
  internal back-ends to correctly handle cases where output operand still
  needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
2013-05-24 16:28:10 -05:00