Commit Graph

43 Commits

Author SHA1 Message Date
Tyler Michael Smith
4e3eb39aca Some fixes to the bgq config
MR and NR for double complex were wrong
Default fusing factor for double precision was wrong as well
2014-04-04 14:50:03 +00:00
Tyler Michael Smith
73b3db5948 Some fixes for the bgq configuration 2014-03-26 15:39:05 +00:00
Field G. Van Zee
fde5f1fdec Added extensive support for configuration defaults.
Details:
- Standard names for reference kernels (levels-1v, -1f and 3) are now
  macro constants. Examples:
    BLIS_SAXPYV_KERNEL_REF
    BLIS_DDOTXF_KERNEL_REF
    BLIS_ZGEMM_UKERNEL_REF
- Developers no longer have to name all datatype instances of a kernel
  with a common base name; [sdcz] datatype flavors of each kernel or
  micro-kernel (level-1v, -1f, or 3) may now be named independently.
  This means you can now, if you wish, encode the datatype-specific
  register blocksizes in the name of the micro-kernel functions.
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
  undefined in bli_kernel.h will default to the corresponding reference
  implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
  it will be defined to be BLIS_DGEMM_UKERNEL_REF.
- Developers no longer need to name level-1v/-1f kernels with multiple
  datatype chars to match the number of types the kernel WOULD take in
  a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
  sufficient, as in bli_daxpyv_opt().
- There is no longer a need to define an obj_t wrapper to go along with
  your level-1v/-1f kernels. The framework now prvides a _kernel()
  function which serves as the obj_t wrapper for whatever kernels are
  specified (or defaulted to) via bli_kernel.h
- Developers no longer need to prototype their kernels, and thus no
  longer need to include any prototyping headers from within
  bli_kernel.h. The framework now generates kernel prototypes, with the
  proper type signature, based on the kernel names defined (or defaulted
  to) via bli_kernel.h.
- If the complex datatype x (of [cz]) implementation of the gemm micro-
  kernel is left undefined by bli_kernel.h, but its same-precision real
  domain equivalent IS defined, BLIS will use a 4m-based implementation
  for the datatype x implementations of all level-3 operations, using
  only the real gemm micro-kernel.
2014-02-25 13:34:56 -06:00
Field G. Van Zee
6363a9f658 Added level-3 support for complex via 4m-/3m.
Details:
- Added the ability to induce complex domain level-3 operations via new
  virtual complex micro-kernels which are implemented via only real
  domain micro-kernels. Two new implementations are provided: 4m and 3m.
  4m implements complex matrix multiplication in terms of four real
  matrix multiplications, where as 3m uses only three and thus is
  capable of even higher (than peak) performance. However, the 3m method
  has somewhat weaker numerical properties, making it less desirable
  in general.
- Further refined packing routines, which were recently revamped, and
  added packing functionality for 4m and 3m.
- Some modifications to trmm and trsm macro-kernels to facilitate indexing
  into micro-panels which were packed for 4m/3m virtual kernels.
- Added 4m and 3m interfaces for each level-3 operation.
- Various other minor changes to facilitate 4m/3m methods.
2014-02-19 17:00:52 -06:00
Tyler Smith
ce06686368 Fixed more Xeon Phi bugs, especially with scattered update 2014-02-14 13:52:18 -06:00
Tyler Smith
31134b5c70 Some fixes, changes, and improvements to the microkernel to the Xeon Phi 2014-02-14 11:19:44 -06:00
Field G. Van Zee
e7f154fe2e Applied edge case fix to arm/neon microkernel.
Details:
- Applied an edge case bugfix, courtesy of Francisco Igual, to the current
  double precision real gemm microkernel in kernels/arm/neon/3.
2014-01-10 08:48:07 -06:00
Field G. Van Zee
2cb13600f9 Updated year in copyright headers to 2014. 2014-01-03 12:29:13 -06:00
Field G. Van Zee
a0331fb10a Introduced auxinfo_t argument to micro-kernels.
Details:
- Removed a_next and b_next arguments to micro-kernels and replaced them
  with a pointer to a new datatype, auxinfo_t, which is simply a struct
  that holds a_next and b_next. The struct may hold other auxiliary
  information that may be useful to a micro-kernel, such as micro-panel
  stride. Micro-kernels may access struct fields via accessor macros
  defined in bli_auxinfo_macro_defs.h.
- Updated all instances of micro-kernel definitions, micro-kernel calls,
  as well as macro-kernels (for declaring and initializing the structs)
  according to above change.
2013-12-19 14:50:11 -06:00
Field G. Van Zee
5ad2ce7bf5 Minor x86_64 (core2) kernel fixes.
Details:
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
  for x86_64/core2 were calling the wrong reference code (l instead
  of u).
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
  kernels.
- Minor typecasting fix in testsuite/src/test_libblis.c.
- Makefile updates.
2013-12-09 18:30:49 -06:00
Field G. Van Zee
b444489f10 Added new "attached" scalar representation.
Details:
- Added infrastructure to support a new scalar representation, whereby
  every object contains an internal scalar that defaults to 1.0. This
  facilitates passing scalars around without having to house them in
  separate objects. These "attached" scalars are stored in the internal
  atom_t field of the obj_t struct, and are always stored to be the same
  datatype as the object to which they are attached. Level-3 variants no
  longer take scalar arguments, however, level-3 internal back-ends stll
  do; this is so that the calling function can perform subproblems such
  as C := C - alpha * A * B on-the-fly without needing to change either
  of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
  from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):

   bli_obj_init_scalar_copy_of()
                           -> bli_obj_scalar_init_detached_copy_of()
   bli_obj_init_scalar()   -> bli_obj_scalar_init_detached()
   bli_obj_create_scalar_with_attached_buffer()
                           -> bli_obj_create_1x1_with_attached_buffer()
   bli_obj_scalar_equals() -> bli_obj_equals()

- Defined new functions:

   bli_obj_scalar_detach()
   bli_obj_scalar_attach()
   bli_obj_scalar_apply_scalar()
   bli_obj_scalar_reset()
   bli_obj_scalar_has_nonzero_imag()
   bli_obj_scalar_equals()

- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:

   bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
   bli_obj_is_scalar()     -> bli_obj_is_1x1()

- Defined new macros to set and copy internal scalars between objects:

   bli_obj_set_internal_scalar()
   bli_obj_copy_internal_scalar()

- In level-3 internal back-ends, added conditional blocks where alpha and
  beta are checked for non-unit-ness. Those values for alpha and beta are
  applied to the scalars attached to aliases of A/B/C, as appropriate,
  before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
  alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
  attached to A and B are multiplied together to obtain alpha, while beta
  is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
  future support for mixed domain/precision. These can be added back later
  once that functionality is given proper treatment. Also, removed the
  creating of copy-casts of alpha and beta since typecasting of scalars
  is now implicitly handled in the internal back-ends when alpha and
  beta are applied to the attached scalars.
2013-12-03 16:08:30 -06:00
Field G. Van Zee
fd4ac636d9 Unimplemented kernels now call reference.
Details:
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
  unimplemented kernel functions simply call the corresponding reference
  implementation. (Previously, these unimplemented functions would
  abort() with a "not yet implemented" message.)
2013-12-02 13:50:36 -06:00
Field G. Van Zee
9e1d0d4bca Added trsm_l, trsm_u ukernels for x86_64/core2.
Details:
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
  These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
  that already existed in kernels/x86_64/core2-sse3/3.
2013-11-18 18:11:07 -06:00
Field G. Van Zee
85e7e02ea3 Merge branch 'master'. Forgot to git-pull. 2013-11-18 12:02:00 -06:00
Field G. Van Zee
67761e224c Attempting to fix errors in bgq build.
Details:
- Removed restrict declaration from b_cast and c_cast from
  bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
  are causing problems for xlc only in those two files and no other
  macro-kernels.
- Fixed (hopefully) kernel function parameter type declarations in
  kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.
2013-11-18 11:57:40 -06:00
Field G. Van Zee
707200541d Syntax error fix in x86_64/core2 gemmtrsm_u ukr. 2013-11-18 11:17:31 -06:00
Field G. Van Zee
d70733abdd Added ARM kernels, configurations.
Details:
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
  Thanks to Francisco Igual for contributing these kernels and
  configurations.
2013-11-16 17:34:25 -06:00
Field G. Van Zee
19885f893a Updated some kernel comment headers.
Details:
- Updated bgq and piledriver comment headers to use BLIS copyright header
  instead of libflame.
2013-11-11 12:09:21 -06:00
Field G. Van Zee
376bbb59c8 Removed support for duplication.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
  and all framework code.
- Updated test suite modules according to above changes.
2013-11-08 11:17:34 -06:00
Field G. Van Zee
a091a219bd Minor fixes to piledriver configuration, ukernel.
Details:
- Applied a patch from Tyler that fixes minor staleness in the piledriver
  configuration and gemm micro-kernel.
- Very minor changes to test suite input files.
2013-10-14 10:11:29 -05:00
Field G. Van Zee
dacdde27ae Added Fran's Sandy Bridge kernels/configuration.
Details:
- Added a kernel directory for kernels developed by Francisco Igual for
  the Sandy Bridge architecture, including a dgemm ukernel coded with
  AVX intrinsics.
- Added a configuration for Sandy Bridge using values supplied by Fran.
2013-10-11 11:37:19 -05:00
Field G. Van Zee
3690bdd4f9 More updates to level-1f kernels for core2-sse3.
Details:
- Changed types in function signatures to match new prototypes. Meant to
  include this in previous commit.
2013-10-10 11:45:33 -05:00
Field G. Van Zee
661d5120cd Fixed outdated fusing factor macros in 1f kernels.
Details:
- Updated level-1f kernels for x86_64 and bgq to use renamed fusing factor
  macros. Meant to include this in 5e54f46c. Thanks to Fran for pointing
  this out.
2013-10-10 11:27:27 -05:00
Field G. Van Zee
5e54f46ccb Added template implementations and other tweaks.
Details:
- Added a 'template' configuration, which contains stub implementations of the
  level 1, 1f, and 3 kernels with one datatype implemented in C for each, with
  lots of in-file comments and documentation.
- Modified some variable/parameter names for some 1/1f operations. (e.g.
  renaming vector length parameter from m to n.)
- Moved level-1f fusing factors from axpyf, dotxf, and dotxaxpyf header files
  to bli_kernel.h.
- Modifed test suite to print out fusing factors for axpyf, dotxf, and
  dotxaxpyf, as well as the default fusing factor (which are all equal
  in the reference and template implementations).
- Cleaned up some sloppiness in the level-1f unb_var1.c files whereby these
  reference variants were implemented in terms of front-end routines rather
  that directly in terms of the kernels. (For example, axpy2v was implemented
  as two calls to axpyv rather than two calls to AXPYV_KERNEL.)
- Changed the interface to dotxf so that it matches that of axpyf, in that
  A is assumed to be m x b_n in both cases, and for dotxf A is actually used
  as A^T.
- Minor variable naming and comment changes to reference micro-kernels in
  frame/3/gemm/ukernels and frame/3/trsm/ukernels.
2013-09-30 12:58:18 -05:00
Field G. Van Zee
97aaf220a8 Added new kernels, configurations.
Details:
- Added various micro-kernels for the following architectures:
    Intel MIC
    IBM BG/Q
    IBM Power7
    AMD Piledriver
    Loogson 3A
  and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
  and Xianyi Zhang for contributing these kernels.
- Added configurations corresponding to above architectures, and renamed
  "clarksville" configuration to "dunnington".
2013-09-17 10:51:36 -05:00
Field G. Van Zee
d352c746e5 Added single/real gemm micro-kernel for x86_64.
Details:
- Added a single-precision real gemm micro-kernel in
  kernels/x86_64/3/bli_gemm_opt_d4x4.c.
- Adjusted the single-precision real register blocksizes in
  config/clarksville/bli_kernel.h to be 8x4.
- Added a missing comment to bli_packm_blk_var2.c that was present in
  bli_packm_blk_var3.c
2013-08-27 13:41:46 -05:00
Field G. Van Zee
dedda523dc Fixed bug in bli_acquire_mpart_t2b(), _l2r().
Details:
- Fixed a bug in bli_acquire_mpart_t2b() and bli_acquire_mpart_l2r()
  that cause incorrect partitioning when SUBPART0 was requested. This
  bug was introduced in 46d3d09d49. Thanks to Bryan for isolating
  this bug.
- Removed dupl kernels from kernels/x86_64/3 directory.
- Uncommented beta == 0 optimizaition code in
  kernels/x86_64/3/bli_gemm_opt_d4x4.c.
2013-08-19 12:07:41 -05:00
Field G. Van Zee
05a657a6b9 Added beta == 0 optimization to x86_64 ukernel.
Details:
- Modified x86_64 gemm microkernel so that when beta is zero, C is not read
  from memory (nor scaled by beta).
- Fixed minor bug in test suite driver when "Test all combinations of storage
  schemes?" switch is disabled, which would result in redundant tests being
  executed for matrix-only (e.g. level-1m, level-3) operations if multiple
  vector storage schemes were specified.
- Restored debug flags as default in clarksville configuration.
2013-06-07 11:04:10 -05:00
Field G. Van Zee
9feb4c23d2 Fixed unaligned handling in axpyf, dotxaxpyf.
Details:
- Fixed over-cautious handling of unaligned operands in vector instrinsic
  implementation of axpyf kernel.
- Fixed over- and under-cautious handling of unaligned operands in vector
  intrinsic implementation of dotxaxpyf kernel.
2013-06-04 14:57:46 -05:00
Field G. Van Zee
22b06cfcd2 Updated level-1/-1f [vector intrinsic] kernels.
Details:
- Updated level-1/-1f kernels so that non-unit and un-aligned cases are
  handled by reference implementation (rather than aborted).
- Added -fomit-frame-pointer to default make_defs.mk for clarksville
  configuration.
- Defined bli_offset_from_alignment() macro.
- Minor edits to old test drivers.
2013-06-03 16:54:52 -05:00
Field G. Van Zee
0288c827d3 Updated ukernels for x86_64.
Details:
- Tweaked micro-kernels and configuration for clarksville.
- Updated/cleaned up old test drivers in test directory.
- Fixed syntax bug in trsv_unb_var1 and trsv_unf_var1 (introduced
  recently).
2013-06-01 08:02:23 -05:00
Field G. Van Zee
2d9c667f3c Fixed x86_64 kernel bugs and other minor issues.
Details:
- Fixed bugs in trmv_l and trsv_u due to backwards iteration resulting in
  unaligned subpartitions. We were already going out of our way a bit to
  handle edge cases in the first iteration for blocked variants, and this
  was simply the unblocked-fused extension of that idea.
- Fixed control tree handling in her/her2/syr/syr2 that was not taking
  into account how the choice of variant needed to be altered for
  upper-stored matrices (given that only lower-stored algorithms are
  explicitly implemented).
- Added bli_determine_blocksize_dim_f(), bli_determine_blocksize_dim_b()
  macros to provide inlined versions of bli_determine_blocksize_[fb]() for
  use by unblocked-fused variants.
- Integrated new blocksize_dim macros into gemv/hemv unf variants for
  consistency with that of the bugfix for trmv/trsv (both of which now
  use the same macros).
- Modified bli_obj_vector_inc() so that 1 is returned if the object is a
  vector of length 1 (ie: 1 x 1). This fixes a bug whereby under certain
  conditions (e.g. dotv_opt_var1), an invalid increment was returned, which
  was invalid only because the code was expecting 1 (for purposes of
  performing contiguous vector loads) but got a value greater than 1 because
  the column stride of the object (e.g. rho) was inflated for alignment
  purposes (albeit unnecessarily since there is only one element in the
  object).
- Replaced some old invocations of set0 with set0s.
- Added alpha parameter to gemmtrsm ukernels for x86_64 and use accordingly.
- Fixed increment bug in cleanup loop of gemm ukernel for x86_64.
- Added safeguard to test modules so that testing a problem with a zero
  dimension does not result in a failure.
- Tweaked handling of zero dimensions in level-2 and level-3 operations'
  internal back-ends to correctly handle cases where output operand still
  needs to be scaled (e.g. by beta, in the case of gemm with k = 0).
2013-05-24 16:28:10 -05:00
Field G. Van Zee
1dab11e37d Updated x86 gemmtrsm ukernels to use alpha. 2013-04-23 17:17:11 -05:00
Field G. Van Zee
9d10d7dd9b Added a_next, b_next arguments to micro-kernels.
Details:
- Added two more arguments to the gemm and gemmtrsm microkernels: the
  addresses of the next micro-panels of A and B. By passing these
  pointers into the micro-kernel, we allow the micro-kernel author to
  prefetch micro-panels of A and B as necessary (though this is
  completely optional; these addresses may also be safely ignored).
- Updated all seven macro-kernels so that they compute and pass in
  a_next and b_next. Note that ONLY the gemm macro-kernel computes
  a_next and b_next with the precise semantics we want. I will go back
  and fix the other macro-kernels in the near future.
- Added 'restrict' to various micro-kernels from which it was missing.
2013-04-23 16:00:18 -05:00
Field G. Van Zee
4fe1435f20 Updated dupl implementation to use PACKNR and NR.
Details:
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
  explicitly so navigate b1 so that situations where PACKNR > NR are
  supported.
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
  frame/3/trsm/ukernels to kernels/c99/.
- Updated clarksville and flame configurations.
2013-04-22 19:00:43 -05:00
Field G. Van Zee
ca9e435c57 Fixed a bug in reference implementation of dupl.
Details:
- Fixed a bug in reference implementation of dupl (bli_dupl_unb_var1.c),
  which resulted in incorrect duplication.
- Updated old test drivers according to recently updated packm control tree
  creation interface.
- Added 'restrict' to x86 gemm microkernel interface.
2013-04-15 09:59:46 -05:00
Field G. Van Zee
b65cdc57d9 Migrated 'bl2' prefix to 'bli'.
Details:
- Changed all filename and function prefixes from 'bl2' to 'bli'.
- Changed the "blis2.h" header filename to "blis.h" and changed all
  corresponding #include statements accordingly.
- Fixed incorrect association for Fran in CREDITS file.
2013-03-24 20:01:49 -05:00
Field G. Van Zee
551ea4767a Removed #include "blis2.h" from low-level headers.
Details:
- Removed #include of "blis2.h" from various lower-level, operation-specific
  header files throughout the framework. Given that these low-level headers
  are included within #blis2.h in a very specific order, #include'ing blis2.h
  within them directly is unnecessary.
2013-03-24 18:00:10 -05:00
Field G. Van Zee
1274e12437 Updated copyright headers from 2012 to 2013. 2013-02-11 14:37:47 -06:00
Field G. Van Zee
be94fb84c0 Added missing 'd' to fused gemmtrsm function name. 2013-01-04 10:55:21 -06:00
Field G. Van Zee
8945db6ec9 Renamed x86,x86_64 kernels to indicate 'd' fusing.
Details:
- Renamed x86 and x86_64 kernels to contain a 'd' before the fusing shape
  to emphasize that the fusing shape is not for all datatype instances, but
  rather just for one (that of double-precision real). Other fusing shapes
  would be proportional to their precision and domain "byte footprints".
- Corresponding changes to config/clarksville/bl2_kernel.h.
2012-12-18 15:07:36 -06:00
Field G. Van Zee
2f272b40f4 Added build system and continued reorganization.
Details:
- Added/renamed packm, unpackm kernels.
- Added machine value routines.
- Added param_map facility.
- Renamed AUTHORS to CREDITS.
- Added Makefile; continued to expand upon existing configure script.
- #define fuse_fac macros in operation headers if not defined already
  (by the user in bl2_kernels.h).
2012-12-04 19:22:14 -06:00
Field G. Van Zee
00f3498a89 Initial commit. 2012-12-03 12:36:11 -06:00