Commit Graph

71 Commits

Author SHA1 Message Date
Field G. Van Zee
a6a156e9fe Added cgemm ukernel for avx/sandybridge.
Details:
- Implemented AVX-based cgemm micro-kernel (via GNU extended inline
  assembly syntax).
- Updated sandybridge configuration accordingly.
2014-10-10 14:26:41 -05:00
Field G. Van Zee
6f8575ab25 Added zgemm ukernel for avx/sandybridge.
Details:
- Implemented AVX-based zgemm micro-kernel (via GNU extended inline
  assembly syntax).
- Updated sandybridge configuration accordingly.
2014-10-10 10:01:45 -05:00
Tyler Smith
7a8ad47fb2 Minor changes to knc configuration, including preference row major storage
Also fixed a bug in the knc micro-kernel where it would fail if k == 0
2014-10-08 15:52:13 -05:00
Field G. Van Zee
e80a453784 Fixed bug introduced by bugfix in 25b258d.
Details:
- We actually need to check alignment of lda*sizeof(double) and NOT
  a+lda because in the latter case, alignment could cancel out and
  still allow the optimized code to run when it shouldn't. Thanks
  to Devin for pointing this out.
2014-09-18 10:24:20 -05:00
Field G. Van Zee
25b258d61f Fixed a non-fatal problem with bugfix in a68b316c.
Details:
- The bugfix in a68b316c was inadvertantly checkin alignment of the
  leading dimension itself, rather than the byte size of the leading
  dimension. Now, we simply check alignment of a+lda.
2014-09-18 10:10:49 -05:00
Field G. Van Zee
a68b316ca4 Fixed alignment bugs in level-1f kernels.
Details:
- Fixed bugs whereby the level-1f dotxf, axpyxf, and dotxaxpyf kernels
  were attempting to compute problems with unaligned leading dimensions
  with optimized code, rather than (correctly) using the reference
  implementations. Thanks to Devin Matthews for reporting this bug.
2014-09-17 11:10:07 -05:00
Tyler Smith
86fc7e4076 Added bulldozer configuration and updated piledriver micro-kernel 2014-09-15 10:35:46 -05:00
Field G. Van Zee
bc1d86b2d4 Sandy Bridge configuration, micro-kernel update.
Details:
- Minor updates to bli_config and bli_kernel.h for sandybridge
  configuration.
- Renamed existing AVX intrinsic-based micro-kernel file to
  bli_gemm_int_d8x4.c.
- Added new file, bli_gemm_asm_d8x4.c, which provides assembly-based
  gemm micro-kernels for single- and double-precision real.
2014-08-07 19:01:20 -05:00
Field G. Van Zee
45692e3ad4 Reverted some accidental changes.
Details:
- Reverted some changes that were unintentionally included in the
  previous commit (9526ce98). Thanks to Tony Kelman for pointing
  this out. (Note: a few select changes were not reverted.)
2014-08-07 13:21:15 -05:00
Field G. Van Zee
9526ce9881 Updated copyright headers of emscripten configuration files. 2014-08-06 14:15:34 -05:00
Field G. Van Zee
c73261f17e More minor cleanups post-copyright update. 2014-07-14 16:23:51 -05:00
Field G. Van Zee
2a09d24463 Reverted power7 symlinks destroyed by sed script.
Details:
- Reverted two symlinks, in kernels/power7/3/test, back to being symlinks
  after recursive-sed.sh mistakenly replaced them with copies of the
  actual files to which they referred. Meant to include this in previous
  commit.
2014-07-14 16:17:09 -05:00
Field G. Van Zee
7ed415824d Updated copyright headers (continued).
Details:
- Inserted "at Austin" into third clause of license declarations.
  Meant to include this change in previous commit.
2014-07-14 16:14:33 -05:00
Field G. Van Zee
5c2c6c8561 Updated copyright headers to contain "at Austin".
Details:
- Updated copyright headers to include "at Austin" in the name of the
  University of Texas.
- Updated the copyright years of a few headers to 2014 (from 2011 and
  2012).
2014-07-14 16:05:03 -05:00
Marat Dukhan
b693b0cddc [SC]AXPY kernels for PNaCl 2014-06-22 13:44:25 -07:00
Marat Dukhan
020a831bc5 Code clean-up in PNaCl port 2014-06-19 00:58:26 -07:00
Marat Dukhan
491be4f91e Optimized dot product kernels for PNaCl 2014-06-19 00:45:44 -07:00
Marat Dukhan
b2ffb4de8b Reformatted PNaCl GEMM kernels 2014-06-15 18:41:30 -04:00
Marat Dukhan
6de2d472d9 CGEMM and ZGEMM kernels for PNaCl 2014-06-15 08:44:31 -04:00
Marat Dukhan
f064711a5e SGEMM and DGEMM kernels for PNaCl 2014-06-15 06:27:37 -04:00
Tyler Smith
00f232f8ed Added single-precision micro-kernel for Knights Corner aka MIC aka Xeon Phi 2014-06-02 13:40:57 -05:00
Field G. Van Zee
3fc60e4914 Fixed ldim alignment bug in core2 gemm ukernel.
Details:
- Fixed a bug in the dunnington/core2 gemm micro-kernels that resulted in
  a segmentation fault if a column-stored matrix's starting address was
  aligned, but its leading dimension was such that its second column was
  unaligned. Basically, the micro-kernel was assuming that aligned load
  instructions were safe when they actually were not. An extra condition
  that checks the alignment of cs_c (ie: the leading dimension in the
  column storage case) has now been added. Thanks to Michael Lehn for
  reporting this bug.
2014-05-21 11:34:42 -05:00
Tyler Michael Smith
20e24430a7 Some fixes for the bgq kernels 2014-04-08 17:50:44 +00:00
Tyler Smith
2b6848b239 Merge http://github.com/flame/blis
Conflicts:
	kernels/bgq/1/bli_axpyv_opt_var1.c
	kernels/bgq/1/bli_dotv_opt_var1.c
2014-04-04 09:54:54 -05:00
Tyler Michael Smith
4e3eb39aca Some fixes to the bgq config
MR and NR for double complex were wrong
Default fusing factor for double precision was wrong as well
2014-04-04 14:50:03 +00:00
Field G. Van Zee
21a0efb33d Fixed follow-up to issue #6. 2014-04-03 16:38:44 -05:00
Field G. Van Zee
c318157a9b Fixed issue #6 (incorrect 'restrict' usage).
Details:
- Fixed improper usage of restrict keyword in axpyv and dotv bgq kernels.
  (However, there may be other instances of similar misuse elsewhere in
  BLIS.) Thanks to Jeff Hammond for reporting this issue.
2014-04-03 16:24:34 -05:00
Field G. Van Zee
b5150a1bf3 Added #include "arm_neon.h" to ARM gemm ukernel.
Details:
- Inserted #include "arm_neon.h" into gemm ukernel source file for
  arm/neon. Thanks to Jean-Michel Hautbois for suggesting this fix.
2014-04-03 12:25:45 -05:00
Field G. Van Zee
d27b4f690c Use generic paths for toolchain in POWER7.
Details:
- Fixed issue #4. Thanks to Jeff Hammond for contributing changes.
2014-04-01 12:57:24 -05:00
Tyler Michael Smith
73b3db5948 Some fixes for the bgq configuration 2014-03-26 15:39:05 +00:00
Field G. Van Zee
fde5f1fdec Added extensive support for configuration defaults.
Details:
- Standard names for reference kernels (levels-1v, -1f and 3) are now
  macro constants. Examples:
    BLIS_SAXPYV_KERNEL_REF
    BLIS_DDOTXF_KERNEL_REF
    BLIS_ZGEMM_UKERNEL_REF
- Developers no longer have to name all datatype instances of a kernel
  with a common base name; [sdcz] datatype flavors of each kernel or
  micro-kernel (level-1v, -1f, or 3) may now be named independently.
  This means you can now, if you wish, encode the datatype-specific
  register blocksizes in the name of the micro-kernel functions.
- Any datatype instances of any kernel (1v, 1f, or 3) that is left
  undefined in bli_kernel.h will default to the corresponding reference
  implementation. For example, if BLIS_DGEMM_UKERNEL is left undefined,
  it will be defined to be BLIS_DGEMM_UKERNEL_REF.
- Developers no longer need to name level-1v/-1f kernels with multiple
  datatype chars to match the number of types the kernel WOULD take in
  a mixed type environment, as in bli_dddaxpyv_opt(). Now, one char is
  sufficient, as in bli_daxpyv_opt().
- There is no longer a need to define an obj_t wrapper to go along with
  your level-1v/-1f kernels. The framework now prvides a _kernel()
  function which serves as the obj_t wrapper for whatever kernels are
  specified (or defaulted to) via bli_kernel.h
- Developers no longer need to prototype their kernels, and thus no
  longer need to include any prototyping headers from within
  bli_kernel.h. The framework now generates kernel prototypes, with the
  proper type signature, based on the kernel names defined (or defaulted
  to) via bli_kernel.h.
- If the complex datatype x (of [cz]) implementation of the gemm micro-
  kernel is left undefined by bli_kernel.h, but its same-precision real
  domain equivalent IS defined, BLIS will use a 4m-based implementation
  for the datatype x implementations of all level-3 operations, using
  only the real gemm micro-kernel.
2014-02-25 13:34:56 -06:00
Field G. Van Zee
6363a9f658 Added level-3 support for complex via 4m-/3m.
Details:
- Added the ability to induce complex domain level-3 operations via new
  virtual complex micro-kernels which are implemented via only real
  domain micro-kernels. Two new implementations are provided: 4m and 3m.
  4m implements complex matrix multiplication in terms of four real
  matrix multiplications, where as 3m uses only three and thus is
  capable of even higher (than peak) performance. However, the 3m method
  has somewhat weaker numerical properties, making it less desirable
  in general.
- Further refined packing routines, which were recently revamped, and
  added packing functionality for 4m and 3m.
- Some modifications to trmm and trsm macro-kernels to facilitate indexing
  into micro-panels which were packed for 4m/3m virtual kernels.
- Added 4m and 3m interfaces for each level-3 operation.
- Various other minor changes to facilitate 4m/3m methods.
2014-02-19 17:00:52 -06:00
Tyler Smith
ce06686368 Fixed more Xeon Phi bugs, especially with scattered update 2014-02-14 13:52:18 -06:00
Tyler Smith
31134b5c70 Some fixes, changes, and improvements to the microkernel to the Xeon Phi 2014-02-14 11:19:44 -06:00
Field G. Van Zee
e7f154fe2e Applied edge case fix to arm/neon microkernel.
Details:
- Applied an edge case bugfix, courtesy of Francisco Igual, to the current
  double precision real gemm microkernel in kernels/arm/neon/3.
2014-01-10 08:48:07 -06:00
Field G. Van Zee
2cb13600f9 Updated year in copyright headers to 2014. 2014-01-03 12:29:13 -06:00
Field G. Van Zee
a0331fb10a Introduced auxinfo_t argument to micro-kernels.
Details:
- Removed a_next and b_next arguments to micro-kernels and replaced them
  with a pointer to a new datatype, auxinfo_t, which is simply a struct
  that holds a_next and b_next. The struct may hold other auxiliary
  information that may be useful to a micro-kernel, such as micro-panel
  stride. Micro-kernels may access struct fields via accessor macros
  defined in bli_auxinfo_macro_defs.h.
- Updated all instances of micro-kernel definitions, micro-kernel calls,
  as well as macro-kernels (for declaring and initializing the structs)
  according to above change.
2013-12-19 14:50:11 -06:00
Field G. Van Zee
5ad2ce7bf5 Minor x86_64 (core2) kernel fixes.
Details:
- Fixed copy-and-paste bug whereby [scz]gemmtrsm_u_opt_d4x4 kernels
  for x86_64/core2 were calling the wrong reference code (l instead
  of u).
- Fixed some unused variables in x86_64/core2 dotaxpyv and dotxaxpyf
  kernels.
- Minor typecasting fix in testsuite/src/test_libblis.c.
- Makefile updates.
2013-12-09 18:30:49 -06:00
Field G. Van Zee
b444489f10 Added new "attached" scalar representation.
Details:
- Added infrastructure to support a new scalar representation, whereby
  every object contains an internal scalar that defaults to 1.0. This
  facilitates passing scalars around without having to house them in
  separate objects. These "attached" scalars are stored in the internal
  atom_t field of the obj_t struct, and are always stored to be the same
  datatype as the object to which they are attached. Level-3 variants no
  longer take scalar arguments, however, level-3 internal back-ends stll
  do; this is so that the calling function can perform subproblems such
  as C := C - alpha * A * B on-the-fly without needing to change either
  of the scalars attached to A or B.
- Removed scalar argument from packm_int().
- Observe and apply attached scalars in scalm_int(), and removed scalar
  from interface of scalm_unb_var1().
- Renamed the following functions (and corresponding invocations):

   bli_obj_init_scalar_copy_of()
                           -> bli_obj_scalar_init_detached_copy_of()
   bli_obj_init_scalar()   -> bli_obj_scalar_init_detached()
   bli_obj_create_scalar_with_attached_buffer()
                           -> bli_obj_create_1x1_with_attached_buffer()
   bli_obj_scalar_equals() -> bli_obj_equals()

- Defined new functions:

   bli_obj_scalar_detach()
   bli_obj_scalar_attach()
   bli_obj_scalar_apply_scalar()
   bli_obj_scalar_reset()
   bli_obj_scalar_has_nonzero_imag()
   bli_obj_scalar_equals()

- Placed all bli_obj_scalar_* functions in a new file, bli_obj_scalar.c.
- Renamed the following macros:

   bli_obj_scalar_buffer() -> bli_obj_buffer_for_1x1()
   bli_obj_is_scalar()     -> bli_obj_is_1x1()

- Defined new macros to set and copy internal scalars between objects:

   bli_obj_set_internal_scalar()
   bli_obj_copy_internal_scalar()

- In level-3 internal back-ends, added conditional blocks where alpha and
  beta are checked for non-unit-ness. Those values for alpha and beta are
  applied to the scalars attached to aliases of A/B/C, as appropriate,
  before being passed into the variant specified by the control tree.
- In level-3 blocked variants, pass BLIS_ONE into subproblems instead of
  alpha and/or beta.
- In level-3 macro-kernels, changed how scalars are obtained. Now, scalars
  attached to A and B are multiplied together to obtain alpha, while beta
  is obtained directly from C.
- In level-3 front-ends, removed old function calls meant to provide
  future support for mixed domain/precision. These can be added back later
  once that functionality is given proper treatment. Also, removed the
  creating of copy-casts of alpha and beta since typecasting of scalars
  is now implicitly handled in the internal back-ends when alpha and
  beta are applied to the attached scalars.
2013-12-03 16:08:30 -06:00
Field G. Van Zee
fd4ac636d9 Unimplemented kernels now call reference.
Details:
- Updated micro-kernels for arm, bgq, loongson3a, and x86_64 so that
  unimplemented kernel functions simply call the corresponding reference
  implementation. (Previously, these unimplemented functions would
  abort() with a "not yet implemented" message.)
2013-12-02 13:50:36 -06:00
Field G. Van Zee
9e1d0d4bca Added trsm_l, trsm_u ukernels for x86_64/core2.
Details:
- Added standalone trsm_l/trsm_u micro-kernels for x86_64 (core2).
  These kernels are based on the gemmtrsm_l/gemmtrsm_u micro-kernels
  that already existed in kernels/x86_64/core2-sse3/3.
2013-11-18 18:11:07 -06:00
Field G. Van Zee
85e7e02ea3 Merge branch 'master'. Forgot to git-pull. 2013-11-18 12:02:00 -06:00
Field G. Van Zee
67761e224c Attempting to fix errors in bgq build.
Details:
- Removed restrict declaration from b_cast and c_cast from
  bli_trsm_lu_ker_var2.c and bli_trsm_rl_ker_var2.c. Curiously, they
  are causing problems for xlc only in those two files and no other
  macro-kernels.
- Fixed (hopefully) kernel function parameter type declarations in
  kernels/bgq/1f/bli_axpyf_opt_var1.c and kernels/bgq/3/bli_gemm_8x8.c.
2013-11-18 11:57:40 -06:00
Field G. Van Zee
707200541d Syntax error fix in x86_64/core2 gemmtrsm_u ukr. 2013-11-18 11:17:31 -06:00
Field G. Van Zee
d70733abdd Added ARM kernels, configurations.
Details:
- Added kernels for ARM, and configurations for Cortex-A9 and Cortex-A15.
  Thanks to Francisco Igual for contributing these kernels and
  configurations.
2013-11-16 17:34:25 -06:00
Field G. Van Zee
19885f893a Updated some kernel comment headers.
Details:
- Updated bgq and piledriver comment headers to use BLIS copyright header
  instead of libflame.
2013-11-11 12:09:21 -06:00
Field G. Van Zee
376bbb59c8 Removed support for duplication.
Details:
- Removed support for duplication from the gemmtrsm/trsm micro-kernels
  and all framework code.
- Updated test suite modules according to above changes.
2013-11-08 11:17:34 -06:00
Field G. Van Zee
a091a219bd Minor fixes to piledriver configuration, ukernel.
Details:
- Applied a patch from Tyler that fixes minor staleness in the piledriver
  configuration and gemm micro-kernel.
- Very minor changes to test suite input files.
2013-10-14 10:11:29 -05:00
Field G. Van Zee
dacdde27ae Added Fran's Sandy Bridge kernels/configuration.
Details:
- Added a kernel directory for kernels developed by Francisco Igual for
  the Sandy Bridge architecture, including a dgemm ukernel coded with
  AVX intrinsics.
- Added a configuration for Sandy Bridge using values supplied by Fran.
2013-10-11 11:37:19 -05:00
Field G. Van Zee
3690bdd4f9 More updates to level-1f kernels for core2-sse3.
Details:
- Changed types in function signatures to match new prototypes. Meant to
  include this in previous commit.
2013-10-10 11:45:33 -05:00