Commit Graph

3 Commits

Author SHA1 Message Date
Field G. Van Zee
97aaf220a8 Added new kernels, configurations.
Details:
- Added various micro-kernels for the following architectures:
    Intel MIC
    IBM BG/Q
    IBM Power7
    AMD Piledriver
    Loogson 3A
  and reorganized kernels directory. Thanks to Tyler Smith, Mike Kistler,
  and Xianyi Zhang for contributing these kernels.
- Added configurations corresponding to above architectures, and renamed
  "clarksville" configuration to "dunnington".
2013-09-17 10:51:36 -05:00
Field G. Van Zee
9d10d7dd9b Added a_next, b_next arguments to micro-kernels.
Details:
- Added two more arguments to the gemm and gemmtrsm microkernels: the
  addresses of the next micro-panels of A and B. By passing these
  pointers into the micro-kernel, we allow the micro-kernel author to
  prefetch micro-panels of A and B as necessary (though this is
  completely optional; these addresses may also be safely ignored).
- Updated all seven macro-kernels so that they compute and pass in
  a_next and b_next. Note that ONLY the gemm macro-kernel computes
  a_next and b_next with the precise semantics we want. I will go back
  and fix the other macro-kernels in the near future.
- Added 'restrict' to various micro-kernels from which it was missing.
2013-04-23 16:00:18 -05:00
Field G. Van Zee
4fe1435f20 Updated dupl implementation to use PACKNR and NR.
Details:
- Updated frame/util/dupl/bli_dupl_unb_var1.c to utilize PACKNR and NR
  explicitly so navigate b1 so that situations where PACKNR > NR are
  supported.
- Moved the 4x2 and 4x4 reference micro-kernels in frame/3/gemm/ukernels and
  frame/3/trsm/ukernels to kernels/c99/.
- Updated clarksville and flame configurations.
2013-04-22 19:00:43 -05:00