Files
blis/frame/ind
Field G. Van Zee 37e55ca39b Fixed obscure 3m1/4m1a bugs in trmm[3] and trsm.
Details:
- Fixed a family of bugs in the triangular level-3 operations for
  certain complex implementations (3m1 and 4m1a) that only manifest if
  one of the register blocksizes (PACKMR/PACKNR, actually) is odd:
  - Fixed incorrect imaginary stride computation in bli_packm_blk_var2()
    for the triangular case.
  - Fixed the incorrect computation of imaginary stride, as stored in
    the auxinfo_t struct in trmm and trsm macro-kernels.
  - Fixed incorrect pointer arithmetic in the trsm macro-kernels in the
    cases where the the register blocksize for the triangular matrix is
    odd. Introduced a new byte-granular pointer arithmetic macro,
    bli_ptr_add(), that computes the correct value.
- Added cpp macro to bli_macro_defs.h for typeof() operator, defined in
  terms of __typeof__, which is used by bli_ptr_add() macro.
- Disabled the row- vs. column-storage optimization in bli_trmm_front()
  for singleton problems because the inherent ambiguity of whether a
  scalar is row-stored or column-stored causes the wrong parameter
  combination code to be executed (by dumb luck of our checking for
  row storage first).
- Added commented-out debugging lines to 3m1/4m1a and reference
  micro-kernels, and trsm_ll macro-kernel.
2015-10-30 18:25:04 -05:00
..