Commit Graph

1905 Commits

Author SHA1 Message Date
Field G. Van Zee
c77ddc4181 Added optional numactl usage to test/3/runme.sh. 2020-09-30 20:15:43 +00:00
Field G. Van Zee
4fd8d9fec2 Tweaked zen2 subconfig's MC cache blocksizes.
Details:
- Updated the MC cache blocksizes registered by the 'zen2' subconfig.
- Minor updates to test/3/Makefile and test/3/runme.sh.
2020-09-28 23:39:05 +00:00
Field G. Van Zee
5efcdeffd5 More minor README.md updates. 2020-09-25 14:25:24 -05:00
Field G. Van Zee
9e940f8aad Added 1m SISC bibtex to README.md.
Details:
- Added final citation info to 1m bibtex in README.md file.
- Updated draft 1m paper link.
- Changed some http to https.
2020-09-25 13:53:35 -05:00
Field G. Van Zee
e293cae2d1 Implemented sgemmsup assembly kernels.
Details:
- Created a set of single-precision real millikernels and microkernels
  comparable to the dgemmsup kernels that already exist within BLIS.
- Added prototypes for all kernels within bli_kernels_haswell.h.
- Registered entry-point millikernels in bli_cntx_init_haswell.c and
  bli_cntx_init_zen.c.
- Added sgemmsup support to the Makefile, runme.sh script, and source
  file in test/sup. This included edits that allow for separate "small"
  dimensions for single- and double-precision as well as for single-
  vs. multithreaded execution.
2020-09-15 16:09:11 -05:00
Field G. Van Zee
2765c6f37c Type saga continues; fixed sgemm ukernel signature.
Details:
- Changed double* pointers in sgemm function signature to float*. At
  this point I've lost track of whether this was my fault or another
  dormant bug like the one described in ece9f6a, but at this point I
  no longer care. It's one of those days (aka I didn't ask for this).
2020-09-12 17:48:15 -05:00
Field G. Van Zee
0779559509 Fixed missing restrict in knl sgemm prototype.
Details:
- Added a missing 'restrict' qualifier in the sgemm ukernel prototype
  for knl. (Not sure how that code was ever compiling before now.)
2020-09-12 17:37:21 -05:00
Field G. Van Zee
ece9f6a3ef Fixed dormant type bugs in bli_kernels_knl.h.
Details:
- Fixed dormant type mismatches in the use of the prototype-generating
  macros in bli_kernels_knl.h. Specifically, some float prototypes
  were incorrectly using double as their ctype. This didn't actually
  matter until the type changes in 645d771, as previously those types
  were not used since packm was prototyped with void* pointers.
2020-09-12 17:22:42 -05:00
Field G. Van Zee
8ebb3b60e1 Fixed accidental breakage in 645d771.
Details:
- In trying to clean up kappa_cast variables in the reference packm
  kernels, which I initally believed to be redundant given the other
  void* -> ctype* changes in 645d771, I accidentally ended up violating
  restrict semantics for 1e/1r packing and possibly other packm kernels.
  (Normally, my pre-commit testsuite run would have caught this, but I
  was unknowingly using an edited input.operations file in which I'd
  disabled most tests as part of unrelated work.) This commit reverts
  the kappa_cast changes in 645d771.
2020-09-12 17:00:47 -05:00
Field G. Van Zee
645d771a14 Minor packm kernel type cleanup (void* -> ctype*).
Details:
- Changed all void* function arguments in reference packm kernels to
  those of the native type (ctype*). These pointers no longer need to
  be void* and are better represented by their native types anyway.
  (See below for details.) Updated knl packm kernels accordingly.
- In the definition of the PACKM_KER_PROT prototype macro template in
  frame/1m/bli_l1m_ker_prot.h, changed the pointer types for kappa, a,
  and p from void* to ctype*. They were originally void* because these
  function signatures had to share the same type so they could all be
  stored in a single array of that shared type, from which they were
  queried and called by packm_cxk(). This is no longer how the function
  pointers are stored, and so it no longer makes sense to force the
  caller of packm kernels to use void*, only so that the implementor
  of the packm kernels can typecast back to the native datatype within
  the kernel definition. This change has no effect internally within
  BLIS because currently all packm kernels are called after querying
  the function addresses from the context and then typecasting to the
  appropriate function pointer type, which is based upon type-specific
  function pointers like float* and double*.
- Removed a comment in frame/1m/bli_l1m_ft_ker.h that was outdated and
  misleading due to changes to the handling of packm kernels since
  moving them into the context.
2020-09-12 15:31:56 -05:00
Field G. Van Zee
54bf6c3554 Minor README.md update.
Details:
- Added a new entry to the "What people are saying about BLIS" section.
2020-09-10 15:42:01 -05:00
Field G. Van Zee
e50b4d4046 Minor update to README.md (SIAM Best Paper Prize). 2020-09-09 14:12:53 -05:00
Devin Matthews
a8efb72074 Merge pull request #434 from flame/intel-zdot
Add an option to change the complex return type.
2020-09-07 16:18:19 -05:00
Field G. Van Zee
97e87f2c9f Whitespace/comment updates to #434 PR. 2020-09-07 15:56:42 -05:00
Devin Matthews
b0c4da1732 Merge pull request #436 from flame/s390x
Add checks so that s390x is detected as 64-bit.
2020-09-07 15:47:54 -05:00
Field G. Van Zee
810e90ee80 Minor README.md update.
Details:
- Added HPE to list of funders.
- Changed http to https in funders' website links.
2020-09-01 16:11:40 -05:00
Devin Matthews
7d41128219 Use -O2 for all framework code. (#435)
It seems that -O3 might be causing intermittent problems with the f2c'ed packed and banded code. -O3 is retained for kernel code. Fixes #341 and fixes #342.
2020-08-13 17:50:58 -05:00
Dave Love
9c5b485d35 Don't override -mcpu with -march on ARM (#353)
* Use -mcpu for ARM
See the GCC doc about -march, -mtune, and -mpu and maybe
https://community.arm.com/developer/tools-software/tools/b/tools-software-ides-blog/posts/compiler-flags-across-architectures-march-mtune-and-mcpu

* Fix typo in flags

* Fix typo in cortexa9 flags

* Modify cortexa53 compilation flags to fix failing BLAS check (#341)
2020-08-07 15:11:18 -05:00
Devin Matthews
c253d14a72 Also handle Intel-style complex return in CBLAS interface. 2020-08-07 09:39:04 -05:00
Devin Matthews
5d653a11a0 Update Multithreading.md
Addresses the issue raised in #426.
2020-08-06 17:58:26 -05:00
Devin Matthews
b1b5870dd3 Add checks so that s390x is detected as 64-bit. 2020-08-06 17:34:20 -05:00
Field G. Van Zee
882dcb11bf Mention example code at top of documentation docs.
Details:
- Steer the reader towards the example code section of each
  documentation doc (object and typed).
- Trivial update to examples/oapi/README, examples/tapi/README.
2020-08-06 17:28:14 -05:00
Field G. Van Zee
f4894512e5 Very minor updates to previous commit. 2020-08-06 17:20:00 -05:00
Field G. Van Zee
adedb893ae Documented mutator functions in BLISObjectAPI.md.
Details:
- Added documentation for commonly-used object mutator functions in
  BLISObjectAPI.md. Previously, only accessor functions were documented.
  Thanks to Jeff Diamond for pointing out this omission.
- Explicitly set the 'diag' property of objects in oapi example modules
  (08level2.c and 09level3.c).
2020-08-06 17:14:01 -05:00
Devin Matthews
5b5278ff49 Use #ifdef instead of #if as macro may be undefined. 2020-08-06 14:19:37 -05:00
Devin Matthews
7fdc0fc893 Add an option to change the complex return type.
ifort apparently does not return complex numbers in registers as in C/C++ (or gfortran), but instead creates a "hidden" first parameter for the return value. The option --complex-return=gnu|intel has been added, as well as a guess based on a provided FC if not specified (otherwise default to gnu). This option affects the signatures of cdotc, cdotu, zdotc, and zdotu, and a single library cannot be used with both GNU and Intel Fortran compilers. Fixes #433.
2020-08-06 14:09:23 -05:00
Field G. Van Zee
6e522e5823 Mention disabling of sup in docs/Sandboxes.md.
Details:
- Added language to remind the reader to disable sup if the intended
  behavior is for the sandbox implementation to handle all problem
  sizes, even the smaller ones that would normally be handled by the
  sup code path.
2020-07-30 19:31:37 -05:00
Field G. Van Zee
00e14cb6d8 Replaced use of bool_t type with C99 bool.
Details:
- Textually replaced nearly all non-comment instances of bool_t with the
  C99 bool type. A few remaining instances, such as those in the files
  bli_herk_x_ker_var2.c, bli_trmm_xx_ker_var2.c, and
  bli_trsm_xx_ker_var2.c, were promoted to dim_t since they were being
  used not for boolean purposes but to index into an array.
- This commit constitutes the third phase of a transition toward using
  C99's bool instead of bool_t, which was raised in issue #420. The first
  phase, which cleaned up various typecasts in preparation for using
  bool as the basis for bool_t (instead of gint_t), was implemented by
  commit a69a4d7. The second phase, which redefined the bool_t typedef
  in terms of bool (from gint_t), was implemented by commit 2c554c2.
2020-07-29 14:24:34 -05:00
Field G. Van Zee
2c554c2fce Redefined bool_t typedef in terms of C99 bool.
Details:
- Changed the typedef that defines bool_t from:

    typedef gint_t bool_t;

  where gint_t is a signed integer that forms the basis of most other
  integers in BLIS, to:

    typedef bool bool_t;

- Changed BLIS's TRUE and FALSE macro definitions from being in terms of
  integer literals:

    #define TRUE  1
    #define FALSE 0

  to being in terms of C99 boolean constants:

    #define TRUE  true
    #define FALSE false

  which are provided by stdbool.h.
- This commit constitutes the second phase of a transition toward using
  C99's bool instead of bool_t, which will address issue #420. The first
  phase, which cleaned up various typecasts in preparation for using
  bool as the basis for bool_t (instead of gint_t), was implemented by
  commit a69a4d7.
2020-07-24 15:57:19 -05:00
Field G. Van Zee
e01dd12558 Fail-safe updates to Makefiles in 'test' dir.
Details:
- Updated Makefiles in test, test/3, and test/sup so that running any of
  the usual targets without having first built BLIS results in a helpful
  error message. For example, if BLIS is not yet configured, make will
  output:

    Makefile:327: *** Cannot proceed: config.mk not detected! Run
    configure first.  Stop.

  Similarly, if BLIS is configured but not yet built, make will output:

    Makefile:340: *** Cannot proceed: BLIS library not yet built! Run
    make first.  Stop.

  In previous commits, these actions would result in a rather cryptic
  make error such as:

    make: *** No rule to make target 'test_sgemm_2400_asm_blis_st.x',
    needed by 'blis-nat-st'.  Stop.
2020-07-24 15:41:46 -05:00
Devin Matthews
b4f47f7540 Add BLIS_EXPORT_BLIS to bli_abort. (#429)
Fixes #428.
2020-07-24 13:56:13 -05:00
Field G. Van Zee
a69a4d7e2f Cleaned up bool_t usage and various typecasts.
Details:
- Fixed various typecasts in

    frame/base/bli_cntx.h
    frame/base/bli_mbool.h
    frame/base/bli_rntm.h
    frame/include/bli_misc_macro_defs.h
    frame/include/bli_obj_macro_defs.h
    frame/include/bli_param_macro_defs.h

  that were missing or being done improperly/incompletely. For example,
  many return values were being typecast as
    (bool_t)x && y
  rather than
    (bool_t)(x && y)
  Thankfully, none of these deficiencies had manifested as actual bugs
  at the time of this commit.
- Changed the return type of bli_env_get_var() from dim_t to gint_t.
  This reflects the fact that bli_env_get_var() needs to be able to
  return a signed integer, and even though dim_t is currently defined
  as a signed integer, it does not intuitively appear to necessarily be
  signed by inspection (i.e., an integer named "dim_t" for matrix
  "dimension"). Also, updated use of bli_env_get_var() within
  bli_pack.c to reflect the changed return type.
- Redefined type of thrcomm_t.barrier_sense field from bool_t to gint_t
  and added comments to the bli_thrcomm_*.h files that will explain a
  planned replacement of bool_t with C99's bool type.
- Note: These changes are being made to facilitate the substitution of
  'bool' for 'bool_t', which will eliminate the namespace conflict with
  arm_sve.h as reported in issue #420. This commit implements the first
  phase of that transition. Thanks to RuQing Xu for reporting this
  issue.
- CREDITS file update.
2020-07-22 16:13:09 -05:00
Field G. Van Zee
a6437a5c11 Replaced broken ref99 sandbox w/ simpler version.
Details:
- The 'ref99' sandbox was broken by multiple refactorings and internal
  API changes over the last two years. Rather than try to fix it, I've
  replaced it with a much simpler version based on var2 of gemmsup.
  Why not fix the previous implementation? It occurred to me that the
  old implementation was trying to be a lightly simplified duplication
  of what exists in the framework. Duplication aside, this sandbox
  would have worked fine if it had been completely independent of the
  framework code. The problem was that it was only partially
  independent, with many function calls calling a function in BLIS
  rather than a duplicated/simplified version within the sandbox. (And
  the reason I didn't make it fully independent to begin with was that
  it seemed unnecessarily duplicative at the time.) Maintaining two
  versions of the same implementation is problematic for obvious
  reasons, especially when it wasn't even done properly to begin with.
  This explains the reimplementation in this commit. The only catch is
  that the newer implementation is single-threaded only and does not
  perform any packing on either input matrix (A or B). Basically, it's
  only meant to be a simple placeholder that shows how you could plug
  in your own implementation. Thanks to Francisco Igual for reporting
  this brokenness.
- Updated the three reference gemmsup kernels (defined in
  ref_kernels/3/bli_gemmsup_ref.c) so that they properly handle
  conjugation of conja and/or conjb. The general storage kernel, which
  is currently identical to the column-storage kernel, is used in the
  new ref99 sandbox to provide basic support for all datatypes
  (including scomplex and dcomplex).
- Minor updates to docs/Sandboxes.md, including adding the threading
  and packing limitations to the Caveats section.
- Fixed a comment typo in bli_l3_sup_var1n2m.c (upon which the new
  sandbox implementation is based).
2020-07-20 19:21:07 -05:00
Devin Matthews
bca040be9d Merge pull request #425 from gmargari/patch-1
Update Multithreading.md
2020-07-20 09:27:30 -05:00
Giorgos Margaritis
171ecc1dc6 Update Multithreading.md 2020-07-20 12:24:06 +03:00
Field G. Van Zee
2605eb4d99 Added missing rv_d?x6 edge cases to sup kernel.
Details:
- Added support to bli_gemmsup_rv_haswell_asm_d6x8n.c for handling
  various n = 6 edge cases with a single sup kernel call. Previously,
  only n = {4,2,1} were handled explicitly as single kernel calls;
  that is, cases where n = 6 were previously being executed via two
  kernel calls (n = 4 and n = 2).
- Added commented debug line to testsuite's test_libblis.c.
2020-07-15 15:25:19 -05:00
Field G. Van Zee
72f6ed0637 Declare/define static functions via BLIS_INLINE.
Details:
- Updated all static function definitions to use the cpp macro
  BLIS_INLINE instead of the static keyword. This allows blis.h to
  use a different keyword (inline) to define these functions when
  compiling with C++, which might otherwise trigger "defined but
  not used" warning messages. Thanks to Giorgos Margaritis for
  reporting this issue and Devin Matthews for suggesting the fix.
- Updated the following files, which are used by configure's
  hardware auto-detection facility, to unconditionally #define
  BLIS_INLINE to the static keyword (since we know BLIS will be
  compiled with C, not C++):
    build/detect/config/config_detect.c
    frame/base/bli_arch.c
    frame/base/bli_cpuid.c
- CREDITS file update.
2020-07-03 17:55:54 -05:00
Field G. Van Zee
5fc701ac5f Added -fomit-frame-pointer option to CKOPTFLAGS.
Details:
- Added the -fomit-frame-pointer compiler option to the CKOPTFLAGS
  variable in the following make_defs.mk files:
    config/haswell/make_defs.mk
    config/skx/make_defs.mk
  as well as comments that mention why the compiler option is needed.
  This option is needed to prevent the compiler from using the rbp
  frame register (in the very early portion of kernel code, typically
  where k_iter and k_left are defined and computed), which, as of
  1c719c9, is used explicitly by the gemmsup millikernels. Thanks to
  Devin Matthews for identifying this missing option and to Jeff
  Diamond for reporting the original bug in #417.
- The file
    config/zen/amd_config.mk
  which feeds into the make_defs.mk for both zen and zen2 subconfigs,
  was also touched, but only to add a commented-out compiler option
  (and the aforementioned explanatory comment) since that file already
  uses -fomit-frame-pointer in COPTFLAGS, which forms the basis of
  CKOPTFLAGS.
2020-07-01 15:48:58 -05:00
Field G. Van Zee
6af59b7057 Fixed disabled edge case optimization in gemmsup.
Details:
- Fixed an inadvertently disabled edge case optimization in the two
  gemmsup variants in bli_l3_sup_var1n2m.c. Background: These edge case
  optimizations allow the last millikernel operation in the jr loop to
  be executed with inflated an register blocksize if it is the last
  (or only) iteration. For example, if mr=6 and nr=8 and the gemmsup
  problem is m=8, n=100, k=100. (In this case, the panel-block variant
  (var1n) is executed, which places the jr loop in the m dimension.)
  In principle, this problem could be executed as two millikernels: one
  with dimensions 6x100x100, and one as 2x100x100. However, with the
  support for inflated blocksizes in the kernel, the entire 8x100x100
  problem can be passed to the millikernel function, which will then
  execute it more favorably as two 4x100x100 millikernel sub-calls.
  Now, this optimization is disabled under certain circumstances, such
  as when multithreading. Previously, the is_mt predicate was being set
  incorrectly such that it was non-zero even when running
  single-threaded.
- Upon fixing the is_mt issue above, another bit of code needed to be
  moved so that the result of the optimization could have an impact on
  the assignment of loop bounds ranges to threads.
2020-07-01 14:54:23 -05:00
Field G. Van Zee
b37634540f Support ldims, packing in sup/test drivers.
Details:
- Updated the test/sup source file (test_gemm.c) and Makefile to support
  building matrices with small or large leading dimensions, and updated
  runme.sh to support executing both kinds of test drivers.
- Updated runme.sh to allow for executing sup drivers with unpacked (the
  default) or packed matrices (via setting BLIS_PACK_A, BLIS_PACK_B
  environment variables), and for capturing output to files that encode
  both the leading dimension (small or large) and packing status into
  the filenames.
- Consolidated octave scripts in test/sup/octave_st, test/sup/octave_mt
  into test/sup/octave and updated the octave code in that consolidated
  directory to read the new output filename format (encoding ldim and
  packing). Also added comments and streamlined code, particularly in
  plot_panel_trxsh.m. Tested the octave scripts with octave 5.2.0.
- Moved old octave_st, octave_mt directories to test/sup/old.
2020-06-25 16:05:12 -05:00
Field G. Van Zee
ceb9b95a96 Fixed incorrect link to shiftd in BLISTypedAPI.md.
Details:
- Previously, the entry for shiftd in the Operation index section of
  BLISTypedAPI.md was incorrectly linking to the shiftd operation entry
  in BLISObjectAPI.md. This has been fixed. Thanks to Jeff Diamond for
  helping find this incorrect link.
2020-06-18 17:15:25 -05:00
Field G. Van Zee
b3c4201681 CREDITS file update. 2020-06-18 14:00:56 -05:00
Isuru Fernando
31af73c11a Expand windows instructions (#414)
* Expand windows instructions

* Windows: both static and shared don't work at the same time
2020-06-18 13:35:54 -05:00
Field G. Van Zee
b5b604e106 Ensure random objects' 1-norms are non-zero.
Details:
- Fixed an innocuous bug that manifested when running the testsuite on
  extremely small matrices with randomization via the "powers of 2 in
  narrow precision range" option enabled. When the randomization
  function emits a perfect 0.0 to fill a 1x1 matrix, the testsuite will
  then compute 0.0/0.0 during the normalization process, which leads to
  NaN residuals. The solution entails smarter implementaions of randv,
  randnv, randm, and randnm, each of which will compute the 1-norm of
  the vector or matrix in question. If the object has a 1-norm of 0.0,
  the object is re-randomized until the 1-norm is not 0.0. Thanks to
  Kiran Varaganti for reporting this issue (#413).
- Updated the implementation of randm_unb_var1() so that it loops over
  a call to the randv_unb_var1() implementation directly rather than
  calling it indirectly via randv(). This was done to avoid the overhead
  of multiple calls to norm1v() when randomizing the rows/columns of a
  matrix.
- Updated comments.
2020-06-17 16:42:24 -05:00
Isuru Fernando
35e38fb693 FIx typo in FAQ 2020-06-16 09:08:31 -07:00
Field G. Van Zee
1c719c91a3 Bugfixes, cleanup of sup dgemm ukernels.
Details:
- Fixed a few not-really-bugs:
  - Previously, the d6x8m kernels were still prefetching the next upanel
    of A using MR*rs_a instead of ps_a (same for prefetching of next
    upanel of B in d6x8n kernels using NR*cs_b instead of ps_b). Given
    that the upanels might be packed, using ps_a or ps_b is the correct
    way to compute the prefetch address.
  - Fixed an obscure bug in the rd_d6x8m kernel that, by dumb luck,
    executed as intended even though it was based on a faulty pointer
    management. Basically, in the rd_d6x8m kernel, the pointer for B
    (stored in rdx) was loaded only once, outside of the jj loop, and in
    the second iteration its new position was calculated by incrementing
    rdx by the *absolute* offset (four columns), which happened to be the
    same as the relative offset (also four columns) that was needed. It
    worked only because that loop only executed twice. A similar issue
    was fixed in the rd_d6x8n kernels.
- Various cleanups and additions, including:
  - Factored out the loading of rs_c into rdi in rd_d6x8[mn] kernels so
    that it is loaded only once outside of the loops rather than
    multiple times inside the loops.
  - Changed outer loop in rd kernels so that the jump/comparison and
    loop bounds more closely mimic what you'd see in higher-level source
    code. That is, something like:
      for( i = 0; i < 6; i+=3 )
    rather than something like:
      for( i = 0; i <= 3; i+=3 )
  - Switched row-based IO to use byte offsets instead of byte column
    strides (e.g. via rsi register), which were known to be 8 anyway
    since otherwise that conditional branch wouldn't have executed.
  - Cleaned up and homogenized prefetching a bit.
  - Updated the comments that show the before and after of the
    in-register transpositions.
  - Added comments to column-based IO cases to indicate which columns
    are being accessed/updated.
  - Added rbp register to clobber lists.
  - Removed some dead (commented out) code.
  - Fixed some copy-paste typos in comments in the rv_6x8n kernels.
  - Cleaned up whitespace (including leading ws -> tabs).
  - Moved edge case (non-milli) kernels to their own directory, d6x8,
    and split them into separate files based on the "NR" value of the
    kernels (Mx8, Mx4, Mx2, etc.).
  - Moved config-specific reference Mx1 kernels into their own file
    (e.g. bli_gemmsup_r_haswell_ref_dMx1.c) inside the d6x8 directory.
  - Added rd_dMx1 assembly kernels, which seems marginally faster than
    the corresponding reference kernels.
  - Updated comments in ref_kernels/bli_cntx_ref.c and changed to using
    the row-oriented reference kernels for all storage combos.
2020-06-04 17:21:08 -05:00
Isuru Fernando
943a21def0 Add build instructions for Windows (#404) 2020-05-21 14:09:21 -05:00
Field G. Van Zee
fbef422f0d Separate OS X and Windows into separate FAQs.
Details:
- Separated the unified Mac OS X / Windows frequently asked question
  into two separate questions, one for each OS.
2020-05-21 10:30:41 -05:00
Guodong Xu
28be1a4265 avoid loading twice in armv8a gemm kernel (#403)
This bug happens at a corner case, when k_iter == 0 and we jump to
CONSIDERKLEFT.

In current design, first row/col. of a and b are loaded twice.

The fix is to rearrange a and b (first row/col.) loading instructions.

Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
2020-05-20 13:22:22 -05:00
Field G. Van Zee
d51245e58b Add support for Intel oneAPI in configure.
Details:
- Properly select cc_vendor based on the output of invoking CC with the
  --version option, including cases where CC is the variant of clang
  that is included with Intel oneAPI. (However, we continue to treat
  the compiler as clang for other purposes, not icc.) Thanks to Ajay
  Panyala and Devin Matthews for reporting on this issue via #402.
2020-05-08 18:00:54 -05:00