Minor updates to docs, Makefiles.

Details:
- Changed all occurrances of
    micro-kernel -> microkernel
    macro-kernel -> macrokernel
    micro-panel  -> micropanel
  in all markdown documents in 'docs' directory. This change is being
  made since we've reached the point in adoption and acceptance of
  BLIS's insights where words such as "microkernel" are no longer new,
  and therefore now merit being unhyphenated.
- Updated "Implementation Notes" sections of KernelsHowTo.md, which
  still contained references to nonexistent cpp macros such as
  BLIS_DEFAULT_MR_? and BLIS_PACKDIM_MR_?.
- Added 'run-fast' and 'check-fast' targets to testsuite/Makefile.
- Minor updates to Testsuite.md, including suggesting use of
  'make check' and 'make check-fast' when running from the local
  testsuite directory.
- Added a comment to top-level Makefile explaining the purpose behind
  the TESTSUITE_WRAPPER variable, which at first glance appears to serve
  no purpose.
This commit is contained in:
Field G. Van Zee
2019-01-28 16:22:23 -06:00
parent 1aa280d052
commit c665eb9b88
11 changed files with 237 additions and 194 deletions

View File

@@ -5,14 +5,14 @@ This wiki is intended to track the support for various hardware types within the
We apologize if this wiki falls out of date. For the latest support, we recommend peeking inside of the relevant sub-configuration (specifically, in the `bli_cntx_init_<configname>.c` file) and looking at which kernels are registered. You may also contact the [blis-devel](http://groups.google.com/group/blis-devel) mailing list.
## Level-3 micro-kernels
## Level-3 microkernels
The following table lists architectures for which there exist optimized level-3 micro-kernels, which micro-kernels are optimized, the name of the author or maintainer, and the current status of the micro-kernels.
The following table lists architectures for which there exist optimized level-3 microkernels, which microkernels are optimized, the name of the author or maintainer, and the current status of the microkernels.
A few remarks / reminders:
* Optimizing only the [gemm micro-kernel](KernelsHowTo.md#gemm-micro-kernel) will result in optimal performance for all [level-3 operations](BLISTypedAPI#level-3-operations) except `trsm` (which will typically achieve 60 - 80% of attainable peak performance).
* The [trsm](BLISTypedAPI#trsm) operation needs the [gemmtrsm micro-kernel(s)](KernelsHowTo.md#gemmtrsm-micro-kernels), in addition to the aforementioned [gemm micro-kernel](KernelsHowTo.md#gemm-micro-kernel), in order reach optimal performance.
* Induced complex (1m) implementations are employed in all situations where the real domain [gemm micro-kernel](KernelsHowTo.md#gemm-micro-kernel) of the corresponding precision is available. Please see our [ACM TOMS article on the 1m method](https://github.com/flame/blis#citations) for more info on this topic.
* Optimizing only the [gemm microkernel](KernelsHowTo.md#gemm-microkernel) will result in optimal performance for all [level-3 operations](BLISTypedAPI#level-3-operations) except `trsm` (which will typically achieve 60 - 80% of attainable peak performance).
* The [trsm](BLISTypedAPI#trsm) operation needs the [gemmtrsm microkernel(s)](KernelsHowTo.md#gemmtrsm-microkernels), in addition to the aforementioned [gemm microkernel](KernelsHowTo.md#gemm-microkernel), in order reach optimal performance.
* Induced complex (1m) implementations are employed in all situations where the real domain [gemm microkernel](KernelsHowTo.md#gemm-microkernel) of the corresponding precision is available. Please see our [ACM TOMS article on the 1m method](https://github.com/flame/blis#citations) for more info on this topic.
* Some microarchitectures use the same sub-configuration. This is not a typo. For example, Haswell and Broadwell systems as well as "desktop" (non-server) versions of Skylake, Kabylake, and Coffeelake all use the `haswell` sub-configuration and the kernels registered therein.
* Remember that you (usually) don't have to choose your sub-configuration manually! Instead, you can always request configure-time hardware detection via `./configure auto`. This will defer to internal logic (based on CPUID for x86_64 systems) that will attempt to choose the appropriate sub-configuration automatically.