Fix parsing in vpu_count on workstation SKX (#351)

* Fix parsing in vpu_count on workstation SKX

* Document Skylake-X as Haswell for single FMA

* Update vpu_count for Skylake and Cascade Lake models

* Support printing the configuration selected, controlled by the environment

Intended particularly for diagnosing mis-selection of SKX through
unknown, or incorrect, number of VPUs.

* Move bli_log outside the cpp condition, and use it where intended

* Add Fixme comment (Skylake D)

* Mostly superficial edits to commits towards #351.

Details:
- Moved architecture/sub-config logging-related code from bli_cpuid.c
  to bli_arch.c, tweaked names, and added more set/get layering.
- Tweaked log messages output from bli_cpuid_is_skx() in bli_cpuid.c.
- Content, whitespace changes to new bullet in HardwareSupport.md that
  relates to single-VPU Skylake-Xs.

* Fix comment typos

Co-authored-by: Field G. Van Zee <field@cs.utexas.edu>
This commit is contained in:
Dave Love
2020-01-06 20:15:48 +00:00
committed by Devrajegowda, Kiran
parent 99da76fd64
commit 291ee5f748
4 changed files with 101 additions and 20 deletions

View File

@@ -15,6 +15,7 @@ A few remarks / reminders:
* Induced complex (1m) implementations are employed in all situations where the real domain [gemm microkernel](KernelsHowTo.md#gemm-microkernel) of the corresponding precision is available, but the "native" complex domain gemm microkernel is unavailable. Note that the table below lists native kernels, so if a microarchitecture lists only `sd`, support for both `c` and `z` datatypes will be provided via the 1m method. (Note: most people cannot tell the difference between native and 1m-based performance.) Please see our [ACM TOMS article on the 1m method](https://github.com/flame/blis#citations) for more info on this topic.
* Some microarchitectures use the same sub-configuration. *This is not a typo.* For example, Haswell and Broadwell systems as well as "desktop" (non-server) versions of Skylake, Kaby Lake, and Coffee Lake all use the `haswell` sub-configuration and the kernels registered therein. Microkernels can be recycled in this manner because the key detail that determines level-3 performance outcomes is actually the vector ISA, not the microarchitecture. In the previous example, all of the microarchitectures listed support AVX2 (but not AVX-512), and therefore they can reuse the same microkernels.
* Remember that you (usually) don't have to choose your sub-configuration manually! Instead, you can always request configure-time hardware detection via `./configure auto`. This will defer to internal logic (based on CPUID for x86_64 systems) that will attempt to choose the appropriate sub-configuration automatically.
* There is a difficulty in automatically choosing the ideal sub-configuration for use on Skylake-X systems, which may have one or two FMA units. The `skx` sub-configuration is only beneficial when used on hardware with two FMA units. Otherwise the hardware is treated as a "desktop" Skylake system, which uses the `haswell` sub-configuration. Furthermore, the number of units can't be queried directly; instead, we rely on a manually-maintained list of CPU models (via logic in `frame/base/bli_cpuid.c`), which may be incorrect for new processors, particularly Gold models. In that case, you can either fix the code (and please raise an issue!) or manually target the `skx` at configure-time (i.e., `./configure [options] skx`). If your performance seems low, you can set `export BLIS_ARCH_DEBUG=1`, which will cause BLIS to output some basic debugging info to `stderr` that will reveal whether your system was detected as having one or two VPUs (FMA units).
| Vendor/Microarchitecture | BLIS sub-configuration | `gemm` | `gemmtrsm` |
|:-------------------------------------|:-----------------------|:-------|:-----------|
@@ -28,7 +29,8 @@ A few remarks / reminders:
| Intel Haswell, Broadwell (AVX/FMA3) | `haswell` | `sdcz` | `sd` |
| Intel Sky/Kaby/CoffeeLake (AVX/FMA3) | `haswell` | `sdcz` | `sd` |
| Intel Knights Landing (AVX-512/FMA3) | `knl` | `sd` | |
| Intel SkylakeX (AVX-512/FMA3) | `skx` | `sd` | |
| Intel SkylakeX (AVX-512/2×FMA3) | `skx` | `sd` | |
| Intel SkylakeX (AVX-512/1×FMA3) | `haswell` | `sdcz` | `sd` |
| ARMv7 Cortex-A9 (NEON) | `cortex-a9` | `sd` | |
| ARMv7 Cortex-A15 (NEON) | `cortex-a15` | `sd` | |
| ARMv8 Cortex-A53 (NEON) | `cortex-a53` | `sd` | |