mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
Minor update to docs/HardwareSupport.md document.
Details: - Added more details and clarifying language to implications of 1m and the recycling of microkernels between microarchitectures.
This commit is contained in:
committed by
Devrajegowda, Kiran
parent
9b16d8e995
commit
959d8d906a
@@ -12,8 +12,8 @@ The following table lists architectures for which there exist optimized level-3
|
||||
A few remarks / reminders:
|
||||
* Optimizing only the [gemm microkernel](KernelsHowTo.md#gemm-microkernel) will result in optimal performance for all [level-3 operations](BLISTypedAPI#level-3-operations) except `trsm` (which will typically achieve 60 - 80% of attainable peak performance).
|
||||
* The [trsm](BLISTypedAPI#trsm) operation needs the [gemmtrsm microkernel(s)](KernelsHowTo.md#gemmtrsm-microkernels), in addition to the aforementioned [gemm microkernel](KernelsHowTo.md#gemm-microkernel), in order reach optimal performance.
|
||||
* Induced complex (1m) implementations are employed in all situations where the real domain [gemm microkernel](KernelsHowTo.md#gemm-microkernel) of the corresponding precision is available. Please see our [ACM TOMS article on the 1m method](https://github.com/flame/blis#citations) for more info on this topic.
|
||||
* Some microarchitectures use the same sub-configuration. This is not a typo. For example, Haswell and Broadwell systems as well as "desktop" (non-server) versions of Skylake, Kabylake, and Coffeelake all use the `haswell` sub-configuration and the kernels registered therein.
|
||||
* Induced complex (1m) implementations are employed in all situations where the real domain [gemm microkernel](KernelsHowTo.md#gemm-microkernel) of the corresponding precision is available, but the "native" complex domain gemm microkernel is unavailable. Note that the table below lists native kernels, so if a microarchitecture lists only `sd`, support for both `c` and `z` datatypes will be provided via the 1m method. (Note: most people cannot tell the difference between native and 1m-based performance.) Please see our [ACM TOMS article on the 1m method](https://github.com/flame/blis#citations) for more info on this topic.
|
||||
* Some microarchitectures use the same sub-configuration. *This is not a typo.* For example, Haswell and Broadwell systems as well as "desktop" (non-server) versions of Skylake, Kaby Lake, and Coffee Lake all use the `haswell` sub-configuration and the kernels registered therein. Microkernels can be recycled in this manner because the key detail that determines level-3 performance outcomes is actually the vector ISA, not the microarchitecture. In the previous example, all of the microarchitectures listed support AVX2 (but not AVX-512), and therefore they can reuse the same microkernels.
|
||||
* Remember that you (usually) don't have to choose your sub-configuration manually! Instead, you can always request configure-time hardware detection via `./configure auto`. This will defer to internal logic (based on CPUID for x86_64 systems) that will attempt to choose the appropriate sub-configuration automatically.
|
||||
|
||||
| Vendor/Microarchitecture | BLIS sub-configuration | `gemm` | `gemmtrsm` |
|
||||
@@ -26,7 +26,7 @@ A few remarks / reminders:
|
||||
| Intel Core2 (SSE3) | `penryn` | `sd` | `d` |
|
||||
| Intel Sandy/Ivy Bridge (AVX/FMA3) | `sandybridge` | `sdcz` | |
|
||||
| Intel Haswell, Broadwell (AVX/FMA3) | `haswell` | `sdcz` | `sd` |
|
||||
| Intel Sky/Kaby/Coffeelake (AVX/FMA3) | `haswell` | `sdcz` | `sd` |
|
||||
| Intel Sky/Kaby/CoffeeLake (AVX/FMA3) | `haswell` | `sdcz` | `sd` |
|
||||
| Intel Knights Landing (AVX-512/FMA3) | `knl` | `sd` | |
|
||||
| Intel SkylakeX (AVX-512/FMA3) | `skx` | `sd` | |
|
||||
| ARMv7 Cortex-A9 (NEON) | `cortex-a9` | `sd` | |
|
||||
|
||||
Reference in New Issue
Block a user