Files
blis/config_registry
RuQing Xu 61584deddf Added 512b SVE-based a64fx subconfig + SVE kernels.
Details:
- Added 512-bit specific 'a64fx' subconfiguration that uses empirically 
  tuned block size by Stepan Nassyr. This subconfig also sets the sector 
  cache size and enables memory-tagging code in SVE gemm kernels. This 
  subconfig utilizes (16, k) and (10, k) DPACKM kernels.
- Added a vector-length agnostic 'armsve' subconfiguration that computes
  blocksizes according to the analytical model. This part is ported from 
  Stepan Nassyr's repository.
- Implemented vector-length-agnostic [d/s/sh] gemm kernels for Arm SVE 
  at size (2*VL, 10). These kernels use unindexed FMLA instructions 
  because indexed FMLA takes 2 FMA units in many implementations.
  PS: There are indexed-FLMA kernels in Stepan Nassyr's repository.
- Implemented 512-bit SVE dpackm kernels with in-register transpose
  support for sizes (16, k) and (10, k).
- Extended 256-bit SVE dpackm kernels by Linaro Ltd. to 512-bit for 
  size (12, k). This dpackm kernel is not currently used by any 
  subconfiguration.
- Implemented several experimental dgemmsup kernels which would 
  improve performance in a few cases. However, those dgemmsup kernels 
  generally underperform hence they are not currently used in any 
  subconfig.
- Note: This commit squashes several commits submitted by RuQing Xu via
  PR #424.
2021-05-19 09:52:29 -05:00

50 lines
1.3 KiB
Plaintext

#
# config_registry
#
# Please refer to the BLIS wiki on configurations for information on the
# syntax and semantics of this file [1].
#
# [1] https://github.com/flame/blis/blob/master/docs/ConfigurationHowTo.md
#
# Processor families.
x86_64: intel64 amd64
intel64: skx knl haswell sandybridge penryn generic
amd64: zen2 zen excavator steamroller piledriver bulldozer generic
# NOTE: ARM families will remain disabled until runtime hardware detection
# logic is added to BLIS.
#arm64: cortexa57 generic
#arm32: cortexa15 cortexa9 generic
# Intel architectures.
skx: skx/skx/haswell/zen
knl: knl/knl/haswell/zen
haswell: haswell/haswell/zen
sandybridge: sandybridge
penryn: penryn
# AMD architectures.
zen2: zen2/zen2/zen/haswell
zen: zen/zen/haswell
excavator: excavator/piledriver
steamroller: steamroller/piledriver
piledriver: piledriver
bulldozer: bulldozer
# ARM architectures.
armsve: armsve/armsve
a64fx: a64fx/armsve
thunderx2: thunderx2/armv8a
cortexa57: cortexa57/armv8a
cortexa53: cortexa53/armv8a
cortexa15: cortexa15/armv7a
cortexa9: cortexa9/armv7a
# IBM architectures.
power10: power10
power9: power9
bgq: bgq
# Generic architectures.
generic: generic