mirror of
https://github.com/amd/blis.git
synced 2026-05-13 18:52:14 +00:00
Details: - Added 512-bit specific 'a64fx' subconfiguration that uses empirically tuned block size by Stepan Nassyr. This subconfig also sets the sector cache size and enables memory-tagging code in SVE gemm kernels. This subconfig utilizes (16, k) and (10, k) DPACKM kernels. - Added a vector-length agnostic 'armsve' subconfiguration that computes blocksizes according to the analytical model. This part is ported from Stepan Nassyr's repository. - Implemented vector-length-agnostic [d/s/sh] gemm kernels for Arm SVE at size (2*VL, 10). These kernels use unindexed FMLA instructions because indexed FMLA takes 2 FMA units in many implementations. PS: There are indexed-FLMA kernels in Stepan Nassyr's repository. - Implemented 512-bit SVE dpackm kernels with in-register transpose support for sizes (16, k) and (10, k). - Extended 256-bit SVE dpackm kernels by Linaro Ltd. to 512-bit for size (12, k). This dpackm kernel is not currently used by any subconfiguration. - Implemented several experimental dgemmsup kernels which would improve performance in a few cases. However, those dgemmsup kernels generally underperform hence they are not currently used in any subconfig. - Note: This commit squashes several commits submitted by RuQing Xu via PR #424.
50 lines
1.3 KiB
Plaintext
50 lines
1.3 KiB
Plaintext
#
|
|
# config_registry
|
|
#
|
|
# Please refer to the BLIS wiki on configurations for information on the
|
|
# syntax and semantics of this file [1].
|
|
#
|
|
# [1] https://github.com/flame/blis/blob/master/docs/ConfigurationHowTo.md
|
|
#
|
|
|
|
# Processor families.
|
|
x86_64: intel64 amd64
|
|
intel64: skx knl haswell sandybridge penryn generic
|
|
amd64: zen2 zen excavator steamroller piledriver bulldozer generic
|
|
# NOTE: ARM families will remain disabled until runtime hardware detection
|
|
# logic is added to BLIS.
|
|
#arm64: cortexa57 generic
|
|
#arm32: cortexa15 cortexa9 generic
|
|
|
|
# Intel architectures.
|
|
skx: skx/skx/haswell/zen
|
|
knl: knl/knl/haswell/zen
|
|
haswell: haswell/haswell/zen
|
|
sandybridge: sandybridge
|
|
penryn: penryn
|
|
|
|
# AMD architectures.
|
|
zen2: zen2/zen2/zen/haswell
|
|
zen: zen/zen/haswell
|
|
excavator: excavator/piledriver
|
|
steamroller: steamroller/piledriver
|
|
piledriver: piledriver
|
|
bulldozer: bulldozer
|
|
|
|
# ARM architectures.
|
|
armsve: armsve/armsve
|
|
a64fx: a64fx/armsve
|
|
thunderx2: thunderx2/armv8a
|
|
cortexa57: cortexa57/armv8a
|
|
cortexa53: cortexa53/armv8a
|
|
cortexa15: cortexa15/armv7a
|
|
cortexa9: cortexa9/armv7a
|
|
|
|
# IBM architectures.
|
|
power10: power10
|
|
power9: power9
|
|
bgq: bgq
|
|
|
|
# Generic architectures.
|
|
generic: generic
|