mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
Details: - Renamed the microkernels in kernels/zen/3 to kernels/haswell/3 and then updated the file contents to use the 'haswell' infix. - Updated bli_cntx_init_zen.c and bli_cntx_init_haswell.c according to above function renames. - Moved/updated the corresponding prototypes in bli_kernels_zen.h to bli_kernels_haswell.h. - Updated config_registry according to above changes. - NOTE: This rename reflects the fact that haswell microkernels are specifically written to overcome the floating-point latency for FMA instructions on Intel Haswell-like architectures, which can issue two FMA instructions per cycle. These ukernels happen to work fine on AMD Zen-based architectures. However, Zen only issues one FMA per cycle, which, while halving its floating-point throughput, gives it extra flexibility in the design of its microkernels--namely, mr and nr can be smaller and still overcome the floating-point latency for those single-issue cores. A smaller value of mr and nr allows for a larger value of kc, which may be useful in some situations. In the future, we may write such Zen-specific microkernels to take advantage of this additional flexibility.
42 lines
1008 B
Plaintext
42 lines
1008 B
Plaintext
#
|
|
# config_registry
|
|
#
|
|
# Please refer to the BLIS wiki on configurations for information on the
|
|
# syntax and semantics of this file [1].
|
|
#
|
|
# [1] https://github.com/flame/blis/wiki/ConfigurationHowTo
|
|
#
|
|
|
|
# Processor families.
|
|
x86_64: intel64 amd64
|
|
intel64: skx knl haswell sandybridge penryn generic
|
|
amd64: zen excavator steamroller piledriver bulldozer generic
|
|
arm64: cortexa57 generic
|
|
arm32: cortexa15 cortexa9 generic
|
|
|
|
# Intel architectures.
|
|
skx: skx/skx/haswell/zen
|
|
knl: knl/knl/haswell/zen
|
|
haswell: haswell/haswell/zen
|
|
sandybridge: sandybridge
|
|
penryn: penryn
|
|
|
|
# AMD architectures.
|
|
zen: zen/zen/haswell
|
|
excavator: excavator/piledriver
|
|
steamroller: steamroller/piledriver
|
|
piledriver: piledriver
|
|
bulldozer: bulldozer
|
|
|
|
# ARM architectures.
|
|
cortexa57: cortexa57/armv8a
|
|
cortexa53: cortexa53/armv8a
|
|
cortexa15: cortexa15/armv7a
|
|
cortexa9: cortexa9/armv7a
|
|
|
|
# IBM architectures.
|
|
bgq: bgq
|
|
|
|
# Generic architectures.
|
|
generic: generic
|