Commit Graph

8 Commits

Author SHA1 Message Date
RuQing Xu
c19db2ff82 Arm SVE Add ZGEMM 2Vx10 Unindexed 2021-10-08 12:13:07 +09:00
RuQing Xu
e13abde30b Arm SVE Add ZGEMM 2Vx7 Unindexed 2021-10-08 12:13:06 +09:00
RuQing Xu
49b9d7998e Arm SVE Add ZGEMM 2Vx8 Unindexed 2021-10-08 12:12:48 +09:00
Devin Matthews
80c5366e4a Move unused ARM SVE kernels to "old" directory. 2021-10-04 15:40:28 -05:00
Devin Matthews
13dbd5b5d3 Apply patch from @xrq-phys. 2021-10-02 16:08:05 -05:00
Devin Matthews
ae0eeeaf77 Add explicit handling for beta == 0 in armsve sd and armv7a d gemm ukrs. 2021-09-29 16:43:38 -05:00
RuQing Xu
61584deddf Added 512b SVE-based a64fx subconfig + SVE kernels.
Details:
- Added 512-bit specific 'a64fx' subconfiguration that uses empirically 
  tuned block size by Stepan Nassyr. This subconfig also sets the sector 
  cache size and enables memory-tagging code in SVE gemm kernels. This 
  subconfig utilizes (16, k) and (10, k) DPACKM kernels.
- Added a vector-length agnostic 'armsve' subconfiguration that computes
  blocksizes according to the analytical model. This part is ported from 
  Stepan Nassyr's repository.
- Implemented vector-length-agnostic [d/s/sh] gemm kernels for Arm SVE 
  at size (2*VL, 10). These kernels use unindexed FMLA instructions 
  because indexed FMLA takes 2 FMA units in many implementations.
  PS: There are indexed-FLMA kernels in Stepan Nassyr's repository.
- Implemented 512-bit SVE dpackm kernels with in-register transpose
  support for sizes (16, k) and (10, k).
- Extended 256-bit SVE dpackm kernels by Linaro Ltd. to 512-bit for 
  size (12, k). This dpackm kernel is not currently used by any 
  subconfiguration.
- Implemented several experimental dgemmsup kernels which would 
  improve performance in a few cases. However, those dgemmsup kernels 
  generally underperform hence they are not currently used in any 
  subconfig.
- Note: This commit squashes several commits submitted by RuQing Xu via
  PR #424.
2021-05-19 09:52:29 -05:00
Guodong Xu
f032d5d4a6 New kernel set for Arm SVE using assembly (#396)
Here adds two kernels for Arm SVE vector extensions.
1. a gemm  kernel for double at sizes 8x8.
2. a packm kernel for double at dimension 8xk.

To achive best performance, variable length agonostic programming
is not used. Vector length (VL) of 256 bits is mandated in both kernels.
Kernels to support other VLs can be added later.

"SVE is a vector extension for AArch64 execution mode for the A64
instruction set of the Armv8 architecture. Unlike other SIMD architectures,
SVE does not define the size of the vector registers, but constrains into
a range of possible values, from a minimum of 128 bits up to a maximum of
2048 in 128-bit wide units. Therefore, any CPU vendor can implement the
extension by choosing the vector register size that better suits the
workloads the CPU is targeting. Instructions are provided specifically
to query an implementation for its register size, to guarantee that
the applications can run on different implementations of the ISA without
the need to recompile the code."  [1]

[1] https://developer.arm.com/solutions/hpc/resources/hpc-white-papers/arm-scalable-vector-extensions-and-application-to-machine-learning

Signed-off-by: Guodong Xu <guodong.xu@linaro.org>
2020-04-29 12:08:46 -05:00