Files
blis/kernels
RuQing Xu dfa5413966 Arm64 dgemmsup with extended MR&NR (#655)
Details:
- Since the number of registers in NEON is large but their lengths are 
  short, I'm here extending both MR and NR.
- The approach is to represent the C microtile in registers optionally 
  in columns, so for sizes like 6x7m, the 'crr' kernel is the default 
  with 'rrr' supported through an in-register transpose.
- A few asm kernels are crafted for 'rv' to complete this extended size 
  support.
- For 'rd' I'm still relying heavily on C99 intrinsic kernels with 
  branching so the performance might not be optimal. (Sorry for that.)
- So far, these changes only affect the 'firestorm' subconfig.
- This commit also contains row-preferential s12x8 and d6x8 gemm
  ukernels. These microkernels are templatized versions of the existing
  s8x12 and d6x8 ukernels defined in bli_gemm_armv8a_asm_d6x8.c.
2022-08-29 19:07:50 -05:00
..