Files
blis/kernels
RuQing Xu 916e1fa8be Armv8A Rename Regs for Clang Compile: FP64 Part
- x7, x8: Used to store address for Alpha and Beta.
  As Alpha & Beta was not used in k-loops, use x0, x1 to load
  Alpha & Beta's addresses after k-loops are completed, since A & B's
  addresses are no longer needed there.
  This "ldr [addr]; -> ldr val, [addr]" would not cause much performance
  drawback since it is done outside k-loops and there are plenty of
  instructions between Alpha & Beta's loading and usage.
- x9: Used to store cs_c. x9 is multiplied by 8 into x10 and not used
  any longer. Directly loading cs_c and into x10 and scale by 8 spares
  x9 straightforwardly.
- x11, x12: Not used at all. Simply remove from clobber list.
- x13: Alike x9, loaded and scaled by 8 into x14, except that x13 is
  also used in a conditional branch so that "cmp x13, #1" needs to be
  modified into "cmp x14, #8" to completely free x13.
- x3, x4: Used to store next_a & next_b. Untouched in k-loops. Load
  these addresses into x0 and x1 after Alpha & Beta are both loaded,
  since then neigher address of A/B nor address of Alpha/Beta is needed.
2021-05-29 16:46:52 +09:00
..
2020-09-29 16:52:18 -05:00