Files
blis/testsuite/input.general
Nicholai Tukanov b426f9e04e POWER9 DGEMM (#355)
Implemented and registered power9 dgemm ukernel.

Details:
- Implemented 12x6 dgemm microkernel for power9. This microkernel 
  assumes that elements of B have been duplicated/broadcast during the
  packing step. The microkernel uses a column orientation for its 
  microtile vector registers and thus implements column storage and 
  general stride IO cases. (A row storage IO case via in-register
  transposition may be added at a future date.) It should be noted that 
  we recommend using this microkernel with gcc and *not* xlc, as issues 
  with the latter cropped up during development, including but not 
  limited to slightly incompatible vector register mnemonics in the GNU 
  extended inline assembly clobber list.
2019-11-01 17:57:03 -05:00

50 lines
2.3 KiB
Plaintext

# ----------------------------------------------------------------------
#
# input.general
# BLIS test suite
#
# This file contains input values that control how BLIS operations are
# tested. Comments explain the purpose of each parameter as well as
# accepted values.
#
3 # Number of repeats per experiment (best result is reported)
c # Matrix storage scheme(s) to test:
# 'c' = col-major storage; 'g' = general stride storage;
# 'r' = row-major storage
cj # Vector storage scheme(s) to test:
# 'c' = colvec / unit stride; 'j' = colvec / non-unit stride;
# 'r' = rowvec / unit stride; 'i' = rowvec / non-unit stride
0 # Test all combinations of storage schemes?
1 # Perform all tests with alignment?
# '0' = do NOT align buffers/ldims; '1' = align buffers/ldims
0 # Randomize vectors and matrices using:
# '0' = real values on [-1,1];
# '1' = powers of 2 in narrow precision range
32 # General stride spacing (for cases when testing general stride)
d # Datatype(s) to test:
# 's' = single real; 'c' = single complex;
# 'd' = double real; 'z' = double complex
0 # Test gemm with mixed-domain operands?
0 # Test gemm with mixed-precision operands?
2000 # Problem size: first to test
2000 # Problem size: maximum to test
200 # Problem size: increment between experiments
# Complex level-3 implementations to test:
0 # 3mh ('1' = enable; '0' = disable)
0 # 3m1 ('1' = enable; '0' = disable)
0 # 4mh ('1' = enable; '0' = disable)
0 # 4m1b ('1' = enable; '0' = disable)
0 # 4m1a ('1' = enable; '0' = disable)
1 # 1m ('1' = enable; '0' = disable)
1 # native ('1' = enable; '0' = disable)
1 # Simulate application-level threading:
# '1' = disable / use one testsuite thread;
# 'n' = enable and use n testsuite threads
1 # Error-checking level:
# '0' = disable error checking; '1' = full error checking
i # Reaction to test failure:
# 'i' = ignore; 's' = sleep() and continue; 'a' = abort
1 # Output results in matlab/octave format? ('1' = yes; '0' = no)
0 # Output results to stdout AND files? ('1' = yes; '0' = no)