mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
Details: - Added "4mh" and "3mh" APIs, which implement the 4m and 3m methods at high levels, respectively. APIs for trmm and trsm were NOT added due to the fact that these approaches are inherently incompatible with implementing 4m or 3m at high levels (because the input right-hand side matrix is overwritten). - Added 4mh, 3mh virtual micro-kernels, and updated the existing 4m and 3m so that all are stylistically consistent. - Added new "rih" packing kernels (both low-level and structure-aware) to support both 4mh and 3mh. - Defined new pack_t schemas to support real-only, imaginary-only, and real+imaginary packing formats. - Added various level0 scalar macros to support the rih packm kernels. - Minor tweaks to trmm macro-kernels to facilitate 4mh and 3mh. - Added the ability to enable/disable 4mh, 3m, and 3mh, and adjusted level-3 front-ends to check enabledness of 3mh, 3m, 4mh, and 4m (in that order) and execute the first one that is enabled, or the native implementation if none are enabled. - Added implementation query functions for each level-3 operation so that the user can query a string that describes the implementation that is currently enabled. - Updated test suite to output implementation types for reach level-3 operation, as well as micro-kernel types for each of the five micro- kernels. - Renamed BLIS_ENABLE_?COMPLEX_VIA_4M macros to _ENABLE_VIRTUAL_?COMPLEX. - Fixed an obscure bug when packing Hermitian matrices (regular packing type) whereby the diagonal elements of the packed micro-panels could get tainted if the source matrix's imaginary diagonal part contained garbage.
37 lines
1.6 KiB
Plaintext
37 lines
1.6 KiB
Plaintext
# ----------------------------------------------------------------------
|
|
#
|
|
# input.general
|
|
# BLIS test suite
|
|
#
|
|
# This file contains input values that control how BLIS operations are
|
|
# tested. Comments explain the purpose of each parameter as well as
|
|
# accepted values.
|
|
#
|
|
|
|
1 # Number of repeats per experiment (best result is reported)
|
|
c # Matrix storage scheme(s) to test:
|
|
# 'c' = col-major storage; 'g' = general stride storage;
|
|
# 'r' = row-major storage
|
|
c # Vector storage scheme(s) to test:
|
|
# 'c' = colvec / unit stride; 'j' = colvec / non-unit stride;
|
|
# 'r' = rowvec / unit stride; 'i' = rowvec / non-unit stride
|
|
0 # Test all combinations of storage schemes?
|
|
32 # General stride spacing (for cases when testing general stride)
|
|
sdcz # Datatype(s) to test:
|
|
# 's' = single real; 'c' = single complex;
|
|
# 'd' = double real; 'z' = double complex
|
|
100 # Problem size: first to test
|
|
300 # Problem size: maximum to test
|
|
100 # Problem size: increment between experiments
|
|
# Complex level-3 implementations
|
|
0 # 3mh ('1' = enable; '0' = disable)
|
|
0 # 3m ('1' = enable; '0' = disable)
|
|
0 # 4mh ('1' = enable; '0' = disable)
|
|
1 # 4m ('1' = enable; '0' = disable)
|
|
1 # Error-checking level:
|
|
# '0' = disable error checking; '1' = full error checking
|
|
i # Reaction to test failure:
|
|
# 'i' = ignore; 's' = sleep() and continue; 'a' = abort
|
|
0 # Output results in matlab/octave format? ('1' = yes; '0' = no)
|
|
0 # Output results to stdout AND files? ('1' = yes; '0' = no)
|