mirror of
https://github.com/amd/blis.git
synced 2026-04-20 15:48:50 +00:00
Implemented and registered power9 dgemm ukernel. Details: - Implemented 12x6 dgemm microkernel for power9. This microkernel assumes that elements of B have been duplicated/broadcast during the packing step. The microkernel uses a column orientation for its microtile vector registers and thus implements column storage and general stride IO cases. (A row storage IO case via in-register transposition may be added at a future date.) It should be noted that we recommend using this microkernel with gcc and *not* xlc, as issues with the latter cropped up during development, including but not limited to slightly incompatible vector register mnemonics in the GNU extended inline assembly clobber list.
36 lines
642 B
Bash
Executable File
36 lines
642 B
Bash
Executable File
#!/bin/bash
|
|
|
|
exec_root="test"
|
|
out_root="output"
|
|
#out_root="output_square"
|
|
|
|
# Operations to test.
|
|
# l2_ops="gemv ger hemv her her2 trmv trsv"
|
|
l3_ops="gemm"
|
|
# "hemm herk her2k trmm trsm"
|
|
test_ops=" ${l3_ops}"
|
|
# "${l2_ops}"
|
|
|
|
# Implementations to test | "openblas atlas mkl"
|
|
test_impls="blis"
|
|
|
|
for im in ${test_impls}; do
|
|
|
|
for op in ${test_ops}; do
|
|
|
|
# Construct the name of the test executable.
|
|
exec_name="${exec_root}_${op}_${im}.x"
|
|
|
|
# Construct the name of the output file.
|
|
out_file="${out_root}_${op}_${im}.m"
|
|
|
|
echo " Running ${exec_name} > ${out_file} "
|
|
|
|
# Run executable.
|
|
./${exec_name} > ${out_file}
|
|
|
|
sleep 1
|
|
|
|
done
|
|
done
|