Details:
- In FP32 GEMM, when threading is disabled, rntm_pack_a and rntm_pack_b
were set to true by default. This leads to perf regression for smaller
sizes. Modified FP32 interface API to not overwrite the packA and
packB variables in rntm structure.
- In FP32 GEMV, Removed the decision making code based on mtag_A/B
and should_pack_A/B for packing. Matrices will be packed only
if the storage format of the matrices doesn't match the storage
format required by the kernel.
- Changed the control flow of checking the value of mtag to whether
matrix is "reordered" or "to-be-packed" or "unpacked". checking
for "reorder" first, followed by "pack". This will ensure that
packing doesn't happen when the matrix is already reordered even
though user forces packing by setting "BLIS_PACK_A/B"
-Modified python script to generate testcases based on block sizes
AMD-Internal: SWLCSG-3527