mirror of
https://github.com/amd/blis.git
synced 2026-05-12 01:59:59 +00:00
Details: - In FP32 GEMM, when threading is disabled, rntm_pack_a and rntm_pack_b were set to true by default. This leads to perf regression for smaller sizes. Modified FP32 interface API to not overwrite the packA and packB variables in rntm structure. - In FP32 GEMV, Removed the decision making code based on mtag_A/B and should_pack_A/B for packing. Matrices will be packed only if the storage format of the matrices doesn't match the storage format required by the kernel. - Changed the control flow of checking the value of mtag to whether matrix is "reordered" or "to-be-packed" or "unpacked". checking for "reorder" first, followed by "pack". This will ensure that packing doesn't happen when the matrix is already reordered even though user forces packing by setting "BLIS_PACK_A/B" -Modified python script to generate testcases based on block sizes AMD-Internal: SWLCSG-3527