mirror of
https://github.com/amd/blis.git
synced 2026-05-24 18:34:40 +00:00
- In the initial patch - for m, n non-multiple of MR and NR respectively we are calling bli_dgemm_ker_var2. Now we have implemented macro-kernel for these fringe cases as well. - Replaced RBP register with R11 in the macro-kernel. - Retuned MC, KC and NC with these new changes. This will result in better performance for matrix sizes like m=4000 or greater when running on single thread. AMD-Internal: [CPUPL-5262] Change-Id: I66c111ceb7feee776703339680d57e8d6d5c809a