mirror of
https://github.com/amd/blis.git
synced 2026-04-19 23:28:52 +00:00
Previously, the ZGEMM implementation used `zscalv` for cases
where the M dimension of matrix A is not in multiple of 24,
resulting in a ~40% performance drop.
This commit introduces a specialized edge cases in pack kernel
to optimize performance for these cases.
The new packing support significantly improves the performance.
- Removed reliance on `zscalv` for edge cases, addressing the
performance bottleneck.
AMD-Internal: [CPUPL-6677]
Co-authored-by: harsh dave <harsdave@amd.com>