mirror of
https://github.com/amd/blis.git
synced 2026-05-11 17:50:00 +00:00
* Optimize ZGEMM Packing Kernel for M-Dimension Edge Cases (cdim0 1–11)
- Introduced specialized AVX-512 assembly paths for cdim0 edge cases (1–11), replacing inefficient zscalv fallback.
- Refactored cdim0 == mnr condition into a switch statement to support multiple optimized cases.
- Added three new macros for column-stored packing with distinct masking patterns.
- Implemented 11 dedicated handlers for row and column stored A matrix packing
with efficient masked loads/stores for partial data.
AMD-Internal: [CPUPL-6677]
Co-authored-by: harsh dave <harsdave@amd.com>
* Update bli_packm_zen4_asm_z12xk.c
---------
Co-authored-by: harsh dave <harsdave@amd.com>
Co-authored-by: Sharma, Shubham <Shubham.Sharma3@amd.com>