Files
blis/kernels
Dave, Harsh b88bea6e72 Optimize ZGEMM Packing Kernel for M-Dimension Edge Cases (cdim0 1–11) (#135)
* Optimize ZGEMM Packing Kernel for M-Dimension Edge Cases (cdim0 1–11)

- Introduced specialized AVX-512 assembly paths for cdim0 edge cases (1–11), replacing inefficient zscalv fallback.
- Refactored cdim0 == mnr condition into a switch statement to support multiple optimized cases.
- Added three new macros for column-stored packing with distinct masking patterns.
- Implemented 11 dedicated handlers for row and column stored A matrix packing
  with efficient masked loads/stores for partial data.

    AMD-Internal: [CPUPL-6677]

Co-authored-by: harsh dave <harsdave@amd.com>

* Update bli_packm_zen4_asm_z12xk.c

---------

Co-authored-by: harsh dave <harsdave@amd.com>
Co-authored-by: Sharma, Shubham <Shubham.Sharma3@amd.com>
2025-08-18 12:38:45 +05:30
..
2021-10-08 02:35:58 +09:00
2024-08-05 15:35:08 -04:00
2024-08-05 15:35:08 -04:00
2024-08-05 15:35:08 -04:00
2023-11-23 08:54:31 -05:00
2020-07-22 18:24:26 +05:30