mirror of
https://github.com/amd/blis.git
synced 2026-04-19 23:28:52 +00:00
Optimize ZGEMM Packing Kernel for M-Dimension Edge Cases (cdim0 1–11) (#135)
* Optimize ZGEMM Packing Kernel for M-Dimension Edge Cases (cdim0 1–11)
- Introduced specialized AVX-512 assembly paths for cdim0 edge cases (1–11), replacing inefficient zscalv fallback.
- Refactored cdim0 == mnr condition into a switch statement to support multiple optimized cases.
- Added three new macros for column-stored packing with distinct masking patterns.
- Implemented 11 dedicated handlers for row and column stored A matrix packing
with efficient masked loads/stores for partial data.
AMD-Internal: [CPUPL-6677]
Co-authored-by: harsh dave <harsdave@amd.com>
* Update bli_packm_zen4_asm_z12xk.c
---------
Co-authored-by: harsh dave <harsdave@amd.com>
Co-authored-by: Sharma, Shubham <Shubham.Sharma3@amd.com>
This commit is contained in:
@@ -246,6 +246,10 @@ void bli_zpackm_zen4_asm_12xk
|
||||
const uint64_t lda = lda0;
|
||||
const uint64_t ldp = ldp0;
|
||||
|
||||
// Note: k_left is currently initialized as k % 4, which ensures safe mask calculation.
|
||||
// Be cautious if modifying this logic in the future (e.g., using k % by other large values),
|
||||
// as (k_left * 2) may overflow when used in bit shifts, potentially causing undefined behavior
|
||||
// or incorrect masks for uint8_t. Ensure k_left remains within a safe range (e.g., < 128
|
||||
uint8_t mask = ((1 << (k_left*2)) - 1);
|
||||
if (mask == 0) mask = 0xff;
|
||||
|
||||
|
||||
Reference in New Issue
Block a user