Packed A matrix stride update to account for fringe cases.

-When A matrix is packed, it is packed in blocks of MRxKC, to form a whole packed MCxKC block. If the m value is not a multiple of MR, then the m % MR block is packed in a different manner as opposed to the MR blocks. Subsequently the strides of the packed MR block and m % MR blocks are different and the same needs to be updated when calling the GEMV kernels with packed A matrix. -Fixes to address compiler warnings. AMD-Internal: [SWLCSG-3359] Change-Id: I7f47afbc9cd92536cb375431d74d9b8bca7bab44
2026-04-24 01:28:51 +00:00 · 2025-01-21 11:45:28 +00:00
parent 66461b8df3
commit 39289858b7
8 changed files with 55 additions and 18 deletions
--- a/bench/bench_aocl_gemm/bench_input.txt
+++ b/bench/bench_aocl_gemm/bench_input.txt
@@ -1,3 +1,4 @@
+r t n n r 1 128 64 1 128 128 *:none
 c n t n n 32 128 2 32 128 32 bf16bf16f32of32:bias=na,swish
 r n n n r 6 1 4 4 16 16 bf16s4f32of32:pre_op_scale=scalar,pre_op_scale_type=bf16,group_size=2
 r n n n r 6 1 4 4 16 16 bf16s4f32of32:pre_op_zp=vector,pre_op_scale=scalar,pre_op_scale_type=bf16,group_size=2