mirror of
https://github.com/amd/blis.git
synced 2026-05-11 09:39:59 +00:00
Details: - Relaxed a long-held requirement in register blocksizes that required the kernel programmer to choose a KC that was divisible by both MR and NR. This was very constraining on some architectures that did not use register blocksizes that were powers of two. The constraint is now enforced only for trmm and trsm, where it is needed, and it is now handled by "nudging" kc upward at runtime, if necessary, to be a multiple of MR or NR, as needed. - Defined bli_trmm_determine_kc_[fb]() and bli_trsm_determine_kc_[fb](), which determine blocksizes for trmm and trsm, taking special care to "nudge" the kc dimension up to a multiple of MR or NR, as needed. - Changed bli_trmm_blk_var3[fb].c to call bli_trmm_determine_kc_[fb]() instead of bli_determine_blocksize_[fb](). - Added safeguard to bli_align_dim_to_mult() that returns the dimension unmodified if the dimension multiple is zero (to avoid division by zero). - Removed cpp guard/check for KC % MR == 0 and KC % NR == 0 from bli_kernel_macro_defs.h. - Whitespace, variable name changes to bli_blocksize.c. - Removed old commented code from bli_gemm_cntl.c.