Details:
- Added cpp guards around the constraints in bli_kernel_macro_defs.h
that enforce MC % NR = 0 and NC % MR = 0. These constraints are ONLY
needed when handling right-side trsm by allowing the matrix on the
right (matrix B) to be triangular, because it involves swapping
register, but not cache, blocksizes (packing A by NR and B by MR)
and then swapping the operands to gemmtrsm just before that kernel
is called. It may be useful to disable these constraints if, for
example, the developer wishes to test the configuration with
a different set of cache blocksizes where only MC % MR = 0 and
NC % NR = 0 are enforced.
- In summary, #defining BLIS_RELAX_MCNR_NCMR_CONSTRAINTS will bypass
the enforcement of MC % NR = 0 and NC % MR = 0.