* Add padding support with transpose
Also move check before writing storing is_src_valid during reading
* Add/modify instances to use wave transfer for gemm universal
Condition is changed so now the vectorsize of vmem reading and lds
writing must be equal to 8 in order to use the wave transfer
* Fix clang format
* Modify example
* Fix bwd data
* Add restriction for wave transfer with padding and transpose
Add test case which shows this limitation
* Fix validity checks 8 bit types
* Add validity check gemm_bias_add_reduce
* Add validity check grouped gemm tile loop
* Fix validity checks new flavours
* Minor fixes
* Fix clang format
[ROCm/composable_kernel commit: 2e49b6b2f7]