mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 10:09:41 +00:00
* clean up
* add mutilple thread scratch to ThreadwiseTensorSliceTransfer_v3r1
* add 2 stage prefetch
* add more sanity check into transform_tensor_descriptor
* tweak
* enabling 2 stage prefetch to exsiting gridwise gemm; tweak
* enabling 2 stage prefetch to exsiting gridwise gemm
* move gridwise gemm pipeline in class; clean up
* add some irregular tile size
* update CalculateHasMainK0BlockLoop for multi-stage-prefetch
* refactor gridwise gemm pipeline class
[ROCm/composable_kernel commit: 22d438ae9e]