mirror of
https://github.com/ROCm/composable_kernel.git
synced 2026-05-14 18:17:44 +00:00
* Add ThreadwiseReduction functor as per-thread reduction api
* Using ThreadwiseReduce api and some change in using PartitionedBlockwiseReduction api to simply the kernels
* Add comments and remove useless declarations in the kernels
* Tiny updates
[ROCm/composable_kernel commit: 82c8b9f8ee]