Files
composable_kernel/include/ck/tensor_operation/gpu/thread
Qianfeng 70544c4c2c Improve Reduction kernel api (#152)
* Add ThreadwiseReduction functor as per-thread reduction api

* Using ThreadwiseReduce api and some change in using PartitionedBlockwiseReduction api to simply the kernels

* Add comments and remove useless declarations in the kernels

* Tiny updates

[ROCm/composable_kernel commit: 82c8b9f8ee]
2022-04-04 20:31:44 -05:00
..