Files
composable_kernel/include/ck/tensor_operation/gpu/thread
Qianfeng 82c8b9f8ee Improve Reduction kernel api (#152)
* Add ThreadwiseReduction functor as per-thread reduction api

* Using ThreadwiseReduce api and some change in using PartitionedBlockwiseReduction api to simply the kernels

* Add comments and remove useless declarations in the kernels

* Tiny updates
2022-04-04 20:31:44 -05:00
..