composable_kernel/include/ck/tensor_operation/gpu/device at 63eee2d9991b08ca286f6895dd8f90da12a62da3 - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-13 10:37:42 +00:00

Files

History

Qianfeng 63eee2d999 Overhaul to Reducton and its dependants (#237 )

* Tiny fix in dynamic_buffer.hpp to support vectorized AtomicAdd for double type

* Update to host layer and host reduction

* Merge and remove reduction kernels

* Merge and remove reduction device interfaces and update pooling device interface

* Merge and remove useless reduction device instances

* Update to reduction profiler and reduction ctests

* Update to reduction and pooling examples and add one reduction example

* Change to reduction examples to let them testable by ctest

* Add explicit pass checking for reduction and pooling examples

* Explicit assignment of tensor shapes in example reduce_blockwise_two_call

* Use atomic_add to repace atomicAdd and add atomic_add for double type

* Add reduce ctest support for double data type

* Replace to_int_vector() by using c++ std::vector::assign()

* Keep DeviceReduceThreadWise separated from DeviceReduceBlockWise

* Merge DeviceReduceBlockWise and DeviceReduceMultiBlockAtomicAdd into DeviceReduceMultiBlock

* Add GetAtomicOperationZeroValue() support for AtomicMax

* Tiny change to reduce example README.md

* Fix some tiny issues due to branch merging

* Revoke previous change in dynamic_buffer.hpp and add atomic_add for double2_t

* Add reduce multiblock_atomic_add instances for fp64 to verify vectorized atomic_add on fp64

* Renaming

* Clean the header includings in device_reduce instances header files

2022-05-24 12:19:12 -05:00

..

convolution_backward_data_specialization.hpp

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

convolution_backward_weight_specialization.hpp

example of conv bwd weight 1d/2d/3d fp32/fp16/bf16 xdl (#244 )

2022-05-20 17:20:10 -05:00

convolution_forward_specialization.hpp

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

device_base.hpp

add GetWorkSpaceSize to base arg (#253 )

2022-05-24 11:13:00 -05:00

device_batched_gemm_reduce_xdl_cshuffle.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_batched_gemm_xdl.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_binary_elementwise.hpp

Hotfix eltiwseop (#242 )

2022-05-19 22:02:06 -05:00

device_conv2d_backward_weight_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_conv2d_fwd_xdl_c_shuffle_bias_activation_add_nhwc_kyxc_nhwk.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_conv2d_fwd_xdl_c_shuffle_bias_activation_nhwc_kyxc_nhwk.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_conv2d_fwd_xdl_nhwc_kyxc_nhwk.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_conv3d_fwd_naive_ndhwc_kzyxc_ndhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_conv3d_fwd_xdl_ndhwc_kzyxc_ndhwk.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_conv_backward_weight.hpp

NHWC Conv2d Bwd weight fp16 ckprofiler and test (#166 )

2022-04-04 20:32:00 -05:00

device_conv_bwd_data.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_conv_fwd_bias_activation_add.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_conv_fwd_bias_activation.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_conv_fwd.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_convnd_backward_weight_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp

add GetWorkSpaceSize to base arg (#253 )

2022-05-24 11:13:00 -05:00

device_convnd_bwd_data_xdl_ndhwc_kzyxc_ndhwk.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_convnd_fwd_xdl_nhwc_kyxc_nhwk.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_gemm_bias_activation_add.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_gemm_bias_activation.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_gemm_bias.hpp

Gemm+Reduce Fusion (#128 )

2022-03-23 22:18:42 -05:00

device_gemm_reduce_xdl_cshuffle.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_gemm_reduce.hpp

Gemm reduce max (#209 )

2022-05-19 21:56:56 -05:00

device_gemm_xdl_c_shuffle_bias_2d.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_gemm_xdl_c_shuffle_bias_activation_add.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_gemm_xdl_c_shuffle_bias_activation.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_gemm_xdl_cshuffle.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_gemm_xdl_splitk_c_shuffle.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_gemm_xdl_splitk.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_gemm_xdl.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_gemm.hpp

batched_gemm: use profiler in ctest (#163 )

2022-03-30 21:32:49 -05:00

device_grouped_gemm_xdl.hpp

Refactor block to C tile map (#235 )

2022-05-20 12:40:51 -05:00

device_pool2d_fwd_nhwc_nhwc.hpp

Overhaul to Reducton and its dependants (#237 )

2022-05-24 12:19:12 -05:00

device_pool2d_fwd.hpp

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

device_reduce_common.hpp

Overhaul to Reducton and its dependants (#237 )

2022-05-24 12:19:12 -05:00

device_reduce_multiblock.hpp

Overhaul to Reducton and its dependants (#237 )

2022-05-24 12:19:12 -05:00

device_reduce_threadwise.hpp

Overhaul to Reducton and its dependants (#237 )

2022-05-24 12:19:12 -05:00

device_reduce.hpp

Overhaul to Reducton and its dependants (#237 )

2022-05-24 12:19:12 -05:00

gemm_specialization.hpp

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

reduction_operator_mapping.hpp

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

tensor_layout.hpp

batched_gemm: use profiler in ctest (#163 )

2022-03-30 21:32:49 -05:00