composable_kernel/include/ck/tensor_operation/gpu/device at 0ffe956ab1c1a8e128c2d6e419de68fcc1a8b5ff - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-05 14:11:29 +00:00

Files

History

rocking5566 0ffe956ab1 Gemm reduce max (#209 )

* [What] Rename the example
[Why] Prepare to add unary reduction

* Add global oparation to the parameter

* Add atomicmax

* Fix compile error

* Support atomicMax (hip library)

* Rename the reduction example

* Fix target name

* use p_d1_grid as the indicator directly

* Prevent performance issue. Let passthrough handle it.

* Implement the function template the specialize the float2

* No need to separate into two lines

* Remove empty line

* add comment

* Fix compile error due to merge from develop

* make the implementation of atomic_max / atomic_add explicit for each datatype

* Refine typo

* For future CI test

* Fix compiler error in ckProfiler

* Merge commit 'de2769e3a6695b38a20529261273ddc5cdaab2fe'

* simply use remove_pointer

* Rename type and var

* Refine example

* Modify reducemax example

* Fix bug in reduction

* Change initialize range

* Implement F64 version of atomicMax

* Move reduction  code together

* Add buffer atomic_max

* Fix coding style by clang-format

* Integrate new api of DeviceGemmReduce_Xdl_CShuffle

* Integrate Batch gemm reduction

* Fix example

* fix example

* clean up

* Fix batch gemm tensor operation

* Fix coding style

* Fix template augument

* Fix clang format

* Keep flexible of different stride for each D tensor

* Fix compile error for ckProfiler

* Fix typo

* [What] Fix naming
[Why] Prepare to add out elementop

* Add DoutElementOp

Co-authored-by: Chao Liu <chao.liu2@amd.com>
Co-authored-by: rocking <chunylai@amd.com>

2022-05-19 21:56:56 -05:00

..

convolution_backward_data_specialization.hpp

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

convolution_forward_specialization.hpp

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

device_base.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_batched_gemm_reduce_xdl_cshuffle.hpp

Gemm reduce max (#209 )

2022-05-19 21:56:56 -05:00

device_batched_gemm_xdl.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_binary_elementwise.hpp

elementwise op (#238 )

2022-05-18 23:34:35 -05:00

device_conv2d_backward_weight_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_conv2d_fwd_xdl_c_shuffle_bias_activation_add_nhwc_kyxc_nhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_conv2d_fwd_xdl_c_shuffle_bias_activation_nhwc_kyxc_nhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_conv2d_fwd_xdl_nhwc_kyxc_nhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_conv3d_fwd_naive_ndhwc_kzyxc_ndhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_conv3d_fwd_xdl_ndhwc_kzyxc_ndhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_conv_backward_weight.hpp

NHWC Conv2d Bwd weight fp16 ckprofiler and test (#166 )

2022-04-04 20:32:00 -05:00

device_conv_bwd_data.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_conv_fwd_bias_activation_add.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_conv_fwd_bias_activation.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_conv_fwd.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_convnd_bwd_data_xdl_ndhwc_kzyxc_ndhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_convnd_fwd_xdl_nhwc_kyxc_nhwk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_gemm_bias_activation_add.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_gemm_bias_activation.hpp

Reorganize files, Part 1 (#119 )

2022-03-08 21:46:36 -06:00

device_gemm_bias.hpp

Gemm+Reduce Fusion (#128 )

2022-03-23 22:18:42 -05:00

device_gemm_reduce_xdl_cshuffle.hpp

Gemm reduce max (#209 )

2022-05-19 21:56:56 -05:00

device_gemm_reduce.hpp

Gemm reduce max (#209 )

2022-05-19 21:56:56 -05:00

device_gemm_xdl_c_shuffle_bias_2d.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_gemm_xdl_c_shuffle_bias_activation_add.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_gemm_xdl_c_shuffle_bias_activation.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_gemm_xdl_cshuffle.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_gemm_xdl_splitk_c_shuffle.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_gemm_xdl_splitk.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_gemm_xdl.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_gemm.hpp

batched_gemm: use profiler in ctest (#163 )

2022-03-30 21:32:49 -05:00

device_grouped_gemm_xdl.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_pool2d_fwd_nhwc_nhwc.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_pool2d_fwd.hpp

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

device_reduce_blockwise_second_call.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_reduce_blockwise.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_reduce_common.hpp

Reduction for int8 and bfloat16 (#125 )

2022-03-22 14:35:14 -05:00

device_reduce_multiblock_atomic_add.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_reduce_multiblock_partial_reduce.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_reduce_threadwise.hpp

Add host API (#220 )

2022-05-12 09:21:01 -05:00

device_reduce.hpp

Reduction for int8 and bfloat16 (#125 )

2022-03-22 14:35:14 -05:00

gemm_specialization.hpp

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

reduction_operator_mapping.hpp

Compile for gfx908 and gfx90a (#130 )

2022-03-31 12:33:34 -05:00

tensor_layout.hpp

batched_gemm: use profiler in ctest (#163 )

2022-03-30 21:32:49 -05:00