composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-17 17:19:12 +00:00

Files

Anthony Chang 9287b7c6b3 Grouped batched attention + permute (#412 )

* grouped attn without batch validates; now move toward grouped batched attn

* grouped batched attention

* working

* remove debug logging

clean up

clean up

* reintroduce g_ prefix back to host tensor variables

* format

* rename file

* restore old file

* rename

* consolidate padded/non-padded attention example

* harmonize padding specialization in attn examples

2022-09-19 16:09:44 -05:00

impl

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

convolution_backward_data_specialization.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

convolution_backward_weight_specialization.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

convolution_forward_specialization.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_base.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

device_batched_contraction_multiple_d_xdl_cshuffle.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

device_batched_contraction_multiple_d.hpp

Add batched/grouped_gemm contraction deviceOps (#349 )

2022-08-10 12:20:29 -05:00

device_batched_gemm_e_permute.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_batched_gemm_gemm_xdl_cshuffle.hpp

Grouped batched attention + permute (#412 )

2022-09-19 16:09:44 -05:00

device_batched_gemm_gemm.hpp

Attention with output permutation (#370 )

2022-08-23 14:52:56 -05:00

device_batched_gemm_multi_d_xdl.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

device_batched_gemm_multi_d.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_batched_gemm_multiple_d_gemm_multiple_d_xdl_cshuffle.hpp

batched_gemm + multiple_d + gemm + multiple_d (#394 )

2022-09-14 17:54:18 -05:00

device_batched_gemm_multiple_d_gemm_multiple_d.hpp

batched_gemm + multiple_d + gemm + multiple_d (#394 )

2022-09-14 17:54:18 -05:00

device_batched_gemm_reduce_xdl_cshuffle.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_batched_gemm_softmax_gemm_permute_xdl_cshuffle.hpp

Fix gemm-softmax-gemm-permute padding cases (#409 )

2022-09-08 09:27:50 -05:00

device_batched_gemm_softmax_gemm_permute.hpp

Attention with output permutation (#370 )

2022-08-23 14:52:56 -05:00

device_batched_gemm_softmax_gemm_xdl_cshuffle.hpp

Fused attention instances & padding tests (#395 )

2022-09-06 14:38:56 -05:00

device_batched_gemm_softmax_gemm.hpp

Attention with output permutation (#370 )

2022-08-23 14:52:56 -05:00

device_batched_gemm_xdl.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_batched_gemm.hpp

Gemm+Bilinear (#316 )

2022-07-02 09:15:38 -05:00

device_batchnorm_forward.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_batchnorm_infer.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_cgemm_4gemm_xdl_cshuffle.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_cgemm.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

device_contraction_multiple_d_xdl_cshuffle.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

device_contraction_multiple_d.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_conv2d_backward_weight_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_conv2d_fwd_xdl_c_shuffle_bias_activation_add_nhwc_kyxc_nhwk.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_conv2d_fwd_xdl_c_shuffle_bias_activation_nhwc_kyxc_nhwk.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_conv2d_fwd_xdl_nhwc_kyxc_nhwk.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_conv3d_fwd_naive_ndhwc_kzyxc_ndhwk.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

device_conv3d_fwd_xdl_ndhwc_kzyxc_ndhwk.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

device_conv_bwd_data.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_conv_bwd_weight.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_conv_fwd_bias_activation_add.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

device_conv_fwd_bias_activation.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

device_conv_fwd.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_convnd_bwd_data_nwc_kxc_nwk_xdl.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_convnd_bwd_weight_nwc_kxc_nwk_xdl_cshuffle.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_elementwise_base.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_elementwise.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_gemm_bias_add_reduce_xdl_cshuffle.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_gemm_bias_e_permute_xdl.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

device_gemm_bias_e_permute.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_gemm_dl.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_gemm_multiple_d_multiple_r_xdl_cshuffle.hpp

Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle (#378 )

2022-08-24 10:12:54 -05:00

device_gemm_multiple_d_multiple_r.hpp

Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle (#378 )

2022-08-24 10:12:54 -05:00

device_gemm_multiple_d_xdl_cshuffle.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

device_gemm_multiple_d.hpp

Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle (#378 )

2022-08-24 10:12:54 -05:00

device_gemm_reduce_xdl_cshuffle.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_gemm_reduce.hpp

Improve external interface for GEMM and GEMM+add+add+fastgelu (#311 )

2022-06-30 22:11:00 -05:00

device_gemm_splitk.hpp

Improve external interface for GEMM and GEMM+add+add+fastgelu (#311 )

2022-06-30 22:11:00 -05:00

device_gemm_xdl_cshuffle.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_gemm_xdl_layernorm_cshuffle.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_gemm_xdl_skip_b_lds.hpp

Fused GEMM+GEMM (#351 )

2022-08-13 09:18:58 -05:00

device_gemm_xdl_splitk_c_shuffle.hpp

Add examples of batched/grouped/SplitK Gemm for int8/bfp16/fp16/fp32 (#361 )

2022-08-23 14:41:56 -05:00

device_gemm_xdl.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_gemm.hpp

Refactor the design of DeviceGemmMultipleDMultipleR_Xdl_CShuffle (#378 )

2022-08-24 10:12:54 -05:00

device_grouped_contraction_multiple_d_xdl_cshuffle.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

device_grouped_contraction_multiple_d.hpp

Add batched/grouped_gemm contraction deviceOps (#349 )

2022-08-10 12:20:29 -05:00

device_grouped_conv_bwd_data_multiple_d.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

device_grouped_conv_fwd_multiple_d_multiple_r_xdl_cshuffle.hpp

Add examples of Conv + reduction (data type: int4, int8, bf16, fp16, fp32) (#380 )

2022-08-31 16:32:17 -05:00

device_grouped_conv_fwd_multiple_d_multiple_r.hpp

Add examples of Conv + reduction (data type: int4, int8, bf16, fp16, fp32) (#380 )

2022-08-31 16:32:17 -05:00

device_grouped_conv_fwd_multiple_d_xdl_cshuffle.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

device_grouped_conv_fwd_multiple_d.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

device_grouped_gemm_softmax_gemm_permute_xdl_cshuffle.hpp

Grouped batched attention + permute (#412 )

2022-09-19 16:09:44 -05:00

device_grouped_gemm_softmax_gemm_permute.hpp

Grouped batched attention + permute (#412 )

2022-09-19 16:09:44 -05:00

device_grouped_gemm_xdl.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

device_grouped_gemm.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_layernorm_impl.hpp

Layernorm welford (#346 )

2022-08-13 09:43:18 -05:00

device_multiple_reduce_multiblock.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_multiple_reduce_threadwise.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_multiple_reduce.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_normalization.hpp

Layernorm welford (#346 )

2022-08-13 09:43:18 -05:00

device_pool2d_fwd_nhwc_nhwc.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_pool2d_fwd.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

device_reduce_common.hpp

Batchnorm-forward and Batchnorm-infer Implemented using generic kernels (#320 )

2022-08-15 10:11:02 -05:00

device_reduce_multiblock.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_reduce_threadwise.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_reduce.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

device_softmax.hpp

Softmax client example (#396 )

2022-09-06 12:22:48 -05:00

device_sparse_embedding3_forward_layernorm.hpp

embedding fuse layernorm (#405 )

2022-09-09 10:41:15 -05:00

gemm_specialization.hpp

Implement padding and sanity checks for fused GEMM+GEMM (#376 )

2022-08-23 10:01:02 -05:00

matrix_padder.hpp

batched_gemm + multiple_d + gemm + multiple_d (#394 )

2022-09-14 17:54:18 -05:00

reduction_operator_mapping.hpp

add license in file (#303 )

2022-06-24 23:32:43 -05:00

tensor_layout.hpp

Conv bwd data multiple d (#404 )

2022-09-19 11:25:28 -05:00

tensor_specialization.hpp

Add batched/grouped_gemm contraction deviceOps (#349 )

2022-08-10 12:20:29 -05:00