composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-14 11:07:44 +00:00

Files

linqunAMD 1749c0409e [CK][CONV] Support NCHW in class DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle (#2375 )

1. When conv spec is 1x1 stride1 pad0, nchw is equal with matrix A + column major, we only need minor change in conv transformer to support it.
2. when out is NKHW, it is equal with matrix C with column major. we need swap A & B to get best performance.
3. Add new instance device_grouped_conv_fwd_xdl_f16_nchw_instances for nchw.

2025-06-26 08:32:39 +08:00

impl

[CK][CONV] Support NCHW in class DeviceGroupedConvFwdMultipleABD_Xdl_CShuffle (#2375 )

2025-06-26 08:32:39 +08:00

conv_tensor_rearrange_op.hpp

Add column to image kernel (#930 )

2023-09-27 17:19:06 +02:00

convolution_backward_data_specialization.hpp

Grouped 3d conv backward data support (#799 )

2023-07-18 11:01:33 -05:00

convolution_backward_weight_specialization.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

convolution_forward_specialization.hpp

Codegen hipRTC compilation (#1579 )

2025-01-31 09:48:39 -08:00

device_avgpool_bwd.hpp

Average pool backward deviceOP and example (#797 )

2023-08-10 12:04:35 +08:00

device_base.hpp

Rebase the PR #1520 to ROCm repo. (#1574 )

2025-02-20 18:58:14 -08:00

device_batched_contraction_multiple_d.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_batched_gemm_e_permute.hpp

Clean up conv example, Instances, profiler and test (#324 )

2022-07-29 18:19:25 -05:00

device_batched_gemm_gemm.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_batched_gemm_multi_d.hpp

Add SplitK support into Batched GEMM V3 (#1729 )

2024-12-13 21:08:35 +01:00

device_batched_gemm_multiple_d_gemm_multiple_d.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_batched_gemm_softmax_gemm_permute.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_batched_gemm_softmax_gemm.hpp

Rebase the PR #1520 to ROCm repo. (#1574 )

2025-02-20 18:58:14 -08:00

device_batched_gemm.hpp

Added Int4 mixed batch gemm support (#1839 )

2025-02-10 11:17:02 +08:00

device_batchnorm_backward.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_batchnorm_forward.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_batchnorm_infer.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_cgemm.hpp

Implement GetWorkSpaceSize from BaseOperator. (#1564 )

2024-10-12 14:05:11 +08:00

device_contraction_multiple_abd.hpp

Add contraction_multi_abd (#972 )

2023-10-17 20:17:58 -05:00

device_contraction_multiple_d.hpp

Add support for mixed precision in contraction scale and bilinear (#973 )

2023-11-02 14:26:33 -07:00

device_conv_bwd_data.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_conv_fwd_bias_activation_add.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_conv_fwd_bias_activation.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_conv_fwd.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_conv_tensor_rearrange.hpp

Add support for groups in Img2Col/Col2Img (#1007 )

2023-10-31 10:46:32 +01:00

device_elementwise_normalization.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_elementwise_scale.hpp

Refactor elementwise kernels (#1222 )

2024-04-19 13:31:17 +02:00

device_elementwise.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_gemm_bias_e_permute.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_gemm_dequantB.hpp

Navi3 rel (#1176 )

2024-03-08 17:11:51 -08:00

device_gemm_multiple_abd.hpp

Add multiple A/B support (#906 )

2023-09-26 21:16:23 -05:00

device_gemm_multiple_d_ab_scale.hpp

Add MoE & FP8 Blockscale WP Kernels for GFX950 (#2297 )

2025-06-12 09:25:59 +08:00

device_gemm_multiple_d_layernorm.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_gemm_multiple_d_multiple_r.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_gemm_multiple_d.hpp

Add MoE & FP8 Blockscale WP Kernels for GFX950 (#2297 )

2025-06-12 09:25:59 +08:00

device_gemm_mx.hpp

Optimized GEMMs for MX FP4/8 (#2294 )

2025-06-05 13:54:15 -06:00

device_gemm_reduce.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_gemm_splitk.hpp

Add splitk gemm fp16 @ fp16 with fp8 compute instances (#983 )

2023-10-13 16:27:11 -05:00

device_gemm_streamk_v2.hpp

Stream-K Reduction option as Runtime parameter and Compilation Error Fix (SK- Reduction) (#2145 )

2025-06-11 10:59:44 -07:00

device_gemm_streamk.hpp

initial stream-k implementation with example (#699 )

2023-07-26 14:18:15 -05:00

device_gemm_v2.hpp

Ck int4 moe develop (#1949 )

2025-03-10 11:16:44 +08:00

device_gemm.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_grouped_contraction_multiple_d.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_grouped_conv_bwd_data_multiple_d.hpp

Integrate universal gemm with conv bwd data and add SplitK (#1315 )

2025-04-28 23:54:49 +02:00

device_grouped_conv_bwd_weight_multiple_d.hpp

Add grouped conv bwd weight multi d kernel (#1237 )

2024-04-18 23:35:04 +02:00

device_grouped_conv_bwd_weight.hpp

Add conv bwd weight fp16 comp bf8 fp8 op, instances and example (#945 )

2023-10-04 08:19:08 -05:00

device_grouped_conv_fwd_multiple_abd.hpp

Codegen hipRTC compilation (#1579 )

2025-01-31 09:48:39 -08:00

device_grouped_conv_fwd_multiple_d.hpp

Introduce multiABD api and deprecate multiD (#1035 )

2023-11-14 17:00:40 +01:00

device_grouped_conv_fwd.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_grouped_gemm_fixed_nk.hpp

Polished Grouped GEMM APIs and new BF16 instances (#1600 )

2024-11-27 13:02:44 +01:00

device_grouped_gemm_multi_abd_fixed_nk.hpp

Added Multi_ABD support into Gemm and GroupedGemmFixedNK (#978 )

2024-04-15 21:09:45 -05:00

device_grouped_gemm_multi_abd.hpp

Added Multi_ABD support into Gemm and GroupedGemmFixedNK (#978 )

2024-04-15 21:09:45 -05:00

device_grouped_gemm_softmax_gemm_permute.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_grouped_gemm_splitk.hpp

Polished Grouped GEMM APIs and new BF16 instances (#1600 )

2024-11-27 13:02:44 +01:00

device_grouped_gemm_tile_loop.hpp

Polished Grouped GEMM APIs and new BF16 instances (#1600 )

2024-11-27 13:02:44 +01:00

device_grouped_gemm.hpp

Polished Grouped GEMM APIs and new BF16 instances (#1600 )

2024-11-27 13:02:44 +01:00

device_max_pool_bwd.hpp

MaxPool & AvgPool bwd instances, test, ckProfiler, client example (#861 )

2023-08-31 21:01:50 +08:00

device_multiple_reduce.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_normalization_bwd_data.hpp

layernorm and groupnorm backward data (#1083 )

2023-12-19 04:23:11 +08:00

device_normalization_bwd_gamma_beta.hpp

Backward of gamma and beta for layernorm and groupnorm (#1013 )

2023-11-10 18:02:03 +08:00

device_normalization_fwd.hpp

Layernorm4d (#1022 )

2023-11-09 08:34:51 +08:00

device_permute.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_pool_fwd.hpp

Refactor pool fwd (#815 )

2023-08-15 02:25:28 +08:00

device_put_element.hpp

Average pool backward deviceOP and example (#797 )

2023-08-10 12:04:35 +08:00

device_reduce_multi_d.hpp

Universal gemm splitk using reduce (with multi-d) (#1341 )

2024-07-19 22:01:22 +08:00

device_reduce.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

device_softmax.hpp

Revert "Grouped Gemm with looping over the tiles. (#788 )" (#982 )

2023-10-11 14:27:29 -05:00

device_splitk_contraction_multiple_d.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

gemm_specialization.hpp

Rebase the PR #1520 to ROCm repo. (#1574 )

2025-02-20 18:58:14 -08:00

helper.hpp

Refactor transform conv to gemm fwd (#1391 )

2024-07-19 09:29:25 +02:00

masking_specialization.hpp

Rebase the PR #1520 to ROCm repo. (#1574 )

2025-02-20 18:58:14 -08:00

matrix_padder.hpp

CK Instance Gen (#1145 )

2024-06-25 16:37:35 -05:00

reduction_operator_mapping.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

tensor_layout.hpp

Rebase the PR #1520 to ROCm repo. (#1574 )

2025-02-20 18:58:14 -08:00

tensor_specialization.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00

welford_helper.hpp

update copyright headers (#726 )

2023-05-31 18:46:57 -05:00