composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-07 00:04:37 +00:00

Files

JP-Fernando d8ee107a47 [rocm-libraries] ROCm/rocm-libraries#4421 (commit 5bb5769)

[CK] Unify the grouped convolution gridwise Run() functions
 (#4421)
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

## Motivation

There are currently three different grouped convolution related Run()
function overloads that exist in `gridwise_gemm_wmma_cshuffle_v3.hpp`.
These are used for the different types of grouped convolution: Forward,
Backward weights, and Backward data.
The functions are very similar and should be unified to a single `Run()`
function for all types of grouped convolution.

## Technical Details

The three old `Run<>()` functions were replaced with a single unified
function.
The new `Run<>()` function is run from device implementations:

-  DeviceGroupedConvFwdMultipleABD_Wmma_CShuffle_V3

-  DeviceGroupedConvBwdDataMultipleD_Wmma_CShuffleV3

-  DeviceGroupedConvBwdWeightMultipleD_Wmma_CShuffleV3

-  DeviceGroupedConvBwdWeightTwoStage_Wmma_CShuffleV3

-  DeviceGroupedConvBwdWeight_Wmma_CShuffleV3

The DeviceGroupedConvFwdMultipleD_Wmma_CShuffle_V3_Large_Tensor
implementation uses a different `Run<>()` overload and was therefore not
modified.

## Test Plan

Run the following grouped convolution tests on `gfx1201`, as this
architecture is WMMA-capable:

- `test_grouped_convnd_fwd`

- `test_grouped_convnd_bwd_weight`

- `test_grouped_convnd_bwd_data`

Compilation and testing were also executed on `gfx1100` to avoid CI
problems.

## Test Result

First part (unification of `Run<>()` function): All tests successful.

Second part (integration of single `Run<>()` function as a direct call):
All tests successful.

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

2026-03-11 16:40:12 +00:00

codegen_device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_avgpool2d_bwd_nhwc_nhwc.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_avgpool3d_bwd_ndhwc_ndhwc.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_batched_contraction_multiple_d_wmma_cshuffle_v3.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_batched_contraction_multiple_d_wmma_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_batched_contraction_multiple_d_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_batched_gemm_e_permute_xdl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_batched_gemm_gemm_wmma_cshuffle_v3_common.hpp

Remove code duplications in batched gemm (multi D) gemm (multi D) wmma (#3617 )

2026-01-26 10:20:30 -08:00

device_batched_gemm_gemm_wmma_cshuffle_v3.hpp

Remove code duplications in batched gemm (multi D) gemm (multi D) wmma (#3617 )

2026-01-26 10:20:30 -08:00

device_batched_gemm_gemm_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_batched_gemm_multi_d_xdl.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_batched_gemm_multiple_d_dl.hpp

Add support for RDNA1 GPUs (#3220 )

2025-11-20 10:45:57 -08:00

device_batched_gemm_multiple_d_gemm_multiple_d_wmma_cshuffle_v3.hpp

Remove code duplications in batched gemm (multi D) gemm (multi D) wmma (#3617 )

2026-01-26 10:20:30 -08:00

device_batched_gemm_multiple_d_gemm_multiple_d_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_batched_gemm_multiple_d_wmma_cshuffle_v3.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_batched_gemm_multiple_d_xdl_cshuffle_v3.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_batched_gemm_reduce_wmma_cshuffle_v3.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_batched_gemm_reduce_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_batched_gemm_softmax_gemm_permute_wmma_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_batched_gemm_softmax_gemm_permute_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_batched_gemm_softmax_gemm_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_batched_gemm_wmma_cshuffle_v3_b_scale.hpp

Remove code duplications in batched gemm wmma (#3580 )

2026-01-23 12:39:03 -08:00

device_batched_gemm_wmma_cshuffle_v3_common.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_batched_gemm_wmma_cshuffle_v3.hpp

Remove code duplications in batched gemm wmma (#3580 )

2026-01-23 12:39:03 -08:00

device_batched_gemm_xdl_fpAintB_b_scale.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_batched_gemm_xdl.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_batchnorm_backward_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_batchnorm_forward_impl_obsolete.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_batchnorm_forward_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_cgemm_4gemm_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_column_to_image_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_contraction_multiple_abd_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_contraction_multiple_d_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_contraction_utils.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_conv2d_backward_weight_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_conv2d_bwd_data_xdl_nhwc_kyxc_nhwk.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_conv2d_fwd_xdl_c_shuffle_bias_activation_add_nhwc_kyxc_nhwk.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_conv2d_fwd_xdl_c_shuffle_bias_activation_nhwc_kyxc_nhwk.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_conv2d_fwd_xdl_c_shuffle_nhwc_kyxc_nhwk.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_conv2d_fwd_xdl_nhwc_kyxc_nhwk.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_conv3d_fwd_naive_ndhwc_kzyxc_ndhwk.hpp

[CK] Integrate GPU reference into ckProfiler for convolutions (#3379 )

2025-12-18 07:59:45 +01:00

device_conv3d_fwd_xdl_ndhwc_kzyxc_ndhwk.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_convnd_bwd_data_nwc_kxc_nwk_dl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_convnd_bwd_data_nwc_kxc_nwk_xdl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_elementwise_dynamic_vector_dims_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_elementwise_normalization_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_elementwise_scale_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_fpAintB_gemm_wmma.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_bias_add_reduce_wmma_cshuffle_v3.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_gemm_bias_add_reduce_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_dl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_dpp.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_multiple_abd_wmma_cshuffle_v3.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_multiple_abd_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_multiple_d_dl.hpp

Add support for RDNA1 GPUs (#3220 )

2025-11-20 10:45:57 -08:00

device_gemm_multiple_d_layernorm_wmma_cshuffle_v3.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_gemm_multiple_d_layernorm_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_multiple_d_multiple_r_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_gemm_multiple_d_wmma_cshuffle_v3_ab_scale.hpp

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

device_gemm_multiple_d_wmma_cshuffle_v3_b_preshuffle.hpp

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

device_gemm_multiple_d_wmma_cshuffle_v3_blockscale_bpreshuffle.hpp

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

device_gemm_multiple_d_wmma_cshuffle_v3.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_multiple_d_wmma_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_multiple_d_xdl_cshuffle_lds_direct_load.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_multiple_d_xdl_cshuffle_v3_ab_scale.hpp

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

device_gemm_multiple_d_xdl_cshuffle_v3_b_preshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_multiple_d_xdl_cshuffle_v3_blockscale_bpreshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_multiple_d_xdl_cshuffle_v3.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_multiple_d_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_gemm_reduce_wmma_cshuffle_v3.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_gemm_reduce_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_wmma_cshuffle_v3_b_preshuffle.hpp

Implement device_gemm_universal_preshuffle_instance for RDNA4 (#3429 )

2026-01-15 07:19:31 -08:00

device_gemm_wmma_cshuffle_v3_b_scale.hpp

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

device_gemm_wmma_cshuffle_v3_common.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_gemm_wmma_cshuffle_v3.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_wmma_cshuffle_v3r1.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_gemm_wmma.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_cshuffle_lds_direct_load.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_cshuffle_streamk_v3.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_cshuffle_v2.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_cshuffle_v3_b_preshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_cshuffle_v3_b_scale.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_cshuffle_v3_mx.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_gemm_xdl_cshuffle_v3.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_cshuffle_v3r1.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_layernorm_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_skip_b_lds.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_splitk_c_shuffle_lds_direct_load.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_splitk_c_shuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_gemm_xdl_streamk.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_gemm_xdl_waveletmodel_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_gemm_xdl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_grouped_contraction_multiple_d_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_grouped_conv_bwd_data_multiple_d_wmma_cshuffle_v3.hpp

[rocm-libraries] ROCm/rocm-libraries#4421 (commit 5bb5769)

2026-03-11 16:40:12 +00:00

device_grouped_conv_bwd_data_multiple_d_wmma_cshuffle.hpp

[rocm-libraries] ROCm/rocm-libraries#4582 (commit 990a00d)

2026-02-27 03:06:29 +00:00

device_grouped_conv_bwd_data_multiple_d_xdl_cshuffle_v1.hpp

[rocm-libraries] ROCm/rocm-libraries#4582 (commit 990a00d)

2026-02-27 03:06:29 +00:00

device_grouped_conv_bwd_weight_dl.hpp

[CK_BUILDER] Instance traits for conv bwd weight algorithms (#3498 )

2025-12-31 15:41:15 -08:00

device_grouped_conv_bwd_weight_explicit.hpp

[Conv] Enable bwd weight splitk autodeduction with cap (#3656 )

2026-01-29 17:40:28 +00:00

device_grouped_conv_bwd_weight_multiple_d_wmma_cshuffle_v3.hpp

[rocm-libraries] ROCm/rocm-libraries#4421 (commit 5bb5769)

2026-03-11 16:40:12 +00:00

device_grouped_conv_bwd_weight_multiple_d_xdl_cshuffle.hpp

[Conv] Enable bwd weight splitk autodeduction with cap (#3656 )

2026-01-29 17:40:28 +00:00

device_grouped_conv_bwd_weight_two_stage_wmma_cshuffle_v3.hpp

[rocm-libraries] ROCm/rocm-libraries#4421 (commit 5bb5769)

2026-03-11 16:40:12 +00:00

device_grouped_conv_bwd_weight_two_stage_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_grouped_conv_bwd_weight_wmma_cshuffle_v3.hpp

[rocm-libraries] ROCm/rocm-libraries#4421 (commit 5bb5769)

2026-03-11 16:40:12 +00:00

device_grouped_conv_bwd_weight_wmma_cshuffle.hpp

[CK_BUILDER] Instance traits for conv bwd weight algorithms (#3498 )

2025-12-31 15:41:15 -08:00

device_grouped_conv_bwd_weight_xdl_cshuffle_v3.hpp

[rocm-libraries] ROCm/rocm-libraries#4271 (commit 6fce58e)

2026-02-11 09:08:38 +00:00

device_grouped_conv_bwd_weight_xdl_cshuffle.hpp

[rocm-libraries] ROCm/rocm-libraries#5135 (commit 5ccc138)

2026-03-09 16:35:26 +00:00

device_grouped_conv_fwd_dl_multiple_d_nhwc_kyxc_nhwk.hpp

Add describe() method to device ops for runtime introspection (#3375 )

2025-12-14 12:49:12 -08:00

device_grouped_conv_fwd_dl_nhwc_kyxc_nhwk.hpp

Add support for RDNA1 GPUs (#3220 )

2025-11-20 10:45:57 -08:00

device_grouped_conv_fwd_multiple_abd_wmma_cshuffle_v3.hpp

[rocm-libraries] ROCm/rocm-libraries#4421 (commit 5bb5769)

2026-03-11 16:40:12 +00:00

device_grouped_conv_fwd_multiple_abd_xdl_cshuffle_v3.hpp

[rocm-libraries] ROCm/rocm-libraries#4273 (commit 591f504)

2026-02-08 11:35:56 +00:00

device_grouped_conv_fwd_multiple_abd_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_grouped_conv_fwd_multiple_d_multiple_r_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_grouped_conv_fwd_multiple_d_multiple_r.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_grouped_conv_fwd_multiple_d_wmma_cshuffle_v3_large_tensor.hpp

WMMA grouped conv fwd large tensor extra flavors (#3582 )

2026-01-23 12:19:51 +01:00

device_grouped_conv_fwd_multiple_d_wmma_cshuffle.hpp

Add describe() method to device ops for runtime introspection (#3375 )

2025-12-14 12:49:12 -08:00

device_grouped_conv_fwd_multiple_d_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_grouped_conv_fwd_multiple_d_xdl_large_tensor_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_grouped_conv_utils.hpp

Grouped convolution forward device implementation and base flavors for RDNA3/4 (#2964 )

2025-12-18 13:12:15 -07:00

device_grouped_gemm_multi_abd_wmma_fixed_nk.hpp

[rocm-libraries] ROCm/rocm-libraries#4425 (commit 513cf9f)

2026-02-25 05:17:08 +00:00

device_grouped_gemm_multi_abd_xdl_fixed_nk.hpp

[rocm-libraries] ROCm/rocm-libraries#4425 (commit 513cf9f)

2026-02-25 05:17:08 +00:00

device_grouped_gemm_multiple_d_dl.hpp

Add support for RDNA1 GPUs (#3220 )

2025-11-20 10:45:57 -08:00

device_grouped_gemm_multiple_d_splitk_xdl_cshuffle_two_stage.hpp

[CK grouped gemm] Fix grouped gemm two stage HasMainK0BlockLoop (#3466 )

2025-12-23 11:33:09 +01:00

device_grouped_gemm_multiple_d_wmma_cshuffle_tile_loop_v3.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_grouped_gemm_multiple_d_xdl_cshuffle_tile_loop.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_grouped_gemm_softmax_gemm_permute_xdl_cshuffle.hpp

Improve XDL to WMMA porting for grouped conv fwd (#3456 )

2025-12-19 15:58:51 -07:00

device_grouped_gemm_wmma_fixed_nk.hpp

[rocm-libraries] ROCm/rocm-libraries#4340 (commit 70a312f)

2026-02-26 00:28:58 +00:00

device_grouped_gemm_wmma_splitk_cshuffle_v3.hpp

Padding support for wave transfer (#3537 )

2026-01-26 12:57:09 -08:00

device_grouped_gemm_xdl_fixed_nk.hpp

[rocm-libraries] ROCm/rocm-libraries#4425 (commit 513cf9f)

2026-02-25 05:17:08 +00:00

device_grouped_gemm_xdl_splitk_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_grouped_gemm_xdl.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

device_grouped_query_attention_forward_wmma.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_image_to_column_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_max_pool_bwd_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_moe_gemm_blockscale.hpp

[rocm-libraries] ROCm/rocm-libraries#4282 (commit 2050f93)

2026-02-12 17:45:52 +00:00

device_moe_gemm.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_moe_mx_gemm_bns.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_moe_mx_gemm_bpreshuffle.hpp

fix static assert (#3178 )

2025-11-20 17:27:05 -08:00

device_moe_mx_gemm.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_multi_query_attention_forward_wmma.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_multiple_reduce_multiblock.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_multiple_reduce_threadwise.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_normalization_bwd_data_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_normalization_bwd_gamma_beta_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_normalization_fwd_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_normalization_fwd_splitk_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_permute_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_pool2d_fwd_nhwc_nhwc.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_pool3d_fwd_ndhwc_ndhwc.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_put_element_impl.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_reduce_common.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_reduce_multiblock.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_reduce_threadwise_multi_d.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_reduce_threadwise.hpp

chore(copyright): update copyright header for include directory (#3224 )

2025-11-18 10:17:18 -08:00

device_softmax_impl.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

device_sparse_embeddings_forward_layernorm.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

device_splitk_contraction_multiple_d_xdl_cshuffle.hpp

[ck] add gridwise base class for in all xdl kernel (#186 ) (#3544 )

2026-01-27 12:49:47 -08:00

split_k_arg.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

split_k_offset_utils.hpp

[CK] Allow tensors larger than 2GB in grouped conv bwd weight (#3169 )

2026-01-08 08:02:02 +01:00

split_k_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00