composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-04-19 22:39:03 +00:00

Files

Erwin Terpstra d5ae81b292 Implement batched gemm add relu gemm add for rdna4 (#3391 )

* wip: test suite for batched gemm multiple d gemm multiple d, working on gridwise implenentation

* wip: many fixes in implementation of batched gemm gemm multiple d

* wip: batched gemm gemm multiple d gridwise op compiling, not working yet

* fix: incorrect d0 grid indexing in batched gemm gemm multipled

* feat: add instances for batched gemm add relu gemm add

* chore: configure instance with low vector transfer size for odd sizes

* chore: add some more validation to device batched gemm gemm multiple d, and removed template parameter that didn't really make sense

* fix: upate device_batched_gemm_gemm_wmma to work with new gridwise changes

* fix: disable odd size tests on XDL archs

* chore: removed temporary logging

* chore: update some references to C tensor to E tensor

* Tentative fix for example template params

* Tentative fix for non-multi-D batched gemm gemm device impl.

* Tentative fix for xdl example template params

* Tentative fix for profiler build on gfx90a

* chore: improve device batched gemm gemm multi D comment to include all ops and dimensions

* chore: explicitly call ck::make_tuple to prevent issues when std::make_tuple would apply

* fix: make the gemm1 data types match what happens in the device op

* feat: add d0s/d1s datatypes and layouts to the device op type string

* chore: change element-wise op so addition happens in fp32

* chore: add static asserts for gemm0/gemm1 calculated wave sizes

* chore: also updated other element-wise ops to use fp32 calculations

* chore: log number of supported instances

* chore: update instance comment

* chore: disable kernel timing in example by default

* fix: gemm1 wave size calculation

* fix: make sure batched gemm multiple d gemm multiple d profiler performs correct type conversions

* chore: remove increased tolerance in batched gemm gemm multiple d example

* chore: add comment explaining that verification fails for certain input values

* chore: clarify instance comment

---------

Co-authored-by: kiefer <kiefer.van.teutem@streamhpc.com>

2026-01-20 13:06:59 -08:00

batchnorm_multiblock

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gemm_layernorm

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

normalization

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_to_ctile_map.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

epilogue_cshuffle_v3_reduce_wmma.hpp

Wmma support for gemm_bias_add_reduce (#3316 )

2026-01-07 10:27:16 -08:00

epilogue_cshuffle_v3_welford_wmma.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

epilogue_cshuffle_v3_wmma_base.hpp

Add support for direct store in epilogue and padding support for wave transfer without transpose (#3465 )

2026-01-14 11:02:19 +01:00

epilogue_cshuffle_v3_wmma.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

epilogue_direct_store.hpp

Add support for direct store in epilogue and padding support for wave transfer without transpose (#3465 )

2026-01-14 11:02:19 +01:00

gridwise_2d_multiple_reduction_multiblock.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_2d_multiple_reduction_threadwise.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_2d_reduction_multiblock.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_2d_reduction_threadwise_multi_d.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_2d_reduction_threadwise.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_ab_transfer_thread_tiles_preshuffle.hpp

Wmma support for gemm_multiply_multiply_wp (#3278 )

2025-12-03 07:38:23 -08:00

gridwise_ab_transfer_thread_tiles.hpp

Wmma support for grouped convolution bwd weight (#2947 )

2025-12-17 15:58:58 -08:00

gridwise_ab_transfer_wave_tiles_interleave.hpp

Add support for direct store in epilogue and padding support for wave transfer without transpose (#3465 )

2026-01-14 11:02:19 +01:00

gridwise_ab_transfer_wave_tiles.hpp

Add support for direct store in epilogue and padding support for wave transfer without transpose (#3465 )

2026-01-14 11:02:19 +01:00

gridwise_batched_gemm_gemm_wmma_cshuffle_v3.hpp

Implement batched gemm add relu gemm add for rdna4 (#3391 )

2026-01-20 13:06:59 -08:00

gridwise_batched_gemm_gemm_xdl_cshuffle_v1.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_batched_gemm_multiple_d_gemm_multiple_d_xdl_cshuffle_v1.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_batched_gemm_multiple_d_softmax_gemm_xdl_cshuffle_v1.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_batched_gemm_softmax_gemm_wmma_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_batched_gemm_softmax_gemm_xdl_cshuffle_v1.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_batchnorm_backward_blockwise_welford.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_batchnorm_forward_blockwise_welford.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_elementwise_1d_scale.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_elementwise_2d.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_elementwise_layernorm_welford_variance.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_fpAintB_gemm_wmma.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_bias_add_reduce_xdl_cshuffle_v1.hpp

Wmma support for gemm_bias_add_reduce (#3316 )

2026-01-07 10:27:16 -08:00

gridwise_gemm_dl_multiple_d.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_dl_v1r3.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_dpp.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_multiple_abd_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_multiple_d_multiple_r_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_multiple_d_wmma_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_multiple_d_xdl_cshuffle_lds_direct_load.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_multiple_d_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_multiple_d_xdl_splitk_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_pipeline_selector.hpp

Simplify includes for CK builder reflection (#3357 )

2025-12-05 07:44:10 -08:00

gridwise_gemm_pipeline_v1.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_pipeline_v2.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_pipeline_v3.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_pipeline_v4_direct_load.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_reduce_xdl_cshuffle_v1.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_split_k_multiple_d_xdl_cshuffle_v2.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_split_k_multiple_d_xdl_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_waveletmodel.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_wmma_cshuffle_v3_ab_scale.hpp

Wmma support for grouped convolution bwd weight (#2947 )

2025-12-17 15:58:58 -08:00

gridwise_gemm_wmma_cshuffle_v3_common.hpp

Implement batched gemm bias permute for RDNA4 (#3534 )

2026-01-17 08:30:27 +01:00

gridwise_gemm_wmma_cshuffle_v3.hpp

Implement batched gemm bias permute for RDNA4 (#3534 )

2026-01-17 08:30:27 +01:00

gridwise_gemm_wmma.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_conv_v3.hpp

[CK] Allow tensors larger than 2GB in grouped conv bwd weight (#3169 )

2026-01-08 08:02:02 +01:00

gridwise_gemm_xdl_cshuffle_streamk_v3.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v1.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v2.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v3_b_preshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v3_b_scale.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v3_multi_abd.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v3_multi_d_ab_scale.hpp

Wmma support for gemm_ab_scale (#3314 )

2025-12-11 09:06:20 +01:00

gridwise_gemm_xdl_cshuffle_v3_multi_d_b_preshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v3_multi_d_blockscale_b_preshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v3_multi_d.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v3_mx_bpreshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v3_mx.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_cshuffle_v3.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_layernorm_cshuffle_v1.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdl_waveletmodel_cshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdlops_bwd_weight.hpp

[CK] Allow tensors larger than 2GB in grouped conv bwd weight (#3169 )

2026-01-08 08:02:02 +01:00

gridwise_gemm_xdlops_skip_b_lds_v1.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdlops_splitk_lds_direct_load.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdlops_streamk.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdlops_v2r3.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdlops_v2r4.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdlops_v2r4r2.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdlops_v3r1.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdlops_v3r2.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_gemm_xdlops_v3r3.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_moe_gemm_blockscale.hpp

moe fp8 blockscale use nt (#3524 )

2026-01-12 10:48:10 +08:00

gridwise_moe_gemm.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_moe_mx_gemm_bns.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_moe_mx_gemm_bpreshuffle.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_moe_mx_gemm.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_permute.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_put_element_1d.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_set_buffer_value.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_set_multiple_buffer_value.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_softmax.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_sparse_embeddings_forward_layernorm_builtins.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_sparse_embeddings_forward_layernorm.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

gridwise_tensor_rearrange.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00