composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-04-19 22:39:03 +00:00

Files

Erwin Terpstra d5ae81b292 Implement batched gemm add relu gemm add for rdna4 (#3391 )

* wip: test suite for batched gemm multiple d gemm multiple d, working on gridwise implenentation

* wip: many fixes in implementation of batched gemm gemm multiple d

* wip: batched gemm gemm multiple d gridwise op compiling, not working yet

* fix: incorrect d0 grid indexing in batched gemm gemm multipled

* feat: add instances for batched gemm add relu gemm add

* chore: configure instance with low vector transfer size for odd sizes

* chore: add some more validation to device batched gemm gemm multiple d, and removed template parameter that didn't really make sense

* fix: upate device_batched_gemm_gemm_wmma to work with new gridwise changes

* fix: disable odd size tests on XDL archs

* chore: removed temporary logging

* chore: update some references to C tensor to E tensor

* Tentative fix for example template params

* Tentative fix for non-multi-D batched gemm gemm device impl.

* Tentative fix for xdl example template params

* Tentative fix for profiler build on gfx90a

* chore: improve device batched gemm gemm multi D comment to include all ops and dimensions

* chore: explicitly call ck::make_tuple to prevent issues when std::make_tuple would apply

* fix: make the gemm1 data types match what happens in the device op

* feat: add d0s/d1s datatypes and layouts to the device op type string

* chore: change element-wise op so addition happens in fp32

* chore: add static asserts for gemm0/gemm1 calculated wave sizes

* chore: also updated other element-wise ops to use fp32 calculations

* chore: log number of supported instances

* chore: update instance comment

* chore: disable kernel timing in example by default

* fix: gemm1 wave size calculation

* fix: make sure batched gemm multiple d gemm multiple d profiler performs correct type conversions

* chore: remove increased tolerance in batched gemm gemm multiple d example

* chore: add comment explaining that verification fails for certain input values

* chore: clarify instance comment

---------

Co-authored-by: kiefer <kiefer.van.teutem@streamhpc.com>

2026-01-20 13:06:59 -08:00

amd_address_space.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_buffer_addressing_builtins.hpp

Update AMD buffer coherency (#3403 )

2025-12-18 10:16:22 +01:00

amd_buffer_addressing.hpp

Update AMD buffer coherency (#3403 )

2025-12-18 10:16:22 +01:00

amd_buffer_coherence.hpp

Update AMD buffer coherency (#3403 )

2025-12-18 10:16:22 +01:00

amd_ck_fp8.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_gemm_dpp.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_inline_asm.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_lds.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_smfmac.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_transpose_load.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_wave_read_first_lane.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_wmma.hpp

Add support to gfx1153 and fix gfx115X WMMA config (#3496 )

2026-01-05 10:03:30 -08:00

amd_xdlops.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

array_multi_index.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

array.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

blkgemmpipe_scheduler.hpp

Simplify includes for CK builder reflection (#3357 )

2025-12-05 07:44:10 -08:00

c_style_pointer_cast.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

common_header.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

container_element_picker.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

container_helper.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

data_type.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

debug.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

dtype_fp64.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

dtype_vector.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

dynamic_buffer.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

e8m0.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

enable_if.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

env.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

f8_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

filter_tuple.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

flush_icache.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

functional2.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

functional3.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

functional4.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

functional.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

generic_memory_space_atomic.hpp

[CK] Allow tensors larger than 2GB in grouped conv bwd weight (#3169 )

2026-01-08 08:02:02 +01:00

get_id.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

get_shift.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

ignore.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

inner_product_dpp8.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

inner_product.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

integral_constant.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

is_detected.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

is_known_at_compile_time.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

loop_scheduler.hpp

Simplify includes for CK builder reflection (#3357 )

2025-12-05 07:44:10 -08:00

magic_division.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

math_v2.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

math.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

multi_index.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

mxf4_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

mxf6_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

mxf8_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

mxfp_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

number.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

numeric_limits.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

numeric_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

pipeline_enum.hpp

Simplify includes for CK builder reflection (#3357 )

2025-12-05 07:44:10 -08:00

random_gen.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

reduction_common.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

reduction_enums.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

reduction_functions_accumulate.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

reduction_operator.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

scaled_type_convert.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

scheduler_enum.hpp

Simplify includes for CK builder reflection (#3357 )

2025-12-05 07:44:10 -08:00

sequence_helper.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

sequence.hpp

Improve sequence sorting and add unit tests (#3376 )

2025-12-10 12:25:23 -08:00

span.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

static_buffer.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

statically_indexed_array_multi_index.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

statically_indexed_array.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

synchronization.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

thread_group.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

transpose_vectors.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

tuple_helper.hpp

Implement batched gemm add relu gemm add for rdna4 (#3391 )

2026-01-20 13:06:59 -08:00

tuple.hpp

Implement grouped gemm tile loop for RDNA4 (#3304 )

2026-01-13 07:14:23 +01:00

type_convert.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

type.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

workgroup_barrier.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

workgroup_synchronization.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00