composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-07-18 09:38:17 +00:00

Files

Johannes Graner c427b9ba2a [CK] Allow tensors larger than 2GB in grouped conv bwd weight (#3169 )

* Take split_k into account when checking 2GB tensor limit.

* Revert "Take split_k into account when checking 2GB tensor limit."

This reverts commit adf35c91be.

* Optimize grouped conv bwd wei split_k off calc

(cherry picked from commit 2115642ee59050dabd81393c1b8f03b34adc05aa)

* Update gridwise_gemm_xdl_cshuffle_conv_v3.hpp

(cherry picked from commit 900d4d4b466f5730ae1189370d3c96267c35ea69)

* Fix tensor descriptors and stride calculations

* Don't miss half of the elements

* Fix buffer size calculations

* Disable hack if stride not divisible by k_batch

* Clean up comments

* Disallow hack in non-contiguous edge cases

* Index -> Dim

* Fix broken test

* Refactor applicability checks into separate function

* fix missed variable name

* Fix variable name in info print

* update V3 2GB check

* No more regression, use templates instead

* Code deduplication

* Regression fix for cshuffle

* arch-guarded atomic_add implementations for gfx11

* Similar for half(4|8)_t as well

* Only use both offset hacks at the same time

* Revert "arch-guarded atomic_add implementations for gfx11"

This reverts commit 3883fe6935.
This reverts commit 5311ec608d.

* Reapply "arch-guarded atomic_add implementations for gfx11"

This reverts commit 1972adeddc.

* Only remove float4 atomic_add

* Refactor to single flag

* Consolidate template parameters

* Consolidate flag in transformers

---------

Co-authored-by: Bartlomiej Kocot <barkocot@amd.com>

[ROCm/composable_kernel commit: ee2c35b92d]

2026-01-08 08:02:02 +01:00

amd_address_space.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_buffer_addressing_builtins.hpp

Update AMD buffer coherency (#3403 )

2025-12-18 10:16:22 +01:00

amd_buffer_addressing.hpp

Update AMD buffer coherency (#3403 )

2025-12-18 10:16:22 +01:00

amd_buffer_coherence.hpp

Update AMD buffer coherency (#3403 )

2025-12-18 10:16:22 +01:00

amd_ck_fp8.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_gemm_dpp.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_inline_asm.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_lds.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_smfmac.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_transpose_load.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_wave_read_first_lane.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

amd_wmma.hpp

Add support to gfx1153 and fix gfx115X WMMA config (#3496 )

2026-01-05 10:03:30 -08:00

amd_xdlops.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

array_multi_index.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

array.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

blkgemmpipe_scheduler.hpp

Simplify includes for CK builder reflection (#3357 )

2025-12-05 07:44:10 -08:00

c_style_pointer_cast.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

common_header.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

container_element_picker.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

container_helper.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

data_type.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

debug.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

dtype_fp64.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

dtype_vector.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

dynamic_buffer.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

e8m0.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

enable_if.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

env.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

f8_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

filter_tuple.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

flush_icache.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

functional2.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

functional3.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

functional4.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

functional.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

generic_memory_space_atomic.hpp

[CK] Allow tensors larger than 2GB in grouped conv bwd weight (#3169 )

2026-01-08 08:02:02 +01:00

get_id.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

get_shift.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

ignore.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

inner_product_dpp8.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

inner_product.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

integral_constant.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

is_detected.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

is_known_at_compile_time.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

loop_scheduler.hpp

Simplify includes for CK builder reflection (#3357 )

2025-12-05 07:44:10 -08:00

magic_division.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

math_v2.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

math.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

multi_index.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

mxf4_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

mxf6_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

mxf8_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

mxfp_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

number.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

numeric_limits.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

numeric_utils.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

pipeline_enum.hpp

Simplify includes for CK builder reflection (#3357 )

2025-12-05 07:44:10 -08:00

random_gen.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

reduction_common.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

reduction_enums.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

reduction_functions_accumulate.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

reduction_operator.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

scaled_type_convert.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

scheduler_enum.hpp

Simplify includes for CK builder reflection (#3357 )

2025-12-05 07:44:10 -08:00

sequence_helper.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

sequence.hpp

Improve sequence sorting and add unit tests (#3376 )

2025-12-10 12:25:23 -08:00

span.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

static_buffer.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

statically_indexed_array_multi_index.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

statically_indexed_array.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

synchronization.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

thread_group.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

transpose_vectors.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

tuple_helper.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

tuple.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

type_convert.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

type.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

workgroup_barrier.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

workgroup_synchronization.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00