ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-11 00:39:02 +00:00

Files

History

Hosang Yoon e7e8801dc3 [rocm-libraries] ROCm/rocm-libraries#7586 (commit c18f2c7)

[CK_TILE] Use gfx11 float buffer atomics in FMHA Bwd

## Motivation

FlashAttention CK backward on gfx11 can hit out-of-bounds/tail writes in
the dQ accumulator atomic-add path when sequence rows are padded at the
tile level but not marked invalid in the DQDKDV main tensor view.

With the generic global atomic fallback, an incorrectly-valid tail
element can issue an actual pointer-based `atomicAdd`. With the buffer
atomic path, the write is issued through a buffer resource with bounds
information and follows the same backend already used by gfx9/gfx12.

This fixes the gfx11 FMHA BWD failure without changing the gfx11 default
for unrelated CK Tile kernels.

## Technical Details

This PR enables the existing CK Tile AMD buffer float atomic-add path
only for generated FMHA BWD gfx11 translation units.

gfx11 normally uses the generic global atomic fallback for
floating-point `buffer_view::atomic_add`. That fallback performs the
atomic through a raw computed pointer and depends on the software
validity predicate to avoid invalid elements. In FMHA BWD dQ
accumulation, padded tail rows can reach this path, so using the buffer
atomic backend is safer: it uses a buffer resource with base pointer,
bounds information, and an element offset, matching the backend already
used by gfx9/gfx12.

Enabling `CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT` globally for gfx11 is
too broad and can break unrelated gfx11 CK builds such as GEMM. Instead,
`config.hpp` now preserves an explicitly pre-defined
`CK_TILE_USE_AMD_BUFFER_ATOMIC_ADD_FLOAT`, while keeping the existing
default disabled for gfx11.

## Test Plan

Validated the change with the FlashAttention CK full test suite with
backward pass enabled on gfx11.
pytest -q -s tests/test_flash_attn_ck.py

## Test Result

FlashAttention CK gfx11 test result:
260680 passed, 152076 skipped

## Submission Checklist

- [ ] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

Co-authored-by: Po Yen Chen <PoYen.Chen@amd.com>

2026-05-30 00:10:26 +00:00

..

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

02_gemm_bilinear

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

03_gemm_bias_relu

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

04_gemm_add_add_fastgelu

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

10_convnd_fwd_multiple_d_multiple_reduce

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

11_convnd_fwd_bias

[DOCS] Documentation Addition (Readme updates) (#2495 )

2025-10-16 03:10:57 -07:00

[rocm-libraries] ROCm/rocm-libraries#5030 (commit 8e02a26)

2026-03-06 09:27:27 -07:00

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

14_gemm_quantization

[rocm-libraries] direct push (commit 7b18234)

2026-03-12 09:47:41 +01:00

15_grouped_gemm

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

16_gemm_multi_d_multi_reduces

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

17_convnd_bwd_data

[CK] Integrate GPU reference into ckProfiler for convolutions (#3379 )

2025-12-18 07:59:45 +01:00

18_batched_gemm_reduce

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

19_binary_elementwise

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

20_grouped_conv_bwd_weight

[rocm-libraries] ROCm/rocm-libraries#5652 (commit 7dc7d1d)

2026-05-18 17:46:01 +02:00

21_gemm_layernorm

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

24_batched_gemm

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

25_gemm_bias_e_permute

Implement batched gemm bias permute for RDNA4 (#3534 )

2026-01-17 08:30:27 +01:00

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

27_layernorm2d_fwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

28_grouped_gemm_bias_e_permute

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

29_batched_gemm_bias_e_permute

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

30_grouped_conv_fwd_multiple_d

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

31_batched_gemm_gemm

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

32_batched_gemm_scale_softmax_gemm

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

33_multiple_reduce

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

36_sparse_embedding

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

37_batched_gemm_add_add_relu_gemm_add

Implement batched gemm add relu gemm add for rdna4 (#3391 )

2026-01-20 13:06:59 -08:00

38_grouped_conv_bwd_data_multiple_d

[rocm-libraries] ROCm/rocm-libraries#7732 (commit b0e29d9)

2026-05-27 09:59:14 +03:00

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

40_conv2d_fwd_quantization

[rocm-libraries] ROCm/rocm-libraries#7111 (commit 651947f)

2026-05-08 07:14:14 -07:00

41_grouped_conv_conv_fwd

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

42_groupnorm_fwd

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

43_splitk_gemm_bias_e_permute

[rocm-libraries] ROCm/rocm-libraries#7760 (commit a61bc76)

2026-05-27 06:56:58 -07:00

44_elementwise_permute

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

45_elementwise_normalization

[CI, CK examples] Disable time_kernel for CI tests and examples (#3464 )

2026-01-07 16:30:57 +01:00

46_gemm_add_multiply

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

47_gemm_bias_softmax_gemm_permute

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

49_maxpool2d_bwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

51_avgpool3d_bwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

52_im2col_col2im

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

53_layernorm2d_bwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

54_groupnorm_bwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

59_grouped_gemm_multi_ABD

[rocm-libraries] ROCm/rocm-libraries#4425 (commit 513cf9f)

2026-02-25 05:16:07 +00:00

60_gemm_multi_ABD

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

61_contraction_multi_ABD

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

62_convnd_activ

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

63_layernorm4d_fwd

chore(copyright) update library wide CMakeLists.txt copyright header template (#3313 )

2025-11-28 13:49:54 -08:00

64_fpAintB_gemm

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

65_gemm_multiply_multiply

[rocm-libraries] ROCm/rocm-libraries#6761 (commit d19f6f1)

2026-05-27 18:55:15 +00:00

66_complex_contraction_bilinear

[rocm-libraries] ROCm/rocm-libraries#7612 (commit 5427d24)

2026-05-22 02:43:50 +00:00

67_gemm_microscaling

[rocm-libraries] ROCm/rocm-libraries#6978 (commit e58096d)

2026-05-15 06:46:51 -07:00

[CK][Examples] Fixing stride issues in ck examples 14/65/68/69 by workaround - Bypassing hostTensor validation

2026-01-15 16:43:02 +01:00

69_gemm_add_relu

[CK][Examples] Fixing stride issues in ck examples 14/65/68/69 by workaround - Bypassing hostTensor validation

2026-01-15 16:43:02 +01:00

[rocm-libraries] ROCm/rocm-libraries#7586 (commit c18f2c7)

2026-05-30 00:10:26 +00:00

CMakeLists.txt

[rocm-libraries] direct push (commit 49b73ad)

2026-05-25 11:26:26 +02:00

README.md

Add basic documentation structure (#1715 )

2024-12-04 00:46:47 +01:00

README.md

Back to the main page

Composable Kernel examples