composable_kernel/include/ck_tile/ops at b3a5e7ff64ffed95d0b2d0346dfd33675cdf406c - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-06-30 03:37:38 +00:00

Files

History

Ding, Yi b3a5e7ff64 [CK_TILE] Fix dq_acc per-nhead stride in FMHA BWD group mode

In group mode the dq_acc workspace layout uses physical (padded)
seqlen_q for the per-nhead stride (see FmhaBwdWorkspaceManager doc;
also matches FmhaBwdConvertQGradKernel reads). The unified-workspace
refactor inlined this stride as kargs.seqlen_q, which is the LOGICAL
length when seqlen_q_ptr is provided. The result: main kernel writes
batch i nhead>0 dq_acc at offsets that the convert kernel never reads,
so dQ ends up zero for those positions.

Hoist physical_seqlen_q to the outer scope and use it for both the
non-deterministic and deterministic stride computations in the
dq_dram_window lambda. Batch mode is unaffected since kargs.seqlen_q
already equals the physical length there.

Fixes 135 padding-related failures in test_ck_tile_fmha_bwd_fp16
(BasicQPadding / MultiBatchPadding / PaddingWithMask / QKVPadding /
VariedPaddingRatios / ZeroLengthPadding / Deterministic /
ElementwiseBias). Verified locally: full suite 672 PASSED / 0 FAILED.
SGPR usage drops by 1; VGPR/AGPR/spill/occupancy unchanged.

2026-04-27 01:54:53 -05:00

..

add_rmsnorm2d_rdquant

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

batched_contraction

[CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550 )

2026-04-22 15:47:47 +00:00

batched_transpose

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

[CK_TILE] Restructure Tile Engine's benchmarking and profiling (#4769 )

2026-04-14 10:50:24 -07:00

[CK][CK TILE] Modify elementwise kernel template signature to accept independent type arguments (#6399 )

2026-04-14 01:44:27 -06:00

[CK_TILE] Separate PermuteN epilogue from CShuffle epilogue into standalone file (#5863 )

2026-04-14 20:22:18 +00:00

[CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550 )

2026-04-22 15:47:47 +00:00

[CK_TILE] Fix dq_acc per-nhead stride in FMHA BWD group mode

2026-04-27 01:54:53 -05:00

[CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550 )

2026-04-22 15:47:47 +00:00

[CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550 )

2026-04-22 15:47:47 +00:00

[CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550 )

2026-04-22 15:47:47 +00:00

[CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550 )

2026-04-22 15:47:47 +00:00

grouped_convolution

[CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550 )

2026-04-22 15:47:47 +00:00

image_to_column

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

CK: Remove 41 commented-out dead code blocks (~200 lines) (#6302 )

2026-04-10 11:17:11 -04:00

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

[CK] Fix/suppress clang lifetimebound warnings with staging compiler. (#6550 )

2026-04-22 15:47:47 +00:00

CK: Remove 41 commented-out dead code blocks (~200 lines) (#6302 )

2026-04-10 11:17:11 -04:00

Fix redundant cast in model sensitive rmsnorm (#3681 )

2026-01-30 10:52:19 +08:00

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

Add padding to cshuffle epilogue to avoid bank conflict (#4274 )

2026-02-10 22:52:00 -07:00

[CK_TILE][FMHA] Add sparse attention VSA (#3341 )

2026-01-31 00:59:47 +08:00

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

Shuffle fix for gfx950 (#3491 )

2026-01-13 09:21:29 -08:00

add_rmsnorm2d_rdquant.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

batched_contraction.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

batched_transpose.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

common.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

elementwise.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

epilogue.hpp

[CK_TILE] Separate PermuteN epilogue from CShuffle epilogue into standalone file (#5863 )

2026-04-14 20:22:18 +00:00

flatmm.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

fmha.hpp

[CK_TILE][FMHA] Support microscaling (mxfp8 and mxfp4) on gfx950 (#4368 )

2026-03-11 09:59:50 +00:00

fused_moe.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

gemm_mx.hpp

Changed the include order of the new WMMA/MFMA unification framework (#5241 )

2026-03-12 09:26:58 +01:00

gemm_quant.hpp

[CK Tile] Eight Waves pipeline GEMM (#4964 )

2026-03-16 09:30:54 +01:00

gemm.hpp

[CK Tile] Eight Waves pipeline GEMM (#4964 )

2026-03-16 09:30:54 +01:00

grouped_convolution.hpp

Changed the include order of the new WMMA/MFMA unification framework (#5241 )

2026-03-12 09:26:58 +01:00

image_to_column.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

layernorm2d.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

moe_flatmm.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

norm_reduce.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

permute.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

pooling.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

reduce.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

rmsnorm2d.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

smoothquant.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

softmax.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

sparse_attn.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

topk_softmax.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00

topk.hpp

Cleanup and refactoring related to tile loading (#4294 )

2026-03-02 12:20:55 +00:00