composable_kernel/include/ck_tile/ops/fmha/pipeline at ef4ff4667d6bb3701dc68f588ff51a6ecda6e0a2 - composable_kernel - Public git mirror

ROCm/composable_kernel

mirror of https://github.com/ROCm/composable_kernel.git synced 2026-05-04 13:41:24 +00:00

Files

History

Yi DING fb64a4453c [rocm-libraries] ROCm/rocm-libraries#5915 (commit a72cf7d)

[CK_TILE] Fix FMHA BWD register pressure by wrapping
 num_total_loop with amd_wave_read_first_lane (#5915)

## Motivation

In three FMHA backward pipelines, `num_total_loop` is computed without
`amd_wave_read_first_lane()`, so the compiler treats it as a VGPR even
though it is logically uniform across all lanes. This raises register
pressure, and under high pressure the compiler may reuse VGPRs across
overlapping live ranges. This was confirmed via assembly inspection: the
compiler reused `v52:v53` as both the B-matrix input for dK MFMAs and an
intermediate value for dV, producing incorrect dK/dV gradients.

## Technical Details

Wrap `num_total_loop` with `amd_wave_read_first_lane()` in three
pipelines:
- `block_fmha_bwd_dq_dk_dv_pipeline_kr_ktr_vr`
- `block_fmha_bwd_dq_dk_dv_pipeline_kr_ktr_vr_iglp`
- `block_fmha_bwd_dq_dk_dv_pipeline_trload_kr_ktr_vr`

This promotes `num_total_loop` to an SGPR, eliminating the excess
register pressure and the incorrect VGPR reuse.

## Test Plan

<!-- Explain any relevant testing done to verify this PR. -->

## Test Result

<!-- Briefly summarize test outcomes. -->

## Submission Checklist

- [x] Look over the contributing guidelines at
https://github.com/ROCm/ROCm/blob/develop/CONTRIBUTING.md#pull-requests.

2026-03-30 01:45:16 +00:00

..

block_fmha_batch_prefill_pipeline_qr_ks_vs_async_default_policy.hpp

Optimize batch prefill kernel performance for VECTORIZED_LAYOUT KV cache (#3657 )

2026-01-29 07:18:41 +08:00

block_fmha_batch_prefill_pipeline_qr_ks_vs_async.hpp

[rocm-libraries] ROCm/rocm-libraries#4999 (commit 45f6624)

2026-03-05 01:09:12 +00:00

block_fmha_bwd_convert_dq.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_bwd_dot_do_o.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_bwd_dq_dk_dv_pipeline_kr_ktr_vr_iglp.hpp

[rocm-libraries] ROCm/rocm-libraries#5915 (commit a72cf7d)

2026-03-30 01:45:16 +00:00

block_fmha_bwd_dq_dk_dv_pipeline_kr_ktr_vr.hpp

[rocm-libraries] ROCm/rocm-libraries#5915 (commit a72cf7d)

2026-03-30 01:45:16 +00:00

block_fmha_bwd_dq_dk_dv_pipeline_selector.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_bwd_dq_dk_dv_pipeline_trload_kr_ktr_vr.hpp

[rocm-libraries] ROCm/rocm-libraries#5915 (commit a72cf7d)

2026-03-30 01:45:16 +00:00

block_fmha_bwd_dq_dk_dv_pipeline_trload_qr_qtr_dor.hpp

[rocm-libraries] ROCm/rocm-libraries#4294 (commit 6601702)

2026-03-02 12:21:44 +00:00

block_fmha_bwd_pipeline_default_policy.hpp

[rocm-libraries] ROCm/rocm-libraries#4584 (commit 42efd1d)

2026-02-21 01:15:57 +00:00

block_fmha_bwd_pipeline_problem.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_bwd_pipeline_trload_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_fwd_appendkv_pipeline_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_fwd_appendkv_pipeline.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_fwd_pagedkv_pipeline_qr_ks_vs_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_fwd_pagedkv_pipeline_qr_ks_vs.hpp

[rocm-libraries] ROCm/rocm-libraries#4313 (commit 080ac66)

2026-03-02 01:54:46 +00:00

block_fmha_fwd_splitkv_combine_pipeline_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_fwd_splitkv_combine_pipeline.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_fwd_splitkv_pipeline_nwarp_sshuffle_qr_ks_vs_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_fwd_splitkv_pipeline_nwarp_sshuffle_qr_ks_vs.hpp

[rocm-libraries] ROCm/rocm-libraries#4313 (commit 080ac66)

2026-03-02 01:54:46 +00:00

block_fmha_fwd_splitkv_pipeline_qr_ks_vs_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_fwd_splitkv_pipeline_qr_ks_vs.hpp

[rocm-libraries] ROCm/rocm-libraries#4313 (commit 080ac66)

2026-03-02 01:54:46 +00:00

block_fmha_fwd_v3_pipeline_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_fwd_v3_pipeline.hpp

[rocm-libraries] ROCm/rocm-libraries#4302 (commit e62bd8a)

2026-03-19 09:19:06 +00:00

block_fmha_pipeline_enum.hpp

[CK_TILE][FMHA] Integrate FAv2 & FAv3 (WIP) in the single fmha_fwd() API (#3153 )

2025-12-05 10:31:12 +08:00

block_fmha_pipeline_problem.hpp

[rocm-libraries] ROCm/rocm-libraries#4368 (commit 17f7dfc)

2026-03-11 10:00:52 +00:00

block_fmha_pipeline_qr_ks_vs_async_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_pipeline_qr_ks_vs_async_trload_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_pipeline_qr_ks_vs_async_trload.hpp

[rocm-libraries] ROCm/rocm-libraries#4294 (commit 6601702)

2026-03-02 12:21:44 +00:00

block_fmha_pipeline_qr_ks_vs_async.hpp

[rocm-libraries] ROCm/rocm-libraries#4368 (commit 17f7dfc)

2026-03-11 10:00:52 +00:00

block_fmha_pipeline_qr_ks_vs_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_pipeline_qr_ks_vs_fp8.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_pipeline_qr_ks_vs_whole_k_prefetch_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_pipeline_qr_ks_vs_whole_k_prefetch.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_pipeline_qr_ks_vs.hpp

[rocm-libraries] ROCm/rocm-libraries#4368 (commit 17f7dfc)

2026-03-11 10:00:52 +00:00

block_fmha_pipeline_qs_ks_vs_default_policy.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_pipeline_qs_ks_vs.hpp

chore(copyright): update copyright header for include directory (#3293 )

2025-11-26 11:00:05 -07:00

block_fmha_pipeline_qx_ks_vs_custom_policy.hpp

[rocm-libraries] ROCm/rocm-libraries#4368 (commit 17f7dfc)

2026-03-11 10:00:52 +00:00

tile_fmha_shape.hpp

Shuffle fix for gfx950 (#3491 )

2026-01-13 09:21:29 -08:00

tile_fmha_traits.hpp

[FMHA] Batch Prefill Support Improvements: Change KV Cache Layout & Large Page Size Support (#3442 )

2026-01-05 18:41:47 +08:00